Source Code, HTML validation and the Search Engines

Friday, October 26, 2007
Posted by Alex Hlinski @ 3:33 pm

Coming from a software engineering background it is sometimes disappointing to see that many web sites are poorly coded, containing incorrect and bloated HTML. This article aims to explain why creating good HTML source code, though not being a direct ranking factor, can help the search engines in their analysis of your pages.

All programming languages have semantic and syntactic rules that must be followed if the program is to run correctly, and HTML script is no exception. Unlike ’strict’ programming language compilers, web browsers can be very ‘loose’ in their interpretation of HTML, leading many web developers to take short cuts and ignore HTML code compliance. From the very beginning of the WWW, the W3C has defined the HTML script language and provides the W3C Validator to aid developers in creating compliant code.

HTML validation can be helpful to SEOs as it can outline code errors, missing alt attributes, incorrect and elements and tags and character encoding issues. A web page that successfully validates will have a better chance of displaying consistently across compliant web browsers. For the search engines, processing time and resources will be saved when extracting markup and reduces the chance that content will be stripped out along with the markup.

Code bloat is a double whammy, causing an increase in page load time for users and increasing the time for SE spider crawls across a whole site. Spiders only have a certain amount of time on a site during a crawl and anything that can be done to reduce the amount of code per page will increase the number of pages that can be indexed during this time.

There are a few simple steps that can be undertaken to make sure your code and resulting page size is as small as possible.

1. Remove your JavaScript code from the main HTML source into an external include file.
2. Externalise CSS in the same way (where appropriate).
3. Use CSS to format content instead of HTML font tags, table elements and other markup.
4. To reduce page load time on dynamic websites, reduce the amount of queries to the database per page and reduce the amount of records returned during each query.

The direct results of these methods in reduction of page size can be seen using the following example from Google Webmaster Tools. If we examine the crawl rate data for a website that has around 3500 pages, each of which originally contained many hundreds of lines of JavaScript code and large amounts of CSS declarations. The pages also had multiple sections and incorrectly defined elements. After removal of all of the unnecessary code and correcting the HTML syntax errors, the pages validated and were around 20% of their original Kb size.

As can be seen in the graph below, the time spent downloading a page has reduced (red), and consequently the GoogleBot has been able to crawl more pages per day (blue).
Improved Google Crawl Data

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a comment