Your free guide to search engine optimization.
Web pages are written in computer languages (most notably HTML) that Web browsers interpret for display. How the page appears in a browser window depends on its source text as well as which browser you are using.
Search engine spiders are designed to crawl and index Web pages quickly, so they can index simple HTML code and follow HTML-scripted links. Web developers fashion HTML as simple text with basic formatting instructions to describe how the browser should display the content. These formatting instructions also include how to link to other documents or content, which helps authors organize information on or among documents.
Today, HTML pages include embedded scripts written in other languages, such as JavaScript. Spiders cannot crawl or index either of these embedded scripts. As a result, the actual HTML content behind Web pages is often a complex morass of tags and scripting that makes it difficult for a search engine spider to crawl the relevant content. In some cases it can prevent the page or site from being indexed.
The major search engines, particularly MSN, recognize this limitation and note in their "Help" sections that Web pages should have clean HTML to ease a spider's crawl through a website.
For more specific information, please visit the following pages from the top three search engines:
- http://www.google.com/intl/en/webmasters/guidelines.html
- http://help.yahoo.com/help/us/ysearch/ranking/ranking-02.html
- http://search.msn.com/docs/siteowner.aspx?t=SEARCH_WEBMASTER_REF_GuidelinesforOptimizingSite.htm
Outside the issue of indexing difficulties, clumsy HTML can hinder a search engine from understanding a Web page's theme or subject matter. Spiders could deem Web pages that do not offer clearly stated themes irrelevant to a given user's search query. One way search engines determine a page's theme is through keyword phrase density. If the HTML source content of a Web page has a high ratio of a keyword phrases as compared to other words or characters in the code, the page has a better chance of being understood as relevant to that keyword phrase.
Search engine spiders index Web pages' source HTML from top to bottom. Therefore, the important parts of the HTML (title and meta tags, body content, and links), should appear as close to the top of the page as possible. Reorganizing the HTML and making it keyword phrase-dense results in a page optimized to rank on its targeted keyword phrase.
JavaScript, which spiders cannot index, should be referenced externally. It is possible to take the JavaScript itself off the page and place it in another file on the server. The Web page will display the same way if you replace the JavaScript with a simple one-line call to the external file. A page without JavaScript fully embedded will have a better keyword phrase ratio.
JavaScript and Cascading Style-Sheets (CSS) embedded in HTML can cause
complications for a search engine when it indexes a Web page as well. CSS causes the same problems as
JavaScript, as it pushes the important content further down the page and dilutes the keyword density.
It is best practice to move the CSS code off the page.