Optimizing Dynamic Sites

November 1st, 2004


Search engines tend to have problems fully indexing dynamic websites (in other words, sites that are hooked up to a database of content).

The kinds of sites that search engines have the biggest trouble with are ones that have overly complex URL structures, including numerous variables in the URL (marked by numerous occurrences of ampersands and equals signs as well as session IDs, user IDs and referral tracking codes).

Not only are such URLs unfriendly to users who might potentially copy the URL and paste it in an email to a friend, or add a link on their own website to that particular page deep within your site, they are also unfriendly to the search engine spiders because they are a tip-off that the page is dynamically generated and could lead to what is called a spider trap.

A spider trap exists when a search engine spider keeps following links to URLs that appear to be different from URLs that have already been explored, however it is the same content.

Imagine for example a search engine spider coming to the site, getting assigned a session ID, which is then embedded in the URL of all the pages on the page. The next time a spider comes to this page, it gets a brand new session ID because your web server can’t detect it is the same spider that came a few minutes ago. This results in numerous copies of the same exact page getting indexed, which is obviously a bad result for the search engine and a bad result for the search engine’s users because of all this duplication of content.

The worst kind of spider traps result in the spider getting an infinite variety of URLs although the same limited set of pages. Each search engine has its own tolerance levels as to how many variables in the URL are acceptable. The idea, however, is to eliminate all signs of the dynamic nature of your pages from the URL, in other words removing all stop characters, question marks, ampersands, equals signs, cgi-bin, user IDs, and session IDs from the URLs to make the page infinitely more palatable to the spiders.

Not only does a clean, simple URL eliminate the potential problems that you could have with getting that page indexed.. As a bonus, you’re also more likely to garner more “deep links” from other sites (i.e. links directly into a page that’s deep within your site) because the URL looks user-friendly, stable, and easy to copy-and-paste (into a web browser, email message, or web page editor).

The best approach is to replace all dynamic looking links with search engine friendly ones. Don’t be tempted just to take a short cut approach and create a site map with links to all these search engine friendly URLs, leaving all the remaining links as is across your site. We say this because the URLs that you haven’t fixed will not enhance the PageRank score of the pages with the friendly URLs. You want to maximize your PageRank score by having as few variations in each URL as possible. Variations in the URLs lead to PageRank dilution because not all possible votes are voting for the same page. Some of them are spread out, some of them voting for some versions of the page with one URL and others voting with other versions of the URL.

Choose the right solution
Assuming you have a dynamic site that is not yet search engine friendly, but you would like to make it so, you have two options.

One is to fix the URLs on your server or, alternatively, you could use a third party hosted proxy serving solution.

The first option is the preferable one if you have the IT resources to implement it on your server, and your server supports the technology required for URL re-writing (for example mod_rewrite for Apache, ISAPI_rewrite for Microsoft’s IIS Server).

If such rewriting modules or plug-ins are not available, you could alternatively recode your scripts to look for variables embedded within the directory names or the file names instead of the “query string,” however this tends to be quite a bit more complicated to implement.

The latter option of a third party hosted proxy serving solution, such as our gravityStream solution, is more appropriate when you have very limited IT resources to implement rewritten URLs across your site, or if you are caught in a middle of a code freeze such as during the holiday season and you need to increase indexation without making significant changes to your website or web server.

????????? ????? ????

Filed under: , | No Comments »