Spider trap for pages created on the fly

September 26th, 2006


Q: Please explain again the “spider trap” as it pertains to pages created on the fly. How big of a disadvantage does this present? If redesigning the site isnt an option, what can be done to lessen this disadvantage?

A: A search engine spider can get caught in a “spider trap” if it keeps bumping into links to pages that are the same content but with different URLs that are varied dynamically (e.g. where the URLs contain a nonessential variable/flag or session ID in the query string). If caught in a spider trap, the spider would download the same pages over and over again, overloading the site’s web server and cluttering up Google’s index with a slew of duplicates. To circumvent such potential problems, Googlebot often chooses to skip over various dynamic pages. This can have very deleterious results, such as the majority of a dynamic site getting skipped over by Googlebot. If revamping the site isn’t an option, you might want to consider an outsourced “dynamic feed” service such as GravityStream.