Find – Metamend SEO Notes Weird Robots.txt Files

Monday, February 18, 2008
Posted by Jim Hedger @ 3:00 pm

Metamend SEO specialist Colin Cochrane posted an interesting find on his blog Saturday. While collecting materials for work on one of his personal sites, Colin attempted to extract material he had previously bookmarked at Using a Firefox add-on that had its User-Agent set to Googlebot, Colin searched for the specific file he wanted but instead was greeted with a 404-error page.

After running through the list of mistakes he might have made and eliminating each of them, he remembered he had reset the User-Agent to Googlebot when researching if another site was cloaking or not. In other words, the search he was conducting on looked as if it was being performed by Googlebot.

After resetting the User-Agent to default and resubmitting his search-query, delivered the results Colin originally expected. Puzzled by the experience, Colin checked the robots.txt files in the source code and found it was disallowing bots from all major search engines, including that of its owner, Yahoo.

“Puzzled by this, I took a look at’ robots.txt and found that it was disallowing Googlebot, Slurp, Teoma, and msnbot for the following:

Disallow: /inbox
Disallow: /subscriptions
Disallow: /network
Disallow: /search
Disallow: /post
Disallow: /login
Disallow: /rss

Seeing that the robots.txt was blocking these search engine spiders, I tried accessing with my User-Agent switcher set to each of the disallowed User-Agents and received the same 404 response for each one.”

Colin surmises that Yahoo, (which recently announced integration of results into its SERPs) is trying to limit the value of posts to other search engines however other commentators in the search marketing sector disagree.

In a comment at Sphinn, where Colin’s article is receiving a lot of attention, Antigua based SEO “Sebastian” notes that recently cashed pages could still be found in Google’s index though Colin did mention he felt the change had only occurred within the previous two or three days. Sebastian did add a comment to Colin’s blog (where the piece originally appeared) stating, “If in a week or so we can’t find crawler fetches after Feb/13 that’s worth further investigation.”

Another comment at Sphinn from long-serving SEO Dan Theis suggests has been trying to deal with bad-bots for a while. Dan points to an August 30, 2006 post at SEOSpeedwagon by Eric Dafforn which clearly shows earlier work on robots.txt files. At that time, Eric supposed was attempting to ward off link-spam posts by outwardly lowering the perceived value of links from the bookmarking network.

As Colin and Sebastian agree, it will be at least a week before SEOs know for sure if was trying to purposefully block search spiders. If we do not see documents in Google’s index that have been crawled since Thursday February 13, we’ll have a better idea about the purpose or intent of the mysterious robots.txt file.

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a comment