General Community > Chit Chat

Robots/Search Engines

(1/1)

Oldiesmann:
I was browsing the server error logs today, and noticed errors coming from IP address owned by the search engine Alexa. The errors were saying that ".../forum/robots.txt" does not exist. I know what they're looking for, but can't figure out why I would get an error message if it didn't exist. I am more than willing to let search engines visit my board - the more traffic, the better - but I want to control what they can view. I only want them to be able to get as far as viewing the posts, and nothing further (ie board index, categories, etc). How can I control how far they go? I have been looking online for info on the meta robots tags, but that hasn't gotten me very far.

Spaceman-Spiff:

--- Quote from: Oldiesmann on August 16, 2003, 03:21:13 PM ---The errors were saying that ".../forum/robots.txt" does not exist. I know what they're looking for, but can't figure out why I would get an error message if it didn't exist.

--- End quote ---
well, your server _should_ log every 404 errors


--- Quote ---How can I control how far they go? I have been looking online for info on the meta robots tags, but that hasn't gotten me very far.
--- End quote ---
i made one to disallow the bots crawling on all the download links:
--- Code: ---User-agent: *
Disallow: /download.php
--- End code ---

tutorials: http://www.google.com/search?q=robots.txt

Oldiesmann:
Well, I figured out how to keep the really aggressive bots from Fast Search, Inc. (alltheweb.com) off my board. I browsed their extensive FAQ section, and found that putting


--- Code: ---User-agent: fast
Disallow: /
--- End code ---

in a robots.txt file in the forum directory will keep them off the board. They are the most aggressive bots I've seen, so I'm glad I got them off my board now.

sensovision:

--- Quote ---Well, I figured out how to keep the really aggressive bots from Fast Search, Inc. (alltheweb.com) off my board. I browsed their extensive FAQ section, and found that putting

--- End quote ---
Oldiesmann, but do you realize that by doing this you'll also loose potential visitors from this engine? I agree that disallowing of Alexa will not make a big difference, many big forums ban it to save on bandwidth, but as for ATW I would let it spider your pages... as many people think that in near future it could become as much important as Google or even more since if Yahoo would make wise move, ATW would be serious threat to the Googel and rest of engines. Currently about 10% of my search engine traffic already come to me from AllTheWeb so even if it crawl about 2000 pages for one time and even if take into consideration 166.39 MB of bandwidth which it used this month... I think it's worth it.
As for disallowing if you really wish to save on bandwidth maybe you could let it spider you fully first and later disallow it, so ATW would have your pages(even if not updated) in index and send you some visitors.

Ben_S:
I get shed loads of traffic from alltheweb, top search engine, why you would want to keep it out I dont know...

Navigation

[0] Message Index

Go to full version