Simple Machines Community Forum

SMF Support => SMF 2.0.x Support => Topic started by: pjn on July 27, 2010, 04:41:21 AM

Title: .htaccess / robots.txt
Post by: pjn on July 27, 2010, 04:41:21 AM
i want to block "Twiceler" but i whant get to googlebot to spy on web

i do this:
in file: robots.txt :
QuoteUser-agent: *
Allow: /

in file: .htaccess :
QuoteRewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^cuill.com
RewriteRule ^.* - [F,L]


but is still spy on my web...


what i need to do ????????

thanx
...
Title: Re: .htaccess / robots.txt
Post by: chilly on July 27, 2010, 05:16:27 AM
Quote from: pjn on July 27, 2010, 04:41:21 AM

in file: robots.txt :
QuoteUser-agent: *
Allow: /

I'm not used to .htaccess handling but I know that you allow every user agent in your robots.txt
so why should that bot stop to crawl your website
Title: Re: .htaccess / robots.txt
Post by: pjn on July 27, 2010, 06:05:15 AM
did you can give me a-code to stop that spy from robots?...
Title: Re: .htaccess / robots.txt
Post by: busterone on July 27, 2010, 08:56:12 AM
User-agent: Twiceler
Disallow: /
Title: Re: .htaccess / robots.txt
Post by: pjn on July 27, 2010, 11:09:26 AM
OK

delete this lines??? :
User-agent: *
Allow: /


and delete this file:
.htaccess

?

or what i need to do?
Title: Re: .htaccess / robots.txt
Post by: pjn on July 27, 2010, 11:10:54 AM
it's still spy :( :

http://pjn-il.com/index.php?action=who
Title: Re: .htaccess / robots.txt
Post by: YogiBear on July 27, 2010, 11:39:30 AM
The problem is, over time, Twiceler doesn't keep to any specific IP range but take a look at this...


http://www.cuil.com/info/webmaster_info/


...apparently, you can contact them and ask them not to crawl your pages.
Title: Re: .htaccess / robots.txt
Post by: pjn on July 28, 2010, 01:09:03 AM
i will try this, OK...
Title: Re: .htaccess / robots.txt
Post by: Adish - (F.L.A.M.E.R) on August 16, 2010, 09:17:05 PM
Any updates over this? Do you still require any assistance?