How to block baidu spider?

Started by peps1, November 27, 2009, 12:59:37 AM

Previous topic - Next topic

mbanusick2

what do u mean by stealing...do they also count as my site's visitors
so if i have a lot of bandwith to spare should i allow them...

Arantor

They also take up processing time which, on a shared host, means genuine users can't get to the forum properly.

mbanusick2

am sorry to bother you guys but how do i know a shared host...does that also mean that the bots are recorded as my guests and what do they do with those info they collect
Sorry To Bother You==
..Thanks a lot..

clyde4210

If your host does not allow rewrite then you could use.
SetEnvIfNoCase user-agent  "^baidu" bad_bot=1
<FilesMatch "(.*)">
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</FilesMatch>

While I don't agree with blocking hundreds of IP's for banning bots. It can be done that way. The time spent processing hundreds of IP's or a rewrite is about the same. I think this hxxp:www.webmasterworld.com/forum92/1145.htm [nonactive] post really sums it up. It all boils down to your servers processor and ram.

Grinch mentions the htaccess had to go thru every image and script. Well you can cache that using the htaccess,  which will speed up the processing time.
<IfModule mod_expires.c>
ExpiresActive On
ExpiresDefault A0

# Set up caching on media files for 1 year (forever?)
<FilesMatch "\.(ico|flv|pdf|mov|mp3|wmv|ppt)$">
  ExpiresDefault A29030400
  Header append Cache-Control "public"
</FilesMatch>

# Set up caching on media files for 1 week
<FilesMatch "\.(gif|jpg|jpeg|png|swf|bmp)$">
ExpiresDefault A604800
Header append Cache-Control "public"
</FilesMatch>

# Set up 2 Hour caching on commonly updated files
<FilesMatch "\.(xml|txt|html|js|css)$">
  ExpiresDefault A7200
  Header append Cache-Control "private, proxy-revalidate, must-revalidate"
</FilesMatch>
</IfModule>


I have NukeSentinel(tm) so I don't need htaccess as much for banning bots and or people.

midlandshunnies

I have just blocked a baidu spider - not sure if it was real all not using this code - different IP range - but seeing about 90 visits every 15 minutes or so so hopefully my site will see a load time improvement and less processing requirements!

<Files *.*>
        order allow,deny
        allow from all
        deny from 180.76.
</Files>

Martine M

I blocked 2 IP ranges and he was gone for a few days then came pack with another IP range,
Blocked that one to and now I think he is gone.
Running SMF 2.09 - Diego Andrés Theme Elegant Mind - TP 1.0 - Main Forum language English - Browser Firefox 33


Quexinos

Hey guys, sorry to bump this but I used the IP deny tool in Cpanel to block Baidu and a couple other search engines.  I just wasn't hapy with how often they came by and that seems to work.

Does using that take up a lot of resources?  It just writes it to .htaccess right?  I want to make sure I'm using as little resources as possible and I've only blocked like 5 IPs, so I should be okay right?

a10

QuoteI've only blocked like 5 IPs, so I should be okay right?
Yes. You can block a lot more if you need to. See this post.
2.0.19, php 8.0.23, MariaDB 10.5.15. Mods: Contact Page, Like Posts, Responsive Curve, Search Focus Dropdown, Add Join Date to Post.

Igal-Incapsula

Hi
Please note that BaiduSpider can, and will, access your site from several different IP ranges - not only from 180.76...
Here are few others IPs, that Baidu will use.
125.39.78.0
123.125.66.15
220.181.7.13
119.63.193.0
and more.

You can verify Baidu spider IPs and/or user-agents via hxxp:botopedia.org [nonactive]

Martine M

Thanks for the url I'll bookmark it.
at this moment I successfully blocked Baïdu for a while now.
Running SMF 2.09 - Diego Andrés Theme Elegant Mind - TP 1.0 - Main Forum language English - Browser Firefox 33


Advertisement: