msnbot causing errors

Started by Sir Osis of Liver, February 19, 2020, 10:22:41 PM

Previous topic - Next topic

Sir Osis of Liver

msnbot is hosing one of my forums, getting thousands of errors -



Database Error: Out of range value for column 'page_hits' at row 1



page_hits is at 65535 every day for past week or two.  Any way to block it?
Ashes and diamonds, foe and friend,
 we were all equal in the end.

                                     - R. Waters

shawnb61

Hmmm...  First time we've seen this in 2.0.

This was reported in 2.1:
https://github.com/SimpleMachines/SMF2.1/issues/5890

And the fix was to increase the size of the field:
https://github.com/SimpleMachines/SMF2.1/pull/5896


I had thought there was a difference in how these were counted between 2.0 & 2.1, but I guess not...
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

drewactual

it's been my experience that msn does a good job of observing rules in robots.txt, you may give that a shot as a first line of defense and see what happens?  make sure after you change it to go to their console and re-index it....

shawnb61

My experience is that they not only ignored robots.txt, they ignored bing webmaster tools as well.

I block them in .htaccess - for not playing nice, for eating significant cpu, & not providing any meaningful traffic to the forum. 
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

Sir Osis of Liver

I'll try blocking it with robots.txt, if that doesn't work will use .htaccess.  Some of these forums don't like blocking spiders, but it's logged over 50k errors in 3 days.  They've had a chronic problem with slow page loads, this doesn't help.
Ashes and diamonds, foe and friend,
 we were all equal in the end.

                                     - R. Waters

drewactual

i speak of my experiences... ymmv.

shawnb61

Quote from: drewactual on February 19, 2020, 10:55:38 PM
i speak of my experiences... ymmv.

True, I blocked 'em back in 2015.  Maybe they started playing nice since.  We don't miss 'em, so we never gave 'em another chance. 

It would probably make sense to attempt a robots.txt solution first.  Especially if your forum stats indicate you get traffic from them. 

My solution was adding this in .htaccess:
SetEnvIfNoCase User-Agent bingbot bad_bot

<Limit GET POST HEAD>
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</Limit>


The result was the elimination of CPU spikes at the time. 
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

Sir Osis of Liver

Rolled right over robots.txt, 600 errors in 8 minutes.  .htaccess seems to have stopped it.  I thought msnbot was retired a while ago.
Ashes and diamonds, foe and friend,
 we were all equal in the end.

                                     - R. Waters

shawnb61

Yep.  Now you are getting warnings in your apache log instead...  BUT...  You have likely stabilized CPU spikes caused by these guys. 
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

Sir Osis of Liver

Running clean, no errors, and forum is running faster for first time I can remember.  Will see how long it lasts.
Ashes and diamonds, foe and friend,
 we were all equal in the end.

                                     - R. Waters

Advertisement: