MSN spiders cause massive numbers of errors

Started by woolly bugger, December 12, 2019, 09:36:25 AM

Previous topic - Next topic

woolly bugger

When I see hundreds of MSN spiders crawling my board I get huge number of erro
A lot of the are looking at the grep

How should I put a stop to this?

Illori

I don't see that you have attached a screenshot of your error log, just that you have errors and the spiders in the who's online page. without knowing the errors we cannot help fix them.

woolly bugger

with 229,374 errors that very in degree I didn't bother... but will the next time they occur, as I deleted them this time...  my bad


Illori


mantu2

Hi,

I have had the same problem for a small while now. At the moment around 30 000 error hits. Errors occurs daily at the same time. The amount just variate a bit. The error message is same on my forum. Version is 2.1.RC2. I hope there is an solution.

Illori

you should upgrade to the latest version on github if you are not using it already. if you still get the error let us know.


shawnb61

Confirming:  You're saying you uploaded a whole new set of files as of 11/22?  (Not just that one PR, correct?)

If so... 

How many rows do you have in log_spider_stats?

Do you see entries in log_spider_stats across multiple days for MSN?

What spider logging level do you have?  (Standard, moderate, aggresive?  Under Admin | Forum | Search Engines | Settings)

If you're on current code, this appears to be two separate issues:
1) Too many pagehits; the field has a 65K max that is being exceeded
2) A bunch of undefined entries; it's possible that they're not defined for guests/bots

I am wondering if the date isn't being incremented somehow...  OTOH, if bing is hitting you >65K times in a day, well, we have another problem.

Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

woolly bugger

I was getting ready to upgrade to the latest github release before I read your reply, so I put the forum in maintenance mode, emptied out unimportant logs and exported the database..

then I checked this forum and read your reply.

I did the upgrade of all files on 11/22

The Search Engine Tracking level is Standard

what is up with  smf_log_search_words ? see ATTACHED,






shawnb61

Quote from: shawnb61 on December 13, 2019, 08:16:19 PM
Do you see entries in log_spider_stats across multiple days for MSN?

Could you dump some of that content?  It would help to see what that looks like...


(log_search_words is your search index when using a custom index - that's normal...)
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

woolly bugger

#11
see attached

also showing my mods...



shawnb61

Perfect. 

Yep, looks like in a recrawl, the value of page-hits can be legitimately exceeded when tracking stats. 
Logged:  https://github.com/SimpleMachines/SMF2.1/issues/5890

The 'undefined' issues are possibly a byproduct, not sure.  We should try a fix for that & see if they go away.

I am going to move this to the Bug Reports board.
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp


shawnb61

Quote from: woolly bugger on December 12, 2019, 09:36:25 AM
How should I put a stop to this?

To eliminate these errors going forward, I would suggest changing the page_hits column in the smf_log_spider_stats table from smallint to int. 

I cannot replicate the undefined errors you have (my first suspicion is Tapatalk...).  But I'd suggest changing page_hits to INT as a first step to cleaning up your logs.

PR submitted:
https://github.com/SimpleMachines/SMF2.1/pull/5896
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

shawnb61

The fix for this issue has been merged, so this will be closed.

The fix is available on the latest version of 2.1 over on GitHub.
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

Advertisement: