News:

Wondering if this will always be free?  See why free is better.

Main Menu

Forum getting swarmed by 500+ guests?

Started by Rhindeer, May 10, 2024, 02:17:40 PM

Previous topic - Next topic

Rhindeer

Sooo, I'm having a strange issue in that my forum (http://spiritsoftheearth.net/smf) is getting swarmed by guests. No registrations, as I have a spam mod installed. However, I was able to trace them back to Amazon.

I tried banning a range of them, but that ended up banning some actual users. So I undid that.

Anyway, the issue is that it's causing errors because it's maxing out the "max_questions" resource.

Is there a way to resolve this? I'm not actually sure what to do since my host has already raised "max_questions" to the limit.

I attached an image showing some of the guests as an example. As I send this there are 520 guests. Ahhh!You cannot view this attachment.

shawnb61

I assume you mean "max_connections".

I'm seeing the same thing - and for me, there has been an explosion of new bots recently.  You need to find a good way to identify & block those bots.  In some cases, existing bots are just not honoring crawl-delays. 

I've been periodically reviewing the actual web access logs to identify the bots, and adding entries to .htaccess to block what I find.

It's been kind of like whack-a-mole...  Every time I think I have it covered, a new bot shows up.  Some of them unabashedly AI related. 

One recent surprise is GoogleOther.  This fairly new bot has been hammering my site - and with very strange hackish behavior (many many thousands of attempts at verification codes?!?!?).  And yes, I've confirmed they are Google IPs.  Not honoring robots.txt at all....
https://thriveagency.com/news/google-introduces-new-googlebot-web-crawler-named-googleother-what-you-need-to-know/

WTF Google...

Anyway @vbgamer45 generously shared a sample .htaccess here:
https://www.simplemachines.org/community/index.php?msg=4170375

Other helpful reading:
https://www.imperva.com/blog/most-active-good-bots/
https://radar.cloudflare.com/traffic/verified-bots
https://github.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/blob/master/_generator_lists/bad-user-agents.list

Hope this helps.
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

Rhindeer

Quote from: shawnb61 on May 10, 2024, 02:51:58 PMI assume you mean "max_connections".

I'm seeing the same thing - and for me, there has been an explosion of new bots recently.  You need to find a good way to identify & block those bots.  In some cases, existing bots are just not honoring crawl-delays. 

I've been periodically reviewing the actual web access logs to identify the bots, and adding entries to .htaccess to block what I find.

It's been kind of like whack-a-mole...  Every time I think I have it covered, a new bot shows up.  Some of them unabashedly AI related. 

One recent surprise is GoogleOther.  This fairly new bot has been hammering my site - and with very strange hackish behavior (many many thousands of attempts at verification codes?!?!?).  And yes, I've confirmed they are Google IPs.  Not honoring robots.txt at all....
https://thriveagency.com/news/google-introduces-new-googlebot-web-crawler-named-googleother-what-you-need-to-know/

WTF Google...

Anyway @vbgamer45 generously shared a sample .htaccess here:
https://www.simplemachines.org/community/index.php?msg=4170375

Other helpful reading:
https://www.imperva.com/blog/most-active-good-bots/
https://radar.cloudflare.com/traffic/verified-bots
https://github.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/blob/master/_generator_lists/bad-user-agents.list

Hope this helps.
Thank you so much for this! This really does help! <3 Though I admit I've never edited the htaccess file before. The code that was linked--where would I put it in this file? It's my current htaccess file. I apologize, I'm a n00b when it comes to a lot of this. D:

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

Also strangely enough it wasn't "max_connections", it was "max_questions"! I've had the max_connections error in this past, but this one keeps popping up as max_questions.

But oof! Glad to know I'm not alone in this! Well, kinda--I'm sorry everyone else has to deal with the bot plight too. Dx The hackish behavior is disturbing. So far I haven't seen any of that, thank goodness.

Aleksi "Lex" Kilpinen

Slava
Ukraini!
"Before you allow people access to your forum, especially in an administrative position, you must be aware that that person can seriously damage your forum. Therefore, you should only allow people that you trust, implicitly, to have such access." -Douglas

How you can help SMF

shawnb61

#4
Quote from: Rhindeer on May 13, 2024, 02:42:17 AMThank you so much for this! This really does help! <3 Though I admit I've never edited the htaccess file before. The code that was linked--where would I put it in this file? It's my current htaccess file. I apologize, I'm a n00b when it comes to a lot of this. D:

Code Select Expand
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

I would place the new code below the block displayed above.  Then TEST...  First off, make sure you haven't cut yourself off!  (Typos can really mess things up...)  You should still be able to access the site as normal.

I test using Chrome, because it has a nice little interface for updating your user-agent for testing purposes.  In Chrome, open the dev console.  Open the "Network Conditions" tab.  In the User agent section, if you uncheck the "Use browser default" box, then specify a "Custom" user agent, you can type anything in there.  Type each bad guy, navigate to your site, and you should be forbidden from accessing the site.
https://developer.chrome.com/docs/devtools/device-mode/override-user-agent

(When done testing, change it back to the browser default if you actually use Chrome...)

Quote from: Rhindeer on May 13, 2024, 02:42:17 AMAlso strangely enough it wasn't "max_connections", it was "max_questions"! I've had the max_connections error in this past, but this one keeps popping up as max_questions.

Interesting.  Apparently "max_questions" is in fact a standard per-hour limit.  Learn something new every day.
https://dev.mysql.com/doc/refman/8.0/en/user-resources.html
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

Rhindeer

Quote from: shawnb61 on May 14, 2024, 03:44:25 PM
Quote from: Rhindeer on May 13, 2024, 02:42:17 AMThank you so much for this! This really does help! <3 Though I admit I've never edited the htaccess file before. The code that was linked--where would I put it in this file? It's my current htaccess file. I apologize, I'm a n00b when it comes to a lot of this. D:

Code Select Expand
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

I would place the new code below the block displayed above.  Then TEST...  First off, make sure you haven't cut yourself off!  (Typos can really mess things up...)  You should still be able to access the site as normal.
I just tried putting it in my htaccess file (pasted it directly under the above code) and it gave me a 500 internal error, so I removed it and it's back to normal. D: So not quite sure how to edit this file, ahh!

For reference I was using this suggested code:

<Location />
<Limit GET POST PUT>

# Begin Bad Bot Blocking
BrowserMatchNoCase OmniExplorer_Bot/6.11.1 bad_bot
BrowserMatchNoCase omniexplorer_bot bad_bot
BrowserMatchNoCase Baiduspider bad_bot
BrowserMatchNoCase Baiduspider/2.0 bad_bot
BrowserMatchNoCase yandex bad_bot
BrowserMatchNoCase yandeximages bad_bot
BrowserMatchNoCase Spinn3r bad_bot
BrowserMatchNoCase sogou bad_bot
BrowserMatchNoCase Sogouwebspider/3.0 bad_bot
BrowserMatchNoCase Sogouwebspider/4.0 bad_bot
BrowserMatchNoCase sosospider+ bad_bot
BrowserMatchNoCase jikespider bad_bot
BrowserMatchNoCase ia_archiver bad_bot
BrowserMatchNoCase PaperLiBot bad_bot
BrowserMatchNoCase ahrefsbot bad_bot
BrowserMatchNoCase ahrefsbot/1.0 bad_bot
BrowserMatchNoCase SiteBot/0.1 bad_bot
BrowserMatchNoCase DNS-Digger/1.0 bad_bot
BrowserMatchNoCase DNS-Digger-Explorer/1.0 bad_bot
BrowserMatchNoCase boardreader bad_bot
BrowserMatchNoCase radian6 bad_bot
BrowserMatchNoCase R6_FeedFetcher bad_bot
BrowserMatchNoCase R6_CommentReader bad_bot
BrowserMatchNoCase ScoutJet bad_bot
BrowserMatchNoCase ezooms bad_bot
BrowserMatchNoCase CC-rget/5.818 bad_bot
BrowserMatchNoCase libwww-perl/5.813 bad_bot
BrowserMatchNoCase magpie-crawler 1.1 bad_bot
BrowserMatchNoCase jakarta bad_bot
BrowserMatchNoCase discobot/1.0 bad_bot
BrowserMatchNoCase MJ12bot bad_bot
BrowserMatchNoCase MJ12bot/v1.2.0 bad_bot
BrowserMatchNoCase MJ12bot/v1.2.5 bad_bot
BrowserMatchNoCase SemrushBot/0.9 bad_bot
BrowserMatchNoCase MLBot bad_bot
BrowserMatchNoCase butterfly bad_bot
BrowserMatchNoCase SeznamBot/3.0 bad_bot
BrowserMatchNoCase HuaweiSymantecSpider bad_bot
BrowserMatchNoCase Exabot/2.0 bad_bot
BrowserMatchNoCase netseer/0.1 bad_bot
BrowserMatchNoCase NetSeer crawler/2.0 bad_bot
BrowserMatchNoCase NetSeer/Nutch-0.9 bad_bot
BrowserMatchNoCase psbot/0.1 bad_bot
BrowserMatchNoCase moreoverbot/5.0 bad_bot
BrowserMatchNoCase Jakarta Commons-HttpClient/3.0 bad_bot
BrowserMatchNoCase SocialSpider-Finder/0.2 bad_bot
BrowserMatchNoCase wordpress bad_bot
BrowserMatchNoCase istellabot bad_bot
BrowserMatchNoCase SeznamBot bad_bot
BrowserMatchNoCase Cliqzbot bad_bot
BrowserMatchNoCase SocialRankIOBot bad_bot
BrowserMatchNoCase Mail.RU_Bot bad_bot
BrowserMatchNoCase Clickag Bot bad_bot
BrowserMatchNoCase Mediatoolkitbot bad_bot
BrowserMatchNoCase SemrushBot bad_bot
BrowserMatchNoCase DotBot/1.1 bad_bot
BrowserMatchNoCase DataForSeoBot bad_bot
BrowserMatchNoCase www.timpi.io bad_bot
BrowserMatchNoCase DotBot bad_bot
BrowserMatchNoCase trendictionbot bad_bot
BrowserMatchNoCase BLEXBot/1.0 bad_bot
BrowserMatchNoCase SeekportBot bad_bot
BrowserMatchNoCase Turnitin bad_bot
BrowserMatchNoCase omgili/0.5 bad_bot
BrowserMatchNoCase CheckHost bad_bot
BrowserMatchNoCase Amazonbot bad_bot
BrowserMatchNoCase SEOkicks bad_bot
<RequireAll>
Require all granted
<RequireNone>
Require env bad_bot
</RequireNone>
</RequireAll>

</Limit>
</Location>

shawnb61

There has been some changes lately, e.g., the new recommended syntax is 'require', like @vbgamer45 uses.

I still use the old syntax.  So I took your list, added a few agents that have been bugging me lately (including GoogleOther and claudebot), and utilized the old syntax.  I also enclosed bots with embedded spaces in single quotes.

Try this:

# Begin Bad Bot Blocking
BrowserMatchNoCase OmniExplorer_Bot/6.11.1 bad_bot
BrowserMatchNoCase omniexplorer_bot bad_bot
BrowserMatchNoCase Baiduspider bad_bot
BrowserMatchNoCase Baiduspider/2.0 bad_bot
BrowserMatchNoCase yandex bad_bot
BrowserMatchNoCase yandeximages bad_bot
BrowserMatchNoCase Spinn3r bad_bot
BrowserMatchNoCase sogou bad_bot
BrowserMatchNoCase Sogouwebspider/3.0 bad_bot
BrowserMatchNoCase Sogouwebspider/4.0 bad_bot
BrowserMatchNoCase sosospider+ bad_bot
BrowserMatchNoCase jikespider bad_bot
BrowserMatchNoCase ia_archiver bad_bot
BrowserMatchNoCase PaperLiBot bad_bot
BrowserMatchNoCase ahrefsbot bad_bot
BrowserMatchNoCase ahrefsbot/1.0 bad_bot
BrowserMatchNoCase SiteBot/0.1 bad_bot
BrowserMatchNoCase DNS-Digger/1.0 bad_bot
BrowserMatchNoCase DNS-Digger-Explorer/1.0 bad_bot
BrowserMatchNoCase boardreader bad_bot
BrowserMatchNoCase radian6 bad_bot
BrowserMatchNoCase R6_FeedFetcher bad_bot
BrowserMatchNoCase R6_CommentReader bad_bot
BrowserMatchNoCase ScoutJet bad_bot
BrowserMatchNoCase ezooms bad_bot
BrowserMatchNoCase CC-rget/5.818 bad_bot
BrowserMatchNoCase libwww-perl/5.813 bad_bot
BrowserMatchNoCase 'magpie-crawler 1.1' bad_bot
BrowserMatchNoCase jakarta bad_bot
BrowserMatchNoCase discobot/1.0 bad_bot
BrowserMatchNoCase MJ12bot bad_bot
BrowserMatchNoCase MJ12bot/v1.2.0 bad_bot
BrowserMatchNoCase MJ12bot/v1.2.5 bad_bot
BrowserMatchNoCase SemrushBot/0.9 bad_bot
BrowserMatchNoCase MLBot bad_bot
BrowserMatchNoCase butterfly bad_bot
BrowserMatchNoCase SeznamBot/3.0 bad_bot
BrowserMatchNoCase HuaweiSymantecSpider bad_bot
BrowserMatchNoCase Exabot/2.0 bad_bot
BrowserMatchNoCase netseer/0.1 bad_bot
BrowserMatchNoCase 'NetSeer crawler/2.0' bad_bot
BrowserMatchNoCase NetSeer/Nutch-0.9 bad_bot
BrowserMatchNoCase psbot/0.1 bad_bot
BrowserMatchNoCase moreoverbot/5.0 bad_bot
BrowserMatchNoCase 'Jakarta Commons-HttpClient/3.0' bad_bot
BrowserMatchNoCase SocialSpider-Finder/0.2 bad_bot
BrowserMatchNoCase wordpress bad_bot
BrowserMatchNoCase istellabot bad_bot
BrowserMatchNoCase SeznamBot bad_bot
BrowserMatchNoCase Cliqzbot bad_bot
BrowserMatchNoCase SocialRankIOBot bad_bot
BrowserMatchNoCase Mail.RU_Bot bad_bot
BrowserMatchNoCase 'Clickag Bot' bad_bot
BrowserMatchNoCase Mediatoolkitbot bad_bot
BrowserMatchNoCase SemrushBot bad_bot
BrowserMatchNoCase DotBot/1.1 bad_bot
BrowserMatchNoCase DataForSeoBot bad_bot
BrowserMatchNoCase www.timpi.io bad_bot
BrowserMatchNoCase DotBot bad_bot
BrowserMatchNoCase trendictionbot bad_bot
BrowserMatchNoCase BLEXBot/1.0 bad_bot
BrowserMatchNoCase SeekportBot bad_bot
BrowserMatchNoCase Turnitin bad_bot
BrowserMatchNoCase omgili/0.5 bad_bot
BrowserMatchNoCase CheckHost bad_bot
BrowserMatchNoCase Amazonbot bad_bot
BrowserMatchNoCase SEOkicks bad_bot
BrowserMatchNoCase Claudebot bad_bot
BrowserMatchNoCase bomborabot bad_bot
BrowserMatchNoCase commoncrawl bad_bot
BrowserMatchNoCase dataforseo-bot bad_bot
BrowserMatchNoCase GoogleOther bad_bot
BrowserMatchNoCase keys-so-bot bad_bot
BrowserMatchNoCase MojeekBot bad_bot
<Limit GET POST HEAD>
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</Limit>
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

Rhindeer

Quote from: shawnb61 on May 14, 2024, 05:12:56 PMThere has been some changes lately, e.g., the new recommended syntax is 'require', like @vbgamer45 uses.

I still use the old syntax.  So I took your list, added a few agents that have been bugging me lately (including GoogleOther and claudebot), and utilized the old syntax.  I also enclosed bots with embedded spaces in single quotes.

Try this:

# Begin Bad Bot Blocking
BrowserMatchNoCase OmniExplorer_Bot/6.11.1 bad_bot
BrowserMatchNoCase omniexplorer_bot bad_bot
BrowserMatchNoCase Baiduspider bad_bot
BrowserMatchNoCase Baiduspider/2.0 bad_bot
BrowserMatchNoCase yandex bad_bot
BrowserMatchNoCase yandeximages bad_bot
BrowserMatchNoCase Spinn3r bad_bot
BrowserMatchNoCase sogou bad_bot
BrowserMatchNoCase Sogouwebspider/3.0 bad_bot
BrowserMatchNoCase Sogouwebspider/4.0 bad_bot
BrowserMatchNoCase sosospider+ bad_bot
BrowserMatchNoCase jikespider bad_bot
BrowserMatchNoCase ia_archiver bad_bot
BrowserMatchNoCase PaperLiBot bad_bot
BrowserMatchNoCase ahrefsbot bad_bot
BrowserMatchNoCase ahrefsbot/1.0 bad_bot
BrowserMatchNoCase SiteBot/0.1 bad_bot
BrowserMatchNoCase DNS-Digger/1.0 bad_bot
BrowserMatchNoCase DNS-Digger-Explorer/1.0 bad_bot
BrowserMatchNoCase boardreader bad_bot
BrowserMatchNoCase radian6 bad_bot
BrowserMatchNoCase R6_FeedFetcher bad_bot
BrowserMatchNoCase R6_CommentReader bad_bot
BrowserMatchNoCase ScoutJet bad_bot
BrowserMatchNoCase ezooms bad_bot
BrowserMatchNoCase CC-rget/5.818 bad_bot
BrowserMatchNoCase libwww-perl/5.813 bad_bot
BrowserMatchNoCase 'magpie-crawler 1.1' bad_bot
BrowserMatchNoCase jakarta bad_bot
BrowserMatchNoCase discobot/1.0 bad_bot
BrowserMatchNoCase MJ12bot bad_bot
BrowserMatchNoCase MJ12bot/v1.2.0 bad_bot
BrowserMatchNoCase MJ12bot/v1.2.5 bad_bot
BrowserMatchNoCase SemrushBot/0.9 bad_bot
BrowserMatchNoCase MLBot bad_bot
BrowserMatchNoCase butterfly bad_bot
BrowserMatchNoCase SeznamBot/3.0 bad_bot
BrowserMatchNoCase HuaweiSymantecSpider bad_bot
BrowserMatchNoCase Exabot/2.0 bad_bot
BrowserMatchNoCase netseer/0.1 bad_bot
BrowserMatchNoCase 'NetSeer crawler/2.0' bad_bot
BrowserMatchNoCase NetSeer/Nutch-0.9 bad_bot
BrowserMatchNoCase psbot/0.1 bad_bot
BrowserMatchNoCase moreoverbot/5.0 bad_bot
BrowserMatchNoCase 'Jakarta Commons-HttpClient/3.0' bad_bot
BrowserMatchNoCase SocialSpider-Finder/0.2 bad_bot
BrowserMatchNoCase wordpress bad_bot
BrowserMatchNoCase istellabot bad_bot
BrowserMatchNoCase SeznamBot bad_bot
BrowserMatchNoCase Cliqzbot bad_bot
BrowserMatchNoCase SocialRankIOBot bad_bot
BrowserMatchNoCase Mail.RU_Bot bad_bot
BrowserMatchNoCase 'Clickag Bot' bad_bot
BrowserMatchNoCase Mediatoolkitbot bad_bot
BrowserMatchNoCase SemrushBot bad_bot
BrowserMatchNoCase DotBot/1.1 bad_bot
BrowserMatchNoCase DataForSeoBot bad_bot
BrowserMatchNoCase www.timpi.io bad_bot
BrowserMatchNoCase DotBot bad_bot
BrowserMatchNoCase trendictionbot bad_bot
BrowserMatchNoCase BLEXBot/1.0 bad_bot
BrowserMatchNoCase SeekportBot bad_bot
BrowserMatchNoCase Turnitin bad_bot
BrowserMatchNoCase omgili/0.5 bad_bot
BrowserMatchNoCase CheckHost bad_bot
BrowserMatchNoCase Amazonbot bad_bot
BrowserMatchNoCase SEOkicks bad_bot
BrowserMatchNoCase Claudebot bad_bot
BrowserMatchNoCase bomborabot bad_bot
BrowserMatchNoCase commoncrawl bad_bot
BrowserMatchNoCase dataforseo-bot bad_bot
BrowserMatchNoCase GoogleOther bad_bot
BrowserMatchNoCase keys-so-bot bad_bot
BrowserMatchNoCase MojeekBot bad_bot
<Limit GET POST HEAD>
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</Limit>

Thank you SO MUCH! <3 The site didn't break and now gonna test everything in Chrome! I'll update ya! I really appreciate you!

shawnb61

#8
Note that some in the above list are valid search engines. 

But most of us are running on some form of shared host with limited resources.  We just can't let everyone crawl...  You have to think of it as a budget.  Who can you afford to let in?

I have a crawl delay in my robots.txt.  I try to allow all legit search engine crawlers that honor the crawl delay to crawl.  I want folks to find my content - even if they are in Asia or Russia...

So for now I have allowed even yandex, baidu, sogou, bingbot, mj12bot, mail_ru.  I block all 'market research', 'seo research' and AI bots.  Literally anything I cannot find a search engine for.

I have a love/hate relationship with yandex & bingbot...  At various times I have had them blocked, because both can go off & crawl far too aggressively.  But they both appear to be following crawl-delay at the moment. 

Note also that Googlebot does NOT honor crawl-delay.  You need to use their search console to limit the rate.  On occasion, even Googlebot can hit you pretty hard, if they get it in their head you need a complete recrawl...

Just things to be aware of.  For now, I suggest you stay as restrictive as possible until you are stable. 
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

Rhindeer

@shawnb61 IT WORKED! Tested in Chrome and it was perfect, and also got to watch my guest list drop from 710 users to 17. Thank you so much! Myself and my community really appreciate you! <3

Rhindeer

Quote from: shawnb61 on May 14, 2024, 05:25:40 PMNote that some in the above list are valid search engines. 

But most of us are running on some form of shared host with limited resources.  We just can't let everyone crawl...  You have to think of it as a budget.  Who can you afford to let in?

I have a crawl delay in my robots.txt.  I try to allow all legit search engine crawlers that honor the crawl delay to crawl.  I want folks to find my content - even if they are in Asia or Russia...

So for now I have allowed even yandex, baidu, sogou, bingbot, mj12bot, mail_ru. 

I have a love/hate relationship with yandex & bingbot...  At various times I have had them blocked, because both can go off & crawl far too aggressively.  But they both appear to be following crawl-delay at the moment. 

Note also that Googlebot does NOT honor crawl-delay.  You need to use their search console to limit the rate.  On occasion, even Googlebot can hit you pretty hard, if they get it in their head you need a complete recrawl...

Just things to be aware of.  For now, I suggest you stay as restrictive as possible until you are stable. 
Thank you for this, that makes total sense. I MAY in the future test run with allowing Google, Yandex, and Bing back since they've never given us issues (when they popped in we might get 20 extra guests, but never hundreds!) but for now I'm happy to keep everyone blocked. xD The Amazon bot was the worst one this round! I've never seen anything like this before.

shawnb61

To be clear - the above blocks this new bot called "GoogleOther", but this is not the normal search engine googlebot.  So Google search is allowed to crawl. 

GoogleOther is supposedly their research bot??? 

It does not honor crawl delay or any other robots.txt directive.  At all.  It has absolutely smashed my site. 

And get this - it was doing hundreds of 'action=verificationcode' calls per hour on my site...  From a valid Google IP. 

Blocked.
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

Rhindeer

Quote from: shawnb61 on May 14, 2024, 05:46:57 PMTo be clear - the above blocks this new bot called "GoogleOther", but this is not the normal search engine googlebot.  So Google search is allowed to crawl. 

GoogleOther is supposedly their research bot??? 

It does not honor crawl delay or any other robots.txt directive.  At all.  It has absolutely smashed my site. 

And get this - it was doing hundreds of 'action=verificationcode' calls per hour on my site...  From a valid Google IP. 

Blocked.
Ewwww wtf???

Yeah, no thanks, that one'll stay blocked!

Steve

It appears the OP's problem has been resolved. Marking solved.
DO NOT pm me for support!

shawnb61

Quote from: shawnb61 on May 14, 2024, 05:25:40 PMI have a love/hate relationship with yandex & bingbot...  At various times I have had them blocked, because both can go off & crawl far too aggressively.  But they both appear to be following crawl-delay at the moment. 

For the record, yandex is not honoring crawl-delay anymore, so I blocked them again.

Oddly, they DO read robots.txt.  If they are disallowed there, they will honor that.  But not crawl-delay.

Their support site says they do not honor crawl-delay, and suggests you create an account with them to control crawls (like Google...)

Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

Advertisement: