Forum getting swarmed by 500+ guests?

Rhindeer · May 10, 2024, 02:17:40 PM

Sooo, I'm having a strange issue in that my forum (http://spiritsoftheearth.net/smf) is getting swarmed by guests. No registrations, as I have a spam mod installed. However, I was able to trace them back to Amazon.

I tried banning a range of them, but that ended up banning some actual users. So I undid that.

Anyway, the issue is that it's causing errors because it's maxing out the "max_questions" resource.

Is there a way to resolve this? I'm not actually sure what to do since my host has already raised "max_questions" to the limit.

I attached an image showing some of the guests as an example. As I send this there are 520 guests. Ahhh!You cannot view this attachment.

shawnb61 · May 10, 2024, 02:51:58 PM

I assume you mean "max_connections".

I'm seeing the same thing - and for me, there has been an explosion of new bots recently. You need to find a good way to identify & block those bots. In some cases, existing bots are just not honoring crawl-delays.

I've been periodically reviewing the actual web access logs to identify the bots, and adding entries to .htaccess to block what I find.

It's been kind of like whack-a-mole... Every time I think I have it covered, a new bot shows up. Some of them unabashedly AI related.

One recent surprise is GoogleOther. This fairly new bot has been hammering my site - and with very strange hackish behavior (many many thousands of attempts at verification codes?!?!?). And yes, I've confirmed they are Google IPs. Not honoring robots.txt at all....
https://thriveagency.com/news/google-introduces-new-googlebot-web-crawler-named-googleother-what-you-need-to-know/

WTF Google...

Anyway @vbgamer45 generously shared a sample .htaccess here:
https://www.simplemachines.org/community/index.php?msg=4170375

Other helpful reading:
https://www.imperva.com/blog/most-active-good-bots/
https://radar.cloudflare.com/traffic/verified-bots
https://github.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/blob/master/_generator_lists/bad-user-agents.list

Hope this helps.

Rhindeer · May 13, 2024, 02:42:17 AM

Quote from: shawnb61 on May 10, 2024, 02:51:58 PMI assume you mean "max_connections".

I'm seeing the same thing - and for me, there has been an explosion of new bots recently. You need to find a good way to identify & block those bots. In some cases, existing bots are just not honoring crawl-delays.

I've been periodically reviewing the actual web access logs to identify the bots, and adding entries to .htaccess to block what I find.

It's been kind of like whack-a-mole... Every time I think I have it covered, a new bot shows up. Some of them unabashedly AI related.

One recent surprise is GoogleOther. This fairly new bot has been hammering my site - and with very strange hackish behavior (many many thousands of attempts at verification codes?!?!?). And yes, I've confirmed they are Google IPs. Not honoring robots.txt at all....
https://thriveagency.com/news/google-introduces-new-googlebot-web-crawler-named-googleother-what-you-need-to-know/

WTF Google...

Anyway @vbgamer45 generously shared a sample .htaccess here:
https://www.simplemachines.org/community/index.php?msg=4170375

Other helpful reading:
https://www.imperva.com/blog/most-active-good-bots/
https://radar.cloudflare.com/traffic/verified-bots
https://github.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/blob/master/_generator_lists/bad-user-agents.list

Hope this helps.

Thank you so much for this! This really does help!

Though I admit I've never edited the htaccess file before. The code that was linked--where would I put it in this file? It's my current htaccess file. I apologize, I'm a n00b when it comes to a lot of this. D:

Code Select

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

Also strangely enough it wasn't "max_connections", it was "max_questions"! I've had the max_connections error in this past, but this one keeps popping up as max_questions.

But oof! Glad to know I'm not alone in this! Well, kinda--I'm sorry everyone else has to deal with the bot plight too. Dx The hackish behavior is disturbing. So far I haven't seen any of that, thank goodness.

Aleksi "Lex" Kilpinen · May 14, 2024, 01:47:48 PM

Unrelated discussion about Who's Online and guests split to it's own topic Who's Online (Was: re: Forum getting swarmed by 500+ guests?)

shawnb61 · May 14, 2024, 03:44:25 PM

Quote from: Rhindeer on May 13, 2024, 02:42:17 AMThank you so much for this! This really does help! Though I admit I've never edited the htaccess file before. The code that was linked--where would I put it in this file? It's my current htaccess file. I apologize, I'm a n00b when it comes to a lot of this. D:

Code Select Expand
Code Select Expand
<IfModule mod_rewrite.c> RewriteEngine On RewriteBase / RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule . /index.php [L] </IfModule>

I would place the new code below the block displayed above. Then TEST... First off, make sure you haven't cut yourself off! (Typos can really mess things up...) You should still be able to access the site as normal.

I test using Chrome, because it has a nice little interface for updating your user-agent for testing purposes. In Chrome, open the dev console. Open the "Network Conditions" tab. In the User agent section, if you uncheck the "Use browser default" box, then specify a "Custom" user agent, you can type anything in there. Type each bad guy, navigate to your site, and you should be forbidden from accessing the site.
https://developer.chrome.com/docs/devtools/device-mode/override-user-agent

(When done testing, change it back to the browser default if you actually use Chrome...)

Quote from: Rhindeer on May 13, 2024, 02:42:17 AMAlso strangely enough it wasn't "max_connections", it was "max_questions"! I've had the max_connections error in this past, but this one keeps popping up as max_questions.

Interesting. Apparently "max_questions" is in fact a standard per-hour limit. Learn something new every day.
https://dev.mysql.com/doc/refman/8.0/en/user-resources.html

Rhindeer · May 14, 2024, 04:37:08 PM

Quote from: shawnb61 on May 14, 2024, 03:44:25 PM
Quote from: Rhindeer on May 13, 2024, 02:42:17 AMThank you so much for this! This really does help! Though I admit I've never edited the htaccess file before. The code that was linked--where would I put it in this file? It's my current htaccess file. I apologize, I'm a n00b when it comes to a lot of this. D:

Code Select Expand
Code Select Expand
<IfModule mod_rewrite.c> RewriteEngine On RewriteBase / RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule . /index.php [L] </IfModule>

I would place the new code below the block displayed above. Then TEST... First off, make sure you haven't cut yourself off! (Typos can really mess things up...) You should still be able to access the site as normal.

I just tried putting it in my htaccess file (pasted it directly under the above code) and it gave me a 500 internal error, so I removed it and it's back to normal. D: So not quite sure how to edit this file, ahh!

For reference I was using this suggested code:

Code Select

<Location />
<Limit GET POST PUT>

# Begin Bad Bot Blocking
BrowserMatchNoCase OmniExplorer_Bot/6.11.1 bad_bot
BrowserMatchNoCase omniexplorer_bot bad_bot
BrowserMatchNoCase Baiduspider bad_bot
BrowserMatchNoCase Baiduspider/2.0 bad_bot
BrowserMatchNoCase yandex bad_bot
BrowserMatchNoCase yandeximages bad_bot
BrowserMatchNoCase Spinn3r bad_bot
BrowserMatchNoCase sogou bad_bot
BrowserMatchNoCase Sogouwebspider/3.0 bad_bot
BrowserMatchNoCase Sogouwebspider/4.0 bad_bot
BrowserMatchNoCase sosospider+ bad_bot
BrowserMatchNoCase jikespider bad_bot
BrowserMatchNoCase ia_archiver bad_bot
BrowserMatchNoCase PaperLiBot bad_bot
BrowserMatchNoCase ahrefsbot bad_bot
BrowserMatchNoCase ahrefsbot/1.0 bad_bot
BrowserMatchNoCase SiteBot/0.1 bad_bot
BrowserMatchNoCase DNS-Digger/1.0 bad_bot
BrowserMatchNoCase DNS-Digger-Explorer/1.0 bad_bot
BrowserMatchNoCase boardreader bad_bot
BrowserMatchNoCase radian6 bad_bot
BrowserMatchNoCase R6_FeedFetcher bad_bot
BrowserMatchNoCase R6_CommentReader bad_bot
BrowserMatchNoCase ScoutJet bad_bot
BrowserMatchNoCase ezooms bad_bot
BrowserMatchNoCase CC-rget/5.818 bad_bot
BrowserMatchNoCase libwww-perl/5.813 bad_bot
BrowserMatchNoCase magpie-crawler 1.1 bad_bot
BrowserMatchNoCase jakarta bad_bot
BrowserMatchNoCase discobot/1.0 bad_bot
BrowserMatchNoCase MJ12bot bad_bot
BrowserMatchNoCase MJ12bot/v1.2.0 bad_bot
BrowserMatchNoCase MJ12bot/v1.2.5 bad_bot
BrowserMatchNoCase SemrushBot/0.9 bad_bot
BrowserMatchNoCase MLBot bad_bot
BrowserMatchNoCase butterfly bad_bot
BrowserMatchNoCase SeznamBot/3.0 bad_bot
BrowserMatchNoCase HuaweiSymantecSpider bad_bot
BrowserMatchNoCase Exabot/2.0 bad_bot
BrowserMatchNoCase netseer/0.1 bad_bot
BrowserMatchNoCase NetSeer crawler/2.0 bad_bot
BrowserMatchNoCase NetSeer/Nutch-0.9 bad_bot
BrowserMatchNoCase psbot/0.1 bad_bot
BrowserMatchNoCase moreoverbot/5.0 bad_bot
BrowserMatchNoCase Jakarta Commons-HttpClient/3.0 bad_bot
BrowserMatchNoCase SocialSpider-Finder/0.2 bad_bot
BrowserMatchNoCase wordpress bad_bot
BrowserMatchNoCase istellabot bad_bot
BrowserMatchNoCase SeznamBot bad_bot
BrowserMatchNoCase Cliqzbot bad_bot
BrowserMatchNoCase SocialRankIOBot bad_bot
BrowserMatchNoCase Mail.RU_Bot bad_bot
BrowserMatchNoCase Clickag Bot bad_bot
BrowserMatchNoCase Mediatoolkitbot bad_bot
BrowserMatchNoCase SemrushBot bad_bot
BrowserMatchNoCase DotBot/1.1 bad_bot
BrowserMatchNoCase DataForSeoBot bad_bot
BrowserMatchNoCase www.timpi.io bad_bot
BrowserMatchNoCase DotBot bad_bot
BrowserMatchNoCase trendictionbot bad_bot
BrowserMatchNoCase BLEXBot/1.0 bad_bot
BrowserMatchNoCase SeekportBot bad_bot
BrowserMatchNoCase Turnitin bad_bot
BrowserMatchNoCase omgili/0.5 bad_bot
BrowserMatchNoCase CheckHost bad_bot
BrowserMatchNoCase Amazonbot bad_bot
BrowserMatchNoCase SEOkicks bad_bot
<RequireAll>
Require all granted
<RequireNone>
Require env bad_bot
</RequireNone>
</RequireAll>

</Limit>
</Location>

shawnb61 · May 14, 2024, 05:12:56 PM

There has been some changes lately, e.g., the new recommended syntax is 'require', like @vbgamer45 uses.

I still use the old syntax. So I took your list, added a few agents that have been bugging me lately (including GoogleOther and claudebot), and utilized the old syntax. I also enclosed bots with embedded spaces in single quotes.

Try this:

Code Select

# Begin Bad Bot Blocking
BrowserMatchNoCase OmniExplorer_Bot/6.11.1 bad_bot
BrowserMatchNoCase omniexplorer_bot bad_bot
BrowserMatchNoCase Baiduspider bad_bot
BrowserMatchNoCase Baiduspider/2.0 bad_bot
BrowserMatchNoCase yandex bad_bot
BrowserMatchNoCase yandeximages bad_bot
BrowserMatchNoCase Spinn3r bad_bot
BrowserMatchNoCase sogou bad_bot
BrowserMatchNoCase Sogouwebspider/3.0 bad_bot
BrowserMatchNoCase Sogouwebspider/4.0 bad_bot
BrowserMatchNoCase sosospider+ bad_bot
BrowserMatchNoCase jikespider bad_bot
BrowserMatchNoCase ia_archiver bad_bot
BrowserMatchNoCase PaperLiBot bad_bot
BrowserMatchNoCase ahrefsbot bad_bot
BrowserMatchNoCase ahrefsbot/1.0 bad_bot
BrowserMatchNoCase SiteBot/0.1 bad_bot
BrowserMatchNoCase DNS-Digger/1.0 bad_bot
BrowserMatchNoCase DNS-Digger-Explorer/1.0 bad_bot
BrowserMatchNoCase boardreader bad_bot
BrowserMatchNoCase radian6 bad_bot
BrowserMatchNoCase R6_FeedFetcher bad_bot
BrowserMatchNoCase R6_CommentReader bad_bot
BrowserMatchNoCase ScoutJet bad_bot
BrowserMatchNoCase ezooms bad_bot
BrowserMatchNoCase CC-rget/5.818 bad_bot
BrowserMatchNoCase libwww-perl/5.813 bad_bot
BrowserMatchNoCase 'magpie-crawler 1.1' bad_bot
BrowserMatchNoCase jakarta bad_bot
BrowserMatchNoCase discobot/1.0 bad_bot
BrowserMatchNoCase MJ12bot bad_bot
BrowserMatchNoCase MJ12bot/v1.2.0 bad_bot
BrowserMatchNoCase MJ12bot/v1.2.5 bad_bot
BrowserMatchNoCase SemrushBot/0.9 bad_bot
BrowserMatchNoCase MLBot bad_bot
BrowserMatchNoCase butterfly bad_bot
BrowserMatchNoCase SeznamBot/3.0 bad_bot
BrowserMatchNoCase HuaweiSymantecSpider bad_bot
BrowserMatchNoCase Exabot/2.0 bad_bot
BrowserMatchNoCase netseer/0.1 bad_bot
BrowserMatchNoCase 'NetSeer crawler/2.0' bad_bot
BrowserMatchNoCase NetSeer/Nutch-0.9 bad_bot
BrowserMatchNoCase psbot/0.1 bad_bot
BrowserMatchNoCase moreoverbot/5.0 bad_bot
BrowserMatchNoCase 'Jakarta Commons-HttpClient/3.0' bad_bot
BrowserMatchNoCase SocialSpider-Finder/0.2 bad_bot
BrowserMatchNoCase wordpress bad_bot
BrowserMatchNoCase istellabot bad_bot
BrowserMatchNoCase SeznamBot bad_bot
BrowserMatchNoCase Cliqzbot bad_bot
BrowserMatchNoCase SocialRankIOBot bad_bot
BrowserMatchNoCase Mail.RU_Bot bad_bot
BrowserMatchNoCase 'Clickag Bot' bad_bot
BrowserMatchNoCase Mediatoolkitbot bad_bot
BrowserMatchNoCase SemrushBot bad_bot
BrowserMatchNoCase DotBot/1.1 bad_bot
BrowserMatchNoCase DataForSeoBot bad_bot
BrowserMatchNoCase www.timpi.io bad_bot
BrowserMatchNoCase DotBot bad_bot
BrowserMatchNoCase trendictionbot bad_bot
BrowserMatchNoCase BLEXBot/1.0 bad_bot
BrowserMatchNoCase SeekportBot bad_bot
BrowserMatchNoCase Turnitin bad_bot
BrowserMatchNoCase omgili/0.5 bad_bot
BrowserMatchNoCase CheckHost bad_bot
BrowserMatchNoCase Amazonbot bad_bot
BrowserMatchNoCase SEOkicks bad_bot
BrowserMatchNoCase Claudebot bad_bot
BrowserMatchNoCase bomborabot bad_bot
BrowserMatchNoCase commoncrawl bad_bot
BrowserMatchNoCase dataforseo-bot bad_bot
BrowserMatchNoCase GoogleOther bad_bot
BrowserMatchNoCase keys-so-bot bad_bot
BrowserMatchNoCase MojeekBot bad_bot
<Limit GET POST HEAD>
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</Limit>

Rhindeer · May 14, 2024, 05:24:52 PM

Quote from: shawnb61 on May 14, 2024, 05:12:56 PMThere has been some changes lately, e.g., the new recommended syntax is 'require', like @vbgamer45 uses.

I still use the old syntax. So I took your list, added a few agents that have been bugging me lately (including GoogleOther and claudebot), and utilized the old syntax. I also enclosed bots with embedded spaces in single quotes.

Try this:

Code Select Expand
# Begin Bad Bot Blocking BrowserMatchNoCase OmniExplorer_Bot/6.11.1 bad_bot BrowserMatchNoCase omniexplorer_bot bad_bot BrowserMatchNoCase Baiduspider bad_bot BrowserMatchNoCase Baiduspider/2.0 bad_bot BrowserMatchNoCase yandex bad_bot BrowserMatchNoCase yandeximages bad_bot BrowserMatchNoCase Spinn3r bad_bot BrowserMatchNoCase sogou bad_bot BrowserMatchNoCase Sogouwebspider/3.0 bad_bot BrowserMatchNoCase Sogouwebspider/4.0 bad_bot BrowserMatchNoCase sosospider+ bad_bot BrowserMatchNoCase jikespider bad_bot BrowserMatchNoCase ia_archiver bad_bot BrowserMatchNoCase PaperLiBot bad_bot BrowserMatchNoCase ahrefsbot bad_bot BrowserMatchNoCase ahrefsbot/1.0 bad_bot BrowserMatchNoCase SiteBot/0.1 bad_bot BrowserMatchNoCase DNS-Digger/1.0 bad_bot BrowserMatchNoCase DNS-Digger-Explorer/1.0 bad_bot BrowserMatchNoCase boardreader bad_bot BrowserMatchNoCase radian6 bad_bot BrowserMatchNoCase R6_FeedFetcher bad_bot BrowserMatchNoCase R6_CommentReader bad_bot BrowserMatchNoCase ScoutJet bad_bot BrowserMatchNoCase ezooms bad_bot BrowserMatchNoCase CC-rget/5.818 bad_bot BrowserMatchNoCase libwww-perl/5.813 bad_bot BrowserMatchNoCase 'magpie-crawler 1.1' bad_bot BrowserMatchNoCase jakarta bad_bot BrowserMatchNoCase discobot/1.0 bad_bot BrowserMatchNoCase MJ12bot bad_bot BrowserMatchNoCase MJ12bot/v1.2.0 bad_bot BrowserMatchNoCase MJ12bot/v1.2.5 bad_bot BrowserMatchNoCase SemrushBot/0.9 bad_bot BrowserMatchNoCase MLBot bad_bot BrowserMatchNoCase butterfly bad_bot BrowserMatchNoCase SeznamBot/3.0 bad_bot BrowserMatchNoCase HuaweiSymantecSpider bad_bot BrowserMatchNoCase Exabot/2.0 bad_bot BrowserMatchNoCase netseer/0.1 bad_bot BrowserMatchNoCase 'NetSeer crawler/2.0' bad_bot BrowserMatchNoCase NetSeer/Nutch-0.9 bad_bot BrowserMatchNoCase psbot/0.1 bad_bot BrowserMatchNoCase moreoverbot/5.0 bad_bot BrowserMatchNoCase 'Jakarta Commons-HttpClient/3.0' bad_bot BrowserMatchNoCase SocialSpider-Finder/0.2 bad_bot BrowserMatchNoCase wordpress bad_bot BrowserMatchNoCase istellabot bad_bot BrowserMatchNoCase SeznamBot bad_bot BrowserMatchNoCase Cliqzbot bad_bot BrowserMatchNoCase SocialRankIOBot bad_bot BrowserMatchNoCase Mail.RU_Bot bad_bot BrowserMatchNoCase 'Clickag Bot' bad_bot BrowserMatchNoCase Mediatoolkitbot bad_bot BrowserMatchNoCase SemrushBot bad_bot BrowserMatchNoCase DotBot/1.1 bad_bot BrowserMatchNoCase DataForSeoBot bad_bot BrowserMatchNoCase www.timpi.io bad_bot BrowserMatchNoCase DotBot bad_bot BrowserMatchNoCase trendictionbot bad_bot BrowserMatchNoCase BLEXBot/1.0 bad_bot BrowserMatchNoCase SeekportBot bad_bot BrowserMatchNoCase Turnitin bad_bot BrowserMatchNoCase omgili/0.5 bad_bot BrowserMatchNoCase CheckHost bad_bot BrowserMatchNoCase Amazonbot bad_bot BrowserMatchNoCase SEOkicks bad_bot BrowserMatchNoCase Claudebot bad_bot BrowserMatchNoCase bomborabot bad_bot BrowserMatchNoCase commoncrawl bad_bot BrowserMatchNoCase dataforseo-bot bad_bot BrowserMatchNoCase GoogleOther bad_bot BrowserMatchNoCase keys-so-bot bad_bot BrowserMatchNoCase MojeekBot bad_bot <Limit GET POST HEAD> Order Allow,Deny Allow from all Deny from env=bad_bot </Limit>

Thank you SO MUCH!

The site didn't break and now gonna test everything in Chrome! I'll update ya! I really appreciate you!

shawnb61 · May 14, 2024, 05:25:40 PM

Note that some in the above list are valid search engines.

But most of us are running on some form of shared host with limited resources. We just can't let everyone crawl... You have to think of it as a budget. Who can you afford to let in?

I have a crawl delay in my robots.txt. I try to allow all legit search engine crawlers that honor the crawl delay to crawl. I want folks to find my content - even if they are in Asia or Russia...

So for now I have allowed even yandex, baidu, sogou, bingbot, mj12bot, mail_ru. I block all 'market research', 'seo research' and AI bots. Literally anything I cannot find a search engine for.

I have a love/hate relationship with yandex & bingbot... At various times I have had them blocked, because both can go off & crawl far too aggressively. But they both appear to be following crawl-delay at the moment.

Note also that Googlebot does NOT honor crawl-delay. You need to use their search console to limit the rate. On occasion, even Googlebot can hit you pretty hard, if they get it in their head you need a complete recrawl...

Just things to be aware of. For now, I suggest you stay as restrictive as possible until you are stable.

Rhindeer · May 14, 2024, 05:35:26 PM

@shawnb61 IT WORKED! Tested in Chrome and it was perfect, and also got to watch my guest list drop from 710 users to 17. Thank you so much! Myself and my community really appreciate you!

Rhindeer · May 14, 2024, 05:39:11 PM

Quote from: shawnb61 on May 14, 2024, 05:25:40 PMNote that some in the above list are valid search engines.

But most of us are running on some form of shared host with limited resources. We just can't let everyone crawl... You have to think of it as a budget. Who can you afford to let in?

I have a crawl delay in my robots.txt. I try to allow all legit search engine crawlers that honor the crawl delay to crawl. I want folks to find my content - even if they are in Asia or Russia...

So for now I have allowed even yandex, baidu, sogou, bingbot, mj12bot, mail_ru.

I have a love/hate relationship with yandex & bingbot... At various times I have had them blocked, because both can go off & crawl far too aggressively. But they both appear to be following crawl-delay at the moment.

Note also that Googlebot does NOT honor crawl-delay. You need to use their search console to limit the rate. On occasion, even Googlebot can hit you pretty hard, if they get it in their head you need a complete recrawl...

Just things to be aware of. For now, I suggest you stay as restrictive as possible until you are stable.

Thank you for this, that makes total sense. I MAY in the future test run with allowing Google, Yandex, and Bing back since they've never given us issues (when they popped in we might get 20 extra guests, but never hundreds!) but for now I'm happy to keep everyone blocked. xD The Amazon bot was the worst one this round! I've never seen anything like this before.

shawnb61 · May 14, 2024, 05:46:57 PM

To be clear - the above blocks this new bot called "GoogleOther", but this is not the normal search engine googlebot. So Google search is allowed to crawl.

GoogleOther is supposedly their research bot???

It does not honor crawl delay or any other robots.txt directive. At all. It has absolutely smashed my site.

And get this - it was doing hundreds of 'action=verificationcode' calls per hour on my site... From a valid Google IP.

Blocked.

Rhindeer · May 14, 2024, 08:05:02 PM

Quote from: shawnb61 on May 14, 2024, 05:46:57 PMTo be clear - the above blocks this new bot called "GoogleOther", but this is not the normal search engine googlebot. So Google search is allowed to crawl.

GoogleOther is supposedly their research bot???

It does not honor crawl delay or any other robots.txt directive. At all. It has absolutely smashed my site.

And get this - it was doing hundreds of 'action=verificationcode' calls per hour on my site... From a valid Google IP.

Blocked.

Ewwww wtf???

Yeah, no thanks, that one'll stay blocked!

Steve · May 15, 2024, 08:38:52 AM

It appears the OP's problem has been resolved. Marking solved.

shawnb61 · May 15, 2024, 01:17:31 PM

Quote from: shawnb61 on May 14, 2024, 05:25:40 PMI have a love/hate relationship with yandex & bingbot... At various times I have had them blocked, because both can go off & crawl far too aggressively. But they both appear to be following crawl-delay at the moment.

For the record, yandex is not honoring crawl-delay anymore, so I blocked them again.

Oddly, they DO read robots.txt. If they are disallowed there, they will honor that. But not crawl-delay.

Their support site says they do not honor crawl-delay, and suggests you create an account with them to control crawls (like Google...)

News:

Forum getting swarmed by 500+ guests?