News:

SMF 2.1.4 has been released! Take it for a spin! Read more.

Main Menu

Forum getting swarmed by 500+ guests?

Started by Rhindeer, May 10, 2024, 02:17:40 PM

Previous topic - Next topic

bayonetbrant

Current .htaccess file looks like it was already modified in an attempt to clear out some bots.

@shawnb61 would the code from your GitHub acc't replace this?  or add to it?

Thanks!


#Simple .htaccess Bot Blocker Mod
# Rate limit search engine crawlers
<IfModule mod_ratelimit.c>
  SetEnvIfNoCase User-Agent "^Baiduspider" crawler_bot
  SetEnvIfNoCase User-Agent "^AhrefsBot" crawler_bot
  SetEnvIfNoCase User-Agent "^Ezooms" crawler_bot
  SetEnvIfNoCase User-Agent "^Googlebot" crawler_bot
  SetEnvIfNoCase User-Agent "^Slurp" crawler_bot
  SetEnvIfNoCase User-Agent "^msnbot" crawler_bot
  SetEnvIfNoCase User-Agent "^Yandex" crawler_bot

  RLimitCPU crawler_bot 1 1
  RLimitIO crawler_bot 512 512
  RLimitMEM crawler_bot 128 128
</IfModule>

<IfModule mod_rewrite.c>
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ADSARobot|ah-ha|almaden|aktuelles|Anarchie|amzn_assoc|ASPSeek|ASSORT|ATHENS|Atomz|attach|attache|autoemailspider|BackWeb|Bandit|BatchFTP|bdfetch|big.brother|BlackWidow|bmclient|Boston\ Project|BravoBrian\ SpiderEngine\ MarcoPolo|Bot\ mailto:[email protected]|Buddy|Bullseye|bumblebee|capture|CherryPicker|ChinaClaw|CICC|clipping|Collector|Copier|Crescent|Crescent\ Internet\ ToolPak|Custo|cyberalert|DA$|Deweb|diagem|Digger|Digimarc|DIIbot|DISCo|DISCo\ Pump|DISCoFinder|Download\ Demon|Download\ Wonder|Downloader|Drip|DSurf15a|DTS.Agent|EasyDL|eCatch|ecollector|efp@gmx\.net|Email\ Extractor|EirGrabber|email|EmailCollector|EmailSiphon|EmailWolf|Express\ WebPictures|ExtractorPro|EyeNetIE|FavOrg|fastlwspider|Favorites\ Sweeper|Fetch|FEZhead|FileHound|FlashGet\ WebWasher|FlickBot|fluffy|FrontPage|GalaxyBot|Generic|Getleft|GetRight|GetSmart|GetWeb!|GetWebPage|gigabaz|Girafabot|Go\!Zilla|Go!Zilla|Go-Ahead-Got-It|GornKer|gotit|Grabber|GrabNet|Grafula|Green\ Research|grub-client|Harvest|hhjhj@yahoo|hloader|HMView|HomePageSearch|http\ generic|HTTrack|httpdown|httrack|ia_archiver|IBM_Planetwide|Image\ Stripper|Image\ Sucker|imagefetch|IncyWincy|Indy*Library|Indy\ Library|informant|Ingelin|InterGET|Internet\ Ninja|InternetLinkagent|Internet\ Ninja|InternetSeer\.com|Iria|Irvine|JBH*agent|JetCar|JOC|JOC\ Web\ Spider|JustView|KWebGet|Lachesis|larbin|LeechFTP|LexiBot|lftp|libwww|likse|Link|Link*Sleuth|LINKS\ ARoMATIZED|LinkWalker|LWP|lwp-trivial|Mag-Net|Magnet|Mac\ Finder|Mag-Net|Mass\ Downloader|MCspider|Memo|Microsoft.URL|MIDown\ tool|Mirror|Missigua\ Locator|Mister\ PiX|MMMtoCrawl\/UrlDispatcherLLL|^Mozilla$|Mozilla.*Indy|Mozilla.*NEWT|Mozilla*MSIECrawler|MS\ FrontPage*|MSFrontPage|MSIECrawler|MSProxy|multithreaddb|nationaldirectory|Navroad|NearSite|NetAnts|NetCarta|NetMechanic|netprospector|NetResearchServer|NetSpider|Net\ Vampire|NetZIP|NetZip\ Downloader|NetZippy|NEWT|NICErsPRO|Ninja|NPBot|Octopus|Offline\ Explorer|Offline\ Navigator|OpaL|Openfind|OpenTextSiteCrawler|OrangeBot|PageGrabber|Papa\ Foto|PackRat|pavuk|pcBrowser|PersonaPilot|Ping|PingALink|Pockey|Proxy|psbot|PSurf|puf|Pump|PushSite|QRVA|RealDownload|Reaper|Recorder|ReGet|replacer|RepoMonkey|Robozilla|Rover|RPT-HTTPClient|Rsync|Scooter|SearchExpress|searchhippo|searchterms\.it|Second\ Street\ Research|Seeker|Shai|Siphon|sitecheck|sitecheck.internetseer.com|SiteSnagger|SlySearch|SmartDownload|snagger|Snake|SpaceBison|Spegla|SpiderBot|sproose|SqWorm|Stripper|Sucker|SuperBot|SuperHTTP|Surfbot|SurfWalker|Szukacz|tAkeOut|tarspider|Teleport\ Pro|Templeton|TrueRobot|TV33_Mercator|UIowaCrawler|UtilMind|URLSpiderPro|URL_Spider_Pro|Vacuum|vagabondo|vayala|visibilitygap|VoidEYE|vspider|Web\ Downloader|w3mir|Web\ Data\ Extractor|Web\ Image\ Collector|Web\ Sucker|Wweb|WebAuto|WebBandit|web\.by\.mail|Webclipping|webcollage|webcollector|WebCopier|webcraft@bea|webdevil|webdownloader|Webdup|WebEMailExtrac|WebFetch|WebGo\ IS|WebHook|Webinator|WebLeacher|WEBMASTERS|WebMiner|WebMirror|webmole|WebReaper|WebSauger|Website|Website\ eXtractor|Website\ Quester|WebSnake|Webster|WebStripper|websucker|webvac|webwalk|webweasel|WebWhacker|WebZIP|Wget|Whacker|whizbang|WhosTalking|Widow|WISEbot|WWWOFFLE|x-Tractor|^Xaldon\ WebSpider|WUMPUS|Xenu|XGET|Zeus.*Webster|Zeus [NC]
RewriteRule ^.* - [F,L]
</IfModule>
#Simple .htaccess Bot Blocker Mod

#Simple .htaccess Bot Blocker Mod
<IfModule mod_rewrite.c>
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ADSARobot|ah-ha|almaden|aktuelles|Anarchie|amzn_assoc|ASPSeek|ASSORT|ATHENS|Atomz|attach|attache|autoemailspider|BackWeb|Bandit|BatchFTP|bdfetch|big.brother|BlackWidow|bmclient|Boston\ Project|BravoBrian\ SpiderEngine\ MarcoPolo|Bot\ mailto:[email protected]|Buddy|Bullseye|bumblebee|capture|CherryPicker|ChinaClaw|CICC|clipping|Collector|Copier|Crescent|Crescent\ Internet\ ToolPak|Custo|cyberalert|DA$|Deweb|diagem|Digger|Digimarc|DIIbot|DISCo|DISCo\ Pump|DISCoFinder|Download\ Demon|Download\ Wonder|Downloader|Drip|DSurf15a|DTS.Agent|EasyDL|eCatch|ecollector|efp@gmx\.net|Email\ Extractor|EirGrabber|email|EmailCollector|EmailSiphon|EmailWolf|Express\ WebPictures|ExtractorPro|EyeNetIE|FavOrg|fastlwspider|Favorites\ Sweeper|Fetch|FEZhead|FileHound|FlashGet\ WebWasher|FlickBot|fluffy|FrontPage|GalaxyBot|Generic|Getleft|GetRight|GetSmart|GetWeb!|GetWebPage|gigabaz|Girafabot|Go\!Zilla|Go!Zilla|Go-Ahead-Got-It|GornKer|gotit|Grabber|GrabNet|Grafula|Green\ Research|grub-client|Harvest|hhjhj@yahoo|hloader|HMView|HomePageSearch|http\ generic|HTTrack|httpdown|httrack|ia_archiver|IBM_Planetwide|Image\ Stripper|Image\ Sucker|imagefetch|IncyWincy|Indy*Library|Indy\ Library|informant|Ingelin|InterGET|Internet\ Ninja|InternetLinkagent|Internet\ Ninja|InternetSeer\.com|Iria|Irvine|JBH*agent|JetCar|JOC|JOC\ Web\ Spider|JustView|KWebGet|Lachesis|larbin|LeechFTP|LexiBot|lftp|libwww|likse|Link|Link*Sleuth|LINKS\ ARoMATIZED|LinkWalker|LWP|lwp-trivial|Mag-Net|Magnet|Mac\ Finder|Mag-Net|Mass\ Downloader|MCspider|Memo|Microsoft.URL|MIDown\ tool|Mirror|Missigua\ Locator|Mister\ PiX|MMMtoCrawl\/UrlDispatcherLLL|^Mozilla$|Mozilla.*Indy|Mozilla.*NEWT|Mozilla*MSIECrawler|MS\ FrontPage*|MSFrontPage|MSIECrawler|MSProxy|multithreaddb|nationaldirectory|Navroad|NearSite|NetAnts|NetCarta|NetMechanic|netprospector|NetResearchServer|NetSpider|Net\ Vampire|NetZIP|NetZip\ Downloader|NetZippy|NEWT|NICErsPRO|Ninja|NPBot|Octopus|Offline\ Explorer|Offline\ Navigator|OpaL|Openfind|OpenTextSiteCrawler|OrangeBot|PageGrabber|Papa\ Foto|PackRat|pavuk|pcBrowser|PersonaPilot|Ping|PingALink|Pockey|Proxy|psbot|PSurf|puf|Pump|PushSite|QRVA|RealDownload|Reaper|Recorder|ReGet|replacer|RepoMonkey|Robozilla|Rover|RPT-HTTPClient|Rsync|Scooter|SearchExpress|searchhippo|searchterms\.it|Second\ Street\ Research|Seeker|Shai|Siphon|sitecheck|sitecheck.internetseer.com|SiteSnagger|SlySearch|SmartDownload|snagger|Snake|SpaceBison|Spegla|SpiderBot|sproose|SqWorm|Stripper|Sucker|SuperBot|SuperHTTP|Surfbot|SurfWalker|Szukacz|tAkeOut|tarspider|Teleport\ Pro|Templeton|TrueRobot|TV33_Mercator|UIowaCrawler|UtilMind|URLSpiderPro|URL_Spider_Pro|Vacuum|vagabondo|vayala|visibilitygap|VoidEYE|vspider|Web\ Downloader|w3mir|Web\ Data\ Extractor|Web\ Image\ Collector|Web\ Sucker|Wweb|WebAuto|WebBandit|web\.by\.mail|Webclipping|webcollage|webcollector|WebCopier|webcraft@bea|webdevil|webdownloader|Webdup|WebEMailExtrac|WebFetch|WebGo\ IS|WebHook|Webinator|WebLeacher|WEBMASTERS|WebMiner|WebMirror|webmole|WebReaper|WebSauger|Website|Website\ eXtractor|Website\ Quester|WebSnake|Webster|WebStripper|websucker|webvac|webwalk|webweasel|WebWhacker|WebZIP|Wget|Whacker|whizbang|WhosTalking|Widow|WISEbot|WWWOFFLE|x-Tractor|^Xaldon\ WebSpider|WUMPUS|Xenu|XGET|Zeus.*Webster|Zeus [NC]
RewriteRule ^.* - [F,L]
</IfModule>
#Simple .htaccess Bot Blocker Mod

shawnb61

Quote from: bayonetbrant on June 02, 2025, 11:42:58 AMCurrent .htaccess file looks like it was already modified in an attempt to clear out some bots.

@shawnb61 would the code from your GitHub acc't replace this?  or add to it?

You have 3 blocks of code there...  The first uses mod_ratelimit.c which I'm not familiar with (though taken at face value, appears to be quite useful...)

The next two blocks use a pretty standard method for blocking via user agent, and appear to be exact clones.  One of those two is unnecessary.

A couple of thoughts:

If these are getting put in your .htaccess via a cPanel function, then they should be updated via cPanel.  Removing them manually won't work...

The bots that appear to be throttled in the first block are eliminated in my code, so that first block would be unnecessary.

The bots blocked in the next two overlap with my set, but only block by useragent & not by IP. 

So it comes down to how many of those useragents were put there deliberately due to specific issues with your forum.  It's up to you whether to merge with my list or not.

My initial thought is to simply replace it with mine.  If some of those guys unique to your old list come back, you can add them.  Keep a backup.

And read all my notes about my selection criteria over on GitHub.  Make sure you're comfy with the overall approach. 
A question worth asking is born in experience & driven by necessity. - Fripp

bayonetbrant

Quote from: shawnb61 on June 02, 2025, 05:14:38 PMMy initial thought is to simply replace it with mine.  If some of those guys unique to your old list come back, you can add them.  Keep a backup.

And read all my notes about my selection criteria over on GitHub.  Make sure you're comfy with the overall approach. 

I will definitely keep a backup

Our .htaccess file was set up by one of members who does some of this stuff professionally, but who left the community a while back b/c of some personality issues, and not available.

I did ready your release notes (thank you for being comprehensive about them!) and I think the approach seems pretty sound.

Rhindeer

@shawnb61 I made that change to session.php and boom, that handled the issue! Our guests are now in the low double digits. Thank you so much!

shawnb61

Quote from: Rhindeer on June 04, 2025, 01:54:20 PM@shawnb61 I made that change to session.php and boom, that handled the issue! Our guests are now in the low double digits. Thank you so much!

I'm glad it worked out.  I made that change back in January (https://github.com/SimpleMachines/SMF/pull/8394), & I haven't seen a spike in guests since.  More importantly, I haven't seen the corresponding spikes in MySQL CPU either.  Both charts got a serious buzz cut...
A question worth asking is born in experience & driven by necessity. - Fripp

a10

So, the South America \ Vietnam \ Indonesia rats back again, was 17.000+ guests in 720 minutes. A zillion unique ip's. Normal guests \ bots 500 to 750. Pageviews for the last few days, see attachment, normal around 5.000.

Set forum to login only, and testing some country blocks, brazil, vietnam, indonesia (10.500 lines of ranges in htaccess!). Helped, but always the risk of chaos and blocking legit members\guests. Still lots of hits, colombia, chili, ecuador, argentina etc.

Total madness, attempting to scrape everything, even trying member's profiles, probably script counting up from forum/index.php?action=profile;u=1

But amazingly zero disruption to normal forum behaviour, speed etc.
2.0.19, php 8.0.30, MariaDB 10.6.18. Mods: Contact Page, Like Posts, Responsive Curve, Search Focus Dropdown, Add Join Date to Post.
Stand with 🇺🇦

HITG

Did you guys know a premium dns can solve this without touching the htaccess.

Kindred

did you know that we can do it via htacess without paying for a "premium" dns?
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

HITG

Quote from: Kindred on Yesterday at 05:54:32 PMdid you know that we can do it via htacess without paying for a "premium" dns?

The only thing is htaccess is meant to be really untouched file when it comes down to big forum engines.

vbgamer45

Cloudflare on the free plan working well for me can block by asn as well. I have an article on country blocking at https://www.simplemachines.org/community/index.php?topic=591920.msg4190552#new

So far on free plan handled 8.7 million requests and 731k unique visitors in three days.
Community Suite for SMF - Grow your forum with SMF, Gallery,Store,Classifieds,Downloads,more!

SMFHacks.com - Paid Modifications for SMF

Mods:
EzPortal - Portal System for SMF
SMF Gallery Pro
SMF Store SMF Classifieds Ad Seller Pro

HITG

Quote from: vbgamer45 on Yesterday at 06:06:40 PMCloudflare on the free plan working well for me can block by asn as well. I have an article on country blocking at https://www.simplemachines.org/community/index.php?topic=591920.msg4190552#new

So far on free plan handled 8.7 million requests and 731k unique visitors in three days.

If they ever start charging you just use a premium dns can stop spam too. Cheap for year and you can do route of dns to stop it.

Advertisement: