Blocking bots ruining forum UX

Started by nakedscientist, March 17, 2025, 03:18:26 AM

Previous topic - Next topic

nakedscientist

Dear all

We've been using SMF for nearly two decades. We have / had a very successful forum with close to 100,000 pages of content.

Unfortunately, in the last month or so we have begun to be assailed by bots arriving in their thousands, scraping the forum.

We have a dedicated server, which is quite capable, but even so cannot cope with the relentless pummelling. Because the bots examine every post and message, caching doesn't help much because it's the sheer volume of requests that paralyse the system. Mysql just slows to a crawl.

We're on SMF 2.0.19 (at the moment).

We urgently need a way to block this. It's impossible to use standard throttling because there are so many bots and they rotate their IPs literally every few minutes.

Implementing an invisible captcha behind the scenes would be an ideal solution so it doesn't harm guest visits (we have a lot of factual content that people just want to browse without logging in).

Does anyone have any solutions, or can help me? I've had to lock the forum temporarily to avoid it compromising the rest of our website.

Thanks

Chris

a10

AI world = bot world :O)

On my forum, crazy bots = crazy page views, normal days it's approx 7.500 (combination of 'gentle' bots + guests + members), Special bot periods it can go up to 100.000, with zero apparent impact or problems. Hosting is low cost shared server.

Am curious about your server's capacity, how many page views daily?
2.0.19, php 8.0.30, MariaDB 10.6.18. Mods: Contact Page, Like Posts, Responsive Curve, Search Focus Dropdown, Add Join Date to Post.
Stand with 🇺🇦

Kindred

well, for one, you should upgrade to 2.1.4

for two, there have been several discussions recently on dealing with the bots -- the only way to deal with many/most if the use .htaccess and block them from the site entirely

Quote from: shawnb61 on March 12, 2025, 04:25:05 PMYeah, unfortunately crawlers are running amok.

If you haven't already, you need to probably work on a robots.txt & .htaccess file.

More here:
https://www.simplemachines.org/community/index.php?msg=4179600

And here:
https://www.simplemachines.org/community/index.php?msg=4186334

Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

nakedscientist

Thanks - I got one idea from there - to disable host lookups; I'd overlooked that.

The problem we have is that we're getting nearly 10,000 simultaneous users, with 100,000 pages of content, and it's killing the database. Because they keep rotating their IPs, and many of them are non-descript anonymous / US / Brazilian / Mexican ISPs, it's hard to stop them.

We're also using nginx as a reverse proxy, and I've implemented bad bot blocker there, but they're still getting through.

The main site has a cache accelerator, which can outrun the bad guys; but because the forum is dynamic, once enough of them pile on, it just grinds mysql to a standstill.

Sir Osis of Liver

You can try adding this to .htaccess, we're using it to block mostly chinese bots on a forum that was crashing from process spikes.  We update it regularly with new IPs, but it's slacked off recently and we're getting very few new ones.  Disabling hostname lookups prevents the forum from getting into a death loop.


# Updated 2-2-25
Order Allow,Deny
Deny from 47.128.0.0/16
Deny from 59.56.0.0/16
Deny from 59.57.0.0/16
Deny from 59.58.0.0/16
Deny from 59.59.0.0/16
Deny from 59.60.0.0/16
Deny from 59.61.0.0/16
Deny from 60.166.0.0/16
Deny from 60.167.0.0/16
Deny from 60.169.0.0/16
Deny from 60.170.0.0/16
Deny from 60.171.0.0/16
Deny from 60.172.0.0/16
Deny from 60.174.0.0/16
Deny from 60.175.0.0/16
Deny from 84.108.0.0/16
Deny from 110.81.0.0/16
Deny from 110.82.0.0/16
Deny from 110.85.0.0/16
Deny from 110.86.0.0/16
Deny from 110.87.0.0/16
Deny from 113.98.0.0/16
Deny from 113.99.0.0/16
Deny from 113.100.0.0/16
Deny from 113.101.0.0/16
Deny from 113.102.0.0/16
Deny from 113.103.0.0/16
Deny from 113.105.0.0/16
Deny from 113.109.0.0/16
Deny from 113.110.0.0/16
Deny from 113.111.0.0/16
Deny from 113.116.0.0/16
Deny from 113.117.0.0/16
Deny from 113.118.0.0/16
Deny from 113.119.0.0/16
Deny from 113.132.0.0/16
Deny from 113.133.0.0/16
Deny from 113.134.0.0/16
Deny from 113.224.0.0/16
Deny from 113.225.0.0/16
Deny from 113.226.0.0/16
Deny from 113.227.0.0/16
Deny from 113.228.0.0/16
Deny from 113.229.0.0/16
Deny from 113.230.0.0/16
Deny from 113.231.0.0/16
Deny from 113.232.0.0/16
Deny from 113.233.0.0/16
Deny from 113.235.0.0/16
Deny from 113.234.0.0/16
Deny from 113.236.0.0/16
Deny from 113.237.0.0/16
Deny from 113.238.0.0/16
Deny from 113.239.0.0/16
Deny from 116.224.0.0/16
Deny from 116.226.0.0/16
Deny from 116.230.0.0/16
Deny from 116.231.0.0/16
Deny from 116.232.0.0/16
Deny from 116.233.0.0/16
Deny from 116.235.0.0/16
Deny from 116.237.0.0/16
Deny from 117.24.0.0/16
Deny from 117.26.0.0/16
Deny from 117.27.0.0/16
Deny from 117.28.0.0/16
Deny from 117.29.0.0/16
Deny from 117.30.0.0/16
Deny from 117.31.0.0/16
Deny from 117.192.0.0/16
Deny from 117.200.0.0/16
Deny from 117.203.0.0/16
Deny from 117.208.0.0/16
Deny from 117.209.0.0/16
Deny from 117.213.0.0/16
Deny from 117.216.0.0/16
Deny from 117.219.0.0/16
Deny from 117.221.0.0/16
Deny from 117.222.0.0/16
Deny from 117.231.0.0/16
Deny from 117.232.0.0/16
Deny from 117.235.0.0/16
Deny from 117.247.0.0/16
Deny from 117.248.0.0/16
Deny from 117.253.0.0/16
Deny from 117.252.0.0/16
Deny from 117.254.0.0/16
Deny from 118.112.0.0/16
Deny from 118.113.0.0/16
Deny from 118.114.0.0/16
Deny from 118.116.0.0/16
Deny from 118.118.0.0/16
Deny from 118.119.0.0/16
Deny from 118.248.0.0/16
Deny from 118.249.0.0/16
Deny from 118.250.0.0/16
Deny from 118.251.0.0/16
Deny from 118.254.0.0/16
Deny vrom 118.255.0.0/16
Deny from 119.162.0.0/16
Deny from 120.32.0.0/16
Deny from 120.33.0.0/16
Deny from 120.34.0.0/16
Deny from 120.35.0.0/16
Deny from 120.36.0.0/16
Deny from 120.37.0.0/16
Deny from 120.38.0.0/16
Deny from 120.40.0.0/16
Deny from 120.41.0.0/16
Deny from 120.42.0.0/16
Deny from 120.43.0.0/16
Deny from 121.205.0.0/16
Deny from 121.206.0.0/16
Deny from 121.207.0.0/16
Deny from 123.180.0.0/16
Deny from 123.181.0.0/16
Deny from 123.182.0.0/16
Deny from 123.183.0.0/16
Deny from 125.77.0.0/16
Deny from 125.78.0.0/16
Deny from 175.44.0.0/16
Deny from 219.131.0.0/16
Deny from 219.133.0.0/16
Deny from 219.136.0.0/16
Deny from 219.137.0.0/16
Deny from 220.161.0.0/16
Deny from 222.208.0.0/16
Deny from 222.209.0.0/16
Deny from 222.210.0.0/16
Deny from 222.211.0.0/16
Deny from 222.212.0.0/16
Deny from 222.213.0.0/16
Deny from 222.214.0.0/16
Allow from all

When in Emor, do as the Snamors.
                              - D. Lister

nakedscientist

Thanks; we've got a standalone firewall in front of the server, so if the logs confirm that these IPs are part of the onslaught, I'll create an address group for this little lot!

Is there a way to write Google Captcha v3 into the threads so that it detects and autoblocks activity that is bot-like?

Doesn't seem like it would be rocket science, but I don't know the forum software anatomy sufficiently well to know where to implement it...

a10

Quote from: nakedscientist on March 17, 2025, 12:51:09 PMThe problem we have is that we're getting nearly 10,000 simultaneous users

Wow! World going mad ( like Elon Musk & Co )

Can only suggest htaccess as mentionned above, large ranges. A pain, as it's lots of work to pick ip's and maintain the list, but should give results over time. Best of luck.
2.0.19, php 8.0.30, MariaDB 10.6.18. Mods: Contact Page, Like Posts, Responsive Curve, Search Focus Dropdown, Add Join Date to Post.
Stand with 🇺🇦

nakedscientist

Apache is the backend, so it sees requests only from the cache accelerator, which is 127.0.0.1 (localhost); hence have to do the config on nginx at the front. Not a problem, but time consuming.

The firewall is easier to configure, and blocks the traffic right out of the fibre, before it even sees the network, which is far safer, and reduces server load.

I think I need a firewall mod that is anti-bot...

AlanDewey

I installed a Ubiquity Edge router between ISP gateway and my server.  I add nasty IPs to the "drop outgoing packets" rule (outgoing from the router to my server.)  So my server is never again bothered by those IPs, it is completely unaware of the requests. 

When server logs show a jump in size from 4 megabyte to 30 megabyte, I peruse them to see what new IPs I need to block.

The stupid thing is I can block a range of annoying/hammering IPs and a month later those bots are still hammering my router with requests even though they have not gotten a response from my IP address for a month.
Causing lots of electrons to push each other around since 1985.

nakedscientist

It's been quite eye-opening doing this: the amount of data that Amazon and Facebook alone are slurping up is huge... Not any more  from our site though!

Advertisement: