News:

Bored?  Looking to kill some time?  Want to chat with other SMF users?  Join us in IRC chat or Discord

Main Menu

Possible 'new' way to control A.I. bots?

Started by AlanDewey, April 30, 2025, 05:32:24 PM

Previous topic - Next topic

AlanDewey

crawler.amazonbot.amazon killing my forum.  My forum has over 200,000 posts so it would take quite a while for the bots to go through it all.  They are using dozens of I.P. addresses, so it is probably impossible to manually ban by I.P.

So I turned on hostname lookups and made a permanent ban on amazonbot.amazon.  22,000 ban log entries in the last 26 hours and they are not slowing down.  I understand that hostname lookups puts a lot of overhead on the server, but it would be less work than serving 200,000 posts, plus attachments.

So I have two thoughts/questions:

1) Instead of SMF replying with the ban information, why not just ignore all requests (do not reply with even one packet.) Maybe they would go away after some time.  My server indicates that it replies with 3386 bytes at each bot request; that was 74 megabyte in the last 26 hours. Or, instead of serving ban reason, just reply with HTTP 500 ?  (or 418  :-)

2) Instead of 22,000 hostname lookups each day, build a store of information containing the I.P.s used by the bot today?  The ban would, after the 2nd or 3rd IP used for that bot, look at the store of IPs and ban that IP.  This "ip ban" database(?) would be cleared once(?) per day(?) and start over.  I think that then there would only need to be hostname lookups less than 100 times per day (depending how many IPs they are using.)



"AI" is simply a huge industry plagiarizing everybody's work.
Causing lots of electrons to push each other around since 1985.

Aleksi "Lex" Kilpinen

#1
1) The ban is designed to be informative, because it is designed with actual users in mind. Consider banning on the server level, through htaccess for example, if you want to stop the bots from ever reaching SMF in the first place. https://httpd.apache.org/docs/2.0/mod/mod_access.html#deny

2) See number 1, this sounds more suited to be a mod. Some mod might even already exist that does something similar. You could also just find out the IP range used, and ban them all at once, through htaccess for example.
Slava
Ukraini!
"Before you allow people access to your forum, especially in an administrative position, you must be aware that that person can seriously damage your forum. Therefore, you should only allow people that you trust, implicitly, to have such access." -Douglas

How you can help SMF

sudoku

It is only going to get worse as time goes by...  O:)

AlanDewey

I am using IIS 10 on Windows Server 2016 which does not use .htaccess

I can, and have, added huge ip ranges in "IP Address and Domain Restrictions" but it will not let me put hostnames in there. 

Have not found help on this searching the internet.

All suggestions will be highly appreciated.
Causing lots of electrons to push each other around since 1985.

Aleksi "Lex" Kilpinen

I'm not very familiar with IIS configuration, but I'd assume Domain Restrictions to actually mean hostnames.
https://learn.microsoft.com/en-us/iis/configuration/system.webserver/security/ipsecurity/add
Slava
Ukraini!
"Before you allow people access to your forum, especially in an administrative position, you must be aware that that person can seriously damage your forum. Therefore, you should only allow people that you trust, implicitly, to have such access." -Douglas

How you can help SMF

vbgamer45

On Windows for IIS I do a firewall rules to block ip ranges
Community Suite for SMF - Grow your forum with SMF, Gallery,Store,Classifieds,Downloads,more!

SMFHacks.com - Paid Modifications for SMF

Mods:
EzPortal - Portal System for SMF
SMF Gallery Pro
SMF Store SMF Classifieds Ad Seller Pro

AlanDewey

Success !

I tried this a few days ago, but every time I typed in the hostname it replied "invalid IP address".

So it looks like I changed something so now it works.  I do not know what I did, though.

The nice thing about this is it blocks them from all my websites  :laugh:

Thanks to everyone for the help.
You cannot view this attachment.
Causing lots of electrons to push each other around since 1985.

AlanDewey

the entry should be   *.crawl.amazonbot.amazon    works better.

I did not realize that 18-209-137-234.crawl.amazonbot.amazon  would not be stopped by my initial entries.  (Well, that is how it is working in IIS10. )
Causing lots of electrons to push each other around since 1985.

Advertisement: