News:

Join the Facebook Fan Page.

Main Menu

Plagued by FB bots

Started by Dave J, June 04, 2024, 04:41:17 AM

Previous topic - Next topic

Dave J

Firstly, sorry if this is not the place to post this.

SMF 2.1.4
PHP 8.1

I have an issue with plagues of Facebook bots on my site, now under normal circumstances I would add the first 2 sets of numbers i.e. '192.168' of the IP address to the 'htaccess file so it blocks them but as you will see from the attached that now they are using IPv6 addresses the variations of the addresses is huge so block one IP has really no effect.

I have looked to see if there is a wildcard '*' you can use for IPv6 but it seems it's very limited and in most cases doesn't work. You can shorten the address but that doesn't help either see second attachment.

Does anyone know of anything that might help to stop the bots? Some of you may have seen that I did try BotBanish which does work but it has a detrimental effect on both the site and members trying to get to the site, even deleting the built in spider data.

Any help is much appreciated

Doug Heffernan

Have a look at the Bot Buster mod. I used it on a client's forum several months ago, who had the same problem with the facebook bots and surprisingly it worked quite well in blocking most of them. The mod will need to be upgraded to be compatible with Smf 2.1.x though.

AlanDewey

See thread from 2 weeks ago; 
QuoteWhat is up with facebook hammering my forum?
https://www.simplemachines.org/community/index.php?topic=588989.msg4174762#msg4174762

Those were all 'regular' IP addresses, so easy to block.

I have never seen an IPv6 address in the server logs for any of my forums or websites on my server.  I have no idea why.
Causing lots of electrons to push each other around since 1985.

Dave J

Quote from: Doug Heffernan on June 04, 2024, 05:57:23 AMHave a look at the Bot Buster mod. I used it on a client's forum several months ago, who had the same problem with the facebook bots and surprisingly it worked quite well in blocking most of them. The mod will need to be upgraded to be compatible with Smf 2.1.x though.

Thank Doug I'll look into that.

Alan,

I read your topic and I no longer have facebook account.

Kindred

you can block the facebook agent in htaccess
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

Dave J

Quote from: Kindred on June 04, 2024, 11:36:11 AMyou can block the facebook agent in htaccess

I have already added some of the IPv6 addresses to it Kindred and I've also added 'fb.com' to deny...but like in the war of the worlds...'but still they come'

Kindred

no...   don't ban by IP. ban by client

Option 1 using robots.txt - this should work:

User-agent: facebookexternalhit/1.1
Disallow: /

Option 2 using htaccess - this should work:

#   Block Facebook bot
RewriteCond %{HTTP_USER_AGENT} ^facebookexternalhit/1\.1  [NC]
RewriteRule ^ - [F,L]
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

Dave J

Quote from: Kindred on June 04, 2024, 11:44:53 AMno...   don't ban by IP. ban by client

Option 1 using robots.txt - this should work:

User-agent: facebookexternalhit/1.1
Disallow: /

Option 2 using htaccess - this should work:

#   Block Facebook bot
RewriteCond %{HTTP_USER_AGENT} ^facebookexternalhit/1\.1  [NC]
RewriteRule ^ - [F,L]


Thank you very much Kindred, I'll try those out

Dave J

I tried the last one first but that has made no difference. I added the code the the .htaccess file and, although it's probably a coincidence the numbers have gone up.

I added the code around 17.00 and you can see what the numbers are now.

I have now created the 'robots.txt file and added that to the root we'll see if that works

shawnb61

I think there's a typo in the .htaccess string above.  The carat ^ is a placeholder for the beginning of the string, so it will only match if the user agent starts with facebookexternalhit/1.1.

I use a different syntax:
BrowserMatchNoCase facebookexternalhit bad_bot
BrowserMatchNoCase claudebot bad_bot
<Limit GET POST HEAD>
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</Limit>

And every time I find a new bad bot I just add another row to the list.
A question worth asking is born in experience & driven by necessity. - Fripp

Dave J

Thanks Shawn I've give that a go. The robots.txt hasn't worked either

shawnb61

On a tangent, but... "facebookexternalhit" is supposedly the thumbnail generator.  This bot is crawling all content - clearly beyond the bounds of thumbnail generation. 
A question worth asking is born in experience & driven by necessity. - Fripp

Dave J

Hmmm...no change. Currently 103 bots and 379 errors in the log

shawnb61

What errors are in the log?

I would test that the .htaccess is working by changing your browser's user agent temporarily.

Very easy to do in Chrome, because it has a nice little interface for updating your user-agent for testing purposes.  In Chrome, open the dev console.  Open the "Network Conditions" tab.  In the User agent section, if you uncheck the "Use browser default" box, then specify a "Custom" user agent, you can type anything in there.  Type each bad guy, navigate to your site, and you should be forbidden from accessing the site.
https://developer.chrome.com/docs/devtools/device-mode/override-user-agent
A question worth asking is born in experience & driven by necessity. - Fripp

Dave J

OK I tried that and I can still get to the site. See below. I have also attached my .htaccess file, maybe you'll see what's stopping it from working.

I deliberately took that screenshot so you can see the amount of errors in the log. As the errors are related to the quiz maybe Diego might have a fix to stop that.

I would still like to stop the bots from being on the site

It's 23.30 here and time for bed. I'll pick it all up in the morning. Thanks guys

shawnb61

I believe all that stuff from Line 27 down is from your cpanel function to block by IP.  I don't think you need that anymore - the block by user agent will be far more comprehensive.  I would remove everything from line 27 down, but keep the new block, and retest.
A question worth asking is born in experience & driven by necessity. - Fripp

Dave J

I've removed all those lines. Although we haven't been plagued with them today the facebook bots are still coming in 1s and 2s.

shawnb61

Hmmm...  Should work.  Can you share what your .htaccess currently looks like?

Also, where is it located in your folder structure?


I blocked facebookexternalhit on my site as well.  Turns out it was using a fair amount of CPU.  Why spend my CPU coaching their AI?

The only downside is that links from FB to my site look kinda plain now.  Not a big deal. 
A question worth asking is born in experience & driven by necessity. - Fripp

Dave J

See attached Shawn

It's in the same as the SMF files/folders 'public_html'

shawnb61

Try deleting lines 28-33
A question worth asking is born in experience & driven by necessity. - Fripp

Advertisement: