News:

SMF 2.1.4 has been released! Take it for a spin! Read more.

Main Menu

Choopa invasion

Started by Chalky, October 24, 2012, 11:20:09 AM

Previous topic - Next topic

Chalky

Does anyone know what Choopa is and why they appear to be attempting a DDOS on my forum with over 30 instances showing on my who's online??

The IPs are all of the form 173.199.1** and here are four of them.

173.199.115.107
173.199.119.155
173.199.115.3
173.199.120.91


ApplianceJunk

What do you mean by 30 instances?

How long do ips stay on your Whos online list?

kat

Having done a bit more research, it seems they're using "Ahrefs Web Crawler - Website Extractor", which is a bit naughty.

Maybe you ought to restrict them, using .htaccess or by going to your site's CPanel and, under the "Security" tab, click "IP deny manager".

You'll figure the rest, I feel sure. :)

Or, put these two lines into the /robots.txt file on your server:

user-agent: AhrefsBot
disallow: /

Chalky

My who's online is set to 30 minutes. They have dropped to 9 now. Damned annoying when they don't seem to be a search engine or anything. Would robots.txt work when smf doesn't recognize them as spiders?  They have been lurking for several days now but today they went mad!

kat

robots.txt is a web thing, not an SMF thing. :)

It'll just block those IP addresses. According to my research, it's not a "Bad" spider. It actually seems to obey robots.txt.

Jade Elizabeth

What does this "Ahrefs Web Crawler - Website Extractor" do exactly?
Once proud Documentation Writer and Help Squad Leader | Check out my new adult coloring career: Color With Jade/Patreon.

mrintech

If you use cPanel, then block the complete IP Range using IP Deny Manager: http://docs.cpanel.net/twiki/bin/view/AllDocumentation/EnkompassHelp/IpDeny (Implied Range)

Bad bots don't follow robots.txt

kat

Quote from: Jade Elizabeth on October 24, 2012, 11:42:24 PM
What does this "Ahrefs Web Crawler - Website Extractor" do exactly?

What Google bots do, essentially. Being an extractor, though, what it could be trying to do, is download entire sites. Essentially, it's SUPPOSED to be for the ability to use the site offline.

Obviously, though, they can be a bit more nefarious...

http://www.websitescraping.com

Chalky

Thanks K@ and mrintech!  There were over 50 of them in my WO this morning at one time, so I have blocked both the 173.199 range in cpanel and added what k@ said to robots.  They've gone now. Do you really think the little bastards were scraping my content?  I'd certainly guess at them being malicious anyway....

kat

Some places harvest sites, like that.

Google does it.

When you do a search, look for the word "Cached", under the links, and you'll see loads of them. Some sites don't even exist, now, but Google have cached versions of them. They even have old versions of this site cached, somewhere.

mrintech

Quote from: K@ on October 25, 2012, 08:47:10 AM

When you do a search, look for the word "Cached", under the links, and you'll see loads of them. Some sites don't even exist, now, but Google have cached versions of them. They even have old versions of this site cached, somewhere.

???

Google keeps cached version of non existing sites for some months and then drops them completely. Although wayback machine (that IA Archiver Bot) maintains a very good past archive:

2012: http://wayback.archive.org/web/*/http://www.simplemachines.org/
.
.
.
.
2003: http://wayback.archive.org/web/20030101000000*/http:////www.simplemachines.org//

:)

kat

Must've changed, then, coz they sure used to.

Anyway, what you added just further illustrates my point. :)

Jade Elizabeth

I think there's a new line I need to add to my robots.txt :-\
Once proud Documentation Writer and Help Squad Leader | Check out my new adult coloring career: Color With Jade/Patreon.

Kindred

mind you... robots.txt is only useful for bots that look for and respect the instructions.

In other words, if it is a scraper, it probably won't respect instructions in robots.txt and you'll have to add ban instructions in your host manager or .htaccess
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

Jade Elizabeth

Yeah, but until I know the scrapers IPs this is my best bet lol.
Once proud Documentation Writer and Help Squad Leader | Check out my new adult coloring career: Color With Jade/Patreon.

waris

Quote from: ChalkCat on October 24, 2012, 03:17:06 PM
My who's online is set to 30 minutes. They have dropped to 9 now. Damned annoying when they don't seem to be a search engine or anything. Would robots.txt work when smf doesn't recognize them as spiders?  They have been lurking for several days now but today they went mad!

SMF will show in Online Users if they are "Spiders" if you set your Registration security to "High" or "Very High".

The down side will be that your "Captcha" will be difficult to read by the registrant.

Jade Elizabeth

You don't need to set rego security to high to see spiders....it's a setting by itself.
Once proud Documentation Writer and Help Squad Leader | Check out my new adult coloring career: Color With Jade/Patreon.

Chalky

I have the spider setting set.  Google, Bing, Alexa, Yahoo, etc all show as spiders.  These 50-odd Choopa IP addresses did not, they simply showed as guests.  I have had no more of them since I blocked them in cPanel and robots.txt  ;)

Jade Elizabeth

Ahh good, you can add spiders too I believe if you need to :).
Once proud Documentation Writer and Help Squad Leader | Check out my new adult coloring career: Color With Jade/Patreon.

Advertisement: