News:

SMF 2.1.4 has been released! Take it for a spin! Read more.

Main Menu

Choopa invasion

Started by Chalky, October 24, 2012, 11:20:09 AM

Previous topic - Next topic

Chalky

Does anyone know what Choopa is and why they appear to be attempting a DDOS on my forum with over 30 instances showing on my who's online??

The IPs are all of the form 173.199.1** and here are four of them.

173.199.115.107
173.199.119.155
173.199.115.3
173.199.120.91


ApplianceJunk

What do you mean by 30 instances?

How long do ips stay on your Whos online list?

kat

Having done a bit more research, it seems they're using "Ahrefs Web Crawler - Website Extractor", which is a bit naughty.

Maybe you ought to restrict them, using .htaccess or by going to your site's CPanel and, under the "Security" tab, click "IP deny manager".

You'll figure the rest, I feel sure. :)

Or, put these two lines into the /robots.txt file on your server:

user-agent: AhrefsBot
disallow: /

Chalky

My who's online is set to 30 minutes. They have dropped to 9 now. Damned annoying when they don't seem to be a search engine or anything. Would robots.txt work when smf doesn't recognize them as spiders?  They have been lurking for several days now but today they went mad!

kat

robots.txt is a web thing, not an SMF thing. :)

It'll just block those IP addresses. According to my research, it's not a "Bad" spider. It actually seems to obey robots.txt.

Jade Elizabeth

What does this "Ahrefs Web Crawler - Website Extractor" do exactly?
Once proud Documentation Writer and Help Squad Leader | Check out my new adult coloring career: Color With Jade/Patreon.

mrintech

If you use cPanel, then block the complete IP Range using IP Deny Manager: http://docs.cpanel.net/twiki/bin/view/AllDocumentation/EnkompassHelp/IpDeny (Implied Range)

Bad bots don't follow robots.txt

kat

Quote from: Jade Elizabeth on October 24, 2012, 11:42:24 PM
What does this "Ahrefs Web Crawler - Website Extractor" do exactly?

What Google bots do, essentially. Being an extractor, though, what it could be trying to do, is download entire sites. Essentially, it's SUPPOSED to be for the ability to use the site offline.

Obviously, though, they can be a bit more nefarious...

http://www.websitescraping.com

Chalky

Thanks K@ and mrintech!  There were over 50 of them in my WO this morning at one time, so I have blocked both the 173.199 range in cpanel and added what k@ said to robots.  They've gone now. Do you really think the little bastards were scraping my content?  I'd certainly guess at them being malicious anyway....

kat

Some places harvest sites, like that.

Google does it.

When you do a search, look for the word "Cached", under the links, and you'll see loads of them. Some sites don't even exist, now, but Google have cached versions of them. They even have old versions of this site cached, somewhere.

mrintech

Quote from: K@ on October 25, 2012, 08:47:10 AM

When you do a search, look for the word "Cached", under the links, and you'll see loads of them. Some sites don't even exist, now, but Google have cached versions of them. They even have old versions of this site cached, somewhere.

???

Google keeps cached version of non existing sites for some months and then drops them completely. Although wayback machine (that IA Archiver Bot) maintains a very good past archive:

2012: http://wayback.archive.org/web/*/http://www.simplemachines.org/
.
.
.
.
2003: http://wayback.archive.org/web/20030101000000*/http:////www.simplemachines.org//

:)

kat

Must've changed, then, coz they sure used to.

Anyway, what you added just further illustrates my point. :)

Jade Elizabeth

I think there's a new line I need to add to my robots.txt :-\
Once proud Documentation Writer and Help Squad Leader | Check out my new adult coloring career: Color With Jade/Patreon.

Kindred

mind you... robots.txt is only useful for bots that look for and respect the instructions.

In other words, if it is a scraper, it probably won't respect instructions in robots.txt and you'll have to add ban instructions in your host manager or .htaccess
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

Jade Elizabeth

Yeah, but until I know the scrapers IPs this is my best bet lol.
Once proud Documentation Writer and Help Squad Leader | Check out my new adult coloring career: Color With Jade/Patreon.

waris

Quote from: ChalkCat on October 24, 2012, 03:17:06 PM
My who's online is set to 30 minutes. They have dropped to 9 now. Damned annoying when they don't seem to be a search engine or anything. Would robots.txt work when smf doesn't recognize them as spiders?  They have been lurking for several days now but today they went mad!

SMF will show in Online Users if they are "Spiders" if you set your Registration security to "High" or "Very High".

The down side will be that your "Captcha" will be difficult to read by the registrant.

Jade Elizabeth

You don't need to set rego security to high to see spiders....it's a setting by itself.
Once proud Documentation Writer and Help Squad Leader | Check out my new adult coloring career: Color With Jade/Patreon.

Chalky

I have the spider setting set.  Google, Bing, Alexa, Yahoo, etc all show as spiders.  These 50-odd Choopa IP addresses did not, they simply showed as guests.  I have had no more of them since I blocked them in cPanel and robots.txt  ;)

Jade Elizabeth

Ahh good, you can add spiders too I believe if you need to :).
Once proud Documentation Writer and Help Squad Leader | Check out my new adult coloring career: Color With Jade/Patreon.

Ricky.

hmm.. I am always facing Chooopa invasion on my forum and a only discussion over net seems to be here on very own SMF community. I guess blocking their IP range is best bet..

busterone

I had 78 Choopa IPs crawling all at one time last week.  There was a bit of talk about them over at Stop Forum Spam's forum, but other than that, the only other place I have seen them mentioned is here. I blocked the entire 173.199 range last week right after I started looking up the various IPs all from 173.199.*.*  It all belongs to Choopa. Even if by some rare chance they are legit, I don't want 50 to 75 bots crawling my site all at once, but I suspect they are up to no good myself.
I see no reason hundreds of bots from a bank of dedicated servers would be crawling in such mass numbers unless spammers are involved. 

Ricky.

Or they are working hard to make a Google search rival :P

PS: I have also blocked 173.199 range..

Jade Elizabeth

Now I have approximately 40 of them online :(. I'm adding to robots.txt now.
Once proud Documentation Writer and Help Squad Leader | Check out my new adult coloring career: Color With Jade/Patreon.

Jade Elizabeth

Well since they're completely ignoring robots.txt or I have the wrong entry I am just going to cpanel deny them.
Once proud Documentation Writer and Help Squad Leader | Check out my new adult coloring career: Color With Jade/Patreon.

Jade Elizabeth

#25
Why is it that when I add them to spider list they arent showing as spiders, even when they pop up later?


EDIT: they're adding to stats (in 5 minutes there has been 16 hits), and the log says they've visited...so why isn't the who's online list showing them as spiders?



Working now lol. They're averaging 20+ hits a minute.
Once proud Documentation Writer and Help Squad Leader | Check out my new adult coloring career: Color With Jade/Patreon.

Jade Elizabeth

#26
Alright so I added them by user agent to my spider list and since then I have legitimately had 100 spiders instead of 100 guests online. It's horrible, now I have only 4 guests. I'd rather have real guests than fake ones though!

Also I noted my bandwidth almost doubled last month out of nowhere, and this month if it kept going at the rate it was it would have used over 2/3rds of the bandwidth I have. So I'm glad I figured out what it was!


Adding this to my .htaccess
# BLOCK USER AGENTS
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} AhrefsBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Ahrefs [NC]
RewriteRule !^robots\.txt$ - [F]

# BLOCK BLANK USER AGENTS
RewriteCond %{HTTP_USER_AGENT} ^-?$



Considering installing this:
http://perishablepress.com/blackhole-bad-bots/

Also, if you're interested... I added these spiders who also use Ahrefs bot, which I added last realising I could do keywords lol..

1. Utel Datacenter Networks (over 500 an hour)
Mozilla/5.0 (compatible; AhrefsBot/3.1; +http://ahrefs.com/robot/)
213.186.1**.***

2. Utel Datacenter Networks
Mozilla/5.0 (compatible; AhrefsBot/3.0; +http://ahrefs.com/robot/)

3. Choopa (over 1000 an hour)
Mozilla/5.0 (compatible; AhrefsBot/4.0; +http://ahrefs.com/robot/)

4. AhrefsBot
AhrefsBot
Once proud Documentation Writer and Help Squad Leader | Check out my new adult coloring career: Color With Jade/Patreon.

Chalky

They are little swines Jade.  I blocked the range in my cPanel and haven't heard a peep out of them since.

Jade Elizabeth

That's not even the worst of it darls...just add this to your .htaccess

# BLOCK USER AGENTS
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} AhrefsBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} spbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} DigExt [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Sogou [NC,OR]
RewriteCond %{HTTP_USER_AGENT} MJ12 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} majestic12 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} 80legs [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SISTRIX [NC,OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Semrush [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Ezooms [NC,OR]
RewriteCond %{HTTP_USER_AGENT} CCBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} TalkTalk [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Ahrefs [NC]
RewriteRule !^robots\.txt$ - [F]

# BLOCK BLANK USER AGENTS
RewriteCond %{HTTP_USER_AGENT} ^-?$


Some of those are malicious and some of them are harvesters and the rest are just nuisances. I've been researching a LOT and there's even an old one (TalkTalk) that seems to be maybe malicious but I can't tell. Better safe than sorry!! I've noticed an increase in speed and a HUGE drop in "guests" - I went from 100+ to 3!

I got my user agents from here, or I searched the IPs on Google to get the agents: http://myip.ms/
Once proud Documentation Writer and Help Squad Leader | Check out my new adult coloring career: Color With Jade/Patreon.

busterone

They are bandwidth suckers for sure. Glad you got 'em blocked.   :)

waris

It is a US company and has been described by stopforumspam as :

Toxic IP address or "bad" email domain
Highlighted   Hot IP or disposable email address


They are now knocking at my door and being zapped by Bad Behaviour and Forum Firewall
+ the Validation questions does help.

You click on the Guest User and you will see the IP and if you have the geoIp mod installed
you will know exactly where in the US they are from.

CMOBOSS_OLD

I like to leave my domains open for a bit while I work on them just for the sake of obtaining a good list of iPs to ban. ;)

Kindred

Banning IPs  is basically pointless.
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

Advertisement: