Content Scrapers...spammer?

Started by ~DS~, February 12, 2010, 07:28:27 PM

Previous topic - Next topic

~DS~

Today I got many hits from 174.133.177.66 in the error logs.
I ban the ip, it was no good. it still index the forum and access every url or pages. I was told it was a content scraper. I unban the ip because I thought it would help the traffic.
"There is no god, and that's the simple truth. If every trace of any single religion were wiped out and nothing were passed on, it would never be created exactly that way again. There might be some other nonsense in its place, but not that exact nonsense. If all of science were wiped out, it would still be true and someone would find a way to figure it all out again."
~Penn Jillette – God, NO! – 2011

Garou

That IP if not spoofed, is static and belongs to ThePlanet.com you may try contacting them about the scraper.

I dont why banning it wouldn't solve the issue for SMF. Perhaps try blocking the IP through your ISP's control panel.

busterone

I banned it in .htaccess quite a while back. It was attempting all sorts of access, and totally ignoring my rules set up in robots.txt. 

~DS~

Quote from: busterone on February 12, 2010, 10:33:06 PM
I banned it in .htaccess quite a while back. It was attempting all sorts of access, and totally ignoring my rules set up in robots.txt.
Rules in robots.txt? Sorry I am a newbie.
"There is no god, and that's the simple truth. If every trace of any single religion were wiped out and nothing were passed on, it would never be created exactly that way again. There might be some other nonsense in its place, but not that exact nonsense. If all of science were wiped out, it would still be true and someone would find a way to figure it all out again."
~Penn Jillette – God, NO! – 2011

busterone

That's ok. I was too at one time. Take a look here and it will explain it better than I can.
There's a lot of stuff here on simplemachines.org about it too.  :)
http://www.robotstxt.org/robotstxt.html

vbgamer45

Scrapers don't normally respect robots.txt though
Community Suite for SMF - Take your forum to the next level built for SMF, Gallery,Store,Classifieds,Downloads,more!

SMFHacks.com -  Paid Modifications for SMF

Mods:
EzPortal - Portal System for SMF
SMF Gallery Pro
SMF Store SMF Classifieds Ad Seller Pro

busterone

Very true. That is why I just blocked it from all my sites.  :)

~DS~

Quote from: busterone on February 12, 2010, 11:11:41 PM
Very true. That is why I just blocked it from all my sites.  :)
Blocked? you mean banned? If so, how do you block, where and how?
"There is no god, and that's the simple truth. If every trace of any single religion were wiped out and nothing were passed on, it would never be created exactly that way again. There might be some other nonsense in its place, but not that exact nonsense. If all of science were wiped out, it would still be true and someone would find a way to figure it all out again."
~Penn Jillette – God, NO! – 2011

busterone

You can ban the ip in your hosts cpanel, or manually in your .htaccess file.
Here is an example from my .htaccess file. I x'd out the other addresses,
and did not include the entire file but you can get an idea of what it looks like.

## USER IP BANNING
<Limit GET POST>
order allow,deny
deny from 174.133.177.66
deny from XXX.XXX
deny from XXX
deny from XXX.XXX
allow from all

~DS~

"There is no god, and that's the simple truth. If every trace of any single religion were wiped out and nothing were passed on, it would never be created exactly that way again. There might be some other nonsense in its place, but not that exact nonsense. If all of science were wiped out, it would still be true and someone would find a way to figure it all out again."
~Penn Jillette – God, NO! – 2011

Advertisement: