News:

Bored?  Looking to kill some time?  Want to chat with other SMF users?  Join us in IRC chat or Discord

Main Menu

More Spiders

Started by SleePy, April 10, 2008, 11:01:45 PM

Previous topic - Next topic

karlbenson

In SMF 2.x it adds the ability to detect (and optionally restrict) spiders on your forum.
By default only some googlebot/slurp(yahoo)/msnbot are detected.
This basically adds many many more.

PrizeLive.com

Just wondering though, what good do spiders do on your forum?
Get Paid Instantly via PayPal (or other options) at PrizeLive.com!

karlbenson

#23
Spiders get you in search engines.
If you don't allow spiders, then don't expect to appear in Google, Yahoo, MSN or any other search engine.

Some of the spiders/bots/crawlers in this mod are not for search engines, but tools like W3C Validator, so you can see when its being run on your forum.

PrizeLive.com

I just installed the " Googlebot & Spiders " mod and made the changes you said to change and now my homepage won't load.
Get Paid Instantly via PayPal (or other options) at PrizeLive.com!

karlbenson

Did it work after installing the mod, but before making the changes I posted?
You may have made a slight mistake (so get a white page)

Double check your edits.

If you still can't spot it, upload your Sources/Subs.php here and I'll take a quick look.

PrizeLive.com

I deleted all the edits I made an will re-attempt later and let you know :).
Get Paid Instantly via PayPal (or other options) at PrizeLive.com!

next-evolution

thank you very mach for the mod :D

FragaCampos

In this who.template i don't have a "$known_spiders = array ("
I have a " $known_agents = array ("

Is it the same? Can i had the spiders' list you provided to this who.template?

Thanks once more.

karlbenson

I'm not sure what your using it with.

The attached edits I posted are for the Googlebot mod, which has them in Sources/Subs.php.
Not who.template

karlbenson

1.1 - 4th May 2008
o Fixed Alexa/InternetArchive
o 25 More Spiders added (which have been detected on my forum in the past month)

FragaCampos

Quote from: karlbenson on May 02, 2008, 07:15:34 PM
I'm not sure what your using it with.

The attached edits I posted are for the Googlebot mod, which has them in Sources/Subs.php.
Not who.template

Got it ;) Thanks again!

rumfa

Will it work the same if I copy the spider list into the who.template?

karlbenson

@rumfa are you referring to Who.template.php spiders mod
http://www.simplemachines.org/community/index.php?topic=19243.0

You would have to manipulate the array as posted in the attachment.

rumfa

#34
ok i did it. Added the whole list. How do i add a custom spider? I have some local spiders here. It is:

(85.10.36.115, Mozilla/5.0 (compatible; Pogodak.co.yu/3.1))
Do i just add the following?
Quotearray (
         'agent' => 'Pogodak',
         'spidername' => 'Pogodak',
         'spider' => true,
          ),

And afcourse do the same in the subs.php but without 'spider' => true,...

karlbenson


2Ntense

There's one I've seen a lot that isn't in your list (haven't looked at the regular 2.0b code yet):

oBot = Cobion.com

From what I could glean from the skimpy info on their site last year, it was most likely hunting for copyright violations.  It was producing twice as many hits on the database per day as Google and Yahoo combined.  I banned it in the robots.txt file and it ignored the ban, so I blocked the sucker in the .htaccess file.

That forum didn't have any problems with hot software.  Not all of us are into warez...

karlbenson

Theres a few companies employed by the content mafiaa who go around browsing sites.
However I've not included any of them. Mainly because most people wouldn't want them showing on the list.

Thanks for reporting anyway.

rumfa

#38
Here are two more who come 4-7 at once :S

Mozilla/3.0 (x86 [en] Windows NT 5.1; Sun)
ip's (probably not all but yet..)
217.23.31.154
61.35.100.131
218.28.213.194
76.104.218.228

Mozilla/4.8 [en] (X11; U; Linux 2.4.20-8 i686)
I have no ip for now...

karlbenson

Those will be proxies or content scrapers/email harvesters/hackers/spammer etc

Advertisement: