News:

Wondering if this will always be free?  See why free is better.

Main Menu

Hiding PHPSESSID from Spiders does not work.

Started by RickW, September 09, 2005, 09:41:09 AM

Previous topic - Next topic

RickW

Or rather, it does not work well (at the moment).

After digging through past threads for awhile, I did come across an explanation of how SMF deals with spiders.  Apparently it recognizes the spider, then doesn't present a session ID.  I've seen several mods that do just this on phpBB - and it can work quite well, as long as you have a 100% up to date list of all the spiders.  I don't think SMF does.

Take a look at one of my sites and the google index:

site:www.scifi-fans.com [nofollow]

All of those different sessions IDs are killing me.  But I think this is an easy fix.  We just need to update the routine responsible for tracking spiders.  So where is this located?  Is there a dedicated file just for this?  On one of my phpBB sites that use a Spider Mod, here is the list of IPs and agents per search engine:

Googlebot
agent match: Google
216.239.46.|64.68.8|64.68.9|164.71.1.|192.51.44.|66.249.71.|66.249.66.|66.249.65.|66.249.64.|82.208.29.204|209.185.253.|209.185.108.|64.208.33.33|64.209.181.5

Inktomi
agent match: Slurp/|Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp [nofollow])
216.35.116.|66.196.|66.94.230.|202.212.5.|68.142.249.|68.142.25

MSN
agent match: msnbot/
131.107.3.|204.95.98.|131.107.1|65.54.164.95|207.46.98.|65.54.188.11

Partial matches are allowed.  The particular mod I'm using also will put possible spiders into a queue for approval, if they only partially match the agent string but not any of the IPs.

My phpBB mod has much more than those 3 bots above, but those 3 are the main ones and I haven't received any hits yet from the other spiders.

So all we have to do is update SMF's spider tracking routine and we're all set.  ;)  So the question is, how...

[Unknown]

That's not how SMF does it.

Notice everything is a "Supplemental Result."  Likely, these are URLs coming to Google by another means, such as through watching the pages users visit.  When your forum is well-indexed, you won't have that problem.

And, if you continue to have this problem even after that, make sure session.auto_start and session.use_trans_sid are both off.

-[Unknown]

exposage

Quote from: [Unknown] on September 10, 2005, 05:53:30 AM
That's not how SMF does it.

Notice everything is a "Supplemental Result."  Likely, these are URLs coming to Google by another means, such as through watching the pages users visit.  When your forum is well-indexed, you won't have that problem.

And, if you continue to have this problem even after that, make sure hxxp:session.auto [nonactive]_start and session.use_trans_sid are both off.

-[Unknown]

I'm wondering how this works as well.  Could someone explain where said hxxp:session.auto [nonactive]_start and session.use_trans_sid options are located?  Thanks


Advertisement: