News:

Bored?  Looking to kill some time?  Want to chat with other SMF users?  Join us in IRC chat or Discord

Main Menu

Improve SEO axe sessionid for spiders

Started by dupont24, November 18, 2006, 07:18:24 AM

Previous topic - Next topic

dupont24

I have no clue if this is the magical wand that could fix the seo issues with SMF but it would be a step in the right direction.

Everything I have read points to spiders having a difficult time indexing with session id..... I have seen some SMF forums indexed at or above 70 percent of there thread / post count.  That is a decent number I guess.  Most are no where near that, lucky if they reach 10 percent.

One thing in common When I see the results on google I know instantly that it is a SMF board.  It always has that quest login information instead of the post content.  There must be a way to get that out of indexing, but how with robots txt? 


I am begging for a mod that  solves this sessionid issue.

Is it possible to create a group for known spiders that do not need session id's?
Then thay can crawl at will.

someone please help I am not the only one suffering poor indexing .

http://www.simplemachines.org/community/index.php?topic=127715.0


Dannii

In 1.1 sessionids aren't shown to known spiders.
"Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise."

dupont24

#2
If that is the case why are they showing phpsesionid in the url.  I have read that sessionid was removed from spiders view in rc2.  However i have also read that is not the case and the spiders still see sessionid as they are treated as guest and not spiders.  Case in point look at indexed threads from SMF most carry the guest activation message in the body.  That is repeated in most indexed threads.

If you enter a url from a thread created in rc3 scroll down to the bottom and look at the results.

http://www.webconfs.com/search-engine-spider-simulator.php

I entered http://sprintcupfans.com/sportsforum/index.php/topic,11484.0.html

and the results at the bottom  carry the ?PHPSESSID=c37e9171f4... which is seen by spiders.

There is another post regarding this exact issue here.  By jonks second post

http://www.simplemachines.org/community/index.php?topic=59676.165

SleePy

Using that simulator they still show as guest on my forum. So its not correctly identifying itself as a spider than. Which is why that happens.
Jeremy D — Site Team / SMF Developer
Support the SMF Support team!
Profiles:GitHub
Join us on IRC Libera.chat/#smf

MrCue

#4
Right before


return $buffer;


in Sources/QueryString.php in the ob_sessrewrite function.

Put this, it will remove SID for anyone and everyone.


$buffer = preg_replace("~PHPSESSID=([0-9A-Za-z]*)~i", '', $buffer);
$buffer = str_replace('?&', '?', $buffer);
$buffer = str_replace('?amp;', '?', $buffer);
$buffer = str_replace('?/', '/', $buffer);


Other choice is to find this

// This isn't meant to be reliable, it's just meant to catch most bots to prevent PHPSESSID from showing up.

In Load.php and make it detect bots/spiders better.
I am neither a Pessimist nor an Optimist, Just a Realist.

Eve-Online Forum | View Latest Eve-Online Kills | Site Map | SMF Installation

dupont24

#5
Quote from: SleePy on November 18, 2006, 11:27:17 AM
Using that simulator they still show as guest on my forum. So its not correctly identifying itself as a spider than. Which is why that happens.

That is probably accurate....but that also means google does not coreectly identify itself as well as many other spiders.

Just look at any successful indexing from a SMF site and you will see in the body of the indexed post it shows the guest login info.


Thank you MrCue
I will try that and see if it helps....

Also any ideas as to how to make it detect better in the load.php or how to stop spiders from indexing that same line over and over
Welcome, Guest. Please login or register. Did you miss your activation email?

Dannii

Make your theme more semantical, so that the content is more important and easy to access.
"Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise."

dupont24

Quote from: eldʌkaː on November 18, 2006, 10:09:48 PM
Make your theme more semantical, so that the content is more important and easy to access.

What would you suggest changing i am using the default which i like very much.  Only reason is the mod ease.



MrCue Since i changed the file with that mod, I have had spiders in the threads / post area.  Specifically google. Now I have seen this for the first time within minutes of changing the file.  However only about a third make it to threads the rest are only requesting the index.php and never crawl beyond that.   

None the less what you suggested has made an improvement.


Daniel Hofverberg

Quote from: dupont24 on November 18, 2006, 11:16:06 AM
If that is the case why are they showing phpsesionid in the url.
Check to make sure that the PHP setting session.use_trans_sid is disabled. If that is enabled, then session ID's may appear with no way for SMF to stop it. As long as that setting is disabled, it seems to work properly - I've checked all indexed threads from my forum, and none include PHPSESSID.

dupont24

Quote from: Daniel Hofverberg on November 20, 2006, 04:11:54 AM
Check to make sure that the PHP setting session.use_trans_sid is disabled. If that is enabled, then session ID's may appear with no way for SMF to stop it. As long as that setting is disabled, it seems to work properly - I've checked all indexed threads from my forum, and none include PHPSESSID.


Where would I locate this?  I am totally new to SMF.. sorry.

Daniel Hofverberg

Check with phpinfo(), if you have such a file. Otherwise, create one by creating a text file containing the following:
<?php phpinfo(); ?>


Call it for instance phpinfo.php, and upload it to the server. Go to that URL in your browser, and scroll down to the heading "Session". Check the value use_trans_sid there, and see if it is set to On or Off.

dupont24


SleePy

if your host allows custom php.ini you can add it in there so it doesn't show the php sessionid
Jeremy D — Site Team / SMF Developer
Support the SMF Support team!
Profiles:GitHub
Join us on IRC Libera.chat/#smf

Daniel Hofverberg

Or if your host has PHP running as an Apache module, then you can most likely change the setting via .htaccess. Just create a file called ".htaccess" (or edit the file, if it already exists), and add the following line in there:
php_flag session.use_trans_sid Off

dupont24

Thanks for all the help..... I am trying this to see if it improves the indexing.  I hope so or I will have no choice but to change forum software again....

I have another site i just set up 3 weeks ago I added it to my webmaster account.  Within 10 days google visited the site.

72 hours later all 8 html web pages were indexed.  Only the index.php of the forum has been indexed after 3 weeks.  There is a definite seo issue with SMF and i can only hope this helps.

My sports site has not had a single thread/post indexed in almost 3 weeks of using SMF, prior with vb as many as 50 plus a day. I hate the fact that peeps defend the seo in smf my board is almost invisible to spiders at this point. That is really a shame cuz I really believe SMF is better than VB for ease of use alone.

Is there anyway to make spiders ignore the welcome guest please sign in or did you forget to activate.  Tha appears in every smf indexed thread/post.

WintermuteX

Quote from: MrCue on November 18, 2006, 11:42:40 AM
Right before


return $buffer;


in Sources/QueryString.php in the ob_sessrewrite function.

Put this, it will remove SID for anyone and everyone.


$buffer = preg_replace("~PHPSESSID=([0-9A-Za-z]*)~i", '', $buffer);
$buffer = str_replace('?&', '?', $buffer);
$buffer = str_replace('?amp;', '?', $buffer);
$buffer = str_replace('?/', '/', $buffer);


Other choice is to find this

// This isn't meant to be reliable, it's just meant to catch most bots to prevent PHPSESSID from showing up.

In Load.php and make it detect bots/spiders better.

Your code helped me for now, as switching session.use_trans_sid off didn't help (neither in php.ini nor in .htaccess).
The only thing i would love to see is getting the urls without the "?" ->

not: http://www.gothic-chat-community.net/forum/index.php?board=1.0
but: http://www.gothic-chat-community.net/forum/index.php/board,1.0.html

as i use an apache server with mod_rewrite.

GC

This works just fine.

http://www.webmasterstalks.com/seo-4-smf-b46.0/

Try it out, I had it enabled for a week and google indexed 400 + pages with 2 scans.

Advertisement: