News:

Want to get involved in developing SMF, then why not lend a hand on our github!

Main Menu

[Accepted] Search Engines

Started by BiErLeEuW, August 03, 2003, 05:21:34 PM

Previous topic - Next topic

BiErLeEuW

Is Simple Machines better for search machines?? I hope that the search machine could read the path to the right message

Joshua Dickerson

Pretty sure it is because it contains less crap in the query string. Haven't gotten much testing but there is a feature in discussion for non-IIS users to allow for even better SEO.
Come work with me at Promenade Group



Need help? See the wiki. Want to help SMF? See the wiki!

Did you know you can help develop SMF? See us on Github.

How have you bettered the world today?

Juvenall Wilson

You know, I've never had any problems getting my SE installs spidered so even if it's the same thing I'd be happy..lol ;D

Metho

Actually, looks like this site is once again using [Unknown]'s output buffering already? Is that gonna be standard or did he just put that in for fun? :D

Methonis
Joshua "Methonis" Frazer
Support Specialist
The Simple Machines Team

David

This site is running the build from the CVS, thus if you see it here it is part of the package.
This space for rent.

Metho

Alrighty. :D So to answer the original, yes, they'll be more spiderable.

Methonis
Joshua "Methonis" Frazer
Support Specialist
The Simple Machines Team

[Unknown]

Quote from: David on August 03, 2003, 06:50:38 PM
This site is running the build from the CVS, thus if you see it here it is part of the package.

The theme here is NOT in the CVS.  But, yes the code is the same.

It's not using output buffering, it's built in.

-[Unknown]

David

Quote from: [Unknown] on August 03, 2003, 08:31:33 PM
The theme here is NOT in the CVS.
Ok fine, you got me there.  ;)
This space for rent.

sensovision

hehe seems that this thread appear just when I was in the process of creating similar thread ;)
And yeah Unknown's URL system is rulezzzzz it's best solution IMHO and it's make my forums crawlable in one day... I never seen that crawlers take so much pages for one session. I've already post it on the thread on YaBB SE.

BTW Unknown, will you be able to make buttons which switch between categories in same format as the rest URLs? as currently it's look like this:
Quoteindex.php?board=3;start=20

[Unknown]

I'm still considering making the start be /whatever. (like with the YaBB SE mod.)  As a bultin feature it'd be EASY to implement...

-[Unknown]

Compuart

#10
Took a shot at it. Converted the url's to look like index.php/t123/s20. Only modified the URL's that lead straight to the messages (so only the links on boardindex and messageindex), as that's where the crawler should go, and the less confusion the better I guess...

As the url-form doesn't seem to be supported on IIS, it's an option turned off by default.

(now let's see if this will be getting more than 5 results)
Hendrik Jan Visser
Former Lead Developer & Co-founder www.simplemachines.org
Personal Signature:
Realitynet.nl -> ExpeditieRobinson.net / PekingExpress.org / WieIsDeMol.Com

[Unknown]

Quote from: Compuart on August 17, 2003, 12:24:08 AM
Took a shot at it. Converted the url's to look like index.php/t123/s20. Only modified the URL's that lead straight to the messages (so only the links on boardindex and messageindex), as that's where the crawler should go, and the less confusion the better I guess...

As the url-form doesn't seem to be supported on IIS, it's an option turned off by default.

(now let's see if this will be getting more than 5 results)

Compuart, using my style: ?threadid=X/Y (where X is the threadid and Y is the start.)  works fine and is already fininshed in the cvs... plus it works on IIS.

-[Unknown]

dschwab9

Quote from: [Unknown] on August 17, 2003, 01:08:23 AM

Compuart, using my style: ?threadid=X/Y (where X is the threadid and Y is the start.)  works fine and is already fininshed in the cvs... plus it works on IIS.

-[Unknown]

Don't some search engines not like URLs with question marks tho?  Seems that's what tells it the page is dynamic and not to index it.

Compuart

#13
that combined with the session ID that is added if a user has cookies disabled (as most crawlers have):
index.php?PHPSESSID=914e6a644c0e8d549685b0cfafeb5621&threadid=X/Y
vs.
index.php/p914e6a644c0e8d549685b0cfafeb5621/tX/sY
Hendrik Jan Visser
Former Lead Developer & Co-founder www.simplemachines.org
Personal Signature:
Realitynet.nl -> ExpeditieRobinson.net / PekingExpress.org / WieIsDeMol.Com

BiErLeEuW

but i want that also messages can be found by seach engines like phpBB got :)

Spaceman-Spiff

i like compuart's version better
but wont that require htaccess?

Compuart

apache can do this without htaccess 8)
Hendrik Jan Visser
Former Lead Developer & Co-founder www.simplemachines.org
Personal Signature:
Realitynet.nl -> ExpeditieRobinson.net / PekingExpress.org / WieIsDeMol.Com

sensovision

I think both styles of URLs will work perfectly since they not contain session ID,  and don't have semicolons which cause the problems for spiders. So I think this issue shouldn't be one to worry about.

[Unknown]

Compuart, the bigger issue is the fact that your way causes the use of two kinds of URLs.  Not only does this make the code terrible, but this will DECREASE the search engine stuff because it counts off links to the ONE link, not versions of the link.

And I have a solution for the session id, okay?  I just hadn't done it yet - didn't know it was IMPERITIVE or anything.

-[Unknown]

lbyard

#19
Quote from: Juvenall Wilson on August 03, 2003, 06:47:46 PM
You know, I've never had any problems getting my SE installs spidered so even if it's the same thing I'd be happy..lol ;D

That hasn't been my experience: http://www.simplemachines.org/community/index.php?threadid=1486.

Also, I think there is a need for an archiving capability to separate the day-to-day, short-term garbage from the good stuff that should be kept.  Otherwise, the search engines, especially google, are going to assign a much lower quality rating to the overall content.  Larry

Advertisement: