Almost ready to give up on SMF

Started by dupont24, November 17, 2006, 05:31:04 PM

Previous topic - Next topic

destalk

#80
Quote from: Motoko-chan on December 31, 2006, 01:16:32 AM
Most of those have been removed as a problem since 1.1 final by telling spiders not to index duplicate links.

Just to be pedantic, the noindex tag will still allow robots to 'see' the page (obviously, they have to see the page to see the noindex meta tag). Whereas with the robots.txt, they are 'excluded' from following any links to the disallowed page/s.

No-one knows whether the search engines make a record of the content of a page that has the noindex tag. The key point is that the pages will not be returned in search results. Or, which is what we are after, the page will be returned, but only using the URLs that we want - i.e. the root of the thread, rather than any of the duplicates, such as those with '.msg' in them.

Overall, it seems clear that duplicate url issues are not the reason why some people are having problems getting SMF forums indexed by serach engines. Reading the Google blog link shows us that Google doesn't penalise sites for having different links to the same content.

The thing is that many people with all sorts of web sites using all sorts of technology (and even static html pages) often have problems getting their sites indexed. There are many many reasons for this.

The special reason why forums may have problems is that their very structure is often so 'cluttered' with 'features' that the spiders can get confused as to which content is important and what is not. Various membership systems, profiles, archives, shout boxes, galleries, online games and so on, all add several complex layers of information for robots to try and work out. As opposed to a simple html website with a few images. With this in mind, it may be best to choose a theme with a fairly simple layout.

With regards to SE indexing, the only SMF specific issue that I would like to see improved is the addition of a second line for some descriptive text in addition to the topic heading. A page with hundreds of links and no ordinary text content is not always liked by search engines, as they may think that there is no real content there. A second line of descriptive text would help to make the message.index page more SE friendly.

For example;

Almost ready to give up on SMF
Some descriptive text to go here might help a list of links on the message index be more 'indexable'...


Kindred

Quote from: shawn911 on December 30, 2006, 09:22:02 PM
You are using Cutenews and not Joomla.
To my point of view, if you use Joomla as CMS, a Bridge for SMF, and SMF NOT WRAPPED you will hit lots of problems indexing your SMF forum.
Duplicates are the main reason i think :
All the 3 urls below are duplicates. And you can browse the whole forum with each of the url below :

- /index.php?option=com_smf&ItemId=92
- /forum/index.php
- /component/option,com_smf/ItemId,92/   (if SEO is activated)

Yet, I am using Joomla and I have no problems with indexing, and no discussbot installed.
I don't have SEO, I have disabled direct access to the forum, and so, .../index.php?option=com_smf&ItemId=92 is the ONLY link to the forum on my site.
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

xtremecruiser

It could be that you are in the Google sandbox 8)

青山 素子

#83
Quote from: destalk on December 31, 2006, 02:08:56 AM
Quote from: Motoko-chan on December 31, 2006, 01:16:32 AM
Most of those have been removed as a problem since 1.1 final by telling spiders not to index duplicate links.

Just to be pedantic, the noindex tag will still allow robots to 'see' the page (obviously, they have to see the page to see the noindex meta tag). Whereas with the robots.txt, they are 'excluded' from following any links to the disallowed page/s.

True.

Quote from: destalk on December 31, 2006, 02:08:56 AM
The thing is that many people with all sorts of web sites using all sorts of technology (and even static html pages) often have problems getting their sites indexed. There are many many reasons for this.

The special reason why forums may have problems is that their very structure is often so 'cluttered' with 'features' that the spiders can get confused as to which content is important and what is not. Various membership systems, profiles, archives, shout boxes, galleries, online games and so on, all add several complex layers of information for robots to try and work out. As opposed to a simple html website with a few images. With this in mind, it may be best to choose a theme with a fairly simple layout.

I agree here. I'm using a simple variation on the classic theme (a very popular layout at that) and I seem to be indexed well. No distracting content or other crud to scare away the spiders. As I said before, there is more than just SMF itself involved here. Now to just figure out how the combination of factors involved influences things.


Quote from: destalk on December 31, 2006, 02:08:56 AM
With regards to SE indexing, the only SMF specific issue that I would like to see improved is the addition of a second line for some descriptive text in addition to the topic heading. A page with hundreds of links and no ordinary text content is not always liked by search engines, as they may think that there is no real content there. A second line of descriptive text would help to make the message.index page more SE friendly.

For example;

Almost ready to give up on SMF
Some descriptive text to go here might help a list of links on the message index be more 'indexable'...

Putting the first post's content (stripped of tags) as the META Description might be helpful. I know modern search engines don't really pay attention to META tags, but it makes for nicer return results at least and might help a bit. (This can probably be done in the themes themselves.)
Motoko-chan
Director, Simple Machines

Note: Unless otherwise stated, my posts are not representative of any official position or opinion of Simple Machines.


destalk

#84
Quote from: Motoko-chan on December 31, 2006, 02:40:44 PM
Putting the first post's content (stripped of tags) as the META Description might be helpful. I know modern search engines don't really pay attention to META tags, but it makes for nicer return results at least and might help a bit. (This can probably be done in the themes themselves.)

It's true the KEYWORDS tag is all but redundant. It's also true that the META Description tag is not used for ranking purposes. However, the META Description tag is utilised in a number of other ways. As you point out the description is much more user friendly in search results.

But also, I have read and seen much that points to the lack of a unique META Description tag being a factor in Google seeing pages created by 'dynamic' web tools - such as content management systems and forums - as duplicate content, or 'boilerplate' template pages.

Currently SMF puts the title as the description, which is fairly OK. Repeating the text from the first post is also an interesting idea. Although, it's not strictly 'accurate' (after all it's not a description ;) ). I think the ideal solution, (from a usability point of view as much as from a search engine point of view) would be to have an option for the thread starter to add some description text.

destalk

Quote from: Motoko-chan on December 30, 2006, 07:18:27 PM
Yet I have no wrapper on my forum and quite decent indexing. There are more factors in play than just SMF here, we just need to figure out what...

http://www.google.com/search?q=site%3Aforum.staticsubs.org

Currently 823 indexed items.

Interestingly all those results (other than the home page) are Supplemental Results. Which means they are in Google's Supplemental Index. This is often the case when Google sees those pages as having duplicate content. I notice that the description for each of those pages is very similar "Welcome, Guest. Please login or register. Did you miss your activation email?...".

Hopefully, that will sort itself out once the noindex rules get indexed.

Dannii

A site:site.com is not really a search, so it just shows the first things in the page. If you did a proper search it would probably show a proper description excerpt.
"Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise."

Stüldt Håjt

#87
This is my robots.txt for smf:
User-agent: *
Disallow: /index.php?action
Disallow: /index.php?wap
Disallow: /index.php?wap2
Disallow: /index.php?imode
Disallow: /index.php?type=rss
Disallow: /index.php*msg
Disallow: /index.php*sort
Disallow: /index.php*prev_next

User-agent: Googlebot-Mobile
Allow: /index.php?wap
Allow: /index.php?wap2

User-agent: Slurp
Allow: /index.php?wap
Allow: /index.php?wap2


I think Im pretty close to perfection with this robots.txt Google only indexes my forum index, board indexes and topic pages. And Google Mobile gets wap versions for mobile search (I'm not sure how well this works since I did this just yesterday).

Use this Google remove url tool to get unwanted results removed from google. For example I had those /index.php?action pages like profile, login, register, search and I just gave remove url tool link to my robots.txt file and it took 1 day and every unwanted search result were gone.

And these Disallow: /index.php*msg /index.php*sort /index.php*prev_next things I know wildcards are not supported but Google webmasters tools shows that Googlebot undestands them. So when Googlebot is crawling, it ignores every /index.php?topic=XXX.msgYYY#msgZZZ, /index.php?board=X.Y;sort=subject and /index.php?topic=XYZ.0;prev_next=prev#new links instead of going to those pages and seeing noindex. And now Googlebot only follows important links and ignores everything else which makes crawling more effective.

Edit: I found out that you can submit mobile pages to Yahoo too, so I included slurp to my robots.txt too. http://search.yahoo.com/info/submit.html

destalk

Nice robots file Stüldt Håjt

I didn't know about the Googlebot-Mobile spider.

However, I wouldn't use the Google Remove Tool for just general removal issues. This is for emergency removal of content for legal reasons and so on. The recommended method for removing pages is to let Google find a 404 error, or to use robots.txt

motumbo

#89
By the way folks, if this hasn't already been said in this thread, Google is a screwed up search engine.  Google sucks bigtime.

Try reading some of the other message boards about Google dropping websites or dropping the number of pages indexed for a site for no reason.

It happens all the time.

I, too, do not like the SMF URL structure.  topic837.msg1002 - or whatever it is don't like it at all.

Take a look at how many parameters are in the SMF URL structure--4.  Google says that URLs with too many parameters (how many is too many?) won't be placed in its normal index.

http://www.google.com/support/webmasters/bin/answer.py?answer=34473&query=supplemental+index&topic=0&type=f

For example, the number of parameters in a URL might exclude a site from being crawled for inclusion in our main index; however, it could still be crawled and added to our supplemental index.







青山 素子

Quote from: motumbo on January 02, 2007, 06:44:45 PM
By the way folks, if this hasn't already been said in this thread, Google is a screwed up search engine.  Google sucks bigtime.

I wouldn't go that far. I will agree with that to a point. Google is awfully complicated as it has to avoid spammers and other bad sites, and that can cause many problems for legitimate code when it gets caught up in the detection.

Unfortunately, it is also the most popular search engine, so we have to try to work with it.


Quote from: motumbo on January 02, 2007, 06:44:45 PM
Try reading some of the other message boards about Google dropping websites or dropping the number of pages indexed for a site for no reason.

It happens all the time.

Indeed, all too true, although this usually happens around the time when they do updates to their search algorithms.


Quote from: motumbo on January 02, 2007, 06:44:45 PM
I, too, do not like the SMF URL structure.  topic837.msg1002 - or whatever it is don't like it at all.

Take a look at how many parameters are in the SMF URL structure--4.  Google says that URLs with too many parameters (how many is too many?) won't be placed in its normal index.

A quick check shows only one paramter for entering into a topic.

For instance, this topic is at:
http://www.simplemachines.org/community/index.php?topic=127715.0

The second page is at:
http://www.simplemachines.org/community/index.php?topic=127715.15

I only count one parameter. For posts on the board index, they do have a msg paramter in them (although that is now indicating that it should not be indexed).

Can you think of a better scheme?
Motoko-chan
Director, Simple Machines

Note: Unless otherwise stated, my posts are not representative of any official position or opinion of Simple Machines.


geezmo

Quote from: destalk on January 02, 2007, 06:18:05 PM

However, I wouldn't use the Google Remove Tool for just general removal issues. This is for emergency removal of content for legal reasons and so on. The recommended method for removing pages is to let Google find a 404 error, or to use robots.txt

Actually the Google Removal Tool is effective in IMMEDIATELY deleting a page or two that you don't want indexed by Google. But when you want a handful of pages deleted, you will be asked to use robots.txt too. But it only takes a day or two before Google removes them from the SERP, but the only thing is you can't request to have the pages back in Google for the next 6 months. I have several site pages before that give a 404 error but even after 8 months, they're still in Google. So I decided to use the Removal Tool and voila, in just one day all these pages are gone.

geezmo

#92
Quote from: Motoko-chan on January 02, 2007, 07:41:07 PM
Unfortunately, it is also the most popular search engine, so we have to try to work with it.

Unfortunately, this is true. I'm beginning to hate Google too, I think the search results are starting to suck. I hate the fact that the search results can disappear in an instant. Yahoo's search is getting better and better in my opinion. However as pointed out, Google still holds the majority of the search engine market so we can only follow whatever they say.

Edit: Fixed nested quotes giving wrong credits. - Motoko

khoking

I am suprised that a post I made in a VB board on the 1st Jan 2007 is already appeared in a google search on the 3rd Jan 2007!

I am using SMF 1.1.1. Should I turn ON the SEO in it? Or just leave it unchecked (off)?
Kho King
www.ShaShinKi.com
www.PentaxWorld.com

青山 素子

Quote from: khoking on January 02, 2007, 10:43:45 PM
I am suprised that a post I made in a VB board on the 1st Jan 2007 is already appeared in a google search on the 3rd Jan 2007!

It all depends on when Google crawls the site. You might have posted before the site was recently crawled, and perhaps that won't happen again for another month, or it may happen in a few days. It is impossible to determine a schedule (I've seen the bot crawl a site every day for a week, then nothing for a month).


Quote from: khoking on January 02, 2007, 10:43:45 PM
I am using SMF 1.1.1. Should I turn ON the SEO in it? Or just leave it unchecked (off)?

I'm guessing you mean the friendly URLs? I haven't seen much difference, but if your server setup supports them, you can always try.
Motoko-chan
Director, Simple Machines

Note: Unless otherwise stated, my posts are not representative of any official position or opinion of Simple Machines.


geezmo

Quote from: Motoko-chan on January 03, 2007, 12:34:46 AM
It all depends on when Google crawls the site. You might have posted before the site was recently crawled, and perhaps that won't happen again for another month, or it may happen in a few days. It is impossible to determine a schedule (I've seen the bot crawl a site every day for a week, then nothing for a month).

It's true that it depends when the Googlebot spiders the site, but given all things constant, IMHO vB posts get to appear faster in Google compared to SMF posts.

Dannii

There is no 'all things constant' when dealing with search engines.
"Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise."

geezmo

"Given all things constant" there means that given two forums -- one vB and one SMF -- being spidered by the Googlebot AT THE SAME TIME, the vB post gets to show up in Google much faster than an SMF post does. You've seen it in THIS example.

destalk

Quote from: geezmo on January 03, 2007, 01:34:12 AM
It's true that it depends when the Googlebot spiders the site, but given all things constant, IMHO vB posts get to appear faster in Google compared to SMF posts.

I disagree that either forum software is better than the other. I had a forum in SMF which had real issues getting more than the main boards indexed. I converted it to vBulletin and it took more than a year for Google to index the vB forum and get the new threads out of the supplemental index. This is despite the fact that the old board urls were 301 redirected to the new vB board urls and the old SMF forum deleted.

Google is a funny old fish sometimes.

Dannii

But they're not the same. That vB forum has 100k more posts than this one, and I don't know how many more it did a year ago. And they had different content.
And.. the vB one isn't in google now. The topic name shows in an archive of the board.
"Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise."

Advertisement: