Do SMF posts appear in search engines? (Something to be concerned about)

Started by geezmo, September 06, 2006, 09:16:08 PM

Previous topic - Next topic

青山 素子

Quote from: geezmo on January 02, 2007, 07:13:00 PM
Good to hear that. When I first posted this issue last year, a few "Simple Machines Heroes" criticized me and some other people for being too interested in having our sites appear in Google. They even said they don't care whether their sites appear in Google and it doesn't matter if their sites don't come up on the first pages of Google searches. Anyway that's been a long time, just glad that finally SMF has realized that search engine optimization is a must these days, unless of course you're not concerned with getting more customers or more hits.

The "Hero" part is just a postcount title. The thing to look for are the badges of team members.

SEO is one of those love/hate things. If we can do a bit to help and it doesn't affect the software's performance, it will likely be done. If it is something that will affect performance it will likely not be done. (things like auto-generation of sitemaps and stuff would affect performance, so that will usually be left for mods to take care of).

If you see anything that can improve SMF in any way, don't be afraid to mention it (even if it meets with criticism). The more participation and opinions that are given, the better SMF will be, and the team will also know where the community wants SMF to evolve.
Motoko-chan
Director, Simple Machines

Note: Unless otherwise stated, my posts are not representative of any official position or opinion of Simple Machines.


geezmo

Good to hear that Motoko-Chan. That kind of attitude will make SMF a very competitive software in the future. A lot of us SMF users are on the verge of moving to vB just because of this SEO issue but because it's now being addressed, I think we can stay on. Thanks again.

Anyway, I'd just like to ask again my concern. Do you know what caused my Google SERPs to decrease right after I upgraded to 1.1?

Quote from: geezmo on December 31, 2006, 02:43:09 PM
For several months now, I've been using Nikolas's SMF Archive mod and have put the following in my robots.txt:

Quote
Disallow: /forum/index.php?referrerid*
Disallow: /forum/index.php?action=calendar
Disallow: /forum/index.php?action=profile*
Disallow: /forum/index.php?action=help
Disallow: /forum/index.php?action=search
Disallow: /forum/index.php?action=search*
Disallow: /forum/index.php?action=register
Disallow: /forum/index.php?action=login

I was satisfied to see that Google has indexed more than 10,000 entries from my forum, almost equivalent to the number of topics.

But surprise! I upgraded the forum from 1.1 RC3 to 1.1 and voila! the Google entries from my site have dropped to 200. I'm sure the cause was the upgrade because I never changed any settings in the forum or modified robots.txt and have kept the sitemaps intact for months now. Also, the drop in Google SERPs occured 3 days after I have upgraded the forum to 1.1.

Any ideas why this happened? One week after the upgrade, the Google SERPs for my forum is still 200.

青山 素子

Quote from: geezmo on January 02, 2007, 07:34:24 PM
Good to hear that Motoko-Chan. That kind of attitude will make SMF a very competitive software in the future. A lot of us SMF users are on the verge of moving to vB just because of this SEO issue but because it's now being addressed, I think we can stay on. Thanks again.

I hear there are also topics on vB sucking at SEO on their boards (haven't checked personally), so my guess is the problems are shared across everything.


Quote from: geezmo on January 02, 2007, 07:34:24 PM
Anyway, I'd just like to ask again my concern. Do you know what caused my Google SERPs to decrease right after I upgraded to 1.1?

It might be related to the noindex we are adding to certain URLs to avoid duplicate content being indexed. Then again, it might be because Google felt like it, they don't exactly give info out like that. Has your count increased? You might want to look into their webmaster tools and check what they have on your site through it, it can be very informative (including telling if they are having problems indexing pages).
Motoko-chan
Director, Simple Machines

Note: Unless otherwise stated, my posts are not representative of any official position or opinion of Simple Machines.


Toadmund

I took a bit of this, and a bit of that just mentioned, and now I have this:
User-agent: *
Disallow: /forum/Themes/
Disallow: /forum/index.php?action=login
Disallow: /forum/index.php?referrerid*
Disallow: /forum/index.php?action=calendar
Disallow: /forum/index.php?action=profile*
Disallow: /forum/index.php?action=help
Disallow: /forum/index.php?action=search
Disallow: /forum/index.php?action=search*
Disallow: /forum/index.php?action=register

User-agent: Googlebot
Disallow: *action=login*
Disallow: /forum/Themes/

User-agent: MSNBot
Disallow: *action=login*
Disallow: /forum/Themes/

User-agent: Slurp
Disallow: *action=login*
Disallow: /forum/Themes/


That look OK?


From google webmaster tools:
QuoteHome page crawl:     
Googlebot last successfully accessed your home page on Dec 24, 2006.
THat's good 'cause just yesterday it said Sept. 9th was the last crawl date, finally updated a bit it seems.
But I have yet to see anything updated, no new posts and no evidence of robots.txt kicking in yet, C'mon google, speed 'er up!

geezmo

Quote from: Motoko-chan on January 02, 2007, 07:47:25 PM
I hear there are also topics on vB sucking at SEO on their boards (haven't checked personally), so my guess is the problems are shared across everything.

Yes, there are SEO problems that have started to crop up in vB. This was caused by their constant upgrading of the forum features (and of course, changes in Google rules that we never get to hear of). I think it's a problem with a growing and improving forum software. But we should never trade it off with "search engine-ability" because most site owners still rely on organic searches for visitors.

What I look forward though in SMF is the ability to put in Google SERP the posts themselves. What I know is that only SMF thread titles are being indexed, not the actual posts yet. vB used to be very good in having each and every forum post indexed in Google.

Quote from: Motoko-chan on January 02, 2007, 07:47:25 PM
It might be related to the noindex we are adding to certain URLs to avoid duplicate content being indexed. Then again, it might be because Google felt like it, they don't exactly give info out like that. Has your count increased? You might want to look into their webmaster tools and check what they have on your site through it, it can be very informative (including telling if they are having problems indexing pages).

That's what I'm thinking too but I can't pinpoint exactly what's wrong. Of course, Google won't bother to explain to me what's really going on. I have a Google Webmasters account but it doesn't say anything about the reasons for such in my account. It's still saying there that they are properly indexing my site. I'll wait for a week if there are changes, probably Google just got "surprised" with the 1.1 upgrade so it decided to hide some of my pages. Hopefully they will come back by next week.

Dannii

The problem with SEO is that noone knows anything absolutely. It's very hard to make decisions about what to include as default when we don't know what all the effects will be.

QuoteWhat I know is that only SMF thread titles are being indexed, not the actual posts yet. vB used to be very good in having each and every forum post indexed in Google.
As to that.. I don't understand how you can have a topic indexed without it's posts?
"Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise."

Stüldt Håjt

I made longer post here but I'll show you my robots.txt:

QuoteUser-agent: *
Disallow: /index.php?action
Disallow: /index.php?wap
Disallow: /index.php?wap2
Disallow: /index.php?imode
Disallow: /index.php?type=rss
Disallow: /index.php*msg
Disallow: /index.php*sort
Disallow: /index.php*prev_next

User-agent: Googlebot-Mobile
Allow: /index.php?wap
Allow: /index.php?wap2

User-agent: Slurp
Allow: /index.php?wap
Allow: /index.php?wap2

This one disables everything else except index, board and topic pages. And mobile versions are for Google's and Yahoo's mobile search.

You can submit your forum's mobile pages to yahoo here: http://search.yahoo.com/info/submit.html

Of course if you think your forum's users will use i-mode too, add it to slurp and googlebot-mobile.

Edit: And btw. using SMF's "search engine friendly urls" is useless. I had them since the beginning of my forum and most of google's results were without them.

青山 素子

Quote from: eldʌkaː on January 02, 2007, 09:45:38 PM
The problem with SEO is that noone knows anything absolutely. It's very hard to make decisions about what to include as default when we don't know what all the effects will be.

QuoteWhat I know is that only SMF thread titles are being indexed, not the actual posts yet. vB used to be very good in having each and every forum post indexed in Google.
As to that.. I don't understand how you can have a topic indexed without it's posts?

What I think the user meant is that each post counted as a separate item in the index. This is because vB has the "feature" to just show one post. SMF doesn't show that granular, so you can only index at the topic level. Of course, IMHO, indexing individual posts is pointless as you don't see the context when you see just those, and if the content on the page is indexed, you are fine anyway.


Quote from: Toadmund on January 02, 2007, 08:47:11 PM
From google webmaster tools:
QuoteHome page crawl:     
Googlebot last successfully accessed your home page on Dec 24, 2006.
THat's good 'cause just yesterday it said Sept. 9th was the last crawl date, finally updated a bit it seems.
But I have yet to see anything updated, no new posts and no evidence of robots.txt kicking in yet, C'mon google, speed 'er up!

Google doesn't crawl some sites all that often, especially if they are unpopular. Viewing their toos page on your site will help speed things up a bit (usually), as will putting their ads on your site (their bot must crawl the site to determine context).

If you don't like seeing their ads on your site, fake it by "testing". Use the following URL, replacing the italic items with what you want:

http://pagead2.googlesyndication.com/pagead/ads?format=728x90&client=ca&adtest=on&url=your site's main url

This previews what ads would look like, and if you do it once a week, it seems to influence Google to index your site a bit faster so the content will be more accurate.
Motoko-chan
Director, Simple Machines

Note: Unless otherwise stated, my posts are not representative of any official position or opinion of Simple Machines.


geezmo

Ok, I was about to reply here so I referred to my first post in this thread posted September 6, 2006. I saw the test I made during that time and tried to check whether there have been changes.

Try this yourself. I'm afraid this test ultimately proved that SMF has a big problem regarding search engines.

Here's the test again:

Quote from: geezmo on September 06, 2006, 09:16:08 PM

Let's compare posts from an SMF forum and a vBulletin forum:

http://www.simplemachines.org/community/index.php?topic=48256.0
Posted on September 5, 2005

QuoteMigrating Database from PHPBB2 to Simple Machines

Hello!

I *really* like this forums script, and since my World of Warcraft guild has been using PHPBB2 for a pretty long time now, I would really love to switch them to these forums but I don't want to loose all of the posts and member accounts.

Is there a way to migrate the database file from PhPBB to SimpleMachines so that people don't need to create new accounts, etc.? I know the structure is completely different and all, but I would really love to make the switch.

Please advise. Your help is greatly appreciated!

Here's a post on the same date in vBulletin.org, a site that uses vBulletin.

http://www.vbulletin.org/forum/showthread.php?t=95681
Posted on September 5, 2005

QuoteGetting an editor on moderated posts/threads in AdminCP

I asked this on vB.com and have been advised by Jake to ask here...   I have a few forum categories where my team gets to moderate the threads and posts made in them. However, quite often we need to edit these threads not just in terms of content, but in terms of the format of the post, and the fact that we feel they may not be appropriate to validate in the moderated forum but they would be fine in another forum.  It would be nice if there was a way of editing the format of a post/thread whilst within AdminCP Moderation area as well as ammending the forum category to where the post will show - instead of validating it first then editing it straight away.  Is there a way to accomplish this? How difficult would this be folks?  Thanks.

THE TEST: Search whether the threads will appear in Google.

Google Search for an SMF post: "Migrating Database from PHPBB2 to Simple Machines"
Search Results: NONE

Google Search for a vBulletin post:
"Getting an editor on moderated posts/threads in AdminCP"
Search Results: 1

I tried this test again and STILL, the SMF post above doesn't show up in Google but the vB post has been there all along. Remember that the SMF post I mentioned was posted in this forum on September 5, 2005 -- that was more than a year ago!

SMF still has a long way to go before solving this search engine problem.

Dannii

Agreed, there is a problem. But can you define the problem and what causes it?
"Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise."

geezmo

Here's another issue. Try this in Google: "site:simplemachines.org/community" and you'll get 4,940 results. That's waaaay low compared to the 852,662 Posts in 111,432 Topics in the forum.

And if you click Page 10 of the search result, you only get to go to Page 4, the rest of the results have been omitted.

I'm only pointing this out because I thought the new version 1.1.1 has already addressed the search engine issue. It looks like the problem even got worse.

What's causing this? I have no idea. But perhaps to sort things out, the developers (with some help from SMF users) can make an intensive comparison of how vB and SMF works, the codes, the functions, etc. That might show why vB posts appear in Google while SMF posts experience otherwise.

destalk

If you do a search for vBulletin duplicate URL and indexing issues you will find the web is full of criticism of vBulletin. In fact, many people feel that vBulletin is the worst forum software for duplication issues. One thread I read on the subject pointed out that each vB thread has "at least 10 URLs that can access it".

I am not trying to 'excuse' SMF. The .msg urls are a problem from a duplicate url point of view, but moving to a different forum software is rarely an answer IMHO.

Most forum software has duplicate URL issues (as do most content management systems) because developers generally concentrate on making the software work fast, rather than concentrate on SEO. As a result we have to utilise noindex or robots.txt solutions. It's a shame, because this is rather like shutting the stable door after the horse has bolted.

It's also a shame that many webmasters/developers don't see this issue as important, because SEO is not just about search engines and web site promotion. It's also about good practice for real human users. Google's reasons for excluding dupicate URLs is because they don't want users to be bombarded with lots of different links to the same places. It's a good ideal and as webmasters we should be helping them with that.


destalk

QuoteAs to that.. I don't understand how you can have a topic indexed without it's posts?

This can happen if the topic is excluded by a robots.txt file or by noindex. It can also happen if Google thinks that URL is a duplicate. In these cases, Google may only index the title of the page.

destalk

QuoteI'm only pointing this out because I thought the new version 1.1.1 has already addressed the search engine issue. It looks like the problem even got worse.

It's also worth noting that it can take many months for Google to reindex and sort out its results when any changes have been made to a web site. In my experience, the bigger the change the longer it takes for Google to 'trust' the web site again.

I mentioned on another thread how a site I moved to vBulletin from SMF took over a year to be reindexed properly by Google.

The new SMF 1.1.1 has added noindex tags for all .msg urls. This means that a site that has thousands of URLs indexed with the .msg URLs, will suddenly be asking Google to exclude those URLs from its search results.

Once Google has dropped all those .msg urls, Google then has to make the effort to reindex those threads using the 'correct' root url. For example;

http://www.simplemachines.org/community/index.php?topic=112100.0

instead of;

http://www.simplemachines.org/community/index.php?topic=112100.msg718725

This is one reason that I think we should try to find a way to only apply the noindex rules to threads that were started *after* a forum upgrades to version 1.1. If anyone knows how to do this, it might seriously lessen the negative affect of upgrading to version 1.1.

I started a thread asking for help on how to do this here, but no-one seems to know how to do this.

Dannii

Don't triple post please.

As to the .msg, you should have excluded them in your robots.txt file, so the 1.1 change of using a noindex tag wouldn't make a difference.
"Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise."

destalk

Quote from: eldʌkaː on January 03, 2007, 04:16:45 AM
Don't triple post please.

Is this a Simple Machines forum rule? I wasn't aware of this. I was always under the impression that it was good BB protocol, when addressing separate issues from different users, to deal with them in separate posts. My apologies if I have broken the rules.

Quote from: eldʌkaː on January 03, 2007, 04:16:45 AM
As to the .msg, you should have excluded them in your robots.txt file, so the 1.1 change of using a noindex tag wouldn't make a difference.

Well perhaps. But what should or shouldn't have done in the past doesn't really help with the current situation. It would have been great if SMF had had this feature from the beginning, but it didn't. That's life, we all live and learn. ;)

Most SMF forum owners have probably not even heard of robots.txt. And it seems clear from the many threads on this subject that many more are unclear as to how to even format the file properly.

That has led to a situation where most SMF forums will have had many of their threads with .msg urls indexed in search engines. These URLs will now all be dropped by search engines. Ideally, these will eventually be replaced by the root urls, but this can take a very long time and sometimes can cause other problems. Wouldn't it be a good idea if we could find a solution to this by finding a way to only apply the new noindex rules to newly created threads?

Dannii

That should be possible.. try this:
Display.php
find:
// Create a previous next string if the selected theme has it as a selected option.
replace:
// Don't use noindex if an old topic
if ($topic < 1000)
$context['robot_no_index'] = false;

// Create a previous next string if the selected theme has it as a selected option.
"Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise."

destalk

Wow. Brilliant. Thanks eldʌkaː.

I'll give that a go. I assume that if ($topic < 1000)
is simply the topic number that you want the noindex rules to start working from?

Dannii

"Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise."


Advertisement: