Do SMF posts appear in search engines? (Something to be concerned about)

Started by geezmo, September 06, 2006, 09:16:08 PM

Previous topic - Next topic

Dannii

Because these lines block them:
User-agent: Googlebot
Disallow: /forum/*.msg*
Disallow: /forum/*sa=showPosts*
Disallow: /forum/*prev_next*
Disallow: /forum/*action=printpage*
Disallow: /forum/*action=recent*

Thats good. They're supposed to be blocked.
"Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise."

Farmacija

yes, but lot of messages and posts which could be indexed are not because of robots.txt file.
i suppose that problme is this line "Disallow: /forum/*.msg*". what that line exactly do with the posts?
www.farmaceuti.com
www.farmaceuti.com/tekstovi

Dannii

It blocks the links to individual posts to reduce duplicate content. If you're running 1.1 you can remove that line.
"Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise."

Farmacija

yes, im run 1.1.1. and i will remove because with that line google bot cant get to all posts in certain topics.
www.farmaceuti.com
www.farmaceuti.com/tekstovi

Ben_S

If you remove it, it will just be not indexed by the meta tag that tells it not to index it and you will be wasting the bandwidth on google bothering to visit it in the first place (assuming it doesn't bothered reading it because of the robots.txt rule - it may anyway).

The .msg links are just duplicated links that are already indexed by the proper link to it, e.g. index.php?topic=3424 and index.php?topic=3424.40 etc.
Liverpool FC Forum with 14 million+ posts.

Farmacija

aha, ok, i thought that google doesnt indexed that post at all.
Thank u Ben_S.
www.farmaceuti.com
www.farmaceuti.com/tekstovi

farmer77

I see in the default index.template that

Quote<meta name="robots" content="noindex" />

Is that good for SEO? Cuz I always thought it was good to index.

Dannii

"Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise."


青山 素子

Quote from: Isaac on December 28, 2006, 12:18:14 AM
Google Officially Address Duplicate Content for Forums

Oddly, when one of our testers did something similar to the fix we added in 1.1 final their indexed count shot up. I had similar results when moving from 1.1 RC3 (before this) to the final 1.1 for one of my boards.
Motoko-chan
Director, Simple Machines

Note: Unless otherwise stated, my posts are not representative of any official position or opinion of Simple Machines.


pushkin22

I tested my forum (SMF 1.1.1) with this tool: http://www.linkvendor.com/seo-tools/se-spider.html
And at the inbound links I see only such links like "index.php?PHPSESSID=9ed92b606c1c0053772a8f601d5919ea&board=42.0" or "index.php?PHPSESSID=c966a4ff67834db2d7be410517b56320&topic=1475.0"

Maybe is "PHPSESSID=something&" the problem!?  ???

geezmo

For several months now, I've been using Nikolas's SMF Archive mod and have put the following in my robots.txt:

Quote
Disallow: /forum/index.php?referrerid*
Disallow: /forum/index.php?action=calendar
Disallow: /forum/index.php?action=profile*
Disallow: /forum/index.php?action=help
Disallow: /forum/index.php?action=search
Disallow: /forum/index.php?action=search*
Disallow: /forum/index.php?action=register
Disallow: /forum/index.php?action=login

I was satisfied to see that Google has indexed more than 10,000 entries from my forum, almost equivalent to the number of topics.

But surprise! I upgraded the forum from 1.1 RC3 to 1.1 and voila! the Google entries from my site have dropped to 200. I'm sure the cause was the upgrade because I never changed any settings in the forum or modified robots.txt and have kept the sitemaps intact for months now. Also, the drop in Google SERPs occured 3 days after I have upgraded the forum to 1.1.

Any ideas why this happened? One week after the upgrade, the Google SERPs for my forum is still 200.

destalk

Quote from: pushkin22 on December 31, 2006, 05:47:45 AM
I tested my forum (SMF 1.1.1) with this tool: http://www.linkvendor.com/seo-tools/se-spider.html
And at the inbound links I see only such links like "index.php?PHPSESSID=9ed92b606c1c0053772a8f601d5919ea&board=42.0" or "index.php?PHPSESSID=c966a4ff67834db2d7be410517b56320&topic=1475.0"

Maybe is "PHPSESSID=something&" the problem!?  ???


Since RC2 SMF will not show PHPSESSID urls to spiders. I guess that the SEO Tools spider is not recognised as a spider.  The best way to see what Google is seeing is to check the cached pages.

Farmacija

now, there is over 8000 pages of my forum indexed by google although i restricted near 5000 posts by editing robots.txt file. :)
www.farmaceuti.com
www.farmaceuti.com/tekstovi

Isaac

Geezmo, my forum currently has 454 pages indexed in Google.  Yesterday, I changed my robots.txt file to what you posted above.  It'll be interesting to see the results.

madfiddler

If you have mkportal installed you have to turn off the "search engine friendly urls" in admin :(

fshagan

I've just started using SMF, but have a tip for a good sitemap maker ... gsitecrawler.com

When you use it, use the Wizard to add your site by clicking on the "Add New Project" button ... it will step you through setting it up to crawl your site.

In the "Filter" section, you list any part of a URL you want to have excluded from the sitemap file.  "action=", "Themes", "sort=" and a few others are in mine ... like "login.php", my phpAdsNew directory, my cgi-bin directory, etc. 

The way I built my list of exclusions was to have the spider crawl my site and look at the URLS.  I wanted to get to the point where only URLs with the format "index.php?board=10.0" and "index.php?topic=23.0" were listed.

It will take a while to crawl your site because it pauses periodically to reduce the load on the server.  But once you have it set up, you simply "Recrawl" the site, review the URL list, and then have the program FTP the sitemap.xml file to your site and ping Google that it's there.  It also creates Yahoo style "urllist.txt" files as well.

nitins60

How i missed this topic? Many of my friends sugges to use vB, because of this reason! Let's see what happens n next version! It will be good, project managers look here

青山 素子

Quote from: nitins60 on January 02, 2007, 03:05:40 AM
How i missed this topic? Many of my friends sugges to use vB, because of this reason! Let's see what happens n next version! It will be good, project managers look here

Are you asking the SMF Project Managers to read through this? If so, they are aware of it. Development doesn't stop and any good info here will be considered. (This is how some of the changes for SEO were added to 1.1 Final.)
Motoko-chan
Director, Simple Machines

Note: Unless otherwise stated, my posts are not representative of any official position or opinion of Simple Machines.


geezmo

Good to hear that. When I first posted this issue last year, a few "Simple Machines Heroes" criticized me and some other people for being too interested in having our sites appear in Google. They even said they don't care whether their sites appear in Google and it doesn't matter if their sites don't come up on the first pages of Google searches. Anyway that's been a long time, just glad that finally SMF has realized that search engine optimization is a must these days, unless of course you're not concerned with getting more customers or more hits.

Advertisement: