Do SMF posts appear in search engines? (Something to be concerned about)

Started by geezmo, September 06, 2006, 09:16:08 PM

Previous topic - Next topic

Toadmund

# robots.txt generated at www.mcanerin.com
User-agent: *
Disallow: *action=login*
Disallow: /Themes/

Like this?
Then I stick it in my forum root?

Sorry to be so dense, but I am dense!

Dannii

"Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise."

destalk

Quote from: Toadmund on December 17, 2006, 10:14:02 PM
# robots.txt generated at www.mcanerin.com
User-agent: *
Disallow: *action=login*
Disallow: /Themes/

Like this?
Then I stick it in my forum root?

Sorry to be so dense, but I am dense!

You may want to specifiy the wildcard rules for each of the big three search eingines only, Google, Yahoo and MSN. They are the only ones that understand wildcards (*). Other smaller serach engines bots will just get confused or ignore wildcards, as widlcards are not part of the robots.txt standard.

Also, there have been issues with Googlebot defaulting to the rules specified in "User-agent: Googlebot", as oppopsed to User-agent: *

So it may be worth repeating any general rules for each specific bot.

So something like this may be more useful.

User-agent: *
Disallow: /Themes/
Disallow: /community/index.php?action=login

User-agent: Googlebot
Disallow: *action=login*
Disallow: /Themes/

User-agent: MSNBot
Disallow: *action=login*
Disallow: /Themes/

User-agent: Slurp
Disallow: *action=login*
Disallow: /Themes/


Toadmund

So I just put all that on notepad in a .txt file, and then shove it into my forum root.

Thanks! :D
I'll do it when I get up tomorrow.

destalk

Quote from: Toadmund on December 18, 2006, 12:52:53 AM
So I just put all that on notepad in a .txt file, and then shove it into my forum root.

Thanks! :D
I'll do it when I get up tomorrow.

Well... something like that. I was just giving an example. You may want to customise it for your particular forum. For example, if your forum is in a directory called community, then it would look something like

Disallow: /community/Themes/

You really want to get this kind of thing right for your site, otherwise you can end up excluding files that you don't want or including pages that you didn't want indexing.

A good way to check if your robots.txt file is properly configured is to get a Google site maps account. They have an analysis tool which will let you type in some urls to see if the robots rules that you have set are functioning properly.

windyweather

I looked through this whole post and maybe I missed it, but it seemed to me that while we have mentioned google sitemaps, nobody said:
Quote"Go here and get the SMF plugin to build google sitemaps."

Maybe this is old news to everyone, but I didn't see it mentioned clearly here.


  • Google has a technology to build sitemaps. Sitemaps are XML files that list every page you have, with options about how often they might change so the robot can come back to check.
  • Many CMSs - WP and others - have plugins that build google sitemaps, and then ping google to suck them up.
  • The information can be easily found on the google webmasters pages.
  • If SMF doens't have a google sitemap builder, then it should - IMHO. An optional plugin that those of us concerned about page traffic can use to put every topic and reply into Google.
  • Without a sitemap, Google does not search all the pages of a CMS site.

I found all this out by searching google when a client of mine wanted to assure that she had more traffic for the WordPress site that I built for her.

fwiw, and sorry if I've repeated things, but I didn't see it in the 6 pages.

ww

青山 素子

There is a new mod out called SEO4SMF that is compatible with 1.1. It currently isn't approved for the official site yet, but will hopefully be soon.
Motoko-chan
Director, Simple Machines

Note: Unless otherwise stated, my posts are not representative of any official position or opinion of Simple Machines.


Dannii

QuoteWithout a sitemap, Google does not search all the pages of a CMS site.
This is not true. A sitemap just helps Google find your pages faster. If it thinks your site is interesting enough it will still index everything.
"Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise."

Toadmund

OK, so here is what I put into my forum root, root being 'forum':
User-agent: *
Disallow: /forum/Themes/
Disallow: /forum/index.php?action=login

User-agent: Googlebot
Disallow: *action=login*
Disallow: /forum/Themes/

User-agent: MSNBot
Disallow: *action=login*
Disallow: /forum/Themes/

User-agent: Slurp
Disallow: *action=login*
Disallow: /forum/Themes/


Is that fine?

And I would have to concur with windyweather, it asks for a sightmap (goog webmaster tools), but how does one make one?

Dannii

"Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise."

destalk

You don't actually have to make a sitemap to use the Google robots.txt tool (in fact many people think don't like them). Just sign up for an account and type in various urls to check that the robots.txt file is working as it should (for Google anyway).

destalk

Quote from: Toadmund on December 18, 2006, 11:56:40 PM
OK, so here is what I put into my forum root, root being 'forum'

The robots.txt file needs to go into the web site root, not the root folder of the forum. I.E. www. yourdomain.com/index.txt

Toadmund

That's why I was being so clear as to where i put it, so someone may correct me.
I was wondering where exactly it goes.
Thanks!

destalk

For those interested in duplicate content issues. Here is a brand new blog post from the horses mouth;

Google's view of Duplicate Content.


farmer77

Now that 1.1 has been out for a while, anyone see Google indexing more of their pages?

Toadmund

Yes, but still inadequate, it only index's as far as the title of the post, therefore all that is indexed under the link is what I had in my meta tag. And since all the links in google are identical (all but title) they are relegated to the omitted results department.

I went and deleted my meta tag, and put in a robots.txt tag, now I wait......

青山 素子

Not sure if I posted in this thread or not, but I went from about 95 items indexed to just over 600 in under a week after moving from RC3 to final (I did open up some more boards as well, but the older setup should have still had over 100 threads). I just checked and see 813 items indexed now.
Motoko-chan
Director, Simple Machines

Note: Unless otherwise stated, my posts are not representative of any official position or opinion of Simple Machines.


Farmacija

Couple weeks ago i have about 3000 links in google but now
its maybe because of robots.txt file or Nicolas' archive script , i don know... :wink:
www.farmaceuti.com
www.farmaceuti.com/tekstovi

farmer77

I just got Nick's SMF archive script this morning.  Really cool guy.  Let's hope it works for me.

Farmacija

i put thid robots.txt file
User-agent: Googlebot
Disallow: /forum/*.msg*
Disallow: /forum/*sa=showPosts*
Disallow: /forum/*prev_next*
Disallow: /forum/*action=printpage*
Disallow: /forum/*action=recent*
User-agent: *
Disallow: /Themes/
Disallow: /community/index.php?action=login

User-agent: Googlebot
Disallow: *action=login*
Disallow: /Themes/

User-agent: MSNBot
Disallow: *action=login*
Disallow: /Themes/

User-agent: Slurp
Disallow: *action=login*
Disallow: /Themes/

and i got this result from google sitemap site:

Why are there so many restricted pages?
www.farmaceuti.com
www.farmaceuti.com/tekstovi

Advertisement: