SMF Version: SMF 1.1 RC2
I notice Google has indexed the following two urls:
http://forumposters.org/forum/index.php?action=help;page=loginout
and
http://forumposters.org/forum/index.php?action=help;page=pm
Both of these are really not urls that would make sense for Google to index. How can I tell Google not to index these pages?
you should have robot.txt and edit it
Would you be so kind as to post a sample robots.txt file that works well for smf?
http://www.simplemachines.org/community/index.php?topic=24726
Awesome. So, according to that thread this is all I need in robots.txt:
User-Agent: *
Disallow: /index.php?action=search
Disallow: /index.php?action=calendar
Disallow: /index.php?action=login
Disallow: /index.php?action=register
Disallow: /index.php?action=profile
Disallow: /index.php?action=stats
Also:
Disallow: /index.php?action=help*
Disallow: /index.php?action=printpage*
I failed to realize that the * at the end of these urls is essential. I've just added that..
After peeking at your robots.txt file, I noticed you have these two lines:
Disallow: /forum/index.php?*all*
Disallow: /forum/index.php?*msg*
May I ask why you do that? Wouldn't that prevent Google from spidering the forum posts?
It will index the pages without all and msg.
Also, you can remove the /forum/index.php? and replace that with a * unless you have multiple forums and want to block only one.
Mine has:
User-agent: *
Disallow: *action=admin*
Disallow: *action=chat*
Disallow: *action=help*
Disallow: *action=login*
Disallow: *action=mlist*
Disallow: *action=post*
Disallow: *action=register*
Disallow: *action=search*
Disallow: *action=who*
Disallow: /Themes/
Lainaus käyttäjältä: forumposters - kesäkuu 28, 2006, 03:57:02 AP
I failed to realize that the * at the end of these urls is essential. I've just added that..
After peeking at your robots.txt file, I noticed you have these two lines:
Disallow: /forum/index.php?*all*
Disallow: /forum/index.php?*msg*
May I ask why you do that? Wouldn't that prevent Google from spidering the forum posts?
Some of SMF links could be misinterpreted as duplicate content and as such I block individual message links and links that combine multiple pages ;)
I still seem to be having problems and I can't figure this out. Here's my robots.txt file:
User-agent: *
Disallow: *action=admin*
Disallow: *action=help*
Disallow: *action=login*
Disallow: *action=mlist*
Disallow: *action=post*
Disallow: *action=register*
Disallow: *action=search*
Disallow: *action=trader*
Disallow: *action=profile*
Disallow: *action=who*
Disallow: /forum/Themes/
Disallow: /forum/admin/
Disallow: /forum/attachments/
Disallow: /cgi-bin/
For some reason Google is indexing hundreds of urls with *action=trader* in them and I don't want this.
Are they still being newly indexed, or are you just seeing the old pages? It takes a long time for a page to be removed from the index..
Good question. After taking a closer look, I think it's just old pages that were indexed a couple months ago before I added the line
Disallow: *action=trader*