Simple Machines Community Forum

SMF Support => SMF 1.1.x Support => Aiheen aloitti: forumposters - kesäkuu 01, 2006, 02:38:50 AP

Otsikko: Google Indexing Pages I Don't Want Indexed
Kirjoitti: forumposters - kesäkuu 01, 2006, 02:38:50 AP
SMF Version: SMF 1.1 RC2
I notice Google has indexed the following two urls:

http://forumposters.org/forum/index.php?action=help;page=loginout
and
http://forumposters.org/forum/index.php?action=help;page=pm

Both of these are really not urls that would make sense for Google to index.  How can I tell Google not to index these pages?
Otsikko: Re: Google Indexing Pages I Don't Want Indexed
Kirjoitti: moviespot - kesäkuu 01, 2006, 07:26:50 AP
you should have robot.txt and edit it
Otsikko: Re: Google Indexing Pages I Don't Want Indexed
Kirjoitti: forumposters - kesäkuu 01, 2006, 10:45:09 AP
Would you be so kind as to post a sample robots.txt file that works well for smf?
Otsikko: Re: Google Indexing Pages I Don't Want Indexed
Kirjoitti: H - kesäkuu 01, 2006, 11:42:16 AP
http://www.simplemachines.org/community/index.php?topic=24726
Otsikko: Re: Google Indexing Pages I Don't Want Indexed
Kirjoitti: forumposters - kesäkuu 01, 2006, 03:19:42 IP
Awesome.  So, according to that thread this is all I need in robots.txt:

User-Agent: *
Disallow: /index.php?action=search
Disallow: /index.php?action=calendar
Disallow: /index.php?action=login
Disallow: /index.php?action=register
Disallow: /index.php?action=profile
Disallow: /index.php?action=stats
Otsikko: Re: Google Indexing Pages I Don't Want Indexed
Kirjoitti: H - kesäkuu 01, 2006, 04:57:44 IP
Also:

Disallow: /index.php?action=help*
Disallow: /index.php?action=printpage*
Otsikko: Re: Google Indexing Pages I Don't Want Indexed
Kirjoitti: forumposters - kesäkuu 28, 2006, 03:57:02 AP
I failed  to realize that the * at the end of these urls is essential.  I've just added that.. 
After peeking at your robots.txt file, I noticed you have these two lines:

Disallow: /forum/index.php?*all*
Disallow: /forum/index.php?*msg*

May I ask why you do that?  Wouldn't that prevent Google from spidering the forum posts?
Otsikko: Re: Google Indexing Pages I Don't Want Indexed
Kirjoitti: Dannii - kesäkuu 28, 2006, 04:00:56 AP
It will index the pages without all and msg.

Also, you can remove the /forum/index.php? and replace that with a * unless you have multiple forums and want to block only one.

Mine has:

User-agent: *
Disallow: *action=admin*
Disallow: *action=chat*
Disallow: *action=help*
Disallow: *action=login*
Disallow: *action=mlist*
Disallow: *action=post*
Disallow: *action=register*
Disallow: *action=search*
Disallow: *action=who*
Disallow: /Themes/
Otsikko: Re: Google Indexing Pages I Don't Want Indexed
Kirjoitti: H - kesäkuu 28, 2006, 12:39:56 IP
Lainaus käyttäjältä: forumposters - kesäkuu 28, 2006, 03:57:02 AP
I failed  to realize that the * at the end of these urls is essential.  I've just added that.. 
After peeking at your robots.txt file, I noticed you have these two lines:

Disallow: /forum/index.php?*all*
Disallow: /forum/index.php?*msg*

May I ask why you do that?  Wouldn't that prevent Google from spidering the forum posts?

Some of SMF links could be misinterpreted as duplicate content and as such I block individual message links and links that combine multiple pages ;)
Otsikko: Re: Google Indexing Pages I Don't Want Indexed
Kirjoitti: forumposters - lokakuu 24, 2006, 10:31:06 IP
I still seem to be having problems and I can't figure this out.  Here's my robots.txt file:

User-agent: *
Disallow: *action=admin*
Disallow: *action=help*
Disallow: *action=login*
Disallow: *action=mlist*
Disallow: *action=post*
Disallow: *action=register*
Disallow: *action=search*
Disallow: *action=trader*
Disallow: *action=profile*
Disallow: *action=who*
Disallow: /forum/Themes/
Disallow: /forum/admin/
Disallow: /forum/attachments/
Disallow: /cgi-bin/


For some reason Google is indexing hundreds of urls with *action=trader* in them and I don't want this.
Otsikko: Re: Google Indexing Pages I Don't Want Indexed
Kirjoitti: Dannii - lokakuu 24, 2006, 10:38:19 IP
Are they still being newly indexed, or are you just seeing the old pages? It takes a long time for a page to be removed from the index..
Otsikko: Re: Google Indexing Pages I Don't Want Indexed
Kirjoitti: forumposters - lokakuu 24, 2006, 10:51:05 IP
Good question.   After taking a closer look, I think it's just old pages that were indexed a couple months ago before I added the line

Disallow: *action=trader*