Creating a good ROBOTS.TXT for SMF (search engine friendly)

Started by Mr. Jinx, January 27, 2006, 05:32:03 AM

Previous topic - Next topic

Mr. Jinx

Quote from: geoffs on March 12, 2006, 11:59:46 AM
Disallow: *forum/index.php?action=admin*
Disallow: *forum/index.php?action=help*
.
.
.

This works as expected when tested on the google robots.txt validation page.


Why don't you use "Disallow: /forum/index.php?action=........"
That's what the first post tells you :)
Eldacar suggestions are a nice try but miss the point.

Skipdawg

Skipdawg's Community

Powered by SMF 1.1.3

geoffs

Quote from: statornic on March 12, 2006, 06:55:46 PM
hxxp:www.searchengineworld.com/cgi-bin/robotcheck.cgi?ti=1142207612&action=reset [nonactive]

When you test your robot file on the link above, you will find some warnings.

Hi - I am not sure if this poster posted this in response to my message but in case he/she did...

That validator will produce those warnings for any robots.txt file that includes wildcard characters in the disallow strings. Whether they are at the head or tail of the disallow spec, or both, makes no difference. As huwnet already mentioned in a previous post, google is probably not the only search engine that can handle the wildcards.
hxxp:www.geoffshapirophotography.com [nonactive]

geoffs

Quote from: Mr. Jinx on March 15, 2006, 03:21:03 PM
Why don't you use "Disallow: /forum/index.php?action=........"

Hi there Mr. Jinx. Your followup posts after your original post of this topic seems to indicate that you too are using wildcards in your robots.txt spec. The only thing that I did was also put the wildcard at the head of each disallow spec so that google would correctly handle specs that were not specified from the site root.

My own site would work with the root-based specs but I was illustrating for the benefit of others that might have sites with multiple forums installed as one example. Without the wildcards on head and tail you'd have to put multiple root-based specs into the robots.txt for each forum path.
hxxp:www.geoffshapirophotography.com [nonactive]

Dannii

This is my current one:
QuoteUser-agent: *
Disallow: *action=admin*
Disallow: *action=help*
Disallow: *action=login*
Disallow: *action=mlist*
Disallow: *action=post*
Disallow: *action=register*
Disallow: *action=search*
Disallow: *action=who*
Disallow: /Themes/
I went through every page in my forum Google had indexed and made mine from that. I dropped the index.php? part, because it didn't work as well and wasn't needed. Of course if you had other sites that used the action= and you wanted to only block smf you'd have to make it more specific.
"Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise."

geoffs

That's a good refinement eldacar!

Actually, I don't see why these search engine developers haven't written their robots.txt parsers to allow regular expressions for the Disallow specs. It needn't be required, just parsed if specified as such.
hxxp:www.geoffshapirophotography.com [nonactive]

Niteblade

affiliate blog

Shonick

User-agent: *
Disallow: /forum/index.php?action=search*
Disallow: /forum/index.php?action=calendar*
Disallow: /forum/index.php?action=login*
Disallow: /forum/index.php?action=register*
Disallow: /forum/index.php?action=profile*
Disallow: /forum/index.php?action=stats*
Disallow: /forum/index.php?action=arcade*
Disallow: /forum/index.php?action=printpage*
Disallow: /forum/index.php?PHPSESSID=*
Disallow: /forum/index.php?*rss*
Disallow: /forum/index.php?*wap*
Disallow: /forum/index.php?*wap2*
Disallow: /forum/index.php?*imode*


Does this code work well with other search engine? DO they index everything on my site if they don't follow this code?

Quote from: Mr. Jinx on January 27, 2006, 05:32:03 AM
I've been using this robots.txt for a few months now, and currently every thing is indexed the way I like it. You may have to use google's auto removal tool to speedup things. (be sure what ýou're doing, a wrong wildcard could remove your complete site)

what happen if someone use this tool to remove my site .This is too bad.

Mr. Jinx

Quote from: viet on January 21, 2007, 09:41:48 AM
Does this code work well with other search engine? DO they index everything on my site if they don't follow this code?

Sure! If a search engine doesn't understand lines in your robots.txt it will be ignored. Major engines like yahoo and msn are still indexing.

Quote from: Mr. Jinx on January 27, 2006, 05:32:03 AM
what happen if someone use this tool to remove my site .This is too bad.

Ofcource the guys at google are smart enough to not let that happen. You'll have to put something on your site to activate this process. Only the site owner can do this.

vagrant

i also have these in mine.
(forum shows friendly url's)

Disallow: index.php?action=activate
Disallow: index.php?action=reminder
Disallow: *topicseen*
Disallow: *html#msg*


Toadmund

How do I block 'news' (like at near top left of forum)

Disallow: /forum/index.php?action=news*


Would that be it?

Dannii

What do you mean block it? You can only block pages, not parts of pages.
"Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise."

Toadmund

I just find it annoying that it lists what I have in the news portion (perhaps not that terribly annoyed).
It is usually little wisecracks, that I change every few weeks depending less or more how much that I like them (I Like; 4 weeks! Perhaps more!)
It updates my weird humour quite regularly it seems.

So, eld^ka, you say that, basically I would have to disallow index.php in order for this not to be listed?

Basically, ultimately what I want to accomplish is for the search engines to concentrate on the content of my forum posts to be listed.
That's my goal and mission with robots.txt

nitins60


Dannii

Quote from: Toadmund on February 25, 2007, 02:25:38 AM
I just find it annoying that it lists what I have in the news portion (perhaps not that terribly annoyed).
It is usually little wisecracks, that I change every few weeks depending less or more how much that I like them (I Like; 4 weeks! Perhaps more!)
It updates my weird humour quite regularly it seems.

So, eld^ka, you say that, basically I would have to disallow index.php in order for this not to be listed?

Basically, ultimately what I want to accomplish is for the search engines to concentrate on the content of my forum posts to be listed.
That's my goal and mission with robots.txt
You could change your template to hide it for guests.. that might be better.
"Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise."

motumbo

Quote from: eldʌkaː on March 15, 2006, 08:25:09 PM
This is my current one:
QuoteUser-agent: *
Disallow: *action=admin*
Disallow: *action=help*
Disallow: *action=login*
Disallow: *action=mlist*
Disallow: *action=post*
Disallow: *action=register*
Disallow: *action=search*
Disallow: *action=who*
Disallow: /Themes/
I went through every page in my forum Google had indexed and made mine from that. I dropped the index.php? part, because it didn't work as well and wasn't needed. Of course if you had other sites that used the action= and you wanted to only block smf you'd have to make it more specific.

Would these disallow directives be placed in a robots.txt file in the root folder or in the /forums/ folder?  I want to be sure I've got this right so I don't screw anything up.

Google and Yahoo both index a lot of stuff that shouldn't be indexed.

Dannii

It can be placed anywhere. If you put it in the root folder, add /forums/ to the beginning of each one.
"Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise."

motumbo

Quote from: eldʌkaː on March 02, 2007, 09:51:05 PM
It can be placed anywhere. If you put it in the root folder, add /forums/ to the beginning of each one.

Thanks for the reply.

forums/*action=admin*

That will work?  The wildcard before "action" is OK?

I just want to be sure.  I'd have to screw something up!

Advertisement: