News:

Want to get involved in developing SMF? Why not lend a hand on our GitHub!

Main Menu

Creating a good ROBOTS.TXT for SMF (search engine friendly)

Started by Mr. Jinx, January 27, 2006, 05:32:03 AM

Previous topic - Next topic

Dannii

"Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise."

destalk

Quoteforums/*action=admin*

Would spiders even find that URL? Isn't it just shown to logged in admin users?

I find that Disallow: /forum/*action=

solves the need for most other disallow rules. But then I only want search engines to index the content of my threads. If you want profiles indexed, do not use that rule.

I use the above in conjunction with;

Disallow: /forum/*sort
Disallow: /forum/Themes/*
Disallow: /forum/*.from
Disallow: /forum/*prev_next*


For Googlebot I also add Disallow: /forum/*wap2*
Although, it's probably not neccesary as Google has its own bot for mobile content and seems smart enough to know the difference.

Also, if you are not using the 'catch all' rule above, I would add action=printpage into the mix.

Owdy

mine

User-agent: *
Disallow: /*;wap
Disallow: /*;wap2
Disallow: /*;imode
Disallow: /*action*
Disallow: /*.gif$
Disallow: /*.jpg$
Disallow: /*.png$

User-agent: Mediapartners-Google
Allow:
Former Lead Support Specialist

Tarvitsetko apua SMF foorumisi kanssa? Otan työtehtäviä vastaan, lue:http://www.simplemachines.org/community/index.php?topic=375918.0


periscope

I posted my robots.txt here in another thread: http://www.simplemachines.org/community/index.php?topic=316463.msg2101632#msg2101632

How about someone start a thread with a good sitemap.xml for their site. You could set the sitemap to tell search engines to revisit the topic list every 3 days for instance.

Keep in mind when you're going through the site with you web browser looking for links to keep search engines out of, make sure you're logged out of your account and viewing as a guest because that's how a search engine will see it. There are some links that only appear when you're logged in to an account and there is no need to block those since a search engine is just viewing as a guest.

Mr. Jinx

Quote from: periscope on June 10, 2009, 12:10:31 AM
How about someone start a thread with a good sitemap.xml for their site. You could set the sitemap to tell search engines to revisit the topic list every 3 days for instance.
Why? There are some good mods that generate sitemaps.

minos

hello  this my robotx.txt any suggestion  add or remove something uploaded in my forum root directory not my domain root

User-agent: *
Disallow: /Sources/*
Disallow: /Smileys/*
Disallow: /Packages/*
Disallow: /header/*
Disallow: /avatars/*
Disallow: /attachments/*
Disallow: /Themes/*
Disallow: /index.php?action=printpage*
Disallow: /index.php?action=post*
Disallow: /index.php?action=permissions*
Disallow: /index.php?action=pm*
Disallow: /index.php?action=help*
Disallow: /index.php?action=register*
Disallow: /index.php?action=login*
Disallow: /index.php?action=mlist*
Disallow: /index.php?action=search*
Disallow: /index.php?action=who*
Disallow: /index.php?*rss*
Disallow: /index.php?action=stats
Disallow: /index.php?PHPSESSID=
Disallow: /index.php?*wap*
Disallow: /index.php?*imode*
Disallow: /index.php?action=post*
Disallow: /index.php?action=activate*
Disallow: /index.php?action=reminder*
Disallow: /index.php?wap2*
Disallow: /*sort
Sitemap: http://www.metalminos.com/foro/index.php?action=sitemap



by the way  bing.com not accept ;xml in sitemap tag if i add this tag without ;xml  will affect google and yahoo?

Astro1

I use the following

User-agent: *
Disallow: /Sources/*
Disallow: /Smileys/*
Disallow: /Packages/*
Disallow: /header/*
Disallow: /avatars/*
Disallow: /attachments/*
Disallow: /Themes/*
Disallow: /index.php?action=printpage*
Disallow: /index.php?action=post*
Disallow: /index.php?action=permissions*
Disallow: /index.php?action=pm*
Disallow: /index.php?action=help*
Disallow: /index.php?action=register*
Disallow: /index.php?action=login*
Disallow: /index.php?action=mlist*
Disallow: /index.php?action=search*
Disallow: /index.php?action=who*
Disallow: /index.php?*rss*
Disallow: /index.php?action=stats
Disallow: /index.php?PHPSESSID=
Disallow: /index.php?*wap*
Disallow: /index.php?*imode*
Disallow: /index.php?action=post*
Disallow: /index.php?action=activate*
Disallow: /index.php?action=reminder*
Disallow: /index.php?wap2*
Disallow: /*sort
Disallow: /*sort,
Disallow: /*action=
Disallow: /*.new.html
Disallow: /*.msg
Disallow: /*.prev_next

esttecb

All this can be easily fixed by using canonical tags.
Except for wap... that require some code tweaks. SMF team should be advised about this.

Deju

Sorry, know this is an oldie, but...

I'm running 1.1.11 with LOTS of mods that I can't fathom of how to transfer over to 2.0.

My question:
If I put this robots.txt file in the right directory shouldn't it prohibit me from browsing those pages while I'm signed our of my forum???

I can still go to print page.

What have I done wrong?

www.troutlegend.com/forum

robots.txt file in root "html" file on server:
QuoteUser-agent: *
Disallow: /forum/index.php?action=search*
Disallow: /forum/index.php?action=calendar*
Disallow: /forum/index.php?action=login*
Disallow: /forum/index.php?action=register*
Disallow: /forum/index.php?action=profile*
Disallow: /forum/index.php?action=stats*
Disallow: /forum/index.php?action=arcade*
Disallow: /forum/index.php?action=printpage*
Disallow: /forum/index.php?PHPSESSID=*
Disallow: /forum/index.php?*rss*
Disallow: /forum/index.php?*wap*
Disallow: /forum/index.php?*wap2*
Disallow: /forum/index.php?*imode*

Mr. Jinx

The robots file will only stop search engines to collect those files/directorie. This will prevent duplicate content in search engines (which is also better for page rankings)

kibtwane

I'm a little confused as to where to put the robot txt file in my directory. 

The root directory for my website is

http://www.thechristianidentityforum.net

But the directory for my forum is

http://www.thechristianidentityforum.net/smf

So which url should I upload the file to?

Many thanks.


mrtarkhan

it is my robot.txt :


User-agent: *
Disallow: index.php?action=help*
Disallow: index.php?action=search*
Disallow: index.php?action=login*
Disallow: index.php?action=register*
Disallow: index.php?action=stats*
Disallow: index.php?action=arcade*
Disallow: index.php?action=printpage*
Disallow: index.php?*rss*
Disallow: index.php?*wap*
Disallow: index.php?*wap2*
Disallow: index.php?*imode*
Disallow: /index.php?action=who*
Disallow: /index.php?action=permissions*
Disallow: /index.php?action=pm*
Disallow: /index.php?action=activate*
Disallow: /index.php?action=reminder*
Disallow: /avatars/
Disallow: /Packages/
Disallow: /Smileys/
Disallow: /Sources/
Disallow: /Themes/
Disallow: index.php/board,136.0.html
allow: index.php?action=sitemap


is it good for a subdomain forum ?
هر پرسش خردمند، نيمي از پاسخ را در خود دارد.((سليمان ابن گاويرول))

DeepBlueGXP

I used this site to test mine and I received errors for the asterisks in the disallow section
QuoteWildcard characters (like "*") are not allowed here

User-agent: *
Disallow: /home/
Disallow: /test/
Disallow: /cgi-bin/
Disallow: /forum/index.php?action=search*
Disallow: /forum/index.php?action=login*
Disallow: /forum/index.php?action=register*
Disallow: /forum/index.php?action=profile*
Disallow: /forum/index.php?action=stats*
Disallow: /forum/index.php?action=printpage*
Disallow: /forum/index.php?PHPSESSID=*
Disallow: /forum/index.php?*rss*

http://www.searchenginepromotionhelp.com/m/robots-text-tester/robots-checker.php



donjazzy

Nice topic.
Can someone please help on how to tell all ROBOTS to index only topics and stay clear from everywhere else in the site.

Secondly, url carrying topicseen and msg should not be index too.

Illori

donjazzy you have a thread on this already, please stick to your thread and please dont bump old topics just to get attention to your issue.

Advertisement: