Uutiset:

Wondering if this will always be free?  See why free is better.

Main Menu
Advertisement:

Robots.txt and SMF

Aloittaja borisz, tammikuu 10, 2008, 10:57:22 AP

« edellinen - seuraava »

borisz

Hi, I have question about search engines and my forum. Couple months ago I blocked all my forums pages with robots.txt because search engines did index some areas of my forums which I didn't wanted to be indexed, for example pages with Forums members info., Help section, login page, logout page,  etc... Is there any procedure which I can use inside Robots.txt or any script for SMF to block only these pages and not entire forum?

Thanks!



karlbenson

#1
Robots don't just spider all of your topics, the randomly spider your forum pages as they find them. Sometimes it means that they may spider many profile pages first.

But overall SMF has no problem getting indexed by search engines.

You can block any page of smf you want in a robots.txt.

A good one I've seen posted around here is

User-agent: *
Disallow: /forum/*.msg*
Disallow: /forum/*sa=showPosts*
Disallow: /forum/*prev_next*
Disallow: /forum/*action=emailuser*
Disallow: /forum/*action=printpage*
Disallow: /forum/*action=recent*
Disallow: /forum/*action=help*
Disallow: /forum/*action=login*
Disallow: /forum/*action=profile*
Disallow: /forum/*action=register*
Disallow: /forum/*action=search*
Disallow: /forum/*action=stats*
Disallow: /forum/*action=unread*
Disallow: /forum/*action=verificationcode*
Disallow: /forum/*action=who*
Disallow: /forum/Themes/

(if you've go it installed on a different folder than /forum/ then you'll need to remove/change that part.
eg installed at the root/base of the domain
Disallow: /Themes/
eg installed on a different folder called smf
Disallow: /smf/Themes/

humbleworld

Google obeys robots. But other search engines may ignore your robots.txt.

borisz

Thanks karlbenson  ;), I really appreciate your help, I made those changes and I will see how they work. I know that some of search engines will ignore my robots.txt but 95%+ of my visitors comes from Google, Yahoo and MSN search so I mostly care to serve these engines with necessary info than those 5%-. Anyway, I hope this will work well and I will test it in Google Webmaster Tools, once again thanks karlbenson



karlbenson

Borisz

Sorry, but after i C+P the above it was missing
User-agent: *

(which I've now added in)

borisz




davebauer

So, am I correct in understanding that if the forum is installed in root then the lines should read like

Disallow: /*action=search*
Some days your the bug -- Some days your the windsheild

karlbenson


Ben_S

Wildcards are not valid for all robots.
Liverpool FC Forum with 14 million+ posts.

davebauer

Some days your the bug -- Some days your the windsheild

calamine

but is there any methods to stop bots from totally stop scrapping email ids / userinfo / files(pics etc.) from an smf forum
using smf 1.14 + tiny portal 0..9.8

青山 素子

General rule: If it's viewable it can be indexed. Many bots don't respect robots.txt (the major search engines usually do).

For profiles, you can block guests from being able to view the user profiles. Just edit the permissions of the guest group.
Motoko-chan
Director, Simple Machines

Note: Unless otherwise stated, my posts are not representative of any official position or opinion of Simple Machines.


daleroe

If I block all guests will that block all search engines?

karlbenson

YES!

And note it is against Search Engines Terms Of Use to show content to Spiders that you don't show to guests.
And could result in your being banned from Google + other SE

daleroe

Are you saying that blocking all guests does not block the spiders?  If so can I block the spiders?  How?

青山 素子

Blocking guests blocks the search engines.

Karl was just noting that if you block guests but code it so you allow search engines to index, it will often get your site banned from the search engines completely.
Motoko-chan
Director, Simple Machines

Note: Unless otherwise stated, my posts are not representative of any official position or opinion of Simple Machines.


karlbenson

To block ALL spiders from your forum
1. Create a robots.txt in the base of your domain

2. Add the code.
User-agent: *
Disallow: /
Disallow: /*


If you want to block ALL guests & Spiders
Admin > Features & Options
Uncheck "Allow guests to browse the forum"

But what you can't do is allow spiders, but not guests.

Advertisement: