Simple Machines Community Forum

Archived Boards and Threads... => Archived Boards => SMF Feedback and Discussion => Aiheen aloitti: borisz - tammikuu 10, 2008, 10:57:22 AP

Otsikko: Robots.txt and SMF
Kirjoitti: borisz - tammikuu 10, 2008, 10:57:22 AP
Hi, I have question about search engines and my forum. Couple months ago I blocked all my forums pages with robots.txt because search engines did index some areas of my forums which I didn't wanted to be indexed, for example pages with Forums members info., Help section, login page, logout page,  etc... Is there any procedure which I can use inside Robots.txt or any script for SMF to block only these pages and not entire forum?

Thanks!
Otsikko: Re: Robots.txt and SMF
Kirjoitti: karlbenson - tammikuu 10, 2008, 12:19:07 IP
Robots don't just spider all of your topics, the randomly spider your forum pages as they find them. Sometimes it means that they may spider many profile pages first.

But overall SMF has no problem getting indexed by search engines.

You can block any page of smf you want in a robots.txt.

A good one I've seen posted around here is

User-agent: *
Disallow: /forum/*.msg*
Disallow: /forum/*sa=showPosts*
Disallow: /forum/*prev_next*
Disallow: /forum/*action=emailuser*
Disallow: /forum/*action=printpage*
Disallow: /forum/*action=recent*
Disallow: /forum/*action=help*
Disallow: /forum/*action=login*
Disallow: /forum/*action=profile*
Disallow: /forum/*action=register*
Disallow: /forum/*action=search*
Disallow: /forum/*action=stats*
Disallow: /forum/*action=unread*
Disallow: /forum/*action=verificationcode*
Disallow: /forum/*action=who*
Disallow: /forum/Themes/

(if you've go it installed on a different folder than /forum/ then you'll need to remove/change that part.
eg installed at the root/base of the domain
Disallow: /Themes/
eg installed on a different folder called smf
Disallow: /smf/Themes/
Otsikko: Re: Robots.txt and SMF
Kirjoitti: humbleworld - tammikuu 10, 2008, 04:29:31 IP
Google obeys robots. But other search engines may ignore your robots.txt.
Otsikko: Re: Robots.txt and SMF
Kirjoitti: borisz - tammikuu 10, 2008, 05:47:30 IP
Thanks karlbenson  ;), I really appreciate your help, I made those changes and I will see how they work. I know that some of search engines will ignore my robots.txt but 95%+ of my visitors comes from Google, Yahoo and MSN search so I mostly care to serve these engines with necessary info than those 5%-. Anyway, I hope this will work well and I will test it in Google Webmaster Tools, once again thanks karlbenson
Otsikko: Re: Robots.txt and SMF
Kirjoitti: karlbenson - tammikuu 10, 2008, 06:14:36 IP
Borisz

Sorry, but after i C+P the above it was missing
User-agent: *

(which I've now added in)
Otsikko: Re: Robots.txt and SMF
Kirjoitti: borisz - tammikuu 11, 2008, 09:01:15 AP
I already add that, Thanks!
Otsikko: Re: Robots.txt and SMF
Kirjoitti: davebauer - tammikuu 16, 2008, 02:07:59 AP
So, am I correct in understanding that if the forum is installed in root then the lines should read like

Disallow: /*action=search*
Otsikko: Re: Robots.txt and SMF
Kirjoitti: karlbenson - tammikuu 16, 2008, 11:49:04 AP
That is correct.
Otsikko: Re: Robots.txt and SMF
Kirjoitti: Ben_S - tammikuu 16, 2008, 01:35:55 IP
Wildcards are not valid for all robots.
Otsikko: Re: Robots.txt and SMF
Kirjoitti: davebauer - tammikuu 16, 2008, 11:40:22 IP
Thanks Karl
Otsikko: Re: Robots.txt and SMF
Kirjoitti: calamine - tammikuu 24, 2008, 12:18:59 IP
but is there any methods to stop bots from totally stop scrapping email ids / userinfo / files(pics etc.) from an smf forum
Otsikko: Re: Robots.txt and SMF
Kirjoitti: 青山 素子 - tammikuu 24, 2008, 12:27:44 IP
General rule: If it's viewable it can be indexed. Many bots don't respect robots.txt (the major search engines usually do).

For profiles, you can block guests from being able to view the user profiles. Just edit the permissions of the guest group.
Otsikko: Re: Robots.txt and SMF
Kirjoitti: daleroe - maaliskuu 18, 2008, 12:40:39 IP
If I block all guests will that block all search engines?
Otsikko: Re: Robots.txt and SMF
Kirjoitti: karlbenson - maaliskuu 18, 2008, 12:43:15 IP
YES!

And note it is against Search Engines Terms Of Use to show content to Spiders that you don't show to guests.
And could result in your being banned from Google + other SE
Otsikko: Re: Robots.txt and SMF
Kirjoitti: daleroe - maaliskuu 19, 2008, 12:10:42 IP
Are you saying that blocking all guests does not block the spiders?  If so can I block the spiders?  How?
Otsikko: Re: Robots.txt and SMF
Kirjoitti: 青山 素子 - maaliskuu 19, 2008, 12:42:48 IP
Blocking guests blocks the search engines.

Karl was just noting that if you block guests but code it so you allow search engines to index, it will often get your site banned from the search engines completely.
Otsikko: Re: Robots.txt and SMF
Kirjoitti: karlbenson - maaliskuu 19, 2008, 01:03:43 IP
To block ALL spiders from your forum
1. Create a robots.txt in the base of your domain

2. Add the code.
User-agent: *
Disallow: /
Disallow: /*


If you want to block ALL guests & Spiders
Admin > Features & Options
Uncheck "Allow guests to browse the forum"

But what you can't do is allow spiders, but not guests.