News:

Want to get involved in developing SMF, then why not lend a hand on our github!

Main Menu

Making your Smf forum as SEO friendly as possible? How to Tips

Started by dobizo, July 23, 2008, 08:31:15 PM

Previous topic - Next topic

dobizo

Thanks for any advice. I'm a web entrepreneur and have  a popular blog for the apparel industry and recently added a business community with a bridged Smf community.

I have some backlinks from my own blogs and other sites. I get decent traffic with it over 30,000 pageviews per month for it to be fairly new and over 2,000 members mostly from the traffic from my main blog. I have some great articles and threads people are discussing in the forum. I want to find the best way to get them indexed into the search engines.

Any mods, tips, or general advice is greatly appreciated. I host my site with Media Temple if that helps with anything and you can look at the forum by going to hxxp:fashionnetwork.dobizo.com/modules/smf/index.php [nonactive]

Thanks,

Dobizo


ellion

thats a really useful list of tips, i have done some of this stuff already.

does anybody know about this robots.txt ?do i just create a txt file in notepad and name it robots with the code User-agent: *
Disallow: /*?action*
Disallow: /*sort=*
Disallow: /*msg*


then upload that. i am geussin that once it is done there will be no feedback from the site or the robots as to wether the robots are making use of that so i want to check before i do it.

H

Indeed you just upload a file called robots.txt with that text inside. It needs to go in your main sites folder (So it can be access via http://www.mysite.example/robots.txt rather than http://www.mysite.example/forum/robots.txt)

I'd recommend you do a search as SEO is a frequently discussed topic. You may also want to do a search for robots.txt as the one confusion has linked to is missing quite a few good entries.
-H
Former Support Team Lead
                              I recommend:
Namecheap (domains)
Fastmail (e-mail)
Linode (VPS)
                             

karlbenson

Re robots.txt
and
User-agent: *
Disallow: /*?action*
Disallow: /*sort=*
Disallow: /*msg*

* for use in the disallow line is NOT supported by most search engines (other than Google,Yahoo) [msn does not support it for one.]
* at the end of the disallow line is unnecessary as it is IMPLIED (the implied at the end is supported by all robots, which for the reason about is the reason why its better not to specify them if you can.

My robots.txt can be found
http://www.youposted.com/robots.txt
I have different sections for different robots and based on the parameters that each can support.
- Yahoo is too aggressive so I limit it to everything but topics.
- Google I allow free reign

masternewbie

Quote from: H on August 04, 2008, 04:00:14 PM
Indeed you just upload a file called robots.txt with that text inside. It needs to go in your main sites folder (So it can be access via http://www.mysite.example/robots.txt rather than http://www.mysite.example/forum/robots.txt)

I'd recommend you do a search as SEO is a frequently discussed topic. You may also want to do a search for robots.txt as the one confusion has linked to is missing quite a few good entries.

What if I have my cpanel redirect my site domain (www.luckie8.com) to goto, PATH: www.luckie8.com/forum/   ?
Will the robots.txt still work if I put it in the root directory?

Deprecated

Quote from: masternewbie on August 05, 2008, 11:36:57 PMWill the robots.txt still work if I put it in the root directory?

The method used to exclude robots from a server is to create a file on the server which specifies an access policy for robots. This file must be accessible via HTTP on the local URL "/robots.txt".

I take that to mean that it must be located at www.example.com/robots.txt

I've used my Apache .htaccess file to forward my entire domain with or without the www to /forum/ but I have added a line that excludes the robots.txt file:

RewriteCond %{REQUEST_URI} !robots.txt [NC]

Even though every other access to my domain with or without the www is forwarded to /forum/ my robots.txt is available to robots in my domain's root directory with or without the www.

I suggest you do the same.

kopchev

Quote from: H on August 04, 2008, 04:00:14 PM
Indeed you just upload a file called robots.txt with that text inside. It needs to go in your main sites folder (So it can be access via http://www.mysite.example/robots.txt rather than http://www.mysite.example/forum/robots.txt)

Disallow: /forum/*?action*
Disallow: /forum/*sort=*
Disallow: /forum/*msg*

Is this correct? I mainly targer google bot since MSN and Yahoo have no worth for my site

karlbenson

No
- No * is needed at the end of each line, like I said, its implied.

Disallow: /forum/*?action
Disallow: /forum/*sort=
Disallow: /forum/*msg

Also note, the above will only work for Google and Yahoo.
It is an invalid robots.txt for other sites (including msn).

Therefore I'd suggest doing different blocks for each


User-agent: Googlebot
Disallow: /forum/*?action
Disallow: /forum/*sort=
Disallow: /forum/*msg

User-agent: Slurp
Disallow: /forum/*?action
Disallow: /forum/*sort=
Disallow: /forum/*msg

User-agent: *
Disallow: /forum/index.php?action


Note, the * (catch all) user agent always comes last. 
And there isn't a way to block msg, and sort links for the catchall since the basic robots.txt which most robots follow, don't support the wildcard *.  So you leave them off.

kopchev

So,

User-agent: Googlebot
Disallow: /forum/*?action
Disallow: /forum/*sort=
Disallow: /forum/*msg

User-agent: Slurp
Disallow: /forum/*?action
Disallow: /forum/*sort=
Disallow: /forum/*msg

User-agent: *
Disallow: /forum/index.php?action


is fine?

My final robots.txt looks like (I run joomla)
edit 2:

User-agent: Googlebot
Disallow: /forum/*?action
Disallow: /forum/*sort=
Disallow: /forum/*msg

User-agent: Slurp
Disallow: /forum/*?action
Disallow: /forum/*sort=
Disallow: /forum/*msg

User-agent: *
Disallow: /administrator/
Disallow: /cache/
Disallow: /components/
Disallow: /editor/
Disallow: /help/
Disallow: /images/
Disallow: /includes/
Disallow: /language/
Disallow: /mambots/
Disallow: /media/
Disallow: /modules/
Disallow: /templates/
Disallow: /installation/
Disallow: /forum/index.php?action

karlbenson

There is an issue.
Each spider will ONLY follow ONE block.
So google won't follow Googlebot section and the * section.
You need to specify for the FULL lot for each spider.

eg
User-agent: Googlebot
Disallow: /forum/*?action
Disallow: /forum/*sort=
Disallow: /forum/*msg
Disallow: /administrator/
Disallow: /cache/
Disallow: /components/
Disallow: /editor/
Disallow: /help/
Disallow: /images/
Disallow: /includes/
Disallow: /language/
Disallow: /mambots/
Disallow: /media/
Disallow: /modules/
Disallow: /templates/
Disallow: /installation/

User-agent: Slurp
Disallow: /forum/*?action
Disallow: /forum/*sort=
Disallow: /forum/*msg
Disallow: /administrator/
Disallow: /cache/
Disallow: /components/
Disallow: /editor/
Disallow: /help/
Disallow: /images/
Disallow: /includes/
Disallow: /language/
Disallow: /mambots/
Disallow: /media/
Disallow: /modules/
Disallow: /templates/
Disallow: /installation/

User-agent: *
Disallow: /administrator/
Disallow: /cache/
Disallow: /components/
Disallow: /editor/
Disallow: /help/
Disallow: /images/
Disallow: /includes/
Disallow: /language/
Disallow: /mambots/
Disallow: /media/
Disallow: /modules/
Disallow: /templates/
Disallow: /installation/
Disallow: /forum/index.php?action


You could also block the smf source and theme files if necessary
Disallow: /forum/Themes
Disallow: /forum/Sources


kopchev

Ok, I will make the changes you suggested. Thanks very much... I have another question. Will this solve the problem with the duplicate meta descriptions and title tags errors in Google Webmasters, that the forum causes?

Deprecated

Karl: Thank you so much for your post. I just read Jerry Bell's article where I presume the suggested SMF robots file came from, added his mods (with the asterisks) to my site yesterday after satisfying myself with what each line did, but I had no idea the end of line wild card wasn't valid. I've updated my own robots.txt file per your suggestion. :)

Kopchev: while there's no harm, your file is probably needlessly overcomplicated. There is no need to list folders if they do not have links pointing to them on your site. Robots cannot see folders unless they are linked. I don't know which of those can be eliminated but I'm sure there is no link to your /cache/ folder, and there are probably others. Being executed by calls from other PHP files (e.g. index.php) does not qualify as being linked.

All: It is important that you must check your robots.txt file not by eyeballing it but by use of automated checkers such as that provided in the Google Webmaster tools. The file suggested by Karl may be perfectly fine for most SMF forums, but if your forum is located in your root as mine is, the /forum/ part is incorrect. I hadn't realized this until I tested my robots.txt using the Webmaster Analyze Robots.txt tool.

You can paste your tentative robots.txt file into the Google page, add a URL you want to test against it, then hit the button and see the result, either allowed or disallowed. Go ahead and test using some action, sort and msg URLs and verify that they work right. You can assume that if you've got it right for Google that it will be right for Yahoo too.

The "official" unofficial robots.txt site is located at http://www.robotstxt.org/orig.html and I suggest that everybody should read this quasi-standard and consider your robots.txt file in that context.

karlbenson

Also, please remember any changes you make to your robots.txt take 24hrs to take effect (as search engines cache them).

The problem with robots.txt is that the most only support the basic robots.txt,  but then there were later ones which some have adopted, but different parts. So it is a nightmare

Deprecated, the end * is valid only for search engines which support a wildcard character.
However a wildcard is already implied by default.

so
Disallow: /cache
also matches
Disallow: /cachefile
Disallow: /cache/

Google and Yahoo support the $ character to prevent the IMPLIED wildcard, eg
Disallow: /cache/$
would only match that, but none of the others I posted above.

MSN only supports $ when used in conjunction with * for filenames ONLY.
Disallow: /avatars/*.png$

Deprecated

I agree with your syntax although I'm not so sure there's any point in including your /cache/ directory since I doubt there is any link a searchbot could follow to discover your cache directory. I presume you just stated that as an example of syntax.

For my own forum I focus 80% of my attention towards Google. They are actively friendly in soliciting the cooperation of website operators and provide personalized tools that allow you to specify the interaction between the Googlebot and your site, and then provide you with regularly updated feedback on how they perceive your site. Google even allows you to set a sitemap.xml to aid their finding your pages.

Yahoo seems to have found me and they visit fairly regularly, but I've never been able to find any way to get their attention or how to optimize my site to get the best results with Yahoo. They're just like the weather, no point in doing anything about it except being prepared.

MSN? Pfffff!!! They are inscrutable, monolithic, impervious. Besides, who uses MSN search anyway? They can come, they can see, and then they can go conquer some place else. ;)

I'm pleased that Google and I have a very nice arrangement. My final lines for User-agent: * suffice for whatever else and I don't care what it is, devil take the hindmost. If they send me any visitors, fine. If not, that's fine too.

The only sure thing I ask is that any Disallow directories I include better well not be indexed! I've got a static site that enforces my robots.txt disallowed directories, including robot bait and auto-ban. I'm not that concerned about my forum site. When I want to feel superior to bad robots I go look at my banned robots log at the other site. :P :P :P

Deprecated

We have become focused on the topic of robots.txt files, so I would like to expand the discussion and get back to the original titled topic: Making your forum as SEO friendly as possible. Here are the steps I've taken so far:

1.) Register for Google Webmaster Tools: Visit their site, sign up for a free account. Validate your authority over your site by either placing a coded META tag on your index page (not that easy for SMF) or placing a coded name file in your root (easy). Have a look around and discover the various reports including error conditions.

2.) Install SMF Sitemap: SlammedDime's mod to add an XML sitemap to your forum, compatible with SMF 1.1.* thru 2.0b3.1p. This creates a new action for your index.php that dynamically generates an XML sitemap of the current state of your site whenever it is accessed. Then go back to your Google Webmaster Tools and enter the URL to your sitemap so that Google can find it.

3.) Install vBulletin Style Meta Tags: rsw686's mod that allows SMF to generate different META tags depending on the board or topic accessed. This allows each topic on your site to have different meta tags, and the tags are more relevant to those specific pages. The description tags are generated from the initial words of the post. It's not likely you will be able to educate your members to get to their point in the first sentence, but I'm pleased as long as my members don't write in textese. ;) Anyway the mod's tags are much better than static tags for the entire site. Note also that you must enter the default tags such as are used on your index page. The mod has a place to configure what you want to use.

4.) Have a close look and give a lot of thought to your .htaccess file (Apache servers only). If you have none, your site probably appears as both example.com and www.example.com which has been said may confuse the robots. You can tell Google which you prefer in your Webmaster Tools, but why not just fix it on your end for everybody? I've got two versions of this file for two different forum sites. One causes all requests to be redirected to www.example1.com while the other causes all requests to be redirected to www.example2.com/forum/ (different URLs, different forums). Both automatically adjust to whether the www is present or not, both use a Redirect 301 permanently moved when the requested URL is wrong, to nudge the search engines into remembering. Note also that .htaccess redirections should be designed to permit your robots.txt file to function properly. If you redirect EVERYTHING to /forum/ the robots.txt file in your root is inaccessible. There is some uncertainty whether any robots.txt file in /forum/ would be honored or even found. I could test for this but honestly I couldn't be bothered. I just made sure my own file in the root is accessible.

5.) Make sure that the parts of your forum that you want indexed are accessible by guests. From your end all robots look like guests, and if you restrict your site material to require registration, the searchbots are going to get nada.

6.) Get your site linked to from other sites. If you control several websites, put links to your forum on each, and enlist friends to do the same. The more sites that link to you, the more valuable the search engines consider your content. Just remember that there's no use unless the other sites are crawled too, so you may need to take steps and at least get the page with the link indexed. I've gotta do this myself... Been too busy getting #1-#5 done.

Okay, that's my tips to share with y'all. Now what I want from youse is to tell me if I have missed anything, and share your ideas on how to get the attention of the search engines. Remember, if you aren't on the search engines your forum will find it difficult to grow. If you get lots of hits on the search engine, people will be drawn to your site from web searches, and your forum will be more likely to gain popularity.

karlbenson

I've done alot of seo type stuff on my forum. I've probably done everything apart from SEO-urls {which in my opinion, have little effect [and would actively result in short-medium term negative pagerank and seo impact if I switched on my 2 year old forum}.

Indeed google is the big cheese.
Yahoo / MSN are less importance, but Yahoo needs controlling purely because its the most aggressive spider in the world.

Indeed, I've wrote a sitemap generating script (similar to the smf sitemap mod), and have submitted it via google webmasters.
And I'm also linking to it in my robots.txt with the auto discovery feature of robots.txt.

Deprecated

I had a lot of fun one day trying to adjust my .htaccess file and write a custom php script to make it appear that robots.txt was being accessed while instead my robots.php was spitting out the text. I wanted to be able to dynamically decide what to send depending on who asked. I almost got it working but there was one final, little flaw and I can't remember what it was now. But that's a story for another day...

I can't see how SMF's default URLs could be any more SEO friendly. I doubt SEOs care how the actual URL is spelled, whether it's http://www.example.com/index.php?topic=150.0 or http://www.example.com/what_a_pleasant_title_for_a_topic.html ... Actually that's what I've done with my vanity cooking site, pretty thread names dynamically generated by one php script and .htaccess file, but that too is a story for another day...

I've researched the topic of SEO friendly URLs and as far as I can tell all the big search engines handle the ?query=selector form just fine.

Well my list above must have covered practically all of it since I know you're a SMF veteran and if you don't have anything to add then I must be almost there. :)

confusion

I am glad too have stirred up some discussion here :)

A few notes from my side:
1. I have observed a notable difference in the volume of traffic to my busiest forum from Google referrals after switching to pretty URLs.  I had been using the old style URL's for a long time and was a bit worried about taking a hit when I switched, but I did not notice a decline. 
2. The sitemaps mod does not work when using pretty urls
3. vbulletin meta tags only work for SMF 2
4. Focusing on google makes a lot of sense.  Take a look at these stats from one of my sites (copied from awstats output):

- Google 7044 7081
- Windows Live 790 790
- Yahoo! 378 378
- Google (cache) 30 536
- Dogpile 21 21
- MSN Search 17 17
- AltaVista 16 16
- Unknown search engines 11 79
- AllTheWeb 10 10
- Baidu 9 9
- Swik (Social Bookmark) 7 7
- Mozbot 6 6
- MetaCrawler (Metamoteur) 4 4
- Yandex 4 4
- del.icio.us (Social Bookmark) 3 3
- Netscape 3 3
- Search.com 2 2
- AOL 2 2
- Avantfind 1 1
- WebCrawler 1 1
- Stumbleupon (Social Bookmark) 1 1
- MetaBot 1 1
- Scroogle 1 1
- Ask 1 1

It's nearly a factor of 10 difference between google referrals and the other SE's.  I am #2 on google for my keyword, but #1 on just about all other SE's.
5. There are many many many off-page activities to increase SERP ranking, I was focusing only on on-page improvements.   If you are looking for a good place to get some ideas for the off-page improvements, read some of the threads over at http://forums.digitalpoint.com.

Deprecated

Could you please describe exactly what "pretty URLs" are, and give an example? I don't want to install the mod just to be able to see what they are.

I may be mistaken but I believe that Google/Yahoo/MSN are quite capable of understanding SMF's default URLs. I couldn't be bothered with the other web crawlers.

Perhaps your 10x difference between Google and the other SEs could be due to Google's vast size?


And what about SMF's "Search engine friendly URLs" ? As I said above, the default URLs seem to work fine. Actually since Google indexes my site regularly, I could care less what the other SEs do. Google is the 800 pound gorilla.

Advertisement: