Making your Smf forum as SEO friendly as possible? How to Tips

Started by dobizo, July 23, 2008, 08:31:15 PM

Previous topic - Next topic

青山 素子

Here it is from my site. I only modified it to use the "forum" path. You'll need to adjust to the path your forum is viewable at:


###################################
# Robots.txt
#
# Based on:
# YouPosted.com Smart Robots v3.05
###################################
#
# My Sitemap - I don't provide it just for the fun of it
Sitemap: http://YOURDOMAINHERE/sitemap.xml

# Google - Most Important bot
# Unfortunately a robots.txt will only stop it crawling certain urls, and NOT adding any
# urls which it comes across into its index. So we're relying on a meta noindex tag.
User-agent: Googlebot
# Don't index mobile versions
Disallow: /forum/index.php?*;wap
Disallow: /forum/index.php?*;wap2
Disallow: /forum/index.php?*;imode

# Yahoo - Too aggressive
# So limit it as much as possible.
User-agent: Slurp
# Disallow Everything
Disallow: /forum/
# Now allow bits and then disallow bits
Allow: /sitemap.xml$
Allow: /robots.txt$
Allow: /forum/index.php$
Allow: /forum/index.php?topic=*.0$
Allow: /forum/index.php?topic=*.*0$
Allow: /forum/index.php?topic=*.*5$
Allow: /forum/index.php?board=*.0$
Allow: /forum/index.php?board=*.*0$
Allow: /forum/index.php?board=*.*5$
# But don't allow these
Disallow: /forum/index.php?*.msg
Disallow: /forum/index.php?topic=*.msg*0$
Disallow: /forum/index.php?topic=*.msg*5$
Disallow: /forum/index.php?*.new
# Anything with a ; disallow
Disallow: /forum/index.php?*;*

# Bad bot - Often ignores robots.txt - Waste of bandwidth
# Despite claiming on their website to be a search engine in development
# I'm suspicious as to whether they are a harvester pretending to be SE
User-agent: Twiceler
Disallow: /

User-Agent: W3C-checklink
Disallow: /

User-agent: TurnitinBot
Disallow: /

# Stop following PHPSESSID's
User-Agent: MJ12bot
Disallow: /forum/index.php?PHPSESSID

# Catch all (remainder)
# Will be followed by any bots other than ones identified above
# Uses BASIC robots.txt directives without wildcards, end-anchors etc
# So Spiders should understand these (including MSNBOT)
User-agent: *
# Default SMF Folders
Disallow: /forum/attachments/
Disallow: /forum/Packages/
Disallow: /forum/Smileys/
Disallow: /forum/Sources/
Disallow: /forum/Themes/
# Default SMF Actions
Disallow: /forum/index.php?action=activate
Disallow: /forum/index.php?action=admin
Disallow: /forum/index.php?action=calendar
Disallow: /forum/index.php?action=emailuser
Disallow: /forum/index.php?action=findmember
Disallow: /forum/index.php?action=help
Disallow: /forum/index.php?action=helpadmin
Disallow: /forum/index.php?action=login
Disallow: /forum/index.php?action=logout
Disallow: /forum/index.php?action=mlist
Disallow: /forum/index.php?action=modifykarma
Disallow: /forum/index.php?action=pm
Disallow: /forum/index.php?action=post
Disallow: /forum/index.php?action=printpage
Disallow: /forum/index.php?action=profile
Disallow: /forum/index.php?action=recent
Disallow: /forum/index.php?action=register
Disallow: /forum/index.php?action=reminder
Disallow: /forum/index.php?action=search
Disallow: /forum/index.php?action=theme
Disallow: /forum/index.php?action=unread
Disallow: /forum/index.php?action=unreadreplies
Disallow: /forum/index.php?action=verificationcode
Disallow: /forum/index.php?action=who
Disallow: /forum/index.php?theme
Disallow: /forum/index.php?action=stats;expand
Disallow: /forum/index.php?action=stats;collapse
Motoko-chan
Director, Simple Machines

Note: Unless otherwise stated, my posts are not representative of any official position or opinion of Simple Machines.


dlackner

Quote from: Motoko-chan on December 04, 2008, 11:08:53 PM
Here it is from my site. I only modified it to use the "forum" path. You'll need to adjust to the path your forum is viewable at

My forum is in the root of my domain so I would remove all of the /forum that are present and leave it as such-


# Google - Most Important bot
#   Unfortunately a robots.txt will only stop it crawling certain urls, and NOT adding any
#   urls which it comes across into its index. So we're relying on a meta noindex tag.
User-agent: Googlebot
# Don't index mobile versions
Disallow: /index.php?*;wap
Disallow: /index.php?*;wap2
Disallow: /index.php?*;imode

# Yahoo - Too aggressive
#   So limit it as much as possible.
User-agent: Slurp
# Disallow Everything
Disallow: /forum/
# Now allow bits and then disallow bits
Allow: /sitemap.xml$
Allow: /robots.txt$
Allow: /index.php$
Allow: /index.php?topic=*.0$
Allow: /index.php?topic=*.*0$
Allow: /index.php?topic=*.*5$
Allow: /index.php?board=*.0$
Allow: /index.php?board=*.*0$
Allow: /index.php?board=*.*5$
# But don't allow these
Disallow: /index.php?*.msg
Disallow: /index.php?topic=*.msg*0$
Disallow: /index.php?topic=*.msg*5$
Disallow: /index.php?*.new
# Anything with a ; disallow
Disallow: /forum/index.php?*;*

# Bad bot - Often ignores robots.txt - Waste of bandwidth
#   Despite claiming on their website to be a search engine in development
#   I'm suspicious as to whether they are a harvester pretending to be SE
User-agent: Twiceler
Disallow: /

User-Agent: W3C-checklink
Disallow: /

User-agent: TurnitinBot
Disallow: /

# Stop following PHPSESSID's
User-Agent: MJ12bot
Disallow: /index.php?PHPSESSID

# Catch all (remainder)
#   Will be followed by any bots other than ones identified above
#   Uses BASIC robots.txt directives without wildcards, end-anchors etc
#   So Spiders should understand these (including MSNBOT)
User-agent: *
# Default SMF Folders
Disallow: /attachments/
Disallow: /Packages/
Disallow: /Smileys/
Disallow: /Sources/
Disallow: /Themes/
# Default SMF Actions
Disallow: /index.php?action=activate
Disallow: /index.php?action=admin
Disallow: /index.php?action=calendar
Disallow: /index.php?action=emailuser
Disallow: /index.php?action=findmember
Disallow: /index.php?action=help
Disallow: /index.php?action=helpadmin
Disallow: /index.php?action=login
Disallow: /index.php?action=logout
Disallow: /index.php?action=mlist
Disallow: /index.php?action=modifykarma
Disallow: /index.php?action=pm
Disallow: /index.php?action=post
Disallow: /index.php?action=printpage
Disallow: /index.php?action=profile
Disallow: /index.php?action=recent
Disallow: /index.php?action=register
Disallow: /index.php?action=reminder
Disallow: /index.php?action=search
Disallow: /index.php?action=theme
Disallow: /index.php?action=unread
Disallow: /index.php?action=unreadreplies
Disallow: /index.php?action=verificationcode
Disallow: /index.php?action=who
Disallow: /index.php?theme
Disallow: /index.php?action=stats;expand
Disallow: /index.php?action=stats;collapse


Correct?

Or do I remove all of the / that are still present as well?

青山 素子

Motoko-chan
Director, Simple Machines

Note: Unless otherwise stated, my posts are not representative of any official position or opinion of Simple Machines.


dlackner

Thanks

One last question- Since as I mentioned, my forum is installed in my root directory- could you edit the following code to remove the /forum references from this particular section as I do not know how that should be done.


# Yahoo - Too aggressive
#   So limit it as much as possible.
User-agent: Slurp
# Disallow Everything
Disallow: /forum/
# Now allow bits and then disallow bits
Allow: /sitemap.xml$
Allow: /robots.txt$
Allow: /index.php$
Allow: /index.php?topic=*.0$
Allow: /index.php?topic=*.*0$
Allow: /index.php?topic=*.*5$
Allow: /index.php?board=*.0$
Allow: /index.php?board=*.*0$
Allow: /index.php?board=*.*5$
# But don't allow these
Disallow: /index.php?*.msg
Disallow: /index.php?topic=*.msg*0$
Disallow: /index.php?topic=*.msg*5$
Disallow: /index.php?*.new
# Anything with a ; disallow
Disallow: /forum/index.php?*;*

青山 素子

Motoko-chan
Director, Simple Machines

Note: Unless otherwise stated, my posts are not representative of any official position or opinion of Simple Machines.


Hassan Omar

No. 3 on the list below doesn't work with smf 1.1.7. Is there another work around? Also I have SMF installed in a subdirectory that google can't seem to get to. How do I work around this?  path = mysite.com/forum/index.php [nofollow]

Quote from: Deprecated on August 07, 2008, 01:33:46 PM
We have become focused on the topic of robots.txt files, so I would like to expand the discussion and get back to the original titled topic: Making your forum as SEO friendly as possible. Here are the steps I've taken so far:

1.) Register for Google Webmaster Tools [nofollow]: Visit their site, sign up for a free account. Validate your authority over your site by either placing a coded META tag on your index page (not that easy for SMF) or placing a coded name file in your root (easy). Have a look around and discover the various reports including error conditions.

2.) Install SMF Sitemap: SlammedDime's mod to add an XML sitemap to your forum, compatible with SMF 1.1.* thru 2.0b3.1p. This creates a new action for your index.php that dynamically generates an XML sitemap of the current state of your site whenever it is accessed. Then go back to your Google Webmaster Tools and enter the URL to your sitemap so that Google can find it.

3.) Install vBulletin Style Meta Tags: rsw686's mod that allows SMF to generate different META tags depending on the board or topic accessed. This allows each topic on your site to have different meta tags, and the tags are more relevant to those specific pages. The description tags are generated from the initial words of the post. It's not likely you will be able to educate your members to get to their point in the first sentence, but I'm pleased as long as my members don't write in textese. ;) Anyway the mod's tags are much better than static tags for the entire site. Note also that you must enter the default tags such as are used on your index page. The mod has a place to configure what you want to use.

4.) Have a close look and give a lot of thought to your .htaccess file (Apache servers only). If you have none, your site probably appears as both example.com [nofollow] and www.example.com [nofollow] which has been said may confuse the robots. You can tell Google which you prefer in your Webmaster Tools, but why not just fix it on your end for everybody? I've got two versions of this file for two different forum sites. One causes all requests to be redirected to www.example1.com [nofollow] while the other causes all requests to be redirected to www.example2.com/forum/ [nofollow] (different URLs, different forums). Both automatically adjust to whether the www is present or not, both use a Redirect 301 permanently moved when the requested URL is wrong, to nudge the search engines into remembering. Note also that .htaccess redirections should be designed to permit your robots.txt file to function properly. If you redirect EVERYTHING to /forum/ the robots.txt file in your root is inaccessible. There is some uncertainty whether any robots.txt file in /forum/ would be honored or even found. I could test for this but honestly I couldn't be bothered. I just made sure my own file in the root is accessible.

5.) Make sure that the parts of your forum that you want indexed are accessible by guests. From your end all robots look like guests, and if you restrict your site material to require registration, the searchbots are going to get nada.

6.) Get your site linked to from other sites. If you control several websites, put links to your forum on each, and enlist friends to do the same. The more sites that link to you, the more valuable the search engines consider your content. Just remember that there's no use unless the other sites are crawled too, so you may need to take steps and at least get the page with the link indexed. I've gotta do this myself... Been too busy getting #1-#5 done.

Okay, that's my tips to share with y'all. Now what I want from youse is to tell me if I have missed anything, and share your ideas on how to get the attention of the search engines. Remember, if you aren't on the search engines your forum will find it difficult to grow. If you get lots of hits on the search engine, people will be drawn to your site from web searches, and your forum will be more likely to gain popularity.

H

Welcome to SMF. Both those mods are compatible with SMF 2.x beta. If you wanted, you could Upgrade :)
-H
Former Support Team Lead
                              I recommend:
Namecheap (domains)
Fastmail (e-mail)
Linode (VPS)
                             

Hassan Omar

Quote from: H on December 15, 2008, 06:08:50 AM
Welcome to SMF. Both those mods are compatible with SMF 2.x beta. If you wanted, you could Upgrade :)

I use yahoo for hosting, which unfortunately doesn't support the .htaccess file (wont allow it to be uploaded). How then do I adjust for this? At issue is the fact that none of my smf forum pages or post are visible when searched on google www.atlantainvesting.net/forum/index.php [nofollow]. Could the issue be the fact that the real index page contains an instant redirect to the forum? (from www.atlantainvesting.net [nofollow] to www.atlantainvesting.net/forum/index.php [nofollow]

Please give me a clue here. I.m using 1.1.7

青山 素子

Try using Google Webmaster Tools and look at the index stats. It should let you know why no indexing is being done on that.
Motoko-chan
Director, Simple Machines

Note: Unless otherwise stated, my posts are not representative of any official position or opinion of Simple Machines.


Hassan Omar

Quote from: Motoko-chan on December 15, 2008, 10:18:20 AM
Try using Google Webmaster Tools [nofollow] and look at the index stats. It should let you know why no indexing is being done on that.

From google webmaster:

Sitemap summary
Most sites will not have all of their pages indexed.
Improve how Google crawls and indexes your site. More information.
Property   Status
Sitemap type   Web
Format   Sitemap
Submitted   10 hours ago
Last downloaded by Google   10 hours ago
Status   OK
Total URLs in Sitemap   57
Indexed URLs in Sitemap Help   No data available. Please check back soon.
Sitemap errors and warnings
No errors or warnings found.

What does this tell me?

青山 素子

That isn't the correct section (Diagnostics -> Web crawl is where you want to look for errors), but it did make me think. Your main sitemap doesn't list any of your forum, so Google is ignoring it. Check http://www.atlantainvesting.net/sitemap.xml.

Sitemaps cut both ways. They let you make sure all the URLs are listed, but if you don't list something, it doesn't get indexed. I suggest you either delete the sitemap files or generate a full sitemap at that location.

Search engines are good at picking up URLs. I only suggest using a sitemap when you need to modify some behavior or you see big gaps in indexing. They only help you get indexed, they don't help rank on searches or anything like that.
Motoko-chan
Director, Simple Machines

Note: Unless otherwise stated, my posts are not representative of any official position or opinion of Simple Machines.


ccraciun

Hello!

I followed some of the tips regarding the optimization of my SMF board and now i'm stuck at the following problems:

1. I use the sitemap mod and i submited mine to google; after that i created robots.txt file, as described here. Unfortunately, it appears that /*?action* does not permit Google to follow my sitemap, or my gallery. Though, i want to prevent crawlers to follow Help, Search, Statistics etc. and other pages that, i guess, may hurt my ranking.
Now, in Webmaster's tools i have:
URL restricted by robots.txt
We encountered an error while trying to access your Sitemap. Please ensure your Sitemap follows our guidelines and can be accessed at the location you provided and then resubmit.

What do you suggest?


2. I observed in Diagnostics | Content analysis, that i have the following issues:
- duplicate meta descriptions;
- short meta descriptions;
- duplicate title tags.
The downloaded tables containing all the information may be found in the attachment.
I am most annoyed by the duplicate title tags issue and i don't understand why this is happening; by example:
http://www.forum-brasov.ro//index.php?topic=2.0;prev_next=prev;
http://www.forum-brasov.ro/index.php?topic=3.0
http://www.forum-brasov.ro/index.php?topic=3.msg%msg_id
http://www.forum-brasov.ro/index.php?topic=3.msg3
all of them open the same page.


Motoko-chan, i would like to use your robot.txt file, but i am wondering though on how to prevent the duplicate content that appears for ?action=print instruction more than imode or wap stuff because i don't use those formats yet for my forum.

Many thanks!

[Later edit]
I forgot to mention that i enabled Search engine friendly URL's in Basic features page of SMF, and the topic put as example at question #2, is accessible by default with:
http://www.forum-brasov.ro/index.php/topic,3.0.html
Also, i installed metatags 1.1 mod for 1.1.x forums which should have solved the duplicate and short meta description issues; maybe the problem reported by google appeared before installing this mod?

H

You shouldn't be blocking: *action* unless you're using a rewrite mod that doesn't use action urls.
-H
Former Support Team Lead
                              I recommend:
Namecheap (domains)
Fastmail (e-mail)
Linode (VPS)
                             

ccraciun

I didn't knew that; now i modified my robots.txt; i hope that it will be ok.
Thanks and Happy new year to all!  :D

I AM Legend

Hi all,
I am creating my 1st robots.txt file, I want both google and yahoo bots to be able to index my entire forum except for my staff area - Admin, Global, staff and recycle bin.
My forum is the the root dir on my host, but I dont want the robots being able to access anything else in the root dir eg like cgi or anything else just the boards on my forum.
How do I go about doing this?
Thanks in advance
Rob

青山 素子

If guests don't have access to those boards, the search engines will not either.
Motoko-chan
Director, Simple Machines

Note: Unless otherwise stated, my posts are not representative of any official position or opinion of Simple Machines.


I AM Legend

o ok, well guests do have limited access on my forum, so its a case of either, give guests full forum access and have bots (and open spammers) or little or no guest access and no bots?

青山 素子

Basically, a search engine bot is like a guest. They can only see the stuff that guests can see. They don't need posting permissions, however. In SMF 2.0, you can restrict recognized search engines to see less than a normal guest, but never more.

You don't want a search engine to see more than a normal guest. That technique is usually grouped with a few others that are considered cloaking. This can get you in trouble and get you removed from the index.
Motoko-chan
Director, Simple Machines

Note: Unless otherwise stated, my posts are not representative of any official position or opinion of Simple Machines.


zerog12avity

Quote from: Deprecated on August 07, 2008, 01:33:46 PM
We have become focused on the topic of robots.txt files, so I would like to expand the discussion and get back to the original titled topic: Making your forum as SEO friendly as possible. Here are the steps I've taken so far:

1.) Register for Google Webmaster Tools: Visit their site, sign up for a free account. Validate your authority over your site by either placing a coded META tag on your index page (not that easy for SMF) or placing a coded name file in your root (easy). Have a look around and discover the various reports including error conditions.

2.) Install SMF Sitemap: SlammedDime's mod to add an XML sitemap to your forum, compatible with SMF 1.1.* thru 2.0b3.1p. This creates a new action for your index.php that dynamically generates an XML sitemap of the current state of your site whenever it is accessed. Then go back to your Google Webmaster Tools and enter the URL to your sitemap so that Google can find it.


I registered a Google Webmaster Tools account and submitted my Sitemmap (Sitemap 2.0.0  Mod)

I keep getting this error after submission...
Unsupported file format
Your Sitemap does not appear to be in a supported format. Please ensure it meets our Sitemap guidelines and resubmit.

Is there a different format available? Or which one should I use?

ccraciun

@zerog12avity: add ;xml to your sitemap address (url/forum/index.php?action=sitemap;xml) and it will work just fine.
I tested it and it's ok, google will accept it this way.

Advertisement: