Simple Machines Community Forum

Customizing SMF => Modifications and Packages => Topic started by: Aaron on November 09, 2006, 01:06:34 PM

Title: SEO: Duplicate Content Preventer
Post by: Aaron on November 09, 2006, 01:06:34 PM
Link to Mod (http://mods.simplemachines.org/index.php?mod=534)
Rate this Mod (http://mods.simplemachines.org/index.php?action=review;sa=add;mod=534)

This mod will tell robots not to index topics that are being accessed with .msg, prev_next, ;all, or by printing the topic (?action=printpage), by adding <meta name="robots" content="noindex" /> to these pages.

Note: this mod requires a modification in index.template.php. It changes only the default theme's index.template.php, so you'll have to apply the changes manually in any custom theme!
Title: Re: SEO: Duplicate Content Preventer
Post by: Niteblade on November 10, 2006, 09:45:03 PM
I installed the mod, but when I do a 'view source' on a "print page," I do not see the '<meta name="robots" content="noindex" />'

Granted, I'm not using the default theme, per se, but the custom theme that I am using was copied into the Themes/default/ directory. In essence, I overwrote the default theme that ships with SMF with the custom theme.


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Print Page - For members: A photo gallery is installed.</title>
<style type="text/css">
body
{
color: black;
background-color: white;
}
body, td, .normaltext
{
font-family: Verdana, arial, helvetica, serif;
font-size: small;
}
*, a:link, a:visited, a:hover, a:active
{
color: black !important;
}
table
{
empty-cells: show;
}
.code
{
font-size: x-small;
font-family: monospace;
border: 1px solid black;
margin: 1px;
padding: 1px;
}
.quote
{
font-size: x-small;
border: 1px solid black;
margin: 1px;
padding: 1px;
}
.smalltext, .quoteheader, .codeheader
{
font-size: x-small;
}
.largetext
{
font-size: large;
}
hr
{
height: 1px;
border: 0;
color: black;
background-color: black;
}
</style>
</head>
<body>
<h1 class="largetext">Midessa</h1>

<h2 class="normaltext">Forum happenings => Feature announcements => Topic started by: Nite on October 07, 2006, 03:30:21 PM</h2>

<table width="90%" cellpadding="0" cellspacing="0" border="0">
<tr>
<td><!--Headers--><div align="center">
Title: Re: SEO: Duplicate Content Preventer
Post by: vbgamer45 on November 10, 2006, 09:48:40 PM
You will need to modify the custom theme as well. Open the package and find .xml files the change is pretty easy to do.
Title: Re: SEO: Duplicate Content Preventer
Post by: Dannii on November 10, 2006, 09:49:20 PM
You should never overwrite the default theme.
Title: Re: SEO: Duplicate Content Preventer
Post by: Niteblade on November 10, 2006, 09:55:31 PM
Quote from: eldʌkaː on November 10, 2006, 09:49:20 PM
You should never overwrite the default theme.

Ya.

But you know, there are so many modifications that are made just for the default theme. And adding numerous modifications to any custom theme is troublesome. To make it less troublesome, I made my custom theme my new default theme. Not all of the actual default theme files were overwritten -- just the ones that the custom theme needed to use in order to differentiate it from the real default.
Title: Re: SEO: Duplicate Content Preventer
Post by: Niteblade on November 10, 2006, 10:00:49 PM
As a side note, the package installed without any errors. And when I open the two modified files, I can see that the code was successfully added.
Title: Re: SEO: Duplicate Content Preventer
Post by: vbgamer45 on November 10, 2006, 10:11:49 PM
Hmm, it seems to work on the other pages just not the print page on your site
Title: Re: SEO: Duplicate Content Preventer
Post by: diegolyanky on November 12, 2006, 04:09:07 PM
Aäron :

You are a genius ... It's a great idea, ... As you know, the robots can turn your board very slow for indexing all the board ... Today, Google's bot was indexed my board and I didn't notate that ...

I checked the code on one of the topics and in the source code it's the intruction / noindex ...

Thanks guy ! ...  ;)
Title: Re: SEO: Duplicate Content Preventer
Post by: udeze on November 28, 2006, 08:05:12 AM
This is truly a great mod, a very powerful SEO tool  :)
Title: Re: SEO: Duplicate Content Preventer
Post by: keith021773 on November 28, 2006, 08:29:33 AM
How is this working with google?   Does it help the google spider index your site better?  And have you noticed a better listing in google?
Title: Re: SEO: Duplicate Content Preventer
Post by: winrules on November 28, 2006, 09:03:47 PM
It stops Google from indexing some duplicate pages. Many times have duplicate content can decrease page rank.

Also this is a default feature on 1.1 final.
Title: Re: SEO: Duplicate Content Preventer
Post by: Jiveturkey on December 02, 2006, 09:25:00 PM
When you say it's a default in 1.1 final does that mean that you don't need to install the mod?

I tried to install it but it says that the file is corrupted.
Title: Re: SEO: Duplicate Content Preventer
Post by: vbgamer45 on December 02, 2006, 09:32:54 PM
Quote from: Jiveturkey on December 02, 2006, 09:25:00 PM
When you say it's a default in 1.1 final does that mean that you don't need to install the mod?

I tried to install it but it says that the file is corrupted.
If you have SMF 1.1 you do not need to install this mod since it is built into this release.
Title: Re: SEO: Duplicate Content Preventer
Post by: golfhos on December 12, 2006, 05:55:18 PM
Thank you very much for this mod!

I know that this change is included in version 1.1, but I've made a lot of customizations to my RC 3 installation and it's going to take me a while to upgrade.  But the SEO duplicate content fix was the the 1.1 change that I was most interested in.

I made the change last week, and I've seen some nice increases in the number of pages being indexed by Google since then.

I also ran into the issue of my print page not getting modified, but it wasn't a big deal to make the change manually to my Printpage.template.




Title: Re: SEO: Duplicate Content Preventer
Post by: simonm on January 14, 2007, 03:09:56 PM
QuoteI also ran into the issue of my print page not getting modified, but it wasn't a big deal to make the change manually to my Printpage.template.

How do I change the Printpage.template manually to prevent robots from indexing it?
Title: Re: SEO: Duplicate Content Preventer
Post by: Rick_M on February 05, 2007, 09:04:50 PM
I'm using 1.1.1 and I couldn't find a way to block duplicate content already built into the software.  If there is an admin setting, can someone point me to the right spot?

I went ahead and installed the mod and it works great, except I found that duplicate content still shows up on the printpage and on reply pages - which google has indexed.

I've just gone ahead and url's with those terms in it with my robots.txt file.  I'm honestly not sure which is better for SEO (robots.txt vs meta), but I am guessing that duplicate content is a big reason why many smf forums don't get ranked well by Google.  I guess I'll see how Google likes my smf forum.  I'm pretty knowledgable on SEO and I've just switched over from xoops to smf.  I love the forum software, but if it doesn't get good search engine traffic, it isn't worth much.
Title: Re: SEO: Duplicate Content Preventer
Post by: Rick_M on February 06, 2007, 08:57:07 PM
Okay - finding more issues with duplicate content that Google is picking up, and possibly causing ranking problems for the forum.

Google is indexing the recent posts page linked to from the front page - and the links on that page go to the specific messages, which I don't want indexed.  I'd want the thread to get indexed, not the specific sub-posts.

Google is also indexing the recent posts under each user ID, but again, it is linking to the specific sub-post message, not the thread.

Finally - on the front page, I'd much rather have the links to the most recent threads that had activity, instead of the specific sub-post.

Any help in addressing these issues is appreciated.

I can't find any options built into 1.1.1 to prevent Google from spidering these duplicate content pages and indexing them.

Title: Re: SEO: Duplicate Content Preventer
Post by: bluegray on February 09, 2007, 09:30:52 AM
Looks like this mod is now part of SMF 1.1.1
There are no options to disable/enable it. But you can check the source of your webpage to see if there is a ' <meta name="robots" content="noindex" /> ' in the header section.

Also remember that google might still have some old pages in it's cache and will follow urls from there, unless you block it in your robots.txt file.
Title: Re: SEO: Duplicate Content Preventer
Post by: Aaron on February 11, 2007, 05:47:04 AM
Quote from: bluegray on February 09, 2007, 09:30:52 AM
Looks like this mod is now part of SMF 1.1.1

It is indeed an integrated part of SMF 1.1.1. :)
Title: Re: SEO: Duplicate Content Preventer
Post by: Rick_M on March 25, 2007, 05:56:15 PM
If this is integrated, I was not able to tell.  1.1.1 is the first versoin I installed, and I had also upgraded to 1.1.2 when it became available.

The pages that were getting indexed by google included none of the main forum threads (even after I installed and submitted a google sitemap), but instead the posts with urls including:

/index.php?action=post;topic=50.11

and

/index.php?action=recent;start=20

and

/index.php?action=post;topic=6.0;num_replies=2

and even on the simplemachines.org website, the main URL for posts are not indexed, but instead there are url's like:

/index.php?topic=159487.msg1017701;topicseen

I finally gave up after 6 weeks - I don't know what it is about the URL structure that Google dislikes, but there is something there.  I switched over to bbPress (because phpBB site was down) and within a week, I have all of my threads indexed as they should be, with many of the threads ranking first for their title. 

My site had been established using xoops in the past, but I didn't need the whole content management - just a forum.  I like SMF the best for usability and features, but if it doesn't get indexed properly, there won't be any traffic. 

If someone has an example of a site that runs SMF that has gotten spidered and indexed well, as well as gets top rankings for the titles of the posts, I'd love to take a look at it.

Don't get me wrong - I love the software.  Not getting indexed properly is a deal-breaker for me though.

Title: Re: SEO: Duplicate Content Preventer
Post by: bluegray on March 27, 2007, 11:11:56 AM
Google have no problems spidering smf standard install. But the pages that show up in the google search will take some time, depending on when they update. So it might take a while if you got spidered just after an update. A proper sitemap might help google spider your important pages faster, but it does not guarantee it will show up in results. GoogleBot will follow all the urls, but the duplicate pages will not be indexed, but instead show up as 'Supplemental Result' or just the url will be listed. Some of the links might already be in the google index, and those will take a while to disappear unless you specifically ask them to remove them.

Check which pages are in the google index by searching for 'site:yourwebsite.com'

Having your pages show up in the top ten is a whole other story. You will have to provide content that is relevant to the search words. Although the standard smf theme is fine, there is plenty of optimizations that can be done (header tags/title/words in url ect.). The prettyurls mod, while not necessary to get spidered, can provide extra keywords for google to rank the page by. I get spidered every day, and most pages are not more than 3 days old in google's cache.  Relevant keywords also provide results in the top ten for my site.

To keep googlebot from certain pages, I also use a robots.txt file, but it's not necessary. But then googlebot doesn't waste time spidering content that people probably don't want to see


User-agent: *
Disallow: /cgi-bin/
Disallow: /index.php?action=activate
Disallow: /index.php?action=admin
Disallow: /index.php?action=arcade
Disallow: /index.php?action=calendar
Disallow: /index.php?action=collapse
Disallow: /index.php?action=deletemsg
Disallow: /index.php?action=editpoll
Disallow: /index.php?action=help
Disallow: /index.php?action=helpadmin
Disallow: /index.php?action=lock
Disallow: /index.php?action=login
Disallow: /index.php?action=logout
Disallow: /index.php?action=markasread
Disallow: /index.php?action=mergetopics
Disallow: /index.php?action=mlist
Disallow: /index.php?action=modifykarma
Disallow: /index.php?action=movetopic
Disallow: /index.php?action=notify
Disallow: /index.php?action=notifyboard
Disallow: /index.php?action=pm
Disallow: /index.php?action=post
Disallow: /index.php?action=profile
Disallow: /index.php?action=register
Disallow: /index.php?action=removetopic2
Disallow: /index.php?action=reporttm
Disallow: /index.php?action=search
Disallow: /index.php?action=sendtopic
Disallow: /index.php?action=splittopics
Disallow: /index.php?action=stats
Disallow: /index.php?action=sticky
Disallow: /index.php?action=trackip
Disallow: /index.php?action=unread
Disallow: /index.php?action=unreadreplies
Disallow: /index.php?action=who
Disallow: /Themes/

Disallow: *.msg
Title: Re: SEO: Duplicate Content Preventer
Post by: Niteblade on June 17, 2007, 04:24:59 PM
Nice robots.txt file ...

Here's mine ... with some Tinyportal additions.

User-agent: Fasterfox
Disallow: /

User-agent: *
Disallow: /arcade/
Disallow: /arcade
Disallow: /attachments/
Disallow: /attachments
Disallow: /avatars/
Disallow: /avatars
Disallow: /chat/
Disallow: /chat
Disallow: /FCKeditor/
Disallow: /FCKeditor
Disallow: /gallery/
Disallow: /gallery
Disallow: /Packages/
Disallow: /Packages
Disallow: /Smileys/
Disallow: /Smileys
Disallow: /Sources/
Disallow: /Sources
Disallow: /Themes/
Disallow: /Themes
Disallow: /tp-downloads/
Disallow: /tp-downloads
Disallow: /tp-images/
Disallow: /tp-images
Disallow: /wysiwyg/
Disallow: /wysiwyg
Disallow: /apc.php
Disallow: /ssi_examples.php
Disallow: /ssi_examples.shtml
Disallow: /status.php
Disallow: /status.php?php
Disallow: /Settings.php
Disallow: /Settings_bak.php
Disallow: /index.php?action=admin
Disallow: /index.php?action=activate
Disallow: /index.php?action=arcade
Disallow: /index.php?action=calendar
Disallow: /index.php?action=collapse
Disallow: /index.php?action=deletemsg
Disallow: /index.php?action=editpoll
Disallow: /index.php?action=gallery
Disallow: /index.php?action=help
Disallow: /index.php?action=helpadmin
Disallow: /index.php?action=lock
Disallow: /index.php?action=login
Disallow: /index.php?action=logout
Disallow: /index.php?action=markasread
Disallow: /index.php?action=mergetopics
Disallow: /index.php?action=mlist
Disallow: /index.php?action=modifykarma
Disallow: /index.php?action=movetopic
Disallow: /index.php?action=notify
Disallow: /index.php?action=notifyboard
Disallow: /index.php?action=pm
Disallow: /index.php?action=post
Disallow: /index.php?action=printpage
Disallow: /index.php?action=profile
Disallow: /index.php?action=register
Disallow: /index.php?action=removetopic2
Disallow: /index.php?action=reporttm
Disallow: /index.php?action=search
Disallow: /index.php?action=sendtopic
Disallow: /index.php?action=splittopics
Disallow: /index.php?action=stats
Disallow: /index.php?action=sticky
Disallow: /index.php?action=tpadmin
Disallow: /index.php?action=tpmod
Disallow: /index.php?action=trackip
Disallow: /index.php?action=unread
Disallow: /index.php?action=unreadreplies
Disallow: /index.php?action=who
Title: Re: SEO: Duplicate Content Preventer
Post by: Chantal Matar on August 04, 2007, 02:39:50 PM
Guys where do I put the robots.txt file?

I just checked my site on Google and they've indexed all the profiles and really irrelevant stuff!

Do I install this mod first, and then upload a robot.txt file somewhere?

Your help would be much appreciated.  :)
Title: Re: SEO: Duplicate Content Preventer
Post by: Neorics on September 20, 2007, 03:55:26 PM
does this work for smf 1.1.3 and with conjunction with SEO4SMF mod?
Title: Re: SEO: Duplicate Content Preventer
Post by: Flying Drupalist on September 21, 2007, 06:00:55 PM
Quote from: eldʌkaː on November 10, 2006, 09:49:20 PM
You should never overwrite the default theme.

Why not? The default theme can always be easily restored from the clean SMF.
Title: Re: SEO: Duplicate Content Preventer
Post by: Aileen on December 12, 2007, 06:06:13 PM
Does this mod still work on 1.1.4? thanks
Title: Re: SEO: Duplicate Content Preventer
Post by: vbgamer45 on December 12, 2007, 07:56:14 PM
If you have SMF 1.1.x you do not need to install this mod since it is built into this release.
Title: Re: SEO: Duplicate Content Preventer
Post by: TrueSatan on December 12, 2007, 08:13:54 PM
Quote from: Miraploy on September 21, 2007, 06:00:55 PM
Quote from: eldʌkaː on November 10, 2006, 09:49:20 PM
You should never overwrite the default theme.

Why not? The default theme can always be easily restored from the clean SMF.

Mods may well not install properly if you overwrite the default theme and having it is a fallback for any problems in other themes even if you install manually for each mod...the advice given not to overwrite it is absolutely correct!
Title: Re: SEO: Duplicate Content Preventer
Post by: pcigre on November 20, 2008, 09:14:54 AM
Anyone to update it for 1.1.7?
Title: Re: SEO: Duplicate Content Preventer
Post by: AlexAcosta on February 12, 2009, 05:28:03 AM
Quote from: Aäron on November 09, 2006, 01:06:34 PM
Link to Mod (http://mods.simplemachines.org/index.php?mod=534)
Rate this Mod (http://mods.simplemachines.org/index.php?action=review;sa=add;mod=534)

This mod will tell robots not to index topics that are being accessed with .msg, prev_next, ;all, or by printing the topic (?action=printpage), by adding <meta name="robots" content="noindex" /> to these pages.

Note: this mod requires a modification in index.template.php. It changes only the default theme's index.template.php, so you'll have to apply the changes manually in any custom theme!
I believe you should also add rel=nofollow on all links pointing to pages like these..not only the noindex tag. Noindex will avoid duplicate content but PR will spead to these useless pages