News:

Bored?  Looking to kill some time?  Want to chat with other SMF users?  Join us in IRC chat or Discord

Main Menu

Robot_no-index code

Started by helpdivya, April 24, 2009, 06:35:38 PM

Previous topic - Next topic

helpdivya

When I do view source on the post page I see below code. Can anyone tell me what this code does?
      // Please don't index these Mr Robot.
   if (!empty($context['robot_no_index']))
      echo '
   <meta name="robots" content="INDEX, FOLLOW">';

karlbenson

That code looks incorrect to me.

It should be noindex.
if (!empty($context['robot_no_index']))
      echo '
   <meta name="robots" content="noindex">';

The code tells search engines not to index duplicate content pages and other pages that you don't want to be included in google. (SEO)

I'd recommend changing the code back to how smf has it set by default.
There is no need to alter it.  SMF is designed to allow your topics to be indexed.

helpdivya

#2
Yes I changed it. But since I have migrated to RC1 I dont find any new pages in google Index. And when I check the old file I found that there was index code so I changed it as it was in the old file before migrating to RC1.

BTW what the above code will do? And what is the code that tells google to index the page?

karlbenson

Your modified code has broken the intended functionality.

It was intended to disallow indexing of .msg links/duplicate content and ALLOW indexing of topics and boards.
However with your modified code it will allow indexing topics and boards, AND allowing the junk pages to get indexed.

We do not recommend modifying that code as there is NO NEED OR REASON TO.
So I'd suggest altering it back asap.

You need to understand that that line is conditional on $context['robot_no_index']
That line is ONLY echoed into the source when smf doesn't want that page indexed. It won't effect topics/boards.

If your site isn't indexing correctly then you either aren't giving search engines enough time, or there are other reasons why its not (eg low pagerank, bad robots.txt, search engines tos violations, private site).

helpdivya

#4
It was getting index before RC1 and there was no issue. And when I check the my old index.template.php file of pre RC1 I found following code

<meta name="description" content="', $context['page_title'], '" />', empty($context['robot_no_index']) ? '' : '

   <meta name="robots" content="index" />', '

karlbenson

That code gives a free-for-all.
Your undoing smf's search engine optimizations.

SMF 1.1.x or 2.x doesn't have any problems getting indexed.
Like I said if your having particular issues, then its probably down to something else, not this code.

helpdivya

It was getting index before RC1 and there was no issue. And when I check the my old index.template.php file of pre RC1 I found following code

<meta name="description" content="', $context['page_title'], '" />', empty($context['robot_no_index']) ? '' : '

   <meta name="robots" content="index" />', '

karlbenson

There isn't much else I can say. I've explained whats wrong with your code and why its wrong.

My last comment in this topic will just point out that in allowing the junk / duplicate content pages to get indexed by google, you are likely to receive (if you haven't done so already) negative seo/seo penalty.

helpdivya

I agree and value your comments as I always do. Please answer this. Was wap part of 1.8 SMF? As far I know wap was introduce in 2.0 RC1. Am I right? Because I dont see any Wireless.template.php file in pre 2.0 RC1. Please answer

helpdivya

Any reply to my previous post?

karlbenson

Themes/default/Wireless.template.php is a file in 2.0

If your using a custom theme and it doesn't have one, then it will pull it from the default theme.

helpdivya

Yes my friend that proves my point. My wap pages are getting in google index in 2 days, so that means there is no problem with TOS, privacy policy as you mention above. If it was a problem then none of the pages would have got index. See my post on the similar topic here

http://www.simplemachines.org/community/index.php?topic=303498.msg2021079#msg2021079

There is something wrong in 2.0 RC1 and as I mention that before RC1 we didn't had any such code infact it was asking to index.

karlbenson

Wap/Wap2/Imode pages are NOT noindexed. The noindex code is not used in Wireless.template.php

All it means is that the search engines are finding your wireless pages first. (Generally speaking since wireless pages are lighter, the mobile spider tends to spider them alot faster than normal spider)

You can easily solve this by adding to Wireless.template.php for wap, wap2 and imode into the <head> section for each wireless type
<meta name="robots" content="noindex">

(note Wireless pages do not run through index.template.php so any change in that file does not affect wireless)

As for getting your site indexed normally, the BEST advice is to
- register with google webmaster tools
- submit an xml sitemap.

helpdivya

The sitemap for google is already submitted since year 2008. As I said several times before the whole  problem of index started after 2.0 RC1.

Some of the issues that I am facing with 2.0 RC1 are


  • All the pages that has no style css to it are only getting index. This was not the case pre 2.0 RC1
  • No new pages are getting index

karlbenson

In RC1, we added the link to wap2 in the footer.

So it could that that prior to RC1 google was unaware that your site had a wap version, then discovered the link in the footer in RC1 and so has started indexing all of them.

If google has got a large backlog of them it is likely to attempt to spider all of the existing ones before spidering new ones.

helpdivya

I want to remove all the wap pages from google index. They say we need to provide the directory in webmaster to remove the url from indexing. Check this please http://www.google.com/support/webmasters/bin/answer.py?answer=59819&hl=en

Could you please provide me the directory that I need to mention in webmaster to remove the wap2 pages from google index. Also you could check the my robots file at http://www.ekhichdi.com/robots.txt. I have put the code in it that you gave me earlier.

karlbenson

Thanks for posting the link to your robots.txt

Theres several issues with your robots.txt

Robots.txt is the cause of your problems with remaining action=tags links remaining in google.
Disallow: *tags*
Remember robots.txt are crawler directives, not indexing ones. And they also do not get the search engine to remove existing pages indexed.
In order for google to drop those pages you must either remove them manually (one by one), OR remove that line from the robots.txt and allow the search engines to CRAWL the page to see the <meta noindex> to then realise it needs to drop existing action=tags pages from google.

You can only have ONE useragent * block

Wildcard in disallow line is not supported by most spiders (only google, yahoo and microsoft and ask).
For this reason you should only use it for specific named spiders. (other spiders will not just ignore that line, it will probably break it for them and they will probably just ignore the entire robots.txt)

(note, wildcards at the end are IMPLIED. The implied is supported by all versions of robots.txt and all spiders.)



So what do I recommend

1. Reversing the noindex at the top. Just trust me on this.

Make sure in each index.template.php you have this code

// Please don't index these Mr Robot.
if (!empty($context['robot_no_index']))
echo '
<meta name="robots" content="noindex" />';


2. Themes/default/Wireless.template.php
In that file you'll find 3 occurances of

<head>

(one is for wap, one for wap2, and one for imode)

ADD AFTER EACH of them (and before the closing </head>)
<meta name="robots" content="noindex" />

3. Robots.txt
Since robots.txt only controls the crawler, it will only stop google crawling, it won't remove existing ones from the index, and if they are enough backlinks to those pages, google will still index them (without ever crawling them [so without a title and description, and cached version of the page - it will only have a url]).

The meta noindex changes above for wireless.template.php will ONLY work if your are NOT blocking those pages with a robots.txt

Heres my suggested robots.txt
User-agent: gotdotnet.ch proxy services
Disallow: /

User-agent: TerrawizBot
Disallow: /

User-agent: *
Disallow: /main
Disallow: /_db_backups/

Sitemap: http://www.ekhichdi.com/index.php?action=sitemap;xml


Although even then,  to be honest I'd advise removing the line  /_db_backups/
If you store your db backups in that folder then thats a security risk that you let be known public.
DB backups shouldn't be in a publically accessible area.
That aside, there are bots which scan robots.txt looking for possible sensitive areas to hack eg (its advisable not to list private/admin areas in it).

Well hopefully that of use.

helpdivya

Thanks for the details. But it has left me more confused now.

First of all I had the exactly the same robots.txt that you suggest. Regarding all the extra codes in my current robots.txt it was given by one of your support specialist by name H. Please check this topic that I raised http://www.simplemachines.org/community/index.php?topic=303498.msg2021079#msg2021079

Also as I understand now having noindex code in the wireless.template page it will block only wap,wap2.imod pages but what about the printpage? Pages like this http://www.ekhichdi.com/fashion-and-styles/tattoo-care-for-the-summer/0/?action=printpage that is in getting in google index.

Once again I repeat that the code in robots.txt was give by your support colleague. Please verfiy as I am more confuse then before now.

Hope I am clear now.

karlbenson

H maybe less familar with the various robots.txt standards that I am.
I spent over a year tweaking my forum for seo with changes to smf and robots.txt
(alot of my results were then incorporated into smf 2.x already).

Printpages should already be noindexed (they are for default smf 2.0)
I'll look into this and report back.

karlbenson

Ok, I appear to have discovered the source of your issues.
This is not down to SMF, but to the PrettyUrls plugin. (without the mod installed smf noindexes fine)

I have posted this to the author of the mod
http://www.simplemachines.org/community/index.php?topic=146969.msg2032479#msg2032479

Advertisement: