News:

SMF 2.1.4 has been released! Take it for a spin! Read more.

Main Menu

How to prevent Google bot from indexing wap and wap2

Started by humbleworld, June 02, 2010, 01:45:54 AM

Previous topic - Next topic

humbleworld

What is the correct format to be used in robotx.txt?
I want to stop Google from crawling wap and wap2.
Also, sendtopic and printpage pages.

Kays

Hi, I don't think it's wise to prevent bots from viewing wireless pages as the major ones are compiling a separate index of wireless enabled pages. There is a trick which should prevent wireless pages from being displayed for a non-wireless browsers.

Try this for your robots.txt file. Do change boards to the name of the folder your forums are in.


User-agent: *
Disallow: /boards/index.php?action=login*
Disallow: /boards/index.php?action=register*
Disallow: /boards/index.php?action=reminder*
Disallow: /boards/index.php?action=printpage*
Disallow: /boards/index.php?action=profile*
Disallow: /boards/index.php?action=emailuser*
Disallow: /boards/index.php?action=help*
Disallow: /boards/index.php?action=search*
Disallow: /boards/index.php?action=viewers*
Disallow: /boards/index.php?action=unread*
Disallow: /boards/index.php?action=verificationcode*


If at first you don't succeed, use a bigger hammer. If that fails, read the manual.
My Mods

humbleworld

Hello Kays,

I am using prettyURL mod, so the URL of my forum has changed. How the robots.txt should look like?

Kays

Blah, I hate the prettyURL mod. It can cause all sorts of problems. ::)

What does the url for an action, such as search or profile, look like?


If at first you don't succeed, use a bigger hammer. If that fails, read the manual.
My Mods

humbleworld

Kays, can the prettyURL mod cause a high cpu load average?
What sort of problems are you experiencing?

Kays

It probably does increase the server load a bit. But I can't say how much as I've never used it.

What I meant by problems is that since it changes the url it can cause support issues. As might be the case here. :)

Can you post the url for a couple of actions so I can see what they look like? Or a link to your site if guest viewing is enabled.

If at first you don't succeed, use a bigger hammer. If that fails, read the manual.
My Mods

ke

Quote from: Kays on June 05, 2010, 07:44:02 AM
Hi, I don't think it's wise to prevent bots from viewing wireless pages as the major ones are compiling a separate index of wireless enabled pages. There is a trick which should prevent wireless pages from being displayed for a non-wireless browsers.

Try this for your robots.txt file. Do change boards to the name of the folder your forums are in.


User-agent: *
Disallow: /boards/index.php?action=login*
Disallow: /boards/index.php?action=register*
Disallow: /boards/index.php?action=reminder*
Disallow: /boards/index.php?action=printpage*
Disallow: /boards/index.php?action=profile*
Disallow: /boards/index.php?action=emailuser*
Disallow: /boards/index.php?action=help*
Disallow: /boards/index.php?action=search*
Disallow: /boards/index.php?action=viewers*
Disallow: /boards/index.php?action=unread*
Disallow: /boards/index.php?action=verificationcode*



Does this still work with version 2.0.4? And how does this trick work? What do these actions have to do with the WAP versions?

Arantor

QuoteWhat do these actions have to do with the WAP versions?

Absolutely nothing.

Mayhem30

This is what I'm using - works great.

Disallow: /forum/index.php?action=help*
Disallow: /forum/index.php?action=login*
Disallow: /forum/index.php?action=register*
Disallow: /forum/index.php?action=search*
Disallow: /forum/index.php?action=reminder*
Disallow: /forum/index.php?action=profile*
Disallow: /forum/index.php?action=unread*
Disallow: /forum/index.php?action=verificationcode*
Disallow: /forum/index.php?action=printpage*

Disallow: /forum/index.php?*;wap
Disallow: /forum/index.php?*;wap2
Disallow: /forum/index.php?*;imode

Seo-luntan

It's interesting and useful.
Mayhem30, I only don't know what's this about Disallow: /forum/index.php?*;imode
?

Simple Machinist

Can you also use this code method to prevent Google from indexing certain private boards on your forum? For example:

User-agent: *

Disallow: /forum/index.php/board,44.0.html


Would something like this work?

Arantor

Not really, it would hide the board listing, sure, but it wouldn't hide any other method by which they can get to topics, e.g. recent list.

If the board is private, don't show it to guests.

nend

I would just remove the old mobile stuff from index.php. There is little use for it now days I believe, since mobile has advanced so forth.

Simple Machinist

Thank you both. I am not worried about mobile stuff. I just have a couple of private boards which I would rather not have indexed at all by Google. So what other Disallow code lines would I need to add to prevent that one board from being indexed?

Also, if I want to prevent my whole forum from being cached by Google I belive I can add the following code:

<META NAME="ROBOTS" CONTENT="NOARCHIVE">

But to which page on my forum should I add the above code to prevent caching?

Arantor

Make the boards not visible to guests, like I said.

robots.txt is the wrong way to do this.

Simple Machinist

Yes, the board is already not visible to guests. Does this alone prevent Google from indexing the board?

robots.txt is the wrong way to do which?

Arantor

Google appears as a guest, therefore if a board is not visible to guests, it's not visible to Google.

Using robots.txt - or meta tags for that matter - is not the way to block an entire board from Google because you have to block every topic in it as well, and there's no way to automate that in robots.txt, and no way to do that in the code without a change, and even if you mark it nofollow or even noindex, Google will STILL follow it...

Simple Machinist

Thanks. I am not trying to stop Google from indexing the whole site. Just to prevent indexing on that particular board which is not accessible by guests. So I think by not making it accessible to guests I got that sorted out.

But I put this code into my index.template.php file for my theme and I am hoping this at least stops Google from caching the entire site:

<META NAME="ROBOTS" CONTENT="NOARCHIVE">



Arantor

Maybe it would, maybe it wouldn't. Google is free to ignore any and all such directions. They are requests, not instructions.

Gwenwyfar

Not sure this helps, but you can also create a group for bots (will be any bot, not just google I believe, anything SMF recognizes as a bot), and then give it whatever permissions you want to give to them. Should be on spider settings somewhere, then you need to create a normal group for them.
"It is impossible to communicate with one that does not wish to communicate"

Advertisement: