Simple Machines Community Forum

SMF Support => SMF 1.1.x Support => Topic started by: humbleworld on June 02, 2010, 01:45:54 AM

Title: How to prevent Google bot from indexing wap and wap2
Post by: humbleworld on June 02, 2010, 01:45:54 AM
What is the correct format to be used in robotx.txt?
I want to stop Google from crawling wap and wap2.
Also, sendtopic and printpage pages.
Title: Re: How to prevent Google bot from indexing wap and wap2
Post by: Kays on June 05, 2010, 07:44:02 AM
Hi, I don't think it's wise to prevent bots from viewing wireless pages as the major ones are compiling a separate index of wireless enabled pages. There is a trick which should prevent wireless pages from being displayed for a non-wireless browsers.

Try this for your robots.txt file. Do change boards to the name of the folder your forums are in.


User-agent: *
Disallow: /boards/index.php?action=login*
Disallow: /boards/index.php?action=register*
Disallow: /boards/index.php?action=reminder*
Disallow: /boards/index.php?action=printpage*
Disallow: /boards/index.php?action=profile*
Disallow: /boards/index.php?action=emailuser*
Disallow: /boards/index.php?action=help*
Disallow: /boards/index.php?action=search*
Disallow: /boards/index.php?action=viewers*
Disallow: /boards/index.php?action=unread*
Disallow: /boards/index.php?action=verificationcode*

Title: Re: How to prevent Google bot from indexing wap and wap2
Post by: humbleworld on June 05, 2010, 07:58:04 AM
Hello Kays,

I am using prettyURL mod, so the URL of my forum has changed. How the robots.txt should look like?
Title: Re: How to prevent Google bot from indexing wap and wap2
Post by: Kays on June 05, 2010, 08:09:38 AM
Blah, I hate the prettyURL mod. It can cause all sorts of problems. ::)

What does the url for an action, such as search or profile, look like?

Title: Re: How to prevent Google bot from indexing wap and wap2
Post by: humbleworld on June 05, 2010, 07:53:23 PM
Kays, can the prettyURL mod cause a high cpu load average?
What sort of problems are you experiencing?
Title: Re: How to prevent Google bot from indexing wap and wap2
Post by: Kays on June 06, 2010, 09:30:28 AM
It probably does increase the server load a bit. But I can't say how much as I've never used it.

What I meant by problems is that since it changes the url it can cause support issues. As might be the case here. :)

Can you post the url for a couple of actions so I can see what they look like? Or a link to your site if guest viewing is enabled.
Title: Re: How to prevent Google bot from indexing wap and wap2
Post by: ke on May 01, 2013, 05:13:49 AM
Quote from: Kays on June 05, 2010, 07:44:02 AM
Hi, I don't think it's wise to prevent bots from viewing wireless pages as the major ones are compiling a separate index of wireless enabled pages. There is a trick which should prevent wireless pages from being displayed for a non-wireless browsers.

Try this for your robots.txt file. Do change boards to the name of the folder your forums are in.


User-agent: *
Disallow: /boards/index.php?action=login*
Disallow: /boards/index.php?action=register*
Disallow: /boards/index.php?action=reminder*
Disallow: /boards/index.php?action=printpage*
Disallow: /boards/index.php?action=profile*
Disallow: /boards/index.php?action=emailuser*
Disallow: /boards/index.php?action=help*
Disallow: /boards/index.php?action=search*
Disallow: /boards/index.php?action=viewers*
Disallow: /boards/index.php?action=unread*
Disallow: /boards/index.php?action=verificationcode*



Does this still work with version 2.0.4? And how does this trick work? What do these actions have to do with the WAP versions?
Title: Re: How to prevent Google bot from indexing wap and wap2
Post by: Arantor on May 01, 2013, 06:59:17 AM
QuoteWhat do these actions have to do with the WAP versions?

Absolutely nothing.
Title: Re: How to prevent Google bot from indexing wap and wap2
Post by: Mayhem30 on May 01, 2013, 01:50:32 PM
This is what I'm using - works great.

Disallow: /forum/index.php?action=help*
Disallow: /forum/index.php?action=login*
Disallow: /forum/index.php?action=register*
Disallow: /forum/index.php?action=search*
Disallow: /forum/index.php?action=reminder*
Disallow: /forum/index.php?action=profile*
Disallow: /forum/index.php?action=unread*
Disallow: /forum/index.php?action=verificationcode*
Disallow: /forum/index.php?action=printpage*

Disallow: /forum/index.php?*;wap
Disallow: /forum/index.php?*;wap2
Disallow: /forum/index.php?*;imode
Title: Re: How to prevent Google bot from indexing wap and wap2
Post by: Seo-luntan on June 22, 2013, 10:50:32 AM
It's interesting and useful.
Mayhem30, I only don't know what's this about Disallow: /forum/index.php?*;imode
?
Title: Re: How to prevent Google bot from indexing wap and wap2
Post by: Simple Machinist on December 18, 2014, 06:38:13 AM
Can you also use this code method to prevent Google from indexing certain private boards on your forum? For example:

User-agent: *

Disallow: /forum/index.php/board,44.0.html


Would something like this work?
Title: Re: How to prevent Google bot from indexing wap and wap2
Post by: Arantor on December 18, 2014, 09:30:29 AM
Not really, it would hide the board listing, sure, but it wouldn't hide any other method by which they can get to topics, e.g. recent list.

If the board is private, don't show it to guests.
Title: Re: How to prevent Google bot from indexing wap and wap2
Post by: nend on December 18, 2014, 10:34:55 AM
I would just remove the old mobile stuff from index.php. There is little use for it now days I believe, since mobile has advanced so forth.
Title: Re: How to prevent Google bot from indexing wap and wap2
Post by: Simple Machinist on December 19, 2014, 02:43:51 AM
Thank you both. I am not worried about mobile stuff. I just have a couple of private boards which I would rather not have indexed at all by Google. So what other Disallow code lines would I need to add to prevent that one board from being indexed?

Also, if I want to prevent my whole forum from being cached by Google I belive I can add the following code:

<META NAME="ROBOTS" CONTENT="NOARCHIVE">

But to which page on my forum should I add the above code to prevent caching?
Title: Re: How to prevent Google bot from indexing wap and wap2
Post by: Arantor on December 19, 2014, 03:20:14 AM
Make the boards not visible to guests, like I said.

robots.txt is the wrong way to do this.
Title: Re: How to prevent Google bot from indexing wap and wap2
Post by: Simple Machinist on December 19, 2014, 03:52:38 AM
Yes, the board is already not visible to guests. Does this alone prevent Google from indexing the board?

robots.txt is the wrong way to do which?
Title: Re: How to prevent Google bot from indexing wap and wap2
Post by: Arantor on December 19, 2014, 03:56:38 AM
Google appears as a guest, therefore if a board is not visible to guests, it's not visible to Google.

Using robots.txt - or meta tags for that matter - is not the way to block an entire board from Google because you have to block every topic in it as well, and there's no way to automate that in robots.txt, and no way to do that in the code without a change, and even if you mark it nofollow or even noindex, Google will STILL follow it...
Title: Re: How to prevent Google bot from indexing wap and wap2
Post by: Simple Machinist on December 19, 2014, 05:18:00 AM
Thanks. I am not trying to stop Google from indexing the whole site. Just to prevent indexing on that particular board which is not accessible by guests. So I think by not making it accessible to guests I got that sorted out.

But I put this code into my index.template.php file for my theme and I am hoping this at least stops Google from caching the entire site:

<META NAME="ROBOTS" CONTENT="NOARCHIVE">


Title: Re: How to prevent Google bot from indexing wap and wap2
Post by: Arantor on December 19, 2014, 11:04:05 AM
Maybe it would, maybe it wouldn't. Google is free to ignore any and all such directions. They are requests, not instructions.
Title: Re: How to prevent Google bot from indexing wap and wap2
Post by: Gwenwyfar on December 20, 2014, 07:09:37 PM
Not sure this helps, but you can also create a group for bots (will be any bot, not just google I believe, anything SMF recognizes as a bot), and then give it whatever permissions you want to give to them. Should be on spider settings somewhere, then you need to create a normal group for them.
Title: Re: How to prevent Google bot from indexing wap and wap2
Post by: Arantor on December 20, 2014, 07:35:54 PM
No, it doesn't work *quite* like that. Yes, you can create such a group but the group only *denies* permissions, it does not grant any extras, and I don't think it gives any changes to board access.
Title: Re: How to prevent Google bot from indexing wap and wap2
Post by: Gwenwyfar on December 20, 2014, 07:52:41 PM
Ahh, I see. And yes, I just tested, it doesn't affect boards even if you're giving an extra permission. Would be a good addition though.
Title: Re: How to prevent Google bot from indexing wap and wap2
Post by: Arantor on December 20, 2014, 08:01:29 PM
Why would it? Adding permissions or access like this is a bad idea since it can earn you penalties like Experts Exchange discovered some time ago.
Title: Re: How to prevent Google bot from indexing wap and wap2
Post by: Gwenwyfar on December 21, 2014, 12:56:03 AM
I was talking about boards access more specifically, I can't think of any use for permissions themselves (as in permissions to do something in the forum), actually. I do have sections I want visible for guests, but not for bots for instance, since they'd be basically "spam" on their eyes and wouldn't be useful for anyone making a search. I know many forums that have a similar section or simply people who wouldn't like google indexing some things.

Experts Exchange discovered what? (sorry, I don't even know what is that site :P)

Title: Re: How to prevent Google bot from indexing wap and wap2
Post by: Arantor on December 21, 2014, 01:01:17 AM
Yeah, there are issues around *that* too. Google is very fussy about things you show to guests but not to things that identify themselves as Google.
Title: Re: How to prevent Google bot from indexing wap and wap2
Post by: Gwenwyfar on December 21, 2014, 01:09:09 AM
And do they really check for that? What issues would they be? I know it already considers these access denied pages a "soft 404 error" regardless. I guess that's one more thing on the pile of useless things to try with a forum for SEs then...
Title: Re: How to prevent Google bot from indexing wap and wap2
Post by: Arantor on December 21, 2014, 01:11:39 AM
They have been known to, yes. Like I said, Experts Exchange ran afoul of this in their scheme to convince people to pay for access but show the results to Google for free.