What is the correct format to be used in robotx.txt?
I want to stop Google from crawling wap and wap2.
Also, sendtopic and printpage pages.
Hi, I don't think it's wise to prevent bots from viewing wireless pages as the major ones are compiling a separate index of wireless enabled pages. There is a trick which should prevent wireless pages from being displayed for a non-wireless browsers.
Try this for your robots.txt file. Do change boards to the name of the folder your forums are in.
User-agent: *
Disallow: /boards/index.php?action=login*
Disallow: /boards/index.php?action=register*
Disallow: /boards/index.php?action=reminder*
Disallow: /boards/index.php?action=printpage*
Disallow: /boards/index.php?action=profile*
Disallow: /boards/index.php?action=emailuser*
Disallow: /boards/index.php?action=help*
Disallow: /boards/index.php?action=search*
Disallow: /boards/index.php?action=viewers*
Disallow: /boards/index.php?action=unread*
Disallow: /boards/index.php?action=verificationcode*
Hello Kays,
I am using prettyURL mod, so the URL of my forum has changed. How the robots.txt should look like?
Blah, I hate the prettyURL mod. It can cause all sorts of problems. ::)
What does the url for an action, such as search or profile, look like?
Kays, can the prettyURL mod cause a high cpu load average?
What sort of problems are you experiencing?
It probably does increase the server load a bit. But I can't say how much as I've never used it.
What I meant by problems is that since it changes the url it can cause support issues. As might be the case here. :)
Can you post the url for a couple of actions so I can see what they look like? Or a link to your site if guest viewing is enabled.
Quote from: Kays on June 05, 2010, 07:44:02 AM
Hi, I don't think it's wise to prevent bots from viewing wireless pages as the major ones are compiling a separate index of wireless enabled pages. There is a trick which should prevent wireless pages from being displayed for a non-wireless browsers.
Try this for your robots.txt file. Do change boards to the name of the folder your forums are in.
User-agent: *
Disallow: /boards/index.php?action=login*
Disallow: /boards/index.php?action=register*
Disallow: /boards/index.php?action=reminder*
Disallow: /boards/index.php?action=printpage*
Disallow: /boards/index.php?action=profile*
Disallow: /boards/index.php?action=emailuser*
Disallow: /boards/index.php?action=help*
Disallow: /boards/index.php?action=search*
Disallow: /boards/index.php?action=viewers*
Disallow: /boards/index.php?action=unread*
Disallow: /boards/index.php?action=verificationcode*
Does this still work with version 2.0.4? And how does this trick work? What do these actions have to do with the WAP versions?
QuoteWhat do these actions have to do with the WAP versions?
Absolutely nothing.
This is what I'm using - works great.
Disallow: /forum/index.php?action=help*
Disallow: /forum/index.php?action=login*
Disallow: /forum/index.php?action=register*
Disallow: /forum/index.php?action=search*
Disallow: /forum/index.php?action=reminder*
Disallow: /forum/index.php?action=profile*
Disallow: /forum/index.php?action=unread*
Disallow: /forum/index.php?action=verificationcode*
Disallow: /forum/index.php?action=printpage*
Disallow: /forum/index.php?*;wap
Disallow: /forum/index.php?*;wap2
Disallow: /forum/index.php?*;imode
It's interesting and useful.
Mayhem30, I only don't know what's this about Disallow: /forum/index.php?*;imode
?
Can you also use this code method to prevent Google from indexing certain private boards on your forum? For example:
User-agent: *
Disallow: /forum/index.php/board,44.0.html
Would something like this work?
Not really, it would hide the board listing, sure, but it wouldn't hide any other method by which they can get to topics, e.g. recent list.
If the board is private, don't show it to guests.
I would just remove the old mobile stuff from index.php. There is little use for it now days I believe, since mobile has advanced so forth.
Thank you both. I am not worried about mobile stuff. I just have a couple of private boards which I would rather not have indexed at all by Google. So what other Disallow code lines would I need to add to prevent that one board from being indexed?
Also, if I want to prevent my whole forum from being cached by Google I belive I can add the following code:
<META NAME="ROBOTS" CONTENT="NOARCHIVE">
But to which page on my forum should I add the above code to prevent caching?
Make the boards not visible to guests, like I said.
robots.txt is the wrong way to do this.
Yes, the board is already not visible to guests. Does this alone prevent Google from indexing the board?
robots.txt is the wrong way to do which?
Google appears as a guest, therefore if a board is not visible to guests, it's not visible to Google.
Using robots.txt - or meta tags for that matter - is not the way to block an entire board from Google because you have to block every topic in it as well, and there's no way to automate that in robots.txt, and no way to do that in the code without a change, and even if you mark it nofollow or even noindex, Google will STILL follow it...
Thanks. I am not trying to stop Google from indexing the whole site. Just to prevent indexing on that particular board which is not accessible by guests. So I think by not making it accessible to guests I got that sorted out.
But I put this code into my index.template.php file for my theme and I am hoping this at least stops Google from caching the entire site:
<META NAME="ROBOTS" CONTENT="NOARCHIVE">
Maybe it would, maybe it wouldn't. Google is free to ignore any and all such directions. They are requests, not instructions.
Not sure this helps, but you can also create a group for bots (will be any bot, not just google I believe, anything SMF recognizes as a bot), and then give it whatever permissions you want to give to them. Should be on spider settings somewhere, then you need to create a normal group for them.
No, it doesn't work *quite* like that. Yes, you can create such a group but the group only *denies* permissions, it does not grant any extras, and I don't think it gives any changes to board access.
Ahh, I see. And yes, I just tested, it doesn't affect boards even if you're giving an extra permission. Would be a good addition though.
Why would it? Adding permissions or access like this is a bad idea since it can earn you penalties like Experts Exchange discovered some time ago.
I was talking about boards access more specifically, I can't think of any use for permissions themselves (as in permissions to do something in the forum), actually. I do have sections I want visible for guests, but not for bots for instance, since they'd be basically "spam" on their eyes and wouldn't be useful for anyone making a search. I know many forums that have a similar section or simply people who wouldn't like google indexing some things.
Experts Exchange discovered what? (sorry, I don't even know what is that site :P)
Yeah, there are issues around *that* too. Google is very fussy about things you show to guests but not to things that identify themselves as Google.
And do they really check for that? What issues would they be? I know it already considers these access denied pages a "soft 404 error" regardless. I guess that's one more thing on the pile of useless things to try with a forum for SEs then...
They have been known to, yes. Like I said, Experts Exchange ran afoul of this in their scheme to convince people to pay for access but show the results to Google for free.