Simple Machines Community Forum

SMF Support => SMF 2.0.x Support => Topic started by: JWJ on May 24, 2021, 03:57:45 PM

Title: Robots NoIndex
Post by: JWJ on May 24, 2021, 03:57:45 PM
I can't help but feel I'm being really stupid here, but can someone please help me. My forum posts are being indexed by Google and I don't want them to be. I've been doing lots of searches to find out how to add a Robots Noindex meta tag to all the forum pages but all the topics I find are about how to help Google index, not how to stop it. When I look at the source of my pages I can't see a robots meta tag in the html even though everything I've read here today suggests it's there by default. Sorry to be so dense but what is the best way of getting <meta name="robots" content="noindex"> into all of my pages?
Thank you.
Title: Re: Robots NoIndex
Post by: Kindred on May 24, 2021, 04:06:14 PM
Why do you not want Google to catalog your site?
(And, if they already have,  it's too late, you are in the catalog)

But what is your reasoning?
Also note that many search bots do not actually follow the instructions of robots tags.

Short of it is, you'll have to edit code.  But if you tell us why, there might be a better option
Title: Re: Robots NoIndex
Post by: JWJ on May 24, 2021, 06:50:03 PM
My forum is for beginner artists, many of whom are 'shy' about showing their work and appreciate the apparent 'privacy' of not having their work displayed around the 'net. I made the mistake of thinking that Disallow in my Robots.txt would keep search engines out but I now found that it doesn't stop the posts being indexed.
Title: Re: Robots NoIndex
Post by: shawnb61 on May 24, 2021, 07:53:46 PM
I use the following in robots.txt in the root public folder on the server where my personal & test environments are, and it seems to work.  The content is not on Google, Bing, Yahoo, etc.:
User-agent: *
Disallow: /


What Kindred says above is accurate: robots.txt is merely a suggestion, so any unscrupulous bots will ignore it.  The major commercial ones will generally try to honor it.  Bing has some syntax quirks, but the above is working for Bing as well for me.

Once indexed, options are a little difficult.  You can use the Google Search Console to clear it from Google.  Or, more brute force, you can try moving it to a (different) sub-folder, so the old URLs don't really exist anymore.  Then they'll get removed as broken links.

I think this article is helpful, and offers some more options for de-indexing:
https://www.upbuild.io/blog/how-to-deindex-pages-from-google/
Title: Re: Robots NoIndex
Post by: Kindred on May 24, 2021, 09:12:41 PM
Or you can turn off guest browsing
Title: Re: Robots NoIndex
Post by: JWJ on May 25, 2021, 04:44:52 AM
shawnb61, I hadn't appreciated that the robots.txt directive only blocks crawling but not indexing so I think the damage is done. That link will prove helpful as I explore ways to get my forum de-indexed.

Kindred, I hadn't considered guest browsing. That seems such a simple solution. I'll give it a try.

Thank you both, I'm very grateful.
Title: Re: Robots NoIndex
Post by: Steve on May 25, 2021, 07:48:34 AM
Marking solved as the original questions have been answered. If you feel you need to continue this topic by all means, mark it unsolved. :)