News:

Want to get involved in developing SMF? Why not lend a hand on our GitHub!

Main Menu

Robots NoIndex

Started by JWJ, May 24, 2021, 03:57:45 PM

Previous topic - Next topic

JWJ

I can't help but feel I'm being really stupid here, but can someone please help me. My forum posts are being indexed by Google and I don't want them to be. I've been doing lots of searches to find out how to add a Robots Noindex meta tag to all the forum pages but all the topics I find are about how to help Google index, not how to stop it. When I look at the source of my pages I can't see a robots meta tag in the html even though everything I've read here today suggests it's there by default. Sorry to be so dense but what is the best way of getting <meta name="robots" content="noindex"> into all of my pages?
Thank you.

Kindred

Why do you not want Google to catalog your site?
(And, if they already have,  it's too late, you are in the catalog)

But what is your reasoning?
Also note that many search bots do not actually follow the instructions of robots tags.

Short of it is, you'll have to edit code.  But if you tell us why, there might be a better option
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

JWJ

My forum is for beginner artists, many of whom are 'shy' about showing their work and appreciate the apparent 'privacy' of not having their work displayed around the 'net. I made the mistake of thinking that Disallow in my Robots.txt would keep search engines out but I now found that it doesn't stop the posts being indexed.

shawnb61

I use the following in robots.txt in the root public folder on the server where my personal & test environments are, and it seems to work.  The content is not on Google, Bing, Yahoo, etc.:
User-agent: *
Disallow: /


What Kindred says above is accurate: robots.txt is merely a suggestion, so any unscrupulous bots will ignore it.  The major commercial ones will generally try to honor it.  Bing has some syntax quirks, but the above is working for Bing as well for me.

Once indexed, options are a little difficult.  You can use the Google Search Console to clear it from Google.  Or, more brute force, you can try moving it to a (different) sub-folder, so the old URLs don't really exist anymore.  Then they'll get removed as broken links.

I think this article is helpful, and offers some more options for de-indexing:
https://www.upbuild.io/blog/how-to-deindex-pages-from-google/
A question worth asking is born in experience & driven by necessity. - Fripp

Kindred

Or you can turn off guest browsing
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

JWJ

shawnb61, I hadn't appreciated that the robots.txt directive only blocks crawling but not indexing so I think the damage is done. That link will prove helpful as I explore ways to get my forum de-indexed.

Kindred, I hadn't considered guest browsing. That seems such a simple solution. I'll give it a try.

Thank you both, I'm very grateful.

Steve

Marking solved as the original questions have been answered. If you feel you need to continue this topic by all means, mark it unsolved. :)
My pet rock is not feeling well. I think it's stoned.

Advertisement: