News:

Want to get involved in developing SMF, then why not lend a hand on our github!

Main Menu

Index.template and "robots noindex" issue

Started by davo88, January 11, 2024, 12:01:59 PM

Previous topic - Next topic

davo88

SMF v2.0.19
Mods installed
Ad Managment 3.2
Simple Audio Video Embedder 4.5.3a
Stop Spammer 2.3.9
Join date and Location in Posts 1.1
Remove Child Boards 2.2.1
Theme - Default - Curve

This line is appearing in about 3000 pages on my site right below the page title and blocking Google and other search bots.
<meta name="robots" content="noindex" />
I notice that index.template.php has this code, but I don't understand exactly what triggers that line to be inserted and whether this might be the cause of the problem. 

echo '
 <meta http-equiv="Content-Type" content="text/html; charset=', $context['character_set'], '" />
 <meta name="description" content="', $context['page_title_html_safe'], '" />', !empty($context['meta_keywords']) ? '
 <meta name="keywords" content="' . $context['meta_keywords'] . '" />' : '', '
 <title>', $context['page_title_html_safe'], '</title>';

 // Please don't index these Mr Robot.
 if (!empty($context['robot_no_index']))
 echo '
 <meta name="robots" content="noindex" />';

 // Present a canonical url for search engines to prevent duplicate content in their indices.
 if (!empty($context['canonical_url']))
 echo '
 <link rel="canonical" href="', $context['canonical_url'], '" />';

Can anyone explain the above code and whether this might be the cause of  '<meta name="robots" content="noindex"' being inserted in the pages?

davo88

Below is an example of the lines that appear above the "noindex" line on the affected pages, in case that helps provide a clue.

<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
<meta name="description" content="Crankcase Breathing on Post War Models" />
<meta name="keywords" content="douglas,motorcycle,forum" />
<title>Crankcase Breathing on Post War Models</title>
<meta name="robots" content="noindex" />

Kindred

Are you sure that you don't have other mods?
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

Arantor

Oh there's plenty of reasons SMF might legitimately put that tag there (correctly).'

Can you give some examples of URLs that are producing it? It may be that they *should* have noindex on them.

shawnb61

I believe SMF puts noindex whenever the URL does not match the canonical form for that page.

(Something to keep in mind with some of the sitemap generators out there - most entries are ignored because they don't follow that rule...)
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

davo88

Quote from: Kindred on January 11, 2024, 12:54:43 PMAre you sure that you don't have other mods?
I did have the httpBL and honeypot mod installed but have now removed it and the problem persists.

Quote from: Arantor on January 11, 2024, 01:49:46 PMCan you give some examples of URLs that are producing it? It may be that they *should* have noindex on them.
The URL's I have checked have all been publicly viewable and not in protected forums. Some examples:
https://www.douglasmotorcycles.net/index.php?topic=4675.5
https://www.douglasmotorcycles.net/index.php?topic=6140.10
https://www.douglasmotorcycles.net/index.php?topic=7014.60

Quote from: shawnb61 on January 11, 2024, 02:08:32 PMI believe SMF puts noindex whenever the URL does not match the canonical form for that page.
Could you elaborate on this please? This sounds like a possibility.

Arantor

And SMF is *correctly* no-indexing all of them.

So, from the guest perspective, you have 50 posts per page as the pagination settings. None of these aligns to the posts-per-page setting so SMF will figure out which page it should be on and deliver the correct content but deliberately flag these as noindex so they won't duplicate.

E.g. your first link is 4675.5 - this indicates 5 posts into the topic. 4675.0 (the first page) is correctly not showing the noindex because that *should* be indexed.

davo88

Quote from: Arantor on January 11, 2024, 03:06:48 PMNone of these aligns to the posts-per-page setting so SMF will figure out which page it should be on and deliver the correct content but deliberately flag these as noindex so they won't duplicate.
I don't understand what " None of these aligns to the posts-per-page setting.." means. Could you explain that a little more please?

Also, how do I correct the problem? Should the posts per page setting be reduced?

Kindred

there is no problem.  it is all working as intended and as google wants.

you have 50 posts per page defined in the admin settings.
That's good.

but the individual link to any post 1-50, 51-100, 101-150 etc are DUPLICATE links to the same page content...  thus they get noindex tags.  (e.g. post 5 and post 25 are all part of the same 1-50 page and should not be individually indexed)
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

davo88

Thanks Kindred, the following was written while you were posting...

Quote from: Arantor on January 11, 2024, 03:06:48 PME.g. your first link is 4675.5 - this indicates 5 posts into the topic. 4675.0 (the first page) is correctly not showing the noindex because that *should* be indexed.
I think I get it now. The code is written so that 4675.0 is the only page that gets indexed and it considers 4675.5 as a duplicate.

Is this a good idea? It seems to be gradually eroding the number of successfully indexed pages.
You cannot view this attachment.

Kindred

it is required by google ToS!   If google sees duplicate content with different URLs, they will rank you lower and lower until they blacklist your site
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

Arantor

Quote from: davo88 on January 11, 2024, 03:50:52 PMThe code is written so that 4675.0 is the only page that gets indexed and it considers 4675.5 as a duplicate.

Exactly, because 4675.0 through to 4675.49 are all numbers that *could* come up (plus some other variations) but they're *all* duplicates of 4675.0 for content. So they all need to be marked no-index.

This is normal and has been for well over a decade at this point (probably longer, fairly sure SMF 1.1 did this in 2006)

And yes, that means the number of total pages will go down because you'll naturally get more topics with more faux-duplicate pages that all need to be excluded out of the total, so while the total will go up, the percentage being indexed will go down (expected).

davo88

Many thanks to all you guys for the fast and helpful replies - very much appreciated.

I've been sifting through URL's on Google Search Console all day and finding out the various issues. The Google crawler has only indexed about 1500 out of approximately 7000 topics. Also it has used links entered by members in their posts (pointing to other topics) as the canonical link and then not indexed the real canonical link. Think I need to speed up the conversion to 2.1.4 and get a good sitemap mod installed. Hopefully that will help the the crawler find all the topics and the true canonical links.
 

Arantor

It should be able to find them normally from the list of topics though...

davo88

Yes, you would think so. But looking through all the link results in Google Search Console yesterday, I get the impression the crawler goes off on all sorts of side roads as it works its way through a forum site.
Unlike a human reader who would scan down the list of topics within a board, the crawlers seem to zig zag all over the place. And then they seem to reach dead ends and stop crawling.
As soon as I resubmitted a link that the crawler had deemed to be non-canonical, it checked the link and almost immediately agreed it was in fact canonical. I could sit there for weeks re-submitting links and correcting things, but how do I know the crawler isn't going to get lost again and undo all that work? For medium to large forums, a sitemap looks like a good aid to keep crawlers on the right track and finish the job. It looks to me like they need a bit of help on a forum site because there are so many links on every page.
I have put one on the 2.0 site using an old mod. I don't know if the format is still acceptable but we'll see what happens.

Arantor

The thing is, you don't. I gave up dealing with Google Search Console because nearly every issue it reported to me was in fact a false positive and that I didn't have to do anything before it just figured it out.

I don't actually think a sitemap will help significantly because all you're doing is just giving it a list of URLs to crawl which it will figure out for itself, at least based on my observations.

The real solution is to actually perform redirects but that has other, user-facing consequences that are awkward to solve effectively.

davo88

This was the sitemap mod I used - SEO sitemap and XML sitemap v2.2.1 as my site is currently running SMF2.0.19.
Google wasn't keen on the HTML version of the sitemap (bottom one in the snip). It doesn't look like a fatal error, it just says it would prefer XML. 

So I submitted an XML one using the same mod and within minutes it successfully found 6584 links. This looks healthier than the 1440 it found on its own. So we'll see if it can manage to stay on track and index those links properly.

You cannot view this attachment.

Advertisement: