News:

Want to get involved in developing SMF? Why not lend a hand on our GitHub!

Main Menu

Any way to make pages of non-existing boards 404?

Started by Arskrigitsioniets, December 24, 2023, 03:47:25 AM

Previous topic - Next topic

Arskrigitsioniets

Yandex Webmaster sent me a diagnostic alert about many pages on my forum don't have 404 respond, for example pages of non-existing boards return 200ok. They say that it can negatively affect page indexing and search results.

Is there any way to solve this? I found this mod https://custom.simplemachines.org/index.php?mod=3969, but it looks outdated.



Arantor

Weird, SMF 2.1 out of the box already sends a 40x code for missing topic and missing board. Though it sends a 403 not a 404 to avoid information leakage.
Holder of controversial views, all of which my own.


Arskrigitsioniets

Quote from: Arantor on December 24, 2023, 11:32:19 AMWeird, SMF 2.1 out of the box already sends a 40x code for missing topic and missing board. Though it sends a 403 not a 404 to avoid information leakage.
I don't know, but my SMF 2.1.4 doesn't do it for pages like https://example.com/index.php?topic=non-exising-topic-numbers. Maybe the Russian localization is the source of the problem, as I understand these titles come from there.

Arantor

I tested it before replying originally, feeding a non-existing board number or topic number produces a 403 error (despite what the page might look like, that's literally the error code being sent back by the server which is what the spiders should be looking at).

I was there when that change was made in 2.1 (because 2.0 didn't do that). Can't vouch for whatever mods you might have though.
Holder of controversial views, all of which my own.


Arskrigitsioniets

UPD: I checked, the problem is not in the Russian locale. Even this very forum doesn't have 404/403, check this page for example: https://www.simplemachines.org/community/index.php?board=24004.0 I don't see 403/404 in the source code, maybe I miss something?

Arantor

Yes, you're missing something - it's nothing to do with the *content( of the page, but the information sent alongside it in the headers.

You cannot view this attachment.

I went to the link you gave, pulled out the network traffic tab in Firefox and sure enough the page is returned by the server with a 403 status code.
Holder of controversial views, all of which my own.


Arskrigitsioniets

It seems that Yandex checks that 404 in content as well as various validators. I tried some of them and all they show 200 OK

Examples:
https://http.app/test/ravXewbq04use9Norzad
https://www.webfx.com/tools/http-status-tool/

Arskrigitsioniets

Also, tried in Firefox newtrodk tab. Idk I see 200 there

UPD: it seems that if you are logged, you get 403, if you are not, you get 200. As search bots are unlogged, they see 200.

Arantor

Argh, yes, I forgot about that behaviour - the assumption that if you're not logged in, you might be able to log in to change it.

That said, it *does* still produce the <meta name="robots" content="noindex"> tag as a guide to search engines to not index that page.

In which case you can tweak Security.php to flag all 'pages that require you to log in' in this fashion as returning a 403 (technically it should be a 401 but I can't test right now that that won't have other side effects)

else
{
loadTemplate('Login');
$context['sub_template'] = 'kick_guest';
$context['robot_no_index'] = true;
}

else
{
header('HTTP/1.1 403 Forbidden');
loadTemplate('Login');
$context['sub_template'] = 'kick_guest';
$context['robot_no_index'] = true;
}
Holder of controversial views, all of which my own.


Arskrigitsioniets

QuoteThat said, it *does* still produce the <meta name="robots" content="noindex"> tag as a guide to search engines to not index that page.
It changes everything and means that Yandex found some pages (a lot of them if it sends a notification) that doesn't contain "noindex", don't exist and return 200Ok. Unfortunately, the alert doesn't show examples, so I really don't know what pages are to blame, but definitely not these with "noindex". I'll try to investigate this, if I find something I'll write here again.

Kindred

Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

Advertisement: