Yandex Webmaster sent me a diagnostic alert about many pages on my forum don't have 404 respond, for example pages of non-existing boards return 200ok. They say that it can negatively affect page indexing and search results.
Is there any way to solve this? I found this mod https://custom.simplemachines.org/index.php?mod=3969, but it looks outdated.
https://custom.simplemachines.org/index.php?mod=2659
Quote from: Bugo on December 24, 2023, 04:22:43 AMhttps://custom.simplemachines.org/index.php?mod=2659
Cool, thanks!
Weird, SMF 2.1 out of the box already sends a 40x code for missing topic and missing board. Though it sends a 403 not a 404 to avoid information leakage.
Quote from: Arantor on December 24, 2023, 11:32:19 AMWeird, SMF 2.1 out of the box already sends a 40x code for missing topic and missing board. Though it sends a 403 not a 404 to avoid information leakage.
I don't know, but my SMF 2.1.4 doesn't do it for pages like https://example.com/index.php?topic=non-exising-topic-numbers. Maybe the Russian localization is the source of the problem, as I understand these titles come from there.
I tested it before replying originally, feeding a non-existing board number or topic number produces a 403 error (despite what the page might look like, that's literally the error code being sent back by the server which is what the spiders should be looking at).
I was there when that change was made in 2.1 (because 2.0 didn't do that). Can't vouch for whatever mods you might have though.
UPD: I checked, the problem is not in the Russian locale. Even this very forum doesn't have 404/403, check this page for example: https://www.simplemachines.org/community/index.php?board=24004.0 I don't see 403/404 in the source code, maybe I miss something?
Yes, you're missing something - it's nothing to do with the *content( of the page, but the information sent alongside it in the headers.
2023-12-24 18_46_27-Window.png
I went to the link you gave, pulled out the network traffic tab in Firefox and sure enough the page is returned by the server with a 403 status code.
It seems that Yandex checks that 404 in content as well as various validators. I tried some of them and all they show 200 OK
Examples:
https://http.app/test/ravXewbq04use9Norzad
https://www.webfx.com/tools/http-status-tool/
Also, tried in Firefox newtrodk tab. Idk I see 200 there
UPD: it seems that if you are logged, you get 403, if you are not, you get 200. As search bots are unlogged, they see 200.
Argh, yes, I forgot about that behaviour - the assumption that if you're not logged in, you might be able to log in to change it.
That said, it *does* still produce the <meta name="robots" content="noindex"> tag as a guide to search engines to not index that page.
In which case you can tweak Security.php to flag all 'pages that require you to log in' in this fashion as returning a 403 (technically it should be a 401 but I can't test right now that that won't have other side effects)
else
{
loadTemplate('Login');
$context['sub_template'] = 'kick_guest';
$context['robot_no_index'] = true;
}
else
{
header('HTTP/1.1 403 Forbidden');
loadTemplate('Login');
$context['sub_template'] = 'kick_guest';
$context['robot_no_index'] = true;
}
QuoteThat said, it *does* still produce the <meta name="robots" content="noindex"> tag as a guide to search engines to not index that page.
It changes everything and means that Yandex found some pages (a lot of them if it sends a notification) that doesn't contain "noindex", don't exist and return 200Ok. Unfortunately, the alert doesn't show examples, so I really don't know what pages are to blame, but definitely not these with "noindex". I'll try to investigate this, if I find something I'll write here again.
Yandex tends to ignore/violate standards anyway