Simple Machines Community Forum

SMF Support => SMF 2.1.x Support => Topic started by: Arskrigitsioniets on December 24, 2023, 03:47:25 AM

Title: Any way to make pages of non-existing boards 404?
Post by: Arskrigitsioniets on December 24, 2023, 03:47:25 AM
Yandex Webmaster sent me a diagnostic alert about many pages on my forum don't have 404 respond, for example pages of non-existing boards return 200ok. They say that it can negatively affect page indexing and search results.

Is there any way to solve this? I found this mod https://custom.simplemachines.org/index.php?mod=3969, but it looks outdated.
Title: Re: Any way to make pages of non-existing boards 404?
Post by: Bugo on December 24, 2023, 04:22:43 AM
https://custom.simplemachines.org/index.php?mod=2659

Title: Re: Any way to make pages of non-existing boards 404?
Post by: Arskrigitsioniets on December 24, 2023, 09:59:11 AM
Quote from: Bugo on December 24, 2023, 04:22:43 AMhttps://custom.simplemachines.org/index.php?mod=2659


Cool, thanks!
Title: Re: Any way to make pages of non-existing boards 404?
Post by: Arantor on December 24, 2023, 11:32:19 AM
Weird, SMF 2.1 out of the box already sends a 40x code for missing topic and missing board. Though it sends a 403 not a 404 to avoid information leakage.
Title: Re: Any way to make pages of non-existing boards 404?
Post by: Arskrigitsioniets on December 24, 2023, 01:33:15 PM
Quote from: Arantor on December 24, 2023, 11:32:19 AMWeird, SMF 2.1 out of the box already sends a 40x code for missing topic and missing board. Though it sends a 403 not a 404 to avoid information leakage.
I don't know, but my SMF 2.1.4 doesn't do it for pages like https://example.com/index.php?topic=non-exising-topic-numbers. Maybe the Russian localization is the source of the problem, as I understand these titles come from there.
Title: Re: Any way to make pages of non-existing boards 404?
Post by: Arantor on December 24, 2023, 01:44:01 PM
I tested it before replying originally, feeding a non-existing board number or topic number produces a 403 error (despite what the page might look like, that's literally the error code being sent back by the server which is what the spiders should be looking at).

I was there when that change was made in 2.1 (because 2.0 didn't do that). Can't vouch for whatever mods you might have though.
Title: Re: Any way to make pages of non-existing boards 404?
Post by: Arskrigitsioniets on December 24, 2023, 01:45:38 PM
UPD: I checked, the problem is not in the Russian locale. Even this very forum doesn't have 404/403, check this page for example: https://www.simplemachines.org/community/index.php?board=24004.0 I don't see 403/404 in the source code, maybe I miss something?
Title: Re: Any way to make pages of non-existing boards 404?
Post by: Arantor on December 24, 2023, 01:48:24 PM
Yes, you're missing something - it's nothing to do with the *content( of the page, but the information sent alongside it in the headers.

2023-12-24 18_46_27-Window.png

I went to the link you gave, pulled out the network traffic tab in Firefox and sure enough the page is returned by the server with a 403 status code.
Title: Re: Any way to make pages of non-existing boards 404?
Post by: Arskrigitsioniets on December 24, 2023, 01:57:54 PM
It seems that Yandex checks that 404 in content as well as various validators. I tried some of them and all they show 200 OK

Examples:
https://http.app/test/ravXewbq04use9Norzad
https://www.webfx.com/tools/http-status-tool/
Title: Re: Any way to make pages of non-existing boards 404?
Post by: Arskrigitsioniets on December 24, 2023, 01:59:47 PM
Also, tried in Firefox newtrodk tab. Idk I see 200 there

UPD: it seems that if you are logged, you get 403, if you are not, you get 200. As search bots are unlogged, they see 200.
Title: Re: Any way to make pages of non-existing boards 404?
Post by: Arantor on December 24, 2023, 02:56:51 PM
Argh, yes, I forgot about that behaviour - the assumption that if you're not logged in, you might be able to log in to change it.

That said, it *does* still produce the <meta name="robots" content="noindex"> tag as a guide to search engines to not index that page.

In which case you can tweak Security.php to flag all 'pages that require you to log in' in this fashion as returning a 403 (technically it should be a 401 but I can't test right now that that won't have other side effects)

else
{
loadTemplate('Login');
$context['sub_template'] = 'kick_guest';
$context['robot_no_index'] = true;
}

else
{
header('HTTP/1.1 403 Forbidden');
loadTemplate('Login');
$context['sub_template'] = 'kick_guest';
$context['robot_no_index'] = true;
}
Title: Re: Any way to make pages of non-existing boards 404?
Post by: Arskrigitsioniets on December 25, 2023, 12:57:32 PM
QuoteThat said, it *does* still produce the <meta name="robots" content="noindex"> tag as a guide to search engines to not index that page.
It changes everything and means that Yandex found some pages (a lot of them if it sends a notification) that doesn't contain "noindex", don't exist and return 200Ok. Unfortunately, the alert doesn't show examples, so I really don't know what pages are to blame, but definitely not these with "noindex". I'll try to investigate this, if I find something I'll write here again.
Title: Re: Any way to make pages of non-existing boards 404?
Post by: Kindred on December 25, 2023, 01:59:54 PM
Yandex tends to ignore/violate standards anyway