Simple Machines Community Forum

SMF Support => SMF 2.0.x Support => Topic started by: SirLouen on June 23, 2018, 07:41:32 PM

Title: Avoid recreating new PHPSESSID
Post by: SirLouen on June 23, 2018, 07:41:32 PM
How can I avoid recreating new PHPSESSID every few minutes?
I'm testing a crawling process with Screaming Frog and after 5 minutes the system gets a new PHPSESSID, so the thing is that the crawling never ends

(https://i.imgur.com/pf3Zcjy.png)

I thought I could solve this with a longer gc_maxlifetime

(https://i.imgur.com/uwVJe3q.png)

But this is not working, the system still creates a new PHPSESSID after a short while

Any other ideas?
Title: Re: Avoid recreating new PHPSESSID
Post by: Arantor on June 23, 2018, 08:11:38 PM
Can't Screaming Frog handle actual cookies?
Title: Re: Avoid recreating new PHPSESSID
Post by: GigaWatt on June 23, 2018, 10:57:30 PM
Had the same problem with archive.org. Never solved it.
Title: Re: Avoid recreating new PHPSESSID
Post by: SirLouen on June 24, 2018, 04:17:57 AM
Quote from: Arantor on June 23, 2018, 08:11:38 PM
Can't Screaming Frog handle actual cookies?

It can, but most crawlers do not handle cookies, therefore I'm assuming that except Google Crawler, that may have figured out how to avoid infinite loops (or not) with the continuous change of PHPSESSID, other poorer crawlers like YandexBot, BingBot or even other tools crawlers like Ahrefs, my find difficulties on crawling the site if the PHPSESSID is different modifying the URL that they have previously stored.

@GigaWatt literally puts a great example of what I'm referring. Archive.org crawler may find this continous change an hindrance to an adequate crawling without loops.
Title: Re: Avoid recreating new PHPSESSID
Post by: Arantor on June 24, 2018, 04:23:07 AM
There is a solution but it'll break the tracking of online guests for these crawlers.

The session isn't being regenerated, it's multiple instances getting new sessions, and none of them smart enough to use cookies or respect canonical URLs.

If you're ok with guest stats being screwed up, I can look up the code changes required to stop it spitting out PHPSESSID.
Title: Re: Avoid recreating new PHPSESSID
Post by: SirLouen on June 24, 2018, 04:25:48 AM
Quote from: Arantor on June 24, 2018, 04:23:07 AM
If you're ok with guest stats being screwed up, I can look up the code changes required to stop it spitting out PHPSESSID.

Yes, I'm ok with this.
Title: Re: Avoid recreating new PHPSESSID
Post by: Arantor on June 24, 2018, 05:07:05 AM
Hmm, which of the pretty URLs mods are you using?
Title: Re: Avoid recreating new PHPSESSID
Post by: SirLouen on June 24, 2018, 05:12:39 AM
SMF Hacks version
Title: Re: Avoid recreating new PHPSESSID
Post by: Arantor on June 24, 2018, 05:24:09 AM
So, Pretty URLs? I don't use Pretty URLs any more, but it looks like you'd need to edit Sources/PrettyUrls-Filters.php:

Code (find) Select
$replacement .= (strpos($replacement, '?') === false ? '?' : ';') . (isset($PHPSESSID[0]) ? $PHPSESSID[0] : '') . ';' . (isset($sesc[0]) ? $sesc[0] : '') . (isset($session_var[0]) ? $session_var[0] : '') . (isset($fragment[0]) ? $fragment[0] : '');

Code (replace) Select
$replacement .= (strpos($replacement, '?') === false ? '?' : ';') . (isset($sesc[0]) ? $sesc[0] : '') . (isset($session_var[0]) ? $session_var[0] : '') . (isset($fragment[0]) ? $fragment[0] : '');

Note that I haven't tested this.
Title: Re: Avoid recreating new PHPSESSID
Post by: SirLouen on June 24, 2018, 05:27:35 AM
Quote from: Arantor on June 24, 2018, 05:24:09 AM
So, Pretty URLs? I don't use Pretty URLs any more, but it looks like you'd need to edit Sources/PrettyUrls-Filters.php:

Code (find) Select
$replacement .= (strpos($replacement, '?') === false ? '?' : ';') . (isset($PHPSESSID[0]) ? $PHPSESSID[0] : '') . ';' . (isset($sesc[0]) ? $sesc[0] : '') . (isset($session_var[0]) ? $session_var[0] : '') . (isset($fragment[0]) ? $fragment[0] : '');

Code (replace) Select
$replacement .= (strpos($replacement, '?') === false ? '?' : ';') . (isset($sesc[0]) ? $sesc[0] : '') . (isset($session_var[0]) ? $session_var[0] : '') . (isset($fragment[0]) ? $fragment[0] : '');

Note that I haven't tested this.

Yes Pretty URLS by SMF Hacks:
http://custom.simplemachines.org/mods/index.php?mod=636

I'm not 100% but it seems that with that modification, you virtually kill the PHPSESSID system in the whole SMF platform?
Title: Re: Avoid recreating new PHPSESSID
Post by: Arantor on June 24, 2018, 05:31:14 AM
Well, yes, that's rather the point.

Regular users use cookies which carry the PHPSESSID with them, but platforms that don't carry cookies have the session ID shoved into the URL so that you can identify how many 'guest' users there are because invariably the guest users are bots and identifying approximately how many bots online is interesting/useful for performance tracking purposes.

This should remove it being reinjected back into the URL - the thing you asked about.
Title: Re: Avoid recreating new PHPSESSID
Post by: SirLouen on June 24, 2018, 05:32:46 AM
But essentially this isn't the same as setting php vars like?

session.use_trans_sid = 0
session.use_only_cookies = 1
Title: Re: Avoid recreating new PHPSESSID
Post by: Arantor on June 24, 2018, 05:35:15 AM
Um, yes, that's again kind of the point? Except for the fact that Pretty URLs (and typically SMF) will just rewrite the URL for you regardless of that setting.
Title: Re: Avoid recreating new PHPSESSID
Post by: SirLouen on June 24, 2018, 05:42:17 AM
Ok, I was a little bit "scared" of killing PHPSESSID because it could bring security inconveniences as I read somewhere.
If the problem is only related to guest stats, it doesn't matter much, because I track everything through GA, never through the internal stat platform.

I can't think in other issues that removal of PHPSESSID rewriting could bring also.
Title: Re: Avoid recreating new PHPSESSID
Post by: Arantor on June 24, 2018, 05:45:18 AM
If I thought it would have security implications across SMF, I'd not have pointed out the relevant section of code ;)

It's only used to differentiate non-cookie users, and all data editing actions have a secondary check anyway.

It will inflate the reported 'most online' figures in the board index info center just because what was previously identifiable as one 'guest' making multiple requests is no longer going to do so.
Title: Re: Avoid recreating new PHPSESSID
Post by: SirLouen on June 24, 2018, 05:46:00 AM
By the way, I'm currently scrapping this forum, and I can see that the PHPSESSID (named as P=) is not that intensively rewritten as in Pretty URL mod. Pretty URL mod just rewrites every single hop, but this forum by default only introduces the var on index.php positions.
Title: Re: Avoid recreating new PHPSESSID
Post by: Arantor on June 24, 2018, 05:47:53 AM
Of course it isn't. In core, it's only introduced on the first hit without a cookie (because it can't know if there's a cookie or not) and if it sees subsequent requests with a cookie, it won't attempt to rewrite it - because it knows there's a better solution.

Pretty URLs, on the other hand... isn't that clever.
Title: Re: Avoid recreating new PHPSESSID
Post by: SirLouen on June 24, 2018, 05:55:51 AM
QuotePretty URLs, on the other hand... isn't that clever.

And now I'm starting to believe, that this is the huge problem of why I've seen a lot of posts regarding this thing. The SMF Hacks P-URL+PHPSESSID appear not to be a good idea in combination.

I'm currently testing the SMF Packs P-URLs mod to see how it handles this. It seems they do it the right way.
Title: Re: Avoid recreating new PHPSESSID
Post by: Arantor on June 24, 2018, 06:03:36 AM
The majority of the issues with Pretty URLs have nothing to do with its adding PHPSESSID to things but mostly stem from how deeply it makes changes, lots of changes, and tries to do so without leaving any IDs around, and how many server setups don't play nicely out of the box with it.

Pretty URLs also hails from a much earlier period where PHPSESSID still had some legitimate uses in SMF 1.x but in 2.0, it's mostly just for stats. SMF Packs's mod on the other hand, is much newer and doesn't have any historical baggage in that department. I haven't seen it lately but the first few versions of SMF Packs had at least as many issues as Pretty URLs, and I've had to fix a number of sites that it broke trying to be 'clever'. The SMF Packs author has a history of doing sloppy work, though he may have gotten better lately, haven't looked.
Title: Re: Avoid recreating new PHPSESSID
Post by: SirLouen on June 24, 2018, 06:17:01 AM
The thing is that Pretty URL topic has been always disregarded by this platform. The URL exact match (UEM) topic has been always around SEO, and despite there is no clear evidence of its correlation with ranking empowerment it's at least one of the most powerful placebo that every SEO manager needs to feel confident. So when deciding the whole on-site strategy, not having this is clearly a putoff or a necessity to make it happen, having to immediately appeal to this faulty solutions :(
Title: Re: Avoid recreating new PHPSESSID
Post by: Arantor on June 24, 2018, 06:21:01 AM
Yup, it's long been a matter of contention within the team and the industry about the value of nice looking URLs, something that the paid platforms have long since figured out and that the free ones keep trying to argue isn't relevant (mostly because it's effort to fix properly)
Title: Re: Avoid recreating new PHPSESSID
Post by: vbgamer45 on June 24, 2018, 08:35:16 AM
Arantor do you suggest ripping out $PHPSESSID?

I see SMF uses the php constant SID
Title: Re: Avoid recreating new PHPSESSID
Post by: SirLouen on June 24, 2018, 08:35:39 AM
Quotemostly because it's effort to fix properly
I assumed that and makes sense.

All current forum CMS found this important and stepped forward in this sense, nodeBB, Vanilla, even bbPress... I think the only two stuck in this position are phpBB and SMF.

Since this forum is maintained by the community, I can't argue much about this because I recognize this is eventually a lot of programming time. If one day I have some time to start helping out I think I will start definitely in this point  ;D

Meanwhile, I will be dealing with the current mods available
Title: Re: Avoid recreating new PHPSESSID
Post by: Arantor on June 24, 2018, 08:44:19 AM
Quote from: vbgamer45 on June 24, 2018, 08:35:16 AM
Arantor do you suggest ripping out $PHPSESSID?

I see SMF uses the php constant SID


I think SID is only set when cookies don't cover it - while $PHPSESSID is set in all cases, which is why SMF doesn't set PHPSESSID all the time. I'd check on the differences between those two and go from there.
Title: Re: Avoid recreating new PHPSESSID
Post by: GigaWatt on June 24, 2018, 09:22:33 AM
@Arantor: How would I do this on a forum that doesn't have Pretty URLs? Remove PHPSESSID for guests I mean :).
Title: Re: Avoid recreating new PHPSESSID
Post by: Arantor on June 24, 2018, 09:33:24 AM
@GigaWatt: same caveat I mentioned still applies: it screws up your ability to track how many guests are online. But you'd do a mild change inside QueryString.php, in ob_sessrewrite(). The exact change depends on whether you use index.php/topic,1.0.html URLs or not.
Title: Re: Avoid recreating new PHPSESSID
Post by: GigaWatt on June 24, 2018, 09:37:57 AM
No, I don't use queriless URLs, I use the regular ones, with queries ;).

I'll give it shot and holler if I need any help ;).
Title: Re: Avoid recreating new PHPSESSID
Post by: Daretary on January 13, 2024, 10:53:47 AM
I just commented this out to stop this horror:
if (empty($_COOKIE) && SID != '' && !isBrowser('possibly_robot'))
// $buffer = preg_replace('/(?<!<link rel="canonical" href=)"' . preg_quote($scripturl, '/') . '(?!\?' . preg_quote(SID, '/') . ')\\??/', '"' . $scripturl . '?' . SID . '&amp;', $buffer);
Title: Re: Avoid recreating new PHPSESSID
Post by: Kindred on January 13, 2024, 12:25:33 PM
That is really the wrong thing to do
Title: Re: Avoid recreating new PHPSESSID
Post by: Arantor on January 13, 2024, 12:43:30 PM
The only circumstance it actually matters is if you expect to have users who don't have cookies enabled, because that's what it was originally implemented for, for being able to track guests/search engines that couldn't handle cookies.

Since cookies are now effectively mandatory anyway you might as well do away with it. You'll note that my comments about it above were 5 years ago - the user end stopped being a problem a while ago (subject to acknowledging the necessity of cookies for guests and search engines) and every crawler handles cookies these days because it's almost more work not to.
Title: Re: Avoid recreating new PHPSESSID
Post by: Irisado on January 13, 2024, 01:49:38 PM
Quote from: Daretary on January 13, 2024, 10:53:47 AMI just commented this out to stop this horror

Please do not revive support topics that are five and a half years old.  Topic locked.