Avoid recreating new PHPSESSID

Started by SirLouen, June 23, 2018, 07:41:32 PM

Previous topic - Next topic

SirLouen

How can I avoid recreating new PHPSESSID every few minutes?
I'm testing a crawling process with Screaming Frog and after 5 minutes the system gets a new PHPSESSID, so the thing is that the crawling never ends



I thought I could solve this with a longer gc_maxlifetime



But this is not working, the system still creates a new PHPSESSID after a short while

Any other ideas?

Arantor

Can't Screaming Frog handle actual cookies?

GigaWatt

Had the same problem with archive.org. Never solved it.
"This is really a generic concept about human thinking - when faced with large tasks we're naturally inclined to try to break them down into a bunch of smaller tasks that together make up the whole."

"A 500 error loosely translates to the webserver saying, "WTF?"..."

SirLouen

Quote from: Arantor on June 23, 2018, 08:11:38 PM
Can't Screaming Frog handle actual cookies?

It can, but most crawlers do not handle cookies, therefore I'm assuming that except Google Crawler, that may have figured out how to avoid infinite loops (or not) with the continuous change of PHPSESSID, other poorer crawlers like YandexBot, BingBot or even other tools crawlers like Ahrefs, my find difficulties on crawling the site if the PHPSESSID is different modifying the URL that they have previously stored.

@GigaWatt literally puts a great example of what I'm referring. Archive.org crawler may find this continous change an hindrance to an adequate crawling without loops.

Arantor

There is a solution but it'll break the tracking of online guests for these crawlers.

The session isn't being regenerated, it's multiple instances getting new sessions, and none of them smart enough to use cookies or respect canonical URLs.

If you're ok with guest stats being screwed up, I can look up the code changes required to stop it spitting out PHPSESSID.

SirLouen

Quote from: Arantor on June 24, 2018, 04:23:07 AM
If you're ok with guest stats being screwed up, I can look up the code changes required to stop it spitting out PHPSESSID.

Yes, I'm ok with this.

Arantor

Hmm, which of the pretty URLs mods are you using?

SirLouen


Arantor

So, Pretty URLs? I don't use Pretty URLs any more, but it looks like you'd need to edit Sources/PrettyUrls-Filters.php:

Code (find) Select
$replacement .= (strpos($replacement, '?') === false ? '?' : ';') . (isset($PHPSESSID[0]) ? $PHPSESSID[0] : '') . ';' . (isset($sesc[0]) ? $sesc[0] : '') . (isset($session_var[0]) ? $session_var[0] : '') . (isset($fragment[0]) ? $fragment[0] : '');

Code (replace) Select
$replacement .= (strpos($replacement, '?') === false ? '?' : ';') . (isset($sesc[0]) ? $sesc[0] : '') . (isset($session_var[0]) ? $session_var[0] : '') . (isset($fragment[0]) ? $fragment[0] : '');

Note that I haven't tested this.

SirLouen

Quote from: Arantor on June 24, 2018, 05:24:09 AM
So, Pretty URLs? I don't use Pretty URLs any more, but it looks like you'd need to edit Sources/PrettyUrls-Filters.php:

Code (find) Select
$replacement .= (strpos($replacement, '?') === false ? '?' : ';') . (isset($PHPSESSID[0]) ? $PHPSESSID[0] : '') . ';' . (isset($sesc[0]) ? $sesc[0] : '') . (isset($session_var[0]) ? $session_var[0] : '') . (isset($fragment[0]) ? $fragment[0] : '');

Code (replace) Select
$replacement .= (strpos($replacement, '?') === false ? '?' : ';') . (isset($sesc[0]) ? $sesc[0] : '') . (isset($session_var[0]) ? $session_var[0] : '') . (isset($fragment[0]) ? $fragment[0] : '');

Note that I haven't tested this.

Yes Pretty URLS by SMF Hacks:
http://custom.simplemachines.org/mods/index.php?mod=636

I'm not 100% but it seems that with that modification, you virtually kill the PHPSESSID system in the whole SMF platform?

Arantor

Well, yes, that's rather the point.

Regular users use cookies which carry the PHPSESSID with them, but platforms that don't carry cookies have the session ID shoved into the URL so that you can identify how many 'guest' users there are because invariably the guest users are bots and identifying approximately how many bots online is interesting/useful for performance tracking purposes.

This should remove it being reinjected back into the URL - the thing you asked about.

SirLouen

But essentially this isn't the same as setting php vars like?

session.use_trans_sid = 0
session.use_only_cookies = 1

Arantor

Um, yes, that's again kind of the point? Except for the fact that Pretty URLs (and typically SMF) will just rewrite the URL for you regardless of that setting.

SirLouen

Ok, I was a little bit "scared" of killing PHPSESSID because it could bring security inconveniences as I read somewhere.
If the problem is only related to guest stats, it doesn't matter much, because I track everything through GA, never through the internal stat platform.

I can't think in other issues that removal of PHPSESSID rewriting could bring also.

Arantor

If I thought it would have security implications across SMF, I'd not have pointed out the relevant section of code ;)

It's only used to differentiate non-cookie users, and all data editing actions have a secondary check anyway.

It will inflate the reported 'most online' figures in the board index info center just because what was previously identifiable as one 'guest' making multiple requests is no longer going to do so.

SirLouen

By the way, I'm currently scrapping this forum, and I can see that the PHPSESSID (named as P=) is not that intensively rewritten as in Pretty URL mod. Pretty URL mod just rewrites every single hop, but this forum by default only introduces the var on index.php positions.

Arantor

Of course it isn't. In core, it's only introduced on the first hit without a cookie (because it can't know if there's a cookie or not) and if it sees subsequent requests with a cookie, it won't attempt to rewrite it - because it knows there's a better solution.

Pretty URLs, on the other hand... isn't that clever.

SirLouen

QuotePretty URLs, on the other hand... isn't that clever.

And now I'm starting to believe, that this is the huge problem of why I've seen a lot of posts regarding this thing. The SMF Hacks P-URL+PHPSESSID appear not to be a good idea in combination.

I'm currently testing the SMF Packs P-URLs mod to see how it handles this. It seems they do it the right way.

Arantor

The majority of the issues with Pretty URLs have nothing to do with its adding PHPSESSID to things but mostly stem from how deeply it makes changes, lots of changes, and tries to do so without leaving any IDs around, and how many server setups don't play nicely out of the box with it.

Pretty URLs also hails from a much earlier period where PHPSESSID still had some legitimate uses in SMF 1.x but in 2.0, it's mostly just for stats. SMF Packs's mod on the other hand, is much newer and doesn't have any historical baggage in that department. I haven't seen it lately but the first few versions of SMF Packs had at least as many issues as Pretty URLs, and I've had to fix a number of sites that it broke trying to be 'clever'. The SMF Packs author has a history of doing sloppy work, though he may have gotten better lately, haven't looked.

SirLouen

The thing is that Pretty URL topic has been always disregarded by this platform. The URL exact match (UEM) topic has been always around SEO, and despite there is no clear evidence of its correlation with ranking empowerment it's at least one of the most powerful placebo that every SEO manager needs to feel confident. So when deciding the whole on-site strategy, not having this is clearly a putoff or a necessity to make it happen, having to immediately appeal to this faulty solutions :(

Advertisement: