Avoid recreating new PHPSESSID

Started by SirLouen, June 23, 2018, 07:41:32 PM

Previous topic - Next topic

SirLouen

How can I avoid recreating new PHPSESSID every few minutes?
I'm testing a crawling process with Screaming Frog and after 5 minutes the system gets a new PHPSESSID, so the thing is that the crawling never ends



I thought I could solve this with a longer gc_maxlifetime



But this is not working, the system still creates a new PHPSESSID after a short while

Any other ideas?

Arantor

Can't Screaming Frog handle actual cookies?

GigaWatt

Had the same problem with archive.org. Never solved it.
"This is really a generic concept about human thinking - when faced with large tasks we're naturally inclined to try to break them down into a bunch of smaller tasks that together make up the whole."

"A 500 error loosely translates to the webserver saying, "WTF?"..."

SirLouen

Quote from: Arantor on June 23, 2018, 08:11:38 PM
Can't Screaming Frog handle actual cookies?

It can, but most crawlers do not handle cookies, therefore I'm assuming that except Google Crawler, that may have figured out how to avoid infinite loops (or not) with the continuous change of PHPSESSID, other poorer crawlers like YandexBot, BingBot or even other tools crawlers like Ahrefs, my find difficulties on crawling the site if the PHPSESSID is different modifying the URL that they have previously stored.

@GigaWatt literally puts a great example of what I'm referring. Archive.org crawler may find this continous change an hindrance to an adequate crawling without loops.

Arantor

There is a solution but it'll break the tracking of online guests for these crawlers.

The session isn't being regenerated, it's multiple instances getting new sessions, and none of them smart enough to use cookies or respect canonical URLs.

If you're ok with guest stats being screwed up, I can look up the code changes required to stop it spitting out PHPSESSID.

SirLouen

Quote from: Arantor on June 24, 2018, 04:23:07 AM
If you're ok with guest stats being screwed up, I can look up the code changes required to stop it spitting out PHPSESSID.

Yes, I'm ok with this.

Arantor

Hmm, which of the pretty URLs mods are you using?

SirLouen


Arantor

So, Pretty URLs? I don't use Pretty URLs any more, but it looks like you'd need to edit Sources/PrettyUrls-Filters.php:

Code (find) Select
$replacement .= (strpos($replacement, '?') === false ? '?' : ';') . (isset($PHPSESSID[0]) ? $PHPSESSID[0] : '') . ';' . (isset($sesc[0]) ? $sesc[0] : '') . (isset($session_var[0]) ? $session_var[0] : '') . (isset($fragment[0]) ? $fragment[0] : '');

Code (replace) Select
$replacement .= (strpos($replacement, '?') === false ? '?' : ';') . (isset($sesc[0]) ? $sesc[0] : '') . (isset($session_var[0]) ? $session_var[0] : '') . (isset($fragment[0]) ? $fragment[0] : '');

Note that I haven't tested this.

SirLouen

Quote from: Arantor on June 24, 2018, 05:24:09 AM
So, Pretty URLs? I don't use Pretty URLs any more, but it looks like you'd need to edit Sources/PrettyUrls-Filters.php:

Code (find) Select
$replacement .= (strpos($replacement, '?') === false ? '?' : ';') . (isset($PHPSESSID[0]) ? $PHPSESSID[0] : '') . ';' . (isset($sesc[0]) ? $sesc[0] : '') . (isset($session_var[0]) ? $session_var[0] : '') . (isset($fragment[0]) ? $fragment[0] : '');

Code (replace) Select
$replacement .= (strpos($replacement, '?') === false ? '?' : ';') . (isset($sesc[0]) ? $sesc[0] : '') . (isset($session_var[0]) ? $session_var[0] : '') . (isset($fragment[0]) ? $fragment[0] : '');

Note that I haven't tested this.

Yes Pretty URLS by SMF Hacks:
http://custom.simplemachines.org/mods/index.php?mod=636

I'm not 100% but it seems that with that modification, you virtually kill the PHPSESSID system in the whole SMF platform?

Arantor

Well, yes, that's rather the point.

Regular users use cookies which carry the PHPSESSID with them, but platforms that don't carry cookies have the session ID shoved into the URL so that you can identify how many 'guest' users there are because invariably the guest users are bots and identifying approximately how many bots online is interesting/useful for performance tracking purposes.

This should remove it being reinjected back into the URL - the thing you asked about.

SirLouen

But essentially this isn't the same as setting php vars like?

session.use_trans_sid = 0
session.use_only_cookies = 1

Arantor

Um, yes, that's again kind of the point? Except for the fact that Pretty URLs (and typically SMF) will just rewrite the URL for you regardless of that setting.

SirLouen

Ok, I was a little bit "scared" of killing PHPSESSID because it could bring security inconveniences as I read somewhere.
If the problem is only related to guest stats, it doesn't matter much, because I track everything through GA, never through the internal stat platform.

I can't think in other issues that removal of PHPSESSID rewriting could bring also.

Arantor

If I thought it would have security implications across SMF, I'd not have pointed out the relevant section of code ;)

It's only used to differentiate non-cookie users, and all data editing actions have a secondary check anyway.

It will inflate the reported 'most online' figures in the board index info center just because what was previously identifiable as one 'guest' making multiple requests is no longer going to do so.

SirLouen

By the way, I'm currently scrapping this forum, and I can see that the PHPSESSID (named as P=) is not that intensively rewritten as in Pretty URL mod. Pretty URL mod just rewrites every single hop, but this forum by default only introduces the var on index.php positions.

Arantor

Of course it isn't. In core, it's only introduced on the first hit without a cookie (because it can't know if there's a cookie or not) and if it sees subsequent requests with a cookie, it won't attempt to rewrite it - because it knows there's a better solution.

Pretty URLs, on the other hand... isn't that clever.

SirLouen

QuotePretty URLs, on the other hand... isn't that clever.

And now I'm starting to believe, that this is the huge problem of why I've seen a lot of posts regarding this thing. The SMF Hacks P-URL+PHPSESSID appear not to be a good idea in combination.

I'm currently testing the SMF Packs P-URLs mod to see how it handles this. It seems they do it the right way.

Arantor

The majority of the issues with Pretty URLs have nothing to do with its adding PHPSESSID to things but mostly stem from how deeply it makes changes, lots of changes, and tries to do so without leaving any IDs around, and how many server setups don't play nicely out of the box with it.

Pretty URLs also hails from a much earlier period where PHPSESSID still had some legitimate uses in SMF 1.x but in 2.0, it's mostly just for stats. SMF Packs's mod on the other hand, is much newer and doesn't have any historical baggage in that department. I haven't seen it lately but the first few versions of SMF Packs had at least as many issues as Pretty URLs, and I've had to fix a number of sites that it broke trying to be 'clever'. The SMF Packs author has a history of doing sloppy work, though he may have gotten better lately, haven't looked.

SirLouen

The thing is that Pretty URL topic has been always disregarded by this platform. The URL exact match (UEM) topic has been always around SEO, and despite there is no clear evidence of its correlation with ranking empowerment it's at least one of the most powerful placebo that every SEO manager needs to feel confident. So when deciding the whole on-site strategy, not having this is clearly a putoff or a necessity to make it happen, having to immediately appeal to this faulty solutions :(

Arantor

Yup, it's long been a matter of contention within the team and the industry about the value of nice looking URLs, something that the paid platforms have long since figured out and that the free ones keep trying to argue isn't relevant (mostly because it's effort to fix properly)

vbgamer45

Arantor do you suggest ripping out $PHPSESSID?

I see SMF uses the php constant SID
Community Suite for SMF - Take your forum to the next level built for SMF, Gallery,Store,Classifieds,Downloads,more!

SMFHacks.com -  Paid Modifications for SMF

Mods:
EzPortal - Portal System for SMF
SMF Gallery Pro
SMF Store SMF Classifieds Ad Seller Pro

SirLouen

Quotemostly because it's effort to fix properly
I assumed that and makes sense.

All current forum CMS found this important and stepped forward in this sense, nodeBB, Vanilla, even bbPress... I think the only two stuck in this position are phpBB and SMF.

Since this forum is maintained by the community, I can't argue much about this because I recognize this is eventually a lot of programming time. If one day I have some time to start helping out I think I will start definitely in this point  ;D

Meanwhile, I will be dealing with the current mods available

Arantor

Quote from: vbgamer45 on June 24, 2018, 08:35:16 AM
Arantor do you suggest ripping out $PHPSESSID?

I see SMF uses the php constant SID


I think SID is only set when cookies don't cover it - while $PHPSESSID is set in all cases, which is why SMF doesn't set PHPSESSID all the time. I'd check on the differences between those two and go from there.

GigaWatt

@Arantor: How would I do this on a forum that doesn't have Pretty URLs? Remove PHPSESSID for guests I mean :).
"This is really a generic concept about human thinking - when faced with large tasks we're naturally inclined to try to break them down into a bunch of smaller tasks that together make up the whole."

"A 500 error loosely translates to the webserver saying, "WTF?"..."

Arantor

@GigaWatt: same caveat I mentioned still applies: it screws up your ability to track how many guests are online. But you'd do a mild change inside QueryString.php, in ob_sessrewrite(). The exact change depends on whether you use index.php/topic,1.0.html URLs or not.

GigaWatt

No, I don't use queriless URLs, I use the regular ones, with queries ;).

I'll give it shot and holler if I need any help ;).
"This is really a generic concept about human thinking - when faced with large tasks we're naturally inclined to try to break them down into a bunch of smaller tasks that together make up the whole."

"A 500 error loosely translates to the webserver saying, "WTF?"..."

Daretary

I just commented this out to stop this horror:
if (empty($_COOKIE) && SID != '' && !isBrowser('possibly_robot'))
// $buffer = preg_replace('/(?<!<link rel="canonical" href=)"' . preg_quote($scripturl, '/') . '(?!\?' . preg_quote(SID, '/') . ')\\??/', '"' . $scripturl . '?' . SID . '&amp;', $buffer);

Kindred

That is really the wrong thing to do
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

Arantor

The only circumstance it actually matters is if you expect to have users who don't have cookies enabled, because that's what it was originally implemented for, for being able to track guests/search engines that couldn't handle cookies.

Since cookies are now effectively mandatory anyway you might as well do away with it. You'll note that my comments about it above were 5 years ago - the user end stopped being a problem a while ago (subject to acknowledging the necessity of cookies for guests and search engines) and every crawler handles cookies these days because it's almost more work not to.

Irisado

Quote from: Daretary on January 13, 2024, 10:53:47 AMI just commented this out to stop this horror

Please do not revive support topics that are five and a half years old.  Topic locked.
Soñando con una playa donde brilla el sol, un arco iris ilumina el cielo, y el mar espejea iridescentemente

Advertisement: