News:

Want to get involved in developing SMF, then why not lend a hand on our github!

Main Menu

improper implementation of preg_replace_callback

Started by Chen Zhen, February 08, 2014, 12:54:24 PM

Previous topic - Next topic

Chen Zhen


Attention SMF devs,

  There is no denying that the changes in SMF 2.0.7 concerning preg_replace_callback are causing issues for many due to the use of create_function for the callback. In many circumstances it can cause excessive memory usage which is crashing many peoples forums. I have already posted one possible solution for these issues.

  Imo at this time if a solution has not been developed, I suggest to post an official notice to people with an apology for the inconvenience and perhaps for all SMF software users to revert to SMF 2.0.6 so long as their OS PHP version is less than 5.5. 
Packages, modlog, news, posts, search, etc.. Many have reported the issue for these and many more will be reported until it is resolved.

Just one example:
As it is even for those without any issues as of yet, if a user adds 50 smileys to a post on a forum it will most likely crash the page. 

  If a fix has been developed and is currently in testing, perhaps an official notice to tell people of the mishap and to be patient for an update would be a respectful thing to do.

Regards.


My SMF Mods & Plug-Ins

WebDev

"Either you repeat the same conventional doctrines everybody is saying, or else you say something true, and it will sound like it's from Neptune." - Noam Chomsky

Chen Zhen


margarett,

  I am very glad to see you are a support specialist giving the proper advise. Hopefully your peers will follow your lead or perhaps better yet, you will be promoted.

ref. http://www.simplemachines.org/community/index.php?topic=518130.0

Respect.

My SMF Mods & Plug-Ins

WebDev

"Either you repeat the same conventional doctrines everybody is saying, or else you say something true, and it will sound like it's from Neptune." - Noam Chomsky

Chen Zhen

  Sinan seems to have come up with a better solution than mine. One that seems it should work for PHP 4.1+ .. a smart dev I tell ya!

Btw - purposely setting the memory limit to 8M in the parsesmileys function (albeit lower than it will be set for most people) will allow for testing this in posts. Stick about 50 smileys in one post and you will see what I mean.

Regards.


My SMF Mods & Plug-Ins

WebDev

"Either you repeat the same conventional doctrines everybody is saying, or else you say something true, and it will sound like it's from Neptune." - Noam Chomsky

[SiNaN]

#3
Just copying what I have posted there, if it helps:

After doing a little research on the web, the cleanest possible solution seems to be using an extra class for this. Something like this:

Subs.php

Code (Find) Select
static $smileyPregSearch = array(), $smileyPregReplacements = array();

Code (Replace) Select
static $smileyPregSearch = array(), $smileyPregReplacements = array(), $callback;

Code (Find) Select
$smileyPregSearch = '~(?<=[>:\?\.\s' . $non_breaking_space . '[\]()*\\\;]|^)(' . implode('|', $searchParts) . ')(?=[^[:alpha:]0-9]|$)~e' . ($context['utf8'] ? 'u' : '');
}

// Replace away!
$message = preg_replace($smileyPregSearch, 'isset($smileyPregReplacements[\'$1\']) ? $smileyPregReplacements[\'$1\'] : \'\'', $message);
}


Code (Replace) Select
$smileyPregSearch = '~(?<=[>:\?\.\s' . $non_breaking_space . '[\]()*\\\;]|^)(' . implode('|', $searchParts) . ')(?=[^[:alpha:]0-9]|$)~' . ($context['utf8'] ? 'u' : '');
$callback = new ParseSmileysReplacement;
$callback->replacements = $smileyPregReplacements;
}

// Replace away!
$message = preg_replace_callback($smileyPregSearch, array($callback, 'callback'), $message);
}

class ParseSmileysReplacement
{
function callback($matches)
{
if (isset($this->replacements[$matches[0]]))
return $this->replacements[$matches[0]];
else
return '';
}
}


I think that should work with anything that SMF is required to work on. Can you confirm?

I believe there aren't a lot preg_replace_callback() calls in SMF in which you need to pass multiple arguments to the callback function. Maybe the one above is the only one. So the others can just use permanent functions.

Edit: Updated as per the issues Arantor pointed out.

Edit 2: Put a line back, which was removed accidentally.
Former SMF Core Developer | My Mods | SimplePortal

Arantor

SMF 2.0 still officially supports PHP < 5, for which class scope is not supported at all, for which loadClassFile exists to support. Additionally, PHP < 5 does not support use of __construct, but instead requires the constructor function to be called the same as the class itself. (But using the class name as constructor is also itself deprecated as of 5.3.3)

While PHP 4 should not be in use, officially it is still supported.

Of slight concern is the fact that a new class instance is still being created every iteration of the loop, as opposed to a new function every iteration of the loop, which is definitely lower in memory usage but whether it is enough is another matter.

I'm also not sure this solves the problematic pregCurry function which appears to be called in other contexts.

Chen Zhen

  I didn't look up absolutely everything in Sinan's code for compatibility at first glance. I understand the idea is to have it function for PHP 4.1+ as the current SMF requirements state.  At least we're communicating (and comparing code) to find a solution.

Here is more or less what I posted at SP for a better explanation of what I propose.




   My solution is to use 2 files.. one for people using PHP 5.3+ and the other for people using anything less than PHP 5.3.0. Each file will have identical function names for the instances where preg_replace_callback (PHP 5.3+) or preg_replace with /e modifier (less than PHP 5.3.0) is to be used. SMF should only include 1 of those files depending on PHP version.

ie.

if (version_compare(PHP_VERSION, '5.3.0') >= 0)
    require_once($sourcedir . '/PregReplaceCallback.php');
else
    require_once($sourcedir . '/PregReplace.php');


For the instances where the character(s) replacements are needed, simply call the permanent function that is located in the file that was loaded. Each file will have like function names because only one will be loaded.

This should work so that SMF will be compatible with PHP 4.1+ without causing any issues as it now most certainly does.

ref. link: Click Here




  So the < PHP 5.3.0 file would have the preg_replace with the /e modifier as it was in SMF 2.0.6 which I do not really need to post.

This is the example code I propose to be put in the PHP 5.3.0+ file for parsesmileys ($smileyPregSearch, $smileyPregReplacements & $message passed to the function):

function smf_parse_smileys($smileyPregSearch, $smileyPregReplacements, $message)
{
    return preg_replace_callback($smileyPregSearch, function($matches) use (&$smileyPregReplacements)
                                        {                                           
return $smileyPregReplacements[$matches[1]];
                                           
                                        }, $message);
}


Regards.

My SMF Mods & Plug-Ins

WebDev

"Either you repeat the same conventional doctrines everybody is saying, or else you say something true, and it will sound like it's from Neptune." - Noam Chomsky

[SiNaN]

Quote from: Arantor on February 08, 2014, 03:14:02 PM
SMF 2.0 still officially supports PHP < 5, for which class scope is not supported at all, for which loadClassFile exists to support. Additionally, PHP < 5 does not support use of __construct, but instead requires the constructor function to be called the same as the class itself. (But using the class name as constructor is also itself deprecated as of 5.3.3)

While PHP 4 should not be in use, officially it is still supported.

What I know about OOP is close to nothing and that's why I wasn't sure if it would be of any help. Thank you for pointing those issues out. I've updated the code a little. Can you check it again to see if that one would work? Or can you think of any other way to make that extra class work with all PHP versions supported? If that doesn't work, the only option left seems to be declaring the variable as global where they are needed in the callback function.

Quote from: Arantor on February 08, 2014, 03:14:02 PM
Of slight concern is the fact that a new class instance is still being created every iteration of the loop, as opposed to a new function every iteration of the loop, which is definitely lower in memory usage but whether it is enough is another matter.

Which loop is that? From what I understand looking at the code, there is only one instance of the class and that's used by the replace.

Quote from: Arantor on February 08, 2014, 03:14:02 PM
I'm also not sure this solves the problematic pregCurry function which appears to be called in other contexts.

I only provided an example for the one in parsesmileys function. From what I can see looking at the update package, there are only two replaces that need an extra argument to be passed to the callback function. So if those can be fixed, the rest can just use permanent functions.
Former SMF Core Developer | My Mods | SimplePortal

Arantor

Well... let me explain.

For the issues to be occurring that we've seen, where 128M or even 256M is exceeded, a very very large amount of dynamic functions needs to be created. Like, crazy amounts, since even 32MB should be more than enough.

Even if we overinflate the figures for a moment, and say that each dynamic function consumes 64KB of RAM (which it doesn't), that means that suddenly we're using several *thousand* instances of these lambda functions. Even with the typical number of calls to parse_bbc per page, we don't get into the thousands of runs of parse_bbc per page. We don't get thousands of calls to parse_smileys per page.

Which means that must be occurring in a loop for it to be doing that.

Now, the pregCurry function is a bit special as you may have seen. It's a function that creates a function that can return a new function. So that's where part of it stems from, but as I said, there's no way this can be just a few calls, this must be being run into the thousands to cause the issues we're seeing.

The true-closure solves this by not being dynamically created repeatedly while the late binding allows scope to be preserved, something that a second glance of the class instance suggests it may not be doing.

QuoteOr can you think of any other way to make that extra class work with all PHP versions supported?

The only way to do that properly with all required PHP versions within SMF is to:
* put it in its own class
* use conventional scope techniques
* load that class with loadClassFile (which will deal with the conventional scope techniques)
* don't use a constructor but have a function that you push the values to it

Mind you, I'd be suggesting at this point that PHP 5 should just be declared the real minimum version for 2.0 and just ignore PHP 4 at this stage, it's been EOL for years.

[SiNaN]

Quote from: Arantor on February 08, 2014, 04:04:32 PM
Well... let me explain.

I understand the issue overall and the problems with pregCurry function. That's why I'm trying to come up with an alternative that does not use it. In the example I provided above (correct me if I'm wrong) there is only one instance of ParseSmileysReplacement class created and that instance is kept as a static variable. So I can't really see the loop which causes multiple instances of ParseSmileysReplacement to be created.

Quote from: Arantor on February 08, 2014, 04:04:32 PM
The only way to do that properly with all required PHP versions within SMF is to:
* put it in its own class
* use conventional scope techniques
* load that class with loadClassFile (which will deal with the conventional scope techniques)
* don't use a constructor but have a function that you push the values to it

From what I understand, you don't need to define the variable scope or use the contructor function. Will the class I updated above still cause problems over different PHP versions?

Quote from: Arantor on February 08, 2014, 04:04:32 PM
Mind you, I'd be suggesting at this point that PHP 5 should just be declared the real minimum version for 2.0 and just ignore PHP 4 at this stage, it's been EOL for years.

I agree with that.
Former SMF Core Developer | My Mods | SimplePortal

Arantor

If that were the case, that only one instance gets created (because it's not in a loop)... why do we have so many lambda functions created? Why do *thousands* get created? (and that's a conservative estimate, it's really likely to in the tens of thousands)

QuoteFrom what I understand, you don't need to define the variable scope or use the contructor function. Will the class I updated above still cause problems over different PHP versions?

Not using a constructor in this case would be recommended.

Using a property without *some* declaration is not recommended, though and can throw errors in some cases (because you generally have to start declaring __set() and stuff), using var as the scope will work but I don't know if it was subsequently deprecated in PHP or not since I haven't used that in years :P. I do know that loadClassFile will silently rewrite it for PHP 4 though.

[SiNaN]

Quote from: Arantor on February 08, 2014, 04:24:58 PM
If that were the case, that only one instance gets created (because it's not in a loop)... why do we have so many lambda functions created? Why do *thousands* get created? (and that's a conservative estimate, it's really likely to in the tens of thousands)

Isn't that because of the pregReplaceCurry function? The callback is called for every replace (like if you had 50 smileys that is 50 times) and the callback is pregReplaceCurry which has create_function in it. What I have suggested above has an instance given as the callback and it's created once and the creation is done a few lines above preg_replace_callback.

Quote from: Arantor on February 08, 2014, 04:24:58 PM
Not using a constructor in this case would be recommended.

Using a property without *some* declaration is not recommended, though and can throw errors in some cases (because you generally have to start declaring __set() and stuff), using var as the scope will work but I don't know if it was subsequently deprecated in PHP or not since I haven't used that in years :P. I do know that loadClassFile will silently rewrite it for PHP 4 though.

I know that using var will result in warnings in recent versions. However, I haven't come across anything that suggests undeclared instance properties to be technically problematic, though I agree it's not the "right" way.
Former SMF Core Developer | My Mods | SimplePortal

Chen Zhen

  I also would like to mention something about the xml parser for installing packages where that callback is being used. A simple permanent function (from ie. Load.php) saves on memory consumption.

  When the package-info.xml file (specifically) of a mod is parsed, removing the excess white spaces also saves on memory consumption.  This is only for that specific file obviously because the other files that are parsed to do edits need those to find their place markers and also to insert their edits. The package-info.xml file on the other hand, is just read & executed. It's not much but perhaps when a person has many mods installed & they are browsing their entire list,  the memory usage is not as compounded.  At first I thought this might be counter-productive to add this in a delimiter but it seems to use less memory when tested.

My SMF Mods & Plug-Ins

WebDev

"Either you repeat the same conventional doctrines everybody is saying, or else you say something true, and it will sound like it's from Neptune." - Noam Chomsky

Arantor

I'd suggest putting it in Subs-Package.php personally, saves a bit of memory the rest of the time.

But to really save some memory, not using Class-Package at all and replacing it with SimpleXML would do wonders for memory consumption.

Chen Zhen

Quote from: Arantor on February 10, 2014, 07:16:17 PM
I'd suggest putting it in Subs-Package.php personally, saves a bit of memory the rest of the time.

But to really save some memory, not using Class-Package at all and replacing it with SimpleXML would do wonders for memory consumption.

It does save on memory & I don't know why I didn't think of it at the time since I was also messing around with that file.
As for SimpleXML, yes you are definitely correct. Imo just change the SMF requirement to PHP 5 as you guys have already suggested here in this thread.

My SMF Mods & Plug-Ins

WebDev

"Either you repeat the same conventional doctrines everybody is saying, or else you say something true, and it will sound like it's from Neptune." - Noam Chomsky

Advertisement: