REPORT: Potentially "Fatal Flaw" in SMF's bbcode parser

Started by dougiefresh, July 31, 2015, 09:12:39 AM

Previous topic - Next topic

dougiefresh



POTENTIALLY "FATAL FLAW" IN SMF'S PARSER



Several days ago, I was notified about a potentially "fatal flaw" in SMF's bbcode parser.  It occurs when a particular bbcode, such as my Post and PM Inline Attachments mod, has a large number of parameters, such as 14+.

What happens in the original SMF code:
(1) A simple array ($preg) is built with a regex expression for each of the parameters.
(1) Another array ($orders) is built with all possible permutations of the parameters from $preg.
(2) The $orders array is checked for a match, up until a match is found.

Why is this a problem?  Well, the permutation process is the issue here.  A bbcode with 14 parameters generates 98,306 permutations, taking little more than quarter a second on my system PER USE and using 90MB of memory!!  Checking for 15 parameters crashes my forum with an out-of-memory error because it attempts to allocate more than 128MB of memory!

Using this array of permutations, I started my tests with the included script below:
$parameters = array(
'id' => array('match' => '(\d+)', 'validate' => 'ILA_Param_ID'),
'width' => array('optional' => true, 'match' => '(\d+)', 'validate' => 'ILA_Param_Width'),
'height' => array('optional' => true, 'match' => '(\d+)', 'validate' => 'ILA_Param_Height'),
'float' => array('optional' => true, 'match' => '(left|right|center)', 'validate' => 'ILA_Param_Float'),
'margin' => array('optional' => true, 'match' => '(\d+)', 'validate' => 'ILA_Param_Margin'),
'margin-left' => array('optional' => true, 'match' => '(\d+)', 'validate' => 'ILA_Param_Margin_Left'),
'margin-right' => array('optional' => true, 'match' => '(\d+)', 'validate' => 'ILA_Param_Margin_Right'),
'margin-top' => array('optional' => true, 'match' => '(\d+)', 'validate' => 'ILA_Param_Margin_Top'),
'margin-bottom' => array('optional' => true, 'match' => '(\d+)', 'validate' => 'ILA_Param_Margin_Bottom'),
'border-style' => array('match' => '(dotted|dashed|solid|double|groove|ridge|inset|outset)', 'validate' => 'ILA_Param_Border_Style'),
'border-width' => array('optional' => true, 'match' => '(\d+)', 'validate' => 'ILA_Param_Border_Width'),
'border-color' => array('optional' => true, 'match' => '(#[\da-fA-F]{3}|#[\da-fA-F]{6}|[A-Za-z]{1,20}|rgb\(\d{1,3}, ?\d{1,3}, ?\d{1,3}\))', 'validate' => 'ILA_Param_Border_Color'),
'scale' => array('optional' => true, 'match' => '(true|false|yes|no)', 'validate' => 'ILA_Param_Scale'),
'msg' => array('optional' => true, 'match' => '(new|\d+)', 'validate' => 'ILA_Param_Msg'),
'msg2' => array('optional' => true, 'match' => '(new|\d+)', 'validate' => 'ILA_Param_Msg'),
'msg3' => array('optional' => true, 'match' => '(new|\d+)', 'validate' => 'ILA_Param_Msg'),
'msg4' => array('optional' => true, 'match' => '(new|\d+)', 'validate' => 'ILA_Param_Msg'),
'msg5' => array('optional' => true, 'match' => '(new|\d+)', 'validate' => 'ILA_Param_Msg'),
'msg6' => array('optional' => true, 'match' => '(new|\d+)', 'validate' => 'ILA_Param_Msg'),
'msg7' => array('optional' => true, 'match' => '(new|\d+)', 'validate' => 'ILA_Param_Msg'),
'msg8' => array('optional' => true, 'match' => '(new|\d+)', 'validate' => 'ILA_Param_Msg'),
'msg9' => array('optional' => true, 'match' => '(new|\d+)', 'validate' => 'ILA_Param_Msg'),
'msgA' => array('optional' => true, 'match' => '(new|\d+)', 'validate' => 'ILA_Param_Msg'),
'msgB' => array('optional' => true, 'match' => '(new|\d+)', 'validate' => 'ILA_Param_Msg'),
);

(Yes, I added some parameters that aren't in my Post & PM Inline Attachments mod....  Specifically, everything from msg2 and down!)

I've attached a test script to show what happens when the permutations array is built.  Here is the output from my localhost system:
Quote
TEST # 5 : 5 parameters
=================================================
Size before "permute" function is executed: 5,246,024
Size after "permute" function is executed: 5,266,576
Time Taken: 0.00011181831359863 seconds
Array Elements: 50

TEST # 6 : 6 parameters
=================================================
Size before "permute" function is executed: 5,246,200
Size after "permute" function is executed: 5,306,368
Time Taken: 0.00025010108947754 seconds
Array Elements: 130

TEST # 7 : 7 parameters
=================================================
Size before "permute" function is executed: 5,246,376
Size after "permute" function is executed: 5,410,520
Time Taken: 0.00060391426086426 seconds
Array Elements: 322

TEST # 8 : 8 parameters
=================================================
Size before "permute" function is executed: 5,246,552
Size after "permute" function is executed: 5,675,480
Time Taken: 0.0015487670898438 seconds
Array Elements: 770

TEST # 9 : 9 parameters
=================================================
Size before "permute" function is executed: 5,246,760
Size after "permute" function is executed: 6,388,688
Time Taken: 0.0037760734558105 seconds
Array Elements: 1794

TEST # 10 : 10 parameters
=================================================
Size before "permute" function is executed: 5,246,976
Size after "permute" function is executed: 8,066,328
Time Taken: 0.0089709758758545 seconds
Array Elements: 4098

TEST # 11 : 11 parameters
=================================================
Size before "permute" function is executed: 5,247,208
Size after "permute" function is executed: 12,023,288
Time Taken: 0.020859003067017 seconds
Array Elements: 9218

TEST # 12 : 12 parameters
=================================================
Size before "permute" function is executed: 5,247,392
Size after "permute" function is executed: 21,272,456
Time Taken: 0.052258014678955 seconds
Array Elements: 20482

TEST # 13 : 13 parameters
=================================================
Size before "permute" function is executed: 5,247,640
Size after "permute" function is executed: 42,637,392
Time Taken: 0.11318302154541 seconds
Array Elements: 45058

TEST # 14 : 14 parameters
=================================================
Size before "permute" function is executed: 5,247,816
Size after "permute" function is executed: 91,495,328
Time Taken: 0.24853491783142 seconds
Array Elements: 98306

TEST # 15 : 15 parameters
=================================================
Size before "permute" function is executed: 5,247,968

Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 35 bytes) in D:\Website\clean\Sources\Subs.php on line 874
The time for under 5 parameters is negliable.  Notice that the time taken for up to 8 parameters is almost non-existance, but more parameters than that cause the permute function to take more and more time to generate all the possible permutations of the parameters.  14 parameters take just under a quarter second PER TAG USE!!!  15 parameters actually crashed the script....

By the way, the error message returned by the script referred to this function (pointed to by >>>):
// This gets all possible permutations of an array.
function permute($array)
{
$orders = array($array);

$n = count($array);
$p = range(0, $n);
for ($i = 1; $i < $n; null)
{
$p[$i]--;
$j = $i % 2 != 0 ? $p[$i] : 0;

$temp = $array[$i];
>>> $array[$i] = $array[$j];
$array[$j] = $temp;

for ($i = 1; $p[$i] == 0; $i++)
$p[$i] = 1;

$orders[] = $array;
}

return $orders;
}

dougiefresh

#1


STEPS TO CORRECT THE FLAW


In Sources/Subs.php, find this code:
Code (Find) Select
return $orders;
}

and add this after it:
Code (Add After) Select


// Rearranges all parameters to be in the right order.  Returns TRUE if no parameters are leftover.
function_param_order($message, &$parameters, &$out)
{
$params = explode(' ', substr($message, 0, strpos($message, ']')));
unset($params[0]);
$order = array();
$out = $old = '';
foreach ($params as $param)
{
if (strpos($param, '=') === false)
$order[$old] .= ' ' . $param;
else
$order[$old = substr($param, 0, strpos($param, '='))] = substr($param, strpos($param, '=') + 1);
}
foreach ($parameters as $key => $ignore)
{
$out .= (isset($order[$key]) ? ' ' . $key . '=' . $order[$key] : '');
unset($order[$key]);
}
return count($order) == 0;
}


Find this code:
Code (Find) Select
$preg = array();
foreach ($possible['parameters'] as $p => $info)
$preg[] = '(\s+' . $p . '=' . (empty($info['quoted']) ? '' : '&quot;') . (isset($info['match']) ? $info['match'] : '(.+?)') . (empty($info['quoted']) ? '' : '&quot;') . ')' . (empty($info['optional']) ? '' : '?');

// Okay, this may look ugly and it is, but it's not going to happen much and it is the best way of allowing any order of parameters but still parsing them right.
$match = false;
$orders = permute($preg);
foreach ($orders as $p)
if (preg_match('~^' . implode('', $p) . '\]~i', substr($message, $pos1 - 1), $matches) != 0)
{
$match = true;
break;
}

// Didn't match our parameter list, try the next possible.
if (!$match)
continue;

and replace it with this:
Code (Replace) Select

// START: Dougiefresh's faster parse_bbc function
if (!fix_param_order(($test = substr($message, $pos1 - 1)), $possible['parameters'], $replace_str))
continue;
$preg = '';
foreach ($possible['parameters'] as $p => $info)
$preg .= '(\s+' . $p . '=' . (empty($info['quoted']) ? '' : '&quot;') . (isset($info['match']) ? $info['match'] : '(.+?)') . (empty($info['quoted']) ? '' : '&quot;') . ')' . (empty($info['optional']) ? '' : '?');
if (!preg_match('~^' . $preg . '\]~i', ($test = $replace_str . substr($test, strpos($test, ']'))), $matches))
continue;
$message = substr($message, 0, $pos1 - 1) . $test;
// END: Dougiefresh's faster parse_bbc function





Okay, this isn't plain English, so I'll translate what the modifications do.....

My solution to this issue has several steps:
(1) Internally rearrange the parameters into the proper order and return the number of unused parameters, as well as the replacement string.
(2) If there are unused parameters, then this can't be the bbcode we're looking for.  Continue to next bbcode...
(3) TEST the replacement string made in STEP 2 to see if the parameters match what that particular bbcode expects.
(4) If STEP 3 is not a match, continue to next bbcode...
(5) Replace the bbcode in question with the rearranged version.

Why is this important?  Several reasons:
(1) Memory is saved by not having to generate all possible permutations of a particular parameter set.
(2) Time is saved (at least with larger sets) by NOT having to generate a large number of permutations.
(3) Time is saved (at least with larger sets) by NOT having to test a large number of permutations.
(4) Irritation by the user because they don't have to deal with "blank screens of death" due to out-of-memory situations.

Please note that this mod leaves the permute function in place, as other mods or future forum versions may need the built-in functionality.




EDIT: Replaced erroneous "replace" operation with "add after"....  Whoops!

dougiefresh

I've also posted a new mod that corrects this issue.  It's called Faster SMF BBCode Parser.  No, as of this writing, it hasn't been approved yet, so please don't comment that there is nothing there!

Kindred

Hmmm.... needs some review... but, Dougie... would you be willing to submit this as a PR on github?
(there are potential license issues with us including fixes offered on the forum, while a gitHub PR clearly indicates the intention to submit...


BTW: Nice job!
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

dougiefresh

Thanks!  Tell me how to submit one on GitHub and I will.

margarett

A submission to GH is to be added to 2.1's codebase, not 2.0 ;)

Have you worked with GH before? If not, you need to fork 2.1's repo then do the changes on your own (forked) repo, then finally create a PR for ours. Don't forget to sign your PR's!
I use SourceTree for handling the code flow from GH to my PC.
Se forem conduzir, não bebam. Se forem beber... CHAMEM-ME!!!! :D

QuoteOver 90% of all computer problems can be traced back to the interface between the keyboard and the chair

Kindred

Well, anything submitted to 2.1 can also be backported to 2.0 :P
(and I don't think the BBC parser has changed between 2.0 and 2.1)

It's just the Intellectual Property/copyright rights that need to be covered, since we are an actual company
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

dougiefresh

It's been submitted at GitHub.  Hopefully, I did it right.....

Kindred

Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

Illori

https://github.com/SimpleMachines/SMF2.1/pull/2956

each commit has to be signed off

something like

Signed-off-by: yournamehere [email protected]

that PR can not be merged until it is signed off.


Illori

you signed off the PR not the commit. if you dont know how to modify the commit, it may be easier to create a new branch commit the change and sign off then make a new PR [and close the old.]

Kindred

you know...   the sign off on the PR should be sufficient. We do not need to beat people over the head when they try to contribute
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

Illori

actually it is not as the PR is not what is merged it is the commits that we see in the log. if the commit is not signed off the commit log will not have the sign off and we may never know if they did sign off.

we have been telling people on github to sign off the commits for years.

Antes

Quote from: Kindred on July 31, 2015, 04:46:33 PM
you know...   the sign off on the PR should be sufficient. We do not need to beat people over the head when they try to contribute

People can edit their PR messages afaik, so they can remove the "sign-off" part and cause trouble deeply.

live627



Kindred

Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

dougiefresh

I'm sorry.  I'm new to GitHub and I didn't understand.  Let me fix that....

EDIT: Tried to edit commit.  Wasn't able to.  Tried to delete commit.  Couldn't figure out how to do that, either.  Deleted account and recreated account, recreated commit and PR.  Signed off on the commit....

Kindred

Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

Advertisement: