SMF Development > Bug Reports

2.0.18 issue with 'substr' function and UTF8 characters

(1/1)

@rjen:
Found an issue in 2.0.18.

The cause is the changed 'substr' function in load.php.
This has been updated in 2.0.18, but due to the changes, special characters such as &, ", < and > are incorrectly processed.

Where does this appear?

We first noticed this the recent topic blocks in TinyPortal: this block uses the output from php ssi_recentTopics, and specifically the short_subject string it provides: if the subject of a message contains an &, the short_subject result presents that as &amp.
Same happens with " ("quot), < (<lt).

It seems that the latest change in this code does not take into account anymore that SMF2.0 is not all UTF-8. These characters are now not correctly handled anymore.
Tinyportal relies on the substr function in SMF to shorten Topic texts en Titles, thefero the issue becomes more visible...

The old SMF code in 2.0.17 works fine:

Load.php

--- Code: --- 'substr' => create_function('$string, $start, $length = null', '
global $smcFunc;
$ent_arr = preg_split(\'~(&#' . (empty($modSettings['disableEntityCheck']) ? '\d{1,7}' : '021') . ';|&quot;|&amp;|&lt;|&gt;|&nbsp;|.)~' . ($utf8 ? 'u' : '') . '\', ' . implode('$string', $ent_check) . ', -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
return $length === null ? implode(\'\', array_slice($ent_arr, $start)) : implode(\'\', array_slice($ent_arr, $start, $length));'),

--- End code ---

The new code in 2.0.18 does not:


--- Code: --- 'substr' => function($string, $start, $length = null) use ($utf8, $ent_check, $ent_list, $modSettings)
{
$ent_arr = preg_split('~(' . $ent_list . '|.)~' . ($utf8 ? 'u' : '') . '', $ent_check($string), -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
return $length === null ? implode('', array_slice($ent_arr, $start)) : implode('', array_slice($ent_arr, $start, $length));
},

--- End code ---

This should be patched in 2.0.19.
This code works correctly:


--- Code: ---        'substr' => function($string, $start, $length = null) use ($utf8, $ent_check, $ent_list, $modSettings)
        {
            $ent_arr = preg_split('~(&#' . (empty($modSettings['disableEntityCheck']) ? '\d{1,7}' : '021') . ';|&quot;|&amp;|&lt;|&gt;|&nbsp;|.)~' . ($utf8 ? 'u' : '') . '', $ent_check($string), -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
            return $length === null ? implode('', array_slice($ent_arr, $start)) : implode('', array_slice($ent_arr, $start, $length));
        },

--- End code ---

shadav:
is this the same issue as this:
https://www.simplemachines.org/community/index.php?topic=576612.0

if so, the patch is here:
https://www.simplemachines.org/community/index.php?topic=576612.msg4081408#msg4081408

shawnb61:
Yes, I believe it's the same.

@rjen:
Just checked the patch: different solution, but the same results... so yes. that fixes it too...

Navigation

[0] Message Index

Go to full version