Simple Machines Community Forum

SMF Development => Bug Reports => Topic started by: @rjen on May 08, 2021, 03:52:52 AM

Title: 2.0.18 issue with 'substr' function and UTF8 characters
Post by: @rjen on May 08, 2021, 03:52:52 AM
Found an issue in 2.0.18.

The cause is the changed 'substr' function in load.php.
This has been updated in 2.0.18, but due to the changes, special characters such as &, ", < and > are incorrectly processed.

Where does this appear?

We first noticed this the recent topic blocks in TinyPortal: this block uses the output from php ssi_recentTopics, and specifically the short_subject string it provides: if the subject of a message contains an &, the short_subject result presents that as &amp.
Same happens with " ("quot), < (<lt).

It seems that the latest change in this code does not take into account anymore that SMF2.0 is not all UTF-8. These characters are now not correctly handled anymore.
Tinyportal relies on the substr function in SMF to shorten Topic texts en Titles, thefero the issue becomes more visible...

The old SMF code in 2.0.17 works fine:

Load.php
'substr' => create_function('$string, $start, $length = null', '
global $smcFunc;
$ent_arr = preg_split(\'~(&#' . (empty($modSettings['disableEntityCheck']) ? '\d{1,7}' : '021') . ';|&quot;|&amp;|&lt;|&gt;|&nbsp;|.)~' . ($utf8 ? 'u' : '') . '\', ' . implode('$string', $ent_check) . ', -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
return $length === null ? implode(\'\', array_slice($ent_arr, $start)) : implode(\'\', array_slice($ent_arr, $start, $length));'),


The new code in 2.0.18 does not:

'substr' => function($string, $start, $length = null) use ($utf8, $ent_check, $ent_list, $modSettings)
{
$ent_arr = preg_split('~(' . $ent_list . '|.)~' . ($utf8 ? 'u' : '') . '', $ent_check($string), -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
return $length === null ? implode('', array_slice($ent_arr, $start)) : implode('', array_slice($ent_arr, $start, $length));
},


This should be patched in 2.0.19.
This code works correctly:


        'substr' => function($string, $start, $length = null) use ($utf8, $ent_check, $ent_list, $modSettings)
        {
            $ent_arr = preg_split('~(&#' . (empty($modSettings['disableEntityCheck']) ? '\d{1,7}' : '021') . ';|&quot;|&amp;|&lt;|&gt;|&nbsp;|.)~' . ($utf8 ? 'u' : '') . '', $ent_check($string), -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
            return $length === null ? implode('', array_slice($ent_arr, $start)) : implode('', array_slice($ent_arr, $start, $length));
        },
Title: Re: 2.0.18 issue with 'substr' function and UTF8 characters
Post by: shadav on May 08, 2021, 09:28:39 AM
is this the same issue as this:
https://www.simplemachines.org/community/index.php?topic=576612.0

if so, the patch is here:
https://www.simplemachines.org/community/index.php?topic=576612.msg4081408#msg4081408
Title: Re: 2.0.18 issue with 'substr' function and UTF8 characters
Post by: shawnb61 on May 08, 2021, 03:01:14 PM
Yes, I believe it's the same.
Title: Re: 2.0.18 issue with 'substr' function and UTF8 characters
Post by: @rjen on May 08, 2021, 03:15:40 PM
Just checked the patch: different solution, but the same results... so yes. that fixes it too...
Title: Re: 2.0.18 issue with 'substr' function and UTF8 characters
Post by: dcmouser on August 25, 2021, 04:58:26 AM
Was just coming here to report the same problem; thanks for the fix!