Advertisement:

Author Topic: 2.0.18 issue with 'substr' function and UTF8 characters  (Read 3168 times)

Online @rjen

  • Sr. Member
  • ****
  • Posts: 761
  • Gender: Male
    • FJR-club Nederland
2.0.18 issue with 'substr' function and UTF8 characters
« on: May 08, 2021, 03:52:52 AM »
Found an issue in 2.0.18.

The cause is the changed 'substr' function in load.php.
This has been updated in 2.0.18, but due to the changes, special characters such as &, ", < and > are incorrectly processed.

Where does this appear?

We first noticed this the recent topic blocks in TinyPortal: this block uses the output from php ssi_recentTopics, and specifically the short_subject string it provides: if the subject of a message contains an &, the short_subject result presents that as &amp.
Same happens with " ("quot), < (<lt).

It seems that the latest change in this code does not take into account anymore that SMF2.0 is not all UTF-8. These characters are now not correctly handled anymore.
Tinyportal relies on the substr function in SMF to shorten Topic texts en Titles, thefero the issue becomes more visible...

The old SMF code in 2.0.17 works fine:

Load.php
Code: [Select]
'substr' => create_function('$string, $start, $length = null', '
global $smcFunc;
$ent_arr = preg_split(\'~(&#' . (empty($modSettings['disableEntityCheck']) ? '\d{1,7}' : '021') . ';|&quot;|&amp;|&lt;|&gt;|&nbsp;|.)~' . ($utf8 ? 'u' : '') . '\', ' . implode('$string', $ent_check) . ', -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
return $length === null ? implode(\'\', array_slice($ent_arr, $start)) : implode(\'\', array_slice($ent_arr, $start, $length));'),

The new code in 2.0.18 does not:

Code: [Select]
'substr' => function($string, $start, $length = null) use ($utf8, $ent_check, $ent_list, $modSettings)
{
$ent_arr = preg_split('~(' . $ent_list . '|.)~' . ($utf8 ? 'u' : '') . '', $ent_check($string), -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
return $length === null ? implode('', array_slice($ent_arr, $start)) : implode('', array_slice($ent_arr, $start, $length));
},

This should be patched in 2.0.19.
This code works correctly:

Code: [Select]
        'substr' => function($string, $start, $length = null) use ($utf8, $ent_check, $ent_list, $modSettings)
        {
            $ent_arr = preg_split('~(&#' . (empty($modSettings['disableEntityCheck']) ? '\d{1,7}' : '021') . ';|&quot;|&amp;|&lt;|&gt;|&nbsp;|.)~' . ($utf8 ? 'u' : '') . '', $ent_check($string), -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
            return $length === null ? implode('', array_slice($ent_arr, $start)) : implode('', array_slice($ent_arr, $start, $length));
        },
Running SMF 2.0 with Tinyportal 2.0.1 at www.fjr-club.nl
Testing SMF 2.1 with Tinyportal 2.1.0 at test2.fjr-club.nl


Offline shawnb61

  • Developer
  • SMF Hero
  • *
  • Posts: 3,319
    • sbulen on GitHub
Re: 2.0.18 issue with 'substr' function and UTF8 characters
« Reply #2 on: May 08, 2021, 03:01:14 PM »
Yes, I believe it's the same.
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

Online @rjen

  • Sr. Member
  • ****
  • Posts: 761
  • Gender: Male
    • FJR-club Nederland
Re: 2.0.18 issue with 'substr' function and UTF8 characters
« Reply #3 on: May 08, 2021, 03:15:40 PM »
Just checked the patch: different solution, but the same results... so yes. that fixes it too...
Running SMF 2.0 with Tinyportal 2.0.1 at www.fjr-club.nl
Testing SMF 2.1 with Tinyportal 2.1.0 at test2.fjr-club.nl