XML functions broken for PHP 5.3.10

Started by RetroX, March 24, 2012, 02:41:03 PM

Previous topic - Next topic

RetroX

It appears that the cleanXml() function (QueryString.php) is broken under SMF 2.0.2 when using a Unicode encoding. I've found that this code doesn't work on my system:
preg_replace('~[\x00-\x08\x0B\x0C\x0E-\x19' . ($context['utf8'] ? (@version_compare(PHP_VERSION, '4.3.3') != -1 ? '\x{D800}-\x{DFFF}\x{FFFE}\x{FFFF}' : "\xED\xA0\x80-\xED\xBF\xBF\xEF\xBF\xBE\xEF\xBF\xBF") : '') . ']~' . ($context['utf8'] ? 'u' : ''), '', $string)
And returns the following error:
Warning: preg_replace(): Compilation failed: disallowed Unicode code point (>= 0xd800 && <= 0xdfff) at offset 34

I'm using an Arch Linux system locally, which is why I have such a bleeding-edge PHP version. On most servers, you won't find a version of PHP that's this up-to-date, but it's still a very big issue.

I've tried taking the code out of SMF and it does appear to work when $context['utf8'] == false, however, I'm assuming that this still isn't the intended behaviour.

Is this just an error in the setup on my system, or is this an error in SMF? I don't think that forcing $context['utf8'] to be false within this function would be a good idea, because quite honestly, I don't know what it does.

Essentially, this error breaks all of the XML functions of the forum, such as quick edit, quoting, auto-suggest, etc.

Spuds

Well that stinks ! ... looks like its a change in the PCRE engine starting at rev 8.3, where it will toss that warning if you specify the surrogate range D800–DFFF.  These are not technically characters, and are invalid in UTF-8 (strict) ... which is why they were stripped ... so looks like preg now complains  :-X

High Surrogates (D800–DB7F)
High Private Use Surrogates (DB80–DBFF)
Low Surrogates (DC00–DFFF)

Do the following replace and see if it fixes the problem on your setup, I don't have a setup to test it on ATM.

Code (find) Select
preg_replace('~[\x00-\x08\x0B\x0C\x0E-\x19' . ($context['utf8'] ? (@version_compare(PHP_VERSION, '4.3.3') != -1 ? '\x{D800}-\x{DFFF}\x{FFFE}\x{FFFF}' : "\xED\xA0\x80-\xED\xBF\xBF\xEF\xBF\xBE\xEF\xBF\xBF") : '') . ']~' . ($context['utf8'] ? 'u' : ''), '', $string)

Code (replace) Select
return preg_replace('~[\x00-\x08\x0B\x0C\x0E-\x19' . ($context['utf8'] ? '\x{FFFE}\x{FFFF}' : '') . ']~' . ($context['utf8'] ? 'u' : ''), '', $string);

RetroX


reklipz

I'm having this same/similar issue.  When clicking "Insert Quote" while forming a reply, this error is generated and nothing seems to happen.

Quote
FreeBSD c16 8.2-RELEASE-p5 FreeBSD 8.2-RELEASE-p5 #13: Thu Dec 29 10:33:00 UTC 2011 root@x3:/usr/obj/usr/src/sys/NFSN32 i386

How do I go about applying your fix to this specific case?  I'm familiar with PHP, etc., but the information in the post isn't leaving me with a clear picture of how to apply it for some reason...

:-[ -- I totally skipped the first line of the original post.  I think I've got it now, :).

So, this is a system specific fix.  How long until we receive an officially supported fix?

butchs

Hey I myself and happy to see a late monkey than to see no monkey at all.  Thank you for showing up!    :D
I have been truly inspired by the SUGGESTIONS as I sit on my throne and contemplate the wisdom imposed upon me.

reklipz

Quote from: Spuds on March 29, 2012, 08:29:13 PM
Welcome to SMF

It looks to be more of a specific PCRE library issue than a specific system issue ... if php is compiled using the latest external version of the PCRE library it looks to cause this warning error.

It will be fixed in SMF 2.1, I'm not sure if there will be a maintenance release (2.0.3) before then or if this fix would make it in to that TBH.

If your looking for timelines, well you came to the wrong place :P ... things happen as they can around here, never sooner.  We are just happy that people are willing to volunteer their time to do all this for free :)

Understood, :).  No complaints here.  I was just interested in how things work around here.  Everything is working fine on my end now, so I'm in no hurry.  Thanks for the fix!

Norv

#6
Just to note, I am still to look at this closely, afaics however we're including the fix for this in 2.0.3. Thank you for the reports and investigation! :)
To-do lists are for deferral. The more things you write down the later they're done... until you have 100s of lists of things you don't do.

File a security report | Developers' Blog | Bug Tracker


Also known as Norv on D* | Norv N. on G+ | Norv on Github

Kryten15

#7
I have the same issue in terms of getting that same error when trying to use the quick quote, etc. I have tried the above fix but it did not work. Instead I got an error about an unexpected return on that line.

I have a moderate knowledge of general coding but I'm far from being an expert, and I have limited knowledge of php. Is there something I am missing that could help fix this issue?

If the fix will be included in 2.0.3 is there any approximate timeframe for that, or is it best to seek a fix before 2.0.3 is released?

EDIT: Apologies. I did a copy and paste job on the fix suggested by Spuds above but it inserted a second return statement which I didn't pick up on. The fix seems to work for me now.

4LP3RUZ1

Quote from: Spuds on March 24, 2012, 07:54:55 PM
Well that stinks ! ... looks like its a change in the PCRE engine starting at rev 8.3, where it will toss that warning if you specify the surrogate range D800–DFFF.  These are not technically characters, and are invalid in UTF-8 (strict) ... which is why they were stripped ... so looks like preg now complains  :-X

High Surrogates (D800–DB7F)
High Private Use Surrogates (DB80–DBFF)
Low Surrogates (DC00–DFFF)

Do the following replace and see if it fixes the problem on your setup, I don't have a setup to test it on ATM.

Code (find) Select
preg_replace('~[\x00-\x08\x0B\x0C\x0E-\x19' . ($context['utf8'] ? (@version_compare(PHP_VERSION, '4.3.3') != -1 ? '\x{D800}-\x{DFFF}\x{FFFE}\x{FFFF}' : "\xED\xA0\x80-\xED\xBF\xBF\xEF\xBF\xBE\xEF\xBF\xBF") : '') . ']~' . ($context['utf8'] ? 'u' : ''), '', $string)

Code (replace) Select
return preg_replace('~[\x00-\x08\x0B\x0C\x0E-\x19' . ($context['utf8'] ? '\x{FFFE}\x{FFFF}' : '') . ']~' . ($context['utf8'] ? 'u' : ''), '', $string);


Old topic I know, but after a plesk update a few days ago (I am assuming, because I changed nothing else...) I ran into this same problem and the above took care of it. Thanks!
Frozen frogs are back :(

Aaruni

I am getting the following error when I try to quote or quick-edit

2: preg_replace(): Compilation failed: disallowed Unicode code point (>= 0xd800 && <= 0xdfff) at offset 34

I have recently made my database UTF-8, and converted HTML elements to UTF-8 characters.

emanuele

If you have SMF 2.0 the fix proposed here is included in the recent patch, so update your forum and the issue should be fixed.


Take a peek at what I'm doing! ;D




Hai bisogno di supporto in Italiano?

Aiutateci ad aiutarvi: spiegate bene il vostro problema: no, "non funziona" non è una spiegazione!!
1) Cosa fai,
2) cosa ti aspetti,
3) cosa ottieni.

Arvacon

Actually if you do this change and then you try to update smf at new version, you get an error about this file and the specific change.
I tried to ignore that error and the local testing forum installation I have is working without problem.
But because I don't like ignoring errors, I replaced my live installation's file with the old QueryString.php file that I had keep (before I do any change to that), so when I applied the patch to install it, I didn't have any error troubles.
Of course after I did this, I checked the quick edit function and it was not working again, but after the patch installed, the code seems changed to what you have suggested here and quick edit works again.

I just type these words, in case that someone had the same troubles with me, so I hope help you save some time.  ;)

emanuele



Take a peek at what I'm doing! ;D




Hai bisogno di supporto in Italiano?

Aiutateci ad aiutarvi: spiegate bene il vostro problema: no, "non funziona" non è una spiegazione!!
1) Cosa fai,
2) cosa ti aspetti,
3) cosa ottieni.

Advertisement: