News:

Want to get involved in developing SMF, then why not lend a hand on our github!

Main Menu

[4368]Word censor should disable option to be limited to whole words in UTF-8

Started by Arantor, May 15, 2010, 03:07:54 PM

Previous topic - Next topic

Arantor

The word censor provides the option for restricting censor matches to whole words, or simply match in place.

When matching whole words, this relies on using the \w marker (or is it \W, I forget, either way it's a PCRE control character), which is fine -- until you're in UTF-8 mode.

As covered by http://www.simplemachines.org/community/index.php?topic=363219.msg2612779#msg2612779 and specifically I'll requote the paragraph from the PHP manual on the subject:
QuoteMatching characters by Unicode property is not fast, because PCRE has to search a structure that contains data for over fifteen thousand characters. That is why the traditional escape sequences such as \d and \w do not use Unicode properties in PCRE.

Since PCRE does not support \w in UTF-8 mode, the option is actually pointless (since it doesn't work) so the option should be removed when in UTF-8 mode.

Norv

To-do lists are for deferral. The more things you write down the later they're done... until you have 100s of lists of things you don't do.

File a security report | Developers' Blog | Bug Tracker


Also known as Norv on D* | Norv N. on G+ | Norv on Github


Advertisement: