Simple Machines Community Forum

SMF Development => Bug Reports => Fixed or Bogus Bugs => Topic started by: Arantor on May 15, 2010, 03:07:54 PM

Title: [4368]Word censor should disable option to be limited to whole words in UTF-8
Post by: Arantor on May 15, 2010, 03:07:54 PM
The word censor provides the option for restricting censor matches to whole words, or simply match in place.

When matching whole words, this relies on using the \w marker (or is it \W, I forget, either way it's a PCRE control character), which is fine -- until you're in UTF-8 mode.

As covered by http://www.simplemachines.org/community/index.php?topic=363219.msg2612779#msg2612779 and specifically I'll requote the paragraph from the PHP manual on the subject:
QuoteMatching characters by Unicode property is not fast, because PCRE has to search a structure that contains data for over fifteen thousand characters. That is why the traditional escape sequences such as \d and \w do not use Unicode properties in PCRE.

Since PCRE does not support \w in UTF-8 mode, the option is actually pointless (since it doesn't work) so the option should be removed when in UTF-8 mode.
Title: Re: Word censor should disable option for being limited to whole words in UTF-8
Post by: Norv on July 18, 2010, 05:01:26 AM
Tracked.
Title: Re: [4368]Word censor should disable option to be limited to whole words in UTF-8
Post by: ziycon on January 14, 2014, 04:32:12 AM
Moved to github.
https://github.com/SimpleMachines/SMF2.1/issues/1189