News:

SMF 2.1.4 has been released! Take it for a spin! Read more.

Main Menu

Word Censor List

Started by dougiefresh, November 05, 2013, 03:00:43 AM

Previous topic - Next topic

dougiefresh

Link to Mod



WORD CENSOR LIST v1.5
By Dougiefresh -> Link to Mod



Introduction
So, you want to run a family friendly community, without any vulgar words appearing on your site. The easiest way to prevent that is to use SMF's word censor feature, but you have an empty list of words and don't want to spend an hour filling in every naughty word you know and some you don't.

Word Censor List will help you by adding a list of some commonly censored words and some uncommon ones to your forum.

Compatibility Notes
This mod was tested on SMF 2.0.5, but should work on earlier versions of SMF 2.0.x.  SMF 1.x is not and will not be supported.

Changelog
The changelog can be viewed at XPtsp.com.

License
Copyright (c) 2015 - 2018, Douglas Orend
All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

dougiefresh

Updated to v1.1.  Upgrading from v1.0 to v1.1 is not necessary, as it does not replace the functionality provided, only fixes the settings installer.

TonyG

I have a list of censor words that I carry around from one family-oriented site to another. Interested in an update to the list you have in edit_db.php? Do you already have a place for this or some preferred mechanism for doing this?
Thanks!

Kindred

Quote from: dougiefresh on November 05, 2013, 03:00:43 AM
. The easiest way to prevent that is to use phpBB's word censor feature,

really? :)
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

margarett

Se forem conduzir, não bebam. Se forem beber... CHAMEM-ME!!!! :D

QuoteOver 90% of all computer problems can be traced back to the interface between the keyboard and the chair

dougiefresh

Always interested in submissions....  Please share!

Biology Forums

Always wanted a mod like this, thanks.

dougiefresh

Quote from: Kindred on December 22, 2014, 04:26:12 PM
Quote from: dougiefresh on November 05, 2013, 03:00:43 AM
. The easiest way to prevent that is to use phpBB's word censor feature,

really? :)
:o Whoops!!!  I meant that you should use SMF's word censor feature.....  Fixed that in the first post!  ::)  I guess I should admit that I copied the description from a phpBB mod and didn't pay that much attention....

TonyG

I just updated the list. Based on other entries, I added and modified a lot of words to include RegExp tests, but it doesn't look like any of those are working. I'm using the Advanced Censor mod which does a PHP function strstr, and Block Censor Words.

Has anyone here modified their filter to do regex tests with the censor list?

Thanks!

Arantor

Considering that the internals of the censor function already use regex, I wish you the *very* best of luck performing the rewrite required to make that work as intended.

TonyG

So am I to understand that this Word Censor List was invalid from the start?

I'll have to look at the regexp code because it doesn't look like it's working with the masks being used.

So which is it? Are we using the wrong kind of regex? Is the regex not working? Is there any documentation for the syntax supported by the current regex mechanism?IF that's a preg_match, can we assume that if the word list in the database has a string that can be interpreted by preg_match that we'll get good censor matching?

And now that I'm thinking about this I'm thinking that the mods might be using strstr() while SMF might be using preg_match, which leaves text to get filtered in different ways along the chain of execution - that can just lead to confusion and embarrassment.

Let's not leave this unresolved - what SHOULD work there?

Thanks.

Arantor

I don't know what you understood from what I said, to be honest, but clearly there is some misunderstanding somewhere.

This adds them to the database in the way SMF's own interface does. This is then internally converted into a regex for processing purposes. It doesn't support full regex syntax for this reason. Hence my comment.

But multiple times I have seen comments... you clearly know best, of course. Best of luck to you.

TonyG

We do have a misunderstanding. I'm trying to understand how this stuff is working so that we can do better filtering.
I completely understand that this Word Censor List mod just inserts text strings into the database.
From there, what happens to each string?

The list already includes some strings with regexp. I just asked if that was valid or not.
You said "This is then internally converted into a regex for processing purposes. It doesn't support full regex syntax for this reason. "
OK, so what sort of conversion is done there? Knowing that will allow us to make better improvements to this list.

From the examples already in the list, it seemed to me that "b[4a@][!l][!l][0o][0o]n" should match balloon, b@l!00n, and b4l!0on. Is that not correct? If not then all I was saying is that a number of entries already in the list are bad and we need to change how this is approached.

TonyG

#13
Coming back to this topic. Can anyone tell us exactly what Regex syntax is supported for words found in the censor list?

I see the Load.php code referred to by @Arantor:

if ($censor_vulgar == null)
{
$censor_vulgar = explode("\n", $modSettings['censor_vulgar']);
$censor_proper = explode("\n", $modSettings['censor_proper']);

// Quote them for use in regular expressions.
for ($i = 0, $n = count($censor_vulgar); $i < $n; $i++)
{
$censor_vulgar[$i] = strtr(preg_quote($censor_vulgar[$i], '/'), array('\\\\\\*' => '[*]', '\\*' => '[^\s]*?', '&' => '&amp;'));
$censor_vulgar[$i] =
                              (empty($modSettings['censorWholeWord']) ?
                                    '/' . $censor_vulgar[$i] . '/' :
                                        '/(?<=^|\W)' .
                                        $censor_vulgar[$i] .
                                        '(?=$|\W)/') .
                              (empty($modSettings['censorIgnoreCase']) ?
                                   '' :
                                        'i') .
                              ((empty($modSettings['global_character_set']) ?
                                   $txt['lang_character_set'] :
                                        $modSettings['global_character_set']) === 'UTF-8' ? 'u' : '');

if (strpos($censor_vulgar[$i], '\'') !== false)
{
$censor_proper[count($censor_vulgar)] = $censor_proper[$i];
$censor_vulgar[count($censor_vulgar)] = strtr($censor_vulgar[$i], array('\'' => '&#039;'));
}
}
}

// Censoring isn't so very complicated :P.
$text = preg_replace($censor_vulgar, $censor_proper, $text);


I broke up that meaty assignment statement just for readability. I understand that's adjusting each word element to account for server-specific settings. But can anyone explain exactly what the reformatting code is doing which might preclude using Regex syntax in elements of $modSettings['censor_vulgar'] ?

Note: I just looked at the Advanced Censor mod. This will not process the $modSettings['censor_vulgar'] list using Regex as seen above. It looks for specific text.:
if (strstr($pMessageBody, $vCensorVulgar[$i])) return true;

However, I believe that code could easily be retrofit with the code from Load.php.

Thanks.

dougiefresh

Hmmmm....  I don't have a copy of version 1.0 of this mod, so I'm gonna have to figure something out regarding the broken censor list....

TonyG

#15
I don't understand @dougiefresh.

To get the Regexp in your word list to work, I think one just needs to understand what's being done in that core code to each element before it does the final preg_replace. It might be helpful to write that data to a file to see what's been done to it. Then we can revise each element to confirm.

As to the Advanced Censor mod, it returns a true before posting if the text contains a censored word. So all that's needed there is the same code from Load.php, and final check:
if (preg_replace($censor_vulgar, $censor_proper, $text) !== $text) return true;
Someone should advise him that his mod is invalid if the wordlist contains Regexp. I guess I'll do this after we get through this discussion.

HTH

dougiefresh

Uploaded v1.2 - January 12th, 2015
o Removed most wildcards from the word censor list.
o Corrected link to the mod in the descriptions.

TonyG

So is the answer to the ongoing question that regex is simply not supported at all for censored words?
If so, then removing the wildcards from the list in this mod is the right solution for this mod.

I think the better long-term solution however is to find out what regexp is possible in the code from Load.php, and then get words in the list to conform within the constraints.

metallicgloss

I installed this package and it is now turning all 'hello' into *o and it is REALLY ANNOYING.
I edited the file in the pack but nothing has changed. I re-installed my forum and re-added a couple packages with an execption of this. It is still doing it, it is doing something with the database. Where can i remove it so it now doesnt could hell as a swear word.

dougiefresh

@metallicgloss: Go into the Admin panel, under Forum => Posts and Topics => Censored Words.  Put a check in the option saying Check only whole words:.  That should solve the problem....

metallicgloss

@dougiefresh THANK YOU!!!!! This will really help the community. Thank you for your help.

KensonPlays

Thanks for the update! I had one by @Labradoodle-360, but I could not find it for the life of me. You're a life-saver. :P

Owner of Mesozoic Haven

dougiefresh

Uploaded v1.4 - April 5th, 2015
o Updated for SMF 2.1 Beta 1

skb

I uninstalled the mod, yet the words remain in the Censored Word List ?

SMF 2.1.4 / TP 2.2.2

Arantor

That would make sense, the mod just adds to the existing censor list.

dougiefresh

Uploaded v1.5 - November 9th, 2018
o No functionality change.
o Updated documentation to point to new website.

skb

I have the "Allow users to turn off word censoring:" option enabled, but I don't see the setting in My Profile / Account Settings / Forum Profile or Look and Layout where a user can exercise this option.

SMF 2.1.4 / TP 2.2.2

landyvlad

Can I use this to add all the words etc to my censored word list and then delete the mod to clean up but leave all those  words in the censor list?
"Put as much effort into your question as you'd expect someone to give in an answer"

Please do not PM, IM or Email me with questions on astrophysics or theology.  You will get better and faster responses by asking homeless people in the street. Thank you.

Be the person your dog thinks you are.

efk

Quote from: landyvlad on October 01, 2021, 12:40:43 AMCan I use this to add all the words etc to my censored word list and then delete the mod to clean up but leave all those  words in the censor list?
I believe that you don't need a mod for that. Simply go to Admin/Forum/Posts and Topics.../Censored Words and do what you need to do, its simple to use.

Kindred

efk --   the thing is, this mod simplifies the adding of a WHOLE BUNCH of words rather than entering them by hand.

landyvlad -- based on the comments above, the answer is yes.
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

shadav

yes you don't need the mod, in fact I only installed the mod on one of my forums then set up the censored words how I wanted then carefully extracted them from the settings table and imported them into the rest of my forums  ;D
don't uninstall the mod though
but you can delete the mod as all this really does is run an sql to your db

it is very helpful as it has a lot of words in its list and a lot of variations of said words and in other languages as well.

landyvlad

Quote from: shadav on October 05, 2021, 12:24:16 PMdon't uninstall the mod though
but you can delete the mod as all this really does is run an sql to your db

Why not uninstall?  Would that remove the words from the censored list?
"Put as much effort into your question as you'd expect someone to give in an answer"

Please do not PM, IM or Email me with questions on astrophysics or theology.  You will get better and faster responses by asking homeless people in the street. Thank you.

Be the person your dog thinks you are.

shadav

I'm not entirely sure but I think it removes them from the db if you uninstall...looking at the files it appears to remove the words but again I'm not entirely sure

Advertisement: