Simple Machines Community Forum

SMF Development => Feature Requests => Applied or Declined Requests => Topic started by: PokémonS on January 07, 2013, 07:08:59 AM

Title: Censored word should not censoring URL
Post by: PokémonS on January 07, 2013, 07:08:59 AM
SMF 2.0.2
Hi, when I optimize my censor words, there are some url which censored, even I check this:

(http://puu.sh/1KLCR)

I type this:
[url=http://arkeis-pokemon.deviantart.com/]pokemon Arkeis[/url]

And this is the output:
[url=http://arkeis-Pokémon.deviantart.com/]Pokémon Arkeis[/url]




Well, it should be like this:
[url=http://arkeis-pokemon.deviantart.com/]Pokémon Arkeis[/url]

So, URL in tag URL should not censored.




Aaand what would be happen if...
[img]http://puu.sh/hore[/img]
[url=http://www.youtube.com/watch?v=v8YHorEjV_k]Youtube Video[/url]


There is a h*re words in the url D:
Title: Re: Censored word should not censoring URL
Post by: Kindred on January 07, 2013, 07:38:28 AM
the system is doing exactly what you asked...   it is checking the whole word (pokemon) and ignoring the case to set it to what you have defined.

and why should the censor ignore urls?
many admins actually use to censor to force certain url corrections or updates.
Title: Re: Censored word should not censoring URL
Post by: PokémonS on January 07, 2013, 07:47:32 AM
Quote from: Kindred on January 07, 2013, 07:38:28 AM
the system is doing exactly what you asked...   it is checking the whole word (pokemon) and ignoring the case to set it to what you have defined.

and why should the censor ignore urls?
many admins actually use to censor to force certain url corrections or updates.

How about adding an Ignore URL tickbox?
Some admins needs to ignore url from censored words ._.
Title: Re: Censored word should not censoring URL
Post by: Kindred on January 07, 2013, 09:02:56 AM
seems like an unusual situation kinda specific to your own personal setup. I have never seen anyone else complain about it before.

So, the extra processing is probably not worth it.
Title: Re: Censored word should not censoring URL
Post by: Arantor on January 07, 2013, 09:04:20 AM
The extra processing is very seriously not worth it because of the amount of processing that would actually be required.
Title: Re: Censored word should not censoring URL
Post by: dimspace on January 09, 2013, 05:13:41 PM
This is a long standing problem on cycling forums when you want to link to an article on spaziociclismo.it :D
Title: Re: Censored word should not censoring URL
Post by: Kindred on January 09, 2013, 08:45:31 PM
Quote from: Kindred on January 07, 2013, 07:38:28 AM
the system is doing exactly what you asked...   it is checking the whole word (pokemon) and ignoring the case to set it to what you have defined.

and why should the censor ignore urls?
many admins actually use to censor to force certain url corrections or updates.

in other words... it is not actually a problem
Title: Re: Censored word should not censoring URL
Post by: oaqm on January 14, 2013, 10:56:42 PM
Actually, I have gotten complaints about the word censor messing up urls and wouldn't mind an option to "exempt" urls.

In my case, my board automagically turns "Duke" into "Duke LaCrosse University". Obviously, whenever anyone links to an online story about Duke, the url is almost always trashed, which leads to much wailing and gnashing of teeth and rending of garments and so on and so forth.

Don't even ask me what "NFL" turns into, children might be reading this.......
Title: Re: Censored word should not censoring URL
Post by: Arantor on January 14, 2013, 10:59:25 PM
Aside from the fact it would seriously tank performance on your site. You could just uncensor Duke, of course...
Title: Re: Censored word should not censoring URL
Post by: oaqm on January 15, 2013, 08:23:55 AM
That will not happen on my watch. Duke EARNED that one, an unforced error in the name of political correctness that they are still chewing on.

This is the Feature Request forum, right?

I'll run my board my way and I will accord you that honor as well. The request for a url exemption to the word censor strikes me as worthy on its face. If the gurus of SMF say it can't be done, then it can't be done, but this IS the Feature Request forum, right??
Title: Re: Censored word should not censoring URL
Post by: Kindred on January 15, 2013, 08:31:15 AM
almost anything CAN be done...
the question is - "Is it worth doing"

Arantor is one of the guru of gurus with SMF code...   and I believe him when he says that such a change would adversely affect performance.
Additionally, as I have pointed out, contrary to your request, many admins actually use the censor to force a redirect and/or correction of URLs... 

So, I don't see any reason to do this. If you don't want the url words replaced, then change the censored word... or get someone to make a mod (honestly, the way most features make it into releases is by starting as mods and becoming popular enough to be noticed as frequently implemented updates.
Title: Re: Censored word should not censoring URL
Post by: Arantor on January 15, 2013, 10:05:03 AM
Well, making it an option wouldn't prohibit the (many more) admins who use it for correction.

The problem is actually doing the detection. Censoring is deliberately done in a naive (in code this has a specific meaning) fashion, it knows nothing of its content. Now imagine that for every term you search for, instead of merely going forwards, letter by letter, looking to see if you matched something, you have to search backwards as well for a fairly surprising number of permutations.

Now do this 30-40 times a page.
Title: Re: Censored word should not censoring URL
Post by: kat on January 15, 2013, 01:55:55 PM
When censoring, you're given the option to "Check only whole words:". Surely, that would preclude URLs, anyway, to some degree, wouldn't it?
Title: Re: Censored word should not censoring URL
Post by: Shambles on January 15, 2013, 01:56:44 PM
^-- I tried that earlier K@ and URLs got mashed just like regular [whole] words. Dunno why it did that tho.


Then again, I only tried it with censoring "fred" to "bill" and checking http://www.fred.com
Title: Re: Censored word should not censoring URL
Post by: kat on January 15, 2013, 01:58:57 PM
I might have to experiment, with that, then.
Title: Re: Censored word should not censoring URL
Post by: Arantor on January 15, 2013, 01:59:32 PM
Not really. The option is an all-or-nothing deal, it's not per-word (and there's no good way to make it so without revamping the entire system)

Even if it were check only whole words, it still wouldn't prevent all URL damage. For example, to use the earlier example, of censoring 'duke' to something else, were there to be a URL of www.example.com/duke/ - this would still match as the method of checking is to match the word book-ended by non-word characters, and / would be one of those.

It's a lot more complicated than that, but that's the basic version. (The problem is that \W and \w match all sorts of funny characters. Especially if setlocale has been called, and then pretty much all bets are off)
Title: Re: Censored word should not censoring URL
Post by: kat on January 15, 2013, 02:03:17 PM
I just censored "tla", replacing it with "lta", ensuring that I checked "Check only whole words:".

My forum has NOT changed this URL:

http://www.tlakoc.org.uk

I guess different things react in different ways, ay?

I substituted the entire "tlakoc" bit and it censored it, so it must see the "tlakoc" thing as a single word.
Title: Re: Censored word should not censoring URL
Post by: Arantor on January 15, 2013, 02:08:00 PM
Well... yes... if 'check only whole words' is selected, tla is not a whole word, because it's followed by a character that is a word character (that's the exact case it's designed for, e.g. to not censor a certain word to 'butt', e.g. clbuttic)
Title: Re: Censored word should not censoring URL
Post by: Shambles on January 15, 2013, 02:12:04 PM
That adjective - "book-ended" - is a superb summary of why my test failed, but the word-within-word passed the censor check.

I like that phrase. "book-ended". Brilliant.
Title: Re: Censored word should not censoring URL
Post by: kat on January 15, 2013, 02:15:48 PM
I guess it matters how the censor decides what a single word is defined as. Actually giving it that definition's gonna be a nightmare if, as I suspect, it's defined as "A word that's followed by a space or punctuation mark". What other definitions could it have, to stop it blocking-out URLs?

Nigh-on impossible, I'd guess.
Title: Re: Censored word should not censoring URL
Post by: Arantor on January 15, 2013, 02:19:01 PM
No, it is as I said - a word is defined as a collection of characters that make up a word, with a character at each end that is not a character that can be part of a word. Except the definition of what a 'word character' is very, very broad and varies depending on language.
Title: Re: Censored word should not censoring URL
Post by: kat on January 15, 2013, 02:58:10 PM
I thought that's what I said, in a rather more garbled way. ;)

Either way, a 100% accurate censor is nigh-on impossible, especially where URLs are concerned, ay?
Title: Re: Censored word should not censoring URL
Post by: Arantor on January 15, 2013, 03:02:26 PM
Well, yes and no. The censor has no idea what a single word is. The concept physically does not exist in the way the censor works.

All it knows is that there are certain characters that can be part of a word, a lot of other characters that can't. And that a 'word' is where it finds a collection of one, with the other before and after it.
Title: Re: Censored word should not censoring URL
Post by: Kindred on January 15, 2013, 03:24:52 PM
and k@...   your definition means that the duke in www.duke.com would be replaced.... since it is separated form the rest by periods...


Seriously, there is no good way to do it.
Title: Re: Censored word should not censoring URL
Post by: kat on January 15, 2013, 03:54:18 PM
No, I can see that.

Thing is, however it's done, something's bound to screw it up, innit?
Title: Re: Censored word should not censoring URL
Post by: Arantor on November 07, 2013, 06:22:47 PM
After re-reviewing this, it's clear that we can't have the censor work in this fashion. Unfortunately that does mean moving to the declined board :(