Censored word should not censoring URL

Started by PokémonS, January 07, 2013, 07:08:59 AM

Previous topic - Next topic

PokémonS

SMF 2.0.2
Hi, when I optimize my censor words, there are some url which censored, even I check this:



I type this:
[url=http://arkeis-pokemon.deviantart.com/]pokemon Arkeis[/url]

And this is the output:
[url=http://arkeis-Pokémon.deviantart.com/]Pokémon Arkeis[/url]




Well, it should be like this:
[url=http://arkeis-pokemon.deviantart.com/]Pokémon Arkeis[/url]

So, URL in tag URL should not censored.




Aaand what would be happen if...
[img]http://puu.sh/hore[/img]
[url=http://www.youtube.com/watch?v=v8YHorEjV_k]Youtube Video[/url]


There is a h*re words in the url D:
きみと手をつなごう つらいときはもっと
ゼロからはじめよう ほら ほら 手をつなごう
みんな手をつなごう つらいときはもっと
力を合わせよう ほら ほら 手をつなごう

Kindred

the system is doing exactly what you asked...   it is checking the whole word (pokemon) and ignoring the case to set it to what you have defined.

and why should the censor ignore urls?
many admins actually use to censor to force certain url corrections or updates.
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

PokémonS

Quote from: Kindred on January 07, 2013, 07:38:28 AM
the system is doing exactly what you asked...   it is checking the whole word (pokemon) and ignoring the case to set it to what you have defined.

and why should the censor ignore urls?
many admins actually use to censor to force certain url corrections or updates.

How about adding an Ignore URL tickbox?
Some admins needs to ignore url from censored words ._.
きみと手をつなごう つらいときはもっと
ゼロからはじめよう ほら ほら 手をつなごう
みんな手をつなごう つらいときはもっと
力を合わせよう ほら ほら 手をつなごう

Kindred

seems like an unusual situation kinda specific to your own personal setup. I have never seen anyone else complain about it before.

So, the extra processing is probably not worth it.
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

Arantor

The extra processing is very seriously not worth it because of the amount of processing that would actually be required.
Holder of controversial views, all of which my own.


dimspace

This is a long standing problem on cycling forums when you want to link to an article on spaziociclismo.it :D

Kindred

Quote from: Kindred on January 07, 2013, 07:38:28 AM
the system is doing exactly what you asked...   it is checking the whole word (pokemon) and ignoring the case to set it to what you have defined.

and why should the censor ignore urls?
many admins actually use to censor to force certain url corrections or updates.

in other words... it is not actually a problem
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

oaqm

Actually, I have gotten complaints about the word censor messing up urls and wouldn't mind an option to "exempt" urls.

In my case, my board automagically turns "Duke" into "Duke LaCrosse University". Obviously, whenever anyone links to an online story about Duke, the url is almost always trashed, which leads to much wailing and gnashing of teeth and rending of garments and so on and so forth.

Don't even ask me what "NFL" turns into, children might be reading this.......

Arantor

Aside from the fact it would seriously tank performance on your site. You could just uncensor Duke, of course...
Holder of controversial views, all of which my own.


oaqm

That will not happen on my watch. Duke EARNED that one, an unforced error in the name of political correctness that they are still chewing on.

This is the Feature Request forum, right?

I'll run my board my way and I will accord you that honor as well. The request for a url exemption to the word censor strikes me as worthy on its face. If the gurus of SMF say it can't be done, then it can't be done, but this IS the Feature Request forum, right??

Kindred

almost anything CAN be done...
the question is - "Is it worth doing"

Arantor is one of the guru of gurus with SMF code...   and I believe him when he says that such a change would adversely affect performance.
Additionally, as I have pointed out, contrary to your request, many admins actually use the censor to force a redirect and/or correction of URLs... 

So, I don't see any reason to do this. If you don't want the url words replaced, then change the censored word... or get someone to make a mod (honestly, the way most features make it into releases is by starting as mods and becoming popular enough to be noticed as frequently implemented updates.
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

Arantor

Well, making it an option wouldn't prohibit the (many more) admins who use it for correction.

The problem is actually doing the detection. Censoring is deliberately done in a naive (in code this has a specific meaning) fashion, it knows nothing of its content. Now imagine that for every term you search for, instead of merely going forwards, letter by letter, looking to see if you matched something, you have to search backwards as well for a fairly surprising number of permutations.

Now do this 30-40 times a page.
Holder of controversial views, all of which my own.


kat

When censoring, you're given the option to "Check only whole words:". Surely, that would preclude URLs, anyway, to some degree, wouldn't it?

Shambles

^-- I tried that earlier K@ and URLs got mashed just like regular [whole] words. Dunno why it did that tho.


Then again, I only tried it with censoring "fred" to "bill" and checking http://www.fred.com

kat

I might have to experiment, with that, then.

Arantor

Not really. The option is an all-or-nothing deal, it's not per-word (and there's no good way to make it so without revamping the entire system)

Even if it were check only whole words, it still wouldn't prevent all URL damage. For example, to use the earlier example, of censoring 'duke' to something else, were there to be a URL of www.example.com/duke/ - this would still match as the method of checking is to match the word book-ended by non-word characters, and / would be one of those.

It's a lot more complicated than that, but that's the basic version. (The problem is that \W and \w match all sorts of funny characters. Especially if setlocale has been called, and then pretty much all bets are off)
Holder of controversial views, all of which my own.


kat

I just censored "tla", replacing it with "lta", ensuring that I checked "Check only whole words:".

My forum has NOT changed this URL:

http://www.tlakoc.org.uk

I guess different things react in different ways, ay?

I substituted the entire "tlakoc" bit and it censored it, so it must see the "tlakoc" thing as a single word.

Arantor

Well... yes... if 'check only whole words' is selected, tla is not a whole word, because it's followed by a character that is a word character (that's the exact case it's designed for, e.g. to not censor a certain word to 'butt', e.g. clbuttic)
Holder of controversial views, all of which my own.


Shambles

That adjective - "book-ended" - is a superb summary of why my test failed, but the word-within-word passed the censor check.

I like that phrase. "book-ended". Brilliant.

kat

I guess it matters how the censor decides what a single word is defined as. Actually giving it that definition's gonna be a nightmare if, as I suspect, it's defined as "A word that's followed by a space or punctuation mark". What other definitions could it have, to stop it blocking-out URLs?

Nigh-on impossible, I'd guess.

Advertisement: