SMF Development > Bug Reports

[4981] [1.x, 2.0] handling MS Smart Quotes

(1/9) > >>

MrPhil:
The support boards for all versions of SMF are clogged with reports of certain characters cutting off the rest of a post, or otherwise apparently causing mischief. The root cause of these problems is that people cut and paste text from Microsoft products (especially Word) that contain MS's "Smart Quotes", which are found only in CP-1252 encoding. My proposal is that all incoming text (from TEXT, TEXTAREA, and possibly other input fields) be scanned for Smart Quotes characters (binary), and any found should be replaced by HTML entities. str_replace() might do the job. Here are all the Smart Quotes:

Smart QuoteGlyphClosest ASCIIUTF-16 valueHTML entitySQ DescriptionReserved Use80&euro;C=20ACeuroEuroreserved control81 reserved control82&sbquo;"201AsbquoLow-"9" opening quotation markBreak Permitted Here83&#402;f0192fnof1 or 402Florin/script f/folderNo Break Here84&bdquo;"201EbdquoLow-"99" opening quotation markIndex85&hellip;...2026hellipEllipsisNext Line86&dagger;+2020daggerSingle daggerStart of Selected Area87&Dagger;++2021DaggerDouble daggerEnd of Selected Area88&circ;^02C6circCircumflex ^ accent (combining?)Character Tabulation Set89&permil;o/oo2030permilo/oo per milleCharacter Tabulation with Justification8A&#352; 0160Scaron1 or 352S + caron accentLine Tabulation Set8B&lsaquo;<2039lsaquoSingle left angle quote < (guillemet)Partial Line Down8C&OElig;OE0152OEligOE ligaturePartial Line Up8D Reverse Line Feed8E&#381; 017DZcaron1 or 381Z + caron accentSingle Shift Two8F Single Shift Three90 Device Control String91&lsquo;   '2018lsquo"6" opening quotation markPrivate Use One92&rsquo;'2019rsquo"9" closing quotation mark/apostrophePrivate Use Two93&ldquo;"201Cldquo"66" opening quotation markSet Transmit State94&rdquo;"201Drdquo"99" closing quotation markCancel Character95&bull;*2022bullSolid bulletMessage Waiting96&ndash;-2013ndashEn-dashStart of Guarded Area97&mdash;--2014mdashEm-dashEnd of Guarded Area98&tilde;~02DCtildeTilde ~ accent (combining?)Start of String99&trade;(tm)2122tradeTrademark TMreserved control9A&#353; 0161scaron1 or 353s + caron accentSingle Character Introducer9B&rsaquo;>203ArsaquoSingle right angle quote > (guillemet)Control Sequence Introducer9C&oelig;oe0153oeligoe ligatureString Terminator9D Operating System Command9E&#382; 017Ezcaron1 or 382z + caron accentPrivacy Message9F&Yuml; 0178YumlY + diaeresis/umlaute accentApplication Program Command
* 1 Named entity may not be widely supported, especially on older  browsers.
* Additional Smart Quotes may have been added since I first made this list.
* Unfortunately, I do not seem to be able to add either numeric or named entities, to show the glyphs. See http://www.catskilltech.com/freeSW/SMF/projects/index.html#smrtquotes to see the glyphs.
Why call this a bug? Because it's been a festering problem for a long time, and really degrades the public image of SMF that it cannot handle something so common as cutting and pasting in Word document text. Just because your average user is too stupid to realize the difference in encodings is the cause of the problem doesn't mean that we can't work around it. It's also very simple to fix -- just define a function to clean the string and call it from wherever SMF takes in user text. Depending on whether BBCode is recognized, and whether HTML entities work, it might be possible to create either a BBCode for each character, or a BBCode to handle generic HTML entities [ent=nnnn] or [ent=name]. Where BBCode is not processed, replace by ASCII character(s).

K@:
I can't see how anyone could ignore such a well-presented, detailed report, Mr. P.

Joshua Dickerson:

--- Quote from: K@ on April 25, 2012, 01:29:56 PM ---I can't see how anyone could ignore such a well-presented, detailed report, Mr. P.

--- End quote ---
Wow, yes... couldn't agree more. If there was an award for best bug report (without the fix), I think this might be it.

K@:
Let's keep our fingers crossed, then, ay? ;)

vbgamer45:
I think it is a great idea. I have run into those issues were users post those special characters all the time and anything to fix I am for it.

Navigation

[0] Message Index

[#] Next page

Go to full version