Yeah, if you edit the posts at some point to change the "smart quote" items to normal, properly encoded characters, they should display correctly. The problem is that they won't be as "pretty" as the original text, especially if your forum is running in Latin-x rather than UTF-8.
If you find yourself frequently cutting and pasting from sources that don't match your forum database and display encoding (e.g., UTF-8 or CP-1252 on other pages, and your forum is Latin-1), you may want to consider converting your forum to another encoding (UTF-8). Note that the process assumes that what's currently in your database (for posts) is correct Latin-1 encoded text, not some hybrid mishmash of encodings. When you cut and paste, you're bringing over the byte codes for the text you see, in whatever encoding that page is displayed in
. You're not bringing over an "em-dash", say, you're bringing over the byte code x91 or whatever CP-1252 uses. If that encoding doesn't match your forum, you will experience the strange symbols.
A forum (and database) in UTF-8 will be able to display any "reasonable" symbol in a post. Note that whether a UTF-8 character can be displayed
depends upon the fonts installed on the viewer's browser, not on anything found on your site! I think all the "smart quotes" should be found on just about any PC browser, so that should be safe. The big problem will be converting cut-and-pasted text from Latin-1 or CP-1252 or whatever to UTF-8. It won't happen automatically. Most browsers do not make it easy to enter characters not found on the keyboard.
- You can manually edit the posts and approximate the offending characters with something found on your keyboard (crude, but fast). That would work even if you stay with Latin-1.
- You can stay with a Latin-1 database, but change your page encoding to CP-1252 (if that's what you're primarily copying from). That might require a tweak to the SMF code to list CP-1252 as the page encoding instead of Latin-1 (ISO-8859-1). The "smart quote" characters would go into the database unchanged, and properly display upon retrieval. Note that if you pull in any text from a non-CP-1252 page, this won't work.
- Change to UTF-8 database and page display encoding. You can learn which original characters are problems (especially "smart quotes") and hopefully replace them with their Unicode replacements (&# followed by the UTF-16 decimal value of that character followed by a semicolon I think will work -- give it a try). For a small set of offending characters, that might be feasible -- at least you'll get the proper text displayed. For massive amounts of text, that's too much work.
- Convert to UTF-8 (database and page encoding). You can cut and paste to a file on your PC, and run a little utility to search out "smart quotes" and replace them with UTF-8 character codes. They'll probably look very odd on your PC, but you should be able to cut and paste them to your forum posts and have them show up correctly (as UTF-8).
- Something else?
How do I know if it's from Word?
You can't really tell if a Web page was itself cut and pasted from Word. Sometimes you can see quotes that are screwed up (much like the text in your first post). If something online uses "typographically correct" opening and closing quotation marks, suspect that it came from Word. You can look at the page source (View > Page source) and see if it lists what page encoding (e.g., CP-1252).
@Kindred's warning applies primarily to text directly cut and pasted from Word on a PC into an SMF post (or anything else online, not just SMF). You have no way of really knowing where HTML in a page came from, except by following hints given in the previous paragraph. Of course you'll know whether or not you're cutting and pasting directly from Word!
P.S. You should review your use of material cut and pasted from other Web pages. Make sure it would be considered "fair use" of copyrighted material, and not theft.