Simple Machines Community Forum

SMF Support => SMF 1.1.x Support => Topic started by: meca1 on March 10, 2010, 12:31:42 PM

Title: Strange symbols are appearing in my topics
Post by: meca1 on March 10, 2010, 12:31:42 PM
I'm not sure if this is the correct place to ask this -- but these strange symbols keep appearing in some of my older topics:

ÃÆ'ƒÂÃ,¢ÃÆ'Ã,¢Ã¢â‚¬Ã...¡Ã‚Ã,¬ÃÆ'Ã,¢Ã¢â‚¬Ã...¾Ã‚Ã,¢t ÃÆ'ƒÂÃ,¢ÃÆ'Ã,¢Ã¢â‚¬Ã...¡Ã‚Ã,¬ÃÆ'Ã,¢Ã¢â‚¬Ã...¾Ã‚Ã,Â

They always appear in older posts where an apostrophe, quotes or dashes should be. Can anyone tell me why? It's very frustrating and time-consuming to go back through all the threads and posts to fix them.... PLEASE HELP!
Title: Re: Strange symbols are appearing in my topics
Post by: Kindred on March 10, 2010, 02:29:43 PM
someone is pasting from word, would be my first guess...   in other words, quotes which are special characters instead of just a simple "


but, if this is on your own forum, you should ask for help in the section for support on your version of SMF 1.1.x or 2.0
Also, when reporting a problem that you would like help with;
smf version, mods installed and URL to your forum/where the issue is appearing.
Title: Re: Strange symbols are appearing in my topics
Post by: kat on March 10, 2010, 02:51:53 PM
Hmmmm...


I'm wondering if you may have been hacked.


Do you have any strange files in your forum directory?


As Kindred said, if you tell us which version of SMF you're using, we'd have a clue, at least.
Title: Re: Strange symbols are appearing in my topics
Post by: meca1 on March 10, 2010, 04:15:42 PM
No, there are no strange posts. And Im very careful about my members. I check them all out before approval to keep spammers out. In hoping this is what you're asking, it says Version SMF 1.1.11
Title: Re: Strange symbols are appearing in my topics
Post by: kat on March 10, 2010, 04:17:48 PM
Not "Posts". Files. Are all your files OK?
Title: Re: Strange symbols are appearing in my topics
Post by: meca1 on March 10, 2010, 04:23:43 PM
Oops - sorry. There are no strange files that I'm aware of. And I haven't noticed anything else that's strange -- just this.
Title: Re: Strange symbols are appearing in my topics
Post by: kat on March 10, 2010, 04:30:06 PM
In that case, I'd suspect a language problem.


Can you give us a link to a post showing this? (One that's open to guest-viewing, obviously).
Title: Re: Strange symbols are appearing in my topics
Post by: meca1 on March 10, 2010, 05:44:22 PM
Thank you Kat - Go to this thread: http://meca1.com/forum/index.php?topic=211.0

It's only open to members but for the moment, I've changed that so guests can see it. Please let me know when you're finished.

Title: Re: Strange symbols are appearing in my topics
Post by: meca1 on March 10, 2010, 11:33:01 PM
I changed it back for now - please tell me when you want to go in there. Or I can PM you with a password.
Title: Re: Strange symbols are appearing in my topics
Post by: kat on March 11, 2010, 04:26:42 AM
Password might be good. We're obviously in different time-zones. :)
Title: Re: Strange symbols are appearing in my topics
Post by: perplexed on March 11, 2010, 06:14:19 AM
I've seen this before on other forums, when they have converted to smf from other forum s/ware, and once for me when I changed hosts.  Can't remember the cause now, a different version of... something *scratches head*
Title: Re: Strange symbols are appearing in my topics
Post by: Kindred on March 11, 2010, 08:41:42 AM
oh..  good catch perplexed... it was in the back of my mind too....   it looks like a character failure.

What is the database language set to?
Are the text values (i.e. the apostrophe and quotes, etc) int eh DATABASE correct (i.e. does the failure occur in the database storage or in the retreval/display?)
Title: Re: Strange symbols are appearing in my topics
Post by: MrPhil on March 11, 2010, 11:16:44 AM
This is only where "apostrophe, quotes or dashes" should be? As @Kindred pointed out, that sounds a lot like someone used Word to create their posting, and brought over its (sadly misnamed) "smart quotes" with the text. If that's what happened, you'll have no choice but to edit the posts in question to clean them up with proper characters.

Now, what's different about these "older posts", that it doesn't happen (?) in newer posts? Were these "older posts" imported from another forum system? Was the forum encoding (database and/or page display) changed after these "older" posts? It looks like maybe your old text was stored in UTF-8, but now is being displayed in Latin-1 encoding.
Title: Re: Strange symbols are appearing in my topics
Post by: meca1 on March 11, 2010, 11:30:54 AM
MrPhil -- I believe you may have hit it on the head. They were actually text that I had copied and pasted from different things online that were items of interest. The reason they were 'older' was due to the fact that I had done them a few months ago. So.... if, in the future if I were to copy and paste such stuff, if I delete the quotes and dashes at that time, and re-type them, will that alleviate the problem?
Title: Re: Strange symbols are appearing in my topics
Post by: Kindred on March 11, 2010, 12:04:37 PM
you should never paste from word to online...

If you need to paste, then paste to a text editor (notepad, if you must, but there are better ones like notepad++ and others) before pasting to the online box
Title: Re: Strange symbols are appearing in my topics
Post by: meca1 on March 11, 2010, 12:12:52 PM
How do I know if it's from Word?
Title: Re: Strange symbols are appearing in my topics
Post by: MrPhil on March 11, 2010, 12:26:31 PM
Yeah, if you edit the posts at some point to change the "smart quote" items to normal, properly encoded characters, they should display correctly. The problem is that they won't be as "pretty" as the original text, especially if your forum is running in Latin-x rather than UTF-8.

If you find yourself frequently cutting and pasting from sources that don't match your forum database and display encoding (e.g., UTF-8 or CP-1252 on other pages, and your forum is Latin-1), you may want to consider converting your forum to another encoding (UTF-8). Note that the process assumes that what's currently in your database (for posts) is correct Latin-1 encoded text, not some hybrid mishmash of encodings.  When you cut and paste, you're bringing over the byte codes for the text you see, in whatever encoding that page is displayed in. You're not bringing over an "em-dash", say, you're bringing over the byte code x91 or whatever CP-1252 uses. If that encoding doesn't match your forum, you will experience the strange symbols.

A forum (and database) in UTF-8 will be able to display any "reasonable" symbol in a post. Note that whether a UTF-8 character can be displayed depends upon the fonts installed on the viewer's browser, not on anything found on your site! I think all the "smart quotes" should be found on just about any PC browser, so that should be safe. The big problem will be converting cut-and-pasted text from Latin-1 or CP-1252 or whatever to UTF-8. It won't happen automatically. Most browsers do not make it easy to enter characters not found on the keyboard.


QuoteHow do I know if it's from Word?
You can't really tell if a Web page was itself cut and pasted from Word. Sometimes you can see quotes that are screwed up (much like the text in your first post). If something online uses "typographically correct" opening and closing quotation marks, suspect that it came from Word. You can look at the page source (View > Page source) and see if it lists what page encoding (e.g., CP-1252).

@Kindred's warning applies primarily to text directly cut and pasted from Word on a PC into an SMF post (or anything else online, not just SMF). You have no way of really knowing where HTML in a page came from, except by following hints given in the previous paragraph. Of course you'll know whether or not you're cutting and pasting directly from Word!

P.S. You should review your use of material cut and pasted from other Web pages. Make sure it would be considered "fair use" of copyrighted material, and not theft.
Title: Re: Strange symbols are appearing in my topics
Post by: lloydb on March 11, 2010, 12:41:51 PM
Working from Word to html has embarrassed me a few times.

But you can turn off the dum 'smart quotes' stuff. I'm too busy to look it up now but you can find it if you check all your options. Since doing that I have been able to copy from word and than paste into my html editor without a single problem.

The 3 dots in a row that most people are ending their sentences with these days is also a MS special character. It can make a similar mess. It is rare in anything I do. Can't remember if I have that turned off in Word or if I just go and rub it out after it is in the editor.
Title: Re: Strange symbols are appearing in my topics
Post by: MrPhil on March 11, 2010, 05:26:29 PM
Yeah, but most people have no idea that "smart quotes" is even in operation, or that it can be turned off. As long as people use Word to type in text, we're going to have this problem. I have no idea why people would even use Word just to type in text for an SMF post -- maybe it's just force of habit for them? After all, the formatting and stuff isn't even preserved during cut and paste.

The "three dots in a row" is an ellipsis. That's another thing that Word pretties up, along with various dashes and quotation marks.

Anybody up for a mod to look for "smart quote" byte codes and turn them into HTML entities or the appropriate UTF-16 entity code (& #nnnn; where nnnn is decimal)? Here's the official Microsoft list for CP-1252:

   Hex code    equivalent                      name

    80         U+20AC  & euro;                 Euro
    82         U+201A  & sbquo;                Low-9 opening quotation mark
    83         U+0192  & fnof;  & #402; *      Florin/script f/folder     
    84         U+201E  & bdquo;                Low-99 opening quotation mark
    85         U+2026  & hellip;               Ellipsis
    86         U+2020  & dagger;               Single dagger
    87         U+2021  & Dagger;               Double dagger
    88         U+02C6  & circ;                 Circumflex ^ accent (combining?)
    89         U+2030  & permil;               o/oo per mille
    8A         U+0160  & Scaron;  & #352; *    S + caron accent
    8B         U+2039  & lsaquo;               Single left angle quote < (guillemet)
    8C         U+0152  & OElig;                OE ligature
    8E         U+017D  & Zcaron;  & #381; ?    Z + caron accent
    91         U+2018  & lsquo;                6 opening quotation mark
    92         U+2019  & rsquo;                9 closing quotation mark/apostrophe
    93         U+201C  & ldquo;                66 opening quotation mark
    94         U+201D  & rdquo;                99 closing quotation mark
    95         U+2022  & bull;                 Solid bullet
    96         U+2013  & ndash;                En-dash
    97         U+2014  & mdash;                Em-dash
    98         U+02DC  & tilde;                Tilde ~ accent (combining?)
    99         U+2122  & trade;                Trademark TM
    9A         U+0161  & scaron;  & #353; *    s + caron accent
    9B         U+203A  & rsaquo;               Single right angle quote > (guillemet)
    9C         U+0153  & oelig;                oe ligature
    9E         U+017E  & zcaron;  & #382; ?    z + caron accent
    9F         U+0178  & Yuml;                 Y + diaeresis/umlaute accent

* recent addition, may not work on all browsers
? very few, if any, browsers support this

These bytes are invalid in Latin-1 and UTF-8; I don't know for sure if any other legitimate encoding uses them.