Simple Machines Community Forum

SMF Support => SMF 2.0.x Support => Topic started by: joeyjojoshabadoo on February 21, 2013, 10:34:00 AM

Title: Not recognising some ascii characters suddenly
Post by: joeyjojoshabadoo on February 21, 2013, 10:34:00 AM
sorry gents, another problem - not having much fun lately with SMF, and my head is melted.

For some reasons since last night, the forum won't recognise some characters now, namely £ and the euro sign.

anyone any idea why this might be happening? I ran the basic maintenence checks last night and nothing more, i didn't screw with any languages or anything, and it seems to be happening now.

an example is here, in the signature - the Diamond characters with question marks.

http://www.dropkickrugby.com/forum/international-rugby/six-nations-betting-round-3-2324-feb/msg4134/#msg4134
Title: Re: Not recognising some ascii characters suddenly
Post by: Arantor on February 21, 2013, 10:38:47 AM
Neither of those are ASCII characters. And I'm not sure of their status within ISO-8859-1 encoding which is what your forum is using.

Back it up and convert to UTF-8 from the admin panel. It might not fix existing posts but it will ensure future posts will work.
Title: Re: Not recognising some ascii characters suddenly
Post by: joeyjojoshabadoo on February 21, 2013, 11:11:18 AM
doers it make any difference that it only started happening last night arantor?

and will utf-8 slow things down  or mess with anything else?

thanks again for your help.

ps- any idea why it happened in the first place?
Title: Re: Not recognising some ascii characters suddenly
Post by: Arantor on February 21, 2013, 11:17:52 AM
If it only started last night, something must have changed somewhere. Things do not randomly stop working.

So did you change anything anywhere last night?

UTF-8 should not slow things down in any meaningful way.
Title: Re: Not recognising some ascii characters suddenly
Post by: MrPhil on February 21, 2013, 12:39:14 PM
The Euro sign is not in Latin-1, but is in Windows-1252 ("Smart Quotes"). The Pound sign is in Latin-1. Your forum is already (or at least currently) in UTF-8. What was the original source of this text showing the <?> signs? Was it cut and pasted from Word or Outlook? If so, the Euro and Pound sign will be single bytes but wrongly encoded for UTF-8. If the text was originally working on a Latin-1 encoded system, did you convert over to UTF-8 at some point? Maybe this particular text failed to be converted. Or, it was actually Windows-1252 and many browsers actually use that (display correctly) when Latin-1 is requested, as it is a superset of Latin-1 for all practical intents and purposes. However, it won't cleanly convert to UTF-8.
Title: Re: Not recognising some ascii characters suddenly
Post by: Arantor on February 21, 2013, 01:16:11 PM
QuoteYour forum is already (or at least currently) in UTF-8.

No, it isn't. Look at the link provided and its source.

var smf_charset = "ISO-8859-1";

<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />

Code (HTTP response headers) Select
Content-Type:text/html; charset=ISO-8859-1
Title: Re: Not recognising some ascii characters suddenly
Post by: MrPhil on February 21, 2013, 04:04:35 PM
Something weird is going on then. Firefox swears it's UTF-8 encoded. That would explain why the ?-in-diamonds if they're Latin-1/Windows-1252 text (as Euro or Pound bytes) being interpreted as UTF-8 on the browser.

Might be time to give the host a nudge and ask them why a Latin-1 site shows up as UTF-8.
Title: Re: Not recognising some ascii characters suddenly
Post by: Arantor on February 21, 2013, 04:13:29 PM
Everything about the page says it's being served in ISO. There is no point as far as I can see why the server is ever under the impression it is UTF-8, nor any point when it is doing so.

Seems to me that the only thing a host could do is take one look at the DB to see if any of it is in UTF-8 or not...
Title: Re: Not recognising some ascii characters suddenly
Post by: MrPhil on February 21, 2013, 05:38:47 PM
The following misplaced line is in the <head> section:
QuoteRugby betting tips, previews, odds and betting forum

Maybe that's messing up something? Was it supposed to be a title or a <meta description>?
Title: Re: Not recognising some ascii characters suddenly
Post by: Arantor on February 21, 2013, 06:02:35 PM
Oh yes, so it is. Is it possible that that's why FF is misidentifying the content type?
Title: Re: Not recognising some ascii characters suddenly
Post by: joeyjojoshabadoo on February 21, 2013, 07:08:19 PM
Guys i really appreciate the help here.

I found where that was coming from - i had installed the global headers and footers mod. I have just removed the line that was floating there and the problem is still there.

Some more info - when I type in the dollar sign using my keyboard it shows up, but the £ (Pound) sign using the keyboard is a diamond
Title: Re: Not recognising some ascii characters suddenly
Post by: Arantor on February 21, 2013, 07:11:12 PM
Yes, the dollar sign is an ASCII character (ASCII = American Standard Code for Information Interchange), so it is normal it would show. The £ sign is not an ASCII character so it doesn't.
Title: Re: Not recognising some ascii characters suddenly
Post by: joeyjojoshabadoo on February 21, 2013, 07:16:40 PM
Hi arantor, i've updated that post just above- i foudn out where it was coming from.

I can't figure out why it only stopped working last night ( the pound sign displaying).

I think i MAY have found something. When i go to maintenence and run the Check all files against current versions test the below is the only difference

Language Files    2.0    2.0.4

So im on 2, and the latest is 2.04. The forum is on 2.04 overall, so not sure whats happening here. Could this be the key?
Title: Re: Not recognising some ascii characters suddenly
Post by: Arantor on February 21, 2013, 07:22:56 PM
It's extremely unlikely to be the cause.

Seriously, take a backup and then convert to UTF-8.
Title: Re: Not recognising some ascii characters suddenly
Post by: MrPhil on February 21, 2013, 10:11:19 PM
The Euro and Pound characters would have been valid if the page were displayed in Latin-1 (actually, Windows-1252). They are invalid UTF-8 characters. That the page is being displayed in UTF-8 means they don't show. Once you figure out why the browser is displaying it in UTF-8 (and fix it), the characters should display correctly.

If you removed that stray line from the HEAD, did it have no effect on the encoding problem, or have you not removed the line yet?
Title: Re: Not recognising some ascii characters suddenly
Post by: Arantor on February 21, 2013, 10:20:21 PM
It appears to be gone, which means we're back to something that is inserted which isn't valid ISO-8859-1.

I find no evidence to support it displaying UTF-8. Every indication is that it is ISO-8859-1. I would wonder if your browser is not, in fact, faulty and claiming the page is UTF-8 when it is not.
Title: Re: Not recognising some ascii characters suddenly
Post by: MrPhil on February 22, 2013, 09:27:13 AM
Everything is valid in ISO-8859-1. I don't know why you [Arantor] insist that it's not displaying in UTF-8, despite everything in the source specifying Latin-1. Both FF 19 and IE 9 display the page in UTF-8 (View > Encoding), plus the presence of ?-in-black-diamonds, which is UTF-8's marker for invalid character codes. I have seen misconfigured servers that override the meta charset to force display in Latin-1; there's no reason to suppose it couldn't be forcing UTF-8. If you View > Encoding, and force it from UTF-8 to ISO-8859-1, the page displays fine; proving that the source text is OK.

I say "bring this to the attention of your hosting company, so they can see if some misconfiguration (changed the other night) is overriding the character set specification and forcing UTF-8". They may be able to trap the header information and find something being sent there.
Title: Re: Not recognising some ascii characters suddenly
Post by: Arantor on February 22, 2013, 09:31:40 AM
QuoteI don't know why you [Arantor] insist that it's not displaying in UTF-8, despite everything in the source specifying Latin-1

Because I find absolutely zero evidence to explain why the browser is treating it as UTF-8 when everything about it says it should be in ISO. As I outlined, the HTTP headers and the page content clearly say it is ISO encoded. If you have browsers that are changing it to UTF-8, they're doing so incorrectly.

QuoteThey may be able to trap the header information and find something being sent there.

I already looked through the headers. I even posted the HTTP header to that effect. The page is screaming ISO in its meta content. What more can the host do?

Oh, you know best, you fix it.
Title: Re: Not recognising some ascii characters suddenly
Post by: MrPhil on February 22, 2013, 10:03:01 AM
I will let the evidence speak for itself. When JoeyJJS comes back and says, "the host admits they misconfigured the server the other day and was forcing UTF-8, and now they've fixed it and the pages display fine," I will enjoy watching you eat crow.

"Joey Jojo Shabadoo? That's the stupidest name I ever heard of!"
                                                              -- Moe Szyslak, The Simpsons
Title: Re: Not recognising some ascii characters suddenly
Post by: joeyjojoshabadoo on February 22, 2013, 12:12:54 PM
Quote from: Arantor on February 21, 2013, 10:20:21 PM
It appears to be gone, which means we're back to something that is inserted which isn't valid ISO-8859-1.

I find no evidence to support it displaying UTF-8. Every indication is that it is ISO-8859-1. I would wonder if your browser is not, in fact, faulty and claiming the page is UTF-8 when it is not.

i actually just amended the sig to take the offending characters out.

lads, i will do the backup as advised and transfer to utf on monday and let you know what happens. Thanks for all your help, much appreciated
Title: Re: Not recognising some ascii characters suddenly
Post by: joeyjojoshabadoo on February 26, 2013, 12:13:21 PM
Quote from: Arantor on February 21, 2013, 10:38:47 AM
Neither of those are ASCII characters. And I'm not sure of their status within ISO-8859-1 encoding which is what your forum is using.

Back it up and convert to UTF-8 from the admin panel. It might not fix existing posts but it will ensure future posts will work.

did it, and all looks good now mate. Thanks again for your help.
Title: Re: Not recognising some ascii characters suddenly
Post by: Arantor on February 26, 2013, 12:20:00 PM
Quote from: joeyjojoshabadoo on February 26, 2013, 12:13:21 PM
did it, and all looks good now mate. Thanks again for your help.

Awesome :)

Quote from: MrPhil on February 22, 2013, 10:03:01 AM
I will let the evidence speak for itself. When JoeyJJS comes back and says, "the host admits they misconfigured the server the other day and was forcing UTF-8, and now they've fixed it and the pages display fine," I will enjoy watching you eat crow.

I think the evidence does speak for itself. Pie's in the oven.
Title: Re: Not recognising some ascii characters suddenly
Post by: MrPhil on February 26, 2013, 02:00:29 PM
Not the same thing. JJJS changed his forum to work around the hosting problem, rather than having the host fix the problem. All the same to him, but the original cause was never addressed.
Title: Re: Not recognising some ascii characters suddenly
Post by: Arantor on February 26, 2013, 04:17:44 PM
The only person who ever saw it as anything other than ISO-8859-1 was you... but it's funny how I suggested it up front and now mysteriously it all works, isn't it? According to you that should never have worked, right?
Title: Re: Not recognising some ascii characters suddenly
Post by: MrPhil on February 26, 2013, 10:25:21 PM
QuoteThe only person who ever saw it as anything other than ISO-8859-1 was you...
And both Firefox and Internet Explorer.
Title: Re: Not recognising some ascii characters suddenly
Post by: Arantor on February 26, 2013, 10:33:27 PM
For *you*.
Title: Re: Not recognising some ascii characters suddenly
Post by: MrPhil on February 27, 2013, 09:05:34 AM
And for the original poster (the only way to get the ?-in-black-diamond glyph is to be displaying in UTF-8).
Title: Re: Not recognising some ascii characters suddenly
Post by: Arantor on February 27, 2013, 09:09:51 AM
Which would be funny if that was what I saw... which it wasn't.