Not recognising some ascii characters suddenly

Started by joeyjojoshabadoo, February 21, 2013, 10:34:00 AM

Previous topic - Next topic

joeyjojoshabadoo

sorry gents, another problem - not having much fun lately with SMF, and my head is melted.

For some reasons since last night, the forum won't recognise some characters now, namely £ and the euro sign.

anyone any idea why this might be happening? I ran the basic maintenence checks last night and nothing more, i didn't screw with any languages or anything, and it seems to be happening now.

an example is here, in the signature - the Diamond characters with question marks.

http://www.dropkickrugby.com/forum/international-rugby/six-nations-betting-round-3-2324-feb/msg4134/#msg4134

Arantor

Neither of those are ASCII characters. And I'm not sure of their status within ISO-8859-1 encoding which is what your forum is using.

Back it up and convert to UTF-8 from the admin panel. It might not fix existing posts but it will ensure future posts will work.

joeyjojoshabadoo

doers it make any difference that it only started happening last night arantor?

and will utf-8 slow things down  or mess with anything else?

thanks again for your help.

ps- any idea why it happened in the first place?

Arantor

If it only started last night, something must have changed somewhere. Things do not randomly stop working.

So did you change anything anywhere last night?

UTF-8 should not slow things down in any meaningful way.

MrPhil

The Euro sign is not in Latin-1, but is in Windows-1252 ("Smart Quotes"). The Pound sign is in Latin-1. Your forum is already (or at least currently) in UTF-8. What was the original source of this text showing the <?> signs? Was it cut and pasted from Word or Outlook? If so, the Euro and Pound sign will be single bytes but wrongly encoded for UTF-8. If the text was originally working on a Latin-1 encoded system, did you convert over to UTF-8 at some point? Maybe this particular text failed to be converted. Or, it was actually Windows-1252 and many browsers actually use that (display correctly) when Latin-1 is requested, as it is a superset of Latin-1 for all practical intents and purposes. However, it won't cleanly convert to UTF-8.

Arantor

QuoteYour forum is already (or at least currently) in UTF-8.

No, it isn't. Look at the link provided and its source.

var smf_charset = "ISO-8859-1";

<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />

Code (HTTP response headers) Select
Content-Type:text/html; charset=ISO-8859-1

MrPhil

Something weird is going on then. Firefox swears it's UTF-8 encoded. That would explain why the ?-in-diamonds if they're Latin-1/Windows-1252 text (as Euro or Pound bytes) being interpreted as UTF-8 on the browser.

Might be time to give the host a nudge and ask them why a Latin-1 site shows up as UTF-8.

Arantor

Everything about the page says it's being served in ISO. There is no point as far as I can see why the server is ever under the impression it is UTF-8, nor any point when it is doing so.

Seems to me that the only thing a host could do is take one look at the DB to see if any of it is in UTF-8 or not...

MrPhil

The following misplaced line is in the <head> section:
QuoteRugby betting tips, previews, odds and betting forum

Maybe that's messing up something? Was it supposed to be a title or a <meta description>?

Arantor

Oh yes, so it is. Is it possible that that's why FF is misidentifying the content type?

joeyjojoshabadoo

Guys i really appreciate the help here.

I found where that was coming from - i had installed the global headers and footers mod. I have just removed the line that was floating there and the problem is still there.

Some more info - when I type in the dollar sign using my keyboard it shows up, but the £ (Pound) sign using the keyboard is a diamond

Arantor

Yes, the dollar sign is an ASCII character (ASCII = American Standard Code for Information Interchange), so it is normal it would show. The £ sign is not an ASCII character so it doesn't.

joeyjojoshabadoo

Hi arantor, i've updated that post just above- i foudn out where it was coming from.

I can't figure out why it only stopped working last night ( the pound sign displaying).

I think i MAY have found something. When i go to maintenence and run the Check all files against current versions test the below is the only difference

Language Files    2.0    2.0.4

So im on 2, and the latest is 2.04. The forum is on 2.04 overall, so not sure whats happening here. Could this be the key?

Arantor

It's extremely unlikely to be the cause.

Seriously, take a backup and then convert to UTF-8.

MrPhil

The Euro and Pound characters would have been valid if the page were displayed in Latin-1 (actually, Windows-1252). They are invalid UTF-8 characters. That the page is being displayed in UTF-8 means they don't show. Once you figure out why the browser is displaying it in UTF-8 (and fix it), the characters should display correctly.

If you removed that stray line from the HEAD, did it have no effect on the encoding problem, or have you not removed the line yet?

Arantor

It appears to be gone, which means we're back to something that is inserted which isn't valid ISO-8859-1.

I find no evidence to support it displaying UTF-8. Every indication is that it is ISO-8859-1. I would wonder if your browser is not, in fact, faulty and claiming the page is UTF-8 when it is not.

MrPhil

Everything is valid in ISO-8859-1. I don't know why you [Arantor] insist that it's not displaying in UTF-8, despite everything in the source specifying Latin-1. Both FF 19 and IE 9 display the page in UTF-8 (View > Encoding), plus the presence of ?-in-black-diamonds, which is UTF-8's marker for invalid character codes. I have seen misconfigured servers that override the meta charset to force display in Latin-1; there's no reason to suppose it couldn't be forcing UTF-8. If you View > Encoding, and force it from UTF-8 to ISO-8859-1, the page displays fine; proving that the source text is OK.

I say "bring this to the attention of your hosting company, so they can see if some misconfiguration (changed the other night) is overriding the character set specification and forcing UTF-8". They may be able to trap the header information and find something being sent there.

Arantor

QuoteI don't know why you [Arantor] insist that it's not displaying in UTF-8, despite everything in the source specifying Latin-1

Because I find absolutely zero evidence to explain why the browser is treating it as UTF-8 when everything about it says it should be in ISO. As I outlined, the HTTP headers and the page content clearly say it is ISO encoded. If you have browsers that are changing it to UTF-8, they're doing so incorrectly.

QuoteThey may be able to trap the header information and find something being sent there.

I already looked through the headers. I even posted the HTTP header to that effect. The page is screaming ISO in its meta content. What more can the host do?

Oh, you know best, you fix it.

MrPhil

I will let the evidence speak for itself. When JoeyJJS comes back and says, "the host admits they misconfigured the server the other day and was forcing UTF-8, and now they've fixed it and the pages display fine," I will enjoy watching you eat crow.

"Joey Jojo Shabadoo? That's the stupidest name I ever heard of!"
                                                              -- Moe Szyslak, The Simpsons

joeyjojoshabadoo

Quote from: Arantor on February 21, 2013, 10:20:21 PM
It appears to be gone, which means we're back to something that is inserted which isn't valid ISO-8859-1.

I find no evidence to support it displaying UTF-8. Every indication is that it is ISO-8859-1. I would wonder if your browser is not, in fact, faulty and claiming the page is UTF-8 when it is not.

i actually just amended the sig to take the offending characters out.

lads, i will do the backup as advised and transfer to utf on monday and let you know what happens. Thanks for all your help, much appreciated

joeyjojoshabadoo

Quote from: Arantor on February 21, 2013, 10:38:47 AM
Neither of those are ASCII characters. And I'm not sure of their status within ISO-8859-1 encoding which is what your forum is using.

Back it up and convert to UTF-8 from the admin panel. It might not fix existing posts but it will ensure future posts will work.

did it, and all looks good now mate. Thanks again for your help.

Arantor

Quote from: joeyjojoshabadoo on February 26, 2013, 12:13:21 PM
did it, and all looks good now mate. Thanks again for your help.

Awesome :)

Quote from: MrPhil on February 22, 2013, 10:03:01 AM
I will let the evidence speak for itself. When JoeyJJS comes back and says, "the host admits they misconfigured the server the other day and was forcing UTF-8, and now they've fixed it and the pages display fine," I will enjoy watching you eat crow.

I think the evidence does speak for itself. Pie's in the oven.

MrPhil

Not the same thing. JJJS changed his forum to work around the hosting problem, rather than having the host fix the problem. All the same to him, but the original cause was never addressed.

Arantor

The only person who ever saw it as anything other than ISO-8859-1 was you... but it's funny how I suggested it up front and now mysteriously it all works, isn't it? According to you that should never have worked, right?

MrPhil

QuoteThe only person who ever saw it as anything other than ISO-8859-1 was you...
And both Firefox and Internet Explorer.

Arantor


MrPhil

And for the original poster (the only way to get the ?-in-black-diamond glyph is to be displaying in UTF-8).

Arantor

Which would be funny if that was what I saw... which it wasn't.

Advertisement: