I'm on a dedicated server, and my ISP maintains everything.
I'm running 1.1.4. It's a large database that has run extremely well for years.
Yesterday my ISP upgraded PHP to v. 5.2.3.
Well, my forum started spitting out funky characters...
I've never run "Convert the database and data to UTF-8"... yet have never had any need to... and never wanted to risk it.
Well, in the coming hours - I guess I am going to try to convert it.... what are the risks?
... or is there some other solution?
Thanks
1.1.4 is EXTREMELY obsolete and vulnerable to hacks. You're 10 upgrades behind.
AFAIK even something that old should be PHP 5 compatible, but I won't swear to it. Converting to UTF-8 isn't necessary and isn't going to help -- the problem is elsewhere. Did your host also upgrade MySQL at the same time? Maybe in the process they changed your database encoding to UTF-8? If server software (e.g., Apache) was also upgraded at this time, it's possible that your host botched it and set something to override your declared encoding (e.g., display in Latin-1 instead of UTF-8, which I've seen a number of times).
What sort of "funky characters" are you getting on your browser? Do they include a lot of question-mark-in-black-diamonds? If so, you're displaying in UTF-8 text with Latin-1 (or some other single byte encoding). Or, are accented non-ASCII characters replaced by a stream of two or three odd accented characters each? That would be a sign that UTF-8 is now being displayed in a single-byte encoding such as Latin-1. Do you know enough about your forum to say what the database encoding/collation is, what languages are supported in what encodings, and what the pages are supposed to be displayed in?
I can't see why an upgrade of PHP would affect the data in the database at most it would only affect the output of the data, before doing anything check with your host that its not a setting in PHP that is different to your old version of PHP as the data wouldn't have been touched by a PHP upgrade.
All the function does is convert HTML-entities to UTF-8 characters.
Lainaus käyttäjältä: MrPhil - elokuu 30, 2011, 07:12:50 IP
1.1.4 is EXTREMELY obsolete and vulnerable to hacks. You're 10 upgrades behind.
I'll bear that in mind - thanks.
Lainaus käyttäjältä: MrPhil - elokuu 30, 2011, 07:12:50 IP
AFAIK even something that old should be PHP 5 compatible, but I won't swear to it. Converting to UTF-8 isn't necessary and isn't going to help -- the problem is elsewhere.
Ok, cool...
Lainaus käyttäjältä: MrPhil - elokuu 30, 2011, 07:12:50 IP
Did your host also upgrade MySQL at the same time? Maybe in the process they changed your database encoding to UTF-8? If server software (e.g., Apache) was also upgraded at this time, it's possible that your host botched it and set something to override your declared encoding (e.g., display in Latin-1 instead of UTF-8, which I've seen a number of times).
I do not know.... Botched in an unfixable way?
Lainaus käyttäjältä: MrPhil - elokuu 30, 2011, 07:12:50 IP
What sort of "funky characters" are you getting on your browser? Do they include a lot of question-mark-in-black-diamonds? If so, you're displaying in UTF-8 text with Latin-1 (or some other single byte encoding). Or, are accented non-ASCII characters replaced by a stream of two or three odd accented characters each?
it looks like.... html etities such as » are being replaced with accented characters... (see attachment) so I guess the latter.
Lainaus käyttäjältä: MrPhil - elokuu 30, 2011, 07:12:50 IP
Do you know enough about your forum to say what the database encoding/collation is, what languages are supported in what encodings, and what the pages are supposed to be displayed in?
Sadly, no - but I use phpmyadmin all the time, so I guess I could look....
Hey I appreciate the response!
Lainaus käyttäjältä: digit - elokuu 30, 2011, 07:58:18 IP
I do not know.... Botched in an unfixable way?
Mostly fixable, depending on how long you let this go and how much effort you're willing to put in. The biggest danger is that the page character encoding has changed, and no longer matches what's already in your database. If users type in accented and non-Latin characters, they'll be in a different encoding and will have to be manually corrected at some point (editing the database via phpMyAdmin).
Lainaa
it looks like.... html etities such as » are being replaced with accented characters... (see attachment) so I guess the latter.
That looks like UTF-8 characters being displayed in a single-byte encoding such as Latin-1 or Windows-1252. Quite possible that your new server settings (Latin-1) are overriding what the page is asking to be displayed in (UTF-8). Talk to your host immediately. Meanwhile, your browser can be told to display in another encoding... try forcing UTF-8 for a page and see if it looks "right" now. If it does, see if the <meta> tag for character encoding on your page is still UTF-8 or it got changed somehow. If it still says "UTF-8", the server setup is wrong.
Lainaa
Meanwhile, your browser can be told to display in another encoding... try forcing UTF-8 for a page and see if it looks "right" now. If it does, see if the <meta> tag for character encoding on your page is still UTF-8 or it got changed somehow. If it still says "UTF-8", the server setup is wrong.
Thanks...
Thanks my current meta tag is.....
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
How do I set that to UTF-8 (or otherwise force browsers to display as UTF-8?) ... I can't seem to find where that is being set - is the character encoding a variable I can change/set somewhere?
I don't see it in the admin area.... hmmmm....
Thanks.
There's a database entry (smf_settings?) for that. I think it's "character_encoding" or something like that (I don't have code to look at -- it might be in an SMF admin panel somewhere). Yours is either missing (the default in such a case is to use Latin-1/ISO-8859-1) or has been set to ISO-8859-1. Was your forum originally in UTF-8?
Browsers have something like View > Character Encoding or something similar. They'll tell you what the current encoding being used is (it should match your <meta> tag) and let you change it to anything else for this page, this one time (i.e., not permanently).
Did you upgrade to 1.1.14 at some point? I see your signature says you're no longer at 1.1.4.
Lainaus käyttäjältä: MrPhil - elokuu 31, 2011, 01:35:15 IP
There's a database entry (smf_settings?) for that. I think it's "character_encoding" or something like that (I don't have code to look at -- it might be in an SMF admin panel somewhere). Yours is either missing (the default in such a case is to use Latin-1/ISO-8859-1) or has been set to ISO-8859-1. Was your forum originally in UTF-8?
Forum has never been UTF-8...I still can't find where the character encoding is set. I don't see it in the settings table.. or in the admin pages. :-P
Lainaus käyttäjältä: MrPhil - elokuu 31, 2011, 01:35:15 IP
Browsers have something like View > Character Encoding or something similar. They'll tell you what the current encoding being used is (it should match your <meta> tag) and let you change it to anything else for this page, this one time (i.e., not permanently).
Found it - thanks...
Lainaus käyttäjältä: MrPhil - elokuu 31, 2011, 01:35:15 IP
Did you upgrade to 1.1.14 at some point? I see your signature says you're no longer at 1.1.4.
Woops - my mistake... I am, and have been running 1.1.14...
In the interim, my ISP (after reading this post) downgraded my PHP and the trouble has gone away. woo hoo... (before I could change the encoding view in my browser to test unfortunately, but thanks for the info)
Then I think he's gonna help convert to UTF-8 - then re-update my PHP... barring you steering me in some other direction!
I appreciate your help!
Well, those characters in the image sure look like UTF-8 being displayed as Latin-1 or perhaps Windows-1252. If your forum has always been Latin-1 (ISO-8859-1), I'm wondering where that UTF-8 text slipped in! Part of the message (up to ...) looks like something that would have come out of the database, rather than a language file, but stuff in the page list (?) looks like something that would either be hardcoded into your code file (a bad practice), or coming from a language file. It would have been helpful to know if that funny text was in fact UTF-8. If this stuff was hardcoded into some code file (e.g., a linktree call), it should have been giving trouble regardless of PHP level. Your host swears they never touched MySQL or Apache? I can't imagine what in PHP would be producing UTF-8 in one version and Latin-1 otherwise. What was your PHP level before (and now) -- 4.4.x? I can't think of what would produce different text output from database or language files or from PHP code, if just the PHP version changed. Did you look for SMF and system error logs, including a directory-by-directory search for something like "error_log" files? Maybe there are some PHP error messages in (only one or the other version of) PHP that you're overlooking. I would be especially concerned about "header already sent" errors.
If your posts (messages), subject lines, category and board titles, PMs, user names, and any other text coming out of the database are fine, I wouldn't bother switching the database to UTF-8. Do that only if you have other reasons (wanting to support text in any language). Was the image sample you gave the only place where there are problems? It's probably a hardcoded << and >> (or something similar) in your theme's "linktree" call. I see that all the time. The solution is to replace hardcoded non-ASCII characters with HTML entities in code files (not necessary in language files). Assuming you're not using a standard SMF theme, try switching to another theme (such as theme=1 standard core theme) and see if you still get funny characters in PHP 5.2.
These funky characters were also appearing in messages and titles. (sorry can't supply the images) - so this was not caused by language, source or template files.
From my ISP...
The DB and MySql were not touched. The only update was from
PHP-5.3.6 -> PHP-5.3.8 (from Gentoo Portage).
If all that happened was going from PHP 5.3.6 to 5.3.8, and these character corruptions are widespread, all I can think of is that it was a bad update, possibly with corrupted or dummy mb_* routines. 5.3.6 is plenty bleeding edge as it is, so if rolling back to that level works, don't sweat it.