News:

Wondering if this will always be free?  See why free is better.

Main Menu

Re: charset=ISO-8859-1 and UTF-8

Started by JaVa^, March 28, 2018, 01:34:36 PM

Previous topic - Next topic

JaVa^

Sorry for bringing this topic up, but i have problem with encodings on our forum. Our service provider changed their database setup so that the default encoding is UTF-8. Our previous encoding was ISO-8859-1. This messed up those special characters we use here in Finland so i took a dump of the database and run it with iconv to convert those special characters. Now that did not go so well as it didn't change euro sign properly and i could not get the dump in because of error: #1267 - Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and
(utf8_general_ci,COERCIBLE) for operation '='. I repaired those manually and replaced them with € that seemed to do the job. Then i realised that iconv did not properly replace Ö character so i modified that also in notepad++ by find and replace. That sorted the rest of the problems. I tried also using the forums own database HTML entities to UTF-8 convert but i think it is not intended to use for that purpose. We are using SMF version 1.1.21.

The problem now is that all the posts and categories etc. are fine and displayed properly but the personal messages part is broken so that those special characters display wrong. Could someone explain why the forum acts differently than personal messages?

We have set this to Settings.php:
$db_character_set = 'utf8';

Also we have modified the index.finnish.php and index.english.php files to have this line:
$txt['lang_character_set'] = 'UTF-8';

Kindred

Well, since 1.1.x is at EoL -- and I am not sure how well it handles utf-8 anyway -- you should upgrade to SMF v2.0.15
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

GigaWatt

I also had a similar problem with Cyrillic characters in the database (SMF 1.1.16). The SMF conversion tool (in the admin menu) didn't do it's job as expected. Some of the tables were converted, but not all of them. For example, the names of the threads were converted, but not the content of the posts, or the names of the boards were showing as expected (in Cyrillic) but not the descriptions, so I had to convert the rest of the database manually. I ran this in phpMyAdmin.

ALTER DATABASE db_name DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;

ALTER TABLE db_table CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;

Where db_name is the name of your database and db_table is the name of the specific table you're trying to convert to UTF-8. I converted the database and all of it's tables, installed a fresh copy of SMF 2.0.15 and set the database as UTF-8 during the install procedure (I think the scripts asked you of your database is UTF-8 compatible during the installation procedure). Everything was fine :), but just in case, I also ran the UTF-8 conversion tool in the admin panel.

More info: https://stackoverflow.com/questions/12756877/how-to-convert-latin1-swedish-ci-data-into-utf8-general-ci

I haven't tested this in SMF 1.1.x, but it works in 2.0.15, so I have no idea if everything will be showing as expected in SMF 1.1.21 (Kindred pointed out why).
"This is really a generic concept about human thinking - when faced with large tasks we're naturally inclined to try to break them down into a bunch of smaller tasks that together make up the whole."

"A 500 error loosely translates to the webserver saying, "WTF?"..."

JaVa^

We know that we have to upgrade to 2.0.x at somepoint but it requires some testing so it's not an option at the moment as we have to get things sorted as soon as possible.

I got this partially working. The problem was that in the database was set like this:
DEFAULT CHARACTER SET latin1
And the data in the database was in fact originally UTF-8 so when i tried to convert it with iconv it messed things up even more.

Now the only thing i'm struggling with are the timestamps of the posts. We have one month that has scandinavian characters and they are broken like this:
Kes?kuu 19, 2011, 21:07:54
I tried to search those timestamps from the database in order to fix them manually but i don't seem to find them?

Aleksi "Lex" Kilpinen

I'd guess that would be your language files having ISO characters while the forum uses UTF. 
Slava
Ukraini!
"Before you allow people access to your forum, especially in an administrative position, you must be aware that that person can seriously damage your forum. Therefore, you should only allow people that you trust, implicitly, to have such access." -Douglas

How you can help SMF

JaVa^

They are like this:

index.finnish.php:$txt['months'] = array(1 => 'Tammikuu', 'Helmikuu', 'Maaliskuu', 'Huhtikuu', 'Toukokuu', 'Kesäkuu', 'Heinäkuu', 'Elokuu', 'Syyskuu', 'Lokakuu', 'Marraskuu', 'Joulukuu');
index.finnish.php:$txt['months_titles'] = array(1 => 'Tammikuu', 'Helmikuu', 'Maaliskuu', 'Huhtikuu', 'Toukokuu', 'Kesäkuu', 'Heinäkuu', 'Elokuu', 'Syyskuu', 'Lokakuu', 'Marraskuu', 'Joulukuu');
index.finnish.php:$txt['months_short'] = array(1 =>  'Tammikuu', 'Helmikuu', 'Maaliskuu', 'Huhtikuu', 'Toukokuu', 'Kesäkuu', 'Heinäkuu', 'Elokuu', 'Syyskuu', 'Lokakuu', 'Marraskuu', 'Joulukuu');

Aren't those correct?

Kindred

instead of trying to debug an issue on a version which is barely supported, I'd spend your efforts actually getting the upgrade in place.
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

Aleksi "Lex" Kilpinen

Quote from: JaVa^ on March 30, 2018, 03:22:09 AM
They are like this:

index.finnish.php:$txt['months'] = array(1 => 'Tammikuu', 'Helmikuu', 'Maaliskuu', 'Huhtikuu', 'Toukokuu', 'Kesäkuu', 'Heinäkuu', 'Elokuu', 'Syyskuu', 'Lokakuu', 'Marraskuu', 'Joulukuu');
index.finnish.php:$txt['months_titles'] = array(1 => 'Tammikuu', 'Helmikuu', 'Maaliskuu', 'Huhtikuu', 'Toukokuu', 'Kesäkuu', 'Heinäkuu', 'Elokuu', 'Syyskuu', 'Lokakuu', 'Marraskuu', 'Joulukuu');
index.finnish.php:$txt['months_short'] = array(1 =>  'Tammikuu', 'Helmikuu', 'Maaliskuu', 'Huhtikuu', 'Toukokuu', 'Kesäkuu', 'Heinäkuu', 'Elokuu', 'Syyskuu', 'Lokakuu', 'Marraskuu', 'Joulukuu');

Aren't those correct?
Could you give me a link to a post where this is visible? Those do look right, but I'm still thinking it's the language files, or the charset settings, or a combination of those.

EDIT:
Also, just making sure - does your index.finnish-utf8.php have both locale and character set like this:


// Again, SPELLING SHOULD BE '' 99% OF THE TIME!!  Please read this!
$txt['lang_locale'] = 'fi_FI.utf8';
$txt['lang_dictionary'] = 'fi';
$txt['lang_spelling'] = '';

// Character set and right to left?
$txt['lang_character_set'] = 'UTF-8';
$txt['lang_rtl'] = false;


Slava
Ukraini!
"Before you allow people access to your forum, especially in an administrative position, you must be aware that that person can seriously damage your forum. Therefore, you should only allow people that you trust, implicitly, to have such access." -Douglas

How you can help SMF

Advertisement: