Switching to php7 causes german umlauts to be displayed incorrectly

Started by orktown, June 10, 2017, 05:43:34 AM

Previous topic - Next topic

orktown

Hi,

We recently tried to switch to php7 after upgrading to 2.0.14. But when we do that, german umlauts in posts(smf_messages) become broken. German texts in menus and general forum texts are still correct. e.g. the german translation for "posts" is "Beiträge". It is still displayed correctly.

When I change the encoding in the browser manually to UTF-8, the posts become correct again, but the umlauts in menus become broken. When we switch back to php5 the forum is fine again. So, I guess some default, maybe in php has changed? Are messages stored as UTF-8 and later converted?

Forum language: German (ISO-8859-1)
Collation of smf_messages  table: latin1_swedish_ci

Any hints/ideas would be helpful.

Arantor

This changed in PHP 5.4 to be UTF-8 by default. Ideally the forum should be converted, it will be forced in SMF 2.1 anyway.

orktown

What's "this"? What changed? The collation/default encoding of the tables?
How should I convert it? I tried to click that convert button somewhere in administration. But it "did nothing". Or at least, I didn't notice anything.

What do you recommend?

Arantor

No, the default in the PHP language itself when it talks to your database.

Conversion requires firstly going through your database, then using UTF-8 language files which you are currently not doing.

shawnb61

I think the SMF admin function to convert to UTF8 may help here; it does a good job converting latin1 to utf8 (no matter what you ended up getting stored in that latin1 database).
I'd backup the system first, to be safe.  A backup is a hard requirement for this type of activity.
Then convert the database, and choose the proper language files as Arantor said. 

More here:
https://wiki.simplemachines.org/smf/UTF-8_Readme
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

orktown

Quote from: Arantor on June 10, 2017, 10:38:25 AM
No, the default in the PHP language itself when it talks to your database.
Conversion requires firstly going through your database, then using UTF-8 language files which you are currently not doing.

Um, you said there was a change from Php 5.4, but we switched to Php 5.6 years ago. Why does it affect us now when switching to php 7? I also compared the php apache2 ini files. Apart from some unrelated changes they are quite equal. Especially, default_charset was UTF-8 in php5.6 too.

Quote from: shawnb61 on June 10, 2017, 11:14:35 AM
I think the SMF admin function to convert to UTF8 may help here; it does a good job converting latin1 to utf8 (no matter what you ended up getting stored in that latin1 database).
https://wiki.simplemachines.org/smf/UTF-8_Readme


I have tried that admin function already, it didn't change anything. I just checked on the frontend, didn't check if the collation in the db changed. I also tried to change the collation manually too, to no avail.
Maybe the table content is already UTF-8. Then I could just follow the steps in the link, switch the language packs and update all members langugae settings. I will try that on a backup.

orktown

Ok, that did it!
I have downloaded the -utf8 language packs and then I updated all users. I had to be a bit more careful since we already had some utf8 languages installed and the query in the documentation would have updated them to polish-utf8-utf8 ...   :P

Is it save to delete the old language files? They have 0 users according to that table in the admin area.
I guess it is best practice to change the collation of the tables too?

shawnb61

Can you confirm tables are no longer latin1-swedish-ci?
They should now be utf8-general-ci.
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

orktown

I did try the conversion now, but it has an interesting effect: Afterwards the umlauts are broken again.
I believe, the content of the tables is already utf8, only the collation is wrong. I had that before with other software.

I am unsure what the best way to proceed is now. Alter the tables to utf8? Leave them "as is"? I tend to the first one. Well, no need to rush it.

shawnb61

Can you confirm the collation of your tables?

Did you check prior to running the utf8 conversion as I asked? 

Pretty important, as converting utf8 tables to utf8 corrupts data.

(Converting latin1 tables to utf8, even with utf8 content in them, is ok....)
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

Advertisement: