News:

Wondering if this will always be free?  See why free is better.

Main Menu

Major PITA on the utf8 and utf-8 and $db_charset etc.

Started by richardwbb, March 12, 2012, 07:00:36 PM

Previous topic - Next topic

Aleksi "Lex" Kilpinen

From that, everything looks OK on both - The only thing I can think of, is that the actual exporting/importing of the database broke the contents - There is a way to export and import using specific charsets, so perhaps the wrong one was used in the transfer process? Or perhaps you reconverted after the import? You should probably start over with the test forum.

Make sure the export using the correct charset, import using the same charset and connection, and before upgrading or anything, make sure the test forum has all UTF8 languages and settings correct.
Only after you have everything working without doing a "reconversion", try to upgrade the test forum.
Slava
Ukraini!
"Before you allow people access to your forum, especially in an administrative position, you must be aware that that person can seriously damage your forum. Therefore, you should only allow people that you trust, implicitly, to have such access." -Douglas

How you can help SMF

richardwbb

Quote from: Aleksi "Lex" Kilpinen on March 21, 2012, 03:31:44 AM
From that, everything looks OK on both

That is what makes me puzzled too. I have no clue where to start looking.

Quote from: Aleksi "Lex" Kilpinen on March 21, 2012, 03:31:44 AM
Make sure the export using the correct charset, import using the same charset and connection, and before upgrading or anything, make sure the test forum has all UTF8 languages and settings correct.
Only after you have everything working without doing a "reconversion", try to upgrade the test forum.

I have the correct UTF-8 languages there, the installer first complains about it and offers try english, which I did, later on to be sure, gave it the language files it was looking for and in the whole upgrade process, totally satisfied the installer till the point it did not warn on anything anymore, besides the unkown charset response (you know, the other topic on this forum I linked to)

Assuming this minor issue everyone will have with upgrading a UTF-8 database (only I feel like I am the only one having a UTF-8 database here???)

I assume ISo-8859-1 database would work.

On the other hand, where is the problem then? I have exported in this ISO format, I hae set bigdump to it, I verified the imported 1.1.16 database on my Linux shows correct

Still, after upgrade.php, characters mangled.

Is this a bug?

I have to use bigdump.php, because I can not import the forum differently with the ISP, their upload limit is way smaller then the size of the database and uploading such a big file isn't very reliable either.

Maybe you will tell me, bigdump.php is suspect, but then again, I imported the 1.1.16 correctly, and after upgrade.php, it *looks* like the problem is back.

I really think not much people use or have used UTf-8 database upgrading from 1.x to 2.x

Also, I don't see how to circumvent this.

Please tell me what you recommend me to do, I can not think of another reason why a proper database goes bad after upgrading, but then I mentioned exporting the live database in ISO-8859-1, importing it in this format too, same result.

AFAIK, I experience the same result which is true, but it can not be true, according to the logic I explain here.

What I could try is not to use bigdump.php, but what would it matter when I verified I have imported the 1.1.16 database correctly?

Do you agree, what is happening to my database, does not make sense?
If my post in this topic looks ambiguous to you, then I'm with Murphy's law and General Stupidity. In other words, trial and error.

Aleksi "Lex" Kilpinen

Firstly, I'd just like to mention that I have a live forum running UTF8, and I did upgrade it from 1.1 as such.

Even now, I'm unsure of your method of transferring the database though.
If the live forum is using utf8, and the connection is utf8, then you should export using utf8, and import to the new database as utf8. Basically, it should be kept utf8 through the whole thing. Unless I'm wrong somehow. How did you import the backup to the local db?

You could try Mysqldumper - it's a bit bigger tool than bigdump, but the great thing about it is that it can do export and import, and it can do it one row at a time, so you should never run into time or size limits. It can also handle multiple different charsets, and I've never had trouble with it myself.

You could also try to to put your forum in maintenance mode, make a copy of the live database on the server, and just try switching the live forum to the copy db to see if it works OK with the copy. If no problems are to be seen with the copy db, then you could revert the live forum to the original db - and this time try exporting and importing a copy of the db to your local machine to test with.

You should make sure that the transferred db works on your test machine, with the same files and setup that the live forum has before upgrading - so simply copy everything over, and edit settings.php or run repair_settings.php to get it working in the new environment.

If everything is working correctly, and everything is showing correctly, then unpack the 2.0 large upgrade package directly over the current 1.1 files, and download 2.0 versions of the utf8 languages and do the the same - before running upgrade.php

If the upgrade.php gives any errors or warnings, make sure to note them.
Slava
Ukraini!
"Before you allow people access to your forum, especially in an administrative position, you must be aware that that person can seriously damage your forum. Therefore, you should only allow people that you trust, implicitly, to have such access." -Douglas

How you can help SMF

MrPhil

Quote from: richardwbb on March 20, 2012, 06:04:59 PM
It used to be latin_swedish_ci and I always wondered, why it wasn't saying English or Dutch.
This is a combination of the character encoding and the language-sensitive collation. MySQL originated in Sweden, so not surprisingly, the collation (specific sorting order, especially for non-ASCII characters) defaults to the Swedish language's. For most users of MySQL, the collation doesn't matter all that much and they just leave it as the default. "latin" is Latin-1 (ISO-8859-1) encoding, which is quite usable for English and Western European language sites.

Anyway, your specific language in use (English, Dutch, etc.) does not show up in the database encoding/collation string, at least if you haven't explicitly changed it. When you change the database to UTF-8, you will see a change to "utf..." and some (often "general") collation.

richardwbb

This is good info, thank you for clearing up the swedish in the collation, Phil.

And Lex, this will take me some time. At first glance it seems I already tried this, but indeed I can try some other things too. Since you have UTF-8 db, that means the upgrade.php 2.0.2 should be fine.

I received a new laptop and ehr, it only takes a day or two to get a decent OS on it ;)

But I will get back to this, not much choice here.  :laugh:
If my post in this topic looks ambiguous to you, then I'm with Murphy's law and General Stupidity. In other words, trial and error.

richardwbb

Ok, I start now with working on this issue, by starting all over again.

My first attempt is to export the live database in ISO-8859 format, to check if that will import correctly and show correctly.

Not expecting this will work, if it doens't, I will try updating an 'empty' database (where I then first post my own special characters in it), upgrade this to 2.0 and expect that will work.

And I try to use mysqldumper on my Linux, assuming the exported database from the live forum is correct.

But I wonder, is there a setting in the database itself, I can check or compare, could it be that it still is saying ISO-8859 somehwere, while MySQL reports the utf8 collation together with the db_charset setting in Settings.php?
If my post in this topic looks ambiguous to you, then I'm with Murphy's law and General Stupidity. In other words, trial and error.

richardwbb

I have it working!!!  :o :laugh: :laugh: :laugh:

I first installed a clean 1.1.16 forum with an empty database so to say.

I learned that settings.php I have on the live forum, contains:

$db_character_set = 'UTF-8';

but the fresh 1.1.16 contained:

$db_character_set = 'utf8';

(I am not sure where and how I picked up the UTF-8 instead of utf8)

I checked the special chars, by omitting this $db_character_set by replacing:

#$db_character_set = 'utf8';

Still, the html header of the forum, was showing UTF-8, which changed to ISO-8859-1 when I changed the forum language from 'language-utf8' to 'language'

I checked the special chars, they showed mangled. Here I made a slight jump in the air of happiness.

I've copied the 2.0.2. upgrade package over the 1.1.16 forum and setting the language to english-utf8, which required me to upload the 2.0.2 english utf8 language package.

That way I preserved my (in my case) dutch-utf8, modifications settings.

Then I ran upgrade.php and checked the special characters, still showing properly.

I opend Settings.php to look for possible $db_character_set = 'utf8',

Which it doesn't have. This tells me this setting is ambigious.

However, and here is what really was holding me back:

I could not export the database of the live forum in .gz or .bz2 format, only leaving .zip.

I tried to export a database on my Linux server, also with phpmyadmin, and surprisingly, it also doesn't export in .gz or bz2.

However, I did receive files from both the live phpmyadmin and my Linux server, which winrar would not decompress

So I thought, maybe I have to unpack this within Linux. Somehow, this also *does not* work.

Now I found that I was sent to the woods this way, because when I unpack the .sql exported with phpmyadmin in .zip format, with Notepad++, the format of the file was ANSI.

Setting it to UTF-8 without BOM (not converting) showed that:

ë flipped to ë (Here I again made a little jump of joy in to the air)

Saving the .sql, upgrading as usual

Presto, all characters now show correctly!!!

I think phpmyadmin exports .zip in Windows format, while it *could* leave it to Unix UTF-8 format, but it doesn't.

Together with the $db_character_set from Settings.php and the choice in language file being -utf8 or not, I can imagine I was in the woods, especially not knowing that phpmyadmin exports .zip in ANSI.

Woohoo!
If my post in this topic looks ambiguous to you, then I'm with Murphy's law and General Stupidity. In other words, trial and error.

Aleksi "Lex" Kilpinen

Now, this is not exactly my strongest area of expertise, but while I'm happy you got things working OK in the end, I think there were some unnecessary steps in that, caused by a simple mistake to begin with.

Quote from: richardwbb on March 27, 2012, 05:04:28 PM
My first attempt is to export the live database in ISO-8859 format, to check if that will import correctly and show correctly.
If the database is actually in utf8, exporting it in an ISO format will result in the contents of the dump being ISO instead of UTF8. This is what I believe to have caused you some unnecessary steps in the rest of the export/import process. The ANSI character set, also known as Windows-1252, is a superset of ISO-8859-1 with the addition of 27 characters in locations that ISO designates for control codes.
So in effect you just partly converted the database contents from UTF8 to ISO, while still trying to continue to use it as UTF8.
Slava
Ukraini!
"Before you allow people access to your forum, especially in an administrative position, you must be aware that that person can seriously damage your forum. Therefore, you should only allow people that you trust, implicitly, to have such access." -Douglas

How you can help SMF

richardwbb

I am a little lost on your answer.

I agree that what you say is like 'A > B', while I took 'A > Z > B'.

What I know for sure, using phpmyadmin of the ISP and phpmyadmin on my local Linux, forced to use a zipped file on both, but it might aswell be my way of using Windows here, do you use Windows at all?

I am able to get a Linux install running properly as an internet server, but I use Samba for SMB access to the directory '/var/www'

However, I also use Notepad++ and the 'format' menu is the one to keep an eye on. I suppose when being in 'ISO-8859-1 mode', everything will work fine.

I also learned that another option within SMF, 'convert to UTF-8', must be bullet proof, while it from an administrators point of view, it looks tricky. Just like the 'search and repair all errors' option, it is bullet proof.

The fun starts when I unpack the zipped SQL file (I have to let the ISP zip it, it is about 50 Megs.) Notepad++ opens this file, containing UTF-8 'steering codes', while the header of this file, somehow makes it show as an ANSI file.

So just setting it to UTF-8 without BOM, seems to correct phpmyadmin export or Windows or Notepad++ itself.

Alteast SMF can't help that. The $dB_characterset setting in Settings.php confused me, when I set the language in SMF to 'non UTF-8', the same mangled characters appear as when I use a UTF-8 language with a ISO-8859-1 database.

So I am happy I have found out what was going on, I got myself in to this trouble by not checking in Notepad++ what format it was reading and in what format it *must* be. (the database.sql I mean)

That is because I never really understood the whole thing with charsets, there seem to be many, all I know from my DOS experience, I started using Windows  95, not usig 3.x at all, started to use IE5, not using the older IE at all (or internet)

And I do remember te Windows 1252 thing, I also remember I was told to use ISO-8859-1 and now I was told (however you explained, it was not necessary) to use UTF-8.

I'll stick with UTF-8 now, assuming never to run in this trouble again (besides missing out on an ANSI formatted .sql)

Still I again thank you *very much*, because you have shown me what it *not* could be.

So actually, I learned that SMF wasn't the problem, MySQL wasn't the problem and hey presto, I can blame everything on Windows.  O:)

I do know for sure, SMF added the db_charaterset in Settings.php somehow, possibly me upgrading things and later on the setting 'UTF-8' has to be changed to 'utf8', where I am indeed appear being the only one, capable of using ANSI formatted .sql and overlooking this close to for ever.

I assume here the db_characterset in Settings.php is meant for people using UTF-8 database with a ISO-8859-1 language, however, I suppose that would give mangled chars either? ???

Also, since Dutch language do not containt a lot of special chars, I found out when it was too late, I had a corrupted database, where users already inputted *new* special chars.

In the end, my database has still mangled chars beyond repair, but that was my own fault.

Glad it is now only in really old topics ad not in the running topics, so users won't complain.

*doing another small dance of joy*  ;D
If my post in this topic looks ambiguous to you, then I'm with Murphy's law and General Stupidity. In other words, trial and error.

Advertisement: