utf-8 conversion error

Started by Krashsite, November 05, 2016, 05:54:20 PM

Previous topic - Next topic

Sir Osis of Liver

The problems with that are :

1. - Who's to know?

2. - If you run the utf-8 conversion on the imported database, it fails with this error -


You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'key key varbinary(255) NOT NULL default '',
CHANGE COLUMN member_name mem' at line 2
File: /public_html/smf20test/Sources/ManageMaintenance.php
Line: 666



It had already been tried by the OP of the original thread.  Haven't had time to try this, but it appears that if you want to import a utf-8 database into a clean install, you have to:

1. - Upload full install package.
2. - Create new database.
3. - Run Install.php to populate db with tables.
4. - Run utf-8 conversion on new database from ACP in new forum.
5. - Use phpmyadmin to drop all database tables.
6. - Import utf-8 database.

If you don't do it exactly in that order, imported db will display incorrectly.  The only other way to fix it is to manually
add $db_character_set = 'utf8'; to Settings.php.

Would be a whole lot easier to have a setting in smf_settings that SMF could use to identify the database as utf-8, and handle it correctly.
Ashes and diamonds, foe and friend,
 we were all equal in the end.

                                     - R. Waters

Kindred

ummmm....Sir Osis...  if you just imported your files and database, you don't have to bother with that, since the Settings.php file from your original site will already HAVE that line.   I think, this case is due to your strange desire to use an overly complicated way to backup/restore/move a site and database.

I mean:
Quote
1. - Upload full install package.
2. - Create new database.
3. - Run Install.php to populate db with tables.
4. - Run utf-8 conversion on new database from ACP in new forum.
5. - Use phpmyadmin to drop all database tables.
6. - Import utf-8 database.
can be replaced by
1. - create a blank database on the new host
2. - import the database backup and backed up forum files
3. - run repair_settings.php to adjust the paths and urls.
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

Sir Osis of Liver

No, you've missed the point.  This discussion began in another thread, OP's forum had been hacked, account suspended, support had jerked him around, and done things unknown to the forum.  He wanted to move the forum to a new host, and it was recommended that he do a completely new install, to avoid carrying over any crap in forum files from old to new host.  Before attempting the move, he tried it in his current host, did clean install in another directory, exported production db, imported it into new db, that's when the problem showed up.  I tried importing it into clean install on my host, same result. 

If you do a completely new install, Settings.php out-of-box does not contain $db_character_set = 'utf8';, it's added by running utf-8 conversion on default latin1 database.  You can't run the conversion on the imported db, it errors out, and you obviously can't run it on a blank db. 

Nothing strange about it, the forum owner is experienced running SMF forums, and Sir Osis has a passing familiarity with the software, and it took a couple of weeks, with input from many folks, to figure this out.

What's most annoying is, when the utf-8 coversion was run on the original db, it added
global_character_set UTF-8
to smf_settings, so the db is identified as being utf-8, but SMF ignores it.  Why doesn't SMF use the db setting, rather than Settings.php, to determine the correct collation?  That would eliminate this problem altogether.
Ashes and diamonds, foe and friend,
 we were all equal in the end.

                                     - R. Waters

richardwbb

Quote from: Sir Osis of Liver on November 24, 2016, 04:04:13 PM
What's most annoying is, when the utf-8 coversion was run on the original db, it added
global_character_set UTF-8
to smf_settings, so the db is identified as being utf-8, but SMF ignores it.  Why doesn't SMF use the db setting, rather than Settings.php, to determine the correct collation?  That would eliminate this problem altogether.


Regarding to what I'm quoting here, I can tell you that this can not be easy, since the way the UTF-8 data is stored is to be decided by how MySQL is configured with the ISP. Or in other words, it wouldn't make it easy to support all possible configurations.

And for the db setting in Settings.php, you will have to click UTF-8 when installing SMF. Arantor also told me that.

However, I can tell from your remarks, that you have been on the same road as I was, only difference, I inherited the SMF installation and database.

On the other hand, I still believe that having a db_character_set setting and a language template ISO-8859-1 that must not be uninstalled [stuff depends on that even with UTF-8], and a SMF that switches encoding, when you enable the UTF-8 template [that you will have to install first, SMF should have the UTF-8 language too and work from there i make of that]

And also, on the other hand [someone else's, haha], it might not be possible for a PHP coded program without root privileges retrieve the proper collation and charset from MySQL, since even a latin 1 database supports all encodings and charsets [and does that very precise, fast]

If MySQL, is configured [be able[ to talk UTF-8 with the client, it doesn't matter what is sent by that client. You only can't run with a ISO-8859-1 language template forum, because that will show Latin-1 encoded UTF-8, tadaa.

And some final words; somewhere the Site Admin has to be told and should gather understanding of what is a MySQL database [or how data looks in that dumpfile], to know about language templates and that db-setting switch.  phpMyAdmin's way of reporting how MySQL is collated/ encoded, doesn't make me feel attracted to that. because my, MySQL is reporting that it talks Latin1 with the [any] client and phpMyAdmin is reporting UTF8mb4, and I'm awaiting for it to fail, haha.

If my post in this topic looks ambiguous to you, then I'm with Murphy's law and General Stupidity. In other words, trial and error.

Sir Osis of Liver

In this case, it's basically the same difference.  Retrieving $db_character_set = 'utf8'; from Settings.php is handled in Load.php here -



// Most database systems have not set UTF-8 as their default input charset.
if (!empty($db_character_set))
$smcFunc['db_query']('set_character_set', '
SET NAMES ' . $db_character_set,
array(
)
);



Just as easy to do this -



/// Check db for UTF-8 collation
if ($modSettings['global_character_set'] == 'UTF-8')
$smcFunc['db_query']('set_character_set', '
SET NAMES ' . 'utf8',
array(
)
);



... and retrieve the setting from the database, so it's available wherever the database is imported.

But what do I know.

Ashes and diamonds, foe and friend,
 we were all equal in the end.

                                     - R. Waters

richardwbb

More then I do, with programming I suppose, haha.

I'm not sure on what '$modSettings', means [what it does], and this might work for you. But if it is possible that a user inserts a Latin encoded database dumpfile into a UTF-8 talking MySQL database, all special characters will end up 'ich'.

But that might relate to the situations I've been, because that settings, I saw in the database file, and it was empty. It also wasn't showing ISO-8859-1 and I believe there is no other option for encoding scheme's. [and how on earth did that got lost]

So I would, test the whole database, for non-Latin characters, that is easy, I've done it, then, manually convert it with iconv [possibly also the way it is done with SMF, and don't ask me how this is done on a WAMP server, haha] and then send the required command with phpMyAdmin to MySQL, to change character set and collation of all the tables. <- this doesn't change the content inside the database on any table.

What I don't get, is that people that are programmers, still have been able to create, MyISAM tables while InnoDB is prefered, and, that they are able to, still, put the encoding to Latin1, or 'swedish chef', haha. [or that a UTF-8 running SMF forum, still modifications aren't updating language in the UTF-8 version of that language file]. The latter is easy to overcome though.

And a final point. A UTF-8 running SMF forum, does not, display non-Latin characters properly, with the 'okay users' message inside Settings.php, tadaa.
If my post in this topic looks ambiguous to you, then I'm with Murphy's law and General Stupidity. In other words, trial and error.

nend

Quote from: Sir Osis of Liver on November 28, 2016, 09:35:18 PM
In this case, it's basically the same difference.  Retrieving $db_character_set = 'utf8'; from Settings.php is handled in Load.php here -



// Most database systems have not set UTF-8 as their default input charset.
if (!empty($db_character_set))
$smcFunc['db_query']('set_character_set', '
SET NAMES ' . $db_character_set,
array(
)
);



Just as easy to do this -



/// Check db for UTF-8 collation
if ($modSettings['global_character_set'] == 'UTF-8')
$smcFunc['db_query']('set_character_set', '
SET NAMES ' . 'utf8',
array(
)
);



... and retrieve the setting from the database, so it's available wherever the database is imported.

But what do I know.



That will not work as modsettings doesn't exist at that point in the code.

Sir Osis of Liver

#67
You don't replace the existing code where it is, it has to be added after the array is loaded, about a half page down.  Makes no difference if the old code remains, they both do the same thing.  Makes more sense (imo) to ask the database if it's utf-8, rather than using Settings.php.
Ashes and diamonds, foe and friend,
 we were all equal in the end.

                                     - R. Waters

richardwbb

That wouldn't be sufficient for an SMF forum with ISO-8859-1 language template. Then the connection with the database can be utf8 and the contents of the posts and messages will be Latin encoded to that utf8 database [èéëòóöàáä -> èéëòóöàáä]. But I did see it drop something right there and told Arantor about that. And what I told in this thread, that Settings.php itself isn't utf8 aware.

Quote from: Sir Osis of Liver on December 02, 2016, 11:26:10 AM
You don't replace the existing code where it is, it has to be added after the array is loaded, about a half page down.  Makes no difference if the old code remains, they both do the same thing.  Makes more sense (imo) to ask the database if it's utf-8, rather than using Settings.php.

If my post in this topic looks ambiguous to you, then I'm with Murphy's law and General Stupidity. In other words, trial and error.

Advertisement: