Hi everyone,
I have SMF 2.0.17 with Ver 10.15 Distrib 10.0.38-MariaDB and PHP 7.2.
I want to export the SQL and import it into Ver 15.1 Distrib 10.6.18-MariaDB. When I import the encoding changes to utfbmb3 from utf8.
In the DB viewer I can see the same thing before and after, only place it changes is on the SMF site where it shows as I see in the DB, not how it was before on the site.
How can I make so they still show fine after changing to the new MariaDB and encoding?
"UTF8" = "UTF8MB3". More recent versions of MySQL are explicit when showing mb3 vs mb4. SMF uses mb3.
That after pic appears to show double encoding. If it looks like that in the database, then I would re-do the restore.
I am not familiar with the MariaDB mysql import utility. My first guess is that it sees that the text string "UTF8MB3" is not "UTF8" and decides to do the UTF8 encoding again, causing the double encoding. If so, the fix may be as simple as clearing the db, changing the references to "UTF8MB3" to "UTF8" in the backup .sql, and re-running the import.
Fixing the restore is preferable to using any db-fix utility.
In the backup it's already UTF8
DROP TABLE IF EXISTS `smf_messages`;
/*!40101 SET @saved_cs_client = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `smf_messages` (
`subject` varchar(255) DEFAULT NULL)
) ENGINE=MyISAM AUTO_INCREMENT=78256 DEFAULT CHARSET=utf8;
/*!40101 SET character_set_client = @saved_cs_client */;
To be clear - when looking at the DB, the old DB and the new DB look different, and the new one looks corrupted, is that correct?
OR... Is it that the DB contents look the same, and the display is different?
Another question: Does it look OK within the backup .sql?
When I export from production using PHPMyAdmin the dump looks like
CREATE DATABASE IF NOT EXISTS `database` DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
CREATE TABLE `smf_sp_shouts` (
`id_shout` mediumint(8) NOT NULL,
`id_shoutbox` int(10) NOT NULL DEFAULT '0',
`id_member` mediumint(8) NOT NULL DEFAULT '0',
`member_name` varchar(80) DEFAULT '0',
`log_time` int(10) NOT NULL DEFAULT '0',
`body` text
) ENGINE=MyISAM DEFAULT CHARSET=utf8;I import it using
docker exec -i mariadb mysql -u root -ppassword < localhost.sql and export it
mysqldump -h 192.168.0.149 --port 3307 -u newuser -ppassword --databases database > database.sql.
Now it looks like
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `database` /*!40100 DEFAULT CHARACTER SET utf8 */;
DROP TABLE IF EXISTS `smf_sp_shouts`;
/*!40101 SET @saved_cs_client = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `smf_sp_shouts` (
`id_shout` mediumint(8) NOT NULL AUTO_INCREMENT,
`id_shoutbox` int(10) NOT NULL DEFAULT '0',
`id_member` mediumint(8) NOT NULL DEFAULT '0',
`member_name` varchar(80) DEFAULT '0',
`log_time` int(10) NOT NULL DEFAULT '0',
`body` text,
PRIMARY KEY (`id_shout`)
) ENGINE=MyISAM AUTO_INCREMENT=57699 DEFAULT CHARSET=utf8;At this point, even though I haven't done anything beside import and export DB, and it broke the site, saying "ă" instead of "ă", and so on.
Only difference between the production server and docker container is PHP 7.2. The production runs on 5.6.40
Quote from: shawnb61 on September 20, 2024, 01:20:55 PMTo be clear - when looking at the DB, the old DB and the new DB look different, and the new one looks corrupted, is that correct?
Another question: Does it look OK within the backup .sql?
On the production site in PHPMyAdmin I see chars like "ă", but on the site they are displayed corectly like "ă". All dumps have it like "ă".
Quote from: mosukeric on September 20, 2024, 02:08:30 PMOn the production site in PHPMyAdmin I see chars like "ă", but on the site they are displayed corectly like "ă". All dumps have it like "ă".
So the data is in fact corrupted in the source.
Question: in the settings table, what is the value of global_character_set in both environments? They must be the same, but just checking...
It's crazy, all I had to do was
$db_character_set = 'latin1';Dumps had latin1
CREATE DATABASE /*!32312 IF NOT EXISTS*/ `database` /*!40100 DEFAULT CHARACTER SET latin1 */;Production did not had it included in Settings.php and worked fine without, but not the new version it seems.
When upgrading to 2.1, the upgrade.php had a notice and it would not go over to finish, it was undefined charset_detected, has a search on google and this (https://github.com/SimpleMachines/SMF/issues/8125) is the problem I had.
Have added this and the upgrade finished, I hope it did what it should considering I've changed the upgrade script.
$upcontext['charset_detected'] = isset($lang_charsets[$language]) ? $lang_charsets[$language] : 'ISO-8859-1';Thank you
@shawnb61, you made me explore a lot today. I still don't fully grasp encoding/colation, but I'm closer.
// I see now in Settings.php
# UTF-8 is now the only character set supported in 2.1.
$db_character_set = 'latin1';So I guess the upgrade did not go as expected since I need latin1.
Unfortunately, your db is actually utf8 (aka utf8mb3) with some double-encoded characters, not latin1. SMF is designed to work with your db as utf8/utf8mb3.
I believe that solution appears to be working as your browser is trying to display utf8 characters as latin1, and the downgrade from utf8 to latin1 is undoing the double-encoding.
It sort of works... But my recommendation would be to actually properly reflect everything as utf8. This is the best long-term solution, and will survive any upgrades, ports, or eventual migrations to utf8mb4.
For your SMF db to be fully utf8, 3 things must be true:
- All of your tables are utf8-general-ci (or utf8mb3-general-ci, which is a synonym)
- The value of $db_character_set in your Settings.php file should be: 'utf8'
- The value of the 'global_character_set' entry in your smf_settings table should be: 'UTF-8'
Then you will have to find a utility to fix your double-encoded characters in the database. Double-encoded characters are basically corrupt, broken. You need to find a way to fix them.
It appears you have one of my utilities to fix double encoding from a while back. I would let that run all the way thru and see if it works. (I took that utility offline when it did some bad things to a MySQL 5.7 db that I couldn't figure out... Worked fine on 5.6 & 8.0, but not 5.7...???)
If it doesn't, you can search around for comparable utilities, e.g.:
https://stackoverflow.com/questions/11436594/how-to-fix-double-encoded-utf8-characters-in-an-utf-8-table
I hope this helps...
SELECT hex(subject) FROM `database`.`smf_messages` WHERE `id_msg` LIKE '%78151%' ORDER BY `id_msg`;
52653A2048C3AFC2BFC2BD7274696120766120666920616C6261737472C384C69220283230303629
UPDATE database.smf_messages
SET subject = CONVERT(CAST(CONVERT(subject USING latin1) AS BINARY) USING utf8mb4)
WHERE id_msg = 78151;
SELECT hex(subject) FROM `database`.`smf_messages` WHERE `id_msg` LIKE '%78151%' ORDER BY `id_msg`;
52653A2048EFBFBD7274696120766120666920616C6261737472C48320283230303629
Hexcode of the text, before and after, here it does not work, it changes from a wrong char to another.
If I run for the whole table I get error, but if I restore the DB and try only with the ID of the record that had error, it works somehow!
Quoteutf8mb4
Per SHawn's instructions, you should be using utf8-general-ci (or utf8mb3-general-ci, which is a synonym)
not mb4
SELECT hex(subject) FROM goldies.smf_messages WHERE `id_msg` = 61;
4D6F6D656E74652064652061757220486F7261746975204D616C61656C6520323030342044564435
UPDATE goldies.smf_messages
SET body = CONVERT(CAST(CONVERT(body USING latin1) AS BINARY) USING utf8mb3)
WHERE id_msg = 61;Invalid utf8mb3 character string
Same error for utf8mb4.
That is quite literally what it means - characters are invalid.
You'll probably have to fix those by hand.
BTW, you showed the subject in hex, but the error is with the body. The subject is fine.