News:

Join the Facebook Fan Page.

Main Menu

Collation problem

Started by Sir Osis of Liver, March 16, 2025, 03:49:54 PM

Previous topic - Next topic

Sir Osis of Liver

Having a problem with collation in 2.1.4.  Personal text is displaying like this -

    ‘70 Guzzi, ‘73 Norton, ‘73 V7 Sport, ‘12 V7 Racer

You can see it here.

Collation is set to utf8_general_ci.  When I try to change it with phpmyadmin to utf8mb4_general_ci (utf8mb3 is not available) it stalls at members table with this error -


ALTER TABLE `smf_members` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci

MySQL said: Documentation
#1071 - Specified key was too long; max key length is 1000 bytes


All tables before members are converted.  Personal text is still scrambled.

When in Emor, do as the Snamors.
                              - D. Lister

shawnb61

#1
Quote from: shawnb61 on March 03, 2025, 05:43:53 PMThe supported collation for 2.1 is utf8_general_ci.  This is what the smf upgrader & other tools will convert to.  (In recent versions of mysql this is displayed as utf8mb3_general_ci; they are synonymous.)
In addition, there are 2 important settings to look at:
- $db_character_set in your Settings.php file should be 'utf8'
- The 'global_character_set' entry in your smf_settings table should be 'UTF-8'

The upgrader & smf2.0 utf8 conversion utilities will set these if they run properly to completion.

A question worth asking is born in experience & driven by necessity. - Fripp

shawnb61

A question worth asking is born in experience & driven by necessity. - Fripp

Chen Zhen


I don't know if it will make a difference for this scenario but I would go with unicode instead of general in any case.

Here's an informative read for you from stack : utf8_unicode_ci vs utf8_general_ci

My SMF Mods & Plug-Ins

WebDev

SMF support staff should be shaping a positive community experience & not provoking an argument or emotional reaction.

Sir Osis of Liver

Quote from: shawnb61 on March 16, 2025, 04:22:19 PMThe upgrader & smf2.0 utf8 conversion utilities will set these if they run properly to completion.

That's been the problem from the beginning, the 2.1.4 upgrade completed but didn't run correctly.  We have problems with attachments and avatars, and now collation.  Collation was set to utf8_general_ci after upgrade, Settings.php contains $db_character_set = 'utf8'; but not global_character_set.  Will try to get collation back to utf8_general_ci, if that fails I'll have to import the  clone database off my server again and start from there.

Just had a look, clone database is in utf8mb3_general_ci, per this -

Quote from: shawnb61 on March 12, 2025, 03:53:30 PM- Convert any tables that are still latin1 to utf8mb3_general_ci using phpmyadmin.


Chen, will have a look at that if all else fails, don't want to introduce ant more variables into this battered database.
When in Emor, do as the Snamors.
                              - D. Lister

Sir Osis of Liver

Quote from: shawnb61 on March 16, 2025, 04:22:19 PM- The 'global_character_set' entry in your smf_settings table should be 'UTF-8'

Misread that, setting is correct.  Everything is back to utf8_general_ci, collation still borked.


This is what's in the database for that member -

personal_text - ‘70 Guzzi, ‘73 Norton, ‘73 V7 Sport, ‘12 V7 Racer
When in Emor, do as the Snamors.
                              - D. Lister

shawnb61

If it's in the DB like that, the core data has been corrupted.

I'd double-check the source.  My suspicion is that it happened when they did the initial utf8 conversion in 1.1.  The 2.1 upgrader is generally good at avoiding double-encoding, but it doesn't fix what's already broke.

If they've been living with it, then they've been living with it... 

If you want to fix it, you need to find a repair utility.  Lots of scripts online to do that, map the funky chars to something better.  I suspect that †is a single quote, '. 
A question worth asking is born in experience & driven by necessity. - Fripp

shawnb61

Quote from: Sir Osis of Liver on March 17, 2025, 12:57:22 PMMisread that, setting is correct.

Did you change that setting before or after the 2.1 upgrade?
A question worth asking is born in experience & driven by necessity. - Fripp

Chen Zhen


I made a SMF utility for such a thing years ago that uses 3rd party scripts to convert characters.
As long as you have an original backup of the db it won't hurt to try it.
ref. https://web-develop.ca/index.php?action=downloads;area=stable_smf_utilities;file=36

My SMF Mods & Plug-Ins

WebDev

SMF support staff should be shaping a positive community experience & not provoking an argument or emotional reaction.

Sir Osis of Liver

Quote from: shawnb61 on March 17, 2025, 01:43:33 PMDid you change that setting before or after the 2.1 upgrade?
No, upgrade changed it.

Quote from: Chen Zhen on March 17, 2025, 03:52:50 PMI made a SMF utility for such a thing years ago that uses 3rd party scripts to convert characters.
As long as you have an original backup of the db it won't hurt to try it.
ref. https://web-develop.ca/index.php?action=downloads;area=stable_smf_utilities;file=36
Will give it a shot on backup.
When in Emor, do as the Snamors.
                              - D. Lister

shawnb61

If the upgrade changed the settings, I suspect the upgrade ran the utf8 conversion.

The point of making those settings changes *before* the upgrade was to avoid that.

If there was no double-encoding before the upgrade, it may make sense to rerun the upgrade, but this time correct the settings first.  That will avoid the utf8 conversion step, and hopefully prevent the double-encoding.
A question worth asking is born in experience & driven by necessity. - Fripp

Sir Osis of Liver

I have the 1.1.21 database, but I believe OP attempted at least one upgrade before I got to it, would have failed with RENAME errors.  It's quite an old forum, first post was in Jan 2008, and has been migrated several times, most recently on GoDaddy.  Most of the contact seems to be intact.  We lost attachments, was able to restore avatars.  Wonder what would happen if I reverted collation to latin1_swedish_ci. 
When in Emor, do as the Snamors.
                              - D. Lister

Sir Osis of Liver

Quote from: Chen Zhen on March 17, 2025, 03:52:50 PMI made a SMF utility for such a thing years ago that uses 3rd party scripts to convert characters.
As long as you have an original backup of the db it won't hurt to try it.
ref. https://web-develop.ca/index.php?action=downloads;area=stable_smf_utilities;file=36

Mod installs ok, won't run - session verification error.

When in Emor, do as the Snamors.
                              - D. Lister

Sir Osis of Liver

If all else fails, do it the hard way -


update smf_members
set personal_text = REPLACE(personal_text,"‘","'")


Worked, but only found one row.  Just a matter of finding them all.

When in Emor, do as the Snamors.
                              - D. Lister

Chen Zhen


The collation tool is a bit dated & I didn't re-test it thoroughly (using the current SMF 2.1.4, PHP 8.3, MySQL 8.0.X, MariaDB 11.X, etc.) before you reported an issue.
Scrap the file you downloaded & try version 2.6 which is available from the same link.
I had to fix 3 things in it & just tested it about 10 times without any problems.

My SMF Mods & Plug-Ins

WebDev

SMF support staff should be shaping a positive community experience & not provoking an argument or emotional reaction.

Sir Osis of Liver

I think we'll let it go.  I've replaced several special characters with queries and gotten results from 1 to over 8000 rows.  It's an old forum (2008) with 16,374 members and 1,393,576 messages in database.  Has a lot of battle damage but running fine in 2.1.4.  Collation shouldn't be a problem going forward, production database is in utf8_general_ci and settings are correct.  Will try your updated script on copy of database on my server.

When in Emor, do as the Snamors.
                              - D. Lister

Advertisement: