News:

Wondering if this will always be free?  See why free is better.

Main Menu

Problem with UTF8 after moving to new server?

Started by perkele, May 01, 2020, 09:25:26 AM

Previous topic - Next topic

perkele

Hello!

TLDR: I am having problems which I suspect come from an incomplete UTF8 conversion, in a forum which was recently moved to a different server, with DB restored from an SQL dump. (And I think the error did not happen on the old server just before the move, which puzzles me.)

SQL error "illegal mix of collations for operation ' IN '", happens when using the Mention plugin in a post with some special characters, different scripts than Latin, or usernames with such characters or umlauts.
Looking at the database tables, I see that some are in utf8_general_ci and some in latin1_swedish_ci collation.

My questions:
Can this be correct?
Should not all the tables be in utf8_general_ci when using UTF8?
Can I simply dump and re-create the database, while just modifying all the collations in the dump to utf8_general_ci? Or would it break something/produce garbled data?


Longer story:

I recently moved an older long-running SMF forum (version 2.0.14, but not sure if all upgrades went correctly in the past) from one server to another.
The database was restored from an SQL dump of the old one into a new empty MariaDB database with default characterset set to UTF8MB4 and default collation to utf8_general_ci. (The forum was already working with UTF8 before, but I did not check the collation of every table how it was before the move, and I don't have the old database dump anymore to look. I thought by simply re-importing the dump, all collations would be correct automatically.)

Almost everything seems to work fine. However, there is now a problem with the "Mention" plugin. In some cases, for example when users have umlauts like 'ä' etc. in their name, or there is for example Khmer script text in the post, an error appears:

QuoteIllegal mix of collations for operation ' IN '
File: /var/www/forum.sangham.net/Sources/Mentions.php
Line: 191

Note: It appears that your database may require an upgrade. Your forum's files are currently at version SMF 2.0.14, while your database is at version 2.0.2. The above error might possibly go away if you execute the latest version of upgrade.php.

(The "Note: ..." above is also something that has been appearing whenever something goes wrong for a longer time. I have been shying away from running "upgrade.php" without understanding what it actually does.)
(The line number may be slightly off due to my own debugging efforts, maybe adding or deleting some empty lines in the code.)

Looking at what SQL query is actually causing the problem, it appears to be this one (from Mention.php, from the Mention plugin):

// Attempt to fetch all the valid usernames along with their required metadata
$request = $smcFunc['db_query']('', '
SELECT id_member, real_name, email_mentions, email_address, unread_mentions, id_group, id_post_group, additional_groups
FROM {db_prefix}members
WHERE real_name IN ({array_string:names})
ORDER BY LENGTH(real_name) DESC
LIMIT {int:count}',
array(
'names' => $names,
'count' => count($names),
)
  );


The real_name IN ({array_string:names}) seems to cause the problem when {array_string:names} contains special characters like umlauts or non-latin scripts.
Since SQL is complaining about mix of collations, I looked at the collations as they are in the database:

MariaDB [forum]> show table status from forum;

The result is a mix of tables with utf8_general_ci and others with latin1_swedish_ci. The members table in particular is in latin1_swedish_ci:

NameEngineVersionRow_formatCollationChecksumCreate_optionsComment
admin_info_filesMyISAM10Dynamiclatin1_swedish_ciNULL
approval_queueMyISAM10Fixedlatin1_swedish_ciNULL
attachmentsMyISAM10Dynamiclatin1_swedish_ciNULL
ban_groupsMyISAM10Dynamiclatin1_swedish_ciNULL
ban_itemsMyISAM10Dynamiclatin1_swedish_ciNULL
board_permissionsMyISAM10Dynamiclatin1_swedish_ciNULL
boardsMyISAM10Dynamiclatin1_swedish_ciNULL
calendarMyISAM10Dynamiclatin1_swedish_ciNULL
calendar_holidaysMyISAM10Dynamiclatin1_swedish_ciNULL
categoriesMyISAM10Dynamiclatin1_swedish_ciNULL
collapsed_categoriesMyISAM10Fixedlatin1_swedish_ciNULL
custom_fieldsMyISAM10Dynamiclatin1_swedish_ciNULL
group_moderatorsMyISAM10Fixedlatin1_swedish_ciNULL
log_actionsMyISAM10Dynamiclatin1_swedish_ciNULL
log_activityMyISAM10Fixedlatin1_swedish_ciNULL
log_bannedMyISAM10Dynamiclatin1_swedish_ciNULL
log_boardsMyISAM10Fixedlatin1_swedish_ciNULL
log_commentsMyISAM10Dynamiclatin1_swedish_ciNULL
log_digestMyISAM10Dynamiclatin1_swedish_ciNULL
log_errorsMyISAM10Dynamiclatin1_swedish_ciNULL
log_floodcontrolMyISAM10Dynamiclatin1_swedish_ciNULL
log_group_requestsMyISAM10Dynamiclatin1_swedish_ciNULL
log_karmaMyISAM10Fixedlatin1_swedish_ciNULL
log_mark_readMyISAM10Fixedlatin1_swedish_ciNULL
log_member_noticesMyISAM10Dynamiclatin1_swedish_ciNULL
log_mentionsMyISAM10Fixedutf8_general_ciNULL
log_notifyMyISAM10Fixedlatin1_swedish_ciNULL
log_onlineMyISAM10Dynamiclatin1_swedish_ciNULL
log_packagesMyISAM10Dynamiclatin1_swedish_ciNULL
log_pollsMyISAM10Fixedlatin1_swedish_ciNULL
log_reportedMyISAM10Dynamiclatin1_swedish_ciNULL
log_reported_commentsMyISAM10Dynamiclatin1_swedish_ciNULL
log_scheduled_tasksMyISAM10Fixedlatin1_swedish_ciNULL
log_search_messagesMyISAM10Fixedlatin1_swedish_ciNULL
log_search_resultsMyISAM10Fixedlatin1_swedish_ciNULL
log_search_subjectsMyISAM10Dynamiclatin1_swedish_ciNULL
log_search_topicsMyISAM10Fixedlatin1_swedish_ciNULL
log_search_wordsInnoDB10Compactutf8_general_ciNULL
log_spider_hitsMyISAM10Dynamiclatin1_swedish_ciNULL
log_spider_statsMyISAM10Fixedlatin1_swedish_ciNULL
log_subscribedMyISAM10Dynamiclatin1_swedish_ciNULL
log_topicsMyISAM10Fixedlatin1_swedish_ciNULL
login_securityMyISAM10Dynamicutf8_general_ciNULL
login_security_logMyISAM10Dynamicutf8_general_ciNULL
mail_queueMyISAM10Dynamiclatin1_swedish_ciNULL
membergroupsMyISAM10Dynamiclatin1_swedish_ciNULL
membersMyISAM10Dynamiclatin1_swedish_ciNULL
menu_itemsMyISAM10Dynamicutf8_general_ciNULL
message_iconsMyISAM10Dynamiclatin1_swedish_ciNULL
messagesMyISAM10Dynamiclatin1_swedish_ciNULL
messages_historyMyISAM10Dynamicutf8_general_ciNULL
moderatorsMyISAM10Fixedlatin1_swedish_ciNULL
newshoutboxMyISAM10Dynamicutf8_general_ciNULL
newtableMyISAM10Dynamicutf8_general_ciNULL
openid_assocMyISAM10Dynamiclatin1_swedish_ciNULL
package_serversMyISAM10Dynamiclatin1_swedish_ciNULL
permission_profilesMyISAM10Dynamiclatin1_swedish_ciNULL
permissionsMyISAM10Dynamiclatin1_swedish_ciNULL
personal_messagesMyISAM10Dynamiclatin1_swedish_ciNULL
personalized_bbcMyISAM10Dynamicutf8_general_ciNULL
pm_recipientsMyISAM10Dynamiclatin1_swedish_ciNULL
pm_rulesMyISAM10Dynamiclatin1_swedish_ciNULL
poll_choicesMyISAM10Dynamiclatin1_swedish_ciNULL
pollsMyISAM10Dynamiclatin1_swedish_ciNULL
quizMyISAM10Dynamicutf8_general_ciNULL
quiz_answerMyISAM10Dynamicutf8_general_ciNULL
quiz_categoryMyISAM10Dynamicutf8_general_ciNULL
quiz_disputeMyISAM10Dynamicutf8_general_ciNULL
quiz_infoboardMyISAM10Dynamicutf8_general_ciNULL
quiz_leagueMyISAM10Dynamicutf8_general_ciNULL
quiz_league_resultMyISAM10Fixedutf8_general_ciNULL
quiz_league_tableMyISAM10Fixedutf8_general_ciNULL
quiz_questionMyISAM10Dynamicutf8_general_ciNULL
quiz_question_typeMyISAM10Dynamicutf8_general_ciNULL
quiz_resultMyISAM10Fixedutf8_general_ciNULL
quiz_sessionMyISAM10Dynamicutf8_general_ciNULL
scheduled_tasksMyISAM10Dynamiclatin1_swedish_ciNULL
search_termsMyISAM10Dynamicutf8_general_ciNULL
sessionsMyISAM10Dynamiclatin1_swedish_ciNULL
settingsMyISAM10Dynamiclatin1_swedish_ciNULL
smileysMyISAM10Dynamiclatin1_swedish_ciNULL
spidersMyISAM10Dynamiclatin1_swedish_ciNULL
subscriptionsMyISAM10Dynamiclatin1_swedish_ciNULL
tagsMyISAM10Dynamicutf8_general_ciNULL
tags_logMyISAM10Fixedutf8_general_ciNULL
themesMyISAM10Dynamiclatin1_swedish_ciNULL
topicsMyISAM10Fixedlatin1_swedish_ciNULL
tp_articlesMyISAM10Dynamicutf8_general_ciNULL
tp_blocksMyISAM10Dynamicutf8_general_ciNULL
tp_dataMyISAM10Fixedutf8_general_ciNULL
tp_dldataMyISAM10Fixedutf8_general_ciNULL
tp_dlmanagerMyISAM10Dynamicutf8_general_ciNULL
tp_eventsMyISAM10Dynamicutf8_general_ciNULL
tp_modulesMyISAM10Dynamicutf8_general_ciNULL
tp_ratesMyISAM10Dynamicutf8_general_ciNULL
tp_ratestatsMyISAM10Dynamicutf8_general_ciNULL
tp_settingsMyISAM10Dynamicutf8_general_ciNULL
tp_shoutboxMyISAM10Dynamicutf8_general_ciNULL
tp_variablesMyISAM10Dynamicutf8_general_ciNULL
welcomeMyISAM10Dynamicutf8_general_ciNULL


Is this database structure correct for an UTF8 forum? Or should ALL tables be in utf8_general_ci collation?
Can I simply dump and reimport the database, just setting the collation for all tables to utf8_general_ci? Or would it break things?

I am shying away from using the built-in "convert to UTF8" function of the forum admin center, because I remember it made problems far in the past. Somehow the UTF8 conversion was achieved some years back, but I don't know exactly how the process was and if maybe something was/is missing.
I think the forum worked on the old server before the move, without this error. So I am wondering why it appears now

BTW: Te old databae was some (relatively recent, I think) version of MySQL, while the new one is MariaDB version "10.1.43-MariaDB-0ubuntu0.18.04.1". Not sure how relevant this is. I think a correctly set-up forum should work with all recent versions of MySQL and MariaDB?

Thanks in advance for any hints.

Kindred

to have a UTF8 forum, you need three places...
1- the table collation
2- the columns in the table itself
3- the setting in the forum (in Settings.php, there should be a line/setting indicating utf8)
Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

perkele

#2
Thanks!

Quote from: Kindred on May 01, 2020, 09:43:16 AM
to have a UTF8 forum, you need three places...
1- the table collation
2- the columns in the table itself
3- the setting in the forum (in Settings.php, there should be a line/setting indicating utf8)


Hmm... And when the database's default collation is set to utf8_general_ci and one imports from an SQL dump, then all tables should have utf8_general_ci if not specified otherwise in the dump, and all columns should have utf8_general_ci if not specified otherwise in the dump? Is that correct?

I don't actually know much about how collation settings work on all the different levels.

So if I dump and then strip the dump of all collation settings and then re-import, it should all be in the importing database's default collation, right?

perkele

Quote from: Kindred on May 01, 2020, 09:43:16 AM
to have a UTF8 forum, you need three places...
1- the table collation
2- the columns in the table itself

Hmm... I just checked, doing
SHOW FULL COLUMNS FROM ...
for each table in the database.
All text and varchar columns of every table have utf8_general_ci as their collation.
But as written before, the collation of tables themselves was latin1_swedish_ci for some tables.
So what does that actually mean?

Quote from: Kindred on May 01, 2020, 09:43:16 AM
3- the setting in the forum (in Settings.php, there should be a line/setting indicating utf8)

And in Settings.php, I have:

$language = 'english-utf8'; # The default language file set for the forum.


That is all I see there in relation to UTF8. Should there be anything else?

shawnb61

I think you still need to run the SMF function to convert your DB to UTF8.  Everything BUT smf is UTF8...

Detailed steps here, worth reading this, in case you have other things to do, e.g., language changes for users:
https://wiki.simplemachines.org/smf/UTF-8_Readme

The UTF8 conversion function ("Convert the database and data to UTF-8") is located under Admin | Maintenance | Forum Maintenance | Database.

Backup first... 
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

perkele

The problem seems to be resolved now.

Not quite sure exactly, but I think the only change really necessary was adding the line
$db_character_set = 'utf8';
in Settings.php.

Here is what I tried and how I come to this conclusion:

I dumped the database and replaced all occurrences of "DEFAULT CHARACTER SET latin1;" with "DEFAULT CHARACTER SET utf8;" in it. Then re-imported.
Now the DB query
MariaDB [forum]> show table status from forum;
would show utf8-general-ci collation everywhere.

Forum seemed to work fine as before, but still with the same error with the "mention" plugin. So no apparent change in functionality visible.

I dumped the database again to be able to restore it from here later.

Then I tried the "convert database to UTF8" function in admin settings. After that, the "mention" problem was gone.

I noticed the new line
$db_character_set = 'utf8';
in Settings.php

I dumped the database again. So now I have three database dumps that I can re-import and try out again. One before the manual conversion (before replacing "DEFAULT CHARACTER SET latin1;" with "DEFAULT CHARACTER SET utf8;" in the dump), one after the manual conversion. And one after the additional "convert to utf8" function from SMF.

Between the last two, I am not sure what the difference is. (The diff between the two .sql files is not empty, and it is difficult to see through. So there are some changes between the dumps.) Collation information as gathered from
MariaDB [forum]> show table status from forum;
and MariaDB [forum]> show full columns for ... (for each table) seems to be the same.

The first dump has latin1 character set here and there (as shown in the first post above).

But all three seem to work well, and the mention problem seems to come up only when
$db_character_set = 'utf8';
is removed from Settings.php.

Strange... as this line was not included in Settings.php before the move to the new server, and on the old server, all seemed to work fine already.

But anyway, glad this seems to be solved. Databases work in mysterious ways.

shawnb61

The settings & the DB must be in sync.

You converted the DB manually - so you needed to change the settings manually.

If you had used the SMF function to convert the DB, SMF would have changed the settings also.
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

Chen Zhen


After using SMF to convert your forum to UTF8, use this SMF utility:
https://web-develop.ca/index.php?action=downloads;area=stable_smf_utilities;file=36

Only use the collation part of the utility under its heading of "Adjust Collation Settings" where you can opt utf8_general_ci and then click save at the bottom left.
This should resolve your problem.

My SMF Mods & Plug-Ins

WebDev

"Either you repeat the same conventional doctrines everybody is saying, or else you say something true, and it will sound like it's from Neptune." - Noam Chomsky

Advertisement: