Hello!
TLDR: I am having problems which I suspect come from an incomplete UTF8 conversion, in a forum which was recently moved to a different server, with DB restored from an SQL dump. (And I think the error did not happen on the old server just before the move, which puzzles me.)
SQL error "illegal mix of collations for operation ' IN '", happens when using the Mention plugin in a post with some special characters, different scripts than Latin, or usernames with such characters or umlauts.
Looking at the database tables, I see that some are in
utf8_general_ci and some in
latin1_swedish_ci collation.
My questions:Can this be correct?
Should not all the tables be in
utf8_general_ci when using UTF8?
Can I simply dump and re-create the database, while just modifying all the collations in the dump to
utf8_general_ci? Or would it break something/produce garbled data?
Longer story:I recently moved an older long-running SMF forum (version 2.0.14, but not sure if all upgrades went correctly in the past) from one server to another.
The database was restored from an SQL dump of the old one into a new empty MariaDB database with default characterset set to UTF8MB4 and default collation to utf8_general_ci. (The forum was already working with UTF8 before, but I did not check the collation of every table how it was before the move, and I don't have the old database dump anymore to look. I thought by simply re-importing the dump, all collations would be correct automatically.)
Almost everything seems to work fine. However, there is now a problem with the "Mention" plugin. In some cases, for example when users have umlauts like 'รค' etc. in their name, or there is for example Khmer script text in the post, an error appears:
QuoteIllegal mix of collations for operation ' IN '
File: /var/www/forum.sangham.net/Sources/Mentions.php
Line: 191
Note: It appears that your database may require an upgrade. Your forum's files are currently at version SMF 2.0.14, while your database is at version 2.0.2. The above error might possibly go away if you execute the latest version of upgrade.php.
(The "Note: ..." above is also something that has been appearing whenever something goes wrong for a longer time. I have been shying away from running "upgrade.php" without understanding what it actually does.)
(The line number may be slightly off due to my own debugging efforts, maybe adding or deleting some empty lines in the code.)
Looking at what SQL query is actually causing the problem, it appears to be this one (from
Mention.php, from the Mention plugin):
// Attempt to fetch all the valid usernames along with their required metadata
$request = $smcFunc['db_query']('', '
SELECT id_member, real_name, email_mentions, email_address, unread_mentions, id_group, id_post_group, additional_groups
FROM {db_prefix}members
WHERE real_name IN ({array_string:names})
ORDER BY LENGTH(real_name) DESC
LIMIT {int:count}',
array(
'names' => $names,
'count' => count($names),
)
);
The
real_name IN ({array_string:names}) seems to cause the problem when
{array_string:names} contains special characters like umlauts or non-latin scripts.
Since SQL is complaining about mix of collations, I looked at the collations as they are in the database:
MariaDB [forum]> show table status from forum;
The result is a mix of tables with
utf8_general_ci and others with
latin1_swedish_ci. The
members table in particular is in
latin1_swedish_ci:
Name | Engine | Version | Row_format | Collation | Checksum | Create_options | Comment |
admin_info_files | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
approval_queue | MyISAM | 10 | Fixed | latin1_swedish_ci | NULL | | |
attachments | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
ban_groups | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
ban_items | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
board_permissions | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
boards | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
calendar | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
calendar_holidays | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
categories | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
collapsed_categories | MyISAM | 10 | Fixed | latin1_swedish_ci | NULL | | |
custom_fields | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
group_moderators | MyISAM | 10 | Fixed | latin1_swedish_ci | NULL | | |
log_actions | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
log_activity | MyISAM | 10 | Fixed | latin1_swedish_ci | NULL | | |
log_banned | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
log_boards | MyISAM | 10 | Fixed | latin1_swedish_ci | NULL | | |
log_comments | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
log_digest | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
log_errors | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
log_floodcontrol | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
log_group_requests | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
log_karma | MyISAM | 10 | Fixed | latin1_swedish_ci | NULL | | |
log_mark_read | MyISAM | 10 | Fixed | latin1_swedish_ci | NULL | | |
log_member_notices | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
log_mentions | MyISAM | 10 | Fixed | utf8_general_ci | NULL | | |
log_notify | MyISAM | 10 | Fixed | latin1_swedish_ci | NULL | | |
log_online | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
log_packages | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
log_polls | MyISAM | 10 | Fixed | latin1_swedish_ci | NULL | | |
log_reported | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
log_reported_comments | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
log_scheduled_tasks | MyISAM | 10 | Fixed | latin1_swedish_ci | NULL | | |
log_search_messages | MyISAM | 10 | Fixed | latin1_swedish_ci | NULL | | |
log_search_results | MyISAM | 10 | Fixed | latin1_swedish_ci | NULL | | |
log_search_subjects | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
log_search_topics | MyISAM | 10 | Fixed | latin1_swedish_ci | NULL | | |
log_search_words | InnoDB | 10 | Compact | utf8_general_ci | NULL | | |
log_spider_hits | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
log_spider_stats | MyISAM | 10 | Fixed | latin1_swedish_ci | NULL | | |
log_subscribed | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
log_topics | MyISAM | 10 | Fixed | latin1_swedish_ci | NULL | | |
login_security | MyISAM | 10 | Dynamic | utf8_general_ci | NULL | | |
login_security_log | MyISAM | 10 | Dynamic | utf8_general_ci | NULL | | |
mail_queue | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
membergroups | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
members | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
menu_items | MyISAM | 10 | Dynamic | utf8_general_ci | NULL | | |
message_icons | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
messages | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
messages_history | MyISAM | 10 | Dynamic | utf8_general_ci | NULL | | |
moderators | MyISAM | 10 | Fixed | latin1_swedish_ci | NULL | | |
newshoutbox | MyISAM | 10 | Dynamic | utf8_general_ci | NULL | | |
newtable | MyISAM | 10 | Dynamic | utf8_general_ci | NULL | | |
openid_assoc | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
package_servers | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
permission_profiles | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
permissions | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
personal_messages | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
personalized_bbc | MyISAM | 10 | Dynamic | utf8_general_ci | NULL | | |
pm_recipients | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
pm_rules | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
poll_choices | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
polls | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
quiz | MyISAM | 10 | Dynamic | utf8_general_ci | NULL | | |
quiz_answer | MyISAM | 10 | Dynamic | utf8_general_ci | NULL | | |
quiz_category | MyISAM | 10 | Dynamic | utf8_general_ci | NULL | | |
quiz_dispute | MyISAM | 10 | Dynamic | utf8_general_ci | NULL | | |
quiz_infoboard | MyISAM | 10 | Dynamic | utf8_general_ci | NULL | | |
quiz_league | MyISAM | 10 | Dynamic | utf8_general_ci | NULL | | |
quiz_league_result | MyISAM | 10 | Fixed | utf8_general_ci | NULL | | |
quiz_league_table | MyISAM | 10 | Fixed | utf8_general_ci | NULL | | |
quiz_question | MyISAM | 10 | Dynamic | utf8_general_ci | NULL | | |
quiz_question_type | MyISAM | 10 | Dynamic | utf8_general_ci | NULL | | |
quiz_result | MyISAM | 10 | Fixed | utf8_general_ci | NULL | | |
quiz_session | MyISAM | 10 | Dynamic | utf8_general_ci | NULL | | |
scheduled_tasks | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
search_terms | MyISAM | 10 | Dynamic | utf8_general_ci | NULL | | |
sessions | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
settings | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
smileys | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
spiders | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
subscriptions | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
tags | MyISAM | 10 | Dynamic | utf8_general_ci | NULL | | |
tags_log | MyISAM | 10 | Fixed | utf8_general_ci | NULL | | |
themes | MyISAM | 10 | Dynamic | latin1_swedish_ci | NULL | | |
topics | MyISAM | 10 | Fixed | latin1_swedish_ci | NULL | | |
tp_articles | MyISAM | 10 | Dynamic | utf8_general_ci | NULL | | |
tp_blocks | MyISAM | 10 | Dynamic | utf8_general_ci | NULL | | |
tp_data | MyISAM | 10 | Fixed | utf8_general_ci | NULL | | |
tp_dldata | MyISAM | 10 | Fixed | utf8_general_ci | NULL | | |
tp_dlmanager | MyISAM | 10 | Dynamic | utf8_general_ci | NULL | | |
tp_events | MyISAM | 10 | Dynamic | utf8_general_ci | NULL | | |
tp_modules | MyISAM | 10 | Dynamic | utf8_general_ci | NULL | | |
tp_rates | MyISAM | 10 | Dynamic | utf8_general_ci | NULL | | |
tp_ratestats | MyISAM | 10 | Dynamic | utf8_general_ci | NULL | | |
tp_settings | MyISAM | 10 | Dynamic | utf8_general_ci | NULL | | |
tp_shoutbox | MyISAM | 10 | Dynamic | utf8_general_ci | NULL | | |
tp_variables | MyISAM | 10 | Dynamic | utf8_general_ci | NULL | | |
welcome | MyISAM | 10 | Dynamic | utf8_general_ci | NULL | | |
Is this database structure correct for an UTF8 forum? Or should ALL tables be in utf8_general_ci collation?
Can I simply dump and reimport the database, just setting the collation for all tables to utf8_general_ci? Or would it break things?
I am shying away from using the built-in "convert to UTF8" function of the forum admin center, because I remember it made problems far in the past. Somehow the UTF8 conversion was achieved some years back, but I don't know exactly how the process was and if maybe something was/is missing.
I think the forum worked on the old server before the move, without this error. So I am wondering why it appears now
BTW: Te old databae was some (relatively recent, I think) version of MySQL, while the new one is MariaDB version "10.1.43-MariaDB-0ubuntu0.18.04.1". Not sure how relevant this is. I think a correctly set-up forum should work with all recent versions of MySQL and MariaDB?
Thanks in advance for any hints.
to have a UTF8 forum, you need three places...
1- the table collation
2- the columns in the table itself
3- the setting in the forum (in Settings.php, there should be a line/setting indicating utf8)
Thanks!
Quote from: Kindred on May 01, 2020, 09:43:16 AM
to have a UTF8 forum, you need three places...
1- the table collation
2- the columns in the table itself
3- the setting in the forum (in Settings.php, there should be a line/setting indicating utf8)
Hmm... And when the database's default collation is set to
utf8_general_ci and one imports from an SQL dump, then all tables should have
utf8_general_ci if not specified otherwise in the dump, and all columns should have
utf8_general_ci if not specified otherwise in the dump? Is that correct?
I don't actually know much about how collation settings work on all the different levels.
So if I dump and then strip the dump of all collation settings and then re-import, it should all be in the importing database's default collation, right?
Quote from: Kindred on May 01, 2020, 09:43:16 AM
to have a UTF8 forum, you need three places...
1- the table collation
2- the columns in the table itself
Hmm... I just checked, doing
SHOW FULL COLUMNS FROM ...
for each table in the database.
All text and varchar columns of every table have
utf8_general_ci as their collation.
But as written before, the collation of tables themselves was
latin1_swedish_ci for some tables.
So what does that actually mean?
Quote from: Kindred on May 01, 2020, 09:43:16 AM
3- the setting in the forum (in Settings.php, there should be a line/setting indicating utf8)
And in Settings.php, I have:
$language = 'english-utf8'; # The default language file set for the forum.
That is all I see there in relation to UTF8. Should there be anything else?
I think you still need to run the SMF function to convert your DB to UTF8. Everything BUT smf is UTF8...
Detailed steps here, worth reading this, in case you have other things to do, e.g., language changes for users:
https://wiki.simplemachines.org/smf/UTF-8_Readme
The UTF8 conversion function ("Convert the database and data to UTF-8") is located under Admin | Maintenance | Forum Maintenance | Database.
Backup first...
The problem seems to be resolved now.
Not quite sure exactly, but I think the only change really necessary was adding the line
$db_character_set = 'utf8';
in Settings.php.
Here is what I tried and how I come to this conclusion:
I dumped the database and replaced all occurrences of "DEFAULT CHARACTER SET latin1;" with "DEFAULT CHARACTER SET utf8;" in it. Then re-imported.
Now the DB query
MariaDB [forum]> show table status from forum;
would show utf8-general-ci collation everywhere.
Forum seemed to work fine as before, but still with the same error with the "mention" plugin. So no apparent change in functionality visible.
I dumped the database again to be able to restore it from here later.
Then I tried the "convert database to UTF8" function in admin settings. After that, the "mention" problem was gone.
I noticed the new line
$db_character_set = 'utf8';
in Settings.php
I dumped the database again. So now I have three database dumps that I can re-import and try out again. One before the manual conversion (before replacing "DEFAULT CHARACTER SET latin1;" with "DEFAULT CHARACTER SET utf8;" in the dump), one after the manual conversion. And one after the additional "convert to utf8" function from SMF.
Between the last two, I am not sure what the difference is. (The diff between the two .sql files is not empty, and it is difficult to see through. So there are some changes between the dumps.) Collation information as gathered from
MariaDB [forum]> show table status from forum;
and MariaDB [forum]> show full columns for ...
(for each table) seems to be the same.
The first dump has latin1 character set here and there (as shown in the first post above (https://www.simplemachines.org/community/index.php?topic=573021.msg4054467#msg4054467)).
But all three seem to work well, and the mention problem seems to come up only when
$db_character_set = 'utf8';
is removed from Settings.php.
Strange... as this line was not included in Settings.php before the move to the new server, and on the old server, all seemed to work fine already.
But anyway, glad this seems to be solved. Databases work in mysterious ways.
The settings & the DB must be in sync.
You converted the DB manually - so you needed to change the settings manually.
If you had used the SMF function to convert the DB, SMF would have changed the settings also.
After using SMF to convert your forum to UTF8, use this SMF utility:
https://web-develop.ca/index.php?action=downloads;area=stable_smf_utilities;file=36
Only use the collation part of the utility under its heading of "Adjust Collation Settings" where you can opt utf8_general_ci and then click save at the bottom left.
This should resolve your problem.