Wrong parsing of UTF characters in board names categories and posts in 1.1 RC2

Started by spiros, December 31, 2005, 07:03:12 PM

Previous topic - Next topic

spiros

[Yes, I have downloaded the 2 latest fix files]

Initially I thought there was something wrong with the upgrade. So I deleted everything and did a clean install.

It seems that non-Latin characters are corrupted in board names, categories and posts although in a quite insonsistent manner!  For example:

Χιο�?μο�? > Γενικά

http://www.nonsmokersclub.com/forum/index.php/topic,4.0.html

Grudge

I'll make a note of this for Compuart, he deals with charset problems (Although he may not realise this :P )
I'm only a half geek really...

spiros

Thanks Grudge, I consider UTF compatibility an absolutely major issue. Thank God I did not try the upgrade/install of RC2 on my language site.

You could just as well tell him to have a look at this post, where there are some good solutions to the older, minor issues.

spiros

#3
This is what I get when I click on my messages:

Illegal mix of collations (latin1_swedish_ci,COERCIBLE) and (utf8_general_ci,IMPLICIT) for operation 'find_in_set'
File: /home/free/public_html/forum/Sources/PersonalMessage.php
Line: 380


FROM {$db_prefix}pm_recipients AS pmr
WHERE pmr.ID_MEMBER = $ID_MEMBER
AND pmr.deleted = 0$labelQuery", __FILE__, __LINE__);
list ($max_messages) = mysql_fetch_row($request);
mysql_free_result($request);


My database is a UTF-8 one and the collation of all tables is        utf8_general_ci.

At the end of the tables (in phpmyadmin) I can see:

  117 table(s)       Sum       2,105       --        latin1_swedish_ci         610.0 KB      

This is the same as in previous installation of (RC1) where no problems occured.

Grudge

Are all columns in that table the same collation, especially the labels column?
I'm only a half geek really...

spiros

Yes, here you go

Field    Type  Collation  Attributes  Null  Default  Extra  Action
ID_PM  int(10)   UNSIGNED No    Change Drop Primary Index Unique Fulltext
ID_MEMBER  mediumint(8)   UNSIGNED No    Change Drop Primary Index Unique Fulltext
labels  varchar(60) utf8_general_ci   No  -1    Change Drop Primary Index Unique Fulltext
bcc  tinyint(3)   UNSIGNED No    Change Drop Primary Index Unique Fulltext
is_read  tinyint(3)   UNSIGNED No    Change Drop Primary Index Unique Fulltext
deleted  tinyint(3)   UNSIGNED No  0

Grudge

Is the table itself also utf8_general_ci? I know the database is, and the column, but what about the actual table?
I'm only a half geek really...

spiros

Grudge,

All tables are. I have checked this, even made an SQL dump to have a closer look.



Table  Action  RecordsTip  Type  Collation  Size  Overhead
Drop 2 MyISAM  utf8_general_ci     4.1 KB     -
smf_attachments    Browse Structure Search Insert Empty Drop 0 MyISAM  utf8_general_ci     1.0 KB     -
smf_ban_groups    Browse Structure Search Insert Empty Drop 0 MyISAM  utf8_general_ci     1.0 KB     -
smf_ban_items    Browse Structure Search Insert Empty Drop 0 MyISAM  utf8_general_ci     1.0 KB     -
smf_board_permissions    Browse Structure Search Insert Empty Drop 69 MyISAM  utf8_general_ci     5.4 KB     -
smf_boards    Browse Structure Search Insert Empty Drop 4 MyISAM  utf8_general_ci     6.6 KB     -
smf_calendar    Browse Structure Search Insert Empty Drop 0 MyISAM  utf8_general_ci     1.0 KB     -
smf_calendar_holidays    Browse Structure Search Insert Empty Drop 167 MyISAM  utf8_general_ci     9.0 KB     -
smf_categories    Browse Structure Search Insert Empty Drop 1 MyISAM  utf8_general_ci     2.0 KB     -
smf_collapsed_categories    Browse Structure Search Insert Empty Drop 0 MyISAM  utf8_general_ci     1.0 KB     -
smf_log_actions    Browse Structure Search Insert Empty Drop 0 MyISAM  utf8_general_ci     1.0 KB     -
smf_log_activity    Browse Structure Search Insert Empty Drop 1 MyISAM  utf8_general_ci     4.0 KB     -
smf_log_banned    Browse Structure Search Insert Empty Drop 0 MyISAM  utf8_general_ci     1.0 KB     -
smf_log_boards    Browse Structure Search Insert Empty Drop 3 MyISAM  utf8_general_ci     2.0 KB     -
smf_log_errors    Browse Structure Search Insert Empty Drop 31 MyISAM  utf8_general_ci     9.4 KB     -
smf_log_floodcontrol    Browse Structure Search Insert Empty Drop 1 MyISAM  utf8_general_ci     2.1 KB     -
smf_log_karma    Browse Structure Search Insert Empty Drop 0 MyISAM  utf8_general_ci     1.0 KB     -
smf_log_mark_read    Browse Structure Search Insert Empty Drop 0 MyISAM  utf8_general_ci     1.0 KB     -
smf_log_notify    Browse Structure Search Insert Empty Drop 0 MyISAM  utf8_general_ci     1.0 KB     -
smf_log_online    Browse Structure Search Insert Empty Drop 2 MyISAM  utf8_general_ci     5.1 KB    648 Bytes
smf_log_polls    Browse Structure Search Insert Empty Drop 0 MyISAM  utf8_general_ci     1.0 KB     -
smf_log_search_messages    Browse Structure Search Insert Empty Drop 0 MyISAM  utf8_general_ci     1.0 KB     -
smf_log_search_results    Browse Structure Search Insert Empty Drop 0 MyISAM  utf8_general_ci     1.0 KB     -
smf_log_search_subjects    Browse Structure Search Insert Empty Drop 15 MyISAM  utf8_general_ci     3.4 KB     -
smf_log_search_topics    Browse Structure Search Insert Empty Drop 0 MyISAM  utf8_general_ci     1.0 KB     -
smf_log_topics    Browse Structure Search Insert Empty Drop 4 MyISAM  utf8_general_ci     2.0 KB     -
smf_membergroups    Browse Structure Search Insert Empty Drop 8 MyISAM  utf8_general_ci     3.3 KB     -
smf_members    Browse Structure Search Insert Empty Drop 1 MyISAM  utf8_general_ci     10.2 KB     -
smf_message_icons    Browse Structure Search Insert Empty Drop 12 MyISAM  utf8_general_ci     3.3 KB     -
smf_messages    Browse Structure Search Insert Empty Drop 4 MyISAM  utf8_general_ci     12.3 KB     -
smf_moderators    Browse Structure Search Insert Empty Drop 0 MyISAM  utf8_general_ci     1.0 KB     -
smf_package_servers    Browse Structure Search Insert Empty Drop 1 MyISAM  utf8_general_ci     2.1 KB     -
smf_permissions    Browse Structure Search Insert Empty Drop 39 MyISAM  utf8_general_ci     2.9 KB     -
smf_personal_messages    Browse Structure Search Insert Empty Drop 0 MyISAM  utf8_general_ci     1.0 KB     -
smf_pm_recipients    Browse Structure Search Insert Empty Drop 0 MyISAM  utf8_general_ci     1.0 KB     -
smf_poll_choices    Browse Structure Search Insert Empty Drop 0 MyISAM  utf8_general_ci     1.0 KB     -
smf_polls    Browse Structure Search Insert Empty Drop 0 MyISAM  utf8_general_ci     1.0 KB     -
smf_sessions    Browse Structure Search Insert Empty Drop 5 MyISAM  utf8_general_ci     7.6 KB    3,296 Bytes
smf_settings    Browse Structure Search Insert Empty Drop 163 MyISAM  utf8_general_ci     11.6 KB     -
smf_smileys    Browse Structure Search Insert Empty Drop 19 MyISAM  utf8_general_ci     2.5 KB     -
smf_themes    Browse Structure Search Insert Empty Drop 44 MyISAM  utf8_general_ci     4.5 KB     -
smf_topics    Browse Structure Search Insert Empty Drop 4 MyISAM  utf8_general_ci     7.1 KB     -
117 table(s)  Sum 2,106 -- latin1_swedish_ci    610.3 KB   4.6 KB
With selected: Check All  /  Uncheck All  /  Check tables having overhead   
Print view       Data Dictionary
Create new table on database free_joomla:
Name: 
Number of fields: 


In fact, the only thing latin in the SQL dump was the actual error

INSERT INTO `smf_log_errors` VALUES (27, 1136129606, 1, '87.202.118.46', '?action=pm', 'Database Error: Illegal mix of collations (latin1_swedish_ci,COERCIBLE) and (utf8_general_ci,IMPLICIT) for operation ''find_in_set''<br />File: /home/free/public_html/forum/Sources/PersonalMessage.php<br />Line: 380', '4a8937e39e6dd5d9610fcb06f935510a');

Grudge

OK, going out on a limb really, what does this do:


ALTER TABLE smf_pm_recipients CHANGE labels labels VARCHAR(60) NOT NULL DEFAULT -1


That should use the default MySQL collation stuff, that may bring it back in line?
I'm only a half geek really...

spiros

I run it, nothing changed. The error still appears.

  ID_PM    int(10)      UNSIGNED  No    0          Change      Drop      Primary      Index      Unique    Fulltext
ID_MEMBER  mediumint(8)   UNSIGNED No    Change Drop Primary Index Unique Fulltext
labels  varchar(60) utf8_general_ci   No  -1    Change Drop Primary Index Unique Fulltext
bcc  tinyint(3)   UNSIGNED No    Change Drop Primary Index Unique Fulltext
is_read  tinyint(3)   UNSIGNED No    Change Drop Primary Index Unique Fulltext
deleted  tinyint(3)   UNSIGNED No    Change Drop Primary Index Unique Fulltext
With selected: Check All  /  Uncheck All     With selected:    Change Drop Primary Index Unique Fulltext

Grudge

Is this your own server? If so can you confirm that my.ini has:

default-character-set=utf8
character_set_server=utf8
collation_server=utf8_general_ci


In it?
I'm only a half geek really...

Grudge

I'm only a half geek really...

spiros

I run the command

show variables like 'colla%';

in phpmyadmin and I get this:

Variable_name      Value
collation_connection    utf8_unicode_ci
collation_database    latin1_swedish_ci
collation_server    latin1_swedish_ci

Funnily enough it was all the same in RC1 where there were no problems.

I checked the messages in phpmyadmin and they appear very strange:

In one case entities:
& # 913 ; α Β β Γ γ<br />Δ δ Ε ε Ζ ζ<br />Η η Θ θ

(although they do not appear as entities in the html!)

And in another

Î'Ï...Ï,,ÏŒ είναι ένα νέο θέμα Î'Ï...Ï,,ÏŒ είναι ένα νέο θέμα Î'Ï...Ï,,ÏŒ εÎ

Grudge

I assume that is in someway the problem. I'm just a little hesitant to tell you to change it, as I'm really not sure what is the right way :/
I'm only a half geek really...

spiros

Grudge,

I doubt this is the problem (at least as far as character corruption is concerned) because with the exact same configuration, the exact same server, and the exact same database, RC1 had no problems at all.

At the same time, I also run other CMSs in the same server which have no problem whatsoever with Unicode.

So I think you might need to do some more work on your side  ;)

Grudge

Believe you me, I haven't forgotten this. It's top of my priority list, and once I get to speak to Compuart I really hope we can sort this out.
I'm only a half geek really...

spiros

Thank you Grudge,

Much appreciated. To prove what I am saying I did another installation of RC1 this time in the same server, same database. I even left the default table collation as latin1_swedish_ci.

Despite all that:

1) Unicode worked perfectly
2) The collation error did not appear in pm

see for yourself this test post

http://www.nonsmokersclub.com/rc1/index.php?topic=2.0

    jos_weblinks    Browse    Structure    Search    Insert    Empty    Drop      MyISAM      utf8_general_ci        4.1 KB         -
rc1_attachments    Browse Structure Search Insert Empty Drop 0 MyISAM  latin1_swedish_ci     1.0 KB     -
rc1_ban_groups    Browse Structure Search Insert Empty Drop 0 MyISAM  latin1_swedish_ci     1.0 KB     -
rc1_ban_items    Browse Structure Search Insert Empty Drop 0 MyISAM  latin1_swedish_ci     1.0 KB     -
rc1_board_permissions    Browse Structure Search Insert Empty Drop 69 MyISAM  latin1_swedish_ci     5.4 KB     -
rc1_boards    Browse Structure Search Insert Empty Drop 1 MyISAM  latin1_swedish_ci     6.1 KB     -
rc1_calendar    Browse Structure Search Insert Empty Drop 0 MyISAM  latin1_swedish_ci     1.0 KB     -

spiros

I changed my my.cnf (linux ini) to what you said

default-character-set=utf8
character_set_server=utf8
collation_server=utf8_general_ci

and resulted in:

1) messing up with most of my db powered sites (even UTF ones) and making non latin characters appear as question marks.
2) Did not resolve the way non latin characters characters were displayed in RC2

I rolled back the changes and everything was back to normal  :)

So I guess, this too, is not an appopriate option.

agridoc

Hi, I did also a test in the forum link that spiros gave.
http://www.nonsmokersclub.com/forum/index.php/topic,6.0.html
Other languages seem to be affected too, special characters seem to work OK.
There are also some strange things as that preview works OK but not in Greek translation (it hangs) and I see a font change when I change to Greek language (MSIE), not obvious with Mozilla/Firefox.

I am Greek also and interested in multilingual altough I use a successful different approach (not UTF-8), with spiros' s suggestion to use a script change by [Uknown] http://www.simplemachines.org/community/index.php?topic=19572.msg160840#msg160840

I used a test string from a multilingual test http://www.aeromodelling.gr/ForumS/index.php?topic=44.msg2540#msg2540

The same string was put in a test RC2 install in my PC and behaves as expected, there is only problem with special characters and Greek codepage. However it is not a UTF installation, I will try to find some time to play a little with UTF and I will inform you if I find something useful.
  For Greek aeromodellers and our friends around the world  - Greek Button sets for SMF - Greeklish to Greek mod
Δeν αφιερώνω χρόνο για μηνύματα σε greeklish.

spiros

agridoc,

It is a different case with UTF. To properly test it you need to download RC2 UTF Greek files - or if not available I can send them to you for testing.

In your test link multilingual text works OK because it is converted into entities (standard SMF with non unicode). When we talk about unicode we mean not converting to entities non latin scripts.

Advertisement: