Simple Machines Community Forum

SMF Support => SMF 1.1.x Support => Topic started by: spiros on December 31, 2005, 07:03:12 PM

Title: Wrong parsing of UTF characters in board names categories and posts in 1.1 RC2
Post by: spiros on December 31, 2005, 07:03:12 PM
[Yes, I have downloaded the 2 latest fix files]

Initially I thought there was something wrong with the upgrade. So I deleted everything and did a clean install.

It seems that non-Latin characters are corrupted in board names, categories and posts although in a quite insonsistent manner!  For example:

Χιο�?μο�? > Γενικά

http://www.nonsmokersclub.com/forum/index.php/topic,4.0.html
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 RC2
Post by: Grudge on December 31, 2005, 08:22:27 PM
I'll make a note of this for Compuart, he deals with charset problems (Although he may not realise this :P )
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: spiros on December 31, 2005, 08:30:45 PM
Thanks Grudge, I consider UTF compatibility an absolutely major issue. Thank God I did not try the upgrade/install of RC2 on my language site.

You could just as well tell him to have a look at this post (http://www.simplemachines.org/community/index.php?topic=63146.msg437576#msg437576), where there are some good solutions to the older, minor issues.
Title: Illegal mix of collations (latin1_swedish_ci,COERCIBLE)
Post by: spiros on January 01, 2006, 10:37:50 AM
This is what I get when I click on my messages:

Illegal mix of collations (latin1_swedish_ci,COERCIBLE) and (utf8_general_ci,IMPLICIT) for operation 'find_in_set'
File: /home/free/public_html/forum/Sources/PersonalMessage.php
Line: 380


FROM {$db_prefix}pm_recipients AS pmr
WHERE pmr.ID_MEMBER = $ID_MEMBER
AND pmr.deleted = 0$labelQuery", __FILE__, __LINE__);
list ($max_messages) = mysql_fetch_row($request);
mysql_free_result($request);


My database is a UTF-8 one and the collation of all tables is        utf8_general_ci.

At the end of the tables (in phpmyadmin) I can see:

  117 table(s)       Sum       2,105       --        latin1_swedish_ci         610.0 KB      

This is the same as in previous installation of (RC1) where no problems occured.
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 RC2
Post by: Grudge on January 01, 2006, 10:44:12 AM
Are all columns in that table the same collation, especially the labels column?
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: spiros on January 01, 2006, 10:47:21 AM
Yes, here you go

Field    Type  Collation  Attributes  Null  Default  Extra  Action
ID_PM  int(10)   UNSIGNED No    Change Drop Primary Index Unique Fulltext
ID_MEMBER  mediumint(8)   UNSIGNED No    Change Drop Primary Index Unique Fulltext
labels  varchar(60) utf8_general_ci   No  -1    Change Drop Primary Index Unique Fulltext
bcc  tinyint(3)   UNSIGNED No    Change Drop Primary Index Unique Fulltext
is_read  tinyint(3)   UNSIGNED No    Change Drop Primary Index Unique Fulltext
deleted  tinyint(3)   UNSIGNED No  0
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 RC2
Post by: Grudge on January 01, 2006, 10:54:58 AM
Is the table itself also utf8_general_ci? I know the database is, and the column, but what about the actual table?
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: spiros on January 01, 2006, 11:00:01 AM
Grudge,

All tables are. I have checked this, even made an SQL dump to have a closer look.



Table  Action  RecordsTip  Type  Collation  Size  Overhead
Drop 2 MyISAM  utf8_general_ci     4.1 KB     -
smf_attachments    Browse Structure Search Insert Empty Drop 0 MyISAM  utf8_general_ci     1.0 KB     -
smf_ban_groups    Browse Structure Search Insert Empty Drop 0 MyISAM  utf8_general_ci     1.0 KB     -
smf_ban_items    Browse Structure Search Insert Empty Drop 0 MyISAM  utf8_general_ci     1.0 KB     -
smf_board_permissions    Browse Structure Search Insert Empty Drop 69 MyISAM  utf8_general_ci     5.4 KB     -
smf_boards    Browse Structure Search Insert Empty Drop 4 MyISAM  utf8_general_ci     6.6 KB     -
smf_calendar    Browse Structure Search Insert Empty Drop 0 MyISAM  utf8_general_ci     1.0 KB     -
smf_calendar_holidays    Browse Structure Search Insert Empty Drop 167 MyISAM  utf8_general_ci     9.0 KB     -
smf_categories    Browse Structure Search Insert Empty Drop 1 MyISAM  utf8_general_ci     2.0 KB     -
smf_collapsed_categories    Browse Structure Search Insert Empty Drop 0 MyISAM  utf8_general_ci     1.0 KB     -
smf_log_actions    Browse Structure Search Insert Empty Drop 0 MyISAM  utf8_general_ci     1.0 KB     -
smf_log_activity    Browse Structure Search Insert Empty Drop 1 MyISAM  utf8_general_ci     4.0 KB     -
smf_log_banned    Browse Structure Search Insert Empty Drop 0 MyISAM  utf8_general_ci     1.0 KB     -
smf_log_boards    Browse Structure Search Insert Empty Drop 3 MyISAM  utf8_general_ci     2.0 KB     -
smf_log_errors    Browse Structure Search Insert Empty Drop 31 MyISAM  utf8_general_ci     9.4 KB     -
smf_log_floodcontrol    Browse Structure Search Insert Empty Drop 1 MyISAM  utf8_general_ci     2.1 KB     -
smf_log_karma    Browse Structure Search Insert Empty Drop 0 MyISAM  utf8_general_ci     1.0 KB     -
smf_log_mark_read    Browse Structure Search Insert Empty Drop 0 MyISAM  utf8_general_ci     1.0 KB     -
smf_log_notify    Browse Structure Search Insert Empty Drop 0 MyISAM  utf8_general_ci     1.0 KB     -
smf_log_online    Browse Structure Search Insert Empty Drop 2 MyISAM  utf8_general_ci     5.1 KB    648 Bytes
smf_log_polls    Browse Structure Search Insert Empty Drop 0 MyISAM  utf8_general_ci     1.0 KB     -
smf_log_search_messages    Browse Structure Search Insert Empty Drop 0 MyISAM  utf8_general_ci     1.0 KB     -
smf_log_search_results    Browse Structure Search Insert Empty Drop 0 MyISAM  utf8_general_ci     1.0 KB     -
smf_log_search_subjects    Browse Structure Search Insert Empty Drop 15 MyISAM  utf8_general_ci     3.4 KB     -
smf_log_search_topics    Browse Structure Search Insert Empty Drop 0 MyISAM  utf8_general_ci     1.0 KB     -
smf_log_topics    Browse Structure Search Insert Empty Drop 4 MyISAM  utf8_general_ci     2.0 KB     -
smf_membergroups    Browse Structure Search Insert Empty Drop 8 MyISAM  utf8_general_ci     3.3 KB     -
smf_members    Browse Structure Search Insert Empty Drop 1 MyISAM  utf8_general_ci     10.2 KB     -
smf_message_icons    Browse Structure Search Insert Empty Drop 12 MyISAM  utf8_general_ci     3.3 KB     -
smf_messages    Browse Structure Search Insert Empty Drop 4 MyISAM  utf8_general_ci     12.3 KB     -
smf_moderators    Browse Structure Search Insert Empty Drop 0 MyISAM  utf8_general_ci     1.0 KB     -
smf_package_servers    Browse Structure Search Insert Empty Drop 1 MyISAM  utf8_general_ci     2.1 KB     -
smf_permissions    Browse Structure Search Insert Empty Drop 39 MyISAM  utf8_general_ci     2.9 KB     -
smf_personal_messages    Browse Structure Search Insert Empty Drop 0 MyISAM  utf8_general_ci     1.0 KB     -
smf_pm_recipients    Browse Structure Search Insert Empty Drop 0 MyISAM  utf8_general_ci     1.0 KB     -
smf_poll_choices    Browse Structure Search Insert Empty Drop 0 MyISAM  utf8_general_ci     1.0 KB     -
smf_polls    Browse Structure Search Insert Empty Drop 0 MyISAM  utf8_general_ci     1.0 KB     -
smf_sessions    Browse Structure Search Insert Empty Drop 5 MyISAM  utf8_general_ci     7.6 KB    3,296 Bytes
smf_settings    Browse Structure Search Insert Empty Drop 163 MyISAM  utf8_general_ci     11.6 KB     -
smf_smileys    Browse Structure Search Insert Empty Drop 19 MyISAM  utf8_general_ci     2.5 KB     -
smf_themes    Browse Structure Search Insert Empty Drop 44 MyISAM  utf8_general_ci     4.5 KB     -
smf_topics    Browse Structure Search Insert Empty Drop 4 MyISAM  utf8_general_ci     7.1 KB     -
117 table(s)  Sum 2,106 -- latin1_swedish_ci    610.3 KB   4.6 KB
With selected: Check All  /  Uncheck All  /  Check tables having overhead   
Print view       Data Dictionary
Create new table on database free_joomla:
Name: 
Number of fields: 


In fact, the only thing latin in the SQL dump was the actual error

INSERT INTO `smf_log_errors` VALUES (27, 1136129606, 1, '87.202.118.46', '?action=pm', 'Database Error: Illegal mix of collations (latin1_swedish_ci,COERCIBLE) and (utf8_general_ci,IMPLICIT) for operation ''find_in_set''<br />File: /home/free/public_html/forum/Sources/PersonalMessage.php<br />Line: 380', '4a8937e39e6dd5d9610fcb06f935510a');
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 RC2
Post by: Grudge on January 01, 2006, 11:06:40 AM
OK, going out on a limb really, what does this do:


ALTER TABLE smf_pm_recipients CHANGE labels labels VARCHAR(60) NOT NULL DEFAULT -1


That should use the default MySQL collation stuff, that may bring it back in line?
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: spiros on January 01, 2006, 11:09:30 AM
I run it, nothing changed. The error still appears.

  ID_PM    int(10)      UNSIGNED  No    0          Change      Drop      Primary      Index      Unique    Fulltext
ID_MEMBER  mediumint(8)   UNSIGNED No    Change Drop Primary Index Unique Fulltext
labels  varchar(60) utf8_general_ci   No  -1    Change Drop Primary Index Unique Fulltext
bcc  tinyint(3)   UNSIGNED No    Change Drop Primary Index Unique Fulltext
is_read  tinyint(3)   UNSIGNED No    Change Drop Primary Index Unique Fulltext
deleted  tinyint(3)   UNSIGNED No    Change Drop Primary Index Unique Fulltext
With selected: Check All  /  Uncheck All     With selected:    Change Drop Primary Index Unique Fulltext
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 RC2
Post by: Grudge on January 01, 2006, 11:12:38 AM
Is this your own server? If so can you confirm that my.ini has:

default-character-set=utf8
character_set_server=utf8
collation_server=utf8_general_ci


In it?
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 RC2
Post by: Grudge on January 01, 2006, 11:17:05 AM
May find this useful?>
http://www.simplemachines.org/community/index.php?topic=27367.msg211371#msg211371
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: spiros on January 01, 2006, 11:24:26 AM
I run the command

show variables like 'colla%';

in phpmyadmin and I get this:

Variable_name      Value
collation_connection    utf8_unicode_ci
collation_database    latin1_swedish_ci
collation_server    latin1_swedish_ci

Funnily enough it was all the same in RC1 where there were no problems.

I checked the messages in phpmyadmin and they appear very strange:

In one case entities:
& # 913 ; α Β β Γ γ<br />Δ δ Ε ε Ζ ζ<br />Η η Θ θ

(although they do not appear as entities in the html!)

And in another

Î'Ï...Ï,,ÏŒ είναι ένα νέο θέμα Î'Ï...Ï,,ÏŒ είναι ένα νέο θέμα Î'Ï...Ï,,ÏŒ εÎ
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 RC2
Post by: Grudge on January 01, 2006, 11:39:15 AM
I assume that is in someway the problem. I'm just a little hesitant to tell you to change it, as I'm really not sure what is the right way :/
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: spiros on January 01, 2006, 02:14:08 PM
Grudge,

I doubt this is the problem (at least as far as character corruption is concerned) because with the exact same configuration, the exact same server, and the exact same database, RC1 had no problems at all.

At the same time, I also run other CMSs in the same server which have no problem whatsoever with Unicode.

So I think you might need to do some more work on your side  ;)
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 RC2
Post by: Grudge on January 01, 2006, 03:51:37 PM
Believe you me, I haven't forgotten this. It's top of my priority list, and once I get to speak to Compuart I really hope we can sort this out.
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: spiros on January 01, 2006, 04:26:27 PM
Thank you Grudge,

Much appreciated. To prove what I am saying I did another installation of RC1 this time in the same server, same database. I even left the default table collation as latin1_swedish_ci.

Despite all that:

1) Unicode worked perfectly
2) The collation error did not appear in pm

see for yourself this test post

http://www.nonsmokersclub.com/rc1/index.php?topic=2.0

    jos_weblinks    Browse    Structure    Search    Insert    Empty    Drop      MyISAM      utf8_general_ci        4.1 KB         -
rc1_attachments    Browse Structure Search Insert Empty Drop 0 MyISAM  latin1_swedish_ci     1.0 KB     -
rc1_ban_groups    Browse Structure Search Insert Empty Drop 0 MyISAM  latin1_swedish_ci     1.0 KB     -
rc1_ban_items    Browse Structure Search Insert Empty Drop 0 MyISAM  latin1_swedish_ci     1.0 KB     -
rc1_board_permissions    Browse Structure Search Insert Empty Drop 69 MyISAM  latin1_swedish_ci     5.4 KB     -
rc1_boards    Browse Structure Search Insert Empty Drop 1 MyISAM  latin1_swedish_ci     6.1 KB     -
rc1_calendar    Browse Structure Search Insert Empty Drop 0 MyISAM  latin1_swedish_ci     1.0 KB     -
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: spiros on January 01, 2006, 05:08:39 PM
I changed my my.cnf (linux ini) to what you said

default-character-set=utf8
character_set_server=utf8
collation_server=utf8_general_ci

and resulted in:

1) messing up with most of my db powered sites (even UTF ones) and making non latin characters appear as question marks.
2) Did not resolve the way non latin characters characters were displayed in RC2

I rolled back the changes and everything was back to normal  :)

So I guess, this too, is not an appopriate option.
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 RC2
Post by: agridoc on January 01, 2006, 08:49:24 PM
Hi, I did also a test in the forum link that spiros gave.
http://www.nonsmokersclub.com/forum/index.php/topic,6.0.html
Other languages seem to be affected too, special characters seem to work OK.
There are also some strange things as that preview works OK but not in Greek translation (it hangs) and I see a font change when I change to Greek language (MSIE), not obvious with Mozilla/Firefox.

I am Greek also and interested in multilingual altough I use a successful different approach (not UTF-8), with spiros' s suggestion to use a script change by [Uknown] http://www.simplemachines.org/community/index.php?topic=19572.msg160840#msg160840

I used a test string from a multilingual test http://www.aeromodelling.gr/ForumS/index.php?topic=44.msg2540#msg2540

The same string was put in a test RC2 install in my PC and behaves as expected, there is only problem with special characters and Greek codepage. However it is not a UTF installation, I will try to find some time to play a little with UTF and I will inform you if I find something useful.
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: spiros on January 02, 2006, 03:00:06 AM
agridoc,

It is a different case with UTF. To properly test it you need to download RC2 UTF Greek files - or if not available I can send them to you for testing.

In your test link multilingual text works OK because it is converted into entities (standard SMF with non unicode). When we talk about unicode we mean not converting to entities non latin scripts.
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 RC2
Post by: agridoc on January 02, 2006, 03:41:15 AM
spiros I have to learn quite a few things about UTF.

I think I can test UTF for multilingual without the Greek UTF translation as the codepage is the same in all languages.

I would really appreciate if you could email me the Greek UTF 1.1 RC2 translation, or PM me a link. I have seen your message that you sent the official SMF 1.1RC2 translation to SMF, which now will have two versions, Greek and UTF-8. However it might take a few days to appear in the download section.

Thank you for your work for SMF in Greek and for your help with multiningual so far.
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: spiros on January 02, 2006, 05:41:38 AM
In fact, you do not even need the Greek files.

Do this:

1) Save the English language files as UTF-8 (especially index.english.php) using Notepad
2) Change the encoding in index.english.php to UTF-8
3) Upload them

You are ready to test RC2 with UTF-8.
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: Compuart on January 02, 2006, 03:32:31 PM
The problem with the collations is that before 4.0 MySQL didn't use them, between 4.0 and 4.1 MYSQL had an intertwined version of char sets and collations and from 4.1 it cannot work properly without the collation specified. If MySQL has been converted from an earlier version it's likely the columns will be configured using a latin_general collation, while if columns are changed afterwards the'll likely get the default collation of the latin character set which surprisingly is latin_swedish (I guess because MySQL is a swedish company).

I'll see if I can write a tool to fix the column collations.
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: spiros on January 02, 2006, 03:39:01 PM
compuart,

I had changed column collations to utf. This is not the point. RC1 had no such problems with unicode.

In fact, if you read the previous posts, I have installed in the same db, with latin columns, RC1 and it works perfectly fine with UTF!!!

Also, today I finished the Greek translation of minibb and it works perfectly with UTF-8 on the same database with latin columns!!!!

So I think it is mostly trying to find what has been broken in the changes from RC1 to RC2.
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: Compuart on January 02, 2006, 03:45:36 PM
Have you tried a clean install of RC2? I'm assume it'll work just as fine as RC1. The problem is most likely the upgrader changing some of the text columns causing a mismatch of collations.
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: spiros on January 02, 2006, 03:47:55 PM
Compuart,

If you read the thread carefully you will see that I tried

1) upgrade (failed)
2) clean install (failed)

I also tried 2 clean installs of RC1 with the same succesfull results.

It would be nice if I had some feedback from other people testing it with UTF-8.
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 RC2
Post by: agridoc on January 02, 2006, 04:29:20 PM
I have tried two clean installs of 1.1 RC2 (RC2 charter and RC2 public) and UTF-8 in my PC and there is a problem althouh not as severe as in spiros' case mainly with capital "Π" (at first I thought rectangles in MSIE were UTF errors, then Spiros revealed that it was polytonic Greek). Not only in Greek but in other languages (Russian for sure). I don' know Russian it was just obvious in a test page I use. :D

That's for spiros (in Greek)

ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΦnbsp;ΡΣΤΥΦΧΨΩ
αβγδεζηθικλμνξοπρστυφχψω
άέήίόύώ
ΆΈΉΊΌΎΏ


Русский (Russian) RC2 - ISO-8859-1 (or 7)
Цnbsp;усский (Russian) RC2 UTF-8
(It's the title of the Russian SMF's support board)

I think many languages may have problem with UTF-8 as it is restricted to only some characters and is hard to diagnose without the necessary language knowledge.

I didn' t change the collation in MySQL, it's latin_swedish.

I have also to report a problem with special characters and UTF-8 in RC2, a few characters do not display correctly.

P.S. By the way Compuart, please take also a look at Multilingual search in SMF's site? (http://www.simplemachines.org/community/index.php?topic=60277.0), Grudge wanted to inform you about. Gri' s messages there are a bit out of subject.  ;D
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: spiros on January 02, 2006, 04:35:19 PM
Another clean install, new db, new everything. Again, same configuration, same problems, see sample post here:

http://www.nonsmokersclub.com/forums/index.php?topic=2.0

From this last test SMF RC2 displays a strong aversion to the Greek capital Π!!!!
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 RC2
Post by: Grudge on January 02, 2006, 04:37:11 PM
What's wrong with that sample post? It looks like greek to me?

EDIT: Oh, I assume the numbers are not suppossed to be there?!

EDIT2: Attached screen, it's odd that one character is an accent on one subject line, but square on the next.
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: Compuart on January 02, 2006, 04:40:22 PM
Hmm, there seem to be two topics merged, with two different problems. Is both problem still occuring (and specifically in a clean 1.1 RC2 install)?

Quote from: spiros on January 01, 2006, 10:37:50 AM
This is what I get when I click on my messages:

Illegal mix of collations (latin1_swedish_ci,COERCIBLE) and (utf8_general_ci,IMPLICIT) for operation 'find_in_set'
File: /home/free/public_html/forum/Sources/PersonalMessage.php
Line: 380


FROM {$db_prefix}pm_recipients AS pmr
WHERE pmr.ID_MEMBER = $ID_MEMBER
AND pmr.deleted = 0$labelQuery", __FILE__, __LINE__);
list ($max_messages) = mysql_fetch_row($request);
mysql_free_result($request);

and
Quote from: spiros on December 31, 2005, 07:03:12 PMIt seems that non-Latin characters are corrupted in board names, categories and posts although in a quite insonsistent manner!  For example:

Χιο�?μο�? > Γενικά

http://www.nonsmokersclub.com/forum/index.php/topic,4.0.html
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: spiros on January 02, 2006, 05:01:46 PM
Hello Compuart,

With the new install I do not get the illegal collation thingy.

However, the real problem is not the square in the title (that can be fixed with a hack) but if you notice at the bottom right of your screenshot there is a Φnbsp which was actually a Π letter.

In fact, in Mozilla it does not display it as Φnbsp but as �.

Here is another test with polytonic Greek.

This is what it looks like in RC2:
http://www.nonsmokersclub.com/forums/index.php?topic=3.0

And this is what it looks like in RC1 (No problem here)
http://www.nonsmokersclub.com/rc1/index.php?topic=3.0

And here is the original text
http://www.mikrosapoplous.gr/syntipas.html

(use both mozilla and IE to see differences. Most of the boxes in IE would be eliminated if a unicode font was used in style sheet - not all though! Edit: I changed theme so these were sorted and you only get the Φnbsp; thingy replacing "Π".  As I said, the main problematic character appears to be the capital Π!!!)
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 RC2
Post by: agridoc on January 02, 2006, 05:28:07 PM
Oops, I made a typo mistake in the alphabet, I corrected the Greek alphabet display in my previous message.

With spiros' s new clean installs the problems are now same as mine, with capital Π". He uses UTF-8 collation , I use latin1.

I noticed that "Π" was displaying in a quoted word "Παρνασός". I tried this in my installation without success.

The Greek "Π" might have a relation with the Russian "Р" that seems also to have a problem with 1.1 RC2 UTF-8.

Spiros, I also noticed that greek search doesn't give results in your test forum http://www.nonsmokersclub.com/forums/index.php?topic=2.0, try it yourself.

So far also the special character "à" is not displaying in 1.1 RC2 UTF-8.
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: spiros on January 02, 2006, 05:40:48 PM
Hmmm you are correct, no search results for monotonic (plain) Greek, but in that post they ware converted to entities for some reason!

Try this one:
http://www.nonsmokersclub.com/forums/index.php?topic=3.0
(polytonic works)

and this new monotonic one
http://www.nonsmokersclub.com/forums/index.php?topic=4.0

In fact, this is why you cannot search Greek in this forum. Since it uses latin charset, all characters are converted to entities, and this is why UTF is so bloody important!
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: spiros on January 02, 2006, 05:52:11 PM
I run another test and changed the collation of all tables to unicode from latin and I had the same serious problems as I had with my previous 2 RC2 installations!!! When I rolled back the changes to Latin the extra errors remained, but when overwriting them with the same clean text only the Π error remained.

The real problem though is this:

Unicode text looks like this in both RC1 and RC2 databases:

μή Ï,,οι χλιδῇ δοκεῖÏ,,ε μηδ᾽ αá½?θαδίᾳ

in the former case it displays OK on the browser, in the latter there are problems with the Π character.
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: taka on January 02, 2006, 05:59:27 PM
Hi.

I've just posted a patch on SMF Coding Discussion forum, but thought you guys might be interested in my patch.  You can find the link to the zip in this thread:

http://www.simplemachines.org/community/index.php?topic=63778.0 (http://www.simplemachines.org/community/index.php?topic=63778.0)

Note that, the fix contains two fixes.  You probably aren't interested in the e-mail fix.  Because it's disabled by default, it's harmless anyway.

Taka
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: spiros on January 02, 2006, 06:13:42 PM
Hello Taka,

The serious problems appeared to RC2 but you mention RC1. Have you read this thread?

Quote
http://hiko-ki.com/patch_20060102.zip

NOTE: The zip file contains modified 1.1 RC1 files.
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: agridoc on January 02, 2006, 06:16:25 PM
Spiros I changed my message, I had already noticed the polytonic search and the differences in search.

As far as Multilingual search in SMF's site (http://www.simplemachines.org/community/index.php?topic=60277.0) is concerned I don' t think this is the cause, please read my messages and let' s discuss this topic in the proper place and you can help there too with your experience. There is no only one way to success.
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 RC2
Post by: Grudge on January 02, 2006, 06:35:09 PM
spiros,

Please don't get too annoyed over this. We *are* seriously looking into this and are attempting to fix it. Some of your posts are coming off as a little too aggressive. Please bear with us, as I'm sure you appreciate these things are quite difficult to emulate, and hence fix. In addition, it's a combination of many different factors complicating it further. I can assure you that you're not being ignored :)

Grudge
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: taka on January 02, 2006, 07:18:07 PM
Quote from: spiros on January 02, 2006, 06:13:42 PM
The serious problems appeared to RC2 but you mention RC1. Have you read this thread?
There might be a new problem introduced by RC2, but RC1 has a couple of UTF-8 related problems.  As Grudge mentioned, I think you are probably seeing a combination of different problems.

Speaking of RC2 with non Latin language, you can take a look at Logue's site:

http://forum.logue.tk/index.php (http://forum.logue.tk/index.php)

He's upgraded to RC2 yesterday.  As you can see, he's also using lots of non Latin board and topics.  I'm still seeing some character corruption, though.  I believe it's due to the same bug existed in RC1.

In any cases, I've sent my patch to Grudge and I expect him to look at what I've done in the patch.  It should at least fix rendering and posting problem that we've seen in RC1.

There might be a problem in database migration code (I'm writing this from 100% guessing).  That needs to be fixed by dev team or some other volunteer since I haven't upgraded to RC2 yet.
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 RC2
Post by: Grudge on January 02, 2006, 07:23:29 PM
taka,

We're looking at your code thanks. We are considering that upgrade may have affected things, we'll keep ploughing on ;)

Thanks,

Grudge
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: taka on January 02, 2006, 08:22:54 PM
Grudge,

No problem :-)

I have a suggestion to the core development team.  Would you pass it on?

I think it's time for SMF project to switch the default charset from ISO-8859-1 to UTF-8.  If you look at the major web sites such as Google, Yahoo, and MSN, they are all using UTF-8 now.  Using UTF-8 makes things a lot easier than ISO-8859-1 once your presence reached certain point, more and more people on this planet using your product.

You may think it's too late to make such change, but if you delay one day, there will be more data in the database for sure.

Please consider using UTF-8 for 1.1 GM release.

Taka
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: spiros on January 02, 2006, 09:02:56 PM
Taka,

I wholeheartedly second your suggestion: I think unicode is the only way ahead for globalization. Joomla developers were smart enough to have core UTF support for 1.1. I guess people in SMF must have realized the importance of unicode to a certain extent.

We are here to test and report (at least myself - you can do more).

Grudge,

I am sorry if some of my posts came across as aggressive. They were not meant to be  :)
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: spiros on January 03, 2006, 03:50:40 AM
Compuart,

I do not know if this helps, but I checked with phpmyadmin on the same db where I have installed Joomla and the unicode entries there look perfect, ie:

Πόσες φορές άραγε κάποιοι Έλληνες πολίτες (γύρω ...    Έχοντας ζήσει στο εξωτερικό και έχοντας

Whereas SMF entries (both RC1 and RC2) look like this

μή Ï,,οι χλιδῇ δοκεῖÏ,,ε μηδ᾽...
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: spiros on January 04, 2006, 10:04:36 AM
Sorry to be a nuissance, do we have any updates on the UTF front?
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: Compuart on January 04, 2006, 10:25:27 AM
I'm still working on it. All changes have to be throroughly tested, especially when it comes to character sets. Although we'll have a better support for UTF-8 we also keep supporting other character sets. ISO-8859-1 will remain the default character set for most languages, but it will be possible to override the character set independent of the chosen language.
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: spiros on January 04, 2006, 11:22:22 AM
Ok Compuart, thanks for your feedback. If there is anything I can do to help in terms of testing please do not hesitate to contact me.
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: Compuart on January 04, 2006, 11:44:52 AM
Thanks, I'll certainly contact you as soon as I'm no longer able to find any bugs when using UTF-8.
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 RC2
Post by: agridoc on January 04, 2006, 12:30:10 PM
I started a new topic Multilingual in SMF 1.1RC2 without UTF for Greek and other languages? (http://www.simplemachines.org/community/index.php?topic=64142.msg443254#msg443254) and I would like your opinion and support there too.

QuoteI don' t want to argue with this against UTF-8.

I want to report that a working multilingual solution exists in a previous SMF version without using UTF-8.

QuotePlease Compuart, as you are focused on multilingual, take a look at this solution too and [Uknown]' s patch. Spiros' s help would be appreciated in this topic too.
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: spiros on January 04, 2006, 05:20:49 PM
Agridoc,

This is hardly a solution. In fact this forum works on this principle: converting all non-latin scripts to html entities (just use view source on this case - say the Greek board- and compare with view source in a UTF page - what do you see?). The only real solution is unicode. It is not by chance that google and other major internet companies use unicode instead of converting all non latin output to... entities! ;)

As far as working solutions go, RC1 was quite near full UTF support. A few hacks here and there, and all the minor character corruption problems in titles and Last Posts / Latest posts were solved.
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 RC2
Post by: agridoc on January 04, 2006, 07:58:45 PM
We have not a quarrel between a UTF and non-UTF solution. I think both should be developed.
Quote from: agridoc on January 04, 2006, 12:19:49 PMSMF' s team should consider all the trends and possibilities. Compuart' s message (http://www.simplemachines.org/community/index.php?topic=63235.msg443044#msg443044) shows they take care of all this.

We can discuss the limitations and other things about the non-UTF solution in http://www.simplemachines.org/community/index.php?topic=64142.0

I will also try to help, if I can, in the development of the UTF solution here.
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: taka on January 05, 2006, 10:10:07 AM
Quote from: spiros on January 04, 2006, 05:20:49 PM
This is hardly a solution. In fact this forum works on this principle: converting all non-latin scripts to html entities (just use view source on this case - say the Greek board- and compare with view source in a UTF page - what do you see?). The only real solution is unicode. It is not by chance that google and other major internet companies use unicode instead of converting all non latin output to... entities! ;)
Agreed.  A solution with entities are too European centric.  It won't work for Asian.  Technically, you can replace all characters by UNICODE code point entity representation, but UNICODE entity representation shouldn't be used that way.

BTW, another UTF-8 related bug.  In News.php, there's wrong use of substr which only works ISO-8859-X.

$row['body'] = strtr(substr(str_replace('<br />', "\n", $row['body']), 0, $modSettings['xmlnews_maxlen'] - 3), array("\n" => '<br />')) . '...';


Because of this code, RSS feed doesn't show body properly.  It's chopped off in the middle of UTF-8 sequence.  This substr should be replaced by mb_substr if it's available.  That solved my problem.
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: spiros on January 05, 2006, 02:48:51 PM
Taka,

Thanks for another excellent solution!
I hope Compuart has noted this.
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: mcgrelio on January 06, 2006, 04:00:59 PM
What about this?
http://www.simplemachines.org/community/index.php?topic=23509.0

I'm setting up a test server to try that solution but i'm doubtful...
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: spiros on January 06, 2006, 06:38:40 PM
mcgrelio,

Yes, this is the standard procedure for converting a non-UTF site to a UTF one. However, the point here is how SMF handles posting in a UTF environment.

Different kettle of fish  ;)
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: spiros on January 09, 2006, 04:22:22 PM
Just another (strange) reason UTF-8 is recommended. In my case, I use a Greek encoding and Google indexes incorrectly the text in my forum. That is to say, it is indexed as if the encoding was not Windows-1253 and it was Latin. So in search results one sees extended ASCII characters (gobleddygook).

I contacted Google support and they recommended switching to UTF-8. So I guess, that, after all, if more topics are indexed correctly in other languages, the higher the publicity of the forum using it, and SMF as a consequence.

Click for a detailed report on this problem and Google's reply (http://www.simplemachines.org/community/index.php?topic=21930.0).
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: agridoc on January 09, 2006, 04:57:15 PM
I use windows-1253 as codepage in all SMF' s languages I have installed (a cheat).

I did a search in Google for my site and I get 10.300 results (not all from the forum), correct Greek titles and text in the display.

Test it here (http://www.google.com.gr/search?as_q=&num=50&hl=el&newwindow=1&btnG=%CE%91%CE%BD%CE%B1%CE%B6%CE%AE%CF%84%CE%B7%CF%83%CE%B7+Google&as_epq=&as_oq=&as_eq=&lr=&as_ft=i&as_filetype=&as_qdr=all&as_occt=any&as_dt=i&as_sitesearch=aeromodelling.gr).

This is a test  search for a greek word in my site (http://www.google.com.gr/search?q=%CE%B1%CE%B5%CF%81%CE%BF%CE%BC%CE%BF%CE%BD%CF%84%CE%AD%CE%BB%CE%BF+site%3Aaeromodelling.gr&btnG=%CE%91%CE%BD%CE%B1%CE%B6%CE%AE%CF%84%CE%B7%CF%83%CE%B7&num=50&hl=el&newwindow=1&as_qdr=all). Seems to work again.

I don' t argue against UTF-8, however I prefer another solution that works too with SMF 1.0x and I believe can be made to work with 1.1 too with help from SMF's support team. See  Multilingual in SMF 1.1RC2 without UTF for Greek and other languages? (http://www.simplemachines.org/community/index.php?topic=64142.msg443254#msg443254)
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: spiros on January 09, 2006, 05:35:32 PM
Yes, I know, in some cases it does work, in others not!

The case with my site is like 5% of Greek posts are indexed correctly and the rest are not!

This is very strange because it is not a uniform behaviour and hence cannot be easily ascribed to a specific reason.
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: NTA on January 10, 2006, 05:09:27 AM
I have this problem too when i using upgrade package to upgrade from SMF 1.0.5 to 1.1 RC2 , after converting all Message, Post, Boardname, MemberName in UTF-8 gone away :D (Thank god that i've tested only in localhost), i found that not Database's problem, i think convert tool has problem when working with UTF-8 encoding....

I'm looking forward your solution :D
Title: preparsecode(str, bool) is not utf-8 safe.
Post by: bubux on January 10, 2006, 02:01:48 PM
I found that void preparsecode(string &message, boolean previewing = false)
(Subs-Post.php)
is not utf-8 safe.

When I comment out preparsecode() in Post.php I can write my message safely.
but with preparsecode(..) I lost all my message body.

I tested with 1.1 RC2 mysql 5.0.16, apache 2.0.55 , and php 5.1.1 on NetBSD 3.0 macppc.

mysql compiled with utf8 default character set, database created in utf8.

and $txt['lang_character_set'] = 'UTF-8'; in index.english.php
( db: utf-8, php: utf-8 environment)

here is my test message.
Quote
[쿠키 스포츠] ○...오른무릎 근육 부상으로 최대 보름간의 재활에 들어간 박지성(25·맨체스터 유나이티드)의 공백에 대해 영국 현지 언론이 깊은 우려를 나타냈다.

맨체스터 지역신문 '맨체스터 이브닝 뉴스'는 10일(한국시간) "박지성이 버튼 앨비언과의 FA컵 3라운드 경기를 앞두고 가진 워밍업 도중 입은 무릎 부상으로 당분간 경기에 나설 수 없게 됐다"며 "알렉스 퍼거슨 감독은 박지성의 공백이 길어지는 것을 결코 원하지 않고 있다"고 전했다.

이 신문은 "박지성이 지난 해 여름 400만파운드의 몸값을 기록하며 PSV 에인트호벤에서 이적해왔지만 아직까지는 팀에 정착해나가고 있는 단계"라며 "박지성이 출장한 29경기에서 14차례 교체멤버로 나왔고 자신이 유용한 대체 요원임을 입증해왔다"고 평가했다.

신문은 이어 "박지성이 버밍엄 시티와의 칼링컵 8강전에서 맨유 입단 후 첫 골을 기록했고,그런 이유로 퍼거슨 감독은 박지성 없는 '빡빡한 1월'을 원하지 않을 것"이라고 덧붙였다.

축구 전문 사이트 ESPN 사커넷도 맨유 코너에서 박지성이 부상으로 칼링컵 4강 1차전에 출장할 수 없다는 '맨체스터 이브닝 뉴스'의 보도를 비중있게 다뤘다.

살인적인 일정을 소화해야 하는 맨유 입장에선 쉴틈 없이 그라운드를 누비는 박지성의 공백이 크게 느껴질 수 밖에 없다. 최소 10일,최대 보름의 재활 진단을 받은 박지성이 어느 정도 컨디션을 회복한다면 예상보다 빨리 팀에 복귀할 가능성을 읽을 수 있는 대목이다. 국민일보 쿠키뉴스 조상운 기자
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 RC2
Post by: scripter on January 12, 2006, 01:38:10 AM
Hello,

I got the same thing with UTF-8 encoding, but I used vietnamese.

It always attemp to change the character "à" to "ænbsp;"

I commented the function preparsecode(), but it can't help

See it here http://www.simplemachines.org/community/index.php?topic=65438.msg451703#new
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: spiros on January 12, 2006, 05:51:39 AM
This is interesting. It does the same to the Greek character Π.
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: GravuTrad on January 12, 2006, 06:39:21 PM
We look at a sort of problem like this in the french translation with the email notification, could it be the same problem?

we obtain this in the mails...:

QuoteUne r=E9ponse a =E9t=E9 post=E9e dans un fil de discussion que vous
surveillez par GravuTrad.

Voir la r=E9ponse au=A0: http://www.gravure-et-traductions.com/Testsmf/index.php=3Ftop
ic=3D1.new;topicseen#new
D=E9sabonnement =E0 ce fil de discussion en cliquant ici: http://www.gravure-et-traductions.com/Testsmf/index.php=3Fact
ion=3Dnotify;topic=3D1.0
D'autres r=E9ponses peuvent avoir =E9t=E9 post=E9es entre temps, mais vous
ne recevrez pas d'autre notification tant que vous ne lirez pas ce fil de
discussion.

Cordialement,
L'=E9quipe My Community.

Like you can see entities are not well seen, but not the same things in the same inboxes (for example caramail do this example above but hotmail read well....)

Could it be an issue of the problem you are talking in this post?
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: spiros on January 13, 2006, 10:23:07 AM
Do you use UTF encoding? These problems seem to appear only when using UTF.
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: GravuTrad on January 13, 2006, 02:02:13 PM
ansi or utf-8, same problem...
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: spiros on January 16, 2006, 06:35:50 AM
Hello Compuart and Grudge,

Any updates on the UTF issue?

I do not mean to inflict any pressure but I have a UTF portal on the standby and I would prefer to use SMF rather than a different forum...

So, if possible, I would really appreciate it if you could give us an estimate of the time needed to release any relevant patches.
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: scripter on January 16, 2006, 10:42:25 AM
Quote from: spiros on January 12, 2006, 05:51:39 AM
This is interesting. It does the same to the Greek character Π.

I found the problem.
This is the error when this function run

// Put it back together!
if (!$previewing)
$message = strtr(implode('', $parts), array('  ' => '&nbsp; ', "\n" => '<br />', "\xA0" => '&nbsp;'));
else
$message = strtr(implode('', $parts), array('  ' => '&nbsp; ', "\xA0" => '&nbsp;'));


it try to replace the code \xA0 to the HTML of Non break space &nbsp;

When I change to the normal space (press the space bar) like this
"\xA0" => ' '

The problem fixed
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: spiros on January 16, 2006, 11:14:06 AM
scripter,

You mean you change the instances of

"\xA0" => '&nbsp;'

to

"\xA0" => ' '

I tried it. But it did not fix the problem with the Greek Π.
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 RC2
Post by: Logue on January 17, 2006, 03:01:26 AM
If it is done in Japanese, a problem will occur ...

だ -> nbsp;
頻 -> 馮bsp;?

// Put it back together!
if (!$previewing)
$message = strtr(implode('', $parts), array('  ' => '&nbsp; ', "\n" => '<br />'));
else
$message = strtr(implode('', $parts), array('  ' => '&nbsp; '));


Probably, I think that it is because \xA0 is contained in some characters. When the portion of \xA0 was deleted, it stopped then, generating garbled characters on SQL data at least. (Thanks to Turkey)
http://forum.logue.tk/index.php/topic,95.0.html

Isn't it hit by the same trouble in the languages (Japanese, Chinese, Korean, etc.) using the kanji?
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: Du! on January 31, 2006, 07:59:45 PM
I use utf-8 and I have the same problem with russian "P" on my SMF 1.1 RC2 forum.
Nothing above did not fix it.
Any idea o update for this problem?
Thanks
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 RC2
Post by: agridoc on February 01, 2006, 12:46:52 AM
Hi Du!

I have noticed this problem in http://www.simplemachines.org/community/index.php?topic=63235.msg440773#msg440773 pointing that there might be a correlation.

I use as a test string for multilingual the catalog of the language specific support boards of SMF, so it is by luck that I noticed this problem with Russian.

There might be problem with other languages too but that must be found by people using them.

I also had a request for multilingual in a more simple approach in  Multilingual in SMF 1.1RC2 without UTF for Greek and other languages? (http://www.simplemachines.org/community/index.php?topic=64142.msg443254#msg443254) that might interest some of the Russian community, as I believe many use windows-1251 and not UTF-8.

Compuart is looking after these two cases but no solution has yet appeared.
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 RC2
Post by: BadCluster on February 01, 2006, 05:45:52 AM
I noticed the following 2 problems in 1.1 RC2.

The 1st is about making categories and subforums with greek fonts. When i make the category i can see Greek with no problem at all, when i try to edit the category i see the Greek fonts like that

http://img227.imageshack.us/img227/4029/problem24zp.jpg

The 2nd problem has to do with searching in greek fonts. I did a search for the word "άθλημα" and i found result only i had the same signs as above appearing

http://img227.imageshack.us/img227/1462/problem4xo.jpg
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 RC2
Post by: agridoc on February 01, 2006, 07:06:57 AM
Hi BadCluster

It is obvious that you are not using UTF-8 but ISO-8859-1 (English) codepage and English language as default or as user. You must install greek language. If you want greek to work with english menus replace ISO-8859-1 with windows-1253 or ISO-8859-7 in index.english.php found in .../Themes/default/languages

If you want more on this search the Greek language support board http://www.simplemachines.org/community/index.php?board=78.0 or post a new topic there as it is out of this topic' s subject.
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: NTA on February 10, 2006, 09:44:27 AM
I have this problem too  :(

In RC2, the charactor "à" become "?" , and when i edit this post, i see "à" become "ænbsp;", please fix this soon  :(
I'm using UTF-8 Encoding..

And i found that, if i using quick edit button (http://www.simplemachines.org/community/Themes/newsite/images/icons/modify_inline.gif) to edit the post, i see that this bug not happen, everything is ok. maybe this help you found how to fix problem soon.

Best regard
NTA
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: SpooK on February 11, 2006, 06:33:53 AM
Simple and tested solution @ http://www.simplemachines.org/community/index.php?topic=70750.0 (http://www.simplemachines.org/community/index.php?topic=70750.0).
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 RC2
Post by: CalCal on February 27, 2006, 11:28:25 AM
This helped:

In file Subs-Post.php change

// Put it back together!
if (!$previewing)
$message = strtr(implode('', $parts), array('  ' => '&nbsp; ', "\n" => '<br />', "\xA0" => '&nbsp;'));
else
$message = strtr(implode('', $parts), array('  ' => '&nbsp; ', "\xA0" => '&nbsp;'));


to

// Put it back together!
if (!$previewing)
$message = strtr(implode('', $parts), array('  ' => '&nbsp; ', "\n" => '<br />'));
else
$message = strtr(implode('', $parts), array('  ' => '&nbsp; '));


Anyway, I tested this solution only on localhost because maybe this will mess something else. What are you (developers) think about this?
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: eyespark on March 01, 2006, 05:54:29 PM
Thank you!!!!! That did it!!!  :P I tried all proposed solutions but only this one worked. Thanks

Quote from: CalCal on February 27, 2006, 11:28:25 AM
This helped:

In file Subs-Post.php change

// Put it back together!
if (!$previewing)
$message = strtr(implode('', $parts), array('  ' => '&nbsp; ', "\n" => '<br />', "\xA0" => '&nbsp;'));
else
$message = strtr(implode('', $parts), array('  ' => '&nbsp; ', "\xA0" => '&nbsp;'));


to

// Put it back together!
if (!$previewing)
$message = strtr(implode('', $parts), array('  ' => '&nbsp; ', "\n" => '<br />'));
else
$message = strtr(implode('', $parts), array('  ' => '&nbsp; '));


Anyway, I tested this solution only on localhost because maybe this will mess something else. What are you (developers) think about this?

Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 RC2
Post by: CalCal on March 04, 2006, 09:47:59 PM
It would be very useful if SMF developers can say something about this "hack"... Is it safe to remove the thing I removed:

"\xA0" => '&nbsp;'


???
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: NTA on March 05, 2006, 02:19:52 AM
Quote from: CalCal on February 27, 2006, 11:28:25 AM
This helped:

In file Subs-Post.php change

// Put it back together!
if (!$previewing)
$message = strtr(implode('', $parts), array('  ' => '&nbsp; ', "\n" => '<br />', "\xA0" => '&nbsp;'));
else
$message = strtr(implode('', $parts), array('  ' => '&nbsp; ', "\xA0" => '&nbsp;'));


to

// Put it back together!
if (!$previewing)
$message = strtr(implode('', $parts), array('  ' => '&nbsp; ', "\n" => '<br />'));
else
$message = strtr(implode('', $parts), array('  ' => '&nbsp; '));


Anyway, I tested this solution only on localhost because maybe this will mess something else. What are you (developers) think about this?



Thank!  :D My problem fixed now  :P
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: spiros on March 05, 2006, 09:41:50 AM
As far as I know, in the latest build, the UTF problems have been fixed. Just wait for the SMF 1.1 final.
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: quocnht on March 29, 2006, 01:16:48 PM
http://forum.logue.tk/index.php/topic,98.msg631.html#new (http://forum.logue.tk/index.php/topic,98.msg631.html#new)

Hope this helped !
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: n3452323 on April 27, 2006, 05:31:58 AM
Quote from: CalCal on February 27, 2006, 11:28:25 AM
This helped:

In file Subs-Post.php change

// Put it back together!
if (!$previewing)
$message = strtr(implode('', $parts), array('  ' => '&nbsp; ', "\n" => '<br />', "\xA0" => '&nbsp;'));
else
$message = strtr(implode('', $parts), array('  ' => '&nbsp; ', "\xA0" => '&nbsp;'));


to

// Put it back together!
if (!$previewing)
$message = strtr(implode('', $parts), array('  ' => '&nbsp; ', "\n" => '<br />'));
else
$message = strtr(implode('', $parts), array('  ' => '&nbsp; '));


Anyway, I tested this solution only on localhost because maybe this will mess something else. What are you (developers) think about this?


Thanks, i change and it work on my site (http://yohosting.net/forrum)
even the news also resolved.

One again, thanks
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: n3452323 on May 01, 2006, 03:51:29 PM
But i found another problem that it ok for the short of massage only. If i post a long message the err happent again all 'à' letter become 'a?'

Anyone can help ?
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: Panagioths on May 08, 2006, 10:28:43 AM
Any fixes on Greek Character 'Π' problem?
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 RC2
Post by: RMIT on May 10, 2006, 08:46:40 AM
Quote from: CalCal on February 27, 2006, 11:28:25 AM
This helped:

In file Subs-Post.php change

// Put it back together!
if (!$previewing)
$message = strtr(implode('', $parts), array('  ' => '&nbsp; ', "\n" => '<br />', "\xA0" => '&nbsp;'));
else
$message = strtr(implode('', $parts), array('  ' => '&nbsp; ', "\xA0" => '&nbsp;'));


to

// Put it back together!
if (!$previewing)
$message = strtr(implode('', $parts), array('  ' => '&nbsp; ', "\n" => '<br />'));
else
$message = strtr(implode('', $parts), array('  ' => '&nbsp; '));


Anyway, I tested this solution only on localhost because maybe this will mess something else. What are you (developers) think about this?

I tried this and the result is good except one problem: if in your post have any link (without link BBCode Tag) , the charactor "à" become "?" like before patching. So, what can I do now ?
Title: Re: Wrong parsing of UTF characters in board names categories and posts in 1.1 R
Post by: n3452323 on May 12, 2006, 06:02:44 AM
The tip (not so nice and cn be tempotary using while waiting for ver1.1 final release) but it resolved the 'à' become '?' is
turn OFF option Automatically link posted URLs

( admin -> Manage Posts and Topics -> Bulletin Board Code Settings )


I think the problem cause by this feature.
demo (http://yohosting.net/foum)