[Yes, I have downloaded the 2 latest fix files]
Initially I thought there was something wrong with the upgrade. So I deleted everything and did a clean install.
It seems that non-Latin characters are corrupted in board names, categories and posts although in a quite insonsistent manner! For example:
Χιο�?μο�? > Γενικά
http://www.nonsmokersclub.com/forum/index.php/topic,4.0.html
I'll make a note of this for Compuart, he deals with charset problems (Although he may not realise this :P )
Thanks Grudge, I consider UTF compatibility an absolutely major issue. Thank God I did not try the upgrade/install of RC2 on my language site.
You could just as well tell him to have a look at this post (http://www.simplemachines.org/community/index.php?topic=63146.msg437576#msg437576), where there are some good solutions to the older, minor issues.
This is what I get when I click on my messages:
Illegal mix of collations (latin1_swedish_ci,COERCIBLE) and (utf8_general_ci,IMPLICIT) for operation 'find_in_set'
File: /home/free/public_html/forum/Sources/PersonalMessage.php
Line: 380
FROM {$db_prefix}pm_recipients AS pmr
WHERE pmr.ID_MEMBER = $ID_MEMBER
AND pmr.deleted = 0$labelQuery", __FILE__, __LINE__);
list ($max_messages) = mysql_fetch_row($request);
mysql_free_result($request);
My database is a UTF-8 one and the collation of all tables is utf8_general_ci.
At the end of the tables (in phpmyadmin) I can see:
117 table(s) Sum 2,105 -- latin1_swedish_ci 610.0 KB
This is the same as in previous installation of (RC1) where no problems occured.
Are all columns in that table the same collation, especially the labels column?
Yes, here you go
Field Type Collation Attributes Null Default Extra Action
ID_PM int(10) UNSIGNED No 0 Change Drop Primary Index Unique Fulltext
ID_MEMBER mediumint(8) UNSIGNED No 0 Change Drop Primary Index Unique Fulltext
labels varchar(60) utf8_general_ci No -1 Change Drop Primary Index Unique Fulltext
bcc tinyint(3) UNSIGNED No 0 Change Drop Primary Index Unique Fulltext
is_read tinyint(3) UNSIGNED No 0 Change Drop Primary Index Unique Fulltext
deleted tinyint(3) UNSIGNED No 0
Is the table itself also utf8_general_ci? I know the database is, and the column, but what about the actual table?
Grudge,
All tables are. I have checked this, even made an SQL dump to have a closer look.
Table Action RecordsTip Type Collation Size Overhead
Drop 2 MyISAM utf8_general_ci 4.1 KB -
smf_attachments Browse Structure Search Insert Empty Drop 0 MyISAM utf8_general_ci 1.0 KB -
smf_ban_groups Browse Structure Search Insert Empty Drop 0 MyISAM utf8_general_ci 1.0 KB -
smf_ban_items Browse Structure Search Insert Empty Drop 0 MyISAM utf8_general_ci 1.0 KB -
smf_board_permissions Browse Structure Search Insert Empty Drop 69 MyISAM utf8_general_ci 5.4 KB -
smf_boards Browse Structure Search Insert Empty Drop 4 MyISAM utf8_general_ci 6.6 KB -
smf_calendar Browse Structure Search Insert Empty Drop 0 MyISAM utf8_general_ci 1.0 KB -
smf_calendar_holidays Browse Structure Search Insert Empty Drop 167 MyISAM utf8_general_ci 9.0 KB -
smf_categories Browse Structure Search Insert Empty Drop 1 MyISAM utf8_general_ci 2.0 KB -
smf_collapsed_categories Browse Structure Search Insert Empty Drop 0 MyISAM utf8_general_ci 1.0 KB -
smf_log_actions Browse Structure Search Insert Empty Drop 0 MyISAM utf8_general_ci 1.0 KB -
smf_log_activity Browse Structure Search Insert Empty Drop 1 MyISAM utf8_general_ci 4.0 KB -
smf_log_banned Browse Structure Search Insert Empty Drop 0 MyISAM utf8_general_ci 1.0 KB -
smf_log_boards Browse Structure Search Insert Empty Drop 3 MyISAM utf8_general_ci 2.0 KB -
smf_log_errors Browse Structure Search Insert Empty Drop 31 MyISAM utf8_general_ci 9.4 KB -
smf_log_floodcontrol Browse Structure Search Insert Empty Drop 1 MyISAM utf8_general_ci 2.1 KB -
smf_log_karma Browse Structure Search Insert Empty Drop 0 MyISAM utf8_general_ci 1.0 KB -
smf_log_mark_read Browse Structure Search Insert Empty Drop 0 MyISAM utf8_general_ci 1.0 KB -
smf_log_notify Browse Structure Search Insert Empty Drop 0 MyISAM utf8_general_ci 1.0 KB -
smf_log_online Browse Structure Search Insert Empty Drop 2 MyISAM utf8_general_ci 5.1 KB 648 Bytes
smf_log_polls Browse Structure Search Insert Empty Drop 0 MyISAM utf8_general_ci 1.0 KB -
smf_log_search_messages Browse Structure Search Insert Empty Drop 0 MyISAM utf8_general_ci 1.0 KB -
smf_log_search_results Browse Structure Search Insert Empty Drop 0 MyISAM utf8_general_ci 1.0 KB -
smf_log_search_subjects Browse Structure Search Insert Empty Drop 15 MyISAM utf8_general_ci 3.4 KB -
smf_log_search_topics Browse Structure Search Insert Empty Drop 0 MyISAM utf8_general_ci 1.0 KB -
smf_log_topics Browse Structure Search Insert Empty Drop 4 MyISAM utf8_general_ci 2.0 KB -
smf_membergroups Browse Structure Search Insert Empty Drop 8 MyISAM utf8_general_ci 3.3 KB -
smf_members Browse Structure Search Insert Empty Drop 1 MyISAM utf8_general_ci 10.2 KB -
smf_message_icons Browse Structure Search Insert Empty Drop 12 MyISAM utf8_general_ci 3.3 KB -
smf_messages Browse Structure Search Insert Empty Drop 4 MyISAM utf8_general_ci 12.3 KB -
smf_moderators Browse Structure Search Insert Empty Drop 0 MyISAM utf8_general_ci 1.0 KB -
smf_package_servers Browse Structure Search Insert Empty Drop 1 MyISAM utf8_general_ci 2.1 KB -
smf_permissions Browse Structure Search Insert Empty Drop 39 MyISAM utf8_general_ci 2.9 KB -
smf_personal_messages Browse Structure Search Insert Empty Drop 0 MyISAM utf8_general_ci 1.0 KB -
smf_pm_recipients Browse Structure Search Insert Empty Drop 0 MyISAM utf8_general_ci 1.0 KB -
smf_poll_choices Browse Structure Search Insert Empty Drop 0 MyISAM utf8_general_ci 1.0 KB -
smf_polls Browse Structure Search Insert Empty Drop 0 MyISAM utf8_general_ci 1.0 KB -
smf_sessions Browse Structure Search Insert Empty Drop 5 MyISAM utf8_general_ci 7.6 KB 3,296 Bytes
smf_settings Browse Structure Search Insert Empty Drop 163 MyISAM utf8_general_ci 11.6 KB -
smf_smileys Browse Structure Search Insert Empty Drop 19 MyISAM utf8_general_ci 2.5 KB -
smf_themes Browse Structure Search Insert Empty Drop 44 MyISAM utf8_general_ci 4.5 KB -
smf_topics Browse Structure Search Insert Empty Drop 4 MyISAM utf8_general_ci 7.1 KB -
117 table(s) Sum 2,106 -- latin1_swedish_ci 610.3 KB 4.6 KB
With selected: Check All / Uncheck All / Check tables having overhead
Print view Data Dictionary
Create new table on database free_joomla:
Name:
Number of fields:
In fact, the only thing latin in the SQL dump was the actual error
INSERT INTO `smf_log_errors` VALUES (27, 1136129606, 1, '87.202.118.46', '?action=pm', 'Database Error: Illegal mix of collations (latin1_swedish_ci,COERCIBLE) and (utf8_general_ci,IMPLICIT) for operation ''find_in_set''<br />File: /home/free/public_html/forum/Sources/PersonalMessage.php<br />Line: 380', '4a8937e39e6dd5d9610fcb06f935510a');
OK, going out on a limb really, what does this do:
ALTER TABLE smf_pm_recipients CHANGE labels labels VARCHAR(60) NOT NULL DEFAULT -1
That should use the default MySQL collation stuff, that may bring it back in line?
I run it, nothing changed. The error still appears.
ID_PM int(10) UNSIGNED No 0 Change Drop Primary Index Unique Fulltext
ID_MEMBER mediumint(8) UNSIGNED No 0 Change Drop Primary Index Unique Fulltext
labels varchar(60) utf8_general_ci No -1 Change Drop Primary Index Unique Fulltext
bcc tinyint(3) UNSIGNED No 0 Change Drop Primary Index Unique Fulltext
is_read tinyint(3) UNSIGNED No 0 Change Drop Primary Index Unique Fulltext
deleted tinyint(3) UNSIGNED No 0 Change Drop Primary Index Unique Fulltext
With selected: Check All / Uncheck All With selected: Change Drop Primary Index Unique Fulltext
Is this your own server? If so can you confirm that my.ini has:
default-character-set=utf8
character_set_server=utf8
collation_server=utf8_general_ci
In it?
May find this useful?>
http://www.simplemachines.org/community/index.php?topic=27367.msg211371#msg211371
I run the command
show variables like 'colla%';
in phpmyadmin and I get this:
Variable_name Value
collation_connection utf8_unicode_ci
collation_database latin1_swedish_ci
collation_server latin1_swedish_ci
Funnily enough it was all the same in RC1 where there were no problems.
I checked the messages in phpmyadmin and they appear very strange:
In one case entities:
& # 913 ; α Β β Γ γ<br />Δ δ Ε ε Ζ ζ<br />Η η Θ θ
(although they do not appear as entities in the html!)
And in another
Î'Ï...Ï,,ÏŒ είναι Îνα νÎο θÎμα Î'Ï...Ï,,ÏŒ είναι Îνα νÎο θÎμα Î'Ï...Ï,,ÏŒ εÎ
I assume that is in someway the problem. I'm just a little hesitant to tell you to change it, as I'm really not sure what is the right way :/
Grudge,
I doubt this is the problem (at least as far as character corruption is concerned) because with the exact same configuration, the exact same server, and the exact same database, RC1 had no problems at all.
At the same time, I also run other CMSs in the same server which have no problem whatsoever with Unicode.
So I think you might need to do some more work on your side ;)
Believe you me, I haven't forgotten this. It's top of my priority list, and once I get to speak to Compuart I really hope we can sort this out.
Thank you Grudge,
Much appreciated. To prove what I am saying I did another installation of RC1 this time in the same server, same database. I even left the default table collation as latin1_swedish_ci.
Despite all that:
1) Unicode worked perfectly
2) The collation error did not appear in pm
see for yourself this test post
http://www.nonsmokersclub.com/rc1/index.php?topic=2.0
jos_weblinks Browse Structure Search Insert Empty Drop 2 MyISAM utf8_general_ci 4.1 KB -
rc1_attachments Browse Structure Search Insert Empty Drop 0 MyISAM latin1_swedish_ci 1.0 KB -
rc1_ban_groups Browse Structure Search Insert Empty Drop 0 MyISAM latin1_swedish_ci 1.0 KB -
rc1_ban_items Browse Structure Search Insert Empty Drop 0 MyISAM latin1_swedish_ci 1.0 KB -
rc1_board_permissions Browse Structure Search Insert Empty Drop 69 MyISAM latin1_swedish_ci 5.4 KB -
rc1_boards Browse Structure Search Insert Empty Drop 1 MyISAM latin1_swedish_ci 6.1 KB -
rc1_calendar Browse Structure Search Insert Empty Drop 0 MyISAM latin1_swedish_ci 1.0 KB -
I changed my my.cnf (linux ini) to what you said
default-character-set=utf8
character_set_server=utf8
collation_server=utf8_general_ci
and resulted in:
1) messing up with most of my db powered sites (even UTF ones) and making non latin characters appear as question marks.
2) Did not resolve the way non latin characters characters were displayed in RC2
I rolled back the changes and everything was back to normal :)
So I guess, this too, is not an appopriate option.
Hi, I did also a test in the forum link that spiros gave.
http://www.nonsmokersclub.com/forum/index.php/topic,6.0.html
Other languages seem to be affected too, special characters seem to work OK.
There are also some strange things as that preview works OK but not in Greek translation (it hangs) and I see a font change when I change to Greek language (MSIE), not obvious with Mozilla/Firefox.
I am Greek also and interested in multilingual altough I use a successful different approach (not UTF-8), with spiros' s suggestion to use a script change by [Uknown] http://www.simplemachines.org/community/index.php?topic=19572.msg160840#msg160840
I used a test string from a multilingual test http://www.aeromodelling.gr/ForumS/index.php?topic=44.msg2540#msg2540
The same string was put in a test RC2 install in my PC and behaves as expected, there is only problem with special characters and Greek codepage. However it is not a UTF installation, I will try to find some time to play a little with UTF and I will inform you if I find something useful.
agridoc,
It is a different case with UTF. To properly test it you need to download RC2 UTF Greek files - or if not available I can send them to you for testing.
In your test link multilingual text works OK because it is converted into entities (standard SMF with non unicode). When we talk about unicode we mean not converting to entities non latin scripts.
spiros I have to learn quite a few things about UTF.
I think I can test UTF for multilingual without the Greek UTF translation as the codepage is the same in all languages.
I would really appreciate if you could email me the Greek UTF 1.1 RC2 translation, or PM me a link. I have seen your message that you sent the official SMF 1.1RC2 translation to SMF, which now will have two versions, Greek and UTF-8. However it might take a few days to appear in the download section.
Thank you for your work for SMF in Greek and for your help with multiningual so far.
In fact, you do not even need the Greek files.
Do this:
1) Save the English language files as UTF-8 (especially index.english.php) using Notepad
2) Change the encoding in index.english.php to UTF-8
3) Upload them
You are ready to test RC2 with UTF-8.
The problem with the collations is that before 4.0 MySQL didn't use them, between 4.0 and 4.1 MYSQL had an intertwined version of char sets and collations and from 4.1 it cannot work properly without the collation specified. If MySQL has been converted from an earlier version it's likely the columns will be configured using a latin_general collation, while if columns are changed afterwards the'll likely get the default collation of the latin character set which surprisingly is latin_swedish (I guess because MySQL is a swedish company).
I'll see if I can write a tool to fix the column collations.
compuart,
I had changed column collations to utf. This is not the point. RC1 had no such problems with unicode.
In fact, if you read the previous posts, I have installed in the same db, with latin columns, RC1 and it works perfectly fine with UTF!!!
Also, today I finished the Greek translation of minibb and it works perfectly with UTF-8 on the same database with latin columns!!!!
So I think it is mostly trying to find what has been broken in the changes from RC1 to RC2.
Have you tried a clean install of RC2? I'm assume it'll work just as fine as RC1. The problem is most likely the upgrader changing some of the text columns causing a mismatch of collations.
Compuart,
If you read the thread carefully you will see that I tried
1) upgrade (failed)
2) clean install (failed)
I also tried 2 clean installs of RC1 with the same succesfull results.
It would be nice if I had some feedback from other people testing it with UTF-8.
I have tried two clean installs of 1.1 RC2 (RC2 charter and RC2 public) and UTF-8 in my PC and there is a problem althouh not as severe as in spiros' case mainly with capital "Π" (at first I thought rectangles in MSIE were UTF errors, then Spiros revealed that it was polytonic Greek). Not only in Greek but in other languages (Russian for sure). I don' know Russian it was just obvious in a test page I use. :D
That's for spiros (in Greek)
ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΦnbsp;ΡΣΤΥΦΧΨΩ
αβγδεζηθικλμνξοπρστυφχψω
άέήίόύώ
ΆΈΉΊΌΎΏ
Русский (Russian) RC2 - ISO-8859-1 (or 7)
Цnbsp;усский (Russian) RC2 UTF-8
(It's the title of the Russian SMF's support board)
I think many languages may have problem with UTF-8 as it is restricted to only some characters and is hard to diagnose without the necessary language knowledge.
I didn' t change the collation in MySQL, it's latin_swedish.
I have also to report a problem with special characters and UTF-8 in RC2, a few characters do not display correctly.
P.S. By the way Compuart, please take also a look at Multilingual search in SMF's site? (http://www.simplemachines.org/community/index.php?topic=60277.0), Grudge wanted to inform you about. Gri' s messages there are a bit out of subject. ;D
Another clean install, new db, new everything. Again, same configuration, same problems, see sample post here:
http://www.nonsmokersclub.com/forums/index.php?topic=2.0
From this last test SMF RC2 displays a strong aversion to the Greek capital Π!!!!
What's wrong with that sample post? It looks like greek to me?
EDIT: Oh, I assume the numbers are not suppossed to be there?!
EDIT2: Attached screen, it's odd that one character is an accent on one subject line, but square on the next.
Hmm, there seem to be two topics merged, with two different problems. Is both problem still occuring (and specifically in a clean 1.1 RC2 install)?
Quote from: spiros on January 01, 2006, 10:37:50 AM
This is what I get when I click on my messages:
Illegal mix of collations (latin1_swedish_ci,COERCIBLE) and (utf8_general_ci,IMPLICIT) for operation 'find_in_set'
File: /home/free/public_html/forum/Sources/PersonalMessage.php
Line: 380
FROM {$db_prefix}pm_recipients AS pmr
WHERE pmr.ID_MEMBER = $ID_MEMBER
AND pmr.deleted = 0$labelQuery", __FILE__, __LINE__);
list ($max_messages) = mysql_fetch_row($request);
mysql_free_result($request);
and
Quote from: spiros on December 31, 2005, 07:03:12 PMIt seems that non-Latin characters are corrupted in board names, categories and posts although in a quite insonsistent manner! For example:
Χιο�?μο�? > Γενικά
http://www.nonsmokersclub.com/forum/index.php/topic,4.0.html
Hello Compuart,
With the new install I do not get the illegal collation thingy.
However, the real problem is not the square in the title (that can be fixed with a hack) but if you notice at the bottom right of your screenshot there is a Φnbsp which was actually a Π letter.
In fact, in Mozilla it does not display it as Φnbsp but as �.
Here is another test with polytonic Greek.
This is what it looks like in RC2:
http://www.nonsmokersclub.com/forums/index.php?topic=3.0
And this is what it looks like in RC1 (No problem here)
http://www.nonsmokersclub.com/rc1/index.php?topic=3.0
And here is the original text
http://www.mikrosapoplous.gr/syntipas.html
(use both mozilla and IE to see differences. Most of the boxes in IE would be eliminated if a unicode font was used in style sheet - not all though! Edit: I changed theme so these were sorted and you only get the Φnbsp; thingy replacing "Π". As I said, the main problematic character appears to be the capital Π!!!)
Oops, I made a typo mistake in the alphabet, I corrected the Greek alphabet display in my previous message.
With spiros' s new clean installs the problems are now same as mine, with capital Π". He uses UTF-8 collation , I use latin1.
I noticed that "Π" was displaying in a quoted word "Παρνασός". I tried this in my installation without success.
The Greek "Π" might have a relation with the Russian "Р" that seems also to have a problem with 1.1 RC2 UTF-8.
Spiros, I also noticed that greek search doesn't give results in your test forum http://www.nonsmokersclub.com/forums/index.php?topic=2.0, try it yourself.
So far also the special character "à" is not displaying in 1.1 RC2 UTF-8.
Hmmm you are correct, no search results for monotonic (plain) Greek, but in that post they ware converted to entities for some reason!
Try this one:
http://www.nonsmokersclub.com/forums/index.php?topic=3.0
(polytonic works)
and this new monotonic one
http://www.nonsmokersclub.com/forums/index.php?topic=4.0
In fact, this is why you cannot search Greek in this forum. Since it uses latin charset, all characters are converted to entities, and this is why UTF is so bloody important!
I run another test and changed the collation of all tables to unicode from latin and I had the same serious problems as I had with my previous 2 RC2 installations!!! When I rolled back the changes to Latin the extra errors remained, but when overwriting them with the same clean text only the Π error remained.
The real problem though is this:
Unicode text looks like this in both RC1 and RC2 databases:
μή Ï,,οι χλιδῇ δοκεῖÏ,,ε μηδ᾽ αá½?θαδίᾳ
in the former case it displays OK on the browser, in the latter there are problems with the Π character.
Hi.
I've just posted a patch on SMF Coding Discussion forum, but thought you guys might be interested in my patch. You can find the link to the zip in this thread:
http://www.simplemachines.org/community/index.php?topic=63778.0 (http://www.simplemachines.org/community/index.php?topic=63778.0)
Note that, the fix contains two fixes. You probably aren't interested in the e-mail fix. Because it's disabled by default, it's harmless anyway.
Taka
Hello Taka,
The serious problems appeared to RC2 but you mention RC1. Have you read this thread?
Quote
http://hiko-ki.com/patch_20060102.zip
NOTE: The zip file contains modified 1.1 RC1 files.
Spiros I changed my message, I had already noticed the polytonic search and the differences in search.
As far as Multilingual search in SMF's site (http://www.simplemachines.org/community/index.php?topic=60277.0) is concerned I don' t think this is the cause, please read my messages and let' s discuss this topic in the proper place and you can help there too with your experience. There is no only one way to success.
spiros,
Please don't get too annoyed over this. We *are* seriously looking into this and are attempting to fix it. Some of your posts are coming off as a little too aggressive. Please bear with us, as I'm sure you appreciate these things are quite difficult to emulate, and hence fix. In addition, it's a combination of many different factors complicating it further. I can assure you that you're not being ignored :)
Grudge
Quote from: spiros on January 02, 2006, 06:13:42 PM
The serious problems appeared to RC2 but you mention RC1. Have you read this thread?
There might be a new problem introduced by RC2, but RC1 has a couple of UTF-8 related problems. As Grudge mentioned, I think you are probably seeing a combination of different problems.
Speaking of RC2 with non Latin language, you can take a look at Logue's site:
http://forum.logue.tk/index.php (http://forum.logue.tk/index.php)
He's upgraded to RC2 yesterday. As you can see, he's also using lots of non Latin board and topics. I'm still seeing some character corruption, though. I believe it's due to the same bug existed in RC1.
In any cases, I've sent my patch to Grudge and I expect him to look at what I've done in the patch. It should at least fix rendering and posting problem that we've seen in RC1.
There might be a problem in database migration code (I'm writing this from 100% guessing). That needs to be fixed by dev team or some other volunteer since I haven't upgraded to RC2 yet.
taka,
We're looking at your code thanks. We are considering that upgrade may have affected things, we'll keep ploughing on ;)
Thanks,
Grudge
Grudge,
No problem :-)
I have a suggestion to the core development team. Would you pass it on?
I think it's time for SMF project to switch the default charset from ISO-8859-1 to UTF-8. If you look at the major web sites such as Google, Yahoo, and MSN, they are all using UTF-8 now. Using UTF-8 makes things a lot easier than ISO-8859-1 once your presence reached certain point, more and more people on this planet using your product.
You may think it's too late to make such change, but if you delay one day, there will be more data in the database for sure.
Please consider using UTF-8 for 1.1 GM release.
Taka
Taka,
I wholeheartedly second your suggestion: I think unicode is the only way ahead for globalization. Joomla developers were smart enough to have core UTF support for 1.1. I guess people in SMF must have realized the importance of unicode to a certain extent.
We are here to test and report (at least myself - you can do more).
Grudge,
I am sorry if some of my posts came across as aggressive. They were not meant to be :)
Compuart,
I do not know if this helps, but I checked with phpmyadmin on the same db where I have installed Joomla and the unicode entries there look perfect, ie:
Πόσες φορές άραγε κάποιοι Έλληνες πολίτες (γύρω ... Έχοντας ζήσει στο εξωτερικό και έχοντας
Whereas SMF entries (both RC1 and RC2) look like this
μή Ï,,οι χλιδῇ δοκεῖÏ,,ε μηδ᾽...
Sorry to be a nuissance, do we have any updates on the UTF front?
I'm still working on it. All changes have to be throroughly tested, especially when it comes to character sets. Although we'll have a better support for UTF-8 we also keep supporting other character sets. ISO-8859-1 will remain the default character set for most languages, but it will be possible to override the character set independent of the chosen language.
Ok Compuart, thanks for your feedback. If there is anything I can do to help in terms of testing please do not hesitate to contact me.
Thanks, I'll certainly contact you as soon as I'm no longer able to find any bugs when using UTF-8.
I started a new topic Multilingual in SMF 1.1RC2 without UTF for Greek and other languages? (http://www.simplemachines.org/community/index.php?topic=64142.msg443254#msg443254) and I would like your opinion and support there too.
QuoteI don' t want to argue with this against UTF-8.
I want to report that a working multilingual solution exists in a previous SMF version without using UTF-8.
QuotePlease Compuart, as you are focused on multilingual, take a look at this solution too and [Uknown]' s patch. Spiros' s help would be appreciated in this topic too.
Agridoc,
This is hardly a solution. In fact this forum works on this principle: converting all non-latin scripts to html entities (just use view source on this case - say the Greek board- and compare with view source in a UTF page - what do you see?). The only real solution is unicode. It is not by chance that google and other major internet companies use unicode instead of converting all non latin output to... entities! ;)
As far as working solutions go, RC1 was quite near full UTF support. A few hacks here and there, and all the minor character corruption problems in titles and Last Posts / Latest posts were solved.
We have not a quarrel between a UTF and non-UTF solution. I think both should be developed.
Quote from: agridoc on January 04, 2006, 12:19:49 PMSMF' s team should consider all the trends and possibilities. Compuart' s message (http://www.simplemachines.org/community/index.php?topic=63235.msg443044#msg443044) shows they take care of all this.
We can discuss the limitations and other things about the non-UTF solution in http://www.simplemachines.org/community/index.php?topic=64142.0
I will also try to help, if I can, in the development of the UTF solution here.
Quote from: spiros on January 04, 2006, 05:20:49 PM
This is hardly a solution. In fact this forum works on this principle: converting all non-latin scripts to html entities (just use view source on this case - say the Greek board- and compare with view source in a UTF page - what do you see?). The only real solution is unicode. It is not by chance that google and other major internet companies use unicode instead of converting all non latin output to... entities! ;)
Agreed. A solution with entities are too European centric. It won't work for Asian. Technically, you can replace all characters by UNICODE code point entity representation, but UNICODE entity representation shouldn't be used that way.
BTW, another UTF-8 related bug. In News.php, there's wrong use of substr which only works ISO-8859-X.
$row['body'] = strtr(substr(str_replace('<br />', "\n", $row['body']), 0, $modSettings['xmlnews_maxlen'] - 3), array("\n" => '<br />')) . '...';
Because of this code, RSS feed doesn't show body properly. It's chopped off in the middle of UTF-8 sequence. This substr should be replaced by mb_substr if it's available. That solved my problem.
Taka,
Thanks for another excellent solution!
I hope Compuart has noted this.
What about this?
http://www.simplemachines.org/community/index.php?topic=23509.0
I'm setting up a test server to try that solution but i'm doubtful...
mcgrelio,
Yes, this is the standard procedure for converting a non-UTF site to a UTF one. However, the point here is how SMF handles posting in a UTF environment.
Different kettle of fish ;)
Just another (strange) reason UTF-8 is recommended. In my case, I use a Greek encoding and Google indexes incorrectly the text in my forum. That is to say, it is indexed as if the encoding was not Windows-1253 and it was Latin. So in search results one sees extended ASCII characters (gobleddygook).
I contacted Google support and they recommended switching to UTF-8. So I guess, that, after all, if more topics are indexed correctly in other languages, the higher the publicity of the forum using it, and SMF as a consequence.
Click for a detailed report on this problem and Google's reply (http://www.simplemachines.org/community/index.php?topic=21930.0).
I use windows-1253 as codepage in all SMF' s languages I have installed (a cheat).
I did a search in Google for my site and I get 10.300 results (not all from the forum), correct Greek titles and text in the display.
Test it here (http://www.google.com.gr/search?as_q=&num=50&hl=el&newwindow=1&btnG=%CE%91%CE%BD%CE%B1%CE%B6%CE%AE%CF%84%CE%B7%CF%83%CE%B7+Google&as_epq=&as_oq=&as_eq=&lr=&as_ft=i&as_filetype=&as_qdr=all&as_occt=any&as_dt=i&as_sitesearch=aeromodelling.gr).
This is a test search for a greek word in my site (http://www.google.com.gr/search?q=%CE%B1%CE%B5%CF%81%CE%BF%CE%BC%CE%BF%CE%BD%CF%84%CE%AD%CE%BB%CE%BF+site%3Aaeromodelling.gr&btnG=%CE%91%CE%BD%CE%B1%CE%B6%CE%AE%CF%84%CE%B7%CF%83%CE%B7&num=50&hl=el&newwindow=1&as_qdr=all). Seems to work again.
I don' t argue against UTF-8, however I prefer another solution that works too with SMF 1.0x and I believe can be made to work with 1.1 too with help from SMF's support team. See Multilingual in SMF 1.1RC2 without UTF for Greek and other languages? (http://www.simplemachines.org/community/index.php?topic=64142.msg443254#msg443254)
Yes, I know, in some cases it does work, in others not!
The case with my site is like 5% of Greek posts are indexed correctly and the rest are not!
This is very strange because it is not a uniform behaviour and hence cannot be easily ascribed to a specific reason.
I have this problem too when i using upgrade package to upgrade from SMF 1.0.5 to 1.1 RC2 , after converting all Message, Post, Boardname, MemberName in UTF-8 gone away :D (Thank god that i've tested only in localhost), i found that not Database's problem, i think convert tool has problem when working with UTF-8 encoding....
I'm looking forward your solution :D
I found that void preparsecode(string &message, boolean previewing = false)
(Subs-Post.php)
is not utf-8 safe.
When I comment out preparsecode() in Post.php I can write my message safely.
but with preparsecode(..) I lost all my message body.
I tested with 1.1 RC2 mysql 5.0.16, apache 2.0.55 , and php 5.1.1 on NetBSD 3.0 macppc.
mysql compiled with utf8 default character set, database created in utf8.
and $txt['lang_character_set'] = 'UTF-8'; in index.english.php
( db: utf-8, php: utf-8 environment)
here is my test message.
Quote
[쿠키 스포츠] ○...오른무릎 근육 부상으로 최대 보름간의 재활에 들어간 박지성(25·맨체스터 유나이티드)의 공백에 대해 영국 현지 언론이 깊은 우려를 나타냈다.
맨체스터 지역신문 '맨체스터 이브닝 뉴스'는 10일(한국시간) "박지성이 버튼 앨비언과의 FA컵 3라운드 경기를 앞두고 가진 워밍업 도중 입은 무릎 부상으로 당분간 경기에 나설 수 없게 됐다"며 "알렉스 퍼거슨 감독은 박지성의 공백이 길어지는 것을 결코 원하지 않고 있다"고 전했다.
이 신문은 "박지성이 지난 해 여름 400만파운드의 몸값을 기록하며 PSV 에인트호벤에서 이적해왔지만 아직까지는 팀에 정착해나가고 있는 단계"라며 "박지성이 출장한 29경기에서 14차례 교체멤버로 나왔고 자신이 유용한 대체 요원임을 입증해왔다"고 평가했다.
신문은 이어 "박지성이 버밍엄 시티와의 칼링컵 8강전에서 맨유 입단 후 첫 골을 기록했고,그런 이유로 퍼거슨 감독은 박지성 없는 '빡빡한 1월'을 원하지 않을 것"이라고 덧붙였다.
축구 전문 사이트 ESPN 사커넷도 맨유 코너에서 박지성이 부상으로 칼링컵 4강 1차전에 출장할 수 없다는 '맨체스터 이브닝 뉴스'의 보도를 비중있게 다뤘다.
살인적인 일정을 소화해야 하는 맨유 입장에선 쉴틈 없이 그라운드를 누비는 박지성의 공백이 크게 느껴질 수 밖에 없다. 최소 10일,최대 보름의 재활 진단을 받은 박지성이 어느 정도 컨디션을 회복한다면 예상보다 빨리 팀에 복귀할 가능성을 읽을 수 있는 대목이다. 국민일보 쿠키뉴스 조상운 기자
Hello,
I got the same thing with UTF-8 encoding, but I used vietnamese.
It always attemp to change the character "à" to "ænbsp;"
I commented the function preparsecode(), but it can't help
See it here http://www.simplemachines.org/community/index.php?topic=65438.msg451703#new
This is interesting. It does the same to the Greek character Π.
We look at a sort of problem like this in the french translation with the email notification, could it be the same problem?
we obtain this in the mails...:
QuoteUne r=E9ponse a =E9t=E9 post=E9e dans un fil de discussion que vous
surveillez par GravuTrad.
Voir la r=E9ponse au=A0: http://www.gravure-et-traductions.com/Testsmf/index.php=3Ftop
ic=3D1.new;topicseen#new
D=E9sabonnement =E0 ce fil de discussion en cliquant ici: http://www.gravure-et-traductions.com/Testsmf/index.php=3Fact
ion=3Dnotify;topic=3D1.0
D'autres r=E9ponses peuvent avoir =E9t=E9 post=E9es entre temps, mais vous
ne recevrez pas d'autre notification tant que vous ne lirez pas ce fil de
discussion.
Cordialement,
L'=E9quipe My Community.
Like you can see entities are not well seen, but not the same things in the same inboxes (for example caramail do this example above but hotmail read well....)
Could it be an issue of the problem you are talking in this post?
Do you use UTF encoding? These problems seem to appear only when using UTF.
ansi or utf-8, same problem...
Hello Compuart and Grudge,
Any updates on the UTF issue?
I do not mean to inflict any pressure but I have a UTF portal on the standby and I would prefer to use SMF rather than a different forum...
So, if possible, I would really appreciate it if you could give us an estimate of the time needed to release any relevant patches.
Quote from: spiros on January 12, 2006, 05:51:39 AM
This is interesting. It does the same to the Greek character Π.
I found the problem.
This is the error when this function run
// Put it back together!
if (!$previewing)
$message = strtr(implode('', $parts), array(' ' => ' ', "\n" => '<br />', "\xA0" => ' '));
else
$message = strtr(implode('', $parts), array(' ' => ' ', "\xA0" => ' '));
it try to replace the code \xA0 to the HTML of Non break space
When I change to the normal space (press the space bar) like this
"\xA0" => ' '
The problem fixed
scripter,
You mean you change the instances of
"\xA0" => ' '
to
"\xA0" => ' '
I tried it. But it did not fix the problem with the Greek Π.
If it is done in Japanese, a problem will occur ...
だ -> nbsp;
頻 -> 馮bsp;?
// Put it back together!
if (!$previewing)
$message = strtr(implode('', $parts), array(' ' => ' ', "\n" => '<br />'));
else
$message = strtr(implode('', $parts), array(' ' => ' '));
Probably, I think that it is because \xA0 is contained in some characters. When the portion of \xA0 was deleted, it stopped then, generating garbled characters on SQL data at least. (Thanks to Turkey)
http://forum.logue.tk/index.php/topic,95.0.html
Isn't it hit by the same trouble in the languages (Japanese, Chinese, Korean, etc.) using the kanji?
I use utf-8 and I have the same problem with russian "P" on my SMF 1.1 RC2 forum.
Nothing above did not fix it.
Any idea o update for this problem?
Thanks
Hi Du!
I have noticed this problem in http://www.simplemachines.org/community/index.php?topic=63235.msg440773#msg440773 pointing that there might be a correlation.
I use as a test string for multilingual the catalog of the language specific support boards of SMF, so it is by luck that I noticed this problem with Russian.
There might be problem with other languages too but that must be found by people using them.
I also had a request for multilingual in a more simple approach in Multilingual in SMF 1.1RC2 without UTF for Greek and other languages? (http://www.simplemachines.org/community/index.php?topic=64142.msg443254#msg443254) that might interest some of the Russian community, as I believe many use windows-1251 and not UTF-8.
Compuart is looking after these two cases but no solution has yet appeared.
I noticed the following 2 problems in 1.1 RC2.
The 1st is about making categories and subforums with greek fonts. When i make the category i can see Greek with no problem at all, when i try to edit the category i see the Greek fonts like that
http://img227.imageshack.us/img227/4029/problem24zp.jpg
The 2nd problem has to do with searching in greek fonts. I did a search for the word "άθλημα" and i found result only i had the same signs as above appearing
http://img227.imageshack.us/img227/1462/problem4xo.jpg
Hi BadCluster
It is obvious that you are not using UTF-8 but ISO-8859-1 (English) codepage and English language as default or as user. You must install greek language. If you want greek to work with english menus replace ISO-8859-1 with windows-1253 or ISO-8859-7 in index.english.php found in .../Themes/default/languages
If you want more on this search the Greek language support board http://www.simplemachines.org/community/index.php?board=78.0 or post a new topic there as it is out of this topic' s subject.
I have this problem too :(
In RC2, the charactor "à" become "?" , and when i edit this post, i see "à" become "ænbsp;", please fix this soon :(
I'm using UTF-8 Encoding..
And i found that, if i using quick edit button
to edit the post, i see that this bug not happen, everything is ok. maybe this help you found how to fix problem soon.
Best regard
NTA
Simple and tested solution @ http://www.simplemachines.org/community/index.php?topic=70750.0 (http://www.simplemachines.org/community/index.php?topic=70750.0).
This helped:
In file Subs-Post.php change
// Put it back together!
if (!$previewing)
$message = strtr(implode('', $parts), array(' ' => ' ', "\n" => '<br />', "\xA0" => ' '));
else
$message = strtr(implode('', $parts), array(' ' => ' ', "\xA0" => ' '));
to
// Put it back together!
if (!$previewing)
$message = strtr(implode('', $parts), array(' ' => ' ', "\n" => '<br />'));
else
$message = strtr(implode('', $parts), array(' ' => ' '));
Anyway, I tested this solution only on localhost because maybe this will mess something else. What are you (developers) think about this?
Thank you!!!!! That did it!!! :P I tried all proposed solutions but only this one worked. Thanks
Quote from: CalCal on February 27, 2006, 11:28:25 AM
This helped:
In file Subs-Post.php change
// Put it back together!
if (!$previewing)
$message = strtr(implode('', $parts), array(' ' => ' ', "\n" => '<br />', "\xA0" => ' '));
else
$message = strtr(implode('', $parts), array(' ' => ' ', "\xA0" => ' '));
to
// Put it back together!
if (!$previewing)
$message = strtr(implode('', $parts), array(' ' => ' ', "\n" => '<br />'));
else
$message = strtr(implode('', $parts), array(' ' => ' '));
Anyway, I tested this solution only on localhost because maybe this will mess something else. What are you (developers) think about this?
It would be very useful if SMF developers can say something about this "hack"... Is it safe to remove the thing I removed:
"\xA0" => ' '
???
Quote from: CalCal on February 27, 2006, 11:28:25 AM
This helped:
In file Subs-Post.php change
// Put it back together!
if (!$previewing)
$message = strtr(implode('', $parts), array(' ' => ' ', "\n" => '<br />', "\xA0" => ' '));
else
$message = strtr(implode('', $parts), array(' ' => ' ', "\xA0" => ' '));
to
// Put it back together!
if (!$previewing)
$message = strtr(implode('', $parts), array(' ' => ' ', "\n" => '<br />'));
else
$message = strtr(implode('', $parts), array(' ' => ' '));
Anyway, I tested this solution only on localhost because maybe this will mess something else. What are you (developers) think about this?
Thank! :D My problem fixed now :P
As far as I know, in the latest build, the UTF problems have been fixed. Just wait for the SMF 1.1 final.
http://forum.logue.tk/index.php/topic,98.msg631.html#new (http://forum.logue.tk/index.php/topic,98.msg631.html#new)
Hope this helped !
Quote from: CalCal on February 27, 2006, 11:28:25 AM
This helped:
In file Subs-Post.php change
// Put it back together!
if (!$previewing)
$message = strtr(implode('', $parts), array(' ' => ' ', "\n" => '<br />', "\xA0" => ' '));
else
$message = strtr(implode('', $parts), array(' ' => ' ', "\xA0" => ' '));
to
// Put it back together!
if (!$previewing)
$message = strtr(implode('', $parts), array(' ' => ' ', "\n" => '<br />'));
else
$message = strtr(implode('', $parts), array(' ' => ' '));
Anyway, I tested this solution only on localhost because maybe this will mess something else. What are you (developers) think about this?
Thanks, i change and it work on my site (http://yohosting.net/forrum)
even the news also resolved.
One again, thanks
But i found another problem that it ok for the short of massage only. If i post a long message the err happent again all 'à' letter become 'a?'
Anyone can help ?
Any fixes on Greek Character 'Π' problem?
Quote from: CalCal on February 27, 2006, 11:28:25 AM
This helped:
In file Subs-Post.php change
// Put it back together!
if (!$previewing)
$message = strtr(implode('', $parts), array(' ' => ' ', "\n" => '<br />', "\xA0" => ' '));
else
$message = strtr(implode('', $parts), array(' ' => ' ', "\xA0" => ' '));
to
// Put it back together!
if (!$previewing)
$message = strtr(implode('', $parts), array(' ' => ' ', "\n" => '<br />'));
else
$message = strtr(implode('', $parts), array(' ' => ' '));
Anyway, I tested this solution only on localhost because maybe this will mess something else. What are you (developers) think about this?
I tried this and the result is good except one problem: if in your post have any link (without link BBCode Tag) , the charactor "à" become "?" like before patching. So, what can I do now ?
The tip (not so nice and cn be tempotary using while waiting for ver1.1 final release) but it resolved the 'à' become '?' is
turn OFF option Automatically link posted URLs
( admin -> Manage Posts and Topics -> Bulletin Board Code Settings )
I think the problem cause by this feature.
demo (http://yohosting.net/foum)