News:

Wondering if this will always be free?  See why free is better.

Main Menu

[3474] UTF-8 Handling

Started by ncrawler99, May 06, 2009, 02:34:08 PM

Previous topic - Next topic

ncrawler99

In the example below, you will see the UTF-8 text will not display properly. It's garbled and cut off. If you try and quote this post, use the print function, or edit it, it will display properly. This is a bug present in SMF 1.1.8 and 2.0RC1. The problem starts at the '[' character in the post. Smileys must be enabled.

See this topic for more information on the specific code in 'Subs.php' that should be looked at:
http://www.simplemachines.org/community/index.php?topic=308539.0

On my server, I can get the text to display properly only if I use mbstring override functionality in php.ini on all string functions. SMF uses multi-byte unsafe functions and doesn't seem to handle UTF-8 'properly' at all.

Example of failure:

// へいし「さあ おとなしく クリスタルを[LB]
//  わたすのだ![LB]
// ちょうろう「ワシらが いったい なにを[LB]
//  したというのじゃ......[LB]
// 『いのちが おしければ おとなしく クリスタルを わたすのだ.[LB]
// くろまどうし「ことわる![LB]
// [LB]
// 『しかたない...かかれ...![End]

// くろまどうし「うわ―っ![End]

// しろまどうし「やめてください![LB]
// へいし「じゃまだてするか![End]

// ちょうろう「なんということを......[LB]
//  わかった......[LB]
//  クリスタルを もってゆくがよい......[LB]
// [LB]
// へいし「さいしょから すなおに[LB]
//  わたせば よいものを![End]

// ちょうろう「なぜじゃ! あのバロンおうが[LB]
//  こんなやりかたを するなぞ......[LB]
//  なにゆえ そなたらは こうまでして[LB]
//  クリスタルを もとめる!?[LB]
// 『...[End]

// しろまどうし「キャ―ッ![End]

// へいし「われわれ あかいつばさは[LB]
//  ほこりたかき ひくうていだん![LB]
//  かよわいものから りゃくだつなど![End]

// 『やめるんだ![End]

Spaceman-Spiff

Replied your other topic.

Özgür

So Long

ncrawler99

Yes. There's also a bug when handling 4-byte Unicode characters. It fails to store a subject when using Unicode characters above U+10000. I can provide additional details if any devs want them.

It seems SMF leaves a bit to be desired with UTF-8 handling in several cases. I'd be happy to continue to provide information on the matter, but it seems so far, no devs have cared to comment.  :'(

Özgür

Yeah i have this issue but now solved.  Me too subject error and i can't fix it.
I try everthing..
- Download my db (48 mb) and edit all non-utf8 chacrecter to utf-8
- All smf files sources, themes, index.php and SSI.php replace to 2.0 rc1 packages.
- bla bla bla..
Maybe your problem is solved also solved my problems..
So Long

SleePy

Özgür´,

You said you fixed this?

ncrawler99,

The multibtye work arounds I believe where done in order to combat hosts who do not have this setup properly. Thus breaking UTF-8 from even working.

Does this problem exist on this forum? You can use the test board to see if it works.
Jeremy D ~ Site Team / SMF Developer ~ GitHub Profile ~ Join us on IRC @ Libera.chat/#smf ~ Support the SMF Support team!

Özgür

Quote from: SleePy on May 28, 2009, 07:00:48 PM
Özgür´,

You said you fixed this?

ncrawler99,

The multibtye work arounds I believe where done in order to combat hosts who do not have this setup properly. Thus breaking UTF-8 from even working.

Does this problem exist on this forum? You can use the test board to see if it works.
Yeah i use this method
Quote from: Spaceman-Spiff on May 06, 2009, 07:41:30 PM
Yes, it seems to be smiley problem.

In v1.1.8, changing this line in Subs.php seems to fix it in my testing forum:

Search for:
            $smileyfromcache[] = '/(?<=[>:\?\.\s' . $non_breaking_space . '[\]()*\\\;]|^)(' . preg_quote($smileysfrom[$i], '/') . '|' . preg_quote(htmlspecialchars($smileysfrom[$i], ENT_QUOTES), '/') . ')(?=[^[:alpha:]0-9]|$)/' . ($context['utf8'] ? 'u' : '');

Replace with:
            $smileyfromcache[] = '/(?<=[>:\?\.\s' . $non_breaking_space . '[\]()*\\\;]|^)(' . preg_quote($smileysfrom[$i], '/') . '|' . preg_quote(htmlspecialchars($smileysfrom[$i], ENT_QUOTES), '/') . ')(?=[^[:alpha:]0-9]|$)/';

Basically it removes the /u (PCRE_UTF8). Now, I'm not an expert in regex, so I can't tell you if removing this may have any side effects. Maybe someone else can explain better.

2.0 codes are different for this. But if you remove ' . ($context['utf8'] ? 'u' : '') to smileyparse function this issue solved.

This site have this bug too.
You can see this http://www.simplemachines.org/community/index.php?topic=313953.new#new
So Long

SleePy

Thanks, Using that test message I was able to confirm this as well in the latest code.

Bug #3474: UTF-8 mishandled for Smileys
Jeremy D ~ Site Team / SMF Developer ~ GitHub Profile ~ Join us on IRC @ Libera.chat/#smf ~ Support the SMF Support team!

Advertisement: