Author Topic: "shorten_subject" and UTF-8 problem in SMF 1.1 RC3  (Read 8837 times)

Offline badbuta

  • Newbie
  • *
  • Posts: 4
"shorten_subject" and UTF-8 problem in SMF 1.1 RC3
« on: August 23, 2006, 06:51:33 AM »
I just upgrade my site to RC3 with Bridge 1.1.6.
I find that the shorten subject function does not function correctly for UTF-8 (Chinese) character.
It shows some strange characters at the end: "�..."

Please see my site to have a look: [nonactive]

Besides, I have tried to modify "src/load.php @#192" based on another fix for RC2 but FAIL~~:
Code: [Select]
return strlen(preg_replace.....
Code: [Select]
return mb_strlen(preg_replace.....
Any suggestion?

Thank you very much.

Offline Compuart

  • SMF Friend
  • SMF Hero
  • *
  • Posts: 5,774
  • Gender: Male
  • Zéeeeal
    • RealityNet
Re: "shorten_subject" and UTF-8 problem in SMF 1.1 RC3
« Reply #1 on: August 24, 2006, 08:06:28 AM »
Which character set are you using?
Hendrik Jan Visser
Former Lead Developer & Co-founder
Personal Signature: -> / / WieIsDeMol.Com

Offline badbuta

  • Newbie
  • *
  • Posts: 4
Re: "shorten_subject" and UTF-8 problem in SMF 1.1 RC3
« Reply #2 on: August 26, 2006, 12:06:36 AM »
Sorry, I donno how to get the charset you mentioned precisely.
I am using Lunarpage hosting and there is myPhpAdmin  The sql server is MySQL - 4.0.25-standard.  I can only get the character set information from myPhpAdmin:

Code: [Select]
Server variables and settings:

character set: latin1

character sets: latin1 big5 czech euc_kr gb2312 gbk latin1_de sjis tis620 ujis dec8 dos german1 hp8 koi8_ru latin2 swe7 usa7 cp1251 danish hebrew win1251 estonia hungarian koi8_ukr win1251ukr greek win1250 croat cp1257 latin5

The problem happened after I upgraded from RC2 to RC3.  I remembered that I did some changes in the code and database.  However, I forget the details....  :-[

So, would you mind to teach me how to get the information to you? e.g. The way to get the character set information from myPhpAdmin...etc.

I am using Joomla 1.0.10.  Joomla and SMF are sharing the same database.
« Last Edit: August 26, 2006, 12:12:29 AM by badbuta »

Offline badbuta

  • Newbie
  • *
  • Posts: 4
Re: "shorten_subject" and UTF-8 problem in SMF 1.1 RC3
« Reply #3 on: September 01, 2006, 01:17:59 AM »
I am finally retry and patch the sourcecode: Load.php
The key is: I missed setting mb_internal_encoding to UTF8.
It is working now (at least~~), YEAH!!!  :D

add line after line #24, like this:

Code: [Select]
if (!defined('SMF'))
die('Hacking attempt...');

replace line ~#192 from:

Code: [Select]
return strlen(preg_replace(\'~' . $ent_list . ($utf8 ? '|.~u' : '~') . '\', \'_\', ' . implode('$string', $ent_check) . '));'),
Code: [Select]
return mb_strlen(preg_replace(\'~' . $ent_list . ($utf8 ? '|.~u' : '~') . '\', \'_\', ' . implode('$string', $ent_check) . '));'),
I understand it is not the best (and complete) solution.  However, still hoping new release can handle better for UTF8 ...
« Last Edit: September 01, 2006, 02:07:38 AM by badbuta »