News:

Join the Facebook Fan Page.

Main Menu

Converting to utf-8

Started by dxyy, November 26, 2008, 04:39:34 AM

Previous topic - Next topic

dxyy

I have a forum on which both English and Chinese are often used. Sometimes the Chinese characters were not displayed properly for me and many other members, but a few had no problems. We soon discovered the problem seemed to lie in the character encoding used by each member. For example if someone used unicode then they had no problems viewing any of the Chinese characters, whereas other members would not be able to see the characters inputted by the members using unicode on their computers or browsers. If anyone changed the character encoding on their browser [view-character encoding-(unicode)utf-8] they would be able to see everything displayed properly. This tells me that the characters in the database ought to be fine.

I searched around this forum and found that I should try converting the database and files to utf-8. I did this without any errors I am aware of. However, I still don't see those characters from before being displayed correctly. I also later used the "Convert HTML-entities to UTF-8 characters" but this also didn't seem to do the trick. :(

I'm really not sure what else needs to be done, but I'd truly like to get everything sorted out. I'll really appreciate any help or guidance anyone has to offer. Thanks! :)



dxyy

I feel bad to bump this again, but I still don't have any answers. :(

ThorstenE

the html source of your forum shows charset=ISO-8859-1"..
this is a charset issue.. the main problem is now: some of your characters are saved in UTF-8 (those users, who change it in browser, others were saved in ISO-8859-1)

Hope this helps:
UTF-8 Readme

dxyy

Oh, I have only converted my test forum and not my real forum as yet. ;)

I always make changes to the test forum before touching my real forum

Here is the test forum: Test

ThorstenE

I cannot see any difference between the test-forum and the real forum. Maybe my browsers (IE 7, Firefox 3) cannot show the characters. Tested this with UTF-8, GB2312, GBK, GB18030..

Can you post a screenshot how it should look?

dxyy

I think you should only see a difference in areas where Chinese characters are inputted by users.

Take for example:

TEST

versus

REAL

If you scroll through those threads you should notice some of the characters are properly displayed whereas other are not. User Leah's posts are mainly not, whereas Phoebe's are displayed properly.

Ok something that even further puzzles me is that on the test forum (which is where I did the database conversion) some of the characters are not displayed properly even though my browser is set to utf-8. Conversely, however, on the real forum (has not yet undergone the database change from ISO-8859-1 to utf-8) if I set my browser's encoding to utf-8 everything is displayed correctly. :o

Hope everything I said makes sense, and I greatly appreciate your help with this!

ThorstenE

Quote from: hugodiaz on November 29, 2008, 01:36:27 PM

Ok something that even further puzzles me is that on the test forum (which is where I did the database conversion) some of the characters are not displayed properly even though my browser is set to utf-8. Conversely, however, on the real forum (has not yet undergone the database change from ISO-8859-1 to utf-8) if I set my browser's encoding to utf-8 everything is displayed correctly. :o
that's because the characters are already UTF-8 (in a non-UTF8 collation database). You cannot convert UTF-8 characters to UTF-8.
I think the only solution is to export the database, then convert  to UTF-8, then re-import the old non-UTF-8 dump. You must change the CREATE TABLE in the .sql file and replace the collation to UTF-8.. I prefer Notepad++ for this because it can handle the most charsets and you can convert the charset between UTF-8 and ANSI.

Only a few tips, I hope they help a bit..


dxyy

Haha, my non-programmer brain has been thoroughly confused by your last post. :P

What should I do for the real forum? With the real forum I have not yet done anything. With the test forum I have used the admin panel to convert the database.

Also what sql file exactly? Sorry, but I'm not too familiar with all of these things so it basically all currently sounds like gibberish. :(

ThorstenE

During a backup from your database all database information is written into a sinle file (.sql).. this is a single textfile containing all of your database entries and structure. you can edit this file with a text-editor (Notepad++ for example). I would guide you through that but my browsers doesn't support the chinese charset and so it's impossible for me to help ..
have you tried asking for help in our Chinese Language Board?

dxyy

I've posted in the Chinese support board, but that area seems to be extremely inactive. :(

Is is that hard to convert a database? :(

Antechinus

Hugodiaz, do you still require assistance with this problem?


Dannii

Your forum is sending the ISO-8859-1 encoding. You need to change your index template and language files, and if you have integrated your forum with any other software, check if they ever call the header() function.
"Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise."

Norv

Please, login in phpMyAdmin, and export your database using the "export" tab. Uncheck "data", let it be structure only. Set it to save the result into a file, and attach that file here.
To-do lists are for deferral. The more things you write down the later they're done... until you have 100s of lists of things you don't do.

File a security report | Developers' Blog | Bug Tracker


Also known as Norv on D* | Norv N. on G+ | Norv on Github

Sarge

This is certainly an interesting case. Perhaps some users (Leah for example) use special software to enable them to read UTF-8 text but type in other character encodings like Big5 or GB2312?

UTF-8 supports both Simplified and Traditional Chinese. You and other users may need to install additional input languages in the operating system (Windows XP etc.) to be able to read Traditional Chinese and/or Simplified Chinese. You also may need to install and use fonts that support at least one of these Chinese standard sets, preferably both.

I suggest installing a fresh test forum as UTF-8 (there's a checkbox for UTF-8 in the SMF install screen) and asking your forum members (Leah and Phoebe, for example) to register in the new test forum and type some posts there. Do not import any databases in the new forum. Please let us know when this is done.

And last but not least: update SMF! Both your main forum and the test one are running SMF 1.1.7, an old version with security issues. 1.1.10 is the latest version of the 1.1.x line and you're strongly advised to upgrade to this version as soon as possible. You should normally be able to update 1.1.7 to 1.1.8, then to 1.1.9 and to 1.1.10 from inside SMF.

    Please do not PM me with support requests unless I invite you to.

http://www.zeriyt.com/   ~   http://www.galeriashqiptare.net/


Quote
<H> I had zero posts when I started posting

dxyy

Sorry to bump this, but I still need some help.

I'd like someone to assist me with this database issue, because it's something that I have been meaning to deal with for the longest while but still haven't managed to. I think I'll start a new topic to refer to this thread in the help wanted board.

Thanks for all of your suggestions though. :)

Antechinus

That's fine. You're allowed to bump it if you are still having problems. Have you tried Sarge's last suggestion about the test site yet?

Norv

Your forum pages are still ISO-8859-1. Please consider to do this:
1) login in phpMyAdmin (What is repair_settings.php?), and select your database on the left. Then make sure to choose the "Structure" tab. You will get a listing of all tables in your database, and one of the columns will say "Collation". What is the collation of all tables? Eventually, please make a screenshot.

2) connect with FTP to your account, and download to your computer the file Settings.php from your forum directory. Open it in a code editor and see if it contains a line like:

$db_character_set = 'utf8';

Please tell if you have that line. (don't modify the file yet)

3) In phpMyAdmin, select the "SQL" tab, and paste there the following SQL code:

SELECT value FROM smf_settings WHERE variable = 'global_character_set';

and post the result here.

Note: as alternative to 1), it might be even more useful if you could consider to follow the previous advice:
Quote from: Norv on August 08, 2009, 08:36:56 AM
Please, login in phpMyAdmin, and export your database using the "export" tab. Uncheck "data", let it be structure only. Set it to save the result into a file, and attach that file here.
It would contain all necessary information, including for rows, not only tables, thus allowing us to understand faster what is going on.
To-do lists are for deferral. The more things you write down the later they're done... until you have 100s of lists of things you don't do.

File a security report | Developers' Blog | Bug Tracker


Also known as Norv on D* | Norv N. on G+ | Norv on Github

Advertisement: