• Welcome to Simple Machines Community Forum. Please login or sign up.
December 09, 2021, 02:02:42 AM

News:

Join the Facebook Fan Page.


Converting to utf-8

Started by dxyy, November 26, 2008, 04:39:34 AM

Previous topic - Next topic

dxyy

I have a forum on which both English and Chinese are often used. Sometimes the Chinese characters were not displayed properly for me and many other members, but a few had no problems. We soon discovered the problem seemed to lie in the character encoding used by each member. For example if someone used unicode then they had no problems viewing any of the Chinese characters, whereas other members would not be able to see the characters inputted by the members using unicode on their computers or browsers. If anyone changed the character encoding on their browser [view-character encoding-(unicode)utf-8] they would be able to see everything displayed properly. This tells me that the characters in the database ought to be fine.

I searched around this forum and found that I should try converting the database and files to utf-8. I did this without any errors I am aware of. However, I still don't see those characters from before being displayed correctly. I also later used the "Convert HTML-entities to UTF-8 characters" but this also didn't seem to do the trick. :(

I'm really not sure what else needs to be done, but I'd truly like to get everything sorted out. I'll really appreciate any help or guidance anyone has to offer. Thanks! :)



dxyy

I feel bad to bump this again, but I still don't have any answers. :(

ThorstenE

the html source of your forum shows charset=ISO-8859-1"..
this is a charset issue.. the main problem is now: some of your characters are saved in UTF-8 (those users, who change it in browser, others were saved in ISO-8859-1)

Hope this helps:
UTF-8 Readme

dxyy

Oh, I have only converted my test forum and not my real forum as yet. ;)

I always make changes to the test forum before touching my real forum

Here is the test forum: Test

ThorstenE

I cannot see any difference between the test-forum and the real forum. Maybe my browsers (IE 7, Firefox 3) cannot show the characters. Tested this with UTF-8, GB2312, GBK, GB18030..

Can you post a screenshot how it should look?

dxyy

I think you should only see a difference in areas where Chinese characters are inputted by users.

Take for example:

TEST

versus

REAL

If you scroll through those threads you should notice some of the characters are properly displayed whereas other are not. User Leah's posts are mainly not, whereas Phoebe's are displayed properly.

Ok something that even further puzzles me is that on the test forum (which is where I did the database conversion) some of the characters are not displayed properly even though my browser is set to utf-8. Conversely, however, on the real forum (has not yet undergone the database change from ISO-8859-1 to utf-8) if I set my browser's encoding to utf-8 everything is displayed correctly. :o

Hope everything I said makes sense, and I greatly appreciate your help with this!

ThorstenE

Quote from: hugodiaz on November 29, 2008, 01:36:27 PM

Ok something that even further puzzles me is that on the test forum (which is where I did the database conversion) some of the characters are not displayed properly even though my browser is set to utf-8. Conversely, however, on the real forum (has not yet undergone the database change from ISO-8859-1 to utf-8) if I set my browser's encoding to utf-8 everything is displayed correctly. :o
that's because the characters are already UTF-8 (in a non-UTF8 collation database). You cannot convert UTF-8 characters to UTF-8.
I think the only solution is to export the database, then convert  to UTF-8, then re-import the old non-UTF-8 dump. You must change the CREATE TABLE in the .sql file and replace the collation to UTF-8.. I prefer Notepad++ for this because it can handle the most charsets and you can convert the charset between UTF-8 and ANSI.

Only a few tips, I hope they help a bit..


dxyy

Haha, my non-programmer brain has been thoroughly confused by your last post. :P

What should I do for the real forum? With the real forum I have not yet done anything. With the test forum I have used the admin panel to convert the database.

Also what sql file exactly? Sorry, but I'm not too familiar with all of these things so it basically all currently sounds like gibberish. :(

ThorstenE

During a backup from your database all database information is written into a sinle file (.sql).. this is a single textfile containing all of your database entries and structure. you can edit this file with a text-editor (Notepad++ for example). I would guide you through that but my browsers doesn't support the chinese charset and so it's impossible for me to help ..
have you tried asking for help in our Chinese Language Board?

dxyy

I've posted in the Chinese support board, but that area seems to be extremely inactive. :(

Is is that hard to convert a database? :(

Antechinus

Hugodiaz, do you still require assistance with this problem?


Dannii

Your forum is sending the ISO-8859-1 encoding. You need to change your index template and language files, and if you have integrated your forum with any other software, check if they ever call the header() function.
"Never imagine yourself not to be otherwise than what it might appear to others that what you were or might have been was not otherwise than what you had been would have appeared to them to be otherwise."

Norv

Please, login in phpMyAdmin, and export your database using the "export" tab. Uncheck "data", let it be structure only. Set it to save the result into a file, and attach that file here.
To-do lists are for deferral. The more things you write down the later they're done... until you have 100s of lists of things you don't do.

File a security report | Developers' Blog | Bug Tracker


Also known as Norv on D* | Norv N. on G+ | Norv on Github

Sarge

This is certainly an interesting case. Perhaps some users (Leah for example) use special software to enable them to read UTF-8 text but type in other character encodings like Big5 or GB2312?

UTF-8 supports both Simplified and Traditional Chinese. You and other users may need to install additional input languages in the operating system (Windows XP etc.) to be able to read Traditional Chinese and/or Simplified Chinese. You also may need to install and use fonts that support at least one of these Chinese standard sets, preferably both.

I suggest installing a fresh test forum as UTF-8 (there's a checkbox for UTF-8 in the SMF install screen) and asking your forum members (Leah and Phoebe, for example) to register in the new test forum and type some posts there. Do not import any databases in the new forum. Please let us know when this is done.

And last but not least: update SMF! Both your main forum and the test one are running SMF 1.1.7, an old version with security issues. 1.1.10 is the latest version of the 1.1.x line and you're strongly advised to upgrade to this version as soon as possible. You should normally be able to update 1.1.7 to 1.1.8, then to 1.1.9 and to 1.1.10 from inside SMF.

    Please do not PM me with support requests unless I invite you to.

http://www.zeriyt.com/   ~   http://www.galeriashqiptare.net/


Quote
<H> I had zero posts when I started posting

dxyy

Sorry to bump this, but I still need some help.

I'd like someone to assist me with this database issue, because it's something that I have been meaning to deal with for the longest while but still haven't managed to. I think I'll start a new topic to refer to this thread in the help wanted board.

Thanks for all of your suggestions though. :)

Antechinus

That's fine. You're allowed to bump it if you are still having problems. Have you tried Sarge's last suggestion about the test site yet?

Norv

Your forum pages are still ISO-8859-1. Please consider to do this:
1) login in phpMyAdmin (What is repair_settings.php?), and select your database on the left. Then make sure to choose the "Structure" tab. You will get a listing of all tables in your database, and one of the columns will say "Collation". What is the collation of all tables? Eventually, please make a screenshot.

2) connect with FTP to your account, and download to your computer the file Settings.php from your forum directory. Open it in a code editor and see if it contains a line like:

$db_character_set = 'utf8';

Please tell if you have that line. (don't modify the file yet)

3) In phpMyAdmin, select the "SQL" tab, and paste there the following SQL code:

SELECT value FROM smf_settings WHERE variable = 'global_character_set';

and post the result here.

Note: as alternative to 1), it might be even more useful if you could consider to follow the previous advice:
Quote from: Norv on August 08, 2009, 08:36:56 AM
Please, login in phpMyAdmin, and export your database using the "export" tab. Uncheck "data", let it be structure only. Set it to save the result into a file, and attach that file here.
It would contain all necessary information, including for rows, not only tables, thus allowing us to understand faster what is going on.
To-do lists are for deferral. The more things you write down the later they're done... until you have 100s of lists of things you don't do.

File a security report | Developers' Blog | Bug Tracker


Also known as Norv on D* | Norv N. on G+ | Norv on Github

dxyy

Norv thanks a lot for your suggestions. I have a quick question about exporting the database. Does structure only mean that I will not be compromising the data of my users? Haha, it might seem like a silly question to experienced heads like yourself, but bear with me please.

Thanks!

Norv

Quote from: hugodiaz on December 26, 2009, 02:10:49 PM
Norv thanks a lot for your suggestions. I have a quick question about exporting the database. Does structure only mean that I will not be compromising the data of my users?

Yes, structure-only means that absolutely nothing of the data of users, no users, no posts, not even the default post that's installed with SMF, will not be in the file. What will be, are SQL instructions about how to create a similar - but empty - database.
To-do lists are for deferral. The more things you write down the later they're done... until you have 100s of lists of things you don't do.

File a security report | Developers' Blog | Bug Tracker


Also known as Norv on D* | Norv N. on G+ | Norv on Github

dxyy

Ok, thanks for the reply.

Here is the structure of both my regular forum (no changes have yet been implemented, but Chinese characters are usually displayed properly if I change my browsers encoding to utf-8) and my test forum (database was already converted to utf-8, but old Chinese characters cannot be displayed no matter what I do)


Norv

Can you please verify and let us know the answers to the other two questions above?

Quote from: Norv on December 11, 2009, 06:45:24 PM
2) connect with FTP to your account, and download to your computer the file Settings.php from your forum directory. Open it in a code editor and see if it contains a line like:

$db_character_set = 'utf8';

Please tell if you have that line. (don't modify the file yet)

3) In phpMyAdmin, select the "SQL" tab, and paste there the following SQL code:

SELECT value FROM smf_settings WHERE variable = 'global_character_set';

and post the result here.
To-do lists are for deferral. The more things you write down the later they're done... until you have 100s of lists of things you don't do.

File a security report | Developers' Blog | Bug Tracker


Also known as Norv on D* | Norv N. on G+ | Norv on Github

dxyy

Norv, just in case you're not aware of this I want to reiterate that I have my regular forum and the test forum. I haven't modified anything on the regular forum as yet, but I already changed the database to utf-8 through phpMyAdmin on the test forum.

Now to answer your questions. All of these are with respect to the test forum.

Settings.php contains the line
$db_character_set = 'utf8';

As for the stuff in phpMyAdmin, maybe the result is ... nothing? Here is a screenshot for your reference.

Norv

Yes, thank you, I know actually, and I've re-read all the topic and checked out the two forums again, in fact, to refresh my memory.
And yes, please consider these questions are about the test forum, if the situation remained the same as in previous posts.

Quote from: hugodiaz on December 27, 2009, 07:02:34 AM
I haven't modified anything on the regular forum as yet, but I already changed the database to utf-8 through phpMyAdmin on the test forum.
How did you do that? I thought you ran "convert to utf8" from the forum administration tools, not with phpMyAdmin. Please correct my understanding if wrong.

What language is selected for the test forum, currently, as default language? Please specify if English, or English-utf8, or is English-utf8 installed at all?
To-do lists are for deferral. The more things you write down the later they're done... until you have 100s of lists of things you don't do.

File a security report | Developers' Blog | Bug Tracker


Also known as Norv on D* | Norv N. on G+ | Norv on Github

dxyy

I believe I later tried installing English utf-8, but the default language is still set to English.

As for changing the database, if I remember correctly I went to phpMyAdmin, then to operations tab and changed the collation to utf8_general_ci.  I also think I then went to the admin panel in my smf forum and .... oh no, actually I'm confusing myself now, because I seem to also remember using the smf admin panel to convert the database.

It's really unfortunate that I didn't deal with this back then when everything was still fresh in my mind.

Anyway, I still have my original forum, so any suggestions as to what I should do with that database? Of course I will set up yet another test forum to see how everything goes before messing around with my actual forum.

Norv

Yes, that's what I would propose.
Please consider copying the "real" forum all on another test forum, both files and restoring the database into a new one. You might want to create the new database as UTF-8, to avoid potential annoyances later.
Run repair_settings.php on the new test forum, then, to correct the paths and database.

Download and install the English-UTF8 language pack.
Switch the default language to English UTF-8. Disable allowing people to set their own language (if enabled).

Then, please let us know how it goes, and what is working wrong with the new test forum, at this point.

Side note: Are you using mods? If yes, they may need edits in the English-UTF8 files, most likely you would need to make those manually.
To-do lists are for deferral. The more things you write down the later they're done... until you have 100s of lists of things you don't do.

File a security report | Developers' Blog | Bug Tracker


Also known as Norv on D* | Norv N. on G+ | Norv on Github

dxyy

How do I create the database as utf-8?

Yes, I'm using quite a few mods... haha, so I'm not really looking forward to having to make all those manual edits.

Norv

If you can create a database in phpMyAdmin, then on the page there must be a "collation" option, where you can choose utf8-general-ci.
If you create it using your host's panel interface, then just create it and then, go in phpMyAdmin and see what collation is has, if possible. You might be able to also change it directly, then.

Well, only those that modified language files will need edits.
To-do lists are for deferral. The more things you write down the later they're done... until you have 100s of lists of things you don't do.

File a security report | Developers' Blog | Bug Tracker


Also known as Norv on D* | Norv N. on G+ | Norv on Github

dxyy

Ok, I'm supposedly on vacation right now so I should have enough time to finally tackle this issue.

I'm confused and concerned about one thing. My database is currently not utf-8 and due to the fact that both English and Chinese have been used by members to make posts wouldn't that affect the way these posts are stored in the database tables? So if I create a new database as utf-8 and upload my current database there, wouldn't I still have a problem with those characters that were not originally stored as utf-8?

What I'm really asking is do I need to somehow convert my databse to utf-8 (like on my computer) and then upload it to the newly created database (which is also setup as utf-8)? Hope my question makes sense?

I just want to start from scratch. Take my real forum's database and figure out how to make it work the way it should i.e. individual members do not have to mess with browser encoding settings and everyone can see the characters being displayed properly! Haha, but there still needs to be the option for users to choose between English (I guess utf-8) and simplified Chinese for the forum's display language. ;)

JimM

If you are doing this on a test forum then I would try it both ways.  Import your database and then convert it from the adminCP and see what the results are.  If everything is good, then you can move on to trying the other methods you mentioned.  The key is to keep a backup untouched so you can keep trying till you get it the way you want.
Jim "JimM" Moore
Former Support Specialist

dxyy

Question; if after going to Admin - Forum Maintenance - Convert the database and data to UTF-8 and proceeding, I still see the same option "Convert the database and data to UTF-8" does that mean it was not successful?

babjusi

The easiest solution would be to write a php script connecting to the db changing the coalltion charatcters to utf-8. Or you could do it table by table at the db by changing it at the Operation tab or running a few queries.

dxyy

Thanks for your reply, but your easy solution is actually quite complex to me since I know next to nothing about writing scripts.

Care to help? ;)

http://www.simplemachines.org/community/index.php?topic=362048.0

JimM

Once you complete this be sure and mark this topic as solved. :)
Jim "JimM" Moore
Former Support Specialist

dxyy

Quote from: JimM on January 26, 2010, 04:55:58 PM
Once you complete this be sure and mark this topic as solved. :)
I'll be sure to do that, but I'm still struggling with this. I'm going to hire someone to write a script for me, and hopefully that will put this issue to bed. ;)

Advertisement: