Advertisement:

Author Topic: How to convert to UTF-8  (Read 29285 times)

Offline agridoc

  • Language Moderator
  • SMF Hero
  • *
  • Posts: 3,269
  • Gender: Male
    • Aeromodelling GR - Aeromodelling in Greece
How to convert to UTF-8
« on: April 01, 2012, 04:25:33 AM »
* Addition: I realized that I posted on April 1st. Don't be afraid, it's not an April fool joke.

The original Re: It's all Greek to me .... :) is a post with structure of a guide that I planned to finish some day. [3.0] Full UTF8 support reminded me, so I worked a bit

Have a look at and, if found useful, use it as you wish, after possible corrections and/or additions.


You have and use SMF installed in ISO/ANSI and you are challenged to convert to UTF-8. You know that this is something that will have to be done, sooner or later but you may be afraid of conversion to UTF-8.

It's a procedure that needs care and must be done with proper planning.

Let's see things.

- You have a SMF ISO installation, originally in English ISO-8859-1.
- Database tables and text fields are in latin1_swedish_ci.
- Another language may have been added. Check in index.[language].php file this line.
Code: [Select]
$txt['lang_character_set'] = 'ISO-8859-1';The character set might be different than ISO-8859-1. This is what will be used in this case as Data character set in step 8, not ISO-8859-1. If more than one language with character set different than ISO-8859-1 have been used the database can't be properly converted using  integrated SMF converter to UTF-8. Many languages using ISO-8859-1 are no problem. Selecting the proper Data character set is essential for success, it should be examined and decided BEFORE starting the procedure.

You are going to convert to UTF-8. It's a procedure that might not finish properly. Some mods may conflict so it's better to do it without them. To use clean files, a Large Upgrade will be done first.

Steps (if other software is bridged with SMF additional steps might be needed).

In small forums, with simple mods only, steps 1,4,5 and 7 might not be necessary. If conversion fails. use the backup to restore the database and follow all steps or consider to test locally in your PC first.

In medium to large forums it's better to test in your PC first. The safest way is to backup, restore locally in a PC with XAMPP or similar and make tests, following the steps. This way, an admin can have a safe training and test to see if additional problems might occur in a particular conversion.

1. Download SMF latest 1.1.x  or 2.0x Large UPGRADE (Not Update or Install) and additional language pack(s) for this version to a dir in your PCfrom  http://download.simplemachines.org and http://download.simplemachines.org/?smflanguages or, for 1.1x  http://download.simplemachines.org/?archive  selecting version.

2. Put the forum in English (ISO-8859-1) default language and maintenance mode.

3. Backup SMF dir and database, so you can go back if something goes wrong.

4. Upload the content of SMF Upgrade and additional language(s) (must be with same character set as already used) with FTP in SMF dir in the server or upload and unpack with CP File Manager, if available. Unpack usually overwrites old files but this is not always the case.

5. Run upgrade.php with your browser. Use English language. Remember to delete file upgrade.php after the procedure finishes. The Upgrade will delete file changes made by mods. Check the upgrade. If everything seems OK go to next step.

6. In Administration Center » Search » Search Method
   Delete any text index you might have created.
   Select "No index" as search method.

7. Go to PhpMyAdmin and empty (NOT drop) all smf_log_search_* tables. Note that table prefix can be different than "smf_", if so selected by admin.

8. Using English go to Admin -> Forum Maintenance -> Forum Maintenance - General Maintenance: Convert the database and data to UTF-8 <- Click here
You will see after text

Data character set : ISO-8859-1 <- or another language's character set.
Database character set : ISO-8859-1
Convert data and database to : UTF-8

English ISO-8859-1 has no significant difference with UTF-8, other character sets may have.
Note: SMF will attempt to detect your character set for your data, that of the forum's default language. That might not always be the right choice, language use and proper character set is better examined and decided BEFORE and selected here.
Proceed to conversion. You may have to wait for quite some time. Don't rush things.

If the procedure finishes correctly check the forum.

There are a few more steps but number 8 is the critical procedure. If everything worked as expected you should have converted to UTF-8. Messages should show correctly.

9. Go to PhpMyAdmin and check SMF database tables. Text fields must have been set to utf8_general_ci collation. If table collation has not been set, can be done manually with PhpMyAdmin.

10. To complete the conversion you go to and run "Convert HTML-entities to UTF-8 characters". Check again. Go to Administration Center » Forum Maintenance » Routine and run "Find and repair any errors". This will put proper data in search log files.

11. Delete ISO/ANSI language files except English and add their UTF-8 versions. Add
English UTF-8 language too. Select the default UTF-8 forum language in Admin > Server Settings.

12. If everything seems to be OK either install again the mods or upgrade to another SMF version. The latter might require another theme and some of the mods might not be available, so plan carefully. You may also want to change the search method and build a new text index.

13. You may convert the language settings of each user by running the following query:
Code: [Select]
UPDATE smf_members
SET lngfile = CONCAT(lngfile, '-utf8')
WHERE lngfile != ''


Blue text: Based on suggestion by emanuele.
Green text: Based on suggestion by Dzonny.
« Last Edit: April 09, 2012, 02:54:31 AM by agridoc »
  For Greek aeromodellers and our friends around the world  - Greek Button sets for SMF - Greeklish to Greek mod
Δeν αφιερώνω χρόνο για μηνύματα σε greeklish.

Offline agridoc

  • Language Moderator
  • SMF Hero
  • *
  • Posts: 3,269
  • Gender: Male
    • Aeromodelling GR - Aeromodelling in Greece
Re: How to convert to UTF-8
« Reply #1 on: April 02, 2012, 12:09:57 AM »
I realized that I posted this topic on April 1st.  :D
Don't be afraid, it's not an April fool joke.

I also added a notice in first message.
  For Greek aeromodellers and our friends around the world  - Greek Button sets for SMF - Greeklish to Greek mod
Δeν αφιερώνω χρόνο για μηνύματα σε greeklish.

Offline emanuele

  • SMF Super Hero
  • *******
  • Posts: 14,161
  • Gender: Male
  • THERE'S JUST ME
Re: How to convert to UTF-8
« Reply #2 on: April 03, 2012, 06:51:56 AM »
Charsets is one of the things I hate most...
Unfortunately I can't give you any feedback because I always tested the conversions only locally in "ideal conditions"... :(


Take a peek at what I'm doing! ;D



Hai bisogno di supporto in Italiano?

Aiutateci ad aiutarvi: spiegate bene il vostro problema: no, "non funziona" non è una spiegazione!!
1) Cosa fai,
2) cosa ti aspetti,
3) cosa ottieni.

Offline agridoc

  • Language Moderator
  • SMF Hero
  • *
  • Posts: 3,269
  • Gender: Male
    • Aeromodelling GR - Aeromodelling in Greece
Re: How to convert to UTF-8
« Reply #3 on: April 03, 2012, 07:44:45 AM »
The original topic is a good story of conversion to UTF-8, although a special case that I had to test first before giving support.

I tried to gather the most common problems that may occur. Number 13 is added from Wiki, needs some touch.

Character sets and characters  ;D Things are much better now. Being Greek and having started way back in 1985 made me work a lot on them through the years, even screen and printer fonts, as well as system hacks sometimes.
  For Greek aeromodellers and our friends around the world  - Greek Button sets for SMF - Greeklish to Greek mod
Δeν αφιερώνω χρόνο για μηνύματα σε greeklish.

Offline emanuele

  • SMF Super Hero
  • *******
  • Posts: 14,161
  • Gender: Male
  • THERE'S JUST ME
Re: How to convert to UTF-8
« Reply #4 on: April 08, 2012, 09:57:47 AM »
I was wondering: is it really necessary to empty the log_search_* tables?
topics and messages should be two (almost) temporary tables (so the content is create and removed periodically).
results contains only numbers.
And subjects is one of those converted and truncating it would end up removing the possibility to search by topic title in topics created before the conversion (unless a repair boards maintenance is performed after the conversion).


Take a peek at what I'm doing! ;D



Hai bisogno di supporto in Italiano?

Aiutateci ad aiutarvi: spiegate bene il vostro problema: no, "non funziona" non è una spiegazione!!
1) Cosa fai,
2) cosa ti aspetti,
3) cosa ottieni.

Offline agridoc

  • Language Moderator
  • SMF Hero
  • *
  • Posts: 3,269
  • Gender: Male
    • Aeromodelling GR - Aeromodelling in Greece
Re: How to convert to UTF-8
« Reply #5 on: April 08, 2012, 02:29:28 PM »
No, it's not always necessary but their content is a known cause of trouble in conversion, as well as a search index. Their content will also probably be useless after conversion. Emptying them will do no harm, as far as I know.

I examined the structure of [smf]_log_search files and only [smf]_log_search_subjects table contains a text field. [smf]_log_search_results table is also a known cause of trouble, although not containing a text field. I had seen the suggestion of emptying all log_search tables and I adopted it in guide.

I tried to gather the most common causes and give instructions that would not  affect useful data. I doubt that any guide can guarantee a successful conversion always but a more detailed guide is needed in instructions to prevent known causes of trouble, if possible, as also pointed by Illori.

it seems like http://wiki.simplemachines.org/smf/UTF-8_Readme

is slightly out of date, i am not that familiar with utf-8 and converting to it. can someone take a look and update the page?
  For Greek aeromodellers and our friends around the world  - Greek Button sets for SMF - Greeklish to Greek mod
Δeν αφιερώνω χρόνο για μηνύματα σε greeklish.

Offline agridoc

  • Language Moderator
  • SMF Hero
  • *
  • Posts: 3,269
  • Gender: Male
    • Aeromodelling GR - Aeromodelling in Greece
Re: How to convert to UTF-8
« Reply #6 on: April 08, 2012, 02:35:56 PM »
As you are concerned about search, in this case all log_search tables were emptied
I'm happy to say that after 2 days nobody has found anything wrong and all are EXTREMELY  happy to have a functional search again.  I think you've solved this and are a hero to many :)  Thank you!
  For Greek aeromodellers and our friends around the world  - Greek Button sets for SMF - Greeklish to Greek mod
Δeν αφιερώνω χρόνο για μηνύματα σε greeklish.

Offline Dzonny

  • Localizer
  • SMF Super Hero
  • *
  • Posts: 10,324
  • Gender: Male
  • No sleep...
    • dzontra.nikola on Facebook
    • @opusteniforum on Twitter
    • Samo opusteno
Re: How to convert to UTF-8
« Reply #7 on: April 08, 2012, 02:40:51 PM »
Thanks agridoc for your help!:)

Now, i'm not sure why would upgrade be a "must" here, i've done many conversations without upgrading first?
|Sistem za razmenu banera|Servisi za webmastere| My Mods

Dont't fear the reaper...
mail: dzonny (@) simplemachines.org

Offline emanuele

  • SMF Super Hero
  • *******
  • Posts: 14,161
  • Gender: Male
  • THERE'S JUST ME
Re: How to convert to UTF-8
« Reply #8 on: April 08, 2012, 02:49:47 PM »
No, it's not always necessary but their content is a known cause of trouble in conversion, as well as a search index. Their content will also probably be useless after conversion. Emptying them will do no harm, as far as I know.
Hopefully it wouldn't be useless because as I wrote log_search_subjects is in the list of the tables converted, but since there is the possibility to recover the data it wouldn't be such a bit issue. Maybe you can add a note to do a "maintenance > repair boards" at the end of the procedure in order to restore all the search features. ;)

As you are concerned about search, in this case all log_search tables were emptied
No, it's that I'm concerned by the fact that the tables are emptied, I'm more concerned by people going to open phpmyadmin! :P
I just suggested because I know many people are afraid by having to dig into the database and others are more prone to make disasters... :P

BTW, I'm not against it, I was just asking because of my inexperience in charsets conversions. ;)


Take a peek at what I'm doing! ;D



Hai bisogno di supporto in Italiano?

Aiutateci ad aiutarvi: spiegate bene il vostro problema: no, "non funziona" non è una spiegazione!!
1) Cosa fai,
2) cosa ti aspetti,
3) cosa ottieni.

Offline agridoc

  • Language Moderator
  • SMF Hero
  • *
  • Posts: 3,269
  • Gender: Male
    • Aeromodelling GR - Aeromodelling in Greece
Re: How to convert to UTF-8
« Reply #9 on: April 08, 2012, 04:56:21 PM »
Thanks agridoc for your help!:)

Now, i'm not sure why would upgrade be a "must" here, i've done many conversations without upgrading first?

There is a possibility that some mod might interfere with conversion process.

A Large upgrade is recommended to start with "clean" files. It's rather a cleanup. No real upgrade is necessary.

I also recommend default theme and English language to avoid possible errors from code in a custom theme or an error in language file.
  For Greek aeromodellers and our friends around the world  - Greek Button sets for SMF - Greeklish to Greek mod
Δeν αφιερώνω χρόνο για μηνύματα σε greeklish.

Offline Dzonny

  • Localizer
  • SMF Super Hero
  • *
  • Posts: 10,324
  • Gender: Male
  • No sleep...
    • dzontra.nikola on Facebook
    • @opusteniforum on Twitter
    • Samo opusteno
Re: How to convert to UTF-8
« Reply #10 on: April 08, 2012, 05:20:46 PM »
That is right, although i never had some huge problems with mods after converting to utf-8 though. Maybe it would be good to note such thing in first post, and also note that upgrade will "remove" all their installed mods.
For example, if forum is "new" and don't have many mods (or have few small mods) user can skip some steps (1, 4 and 5). If forum is just installed for example user should just do 8 step IMO (there's no need to upgrade or empty any tables in this case).
|Sistem za razmenu banera|Servisi za webmastere| My Mods

Dont't fear the reaper...
mail: dzonny (@) simplemachines.org

Offline agridoc

  • Language Moderator
  • SMF Hero
  • *
  • Posts: 3,269
  • Gender: Male
    • Aeromodelling GR - Aeromodelling in Greece
Re: How to convert to UTF-8
« Reply #11 on: April 08, 2012, 05:33:06 PM »
emanuele you are right in your comments about log_search_subjects and a board repair must be recommended. Thank you, it's something I had forgotten. Database conversions are not done very often.

About worrying for PhpMyAdmin. I had written in [3.0] Full UTF8 support
Quote
No problem for a new installation. A difficult task would be database conversion in upgrades, it's a procedure that must be done with care, prone to problems and possible data loss. Unless convinced otherwise in the future,  I believe that it should be a separate process.

A user friendly, trouble free, click and go conversion to UTF-8 would be ideal but is it possible?

I am afraid that, before starting a conversion to UTF-8. an admin must be sure of his abilities to backup and restore safely database and site, in large sites more than PhpMyAdmin may be needed. Otherwise he should obtain assistance.

The safest way is to backup, restore locally in a PC with XAMPP or similar and make tests. It might be useful to add this in guide. This way, an admin can have a safe training and test to see the problems that might occur in particular conversion.
  For Greek aeromodellers and our friends around the world  - Greek Button sets for SMF - Greeklish to Greek mod
Δeν αφιερώνω χρόνο για μηνύματα σε greeklish.

Offline agridoc

  • Language Moderator
  • SMF Hero
  • *
  • Posts: 3,269
  • Gender: Male
    • Aeromodelling GR - Aeromodelling in Greece
Re: How to convert to UTF-8
« Reply #12 on: April 08, 2012, 05:49:02 PM »
From guide
Quote
Some mods may conflict so it's better to do it without them. To use clean files, a Large Upgrade will be done first.
One can add that mods must be installed again, although I believe that this should be understood.

I have seen mods hiding the database conversion option, as well as interfering in conversion process. I don't remember which mods did what and we can't check all mods for such behavior. So the safe way it to do without mods.

In small forums restore doesn't take much time so an unsuccessful attempt is no big deal. Not so with a medium or big forum.

There are other things that I may not remember now or I don't know and can be added in the guide. With cooperation it can be made as full as possible.
  For Greek aeromodellers and our friends around the world  - Greek Button sets for SMF - Greeklish to Greek mod
Δeν αφιερώνω χρόνο για μηνύματα σε greeklish.

Offline agridoc

  • Language Moderator
  • SMF Hero
  • *
  • Posts: 3,269
  • Gender: Male
    • Aeromodelling GR - Aeromodelling in Greece
Re: How to convert to UTF-8
« Reply #13 on: April 23, 2012, 01:23:08 AM »
I did some changes based on suggestions by Dzonny and emanuele quite a few days ago.
  For Greek aeromodellers and our friends around the world  - Greek Button sets for SMF - Greeklish to Greek mod
Δeν αφιερώνω χρόνο για μηνύματα σε greeklish.

Offline Dzonny

  • Localizer
  • SMF Super Hero
  • *
  • Posts: 10,324
  • Gender: Male
  • No sleep...
    • dzontra.nikola on Facebook
    • @opusteniforum on Twitter
    • Samo opusteno
Re: How to convert to UTF-8
« Reply #14 on: April 23, 2012, 09:48:44 AM »
Thanks agridoc for your help :)

I would suggest to make wiki page about this rather then editing UTF-8 Readme page (as it's kinda out of date), and it would be easier for all us to keep track of it on new page regard just converting to utf. Would you like to create it agridoc, so we can all later help with rephrasing that?
|Sistem za razmenu banera|Servisi za webmastere| My Mods

Dont't fear the reaper...
mail: dzonny (@) simplemachines.org

Offline agridoc

  • Language Moderator
  • SMF Hero
  • *
  • Posts: 3,269
  • Gender: Male
    • Aeromodelling GR - Aeromodelling in Greece
Re: How to convert to UTF-8
« Reply #15 on: April 25, 2012, 12:14:39 AM »
I do agree that it must be a separate page, it's too big to be added in UTF-8 Readme page, a link should be added there for "a more comprehensive guide" or something like that.

I am thinking of adding a section for a special case that must be quite frequent. SMF started as ISO/ANSI installation then a UTF-8 language was added and mostly used. While it shows as UTF-8 it isn't, although data is stored as UTF-8 in latin1_swedish_ci tables. I have done, tested and helped to do such conversions with 1.1x, not with 2.0x yet. I will test when I find time. It should work the same but some tests must be done to be completely sure.
  For Greek aeromodellers and our friends around the world  - Greek Button sets for SMF - Greeklish to Greek mod
Δeν αφιερώνω χρόνο για μηνύματα σε greeklish.

Offline Angelina Belle

  • SMF Friend
  • SMF Hero
  • *
  • Posts: 7,589
Re: How to convert to UTF-8
« Reply #16 on: April 25, 2012, 08:21:44 AM »
So I think it would be perfect if you go to http://wiki.simplemachines.org/smf/How_to_convert_your_forum_to_UTF-8
You will see it does not exist, but it will tell you how to get started on that document.

Put in there all the instructions you have so far.

After that, everyone who is allowed to edit the wiki (all users with at least 10 posts) will be able to make changes.
Usually, these changes are improvements. But you can watch that page to see how it develops.

Thanks!
Never attribute to malice that which is adequately explained by stupidity. -- Hanlon's Razor

Offline Antechinus

  • SMF Friend
  • SMF Super Hero
  • *
  • Posts: 22,778
  • Master of BBC Abuse
Re: How to convert to UTF-8
« Reply #17 on: April 29, 2012, 08:07:12 PM »
Suggestion, quoted from one of my posts in the dev boards:

Quote
And no way would I try and convert my live db and rely on restoring a backup if things went wrong. I've got more sense than that. I'd just copy the existing db over to a new one and convert that, then point the forum at it. Much easier to fix if things go wrong.
Mods - Manky Old Themes - Apocalypse theme (WIP)

Going through 2.0.x in detail is like picking through something that dropped out of the rear end of a vulture. Every couple of seconds I'm like "Oooo, that's gruesome. WTF is that doing in there?"

It doesn't matter if the glass is half empty or half full. There is clearly room for more beer.

Offline agridoc

  • Language Moderator
  • SMF Hero
  • *
  • Posts: 3,269
  • Gender: Male
    • Aeromodelling GR - Aeromodelling in Greece
Re: How to convert to UTF-8
« Reply #18 on: April 29, 2012, 11:40:29 PM »
There is this that is similar but not the same.

In small forums, with simple mods only, steps 1,4,5 and 7 might not be necessary. If conversion fails. use the backup to restore the database and follow all steps or consider to test locally in your PC first.

In medium to large forums it's better to test in your PC first. The safest way is to backup, restore locally in a PC with XAMPP or similar and make tests, following the steps. This way, an admin can have a safe training and test to see if additional problems might occur in a particular conversion.
...
Green text: Based on suggestion by Dzonny.

Still there would be much work to install again mods, themes etc. However one might do
- Put the forum in maintenance mod
- Copy SMF tables to another DB.
- Install a small simple mod.
- Point SMF to new DB
Then follow the conversion steps.

So in case that something goes wrong it will be easy to point back to original DB and revert to backup before last mod install.
  For Greek aeromodellers and our friends around the world  - Greek Button sets for SMF - Greeklish to Greek mod
Δeν αφιερώνω χρόνο για μηνύματα σε greeklish.

Offline Angelina Belle

  • SMF Friend
  • SMF Hero
  • *
  • Posts: 7,589
Re: How to convert to UTF-8
« Reply #19 on: May 03, 2012, 03:19:32 PM »
So -- is this information ready to create a how-to on the wiki?

As long as the information is correct, please put it on the wiki and discuss here. We can continue to edit and improve.

Thanks!
Never attribute to malice that which is adequately explained by stupidity. -- Hanlon's Razor