SMF Community Helpers > SMF Documentation Help
How to convert to UTF-8
agridoc:
* Addition: I realized that I posted on April 1st. Don't be afraid, it's not an April fool joke.
The original Re: It's all Greek to me .... :) is a post with structure of a guide that I planned to finish some day. [3.0] Full UTF8 support reminded me, so I worked a bit
Have a look at and, if found useful, use it as you wish, after possible corrections and/or additions.
You have and use SMF installed in ISO/ANSI and you are challenged to convert to UTF-8. You know that this is something that will have to be done, sooner or later but you may be afraid of conversion to UTF-8.
It's a procedure that needs care and must be done with proper planning.
Let's see things.
- You have a SMF ISO installation, originally in English ISO-8859-1.
- Database tables and text fields are in latin1_swedish_ci.
- Another language may have been added. Check in index.[language].php file this line.
--- Code: ---$txt['lang_character_set'] = 'ISO-8859-1';
--- End code ---
The character set might be different than ISO-8859-1. This is what will be used in this case as Data character set in step 8, not ISO-8859-1. If more than one language with character set different than ISO-8859-1 have been used the database can't be properly converted using integrated SMF converter to UTF-8. Many languages using ISO-8859-1 are no problem. Selecting the proper Data character set is essential for success, it should be examined and decided BEFORE starting the procedure.
You are going to convert to UTF-8. It's a procedure that might not finish properly. Some mods may conflict so it's better to do it without them. To use clean files, a Large Upgrade will be done first.
Steps (if other software is bridged with SMF additional steps might be needed).
In small forums, with simple mods only, steps 1,4,5 and 7 might not be necessary. If conversion fails. use the backup to restore the database and follow all steps or consider to test locally in your PC first.
In medium to large forums it's better to test in your PC first. The safest way is to backup, restore locally in a PC with XAMPP or similar and make tests, following the steps. This way, an admin can have a safe training and test to see if additional problems might occur in a particular conversion.
1. Download SMF latest 1.1.x or 2.0x Large UPGRADE (Not Update or Install) and additional language pack(s) for this version to a dir in your PCfrom http://download.simplemachines.org and http://download.simplemachines.org/?smflanguages or, for 1.1x http://download.simplemachines.org/?archive selecting version.
2. Put the forum in English (ISO-8859-1) default language and maintenance mode.
3. Backup SMF dir and database, so you can go back if something goes wrong.
4. Upload the content of SMF Upgrade and additional language(s) (must be with same character set as already used) with FTP in SMF dir in the server or upload and unpack with CP File Manager, if available. Unpack usually overwrites old files but this is not always the case.
5. Run upgrade.php with your browser. Use English language. Remember to delete file upgrade.php after the procedure finishes. The Upgrade will delete file changes made by mods. Check the upgrade. If everything seems OK go to next step.
6. In Administration Center » Search » Search Method
Delete any text index you might have created.
Select "No index" as search method.
7. Go to PhpMyAdmin and empty (NOT drop) all smf_log_search_* tables. Note that table prefix can be different than "smf_", if so selected by admin.
8. Using English go to Admin -> Forum Maintenance -> Forum Maintenance - General Maintenance: Convert the database and data to UTF-8 <- Click here
You will see after text
Data character set : ISO-8859-1 <- or another language's character set.Database character set : ISO-8859-1Convert data and database to : UTF-8English ISO-8859-1 has no significant difference with UTF-8, other character sets may have.
Note: SMF will attempt to detect your character set for your data, that of the forum's default language. That might not always be the right choice, language use and proper character set is better examined and decided BEFORE and selected here.
Proceed to conversion. You may have to wait for quite some time. Don't rush things.
If the procedure finishes correctly check the forum.
There are a few more steps but number 8 is the critical procedure. If everything worked as expected you should have converted to UTF-8. Messages should show correctly.
9. Go to PhpMyAdmin and check SMF database tables. Text fields must have been set to utf8_general_ci collation. If table collation has not been set, can be done manually with PhpMyAdmin.
10. To complete the conversion you go to and run "Convert HTML-entities to UTF-8 characters". Check again. Go to Administration Center » Forum Maintenance » Routine and run "Find and repair any errors". This will put proper data in search log files.
11. Delete ISO/ANSI language files except English and add their UTF-8 versions. Add
English UTF-8 language too. Select the default UTF-8 forum language in Admin > Server Settings.
12. If everything seems to be OK either install again the mods or upgrade to another SMF version. The latter might require another theme and some of the mods might not be available, so plan carefully. You may also want to change the search method and build a new text index.
13. You may convert the language settings of each user by running the following query:
--- Code: ---UPDATE smf_members
SET lngfile = CONCAT(lngfile, '-utf8')
WHERE lngfile != ''
--- End code ---
Blue text: Based on suggestion by emanuele.
Green text: Based on suggestion by Dzonny.
agridoc:
I realized that I posted this topic on April 1st. :D
Don't be afraid, it's not an April fool joke.
I also added a notice in first message.
emanuele:
Charsets is one of the things I hate most...
Unfortunately I can't give you any feedback because I always tested the conversions only locally in "ideal conditions"... :(
agridoc:
The original topic is a good story of conversion to UTF-8, although a special case that I had to test first before giving support.
I tried to gather the most common problems that may occur. Number 13 is added from Wiki, needs some touch.
Character sets and characters ;D Things are much better now. Being Greek and having started way back in 1985 made me work a lot on them through the years, even screen and printer fonts, as well as system hacks sometimes.
emanuele:
I was wondering: is it really necessary to empty the log_search_* tables?
topics and messages should be two (almost) temporary tables (so the content is create and removed periodically).
results contains only numbers.
And subjects is one of those converted and truncating it would end up removing the possibility to search by topic title in topics created before the conversion (unless a repair boards maintenance is performed after the conversion).
Navigation
[0] Message Index
[#] Next page
Go to full version