• Welcome to Simple Machines Community Forum. Please login or sign up.
September 23, 2021, 11:23:51 AM

News:

SMF 2.0.18 has been released! Please update. Read more.


UTF-8 Conversion

Started by JayBachatero, January 08, 2007, 09:39:10 PM

Previous topic - Next topic

CSpili

I'm having some problems converting from the greek language of MyBB 1.2 to SMF 1.1.3.  Still shows alien figures unfortunately :(

JayBachatero

Did you setup smf to the correct chatset?  Also did you download the correct language pack.
Follow me on Twitter

"HELP!!! I've fallen and I can't get up"
This moment has been brought to you by LifeAlert

beate_r

What about running recode or iconv on a database dump and then importing the new database to SMF? Has anyone tried?

Sarge

August 19, 2007, 07:52:57 AM #23 Last Edit: August 19, 2007, 08:04:23 AM by Sarge
Be sure to export and import the database with the correct charsets:
http://textsnippets.com/posts/show/84

The sed syntax posted in the link above has never been correct for me in CentOS and Ubuntu (invalid arguments, I think), but the alternate syntax (posted in the last comment) always worked fine:
sed -r 's/latin1/utf8/g' dump.sql > dump_utf.sql

If you can't import because you don't have shell access, you can add
SET NAMES utf8
or
SET NAMES 'utf8'
at the start of the database dump file. If you do this, save the dump file in the correct charset! Alternatively, use/modify a restore script that can "SET NAMES utf8" when importing.

    Please do not PM me with support requests unless I invite you to.

http://www.zeriyt.com/   ~   http://www.galeriashqiptare.net/


Quote
<H> I had zero posts when I started posting

kicho

Quote from: Sarge on August 19, 2007, 07:52:57 AM
Be sure to export and import the database with the correct charsets:
http://textsnippets.com/posts/show/84 [nofollow]

The sed syntax posted in the link above has never been correct for me in CentOS and Ubuntu (invalid arguments, I think), but the alternate syntax (posted in the last comment) always worked fine:
sed -r 's/latin1/utf8/g' dump.sql > dump_utf.sql

If you can't import because you don't have shell access, you can add
SET NAMES utf8
or
SET NAMES 'utf8'
at the start of the database dump file. If you do this, save the dump file in the correct charset! Alternatively, use/modify a restore script that can "SET NAMES utf8" when importing.


Can anyone do this for me? Pretty, pretty, pretty please. :(

Sarge

kicho, I can try. Send me the details via PM. I prefer access to the original (unconverted) phpBB database, if possible.

    Please do not PM me with support requests unless I invite you to.

http://www.zeriyt.com/   ~   http://www.galeriashqiptare.net/


Quote
<H> I had zero posts when I started posting

kicho


CSpili

August 27, 2007, 12:52:37 PM #27 Last Edit: August 27, 2007, 02:09:54 PM by CSpili
Jay, sorry for the delay, I was on vacation...  At the moment it's still showing alien.  DB config is UTF-8, mybb has latin1 as usual, and smf is installed with utf-8 support.

Well, at the moment, the problem still resides.  Apparently, the SQL DB is installed as utf-8, mybb has latin1 as usual and the board is in greek (iso 8859-7), however, the converter probably understands it as 8859-1 and there's no option to change it in -7 in the convert db @ the admin panel or when I'm converting the forum.

Please advise.
Thanks in advance,
Constantinos

Sarge

CSpili, if you don't hear back from Jay sooner, I will try to help you as soon as I finish helping kicho (above) :)

    Please do not PM me with support requests unless I invite you to.

http://www.zeriyt.com/   ~   http://www.galeriashqiptare.net/


Quote
<H> I had zero posts when I started posting

sektor

Hello,

Any idea why i get a "Hacking attempt..." message when i try to convert to UTF8?

Sarge

Quote from: al0000 on August 27, 2007, 06:20:14 PM
Any idea why i get a "Hacking attempt..." message when i try to convert to UTF8?

Can you please provide more information? For example, the forum software you're converting from, where you downloaded convert.php and the .sql file, and how you tried the conversion.

A step-by-step example would help us in determining whether the issue can be replicated or is specific to your case.

    Please do not PM me with support requests unless I invite you to.

http://www.zeriyt.com/   ~   http://www.galeriashqiptare.net/


Quote
<H> I had zero posts when I started posting

Sarge

CSpili, I managed to repair kicho's forum, so I can work on yours. Send me a PM. :)

    Please do not PM me with support requests unless I invite you to.

http://www.zeriyt.com/   ~   http://www.galeriashqiptare.net/


Quote
<H> I had zero posts when I started posting

CSpili


Sarge

September 02, 2007, 04:40:32 AM #33 Last Edit: September 02, 2007, 04:42:06 AM by Sarge
Quote from: CSpili on September 01, 2007, 08:02:09 AM
you've got PM Sarge

OK, I got the info. Text and varchar columns in your MyBB database tables seem to be in latin1, whereas connections to MySQL are made in utf8. This causes Greek characters to get garbled during transfer.

First of all, put your MyBB forum in "Maintenance Mode" (I don't know what the relevant setting is in MyBB). Then get a full backup of MyBB database tables, or delete the SMF tables and backup the database. I have found backups generated by cPanel to be pretty reliable. Keep this backup somewhere safe -- don't use it for the procedure described below.

Create a database backup using a modified create_backup.php (attached) -- this modified tool creates a database backup using the latin1 charset, regardless of the charset that MySQL uses by default (utf8 in your case) to transfer data. You can specify the table prefix (probably mybb_) so that you get only MyBB tables.

Assuming you're running Windows on your home computer, download and install the Win32 version of GNU sed from here:
http://gnuwin32.sourceforge.net/downlinks/sed.php

Open a command prompt (Start > All Programs > Accessories > Command Prompt, in Windows XP), type the following command and hit Enter:

path-to-sed s/latin1/utf8/g path-to-db-backup.sql > path-to-db-backup-sed.sql

Replace path-to-sed, path-to-db-backup.sql and path-to-db-backup-sed.sql with correct values. For example, if your (uncompressed) database dump is saved as backup.sql in C:\ and sed.exe is located in C:\Program Files\GnuWin32\bin, the correct command would be:

"C:\Program Files\GnuWin32\bin\sed" s/latin1/utf8/g C:\backup.sql > C:\backup-sed.sql

Then import the created backup-sed.sql to phpMyAdmin, cPanel or whatever you use to import database backups. Hopefully, your MyBB installation should now be in UTF-8 and you should be able to convert to SMF without any problems.

There are some possible issues with codepage conversions, for which you can use a text editor that allows you to specify the codepage; UltraEdit-32  seems to support this.

If you run into any issues or are afraid of doing the above by yourself, you can send me cPanel login details via PM and I will carry out the conversion process for you.

    Please do not PM me with support requests unless I invite you to.

http://www.zeriyt.com/   ~   http://www.galeriashqiptare.net/


Quote
<H> I had zero posts when I started posting

designer0307

September 05, 2007, 08:01:34 AM #34 Last Edit: September 05, 2007, 08:03:18 AM by designer0307
Hi ,
I have similar problem as you guys but still i can't manage to fix it.
The matter is simple.I have just converted phpbb to SMF and everything works fine except encoding.I don't have  "ąśźć" (polish signs) in the posts converted.I only see"? ? ?". Phpbb datebase is encoded "latin1_swedish_ci" in phpmy Admin same with SMF tables. Smf is fresh installed without"utf8" marked during installation. I tried to mark it and then convert but problem remains the same.
Simply,converter changes proper polish signs to question marks.How i can fix this?

If someone can help me i would appreciate it.

Sarge

designer0307, first of all, welcome to SMF! :)

I suspect the accented characters problem has to do with database table charset and/or collation.

Post the configuration variables mentioned in this post:
http://www.simplemachines.org/community/index.php?topic=165442.msg1056581#msg1056581

    Please do not PM me with support requests unless I invite you to.

http://www.zeriyt.com/   ~   http://www.galeriashqiptare.net/


Quote
<H> I had zero posts when I started posting

designer0307

September 05, 2007, 03:33:33 PM #36 Last Edit: September 05, 2007, 03:37:41 PM by designer0307
 That's what i thought too but it seems that it's not the case. Phpbb tables are encoded in latin1_swedish_ci so i changed smf tables to latin1 swedish ci and then converted but there were still question marks in the posts;/

I tried installing smf in utf8 but this doesn't help at all.When i did it,  phpbb tables were latin1_swedish_ci and smf tables(in the same datebase) were utf8_general ci.

Here are the variables u asked for.I hope this will help a  bit.


character set client     utf8
(Global value)    latin1
character set connection    utf8
(Global value)    latin1
character set database    latin1
character set results    utf8
default_charset(Global value)    latin1
character set server    latin1
character set system    utf8
character sets dir    /usr/share/mysql/charsets/
collation connection    utf8_unicode_ci
(Global value)    latin1_swedish_ci
collation database    latin1_swedish_ci
collation server    latin1_swedish_ci


default charset -no value
_SERVER HTTP_ACCEPT_CHARSET"]   -ISO-8859-2, utf-8;q=0.7,*;q=0.7
_ENV["HTTP_ACCEPT_CHARSET"]   -ISO-8859-2, utf-8;q=0.7,*;q=0.7


Ps.I installed polish language pack as well and i used default smf template.I tried bot packs:

* smf-1-1-2_polish_iso-8859-2.zip
* smf-1-1-2_polish_utf-8.zip

Sarge

Link to your phpBB forum? Send it to me via PM if, for some reason, you don't want to post it publicly.

Are you using SMF now? When converting from phpBB, new posts, members etc. in your new SMF forum will be lost, although I can help with posts.

    Please do not PM me with support requests unless I invite you to.

http://www.zeriyt.com/   ~   http://www.galeriashqiptare.net/


Quote
<H> I had zero posts when I started posting

designer0307

September 05, 2007, 05:32:31 PM #38 Last Edit: September 05, 2007, 05:34:50 PM by designer0307
Sure thing.I have modified phpbb colled "phpbb by przemo".It converts  with ease though. i am using phpbb2 converter for this purpose.The only problem is only encoding.
hxxp:forum.forumwow.net [nonactive]

I want to convert it to SMF and add tiny portal to it so i am not currently using SMF.Only fresh clean installation.

This is what i managed to convert:
hxxp:forumwow.net/smf [nonactive]

I don't care about anything byt posts.

Thx for helping me ;)

Sarge

I see. The data gets saved as ISO-8859-2, which is also indicated by the encoding used in the phpBB forum pages (View -> Character Encoding in Firefox).

Do Polish characters show up correctly in new posts in SMF?

I will be try to help Jay add support for multiple character sets in the converters, but meanwhile I can solve the issue for you. PM me your access details (cPanel, FTP, SMF Admin accounts) and I will do the conversion for you.

Let me know if you want to keep the existing SMF installation (and new posts and members), or redo the conversion from phpBB and lose the new posts. Personally, I prefer the second method, because it's cleaner (read my previous post), but it's up to you.

    Please do not PM me with support requests unless I invite you to.

http://www.zeriyt.com/   ~   http://www.galeriashqiptare.net/


Quote
<H> I had zero posts when I started posting

Advertisement: