Character encoding (UTF-8) does not work on clean install of 2.0 RC5

Started by sakaal, May 01, 2011, 06:37:23 PM

Previous topic - Next topic

sakaal

I have installed 2.0 RC5 and the Finnish language pack on a clean server.
smf_2-0-rc5_install.tar.gz
smf_2-0-rc5_finnish.tar.gz

The database is PostgreSQL 8.4 running on a separate host.

Both hosts are Ubuntu 10.10 LTS servers.

I have full root access to the whole environment, database, Apache 2.2, and everything else.

The database has  been created as:

create database smf with owner smf encoding 'UTF8';

Running Simple Machines installer works fine when I do everything in English only (not installing the Finnish language pack and not using Scandinavian letters, i.e. umlauts).

If I try to run the installer after installing the Finnish language pack, the installer fails inserting some rows to the database.

When I try to use Scandinavian letters (either with the English installation or after installing the Finnish language pack manually), I get the same errors with character encodings.

All the errors are similar to this:

ERROR: invalid byte sequence for encoding "UTF8": 0xf6e42020
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
File: /srv/www/www.jutska.net-ssl/htdocs/foorumi/Sources/Subs-Post.php
Line: 2468

I always get this kind of messages when trying to use umlauts, etc.

I tried to set $db_character_set = 'utf8'; and even "'UTF8'" (with double quoting) as suggested on some older forum post. I tried many other values as well, with no success. I did restart Apache 2 between changes and tried all the usual stuff.

This should be a very basic use case: to do a clean install on a UTF-8 PostgreSQL 8.4 server with one language pack, don't you think?

Please provide instructions, what are the correct settings for UTF-8 PostgreSQL database? I don't care about the configuration wizards; just need to know what to put in the configuration files.

And yes, I do need to use UTF-8. I'm integrating it with other systems, which are also based on UTF-8. I do need to use PostgreSQL, because it's a real database and the other systems need it (for example PostGIS).

Everybody should use UTF-8 on the web anyway. Web is global, so locally limited character sets should be deprecated. UTF-8 is the way to go.

Oldiesmann

Make sure you install the Finnish-UTF8 language pack, and not the regular Finnish one.
Michael Eshom
Christian Metal Fans

Advertisement: