News:

Bored?  Looking to kill some time?  Want to chat with other SMF users?  Join us in IRC chat or Discord

Main Menu

Wrong characterset?

Started by HunterP, April 11, 2011, 08:21:25 PM

Previous topic - Next topic

HunterP


Hi,

Recently converted a phpBB3 board to SMF2.0. Before this conversion, the board had moved to another host. I guess something went wrong when the DB was imported, because I'm getting weird characters like :

Skarsterlân

This should be :

Skarsterlân

How can I solve this?

MrPhil

It looks like your text is in UTF-8, while your page is in some single-byte encoding such as Latin-n (ISO-8859-n). Though, I've never seen a common Western European character like a+hat take up four bytes in UTF-8, unless possibly it's some sort of base + combining accent. In your browser, check by changing the encoding on the fly (View > Character Encoding, or something like that, and select UTF-8) and see if you can determine what the correct encoding is. Anyway, check what your database and language file encoding is, and then what the page is being displayed in. They should all be the same, and probably aren't.

HunterP


Found the solution. Weird enough I encountered this twice, but both times were a bit different, that's why it didn't trigger me the second time. There seems to be something wrong with the Display.template.php from the full package. I think there is a BOM in the file, when I copy its contents to a clean file, without any changes, the file becomes a few bytes smaller and the problem is solved!

Maybe someone can check the zipfiles in the download section?

MrPhil

Odd. A BOM in a file should have no influence on what's in the database, or what encoding a page is displayed in. What it should do is cause a "headers already sent" error message. Anyway, you can check for yourself by putting your PC or editor into Latin-x encoding (not be in UTF-8) and taking a look at the PHP file in question. You should then clearly see the three byte BOM at the beginning.

HunterP

Quote from: MrPhil on April 12, 2011, 08:12:43 PM
Odd. A BOM in a file should have no influence on what's in the database, or what encoding a page is displayed in. What it should do is cause a "headers already sent" error message. Anyway, you can check for yourself by putting your PC or editor into Latin-x encoding (not be in UTF-8) and taking a look at the PHP file in question. You should then clearly see the three byte BOM at the beginning.

I've seen this on two separate forums. When I view the page source, the BOM is shown as first (3) characters and I guess this messes up the line which shows the charset. Some browsers might render this line correctly (FireFox does), some don't (IE doesn't).

The HTML-files started with :

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

I assume that some browsers want to see <!DOCTYPE starting at the beginning of a line, or at the beginning of a file? Anyway, when this was fixed, the problem was solved.....

MrPhil

The BOM must have been part of, or near, whatever output the DOCTYPE tag. Usually, when a BOM is sitting at the top of a .php file, it is outside of the PHP code (before <?php ) and gets sent directly to the browser as HTML code. If it's early enough in the process of creating a page, the application may attempt to send "headers" after the BOM has already been sent, which causes the "headers already sent" error (sending any text or HTML will trigger the sending of whatever HTTP headers have been set so far). If it's late enough in the process, as it evidently was in your case, no headers are being sent after the BOM happened to be sent, so no harm done.

Most browsers want to see any DOCTYPE tag at the very beginning of the page, and may ignore it (go into "quirks mode") if it isn't the very first thing. That might have an effect on your character set encoding used. So, it's still a good idea to get rid of any BOM that should happen to be in your files. I believe that there is some sort of "file_check.php" utility floating around SMF that, among other things, checks for stray BOMs.

HunterP


I didn't touch the files before I installed SMF and as it happened on two forums, the BOM has to be in the SMF-package.

Anyway, if someone else experiences the same problem in the future, this topic might be helpfull.

Advertisement: