SMF Support > SMF 2.0.x Support

index.english-utf8.php is missing

<< < (7/8) > >>

frakme:
I am still having trouble with this issue. Does anyone have any more input.

For a refresher, the latin characters only appear in the body of messages and simple portal pages AFTER the post or preview button has been pushed. They appear correctly in the database and in the edit/modify screens of the forum itself.

MrPhil:

--- Quote from: frakme on March 23, 2012, 07:19:49 PM ---I added the utf-8 files in the themes/default/languages file, the forum is set to english/utf8 but foreign characters are still not showing. This is where I lack knowledge because I thought the purpose of utf8 was so that one didn't have to add each individual language fie.

--- End quote ---

Setting the forum language to English merely means that prompts and labels and messages come out in (American) English (sorry, K@!). You can add other languages if you want, selected by the viewers, so long as they all use the same encoding (UTF-8, in your case). The "encoding" is a different matter, giving what alphabet(s) to use. Latin-1 (ISO-8859-1) is the default for SMF, and supports English and other "Western European" languages. If someone wants to submit a post about the Greek economic crisis in Greek text, they're outta luck. By using UTF-8, anyone can submit a post in any language known to humanity (plus Klingon, Elvish, and other made-up languages).

Several things need to match, for "foreign characters" to show up. Your database itself needs to be in UTF-8, and its content should have been converted from Latin-1 to UTF-8 at some point (e.g., an é encoded in Latin-1 as a single byte is now a two-byte UTF-8 character). All the language support files for all the languages you want to support need to be in UTF-8. Finally, each page needs to be output specifying UTF-8 as the charset (encoding). That will show up in the <meta ... charset=UTF-8... tag in each page.

I don't think you've told us yet how accented characters "fail" to show. Are you seeing a string of two or more European accented characters for each expected accented character? If so, that means the text is being provided in UTF-8, but the page is displaying in Latin-1. Are you seeing a ?-in-black-diamond glyph instead of the expected accented character? If so, that means the text is in Latin-1, but the page display is in UTF-8. Is there any difference between fixed language file text and content from the database (such as posts)? If they show accented characters differently, that could be a clue.

N.B.: some servers are misconfigured so that they force the page to be displayed in Latin-1, even though the <meta> tag for charset says UTF-8. If your meta tags says UTF-8 yet you're seeing multiple accented European characters for each accented or non-Latin character, tell your browser to show the encoding in use: View > Character Set (or something similar). If it says Latin-1/ISO-8859-1/Western European, change it to UTF-8 and see if the problem goes away for that page. If so, talk to your host and ask why their server is forcing Latin-1.

Finally, if you're missing an English UTF-8 file, just copy the corresponding regular English file to the UTF-8 name or directory (whichever is appropriate). English does not make use of non-ASCII characters (no accents), so it's unlikely that there would be any difference in the text between Latin-1 and UTF-8. Possibly if a Pound Sterling character, or a hard coded non-breaking space character, or a guillemet quotation mark is used, you might have to edit the file to fix those, but otherwise it's unlikely you'll have to make any edits. Same thing for any missing image/icon files: just copy them from regular English into UTF-8 English area.

P.S. Not a "foreign character", but certain punctuation characters cut and pasted from Microsoft Word are "Smart Quotes" which will mess up UTF-8 and (often) Latin-1 displays. That is a separate issue. Let us know if the problem is actually that certain punctuation characters are disappearing and/or taking the rest of a post with them.

frakme:

--- Quote --- By using UTF-8, anyone can submit a post in any language known to humanity (plus Klingon, Elvish, and other made-up languages).

--- End quote ---
Uh, yeah. That's why my original post was looking for the missing UTF8 language pack- the pack that "disappeared" after the website was upgraded to smf 2.02. The forum and database have always been set to UTF8 for that very reason-our posters use many languages to enhance their stories. 


--- Quote ---I don't think you've told us yet how accented characters "fail" to show. Are you seeing a string of two or more European accented characters for each expected accented character?
--- End quote ---
I'm sorry to say I've said this in multiple posts above. They appear as latin characters, Å, Ä,ƒ ¾Ã and so forth. They appear "correct" with the correct language letter (be it czech, japanese or arabic) in the database. The appear correct in every area of the website and forum EXCEPT the actual post body. If you edit any post body using the modify button, the same post appears correct but will return to the incorrect (ie latin character) presentation after pressing the "post" button.


--- Quote ---Are you seeing a string of two or more European accented characters for each expected accented character? If so, that means the text is being provided in UTF-8, but the page is displaying in Latin-1.
--- End quote ---
Yes, that is what I believe the problem is. I posted the above posts asking how I can fix that because I don't know why they are doing that. The database and forum are set to UTF8, with the appropriate language packs installed. All tables, columns etc are set to utf8 unicode, the database is as well. UTF8 is set as the language via the smf Cpanel and the meta tags indicate the website is set to utf8.

So what is my next step? Again, I appreciate the help and patience. Thank you.

MrPhil:
If your text is definitely in UTF-8, is your page specifying UTF-8 charset (encoding)? If you are (everything everywhere is UTF-8), in your browser go to View > Character Encoding and see if it's showing UTF-8. If not, select UTF-8 and see if the bad characters clear up. Then ask your host why their server is overriding your encoding specification.

Regarding missing UTF-8 language files, for English, simply copy over the missing ones from regular (Latin-1) English. For other languages, you'll have to convert from Latin-1 or other encoding to UTF-8.

frakme:

--- Quote ---If you are (everything everywhere is UTF-8), in your browser go to View > Character Encoding and see if it's showing UTF-8. If not, select UTF-8 and see if the bad characters clear up.
--- End quote ---

The same characters appear correct (an umlaut appears as an umlaut for example) in certain sections of the website  so it is not a browser issue.



--- Quote ---Regarding missing UTF-8 language files, for English, simply copy over the missing ones from regular (Latin-1) English. For other languages, you'll have to convert from Latin-1 or other encoding to UTF-8.
--- End quote ---

I was missing the php file and the 2.02 version of uft8 language pack. Luckily that issue was solved at the beginning of this post.


--- Quote ---Then ask your host why their server is overriding your encoding specification.
--- End quote ---
According to my host, they are not.



--- Quote ---For other languages, you'll have to convert from Latin-1 or other encoding to UTF-8.
--- End quote ---
According to my database collation information, everything has been converted to UTF8 unicode.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version