News:

Want to get involved in developing SMF, then why not lend a hand on our github!

Main Menu

index.english-utf8.php is missing

Started by frakme, March 23, 2012, 06:27:24 PM

Previous topic - Next topic

frakme

Shall I post more or is that enough to get someone to reply?

frakme

Is there any more assistance available on this matter?

ziycon

I had a brief look over this topic, I see that the database is set to utf8_unicode_ci and the database tables and the connection are set to utf8_general_ci. Would you be able to backup the whole database and change everything to one or the other, might be easiest to change everything to utf8_unicode_ci and the server seems to already be set to that.

frakme

Thank you for your reply. I will try that and see how it goes.

Norv

Hello frakme,

Sorry to have gotten so late to this issue. Please, let us know if the problem of your forum is still unsolved.
To-do lists are for deferral. The more things you write down the later they're done... until you have 100s of lists of things you don't do.

File a security report | Developers' Blog | Bug Tracker


Also known as Norv on D* | Norv N. on G+ | Norv on Github

frakme

QuoteHello frakme,

Sorry to have gotten so late to this issue. Please, let us know if the problem of your forum is still unsolved.

Unfortunately, no. It did not fix the errors that appear only in the post and preview area of new forum posts. 

Norv

Can you please:
- log in phpmyadmin, your SMF database
- and run the following query:

SELECT * FROM `smf_settings` WHERE `variable` = 'global_character_set';

(replace 'smf_' accordingly with your actual database prefix)
Please do tell what the result is.

Also, could you eventually post a link to your forum?
To-do lists are for deferral. The more things you write down the later they're done... until you have 100s of lists of things you don't do.

File a security report | Developers' Blog | Bug Tracker


Also known as Norv on D* | Norv N. on G+ | Norv on Github

frakme

Running that query didn't work either. I just noticed the verification image for the registration page isn't working either. That is usually related to language so I can't image they aren't related.

Norv

Sorry, I wasn't clear enough: the query wasn't supposed to fix the problem, but I would need to know exactly its result. Please, could you paste it here?

The entire message you get from MySQL.
To-do lists are for deferral. The more things you write down the later they're done... until you have 100s of lists of things you don't do.

File a security report | Developers' Blog | Bug Tracker


Also known as Norv on D* | Norv N. on G+ | Norv on Github

frakme

No that's on me, its what I get for  trying to work when I should have been sleeping. Sorry about that. I ran the query and I did not get any kind of error message.

frakme

I am still having trouble with this issue. Does anyone have any more input.

For a refresher, the latin characters only appear in the body of messages and simple portal pages AFTER the post or preview button has been pushed. They appear correctly in the database and in the edit/modify screens of the forum itself.

MrPhil

Quote from: frakme on March 23, 2012, 07:19:49 PM
I added the utf-8 files in the themes/default/languages file, the forum is set to english/utf8 but foreign characters are still not showing. This is where I lack knowledge because I thought the purpose of utf8 was so that one didn't have to add each individual language fie.

Setting the forum language to English merely means that prompts and labels and messages come out in (American) English (sorry, K@!). You can add other languages if you want, selected by the viewers, so long as they all use the same encoding (UTF-8, in your case). The "encoding" is a different matter, giving what alphabet(s) to use. Latin-1 (ISO-8859-1) is the default for SMF, and supports English and other "Western European" languages. If someone wants to submit a post about the Greek economic crisis in Greek text, they're outta luck. By using UTF-8, anyone can submit a post in any language known to humanity (plus Klingon, Elvish, and other made-up languages).

Several things need to match, for "foreign characters" to show up. Your database itself needs to be in UTF-8, and its content should have been converted from Latin-1 to UTF-8 at some point (e.g., an é encoded in Latin-1 as a single byte is now a two-byte UTF-8 character). All the language support files for all the languages you want to support need to be in UTF-8. Finally, each page needs to be output specifying UTF-8 as the charset (encoding). That will show up in the <meta ... charset=UTF-8... tag in each page.

I don't think you've told us yet how accented characters "fail" to show. Are you seeing a string of two or more European accented characters for each expected accented character? If so, that means the text is being provided in UTF-8, but the page is displaying in Latin-1. Are you seeing a ?-in-black-diamond glyph instead of the expected accented character? If so, that means the text is in Latin-1, but the page display is in UTF-8. Is there any difference between fixed language file text and content from the database (such as posts)? If they show accented characters differently, that could be a clue.

N.B.: some servers are misconfigured so that they force the page to be displayed in Latin-1, even though the <meta> tag for charset says UTF-8. If your meta tags says UTF-8 yet you're seeing multiple accented European characters for each accented or non-Latin character, tell your browser to show the encoding in use: View > Character Set (or something similar). If it says Latin-1/ISO-8859-1/Western European, change it to UTF-8 and see if the problem goes away for that page. If so, talk to your host and ask why their server is forcing Latin-1.

Finally, if you're missing an English UTF-8 file, just copy the corresponding regular English file to the UTF-8 name or directory (whichever is appropriate). English does not make use of non-ASCII characters (no accents), so it's unlikely that there would be any difference in the text between Latin-1 and UTF-8. Possibly if a Pound Sterling character, or a hard coded non-breaking space character, or a guillemet quotation mark is used, you might have to edit the file to fix those, but otherwise it's unlikely you'll have to make any edits. Same thing for any missing image/icon files: just copy them from regular English into UTF-8 English area.

P.S. Not a "foreign character", but certain punctuation characters cut and pasted from Microsoft Word are "Smart Quotes" which will mess up UTF-8 and (often) Latin-1 displays. That is a separate issue. Let us know if the problem is actually that certain punctuation characters are disappearing and/or taking the rest of a post with them.

frakme

#32
QuoteBy using UTF-8, anyone can submit a post in any language known to humanity (plus Klingon, Elvish, and other made-up languages).
Uh, yeah. That's why my original post was looking for the missing UTF8 language pack- the pack that "disappeared" after the website was upgraded to smf 2.02. The forum and database have always been set to UTF8 for that very reason-our posters use many languages to enhance their stories. 

QuoteI don't think you've told us yet how accented characters "fail" to show. Are you seeing a string of two or more European accented characters for each expected accented character?
I'm sorry to say I've said this in multiple posts above. They appear as latin characters, Å, Ä,ƒ ¾Ã and so forth. They appear "correct" with the correct language letter (be it czech, japanese or arabic) in the database. The appear correct in every area of the website and forum EXCEPT the actual post body. If you edit any post body using the modify button, the same post appears correct but will return to the incorrect (ie latin character) presentation after pressing the "post" button.

QuoteAre you seeing a string of two or more European accented characters for each expected accented character? If so, that means the text is being provided in UTF-8, but the page is displaying in Latin-1.
Yes, that is what I believe the problem is. I posted the above posts asking how I can fix that because I don't know why they are doing that. The database and forum are set to UTF8, with the appropriate language packs installed. All tables, columns etc are set to utf8 unicode, the database is as well. UTF8 is set as the language via the smf Cpanel and the meta tags indicate the website is set to utf8.

So what is my next step? Again, I appreciate the help and patience. Thank you.

MrPhil

If your text is definitely in UTF-8, is your page specifying UTF-8 charset (encoding)? If you are (everything everywhere is UTF-8), in your browser go to View > Character Encoding and see if it's showing UTF-8. If not, select UTF-8 and see if the bad characters clear up. Then ask your host why their server is overriding your encoding specification.

Regarding missing UTF-8 language files, for English, simply copy over the missing ones from regular (Latin-1) English. For other languages, you'll have to convert from Latin-1 or other encoding to UTF-8.

frakme

QuoteIf you are (everything everywhere is UTF-8), in your browser go to View > Character Encoding and see if it's showing UTF-8. If not, select UTF-8 and see if the bad characters clear up.

The same characters appear correct (an umlaut appears as an umlaut for example) in certain sections of the website  so it is not a browser issue.


QuoteRegarding missing UTF-8 language files, for English, simply copy over the missing ones from regular (Latin-1) English. For other languages, you'll have to convert from Latin-1 or other encoding to UTF-8.

I was missing the php file and the 2.02 version of uft8 language pack. Luckily that issue was solved at the beginning of this post.

QuoteThen ask your host why their server is overriding your encoding specification.
According to my host, they are not.


QuoteFor other languages, you'll have to convert from Latin-1 or other encoding to UTF-8.
According to my database collation information, everything has been converted to UTF8 unicode.


frakme

I kind of gave up on this problem and was ready to say screw it. Then the website owner came back from vacation and schooled me a little bit. She fixed the problem and gave clear step by step instructions while we worked.   The solution sounds like a lot of work but it really was just a basic "clean install" of smf 2.02. It really only took about 2 hours total and that was only because we had so many tables to go through.

Solution
1) create new empty database
2) export data and structure of current "live" database to hardrive of local computer
3) open in Emeditor, checked for any incorrect syntax or mixed collation
4) save as with encoding set to UTF8; then import into newly created database

At this point, she verified that all charsets and other information were correctly collated in phpmyadmin. Website was set to new database. Characters were still appearing garbled on website.

4) create new empty database
5) looked through index, ini and source php files, stored on host server, to make sure all charsets were set to utf8, finding no errors we went on to the next step
6)  deleted all SMF files from hosting directory/server
7) manually installed SMF 2.02, onto host server, using filezilla to create clean live site
8) dropped single table and imported same table from exported file of the old database
9) double checked that collation for rows of each table (by looking in structure tab of phpmyadmin) and table column (by using operation tab for each table) was still set to utf8 general ci
10) repeated for each standard table
11) used SMF package manager on cpanel to add packages
12) dropped single package tables and imported same table from exported file of the old database
13) Characters appear correctly

She said she could have probably imported the entire database at once, verified the collation and it would have worked fine. But she also said she wanted to double check their wasn't some weird error in any single table since the errors appeared suddenly. It is her belief that "at some point during the original upgrade via the hosting panel, something broke." At least that is a way to explain why all the information /communication was correctly coded/collated but errors still appeared.


Advertisement: