News:

Want to get involved in developing SMF, then why not lend a hand on our github!

Main Menu

index.english-utf8.php is missing

Started by frakme, March 23, 2012, 06:27:24 PM

Previous topic - Next topic

frakme

We recently converted our forum from 1.14 to 2.02. We followed the usual steps and did so without any noticeable issue, except for a bug in the subs.php menu referenced HERE. Mods installed are SMF Gallery Lite and Simple Portal.

A few days ago, members started reporting problems with characters in new posts. When members have used other languages (likely copied directly from a translator)  Å shows instead of some of the typed characters. In older posts, everything from the first "foreign" character on will have disappeared. I checked the database and that information is still "there" it just is not showing up on the smf side of things.

The admin panel on the smf side of the forum shows that ISO-8859-1/English is the language installed. I did convert HTML-entities to UTF-8 characters through the SMF control panel.  The file index.english-utf8.php is missing from the files on the host. I thought it was there when we ran 1.14. The other account manager on this forum/host says he did not remove the file when he upgraded to 2.02. but said when he upgraded,  index.english.php replaced the index.english-utf8.php in the themes/default/languages file.

So my question is

1) is this just a language problem  that can be fixed by finding and adding the english-utf8.php file?
2) if not, what do I need to do from here?

kat

Finding the language files, here, is a bit of a nightmare, I'm afraid. They're in a really illogical place, for some reason.

English UTF-8 is here:

http://download.simplemachines.org/?smflanguages;lang=english

PROPER English UTF-8 ;) is here:

http://download.simplemachines.org/?smflanguages;lang=english_british

frakme

#2
Thank you so much for the quick reply. I have been looking for that languages page for days! :)

I added the utf-8 files in the themes/default/languages file, the forum is set to english/utf8 but foreign characters are still not showing. This is where I lack knowledge because I thought the purpose of utf8 was so that one didn't have to add each individual language fie. It is impossible for me to know which language someone might use as the forum itself is in english but many people use various languages as the traditional language of characters in certain parts of the stories they are telling.

So, what should I try next?

frakme

Just an update because I failed to mention this in the earlier post.
1)Database through phpmyadmin shows that all tables are collated with utf8 and character sets are utf8 as well.
2) HTML convert to utf8 was done and shows completed
3) Settings php file on the server also lists $language = 'english-utf8'; in Forum Information and $db_character_set = 'utf8'; in Database Information
4)All foreign characters are still displaying incorrectly once posted.  For example bohové becomes bůh sakra BUT in the modify post screen they characters appear as they should. For example bůh sakra becomes bohové.

Any help on where to go from here is appreciated.

Kermit

Are you sure,that all english-UTF-8 language files have been replaced properly ? I don't have much idea about this UTF-8 thing,but the problem is mostly,that languages files are not completely in UTF-8 format
My Mods
Please don't PM/mail me for support,unless i invite you
Formerly known as Duncan85
Quote
"Two things are infinite: the universe and human stupidity; and I'm not sure about the the universe."

A. Einstein

kat

#5
Quote from: frakme on March 23, 2012, 07:19:49 PMforeign characters are still not showing.

Thinking logically, the foreign characters won't show, because, by definition, they're not... er... English! ;)

So, one would surmise that you need the language packs for those other languages, no?

Those well-hidden language packs are here:

http://download.simplemachines.org/?smflanguages

Edit: Thinking further on it, and after a nudge from Kermit (Ta, Kermit) ;) what actual characters do you mean, exactly?

I was thinking of umlauts and that kinda thing, which we English just don't have. But, of course, we have words like "Café" that have accents, although most of us are too lazy to use 'em. ;)

frakme

QuoteThinking logically, the foreign characters won't show, because, by definition, they're not... er... English!

To respond logically, they showed without issue until the upgrade to smf 2.02. We only had english UTF8 installed, both on the SMF control panel and on the host in our php files, when it was using the 1.14 version. This is the reason for my initial post. Even though the forum itself is in English,  we have members that often insert character language into their story posts, so we have always used UTF8 instead of basic english.

QuoteAre you sure,that all english-UTF-8 language files have been replaced properly ? I don't have much idea about this UTF-8 thing,but the problem is mostly,that languages files are not completely in UTF-8 format

If by properly, you mean the suddenly missing utf8 files are now placed in the appropriate folders on the host? Then yes. They are located in the themes/default/languages file. If you mean something else, then I won't know unless you explain step by step.

QuoteEdit: Thinking further on it, and after a nudge from Kermit (Ta, Kermit)  what actual characters do you mean, exactly?

Yes, that is correct. Any foreign character with any kind of non typical character is appearing as (I think) something that looks like latin.  Někdy jsou lidé kreténi. Život někdy je sračky. becomes Někdy jsou lidé kreténi. Život někdy je sračky. Again, please note that the correct form is shown in the  post or modify post box but  if you hit the post button/preview button the character issue appears.

The above is Czech and upon the above advice, I installed the czech utf8 files on the server, in the appropriate file just to check. Nothing has changed.

kat

Just a hunch, this.

Try going to Admin>Languages>Add Language and search.

frakme

QuoteTry going to Admin>Languages>Add Language and search.

When I uploaded the language files to the server, host-side, the SMF admin language panel automatically added the czech language options, both standard and UTF8. However, it did not fix the character issue within the posts.


The only thing it does (which is what it is meant to do) is change the entire forum to Czech when I select that radio button. The characters on the rest of the forum do appear correct. For example on the admin menu, any forum menu or button a č remains a č.  However, inside the posts the errors still occur. Latin-esque characters replace the Czech characters (or any other language except standard English).


Edit: I am working off the default smf theme to ensure this isn't just a language issue specific to our custom themes. I contacted the custom theme creator who verified his theme is supports multiple languages.

frakme

#9
Scratch what I said about the czech utf8 files doing anything.

The individual language utf8 files being installed do not actually have anything to do with the other parts of the forum to work correctly. I just tested by adding menu buttons via the subs.php file as well as using foreign language in other parts of the global template and those ares show the characters correctly. I tested in german, hebrew, arabic, greek, romanian, latvian, urdu and turkish and I do not have those individual utf8 language files installed anywhere in my server side files or on the SMF language panel.  The only issue that remains is in the post and preview areas. Those two areas display the characters incorrectly. How is that possible that the characters work in some areas of the site but not others? Does that help clarify the database rather than any smf php files is the culprit?

Again, any help is appreciated. This is a creative arts/hobby forum and without the ability to post their work and writings, it really doesn't serve its purpose.

kat

This is getting a bit over my head, now.

What I'll do, is give the localisation team a nudge, to see if they have any ideas. :)

kat

I've had a suggestion, from someone who knows WAY more than I do, that you need to convert your database to UTF-8.

This, apparently, is a bit out of date.

http://wiki.simplemachines.org/smf/UTF-8_Readme

If it is, maybe you can ask your host to do it?

kat

By the way, in case you're not a GOOD admin, read my sig, before you attempt this, woncha? ;)

frakme

QuoteI've had a suggestion, from someone who knows WAY more than I do, that you need to convert your database to UTF-8.

The database is already in UTF-8. I've attached the screen shots of the phpadmin screen. Did I miss something?

Thanks so much for looking into this. I'm happy to PM anyone you think I should talk to or happy to have anyone PM me.

Dzonny

Hello there.

I would firstly suggest to reupload english and other custom language packs manually, as you were missing some files there may be some more problems with it packs you have there. That's 2.0.2 language packs i believe?

You have something in your error log?

When you open *new* posts with those special characters in phpMyAdmin, can you see them properly?

frakme

#15
QuoteI would firstly suggest to reupload english and other custom language packs manually, as you were missing some files there may be some more problems with it packs you have there. That's 2.0.2 language packs i believe?

Okay, I re-uploaded the utf8 language packs today. I had already done that when advised to do so in the above posts, but I did it again tonight. There is no change on the forums. The type appears correctly when you type or paste it in the edit post screen. It appears correctly in the modify post screen. It appears with (what looks like to me) latin type characters once the preview and post buttons have been pressed. If you hit modify after posting, the type appears correctly in the edit box but is once again in strange characters in the thread after the the post button is selected.

QuoteYou have something in your error log?

There are two errors recorded over the course of the last 4 days. 

1."The attachments upload directory is not writable. Your attachment or avatar cannot be saved." (1)


2.  "Sorry Guest, you are banned from using this forum! (15)
Approved players only
This ban is not set to expire."




QuoteWhen you open *new* posts with those special characters in phpMyAdmin, can you see them properly?
Yes, all foreign characters appear correctly in the phpmyadmin browse tabs.

Also wanted to add that, in case it wasn't clear above, special characters, foreign or otherwise, all appear normal in other sections of the live forum. Menu bars, topic bars, topic titles within the posts -all show normal characters. It is only the body of posts, simple portal pages and the shoutbox that are affected. Articles and news blocks of simple portal post fine with the characters displayed properly.


Dzonny

So, i think it's clear now that problem is not in language packs. It's in database probably.

Quote from: frakme on March 28, 2012, 12:18:38 AM
QuoteWhen you open *new* posts with those special characters in phpMyAdmin, can you see them properly?
Yes, all foreign characters appear correctly in the phpmyadmin browse tabs.

And when you open old posts, is there any broken characters, or messed up signs?

Your error log is not related to this problem, so it's okay.

frakme

Alright, so to "fix" the database, what is my next step?


QuoteAnd when you open old posts, is there any broken characters, or messed up signs?

Yes, the problem occurs in older posts too. The account executive who handled the forum before I took over to try and fix the issue, said he noticed the problem right after the original upgrade to 2.02. They do not appear correct in the database when I looked at them.

Dzonny

Can you please copy/paste some of the lines from the database (with "broken" characters) just to see them firstly?

frakme

Palom’s 
war…

Those are the common characters that appear in the old posts.


frakme

Shall I post more or is that enough to get someone to reply?

frakme

Is there any more assistance available on this matter?

ziycon

I had a brief look over this topic, I see that the database is set to utf8_unicode_ci and the database tables and the connection are set to utf8_general_ci. Would you be able to backup the whole database and change everything to one or the other, might be easiest to change everything to utf8_unicode_ci and the server seems to already be set to that.

frakme

Thank you for your reply. I will try that and see how it goes.

Norv

Hello frakme,

Sorry to have gotten so late to this issue. Please, let us know if the problem of your forum is still unsolved.
To-do lists are for deferral. The more things you write down the later they're done... until you have 100s of lists of things you don't do.

File a security report | Developers' Blog | Bug Tracker


Also known as Norv on D* | Norv N. on G+ | Norv on Github

frakme

QuoteHello frakme,

Sorry to have gotten so late to this issue. Please, let us know if the problem of your forum is still unsolved.

Unfortunately, no. It did not fix the errors that appear only in the post and preview area of new forum posts. 

Norv

Can you please:
- log in phpmyadmin, your SMF database
- and run the following query:

SELECT * FROM `smf_settings` WHERE `variable` = 'global_character_set';

(replace 'smf_' accordingly with your actual database prefix)
Please do tell what the result is.

Also, could you eventually post a link to your forum?
To-do lists are for deferral. The more things you write down the later they're done... until you have 100s of lists of things you don't do.

File a security report | Developers' Blog | Bug Tracker


Also known as Norv on D* | Norv N. on G+ | Norv on Github

frakme

Running that query didn't work either. I just noticed the verification image for the registration page isn't working either. That is usually related to language so I can't image they aren't related.

Norv

Sorry, I wasn't clear enough: the query wasn't supposed to fix the problem, but I would need to know exactly its result. Please, could you paste it here?

The entire message you get from MySQL.
To-do lists are for deferral. The more things you write down the later they're done... until you have 100s of lists of things you don't do.

File a security report | Developers' Blog | Bug Tracker


Also known as Norv on D* | Norv N. on G+ | Norv on Github

frakme

No that's on me, its what I get for  trying to work when I should have been sleeping. Sorry about that. I ran the query and I did not get any kind of error message.

frakme

I am still having trouble with this issue. Does anyone have any more input.

For a refresher, the latin characters only appear in the body of messages and simple portal pages AFTER the post or preview button has been pushed. They appear correctly in the database and in the edit/modify screens of the forum itself.

MrPhil

Quote from: frakme on March 23, 2012, 07:19:49 PM
I added the utf-8 files in the themes/default/languages file, the forum is set to english/utf8 but foreign characters are still not showing. This is where I lack knowledge because I thought the purpose of utf8 was so that one didn't have to add each individual language fie.

Setting the forum language to English merely means that prompts and labels and messages come out in (American) English (sorry, K@!). You can add other languages if you want, selected by the viewers, so long as they all use the same encoding (UTF-8, in your case). The "encoding" is a different matter, giving what alphabet(s) to use. Latin-1 (ISO-8859-1) is the default for SMF, and supports English and other "Western European" languages. If someone wants to submit a post about the Greek economic crisis in Greek text, they're outta luck. By using UTF-8, anyone can submit a post in any language known to humanity (plus Klingon, Elvish, and other made-up languages).

Several things need to match, for "foreign characters" to show up. Your database itself needs to be in UTF-8, and its content should have been converted from Latin-1 to UTF-8 at some point (e.g., an é encoded in Latin-1 as a single byte is now a two-byte UTF-8 character). All the language support files for all the languages you want to support need to be in UTF-8. Finally, each page needs to be output specifying UTF-8 as the charset (encoding). That will show up in the <meta ... charset=UTF-8... tag in each page.

I don't think you've told us yet how accented characters "fail" to show. Are you seeing a string of two or more European accented characters for each expected accented character? If so, that means the text is being provided in UTF-8, but the page is displaying in Latin-1. Are you seeing a ?-in-black-diamond glyph instead of the expected accented character? If so, that means the text is in Latin-1, but the page display is in UTF-8. Is there any difference between fixed language file text and content from the database (such as posts)? If they show accented characters differently, that could be a clue.

N.B.: some servers are misconfigured so that they force the page to be displayed in Latin-1, even though the <meta> tag for charset says UTF-8. If your meta tags says UTF-8 yet you're seeing multiple accented European characters for each accented or non-Latin character, tell your browser to show the encoding in use: View > Character Set (or something similar). If it says Latin-1/ISO-8859-1/Western European, change it to UTF-8 and see if the problem goes away for that page. If so, talk to your host and ask why their server is forcing Latin-1.

Finally, if you're missing an English UTF-8 file, just copy the corresponding regular English file to the UTF-8 name or directory (whichever is appropriate). English does not make use of non-ASCII characters (no accents), so it's unlikely that there would be any difference in the text between Latin-1 and UTF-8. Possibly if a Pound Sterling character, or a hard coded non-breaking space character, or a guillemet quotation mark is used, you might have to edit the file to fix those, but otherwise it's unlikely you'll have to make any edits. Same thing for any missing image/icon files: just copy them from regular English into UTF-8 English area.

P.S. Not a "foreign character", but certain punctuation characters cut and pasted from Microsoft Word are "Smart Quotes" which will mess up UTF-8 and (often) Latin-1 displays. That is a separate issue. Let us know if the problem is actually that certain punctuation characters are disappearing and/or taking the rest of a post with them.

frakme

#32
QuoteBy using UTF-8, anyone can submit a post in any language known to humanity (plus Klingon, Elvish, and other made-up languages).
Uh, yeah. That's why my original post was looking for the missing UTF8 language pack- the pack that "disappeared" after the website was upgraded to smf 2.02. The forum and database have always been set to UTF8 for that very reason-our posters use many languages to enhance their stories. 

QuoteI don't think you've told us yet how accented characters "fail" to show. Are you seeing a string of two or more European accented characters for each expected accented character?
I'm sorry to say I've said this in multiple posts above. They appear as latin characters, Å, Ä,ƒ ¾Ã and so forth. They appear "correct" with the correct language letter (be it czech, japanese or arabic) in the database. The appear correct in every area of the website and forum EXCEPT the actual post body. If you edit any post body using the modify button, the same post appears correct but will return to the incorrect (ie latin character) presentation after pressing the "post" button.

QuoteAre you seeing a string of two or more European accented characters for each expected accented character? If so, that means the text is being provided in UTF-8, but the page is displaying in Latin-1.
Yes, that is what I believe the problem is. I posted the above posts asking how I can fix that because I don't know why they are doing that. The database and forum are set to UTF8, with the appropriate language packs installed. All tables, columns etc are set to utf8 unicode, the database is as well. UTF8 is set as the language via the smf Cpanel and the meta tags indicate the website is set to utf8.

So what is my next step? Again, I appreciate the help and patience. Thank you.

MrPhil

If your text is definitely in UTF-8, is your page specifying UTF-8 charset (encoding)? If you are (everything everywhere is UTF-8), in your browser go to View > Character Encoding and see if it's showing UTF-8. If not, select UTF-8 and see if the bad characters clear up. Then ask your host why their server is overriding your encoding specification.

Regarding missing UTF-8 language files, for English, simply copy over the missing ones from regular (Latin-1) English. For other languages, you'll have to convert from Latin-1 or other encoding to UTF-8.

frakme

QuoteIf you are (everything everywhere is UTF-8), in your browser go to View > Character Encoding and see if it's showing UTF-8. If not, select UTF-8 and see if the bad characters clear up.

The same characters appear correct (an umlaut appears as an umlaut for example) in certain sections of the website  so it is not a browser issue.


QuoteRegarding missing UTF-8 language files, for English, simply copy over the missing ones from regular (Latin-1) English. For other languages, you'll have to convert from Latin-1 or other encoding to UTF-8.

I was missing the php file and the 2.02 version of uft8 language pack. Luckily that issue was solved at the beginning of this post.

QuoteThen ask your host why their server is overriding your encoding specification.
According to my host, they are not.


QuoteFor other languages, you'll have to convert from Latin-1 or other encoding to UTF-8.
According to my database collation information, everything has been converted to UTF8 unicode.


frakme

I kind of gave up on this problem and was ready to say screw it. Then the website owner came back from vacation and schooled me a little bit. She fixed the problem and gave clear step by step instructions while we worked.   The solution sounds like a lot of work but it really was just a basic "clean install" of smf 2.02. It really only took about 2 hours total and that was only because we had so many tables to go through.

Solution
1) create new empty database
2) export data and structure of current "live" database to hardrive of local computer
3) open in Emeditor, checked for any incorrect syntax or mixed collation
4) save as with encoding set to UTF8; then import into newly created database

At this point, she verified that all charsets and other information were correctly collated in phpmyadmin. Website was set to new database. Characters were still appearing garbled on website.

4) create new empty database
5) looked through index, ini and source php files, stored on host server, to make sure all charsets were set to utf8, finding no errors we went on to the next step
6)  deleted all SMF files from hosting directory/server
7) manually installed SMF 2.02, onto host server, using filezilla to create clean live site
8) dropped single table and imported same table from exported file of the old database
9) double checked that collation for rows of each table (by looking in structure tab of phpmyadmin) and table column (by using operation tab for each table) was still set to utf8 general ci
10) repeated for each standard table
11) used SMF package manager on cpanel to add packages
12) dropped single package tables and imported same table from exported file of the old database
13) Characters appear correctly

She said she could have probably imported the entire database at once, verified the collation and it would have worked fine. But she also said she wanted to double check their wasn't some weird error in any single table since the errors appeared suddenly. It is her belief that "at some point during the original upgrade via the hosting panel, something broke." At least that is a way to explain why all the information /communication was correctly coded/collated but errors still appeared.


Advertisement: