News:

Wondering if this will always be free?  See why free is better.

Main Menu

"?" instead of Turkish Characters

Started by Ilkharnos, July 10, 2017, 07:11:55 AM

Previous topic - Next topic

Ilkharnos

Hello,

First of all, I am sorry if such a subject have already been brought here, or if I am in the wrong place.

I have a small SMF forum in Turkish language (2.0.14). I have been using it since 2009 without any language or character problem. Today I noticed a serious problem regarding Turkish characters.

Turkish characters seem to show properly in all of the forum areas...



...except when you click on a topic to show the user post. Then, Turkish characters get funny in the entire page.



If you get back to other sections of the forum after visiting a topic's index, Turkish characters get back to normal. This issue occurs only in the topic index.

I haven't got any idea about its cause. Yesterday it wasn't there. What I did yesterday was (in order):

- Installing bookmarks modification manually (I changed the codes manually with care, and it worked fine. There wasn't any character problems)

- Making a filezilla backup of "public_html" (done)

- Trying to make a database backup through phpMyAdmin. It wasn't successful, and it was my first failure ever since. I didn't receive any error messages, but the backup progress was stuck with a constantly-turning loading icon. There was no results.  I tried the same operation many times but I couldn't make it. I gave up and slept, without making any checks to my forum. Today, when I woke up, my forum was in this state.

I tried hard to find a solution, and I made some operations on my database (like changing the default charset to utf8 for all tables), but it didn't change anything.

Finally, I have zero knowledge regarding codes or databases. I do what I do by trial and error. So, I might have caused this by making a mistake somewhere.

I would be grateful for any assistance you may give. Thank you very much.

Regards

Ilkharnos

Note: Posts themselves are not corrupt. When I am in the modify post screen, I see Turkish characters without problem, both in the post itself and the other sections of the same page (main menu, for example). The problem only shows itself when you are in the topic index (of any topic).

shawnb61

There are a lot of variables that are in play here, and it would be helpful to see them... 

There is a small utility that will dump all the variables related to charset & collation for your installation.   

It may be found here:
https://github.com/sbulen/sjrbTools/blob/master/SMF_UTF8_Diag.php

If you could download the file, then place it in your SMF directory (where settings.php is), then share the output that would help. 

The output would look like this:
https://shawnbulen.com/VAN20/SMF_UTF8_Diag.php

I wrote & use this utility to audit my environments' UTF8 settings.  (It's not an official SMF utility.)   
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

Ilkharnos

Thank you very much for your assistance. I downloaded the file and ran the diagnostic. The results are as follows:

http://karamigfer.com/SMF_UTF8_Diag.php

I also attached it as a document in case you don't see it.

I will look forward to hearing your comments.

shawnb61

I think that is good news. 

Although your table defaults are all utf8mb4 (due to the fact you changed them in phpmyadmin), your raw data is still in latin1.  The actual database data hasn't been converted to UTF8 yet, only the defaults to be used for newly added columns. 

And none of the forum settings ($db_character_set & global_character_set) think you're UTF8 yet. 

So...

For you the solution is very simple:
(1) Backup your database!!!!!    Just in case...
(2) Use the SMF utility to convert to UTF8.  This is found under Admin | Maintenance | Forum Maintenance | Database.   (leave the dropdown at its default value of ISO_8859_1)
(3) Install a Turkish UTF8 character set language file
(4) Whenever you install a new language file, some mods may need re-installation to properly use the language file

Your database will be utf8_general_ci throughout afterwards. 

Hope this helps.   


(NOTE: That **IF** your database data/columns had already been UTF8, the solution would have been more difficult.  I do NOT recommend running SMF's UTF8 conversion utility when your data is already UTF8.)
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

shawnb61

I would suggest one more step:
    (5)  Set your database level default collation to utf8_general_ci.

Unfortunately, most mods just create new tables at the DB default.  This will prevent collation mismatches when that happens.  You don't want some tables at latin5_turkish_ci and some at utf8_general_ci.   

Setting a good default at the DB level is good practice & will prevent issues in the future as you add or reinstall mods. 
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

Ilkharnos

Thank you for these good advices. I feel that we are close to a solution.

When I followed these steps, "?" problem in topic index is solved and I am thankful for that. However, all of the Turkish characters in all the entries (posts and portal blocks) gets weird. The bad thing is, the entries themselves seem to be changed. When I click on "modify" in a post, what I see is still a corrupted text. Correcting these entries manually would last forever.

I can roll back to database backup and correct this issue that way, but it puts me back to start again. Have you got any ideas to get through this problem?

Note: You have posted one more step as I am writing this. I will try to do as you say. I started to use adminer instead of phpMyAdmin because of performance issues. I will try to find that setting.

Regards.

shawnb61

Definitely rollback.   

Do you have a backup from before the utf8mb4 changes?   

If so, I'd restore to that version and again attempt the steps above.  I am guessing that the utf8mb4 table settings confused the conversion. 

In my experience, the SMF conversion will do a good job converting latin1_swedish_ci to utf8_general_ci - even when utf8 Turkish characters are inserted in the latin1 database (which is very common). 

Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

Ilkharnos

#8
I tried many language - charset variations in order to keep my Turkish characters stable but I had no luck so far. Turkish characters keep getting corrupted when I use SMF's conversion tool.

Before using SMF's conversion, I tried to change DB charsets (every single one of them) back to latin1_swedish_ci (default) but it didn't solve my problem.

There is an older database backup. After I saved that backup, I worked 15 more days on my website. So I am not sure if it would be usable, but I will use it if I have no choices.

From what I have seen so far, the character corruptions are based on language-charset adjustments. They keep taking different forms as I change the variations. If I make that specific adjustment which corrects the Turkish characters, my "?" problem is back this time.

I also tried to open database backup .sql file with the text editor and use the "replace" command to manually correct the letters, but it damages the file and cause it to become unusable.

I will keep trying to find a solution. If I find something I will write it here. If you have got any ideas, I would be happy to hear them.

And again, thank you very much.

Ilkharnos

I couldn't find any solution for my "?" problem yet. There is one more detail I have recently detected. Maybe it gives another clue about the problem.

When I view a topic's index (this is where the Turkish characters become "?"), and only in a topic's index, there is a difference in the browser.



This double line doesn't appear anywhere else in my forum, just like the "?" marks which only appear in topic index. I think there is a code error (My site might include more than one code errors, but unfortunately, repairing them is beyond my knowledge).

I would like to know which code files govern the topic index and what can be done about it. Maybe the solution for the "?" problem lies in it?

Thank you in advance.

Ilkharnos

I have found out that the horizontal lines I had pointed out originates from GenericControls.php. I have just solved this problem.

I have also solved my Turkish Character problem thanks to an application called Notepad++

Regards.

shawnb61

Can you describe exactly how the Turkish character issue was solved?   I'm curious.
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

Ilkharnos

After all of my efforts with phpmyadmin, adminer and SMF conversion tool were failed, I was convinced that the characters in the database themselves were broken/corrupt. Because whenever I open a .sql file with notepad or notepad++, I saw weird characters instead of Turkish characters. So I focused on changing the data in the database, and I did it with the help of notepad++.

When I opened the .sql file with notepad++, first I chose the correct encoding options (regarding UTF8). After that, I used the "replace" command. There was a total of 4 (or 5) character types which needed to be corrected. I replaced them by applying "replace all" command for each character.

I don't know if it sounds crazy, but it worked.


shawnb61

Thanks for the explanation. 

If the raw data was in error, I believe that means that somehow the data was miscoded initially.

It looks like your DB & forum are now properly configured as UTF8.  Hopefully you should have no issues going forward. 

You can delete that utility now, no need to keep it around. 
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

Ilkharnos

I removed the diagnostic tool.

Thank you very much for your assistance. I have learned much thanks to you.

Advertisement: