Want to get involved in developing SMF, then why not lend a hand on our github!
Started by rcane, December 04, 2021, 10:39:14 PM
Quote from: shadav on December 05, 2021, 12:24:14 AMturn off the WYSIWYG editorit's buggy and doesn't like copy and pasted messages, rather it does but it's pasting exactly what you copied from another siteso you can either turn it off, or you will need to paste the code in like notepad fist to strip it of all the site's code that you copied, then copy it from notepad to your editor/reply box
Quote from: Steve on December 05, 2021, 08:01:47 PMI'm not sure I understand your response to shadav's suggestion.The WSIWYG is loaded with bugs. It's highly recommended to not use it at all.
Quote from: shawnb61 on December 06, 2021, 10:29:58 AMTwo other questions come to mind:Is your forum utf8? Non-utf8 forums have issues with certain characters. The right-quote being one...What are they copying and pasting from? Lots of other sources include non-printable characters. In general it's not advised to c&p from Word or other apps. Even web pages can cause issues. If you must, you need to look into a tool like Clean Text (mac) to strip out non-printable characters. There are other threads on this.
Quote from: rcane on December 06, 2021, 05:38:06 PMIt's UTF8, though I'm not sure from where folks were pasting.
Quote from: shawnb61 on December 06, 2021, 06:39:42 PMThat explains the problem with the right quote. Pretty common UTF8 error. Or Non-UTF8 error...As always, run backups before doing anything. Twice. You need to be able to recover if things go funky.7 = some SQL you need to run in a SQL window. If you have phpmyadmin (or adminer), you can do it in the SQL window there.8 = Look under Admin | Configuration | Languages | Edit Languages. You want all -utf8 languages now.9 = Look for funky stuff... One pretty common issue that happens is double-encoding. ISO-8859-1, under certain configs, will actually allow you to post utf8 content, and it will look OK. If you then convert that already-utf8 content to utf8, it will be double-encoded. At that point you have corrupt data... There are ways to fix it. Not fun, but fixable. E.g., the Euro symbol becomes "â,¬". 10 = a new function available where you found the 'convert to utf8' function.
Quote from: rcane on December 07, 2021, 03:01:28 PMI tried running this in SQL:UPDATE smfqg_membersSET lngfile = CONCAT(lngfile, '-utf8')WHERE lngfile != ''Had to change the manual's code of smf_members to what you see up there, as that's the only members db i have.No changes were made when I tested it.I have only english, but I'm not sure if that is correct. I need a baseline here. A reference datum from which to move forward.
Quote from: rcane on December 07, 2021, 03:01:28 PMSo, I first need to learn where to be looking to confirm all these things. 1. Also unsure how to check languages in the Users column; was that in the members php? I didn't see one in there if so; and
Quote from: rcane on December 07, 2021, 03:01:28 PM2. running the replacement query in SQL; I read that can be done, but I need a handle on all the places I need to be checking. I'm still learning the environment.
Quote from: shawnb61 on December 08, 2021, 12:50:34 AMNo, it will not correct the issues. But you will see far fewer character issues going forward now you are on utf8.You should run the entity conversion at least once.When you are NOT on utf8, multi-byte characters are stored as 'html entities', which is a codified form of the characters. They must, because your DB doesn't support multi-byte characters yet. Once you are using utf8, they don't need to be stored as entities, you can simply use the actual characters. This function will replace entities with the actual characters.E.g., if your forum is NOT utf8, then 'काचं शक्नोम्यत्तुम् । नोपहिनस्ति माम् ॥' is stored in entity form, e.g., 'काचंशक्नोम्यत्तुम् । नोपहिनस्ति माम् ॥'.After you are on utf8, the Convert Entities to Characters function cleans these up where possible so they are stored as 'काचं शक्नोम्यत्तुम् । नोपहिनस्ति माम् ॥' as expected.Even in "English", there are a lot of multi-byte characters, e.g., the Euro symbol, the copyright symbol, the right-quote, etc... UTF8 is definitely the way to go.Hope this helps...