Having trouble with pasted text in posts and %u2019

Started by rcane, December 04, 2021, 10:39:14 PM

Previous topic - Next topic

rcane

Hi,

Is there a way to allow the single quotation mark without the %u2019 coming over?

Kindred

Слaва
Украинi

Please do not PM, IM or Email me with support questions.  You will get better and faster responses in the support boards.  Thank you.

"Loki is not evil, although he is certainly not a force for good. Loki is... complicated."

shadav

turn off the WYSIWYG editor
it's buggy and doesn't like copy and pasted messages, rather it does but it's pasting exactly what you copied from another site

so you can either turn it off, or you will need to paste the code in like notepad fist to strip it of all the site's code that you copied, then copy it from notepad to your editor/reply box

rcane

#3
Quote from: shadav on December 05, 2021, 12:24:14 AMturn off the WYSIWYG editor
it's buggy and doesn't like copy and pasted messages, rather it does but it's pasting exactly what you copied from another site

so you can either turn it off, or you will need to paste the code in like notepad fist to strip it of all the site's code that you copied, then copy it from notepad to your editor/reply box

I'm not certain if it's coming as pasted text from another app (like notepad or word), or if it's being entered in the post editor in SMF.  I tried pasting and typing with the global setting unchecked and noticed no errors.

I also don't see a difference in the editor buttons displayed regardless of WSIWYG settings.  Should I?


I searched the forum for %u2019 and it's replete with instances of it.



I made sure the global setting stays off.

Steve

I'm not sure I understand your response to shadav's suggestion.

The WSIWYG is loaded with bugs. It's highly recommended to not use it at all.
DO NOT pm me for support!

rcane

Quote from: Steve on December 05, 2021, 08:01:47 PMI'm not sure I understand your response to shadav's suggestion.

The WSIWYG is loaded with bugs. It's highly recommended to not use it at all.

I was curious if there is a difference in the editing console shown depending on which setting you've got--off or on.

I turned it off yesterday to be safe.  I wasn't aware of the bugs.

I wish I could do a master find & replace of %u2019 to clear those out. :)

shawnb61

Two other questions come to mind:

Is your forum utf8?  Non-utf8 forums have issues with certain characters.  The right-quote being one...

What are they copying and pasting from?  Lots of other sources include non-printable characters.  In general it's not advised to c&p from Word or other apps.  Even web pages can cause issues.  If you must, you need to look into a tool like Clean Text (mac) to strip out non-printable characters.  There are other threads on this.
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

rcane

Quote from: shawnb61 on December 06, 2021, 10:29:58 AMTwo other questions come to mind:

Is your forum utf8?  Non-utf8 forums have issues with certain characters.  The right-quote being one...

What are they copying and pasting from?  Lots of other sources include non-printable characters.  In general it's not advised to c&p from Word or other apps.  Even web pages can cause issues.  If you must, you need to look into a tool like Clean Text (mac) to strip out non-printable characters.  There are other threads on this.

It's UTF8, though I'm not sure from where folks were pasting.


shawnb61

Quote from: rcane on December 06, 2021, 05:38:06 PMIt's UTF8, though I'm not sure from where folks were pasting.

You need to confirm that...  You cannot just post anything in 2.0 without cleaning it first...  SMF wants text.

First step here is to understand exactly what is triggering the error.  So you can reproduce it at will.

Once YOU can reproduce it, we can!   :)
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

rcane

#9
ISO 8859-1 english is the set from within SMF.

In the php center it shows:  UTF-8 Unicode (utf8mb4)

From this page I'm not sure about 7, 8, and 10 when converting to UTF8--haven't done that before.

https://wiki.simplemachines.org/smf/UTF-8_Readme

shawnb61

That explains the problem with the right quote.  Pretty common UTF8 error.  Or Non-UTF8 error...

As always, run backups before doing anything.  Twice.  You need to be able to recover if things go funky.

7 = some SQL you need to run in a SQL window.  If you have phpmyadmin (or adminer), you can do it in the SQL window there.

8 = Look under Admin | Configuration | Languages | Edit Languages.  You want all -utf8 languages now.

9 = Look for funky stuff...  One pretty common issue that happens is double-encoding.  ISO-8859-1, under certain configs, will actually allow you to post utf8 content, and it will look OK.  If you then convert that already-utf8 content to utf8, it will be double-encoded.  At that point you have corrupt data...  There are ways to fix it.  Not fun, but fixable.  E.g., the Euro symbol becomes "â,¬".

10 = a new function available where you found the 'convert to utf8' function.
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

rcane

Quote from: shawnb61 on December 06, 2021, 06:39:42 PMThat explains the problem with the right quote.  Pretty common UTF8 error.  Or Non-UTF8 error...

As always, run backups before doing anything.  Twice.  You need to be able to recover if things go funky.

7 = some SQL you need to run in a SQL window.  If you have phpmyadmin (or adminer), you can do it in the SQL window there.

8 = Look under Admin | Configuration | Languages | Edit Languages.  You want all -utf8 languages now.

9 = Look for funky stuff...  One pretty common issue that happens is double-encoding.  ISO-8859-1, under certain configs, will actually allow you to post utf8 content, and it will look OK.  If you then convert that already-utf8 content to utf8, it will be double-encoded.  At that point you have corrupt data...  There are ways to fix it.  Not fun, but fixable.  E.g., the Euro symbol becomes "â,¬".

10 = a new function available where you found the 'convert to utf8' function.

Ok, I have backups; I'm doing it daily.  but I've never run any SQL queries so that is new territory for me.

This is all I see under "admin>config>languages>edit languages: 





UTF8MB4_unicode_ci is what's showing in myPhpAdmin



shawnb61

How many languages do you have?

For each, go to Add Languages and add the utf8 version.

That's a good first step.
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

rcane


shawnb61

Good.  Simple.  Add the utf8 one.

Then find the sql window in phpmyadmin for step #7.
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

rcane

I tried running this in SQL:

UPDATE smfqg_members
SET lngfile = CONCAT(lngfile, '-utf8')
WHERE lngfile != ''

Had to change the manual's code of smf_members to what you see up there, as that's the only members db i have.

No changes were made when I tested it.

I have only english, but I'm not sure if that is correct.  I need a baseline here.  A reference datum from which to move forward.

My SMF language is as that image i attached.

If I search admin>languages I see two "english" one says UTf8 and one say NO UTF8.


So, I first need to learn where to be looking to confirm all these things.


1. Also unsure how to check languages in the Users column; was that in the members php? I didn't see one in there if so; and

2. running the replacement query in SQL; I read that can be done, but I need a handle on all the places I need to be checking.  I'm still learning the environment.

shawnb61

Quote from: rcane on December 07, 2021, 03:01:28 PMI tried running this in SQL:

UPDATE smfqg_members
SET lngfile = CONCAT(lngfile, '-utf8')
WHERE lngfile != ''

Had to change the manual's code of smf_members to what you see up there, as that's the only members db i have.
No changes were made when I tested it.
I have only english, but I'm not sure if that is correct.  I need a baseline here.  A reference datum from which to move forward.
If you only ever had one language, then folks would have never needed to change the language from the default.  Getting 0 results makes sense.  The point here was to ensure that folks who made language decisions (not using the default) will use the utf8 going forward.  I think you're good.

Quote from: rcane on December 07, 2021, 03:01:28 PMSo, I first need to learn where to be looking to confirm all these things.
1. Also unsure how to check languages in the Users column; was that in the members php? I didn't see one in there if so; and

This is displayed in the same place, under Admin|Configuration|Languages.  If everything worked, you'll see everyone on your UTF8 version of the language.  See the screenshot below.

(If not, don't sweat it - I don't think there is any real difference between the english & english.utf8... This is mainly an issue for other languages.)

Quote from: rcane on December 07, 2021, 03:01:28 PM2. running the replacement query in SQL; I read that can be done, but I need a handle on all the places I need to be checking.  I'm still learning the environment.
You only needed to do the replacement query once, in your SQL window. You're done with that.  If all your users are on the proper language, you're good.  Done.

You cannot view this attachment.
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

rcane

I'm showing all of them on the -utf8 now.  thanks.

What about Convert HTML-entities to UTF-8 characters?

Can it correct the other typos out there?

shawnb61

No, it will not correct the issues.  But you will see far fewer character issues going forward now you are on utf8.

You should run the entity conversion at least once.

When you are NOT on utf8, multi-byte characters are stored as 'html entities', which is a codified form of the characters.  They must, because your DB doesn't support multi-byte characters yet.  Once you are using utf8, they don't need to be stored as entities, you can simply use the actual characters.  This function will replace entities with the actual characters.

E.g., if your forum is NOT utf8, then 'काचं शक्नोम्यत्तुम् । नोपहिनस्ति माम् ॥' is stored in entity form, e.g., 'काचंशक्नोम्यत्तुम् । नोपहिनस्ति माम् ॥'.

After you are on utf8, the Convert Entities to Characters function cleans these up where possible so they are stored as 'काचं शक्नोम्यत्तुम् । नोपहिनस्ति माम् ॥' as expected.

Even in "English", there are a lot of multi-byte characters, e.g., the Euro symbol, the copyright symbol, the right-quote, etc...  UTF8 is definitely the way to go.

Hope this helps...
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

rcane

Quote from: shawnb61 on December 08, 2021, 12:50:34 AMNo, it will not correct the issues.  But you will see far fewer character issues going forward now you are on utf8.

You should run the entity conversion at least once.

When you are NOT on utf8, multi-byte characters are stored as 'html entities', which is a codified form of the characters.  They must, because your DB doesn't support multi-byte characters yet.  Once you are using utf8, they don't need to be stored as entities, you can simply use the actual characters.  This function will replace entities with the actual characters.

E.g., if your forum is NOT utf8, then 'काचं शक्नोम्यत्तुम् । नोपहिनस्ति माम् ॥' is stored in entity form, e.g., 'काचंशक्नोम्यत्तुम् । नोपहिनस्ति माम् ॥'.

After you are on utf8, the Convert Entities to Characters function cleans these up where possible so they are stored as 'काचं शक्नोम्यत्तुम् । नोपहिनस्ति माम् ॥' as expected.

Even in "English", there are a lot of multi-byte characters, e.g., the Euro symbol, the copyright symbol, the right-quote, etc...  UTF8 is definitely the way to go.

Hope this helps...

It does. Thank you.

Advertisement: