• Welcome to Simple Machines Community Forum. Please login or sign up.
October 16, 2021, 11:10:28 AM

News:

Bored?  Looking to kill some time?  Want to chat with other SMF users?  Join us in IRC chat or Discord


2.0.16 non-Latin character corruption bug upon inline edit

Started by spiros, January 02, 2020, 05:40:35 AM

Previous topic - Next topic

shawnb61

I cannot replicate this exactly, but I am definitely seeing some issues.  Logged #123.

A few questions:
- Is your DB UTF8?
- What browser are you using? Is it current?
- MySQL or pg?   Version?
- PHP version?

Using the inline edit buttons I do see some erroneous "subject empty" and "message empty" items when multi-byte characters are encountered.  Also some non-responses. 
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

Illori

if the issue is being seen here, we are UTF-8 in the database. using
Database Server: MySQL
PHP: 7.1.28

spiros

 - UTF8
- All major current browsers
- MariaDB 10.3
- PHP 7.1/7.2/7.3

Quote from: shawnb61 on January 02, 2020, 02:04:04 PM
I cannot replicate this exactly

Please try editing inline a specific post here and entering in title or body Greek or Cyrillic text for example.

shawnb61

OK, I have since been able to replicate.  It depends which characters are being used. 

If there are any 4-byte UTF8 characters, you get the old erroneous empty message/subject. 

If there are only 2 or 3-byte UTF8 characters, this can occur. 

Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

spiros

Actually, I got empty message warning even with 3-byte UTF8 characters (I presume that this is what Greek characters are).

shawnb61

Yes - there are variations depending on if you are editing an existing message - & what mixture of content existed before & after. 
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

shawnb61

Quote from: spiros on January 02, 2020, 02:16:53 PM
Actually, I got empty message warning even with 3-byte UTF8 characters (I presume that this is what Greek characters are).

I think Greek monotonic characters are 2-bytes, & classical & polytonic are 3-bytes. 

Helpful resources:
https://www.utf8-chartable.de/unicode-utf8-table.pl
http://kermitproject.org/utf8.html

Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

spiros

I guess some multibyte-unsafe string was used somewhere along the line?

shawnb61

No.  It should all be safe.  This is an SMF bug.

At the core of the issue is that the early MySQL implementation of UTF8 was incomplete, & only supported 1-to-3 byte UTF8 characters.  4-byte UTF8 characters were not supported.  SMF started supporting UTF8 in this state, and thus requires special logic for 4-byte characters.  So... SMF has a lot of history & configurations to support with the same logic - non-UTF8, as well as a brain-damaged version of UTF8. 
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

spiros

Well, I never had any such issues up to 2.0.14 and this issue has been tested by me with 2-to-3 byte UTF8 characters. I guess you are talking about using utfmb4 to resolve 4-byte UTF8 character issues which is a much broader thing https://github.com/SimpleMachines/SMF2.1/issues/5108 and, of course, no 2-to-3 byte UTF8 character issues occur with standard modify.

shawnb61

This is a dupe, so I'll close out this one & keep the other, which has a bit more discussion.

Dupe of:  https://www.simplemachines.org/community/index.php?topic=571082.0
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

Advertisement: