Simple Machines Community Forum

SMF Development => Bug Reports => Fixed or Bogus Bugs => Topic started by: spiros on January 02, 2020, 05:40:35 AM

Title: 2.0.16 non-Latin character corruption bug upon inline edit
Post by: spiros on January 02, 2020, 05:40:35 AM
See it even on smf site: https://www.simplemachines.org/community/index.php?topic=570570.msg4042163#msg4042163

Reported here: https://www.simplemachines.org/community/index.php?topic=571082.new#new
Title: Re: 2.0.16 non-Latin character corruption bug upon inline edit
Post by: shawnb61 on January 02, 2020, 02:04:04 PM
I cannot replicate this exactly, but I am definitely seeing some issues.  Logged #123.

A few questions:
- Is your DB UTF8?
- What browser are you using? Is it current?
- MySQL or pg?   Version?
- PHP version?

Using the inline edit buttons I do see some erroneous "subject empty" and "message empty" items when multi-byte characters are encountered.  Also some non-responses. 
Title: Re: 2.0.16 non-Latin character corruption bug upon inline edit
Post by: Illori on January 02, 2020, 02:05:55 PM
if the issue is being seen here, we are UTF-8 in the database. using
Database Server: MySQL
PHP: 7.1.28
Title: Re: 2.0.16 non-Latin character corruption bug upon inline edit
Post by: spiros on January 02, 2020, 02:13:20 PM
 - UTF8
- All major current browsers
- MariaDB 10.3
- PHP 7.1/7.2/7.3

Quote from: shawnb61 on January 02, 2020, 02:04:04 PM
I cannot replicate this exactly

Please try editing inline a specific post here and entering in title or body Greek or Cyrillic text for example.
Title: Re: 2.0.16 non-Latin character corruption bug upon inline edit
Post by: shawnb61 on January 02, 2020, 02:14:32 PM
OK, I have since been able to replicate.  It depends which characters are being used. 

If there are any 4-byte UTF8 characters, you get the old erroneous empty message/subject. 

If there are only 2 or 3-byte UTF8 characters, this can occur. 

Title: Re: 2.0.16 non-Latin character corruption bug upon inline edit
Post by: spiros on January 02, 2020, 02:16:53 PM
Actually, I got empty message warning even with 3-byte UTF8 characters (I presume that this is what Greek characters are).
Title: Re: 2.0.16 non-Latin character corruption bug upon inline edit
Post by: shawnb61 on January 02, 2020, 02:20:15 PM
Yes - there are variations depending on if you are editing an existing message - & what mixture of content existed before & after. 
Title: Re: 2.0.16 non-Latin character corruption bug upon inline edit
Post by: shawnb61 on January 02, 2020, 02:29:14 PM
Quote from: spiros on January 02, 2020, 02:16:53 PM
Actually, I got empty message warning even with 3-byte UTF8 characters (I presume that this is what Greek characters are).

I think Greek monotonic characters are 2-bytes, & classical & polytonic are 3-bytes. 

Helpful resources:
https://www.utf8-chartable.de/unicode-utf8-table.pl
http://kermitproject.org/utf8.html

Title: Re: 2.0.16 non-Latin character corruption bug upon inline edit
Post by: spiros on January 02, 2020, 02:30:01 PM
I guess some multibyte-unsafe string was used somewhere along the line?
Title: Re: 2.0.16 non-Latin character corruption bug upon inline edit
Post by: shawnb61 on January 02, 2020, 02:41:23 PM
No.  It should all be safe.  This is an SMF bug.

At the core of the issue is that the early MySQL implementation of UTF8 was incomplete, & only supported 1-to-3 byte UTF8 characters.  4-byte UTF8 characters were not supported.  SMF started supporting UTF8 in this state, and thus requires special logic for 4-byte characters.  So... SMF has a lot of history & configurations to support with the same logic - non-UTF8, as well as a brain-damaged version of UTF8. 
Title: Re: 2.0.16 non-Latin character corruption bug upon inline edit
Post by: spiros on January 02, 2020, 02:55:42 PM
Well, I never had any such issues up to 2.0.14 and this issue has been tested by me with 2-to-3 byte UTF8 characters. I guess you are talking about using utfmb4 to resolve 4-byte UTF8 character issues which is a much broader thing https://github.com/SimpleMachines/SMF2.1/issues/5108 and, of course, no 2-to-3 byte UTF8 character issues occur with standard modify.
Title: Re: 2.0.16 non-Latin character corruption bug upon inline edit
Post by: shawnb61 on January 07, 2020, 12:52:58 AM
This is a dupe, so I'll close out this one & keep the other, which has a bit more discussion.

Dupe of:  https://www.simplemachines.org/community/index.php?topic=571082.0