See it even on smf site: https://www.simplemachines.org/community/index.php?topic=570570.msg4042163#msg4042163
Reported here: https://www.simplemachines.org/community/index.php?topic=571082.new#new
I cannot replicate this exactly, but I am definitely seeing some issues. Logged #123.
A few questions:
- Is your DB UTF8?
- What browser are you using? Is it current?
- MySQL or pg? Version?
- PHP version?
Using the inline edit buttons I do see some erroneous "subject empty" and "message empty" items when multi-byte characters are encountered. Also some non-responses.
if the issue is being seen here, we are UTF-8 in the database. using
Database Server: MySQL
PHP: 7.1.28
- UTF8
- All major current browsers
- MariaDB 10.3
- PHP 7.1/7.2/7.3
Quote from: shawnb61 on January 02, 2020, 02:04:04 PM
I cannot replicate this exactly
Please try editing inline a specific post here and entering in title or body Greek or Cyrillic text for example.
OK, I have since been able to replicate. It depends which characters are being used.
If there are any 4-byte UTF8 characters, you get the old erroneous empty message/subject.
If there are only 2 or 3-byte UTF8 characters, this can occur.
Actually, I got empty message warning even with 3-byte UTF8 characters (I presume that this is what Greek characters are).
Yes - there are variations depending on if you are editing an existing message - & what mixture of content existed before & after.
Quote from: spiros on January 02, 2020, 02:16:53 PM
Actually, I got empty message warning even with 3-byte UTF8 characters (I presume that this is what Greek characters are).
I
think Greek monotonic characters are 2-bytes, & classical & polytonic are 3-bytes.
Helpful resources:
https://www.utf8-chartable.de/unicode-utf8-table.pl
http://kermitproject.org/utf8.html
I guess some multibyte-unsafe string was used somewhere along the line?
No. It should all be safe. This is an SMF bug.
At the core of the issue is that the early MySQL implementation of UTF8 was incomplete, & only supported 1-to-3 byte UTF8 characters. 4-byte UTF8 characters were not supported. SMF started supporting UTF8 in this state, and thus requires special logic for 4-byte characters. So... SMF has a lot of history & configurations to support with the same logic - non-UTF8, as well as a brain-damaged version of UTF8.
Well, I never had any such issues up to 2.0.14 and this issue has been tested by me with 2-to-3 byte UTF8 characters. I guess you are talking about using utfmb4 to resolve 4-byte UTF8 character issues which is a much broader thing https://github.com/SimpleMachines/SMF2.1/issues/5108 and, of course, no 2-to-3 byte UTF8 character issues occur with standard modify.
This is a dupe, so I'll close out this one & keep the other, which has a bit more discussion.
Dupe of: https://www.simplemachines.org/community/index.php?topic=571082.0