Database Error: Incorrect string value

Started by zselby, April 17, 2023, 11:41:07 AM

Previous topic - Next topic

zselby

Hi,

I recently upgraded to 2.1.3 from 1.1.x.  Things seemed to go fine, however I'm occasionally seeing the following errors in the log and a few users are getting database errors when trying to post. 

Errors are:

Database Error: Incorrect string value: '\xE2\xEF\xBF\xBD\xEF\xBF...' for column `larrivee_forumtemp`.`smf_messages`.`body` at row 1

and

Database Error: Incorrect string value: '\xC2 ...' for column `larrivee_forumtemp`.`smf_messages`.`body` at row 1

Collation of all tables and columns is utf8_general_ci which I updated via phpMyAdmin.  Only mod installed is EzPortal.  I searched and tried everything I could find, but continue getting these errors.  Any help would be greatly appreciated.

Doug Heffernan

Quote from: zselby on April 17, 2023, 11:41:07 AMCollation of all tables and columns is utf8_general_ci which I updated via phpMyAdmin. 

Can you make a screenshot of the structure area of the smf_messages table and post it here?

zselby


Doug Heffernan

What about the language, what is the default language set to and what language are those members that get the error using?

Can you try something? Make first a backup of your database and then go to your Acp->Forum Maintenance->Database and run the following option: Convert HTML-entities to UTF-8 characters?

I also noticed that the Type for the body column is set to mediumtext. That should be set to text.

zselby

Quote from: Doug Heffernan on April 17, 2023, 01:58:13 PMWhat about the language, what is the default language set to and what language are those members that get the error using?

Can you try something? Make first a backup of your database and then go to your Acp->Forum Maintenance->Database and run the following option: Convert HTML-entities to UTF-8 characters?

I also noticed that the Type for the body column is set to mediumtext. That should be set to text.

Only language I have installed in English and all users (I think) are using English.  Is there a place to check that?

I don't have Convert HTML-entities to UTF-8 characters as an option in ACP->forum maintenance->database.  Only two options I have are Optimize All Tables and Convert to Text.

Arantor

How *exactly* did you do the conversion to UTF-8? Did you do it manually before the upgrade?
Holder of controversial views, all of which my own.


Doug Heffernan

Quote from: zselby on April 17, 2023, 02:08:57 PMOnly language I have installed in English and all users (I think) are using English.  Is there a place to check that?

You can check it by going to your Admin Panel->Languages. What does it say under the Character Set for the language that you are using?

Quote from: zselby on April 17, 2023, 02:08:57 PMI don't have Convert HTML-entities to UTF-8 characters as an option in ACP->forum maintenance->database.  Only two options I have are Optimize All Tables and Convert to Text.

It should be there, the last option. Can you post a screenshot of how it looks like at your Admin Panel?

Additionally, check out the following link and make sure that your conversion to UTF-8 has been done properly/thoroughly.

https://wiki.simplemachines.org/smf/UTF-8_Readme#:~:text=Go%20to%20Forum%20Maintenance%20%3E%20Convert,of%20your%20default%20language%20file.

zselby

Quote from: Arantor on April 17, 2023, 02:09:59 PMHow *exactly* did you do the conversion to UTF-8? Did you do it manually before the upgrade?

To be honest I don't remember.  I vaguely remember converting to UTF-8 via admin when I was running 1.x.x, but maybe that's not possible.  The forum was running 1.x.x for almost 19 years.

I didn't make the change in phpMyAdmin until after the upgrade when I starting seeing these errors.

zselby

Quote from: Doug Heffernan on April 17, 2023, 02:21:31 PM
Quote from: zselby on April 17, 2023, 02:08:57 PMOnly language I have installed in English and all users (I think) are using English.  Is there a place to check that?

You can check it by going to your Admin Panel->Languages. What does it say under the Character Set for the language that you are using?

Quote from: zselby on April 17, 2023, 02:08:57 PMI don't have Convert HTML-entities to UTF-8 characters as an option in ACP->forum maintenance->database.  Only two options I have are Optimize All Tables and Convert to Text.

It should be there, the last option. Can you post a screenshot of how it looks like at your Admin Panel?

Additionally, check out the following link and make sure that your conversion to UTF-8 has been done properly/thoroughly.

https://wiki.simplemachines.org/smf/UTF-8_Readme#:~:text=Go%20to%20Forum%20Maintenance%20%3E%20Convert,of%20your%20default%20language%20file.

English is the only language listed and the character set is UTF-8.

Screenshot is attached.

Doug Heffernan

Quote from: zselby on April 17, 2023, 02:28:43 PMEnglish is the only language listed and the character set is UTF-8.

It appears that you are using a custom theme. Try switching to the default one and see if the Convert HTML-entities to UTF-8 characters will appear then.


zselby

Quote from: Doug Heffernan on April 17, 2023, 02:32:51 PM
Quote from: zselby on April 17, 2023, 02:28:43 PMEnglish is the only language listed and the character set is UTF-8.

It appears that you are using a custom theme. Try switching to the default one and see if the Convert HTML-entities to UTF-8 characters will appear then.



Thanks.  I found that suggestion via search the other day and just tried it again.  I'm still not seeing it listed.

Doug Heffernan

Quote from: zselby on April 17, 2023, 02:35:31 PMThanks.  I found that suggestion via search the other day and just tried it again.  I'm still not seeing it listed.

That 's very strange. That option should be there.

What about this?

Quote from: Doug Heffernan on April 17, 2023, 02:21:31 PMAdditionally, check out the following link and make sure that your conversion to UTF-8 has been done properly/thoroughly.

https://wiki.simplemachines.org/smf/UTF-8_Readme#:~:text=Go%20to%20Forum%20Maintenance%20%3E%20Convert,of%20your%20default%20language%20file.

Arantor

Quote from: Doug Heffernan on April 17, 2023, 02:39:09 PMThat option should be there.

Only if SMF thinks it's in UTF-8 mode. Evidence suggests it does not think this because it's trying to write ISO-encoded content into UTF-8 tables.
Holder of controversial views, all of which my own.


zselby

Quote from: Doug Heffernan on April 17, 2023, 02:39:09 PM
Quote from: zselby on April 17, 2023, 02:35:31 PMThanks.  I found that suggestion via search the other day and just tried it again.  I'm still not seeing it listed.

That 's very strange. That option should be there.

What about this?

Quote from: Doug Heffernan on April 17, 2023, 02:21:31 PMAdditionally, check out the following link and make sure that your conversion to UTF-8 has been done properly/thoroughly.

https://wiki.simplemachines.org/smf/UTF-8_Readme#:~:text=Go%20to%20Forum%20Maintenance%20%3E%20Convert,of%20your%20default%20language%20file.


3. Go to Forum Maintenance > Convert the database and data to UTF-8 (this option will only be available if SMF detects a database version which supports UTF-8).

That's what I believe I did when running 1.1, and I don't have it as an option now.

UPDATE smf_members
SET lngfile = CONCAT(lngfile, '-utf8')
WHERE lngfile != ''


I had not run that query but just did.

shawnb61

Some more questions:
- What is the value of $db_character_set in your Settings.php file?  (It should be: 'utf8')
- What is the value of the 'global_character_set' entry in your smf_settings table?  (It should be: 'UTF-8')
- Are they getting the errors when updating existing posts, or adding new posts?
- Do you know what the source of the material is?

xC2 is not a valid utf8 character.  (Is there any more to that string?)

Neither is  xE2EFBFBDEFBF. 

SMF2.1 will strip out/cleanse invalid utf8 characters when posting, so I don't think these are fresh posts.  My first suspicion is that this invalid data was introduced back in SMF1.x, and these messages are popping up as the records are being updated directly or indirectly somehow.

A question worth asking is born in experience & driven by necessity. - Fripp

zselby

Quote from: shawnb61 on April 17, 2023, 02:53:16 PMSome more questions:
- What is the value of $db_character_set in your Settings.php file?  (It should be: 'utf8')
'utf8'
Quote- What is the value of the 'global_character_set' entry in your smf_settings table?  (It should be: 'UTF-8')
I don't have a global_character_set in smf_settings
Quote- Are they getting the errors when updating existing posts, or adding new posts?
I don't know for all cases, but in at least one it was replying to a recent post that was created after upgrading.
Quote- Do you know what the source of the material is?
I don't
QuotexC2 is not a valid utf8 character.  (Is there any more to that string?)
No - Database Error: Incorrect string value: '\xC2 ...'
QuoteNeither is  xE2EFBFBDEFBF. 

SMF2.1 will strip out/cleanse invalid utf8 characters when posting, so I don't think these are fresh posts.  My first suspicion is that this invalid data was introduced back in SMF1.x, and these messages are popping up as the records are being updated directly or indirectly somehow.


Sesquipedalian

In the first error message, the sequence of bytes indicated by "\xE2\xEF\xBF\xBD\xEF\xBF..." is not valid UTF-8. The substring "\xEF\xBF\xBD" is the sequence of bytes for the Unicode replacement character, "�", but the single byte before it, "\xE2", can never occur alone in a valid UTF-8 string. The database is rejecting the string because it is malformed.

My suspicion is that the values of $context['utf8'] and/or $context['character_set'] somehow have gotten messed up, and that this is causing problems in the sanitize_chars() function in Subs.php. In particular, if these values are incorrect in just the right way, and the input string contains some disallowed control characters, it might be possible that sanitize_chars() could end up trying to use the wrong method to replace those disallowed characters.

@zselby:

1. Can you give us the entire error message from the error log, and especially the backtrace information?

2. Please open the ./Themes/default/languages/index.english.php file and verify that the following line appears, written exactly as shown:

$txt['lang_character_set'] = 'UTF-8';
I promise you nothing.

Sesqu... Sesqui... what?
Sesquipedalian, the best word in the English language.

zselby

Quote from: Sesquipedalian on April 17, 2023, 03:08:55 PMIn the first error message, the sequence of bytes indicated by "\xE2\xEF\xBF\xBD\xEF\xBF..." is not valid UTF-8. The substring "\xEF\xBF\xBD" is the sequence of bytes for the Unicode replacement character, "�", but the single byte before it, "\xE2", can never occur alone in a valid UTF-8 string. The database is rejecting the string because it is malformed.

My suspicion is that the values of $context['utf8'] and/or $context['character_set'] somehow have gotten messed up, and that this is causing problems in the sanitize_chars() function in Subs.php. In particular, if these values are incorrect in just the right way, and the input string contains some disallowed control characters, it might be possible that sanitize_chars() could end up trying to replace those disallowed characters with "�" rather than "?".

@zselby:

1. Can you give us the entire error message from the error log, and especially the backtrace information?

Error
Type of error
Database
Error message
Database Error: Incorrect string value: '\xF0\xEF\xBF\xBD\xA4\xEF...' for column `larrivee_forumtemp`.`smf_personal_messages`.`body` at row 1
File
/home/larrivee/public_html/forum/Sources/Subs-Post.php
Line
1145
URL of page causing the error
http://www.larriveeforum.com/forum/index.php?action=pm;sa=send2 [nofollow]
Backtrace information
#0: smf_db_error()
Called from /home/larrivee/public_html/forum/Sources/Subs-Db-mysql.php on line 494
#1: smf_db_query()
Called from /home/larrivee/public_html/forum/Sources/Subs-Db-mysql.php on line 811
#2: smf_db_insert()
Called from /home/larrivee/public_html/forum/Sources/Subs-Post.php on line 1145
#3: sendpm()
Called from /home/larrivee/public_html/forum/Sources/PersonalMessage.php on line 2560
#4: MessagePost2()
Called from /home/larrivee/public_html/forum/Sources/Subs.php on line 6016
#5: call_helper()
Called from /home/larrivee/public_html/forum/Sources/PersonalMessage.php on line 220
#6: MessageMain()
Called from /home/larrivee/public_html/forum/index.php on line 192

----------------------------------------------------------------------------------------------

Error
Type of error
Database
Error message
Database Error: Incorrect string value: '\xC2 ...' for column `larrivee_forumtemp`.`smf_messages`.`body` at row 1
File
/home/larrivee/public_html/forum/Sources/Subs-Post.php
Line
1897
URL of page causing the error
http://www.larriveeforum.com/forum/index.php?action=post2;start=0;board=5 [nofollow]
Backtrace information
#0: smf_db_error()
Called from /home/larrivee/public_html/forum/Sources/Subs-Db-mysql.php on line 494
#1: smf_db_query()
Called from /home/larrivee/public_html/forum/Sources/Subs-Db-mysql.php on line 811
#2: smf_db_insert()
Called from /home/larrivee/public_html/forum/Sources/Subs-Post.php on line 1897
#3: createPost()
Called from /home/larrivee/public_html/forum/Sources/Post.php on line 2380
#4: Post2()
Called from /home/larrivee/public_html/forum/index.php on line 192

Quote2. Please open the ./Themes/default/languages/index.english.php file and verify that the following line appears, written exactly as shown:

$txt['lang_character_set'] = 'UTF-8';

Yes:  $txt['lang_character_set'] = 'UTF-8';

shawnb61

First suggestion is to set global_character_set...
A question worth asking is born in experience & driven by necessity. - Fripp

zselby


Advertisement: