Incorrect string value following upgrade to 2.1 RC3 from 2.0.17

Started by ozbob, December 23, 2020, 09:01:43 PM

Previous topic - Next topic

ozbob

Hello

Upgraded a test forum from 2.0.17 to 2.1RC3.

Manged to solve a few minor issues and is running well except for one repeating error with database.

Type of error: Database
: Incorrect string value: '\xE0\xAE\x9A\xE0\xAE\xBF...' for column `railbotforum_smf21`.`smf_settings`.`value` at row 1
http://railbotforum.org/forum/cron.php
/home/railbotforum/public_html/forum/Sources/Subs.php (Line 528)

Anyone know how to resolve this error please?

Thanks.
Bob

ozbob

I put in the correct overall time offset and I thought the error had gone away.

Still happening, but much slower rate.

ozbob

Interesting.  Error seems to have gone away now.  Might have cleared out some incorrect strings or something.

Edit: No, still occurring but just the odd error now.

Might see if it eventually goes away.


albertlast


ozbob

Error still occurring:

Type of error: Database
Error message:
: Incorrect string value: '\xE0\xAE\x9A\xE0\xAE\xBF...' for column `railbotforum_smf21`.`smf_settings`.`value` at row 1
File: /home/railbotforum/public_html/forum/Sources/Subs.php
Line: 528
URL of page causing the error: https://railbotforum.org/forum/index.phphttp://railbotforum.org/forum/cron.php

=====

Backtrace information

#0: smf_db_error()
Called from /home/railbotforum/public_html/forum/Sources/Subs-Db-mysql.php on line 489
#1: smf_db_query()
Called from /home/railbotforum/public_html/forum/Sources/Subs-Db-mysql.php on line 804
#2: smf_db_insert()
Called from /home/railbotforum/public_html/forum/Sources/Subs.php on line 528
#3: updateSettings()
Called from /home/railbotforum/public_html/forum/Sources/Subs.php on line 6605
#4: set_tld_regex()
Called from /home/railbotforum/public_html/forum/Sources/tasks/UpdateTldRegex.php on line 31
#5: execute()
Called from /home/railbotforum/public_html/forum/cron.php on line 233
#6: perform_task()
Called from /home/railbotforum/public_html/forum/cron.php on line 127

albertlast

so it's an error from set_tld_regex function in the subs.php file maybe in line 6605.

ozbob

This is around line 6605 on Subs.php

// Kill anything else.  < line 6605


function smf_serverResponse($data = '', $type = 'content-type: application/json')
{
global $db_show_debug, $modSettings;

// Defensive programming anyone?
if (empty($data))
return false;

// Don't need extra stuff...
$db_show_debug = false;

// Kill anything else.
ob_end_clean();

if (!empty($modSettings['CompressedOutput']))
@ob_start('ob_gzhandler');

else
ob_start();

// Set the header.
header($type);

// Echo!
echo $data;

// Done.
obExit(false);
}

ozbob

From Subs.php  from line 6640


function set_tld_regex($update = false)
{
global $sourcedir, $smcFunc, $modSettings;
static $done = false;

// If we don't need to do anything, don't
if (!$update && $done)
return;

// Should we get a new copy of the official list of TLDs?
if ($update)
{
$tlds = fetch_web_data('https://data.iana.org/TLD/tlds-alpha-by-domain.txt');
$tlds_md5 = fetch_web_data('https://data.iana.org/TLD/tlds-alpha-by-domain.txt.md5');

/**
* If the Internet Assigned Numbers Authority can't be reached, the Internet is GONE!
* We're probably running on a server hidden in a bunker deep underground to protect
* it from marauding bandits roaming on the surface. We don't want to waste precious
* electricity on pointlessly repeating background tasks, so we'll wait until the next
* regularly scheduled update to see if civilization has been restored.
*/
if ($tlds === false || $tlds_md5 === false)
$postapocalypticNightmare = true;

// Make sure nothing went horribly wrong along the way.
if (md5($tlds) != substr($tlds_md5, 0, 32))
$tlds = array();
}
// If we aren't updating and the regex is valid, we're done
elseif (!empty($modSettings['tld_regex']) && @preg_match('~' . $modSettings['tld_regex'] . '~', null) !== false)
{
$done = true;
return;
}

// If we successfully got an update, process the list into an array
if (!empty($tlds))
{
// Clean $tlds and convert it to an array
$tlds = array_filter(explode("\n", strtolower($tlds)), function($line)
{
$line = trim($line);
if (empty($line) || strlen($line) != strspn($line, 'abcdefghijklmnopqrstuvwxyz0123456789-'))
return false;
else
return true;
});

// Convert Punycode to Unicode
require_once($sourcedir . '/Class-Punycode.php');
$Punycode = new Punycode();
$tlds = array_map(function($input) use ($Punycode)
{
return $Punycode->decode($input);
}, $tlds);
}
// Otherwise, use the 2012 list of gTLDs and ccTLDs for now and schedule a background update
else
{
$tlds = array('com', 'net', 'org', 'edu', 'gov', 'mil', 'aero', 'asia', 'biz',
'cat', 'coop', 'info', 'int', 'jobs', 'mobi', 'museum', 'name', 'post',
'pro', 'tel', 'travel', 'xxx', 'ac', 'ad', 'ae', 'af', 'ag', 'ai', 'al',
'am', 'ao', 'aq', 'ar', 'as', 'at', 'au', 'aw', 'ax', 'az', 'ba', 'bb', 'bd',
'be', 'bf', 'bg', 'bh', 'bi', 'bj', 'bm', 'bn', 'bo', 'br', 'bs', 'bt', 'bv',
'bw', 'by', 'bz', 'ca', 'cc', 'cd', 'cf', 'cg', 'ch', 'ci', 'ck', 'cl', 'cm',
'cn', 'co', 'cr', 'cu', 'cv', 'cx', 'cy', 'cz', 'de', 'dj', 'dk', 'dm', 'do',
'dz', 'ec', 'ee', 'eg', 'er', 'es', 'et', 'eu', 'fi', 'fj', 'fk', 'fm', 'fo',
'fr', 'ga', 'gb', 'gd', 'ge', 'gf', 'gg', 'gh', 'gi', 'gl', 'gm', 'gn', 'gp',
'gq', 'gr', 'gs', 'gt', 'gu', 'gw', 'gy', 'hk', 'hm', 'hn', 'hr', 'ht', 'hu',
'id', 'ie', 'il', 'im', 'in', 'io', 'iq', 'ir', 'is', 'it', 'je', 'jm', 'jo',
'jp', 'ke', 'kg', 'kh', 'ki', 'km', 'kn', 'kp', 'kr', 'kw', 'ky', 'kz', 'la',
'lb', 'lc', 'li', 'lk', 'lr', 'ls', 'lt', 'lu', 'lv', 'ly', 'ma', 'mc', 'md',
'me', 'mg', 'mh', 'mk', 'ml', 'mm', 'mn', 'mo', 'mp', 'mq', 'mr', 'ms', 'mt',
'mu', 'mv', 'mw', 'mx', 'my', 'mz', 'na', 'nc', 'ne', 'nf', 'ng', 'ni', 'nl',
'no', 'np', 'nr', 'nu', 'nz', 'om', 'pa', 'pe', 'pf', 'pg', 'ph', 'pk', 'pl',
'pm', 'pn', 'pr', 'ps', 'pt', 'pw', 'py', 'qa', 're', 'ro', 'rs', 'ru', 'rw',
'sa', 'sb', 'sc', 'sd', 'se', 'sg', 'sh', 'si', 'sj', 'sk', 'sl', 'sm', 'sn',
'so', 'sr', 'ss', 'st', 'su', 'sv', 'sx', 'sy', 'sz', 'tc', 'td', 'tf', 'tg',
'th', 'tj', 'tk', 'tl', 'tm', 'tn', 'to', 'tr', 'tt', 'tv', 'tw', 'tz', 'ua',
'ug', 'uk', 'us', 'uy', 'uz', 'va', 'vc', 've', 'vg', 'vi', 'vn', 'vu', 'wf',
'ws', 'ye', 'yt', 'za', 'zm', 'zw',
);

// Schedule a background update, unless civilization has collapsed and/or we are having connectivity issues.
if (empty($postapocalypticNightmare))
{
$smcFunc['db_insert']('insert', '{db_prefix}background_tasks',
array('task_file' => 'string-255', 'task_class' => 'string-255', 'task_data' => 'string', 'claimed_time' => 'int'),
array('$sourcedir/tasks/UpdateTldRegex.php', 'Update_TLD_Regex', '', 0), array()
);
}
}

// Tack on some "special use domain names" that aren't in DNS but may possibly resolve.
// See https://www.iana.org/assignments/special-use-domain-names/ for more info.
$tlds = array_merge($tlds, array('local', 'onion', 'test'));

// Get an optimized regex to match all the TLDs
$tld_regex = build_regex($tlds);

// Remember the new regex in $modSettings
updateSettings(array('tld_regex' => $tld_regex));

// Redundant repetition is redundant
$done = true;
}

shawnb61

The thing is that \xE0\xAE\x9A\xE0\xAE\xBF is valid utf8.  0xE0AE9A is ச & 0xEOAEBF is ி (which seems to say the prior character is a vowel???). 

So it is having issues with சி in the downloaded tld string.  Which, I just checked, is the very first character...

Questions:
Is your settings table utf8? 
Are all the columns in your settings table utf8?

It should look something like the attached when the Structure tab is viewed in phpmyadmin. 


See also:
https://github.com/SimpleMachines/SMF2.1/pull/5707
https://github.com/SimpleMachines/SMF2.1/issues/5672
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

ozbob

Thanks.

Appears not.  I thought the db was converted to UTF8 during the upgrade process.

Should I attempt to convert  to UTF8?  I see there is a change tab in phpmyadmin.


shawnb61

It is converted during the upgrade process.  It appears something happened during the conversion that caused it to skip some tables/columns.

The real question is how bad is it?

Try to run this query from the Sql prompt in phpmyadmin:
SELECT TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, COLLATION_NAME
FROM information_schema.COLUMNS
WHERE COLLATION_NAME IS NOT NULL
AND COLLATION_NAME != 'utf8_general_ci'
AND TABLE_SCHEMA = '??'
AND TABLE_NAME LIKE 'smf_%';


Substitute your $db_name for ?? and your $db_prefix for smf_ in the last 2 lines. 

This will let us know how big the problem is... 
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

ozbob


ozbob

Ran the query:

Look like a number of tables not converted.


shawnb61

Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp


shawnb61

OK....   I would suggest:

1) Backing everything up...  Very important, in case we break something even worse...
2) Convert the above each to utf8_general_ci.  Repeat until the above query comes back with no rows.
3) Confirm you have an entry in your Settings.php file for $db_character_set = 'utf8';
4) Confirm you have an entry in your smf_settings table for:
      variable:  global_character_set
      value: UTF-8

3 & 4 are hopefully not necessary, just double-checking to be safe.
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

ozbob


shawnb61

I would do this at the table level in phpmyadmin.

If you do so, there is an option to make it convert all the columns also.  I think this is the best way to do it.
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp


ozbob

Successfully converted all tables as instructed to UTF8 general.

Checked settings.php


######### Legacy Settings #########
# UTF-8 is now the only character set supported in 2.1.
$db_character_set = 'utf8';


Checked smf_settings table.

global_character_set UTF-8

All tables are now UTF8, ran the query and null return.

And

https://railbotforum.org/forum/index.php is running without errors.

Thank you so much for your assistance.  I have learned a bit too.

Have a great day (or is it night  :D )  I am in Australia and it is 8am Christmas Day morning.

Merry Christmas to you all as we say.

Thanks for your assistance and a great forum.

Advertisement: