Simple Machines Community Forum

SMF Support => SMF 2.0.x Support => Topic started by: pabloalcorta on June 15, 2019, 01:10:45 PM

Title: Error Convert HTML-entities to UTF-8 characters
Post by: pabloalcorta on June 15, 2019, 01:10:45 PM
hello I'm following the tutorial https://wiki.simplemachines.org/smf/UTF-8_Readme  by running "Convert HTML-entities to UTF-8 characters" of I get this error

Incorrect string value: '\xF0\x9F\x98\xB3Archivo: /storage/ssd3/905/9716905/public_html/prueba12/Sources/ManageMaintenance.php
line: 950


I'm using another server to do tests

thanks
Title: Re: Error Convert HTML-entities to UTF-8 characters
Post by: GigaWatt on June 17, 2019, 08:13:30 AM
Could you paste what's around line 950 in ManageMaintenance.php (20 lines above and below line 950)?
Title: Re: Error Convert HTML-entities to UTF-8 characters
Post by: pabloalcorta on June 17, 2019, 01:07:01 PM
ok here here it is
Code: [Select]
if (empty($max_value))
continue;

while ($context['start'] <= $max_value)
{
// Retrieve a list of rows that has at least one entity to convert.
$request = $smcFunc['db_query']('', '
SELECT {raw:primary_keys}, {raw:columns}
FROM {db_prefix}{raw:cur_table}
WHERE {raw:primary_key} BETWEEN {int:start} AND {int:start} + 499
AND {raw:like_compare}
LIMIT 500',
array(
'primary_keys' => implode(', ', $primary_keys),
'columns' => implode(', ', $columns),
'cur_table' => $cur_table,
'primary_key' => $primary_key,
'start' => $context['start'],
'like_compare' => '(' . implode(' LIKE \'%&#%\' OR ', $columns) . ' LIKE \'%&#%\')',
)
);
while ($row = $smcFunc['db_fetch_assoc']($request))
{
$insertion_variables = array();
$changes = array();
foreach ($row as $column_name => $column_value)
if ($column_name !== $primary_key && strpos($column_value, '&#') !== false)
{
$changes[] = $column_name . ' = {string:changes_' . $column_name . '}';
$insertion_variables['changes_' . $column_name] = preg_replace_callback('~&#(\d{1,7}|x[0-9a-fA-F]{1,6});~', 'fixchar__callback', $column_value);
}

$where = array();
foreach ($primary_keys as $key)
{
$where[] = $key . ' = {string:where_' . $key . '}';
$insertion_variables['where_' . $key] = $row[$key];
}

// Update the row.
if (!empty($changes))
$smcFunc['db_query']('', '
UPDATE {db_prefix}' . $cur_table . '
SET
' . implode(',
', $changes) . '
WHERE ' . implode(' AND ', $where),
$insertion_variables
);
}
$smcFunc['db_free_result']($request);
$context['start'] += 500;

// After ten seconds interrupt.
if (time() - $context['start_time'] > 10)
{
// Calculate an approximation of the percentage done.
$context['continue_percent'] = round(100 * ($context['table'] + ($context['start'] / $max_value)) / $context['num_tables'], 1);
$context['continue_get_data'] = '?action=admin;area=maintain;sa=database;activity=convertentities;table=' . $context['table'] . ';start=' . $context['start'] . ';' . $context['session_var'] . '=' . $context['session_id'];
return;
}
}
$context['start'] = 0;
}

// Make sure all serialized strings are all right.
require_once($sourcedir . '/Subs-Charset.php');
fix_serialized_columns();

// If we're here, we must be done.
$context['continue_percent'] = 100;
$context['continue_get_data'] = '?action=admin;area=maintain;sa=database;done=convertentities';
$context['last_step'] = true;
$context['continue_countdown'] = -1;
}

thanks
Title: Re: Error Convert HTML-entities to UTF-8 characters
Post by: GigaWatt on June 19, 2019, 04:21:45 AM
Just looked it up. xF0 x9F x98 xB3 are emoji unicode characters, which are represented as either empty spaces or unknown symbols by a text editor if the text editor uses a font that can't display emojis. If you haven't done any manual edits to SMF's files, a mod probably made those edits and added those extra characters. It's best to just replace that whole section with copy/pasting the whole thing from a fresh set of files. Copy and overwrite the code you have around line 950 with this code.

Code: [Select]
// Update the row.
if (!empty($changes))
$smcFunc['db_query']('', '
UPDATE {db_prefix}' . $cur_table . '
SET
' . implode(',
', $changes) . '
WHERE ' . implode(' AND ', $where),
$insertion_variables
);
}
$smcFunc['db_free_result']($request);
$context['start'] += 500;
Title: Re: Error Convert HTML-entities to UTF-8 characters
Post by: pabloalcorta on June 19, 2019, 10:03:50 PM
I have replaced it with the code and the same error keeps coming up
Title: Re: Error Convert HTML-entities to UTF-8 characters
Post by: albertlast on June 26, 2019, 12:50:25 AM
You could change the colum type to uftmb4,
this shouod fix the issue.
Title: Re: Error Convert HTML-entities to UTF-8 characters
Post by: shawnb61 on August 07, 2019, 04:40:46 PM
albertlast is correct.  That byte sequence \xF0\x9F\x98\xB3 is a 4-byte emoji, which a MySQL UTF8 DB cannot store natively. 

So you either convert your DB to UTF8MB4 or you find a way to convert those sequences to htmlentities prior to loading them. 


I *JUST* ran across this last night & remembered this recent thread...
Title: Re: Error Convert HTML-entities to UTF-8 characters
Post by: shawnb61 on August 11, 2019, 06:25:12 PM
For the record, I've been experimenting doing EXACTLY what pabloalcorta initially reported.  I think this is a bug, and is easily reproducible. 

Background:HOWEVER...  If you now attempt to convert the html entities to UTF8 using the SMF function, bad things happen, depending on whether your DB is in STRICT mode or not:SMF 2.0.x under some circumstances sets the sql_mode to '', i.e., NON-strict mode, so there is danger of data loss using the "Convert HTML-entities to UTF8 characters" function. 

To fix: We should have the entity-to-UTF8 conversion either leave 4-byte entities alone, or disable it altogether.  OR, cutover 2.0.x to STRICT mode.  Since 2.1 already uses STRICT mode, data is safe, but the entity conversion should be looked at as well, it likely produces the same error noted above. 

I think that 4-byte chars (emojis, CJK) are at the root of a lot of aborted UTF8 conversions for this very reason.
Title: Re: Error Convert HTML-entities to UTF-8 characters
Post by: pabloalcorta on August 16, 2019, 08:06:13 PM
qshawnb61,albertlast thanks I am reading and I will try next week to see if I can do it ... my null php knowledge I will try to do what they say thank you
Title: Re: Error Convert HTML-entities to UTF-8 characters
Post by: shawnb61 on August 16, 2019, 08:32:32 PM
pabloacorta -

You do not need to run that task.  If things look good, I would leave things as-is!
Title: Re: Error Convert HTML-entities to UTF-8 characters
Post by: pabloalcorta on August 27, 2019, 11:21:12 AM
shawnb61 ok thanks i still haven't tried it yet .... i'm trying to organize to do a test
Title: Re: Error Convert HTML-entities to UTF-8 characters
Post by: shawnb61 on August 27, 2019, 02:56:12 PM
Ok!  Just to be clear: due to the defect you helped identify above, I do NOT suggest running "convert html entities to utf8 characters". 

Just convert to utf8 & leave the entities alone for now.
Title: Re: Error Convert HTML-entities to UTF-8 characters
Post by: pabloalcorta on August 27, 2019, 03:23:06 PM
ok thanks