Simple Machines Community Forum

SMF Development => Bug Reports => Fixed or Bogus Bugs => Topic started by: Özgür on April 01, 2009, 02:25:55 PM

Title: [3738] shorten_subject issue
Post by: Özgür on April 01, 2009, 02:25:55 PM
',shorten_subject($something['some'], 20),'

This code should work this
<a href="">arkadaslar özel kılıç...</a><br />
Here's how it works
<a href="">arkadaslar özel kılı�...</a><br />

� �   character non-utf8 and get xhtml error.
If i increasing 20 to 21 work fine,   looks "ç" but if twenty-first character is "ç" issue again..

',shorten_subject($something['some'], 21),'

I can't see any bug like this. I change my language files to original one but not work.

I can't validate my page..
System : 2.0 Rc1
Please look attacment.
Title: Re: shorten_subject issue
Post by: karlbenson on April 01, 2009, 02:35:26 PM
moving to bug reports.
Title: Re: shorten_subject issue
Post by: Özgür on April 01, 2009, 02:45:03 PM
I did not know and not sure this is bug. So I opened a topic 2.0 Help. Sorry and thanks. =)
Title: Re: shorten_subject issue
Post by: SleePy on April 05, 2009, 01:44:50 PM
Are you using UTF-8?
Title: Re: shorten_subject issue
Post by: Özgür on April 05, 2009, 01:46:37 PM
Quote from: SleePy on April 05, 2009, 01:44:50 PM
Are you using UTF-8?
Yes.
Title: Re: shorten_subject issue
Post by: Özgür on April 06, 2009, 06:52:52 PM
I remember now..
I try to "Convert UTF8" in admin panel long time ago (I guess Rc1 released.) And Converter have a bug.. Maybe this is the problem.

Converter bug is fixed?
Title: Re: shorten_subject issue
Post by: SleePy on April 26, 2009, 09:55:34 AM
It shouldn't be an issue with new posts, only old posts need the conversion.

We can try something.
Run This query In phpMyAdmin (What is phpMyAdmin? (http://www.simplemachines.org/community/index.php?topic=21919.0))

REPLACE INTO smf_settings VALUES ('disableEntityCheck', 1);


See if this works as it should now. shorten_subject appears to be using $smcFunc variables. So it should be UTF-8 safe. If this doesn't work, we will reverse that change.
Title: Re: shorten_subject issue
Post by: Özgür on April 26, 2009, 10:27:29 AM
Quote from: SleePy on April 26, 2009, 09:55:34 AM
It shouldn't be an issue with new posts, only old posts need the conversion.

We can try something.
Run This query In phpMyAdmin (What is phpMyAdmin? (http://www.simplemachines.org/community/index.php?topic=21919.0))

REPLACE INTO smf_settings VALUES ('disableEntityCheck', 1);


See if this works as it should now. shorten_subject appears to be using $smcFunc variables. So it should be UTF-8 safe. If this doesn't work, we will reverse that change.
I run this code in myphpmadmin. And nothing change. And i reverse to "0".
Utf-8 character encoding database values how seen?

My messages seems like this (in phpmyadmin) Ynt: Gitar Alıcam Yardım Lütfen
Title: Re: shorten_subject issue
Post by: Özgür on April 29, 2009, 07:43:43 PM
Plus  i have this error in error_log file.
Please help me for this.
Title: Re: shorten_subject issue
Post by: SleePy on April 30, 2009, 10:33:02 PM
Özgür´,

That first error tells me your SMF is missing some core checks.
I would suggest a reuploading all files from a large upgrade package (without the actual upgrade files). This will remove any modifications and custom themes.

Before you reapply your mods and themes. Test to see if new posts still have this issue. If you are still having the issues, then you can re apply mods. Otherwise re apply them 1 by 1, testing to see what mod breaks this.
Title: Re: shorten_subject issue
Post by: Özgür on June 19, 2009, 06:48:32 AM
This topic moved but problem not fixed yet. I change my shorten_subject function.
// Shorten a subject + internationalization concerns.
function shorten_subject($subject, $len)
{
global $smcFunc;

// It was already short enough!
if ($smcFunc['strlen']($subject) <= $len)
return $subject;

// Shorten it by the length it was too long, and strip off junk from the end.
return mb_substr($subject, 0, $len, 'UTF-8') . '...';


}

But why smf function not correctly worked?
Title: Re: shorten_subject issue
Post by: karlbenson on June 19, 2009, 07:26:30 AM
There I moved it back.

Have you tried what Sleepy suggested above?
Quote from: SleePy on April 30, 2009, 10:33:02 PM
Özgür´,

That first error tells me your SMF is missing some core checks.
I would suggest a reuploading all files from a large upgrade package (without the actual upgrade files). This will remove any modifications and custom themes.

Before you reapply your mods and themes. Test to see if new posts still have this issue. If you are still having the issues, then you can re apply mods. Otherwise re apply them 1 by 1, testing to see what mod breaks this.
Title: Re: shorten_subject issue
Post by: Özgür on June 19, 2009, 07:34:18 AM
Quote from: regularexpression on June 19, 2009, 07:26:30 AM
There I moved it back.

Have you tried what Sleepy suggested above?
Quote from: SleePy on April 30, 2009, 10:33:02 PM
Özgür´,

That first error tells me your SMF is missing some core checks.
I would suggest a reuploading all files from a large upgrade package (without the actual upgrade files). This will remove any modifications and custom themes.

Before you reapply your mods and themes. Test to see if new posts still have this issue. If you are still having the issues, then you can re apply mods. Otherwise re apply them 1 by 1, testing to see what mod breaks this.
Yes i hardly try actually.

Removed all files Sources files, Theme files etc. (not include settings.php) And i send clean smf 2.0 files. And send Upgrade files (upgrade sqls and upgrade.php) And i try upgrade again. Issue not solved. If you need anything (ssh, ftp, phpmyadmin or whatever) i will give.
Title: Re: shorten_subject issue
Post by: Özgür on June 27, 2009, 06:08:27 PM
Solve is different function. Because problem is "substr" function for php side..

<?php
$ifade
= 'Ne zaman seni düşünsem';
echo
substr($ifade,0,20);
?>

Give us = > Ne zaman seni düş����
but mb_substr
<?php
$ifade
= 'Ne zaman seni düşünsem';
echo
mb_substr($ifade,0,20,'UTF-8');
?>

Give us to => Ne zaman seni düşüns
Title: Multi-byte safe substr()
Post by: k14 on July 18, 2009, 04:32:12 PM
Version(s) of SMF: SMF 2.0 RC1-1
Any non-English Language packs I have installed: polish-utf8
Am I using UTF-8? YES

Where the Error Occurred:
File: /Themes/default/Calendar.template.php
Line: 307
Is:
<td class="titlebg2" width="14%" align="center" ', $calendar_data['size'] == 'small' ? 'style="font-size: x-small;"' : '', '>', !empty($calendar_data['short_day_titles']) ? substr($txt['days'][$day], 0, 1) : $txt['days'][$day], '</td>';
Should be:
<td class="titlebg2" width="14%" align="center" ', $calendar_data['size'] == 'small' ? 'style="font-size: x-small;"' : '', '>', !empty($calendar_data['short_day_titles']) ? mb_substr($txt['days'][$day], 0, 1, $context['character_set']) : $txt['days'][$day], '</td>';

Additional Information:
In polish "Wednesday" is "Środa". Using short day names in calendar with substr() causes Środa to be truncated to one byte only, not to one character. This results in displaying the first byte of 2-byte utf8 sequence instead of "Ś".
I suggest changing all occurrences of substr() to mb_substr() in all files.
Title: Re: Multi-byte safe substr()
Post by: Özgür on July 19, 2009, 07:21:57 AM
Related shorten_subject issue (http://www.simplemachines.org/community/index.php?topic=302468.0)
Title: Re: shorten_subject issue
Post by: karlbenson on September 07, 2009, 03:08:54 PM
Finally got around to tracking this bug report inline with a dev team discussion topic.
http://dev.simplemachines.org/mantis/view.php?id=3738
Title: Re: [3738] shorten_subject issue
Post by: Özgür on September 08, 2009, 08:27:05 AM
I glad to hear that. :)
Title: Re: [3738] shorten_subject issue
Post by: Elmacik on September 08, 2009, 08:43:59 AM
A good catch. In deed using standart mb functions everywhere won't solve the issue PHP's standard multibyte functions doesn't seem to work with converted entities. For example if you use mb_substr() instead of substr, it will help a lot for UTF-8, but it will break entities. Because it counts every character by one even in entities. SMF will have to workaround that and I don't think it will be hard.

For example mb_substr('some text &copy; some other text', 0, 12'); returns "some text &c" normally. But it causes wrong displaying so it should return "some text &copy; " naturally and it will display correctly. (Just like the behaviour difference between normal sorting and natural sorting.) Its nothing to do with special entities, but some UTF-8 chars will turn into HTML entities too. For example some letter in Turkish alphabet as Daydreamer pointed out.
Title: Re: [3738] shorten_subject issue
Post by: †MavN† on October 09, 2009, 03:05:44 AM

Still have issues in RC2 with utf-8.
For example, calendar's short day titles

<th class="titlebg2 days" scope="col" ', $calendar_data['size'] == 'small' ? 'style="font-size: x-small;"' : '', '>', !empty($calendar_data['short_day_titles']) ? substr($txt['days'][$day], 0, 1) : $txt['days'][$day], '</th>';

Should be replaced with

<th class="titlebg2 days" scope="col" ', $calendar_data['size'] == 'small' ? 'style="font-size: x-small;"' : '', '>', !empty($calendar_data['short_day_titles']) ? $smcFunc['substr']($txt['days'][$day], 0, 1) : $txt['days'][$day], '</th>';
Title: Re: [3738] shorten_subject issue
Post by: karlbenson on October 09, 2009, 01:31:24 PM
Mavn, we have not yet applied the fix as we're wanting to make sure we do it safely throughout.
Rather than fix it for one or two areas, we want to make sure we fix it for all

However there are some downsides (which is why we haven't just replaced all substr and strlen)
Title: Re: [3738] shorten_subject issue
Post by: Özgür on January 01, 2010, 07:13:13 PM
http://www.nabruk.com/forum/index.php RC2 still have this problem.
Title: Re: [3738] shorten_subject issue
Post by: karlbenson on January 02, 2010, 09:10:47 AM
Indeed DayDreamer
Title: Re: [3738] shorten_subject issue
Post by: digger on March 14, 2010, 07:50:12 PM
RC3 still have this problems.
Title: Re: [3738] shorten_subject issue
Post by: Arantor on March 14, 2010, 08:25:57 PM
Yup because the fix is still being debated (I was involved in the debate)