problrm with PM and special characters

Started by SamiBH, April 28, 2012, 08:24:35 AM

Previous topic - Next topic

SamiBH

Hello,

When trying to send pm to a member with special characters(stuff like @copy; ) in his name it will not send.(username not found)

info:
smf_settings: global_character_set  UTF-8
default language: Arabic UTF-8
Database: utf8_general_ci
Settings.php: last line Don't have  ' $db_character_set = 'utf8'; '
page source: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" dir="rtl"><head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />



SMF 1.1.16

if I add "$db_character_set = 'utf8';" into settings.php posts don't show up right . like "�?�?�??�?�"

any help would be appreciated


MrPhil

Quote from: SamiBH on April 28, 2012, 08:24:35 AM
When trying to send pm to a member with special characters(stuff like @copy; ) in his name it will not send.(username not found)
You mean they have "special" characters in their name? Interesting... the names work properly otherwise? How are you typing in these characters? Are you cutting and pasting or are you using HTML entities? I don't think that an entity like & copy; will work -- you need to cut and paste in (or use a "glass keyboard" such as "abcTajpu" for Firefox) the character itself. Be careful that whatever source you cut from is the same character encoding as your forum, so that the character is recognized.

Quote
Settings.php: last line Don't have  ' $db_character_set = 'utf8'; '
page source: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" dir="rtl"><head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

Is your forum displaying in UTF-8 or some other encoding? Your page header above seems to indicate that you are in UTF-8. Is this from a forum page or from an HTML-format email?

Quote
if I add "$db_character_set = 'utf8';" into settings.php posts don't show up right . like "�?�?�??�?�"
Hmm. That indicates that the text it's trying to display is not in UTF-8.

SamiBH

Hello,

thank you for the reply

Quote from: MrPhil on April 30, 2012, 11:36:35 AM
You mean they have "special" characters in their name? Interesting... the names work properly otherwise? How are you typing in these characters? Are you cutting and pasting or are you using HTML entities? I don't think that an entity like & copy; will work -- you need to cut and paste in (or use a "glass keyboard" such as "abcTajpu" for Firefox) the character itself. Be careful that whatever source you cut from is the same character encoding as your forum, so that the character is recognized.

yes , it's in the names and works fine everywhere , just when trying to send pm to one of them I get (user not found)

I don't copy the name, I just click on (send pm to the user) , of course I tried typing the name and putting it inside "" but no luck



Quote
Is your forum displaying in UTF-8 or some other encoding? Your page header above seems to indicate that you are in UTF-8. Is this from a forum page or from an HTML-format email?

from the forum

Quote
Hmm. That indicates that the text it's trying to display is not in UTF-8.

I think it's utf-8 as I told you mysql is in utf-8 and forum header is in utf-8 also

....................

is there another file beside settings.php that determines  what encoding the forum use?


MrPhil

So you're not actually typing in the name with the funny symbols, but replying to a PM or otherwise letting the system fill in the name? It also happens when you try to type in the name, with a character such as a (tm) cut and pasted from your own forum?

I seem to recall seeing complaints long ago about PM having trouble with user names containing odd characters. Have you done a thorough search of this forum? I don't remember if it was happening in your circumstances, or only when trying to type in a name (or use the auto-complete). Maybe if you find those discussions there will be some information there that can point us in the right direction.

By the way, was your forum originally in another encoding (Latin-x) and you converted it over to UTF-8? Are any of these offending names dating from pre-UTF-8 times? If so, they probably have the wrong byte(s) for the special (non-ASCII) characters. I don't know if the conversion to UTF-8 takes care of them. Can you tell if the characters in question are a single byte or multiple bytes? You might have to cut and paste them into a file and examine it under a hex editor, or an editor that can switch between UTF-8 and Latin-1. You might even be able just to display their name in your forum and see what switching your browser to Latin-1 does (View > Character Encoding)... do the special characters change to a stream of odd accented characters? If so, they are in UTF-8. If not, they never got converted to UTF-8, and may still be Latin-1 or even CP-1252 (originally cut and pasted from Word).

In your original post, you showed some text that was a string of ? and invalid character markers. It is not in UTF-8 and you are trying to display it in UTF-8. Is this text Arabic? Where did it come from -- language support file or a post?

SamiBH

Quote from: MrPhil on May 01, 2012, 10:10:27 AM
So you're not actually typing in the name with the funny symbols, but replying to a PM or otherwise letting the system fill in the name?

yes

Quote from: MrPhil on May 01, 2012, 10:10:27 AM
It also happens when you try to type in the name, with a character such as a (tm) cut and pasted from your own forum?

yes

Quote from: MrPhil on May 01, 2012, 10:10:27 AM
I seem to recall seeing complaints long ago about PM having trouble with user names containing odd characters. Have you done a thorough search of this forum? I don't remember if it was happening in your circumstances, or only when trying to type in a name (or use the auto-complete). Maybe if you find those discussions there will be some information there that can point us in the right direction.

thank you , did that already nothing helped

Quote from: MrPhil on May 01, 2012, 10:10:27 AM
By the way, was your forum originally in another encoding (Latin-x) and you converted it over to UTF-8?

yes it was.

Quote from: MrPhil on May 01, 2012, 10:10:27 AM

Are any of these offending names dating from pre-UTF-8 times? If so, they probably have the wrong byte(s) for the special (non-ASCII) characters. I don't know if the conversion to UTF-8 takes care of them. Can you tell if the characters in question are a single byte or multiple bytes? You might have to cut and paste them into a file and examine it under a hex editor, or an editor that can switch between UTF-8 and Latin-1. You might even be able just to display their name in your forum and see what switching your browser to Latin-1 does (View > Character Encoding)... do the special characters change to a stream of odd accented characters? If so, they are in UTF-8. If not, they never got converted to UTF-8, and may still be Latin-1 or even CP-1252 (originally cut and pasted from Word).

the users are after converting to utf-8.


Quote from: MrPhil on May 01, 2012, 10:10:27 AM
In your original post, you showed some text that was a string of ? and invalid character markers. It is not in UTF-8 and you are trying to display it in UTF-8. Is this text Arabic? Where did it come from -- language support file or a post?

it's Arabic from a post in the forum.

Arabic shows fine in the forum, only after adding  ' $db_character_set = 'utf8'; ' in Settings.php Arabic becomes like this "�?�?�??�?�"


MrPhil

Without the $db_character_set = 'utf8'; line, Arabic text displays OK (both language support files and posts)? You've gone to your browser's View > Character Encoding and confirmed that the page is in UTF-8 (also, View > Page Source and see <meta> "charset=UTF-8" tag)? When you add the $db_character_set = 'utf8'; line, none of the Arabic posts display now? What do View > Character Encoding and the 'charset' meta tag show? To get the ?-in-diamond "invalid character" glyph, it must be in UTF-8 now but the text is still another encoding, which seems to imply that it may not have been UTF-8 you were looking at earlier. These are recent posts entered after the (supposed) conversion to UTF-8? Does phpMyAdmin show the database to be in UTF-8 ("collation" includes "utf8", not "latin1")?

Note that I've seen servers misconfigured to force Latin-1 encoding, despite having a "charset=UTF-8" meta tag. However, if your Arabic text shows up OK, I would suspect that the server is not forcing Latin-1.

SamiBH

ok I found the problem.

I use custom file to Register users to my forum. if the username have special characters the problem happens

why, SMF uses "strtolower" to register username but my file don't .

with my file:
ĎӨӨレレα尺 >>becomes>> ĎӨӨレレα尺

with SMF:
ĎӨӨレレα尺 >>becomes>> ďөөレレα尺

so every time I try to send PM to "ĎӨӨレレα尺" SMF makes it "ďөөレレα尺" and don't find the user of course.

to fix this (if you have the same problem).

change:-

Subs-Auth.php---
from:
// Add slashes, trim, and fix wildcards for each name.
$names[$i] = addslashes(trim($func['strtolower']($name)));

to:
// Add slashes, trim, and fix wildcards for each name.
$names[$i] = addslashes(trim($name));

.................................................
PersonalMessage.php--
from:
$input[$rec_type][$index] = $func['htmlspecialchars']($func['strtolower'](stripslashes(trim($member))));
to:
$input[$rec_type][$index] = $func['htmlspecialchars']((stripslashes(trim($member))));

from:
if (array_intersect(array($func['strtolower']($member['username']), $func['strtolower']($member['name']), $func['strtolower']($member['email'])), $to_members))
to:
if (array_intersect(array(($member['username']), $func['strtolower']($member['name']), $func['strtolower']($member['email'])), $to_members))


from:
$input[$rec_type] = array_diff($input[$rec_type], array($func['strtolower']($member['username']), $func['strtolower']($member['name']), $func['strtolower']($member['email'])));

to:
$input[$rec_type] = array_diff($input[$rec_type], array(($member['username']), $func['strtolower']($member['name']), $func['strtolower']($member['email'])));


thank you MrPhil and sorry if I didn't make my problem clear :)

Advertisement: