News:

Bored?  Looking to kill some time?  Want to chat with other SMF users?  Join us in IRC chat or Discord

Main Menu

Question marks instead of apostrophes.

Started by Antechinus, December 28, 2008, 02:24:31 AM

Previous topic - Next topic

Antechinus

Here's a strange one: some member's posts are showing a question mark where they should show an apostrophe. This only happens to some members, not all members.

It's beginning to make the place rather messy so a solution would be good.  Running 1.1.7 with TP 0.983 and there are no error messages showing up. Forum language is English (ie: not English UTF-8)

Acans

stupid question, is this (!!!) a apostrophe
"The Book of Arantor, 17:3-5
  And I said unto him, thy database query shalt always be sent by the messenger of $smcFunc
  And $smcFunc shall protect you against injections and evil
  And so it came to pass that mysql_query was declared deprecated and even though he says he is not
  dead yet, the time was soon to come to pass when mysql_query shall be gone and no more

ThorstenE

' is an apostrophe.

Quote from: antechinus on December 28, 2008, 02:24:31 AM
Forum language is English (ie: not English UTF-8)
database tables / fields are also latin or your type of charset? a question mark is often shown when you insert a latin/ANSI encoded special character into a UTF-8 collated database field.

a short test: try it with these characters (german umlauts) öäü
when you see a question mark your database table / field collation is UTF-8 I think.

Antechinus

Database is in latin_swedish_ci

This is not a case of non-English characters being inserted into posts. I know all about that. This is different. This is a case of people posting in standard English but when they submit their post the apostrophes in it show as question marks. Also, as I said, this only happens to some members. For instance it does not happen to me when I submit posts.

ThorstenE

maybe the members with this issue use a wrong (fixed UTF-8) encoding in their browser?

MrPhil

Another path to pursue: when people type in apostrophes and quotation marks, and they're changed to oddball/unprintable characters, it's the result of using a Microsoft editor. The "editor" (such as Word) "helpfully" changes the quotation marks and apostrophes to the typographically correct glyphs. This is called "smart quotes". Unfortunately, the character set that MS uses is not a standard one, and thus quotation marks and apostrophes don't show up correctly on a Web page. They're in an area reserved for control characters.

The $64 question is: how is an MS editor getting involved here in an SMF post? Are these particular users on Windows machines and using Word (or some other MS word processor) to edit their posts and cutting and pasting into SMF? That's the only way I can think of that incorrect characters could be dragged in, although it's morning and I haven't had my second cup of coffee yet... anyway, that's another thing to look into.

Antechinus

That's a good point that I hadn't thought of. I'll poll the members affected by this and see what turns up. Thanks.

Antechinus

Ok, nobody is using an external program like Word or whatever and everyone has their browser on the default settings as downloaded. IOW, exactly the same as people who aren't generating this problem.

Any more ideas?

MrPhil

Hmm. Since it doesn't affect everyone, it can't be some kind of trash in the censored words list. Does it affect only certain kinds of browsers, or a wide range? Is there any code anywhere to replace ' by ' or anything like that? Maybe some browsers don't recognize the particular HTML entity being used. There's got to be some common pattern by browser make, version, PC language configuration, keyboard, user group, or something.

I've seen some people posting on these SMF forums whose apostrophes are always escaped (\'), and I've always wondered if they were doing that manually out of habit, or what. It does suggest that somewhere, apostrophes are being singled out for special treatment. When they're put into the database, they're supposed to be escaped (and then unescaped on the flip side) -- maybe something there isn't working, but then again, I'd think it would affect everyone. Is data handed off from TinyPortal to SMF and vice-versa? Maybe they're doing some kind of irreversible change in the process?

When you say "question mark", is this just a plain "?", a white "?" on a black diamond, or something else? Are any of these users tech-savvy enough to View > Page Source and see just what the browser is being asked to display -- an HTML entity, a plain "?", or a binary byte (or double byte)? Is SMF outputting UTF-8 for its page? Is TinyPortal putting out the same page encoding? If there were a mixed character set encoding, I'd think that everyone would experience the same problem.

Can you give the SMF community a link to a posting so we can see if this happens on our browsers, and report back what we're running on?

Antechinus

Quote from: MrPhil on January 01, 2009, 06:44:36 PM
Hmm. Since it doesn't affect everyone, it can't be some kind of trash in the censored words list. Does it affect only certain kinds of browsers, or a wide range? Is there any code anywhere to replace ' by ' or anything like that? Maybe some browsers don't recognize the particular HTML entity being used. There's got to be some common pattern by browser make, version, PC language configuration, keyboard, user group, or something.
We don't censor words so the censored words list is extremely short.  ;)  The only thing in the list is &nbsp and that is censored to &nbsp. This may seem crazy but it is a dodge to get around a bug in the AEVC mod and it works.

You're right that there has to be a common pattern but tracking it could be fun. I'll see what other questions I can think of to ask them.

QuoteIs data handed off from TinyPortal to SMF and vice-versa? Maybe they're doing some kind of irreversible change in the process?
I've no idea.

QuoteWhen you say "question mark", is this just a plain "?"
Yes

QuoteAre any of these users tech-savvy enough to View > Page Source and see just what the browser is being asked to display -- an HTML entity, a plain "?", or a binary byte (or double byte)?
No such luck. They have trouble finding the keyboard.


QuoteCan you give the SMF community a link to a posting so we can see if this happens on our browsers, and report back what we're running on?
That's probably the best thing to do. Here's a link.

MrPhil

I can see the (') replaced with (?) in the supplied link. The page source shows ISO-8859-1 (Latin-1) character set, the header info looks valid (the W3C validator doesn't show any gross errors), and the character appears to be a normal (?) (x3F).

I presume that as the poster typed in the message, they saw (') rather than (?). Are you comfortable enough with phpMyAdmin (or some other database maintenance tool) to locate the message and see what bytes are in the database entry itself? Can you ask your users if they see this problem only with IE (which version?) or with both IE and FF? I think one of the posts said that they "hadn't changed anything" when the problem occurred. If they didn't do any updates to their OS or browser (including security patches from MS), did you (or your hosting service) change anything on the server side? New MySQL level? New PHP level? Migrated server (along with software changes or lost/damaged configuration files)?

By the way, is TinyPortal running? I see only SMF in the link. Is there any difference in going through TP versus straight to SMF? I'm wondering if TP is "helpfully" modifying (') in some way (perhaps by escaping it), so that it's mangled by the time it gets to SMF. Of course, if that's the case, why doesn't everyone see this problem? Could there be some sensitivity to the PC's OS settings, the browser used, or specific settings?

You mentioned people running with UTF-8 as the default browser coding. I don't think that would affect anything... both the (') and the (?) are in the ASCII range (bottom 128), and shouldn't be affected by character encoding. That does bring up the question as to whether these people are on English-language keyboards or if they're on Arabic or other (non-Latin alphabet) keyboards... could they possibly be pressing a key that looks like an English apostrophe ('), but is actually some other character?

If none of this pans out, you've got to find out what's in common. If it's just, say, IE7 browsers on systems with a non-English locale, perhaps MS has decided to "help" us with turning on "smart quotes" in the browser? If that were the case, however, I'd think that (") would show up in a funny way.

I think you'll be asking your users a lot more, very detailed, questions before you get to the bottom of this.

Antechinus

Quote from: MrPhil on January 02, 2009, 10:01:01 AMI presume that as the poster typed in the message, they saw (') rather than (?).
Correct. At least that's what they tell me. 


QuoteAre you comfortable enough with phpMyAdmin (or some other database maintenance tool) to locate the message and see what bytes are in the database entry itself?
I can do it but I'll need instructions as my phpMyAdmin skills are fairly basic. Tell me which queries to run or whatever and I'll take a look.

QuoteCan you ask your users if they see this problem only with IE (which version?) or with both IE and FF?
It happens with both browsers.


QuoteI think one of the posts said that they "hadn't changed anything" when the problem occurred. If they didn't do any updates to their OS or browser (including security patches from MS), did you (or your hosting service) change anything on the server side? New MySQL level? New PHP level? Migrated server (along with software changes or lost/damaged configuration files)?
This could be the key. I was messing around with the language settings as described in this thread. I don't recall the problem occurring before this, but it's odd that it only affects certain members.


QuoteBy the way, is TinyPortal running? I see only SMF in the link. Is there any difference in going through TP versus straight to SMF? I'm wondering if TP is "helpfully" modifying (') in some way (perhaps by escaping it), so that it's mangled by the time it gets to SMF. Of course, if that's the case, why doesn't everyone see this problem? Could there be some sensitivity to the PC's OS settings, the browser used, or specific settings?
TP is running but I have no idea about the rest. The forum is a 1.1.7/TP 0.983 combination with several mods, but as I said I don't recall this problem ever happening before the language settings were messed with. 


QuoteThat does bring up the question as to whether these people are on English-language keyboards or if they're on Arabic or other (non-Latin alphabet) keyboards... could they possibly be pressing a key that looks like an English apostrophe ('), but is actually some other character?
The woman whose post I linked to is German and is presumably using a German keyboard. OTOH her English is very good indeed. In fact it's better than many people who have English as a first language.


QuoteIf it's just, say, IE7 browsers on systems with a non-English locale
Unfortunately it isn't. ;)

ThorstenE

I'm also usin a german keyboard ;)
please copy these characters and post them to the affected forum:

example:
bro`s (with gravis accent)
bro´s (with akut accent)
bro's (with single quote -> should be used as apostrophe)

I have no english keyboard here, so I cannot compare the character layout but I think she's using an akut or gravis accent (also part of our keyboard layout) instead of a single quote?
http://de.wikipedia.org/wiki/Apostroph#Aufstellung_.C3.A4hnlicher_Zeichen

Antechinus

Bingo.  :D  The akut does it. Test post is here.

ThorstenE

ok, two possible solutions in my opinion:
1) convert the forum to UTF-8 (then ALL characters incl. the akut accent should work)
2) tell your users they should use a single quote instead of the akut accent.

I personally prefer 1) because it's a real solution ;) but it's not my forum..

Antechinus

Funny you should mention that. I'm planning on doing the UTF-8 conversion at the same time as I convert the place to SMF 2. This is currently scheduled for shortly after the public release of RC1. Might as well do it all at once.

To be honest, I think SMF really should be UTF-8 by default. Other forum software is and it works just fine. The fact that SMF isn't is really a demonstration of the "Central America means Kansas" mindset IMO.  ;)

BloodWings

#16
*little bump*


I am having the issue of question marks for apostrophes only when text is copied and pasted into a post.  It converts all the apostrophes into question marks. 

Also, after the conversion from *stabs* Xsorbit to new host SMF, the older posts have a number of odd A's sprinkled throughout the posts.

The SQL connection collation setting under my phpAdmin is :  utf8_unicode_ci   I am VERY new to all this; can someone tell me if I should have one of the other options ticked?



#  MySQL charset:  UTF-8 Unicode (utf8)
#  MySQL connection collation:  utf8_unicode_ci

MrPhil

Quote from: BloodWings on January 10, 2009, 04:24:07 PM
I am having the issue of question marks for apostrophes only when text is copied and pasted into a post.  It converts all the apostrophes into question marks. 

What are you producing the original text in, before cutting and pasting it into SMF? If you're using Word (or similar Microsoft word processors), you will have to find a way to turn off the "smart quotes" function. It turns your apostrophes into typographically correct glyphs, but unfortunately, MS used nonstandard code points for these characters.

heavyccasey

Quote from: TE on January 03, 2009, 07:42:22 AM
ok, two possible solutions in my opinion:
1) convert the forum to UTF-8 (then ALL characters incl. the akut accent should work)
2) tell your users they should use a single quote instead of the akut accent.

I personally prefer 1) because it's a real solution ;) but it's not my forum..
You could also use the swear filter. I use that for the MS "smart" quotes.

BloodWings

Actually, it's happening when we copy/paste text from a posted blog. 

Advertisement: