Advertisement:

Author Topic: Question marks instead of apostrophes.  (Read 38177 times)

Offline Antechinus

  • SMF Friend
  • SMF Super Hero
  • *
  • Posts: 24,327
  • Master of BBC Abuse
Question marks instead of apostrophes.
« on: December 28, 2008, 02:24:31 AM »
Here's a strange one: some member's posts are showing a question mark where they should show an apostrophe. This only happens to some members, not all members.

It's beginning to make the place rather messy so a solution would be good.  Running 1.1.7 with TP 0.983 and there are no error messages showing up. Forum language is English (ie: not English UTF-8)

Offline ѕησω

  • SMF Friend
  • SMF Hero
  • *
  • Posts: 3,432
  • Gender: Male
  • Nisi credideritis, non intelligetis.
    • wade.poulsen93 on Facebook
    • acans on GitHub
    • https://www.linkedin.com/profile/view?id=145186638 on LinkedIn
    • @imacans on Twitter
    • Acans
Re: Question marks instead of apostrophes.
« Reply #1 on: December 28, 2008, 03:08:12 AM »
stupid question, is this (!!!) a apostrophe
"The Book of Arantor, 17:3-5
  And I said unto him, thy database query shalt always be sent by the messenger of $smcFunc
  And $smcFunc shall protect you against injections and evil
  And so it came to pass that mysql_query was declared deprecated and even though he says he is not dead yet, the time was soon to come to pass when mysql_query shall be gone and no more."

ThorstenE

  • Guest
Re: Question marks instead of apostrophes.
« Reply #2 on: December 28, 2008, 08:16:08 AM »
’ is an apostrophe.

Forum language is English (ie: not English UTF-8)
database tables / fields are also latin or your type of charset? a question mark is often shown when you insert a latin/ANSI encoded special character into a UTF-8 collated database field.

a short test: try it with these characters (german umlauts) öäü
when you see a question mark your database table / field collation is UTF-8 I think.

Offline Antechinus

  • SMF Friend
  • SMF Super Hero
  • *
  • Posts: 24,327
  • Master of BBC Abuse
Re: Question marks instead of apostrophes.
« Reply #3 on: December 30, 2008, 01:58:55 AM »
Database is in latin_swedish_ci

This is not a case of non-English characters being inserted into posts. I know all about that. This is different. This is a case of people posting in standard English but when they submit their post the apostrophes in it show as question marks. Also, as I said, this only happens to some members. For instance it does not happen to me when I submit posts.

ThorstenE

  • Guest
Re: Question marks instead of apostrophes.
« Reply #4 on: December 30, 2008, 02:32:53 AM »
maybe the members with this issue use a wrong (fixed UTF-8) encoding in their browser?

MrPhil

  • Guest
Re: Question marks instead of apostrophes.
« Reply #5 on: December 30, 2008, 08:50:54 AM »
Another path to pursue: when people type in apostrophes and quotation marks, and they're changed to oddball/unprintable characters, it's the result of using a Microsoft editor. The "editor" (such as Word) "helpfully" changes the quotation marks and apostrophes to the typographically correct glyphs. This is called "smart quotes". Unfortunately, the character set that MS uses is not a standard one, and thus quotation marks and apostrophes don't show up correctly on a Web page. They're in an area reserved for control characters.

The $64 question is: how is an MS editor getting involved here in an SMF post? Are these particular users on Windows machines and using Word (or some other MS word processor) to edit their posts and cutting and pasting into SMF? That's the only way I can think of that incorrect characters could be dragged in, although it's morning and I haven't had my second cup of coffee yet... anyway, that's another thing to look into.

Offline Antechinus

  • SMF Friend
  • SMF Super Hero
  • *
  • Posts: 24,327
  • Master of BBC Abuse
Re: Question marks instead of apostrophes.
« Reply #6 on: December 30, 2008, 07:16:12 PM »
That's a good point that I hadn't thought of. I'll poll the members affected by this and see what turns up. Thanks.

Offline Antechinus

  • SMF Friend
  • SMF Super Hero
  • *
  • Posts: 24,327
  • Master of BBC Abuse
Re: Question marks instead of apostrophes.
« Reply #7 on: January 01, 2009, 04:43:27 PM »
Ok, nobody is using an external program like Word or whatever and everyone has their browser on the default settings as downloaded. IOW, exactly the same as people who aren't generating this problem.

Any more ideas?

MrPhil

  • Guest
Re: Question marks instead of apostrophes.
« Reply #8 on: January 01, 2009, 06:44:36 PM »
Hmm. Since it doesn't affect everyone, it can't be some kind of trash in the censored words list. Does it affect only certain kinds of browsers, or a wide range? Is there any code anywhere to replace ' by ' or anything like that? Maybe some browsers don't recognize the particular HTML entity being used. There's got to be some common pattern by browser make, version, PC language configuration, keyboard, user group, or something.

I've seen some people posting on these SMF forums whose apostrophes are always escaped (\'), and I've always wondered if they were doing that manually out of habit, or what. It does suggest that somewhere, apostrophes are being singled out for special treatment. When they're put into the database, they're supposed to be escaped (and then unescaped on the flip side) -- maybe something there isn't working, but then again, I'd think it would affect everyone. Is data handed off from TinyPortal to SMF and vice-versa? Maybe they're doing some kind of irreversible change in the process?

When you say "question mark", is this just a plain "?", a white "?" on a black diamond, or something else? Are any of these users tech-savvy enough to View > Page Source and see just what the browser is being asked to display -- an HTML entity, a plain "?", or a binary byte (or double byte)? Is SMF outputting UTF-8 for its page? Is TinyPortal putting out the same page encoding? If there were a mixed character set encoding, I'd think that everyone would experience the same problem.

Can you give the SMF community a link to a posting so we can see if this happens on our browsers, and report back what we're running on?

Offline Antechinus

  • SMF Friend
  • SMF Super Hero
  • *
  • Posts: 24,327
  • Master of BBC Abuse
Re: Question marks instead of apostrophes.
« Reply #9 on: January 02, 2009, 01:59:14 AM »
Hmm. Since it doesn't affect everyone, it can't be some kind of trash in the censored words list. Does it affect only certain kinds of browsers, or a wide range? Is there any code anywhere to replace ' by ' or anything like that? Maybe some browsers don't recognize the particular HTML entity being used. There's got to be some common pattern by browser make, version, PC language configuration, keyboard, user group, or something.
We don't censor words so the censored words list is extremely short.  ;)  The only thing in the list is &nbsp and that is censored to &nbsp. This may seem crazy but it is a dodge to get around a bug in the AEVC mod and it works.

You're right that there has to be a common pattern but tracking it could be fun. I'll see what other questions I can think of to ask them.

Quote
Is data handed off from TinyPortal to SMF and vice-versa? Maybe they're doing some kind of irreversible change in the process?
I've no idea.

Quote
When you say "question mark", is this just a plain "?"
Yes

Quote
Are any of these users tech-savvy enough to View > Page Source and see just what the browser is being asked to display -- an HTML entity, a plain "?", or a binary byte (or double byte)?
No such luck. They have trouble finding the keyboard.


Quote
Can you give the SMF community a link to a posting so we can see if this happens on our browsers, and report back what we're running on?
That's probably the best thing to do. Here's a link.

MrPhil

  • Guest
Re: Question marks instead of apostrophes.
« Reply #10 on: January 02, 2009, 10:01:01 AM »
I can see the (') replaced with (?) in the supplied link. The page source shows ISO-8859-1 (Latin-1) character set, the header info looks valid (the W3C validator doesn't show any gross errors), and the character appears to be a normal (?) (x3F).

I presume that as the poster typed in the message, they saw (') rather than (?). Are you comfortable enough with phpMyAdmin (or some other database maintenance tool) to locate the message and see what bytes are in the database entry itself? Can you ask your users if they see this problem only with IE (which version?) or with both IE and FF? I think one of the posts said that they "hadn't changed anything" when the problem occurred. If they didn't do any updates to their OS or browser (including security patches from MS), did you (or your hosting service) change anything on the server side? New MySQL level? New PHP level? Migrated server (along with software changes or lost/damaged configuration files)?

By the way, is TinyPortal running? I see only SMF in the link. Is there any difference in going through TP versus straight to SMF? I'm wondering if TP is "helpfully" modifying (') in some way (perhaps by escaping it), so that it's mangled by the time it gets to SMF. Of course, if that's the case, why doesn't everyone see this problem? Could there be some sensitivity to the PC's OS settings, the browser used, or specific settings?

You mentioned people running with UTF-8 as the default browser coding. I don't think that would affect anything... both the (') and the (?) are in the ASCII range (bottom 128), and shouldn't be affected by character encoding. That does bring up the question as to whether these people are on English-language keyboards or if they're on Arabic or other (non-Latin alphabet) keyboards... could they possibly be pressing a key that looks like an English apostrophe ('), but is actually some other character?

If none of this pans out, you've got to find out what's in common. If it's just, say, IE7 browsers on systems with a non-English locale, perhaps MS has decided to "help" us with turning on "smart quotes" in the browser? If that were the case, however, I'd think that (") would show up in a funny way.

I think you'll be asking your users a lot more, very detailed, questions before you get to the bottom of this.

Offline Antechinus

  • SMF Friend
  • SMF Super Hero
  • *
  • Posts: 24,327
  • Master of BBC Abuse
Re: Question marks instead of apostrophes.
« Reply #11 on: January 02, 2009, 04:40:00 PM »
I presume that as the poster typed in the message, they saw (') rather than (?).
Correct. At least that's what they tell me. 


Quote
Are you comfortable enough with phpMyAdmin (or some other database maintenance tool) to locate the message and see what bytes are in the database entry itself?
I can do it but I'll need instructions as my phpMyAdmin skills are fairly basic. Tell me which queries to run or whatever and I'll take a look.

Quote
Can you ask your users if they see this problem only with IE (which version?) or with both IE and FF?
It happens with both browsers.


Quote
I think one of the posts said that they "hadn't changed anything" when the problem occurred. If they didn't do any updates to their OS or browser (including security patches from MS), did you (or your hosting service) change anything on the server side? New MySQL level? New PHP level? Migrated server (along with software changes or lost/damaged configuration files)?
This could be the key. I was messing around with the language settings as described in this thread. I don't recall the problem occurring before this, but it's odd that it only affects certain members.


Quote
By the way, is TinyPortal running? I see only SMF in the link. Is there any difference in going through TP versus straight to SMF? I'm wondering if TP is "helpfully" modifying (') in some way (perhaps by escaping it), so that it's mangled by the time it gets to SMF. Of course, if that's the case, why doesn't everyone see this problem? Could there be some sensitivity to the PC's OS settings, the browser used, or specific settings?
TP is running but I have no idea about the rest. The forum is a 1.1.7/TP 0.983 combination with several mods, but as I said I don't recall this problem ever happening before the language settings were messed with. 


Quote
That does bring up the question as to whether these people are on English-language keyboards or if they're on Arabic or other (non-Latin alphabet) keyboards... could they possibly be pressing a key that looks like an English apostrophe ('), but is actually some other character?
The woman whose post I linked to is German and is presumably using a German keyboard. OTOH her English is very good indeed. In fact it's better than many people who have English as a first language.


Quote
If it's just, say, IE7 browsers on systems with a non-English locale
Unfortunately it isn't. ;)

ThorstenE

  • Guest
Re: Question marks instead of apostrophes.
« Reply #12 on: January 03, 2009, 02:52:21 AM »
I'm also usin a german keyboard ;)
please copy these characters and post them to the affected forum:

example:
bro`s (with gravis accent)
bro´s (with akut accent)
bro's (with single quote -> should be used as apostrophe)

I have no english keyboard here, so I cannot compare the character layout but I think she's using an akut or gravis accent (also part of our keyboard layout) instead of a single quote?
http://de.wikipedia.org/wiki/Apostroph#Aufstellung_.C3.A4hnlicher_Zeichen

Offline Antechinus

  • SMF Friend
  • SMF Super Hero
  • *
  • Posts: 24,327
  • Master of BBC Abuse
Re: Question marks instead of apostrophes.
« Reply #13 on: January 03, 2009, 06:02:08 AM »
Bingo.  :D  The akut does it. Test post is here.

ThorstenE

  • Guest
Re: Question marks instead of apostrophes.
« Reply #14 on: January 03, 2009, 07:42:22 AM »
ok, two possible solutions in my opinion:
1) convert the forum to UTF-8 (then ALL characters incl. the akut accent should work)
2) tell your users they should use a single quote instead of the akut accent.

I personally prefer 1) because it's a real solution ;) but it's not my forum..

Offline Antechinus

  • SMF Friend
  • SMF Super Hero
  • *
  • Posts: 24,327
  • Master of BBC Abuse
Re: Question marks instead of apostrophes.
« Reply #15 on: January 03, 2009, 03:54:04 PM »
Funny you should mention that. I'm planning on doing the UTF-8 conversion at the same time as I convert the place to SMF 2. This is currently scheduled for shortly after the public release of RC1. Might as well do it all at once.

To be honest, I think SMF really should be UTF-8 by default. Other forum software is and it works just fine. The fact that SMF isn't is really a demonstration of the "Central America means Kansas" mindset IMO.  ;)

Offline BloodWings

  • Newbie
  • *
  • Posts: 2
Re: Question marks instead of apostrophes.
« Reply #16 on: January 10, 2009, 04:24:07 PM »
*little bump*


I am having the issue of question marks for apostrophes only when text is copied and pasted into a post.  It converts all the apostrophes into question marks. 

Also, after the conversion from *stabs* Xsorbit to new host SMF, the older posts have a number of odd A's sprinkled throughout the posts.

The SQL connection collation setting under my phpAdmin is :  utf8_unicode_ci   I am VERY new to all this; can someone tell me if I should have one of the other options ticked?



#  MySQL charset:  UTF-8 Unicode (utf8)
#  MySQL connection collation:  utf8_unicode_ci
« Last Edit: January 10, 2009, 04:25:58 PM by BloodWings »

MrPhil

  • Guest
Re: Question marks instead of apostrophes.
« Reply #17 on: January 10, 2009, 09:59:06 PM »
I am having the issue of question marks for apostrophes only when text is copied and pasted into a post.  It converts all the apostrophes into question marks. 

What are you producing the original text in, before cutting and pasting it into SMF? If you're using Word (or similar Microsoft word processors), you will have to find a way to turn off the "smart quotes" function. It turns your apostrophes into typographically correct glyphs, but unfortunately, MS used nonstandard code points for these characters.

Offline heavyccasey

  • Jr. Member
  • **
  • Posts: 212
Re: Question marks instead of apostrophes.
« Reply #18 on: January 11, 2009, 01:09:29 AM »
ok, two possible solutions in my opinion:
1) convert the forum to UTF-8 (then ALL characters incl. the akut accent should work)
2) tell your users they should use a single quote instead of the akut accent.

I personally prefer 1) because it's a real solution ;) but it's not my forum..
You could also use the swear filter. I use that for the MS "smart" quotes.

Offline BloodWings

  • Newbie
  • *
  • Posts: 2
Re: Question marks instead of apostrophes.
« Reply #19 on: January 11, 2009, 01:18:46 AM »
Actually, it's happening when we copy/paste text from a posted blog.