News:

Wondering if this will always be free?  See why free is better.

Main Menu

If a post contains £, post body is not visible.

Started by increpatio, April 02, 2011, 09:49:42 AM

Previous topic - Next topic

increpatio

When I post something with a £ on my forum (or – (en dash, not -), or the ellipsis character, or a bunch of other characters like that), the post body is not visible (though if I quote it I can see its contents in the editor area just fine).

This has been happening for a while, I think maybe since I upgraded to 2.  I just upgraded to RC5, and the problem hasn't resolved itself.

I've had a search around - this seems the most closely related thread I could find, but I wasn't able to figure out a solution from it.


  • This is happening for all users, not just me.
  • I tried switching to the british language pack, but that didn't make a difference, so changed back.
  • I've tried running "convert to utf8" on my database.  Says it works just fine, but it didn't make a difference.  (Though I notice that the option to convert it is still there - I don't know if this is a problem or not...).
  • My settings.php file has the line  $db_character_set = 'utf8';.
  • I'm not too sure of where there might be problems in mysql, but the default characterset of my _messages table, at least, is utf8.
  • This is happening to new posts as well.

I'd really appreciate some help/hints with this  :'(

Spoogs

Do you have the utf8 language pack installed? Admin>>Configuration>>Languages

increpatio

I have

en_GB.utf8
en_US
en_US.utf8 (< the selected one)

MrPhil

When you say you use characters such as &pound; or &ndash; or &hellip;, how are you inserting them? Are you cutting and pasting from a PC, specifically from a Microsoft word processor such as Word? If so, you should be aware that PCs use a nonstandard character encoding (Windows-1252) which has such characters in what should be reserved for control characters. If you paste them into a Latin-1 or UTF-8 Web application, you could be inserting control characters into your text, which could bend the mind of the browser. See my sig > Projects > Smart Quotes.

increpatio

QuoteWhen you say you use characters such as &pound; or &ndash; or &hellip;, how are you inserting them?
With my keyboard.  Just like this

£ –

:D

I'm using OS X 10.6, but asked someone on a windows machine to double-check and they're getting the same behaviour.

MrPhil

I've seen keyboards with a British Pound Sterling, but never one with an en dash or an ellipsis. But, to get back to my question, are you directly typing them into an SMF text entry box, or are you building it in a PC word processor and cutting-and-pasting into your browser?

increpatio

Quote from: MrPhil on April 02, 2011, 03:38:17 PM
Are you directly typing them into an SMF text entry box, or are you building it in a PC word processor and cutting-and-pasting into your browser?
I am typing directly into the browser text box in SMF.

MrPhil

OK, I don't know why Pound Sterling would cause problems -- have you confirmed that your page is displaying in UTF-8, and the database is actually UTF-8? The Pound Sterling (if it's on your keyboard) should be stored as the right UTF-8 code and properly displayed. As for en-dash and ellipsis that you mentioned, are those on your keyboard? I've never seen them on a keyboard, so I assume that you must be cutting and pasting from a Word document. The problem there, as I've said, is that these special characters will cause problems because they're not valid in UTF-8.

Let's isolate the problem and make sure which characters are causing problems. Make posts with nothing but ASCII characters (normal text, no en-dash or ellipses, etc.) and one or two Pound Sterling signs, and see what happens. Does the preview work and display later doesn't? Go into phpMyAdmin and find and examine the post (smf_messages) and see if it appears to be stored correctly. Hopefully somewhere you'll isolate what went wrong.

increpatio

#8
Thanks for your suggestions, they're giving some direction to my flailings.  :)

Quotehave you confirmed that your page is displaying in UTF-8, and the database is actually UTF-8?
I don't know for sure how to check that the database is UTF-8 for sure beyond what I've already said (points 3,4,5 of my original post).

The page is displaying in UTF-8.

QuoteAs for en-dash and ellipsis that you mentioned, are those on your keyboard?
Pound sign is from keyboard (entered regularly, and it works here fine).
en-dash is from keyboard (alt+hyphen on my keyboard).
Other ones come from pasting from textedit (notepad equivalent on OSX), which does some swaps.  However, these swaps I can paste in just fine on another older smf forum I have on the same server (V. 1.1.10), and here as well:

"a...b"  (note the ellipsis / quotes is special)

If I insert any of the characters mentioned, then nothing appears in the post body.

Quote
Let's isolate the problem and make sure which characters are causing problems. Make posts with nothing but ASCII characters (normal text, no en-dash or ellipses, etc.) and one or two Pound Sterling signs, and see what happens. Does the preview work and display later doesn't?
Preview displays nothing, and nothing appears in the post body (either visibly, or in the source code – the source code the post body div is empty : <div id="msg_11438_quick_mod"></div> ).

I make a post "kitten £ kitten"

In the mysql database (I don't have mysqladmin, am working from a command-prompt), I get the body as being
"kitten ? kitten"

"kitten £ kitten" is accepted fine as a message title, and displays fine.

If I try to edit the post again, I see the contents of the edit post text box are correct, displaying "kitten £ kitten".  (resaving doesn't change anything).

Making the same post on the 1.1.10 forum I have on the same server, it displays fine on the website, and looking at it in the database I get the body as being
"kitten ? kitten"

One difference I notice between the field types in 1.1.10 and the 2.0RC5 versions is that in 1.1.10 body is mediumtext and in 2.0RC5 it's text.  I don't know if this is at all significant.

----

Next stop, try to re-download a fresh copy of to a different location and see if I can view the forums okay with that, in case I've screwed up its insides somehow.

Downloaded + unzipped a fresh upgrade to a new folder.  Copied my settings.php file over.  Kitten post still has empty body.

Hmm.

----

Update:

Nothing suspicious in the error log either.

increpatio

So, if it's not something with the forum php files (with the possible exception of settings.php, but I'd be skeptical of that), one figures it must be a problem with the database, right?  But where should I go poking? Hmm.  Worst comes to worst I can set up a new forum (hopefully not having the bug) and compare the table headers side by side.

MrPhil

Hmm. You shouldn't accept that your page's code says "UTF-8" and leave it at that. I've seen servers that somehow override that and force Latin-1 (or some other encoding), so the meta tag is ignored. Try your browser's View > Character Encoding and play around to see if forcing other encodings at the browser level does anything interesting.

phpMyAdmin or SQL "SHOW" command (you'll have to look up the details) should tell you the encoding and collation that your database is using (overall, per table, and per field).

Do you know for sure that alt+hyphen produces a character code in the page's displayed encoding? That is, UTF-8 rather than whatever native encoding your PC (Mac?) is using? Sometimes "alternate" entry methods are not fully integrated into the browser/Web and end up producing a different encoding than the page is using. Case in point, on a Windows PC alt+nnn on the keyboard produces a Windows-1252 (or whatever encoding the native PC is) character rather than something in UTF-8 or whatever the page is, so be careful.

text vs. mediumtext shouldn't make any difference.

increpatio

Quote from: MrPhil on April 03, 2011, 09:37:43 PM
Hmm. You shouldn't accept that your page's code says "UTF-8" and leave it at that. I've seen servers that somehow override that and force Latin-1 (or some other encoding), so the meta tag is ignored. Try your browser's View > Character Encoding and play around to see if forcing other encodings at the browser level does anything interesting.
The whole post's text is missing, viewing the source.  Changing character encoding is unlikely to fix that.  It works fine with other forums of the same version, and sharing the same sql database.  The page has the same character encoding (looking at what encoding firefox is telling me it's using for the page) as my working forum of the same version and of the same theme and on the same forum.

I tried changing character encoding before posting, but the bodies were still empty.

Quote
phpMyAdmin or SQL "SHOW" command (you'll have to look up the details) should tell you the encoding and collation that your database is using (overall, per table, and per field).
In installed a new smf forum, using the same database (but not sharing any data - new forum uses different prefix), same version, and the new blank one works fine.  This is a blank install of smf, and none of the problems occur.

I ran "show create table" on the _messages tables of both, and compared them.  The only difference other than mediumtext is that on the broken forum, AUTO_INCREMENT=11466, and on the good forum AUTO_INCREMENT=3.  I don't think this is significant.  I also checked "show table status", and both tables have collation utf8_general_ci.

The database has default charset latin1, but the messages tables both have default charset utf8.

I tried modifying the new (unbroken) forum to point at the old one's database, but that had the same problems.
I tried modifying the old (broken) forum to point at the new one's database, and the problem wasn't there.

This would lead me to think that there's a problem in the database somewhere.  But I can't imagine where...

Quote
Do you know for sure that alt+hyphen produces a character code in the page's displayed encoding?
Using http://javascript.internet.com/miscellaneous/ascii-character-code.html [nofollow] gives me a character code of 8211, which is the correct utf-8 code [nofollow].

going to scratch my chin some more...thanks for your suggestions so far, much appreciated to have a companion while looking into wretched matters like this :)

increpatio

#12
Looking at php, I've narrowed down the culprit to

parsesmileys

If I return on the first line of this, everything is cleared up.

I had created, I think a while back, an empty smiley set, which I was using.  I may have created it by nefarious means, I can't remember. (I couldn't find any other way, and still can't).

the line that nukes everything is

$message = preg_replace($smileyPregSearch, 'isset($smileyPregReplacements[\'$1\']) ? $smileyPregReplacements[\'$1\'] : \'\'', $message);

where

$smileyPregSearch = "~(?<=[>:\?\.\s\x{A0}[\]()*\\;]|^)()(?=[^[:alpha:]0-9]|$)~eu"
and the second argument  is ''.

Anyway, putting a return at the start of this function fixes everything.  So a temporary fix at least...

Advertisement: