Post not complete after hitting POST Button

Started by Johnny B, May 12, 2012, 01:06:46 AM

Previous topic - Next topic

Johnny B

Hello all

First let me say thanks for helping in advance

PROBLEM:

Just installed, made a few posts, all has went well until tonight.

Been playing with members making posts. When a member makes a post not all of the post will show up. Example I made a post that had 10 sentences and only 2 show.  I did toggle a few settings today but for the life of me don't remember putting a limit of characters in a post.

Latest version being used, just installed yesterday 2.02

edited

Storman™

Was that a one off or does it happen all the time ?

Johnny B

#2
Thanks for the reply.

Did a lot of testing last with newly created members rotating posts through various boards and it seems to be one particular post. All members in all boards can post what they want and as much as they want. But, no mater who posts including admin this one post will only allow 2 sentences to be posted. When I write the post, I can view and all is there and looks good - but, when I hit the post button only 2 sentences show. Strange because there is no foul language .  Anyway here is the post:


Arantor

It's those quote marks, they're invalid code under some circumstances.

Take a backup, then convert the forum to UTF-8 from the maintenance area.

MrPhil

It's hard to see what exactly the original quotation marks are/were, but it's likely that the quotation marks are Microsoft "Smart Quotes" and are invalid in many browsers. Depending on the browser, a Smart Quote might be interpreted as a control code, which could terminate the post at that point. It won't do any good to convert to UTF-8 (both database and page display and any non-English language files), but it won't hurt.

Since this is such a common problem (cutting and pasting text from MS Word), I've asked the SMF developers to put in code to scan for Smart Quotes characters and convert them to HTML entities. For the time being, all you can do is edit the posts to replace the Smart Quotes with standard ASCII characters or HTML entities. See my sig > projects > Change "Smart Quotes" for a complete list of Smart Quotes to look for.

Johnny B

Quote from: MrPhil on May 12, 2012, 11:00:08 AM
It's hard to see what exactly the original quotation marks are/were, but it's likely that the quotation marks are Microsoft "Smart Quotes" and are invalid in many browsers. Depending on the browser, a Smart Quote might be interpreted as a control code, which could terminate the post at that point. It won't do any good to convert to UTF-8 (both database and page display and any non-English language files), but it won't hurt.

Since this is such a common problem (cutting and pasting text from MS Word), I've asked the SMF developers to put in code to scan for Smart Quotes characters and convert them to HTML entities. For the time being, all you can do is edit the posts to replace the Smart Quotes with standard ASCII characters or HTML entities. See my sig > projects > Change "Smart Quotes" for a complete list of Smart Quotes to look for.

Thanks for the reply but that was copied to notepad first to remove all that then pasted to thread, not sure but that doesn't seem to be the problem. Not an expert but I was taught notepad does remove word coding. Am I wrong on that?

Quote from: Arantor on May 12, 2012, 10:34:33 AM
It's those quote marks, they're invalid code under some circumstances.

Take a backup, then convert the forum to UTF-8 from the maintenance area.

I went to the "Convert the database and data to UTF-8" Section, there is a drop down menu to select options. Which option do I select?

Thank you

MrPhil

Quote from: spinfactor on May 12, 2012, 02:17:45 PM
I was taught notepad does remove word coding. Am I wrong on that?

Partially wrong. While cutting and pasting to Notepad (or, usually, anything done via Window's clipboard buffer) will remove the codes for italic, bold, etc., they will not translate "Smart Quotes" characters into other characters (e.g., standard ASCII). What happens is that you end up with single byte codes in your text that aren't valid in the encoding that SMF uses (typically either Latin-1 or UTF-8). Some browsers may fudge the results and display these particular characters as Smart Quotes characters under some encodings, but you can't count on it.

MrPhil

Quote from: spinfactor on May 12, 2012, 02:17:45 PM
I went to the "Convert the database and data to UTF-8" Section, there is a drop down menu to select options. Which option do I select?

Before you go and convert your SMF installation to UTF-8, you should be aware that if the root cause of your problem is that Smart Quote characters were cut and pasted in, it is highly unlikely that a conversion to UTF-8 will work. You might be able to successfully convert existing database entries (posts) to UTF-8 if you tell phpMyAdmin that the existing database is CP-1252 rather than Latin-1, but it will do nothing for you in the future the next time someone cuts and pastes in a Word document containing Smart Quotes.

Johnny B

Quote from: MrPhil on May 12, 2012, 03:09:28 PM
Quote from: spinfactor on May 12, 2012, 02:17:45 PM
I was taught notepad does remove word coding. Am I wrong on that?

Partially wrong. While cutting and pasting to Notepad (or, usually, anything done via Window's clipboard buffer) will remove the codes for italic, bold, etc., they will not translate "Smart Quotes" characters into other characters (e.g., standard ASCII). What happens is that you end up with single byte codes in your text that aren't valid in the encoding that SMF uses (typically either Latin-1 or UTF-8). Some browsers may fudge the results and display these particular characters as Smart Quotes characters under some encodings, but you can't count on it.

Thank you, I learned something here

Quote from: MrPhil on May 12, 2012, 03:14:03 PM
Quote from: spinfactor on May 12, 2012, 02:17:45 PM
I went to the "Convert the database and data to UTF-8" Section, there is a drop down menu to select options. Which option do I select?

Before you go and convert your SMF installation to UTF-8, you should be aware that if the root cause of your problem is that Smart Quote characters were cut and pasted in, it is highly unlikely that a conversion to UTF-8 will work. You might be able to successfully convert existing database entries (posts) to UTF-8 if you tell phpMyAdmin that the existing database is CP-1252 rather than Latin-1, but it will do nothing for you in the future the next time someone cuts and pastes in a Word document containing Smart Quotes.

Gotcha, well this sounds far more difficult than my abilities. Besides, my past experience shows when you tinker with something that effects are none familiar to you the chances of a major screw-up are plentiful. Thanks, guess I'll live with this and hope members don't copy and paste. Shame

MrPhil

Maybe someone should come up with a mod to sanitize cut and pasted Smart Quotes into HTML entities? This may involve creating a new BBCode tag to handle entities, e.g.,  [ent=laquo]. We can then nudge the developers to build this into the base product (hopefully not waiting until 2.1). There may be work going on for this already, and if not, I'll try to take a look at it this weekend.

Johnny B

Sweet, I like that. There's always room for improvement my father always said, fingers crossed.

MrPhil

#11
I did some experimenting, and at least some browsers on some PCs will translate cut and pasted Smart Quotes into UTF-8 equivalents for forums set up as UTF-8. I don't know what will happen if the database is converted to UTF-8, but I suspect that any Smart Quotes already in the database won't be translated to UTF-8. Since this translation of Smart Quotes on the way in (or out) is probably done on the whim of the browser author, and I haven't heard of any official standard to handle it, I hate to recommend doing something major like converting your database and forum to UTF-8. There's no telling when that will work and when it won't.

Try this. It translates Smart Quotes "on the way out" to HTML entities. It should work for any single byte encoding (as well as UTF-8), but has only been tested on a UTF-8 SMF 2.0.2 forum, so use at your own risk and be ready to back it out. On UTF-8, it tries to determine if a byte sequence is already a legitimate UTF-8 character, and if so, doesn't touch it. Otherwise (and on single byte encodings such as Latin-1), it changes anything in the range x80 through x9F to HTML entities. UTF-8 is the only multibyte encoding I've tried -- it will probably fail on UTF-16 or other multibyte encodings, but very few people use those (and they should not install this patch).

It only is applied to text that is potentially subject to BBCode expansion (it is called from parse_bbc). Text that does not undergo BBCode processing (e.g., topic subjects) will not be fixed up. Perhaps someone with the time and inclination can fix that (it doesn't depend on BBCode; that was just a convenient place to call it from). This patch is just a Quick'n'Dirty; others are welcome to fix it up or redo it in more elegant code.

All changes are in Sources/Subs.php. Applying this to SMF 1.x will probably work, but you may have to work to find where in parse_bbc() to apply the changes.

Find

// Never show smileys for wireless clients.  More bytes, can't see it anyway :P.

and change to

// Just in case it wasn't determined yet whether UTF-8 is enabled.
if (!isset($context['utf8']))
$context['utf8'] = (empty($modSettings['global_character_set']) ? $txt['lang_character_set'] : $modSettings['global_character_set']) === 'UTF-8';
        // always fix up "Smart Quotes" going out to the browser, no matter what the encoding
        $message = fix_SmartQuotes($message);
// Never show smileys for wireless clients.  More bytes, can't see it anyway :P.


Find

// Just in case it wasn't determined yet whether UTF-8 is enabled.
if (!isset($context['utf8']))
$context['utf8'] = (empty($modSettings['global_character_set']) ? $txt['lang_character_set'] : $modSettings['global_character_set']) === 'UTF-8';

and change to (comment out):

// // Just in case it wasn't determined yet whether UTF-8 is enabled.
// if (!isset($context['utf8']))
// $context['utf8'] = (empty($modSettings['global_character_set']) ? $txt['lang_character_set'] : $modSettings['global_character_set']) === 'UTF-8';


At the very end of the file, find
?>
and insert before it:

// Translate MS Smart Quotes into HTML entities, so browsers don't choke
// on them as control codes. Some browsers treat Latin-1 as CP-1252, and
// some pages may already be in CP-1252; in either case this should be safe.
// This is done upon text output, and only for text which can have BBCodes.
// Database functions which might be searching or sorting text on UTF-8, and
// are run against the text before this routine is called, might still fail.
// author: SMF's MrPhil (see catskilltech.com)
function fix_SmartQuotes($message) {
  global $context;
  // sometimes a numeric entity is used, because the named entity
  // may not yet be widely supported
  $SQ_list = array(
                        // UTF  SQ usage, UTF usage
                        // codepoint
    0x80 => '€'  , // 20AC Euro, reserved control
    0x81 => '?',        //      unused, reserved control
    0x82 => '‚' , // 201A Low-"9" opening quotation mark, Break Permitted Here
    0x83 => '& #402;'  , // 0192 ƒ Florin/script f/folder, No Break Here
    0x84 => '„' , // 201E Low-"99" opening quotation mark, Index
    0x85 => '…', // 2026 Ellipsis, Next Line
    0x86 => '†', // 2020 Single dagger, Start of Selected Area
    0x87 => '‡', // 2021 Double dagger, End of Selected Area
    0x88 => 'ˆ'  , // 02C6 Circumflex ^ accent (combining?), Character Tabulation Set
    0x89 => '‰', // 2030 o/oo per mille, Character Tabulation with Justification
    0x8A => '& #352;'  , // 0160 &Scaron S + caron accent, Line Tabulation Set
    0x8B => '&lsaquo;', // 2039 Single left angle quote < (guillemet) Partial Line Down
    0x8C => '&OElig;' , // 0152 OE ligature, Partial Line Up
    0x8D => '?',        //      unused, Reverse Line Feed
    0x8E => '& #381;'  , // 017D &Zcaron Z + caron accent, Single Shift Two
    0x8F => '?',        //      unused, Single Shift Three
    0x90 => '?',        //      unused, Device Control String
    0x91 => '&lsquo;' , // 2018 "6" opening quotation mark, Private Use One
    0x92 => '&rsquo;' , // 2019 "9" closing quotation mark/apostrophe, Private Use Two
    0x93 => '&ldquo;' , // 201C "66" opening quotation mark, Set Transmit State
    0x94 => '&rdquo;' , // 201D "99" closing quotation mark, Cancel Character
    0x95 => '&bull;'  , // 2022 Solid bullet, Message Waiting
    0x96 => '&ndash;' , // 2013 En-dash, Start of Guarded Area
    0x97 => '&mdash;' , // 2014 Em-dash, End of Guarded Area
    0x98 => '&tilde;' , // 02DC Tilde ~ accent (combining?), Start of String
    0x99 => '&trade;' , // 2122 Trademark TM, reserved control
    0x9A => '& #353;'  , // 0161 &scaron s + caron accent, Single Character Introducer
    0x9B => '&rsaquo;', // 203A Single right angle quote > (guillemet), Control Sequence Introducer
    0x9C => '&oelig;' , // 0153 oe ligature, String Terminator
    0x9D => '?',        //      unused, Operating System Command
    0x9E => '& #382;'  , // 017E &zcaron z + caron accent, Privacy Message
    0x9F => '&Yuml;'  , // 0178 Y + diaeresis/umlaute accent, Application Program Command
  );
  $new_message = '';
  if ($context['utf8']) {
    // we are in multibyte UTF-8 mode, so need to skip legitimate UTF-8
    // sequences that may contain x80-9F bytes inside them
    // note that strlen($message) can vary as entities replace char bytes
    for ($i = 0; $i < strlen($message); $i++) {
      $c = ord($message[$i]);
      // lead byte 110x xxxx, followed by one 10xx xxxx, or
      //           1110 xxxx              two
      //           1111 0xxx              three        ?
      // if so, is legitimate UTF-8 sequence, don't modify
      $utf8_seq = 0;  // not UTF-8 (zero 10xx xxxx bytes to follow)
      $cm = $c & 0xE0;
      if ($cm == 0xC0) {
        $utf8_seq = 1;
      } else {
        $cm = $c & 0xF0;
if ($cm == 0xE0) {
  $utf8_seq = 2;
} else {
  $cm = $c & 0xF8;
  if ($cm == 0xF0) {
    $utf8_seq = 3;
  }
}
      }

      for ($j = 0; $j < $utf8_seq; $j++) {
        // j+1st following byte should be 10xx xxxx
// but first, are we running off the end of $message?
// shouldn't happen with well-formed UTF-8 characters...
if ($i+$j+1 >= strlen($message)) {
          $utf8_seq = 0;
  break;
        }
$cm = ord($message[$i+$j+1]) & 0xC0;
if ($cm != 0x80) {
          $utf8_seq = 0;
  break;
        }
      }

      // skip over next $utf8_seq bytes as a legitimate UTF-8 sequence
      // or process single byte as possible Smart Quote
      if ($utf8_seq == 0) {
        if ($c >= 0x80 && $c <= 0x9F) {
  $new_message .= $SQ_list[$c]; // replace by HTML entity
        } else {
          $new_message .= chr($c);      // use original character
  // note that originally malformed UTF-8 won't be fixed
        }
      } else {
        $new_message .= substr($message, $i, $utf8_seq+1); // use original bytes
$i += $utf8_seq;  // end of loop adds another 1
      }
    } // end of for loop through $message
  } else {
    // we are in a single byte mode, so go ahead and fix any
    // x80-9F bytes
    for ($i = 0; $i < strlen($message); $i++) {
      $c = ord($message[$i]);
      if ($c >= 0x80 && $c <= 0x9F) {
        $new_message .= $SQ_list[$c];
      } else {
        $new_message .= chr($c);
      }
    }
  }
  return $new_message;
}

There are five (5) entries in the $SQ_list table where you need to close up & and # (remove the space) to form proper HTML entities. This is because numeric entities get processed in the forum display into the actual characters.

Add: That last chunk of text is available as an attachment SQ_fix.txt. Just cut and paste it into place (just before the closing ?> ). There is no fixup needed as described above.

MrPhil

Have you had a chance to try this code? Please let me know if it works for you, or if you experience problems. If it works well, it might get packaged up into a mod.

phlexx

Please, has this been tested by anyone using SMF 2.0.2? does it work on 2.0.2?

MrPhil

It was developed and tested (briefly) on SMF 2.0.2. Just be sure to back up the file before editing it, in case something goes wrong.

phlexx

Ok. I will do that ASAP. though I am not good in php coding. :-*

MrPhil

No PHP coding is involved. All you have to do is edit one file, looking for a match in a group of lines, and making some changes listed.

phlexx

Thanks mrphil. The problem is solved now with your patch. Hope it remains a permanent solution. Many thanks.

Johnny B

#18
Ahh, I thought this thread was closed. Could have sworn it was closed, that's why I started a different thread yesterday. Anyway, thanks for looking into this further. I'll give it a shot

BTW, where so I make these changes? How do I find this find to make the changes? Sorry, this is all new to me

Storman™

QuoteBTW, where so I make these changes?

In his posting above MrPhil advised that these changes were in Subs.php which is located in the Sources folder

;)

Advertisement: