News:

Want to get involved in developing SMF, then why not lend a hand on our github!

Main Menu

How did emojis get into my SMF post on my site?

Started by njtweb, February 21, 2020, 01:16:22 PM

Previous topic - Next topic

njtweb

A guest made a post, nothing special in the content verbiage-wise but there is a crossed fingers and praying emoji in the post. I clicked on modify to check the img tag formatting and there's no bbcode at all. How was this person able to do this?

aegersz

I'd say they used their phone or tablet etc. to post that

The configuration of my Linux VPS (SMF 2.0 with 160+ mods & some assorted manual tweaks) can be found here and notes on my mods can be found here (warning: those links will take you to a drug related forum). My (House) music DJ dedication page is here

drewactual

https://unicode.org/emoji/charts/full-emoji-list.html

https://www.simplemachines.org/community/index.php?topic=569059.msg4026546

so far they aren't impacting my sites except to give people pause when they show up in a thread title on teh main board... then i get messages like "how did they do that?"....

strange thing is, sometimes they 'work' and sometimes they render as code..... U+1FAD1   

njtweb

I tested it out myself in my admin forum from my laptop. I went to Twitter and opened a new tweet. I clicked on 3 emojis to add to the tweet and then highlighted and copied them. Then I pasted them into a new test topic.

Sure enough, all three showed up. When I click on modify they're there exactly the same way in the edit mode as they are in the post mode.

shawnb61

Yep, true emoji are just characters, & can be copied & pasted from anywhere.  Should work fine in messages & subjects.
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

efk

For 10 years this is the first time to see that something like this is possible, and that part is unexpected "true emoji are just characters, & can be copied & pasted from anywhere". I tried to copy/paste from facebook, but it gives something like :) and :D
I'm wondering how many people knew about this.

shawnb61

There's a difference between smileys & emoji. 

A smiley is a piece of text that SMF (or any app...) translates to a predefined image.  E.g., the text :) becomes :)

When you copy & paste the smiley, you are usually just copying the actual text.  That's why you saw that when you copied from FB. 

There is a range of characters called emojis/emoticons that are actually just characters - no translation to an image needed.  I find this site very illustrative - you can look up "emoticons" in the U+1F600 ... U+1F64F range:
https://www.utf8-chartable.de/unicode-utf8-table.pl

Your smartphone is actually just showing you an additional set of text characters you can plonk in there, no different from any of these:
  • ಬಾ ಇಲ್ಲಿ ಸಂಭವಿಸು ಇಂದೆನ್ನ ಹೃದಯದಲಿ
  • काचं शक्नोम्यत्तुम् । नोपहिनस्ति माम् ॥
  • ⠊⠀⠉⠁⠝⠀⠑⠁⠞⠀⠛⠇⠁⠎⠎⠀⠁⠝⠙⠀⠊⠞⠀⠙⠕⠑⠎⠝⠞⠀⠓⠥⠗⠞⠀⠍⠑
  • 𐌼𐌰𐌲 𐌲𐌻𐌴𐍃 𐌹̈𐍄𐌰𐌽, 𐌽𐌹 𐌼𐌹𐍃 𐍅𐌿 𐌽𐌳𐌰𐌽 𐌱𐍂𐌹𐌲𐌲𐌹𐌸.
  • ᛖᚴ ᚷᛖᛏ ᛖᛏᛁ ᚧ ᚷᛚᛖᚱ ᛘᚾ ᚦᛖᛋᛋ ᚨᚧ ᚡᛖ ᚱᚧᚨ ᛋᚨᚱ
(For a fantastic source of character test data, look here: http://kermitproject.org/utf8.html)

Where the problems come from...  Note that MySQL's implementation of UTF8 is brain-damaged, and does not support 4-byte characters.  Note that these emoji are 4-byte characters.  To work around this, SMF has had to do a lot of translation behind the scenes.  Basically, anywhere a character is entered or displayed needs special treatment for 4-byte chars (translation to/from an html entity).  If that special treatment is missed somehow, you see the goofy entity codes, as drewactual noted above. 

For basic posts & topics, they have worked fine for a while - the occasional odd bug aside, e.g., for a while there post previews didn't work...
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

probeman

Is there a way to force a text display instead of a smilely symbol in a post?

In other words force :) to display as a colon and a right parentheses?

Arantor

If you don't want *smileys*, delete them from SMF's configuration (they're configurable after all). Or use the nobbc tag for those specific one-off occasions like so :) to just hide them.

SpacePhoenix

Quote from: shawnb61 on February 21, 2020, 03:41:34 PMWhere the problems come from...  Note that MySQL's implementation of UTF8 is brain-damaged, and does not support 4-byte characters.

Looks like that's changed:

https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-sets.html

QuoteMySQL supports multiple Unicode character sets:

    utf8mb4: A UTF-8 encoding of the Unicode character set using one to four bytes per character.

    utf8mb3: A UTF-8 encoding of the Unicode character set using one to three bytes per character. This character set is deprected in MySQL 8.0, and you should use utfmb4 instead.

    utf8: An alias for utf8mb3. In MySQL 8.0, this alias is deprecated; use utf8mb4 instead. utf8 is expected in a future release to become an alias for utf8mb4.


    ucs2: The UCS-2 encoding of the Unicode character set using two bytes per character. Deprecated in MySQL 8.0.28; you should expect support for this character set to be removed in a future release.

    utf16: The UTF-16 encoding for the Unicode character set using two or four bytes per character. Like ucs2 but with an extension for supplementary characters.

    utf16le: The UTF-16LE encoding for the Unicode character set. Like utf16 but little-endian rather than big-endian.

    utf32: The UTF-32 encoding for the Unicode character set using four bytes per character.


Arantor

No, it's still brain-damaged. The entry called utf8 (now utf8mb3) is the one Shawn is referring to and is still often used as the default.

utf8mb4 has been around for years but the brain-damaged version hasn't been killed off yet.

Advertisement: