News:

SMF 2.1.4 has been released! Take it for a spin! Read more.

Main Menu

db 🔨 Hammered?

Started by drewactual, August 07, 2019, 03:03:05 PM

Previous topic - Next topic

drewactual

a user did this opening a thread on my forum... rendering the hammer emoji. 

i'm guessing there is no problem with this as it is just:

🔨
U+1F528;
ect


insofar as the DB sees it.... but i thought i would ask if it could cause issues?  i don't want to jack my DB up.  and, i don't know how to go about preventing their use, either, if it's something i should concern about.

okay, THIS site doesn't parse the 🔨 or U+1F528 as a hammer.... what setting am i looking past? 🔨 

drewactual

is it simply the HTML setting?

my site apparently allows unicode decimal code.... and i don't know why.... 🔨

vbgamer45

On my non SMF projects that use emoji I have the database charset set to utf8mb4
Community Suite for SMF - Take your forum to the next level built for SMF, Gallery,Store,Classifieds,Downloads,more!

SMFHacks.com -  Paid Modifications for SMF

Mods:
EzPortal - Portal System for SMF
SMF Gallery Pro
SMF Store SMF Classifieds Ad Seller Pro

drewactual

Mine is utf-8. 

I cant foresee it being a problem as its just characters and not code per say.... But its worth asking.  By the way, on my phone the hammer shows up here.  Didnt on laptop.

Arantor

OK so here's the deal... MySQL prior to recent versions can't actually handle proper UTF-8, which is what utf8mb4 is all about, but SMF out of the box cannot use that setting for historical reasons.

So, emojis are converted into an entity form that can be handled in the database and displayed normally.

You can't have the # with the x in the entity format and when I wrote that code I went for the numeric format, but the encoding is done in such a way that if you try to manually enter the & it'll get converted into the entity form of & so that & in a post doesn't get randomly eaten (because that's a thing that can happen as & means something in HTTP)

It's not the HTML setting, it's simply trying to accept the emoji as is and internally convert it to a format the DB can handle - unless you're actually having a problem with it, leave it alone.

As for the hammer showing up here on your phone, no surprise, your phone has a font that can support it, your laptop might not unless you're using a recent browser on Windows 10 (or Linux or Mac).

drewactual

no errors have registered.  i'm not going to tell them how it works, either, though.  some of them may figure it out.  hopefully it doesn't become a problem. 

it's showing on my laptop now, too. 

vbgamer:  i use pretty URL's on that site... i wonder how it is going to like it?
edited:  it reads "/holder-128296-hammered/" which is fine.

shawnb61

Yep.  What Arantor said.   

UTF8 based DBs cannot store the 4-byte sequences natively.  BUT...  If converted to html entities, they can be stored & retrieved & displayed fine.   IF...  Your browser/computer is current enough to know how to display them. 

UTF8MB4 DBs store the 4 byte sequences natively.  Even then, you still have a browser/computer dependency for proper display. 

Oddly, smartphones led the way here, as they were born more recently... 

Note this is true for a lot of languages.  This Vietnamese text uses a 4-byte font:
些 𣎏 世 咹 水 晶 𦓡 空 𣎏 害 咦

As does this cool Gothic text:
𐌼𐌰𐌲 𐌲𐌻𐌴𐍃 𐌹̈𐍄𐌰𐌽, 𐌽𐌹 𐌼𐌹𐍃 𐍅𐌿 𐌽𐌳𐌰𐌽 𐌱𐍂𐌹𐌲𐌲𐌹𐌸.

The above text examples are stored natively in UTF8MB4 MySQL as 4-byte sequences.  But they must be stored as html entities in UTF8 MySQL, due to MySQL's stunted 3-byte UTF8 implementation.
Address the process rather than the outcome.  Then, the outcome becomes more likely.   - Fripp

drewactual

i do appreciate the responses, y'all. 

i'm coming into my season.... quickly... right now there are 1k posts every three or so days- when the season starts there will be likely 3k a day- so.... if this was going to jerk my DB around, it would be a bad time for users to 'discover' it, ya know?

Arantor

More importantly for the 4 byte sequences, you have to tell the database when you connect to it that you want to use 4-byte (which is the main historical hiccup)

It doesn't help that UTF-8 is weird where characters don't all need 4 bytes; English is 1 byte, Japanese mostly sits in the 2-byte region, most commonly used languages can fit into the third byte. 4 bytes covers emoji.

And that's without the really complex part where you play combining emojis (e.g. people emojis combined with another emoji to provide colour variants)

It's... complicated, and what we do in SMF is the simplest form of what there is out there... because there is the difference between a character and the grapheme(s) that make it up to form a single glyph to be rendered with a font or fallback fonts. And we largely pretend that isn't a thing... ;)

You're not jerking your DB around by this stuff being there, it is how SMF operates  from... 2.0.10 I think it was when that code got merged. Prior to that, emoji just would cut off the post at the first time it was inserted.

drewactual

thank you for the reassurance, Sir.  I just about hit the panic button navigating to the error log to see what it impacted.... just to find none.

Advertisement: