Advertisement:

Author Topic: RSS encoding issue  (Read 1824 times)

Offline AliG

  • Semi-Newbie
  • *
  • Posts: 13
RSS encoding issue
« on: December 12, 2018, 07:36:21 AM »
Hi,
there is an issue with RSS. When I use BEL character (Ctrl-G via command line, 0x07), it will break RSS. Shouldn't it be encoded somehow?

This is the char. Now, if you try to open RSS, it will show you that there is encoding issue and you won't be able to use RSS for some time.
Code: [Select]

« Last Edit: March 01, 2019, 04:13:23 PM by Gwenwyfar »

Offline Arantor

  • Resident Overthinker
  • SMF Friend
  • SMF Legend
  • *
  • Posts: 71,248
    • StoryBB/StoryBB on GitHub
Re: RSS encoding issue
« Reply #1 on: December 12, 2018, 07:38:11 AM »
Why are you using the BEL character in the first place?
Don’t try to tell me that some power can corrupt a person. You haven’t had enough to know what it’s like.

No good deed goes unpunished / No act of charity goes unresented.

Offline AliG

  • Semi-Newbie
  • *
  • Posts: 13
Re: RSS encoding issue
« Reply #2 on: December 12, 2018, 07:39:42 AM »
When you try to send a batch text file in the CODE block which contains it ...

Offline Arantor

  • Resident Overthinker
  • SMF Friend
  • SMF Legend
  • *
  • Posts: 71,248
    • StoryBB/StoryBB on GitHub
Re: RSS encoding issue
« Reply #3 on: December 12, 2018, 08:10:15 AM »
Well... it’s legal UTF-8 as evidenced by how it survives in the post itself.

It’s definitely an edge case in RSS feeds, as it isn’t supported in XML 1.0 though it is in XML 1.1 but listed as highly discouraged.

The problem is... what to actually encode it as in that situation? I could conceivably see it becoming a numeric entity but there’s no guarantee it would be handled correctly by RSS feed parsers.

Is it a bug? I think so, but it’s also one of those horrendous corner cases that wouldn’t normally ever come up (I didn’t think anyone emitted literal BELs in batch files as there should be better ways to express that), and if I’m truly honest part of me thinks the correct fix is to actually break your use case anyway. I’d argue that the character should be converted at post save time to U+2407 which is the Unicode glyph that actually has visible content, which is kind of important in a forum, but sucks for your use case.

The alternative is to simply attach the file rather than copy paste it into a code block.
Don’t try to tell me that some power can corrupt a person. You haven’t had enough to know what it’s like.

No good deed goes unpunished / No act of charity goes unresented.

Offline AliG

  • Semi-Newbie
  • *
  • Posts: 13
Re: RSS encoding issue
« Reply #4 on: December 12, 2018, 08:16:41 AM »
Can it be encoded like  or  ?

Offline Arantor

  • Resident Overthinker
  • SMF Friend
  • SMF Legend
  • *
  • Posts: 71,248
    • StoryBB/StoryBB on GitHub
Re: RSS encoding issue
« Reply #5 on: December 12, 2018, 08:22:15 AM »
That was what I meant about numeric entity form, but there’s no guarantee it would be handled correctly anyway. Especially as the core content is emitted as CDATA fields where entity form is “supposed” to not be used. So, no, probably not.

I still think the correct solution here is to attach the batch file rather than trying to fix a problem that ultimately is more specification level than anything else.
Don’t try to tell me that some power can corrupt a person. You haven’t had enough to know what it’s like.

No good deed goes unpunished / No act of charity goes unresented.

Offline AliG

  • Semi-Newbie
  • *
  • Posts: 13
Re: RSS encoding issue
« Reply #6 on: December 12, 2018, 11:13:23 AM »
I understand your point but usually people put some short snippets into the CODE section.
It is really rare situation. It can be marked as solved if there is no simple solution for that.

Offline Arantor

  • Resident Overthinker
  • SMF Friend
  • SMF Legend
  • *
  • Posts: 71,248
    • StoryBB/StoryBB on GitHub
Re: RSS encoding issue
« Reply #7 on: December 12, 2018, 11:30:06 AM »
True, but they don’t usually put BELs in there ;)

It’s ultimaely not my call to make, though.
Don’t try to tell me that some power can corrupt a person. You haven’t had enough to know what it’s like.

No good deed goes unpunished / No act of charity goes unresented.

Offline Sesquipedalian

  • The Mad Doctor
  • On Hiatus
  • Sr. Member
  • *
  • Posts: 911
  • Gender: Male
  • It works! ... in theory.
    • Sesquipedalian on GitHub
Re: RSS encoding issue
« Reply #8 on: December 19, 2018, 02:09:18 AM »
According to the W3C spec, the only low ASCII control characters allowed in XML 1.0 are tab, line feed, and carriage return. Apparently this is true even when the characters are represented as entities (e.g. as  for the BELL character). All the control characters except NULL can be used in XML 1.1, but virtually no one uses XML 1.1 and the RSS spec requires XML 1.0. So there is simply no way to include a BEL character in an RSS feed.

The only improvement we could make in SMF would be to strip out the disallowed characters when generating the feed.
I promise you nothing.

Sesqu... Sesqui... what?
Sesquipedalian, the best word in the English language.