Bored?  Looking to kill some time?  Want to chat with other SMF users?  Join us in IRC chat or Discord

Main Menu

RSS encoding issue

Started by AliG, December 12, 2018, 07:36:21 AM

Previous topic - Next topic


there is an issue with RSS. When I use BEL character (Ctrl-G via command line, 0x07), it will break RSS. Shouldn't it be encoded somehow?

This is the char. Now, if you try to open RSS, it will show you that there is encoding issue and you won't be able to use RSS for some time.


Why are you using the BEL character in the first place?


When you try to send a batch text file in the CODE block which contains it ...


Well... it's legal UTF-8 as evidenced by how it survives in the post itself.

It's definitely an edge case in RSS feeds, as it isn't supported in XML 1.0 though it is in XML 1.1 but listed as highly discouraged.

The problem is... what to actually encode it as in that situation? I could conceivably see it becoming a numeric entity but there's no guarantee it would be handled correctly by RSS feed parsers.

Is it a bug? I think so, but it's also one of those horrendous corner cases that wouldn't normally ever come up (I didn't think anyone emitted literal BELs in batch files as there should be better ways to express that), and if I'm truly honest part of me thinks the correct fix is to actually break your use case anyway. I'd argue that the character should be converted at post save time to U+2407 which is the Unicode glyph that actually has visible content, which is kind of important in a forum, but sucks for your use case.

The alternative is to simply attach the file rather than copy paste it into a code block.


Can it be encoded like  or  ?


That was what I meant about numeric entity form, but there's no guarantee it would be handled correctly anyway. Especially as the core content is emitted as CDATA fields where entity form is "supposed" to not be used. So, no, probably not.

I still think the correct solution here is to attach the batch file rather than trying to fix a problem that ultimately is more specification level than anything else.


I understand your point but usually people put some short snippets into the CODE section.
It is really rare situation. It can be marked as solved if there is no simple solution for that.


True, but they don't usually put BELs in there ;)

It's ultimaely not my call to make, though.


According to the W3C spec, the only low ASCII control characters allowed in XML 1.0 are tab, line feed, and carriage return. Apparently this is true even when the characters are represented as entities (e.g. as  for the BELL character). All the control characters except NULL can be used in XML 1.1, but virtually no one uses XML 1.1 and the RSS spec requires XML 1.0. So there is simply no way to include a BEL character in an RSS feed.

The only improvement we could make in SMF would be to strip out the disallowed characters when generating the feed.
I promise you nothing.

Sesqu... Sesqui... what?
Sesquipedalian, the best word in the English language.