How to clear RSS/Atom 'description' from html tags

Started by Butiks, April 10, 2023, 12:13:56 PM

Previous topic - Next topic

Butiks

Hello

On my forum SMF v2.1.3 there is an rss tape and an atom.
<link rel="alternate" type="application/rss+xml" title="... - RSS" href=".../index.php?action=.xml;type=rss2">
<link rel="alternate" type="application/atom+xml" title="... - Atom" href=".../index.php?action=.xml;type=atom">
I noticed that in rss tape, have a tag "description" with html - it is not right!

<description><![CDATA[<div class="centertext"><span style="color: #2E8B57;" class="bbc_color"><span style="font-size: 1.35em;" class="bbc_size">Lorem ipsum</span></span></div><b> is simply dummy text of the printing and typesetting industry.</b><br>Lorem Ipsum has been the industry's standard dummy text ever since the 1500s<br><br>[url=&quot;https://github.com/...]]></description>
Need plain text!  :)

Tell me, what changes are needed in the script ../Sources/News.php to clear all tags "description" from html?
SMF: 2.1.3
Mods: Optimus, Hide Content, Quick Spoiler, Avatars Display Integration, Similar Topics, Simple Colorizer

Aleksi "Lex" Kilpinen

It is right, RSS2 description tags includes text and entity encoded html. CDATA is a workaround to deliver post contents without messing with it.
CDATA stands for Character Data and means that the data could be interpreted as XML markup, but should not be.
Slava
Ukraini!
"Before you allow people access to your forum, especially in an administrative position, you must be aware that that person can seriously damage your forum. Therefore, you should only allow people that you trust, implicitly, to have such access." -Douglas

How you can help SMF

Arantor

Yup, CDATA is the correct way to send HTML markup in an RSS or Atom feed, that's how you embed content that looks like XML but isn't inside XML - because it would try to interpret the div and so on as part of the regular XML, and it'll fail because those elements are not defined in the DTD or XML schema (or the namespace, for that matter)

But since it has value to export rich content in the feed (e.g. for something like Thunderbird to pick up with a formatted preview), the HTML is not stripped but passed as best possible to the RSS client.
Holder of controversial views, all of which my own.


Advertisement: