Overriding Core elements via extensions
- I've seen a recent suggestion on this list that Atom elements may be
used to replace core elements in RSS 2.0 that have perceived
deficiencies. Others have used Dublin Core, xhtml, and content
namespaces for similar purposes.
In general, there is no blanket statement that covers each case. In
some cases, it helps interop. In others, it detracts from interop.
Consider an RSS 2.0 item which contains only the following elements:
That's an extreme case, but in a minute I'll discuss a few real live
examples that have similar issues. The first point I would like to make
is that if we were to take a survey of existing implementations, I would
bet that the most common interpretation of this item would be to prefer
content:encoded, the least common interpretation would be to prefer
atom:summary. The core element, description, might very well be the
median, but the mode (i.e., most common behavior) is likely to a
preference for content:encoded.
But before I go there, let's discuss xhtml:body. This was first
suggested by Don Box, and is present in dasBlog feeds, and is supported
by aggregators like Bloglines and RSS Bandit. DasBlog feeds also repeat
this information, essentially verbatim, in the description.
WordPress feeds put the full content in content:encoded. In addition,
they respect the original definition of description and put a plain text
version of the content there. Plain text descriptions are problematic
in 2006, but the inclusion of content:encoded significantly mitigates
the problem as most aggregators prefer it over description.
TypePad feeds put the full content in content:encoded. In addition,
they respect a different aspect of the original definition of
description and put an excerpt (or summary) there. Per the extant
conventions of the day, the description is excaped HTML.
Three feeds. Three sets of approaches. Undoubtedly, there are more.
All are valid. As has already been stated, the ship has already sailed:
content:encoded is in too wide of use to restrict its use. There may
be some room for tightening up when xhtml:body, atom:summary, and
atom:content elements are NOT RECOMMENDED.
Ideally, aggregator developers would get together and agree on a common
set of precidence rules. The Feed Validator could help to enforce these.
= = =
While the set of extensions that may possibly override core elements is
potentially unbounded, in practice there actually are very few cases.
And as I said, there is no one blanket statement that covers all cases.
Consider the other extreme: dc:subject. The only advantage I can
conceive of dc:subject over category is that it is easier to create a
subject for "AC/DC" - but even that is a stretch. Despite this, RSS 2.0
feeds with dc:subject can be found. What makes this case even more
interesting is that unlike description, multiple category elements are
permitted. So it may not be a simple matter of precendence, some
implementations may simply treat these elements as synonyms and collect
all of them.
As such, I think that having both dc:subject and category SHOULD NOT be
present in the same item. In fact, a case could be made that dc:subject
should be discouraged altogether. It is common to use RFC 2119
terminology to express what amounts to Potel's law here: items SHOULD
NOT contain dc:subject elements, but feed processors SHOULD treat all
such items as if they were category elements.
I don't know if you want to go that far, I'm just tossing it out as an
= = =
dc:date is widely used as an alternative for pubDate, and avoids some
nasty internationalization issues that affect a small percentage of
deployment platforms. Again, I think that both SHOULD NOT ever appear
in a single item, but feed processors SHOULD treat them as equivalent.
= = =
dc:creator is perhaps second only to content:encoded in terms of
widespread usage. Unlike author, managingEditor, or webMaster,
dc:creator is designed as a display name instead of a contact.
= = =
Those few elements cover the vast majority of cases that I know of where
an extension element overrides a core element in widespread
implementations. Guid vs link merits an entirely separate discussion.
dc:rights, admin:generatorAgent, dc:language are less frequently used.
And, of course, there are some specialized applications with (allegedly)
special needs - itunes being the single biggest example of this.
There is one element from Atom that I have seen recommended, most
notably by Randy. Furthermore, this recommendation has been adopted by
FeedBurner. It is for an atom:link element with a rel="self". I don't
honestly know how widely supported this recommendation is or whether or
not the RSS Advisory Board would like to endorse this recommendation.
In any case, this is not the case of an extension overriding a core element.
= = =
All of the above is simply offered as observations and/or non-binding
advice. I simply thought there would be value in trying to scope out
the complete set. While it is entirely possible that I missed
something, it was my intent for the list above to be exhaustive.
Perhaps others reading this can endorse, amend, or disagree with any or
all of the above.
Of course, where any or all of this goes is up for discussion. My take,
for what it is worth, if you look at the complete list, it is relatively
small. And the interop value for including this information is very high.
- Sam Ruby
- Sam Ruby wrote:
>Of course, as soon as I sent this, I remembered one more. The
> All of the above is simply offered as observations and/or non-binding
> advice. I simply thought there would be value in trying to scope out
> the complete set. While it is entirely possible that I missed
> something, it was my intent for the list above to be exhaustive.
> Perhaps others reading this can endorse, amend, or disagree with any or
> all of the above.
discussion that the draft-1 description of guid "could be interpreted as
meaning that the feed producer should allocate a new guid if an item
changes". Like Randy, I don't think that was the intent, but I do
believe that Andy has two points that should be considered (1) the
description of guid needs to be clarified (and I will have more on that
in a later post), and (2) there is a need for this. Dare has commented
on this need in his weblog:
Dare initially suggested dcterms:modified, atom:updated was proposed as
perhaps a better fit.
- Sam Ruby
- Sam Ruby wrote:
> Consider an RSS 2.0 item which contains only the following elements:I ran a couple of tests through my little aggregator collection and these
were the results:
Basically everyone supported content:encoded (except Thunderbird), almost
nobody supported atom extensions (except Sharpreader), and around half
supported xhtml:body to some extent (prefixed xhtml generally caused
problems). When they're mixed together in a single item, extensions are
usually chosen before the standard description element and when multiple
extensions are supported by an aggregator, the last one encountered usually
Aggregators tested: Blogbridge, Bloglines, BottomFeeder, FeedDemon,
FeedReader, Googler Reader, GreatNews, JetBrains Omea, Netvibes, Newsgator
Online, NewzCrawler, RSSBandit, RSSOwl, Sharpreader, Snarfer and
Bloglines, BottomFeeder, JetBrains Omea, Newsgator Online, NewzCrawler,
RSSBandit, Sharpreader and Snarfer all supported xhtml:body. Only Snarfer
interpreted markup when it was prefixed (i.e. xhtml wasn't the default
namespace), BottomFeeder never interpreted the markup regardless of whether
it was prefixed or not, and Bloglines failed to display any content at all
when the markup was prefixed.
When all the elements were included in the order you listed, Bloglines,
JetBrains Omea, Newsgator Online, NewzCrawler, RSSBandit and Snarfer
displayed xhtml:body, Sharpreader displayed atom:content, Thunderbird
displayed the description, and everyone else displayed content:encoded.
When the elements were included in reverse order, FeedDemon, NewzCrawler and
Thunderbird displayed the description element, Newsgator Online display
xhtml:body, and everyone else display content:encoded.
It's probably also worth nothing that everyone interpreted the markup
correctly when included in the description element although this wasn't
intended to be a markup test so the example used was as simple as possible.