Loading ...
Sorry, an error occurred while loading the content.

Overriding Core elements via extensions

Expand Messages
  • Sam Ruby
    I ve seen a recent suggestion on this list that Atom elements may be used to replace core elements in RSS 2.0 that have perceived deficiencies. Others have
    Message 1 of 3 , Feb 21, 2006
    • 0 Attachment
      I've seen a recent suggestion on this list that Atom elements may be
      used to replace core elements in RSS 2.0 that have perceived
      deficiencies. Others have used Dublin Core, xhtml, and content
      namespaces for similar purposes.

      In general, there is no blanket statement that covers each case. In
      some cases, it helps interop. In others, it detracts from interop.

      Consider an RSS 2.0 item which contains only the following elements:
      description
      content:encoded
      xhtml:body
      atom:summary
      atom:content

      That's an extreme case, but in a minute I'll discuss a few real live
      examples that have similar issues. The first point I would like to make
      is that if we were to take a survey of existing implementations, I would
      bet that the most common interpretation of this item would be to prefer
      content:encoded, the least common interpretation would be to prefer
      atom:summary. The core element, description, might very well be the
      median, but the mode (i.e., most common behavior) is likely to a
      preference for content:encoded.

      But before I go there, let's discuss xhtml:body. This was first
      suggested by Don Box, and is present in dasBlog feeds, and is supported
      by aggregators like Bloglines and RSS Bandit. DasBlog feeds also repeat
      this information, essentially verbatim, in the description.

      WordPress feeds put the full content in content:encoded. In addition,
      they respect the original definition of description and put a plain text
      version of the content there. Plain text descriptions are problematic
      in 2006, but the inclusion of content:encoded significantly mitigates
      the problem as most aggregators prefer it over description.

      TypePad feeds put the full content in content:encoded. In addition,
      they respect a different aspect of the original definition of
      description and put an excerpt (or summary) there. Per the extant
      conventions of the day, the description is excaped HTML.

      Three feeds. Three sets of approaches. Undoubtedly, there are more.
      All are valid. As has already been stated, the ship has already sailed:
      content:encoded is in too wide of use to restrict its use. There may
      be some room for tightening up when xhtml:body, atom:summary, and
      atom:content elements are NOT RECOMMENDED.

      Ideally, aggregator developers would get together and agree on a common
      set of precidence rules. The Feed Validator could help to enforce these.

      = = =

      While the set of extensions that may possibly override core elements is
      potentially unbounded, in practice there actually are very few cases.
      And as I said, there is no one blanket statement that covers all cases.

      Consider the other extreme: dc:subject. The only advantage I can
      conceive of dc:subject over category is that it is easier to create a
      subject for "AC/DC" - but even that is a stretch. Despite this, RSS 2.0
      feeds with dc:subject can be found. What makes this case even more
      interesting is that unlike description, multiple category elements are
      permitted. So it may not be a simple matter of precendence, some
      implementations may simply treat these elements as synonyms and collect
      all of them.

      As such, I think that having both dc:subject and category SHOULD NOT be
      present in the same item. In fact, a case could be made that dc:subject
      should be discouraged altogether. It is common to use RFC 2119
      terminology to express what amounts to Potel's law here: items SHOULD
      NOT contain dc:subject elements, but feed processors SHOULD treat all
      such items as if they were category elements.

      I don't know if you want to go that far, I'm just tossing it out as an
      option.

      = = =

      dc:date is widely used as an alternative for pubDate, and avoids some
      nasty internationalization issues that affect a small percentage of
      deployment platforms. Again, I think that both SHOULD NOT ever appear
      in a single item, but feed processors SHOULD treat them as equivalent.

      = = =

      dc:creator is perhaps second only to content:encoded in terms of
      widespread usage. Unlike author, managingEditor, or webMaster,
      dc:creator is designed as a display name instead of a contact.

      = = =

      Those few elements cover the vast majority of cases that I know of where
      an extension element overrides a core element in widespread
      implementations. Guid vs link merits an entirely separate discussion.
      dc:rights, admin:generatorAgent, dc:language are less frequently used.
      And, of course, there are some specialized applications with (allegedly)
      special needs - itunes being the single biggest example of this.

      There is one element from Atom that I have seen recommended, most
      notably by Randy. Furthermore, this recommendation has been adopted by
      FeedBurner. It is for an atom:link element with a rel="self". I don't
      honestly know how widely supported this recommendation is or whether or
      not the RSS Advisory Board would like to endorse this recommendation.
      In any case, this is not the case of an extension overriding a core element.

      = = =

      All of the above is simply offered as observations and/or non-binding
      advice. I simply thought there would be value in trying to scope out
      the complete set. While it is entirely possible that I missed
      something, it was my intent for the list above to be exhaustive.
      Perhaps others reading this can endorse, amend, or disagree with any or
      all of the above.

      Of course, where any or all of this goes is up for discussion. My take,
      for what it is worth, if you look at the complete list, it is relatively
      small. And the interop value for including this information is very high.

      - Sam Ruby
    • Sam Ruby
      ... Of course, as soon as I sent this, I remembered one more. The discussion that the draft-1 description of guid could be interpreted as meaning that the
      Message 2 of 3 , Feb 21, 2006
      • 0 Attachment
        Sam Ruby wrote:
        >
        > All of the above is simply offered as observations and/or non-binding
        > advice. I simply thought there would be value in trying to scope out
        > the complete set. While it is entirely possible that I missed
        > something, it was my intent for the list above to be exhaustive.
        > Perhaps others reading this can endorse, amend, or disagree with any or
        > all of the above.

        Of course, as soon as I sent this, I remembered one more. The
        discussion that the draft-1 description of guid "could be interpreted as
        meaning that the feed producer should allocate a new guid if an item
        changes". Like Randy, I don't think that was the intent, but I do
        believe that Andy has two points that should be considered (1) the
        description of guid needs to be clarified (and I will have more on that
        in a later post), and (2) there is a need for this. Dare has commented
        on this need in his weblog:

        http://www.25hoursaday.com/weblog/CommentView.aspx?guid=4c8d83e9-bc7e-432d-b5b2-07965bd959ad

        Dare initially suggested dcterms:modified, atom:updated was proposed as
        perhaps a better fit.

        - Sam Ruby
      • James Holderness
        ... I ran a couple of tests through my little aggregator collection and these were the results: Basically everyone supported content:encoded (except
        Message 3 of 3 , Feb 21, 2006
        • 0 Attachment
          Sam Ruby wrote:
          > Consider an RSS 2.0 item which contains only the following elements:
          > description
          > content:encoded
          > xhtml:body
          > atom:summary
          > atom:content

          I ran a couple of tests through my little aggregator collection and these
          were the results:

          Basically everyone supported content:encoded (except Thunderbird), almost
          nobody supported atom extensions (except Sharpreader), and around half
          supported xhtml:body to some extent (prefixed xhtml generally caused
          problems). When they're mixed together in a single item, extensions are
          usually chosen before the standard description element and when multiple
          extensions are supported by an aggregator, the last one encountered usually
          takes preference.

          Aggregators tested: Blogbridge, Bloglines, BottomFeeder, FeedDemon,
          FeedReader, Googler Reader, GreatNews, JetBrains Omea, Netvibes, Newsgator
          Online, NewzCrawler, RSSBandit, RSSOwl, Sharpreader, Snarfer and
          Thunderbird.

          Bloglines, BottomFeeder, JetBrains Omea, Newsgator Online, NewzCrawler,
          RSSBandit, Sharpreader and Snarfer all supported xhtml:body. Only Snarfer
          interpreted markup when it was prefixed (i.e. xhtml wasn't the default
          namespace), BottomFeeder never interpreted the markup regardless of whether
          it was prefixed or not, and Bloglines failed to display any content at all
          when the markup was prefixed.

          When all the elements were included in the order you listed, Bloglines,
          JetBrains Omea, Newsgator Online, NewzCrawler, RSSBandit and Snarfer
          displayed xhtml:body, Sharpreader displayed atom:content, Thunderbird
          displayed the description, and everyone else displayed content:encoded.

          When the elements were included in reverse order, FeedDemon, NewzCrawler and
          Thunderbird displayed the description element, Newsgator Online display
          xhtml:body, and everyone else display content:encoded.

          It's probably also worth nothing that everyone interpreted the markup
          correctly when included in the description element although this wasn't
          intended to be a markup test so the example used was as simple as possible.

          Regards
          James
        Your message has been successfully submitted and would be delivered to recipients shortly.