Loading ...
Sorry, an error occurred while loading the content.
 

failing aggregator tests using HTML in RSS 1.0

Expand Messages
  • Ken MacLeod
    Sam Ruby recently created some aggregator test cases for Atom, which he also ported to RSS 1.0 and RSS 2.0 [1]. Not surprisingly, it showed up some failing
    Message 1 of 1 , Jun 5, 2004
      Sam Ruby recently created some aggregator test cases for Atom, which
      he also ported to RSS 1.0 and RSS 2.0 [1]. Not surprisingly, it
      showed up some failing tests for aggregators using RSS 1.0 as well.

      In these particular tests, Sam is using utf-16 encoded feeds with the
      word Internationalization spelled using a variety of international
      characters and character references. In the Atom tests, Sam is using
      an Atom feature that allows entries to include escaped HTML markup in
      the <title> and <content> elements. In porting these to RSS, however,
      Sam kept the test cases that included escaped HTML markup, which RSS
      1.0 does not allow for in <title> and <description> (the Content
      module does support it in content:encoded, however).

      It's not the misunderstanding of escaped markup in the tests that's an
      issue, but the rendering of the tests by aggregators and the
      misunderstanding of the results. For several aggregators that were
      tested using RSS 1.0, the testers indicated that the aggregator
      "passed" if it rendered the "expected" international characters, even
      in the escaped HTML markup.

      The correct results would be for the characters of the escaped HTML
      markup to appear literally to the user/tester. For example, this
      title in Atom, not using the escaped-HTML feature:

      <atom:title>An accented character: &eacute;</atom:title>

      and this title in RSS 1.0:

      <rss1:title>An accented character: &eacute;</rss1:title>

      should appear to the user as:

      An accented character: é

      for any readers whose display isn't getting that, that's the literal
      characters "& e a c u t e ;".

      What to do about this? Unfortunately, it is likely that this is a
      widespread error.

      The first step would be to file a bug report, supply test cases, and
      patches for the Feed Validator to provide a warning for this issue
      (it's not an error, as the characters themselves are valid, but users
      might be unaware that they should render literally).

      The next step would be to expand on Sam's tests with RSS 1.0 test
      feeds, in particular using his earlier UTF-8 tests which weren't
      ported to RSS 1.0.

      With the feed validator fixed and a suite of tests, we could then
      begin guiding aggregators and producers in correcting their feeds.

      Thoughts?

      -- Ken

      [1] http://www.intertwingly.net/blog/2004/06/03/Aggregator-utf-16-tests
    Your message has been successfully submitted and would be delivered to recipients shortly.