Loading ...
Sorry, an error occurred while loading the content.

536RFC822 date support

Expand Messages
  • James Holderness
    Mar 21, 2006
      Twenty contestants! Thirty-one tests! Spanning three hundred years and
      fourteen time zones! ... Sorry, I got a little over excited there. These are
      the results of my RSS date tests.

      Executive Summary

      The only format guaranteed to work in all aggregators tested is of the form
      "Thu, 09 Feb 2006 23:59:45 +0000". The leading zero in the day is optional
      and the time zone may also be "-0000" or "GMT". The case of the weekday and
      the month should be exactly as recommended in the RFC. The year should
      always be four digits. All parts of the date listed as optional must be
      included (except, as mentioned, the leading zero in the day). No comments
      should be used, and no whitespace other than a single space between the
      various components as shown. Dates too far into the future or past should be
      avoided too.

      If you're not worried about working in ALL aggregators (three or fewer
      failures) I can add that most supported US time zones, numeric time zones
      (although hours only), a missing weekday, and non-standard case for the
      weekday and month (time zones should always be uppercase though).

      For those of you that want more details, read on...

      Aggregator Tested

      Blogbridge, Bloglines, BottomFeeder, FeedDemon, FeedReader, Google Reader,
      GreatNews, Internet Explorer 7, JetBrains Omea, Newsgator Online,
      NewzCrawler, Pluck (Firefox plugin), RSSBandit, RSSOwl, Sharpreader,
      Snarfer, and Thunderbird.

      Not tested: Firefox and Netvibes don't seem to support dates (actually
      Firefox wouldn't even subscribe to my test feed, but I'm fairly sure that
      had nothing to do with the dates). MyYahoo! still doesn't subscribe to any
      of my test feeds (not sure why). I haven't been able to get hold of a Pocket
      PC to test FeederReader, but Greg may be running through the tests himself.
      10000 other aggregators haven't been tested because I don't have access to
      Linux, a Mac or I just haven't bothered downloading them.

      Testing Notes

      For the record, I'm one of the developers working on Snarfer and these test
      feeds were created from a small subset of my unit tests for our date
      routines. Obviously my code is going to pass all my own tests, so for the
      most part I'm going to discount it from the rest of this discussion. If I
      say something along the lines of "no aggregators were capable of parsing
      test X" assume I mean "No aggregator, other than Snarfer". When I say
      something like "4 aggregators failed to parse test X", you should probably
      read that as "4 of 16".

      Also, it should be noted that these tests were designed to find failures.
      The significance of this became apparent to me when Sam suggested testing
      for a single comment at the end of the date. My comment tests were all
      fairly complicated because I was trying to find failures. If you're trying
      to determine what elements of RSS (or in this case RSS dates) are likely to
      be well supported then Sam's kind of test is probably far more useful than
      those that I've been doing. Bare that in mind when considering these

      Time Zones

      As has been mentioned before, military time zones were improperly specified
      in RFC822. They are described as "carrying no information" in RFC1123, and
      RFC2822 recommends they be considered equivalent to -0000 (essentially UT).
      As a result it's hard to say what is right or wrong about military time zone
      interpretation. The most RFC-accurate interpretation is probably UT (and
      nine aggregators made this choice, myself included). Only one aggregator
      (RSSBandit) followed the original definition in RFC822, but I'm seriously
      considering doing that myself since it's actually the interpretation most
      likely to be correct. Three aggregators chose to interpret military time
      zones as equivalent to the user's local time zone (which I would say is
      probably wrong, but some might argue otherwise). Four failed to parse these
      tests in any meaningful way at all.

      US time zones were parsed successfully by most. Only two aggregators failed
      with any of the tests.

      Hour-based numeric time zones were also parsed successfully by most, also
      with only two failures. However, four aggregators failed to parse numeric
      time zones when minutes were used.

      Support for the various forms of Universal Time was disappointing. Everyone
      could handle "GMT", "+0000" and "-0000", but five failed to parse "UT" and
      four failed to parse "Z".

      Case Independence

      Section 3.4.7 of RFC822 notes that "alphabetic strings may be represented in
      any combination of upper and lower case", although they suggest using the
      case shown in the specification when generating values.

      Only one aggregator failed to parse a date when the weekday used
      non-standard case (probably because most ignore the day altogether). Three
      failed when the month used non-standard cased. Five failed to parse time
      zones that weren't all uppercase (actually nine failed to pass all the
      tests, but some of those were a result of basic time zone failures).

      Optional Elements

      The day of the month could be 1 or 2 digits according to RFC822. No
      aggregators had any problems with dates using a single digit day.

      The weekday element of a date is defined as optional in RFC822, but three
      aggregators failed to parse dates when it was missing.

      The seconds in the time element are also optional, but four aggregators
      failed to parse dates when that was missing.

      Two digit years were the only allowed form in RFC822, but four digits were
      recommended in RFC1123 and the RSS spec itself. RFC2822 provides specific
      rules for interpreting four digit years (namely 00 to 49 are equivalent to
      2000 to 2049, and 50 to 99 are equivalent to 1950 to 1999). Based on these
      rules eight aggregators failed to parse two digit years successfully. Being
      more lenient and allowing that 96 could be equivalent to 2096 would drop the
      failure rate to five. 96AD as a further valid alternative would drop the
      failure rate to four.


      RFC822 allowed comments almost anywhere within a date. However, RFC2822 only
      permitted comments at the end of a date, although "conformant receivers" are
      still required to parse old-style RFC822 comments. In addition, section
      3.4.2 of RFC822 adds: "When passing text to processes that do not interpret
      text according to this standard [...] exactly ONE SPACE should be used in
      place of arbitrary linear-white-space and comment sequences".

      While RSS uses the RFC822 date format, it does not process RFC822 messages
      and it could be argued, therefore, that it does not interpret text according
      to the RFC822 standard. By my reading that would make comments "not
      recommended", although probably still allowed.

      Regardless of how you interpret the spec, the bottom line is that none of
      the aggregators tested were capable of parsing my initial comment tests. The
      simple comment test that Sam suggested (basically the RFC2822 format) still
      failed on four aggregators.


      As with comments, RFC822 allowed any amount of whitespace (in the form of
      spaces and horizontal tabs) almost anywhere in a date. Its "folding" rules
      also permitted CRLF followed by at least one whitespace character in order
      to split long headers over multiple lines. RFC2822 is slightly more
      restrictive about where whitespace is allowed, but still requires conformant
      receivers to support the old format. As with comments, it could be argued
      that none of these rules really apply to RSS.

      As far as the testing goes, nine aggregators failed the whitespace tests.
      The worst support was for the test that used horizontal tabs (although the
      failure could also have been a result of where the whitespace was used
      rather that what was being used). The test that included "folding" but
      otherwise used only a single space in "expected" positions only failed on
      six aggregators.

      Past and Future

      In order to test integer overflows for distant past and distant future dates
      I included a couple of tests with dates from 1806, 1906, 2016 and 2106.
      Eight aggregators failed to parse 1806, five failed to parse 2106, four
      failed to parse 1906 and only one failed to parse 2016.


      RFC822 date support is a whole lot worse than I would have expected. On
      average aggregators failed over 25% of the tests. The worst failed nearly
      half the tests; the best still failed more than 10%. Admittedly these
      numbers don't mean much, but I would have expected better. Another
      interesting point is that every aggregator failed in different ways. This
      would suggest that people are not using standard libraries or else they're
      all using different languages. Those that are using standard libraries don't
      seem to be getting the well-executed, thoroughly-tested implementations they
      might have been hoping for.

    • Show all 7 messages in this topic