Loading ...
Sorry, an error occurred while loading the content.

RFC822 date support

Expand Messages
  • James Holderness
    Twenty contestants! Thirty-one tests! Spanning three hundred years and fourteen time zones! ... Sorry, I got a little over excited there. These are the results
    Message 1 of 7 , Mar 21 10:27 AM
    • 0 Attachment
      Twenty contestants! Thirty-one tests! Spanning three hundred years and
      fourteen time zones! ... Sorry, I got a little over excited there. These are
      the results of my RSS date tests.


      Executive Summary

      The only format guaranteed to work in all aggregators tested is of the form
      "Thu, 09 Feb 2006 23:59:45 +0000". The leading zero in the day is optional
      and the time zone may also be "-0000" or "GMT". The case of the weekday and
      the month should be exactly as recommended in the RFC. The year should
      always be four digits. All parts of the date listed as optional must be
      included (except, as mentioned, the leading zero in the day). No comments
      should be used, and no whitespace other than a single space between the
      various components as shown. Dates too far into the future or past should be
      avoided too.

      If you're not worried about working in ALL aggregators (three or fewer
      failures) I can add that most supported US time zones, numeric time zones
      (although hours only), a missing weekday, and non-standard case for the
      weekday and month (time zones should always be uppercase though).

      For those of you that want more details, read on...


      Aggregator Tested

      Blogbridge, Bloglines, BottomFeeder, FeedDemon, FeedReader, Google Reader,
      GreatNews, Internet Explorer 7, JetBrains Omea, Newsgator Online,
      NewzCrawler, Pluck (Firefox plugin), RSSBandit, RSSOwl, Sharpreader,
      Snarfer, and Thunderbird.

      Not tested: Firefox and Netvibes don't seem to support dates (actually
      Firefox wouldn't even subscribe to my test feed, but I'm fairly sure that
      had nothing to do with the dates). MyYahoo! still doesn't subscribe to any
      of my test feeds (not sure why). I haven't been able to get hold of a Pocket
      PC to test FeederReader, but Greg may be running through the tests himself.
      10000 other aggregators haven't been tested because I don't have access to
      Linux, a Mac or I just haven't bothered downloading them.


      Testing Notes

      For the record, I'm one of the developers working on Snarfer and these test
      feeds were created from a small subset of my unit tests for our date
      routines. Obviously my code is going to pass all my own tests, so for the
      most part I'm going to discount it from the rest of this discussion. If I
      say something along the lines of "no aggregators were capable of parsing
      test X" assume I mean "No aggregator, other than Snarfer". When I say
      something like "4 aggregators failed to parse test X", you should probably
      read that as "4 of 16".

      Also, it should be noted that these tests were designed to find failures.
      The significance of this became apparent to me when Sam suggested testing
      for a single comment at the end of the date. My comment tests were all
      fairly complicated because I was trying to find failures. If you're trying
      to determine what elements of RSS (or in this case RSS dates) are likely to
      be well supported then Sam's kind of test is probably far more useful than
      those that I've been doing. Bare that in mind when considering these
      results.


      Time Zones

      As has been mentioned before, military time zones were improperly specified
      in RFC822. They are described as "carrying no information" in RFC1123, and
      RFC2822 recommends they be considered equivalent to -0000 (essentially UT).
      As a result it's hard to say what is right or wrong about military time zone
      interpretation. The most RFC-accurate interpretation is probably UT (and
      nine aggregators made this choice, myself included). Only one aggregator
      (RSSBandit) followed the original definition in RFC822, but I'm seriously
      considering doing that myself since it's actually the interpretation most
      likely to be correct. Three aggregators chose to interpret military time
      zones as equivalent to the user's local time zone (which I would say is
      probably wrong, but some might argue otherwise). Four failed to parse these
      tests in any meaningful way at all.

      US time zones were parsed successfully by most. Only two aggregators failed
      with any of the tests.

      Hour-based numeric time zones were also parsed successfully by most, also
      with only two failures. However, four aggregators failed to parse numeric
      time zones when minutes were used.

      Support for the various forms of Universal Time was disappointing. Everyone
      could handle "GMT", "+0000" and "-0000", but five failed to parse "UT" and
      four failed to parse "Z".


      Case Independence

      Section 3.4.7 of RFC822 notes that "alphabetic strings may be represented in
      any combination of upper and lower case", although they suggest using the
      case shown in the specification when generating values.

      Only one aggregator failed to parse a date when the weekday used
      non-standard case (probably because most ignore the day altogether). Three
      failed when the month used non-standard cased. Five failed to parse time
      zones that weren't all uppercase (actually nine failed to pass all the
      tests, but some of those were a result of basic time zone failures).


      Optional Elements

      The day of the month could be 1 or 2 digits according to RFC822. No
      aggregators had any problems with dates using a single digit day.

      The weekday element of a date is defined as optional in RFC822, but three
      aggregators failed to parse dates when it was missing.

      The seconds in the time element are also optional, but four aggregators
      failed to parse dates when that was missing.

      Two digit years were the only allowed form in RFC822, but four digits were
      recommended in RFC1123 and the RSS spec itself. RFC2822 provides specific
      rules for interpreting four digit years (namely 00 to 49 are equivalent to
      2000 to 2049, and 50 to 99 are equivalent to 1950 to 1999). Based on these
      rules eight aggregators failed to parse two digit years successfully. Being
      more lenient and allowing that 96 could be equivalent to 2096 would drop the
      failure rate to five. 96AD as a further valid alternative would drop the
      failure rate to four.


      Comments

      RFC822 allowed comments almost anywhere within a date. However, RFC2822 only
      permitted comments at the end of a date, although "conformant receivers" are
      still required to parse old-style RFC822 comments. In addition, section
      3.4.2 of RFC822 adds: "When passing text to processes that do not interpret
      text according to this standard [...] exactly ONE SPACE should be used in
      place of arbitrary linear-white-space and comment sequences".

      While RSS uses the RFC822 date format, it does not process RFC822 messages
      and it could be argued, therefore, that it does not interpret text according
      to the RFC822 standard. By my reading that would make comments "not
      recommended", although probably still allowed.

      Regardless of how you interpret the spec, the bottom line is that none of
      the aggregators tested were capable of parsing my initial comment tests. The
      simple comment test that Sam suggested (basically the RFC2822 format) still
      failed on four aggregators.


      Whitespace

      As with comments, RFC822 allowed any amount of whitespace (in the form of
      spaces and horizontal tabs) almost anywhere in a date. Its "folding" rules
      also permitted CRLF followed by at least one whitespace character in order
      to split long headers over multiple lines. RFC2822 is slightly more
      restrictive about where whitespace is allowed, but still requires conformant
      receivers to support the old format. As with comments, it could be argued
      that none of these rules really apply to RSS.

      As far as the testing goes, nine aggregators failed the whitespace tests.
      The worst support was for the test that used horizontal tabs (although the
      failure could also have been a result of where the whitespace was used
      rather that what was being used). The test that included "folding" but
      otherwise used only a single space in "expected" positions only failed on
      six aggregators.


      Past and Future

      In order to test integer overflows for distant past and distant future dates
      I included a couple of tests with dates from 1806, 1906, 2016 and 2106.
      Eight aggregators failed to parse 1806, five failed to parse 2106, four
      failed to parse 1906 and only one failed to parse 2016.


      Conclusions

      RFC822 date support is a whole lot worse than I would have expected. On
      average aggregators failed over 25% of the tests. The worst failed nearly
      half the tests; the best still failed more than 10%. Admittedly these
      numbers don't mean much, but I would have expected better. Another
      interesting point is that every aggregator failed in different ways. This
      would suggest that people are not using standard libraries or else they're
      all using different languages. Those that are using standard libraries don't
      seem to be getting the well-executed, thoroughly-tested implementations they
      might have been hoping for.

      Regards
      James
    • Charles Iliya Krempeaux
      Hello, On 3/21/06, James Holderness wrote: [...] Conclusions ... Just out of curiousity, do you have a list of the standard libraries
      Message 2 of 7 , Mar 21 11:36 AM
      • 0 Attachment
        Hello,

        On 3/21/06, James Holderness <j4_james@...> wrote:

        [...]
         

        Conclusions

        RFC822 date support is a whole lot worse than I would have expected. On
        average aggregators failed over 25% of the tests. The worst failed nearly
        half the tests; the best still failed more than 10%. Admittedly these
        numbers don't mean much, but I would have expected better. Another
        interesting point is that every aggregator failed in different ways. This
        would suggest that people are not using standard libraries or else they're
        all using different languages. Those that are using standard libraries don't
        seem to be getting the well-executed, thoroughly-tested implementations they
        might have been hoping for.

        Just out of curiousity, do you have a list of the "standard libraries" for handling these types of dates?


        See ya

        --
            Charles Iliya Krempeaux, B.Sc.

            charles @ reptile.ca
            supercanadian @ gmail.com

            developer weblog: http://ChangeLog.ca/

      • ecomputerd
        James, I have tested FeederReader on the Pocket PC and (now, after updates) it passes all 31 date parsing tests, including comments and nested comments. Greg
        Message 3 of 7 , Mar 21 2:12 PM
        • 0 Attachment
          James,

          I have tested FeederReader on the Pocket PC and (now, after updates)
          it passes all 31 date parsing tests, including comments and nested
          comments.

          Greg Smith
        • robertsayre2000
          ... Thunderbird uses the reasonably well-tested Mozilla Mail/News library to parse RSS dates, but the syndication code screens things out with a relatively
          Message 4 of 7 , Mar 21 2:26 PM
          • 0 Attachment
            --- In rss-public@yahoogroups.com, "James Holderness" <j4_james@...>
            wrote:
            > Only one aggregator(RSSBandit) followed the original definition in
            > RFC822, but I'm seriously considering doing that myself since it's
            > actually the interpretation most likely to be correct.

            Thunderbird uses the reasonably well-tested Mozilla Mail/News library
            to parse RSS dates, but the syndication code screens things out with a
            relatively stringent regex. Could you list the test failures for
            Thunderbird specifically? I would like to file bugs on them, if necessary.

            thanks,
            Rob
          • James Holderness
            ... Time Zones: Thunderbird interprets any military time zone as being equivalent to the user s local time zone. Personally I consider that wrong but for these
            Message 5 of 7 , Mar 21 3:38 PM
            • 0 Attachment
              robertsayre2000 wrote:
              > Thunderbird uses the reasonably well-tested Mozilla Mail/News library
              > to parse RSS dates, but the syndication code screens things out with a
              > relatively stringent regex. Could you list the test failures for
              > Thunderbird specifically? I would like to file bugs on them, if necessary.

              Time Zones:
              Thunderbird interprets any military time zone as being equivalent to the
              user's local time zone. Personally I consider that wrong but for these tests
              I didn't count that against it. However I would expect it to be capable of
              parsing the Z time zone as GMT in the following example:

              Thu, 09 Feb 2006 23:59:45 Z

              Case Independence:
              These tests were all meant to test non-standard case, but the last one is
              assumedly just a continuation of the military time zone bug.

              tHu, 09 Feb 2006 23:59:45 +0000
              Thu, 09 fEb 2006 23:59:45 +0000
              Thu, 09 Feb 2006 16:59:45 pDt
              Thu, 09 Feb 2006 23:59:45 z

              Comments:
              Note the backslash in the second example is part of the test.

              Thu(oh how I hate thursdays), 9(th) Feb(ruary) 2006 23:59:45 +0000(GMT
              r0x0rz)
              Thu, 09 (nested (comment) Mar)Feb(escaped comment\) 2005) 2006()23:59:45
              +0000
              Thu, 9 Feb 2006 23:59:45 +0000 (GMT)

              Whitespace:
              Tabs and CR/LF are shown using C escaping.

              \tThu , \t 09 Feb\t2006 23:59:45\t +0000
              \r\n Thu,\r\n 09 Feb 2006\r\n 23:59:45 +0000\r\n

              Past and Future:
              The results Thunderbird produced for these was quite weird.

              Sun, 09 Feb 1806 23:59:45 +0000
              Fri, 09 Feb 1906 23:59:45 +0000
              Tue, 09 Feb 2106 23:59:45 +0000

              The tests were run on Windows XP with version 1.6a1 (20051215).

              Regards
              James
            • James Holderness
              Thanks Greg. That s excellent.
              Message 6 of 7 , Mar 21 3:39 PM
              • 0 Attachment
                Thanks Greg. That's excellent.

                ecomputerd wrote:

                > I have tested FeederReader on the Pocket PC and (now, after updates)
                > it passes all 31 date parsing tests, including comments and nested
                > comments.
              • James Holderness
                ... I m afraid not. I know PHP has a strtotime function which I believe is supposed to be capable of parsing RFC822 dates. In .NET you should be able to use
                Message 7 of 7 , Mar 21 3:57 PM
                • 0 Attachment
                  Charles Iliya Krempeaux wrote:
                  > Just out of curiousity, do you have a list of the "standard libraries" for
                  > handling these types of dates?

                  I'm afraid not. I know PHP has a strtotime function which I believe is
                  supposed to be capable of parsing RFC822 dates. In .NET you should be able
                  to use DateTime.ParseExact with the "r" format. I would expect languages
                  like python and perl would have similar functions or libraries with
                  equivalent functionality. I do most of my development in low level C++ and
                  assembler, though, so my knowledge of libraries is pretty much zero.

                  Regards
                  James
                Your message has been successfully submitted and would be delivered to recipients shortly.