Loading ...
Sorry, an error occurred while loading the content.

196[ISO8601] Re: Clarifications:

Expand Messages
  • g1smd@amsat.org
    Aug 1, 2001
    • 0 Attachment
      On 2001-Jul-17 Pete Forman wrote:


      >> I agree that this is a bit sloppy. It needs rewriting, or some
      >> additional notes. The minimum required, would be to state that
      >> the year may be specified by two, or by four, or by more digits;
      >> and I see a problem here... 121212 is assumed to be YYMMDD,
      >> but this could be the YYYYYY 121212.

      > No, were an expanded representation to be used for a six digit
      > year it would be +121212.

      I did already refer to this in another paragraph, where I noted:
      >>> .... I see a problem here... 121212 is assumed to be YYMMDD,
      >>> but this could be the YYYYYY 121212. Having re-read the
      >>> standard I see that para 4.7 does cover this. Additionally,
      >>> para 4.8 does say that elements do all have a defined length,
      >>> and that leading zeroes must be used to fulfil this. ....
      but I forgot to repeat that note with the above text.

      >> Also, the 'mutual agreement' problem appears here again.
      >> Representations that have the prescribed leading hyphens omitted
      >> can be used only by mutual agreement... except that the format at j
      >> seems to be the default, rather than the format stated in i. That
      >> is, for all the others, mutual agreement is required, but for j it
      >> has already been forced upon us to agree to this. In doing this,
      >> you get the 'logic error' with the RIGHT c entry being disallowed
      >> (in order to satisfy the {non written, as far as I can see} rule
      >> that any representation can have only one implied meaning unless
      >> mutual agreement has already been obtained).

      > As I said in my previous message I consider that there are two
      > possible reasons for omitting hyphens. Mutual agreement is one.
      > The other is that hyphens should/must only be used to stand for
      > omitted components in order to disambiguate.

      Unfortunately, since ISO mixed their logic in deciding on YYMMDD
      over YYYYYMM, this skews the expected logical 'pattern' of allowed
      formats, as shown in the tables in my message posted 2001-Jul-16.
      I think their choices of 'default' formats are somewhat arbitrary.

      >> I guess they had to include YYMMDD and exclude YYYYMM simply
      >> because millions of computer systems were already using YYMMDD.

      > Not necessarily. We are talking about a date format, it
      > is reasonable to give preference to the full precision
      > interpretation over the reduced one.

      So why isn't 1212 decoded as MMDD, instead of YYYY?
      It seems very odd to me, that the formats go:
      12121212 YYYYMMDD
      121212 YYMMDD
      1212 YYYY
      12 YY (19 of 1950)
      Surely, life would be much easier if 121212 were YYYYMM?

      I was expecting one of the following patterns:
      12121212 YYYYMMDD
      121212 YYYYMM
      1212 YYYY
      12 YY (19 of 1950)
      12121212 YYYYMMDD
      121212 YYMMDD
      1212 MMDD
      12 DD
      12121212 YYYYMMDD
      121212 YYMMDD
      1212 YYYY
      12 YY (50 of 1950)
      The last three all have a logical pattern to them, whereas
      the first table (as derived from the ISO 8601 standard) does
      not have a logical pattern. Have another look at the various
      tables in my previous message (the one dated 2001-Jul-16)
      for further information.

      > In general ISO 8601 does not say that two digit years are evil. It
      > passes them off as a specific case of a truncated representation.

      Most formats that have a two digit year have a leading hyphen.
      Only YYMMDD does not, at the expense of YYYYMM being disallowed.
      I don't understand why.

      >> The table can be rearranged to ask what a numerical format should
      >> be decoded as. To keep it simple, I have not divided it into Basic
      >> and Extended formats. Anything with a hyphen between elements is an
      >> Extended format.

      > As you probably realise, that contradicts

      That is so illogical. What is a Basic Format? What is an Extended Format?

      A simple answer would be (you would think) that an Extended Format
      includes separators between elements, and a Basic Format always has
      them omitted. However, because someone at ISO decided that 121212
      would be YYMMDD (the Basic version of YY-MM-DD), then YYYYMM has been
      disallowed. A Year and Month always has to have a hyphen separator:
      YYYY-MM. But why is it then called a Basic Format? This is the only
      Basic Format in the whole standard that includes any separators.

      I repeat, again, just what is a Basic Format? Give me a simple
      definition. Hyphen separators are not it; unless ISO have made
      a mistake and it is meant to be:

      Year and Month:
      *Extended* Format: YYYY-MM
      Basic Format: Not Applicable (because 121212 is YYMMDD)

      but as already stated, I think ISO made a fundamental error in
      allowing YYMMDD over YYYYMM in the first place. That is where
      the heart of the whole problem lies.

      >> Writing the table this way, I have included some formats that the
      >> ISO standard says are 'Not Applicable'. There cannot be a way to
      >> tell if '1950' is supposed to be a Basic format Year or an Extended
      >> format Year. I have ignored this and included it under both styles.

      > Again, the standard is clear that '1950' is basic format only.
      > I take 'Not Applicable' to mean "don't use this".

      Take a date like 1212-12-12, reduce the precision to 1212-12,
      then to 1212. Now do the same with 12121212, reduce to 1212-12
      (121212 not allowed!!), then to 1212. So, 1212-12-12 is an
      Extended Format, and 12121212 is a Basic Format; but both reduce
      to 1212 for just the Year. So, really, 1212 could be a Basic
      Format or an Extended Format, there is no way to tell. What I
      think the ISO standard means by 'Not Applicable' is simply that
      because 1212 does not contain any hyphen separators; in other
      words, that is, because 1212 (Extended) is exactly the same as
      1212 (Basic) (i.e. the Extended Format does not have it's own
      unique definition), then there is no need to repeat the
      definition that was shown for the Basic Format. So I think
      that 'Not Applicable' really just means that there is no unique
      representation to show for the Extended Format, so you just use
      the same format as is already listed for the Basic Format.
      However, I am also assuming that the difference between an
      Extended Format and a Basic Format is that the Basic Format
      does not include any hyphens used as separators.

      > What we could do with is a rationale for the standard. I wonder
      > if one was produced.

      > The other useful production would be a general reader. Given any
      > input string it should be possible to determine whether it is basic
      > or extended, full or reduced precision, expanded or not, truncated
      > or not, calendar or ordinal or week. (At a higher level we need to
      > determine whether a string is a date, time, interval, etc.)

      There is NO pattern to the ISO standard. Many of the choices
      are arbitrary... viz YYMMDD vs YYYYMM and so on. This makes
      finding a 'simple' rule impossible.

      > A start for this might be

      > Parse as date:
      > Does it contain a 'W'?
      > => parse as week date
      > else Does it have an even number of digits? **
      > => parse as calendar date
      > else
      > => parse as ordinal date

      > Parse as calendar date:
      > Split into fields of hyphen or pair-of-digits or plus (1st only)
      > Match against candidate formats

      > Parse as ordinal date:
      > Split into fields of hyphen or pair-of-digits or plus (1st only)
      > or triple-of-digits (last only)
      > Match against candidate formats

      >**This assumes that expanded formats use an even number of digits for
      > the year. A different approach might tolerate an odd number.
      > Actually, in order to parse expanded formats the number of digits
      > for the year must be known otherwise there is no way to distinguish
      > days from years from centuries. Years before 0000 are problematic
      > as well. But according to mutual agreement is needed for
      > years prior to 1582 anyway.

      > The table for calendar dates then starts:

      > Number of Fields Format Section Note
      > fields
      > 1 + illegal
      > 1 - illegal
      > 1 2 YY
      > 2 + - illegal
      > 2 + 2 +YY 1
      > 2 - - illegal
      > 2 - 2 -YY 2
      > 2 2 - illegal
      > 2 2 2 YYYY
      > ...
      > 4 2 2 2 2 YYYYMMDD
      > ...
      > 6 2 2 - 2 - 2 YYYY-MM-DD

      I see that your table deals only with stuff that begins with the
      Year. That is all easy. See if you can finish it, when you deal
      with left-truncated stuff: both full and reduced precision.
      It becomes a LOT more difficult.

      > Notes:
      > 1 Implicitly assume that expanded representation years have 4 digits
      > 2 Implicitly assume that expanded representation years are positive
      > or have more that 4 digits

      > Different versions of tables would be needed for different expanded
      > representations. Expanded and truncated representations are mutually
      > exclusive. The agreement between parties has to be inspected to
      > establish whether a leading hyphen means a negative year or truncated
      > representation.

      It gets very complicated doesn't it. My tables of Allowed and
      Disallowed formats in the message dated 2001-Jul-16 may help
      to guide you to look for logic errors.







    • Show all 15 messages in this topic