Loading ...
Sorry, an error occurred while loading the content.

[ISO8601] Re: Clarifications:

Expand Messages
  • g1smd@amsat.org
    On 2001-Jul-16 Pete Forman wrote: [2001-Aug-01] ... I agree with that. I didn t find the words *mutually* unambiguous . Instead, I found ... unique and
    Message 1 of 15 , Aug 1, 2001
      On 2001-Jul-16 Pete Forman wrote:


      >> In fact, the standard before the first truncated format in the
      >> opening paragraph of says "In each case hyphens that
      >> indicate components should be used only as indicated or shall be
      >> omitted."

      >> That to me hints that some of choices are arbitrary, so don't play
      >> around with them. Also, there is no place that states that all
      >> formats are mutually unambiguous from each other. As I was reading
      >> I was looking for just such a statement or examples that violated
      >> the idea. I found neither, but that is no proof.

      > How about the last paragraph in 4.1.

      I agree with that. I didn't find the words '*mutually* unambiguous'.
      Instead, I found '... unique and unambiguous', which I think just
      about does the same job.

      > I agree that the choices seem arbitrary. There may be some logic
      > behind it though. My guess is that the rules are something like

      > Replace an omitted component in a truncated format with a hyphen.
      > Component may be century (first two digits of a four digit year)
      > or decade (first three digits of a four digit year) or last two
      > digits of the year or month or week. The term component is not
      > defined as such but the components are listed for each of the
      > truncated representations.

      Not quite. I think that you appear to say that -1 is a year like
      1981 or 2001, stated by omitting the decade. Adding the month to
      this, to increase precision, will make -111. This can now be
      confused with -DDD, day 111 of the year. So, you should modify
      your statement to say that: (except for Day of Year [DDD] elements,
      and Day of Week [D] elements) elements should always have an even
      number of digits: YYYY, YYMM, MMDD, YYMMDD, etc. However, I can
      see where you may have got this idea from. In the examples for
      the various 'Week-of-Year and Day-of-Week' formats, there are
      some three and some single digit year examples. However, in those
      examples, the placement of the 'W' always clarifies what is going
      on. In the Calendar and Ordinal date formats you cannot do this
      with Basic Formats. The Year must be two or four digits, except
      for some Extended Format dates that can have a three or single
      digit year, because these cannot be mixed up with other formats:
      -Y-DDD -YYY-DDD -Y-MM-DD -YYY-MM-DD and possibly -Y-MM and
      -YYY-MM (and you can probably omit the leading hyphen on all
      of these and get away with it). For most of these it is *not*
      possible to have a Basic format (if 'Basic' formats are taken
      to mean that hyphen separators *between* digits are omitted),
      as these *will* then be confused with other pre-defined formats.

      > Remove hyphens if the result is unambiguous.
      > 4.6 para 1 states that a hyphen may be necessary to represent
      > an omitted component. That implies to me that the hyphen
      > should not be used if possible.

      This 'ruling' seems arbitrary in the standard. A format like
      12-12 isn't permitted at all, but 121212 is read as YYMMDD,
      when I would expect YYYYMM to be the one. Similarly -121212
      doesn't appear anywhere, when I think that -YYMMDD is expected
      (like -1212 is -YYMM, for example).

      > (The above two rules may also be expressed as: Components may
      > be omitted, if the result is ambiguous then use a hyphen to
      > stand for the omitted component.)

      I almost agree with this, but I still don't understand why
      121212 has to be YYMMDD, when YYYYMM would be more logical,
      and this would then follow a 'pattern' with the other formats.
      See the various tables that I included in my previous message,
      posted 2001-Jul-16.

      > Add hyphens where a format is ambiguous. This is bound to be
      > arbitrary: if two formats collide one must be chosen to get
      > the extra hyphen.

      It isn't always arbitrary. '12' is always the first two digits
      of a Year, so the last two digits of the Year are -12, the Month
      is --12, and the Day is ---12. This is clear and logical. Note
      that 12-12 isn't permitted at all, as it could be either YY-MM
      or MM-DD. Instead -YY-MM and --MM-DD are used; so none of them
      'get the extra hyphen' (in this context)... they both include a
      hyphen or hyphens. So, two formats collide at '12-12' and rather
      then one gets a hyphen, and the other doesn't, then in fact the
      '12-12' format isn't defined/used at all.

      > Note that omitting hyphens by these rules is a separate issue to
      > I take the latter to mean that the communicating parties
      > agree that, for example, two digits alone mean a month rather than
      > using four characters of proper.

      Yes, by mutual agreement I can say that '12' in one data element
      is MM, and in another is DD, rather than the default of YY.

      What comments have you got regarding the material in my message
      dated 2001-Jul-16, under the heading 'A BIG MISTAKE'?







    • g1smd@amsat.org
      On 2001-Jul-17 Pete Forman wrote: [2001-Aug-01] ... but I forgot to repeat that note with the above text. ... Unfortunately, since ISO mixed their logic in
      Message 2 of 15 , Aug 1, 2001
        On 2001-Jul-17 Pete Forman wrote:


        >> I agree that this is a bit sloppy. It needs rewriting, or some
        >> additional notes. The minimum required, would be to state that
        >> the year may be specified by two, or by four, or by more digits;
        >> and I see a problem here... 121212 is assumed to be YYMMDD,
        >> but this could be the YYYYYY 121212.

        > No, were an expanded representation to be used for a six digit
        > year it would be +121212.

        I did already refer to this in another paragraph, where I noted:
        >>> .... I see a problem here... 121212 is assumed to be YYMMDD,
        >>> but this could be the YYYYYY 121212. Having re-read the
        >>> standard I see that para 4.7 does cover this. Additionally,
        >>> para 4.8 does say that elements do all have a defined length,
        >>> and that leading zeroes must be used to fulfil this. ....
        but I forgot to repeat that note with the above text.

        >> Also, the 'mutual agreement' problem appears here again.
        >> Representations that have the prescribed leading hyphens omitted
        >> can be used only by mutual agreement... except that the format at j
        >> seems to be the default, rather than the format stated in i. That
        >> is, for all the others, mutual agreement is required, but for j it
        >> has already been forced upon us to agree to this. In doing this,
        >> you get the 'logic error' with the RIGHT c entry being disallowed
        >> (in order to satisfy the {non written, as far as I can see} rule
        >> that any representation can have only one implied meaning unless
        >> mutual agreement has already been obtained).

        > As I said in my previous message I consider that there are two
        > possible reasons for omitting hyphens. Mutual agreement is one.
        > The other is that hyphens should/must only be used to stand for
        > omitted components in order to disambiguate.

        Unfortunately, since ISO mixed their logic in deciding on YYMMDD
        over YYYYYMM, this skews the expected logical 'pattern' of allowed
        formats, as shown in the tables in my message posted 2001-Jul-16.
        I think their choices of 'default' formats are somewhat arbitrary.

        >> I guess they had to include YYMMDD and exclude YYYYMM simply
        >> because millions of computer systems were already using YYMMDD.

        > Not necessarily. We are talking about a date format, it
        > is reasonable to give preference to the full precision
        > interpretation over the reduced one.

        So why isn't 1212 decoded as MMDD, instead of YYYY?
        It seems very odd to me, that the formats go:
        12121212 YYYYMMDD
        121212 YYMMDD
        1212 YYYY
        12 YY (19 of 1950)
        Surely, life would be much easier if 121212 were YYYYMM?

        I was expecting one of the following patterns:
        12121212 YYYYMMDD
        121212 YYYYMM
        1212 YYYY
        12 YY (19 of 1950)
        12121212 YYYYMMDD
        121212 YYMMDD
        1212 MMDD
        12 DD
        12121212 YYYYMMDD
        121212 YYMMDD
        1212 YYYY
        12 YY (50 of 1950)
        The last three all have a logical pattern to them, whereas
        the first table (as derived from the ISO 8601 standard) does
        not have a logical pattern. Have another look at the various
        tables in my previous message (the one dated 2001-Jul-16)
        for further information.

        > In general ISO 8601 does not say that two digit years are evil. It
        > passes them off as a specific case of a truncated representation.

        Most formats that have a two digit year have a leading hyphen.
        Only YYMMDD does not, at the expense of YYYYMM being disallowed.
        I don't understand why.

        >> The table can be rearranged to ask what a numerical format should
        >> be decoded as. To keep it simple, I have not divided it into Basic
        >> and Extended formats. Anything with a hyphen between elements is an
        >> Extended format.

        > As you probably realise, that contradicts

        That is so illogical. What is a Basic Format? What is an Extended Format?

        A simple answer would be (you would think) that an Extended Format
        includes separators between elements, and a Basic Format always has
        them omitted. However, because someone at ISO decided that 121212
        would be YYMMDD (the Basic version of YY-MM-DD), then YYYYMM has been
        disallowed. A Year and Month always has to have a hyphen separator:
        YYYY-MM. But why is it then called a Basic Format? This is the only
        Basic Format in the whole standard that includes any separators.

        I repeat, again, just what is a Basic Format? Give me a simple
        definition. Hyphen separators are not it; unless ISO have made
        a mistake and it is meant to be:

        Year and Month:
        *Extended* Format: YYYY-MM
        Basic Format: Not Applicable (because 121212 is YYMMDD)

        but as already stated, I think ISO made a fundamental error in
        allowing YYMMDD over YYYYMM in the first place. That is where
        the heart of the whole problem lies.

        >> Writing the table this way, I have included some formats that the
        >> ISO standard says are 'Not Applicable'. There cannot be a way to
        >> tell if '1950' is supposed to be a Basic format Year or an Extended
        >> format Year. I have ignored this and included it under both styles.

        > Again, the standard is clear that '1950' is basic format only.
        > I take 'Not Applicable' to mean "don't use this".

        Take a date like 1212-12-12, reduce the precision to 1212-12,
        then to 1212. Now do the same with 12121212, reduce to 1212-12
        (121212 not allowed!!), then to 1212. So, 1212-12-12 is an
        Extended Format, and 12121212 is a Basic Format; but both reduce
        to 1212 for just the Year. So, really, 1212 could be a Basic
        Format or an Extended Format, there is no way to tell. What I
        think the ISO standard means by 'Not Applicable' is simply that
        because 1212 does not contain any hyphen separators; in other
        words, that is, because 1212 (Extended) is exactly the same as
        1212 (Basic) (i.e. the Extended Format does not have it's own
        unique definition), then there is no need to repeat the
        definition that was shown for the Basic Format. So I think
        that 'Not Applicable' really just means that there is no unique
        representation to show for the Extended Format, so you just use
        the same format as is already listed for the Basic Format.
        However, I am also assuming that the difference between an
        Extended Format and a Basic Format is that the Basic Format
        does not include any hyphens used as separators.

        > What we could do with is a rationale for the standard. I wonder
        > if one was produced.

        > The other useful production would be a general reader. Given any
        > input string it should be possible to determine whether it is basic
        > or extended, full or reduced precision, expanded or not, truncated
        > or not, calendar or ordinal or week. (At a higher level we need to
        > determine whether a string is a date, time, interval, etc.)

        There is NO pattern to the ISO standard. Many of the choices
        are arbitrary... viz YYMMDD vs YYYYMM and so on. This makes
        finding a 'simple' rule impossible.

        > A start for this might be

        > Parse as date:
        > Does it contain a 'W'?
        > => parse as week date
        > else Does it have an even number of digits? **
        > => parse as calendar date
        > else
        > => parse as ordinal date

        > Parse as calendar date:
        > Split into fields of hyphen or pair-of-digits or plus (1st only)
        > Match against candidate formats

        > Parse as ordinal date:
        > Split into fields of hyphen or pair-of-digits or plus (1st only)
        > or triple-of-digits (last only)
        > Match against candidate formats

        >**This assumes that expanded formats use an even number of digits for
        > the year. A different approach might tolerate an odd number.
        > Actually, in order to parse expanded formats the number of digits
        > for the year must be known otherwise there is no way to distinguish
        > days from years from centuries. Years before 0000 are problematic
        > as well. But according to mutual agreement is needed for
        > years prior to 1582 anyway.

        > The table for calendar dates then starts:

        > Number of Fields Format Section Note
        > fields
        > 1 + illegal
        > 1 - illegal
        > 1 2 YY
        > 2 + - illegal
        > 2 + 2 +YY 1
        > 2 - - illegal
        > 2 - 2 -YY 2
        > 2 2 - illegal
        > 2 2 2 YYYY
        > ...
        > 4 2 2 2 2 YYYYMMDD
        > ...
        > 6 2 2 - 2 - 2 YYYY-MM-DD

        I see that your table deals only with stuff that begins with the
        Year. That is all easy. See if you can finish it, when you deal
        with left-truncated stuff: both full and reduced precision.
        It becomes a LOT more difficult.

        > Notes:
        > 1 Implicitly assume that expanded representation years have 4 digits
        > 2 Implicitly assume that expanded representation years are positive
        > or have more that 4 digits

        > Different versions of tables would be needed for different expanded
        > representations. Expanded and truncated representations are mutually
        > exclusive. The agreement between parties has to be inspected to
        > establish whether a leading hyphen means a negative year or truncated
        > representation.

        It gets very complicated doesn't it. My tables of Allowed and
        Disallowed formats in the message dated 2001-Jul-16 may help
        to guide you to look for logic errors.







      • P A Hill & E V Goodall
        ... I think is why increased precision is only done when the exchanging parties agree. I think this gets around the problem, that given some arbitrary sequence
        Message 3 of 15 , Aug 2, 2001
          g1smd@... wrote:
          > Adding the month to
          > this, to increase precision, will make -111. This can now be
          > confused with -DDD, day 111 of the year.

          I think is why increased precision is only done when the exchanging
          parties agree. I think this gets around the problem, that given some
          arbitrary sequence can we guess what it is.

        Your message has been successfully submitted and would be delivered to recipients shortly.