Loading ...
Sorry, an error occurred while loading the content.

Re: [ISO8601] Re: Clarifications: 5.2.2.2

Expand Messages
  • Pete Forman
    ... No, were an expanded representation to be used for a six digit year it would be +121212. ... As I said in my previous message I consider that there are two
    Message 1 of 15 , Jul 17, 2001
    • 0 Attachment
      g1smd@... writes:
      > [most of the material snipped]

      > I agree that this is a bit sloppy. It needs rewriting, or some
      > additional notes. The minimum required, would be to state that the
      > year may be specified by two, or by four, or by more digits; and I
      > see a problem here... 121212 is assumed to be YYMMDD, but this
      > could be the YYYYYY 121212.

      No, were an expanded representation to be used for a six digit year it
      would be +121212.


      > Also, the 'mutual agreement' problem appears here again.
      > Representations that have the prescribed leading hyphens omitted
      > can be used only by mutual agreement... except that the format at j
      > seems to be the default, rather than the format stated in i. That
      > is, for all the others, mutual agreement is required, but for j it
      > has already been forced upon us to agree to this. In doing this,
      > you get the 'logic error' with the RIGHT c entry being disallowed
      > (in order to satisfy the {non written, as far as I can see} rule
      > that any representation can have only one implied meaning unless
      > mutual agreement has already been obtained).

      As I said in my previous message I consider that there are two
      possible reasons for omitting hyphens. Mutual agreement is one. The
      other is that hyphens should/must only be used to stand for omitted
      components in order to disambiguate.


      > I guess they had to include YYMMDD and exclude YYYYMM simply
      > because millions of computer systems were already using YYMMDD.

      Not necessarily. We are talking about a date format, it is reasonable
      to give preference to the full precision interpretation over the
      reduced one.

      In general ISO 8601 does not say that two digit years are evil. It
      passes them off as a specific case of a truncated representation.

      > The table can be rearranged to ask what a numerical format should
      > be decoded as. To keep it simple, I have not divided it into Basic
      > and Extended formats. Anything with a hyphen between elements is an
      > Extended format.

      As you probably realise, that contradicts 5.2.1.2.

      > Writing the table this way, I have included some formats that the
      > ISO standard says are 'Not Applicable'. There cannot be a way to
      > tell if '1950' is supposed to be a Basic format Year or an Extended
      > format Year. I have ignored this and included it under both styles.

      Again, the standard is clear that '1950' is basic format only. I take
      'Not Applicable' to mean "don't use this".




      What we could do with is a rationale for the standard. I wonder if
      one was produced.

      The other useful production would be a general reader. Given any
      input string it should be possible to determine whether it is basic or
      extended, full or reduced precision, expanded or not, truncated or
      not, calendar or ordinal or week. (At a higher level we need to
      determine whether a string is a date, time, interval, etc.)

      A start for this might be

      Parse as date:
      Does it contain a 'W'?
      => parse as week date
      else Does it have an even number of digits? **
      => parse as calendar date
      else
      => parse as ordinal date

      Parse as calendar date:
      Split into fields of hyphen or pair-of-digits or plus (1st only)
      Match against candidate formats

      Parse as ordinal date:
      Split into fields of hyphen or pair-of-digits or plus (1st only)
      or triple-of-digits (last only)
      Match against candidate formats

      **This assumes that expanded formats use an even number of digits for
      the year. A different approach might tolerate an odd number.
      Actually, in order to parse expanded formats the number of digits
      for the year must be known otherwise there is no way to distinguish
      days from years from centuries. Years before 0000 are problematic
      as well. But according to 4.3.2.1 mutual agreement is needed for
      years prior to 1582 anyway.

      The table for calendar dates then starts

      Number of Fields Format Section Note
      fields
      1 + illegal
      1 - illegal
      1 2 YY 5.2.1.2.c.B
      2 + - illegal
      2 + 2 +YY 5.2.1.4.d.B 1
      2 - - illegal
      2 - 2 -YY 5.2.1.3.c.B 2
      2 2 - illegal
      2 2 2 YYYY 5.2.1.2.b.B
      ...
      4 2 2 2 2 YYYYMMDD 5.2.1.1.B
      ...
      6 2 2 - 2 - 2 YYYY-MM-DD 5.2.1.1.E

      Notes:
      1 Implicitly assume that expanded representation years have 4 digits
      2 Implicitly assume that expanded representation years are positive
      or have more that 4 digits


      Different versions of tables would be needed for different expanded
      representations. Expanded and truncated representations are mutually
      exclusive. The agreement between parties has to be inspected to
      establish whether a leading hyphen means a negative year or truncated
      representation.
      --
      Pete Forman -./\.- Disclaimer: This post is originated
      WesternGeco -./\.- by myself and does not represent
      pete.forman@... -./\.- opinion of Schlumberger, Baker
      http://www.crosswinds.net/~petef -./\.- Hughes or their divisions.
    • P A Hill & E V Goodall
      ... Very good! That does qualify as stating that are all tying to be unique. ... Yes that is how I read all of the paragraphs that read like that: If, by
      Message 2 of 15 , Jul 17, 2001
      • 0 Attachment
        Pete Forman wrote:
        >
        > P A Hill & E V Goodall writes:
        > > [snip]
        > > In fact, the standard before the first truncated format in the
        > > opening paragraph of 5.2.1.3 says "In each case hyphens that
        > > indicate components should be used only as indicated or shall be
        > > omitted."
        > >
        > > That to me hints that some of choices are arbitrary, so don't play
        > > around with them. Also, there is no place that states that all
        > > formats are mutually unambiguous from each other.
        >
        > How about the last paragraph in 4.1.

        Very good! That does qualify as stating that are all tying to be unique.

        > Add hyphens where a format is ambiguous. This is bound to be
        > arbitrary: if two formats collide one must be chosen to get the
        > extra hyphen.

        >
        > Note that omitting hyphens by these rules is a separate issue to
        > 5.2.1.3. I take the latter to mean that the communicating parties
        > agree that, for example, two digits alone mean a month rather than
        > using four characters of 5.2.1.3.e proper.

        Yes that is how I read all of the paragraphs that read like that:

        "If, by agreement, truncated representations are used the basic formats shall
        be as specified below. In each case hyphens that indicate omitted components
        shall be used only as indicated or shall be omitted."

        It also tells anyone who wants to claim to be 8601 compliant to not to make up
        a format that might be read just like one of the real formats but missing
        only some of the hyphens.

        For example, I can't claim to be 8601 compliant and take one leading
        hyphen off of each of examples in 5.2.1.3.

        Thanks for the comments on how you read 8601.

        -Paul
      • P A Hill & E V Goodall
        ... Actually, I was assuming just mutually unambiguous choices had been made, so was not expecting a particular format. I was thrown off by the writers of the
        Message 3 of 15 , Jul 17, 2001
        • 0 Attachment
          g1smd@... wrote:
          > > The note at 5.2.3.3 would not list only one format, but mention
          > > all of those which one might think might have a leading dash
          > > for missing 'century' and another for missing year pointing
          > > out the simplification.
          >
          > Now I see what you are saying, I agree that the wording here
          > is sub-optimal. You reach a place where you see a format you
          > were not expecting, with no previous rationale as to why the
          > format is shown like it is. Yes, the standard is deficient
          > (unless 4.9 is where its at?) and requires extra notes.

          Actually, I was assuming just mutually unambiguous choices
          had been made, so was not expecting a particular format. I was
          thrown off by the writers of the standard expecting a particular
          format.

          > I wasn't sure why you were hung up on this one word 'should'.
          > Now you have explained more, then I am happy to agree with you.
          > You are right. Although the standard works the way I have said,
          > and the examples follow the method I have stated, nowhere in
          > the standard does it state clearly that this is the case, or
          > why it should be so, and several notes of clarification are
          > obviously missing on a few examples.

          I'm glad we got that worked out! Yes, it was more an editorial
          analysis stated as "did I miss something", then a criticism of
          a particular format.

          Thanks for the interesting tables! I was just starting to
          work on some like these myself.

          -Paul
        • P A Hill & E V Goodall
          ... Let s make that: That does qualify as stating that all are trying to be unique. -Paul
          Message 4 of 15 , Jul 18, 2001
          • 0 Attachment
            P A Hill & E V Goodall wrote:
            > That does qualify as stating that are all tying to be unique.

            Let's make that:

            That does qualify as stating that all are trying to be unique.

            -Paul
          • g1smd@amsat.org
            On 2001-Jul-16 Pete Forman wrote: [2001-Aug-01] ... I agree with that. I didn t find the words *mutually* unambiguous . Instead, I found ... unique and
            Message 5 of 15 , Aug 1, 2001
            • 0 Attachment
              On 2001-Jul-16 Pete Forman wrote:


              [2001-Aug-01]



              >> In fact, the standard before the first truncated format in the
              >> opening paragraph of 5.2.1.3 says "In each case hyphens that
              >> indicate components should be used only as indicated or shall be
              >> omitted."

              >> That to me hints that some of choices are arbitrary, so don't play
              >> around with them. Also, there is no place that states that all
              >> formats are mutually unambiguous from each other. As I was reading
              >> I was looking for just such a statement or examples that violated
              >> the idea. I found neither, but that is no proof.

              > How about the last paragraph in 4.1.

              I agree with that. I didn't find the words '*mutually* unambiguous'.
              Instead, I found '... unique and unambiguous', which I think just
              about does the same job.



              > I agree that the choices seem arbitrary. There may be some logic
              > behind it though. My guess is that the rules are something like

              > Replace an omitted component in a truncated format with a hyphen.
              > Component may be century (first two digits of a four digit year)
              > or decade (first three digits of a four digit year) or last two
              > digits of the year or month or week. The term component is not
              > defined as such but the components are listed for each of the
              > truncated representations.

              Not quite. I think that you appear to say that -1 is a year like
              1981 or 2001, stated by omitting the decade. Adding the month to
              this, to increase precision, will make -111. This can now be
              confused with -DDD, day 111 of the year. So, you should modify
              your statement to say that: (except for Day of Year [DDD] elements,
              and Day of Week [D] elements) elements should always have an even
              number of digits: YYYY, YYMM, MMDD, YYMMDD, etc. However, I can
              see where you may have got this idea from. In the examples for
              the various 'Week-of-Year and Day-of-Week' formats, there are
              some three and some single digit year examples. However, in those
              examples, the placement of the 'W' always clarifies what is going
              on. In the Calendar and Ordinal date formats you cannot do this
              with Basic Formats. The Year must be two or four digits, except
              for some Extended Format dates that can have a three or single
              digit year, because these cannot be mixed up with other formats:
              -Y-DDD -YYY-DDD -Y-MM-DD -YYY-MM-DD and possibly -Y-MM and
              -YYY-MM (and you can probably omit the leading hyphen on all
              of these and get away with it). For most of these it is *not*
              possible to have a Basic format (if 'Basic' formats are taken
              to mean that hyphen separators *between* digits are omitted),
              as these *will* then be confused with other pre-defined formats.



              > Remove hyphens if the result is unambiguous.
              > 4.6 para 1 states that a hyphen may be necessary to represent
              > an omitted component. That implies to me that the hyphen
              > should not be used if possible.

              This 'ruling' seems arbitrary in the standard. A format like
              12-12 isn't permitted at all, but 121212 is read as YYMMDD,
              when I would expect YYYYMM to be the one. Similarly -121212
              doesn't appear anywhere, when I think that -YYMMDD is expected
              (like -1212 is -YYMM, for example).



              > (The above two rules may also be expressed as: Components may
              > be omitted, if the result is ambiguous then use a hyphen to
              > stand for the omitted component.)

              I almost agree with this, but I still don't understand why
              121212 has to be YYMMDD, when YYYYMM would be more logical,
              and this would then follow a 'pattern' with the other formats.
              See the various tables that I included in my previous message,
              posted 2001-Jul-16.



              > Add hyphens where a format is ambiguous. This is bound to be
              > arbitrary: if two formats collide one must be chosen to get
              > the extra hyphen.

              It isn't always arbitrary. '12' is always the first two digits
              of a Year, so the last two digits of the Year are -12, the Month
              is --12, and the Day is ---12. This is clear and logical. Note
              that 12-12 isn't permitted at all, as it could be either YY-MM
              or MM-DD. Instead -YY-MM and --MM-DD are used; so none of them
              'get the extra hyphen' (in this context)... they both include a
              hyphen or hyphens. So, two formats collide at '12-12' and rather
              then one gets a hyphen, and the other doesn't, then in fact the
              '12-12' format isn't defined/used at all.



              > Note that omitting hyphens by these rules is a separate issue to
              > 5.2.1.3. I take the latter to mean that the communicating parties
              > agree that, for example, two digits alone mean a month rather than
              > using four characters of 5.2.1.3.e proper.

              Yes, by mutual agreement I can say that '12' in one data element
              is MM, and in another is DD, rather than the default of YY.



              What comments have you got regarding the material in my message
              dated 2001-Jul-16, under the heading 'A BIG MISTAKE'?



              Cheers,

              Ian.


              <mail://g1smd@...>

              <http://www.qsl.net/g1smd/>
              <http://home.freeuk.net/g1smd/>
              <http://ourworld.compuserve.com/homepages/dstrange/y2k.htm>

              <ftp://ftp.funet.fi/pub/ham/misc/g1smd.zip>
              <ftp://ftp.qsl.net/pub/g1smd/>


              [2001-08-01]

              .end
            • g1smd@amsat.org
              On 2001-Jul-17 Pete Forman wrote: [2001-Aug-01] ... but I forgot to repeat that note with the above text. ... Unfortunately, since ISO mixed their logic in
              Message 6 of 15 , Aug 1, 2001
              • 0 Attachment
                On 2001-Jul-17 Pete Forman wrote:


                [2001-Aug-01]



                >> I agree that this is a bit sloppy. It needs rewriting, or some
                >> additional notes. The minimum required, would be to state that
                >> the year may be specified by two, or by four, or by more digits;
                >> and I see a problem here... 121212 is assumed to be YYMMDD,
                >> but this could be the YYYYYY 121212.

                > No, were an expanded representation to be used for a six digit
                > year it would be +121212.

                I did already refer to this in another paragraph, where I noted:
                >>> .... I see a problem here... 121212 is assumed to be YYMMDD,
                >>> but this could be the YYYYYY 121212. Having re-read the
                >>> standard I see that para 4.7 does cover this. Additionally,
                >>> para 4.8 does say that elements do all have a defined length,
                >>> and that leading zeroes must be used to fulfil this. ....
                but I forgot to repeat that note with the above text.



                >> Also, the 'mutual agreement' problem appears here again.
                >> Representations that have the prescribed leading hyphens omitted
                >> can be used only by mutual agreement... except that the format at j
                >> seems to be the default, rather than the format stated in i. That
                >> is, for all the others, mutual agreement is required, but for j it
                >> has already been forced upon us to agree to this. In doing this,
                >> you get the 'logic error' with the RIGHT c entry being disallowed
                >> (in order to satisfy the {non written, as far as I can see} rule
                >> that any representation can have only one implied meaning unless
                >> mutual agreement has already been obtained).

                > As I said in my previous message I consider that there are two
                > possible reasons for omitting hyphens. Mutual agreement is one.
                > The other is that hyphens should/must only be used to stand for
                > omitted components in order to disambiguate.

                Unfortunately, since ISO mixed their logic in deciding on YYMMDD
                over YYYYYMM, this skews the expected logical 'pattern' of allowed
                formats, as shown in the tables in my message posted 2001-Jul-16.
                I think their choices of 'default' formats are somewhat arbitrary.



                >> I guess they had to include YYMMDD and exclude YYYYMM simply
                >> because millions of computer systems were already using YYMMDD.

                > Not necessarily. We are talking about a date format, it
                > is reasonable to give preference to the full precision
                > interpretation over the reduced one.

                So why isn't 1212 decoded as MMDD, instead of YYYY?
                It seems very odd to me, that the formats go:
                12121212 YYYYMMDD
                121212 YYMMDD
                1212 YYYY
                12 YY (19 of 1950)
                Surely, life would be much easier if 121212 were YYYYMM?

                I was expecting one of the following patterns:
                12121212 YYYYMMDD
                121212 YYYYMM
                1212 YYYY
                12 YY (19 of 1950)
                or:
                12121212 YYYYMMDD
                121212 YYMMDD
                1212 MMDD
                12 DD
                or:
                12121212 YYYYMMDD
                121212 YYMMDD
                1212 YYYY
                12 YY (50 of 1950)
                The last three all have a logical pattern to them, whereas
                the first table (as derived from the ISO 8601 standard) does
                not have a logical pattern. Have another look at the various
                tables in my previous message (the one dated 2001-Jul-16)
                for further information.



                > In general ISO 8601 does not say that two digit years are evil. It
                > passes them off as a specific case of a truncated representation.

                Most formats that have a two digit year have a leading hyphen.
                Only YYMMDD does not, at the expense of YYYYMM being disallowed.
                I don't understand why.



                >> The table can be rearranged to ask what a numerical format should
                >> be decoded as. To keep it simple, I have not divided it into Basic
                >> and Extended formats. Anything with a hyphen between elements is an
                >> Extended format.

                > As you probably realise, that contradicts 5.2.1.2.

                That is so illogical. What is a Basic Format? What is an Extended Format?

                A simple answer would be (you would think) that an Extended Format
                includes separators between elements, and a Basic Format always has
                them omitted. However, because someone at ISO decided that 121212
                would be YYMMDD (the Basic version of YY-MM-DD), then YYYYMM has been
                disallowed. A Year and Month always has to have a hyphen separator:
                YYYY-MM. But why is it then called a Basic Format? This is the only
                Basic Format in the whole standard that includes any separators.

                I repeat, again, just what is a Basic Format? Give me a simple
                definition. Hyphen separators are not it; unless ISO have made
                a mistake and it is meant to be:

                Year and Month:
                ---------------
                *Extended* Format: YYYY-MM
                Basic Format: Not Applicable (because 121212 is YYMMDD)

                but as already stated, I think ISO made a fundamental error in
                allowing YYMMDD over YYYYMM in the first place. That is where
                the heart of the whole problem lies.



                >> Writing the table this way, I have included some formats that the
                >> ISO standard says are 'Not Applicable'. There cannot be a way to
                >> tell if '1950' is supposed to be a Basic format Year or an Extended
                >> format Year. I have ignored this and included it under both styles.

                > Again, the standard is clear that '1950' is basic format only.
                > I take 'Not Applicable' to mean "don't use this".

                Take a date like 1212-12-12, reduce the precision to 1212-12,
                then to 1212. Now do the same with 12121212, reduce to 1212-12
                (121212 not allowed!!), then to 1212. So, 1212-12-12 is an
                Extended Format, and 12121212 is a Basic Format; but both reduce
                to 1212 for just the Year. So, really, 1212 could be a Basic
                Format or an Extended Format, there is no way to tell. What I
                think the ISO standard means by 'Not Applicable' is simply that
                because 1212 does not contain any hyphen separators; in other
                words, that is, because 1212 (Extended) is exactly the same as
                1212 (Basic) (i.e. the Extended Format does not have it's own
                unique definition), then there is no need to repeat the
                definition that was shown for the Basic Format. So I think
                that 'Not Applicable' really just means that there is no unique
                representation to show for the Extended Format, so you just use
                the same format as is already listed for the Basic Format.
                However, I am also assuming that the difference between an
                Extended Format and a Basic Format is that the Basic Format
                does not include any hyphens used as separators.



                > What we could do with is a rationale for the standard. I wonder
                > if one was produced.

                > The other useful production would be a general reader. Given any
                > input string it should be possible to determine whether it is basic
                > or extended, full or reduced precision, expanded or not, truncated
                > or not, calendar or ordinal or week. (At a higher level we need to
                > determine whether a string is a date, time, interval, etc.)

                There is NO pattern to the ISO standard. Many of the choices
                are arbitrary... viz YYMMDD vs YYYYMM and so on. This makes
                finding a 'simple' rule impossible.



                > A start for this might be

                > Parse as date:
                > Does it contain a 'W'?
                > => parse as week date
                > else Does it have an even number of digits? **
                > => parse as calendar date
                > else
                > => parse as ordinal date

                > Parse as calendar date:
                > Split into fields of hyphen or pair-of-digits or plus (1st only)
                > Match against candidate formats

                > Parse as ordinal date:
                > Split into fields of hyphen or pair-of-digits or plus (1st only)
                > or triple-of-digits (last only)
                > Match against candidate formats

                >**This assumes that expanded formats use an even number of digits for
                > the year. A different approach might tolerate an odd number.
                > Actually, in order to parse expanded formats the number of digits
                > for the year must be known otherwise there is no way to distinguish
                > days from years from centuries. Years before 0000 are problematic
                > as well. But according to 4.3.2.1 mutual agreement is needed for
                > years prior to 1582 anyway.

                > The table for calendar dates then starts:

                > Number of Fields Format Section Note
                > fields
                > 1 + illegal
                > 1 - illegal
                > 1 2 YY 5.2.1.2.c.B
                > 2 + - illegal
                > 2 + 2 +YY 5.2.1.4.d.B 1
                > 2 - - illegal
                > 2 - 2 -YY 5.2.1.3.c.B 2
                > 2 2 - illegal
                > 2 2 2 YYYY 5.2.1.2.b.B
                > ...
                > 4 2 2 2 2 YYYYMMDD 5.2.1.1.B
                > ...
                > 6 2 2 - 2 - 2 YYYY-MM-DD 5.2.1.1.E

                I see that your table deals only with stuff that begins with the
                Year. That is all easy. See if you can finish it, when you deal
                with left-truncated stuff: both full and reduced precision.
                It becomes a LOT more difficult.



                > Notes:
                > 1 Implicitly assume that expanded representation years have 4 digits
                > 2 Implicitly assume that expanded representation years are positive
                > or have more that 4 digits

                > Different versions of tables would be needed for different expanded
                > representations. Expanded and truncated representations are mutually
                > exclusive. The agreement between parties has to be inspected to
                > establish whether a leading hyphen means a negative year or truncated
                > representation.

                It gets very complicated doesn't it. My tables of Allowed and
                Disallowed formats in the message dated 2001-Jul-16 may help
                to guide you to look for logic errors.



                Cheers,

                Ian.


                <mail://g1smd@...>

                <http://www.qsl.net/g1smd/>
                <http://home.freeuk.net/g1smd/>
                <http://ourworld.compuserve.com/homepages/dstrange/y2k.htm>

                <ftp://ftp.funet.fi/pub/ham/misc/g1smd.zip>
                <ftp://ftp.qsl.net/pub/g1smd/>


                [2001-08-01]

                .end
              • P A Hill & E V Goodall
                ... I think is why increased precision is only done when the exchanging parties agree. I think this gets around the problem, that given some arbitrary sequence
                Message 7 of 15 , Aug 2, 2001
                • 0 Attachment
                  g1smd@... wrote:
                  > Adding the month to
                  > this, to increase precision, will make -111. This can now be
                  > confused with -DDD, day 111 of the year.

                  I think is why increased precision is only done when the exchanging
                  parties agree. I think this gets around the problem, that given some
                  arbitrary sequence can we guess what it is.

                  -Paul
                Your message has been successfully submitted and would be delivered to recipients shortly.