Loading ...
Sorry, an error occurred while loading the content.

189[ISO8601] Re: Clarifications: 5.2.2.2

Expand Messages
  • g1smd@amsat.org
    Jul 16, 2001
    • 0 Attachment
      On 2001-Jul-14 Paul Hill <goodhill@...> wrote:


      [2001-Jul-16]



      These comments are concerning the text of ISO/TC 154 N 362 [PDF]
      document, which is the final draft version of the ISO 8601:2000
      standard. The final published version of ISO 8601:2000 is still
      not available online, but this draft (from only four days
      previously) can still be downloaded from [PDF00005.PDF]:
      <http://lists.ebxml.org/archives/ebxml-core/200104/msg00252.html>.



      >> Section 4.6 refers. 'These leading hyphens may be omitted in
      >> the applications where there is no risk of confusing these
      >> representations with others'. Any leading hyphen would always
      >> be to replace a missing element. Hyphens between elements are
      >> separators in Extended Formats. There are no separators in the
      >> Basic formats. There are never any separators before the first
      >> element, only 'replacement' hyphens for missing elements.

      > This is probably why the note which looks unobvious to me is
      > obvious to you. I read 4.6 with the most important opening
      > phrase "By mutual agreement of the partners in information
      > interchange" Thus, it provides a way in specific applications
      > of this standard to drop something this is stated in the
      > standard. I don't see 4.6 as suggesting that it is the
      > rationale which was used to come up with the formats which
      > are in standard.

      I am very familiar with the various allowed formats, and
      variations, and being familiar with all of that, it is very
      easy for me to miss a point or hint in the wording, or to know
      something which although it isn't actually stated in the standard,
      is actually the way things are done. I do see that you have a
      point here. They state that the leading zeroes *may* be dropped,
      then go on to just automatically drop them, in some of the
      examples, without putting a note against some of them. That is
      not very good. Each and every time they do this, it does need an
      extra note or clarification. However, is para 4.9 perhaps a
      poorly worded way of trying to tell us about this?



      >> It's all in paragraph 4.6 as far as I can see.

      > It says you can drop what is there, but it doesn't say that
      > the full representation would treat the hundreds part of the
      > year as a separate component from the tens and ones of the
      > year. Only a few examples hint at that, but not all of them
      > do and not all exceptions are noted.

      I agree that this is a bit sloppy. It needs rewriting, or some
      additional notes. The minimum required, would be to state that
      the year may be specified by two, or by four, or by more
      digits; and I see a problem here... 121212 is assumed to be
      YYMMDD, but this could be the YYYYYY 121212. Having re-read the
      standard I see that para 4.7 does cover this. Additionally,
      para 4.8 does say that elements do all have a defined length,
      and that leading zeroes must be used to fulfil this. It
      doesn't fully answer your point, so I guess their wording
      needs improving.



      >> Does this still appear in the published ISO 8601:2000?

      > I wouldn't know, I don't have it, I just have the various
      > free downloads. Hopefully this was clear. If not it should
      > be by now.

      I am still waiting for someone on this email list to compile
      a note of all of the changes between the 2000-Dec-19 draft
      and the 2001-Jan-24 final published version of ISO 8601.
      Any volunteers?



      >>> -YY appears in truncated calendar dates, i.e. -YY-MM-DD,
      >>> see examples of section 5.2.1.3.

      >> You have misquoted the standard.

      > Sorry, my mistake. It should have read "-YY-MM".

      No problems. Typos happen, but it did hinder your argument a
      bit. Here, the first hyphen is replacing the 'old' 'CC' and
      the second hyphen is the separator. The Basic format of this
      is -YYMM, where the hyphen again replaces the 'old' 'CC'.
      In 'YY-MM-DD' ISO automatically dropped the leading hyphen,
      as a date like '12-12-12' cannot possibly be anything other
      than 'YY-MM-DD'. Is this hinted at in 4.9?



      >> In all these formats: -YYMM and -YY-MM, and -YY, the hyphen
      >> does replace the missing two digits of the 'century'. In
      >> YY-MM-DD it has been left out, as per para 4.6.

      > Again, that is not what a pedantic read of 4.6 says. 4.6 says
      > me and who ever I communicate with can cut what we see even
      > further when we are only using some agreed upon subset of
      > everything, it doesn't provide a rationale for what is in the
      > standard.

      As above, I do agree that the wording in 4.6 provides a reason
      for this, but does not provide a complete rationale of why this
      is done. I hope ISO clarifies it in the next edition. I do now
      see what your point is; that you *may* drop the hyphen, but
      for some reason, with YY-MM-DD and YYMMDD, ISO have *already*
      dropped it, without clearly saying why. However, is para 4.9
      perhaps a poorly worded way of trying to tell us about this?



      >>> 5.2.2.2 (a) YY-DDD <-- no dash for missing numeric century

      >> Yes, because in 05-005, the '005' has to be the Day of the
      >> Year, there is absolutely no other possibility. Therefore,
      >> the element before that has to be a two-digit year. The
      >> leading hyphen can therefore be omitted, exactly as per the
      >> YY-MM-DD example, above. That is, -YY-DDD is unnecessary,
      >> YY-DDD will suffice.

      > It is too bad that 4.6 doesn't actually introduce the idea that
      > the writers of the standard used the idea as you claim, to come
      > up with their various formats.

      Extra notes would be useful; but I also think there is a 'logic
      error' or 'precedence error' in providing some of the default
      formats. I'll explain more at the end of this message. I've
      hinted at it with the note about YYMMDD and YYYYYY above; but
      it also concerns YYYYMM.



      > Maybe, some discussion at 5.2.1 ... Year would actually set at
      > least me in the right mind set.

      > Also, if your suggestion as to the design is the case, I would
      > expect a note like the one I was surprised to see in 5.2.2.2
      > after 5.2.1.3 (a) and the note at 5.2.3.3 noting all of the
      > variations which are not "fully hyphenated". This would make
      > them all consistent.

      > The note at 5.2.3.3 would not list only one format, but mention
      > all of those which one might think might have a leading dash
      > for missing 'century' and another for missing year pointing
      > out the simplification.

      Now I see what you are saying, I agree that the wording here
      is sub-optimal. You reach a place where you see a format you
      were not expecting, with no previous rationale as to why the
      format is shown like it is. Yes, the standard is deficient
      (unless 4.9 is where its at?) and requires extra notes.



      > In fact, the standard before the first truncated format in the
      > opening paragraph of 5.2.1.3 says "In each case hyphens that
      > indicate components should be used only as indicated or shall
      > be omitted."

      That paragraph, along with 4.6, and now seeing that several notes
      against examples are obviously missing, when all read together
      do raise some doubts. I agree that the wording is poor.



      > That to me hints that some of choices are arbitrary, so don't
      > play around with them. Also, there is no place that states
      > that all formats are mutually unambiguous from each other.

      I knew that last statement to be true (mutually unambiguous),
      but it takes a bit of tracking down to find those words in the
      very last part of 4.1 '... unique and unambiguous'; but is
      that wording as strong as '*mutually* unambiguous'? I think
      that it probably is.

      I have never considered any of the defined formats to be
      'arbitrary' choices, but now that I have condensed all of the
      standard down to a short table, further on, that table does
      appear to show that to be the case for several of the formats,
      most notably with YYMMDD.



      > As I was reading I was looking for just such a statement or
      > examples that violated the idea. I found neither, but that
      > is no proof.

      I now see what you are trying to say; and I think I have found
      one... YYMMDD vs YYYYYY (excepting the note in 4.7). Also, I
      think that YYMMDD should have been disallowed in favour of
      YYYYMM, which is not currently permitted. See the table below.



      > Hopefully this can all be clarified in the next edition.

      >> OVER TO YOU! There are over 100 people out there reading this.
      >> What say all of you? Am I right? Dan Kohn? Fred Bone? Pete
      >> Forman? Aron Roberts?

      > The question is not whether you are right, it is a question of
      > meaning of the standard. Or another way to put it: You may be
      > right, and I have no reason to think you aren't, but the
      > standard is still not clear on where the "should" comes from.

      I wasn't sure why you were hung up on this one word 'should'.
      Now you have explained more, then I am happy to agree with you.
      You are right. Although the standard works the way I have said,
      and the examples follow the method I have stated, nowhere in
      the standard does it state clearly that this is the case, or
      why it should be so, and several notes of clarification are
      obviously missing on a few examples. It takes someone else
      reading it 'fresh' to spot these errors. I am too 'familiar'
      with how it works to pick up a fundamental error like that.
      If you know how something already works, then it isn't always
      readily obvious that some little note or clarification is
      actually missing. There are 'hints' in 4.6 and 4.9, but not
      enough to satisfy, now that I have read it about 6 times.



      > I personally am now convinced by what you have provided that
      > I was misled by the facts that the very first truncated example
      > YYMMDD doesn't have a leading dash, doesn't have a note which
      > points this out and there is nothing up to that point which
      > says what the expected style is, and there is no consistency
      > in the following examples, so when I finally get to 5.2.2.2
      > "note: ... should be ..." I go back and read looking for
      > something that tells me what should be anywhere and all I see
      > are various examples without explanation that any are
      > exceptions to any expectations. Thus reading the standard
      > does not make me think there is any "should" involved other
      > than the examples as given. That is the source of my question
      > about this note.

      Fully understood. Now you have explained it, I am tending to
      agree with you. Your point was not that there was an error in
      the format they were suggesting to use, but that there was no
      notes to explain why it should be formatted that way, when you
      were actually expecting to see something else there, after
      following the 'logic' of the previous few paragraphs. Para
      4.9 may be referring, but it isn't obvious or clearly worded.
      I keep saying *may* because I don't really know if it is, or
      if it isn't.



      I have checked the next part of this message very thoroughly
      but due to the huge complexity in compiling it, I cannot
      guarantee that I caught all of the initial typing errors.



      In the next part, I have used a Year of 1212, Month of 12,
      and Day of 12, so that you are not influenced by digits like
      '99' seemingly assuring you these are the last two digits of
      a year... when in fact two digits stated on their own are
      actually for the FIRST two digits of the year (unless by
      'mutual agreement', etc).



      This next part needs to be viewed using a NON-proportional
      typeface, so that it aligns in columns. Copy and paste to a
      Word Processor, if necessary, in order to achieve this.



      There is an inconsistency in the standard, which becomes
      obvious when I list the allowed and non-allowed formats in
      a table, like this (non-allowed are marked with x here;
      'allowed only by mutual agreement' are marked with z; and
      the '!' marking means 'see notes'; read the left and right
      half separately):


      YYYY-MM-DD YYYY-MM-DD YYYYMMDD YYYYMMDD
      ---------- ---------- -------- --------

      a YY 12 YY 12 a
      b YYYY 1212 YYYY 1212 b
      c YYYY-MM 1212-12 x! YYYYMM 121212 c
      d YYYY-MM-DD 1212-12-12 YYYYMMDD 12121212 d
      e -YY -12 -YY -12 e
      f z YY 12 z YY 12 f
      g -YY-MM -12-12 -YYMM -1212 g
      h z YY-MM 12-12 z YYMM 1212 h
      i x! -YY-MM-DD -12-12-12 x! -YYMMDD -121212 i
      j ! YY-MM-DD 12-12-12 ! YYMMDD 121212 j
      k --MM --12 --MM --12 k
      l x -MM -12 x -MM -12 l
      m z MM 12 z MM 12 m
      n --MM-DD --12-12 --MMDD --1212 n
      o x -MM-DD -12-12 x -MMDD -1212 o
      p z MM-DD 12-12 z MMDD 1212 p
      q ---DD ---12 ---DD ---12 q
      r x --DD --12 x --DD --12 r
      s x -DD -12 x -DD -12 s
      t z DD 12 z DD 12 t

      An additional note must also state that leading hyphens only
      replace elements, and are never separators, for this to work.
      Also note that one digit elements, and three, five and seven
      digit formats are not allowed (unless, I suppose 'by mutual
      agreement...'). In any case, a one digit element would always
      have to be the most left hand element in the expression (e.g.
      YYYMM, YMM, YMMDD) except for showing a decade ('197' for
      the 1970s?), but these are 'horrible' structures, the latter
      especially possibly being almost outside the scope of the
      ISO 8601 standard.


      Notes and Explanations:
      -----------------------

      LEFT: RIGHT:

      Not allowed (x left), because c - In my opinion, this format
      (except z by mutual agreement): should be allowed, not j !
      f - used by a, confuse with m t f - used by a, confuse with m t
      h - could be confused with p h - used by b, confuse with p
      i - logical, but ISO use j i - logical, but ISO use j
      j - allowed!! In my opinion, j - allowed!! In my opinion
      it should not be allowed. it should not be allowed.
      l - used by e, confuse with s l - used by e, confuse with s
      m - used by a, confuse with f t m - used by a, confuse with f t
      o - format used by g o - format used by g
      p - could be confused with h p - used by b, confuse with h
      r - format used by k r - format used by k
      s - used by e, confuse with l s - used by e, confuse with l
      t - used by a, confuse with f m t - used by a, confuse with f m


      I say 'allowed!!' against the LEFT 'j' format only because
      it is the only 'right-justified' Extended date format that
      is allowed (without a leading hyphen to show the missing
      elements), and it is out of place. On the RIGHT side,
      YYMMDD (j), which is also allowed, will therefore conflict
      with YYYYYY. In my opinion, usage of YYMMDD in the standard
      is NOT correct. They should have allowed the YYYYMM format
      instead. It is strange that the YYYYMM (RIGHT c) date format
      is not allowed, as it totally breaks the logic of the table,
      if you are looking for a pattern. The pattern to me is that
      dates are 'left justified', with hyphens in place of missing
      left elements, one hyphen per two digits omitted (which is a
      definition that neatly avoids having to use a word like
      'century'; as long as it is also stated that the year can be
      two or four digits; or more digits, just as long as only two
      at a time are added), and that reduced precision simply
      deletes digits two at a time from the right of the date.

      Now, looking at the standard condensed to this table, it is
      obvious that the problems also occur when there is a format
      that is allowed in the left column, but disallowed in the
      right column (e.g. c), as the standard does not provide
      enough information to support why this is done. To me that
      is an error.

      Also, the 'mutual agreement' problem appears here again.
      Representations that have the prescribed leading hyphens
      omitted can be used only by mutual agreement... except
      that the format at j seems to be the default, rather than
      the format stated in i. That is, for all the others, mutual
      agreement is required, but for j it has already been
      forced upon us to agree to this. In doing this, you get the
      'logic error' with the RIGHT c entry being disallowed (in
      order to satisfy the {non written, as far as I can see} rule
      that any representation can have only one implied meaning
      unless mutual agreement has already been obtained).

      I guess they had to include YYMMDD and exclude YYYYMM simply
      because millions of computer systems were already using YYMMDD.
      However that probably helped people to avoid thinking about
      Y2K problems for far longer than they should have done.
      1988 would actually have been early enough for every version
      of Windows (3.x onwards) to be completely free of all such
      problems, for example.



      This next part needs to be viewed using a NON-proportional
      typeface, so that it aligns in columns.



      The table can be rearranged to ask what a numerical format
      should be decoded as. To keep it simple, I have not divided
      it into Basic and Extended formats. Anything with a hyphen
      between elements is an Extended format. Writing the table
      this way, I have included some formats that the ISO standard
      says are 'Not Applicable'. There cannot be a way to tell if
      '1950' is supposed to be a Basic format Year or an Extended
      format Year. I have ignored this and included it under both
      styles. The table produces (again, a, b, etc, refer to notes
      after) the following result:


      ALLOWED DISALLOWED
      ------- ----------

      1212-12-12 YYYY-MM-DD
      1212-12 YYYY-MM
      1212 YYYY z YYMM MMDD
      12 YY (19 of 1950) z YY (50 of 1950) MM DD

      12121212 YYYYMMDD
      121212 a YYMMDD ! x! YYYYMM !
      1212 YYYY z YYMM MMDD
      12 YY (19 of 1950) z YY (50 of 1950) MM DD

      1212-12-12 YYYY-MM-DD
      12-12-12 b YY-MM-DD !
      12-12 c n/a z YY-MM MM-DD !
      12 YY (19 of 1950) z YY (50 of 1950) MM DD

      -12-12-12 d n/a ! x! -YY-MM-DD (use YY-MM-DD) !
      -12-12 -YY-MM x -MM-DD
      -12 -YY (50 of 1950) x -MM -DD

      -121212 e n/a ! x! -YYMMDD ! (use YYMMDD) !
      -1212 -YYMM x -MMDD
      -12 -YY (50 of 1950) x -MM -DD

      --12-12 --MM-DD
      --12 --MM x --DD

      --1212 --MMDD
      --12 --MM x --DD

      ---12 ---DD

      12-1212 n/a x Not allowed at all.


      Notes and Explanations:
      -----------------------

      At a, I wish that 121212 were really YYYYMM, not YYMMDD.

      People use b, but logically the full date is -12-12-12.
      It would be useful if both b and YYMMDD were disallowed, or
      at least reverted to the 'use by mutual agreement' status.

      At c, I am glad that both are not valid, but logically since
      YY-MM-DD was allowed at b, then 12-12 would have to be MM-DD
      (both would then be 'right justified'). However these would be
      the only two 'right justified' dates in the table. Everything
      else is 'left justified, with a hyphen to replace each missing
      pair of digits'. So really it is b that breaks this unwritten
      rule. I wonder if ISO realise what they have done?

      At d and e, these are the formats you would logically expect
      to see being used, but ISO just automatically dropped the
      leading hyphen, producing the formats at a and b instead.
      By breaking the 'pattern' you are correct to say that some
      choices of format now appear to be arbitrary.

      And, z, just confirms the old 'by mutual agreement you can
      drop leading hyphens for any of these date formats' rule,
      and use all the meanings that I have currently placed in the
      'disallowed' column if you need to do so.

      If a line has an 'x', then entries in the 'disallowed' column
      to the right of the 'x' are never allowed to be used.

      For anything marked 'n/a !' in the allowed column, the '!'
      points to surprise at the blank; that ISO have disallowed
      what you would logically expect to be there.

      A date format in the allowed column, with a '!' included, you
      would logically expect to be disallowed, but ISO included it.

      Having allowed YYMMDD in, you would expect YY on its own to
      simply be the 50 of 1950, but it actually represents the 19
      part.

      Logically, both of the entries at ALLOWED 'a' and ALLOWED 'b'
      should really be 'by mutual agreement' formats, but ISO chose
      not to do this.

      From this, it is now obvious that ISO have 'jumped the gun'
      by automatically deleting leading hyphens on some formats
      that would be logically expected to have one or more leading
      hyphens, and have not provided a note to say that this is
      what has happened (unless 4.6 and 4.9 is the hint).



      Another part of the problem is that the date is written as
      4-2-2. If there were a separator between the 'Century' and
      the two digit Year, then semantic rules would be a lot easier.

      Today would be 20-01-07-15. So, leaving off left elements
      would give -01-07-15, --07-15, ---15, and leaving off right
      elements would give 20-01-07, 20-01, and 20. It's the fact
      that the Year is four digits, and the other elements are
      only two digits that skews the pattern; as well as the
      inclusion of YYMMDD in the default standard, rather then
      YYYYMM. For Basic format dates read 20010715, -010715,
      --0715, ---15, 200107, 2001, and 20 noting that here 200107
      is YYYYMM, because YYMMDD is actually done by using -YYMMDD.
      That IS consistent. ISO Ver 3?



      Another way of solving the problem, would be to set up a
      standard where the date is a full 8 digits, and has optional
      separators, but if any pair of digits on either the left or
      right end of the date are missing then they are replaced,
      each pair of missing digits, with a hyphen (whereas ISO 8601
      only ever places hyphens on the left side of a date), as
      well as there being a rule to say that there are no leading
      or trailing separators, only hyphens used as replacements.
      In other words, separators are only ever used in order to
      separate digits.


      This would produce something like:


      BASIC FORMAT EXTENDED FORMAT
      ------------ ---------------

      12121212 YYYYMMDD 1212-12-12 YYYY-MM-DD
      121212- YYYYMM- 1212-12- YYYY-MM-
      1212-- YYYY-- 1212-- YYYY--
      12--- YY--- 12--- YY---
      -121212 -YYMMDD -12-12-12 -YY-MM-DD
      -1212- -YYMM- -12-12- -YY-MM-
      -12-- -YY-- -12-- -YY--
      --1212 --MMDD --12-12 --MM-DD
      --12- --MM- --12- --MM-
      ---12 ---DD ---12 ---DD


      You can always check the date is correctly formed, by using
      a very simple set of rules... Ignore all hyphens BETWEEN
      digits. Group all digits into pairs. Count each leading
      hyphen, each digit pair, and each trailing hyphen. There
      should always be exactly FOUR units.

      A further rule would state that in order to join the date
      to a time, the date should NOT have trailing hyphens, whilst
      still having four units, i.e. the right hand end MUST be
      the Day Number digits.

      This only works for a four digit year system. For all years
      beyond 9999, the standard has to be rewritten to add formats
      with the Year stated as six digits, an extra leading hyphen
      to be added to every one of the above formats, and with the
      format checking rule updated to say that there are now FIVE
      units, rather than only four.

      I don't propose this as a replacement to ISO 8601, but just
      mention it as all these points apply to the current standard
      in some way.



      The more I look at things, the more I am convinced that
      including YYMMDD rather than YYYYMM is the root cause of
      most of the problem. This looks like being a Y2K problem
      in another guise. The original standard was written well
      before the Year 2000. Including YYMMDD simply pandered
      to the, then current, vogue for a 'default' of a two-digit
      year, without thought for the overall logic of the whole
      document. The new standard makes reference to using all
      four digits for the Year to avoid these sorts of problems,
      but this could also have helped in revising the logic of
      the standard... by suggesting that users go back to using
      -YYMMDD in preference to just YYMMDD alone, or, even
      better still, just adopt the full four digit year. This
      would then allow 121212 to be 'YYYYMM' as originally, and
      logically, expected.



      Now, let's re-do those first two tables above for the Ordinal
      Day of Year Format. Note that I have repeated formats like
      YY and YYYY from the first table for completeness; you can't
      tell if YYYY is part of a 'normal' Gregorian Calendar date,
      an Ordinal Date, or a 'Week Number and Day of Week' date, so
      I repeat them here, whereas the official standard does not:


      YYYY-DDD YYYY-DDD YYYYDDD YYYYDDD
      -------- -------- ------- -------

      a YY 12 YY 12 a
      b YYYY 1212 YYYY 1212 b
      c YYYY-DDD 1212-121 YYYYDDD 1212121 c
      d -YY -12 -YY -12 d
      e z YY 12 z YY 12 e
      f x! -YY-DDD -12-121 x! -YYDDD -12121 f
      g ! YY-DDD 12-121 ! YYDDD 12121 g
      h x! --DDD --121 x! --DDD --121 h
      i ! -DDD -121 ! -DDD -121 i
      j z DDD 121 z DDD 121 j
      k x (-)(-)DD 12 x (-)(-)DD 12 k
      l x (-)(-)D 1 x (-)(-)D 1 l

      An additional note must also state that leading hyphens only
      replace elements, and are never separators, for this to work.
      Again 'z' refers to 'mutual agreement' formats, and 'x' means
      not permitted.


      Notes and Explanations:
      -----------------------

      LEFT: RIGHT:

      c - complete c - 7 digits: unambiguous.
      e - used by a e - used by a
      f - is the logical default, but f - is the logical default, but
      g - does not conflict with g - does not conflict with
      anything else anything else (5 digits).
      h - logical default, but h - logical default, but
      i - ISO dropped one leading '-' i - ISO dropped one leading '-'
      j - mutual agreement, but three j - mutual agreement, but three
      digits are unambiguous anyway digits are unambiguous anyway
      k - not allowed, must be 3 digits k - not allowed, must be 3 digits
      l - not allowed, must be 3 digits l - not allowed, must be 3 digits

      The Year can be two or four digits (with a leading hyphen
      mandatory for some two digit formats), the Day of Year must
      be three digits.

      You would expect 'g' to be by 'mutual agreement' and for
      'i' to be 'not permitted' if you compare this table with
      the one for the Gregorian Calendar Date, but 4.6 and 4.9
      override this. Entry 'j' is still 'by mutual agreement'
      as would be expected.



      The table can be rearranged to ask what a numerical format
      should be decoded as. To keep it simple, I have not divided
      it into Basic and Extended formats. Anything with a hyphen
      between elements is an Extended format. Writing the table
      in this way, I have included some formats that the ISO
      standard says are 'Not Applicable'. There cannot be a way
      to tell if '1950' is supposed to be a Basic format Year or
      an Extended format Year. I have ignored this and included
      it under both styles. The table produces (again, a, b, etc,
      refer to notes after) the following result:


      ALLOWED DISALLOWED
      ------- ----------

      1212-121 YYYY-DDD
      1212 YYYY x YDDD
      12 YY (19 of 1950) z YY (50 of 1950)

      1212121 YYYYDDD
      1212 YYYY x YDDD
      12 YY (19 of 1950) z YY (50 of 1950)

      1212-121 YYYY-DDD
      12-121 a YY-DDD !
      121 n/a ! z DDD x YYY

      1212121 YYYYDDD
      12121 b YYDDD !
      121 n/a ! z DDD x YYY

      -12-121 c n/a ! x! -YY-DDD (use YY-DDD) !
      -121 e -DDD ! x -YYY (-YY or YYYY)
      -12 -YY (50 of 1950) x -DD (must be DDD)

      -12121 d n/a ! x! -YYDDD ! (use YYDDD) !
      -121 e -DDD ! x -YYY (-YY or YYYY)
      -12 -YY (50 of 1950) x -DD (must be DDD)

      --121 f n/a ! x! --DDD ! (use -DDD) !
      --12 g n/a x --DD (must be DDD)


      Notes and Explanations:
      -----------------------

      At a, ISO automatically dropped the hyphen of -YY-DDD to
      make YY-DDD, just as they did at Gregorian Date 'a'.

      At b, ISO automatically dropped the hyphen of -YYDDD to
      make YYDDD, just as they did at Gregorian Date 'b'.

      For c and d, the disallowed format is the one that you would
      be logically expecting to be allowed, but see notes a and b.

      For e, you would logically expect --DDD, but with no risk
      of misinterpretation ISO automatically dropped the first
      hyphen, leaving -DDD. You could drop all hyphens, because
      this is the only three digit element in the standard
      (unless by mutual agreement you are exchanging YYY, or you
      have dropped the leading W from W527), but ISO insist here
      in having a minimum of one leading hyphen in place of all
      of the missing elements.

      For f, the disallowed format is the one that you would
      be logically expecting to be allowed, but see note e.

      At g, --12 always means --MM for Gregorian Calendar Date;
      two digit day is not allowed here.

      And, z, just confirms the old 'by mutual agreement you can
      drop leading hyphens for any of these date formats' rule,
      and use all the meanings that I have currently placed in the
      'disallowed' column if you need to do so.

      If a line has an 'x', then entries to the right of the 'x'
      in the 'disallowed' column are never allowed to be used.

      For anything marked 'n/a !' in the allowed column, the '!'
      points to surprise at the blank; that ISO have disallowed
      what you would logically expect to be there.

      A date format in the allowed column, with a '!' included, you
      would logically expect to be disallowed, but ISO included it.

      Having allowed YYMMDD in, you would expect YY on its own to
      simply be the 50 of 1950, but it actually represents the 19
      part.

      Of course, whilst formats here for Ordinal Day of Year should
      not conflict with each other, they also must not conflict with
      any format already being used for Gregorian Calendar Date, and
      vice versa. Ditto for the 'Week Number and Day of Week' format.



      There is no need to do the tables for the Week format, because
      the letter 'W' built in, before the week number, will always
      show what is going on, and the single digit day (even if it is
      on its own) has to be the Day of Week, because a single digit
      is not allowed to be used for any other Date (or Time)
      representation (except by 'mutual agreement'...).

      However, looking closely at 5.2.3.3 (c) and (d), reveals
      formats with a SINGLE digit YEAR. This is something not
      mentioned at all for Gregorian or Ordinal date. Why should
      the Week/Day format be any different? There is absolutely
      no explanation here, other than to say that I believe that
      the 1988 version of the standard used to say that formats
      were not limited to the examples listed in the standard,
      just as long as all formats followed the general rules
      about element ordering, use of separators, and consistent
      element length achieved by use of leading zeroes where
      required, etc.

      So, I guess that formats like YYY and Y-DDD and YYY-DDD
      are allowed 'by mutual agreement' but their Basic Formats
      of YYY and YDDD and YYYDDD have a real risk of being
      interpreted, wrongly, as DDD and YYYY and YYMMDD (or
      YYYYMM) respectively.



      Para 4.5, Note 1 says 'The hyphen may also be used to
      indicate omitted components', but it doesn't say whether
      these are components omitted on the left, right, or in the
      middle of a date. It isn't permitted to omit components
      from the middle of a date, but the note does give the
      initial impression that 1999-- is just as valid a date
      format as --1231 is.



      All of these complications with ISO 8601 are why, when most
      people implement the standard, they actually just list the
      formats that they are going to allow, these usually being
      just a small subset of what is actually possible. I *always*
      use a four digit year, for example.



      A BIG MISTAKE. This paragraph, and the next three, have been
      inserted after I had written the whole of this message, and
      was just about to hit the 'Send' button.

      Basic Formats do not include separators. So, how come para
      5.2.1.2 (a) does include a hyphen separator? The YYYY-MM
      format is a *Basic* Format according to the Standard. How
      can that be right? Extended Formats include separators.
      Basic Formats do not include separators, except YYYY-MM
      we are supposed to believe.

      Stand up ISO. You have been caught out. The choice of defined
      formats appears to be arbitrary. I expected to see, and have
      always understood it to be so, that the choice of formats
      was based on a logical pattern. This does not appear to be
      the case. I had already spotted a couple of dodges (regarding
      YYMMDD, YYDDD, and I thought YYYYMM), but this one point,
      'YYYY-MM', has eluded me for a very long time (circa 7
      years!). I have not adjusted my tables to reflect this
      last point, so you can work out for yourself the effect
      that the ISO kludge has on the logic. To be fair, I don't
      think ISO have ever claimed that the standard was based
      on a logical pattern, other then the highest element first
      'Year-Month-Day-Hour-Minute-Second' element ordering, and
      unambiguous representation of dates and times and zones.

      This definition for YYYY-MM carries over from the earlier
      1988 standard which can be downloaded from my FTP site at:
      <ftp://ftp.qsl.net/pub/g1smd/> if you haven't already got
      it. It means that in my first table, everything on the LEFT
      is an Extended Format, except that the entry at LEFT 'c'
      is to be included on the RIGHT and be called a Basic Format.
      That just does not make any sense at all. And, it's all
      because they wanted to define 121212 as YYMMDD, rather than
      accepting YYYYMM. The 'similar' YYYY-DDD (Ordinal Date)
      format is correctly listed as an Extended Format, so this
      listing of YYYY-MM as a Basic Format just does not make
      any sense at all. I've been duped!



      This is the most complicated thing I have written for a few
      weeks. I really hope I have caught all of the typos!



      Cheers,

      Ian.


      <mail://g1smd@...>

      <http://www.qsl.net/g1smd/>
      <http://home.freeuk.net/g1smd/>
      <http://ourworld.compuserve.com/homepages/dstrange/y2k.htm>

      <ftp://ftp.funet.fi/pub/ham/misc/g1smd.zip>
      <ftp://ftp.qsl.net/pub/g1smd/>


      [2001-07-16]

      .end
    • Show all 15 messages in this topic