Loading ...
Sorry, an error occurred while loading the content.
 

Re: [sgf-std] Re: SGF question regarding CA property

Expand Messages
  • William M. Shubert
    I d just like to note, that UTF-16 is not possible as a CA[] parameter. To get to the CA[] property itself, you must be able to parse at least some text
    Message 1 of 4 , Aug 19, 2006
      I'd just like to note, that UTF-16 is not possible as a CA[] parameter.
      To get to the CA[] property itself, you must be able to parse at least
      some text without knowing the character set, so the character set must
      be backward compatible with ASCII. ISO-8859-1 and UTF-8 are both
      backward compatible, but UTF-16 is not, so the parser will never be able
      to find the CA[] property in the first place.

      On Fri, 2006-08-18 at 22:07 +0200, Arno Hollosi wrote:
      ...
      > I guess I should update the spec to make UTF-8 (and maybe other Unicode
      > encodings such as UTF-16) legal values as well.
      ...
    • Arno Hollosi
      ... Not necessarily. We could define that the CA[] property is only valid for text fields *after* the CA[] property occurs in the text. But there is one more
      Message 2 of 4 , Aug 24, 2006
        > backward compatible, but UTF-16 is not, so the parser will never be able
        > to find the CA[] property in the first place.

        Not necessarily. We could define that the CA[] property is only valid for
        text fields *after* the CA[] property occurs in the text.

        But there is one more argument against UTF-16: the parser should be able
        to find the end of a text property by looking for Latin1 ']' [*] (i.e.
        without knowing about the encoding of the text) I am not sure that UTF-16
        has no byte value of ']' occuring e.g. as part of some chinese character.

        So the argument for Latin1 compatibility is a very strong one.

        /Arno

        [*] actually: Latin1 ']' that is not preceeded by Latin1 '\'
      • Lauri Paatero
        HI, In order to promote compatibility and keep testing work sensible, I think standard should say that UTF-8 is only allowed encoding for unicode. Other uncode
        Message 3 of 4 , Aug 24, 2006
          HI,

          In order to promote compatibility and keep testing work sensible, I
          think standard should say that UTF-8 is only allowed encoding for
          unicode. Other uncode encodings do not give any additional value (or do
          they?).

          If CA would affect only properties after it, then order of properties in
          root node is critical.
          This is quite troublesome change, as currently order is irrelevant.

          It is possible to parse any character set, even binary data, from text
          values. Process goes as follows:
          - Input is stream of bytes.
          - These bytes are interpreted as latin-1 to find bytes that belongs text
          values. Recognition can understand quoting (\), as it is in latin-1, so
          there is no need to know actual character set.
          - Latin-1 quoting ("bytes") are removed from text values.
          - Now bytes (quoting removed) are interpreted using character set found
          in CA property.

          While this possible, I do not see any reason to expect applications to
          implement this. For example in Java 1.3 this would create significant
          overhead.

          regards
          Lauri Paatero


          Arno Hollosi wrote:
          >> backward compatible, but UTF-16 is not, so the parser will never be able
          >> to find the CA[] property in the first place.
          >>
          >
          > Not necessarily. We could define that the CA[] property is only valid for
          > text fields *after* the CA[] property occurs in the text.
          >
          > But there is one more argument against UTF-16: the parser should be able
          > to find the end of a text property by looking for Latin1 ']' [*] (i.e.
          > without knowing about the encoding of the text) I am not sure that UTF-16
          > has no byte value of ']' occuring e.g. as part of some chinese character.
          >
          > So the argument for Latin1 compatibility is a very strong one.
          >
          > /Arno
          >
          > [*] actually: Latin1 ']' that is not preceeded by Latin1 '\'
          >
          >
          >
          > SGF spec: http://www.red-bean.com/sgf/
          > Contact: Arno Hollosi <ahollosi@...>
          > Yahoo! Groups Links
          >
          >
          >
          >
          >
          >
          >
          >
        Your message has been successfully submitted and would be delivered to recipients shortly.