Loading ...
Sorry, an error occurred while loading the content.

Re: [sgf-std] Re: SGF question regarding CA property

Expand Messages
  • Arno Hollosi
    ... Not necessarily. We could define that the CA[] property is only valid for text fields *after* the CA[] property occurs in the text. But there is one more
    Message 1 of 4 , Aug 24 5:07 AM
    • 0 Attachment
      > backward compatible, but UTF-16 is not, so the parser will never be able
      > to find the CA[] property in the first place.

      Not necessarily. We could define that the CA[] property is only valid for
      text fields *after* the CA[] property occurs in the text.

      But there is one more argument against UTF-16: the parser should be able
      to find the end of a text property by looking for Latin1 ']' [*] (i.e.
      without knowing about the encoding of the text) I am not sure that UTF-16
      has no byte value of ']' occuring e.g. as part of some chinese character.

      So the argument for Latin1 compatibility is a very strong one.

      /Arno

      [*] actually: Latin1 ']' that is not preceeded by Latin1 '\'
    • Lauri Paatero
      HI, In order to promote compatibility and keep testing work sensible, I think standard should say that UTF-8 is only allowed encoding for unicode. Other uncode
      Message 2 of 4 , Aug 24 9:44 AM
      • 0 Attachment
        HI,

        In order to promote compatibility and keep testing work sensible, I
        think standard should say that UTF-8 is only allowed encoding for
        unicode. Other uncode encodings do not give any additional value (or do
        they?).

        If CA would affect only properties after it, then order of properties in
        root node is critical.
        This is quite troublesome change, as currently order is irrelevant.

        It is possible to parse any character set, even binary data, from text
        values. Process goes as follows:
        - Input is stream of bytes.
        - These bytes are interpreted as latin-1 to find bytes that belongs text
        values. Recognition can understand quoting (\), as it is in latin-1, so
        there is no need to know actual character set.
        - Latin-1 quoting ("bytes") are removed from text values.
        - Now bytes (quoting removed) are interpreted using character set found
        in CA property.

        While this possible, I do not see any reason to expect applications to
        implement this. For example in Java 1.3 this would create significant
        overhead.

        regards
        Lauri Paatero


        Arno Hollosi wrote:
        >> backward compatible, but UTF-16 is not, so the parser will never be able
        >> to find the CA[] property in the first place.
        >>
        >
        > Not necessarily. We could define that the CA[] property is only valid for
        > text fields *after* the CA[] property occurs in the text.
        >
        > But there is one more argument against UTF-16: the parser should be able
        > to find the end of a text property by looking for Latin1 ']' [*] (i.e.
        > without knowing about the encoding of the text) I am not sure that UTF-16
        > has no byte value of ']' occuring e.g. as part of some chinese character.
        >
        > So the argument for Latin1 compatibility is a very strong one.
        >
        > /Arno
        >
        > [*] actually: Latin1 ']' that is not preceeded by Latin1 '\'
        >
        >
        >
        > SGF spec: http://www.red-bean.com/sgf/
        > Contact: Arno Hollosi <ahollosi@...>
        > Yahoo! Groups Links
        >
        >
        >
        >
        >
        >
        >
        >
      Your message has been successfully submitted and would be delivered to recipients shortly.