Loading ...
Sorry, an error occurred while loading the content.

Re: SGF question regarding CA property

Expand Messages
  • Arno Hollosi
    Dear Christian, I m CCing the sgf-std list, as I guess this is of interest to others as well. ... Technically speaking you are right. De facto, just about all
    Message 1 of 4 , Aug 18 1:07 PM
    • 0 Attachment
      Dear Christian,

      I'm CCing the sgf-std list, as I guess this is of interest to others as
      well.

      > I've read http://www.red-bean.com/sgf/properties.html#CA, but it's
      > not clear to me what values the and the CA property can have.
      >
      > I understand that "US-ASCII" and "ISO-8859-1" are legal values
      > according to RFC 1345, but my gGo 1.0 generated properties like
      > "CA[UTF-8]" and as far as I can see, this isn't defined by RFC 1345,
      > so I'd think this would be an illegal value. Am I missing something?

      Technically speaking you are right. De facto, just about all programs
      that are able to deal with characters outside Latin1 use UTF-8 encoding.
      I guess I should update the spec to make UTF-8 (and maybe other Unicode
      encodings such as UTF-16) legal values as well.

      > Do you know to what extent SGF parsers actually deal with all the
      > character sets RFC 1345 defines? For example, how is Japanese
      > typically dealt with in terms of SGF?

      UTF-8.

      Thanks for bringing this to my attention. I shall update the spec soonish.

      regards,
      /Arno
    • William M. Shubert
      I d just like to note, that UTF-16 is not possible as a CA[] parameter. To get to the CA[] property itself, you must be able to parse at least some text
      Message 2 of 4 , Aug 19 12:00 AM
      • 0 Attachment
        I'd just like to note, that UTF-16 is not possible as a CA[] parameter.
        To get to the CA[] property itself, you must be able to parse at least
        some text without knowing the character set, so the character set must
        be backward compatible with ASCII. ISO-8859-1 and UTF-8 are both
        backward compatible, but UTF-16 is not, so the parser will never be able
        to find the CA[] property in the first place.

        On Fri, 2006-08-18 at 22:07 +0200, Arno Hollosi wrote:
        ...
        > I guess I should update the spec to make UTF-8 (and maybe other Unicode
        > encodings such as UTF-16) legal values as well.
        ...
      • Arno Hollosi
        ... Not necessarily. We could define that the CA[] property is only valid for text fields *after* the CA[] property occurs in the text. But there is one more
        Message 3 of 4 , Aug 24 5:07 AM
        • 0 Attachment
          > backward compatible, but UTF-16 is not, so the parser will never be able
          > to find the CA[] property in the first place.

          Not necessarily. We could define that the CA[] property is only valid for
          text fields *after* the CA[] property occurs in the text.

          But there is one more argument against UTF-16: the parser should be able
          to find the end of a text property by looking for Latin1 ']' [*] (i.e.
          without knowing about the encoding of the text) I am not sure that UTF-16
          has no byte value of ']' occuring e.g. as part of some chinese character.

          So the argument for Latin1 compatibility is a very strong one.

          /Arno

          [*] actually: Latin1 ']' that is not preceeded by Latin1 '\'
        • Lauri Paatero
          HI, In order to promote compatibility and keep testing work sensible, I think standard should say that UTF-8 is only allowed encoding for unicode. Other uncode
          Message 4 of 4 , Aug 24 9:44 AM
          • 0 Attachment
            HI,

            In order to promote compatibility and keep testing work sensible, I
            think standard should say that UTF-8 is only allowed encoding for
            unicode. Other uncode encodings do not give any additional value (or do
            they?).

            If CA would affect only properties after it, then order of properties in
            root node is critical.
            This is quite troublesome change, as currently order is irrelevant.

            It is possible to parse any character set, even binary data, from text
            values. Process goes as follows:
            - Input is stream of bytes.
            - These bytes are interpreted as latin-1 to find bytes that belongs text
            values. Recognition can understand quoting (\), as it is in latin-1, so
            there is no need to know actual character set.
            - Latin-1 quoting ("bytes") are removed from text values.
            - Now bytes (quoting removed) are interpreted using character set found
            in CA property.

            While this possible, I do not see any reason to expect applications to
            implement this. For example in Java 1.3 this would create significant
            overhead.

            regards
            Lauri Paatero


            Arno Hollosi wrote:
            >> backward compatible, but UTF-16 is not, so the parser will never be able
            >> to find the CA[] property in the first place.
            >>
            >
            > Not necessarily. We could define that the CA[] property is only valid for
            > text fields *after* the CA[] property occurs in the text.
            >
            > But there is one more argument against UTF-16: the parser should be able
            > to find the end of a text property by looking for Latin1 ']' [*] (i.e.
            > without knowing about the encoding of the text) I am not sure that UTF-16
            > has no byte value of ']' occuring e.g. as part of some chinese character.
            >
            > So the argument for Latin1 compatibility is a very strong one.
            >
            > /Arno
            >
            > [*] actually: Latin1 ']' that is not preceeded by Latin1 '\'
            >
            >
            >
            > SGF spec: http://www.red-bean.com/sgf/
            > Contact: Arno Hollosi <ahollosi@...>
            > Yahoo! Groups Links
            >
            >
            >
            >
            >
            >
            >
            >
          Your message has been successfully submitted and would be delivered to recipients shortly.