Loading ...
Sorry, an error occurred while loading the content.

Re: [sgf-std] Re: SGF question regarding CA property

Expand Messages
  • Lauri Paatero
    HI, In order to promote compatibility and keep testing work sensible, I think standard should say that UTF-8 is only allowed encoding for unicode. Other uncode
    Message 1 of 4 , Aug 24, 2006
    • 0 Attachment
      HI,

      In order to promote compatibility and keep testing work sensible, I
      think standard should say that UTF-8 is only allowed encoding for
      unicode. Other uncode encodings do not give any additional value (or do
      they?).

      If CA would affect only properties after it, then order of properties in
      root node is critical.
      This is quite troublesome change, as currently order is irrelevant.

      It is possible to parse any character set, even binary data, from text
      values. Process goes as follows:
      - Input is stream of bytes.
      - These bytes are interpreted as latin-1 to find bytes that belongs text
      values. Recognition can understand quoting (\), as it is in latin-1, so
      there is no need to know actual character set.
      - Latin-1 quoting ("bytes") are removed from text values.
      - Now bytes (quoting removed) are interpreted using character set found
      in CA property.

      While this possible, I do not see any reason to expect applications to
      implement this. For example in Java 1.3 this would create significant
      overhead.

      regards
      Lauri Paatero


      Arno Hollosi wrote:
      >> backward compatible, but UTF-16 is not, so the parser will never be able
      >> to find the CA[] property in the first place.
      >>
      >
      > Not necessarily. We could define that the CA[] property is only valid for
      > text fields *after* the CA[] property occurs in the text.
      >
      > But there is one more argument against UTF-16: the parser should be able
      > to find the end of a text property by looking for Latin1 ']' [*] (i.e.
      > without knowing about the encoding of the text) I am not sure that UTF-16
      > has no byte value of ']' occuring e.g. as part of some chinese character.
      >
      > So the argument for Latin1 compatibility is a very strong one.
      >
      > /Arno
      >
      > [*] actually: Latin1 ']' that is not preceeded by Latin1 '\'
      >
      >
      >
      > SGF spec: http://www.red-bean.com/sgf/
      > Contact: Arno Hollosi <ahollosi@...>
      > Yahoo! Groups Links
      >
      >
      >
      >
      >
      >
      >
      >
    Your message has been successfully submitted and would be delivered to recipients shortly.