Re: [sgf-std] Re: SGF question regarding CA property
In order to promote compatibility and keep testing work sensible, I
think standard should say that UTF-8 is only allowed encoding for
unicode. Other uncode encodings do not give any additional value (or do
If CA would affect only properties after it, then order of properties in
root node is critical.
This is quite troublesome change, as currently order is irrelevant.
It is possible to parse any character set, even binary data, from text
values. Process goes as follows:
- Input is stream of bytes.
- These bytes are interpreted as latin-1 to find bytes that belongs text
values. Recognition can understand quoting (\), as it is in latin-1, so
there is no need to know actual character set.
- Latin-1 quoting ("bytes") are removed from text values.
- Now bytes (quoting removed) are interpreted using character set found
in CA property.
While this possible, I do not see any reason to expect applications to
implement this. For example in Java 1.3 this would create significant
Arno Hollosi wrote:
>> backward compatible, but UTF-16 is not, so the parser will never be able
>> to find the CA property in the first place.
> Not necessarily. We could define that the CA property is only valid for
> text fields *after* the CA property occurs in the text.
> But there is one more argument against UTF-16: the parser should be able
> to find the end of a text property by looking for Latin1 ']' [*] (i.e.
> without knowing about the encoding of the text) I am not sure that UTF-16
> has no byte value of ']' occuring e.g. as part of some chinese character.
> So the argument for Latin1 compatibility is a very strong one.
> [*] actually: Latin1 ']' that is not preceeded by Latin1 '\'
> SGF spec: http://www.red-bean.com/sgf/
> Contact: Arno Hollosi <ahollosi@...>
> Yahoo! Groups Links