I have recently taken Arno s sgfc source and built a program that processes collections of SGF files and produces statistics on property and value usage. I veMessage 1 of 2 , May 3, 2007View SourceI have recently taken Arno's sgfc source and built a program that processes collections of
SGF files and produces statistics on property and value usage. I've been building a goban
in Second Life and working on a SGF viewer feature for it. I wanted to know if some of the
more obscure properties were used enough to warrant implementation.
I ran it over about five thousand SGF files from the GoDatabases page over at Sensei's and
all five thousand problems from goproblems.com with interesting results. For example:
* Other than text comments (C and N), move and node annotation is basically never used.
* The graphic markup AR, LN, and DD, is never used. Nor is SL.
* The character set (CA), if ever specified, is almost always UTF-8.
The source is available:
It is a C++ file that includes its own main(). Just link it with all the object files from sgfc
except main.o and save.o.
The full output from my two 5,000 file runs are at:
Of course, I'd like to run it over a few other collections to get a wider representation, but I
though this might be interesting data to anyone working on SGF FF and/or some future
XML go format.
Mark, nice work. Thanks for the information. ... It would be interesting to run this program also on e.g. the GTL archive. It contains game reviews andMessage 1 of 2 , May 3, 2007View SourceMark,
nice work. Thanks for the information.
> and produces statistics on property and value usage.It would be interesting to run this program also on e.g. the GTL archive.
It contains game reviews and therefore I would assume that it has more
markup properties than you would find in a collection of professional
games. I don't have time to do this myself during the next two weeks, so
if anyone beats me to it post your results here :o)
> I wanted to know if some of theHeresy! Of course, one could argue that by leaving out support for such
> more obscure properties were used enough to warrant implementation.
properties, you contribute to the status quo and such properties will
never emerge from obscurity. I for one am planning to integrate AR & LN
into SenseisLibrary's diagrams.
> * The character set (CA), if ever specified, is almost always UTF-8.Which goes to show that I should nail down UTF-8 usage in the
specification according to our discussion some months ago.