Loading ...
Sorry, an error occurred while loading the content.

SGF Statistics

Expand Messages
  • Mark
    I have recently taken Arno s sgfc source and built a program that processes collections of SGF files and produces statistics on property and value usage. I ve
    Message 1 of 2 , May 3 7:27 AM
    • 0 Attachment
      I have recently taken Arno's sgfc source and built a program that processes collections of
      SGF files and produces statistics on property and value usage. I've been building a goban
      in Second Life and working on a SGF viewer feature for it. I wanted to know if some of the
      more obscure properties were used enough to warrant implementation.

      I ran it over about five thousand SGF files from the GoDatabases page over at Sensei's and
      all five thousand problems from goproblems.com with interesting results. For example:

      * Other than text comments (C and N), move and node annotation is basically never used.
      * The graphic markup AR, LN, and DD, is never used. Nor is SL.
      * The character set (CA), if ever specified, is almost always UTF-8.
      The source is available:

      http://www.ozonehouse.com/mark/sgf/stats.cpp

      It is a C++ file that includes its own main(). Just link it with all the object files from sgfc
      except main.o and save.o.

      The full output from my two 5,000 file runs are at:

      http://www.ozonehouse.com/mark/sgf/game-stats.txt
      http://www.ozonehouse.com/mark/sgf/problem-stats.txt

      Of course, I'd like to run it over a few other collections to get a wider representation, but I
      though this might be interesting data to anyone working on SGF FF[5] and/or some future
      XML go format.


      - Mark
    • Arno Hollosi
      Mark, nice work. Thanks for the information. ... It would be interesting to run this program also on e.g. the GTL archive. It contains game reviews and
      Message 2 of 2 , May 3 8:45 AM
      • 0 Attachment
        Mark,

        nice work. Thanks for the information.

        > and produces statistics on property and value usage.

        It would be interesting to run this program also on e.g. the GTL archive.
        It contains game reviews and therefore I would assume that it has more
        markup properties than you would find in a collection of professional
        games. I don't have time to do this myself during the next two weeks, so
        if anyone beats me to it post your results here :o)

        > I wanted to know if some of the
        > more obscure properties were used enough to warrant implementation.

        Heresy! Of course, one could argue that by leaving out support for such
        properties, you contribute to the status quo and such properties will
        never emerge from obscurity. I for one am planning to integrate AR & LN
        into SenseisLibrary's diagrams.

        > * The character set (CA), if ever specified, is almost always UTF-8.

        Which goes to show that I should nail down UTF-8 usage in the
        specification according to our discussion some months ago.

        /Arno
      Your message has been successfully submitted and would be delivered to recipients shortly.