Loading ...
Sorry, an error occurred while loading the content.

39720Re: Speller data structures

Expand Messages
  • Bram Moolenaar
    May 15, 2005
    • 0 Attachment
      Olaf Seibert wrote:

      > Recently I wrote about a trie data structure for spelling word lists.
      > There was some doubt as to the memory efficiency. Therefore I did a
      > small test with the Polish wordlist, which was reputed to be 60
      > megabytes. The one I found was only 40 but was 60 in some expanded form
      > for Aspell. My file was only 1 megabyte.

      That's a good result. The current file size for the Polish Vim .spl
      file is about 3 Mbyte. That includes flags and handling of non-word
      characters, thus it's not completely comparable.

      I'll have a better look at the code later. It looks like you could
      store a character as an int at a node to support Unicode. That should
      not increase the memory use much (struct size is often rounded to 4 bytes
      anyway).

      > (Actually, using Polish is kind-of cheating. Languages with fewer words,
      > or languages that have less regular word endings, have a far lower
      > compression ratio. Dutch or English wordlists probably are about the
      > same size, on disk and in memory, as this Polish list of 3.073.375
      > words.)

      Polish has many words that are alike. Thus this test may give a wrong
      impression. Can you do the same for English and/or Dutch?

      Before this could be used in Vim there would still be a lot of work
      (esp. for handling non-word characters). I'm not sure if it's worth the
      try to see if this approach works better than the current
      implementation. The Trie code doesn't look much simpler.

      - Bram

      --
      "Hegel was right when he said that we learn from history that man can
      never learn anything from history." (George Bernard Shaw)

      /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
      /// Sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
      \\\ Project leader for A-A-P -- http://www.A-A-P.org ///
      \\\ Buy LOTR 3 and help AIDS victims -- http://ICCF.nl/lotr.html ///
    • Show all 3 messages in this topic