Loading ...
Sorry, an error occurred while loading the content.

Re: Spell suggestions with postponed prefixes

Expand Messages
  • Bram Moolenaar
    ... The mechanism simply considers every valid word, including prefixes. The edit distance from the badly spelled word is computed, only words with a small
    Message 1 of 3 , Jul 5, 2005
    • 0 Attachment
      Moshe Kaminsky wrote:

      > * Bram Moolenaar <Bram@...> [30/06/05 00:29]:
      > >
      > > I have now also implemented using postponed prefixes for making
      > > suggestions. Currently only Hebrew uses this.
      > >
      > > Please give it a try and see how well it works. I didn't do any
      > > specific scoring for prefixes, since I don't know how to do that.
      > >
      > > It does look like the suggestions are valid words or combinations with
      > > allowed prefixes. But you better check that no suggestions are actually
      > > wrongly spelled words.
      >
      > I checked it on several examples, it looks fine. I noticed that getting
      > a suggestion which is a rare word is quite rare... I guess it can be
      > tuned.
      >
      > I don't know if my original suggestions are used, but I saw the code a
      > few days ago, and it appears to me there is an implementation which both
      > simpler and should give better results: Simply run the state machine
      > with both trees, the prefixes and the words, and when there is a word
      > split, continue each time with the other tree. When passing from the
      > word tree, there should be an actual splitting and a penalty, but not
      > vice versa (when the prefixes are not postponed, both can point to the
      > same tree). This way, the length of the prefix need not be considered,
      > since long prefixes will by penalised anyway for being rare.

      The mechanism simply considers every valid word, including prefixes.
      The edit distance from the badly spelled word is computed, only words
      with a small distance are used. This doesn't take the length of the
      prefix into account, it doesn't matter where the prefix stops and the
      basic word starts. Would it be good to give a penalty to longer
      prefixes? You would need to experiment with this, using a list of
      actual misspellings. Just trying a few artificial misspellings may give
      a wrong impression.

      > > It's in the snapshot that I will upload in a couple of hours.
      > >
      > > I also implemented a different method for sound folding. It's simpler
      > > and faster. I'm using it for Dutch to try out. Should be simple to add
      > > to any language.
      >
      > Is it correct that the (original) SAL mechanism is mainly useful when
      > there are combinations of several letters that sound like one? When
      > there are only several letters such that one sounds similar to another
      > (like c and k), the new method is equivalent to the original? Also, what
      > is the advantage/disadvantage in this case over specifying, say,
      > REP c k
      > (I guess it saves writing when there are more than two that sound the
      > same, but is there any other difference?)

      The SAL mechanism is only useful if you can turn a word into its
      "sound-a-like" equivalent. For English the mechanism is to leave out
      all vowels and do some tricks with "th", "gh", etc. In Dutch I would
      have "sch" sound the same as "s". In general the length of the
      sound-a-like word is much less, thus more words look alike.

      Using REP items with single letters isn't very useful, since that will
      be tried anyway. It's only that they may get a slightly better score
      that way. It counts a lot more when replacing several characters at
      once.

      --
      hundred-and-one symptoms of being an internet addict:
      248. You sign your letters with your e-mail address instead of your name.

      /// Bram Moolenaar -- Bram@... -- http://www.Moolenaar.net \\\
      /// Sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
      \\\ Project leader for A-A-P -- http://www.A-A-P.org ///
      \\\ Buy LOTR 3 and help AIDS victims -- http://ICCF.nl/lotr.html ///
    Your message has been successfully submitted and would be delivered to recipients shortly.