Loading ...
Sorry, an error occurred while loading the content.

About Doug McIlroy's "Development of a Spelling List"

Expand Messages
  • Shlomi Fish
    In: http://tech.groups.yahoo.com/group/hackers-il/message/2532 Nadav mentioned this paper: http://www.cs.dartmouth.edu/~doug/spell.ps.gz For a long time I ve
    Message 1 of 2 , Dec 26, 2009
    • 0 Attachment
      In:

      http://tech.groups.yahoo.com/group/hackers-il/message/2532

      Nadav mentioned this paper:

      http://www.cs.dartmouth.edu/~doug/spell.ps.gz

      For a long time I've intended to read it as I wanted to seek wisdom about
      sophisticated and/or compact data structures and algorithms. A few days ago I
      finally took some time to read it and finished it in one day. Here is my
      impression of it:

      To sum up, I found it disappointing. There was very little of substance there
      regarding data structures and algorithms. McIlroy only briefly discussed it.
      Most of it was just about creating the word list, and how the English stems
      and transformations were implemented, and normally, that would have interested
      me, but that's not what I was looking for in that particular article given
      Nadav's introduction.

      I should also note that like most scientific papers I finished reading, I
      found that it avoided giving all the relevant details, which left me with too
      little understanding to implement them myself. Maybe scientific papers are not
      really intended to be a technical specification, but it's still a bit
      dissapointing to be hanging in mid-air like that.

      Regards,

      Shlomi Fish

      --
      -----------------------------------------------------------------
      Shlomi Fish http://www.shlomifish.org/
      First stop for Perl beginners - http://perl-begin.org/

      Bzr is slower than Subversion in combination with Sourceforge.
      ( By: http://dazjorz.com/ )
    • Nadav Har'El
      ... Funny, 8 years ago I mentioned a paper which was 20 year old at the time, and now we talk about it again ;-) ... I think you missed the point. McIlroy did
      Message 2 of 2 , Jan 5, 2010
      • 0 Attachment
        On Sun, Dec 27, 2009, Shlomi Fish wrote about "[hackers-il] About Doug McIlroy's "Development of a Spelling List"":
        > In:
        >
        > http://tech.groups.yahoo.com/group/hackers-il/message/2532
        >
        > Nadav mentioned this paper:
        >
        > http://www.cs.dartmouth.edu/~doug/spell.ps.gz

        Funny, 8 years ago I mentioned a paper which was 20 year old at the
        time, and now we talk about it again ;-)

        > For a long time I've intended to read it as I wanted to seek wisdom about
        > sophisticated and/or compact data structures and algorithms. A few days ago I
        > finally took some time to read it and finished it in one day. Here is my
        > impression of it:
        >
        > To sum up, I found it disappointing. There was very little of substance there
        > regarding data structures and algorithms. McIlroy only briefly discussed it.

        I think you missed the point. McIlroy did not invent any new magical
        algorithm. Heck, this is a 28 year old paper - any algorithm he used is
        bound to be old by now, by definition ;-) And of course, not the entire paper
        is about saving memory - which in those days wasn't a novelty, as all
        computers had very little memory!

        But if you read the paper (and I have to admit I haven't done so in 10
        years, so I don't remember any of the details) you get the sense of all the
        hoops he had to jump through to use less than a few dozen Ks.
        Nowadays, do you think that anybody would even bother with those things?
        You'd just create a hash-table (which is probably the biggest data structure
        you can imagine for a word list) an fill it with 30,000 words, and take
        roughly 1 MB. In a paper, you wouldn't even bother mention how you did it,
        Who cares about 1MB? Back then, people cared. He wanted to sqeeze it in 52 K.

        Nowadays, if you run "aspell" for English it takes 1.7 MB. "hunspell" for
        English it takes 6 MB. Does it have to be this way? No. Of course that this
        article, written 28 years ago, doesn't spell-out (pun intended) why - but
        it can give you a few ideas on what people focused on back then, and what
        they focus-on now - for better and for worse.

        > Most of it was just about creating the word list, and how the English stems
        > and transformations were implemented, and normally, that would have interested
        > me, but that's not what I was looking for in that particular article given
        > Nadav's introduction.

        I agree that calling it "An interesting paper about the problems people
        faced on older computers with very limited memories" was not accurate -
        it's not *about* the problems, it's about writing a spell-checker, but
        the problem of limited memory shows through many of his design decisions
        and throughout the discussion.

        You didn't expected an article about how to write a spell-checker to be
        just, or even mainly, about how to save memory, did you?
        I hope you at least learned new things and ideas while reading it. Maybe
        one day you'll want to write a spell-checker, and can apply some of these
        ideas. Stranger things have happened...

        > I should also note that like most scientific papers I finished reading, I
        > found that it avoided giving all the relevant details, which left me with too
        > little understanding to implement them myself. Maybe scientific papers are not
        > really intended to be a technical specification, but it's still a bit
        > dissapointing to be hanging in mid-air like that.

        I disagree. Scientific papers indeed often do not give you a line-by-line
        recipe of how to do something. The code is for that (like all of AT&T's
        Unix code, spell's source code *was* available for anybody who really looked
        for it). But the paper better explains *why* things were done the way they
        were done. And if you want to write something like this yourself, even if
        it doesn't spoon-feed you every detail, it can at least save you a lot of
        time by pointing you at the right direction.

        --
        Nadav Har'El | Wednesday, Jan 6 2010, 20 Tevet 5770
        nyh@... |-----------------------------------------
        Phone +972-523-790466, ICQ 13349191 |The world is coming to an end ... SAVE
        http://nadav.harel.org.il |YOUR BUFFERS!!!
      Your message has been successfully submitted and would be delivered to recipients shortly.