Loading ...
Sorry, an error occurred while loading the content.

Re: [hackers-il] About Doug McIlroy's "Development of a Spelling List"

Expand Messages
  • Nadav Har'El
    ... Funny, 8 years ago I mentioned a paper which was 20 year old at the time, and now we talk about it again ;-) ... I think you missed the point. McIlroy did
    Message 1 of 2 , Jan 5, 2010
    • 0 Attachment
      On Sun, Dec 27, 2009, Shlomi Fish wrote about "[hackers-il] About Doug McIlroy's "Development of a Spelling List"":
      > In:
      > http://tech.groups.yahoo.com/group/hackers-il/message/2532
      > Nadav mentioned this paper:
      > http://www.cs.dartmouth.edu/~doug/spell.ps.gz

      Funny, 8 years ago I mentioned a paper which was 20 year old at the
      time, and now we talk about it again ;-)

      > For a long time I've intended to read it as I wanted to seek wisdom about
      > sophisticated and/or compact data structures and algorithms. A few days ago I
      > finally took some time to read it and finished it in one day. Here is my
      > impression of it:
      > To sum up, I found it disappointing. There was very little of substance there
      > regarding data structures and algorithms. McIlroy only briefly discussed it.

      I think you missed the point. McIlroy did not invent any new magical
      algorithm. Heck, this is a 28 year old paper - any algorithm he used is
      bound to be old by now, by definition ;-) And of course, not the entire paper
      is about saving memory - which in those days wasn't a novelty, as all
      computers had very little memory!

      But if you read the paper (and I have to admit I haven't done so in 10
      years, so I don't remember any of the details) you get the sense of all the
      hoops he had to jump through to use less than a few dozen Ks.
      Nowadays, do you think that anybody would even bother with those things?
      You'd just create a hash-table (which is probably the biggest data structure
      you can imagine for a word list) an fill it with 30,000 words, and take
      roughly 1 MB. In a paper, you wouldn't even bother mention how you did it,
      Who cares about 1MB? Back then, people cared. He wanted to sqeeze it in 52 K.

      Nowadays, if you run "aspell" for English it takes 1.7 MB. "hunspell" for
      English it takes 6 MB. Does it have to be this way? No. Of course that this
      article, written 28 years ago, doesn't spell-out (pun intended) why - but
      it can give you a few ideas on what people focused on back then, and what
      they focus-on now - for better and for worse.

      > Most of it was just about creating the word list, and how the English stems
      > and transformations were implemented, and normally, that would have interested
      > me, but that's not what I was looking for in that particular article given
      > Nadav's introduction.

      I agree that calling it "An interesting paper about the problems people
      faced on older computers with very limited memories" was not accurate -
      it's not *about* the problems, it's about writing a spell-checker, but
      the problem of limited memory shows through many of his design decisions
      and throughout the discussion.

      You didn't expected an article about how to write a spell-checker to be
      just, or even mainly, about how to save memory, did you?
      I hope you at least learned new things and ideas while reading it. Maybe
      one day you'll want to write a spell-checker, and can apply some of these
      ideas. Stranger things have happened...

      > I should also note that like most scientific papers I finished reading, I
      > found that it avoided giving all the relevant details, which left me with too
      > little understanding to implement them myself. Maybe scientific papers are not
      > really intended to be a technical specification, but it's still a bit
      > dissapointing to be hanging in mid-air like that.

      I disagree. Scientific papers indeed often do not give you a line-by-line
      recipe of how to do something. The code is for that (like all of AT&T's
      Unix code, spell's source code *was* available for anybody who really looked
      for it). But the paper better explains *why* things were done the way they
      were done. And if you want to write something like this yourself, even if
      it doesn't spoon-feed you every detail, it can at least save you a lot of
      time by pointing you at the right direction.

      Nadav Har'El | Wednesday, Jan 6 2010, 20 Tevet 5770
      nyh@... |-----------------------------------------
      Phone +972-523-790466, ICQ 13349191 |The world is coming to an end ... SAVE
      http://nadav.harel.org.il |YOUR BUFFERS!!!
    Your message has been successfully submitted and would be delivered to recipients shortly.