Re: [hackers-il] About Doug McIlroy's "Development of a Spelling List"
- On Sun, Dec 27, 2009, Shlomi Fish wrote about "[hackers-il] About Doug McIlroy's "Development of a Spelling List"":
> In:Funny, 8 years ago I mentioned a paper which was 20 year old at the
> Nadav mentioned this paper:
time, and now we talk about it again ;-)
> For a long time I've intended to read it as I wanted to seek wisdom aboutI think you missed the point. McIlroy did not invent any new magical
> sophisticated and/or compact data structures and algorithms. A few days ago I
> finally took some time to read it and finished it in one day. Here is my
> impression of it:
> To sum up, I found it disappointing. There was very little of substance there
> regarding data structures and algorithms. McIlroy only briefly discussed it.
algorithm. Heck, this is a 28 year old paper - any algorithm he used is
bound to be old by now, by definition ;-) And of course, not the entire paper
is about saving memory - which in those days wasn't a novelty, as all
computers had very little memory!
But if you read the paper (and I have to admit I haven't done so in 10
years, so I don't remember any of the details) you get the sense of all the
hoops he had to jump through to use less than a few dozen Ks.
Nowadays, do you think that anybody would even bother with those things?
You'd just create a hash-table (which is probably the biggest data structure
you can imagine for a word list) an fill it with 30,000 words, and take
roughly 1 MB. In a paper, you wouldn't even bother mention how you did it,
Who cares about 1MB? Back then, people cared. He wanted to sqeeze it in 52 K.
Nowadays, if you run "aspell" for English it takes 1.7 MB. "hunspell" for
English it takes 6 MB. Does it have to be this way? No. Of course that this
article, written 28 years ago, doesn't spell-out (pun intended) why - but
it can give you a few ideas on what people focused on back then, and what
they focus-on now - for better and for worse.
> Most of it was just about creating the word list, and how the English stemsI agree that calling it "An interesting paper about the problems people
> and transformations were implemented, and normally, that would have interested
> me, but that's not what I was looking for in that particular article given
> Nadav's introduction.
faced on older computers with very limited memories" was not accurate -
it's not *about* the problems, it's about writing a spell-checker, but
the problem of limited memory shows through many of his design decisions
and throughout the discussion.
You didn't expected an article about how to write a spell-checker to be
just, or even mainly, about how to save memory, did you?
I hope you at least learned new things and ideas while reading it. Maybe
one day you'll want to write a spell-checker, and can apply some of these
ideas. Stranger things have happened...
> I should also note that like most scientific papers I finished reading, II disagree. Scientific papers indeed often do not give you a line-by-line
> found that it avoided giving all the relevant details, which left me with too
> little understanding to implement them myself. Maybe scientific papers are not
> really intended to be a technical specification, but it's still a bit
> dissapointing to be hanging in mid-air like that.
recipe of how to do something. The code is for that (like all of AT&T's
Unix code, spell's source code *was* available for anybody who really looked
for it). But the paper better explains *why* things were done the way they
were done. And if you want to write something like this yourself, even if
it doesn't spoon-feed you every detail, it can at least save you a lot of
time by pointing you at the right direction.
Nadav Har'El | Wednesday, Jan 6 2010, 20 Tevet 5770
Phone +972-523-790466, ICQ 13349191 |The world is coming to an end ... SAVE
http://nadav.harel.org.il |YOUR BUFFERS!!!