Re: Typical lexicon size in natlangs
- On Sat, May 11, 2013 at 12:08 PM, Alex Fink <000024@...> wrote:
>For example, if you know the most common 28 words in English you can
> For instance, it's a number bandied around that knowing 500 hanzi will allow you
> to read 90% of the characters in a Chinese newspaper -- but usually by people
> who don't appreciate the fact that this includes all the grammatical and closed-class
> words, and a swathe of basic lexis, but probably not the ìnteresting word or two
> in the headline you care about.
read 50% of everything written. But what does THAT mean if 50% means
that you can read only 50% of each sentence?
Or, if you get really ambitious you can learn 732 words and read 90%
of everything written in English. If you want to be able to read 99.9%
of everything written in English you will need to learn 2090 words.
(These figures are from my own million-word corpus taken from 20th
century fiction and non-fiction on Gutenberg.com.)
So what does it really mean to say you can read 90% by knowing 732 words?
Maybe the only meaningful measure of lexicon size is how many words
you must know to cover some specified x% of the whole of the written
corpus. That's a very different number for Toki Pona than it is for
English. That way you could talk meaningfully about a specific
language's "90% coverage lexicon", and its "98% coverage lexicon", and
- On Mon, May 20, 2013 at 5:32 PM, Anthony Miles <mamercus88@...> wrote:
> Even in an impoverished environment humans or something like them will expand vocabulary.Sure. But this thread discusses "typical lexicon size", and Gary
Shannon and H. S. Teoh proposed a "bootstrap lexicon size" as a
meaningful measure. And I'm just pointing out that I don't think it
would be a good metric, because if you use it for many languages, and
the resulting size varies, let's say, between X-10% and X+10% for some
X, that does not offer any insight about the *typical* lexicon size of
the languages so tested. Systems of vastly different complexity can
arise from similarly simple foundations (cellular automata are a clear
example of that).