Re: "English has the most words of any language"
> One way to measure that would be to determine the 80% and 90% vocabulary ofExtracting the words responsible for the top 90% or so of discourse would
> English and compare it to the size of such vocabularies in other languages.
> That'd give you a rough sense of the size of the lexicon, it seems to me.
> I got the idea from this article, which puts the 80% vocabulary of English
> at 2400 lemmas (I have no idea where that figure comes from). If so, that
> puts English more or less on par with most modern languages.
certainly provide a much more robust measure for the lexicon "size" than
attempting to come up with a usable definition for including a given word
in the so called total lexicon. But I don't think it measures really the
same thing we mean intuitively when talking about the size of the lexicon
of a given language. It rather measures the amount of basic lexical items
one uses to use in a given situation. If you consider compounds and
derivations to be independent lexical items, I'd expect all languages to
end up displaying roughly similar numbers here. In this case you'd really
be measuring the amount of overloading of meaning a language has for its
words, at least if you've managed to analyse equivalent discourse events.
For a large sample of natural discourse you could also try to use a maximum
statistic, namely the total amount of lexical items within the data. This
sound to be much less robust for the size of the dataset and the discourse
topics it includes than the 90% vocabulary approach, however.
The way I would proceed would be to label each word (or what ever
definition for a lexical item we'd be using) with its frequency in a
dataset of fixed length and analyse their full distribution. In other words
I'd plot the histogram of the words at different frequency bins. From the
breadth of the histogram you could see the number of core words used in
basic discourse. There would also be the low frequency tail of the
distribution which would reveal how much specialised vocabulary the
speakers are comfortable in using.
>Thanks. Your permission means a lot to me.
> Argue what you will, then.
Second Person, a chapbook of poetry by Patrick Dunn, is now available for
order from Finishing Line