## 30Re: AI-GEOSTATS: In need of some help.

Expand Messages
• Dec 20, 2000
> I need a way to compare two small populations (very small sample sizes...
>5 and 6... both of which lack normality). I would like to compare them
>based on 3-5 parameters. Because of the above limitations I have given up
>on the validity of a t-test (which assumes a normal distribution and
>larger sample sizes). My basic question is this: are these two small
>populations statistically different or do they belong to the same
>population ?

My thinking is that, on one hand, there is no theoretical lower limit to
the minimum number of data, except the evident "at least one" ! (or two if
some spreadth information is required). What changes with such very small
samples is the 2nd kind risk value, if calculable, which becomes high. Said
differently, to a 1st kind risk given, the decision criteron becomes so
"wide", that it has a very low power of discreminating between statistical
coherence, which is the question given, and bad luck coincidence of the
data (for instance, on one scales plate, five or six realisations of a
random variate defined between 0 and 1 and having a bimodal distribution
with modes at 0.25 ans 0.75, and on the other plate, not more much data
from a normal random variate of mean = 0.5 and standard deviation = 0.25 --
hue ! just a guess of a counter-example...).

On the other hand, practically, having no restriction on the class of
plausible probability laws implies the non-parametric test, which decision
intervals can not necessarily be calculated to a known precision. More
precisely, in the present case, the test I am thinking of, to compare two
samples for being from the same parent distribution with no other
assumption, is the Kolmogorov-Smirnov test, which is based on the
distribution of the maximum absolute difference between the two empirical
cumulative functions (CDFs), a distribution which pdf expression is only
known _assymptotical_ (as far as I have learned in stat books... ; more
other, the assymptotic function is an infinite serie, which may show in
some cases a poor numerical convergence -- but this is another story). By
assymptotic is meant that the approximation becomes more and more valid
when the sample size increases. However, no idea is given to the quality of
this approximation ! And in the present case, as the question relates to
very small samples, I have found nothing on the validity of this
criteria... So if some theoretical statistitian can confirm, or invalidate

A last word : in case you (I mean, anyone on the list) is interested, I
have written a Matlab script (v.4.2) that does the job : asking for two
samples files, drawing these samples and their two associated empirical
CDFs, calculating the max difference, and evaluating the corresponding
probability according the (assymptotic) K-S law.

--E'ric Lewin

PS: I am not fully sure of the exact statistical english terminology (1st
or 2nd "kind risk", etc.); if I am wrong, thanks for correcting me.

+=[ Éric LEWIN <mailto:eric.lewin@...> Tél: (33/0)4 76 63 59 13 ]=+
+===[ LGCA (Labo. de Géodynamique des Chaînes Alpines), Grenoble (France) ]===+

--
* To post a message to the list, send it to ai-geostats@...
* As a general service to the users, please remember to post a summary of any useful responses to your questions.
* To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
* Support to the list is provided at http://www.ai-geostats.org
• Show all 7 messages in this topic