- Dec 20, 2000
> I need a way to compare two small populations (very small sample sizes...

My thinking is that, on one hand, there is no theoretical lower limit to

>5 and 6... both of which lack normality). I would like to compare them

>based on 3-5 parameters. Because of the above limitations I have given up

>on the validity of a t-test (which assumes a normal distribution and

>larger sample sizes). My basic question is this: are these two small

>populations statistically different or do they belong to the same

>population ?

the minimum number of data, except the evident "at least one" ! (or two if

some spreadth information is required). What changes with such very small

samples is the 2nd kind risk value, if calculable, which becomes high. Said

differently, to a 1st kind risk given, the decision criteron becomes so

"wide", that it has a very low power of discreminating between statistical

coherence, which is the question given, and bad luck coincidence of the

data (for instance, on one scales plate, five or six realisations of a

random variate defined between 0 and 1 and having a bimodal distribution

with modes at 0.25 ans 0.75, and on the other plate, not more much data

from a normal random variate of mean = 0.5 and standard deviation = 0.25 --

hue ! just a guess of a counter-example...).

On the other hand, practically, having no restriction on the class of

plausible probability laws implies the non-parametric test, which decision

intervals can not necessarily be calculated to a known precision. More

precisely, in the present case, the test I am thinking of, to compare two

samples for being from the same parent distribution with no other

assumption, is the Kolmogorov-Smirnov test, which is based on the

distribution of the maximum absolute difference between the two empirical

cumulative functions (CDFs), a distribution which pdf expression is only

known _assymptotical_ (as far as I have learned in stat books... ; more

other, the assymptotic function is an infinite serie, which may show in

some cases a poor numerical convergence -- but this is another story). By

assymptotic is meant that the approximation becomes more and more valid

when the sample size increases. However, no idea is given to the quality of

this approximation ! And in the present case, as the question relates to

very small samples, I have found nothing on the validity of this

criteria... So if some theoretical statistitian can confirm, or invalidate

and complete this, I'll be happy to learn more.

A last word : in case you (I mean, anyone on the list) is interested, I

have written a Matlab script (v.4.2) that does the job : asking for two

samples files, drawing these samples and their two associated empirical

CDFs, calculating the max difference, and evaluating the corresponding

probability according the (assymptotic) K-S law.

--E'ric Lewin

PS: I am not fully sure of the exact statistical english terminology (1st

or 2nd "kind risk", etc.); if I am wrong, thanks for correcting me.

+=[ Éric LEWIN <mailto:eric.lewin@...> Tél: (33/0)4 76 63 59 13 ]=+

+===[ LGCA (Labo. de Géodynamique des Chaînes Alpines), Grenoble (France) ]===+

--

* To post a message to the list, send it to ai-geostats@...

* As a general service to the users, please remember to post a summary of any useful responses to your questions.

* To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list

* Support to the list is provided at http://www.ai-geostats.org - << Previous post in topic Next post in topic >>