## AI-GEOSTATS: In need of some help.

Expand Messages
• Hi Folks! This is my first post to this list. Hope it is not out of place. I need a way to compare two small populations (very small sample sizes..5 and
Message 1 of 7 , Dec 19, 2000
Hi Folks!
This is my first post to this list. Hope it is not out of place. I need a
way to compare two small populations (very small sample sizes..5 and 6....both
of which lack normality). I would like to compare them based on 3-5 parameters.
Because of the above limitations I have given up on the validity of a t-test
(which assumes a normal distribution and larger sample sizes). My basic question
is this: are these two small populations statistically different or do they
belong to the same population? I have asked many elementary level stats folks
and have not been entirely satisfied with their solutions. So, I pose this
'problem' to you.
Happy Holidays!
-Harland

--
* To post a message to the list, send it to ai-geostats@...
* As a general service to the users, please remember to post a summary of any useful responses to your questions.
* To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
* Support to the list is provided at http://www.ai-geostats.org
• ... My thinking is that, on one hand, there is no theoretical lower limit to the minimum number of data, except the evident at least one ! (or two if some
Message 2 of 7 , Dec 20, 2000
> I need a way to compare two small populations (very small sample sizes...
>5 and 6... both of which lack normality). I would like to compare them
>based on 3-5 parameters. Because of the above limitations I have given up
>on the validity of a t-test (which assumes a normal distribution and
>larger sample sizes). My basic question is this: are these two small
>populations statistically different or do they belong to the same
>population ?

My thinking is that, on one hand, there is no theoretical lower limit to
the minimum number of data, except the evident "at least one" ! (or two if
some spreadth information is required). What changes with such very small
samples is the 2nd kind risk value, if calculable, which becomes high. Said
differently, to a 1st kind risk given, the decision criteron becomes so
"wide", that it has a very low power of discreminating between statistical
coherence, which is the question given, and bad luck coincidence of the
data (for instance, on one scales plate, five or six realisations of a
random variate defined between 0 and 1 and having a bimodal distribution
with modes at 0.25 ans 0.75, and on the other plate, not more much data
from a normal random variate of mean = 0.5 and standard deviation = 0.25 --
hue ! just a guess of a counter-example...).

On the other hand, practically, having no restriction on the class of
plausible probability laws implies the non-parametric test, which decision
intervals can not necessarily be calculated to a known precision. More
precisely, in the present case, the test I am thinking of, to compare two
samples for being from the same parent distribution with no other
assumption, is the Kolmogorov-Smirnov test, which is based on the
distribution of the maximum absolute difference between the two empirical
cumulative functions (CDFs), a distribution which pdf expression is only
known _assymptotical_ (as far as I have learned in stat books... ; more
other, the assymptotic function is an infinite serie, which may show in
some cases a poor numerical convergence -- but this is another story). By
assymptotic is meant that the approximation becomes more and more valid
when the sample size increases. However, no idea is given to the quality of
this approximation ! And in the present case, as the question relates to
very small samples, I have found nothing on the validity of this
criteria... So if some theoretical statistitian can confirm, or invalidate

A last word : in case you (I mean, anyone on the list) is interested, I
have written a Matlab script (v.4.2) that does the job : asking for two
samples files, drawing these samples and their two associated empirical
CDFs, calculating the max difference, and evaluating the corresponding
probability according the (assymptotic) K-S law.

--E'ric Lewin

PS: I am not fully sure of the exact statistical english terminology (1st
or 2nd "kind risk", etc.); if I am wrong, thanks for correcting me.

+=[ Éric LEWIN <mailto:eric.lewin@...> Tél: (33/0)4 76 63 59 13 ]=+
+===[ LGCA (Labo. de Géodynamique des Chaînes Alpines), Grenoble (France) ]===+

--
* To post a message to the list, send it to ai-geostats@...
* As a general service to the users, please remember to post a summary of any useful responses to your questions.
* To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
* Support to the list is provided at http://www.ai-geostats.org
• You could probably try using the Mann-Whitney U test which is a distribution-free (nonparametric) statistical test (I assume you re comparing two means). I m
Message 3 of 7 , Dec 20, 2000
You could probably try using the Mann-Whitney U test which is a
distribution-free (nonparametric) statistical test (I assume you're
comparing two means). I'm not to sure about your very small sample sizes,
but it should be all right.

Regards.

Hugo Pilkington
____________________________________________________________
EuroHIV - European Centre for the Epidemiological Monitoring of AIDS
Institut de Veille Sanitaire (InVS)
12, rue du Val d'Osne
94415 Saint-Maurice cedex
France

h.pilkington@...
Tel: +33 (0)141 79 68 68 http://www.ceses.org
Fax: +33 (0)141 79 68 02 http://www.invs.sante.fr

-----Message d'origine-----
De: mercury1@... [SMTP:mercury1@...]
Date: mardi 19 décembre 2000 18:08
À: ai-geostats@...
Objet: AI-GEOSTATS: In need of some help.

Hi Folks!
This is my first post to this list. Hope it is not out of place. I need a
way to compare two small populations (very small sample sizes..5 and
6....both
of which lack normality). I would like to compare them based on 3-5
parameters.
Because of the above limitations I have given up on the validity of a
t-test
(which assumes a normal distribution and larger sample sizes). My basic
question
is this: are these two small populations statistically different or do they
belong to the same population? I have asked many elementary level stats
folks
and have not been entirely satisfied with their solutions. So, I pose this
'problem' to you.
Happy Holidays!
-Harland

--
* To post a message to the list, send it to ai-geostats@...
* As a general service to the users, please remember to post a summary of
any useful responses to your questions.
* To unsubscribe, send an email to majordomo@... with no subject and
"unsubscribe ai-geostats" followed by "end" on the next line in the message
body. DO NOT SEND Subscribe/Unsubscribe requests to the list
* Support to the list is provided at http://www.ai-geostats.org

--
* To post a message to the list, send it to ai-geostats@...
* As a general service to the users, please remember to post a summary of any useful responses to your questions.
* To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
* Support to the list is provided at http://www.ai-geostats.org
• You can use nonparametric tests like bootstrap or permutation (randomization) tests: they give better results with nonnormal and small samples. they can be
Message 4 of 7 , Dec 20, 2000
You can use nonparametric tests like bootstrap or permutation
(randomization) tests: they give better results with nonnormal and small
samples. they can be used for univariate or multivariate tests.

The permutation test approximate the distribution of a statistic under the
H0 hypothesis (no difference):
- Enumerate all the permutations between the 2 samples (if the samples are
too large, the permutations are randomly sampled)
- Evaluate a statistic ( T, F, Manahalobis, nonparametric...) for each
permutation. The statistic is only used as a distance measure.
- Order the simulated values to get the permutation distribution, and
search the position of the true sample.

The test is significant if
- the true value is among the 2.5% values at each end (two side test)
- it is among the 5% upper [lower] values (one side test)

jf

-------------------------------------------------------------
jean-francois LENAIN L.A.S.E.H.
faculte des Sciences
e-mail: lenain@... 87060 Limoges CEDEX (France)
-------------------------------------------------------------

--
* To post a message to the list, send it to ai-geostats@...
* As a general service to the users, please remember to post a summary of any useful responses to your questions.
* To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
* Support to the list is provided at http://www.ai-geostats.org
• Mann-Whitney U test does require that the distributions of two populations are similar. It is distribution free but not assumption free. Resampling (or
Message 5 of 7 , Dec 20, 2000
Mann-Whitney U test does require that the distributions of two populations
are similar. It is distribution free but not assumption free.

Resampling (or randomization) methods should work. Basically, you will mix
the two samples, and then randomly separate them into two groups, and
calculate the parameter you are interested (e.g., average difference);
repeated the process for a large number of times. You will get a
distribution of the parameter, compare the parameter estimate from original
data with this distribution to see how likely you would be able to get the
original value by chance.

You can run this through Excel by set up a macro. Or check
www.resample.com, there is information about the methods and software.

Best,

Yong Wang

At 10:08 AM 12/19/00 -0700, mercury1@... wrote:
>
>Hi Folks!
>This is my first post to this list. Hope it is not out of place. I need a
>way to compare two small populations (very small sample sizes..5 and
6....both
>of which lack normality). I would like to compare them based on 3-5
parameters.
> Because of the above limitations I have given up on the validity of a t-test
>(which assumes a normal distribution and larger sample sizes). My basic
question
>is this: are these two small populations statistically different or do they
>belong to the same population? I have asked many elementary level stats
folks
>and have not been entirely satisfied with their solutions. So, I pose this
>'problem' to you.
>Happy Holidays!
> -Harland
>
>--
>* To post a message to the list, send it to ai-geostats@...
>* As a general service to the users, please remember to post a summary of
any useful responses to your questions.
>* To unsubscribe, send an email to majordomo@... with no subject and
"unsubscribe ai-geostats" followed by "end" on the next line in the message
body. DO NOT SEND Subscribe/Unsubscribe requests to the list
>* Support to the list is provided at http://www.ai-geostats.org
>

--
* To post a message to the list, send it to ai-geostats@...
* As a general service to the users, please remember to post a summary of any useful responses to your questions.
* To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
* Support to the list is provided at http://www.ai-geostats.org
• Harland, You could use the exact form of the Wilcoxon Rank Sum test, which is appropriate for sample sizes of 10 or less per group. Computational details are
Message 6 of 7 , Dec 28, 2000
Harland,

You could use the exact form of the Wilcoxon Rank Sum test, which is
appropriate for sample sizes of 10 or less per group. Computational details
are shown on p. 120 of "Statistical Methods in Water Resources," Helsel and
Hirsch, 1992, Elsevier. The test is commonly used to determine whether two
groups are from the same population (i.e. have the same median and other
percentiles), or alternatively whether the medians are different.

Tom Nolan

> -----Original Message-----
> From: ai-geostats-list@... [mailto:ai-geostats-list@...]On
> Behalf Of mercury1@...
> Sent: Tuesday, December 19, 2000 12:08 PM
> To: ai-geostats@...
> Subject: AI-GEOSTATS: In need of some help.
>
>
>
> Hi Folks!
> This is my first post to this list. Hope it is not out of place.
> I need a
> way to compare two small populations (very small sample sizes..5
> and 6....both
> of which lack normality). I would like to compare them based on
> 3-5 parameters.
> Because of the above limitations I have given up on the validity
> of a t-test
> (which assumes a normal distribution and larger sample sizes).
> My basic question
> is this: are these two small populations statistically different
> or do they
> belong to the same population? I have asked many elementary
> level stats folks
> and have not been entirely satisfied with their solutions. So, I
> pose this
> 'problem' to you.
> Happy Holidays!
> -Harland
>
> --
> * To post a message to the list, send it to ai-geostats@...
> * As a general service to the users, please remember to post a
> summary of any useful responses to your questions.
> * To unsubscribe, send an email to majordomo@... with no
> subject and "unsubscribe ai-geostats" followed by "end" on the
> next line in the message body. DO NOT SEND Subscribe/Unsubscribe
> requests to the list
> * Support to the list is provided at http://www.ai-geostats.org

--
* To post a message to the list, send it to ai-geostats@...
* As a general service to the users, please remember to post a summary of any useful responses to your questions.
* To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
* Support to the list is provided at http://www.ai-geostats.org
• With five to six samples per population, concluding anything from the tests would really be pushing it. Complementing the results with any deterministic
Message 7 of 7 , Dec 28, 2000
With five to six samples per population, concluding
anything from the tests would really be pushing it.
Complementing the results with any deterministic
knowledge of the underlying population (genesis,
noteworthy features, prior experience, etc) could lend
some measure of validity to what you will eventually
conclude from such tests (i.e. do they make sense).

Unfortunately, doing that often leaves one in the
unsavory position of realizing that there is more
uncertainty than first thought of. Somewhat counter-intuitive,
but so true in my personal experience.

Syed

-----Original Message-----
From: Tom Nolan <btnolan@...>
To: <ai-geostats@...>
Date: Thursday, December 28, 2000 10:41 PM
Subject: RE: AI-GEOSTATS: In need of some help.

>Harland,
>
>You could use the exact form of the Wilcoxon Rank Sum test, which is
>appropriate for sample sizes of 10 or less per group. Computational
details
>are shown on p. 120 of "Statistical Methods in Water Resources," Helsel and
>Hirsch, 1992, Elsevier. The test is commonly used to determine whether two
>groups are from the same population (i.e. have the same median and other
>percentiles), or alternatively whether the medians are different.
>
>Tom Nolan
>
>> -----Original Message-----
>> From: ai-geostats-list@... [mailto:ai-geostats-list@...]On
>> Behalf Of mercury1@...
>> Sent: Tuesday, December 19, 2000 12:08 PM
>> To: ai-geostats@...
>> Subject: AI-GEOSTATS: In need of some help.
>>
>>
>>
>> Hi Folks!
>> This is my first post to this list. Hope it is not out of place.
>> I need a
>> way to compare two small populations (very small sample sizes..5
>> and 6....both
>> of which lack normality). I would like to compare them based on
>> 3-5 parameters.
>> Because of the above limitations I have given up on the validity
>> of a t-test
>> (which assumes a normal distribution and larger sample sizes).
>> My basic question
>> is this: are these two small populations statistically different
>> or do they
>> belong to the same population? I have asked many elementary
>> level stats folks
>> and have not been entirely satisfied with their solutions. So, I
>> pose this
>> 'problem' to you.
>> Happy Holidays!
>> -Harland
>>
>> --
>> * To post a message to the list, send it to ai-geostats@...
>> * As a general service to the users, please remember to post a
>> summary of any useful responses to your questions.
>> * To unsubscribe, send an email to majordomo@... with no
>> subject and "unsubscribe ai-geostats" followed by "end" on the
>> next line in the message body. DO NOT SEND Subscribe/Unsubscribe
>> requests to the list
>> * Support to the list is provided at http://www.ai-geostats.org
>
>
>--
>* To post a message to the list, send it to ai-geostats@...
>* As a general service to the users, please remember to post a summary of
any useful responses to your questions.
>* To unsubscribe, send an email to majordomo@... with no subject and
"unsubscribe ai-geostats" followed by "end" on the next line in the message
body. DO NOT SEND Subscribe/Unsubscribe requests to the list
>* Support to the list is provided at http://www.ai-geostats.org

--
* To post a message to the list, send it to ai-geostats@...
* As a general service to the users, please remember to post a summary of any useful responses to your questions.
* To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
* Support to the list is provided at http://www.ai-geostats.org
Your message has been successfully submitted and would be delivered to recipients shortly.