## RE: [ai-geostats] F and T-test for samples drawn from the same p

Expand Messages
• ... On the face of it, the scenario you describe corresponds to a standard t-test (which involves an assumption that the variances of the two populations do
Message 1 of 16 , Dec 3, 2004
> Hello everyone,
>
> I have two groups of several thousand samples analysed
> for various elements, and wish to determine if these
> samples are drawn from the same statistical population
> for later variography studies. I propose to test the two
> groups by using a F-test to test the sample variances,
> and a T-test to test the group means, at a given confidence limit.
>
> Before I do this, I wonder how I would interpret the results
> of the test if, for example:
>
> 1. The F-test suggests no significant statistical difference
> between the variances at a 90% confidence limit, BUT
> 2. The T-test suggests a significant statistical difference
> between the means at the same, or lower confidence limit.
>
> Has anyone come across this scenario before and how are they
> interpreted?

On the face of it, the scenario you describe corresponds to
a standard t-test (which involves an assumption that the
variances of the two populations do not differ), though I'm
not sure what you mean in (2) by significant "at the same,
or lower confidence limit." (Do I take it that in (1) you
mean that the P-value for the F test is 0.1 or less?)

However, if you get significant difference between the variances
in (1), then it may not be very good to use the standard
t test (depending on how different they are). A modified
version, such as the Welch test, should be used instead.

There is an issue with interpreting the results where the
samples have initially been screened by one test, before
another one is applied, since the sampling distribution
of the second test, conditional on the outcome of the
first, may not be the same as the sampling distribution of
the second test on its own. However, I feel inclined to
guess that this may not make any important difference

Hoping this helps,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding@...>
Fax-to-email: +44 (0)870 094 0861 [NB: New number!]
Date: 03-Dec-04 Time: 14:15:09
------------------------------ XFMail ------------------------------
• Hi Ted, Thanks for your reply. I suspect my original query was too vague, so I will illustrate it with a practical example here. I have an ore horizon that
Message 2 of 16 , Dec 3, 2004
Hi Ted,

Thanks for your reply. I suspect my original query was too vague, so I will
illustrate it with a practical example here.

I have an ore horizon that splits into two separate horizons. One of these
split horizons has a lower average grade, and the other has a higher average
grade. I need to determine whether I should treat these two horizons as
separate entities during grade estimation. My geological observations tell
me that these two horizons derive from the same source, and on the face of
it are not different from one another in terms of mineral content and
genesis. I aim to back it up by proving, or attempting to prove, that
statistically these two horizons are the same, and can be treated as such as
far as grade estimation goes. Because the mean grades vary between the two,
I suspect that the T-test might fail, but I also suspect that the variance
in grade between the two might be very similar, and thus the F-test will
pass. Now I have a problem : a T-test tells me the populations differ
statistically, and but the F-test tells me they don't.

The confidence limit I refer to in (2) by the way is the Alpha value used to
determine the confidence level for the test - I am using Excel to do the
test.

Thanks,
Colin

-----Original Message-----
From: Ted.Harding@... [mailto:Ted.Harding@...]
Sent: 03 December 2004 14:15
Cc: ai-geostats@...
Subject: RE: [ai-geostats] F and T-test for samples drawn from the same
p

> Hello everyone,
>
> I have two groups of several thousand samples analysed
> for various elements, and wish to determine if these
> samples are drawn from the same statistical population
> for later variography studies. I propose to test the two
> groups by using a F-test to test the sample variances,
> and a T-test to test the group means, at a given confidence limit.
>
> Before I do this, I wonder how I would interpret the results
> of the test if, for example:
>
> 1. The F-test suggests no significant statistical difference
> between the variances at a 90% confidence limit, BUT
> 2. The T-test suggests a significant statistical difference
> between the means at the same, or lower confidence limit.
>
> Has anyone come across this scenario before and how are they
> interpreted?

On the face of it, the scenario you describe corresponds to
a standard t-test (which involves an assumption that the
variances of the two populations do not differ), though I'm
not sure what you mean in (2) by significant "at the same,
or lower confidence limit." (Do I take it that in (1) you
mean that the P-value for the F test is 0.1 or less?)

However, if you get significant difference between the variances
in (1), then it may not be very good to use the standard
t test (depending on how different they are). A modified
version, such as the Welch test, should be used instead.

There is an issue with interpreting the results where the
samples have initially been screened by one test, before
another one is applied, since the sampling distribution
of the second test, conditional on the outcome of the
first, may not be the same as the sampling distribution of
the second test on its own. However, I feel inclined to
guess that this may not make any important difference

Hoping this helps,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding@...>
Fax-to-email: +44 (0)870 094 0861 [NB: New number!]
Date: 03-Dec-04 Time: 14:15:09
------------------------------ XFMail ------------------------------
• Standard t-tests make two assumptions: 1. both data sets are normally distributed; 2. they have approximately equal variance. Test these assumptions before
Message 3 of 16 , Dec 3, 2004
Standard t-tests make two assumptions: 1. both data sets are normally
distributed; 2. they have approximately equal variance. Test these
assumptions before applying a t-test. Violate these assumptions at your
own risk. If you fail either assumption, you need to consider your
options, but probably should not use a plain-vanilla t-test. You could
possibly use a data transform to "fix" the first assumption. You might
have to use a modified t-test (such as Satterthwaite's modification) Or
you might consider a non-parametric approach, such as Mann-Whitney
U-test.

Tim Glover
Senior Environmental Scientist - Geochemistry
Geoenvironmental Department
MACTEC Engineering and Consulting, Inc.
Kennesaw, Georgia, USA
Office 770-421-3310
Fax 770-421-3486
Email ntglover@...
Web www.mactec.com

-----Original Message-----
Sent: Friday, December 03, 2004 9:59 AM
To: 'ted.harding@...'
Cc: 'ai-geostats@...'
Subject: RE: [ai-geostats] F and T-test for samples drawn from the same
p

Hi Ted,

Thanks for your reply. I suspect my original query was too vague, so I
will
illustrate it with a practical example here.

I have an ore horizon that splits into two separate horizons. One of
these
split horizons has a lower average grade, and the other has a higher
average
grade. I need to determine whether I should treat these two horizons as
separate entities during grade estimation. My geological observations
tell
me that these two horizons derive from the same source, and on the face
of
it are not different from one another in terms of mineral content and
genesis. I aim to back it up by proving, or attempting to prove, that
statistically these two horizons are the same, and can be treated as
such as
far as grade estimation goes. Because the mean grades vary between the
two,
I suspect that the T-test might fail, but I also suspect that the
variance
in grade between the two might be very similar, and thus the F-test will
pass. Now I have a problem : a T-test tells me the populations differ
statistically, and but the F-test tells me they don't.

The confidence limit I refer to in (2) by the way is the Alpha value
used to
determine the confidence level for the test - I am using Excel to do the
test.

Thanks,
Colin

-----Original Message-----
From: Ted.Harding@... [mailto:Ted.Harding@...]
Sent: 03 December 2004 14:15
Cc: ai-geostats@...
Subject: RE: [ai-geostats] F and T-test for samples drawn from the same
p

> Hello everyone,
>
> I have two groups of several thousand samples analysed
> for various elements, and wish to determine if these
> samples are drawn from the same statistical population
> for later variography studies. I propose to test the two
> groups by using a F-test to test the sample variances,
> and a T-test to test the group means, at a given confidence limit.
>
> Before I do this, I wonder how I would interpret the results
> of the test if, for example:
>
> 1. The F-test suggests no significant statistical difference
> between the variances at a 90% confidence limit, BUT
> 2. The T-test suggests a significant statistical difference
> between the means at the same, or lower confidence limit.
>
> Has anyone come across this scenario before and how are they
> interpreted?

On the face of it, the scenario you describe corresponds to
a standard t-test (which involves an assumption that the
variances of the two populations do not differ), though I'm
not sure what you mean in (2) by significant "at the same,
or lower confidence limit." (Do I take it that in (1) you
mean that the P-value for the F test is 0.1 or less?)

However, if you get significant difference between the variances
in (1), then it may not be very good to use the standard
t test (depending on how different they are). A modified
version, such as the Welch test, should be used instead.

There is an issue with interpreting the results where the
samples have initially been screened by one test, before
another one is applied, since the sampling distribution
of the second test, conditional on the outcome of the
first, may not be the same as the sampling distribution of
the second test on its own. However, I feel inclined to
guess that this may not make any important difference

Hoping this helps,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding@...>
Fax-to-email: +44 (0)870 094 0861 [NB: New number!]
Date: 03-Dec-04 Time: 14:15:09
------------------------------ XFMail ------------------------------
• RE: [ai-geostats] F and T-test for samples drawn from the same p There is one other very important assumption about these standard statiatical tests - namely
Message 4 of 16 , Dec 3, 2004
RE: [ai-geostats] F and T-test for samples drawn from the same p

There is one other very important assumption about these standard statiatical tests - namely that the samples are independent. This typically removes a large part of the usability of basic tests unless corrected for spatial variables. It is most likely the case that your samples within each horizon are not independent (unless the variogram has got zero range)- so your typical tests cannot be used. They will tend to give pessimistic results - in other words you will tend to find differences in means when none exists. So, these type of tests don't apply directly.

I don't know if there has been much work on trying to provide 'rigourous' methods (but given that it is impossible to give a statistical test that shows  if a random function is stationary or not (Matheron - 'Estimating and choosing') then I guess the results would not be completely rigourous). You may be able to get an intuitive feel for the likely difference in means by trying to see how many quasi independent points you have got. You could guess-timate this by assuming that points separated by more than a variogram range are independent and see how many such 'range units' you have got and using this as the number of 'samples' (actually - you may be better by working with an integral range). But if you have any trends in the data then you will not reliable estimates of the two means and so cannot 'prove' that the samples come from the same random function - even if they do.

Regards

Colin Daly

-----Original Message-----
From:   Glover, Tim [mailto:NTGLOVER@...]
Sent:   Fri 12/3/2004 3:15 PM
Cc:     ai-geostats@...
Subject:        RE: [ai-geostats] F and T-test for samples drawn from the same p
Standard t-tests make two assumptions: 1. both data sets are normally
distributed; 2. they have approximately equal variance.  Test these
assumptions before applying a t-test. Violate these assumptions at your
own risk.  If you fail either assumption, you need to consider your
options, but probably should not use a plain-vanilla t-test.  You could
possibly use a data transform to "fix" the first assumption.  You might
have to use a modified t-test (such as Satterthwaite's modification) Or
you might consider a non-parametric approach, such as Mann-Whitney
U-test.

Tim Glover
Senior Environmental Scientist - Geochemistry
Geoenvironmental Department
MACTEC Engineering and Consulting, Inc.
Kennesaw, Georgia, USA
Office 770-421-3310
Fax 770-421-3486
Email ntglover@...
Web www.mactec.com

-----Original Message-----
Sent: Friday, December 03, 2004 9:59 AM
To: 'ted.harding@...'
Cc: 'ai-geostats@...'
Subject: RE: [ai-geostats] F and T-test for samples drawn from the same
p

Hi Ted,

Thanks for your reply. I suspect my original query was too vague, so I
will
illustrate it with a practical example here.

I have an ore horizon that splits into two separate horizons. One of
these
split horizons has a lower average grade, and the other has a higher
average
grade. I need to determine whether I should treat these two horizons as
separate entities during grade estimation. My geological observations
tell
me that these two horizons derive from the same source, and on the face
of
it are not different from one another in terms of mineral content and
genesis. I aim to back it up by proving, or attempting to prove, that
statistically these two horizons are the same, and can be treated as
such as
far as grade estimation goes. Because the mean grades vary between the
two,
I suspect that the T-test might fail, but I also suspect that the
variance
in grade between the two might be very similar, and thus the F-test will
pass. Now I have a problem : a T-test tells me the populations differ
statistically, and but the F-test tells me they don't.

The confidence limit I refer to in (2) by the way is the Alpha value
used to
determine the confidence level for the test - I am using Excel to do the
test.

Thanks,
Colin

-----Original Message-----
From: Ted.Harding@... [mailto:Ted.Harding@...]
Sent: 03 December 2004 14:15
Cc: ai-geostats@...
Subject: RE: [ai-geostats] F and T-test for samples drawn from the same
p

> Hello everyone,

> I have two groups of several thousand samples analysed
> for various elements, and wish to determine if these
> samples are drawn from the same statistical population
> for later variography studies. I propose to test the two
> groups by using a F-test to test the sample variances,
> and a T-test to test the group means, at a given confidence limit.

> Before I do this, I wonder how I would interpret the results
> of the test if, for example:

> 1. The F-test suggests no significant statistical difference
> between the variances at a 90% confidence limit, BUT
> 2. The T-test suggests a significant statistical difference
> between the means at the same, or lower confidence limit.

> Has anyone come across this scenario before and how are they
> interpreted?

On the face of it, the scenario you describe corresponds to
a standard t-test (which involves an assumption that the
variances of the two populations do not differ), though I'm
not sure what you mean in (2) by significant "at the same,
or lower confidence limit." (Do I take it that in (1) you
mean that the P-value for the F test is 0.1 or less?)

However, if you get significant difference between the variances
in (1), then it may not be very good to use the standard
t test (depending on how different they are). A modified
version, such as the Welch test, should be used instead.

There is an issue with interpreting the results where the
samples have initially been screened by one test, before
another one is applied, since the sampling distribution
of the second test, conditional on the outcome of the
first, may not be the same as the sampling distribution of
the second test on its own. However, I feel inclined to
guess that this may not make any important difference

Hoping this helps,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding@...>
Fax-to-email: +44 (0)870 094 0861  [NB: New number!]
Date: 03-Dec-04                                       Time: 14:15:09
------------------------------ XFMail ------------------------------

• Colin (Daly) is exactly correct. The spatial dependence is the main issue here when you use the t-test for spatial data. You might be able to transform your
Message 5 of 16 , Dec 3, 2004
RE: [ai-geostats] F and T-test for samples drawn from the same p

Colin (Daly) is exactly correct. The spatial dependence is the main issue here when you use the t-test for spatial data. You might be able to transform your data for normality or even homogeneity, but the dependence is still there.

In this case, you need to incorporate the spatial dependence (described by variogram) into the ttest. Try the generalized least square for a likelihood approach.

Din Chen

From: Colin Daly [mailto:Colin.Daly@...]
Sent: Friday, December 03, 2004 8:16 AM
To: Glover, Tim; Colin Badenhorst; ted.harding@...
Cc: ai-geostats@...
Subject: RE: [ai-geostats] F and T-test for samples drawn from the same p

There is one other very important assumption about these standard statiatical tests - namely that the samples are independent. This typically removes a large part of the usability of basic tests unless corrected for spatial variables. It is most likely the case that your samples within each horizon are not independent (unless the variogram has got zero range)- so your typical tests cannot be used. They will tend to give pessimistic results - in other words you will tend to find differences in means when none exists. So, these type of tests don't apply directly.

I don't know if there has been much work on trying to provide 'rigourous' methods (but given that it is impossible to give a statistical test that shows  if a random function is stationary or not (Matheron - 'Estimating and choosing') then I guess the results would not be completely rigourous). You may be able to get an intuitive feel for the likely difference in means by trying to see how many quasi independent points you have got. You could guess-timate this by assuming that points separated by more than a variogram range are independent and see how many such 'range units' you have got and using this as the number of 'samples' (actually - you may be better by working with an integral range). But if you have any trends in the data then you will not reliable estimates of the two means and so cannot 'prove' that the samples come from the same random function - even if they do.

Regards

Colin Daly

-----Original Message-----
From:   Glover, Tim [mailto:NTGLOVER@...]
Sent:   Fri 12/3/2004 3:15 PM
Cc:     ai-geostats@...
Subject:        RE: [ai-geostats] F and T-test for samples drawn from the same p
Standard t-tests make two assumptions: 1. both data sets are normally
distributed; 2. they have approximately equal variance.  Test these
assumptions before applying a t-test. Violate these assumptions at your
own risk.  If you fail either assumption, you need to consider your
options, but probably should not use a plain-vanilla t-test.  You could
possibly use a data transform to "fix" the first assumption.  You might
have to use a modified t-test (such as Satterthwaite's modification) Or
you might consider a non-parametric approach, such as Mann-Whitney
U-test.

Tim Glover
Senior Environmental Scientist - Geochemistry
Geoenvironmental Department
MACTEC Engineering and Consulting, Inc.
Kennesaw , Georgia , USA
Office 770-421-3310
Fax 770-421-3486
Email ntglover@...
Web www.mactec.com

-----Original Message-----
Sent: Friday, December 03, 2004 9:59 AM
To: 'ted.harding@...'
Cc: 'ai-geostats@...'
Subject: RE: [ai-geostats] F and T-test for samples drawn from the same
p

Hi Ted,

Thanks for your reply. I suspect my original query was too vague, so I
will
illustrate it with a practical example here.

I have an ore horizon that splits into two separate horizons. One of
these
split horizons has a lower average grade, and the other has a higher
average
grade. I need to determine whether I should treat these two horizons as
separate entities during grade estimation. My geological observations
tell
me that these two horizons derive from the same source, and on the face
of
it are not different from one another in terms of mineral content and
genesis. I aim to back it up by proving, or attempting to prove, that
statistically these two horizons are the same, and can be treated as
such as
far as grade estimation goes. Because the mean grades vary between the
two,
I suspect that the T-test might fail, but I also suspect that the
variance
in grade between the two might be very similar, and thus the F-test will
pass. Now I have a problem : a T-test tells me the populations differ
statistically, and but the F-test tells me they don't.

The confidence limit I refer to in (2) by the way is the Alpha value
used to
determine the confidence level for the test - I am using Excel to do the
test.

Thanks,
Colin

-----Original Message-----
From: Ted.Harding@... [mailto:Ted.Harding@...]
Sent: 03 December 2004 14:15
Cc: ai-geostats@...
Subject: RE: [ai-geostats] F and T-test for samples drawn from the same
p

> Hello everyone,

> I have two groups of several thousand samples analysed
> for various elements, and wish to determine if these
> samples are drawn from the same statistical population
> for later variography studies. I propose to test the two
> groups by using a F-test to test the sample variances,
> and a T-test to test the group means, at a given confidence limit.

> Before I do this, I wonder how I would interpret the results
> of the test if, for example:

> 1. The F-test suggests no significant statistical difference
> between the variances at a 90% confidence limit, BUT
> 2. The T-test suggests a significant statistical difference
> between the means at the same, or lower confidence limit.

> Has anyone come across this scenario before and how are they
> interpreted?

On the face of it, the scenario you describe corresponds to
a standard t-test (which involves an assumption that the
variances of the two populations do not differ), though I'm
not sure what you mean in (2) by significant "at the same,
or lower confidence limit." (Do I take it that in (1) you
mean that the P-value for the F test is 0.1 or less?)

However, if you get significant difference between the variances
in (1), then it may not be very good to use the standard
t test (depending on how different they are). A modified
version, such as the Welch test, should be used instead.

There is an issue with interpreting the results where the
samples have initially been screened by one test, before
another one is applied, since the sampling distribution
of the second test, conditional on the outcome of the
first, may not be the same as the sampling distribution of
the second test on its own. However, I feel inclined to
guess that this may not make any important difference

Hoping this helps,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding@...>
Fax-to-email: +44 (0)870 094 0861  [NB: New number!]
Date: 03-Dec-04                                       Time: 14:15:09
------------------------------ XFMail ------------------------------

• Colin You need to bear in mind that statistical tests such as t and F are only testing a very simple hypothesis - they do not test whether the samples are from
Message 6 of 16 , Dec 3, 2004
Colin

You need to bear in mind that statistical tests such
as t and F are only testing a very simple hypothesis -
they do not test whether the samples are from the same
population.

The F test is to check whether the standard deviations
differ. If the ore is from the same genesis, it is
likely that the variability will be constant and your
F test will not be significant.

The t test is against the hypothesis that the average
values are the same. That is, one population has a
higher average grade than the other. You can have the
same variability around the mean, but have a zone
where the minerals tend to concentrate at a higher
average.

Even if both tests are not significant, this does not
'prove' that the two populations are the same. You
could have two sets of data with the same mean and
standard deviation and completely different shapes,
for example.

To include the spatial element, you could try a cross
validation approach where one set of samples is the
'actual' values and you try to estimate those from the
other set. This will show up consistent differences in
average between the two as well as differences in
variability.

Strictly, all of the above requires a Normal
distribution but with your not-too-skewed data and
thousands of samples, the Central Limit Theorem should
take care of those problems.

Isobel
http://uk.geocities.com/drisobelclark
• Dear all, I m wondering if sample size (number of samples, n) is playing a role here. Since Colin is using Excel to analyse several thousand samples, I have
Message 7 of 16 , Dec 5, 2004
Dear all,

I'm wondering if sample size (number of samples, n) is playing a role here.

Since Colin is using Excel to analyse several thousand samples, I have checked the functions of t-tests in Excel. In the Data Analysis Tools help, a function is provided for "t-Test: Two-Sample Assuming Unequal Variances analysis". This function is the same as those from many text books (There are other forms of the function). Unfortunately, I cannot find the function for "assuming equal variances" in Excel, but I assume they are similar, and should be the same as those from some text books.

From the function, you can find that when the sample size is large you always get a large t value. When sample size is large enough, even slight differences between the mean values of two data sets (x bar and y bar) can be detected, and this will result in rejection of the null hypothesis. This is in fact quite reasonable. When the sample size is large, you are confident with the mean values (Central Limit Theorem), with a very small stand error (s/(sqrt(n)). Therefore, you are confident to detect the differences between the two data sets. Even though there is only a slight difference, you can still say, yes, they are "significantly" different.

If you still remember some time ago, we had a discussion on large sample size problem for tests for normality. When the sample size is large enough, the result can always be expected (for real data sets), that is, rejection of the null hypothesis.

Cheers,

Chaosheng
--------------------------------------------------------------------------
Dr. Chaosheng Zhang
Lecturer in GIS
Department of Geography
National University of Ireland, Galway
IRELAND
Tel: +353-91-524411 x 2375
Direct Tel: +353-91-49 2375
Fax: +353-91-525700
E-mail:
Chaosheng.Zhang@...
Web 1: www.nuigalway.ie/geography/zhang.html
Web 2: www.nuigalway.ie/geography/gis/index.htm
----------------------------------------------------------------------------

----- Original Message -----
From: "Isobel Clark" <drisobelclark@...>
To: "Donald E. Myers" <myers@...>
Sent: Saturday, December 04, 2004 11:49 AM
Subject: [ai-geostats] F and T-test for samples drawn from the same p

> Don
>
> Thank you for the extended clarification of F
and t
> hypothesis test. For those unfamiliar with the
> concept, it
is worth noting that the F test for
> multiple means may be more familiar
under the title
> "Analysis of variance".
>
> My own brief
answer was in the context of Colin's
> question, where it was quite clear
that he was talking
> aboutthe simplest F variance-ratio and t comparison
of
> means test.
>
> Isobel
>
>

> * By using the ai-geostats mailing list you agree to follow its rules
> ( see
href="http://www.ai-geostats.org/help_ai-geostats.htm">http://www.ai-geostats.org/help_ai-geostats.htm )
>
> * To unsubscribe to ai-geostats, send the
following in the subject or in the body (plain text format) of an email message to
sympa@...
>
> Signoff
ai-geostats
>
• HiSorry to repeat myself - but the samples are not independent. Independance is a fundamental assumption of these types of tests - and you cannot interpret
Message 8 of 16 , Dec 5, 2004
RE: [ai-geostats] F and T-test for samples drawn from the same p

Hi

Sorry to repeat myself - but the samples are not independent.  Independance is a fundamental assumption of these types of tests - and you cannot interpret the tests if this assumption is violated.  In the situation where spatial correlation exists, the true standard error is nothing like as small as the (s/sqrt(n)) that Chaosheng discusses - because the sqrt(n) depends on independence.

Again, as I said before, if the data has any type of trend in it, then it is completely meaningless to try and use these tests - and with no trend but some 'ordinary' correlation, you must find a means of taking the data redundancy into account or risk get hopelessly pessimistic results (in the sense of rejecting the null hypothesis of equal means far too often)

Consider a trivial example. A one dimensional random function which takes constant values over intervals of lenght one - so, it takes the value a_0 in the interval [0,1[  then the value a_1 in the interval [1,2[ and so on (let us suppose that each a_n term is drawn at random from a gaussian distribution with the same mean and variance for example).  Next suppose you are given samples on the interval [0,2]. You spot that there seems to be a jump between [0,1[ and [1,2[  - so you test for the difference in the means. If you apply an f test you will easily find that the mean differs (and more convincingly the more samples you have drawn!). However by construction of the random function,  the mean is not different.  We have been lulled into the false conclusion of differing means by assuming that all our data are independent.

Regards

Colin Daly

-----Original Message-----
From:   Chaosheng Zhang [mailto:Chaosheng.Zhang@...]
Sent:   Sun 12/5/2004 11:42 AM
To:     ai-geostats@...
Cc:     Colin Badenhorst; Isobel Clark; Donald E. Myers
Subject:        Re: [ai-geostats] F and T-test for samples drawn from the same p
Dear all,

I'm wondering if sample size (number of samples, n) is playing a role here.

Since Colin is using Excel to analyse several thousand samples, I have checked the functions of t-tests in Excel. In the Data Analysis Tools help, a function is provided for "t-Test: Two-Sample Assuming Unequal Variances analysis". This function is the same as those from many text books (There are other forms of the function). Unfortunately, I cannot find the function for "assuming equal variances" in Excel, but I assume they are similar, and should be the same as those from some text books.

From the function, you can find that when the sample size is large you always get a large t value. When sample size is large enough, even slight differences between the mean values of two data sets (x bar and y bar) can be detected, and this will result in rejection of the null hypothesis. This is in fact quite reasonable. When the sample size is large, you are confident with the mean values (Central Limit Theorem), with a very small stand error (s/(sqrt(n)). Therefore, you are confident to detect the differences between the two data sets. Even though there is only a slight difference, you can still say, yes, they are "significantly" different.

If you still remember some time ago, we had a discussion on large sample size problem for tests for normality. When the sample size is large enough, the result can always be expected (for real data sets), that is, rejection of the null hypothesis.

Cheers,

Chaosheng

--------------------------------------------------------------------------

Dr. Chaosheng Zhang

Lecturer in GIS

Department of Geography

National University of Ireland, Galway

IRELAND

Tel: +353-91-524411 x 2375

Direct Tel: +353-91-49 2375

Fax: +353-91-525700

E-mail: Chaosheng.Zhang@...

Web 1: www.nuigalway.ie/geography/zhang.html

Web 2: www.nuigalway.ie/geography/gis/index.htm

----------------------------------------------------------------------------

----- Original Message -----

From: "Isobel Clark" <drisobelclark@...>

To: "Donald E. Myers" <myers@...>

Sent: Saturday, December 04, 2004 11:49 AM

Subject: [ai-geostats] F and T-test for samples drawn from the same p

> Don

>

> Thank you for the extended clarification of F and t

> hypothesis test. For those unfamiliar with the

> concept, it is worth noting that the F test for

> multiple means may be more familiar under the title

> "Analysis of variance".

>

> My own brief answer was in the context of Colin's

> question, where it was quite clear that he was talking

> aboutthe simplest F variance-ratio and t comparison of

> means test.

>

> Isobel

>

>

--------------------------------------------------------------------------------

> * By using the ai-geostats mailing list you agree to follow its rules

> ( see http://www.ai-geostats.org/help_ai-geostats.htm )

>

> * To unsubscribe to ai-geostats, send the following in the subject or in the body (plain text format) of an email message to sympa@...

>

> Signoff ai-geostats

>

 ```DISCLAIMER: This message contains information that may be privileged or confidential and is the property of the Roxar Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorised to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message. ```
• Hello, I am currently principal investigator on a major NIH grant that aims to develop software for test of hypothesis using alternate hypothesis specified by
Message 9 of 16 , Dec 5, 2004
Hello,

I am currently principal investigator on a major NIH grant
that aims to develop software for test of hypothesis
using alternate hypothesis specified by the user and that
differ from the omnibus "spatial independence";
we called them "spatial neutral models".
For example, you can test for clusters of cancer rates
"above and beyond" a regional background in exposure.
The p-values are computed using randomization and I applied
geostatistical simulation to generate multiple realizations
that are then used to derive the empirical distribution of
the test statistic.

I presented an example during the last GeoEnv conference
and I put a PDF copy of the paper, which is in press for
the moment, on my website.

Cheers,

Pierre

<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

Dr. Pierre Goovaerts
President of PGeostat, LLC
Chief Scientist with Biomedware Inc.
710 Ridgemont Lane
Ann Arbor, Michigan, 48103-1535, U.S.A.

E-mail: goovaert@...
Phone: (734) 668-9900
Fax: (734) 668-7788
http://alumni.engin.umich.edu/~goovaert/

<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

On Sun, 5 Dec 2004, Colin Daly wrote:

>
>
> Hi
>
> Sorry to repeat myself - but the samples are not independent. Independance is a fundamental assumption of these types of tests - and you cannot interpret the tests if this assumption is violated. In the situation where spatial correlation exists, the true standard error is nothing like as small as the (s/sqrt(n)) that Chaosheng discusses - because the sqrt(n) depends on independence.
>
> Again, as I said before, if the data has any type of trend in it, then it is completely meaningless to try and use these tests - and with no trend but some 'ordinary' correlation, you must find a means of taking the data redundancy into account or risk get hopelessly pessimistic results (in the sense of rejecting the null hypothesis of equal means far too often)
>
> Consider a trivial example. A one dimensional random function which takes constant values over intervals of lenght one - so, it takes the value a_0 in the interval [0,1[ then the value a_1 in the interval [1,2[ and so on (let us suppose that each a_n term is drawn at random from a gaussian distribution with the same mean and variance for example). Next suppose you are given samples on the interval [0,2]. You spot that there seems to be a jump between [0,1[ and [1,2[ - so you test for the difference in the means. If you apply an f test you will easily find that the mean differs (and more convincingly the more samples you have drawn!). However by construction of the random function, the mean is not different. We have been lulled into the false conclusion of differing means by assuming that all our data are independent.
>
> Regards
>
> Colin Daly
>
>
> -----Original Message-----
> From: Chaosheng Zhang [mailto:Chaosheng.Zhang@...]
> Sent: Sun 12/5/2004 11:42 AM
> To: ai-geostats@...
> Cc: Colin Badenhorst; Isobel Clark; Donald E. Myers
> Subject: Re: [ai-geostats] F and T-test for samples drawn from the same p
> Dear all,
>
>
>
> I'm wondering if sample size (number of samples, n) is playing a role here.
>
>
>
> Since Colin is using Excel to analyse several thousand samples, I have checked the functions of t-tests in Excel. In the Data Analysis Tools help, a function is provided for "t-Test: Two-Sample Assuming Unequal Variances analysis". This function is the same as those from many text books (There are other forms of the function). Unfortunately, I cannot find the function for "assuming equal variances" in Excel, but I assume they are similar, and should be the same as those from some text books.
>
>
>
> From the function, you can find that when the sample size is large you always get a large t value. When sample size is large enough, even slight differences between the mean values of two data sets (x bar and y bar) can be detected, and this will result in rejection of the null hypothesis. This is in fact quite reasonable. When the sample size is large, you are confident with the mean values (Central Limit Theorem), with a very small stand error (s/(sqrt(n)). Therefore, you are confident to detect the differences between the two data sets. Even though there is only a slight difference, you can still say, yes, they are "significantly" different.
>
>
>
> If you still remember some time ago, we had a discussion on large sample size problem for tests for normality. When the sample size is large enough, the result can always be expected (for real data sets), that is, rejection of the null hypothesis.
>
>
>
> Cheers,
>
>
>
> Chaosheng
>
> --------------------------------------------------------------------------
>
> Dr. Chaosheng Zhang
>
> Lecturer in GIS
>
> Department of Geography
>
> National University of Ireland, Galway
>
> IRELAND
>
> Tel: +353-91-524411 x 2375
>
> Direct Tel: +353-91-49 2375
>
> Fax: +353-91-525700
>
> E-mail: Chaosheng.Zhang@...
>
> Web 1: www.nuigalway.ie/geography/zhang.html
>
> Web 2: www.nuigalway.ie/geography/gis/index.htm
>
> ----------------------------------------------------------------------------
>
>
>
>
>
> ----- Original Message -----
>
> From: "Isobel Clark" <drisobelclark@...>
>
> To: "Donald E. Myers" <myers@...>
>
>
> Sent: Saturday, December 04, 2004 11:49 AM
>
> Subject: [ai-geostats] F and T-test for samples drawn from the same p
>
>
>
>
>
> > Don
>
> >
>
> > Thank you for the extended clarification of F and t
>
> > hypothesis test. For those unfamiliar with the
>
> > concept, it is worth noting that the F test for
>
> > multiple means may be more familiar under the title
>
> > "Analysis of variance".
>
> >
>
> > My own brief answer was in the context of Colin's
>
> > question, where it was quite clear that he was talking
>
> > aboutthe simplest F variance-ratio and t comparison of
>
> > means test.
>
> >
>
> > Isobel
>
> >
>
> >
>
>
>
>
>
> --------------------------------------------------------------------------------
>
>
>
>
>
> > * By using the ai-geostats mailing list you agree to follow its rules
>
> > ( see http://www.ai-geostats.org/help_ai-geostats.htm )
>
> >
>
> > * To unsubscribe to ai-geostats, send the following in the subject or in the body (plain text format) of an email message to sympa@...
>
> >
>
> > Signoff ai-geostats
>
> >
>
>
>
>
> DISCLAIMER:
> This message contains information that may be privileged or confidential and is the property of the Roxar Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorised to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.
• Colin, Isn t a basic rule of geostatisitics that all populations must follow the intrinsic hypothesis, i.e. stationarity ,constant mean and variance, so you
Message 10 of 16 , Dec 5, 2004
Colin,

Isn't a basic rule of geostatisitics that all populations must follow the
intrinsic
hypothesis, i.e. stationarity ,constant mean and variance, so you should
split
any populations that do not have the same mean and variance, introduced
pp33 Mining Geostatistics A.G.Journel & Ch. J.Huijbregts.

Regards Digby

----- Original Message -----
To: <ted.harding@...>
Cc: <ai-geostats@...>
Sent: Saturday, December 04, 2004 1:28 AM
Subject: RE: [ai-geostats] F and T-test for samples drawn from the same p

> Hi Ted,
>
> Thanks for your reply. I suspect my original query was too vague, so I
> will
> illustrate it with a practical example here.
>
> I have an ore horizon that splits into two separate horizons. One of these
> split horizons has a lower average grade, and the other has a higher
> average
> grade. I need to determine whether I should treat these two horizons as
> separate entities during grade estimation. My geological observations tell
> me that these two horizons derive from the same source, and on the face of
> it are not different from one another in terms of mineral content and
> genesis. I aim to back it up by proving, or attempting to prove, that
> statistically these two horizons are the same, and can be treated as such
> as
> far as grade estimation goes. Because the mean grades vary between the
> two,
> I suspect that the T-test might fail, but I also suspect that the variance
> in grade between the two might be very similar, and thus the F-test will
> pass. Now I have a problem : a T-test tells me the populations differ
> statistically, and but the F-test tells me they don't.
>
> The confidence limit I refer to in (2) by the way is the Alpha value used
> to
> determine the confidence level for the test - I am using Excel to do the
> test.
>
> Thanks,
> Colin
>
>
> -----Original Message-----
> From: Ted.Harding@... [mailto:Ted.Harding@...]
> Sent: 03 December 2004 14:15
> Cc: ai-geostats@...
> Subject: RE: [ai-geostats] F and T-test for samples drawn from the same
> p
>
>
> On 03-Dec-04 Colin Badenhorst wrote:
>> Hello everyone,
>>
>> I have two groups of several thousand samples analysed
>> for various elements, and wish to determine if these
>> samples are drawn from the same statistical population
>> for later variography studies. I propose to test the two
>> groups by using a F-test to test the sample variances,
>> and a T-test to test the group means, at a given confidence limit.
>>
>> Before I do this, I wonder how I would interpret the results
>> of the test if, for example:
>>
>> 1. The F-test suggests no significant statistical difference
>> between the variances at a 90% confidence limit, BUT
>> 2. The T-test suggests a significant statistical difference
>> between the means at the same, or lower confidence limit.
>>
>> Has anyone come across this scenario before and how are they
>> interpreted?
>
> On the face of it, the scenario you describe corresponds to
> a standard t-test (which involves an assumption that the
> variances of the two populations do not differ), though I'm
> not sure what you mean in (2) by significant "at the same,
> or lower confidence limit." (Do I take it that in (1) you
> mean that the P-value for the F test is 0.1 or less?)
>
> However, if you get significant difference between the variances
> in (1), then it may not be very good to use the standard
> t test (depending on how different they are). A modified
> version, such as the Welch test, should be used instead.
>
> There is an issue with interpreting the results where the
> samples have initially been screened by one test, before
> another one is applied, since the sampling distribution
> of the second test, conditional on the outcome of the
> first, may not be the same as the sampling distribution of
> the second test on its own. However, I feel inclined to
> guess that this may not make any important difference
>
> Hoping this helps,
> Ted.
>
>
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <Ted.Harding@...>
> Fax-to-email: +44 (0)870 094 0861 [NB: New number!]
> Date: 03-Dec-04 Time: 14:15:09
> ------------------------------ XFMail ------------------------------
>
>
>

--------------------------------------------------------------------------------

>* By using the ai-geostats mailing list you agree to follow its rules
> ( see http://www.ai-geostats.org/help_ai-geostats.htm )
>
> * To unsubscribe to ai-geostats, send the following in the subject or in
> the body (plain text format) of an email message to sympa@...
>
> Signoff ai-geostats
• Every resource model I have done, I always subdivide the populations into those of equal mean and variance, so stationarity is obeyed, is this the correct
Message 11 of 16 , Dec 5, 2004
Every resource model I have done, I always subdivide the populations into
those of equal mean and variance, so stationarity is obeyed, is this the
correct
procedure, I havn't read Mining Geostatisitcs in detail yet, but understood
that this was a basic requirement for geostatisitical modelling procedures.

Digby
• RE: [ai-geostats] F and T-test for samples drawn from the same pBesides the discussions on the theory, I think we need a practical solution for Colin
Message 12 of 16 , Dec 6, 2004
RE: [ai-geostats] F and T-test for samples drawn from the same p
Besides the discussions on the theory, I think we need a practical solution for Colin Badenhorst's initial problem (This is not his problem only). He wants to compare two sets of spatial data with several thousand samples.

Spatial autocorrelation (or lack of independence) is a basic feature of spatial data, and thus we cannot do anything to ask spatial data to behave well to satisfy the statistical requirements. If your spatial data set is lack of spatial autocorrelation, you may be asked to go back and take more samples. The ideal way is perhaps to develop a t-test (or whatever test) for spatial data, something like "spatially weighted test". If such a test is not available, we have no choice, but have to use existing methods. They may not be exactly suitable to spatial data, but better than nothing.

For the time being, the best way to solve the problem is still to use statistical methods, but try to explain the results carefully and appropriately. We have to acknowledge the discrepancies between the basic feature of spatial data and possible statistical requirements. Meanwhile, when the sample size (well, going back to my initial concern) is large, you will always get the result of rejecting the null hypothesis for REAL data, no matter there is spatial dependence or not. In this case, what does such a result mean? I would like to say this result is not very meaningful, as it just proves the power of statistical tests. The simple ways of graphs (e.g., histogram, box-plot) and percentiles may become helpful for comparison.

Therefore, for Colin's initial problem, the solution is to explain the results properly, and maybe to try some other methods if available.

Cheers,

Chaosheng
--------------------------------------------------------------------------
Dr. Chaosheng Zhang
Lecturer in GIS
Department of Geography
National University of Ireland, Galway
IRELAND
Tel: +353-91-524411 x 2375
Direct Tel: +353-91-49 2375
Fax: +353-91-525700
E-mail: Chaosheng.Zhang@...
Web 1: www.nuigalway.ie/geography/zhang.html
Web 2: www.nuigalway.ie/geography/gis/index.htm
----------------------------------------------------------------------------

• Dear all I am having difficulty understanding why none of you want to try a spatial approach to statistics. Everyone is trying to make the independent
Message 13 of 16 , Dec 6, 2004
Dear all

I am having difficulty understanding why none of you
want to try a spatial approach to statistics. Everyone
is trying to make the 'independent' statistical tests
work on spatial data. Try turning this around and look
at the spatial aspect first.

(1) Testing variances: the sill on the semi-variogram
(total height of model) is theoretically a good
estimate for the sample variance when auto-correlation
or spatial dependence is present. Do your F test on
that. Yes, you still have degrees of freedom problems,
but with thousands of samples the 'infinity column'
should be sufficient.

(2) Testing means: the classic t-test in the presence
of 'equal variances' requires the 'standard error' of
each mean. For independent samples, this is s/sqrt(n).
For spatially dependent samples, this is the kriging
standard error for the global mean. Your only problem
then is getting a global standard error.

Isobel
http://geoecosse.bizland.com/whatsnew.htm
• Isobel, Good idea, and that s a step forward. Any references or is it still an idea? Cheers, Chaosheng ... From: Isobel Clark To:
Message 14 of 16 , Dec 6, 2004
Isobel,

Good idea, and that's a step forward. Any references or is it still an idea?

Cheers,

Chaosheng

----- Original Message -----
From: "Isobel Clark" <drisobelclark@...>
To: "AI Geostats mailing list" <ai-geostats@...>
Sent: Monday, December 06, 2004 1:07 PM
Subject: Re: [ai-geostats] F and T-test for samples drawn from the same p

> Dear all
>
> I am having difficulty understanding why none of you
> want to try a spatial approach to statistics. Everyone
> is trying to make the 'independent' statistical tests
> work on spatial data. Try turning this around and look
> at the spatial aspect first.
>
> (1) Testing variances: the sill on the semi-variogram
> (total height of model) is theoretically a good
> estimate for the sample variance when auto-correlation
> or spatial dependence is present. Do your F test on
> that. Yes, you still have degrees of freedom problems,
> but with thousands of samples the 'infinity column'
> should be sufficient.
>
> (2) Testing means: the classic t-test in the presence
> of 'equal variances' requires the 'standard error' of
> each mean. For independent samples, this is s/sqrt(n).
> For spatially dependent samples, this is the kriging
> standard error for the global mean. Your only problem
> then is getting a global standard error.
>
> Isobel
> http://geoecosse.bizland.com/whatsnew.htm
>
>

----------------------------------------------------------------------------
----

> * By using the ai-geostats mailing list you agree to follow its rules
> ( see http://www.ai-geostats.org/help_ai-geostats.htm )
>
> * To unsubscribe to ai-geostats, send the following in the subject or in
the body (plain text format) of an email message to sympa@...
>
> Signoff ai-geostats
>
• There ws a pretty good paper on global standard errors in the 1984 APCOM proceedings, so I am sure it should be in the major textbooks by now. Commparing the
Message 15 of 16 , Dec 6, 2004
There ws a pretty good paper on global standard errors
in the 1984 APCOM proceedings, so I am sure it should
be in the major textbooks by now.

Commparing the sills is very straightforward, I think.

Isobel
http://geecosse.bizland.com/books.htm

--- Chaosheng Zhang <Chaosheng.Zhang@...>
wrote:
> Isobel,
>
> Good idea, and that's a step forward. Any references
> or is it still an idea?
>
> Cheers,
>
> Chaosheng
>
> ----- Original Message -----
> From: "Isobel Clark" <drisobelclark@...>
> To: "AI Geostats mailing list" <ai-geostats@...>
> Sent: Monday, December 06, 2004 1:07 PM
> Subject: Re: [ai-geostats] F and T-test for samples
> drawn from the same p
>
>
> > Dear all
> >
> > I am having difficulty understanding why none of
> you
> > want to try a spatial approach to statistics.
> Everyone
> > is trying to make the 'independent' statistical
> tests
> > work on spatial data. Try turning this around and
> look
> > at the spatial aspect first.
> >
> > (1) Testing variances: the sill on the
> semi-variogram
> > (total height of model) is theoretically a good
> > estimate for the sample variance when
> auto-correlation
> > or spatial dependence is present. Do your F test
> on
> > that. Yes, you still have degrees of freedom
> problems,
> > but with thousands of samples the 'infinity
> column'
> > should be sufficient.
> >
> > (2) Testing means: the classic t-test in the
> presence
> > of 'equal variances' requires the 'standard error'
> of
> > each mean. For independent samples, this is
> s/sqrt(n).
> > For spatially dependent samples, this is the
> kriging
> > standard error for the global mean. Your only
> problem
> > then is getting a global standard error.
> >
> > Isobel
> > http://geoecosse.bizland.com/whatsnew.htm
> >
> >
>
>
>
----------------------------------------------------------------------------
> ----
>
>
> > * By using the ai-geostats mailing list you agree
> > ( see
> http://www.ai-geostats.org/help_ai-geostats.htm )
> >
> > * To unsubscribe to ai-geostats, send the
> following in the subject or in
> the body (plain text format) of an email message to
> sympa@...
> >
> > Signoff ai-geostats
> >
>
>
> > * By using the ai-geostats mailing list you agree
to
> ( see
> http://www.ai-geostats.org/help_ai-geostats.htm )
>
> * To unsubscribe to ai-geostats, send the following
> in the subject or in the body (plain text format) of
> an email message to sympa@...
>
> Signoff ai-geostats
• RE: [ai-geostats] F and T-test for samples drawn from the same pComparisons of the sills of relative variograms may indicate wether the proportional effect is
Message 16 of 16 , Dec 6, 2004
RE: [ai-geostats] F and T-test for samples drawn from the same pComparisons
of the sills of relative variograms may indicate wether the proportional
effect is present
between the low and high grade zones, so a test on the correlation
coefficients could be relevant.

Digby
www.users.on.net/~digbym
Your message has been successfully submitted and would be delivered to recipients shortly.