[ai-geostats] Re: descriptive statistics or inference?
I agree with you that no estimation is needed if we have the population,
and that's what I said in the beginning of my last discussion. I'm saying
that when variance and sill in a population doesn't match, I'll have
concern when I have to use sill in a sample to estimate the
On Wed, 8 Dec 2004, Isobel Clark wrote:
> > And just a personal opinion, I would like to think
> > geostatistic
> > theories apply to population of any size, as small
> > as 27, or as large as
> > 1,000,000. If I'm making an example that
> > geostatistics doesn't apply, then
> > there's something to concern about in this approach.
> Geostatistics applies to any size of sample set but
> for the theory to work ou have to have a relatively
> enormous population to draw rom.
> Put in plain terms, the assumption is that the
> withdrawal of the samples does not materially affect
> the behaviour of the population.
> If you have the whole population, you don't need to do
> tests or estimates.
Even if your population variance and sill do not match identically,
the sample sill should still be a better estimate than the sample variance,
when you consider the amount of clustering which occurs in sampling.
- On Thu, 9 Dec 2004, Digby Millikan wrote:
> Even if your population variance and sill do not match identically,
> the sample sill should still be a better estimate than the sample
> when you consider the amount of clustering which occurs in sampling.All right, if you think the clustering of data values (I'm not talking
about clustering of locations) are not be part of the representation of
I just found an example that I can use as population, with 2500 points
and it's 2-D (in the GSLIB manual, second realization of SGSIM.OUT-- if
you happen to have this data set) and found the sill and variance of this
population not matching (sill~20, variance=18.63).
I intended to use a smaller sample so everyone can have fun playing the
data (even if you use M$-Excel to calculate the variogram), which also
speaks out more what I'd like to say. But seems like people are more
interested in discussing the size of population. . . I'll leave it here
then, if nobody found any problem estimating population variance using the
sill value. Maybe I'm just psychologically not comfortable estimating
variance like that. . . (I'll probably follow you people if I found no
theoretic derivation for my thinking.)
It's fun discussing with you people though, and I'm happy to have this
much discussion for my debut.
I did mean the clustering of locations, if the sample is evenly and or
spread your s^2 estimate will be no problem, it's when the data is clustered
locations I believe removal of these data improves the estimate, the less
the less improvment, the higher the clustering, the higher the improvment.
clustered data has correlation which is picked up in the sub-range portion
So the variance is 93% of the sill for the population which adds credence
argument. What would be interesting is to see the results we get from a
some clustering i.e. a sample variogram sill closer to 18.63. I'm interested
method for use with mine sampling data. I have GSLIB, but no compiler. Are
developing a sample subset with some clustering, or can you send me the
> All right, if you think the clustering of data values (I'm not talking
> about clustering of locations) are not be part of the representation of
Sorry I don't have time to process the data yet, but a summary so far,
is that geostatisitcs was originally developed in the mining field, and
for the application of mineral resource assesment with which I am involved
where sampling patterns are generally clustered and the use of the sill of
variogram can be used as an estimate of the variance at your discression.
As Isobel says this is where the half originates in the equation for the
experimental variogram, as the sill will often approximate the variance.
However with many applications of geostatisitics much more detailed and
regular sampling patterns are used (even in mining e.g. soil geochemistry),
in which case the sill is not an exact estimator (I'm not sure of the
terminology here) of the variance and should be treated as such, though
seems to be more familiar with the mathematics of this, and it might be
interesting a further study of this.
The variance/sill relationship is theoretical and does
not depend on the layout of the samples, regular or
clustered. Since the sill only uses pairs where
samples are uncorrelated from one another, the
clustering is irrelevant.
It does depend on the distribution of the samples
values being 'stationary', that is having constant
mean and variance over the study area. It also depends
on that distribution having a valid variance. For
example, the variance of samples from a lognormal
distribution depends on the average of those samples -
hence the proportional effect.
All of this is explained in any basic geostatistics
book, including Matheron's original Theory of
Regionalised Variables and my Practical Geostatistics
(Chapter 3) which cn be freely downloaded from
- Like Isobel mentioned, the sill only uses pairs where samples are
uncorrelated from one another, and in this case the clustering is
And I totally agree with that. The crucial thing, Digby, is that you want
to make sure the variance estimated reflects the characteristic of what
you actually wanted, regardless of the terminology that may or may not be
stated for different purposes.
On Sun, 12 Dec 2004, Isobel Clark wrote:
> The variance/sill relationship is theoretical and does
> not depend on the layout of the samples, regular or
> clustered. Since the sill only uses pairs where
> samples are uncorrelated from one another, the
> clustering is irrelevant.
> It does depend on the distribution of the samples
> values being 'stationary', that is having constant
> mean and variance over the study area. It also depends
> on that distribution having a valid variance. For
> example, the variance of samples from a lognormal
> distribution depends on the average of those samples -
> hence the proportional effect.
> All of this is explained in any basic geostatistics
> book, including Matheron's original Theory of
> Regionalised Variables and my Practical Geostatistics
> (Chapter 3) which cn be freely downloaded from
Thankyou, in relation to Colins work then the variance
may be estimated from the sill of the variograms for
the two orebodies and if the two orebodies had lognormal
distributions, they may have a different mean and variance,
but may still display the proportional effect, i.e. similar
coefficients of variation in which case Geostatistical
Ore Reserve Estimation, pp172 M. David points out
that lognormal kriging may be avoided, from what I
understand as relative variograms may be used instead
of lognormal variograms.
- Hi people,
Finally I got the point of argues for the estimation of population
What I had in mind as an "overall" variance is the variance of all
possible locations in any realization of the random field, while Isobel
and some other people are trying to explain to me is the variance of all
possible realization at any location of the random field.
I realized, by noticing this, why along the discussion the stationarity
and existence of variance has been emphasized. If the random field is not
stationary then we'll have no consistant population variance as Isobel
explained. I also learned that my understanding of the population variance
has a name called "areal variance."
Now, it should be clear why I emphasized on the expected variance of the
"future samples." This would be the variance of the any possible sample
taken from the current realization (which I called "population"
previously) by some planned sampling scheme. And this variance will have
to do with the clustering or non-clustering of the future sampling
scheme. I'm aware, of course, that in practice "future" samples may no
longer be taken from the current realization since in the real case
the study site would be changing. Calling it a "future" sample is
just a convenient saying for the expected variance based on possible
Hope I'm not getting things more confused.