## GEOSTATS: SUM(2): Kriging variance

Expand Messages
• Dear all, you will find here under interesting comments I received from Donald Myers (thank you very much !) to my question on kriging variance. Enjoy the
Message 1 of 1 , Dec 20, 1905
• 0 Attachment
Dear all,

Donald Myers (thank you very much !) to my question on kriging
variance.

Best regards

Gregoire

------------------------------------------------------

1. I agree that in some sense there is a discrepancy between using the whole
data set to estimate the variogram and then only using a local neighborhood
for kriging but I think that asking the question that way confuses things.

2. Since the data is not collected by "random sampling", (note that random
site selection is not quite the same thing as random sampling as the term is
customarily used in statistics), there is no theoretical connection between
the set of data locations and the region for which the modeled variogram might
be used. We often assume that there is but there is not.

3. There is also the question of the difference between a spatial average,
which is what the sample variogram is, and an ensemble (theoretical
probabilistic) average which is the way the variogram is defined. Although it
is not often mentioned, using the spatial average to estimate the ensemble
average implies the use of an ergodic property. Unlike many statistical
techniques, there are no "replicates" to get around this problem. By its
definition, the variogram is not location dependent.

4. One way to see how the variogram relates to a region that is somehow
connected with the data locations is to look at the "dual" form of the kriging
estimator.

Z*(x) = SUM{i=1,...n} Bi gamma(xi - x) + SUM {j=0,...,p}Aj Fj(x)

I have written it in the form corresponding to universal kriging, for ordinary
kriging the sum on the right only has one term, a constant. It is easy to show
that the sum of the Bi's is zero. First suppose that the variogram has a
range, then consider a region enclosing the data locations but large enough so
that for all points x in the region, xi-x is less than the range of the
variogram. Now consider points outside of that region, the variogram value for
all pairs xi-x is the same and hence the first sum on the right hand side of
the equation is zero. That means that the interpolated value is entirely
determined by the right hand sum, for ordinary kriging this means a constant
(which in fact is the arithmetic mean of the data values). If the variogram
does not have a range then the same argument still applies but we have to
consider a larger region, i.e., big enough so that although the variogram
values are not constant they are essentially the same (when x is sufficiently
far enough away from all the data locations, the magnitude of the xi-x's is
essentially the same and hence the variogram values are essentially the
same).

We might then consider the "region" described above (the one containing the
data locations) as being the region intrinsically related to that set of data
locations. Unfortunately you can't compute the kriging variance from the
coefficients in the dual form.

5. Now to return to the question at hand, although it will not happen in all
instances, generally speaking the kriging variance will increase when fewer
data locations are used for the kriging. One way to see this is to consider a
system of kriging equations which includes all the data locations but now
"force" some weights (weights at some locations) to be zero so that the
solution vector now looks as though it corresponded to a local neighborhood.

This has to be a sub-optimal solution, the estimation variance obtained from
this solution can not be greater than the estimation obtained by not including
the "zero" constraints. That is, the system of equations obtained by using the
local neighborhood is a special case of the the one using all the data
locations, special in the sense that additional constraints are imposed.

Imposing the additional constraints can not decrease the kriging variance. If
the larger variance is interpreted as meaning greater uncertainty then this is
what should happen when you leave out information.

Having said all of the above paragraph I would note that sometimes a data
location should be thought of as "dis-information" rather than as
"information".

6. Nearly everyone is familiar with the injunction to not use the sample
variogram plot for distances that exceed half the largest pair distance. In
fact, of course, one often does not use the sample variogram for lags that are
even that large and hence in looking at the size of local neighborhoods one
should keep in mind the largest lag used in modeling the variogram.

An interesting reference on kriging variances and sample design is:

"The updated kriging variance and optimal sample design"
(Gao, Wang and Zhao)
Math. Geology 28, (1996) 295-314

Gregoire Dubois
Section of Earth Sciences
Institute of Mineralogy and Petrography
University of Lausanne
Switzerland

Currently detached in Italy

http://curie.ei.jrc.it/ai-geostats.htm

____________________________________________________________________