you will find here under interesting comments I received from

Donald Myers (thank you very much !) to my question on kriging

variance.

Enjoy the reading,

Best regards

Gregoire

------------------------------------------------------

1. I agree that in some sense there is a discrepancy between using the whole

data set to estimate the variogram and then only using a local neighborhood

for kriging but I think that asking the question that way confuses things.

2. Since the data is not collected by "random sampling", (note that random

site selection is not quite the same thing as random sampling as the term is

customarily used in statistics), there is no theoretical connection between

the set of data locations and the region for which the modeled variogram might

be used. We often assume that there is but there is not.

3. There is also the question of the difference between a spatial average,

which is what the sample variogram is, and an ensemble (theoretical

probabilistic) average which is the way the variogram is defined. Although it

is not often mentioned, using the spatial average to estimate the ensemble

average implies the use of an ergodic property. Unlike many statistical

techniques, there are no "replicates" to get around this problem. By its

definition, the variogram is not location dependent.

4. One way to see how the variogram relates to a region that is somehow

connected with the data locations is to look at the "dual" form of the kriging

estimator.

Z*(x) = SUM{i=1,...n} Bi gamma(xi - x) + SUM {j=0,...,p}Aj Fj(x)

I have written it in the form corresponding to universal kriging, for ordinary

kriging the sum on the right only has one term, a constant. It is easy to show

that the sum of the Bi's is zero. First suppose that the variogram has a

range, then consider a region enclosing the data locations but large enough so

that for all points x in the region, xi-x is less than the range of the

variogram. Now consider points outside of that region, the variogram value for

all pairs xi-x is the same and hence the first sum on the right hand side of

the equation is zero. That means that the interpolated value is entirely

determined by the right hand sum, for ordinary kriging this means a constant

(which in fact is the arithmetic mean of the data values). If the variogram

does not have a range then the same argument still applies but we have to

consider a larger region, i.e., big enough so that although the variogram

values are not constant they are essentially the same (when x is sufficiently

far enough away from all the data locations, the magnitude of the xi-x's is

essentially the same and hence the variogram values are essentially the

same).

We might then consider the "region" described above (the one containing the

data locations) as being the region intrinsically related to that set of data

locations. Unfortunately you can't compute the kriging variance from the

coefficients in the dual form.

5. Now to return to the question at hand, although it will not happen in all

instances, generally speaking the kriging variance will increase when fewer

data locations are used for the kriging. One way to see this is to consider a

system of kriging equations which includes all the data locations but now

"force" some weights (weights at some locations) to be zero so that the

solution vector now looks as though it corresponded to a local neighborhood.

This has to be a sub-optimal solution, the estimation variance obtained from

this solution can not be greater than the estimation obtained by not including

the "zero" constraints. That is, the system of equations obtained by using the

local neighborhood is a special case of the the one using all the data

locations, special in the sense that additional constraints are imposed.

Imposing the additional constraints can not decrease the kriging variance. If

the larger variance is interpreted as meaning greater uncertainty then this is

what should happen when you leave out information.

Having said all of the above paragraph I would note that sometimes a data

location should be thought of as "dis-information" rather than as

"information".

6. Nearly everyone is familiar with the injunction to not use the sample

variogram plot for distances that exceed half the largest pair distance. In

fact, of course, one often does not use the sample variogram for lags that are

even that large and hence in looking at the size of local neighborhoods one

should keep in mind the largest lag used in modeling the variogram.

An interesting reference on kriging variances and sample design is:

"The updated kriging variance and optimal sample design"

(Gao, Wang and Zhao)

Math. Geology 28, (1996) 295-314

Gregoire Dubois

Section of Earth Sciences

Institute of Mineralogy and Petrography

University of Lausanne

Switzerland

Currently detached in Italy

http://curie.ei.jrc.it/ai-geostats.htm

____________________________________________________________________

Get free email and a permanent address at http://www.netaddress.com/?N=1

--

*To post a message to the list, send it to ai-geostats@....

*As a general service to list users, please remember to post a summary

of any useful responses to your questions.

*To unsubscribe, send email to majordomo@... with no subject and

"unsubscribe ai-geostats" in the message body.

DO NOT SEND Subscribe/Unsubscribe requests to the list!