of the responses below.

Thank you.

John Walter

Donald E. Myers wrote

There are a couple of underlying assumptions that are critical, you will then

have to ask how your problem/application relates to those.

1. The data is considered to be a non-random sample from one realization of a

random function.

Hence "probability basis" as it relates to the design of a sample

pattern is not relevant. "Pattern" in this case pertains to the data

locations, not the distribution of values

2. The random function must satisfy certain stationarity assumptions

a. if you use a covariance and Simple Kriging then you need second order

stationarity

b. if you use a variogram and Ordinary Kriging then you need

"instrinsic" stationarity

c. In the case that the mean function(of the random function, this is

theoretically the same as a trend surface but is sometime estimated

by a trend surface) is a polynomial function of the position

coordinates then you need either second order or intrinsic

stationarity of the residuals. You can either use Universal Kriging

or one of the above (the latter on the residuals)

Now some practical as well as theoretical questions and problems

A. What do you mean by "biased" data? In general in statistics, bias

pertain to

an estimator, i.e., when the expected value of the estimator is not the

same as

the quantity being estimated (estimator, not "estimate"). Authors will

sometimes use the word in an intuitive sense but this is not very precise

and is

hard to either check or utilize.

B. The Kriging estimator (any of the above three types) already compensates

somewhat for clustering in the data locations. Unlike inverse distance

weighting, when there are two data locations close together the weights are

decreased on each location.

C. Now there are aspects of the frequency distribution of the data that

have an

effect. The sample variogram is an average of squared differences, hence a

skewed distribution can distort the sample variogram. Likewise the Kriging

estimator is a weighted average and averages in general are sensitive to a few

"outliers". This is why it is sometimes useful or necessary to use a

non-linear

transform such as the logarithm. That is a big discussion in itself.

There are no distributional assumptions implicit in the derivation of

the kriging equations.

D. The Kriging estimator is always unbiased (separately at each location where

you want an estimate). That is, the equations for the coefficients in kriging

estimator are derived under the constraint of unbiasedness. This is

probably not

the same as an intuitive idea of unbiasedness.

E. Any valid choice of the variogram/covariance function will result in a

unique

solution for the kriging equations (valid means that the variogram is

conditionally negative definite or that the covariance function is positive

definite). However the solution and hence the estimated values are affected by

the choice of the variogram/covariance function, hence it is important to fit

the model well. In practice you will use a search neighborhood and the results

can be sensitive to the search neighborhood parameters.

The problem alluded to by Cressie is related to some tendency to change the

sampling design based on the data collected, i.e., when they found high grades

they tended to drill more exploratory holes nearby and when they found low

grades they tended to not drill more exploratory holes nearby. Thus they

altered

the distribution of the grades by the sampling plan.

Finally, note that a "good" sampling plan for kriging is not the same as a

"good" sampling plan for estimating and modeling the variogram or covariance

function. There are quite a number of papers in the literature on both of

these

issues but no absolute solution.

Donald E. Myers

http://www.u.arizona.edu/*donaldm

Isobel Clark wrote

When I saw the title of your email, I thought you

would be talking about data which was incorrectly

measured -- that is what we generally understand as

'bias'. For example, in the gold mines, the method of

determining how much gold is in a sample can be

consistently lower than the real value (or higher!).

Your problem seems to be in non-uniform (or

non-random) sampling with respect to both location and

value. Clustered/preferential sampling is not a

problem with ordinary geostatistics but can become one

if you use one of the mechanical transformation

methods such as 'normal score' or rank order transform

since these really on 'random' sampling with respect

to value in order to get a representative histogram.

Using a lognormal or other parametric transform is not

affected by these problems unless the preferential

sampling is excessive.

Kriging estimates deal with the clustering and

preferential sampling provided you have either used a

parametric transform or have declustered before your

score or rank transform. So you should get unbiassed

answers for your overall parameters.

Hope this helps some

Isobel

Ruben Roa Ureta wrote:> > Dear list members,

[Non-text portions of this message have been removed]

> >

> > I am wrestling with particular dilemma regarding how to incorporate data

> > collected without a design or probability basis into kriging estimators.

>

>Kriging estimators of interpolated values on a grid coming from intrinsic

>geostatistics do not depend on a sampling desing, i.e. they are the same

>for all sampling designs. In transitive geostatistics they do depend on

>the sampling design. Transitive and intrinsic geostatistics represent the

>same divide as design- and model-based statistics in general. Estimates of

>the estimation variance of the mean or the total across the grid do depend

>on the sampling design both in intrinsic and transitive geostatistics,

>though in essentially different manners.

>

> > In particular I am dealing with data that has clustered and uneven

> > sampling as well as some bias towards higher data values. Is is

> > appropriate to use geostatistics to obtain means and variances in this

> > situation.

>

>Your language is a little imprecise. The bias is defined for estimators

>and not for values so it is rather strange to read "bias towards higher

>data values". I guess you mean that the people collecting the samples had

>an intention to collect more samples where the variable yielded higher

>values. If that is the case, geostatistics can be applied to those samples

>because contrary to design-based inference, the intrinsic geostatistical

>estimator of the mean or the total do not depend on the intentions of the

>people collecting the samples.

>

>Also, "to obtain mean and variances" is imprecise. In intrinsic

>least-squares geostatistics you have the 'kriging variance', which

>fulfills an analytical role in optimizing interpolation, and 'estimation

>variance', which is the second order statement about the quality of the

>estimate of the mean or the total. If your question refer to the

>estimation variance, then you can use intrinsic geostatistis to estimate

>the estimation variance because this estimation do not depend on the

>intentions of the people doing the sampling, though it may depend on the

>geometry of the actual sampling. In fact, it is convenient to perform some

>form of systematic sampling. The latest i have seen on estimation

>variances is:

>Aubry and Debouzie. 2000. Geostatistical estimation variance for the

>spatial mean in two-dimensional systematic sampling. Ecology 81:543-553.

>And there is a program, called EVA, written by Lafont and Petitgas. You

>can ask a copy of the program to Pierre Petitgas.

>

> > I understand that the use of biased data was part of the original dilemma

> > and impetus for the development of geostatistics in the gold mining

> > industry (Cressie, 2003. J Math. Geol. 22:239-252) but I cannot find a

> > satisfactory to the question of whether you can use biased data in

> > geostatistical estimation.

>

>Please see above.

>

> > Based on kriged estimates obtained from biased samples of simulated

> > spatially autocorrelated data sets with known paramaters, I find that

> > kriging means are, on average, less biased than the corresponding

> > arithmetic sample mean. Is this a case where, in practice, the

> > differential spatial weighting of sample data provided by kriging,

> > results in less biased means but with little theoretical basis?

>

>The theoretical basis of model-based estimation in general is sound. I

>guess that is why most of statistics is model-based, i.e. in most of

>statistics expectations for the estimators are computed with reference to

>a model for a random variable rather than with reference to the

>probability of the sample under a sampling design.

>

> > Secondarily are the

> > geostatistical variance estimates obtained from biased data theoretically

> > valid? I guess that you could interpret them in the sense that "if one was

> > to sample the same random process with the same set of biased sample

> > locations, the geostatistical variance is the prediction error that one

> > would observe". The problem lies, I think, in how "representative" the

> > biased samples are of the random process and, with no design basis to the

> > sampling, one is left with the inherent logical confound of model-based

> > estimation methods- that estimates are model-unbiased, provided the model

> > is correct, but I will never know if the model is correct."

>

>When you work with models you are forced to try to understand the physics

>of the problem, how variables relate to each other in reality. You don't

>know if your model is correct for certain, but you can defend it by

>understanding the nature of the problem. On the other hand, when you base

>your judgement on blind random sampling, yo never know if the sample you

>actually obtained share the properties of all the possible samples that

>could have been obtained under the sampling design, though all your

>computations depend on this real and unique sample being replaced by all

>the possible samples that could have been obtained.

>

> > So does

> > geostatistics provide a "better" model for estimation with biased data in

> > practice in certain situations because of the spatial weighting of samples

> > or is this theoretically unsound?

>

>Samples are not biased. Bias is a property of estimators. When you say

>"biased samples" you seem to mean "non random, or intentional samples".

>There is no special problem with intentional samples and they are very

>good in some conditions.

>

> > I have searched the literature with limited definitive answers but wanted

> > to engage the group in this discussion and ask for any references on the

> > subject.

>

>We have discussed this issue a few times in this mail list. See the

>archives at the AI-GEOSTAT website.

>

>Cheers

>Ruben