Loading ...
Sorry, an error occurred while loading the content.

Re: AI-GEOSTATS: estimation with biased data- SUMMARY

Expand Messages
  • john walter
    Thank you for the prompt responses to my question. I have posted a summary of the responses below. Thank you. John Walter Donald E. Myers wrote There are a
    Message 1 of 1 , Mar 30, 2004
      Thank you for the prompt responses to my question. I have posted a summary
      of the responses below.

      Thank you.

      John Walter

      Donald E. Myers wrote

      There are a couple of underlying assumptions that are critical, you will then
      have to ask how your problem/application relates to those.
      1. The data is considered to be a non-random sample from one realization of a
      random function.
      Hence "probability basis" as it relates to the design of a sample
      pattern is not relevant. "Pattern" in this case pertains to the data
      locations, not the distribution of values
      2. The random function must satisfy certain stationarity assumptions
      a. if you use a covariance and Simple Kriging then you need second order
      b. if you use a variogram and Ordinary Kriging then you need
      "instrinsic" stationarity
      c. In the case that the mean function(of the random function, this is
      theoretically the same as a trend surface but is sometime estimated
      by a trend surface) is a polynomial function of the position
      coordinates then you need either second order or intrinsic
      stationarity of the residuals. You can either use Universal Kriging
      or one of the above (the latter on the residuals)
      Now some practical as well as theoretical questions and problems
      A. What do you mean by "biased" data? In general in statistics, bias
      pertain to
      an estimator, i.e., when the expected value of the estimator is not the
      same as
      the quantity being estimated (estimator, not "estimate"). Authors will
      sometimes use the word in an intuitive sense but this is not very precise
      and is
      hard to either check or utilize.
      B. The Kriging estimator (any of the above three types) already compensates
      somewhat for clustering in the data locations. Unlike inverse distance
      weighting, when there are two data locations close together the weights are
      decreased on each location.
      C. Now there are aspects of the frequency distribution of the data that
      have an
      effect. The sample variogram is an average of squared differences, hence a
      skewed distribution can distort the sample variogram. Likewise the Kriging
      estimator is a weighted average and averages in general are sensitive to a few
      "outliers". This is why it is sometimes useful or necessary to use a
      transform such as the logarithm. That is a big discussion in itself.
      There are no distributional assumptions implicit in the derivation of
      the kriging equations.
      D. The Kriging estimator is always unbiased (separately at each location where
      you want an estimate). That is, the equations for the coefficients in kriging
      estimator are derived under the constraint of unbiasedness. This is
      probably not
      the same as an intuitive idea of unbiasedness.
      E. Any valid choice of the variogram/covariance function will result in a
      solution for the kriging equations (valid means that the variogram is
      conditionally negative definite or that the covariance function is positive
      definite). However the solution and hence the estimated values are affected by
      the choice of the variogram/covariance function, hence it is important to fit
      the model well. In practice you will use a search neighborhood and the results
      can be sensitive to the search neighborhood parameters.
      The problem alluded to by Cressie is related to some tendency to change the
      sampling design based on the data collected, i.e., when they found high grades
      they tended to drill more exploratory holes nearby and when they found low
      grades they tended to not drill more exploratory holes nearby. Thus they
      the distribution of the grades by the sampling plan.
      Finally, note that a "good" sampling plan for kriging is not the same as a
      "good" sampling plan for estimating and modeling the variogram or covariance
      function. There are quite a number of papers in the literature on both of
      issues but no absolute solution.
      Donald E. Myers

      Isobel Clark wrote

      When I saw the title of your email, I thought you
      would be talking about data which was incorrectly
      measured -- that is what we generally understand as
      'bias'. For example, in the gold mines, the method of
      determining how much gold is in a sample can be
      consistently lower than the real value (or higher!).
      Your problem seems to be in non-uniform (or
      non-random) sampling with respect to both location and
      value. Clustered/preferential sampling is not a
      problem with ordinary geostatistics but can become one
      if you use one of the mechanical transformation
      methods such as 'normal score' or rank order transform
      since these really on 'random' sampling with respect
      to value in order to get a representative histogram.
      Using a lognormal or other parametric transform is not
      affected by these problems unless the preferential
      sampling is excessive.
      Kriging estimates deal with the clustering and
      preferential sampling provided you have either used a
      parametric transform or have declustered before your
      score or rank transform. So you should get unbiassed
      answers for your overall parameters.
      Hope this helps some

      Ruben Roa Ureta wrote:
      > > Dear list members,
      > >
      > > I am wrestling with particular dilemma regarding how to incorporate data
      > > collected without a design or probability basis into kriging estimators.
      >Kriging estimators of interpolated values on a grid coming from intrinsic
      >geostatistics do not depend on a sampling desing, i.e. they are the same
      >for all sampling designs. In transitive geostatistics they do depend on
      >the sampling design. Transitive and intrinsic geostatistics represent the
      >same divide as design- and model-based statistics in general. Estimates of
      >the estimation variance of the mean or the total across the grid do depend
      >on the sampling design both in intrinsic and transitive geostatistics,
      >though in essentially different manners.
      > > In particular I am dealing with data that has clustered and uneven
      > > sampling as well as some bias towards higher data values. Is is
      > > appropriate to use geostatistics to obtain means and variances in this
      > > situation.
      >Your language is a little imprecise. The bias is defined for estimators
      >and not for values so it is rather strange to read "bias towards higher
      >data values". I guess you mean that the people collecting the samples had
      >an intention to collect more samples where the variable yielded higher
      >values. If that is the case, geostatistics can be applied to those samples
      >because contrary to design-based inference, the intrinsic geostatistical
      >estimator of the mean or the total do not depend on the intentions of the
      >people collecting the samples.
      >Also, "to obtain mean and variances" is imprecise. In intrinsic
      >least-squares geostatistics you have the 'kriging variance', which
      >fulfills an analytical role in optimizing interpolation, and 'estimation
      >variance', which is the second order statement about the quality of the
      >estimate of the mean or the total. If your question refer to the
      >estimation variance, then you can use intrinsic geostatistis to estimate
      >the estimation variance because this estimation do not depend on the
      >intentions of the people doing the sampling, though it may depend on the
      >geometry of the actual sampling. In fact, it is convenient to perform some
      >form of systematic sampling. The latest i have seen on estimation
      >variances is:
      >Aubry and Debouzie. 2000. Geostatistical estimation variance for the
      >spatial mean in two-dimensional systematic sampling. Ecology 81:543-553.
      >And there is a program, called EVA, written by Lafont and Petitgas. You
      >can ask a copy of the program to Pierre Petitgas.
      > > I understand that the use of biased data was part of the original dilemma
      > > and impetus for the development of geostatistics in the gold mining
      > > industry (Cressie, 2003. J Math. Geol. 22:239-252) but I cannot find a
      > > satisfactory to the question of whether you can use biased data in
      > > geostatistical estimation.
      >Please see above.
      > > Based on kriged estimates obtained from biased samples of simulated
      > > spatially autocorrelated data sets with known paramaters, I find that
      > > kriging means are, on average, less biased than the corresponding
      > > arithmetic sample mean. Is this a case where, in practice, the
      > > differential spatial weighting of sample data provided by kriging,
      > > results in less biased means but with little theoretical basis?
      >The theoretical basis of model-based estimation in general is sound. I
      >guess that is why most of statistics is model-based, i.e. in most of
      >statistics expectations for the estimators are computed with reference to
      >a model for a random variable rather than with reference to the
      >probability of the sample under a sampling design.
      > > Secondarily are the
      > > geostatistical variance estimates obtained from biased data theoretically
      > > valid? I guess that you could interpret them in the sense that "if one was
      > > to sample the same random process with the same set of biased sample
      > > locations, the geostatistical variance is the prediction error that one
      > > would observe". The problem lies, I think, in how "representative" the
      > > biased samples are of the random process and, with no design basis to the
      > > sampling, one is left with the inherent logical confound of model-based
      > > estimation methods- that estimates are model-unbiased, provided the model
      > > is correct, but I will never know if the model is correct."
      >When you work with models you are forced to try to understand the physics
      >of the problem, how variables relate to each other in reality. You don't
      >know if your model is correct for certain, but you can defend it by
      >understanding the nature of the problem. On the other hand, when you base
      >your judgement on blind random sampling, yo never know if the sample you
      >actually obtained share the properties of all the possible samples that
      >could have been obtained under the sampling design, though all your
      >computations depend on this real and unique sample being replaced by all
      >the possible samples that could have been obtained.
      > > So does
      > > geostatistics provide a "better" model for estimation with biased data in
      > > practice in certain situations because of the spatial weighting of samples
      > > or is this theoretically unsound?
      >Samples are not biased. Bias is a property of estimators. When you say
      >"biased samples" you seem to mean "non random, or intentional samples".
      >There is no special problem with intentional samples and they are very
      >good in some conditions.
      > > I have searched the literature with limited definitive answers but wanted
      > > to engage the group in this discussion and ask for any references on the
      > > subject.
      >We have discussed this issue a few times in this mail list. See the
      >archives at the AI-GEOSTAT website.

      [Non-text portions of this message have been removed]
    Your message has been successfully submitted and would be delivered to recipients shortly.