Loading ...
Sorry, an error occurred while loading the content.

Re: AI-GEOSTATS: Cracks in the foundations?

Expand Messages
  • Donald E. Myers
    Re: the article by S. Henley ( Geostatistics-cracks in the foundations? ) There was a debate of sorts in Mathematical Geology in 1987 but I don t think it
    Message 1 of 5 , Oct 28, 2001
    • 0 Attachment
      Re: the article by S. Henley ("Geostatistics-cracks in the foundations?")

      There was a debate of sorts in Mathematical Geology in
      1987 but I don't think it contributed much to understanding geostatistics
      including its strengths and weaknesses. Here are a few more thoughts.

      GENERAL COMMENTS

      All of the concerns raised in the article are well-known in the geostatistical
      literature, in particular in Mathematical Geology and Water Resources
      Research as well as in the publications emanating from the various
      APCOMS. Unfortunately the problems, strengths and weaknesses of
      geostatistics are not well described in the article, in some cases there are
      clear errors or mis-representations.

      I. Although geostatistics is clearly and deservedly associated with G. Matheron
      (and some of his early students such as A. Journel, J.-P. Chiles,
      P. Delfiner as well as others from the Fontainebleau center), his
      work is similar to or duplicates the work of Matern and Ghandin. Matheron
      acknowledges this in some of his writings. In addition as almost always
      happens, his work builds on work done earlier by many people in a
      number of fields. Finally, geostatistics is not a stagnant field, it
      has developed and evolved due to the work of many people, not always
      in directions directly related to Matheron's original work (this is
      not to belittle or denigrate Matheron's contribution but to recognize
      that geostatistics as it exists today is not the same as 30 years ago).
      Henley's article seems to imply that little has changed in the interim.

      II. Although there are aspects of the applications in hydrology that
      incorporate state equations derived from basic principles, geostatistics
      has most often been used for the analysis of spatial data when no
      state equations are available. This means that the problem is really
      ill-posed, i.e., the solution is not unique and to obtain a unique solution
      one must impose some form of model. The stochastic model implicit in
      Matheron's work serves this purpose. However, one should not be mis-led
      in thinking that this is totally artificial. There are clear connections
      with Bayesian statistics (see the work of Wahba as early as 1970),
      Thin Plate Splines and the more general interpolation methodology
      known in the numerical analysis literature as "Radial Basis Functions"
      which is a deterministic approach to the problem. Again Henley seems to
      ignore this background.

      MORE SPECIFIC COMMENTS

      III. Henley makes a great deal out of the point that kriging is a "linear
      method". This is true and not true. The kriging estimator (Simple,
      Ordinary, Universal) for the value at a non-data location or the
      spatial average over an area or volume (e.g., average grade in a mining
      block) is a linear combination of the data. Written as an interpolating
      function it is NOT linear, i.e., not a linear function of the position
      coordinates. Moreover in the case of multivariate normality, the
      Simple Kriging estimator is the conditional expectation (which is THE minimum
      variance estimator, linear or otherwise). Of course multivariate
      normality is a strong condition.

      IV. Henley claims that it is necessary that the error distribution be normal,
      this is absolutely wrong.

      V. While there are individuals who would identify themselves as "geostatisticians"
      it is more likely that individuals using geostatistics as well as
      contributing to new developments in the field would call themselves:
      mining engineers, petroleum engineers, geologists, soil scientists,
      hydrologists, statisticians, mathematicians, ecologists, geographers,
      etc. Hence the constant reference to (and laying blame on) "geostatisticians"
      is mis-leading.

      VI. Henley fails to distinguish between "estimation variance" and
      "kriging variance" (the latter being the minimized value of the former,
      i.e., the weights in the kriging estimator are obtained by this
      minimization).

      VII. Henley fails to distinguish between the sample/experimental
      variogram and the (theoretical) model. The sample variogram is AN
      estimator of the model (and certainly a number of authors would
      say that it is not the only choice). Ultimately however it is the
      variogram (theoretical) that is of interest and which is use in
      the kriging equations (to obtain the coefficients in the kriging
      estimator).

      VIII. It is true that there are similarities between the Inverse
      Distance Weighting (IWD) estimator and the kriging estimator. (1) As
      noted by Henley, both are weighted averages of the data (but in the
      case of IWD the coefficients are always non-negative, this is not
      true for the kriging estimator), (2) in the usual formulation, IWD
      is isotropic (only distance is used, not direction), (3) IWD is
      not a "perfect/exact" interpolator as is the kriging estimator since
      one can estimate at a data location using the data value at that location
      in IWD (this would involve a zero distance), (4) while IWD in some
      sense incorporates the spatial correlation between the value at the location
      to be estimated and a data location (all pairings), IWD does NOT
      incorporate the spatial correlation between the values at pairs of
      data locations hence some information is ignored in IWD. Finally
      one might attempt to optimize the choice of the exponent in IWD, one
      size does NOT fit all (see a paper by Kane et al, Computers & Geosciences
      1982).

      IX. It is certainly true that kriging will "smooth" the data, howeve
      this is true of all interpolation methods/algorithms. This is one
      reason why some advocate the use of conditional simulation as an
      alternative.

      X. All statistical techniques are subject to problems resulting from
      a lack of data, many of the problems Henley identifies or asserts are
      related to insufficient data. Unfortunately, data costs money and one
      will almost never have enough (of either). The related problem is how the
      data is collected. In more typical applications of statistics, one
      "designs" the experiment to ensure that the underlying statistical
      assumptions are satisifed. This simply will not work in geostatistics.

      XI. Henley correctly identifies "stationarity" or rather the lack of it
      as a critical problem. However he does not quite describe it correctly. The
      constant mean condition (which is only part of the definition of
      second order stationarity) pertains to the implied random function, not
      to the data. Since one has data from at most one realization of the
      random function one can not statistically "test" this assumption. This
      is one of the places where the lack of data is a serious problem,e.g,
      to decide whether to partition the region of interest into separate
      regions to obtain "stationarity" on each separately. Note that
      the condition on the variogram/covariance (being only a function of
      the separation vector) is critical and not implied by the constant
      mean condition.

      XII. The statement "The problem with this method was that the semivariogram
      itself was sensitive to the form of the deterministic surface.
      Therefore, it required a number of iterations of kriging, variogram
      computation, and model fitting, to converge towards a consistent
      solution." contains a germ of truth but it is also confused and mis-leading.
      What Henley is presumably referring to is that if one fits a trend surface to
      the data and then computes a sample variogram using the residuals, the
      resulting sample variogram is biased (see a paper by N. Cressie in
      Mathematical Geology for example and a much earlier paper by another
      author in the proceedings of the NATO conference of 1975). Then there
      was a dissertation in the Dept Hydrology (J. Samper) about 1987) on
      a maximum likelihood method (an iterative application) for fitting the
      sample variogram to residuals. This problem is not completely resolved
      because as is frequently the case in geostatistics or rather in the
      application of geostatistics there is a discrepancy between theory and
      application. It is well known (see Matheron's 1971 Fountainebleau Summer
      School Lecture Notes) that the optimal estimator of the drift, i.e., the
      non-constant mean of the random function, is obtained by kriging. However
      to apply this means that one must first have the variogram model, yet
      one can not model the variogram using the original data. The problem is
      circular. In practice the problem in fact often addressed by fitting a
      trend surface to the data and fitting the variogram to the residuals.

      The real problem is the fact that the sample variogram only estimates
      the variogram if the mean of the random function is constant. When the
      mean is not constant and in particular when the fitted trend surface is
      not just a constant, the sample variogram will exhibit a very rapid growth
      rate (quadratic or greater). There are no valid variogram models with
      this property. Hence if the sample variogram exhibits this growth condition
      this property is taken as evidence of a non-stationarity. Note that this
      property or characteristic is not absolute, it may only appear for large lag
      distances and hence one may be able to fit a variogram model to the
      sample variogram using only the information for short lags.


      XIII. The statements "These methods have found little practical application
      because of their complexity, and the inherent instability of their
      solutions. Furthermore, the resulting kriging system is no longer
      linear and thus loses its ideal "BLUE" properties." are partially correct and
      partially incorrect.

      It is well known that the form of the kriging estimator and the kriging
      equations when using generalized covariances is exactly the same as for
      Universal Kriging. It is also well known that the kriging estimator
      (written in the dual form), when using the right choice of a generalized
      covariance, is the same as the thin plate spline. One must use the so-called
      "spline covariance", this was implemented in the BLUEPACK software back
      in the 80's and is also in the current ISATIS. It is NOT correct to
      say that the resulting kriging system is not linear, the form of the kriging
      estimator is unchanged and the kriging equations ARE still linear. The
      kriging estimator is still a BLUE contrary to Henley's claim.

      There are several reasons why generalized covariances are not widely used
      (for a particularly good presentation see the recent book by Delfiner
      and Chiles). One is that to write software is much more complicated, i.e.,
      to determine the order of the non-stationarity. A second is that as
      contrasted with the family of known variograms there are only a few
      known generalized covariances (although every variogram corresponds to
      a generalized covariance) and these are all isotropic. Hence in practice
      one is likely to revert to using a variogram. Third, it is not so difficult
      to use geostatistics/kriging because of the availability of both free
      and moderately inexpensive software, the theoretical questions are largely
      taken care of in writing the software. There is essentially no free
      software that incorporates the use of generalized covariances (ISATIS is
      rather expensive because it is intended primarily by petroleum and mining
      companies)

      XIV. The first half of the statement "Although regionalised variable theory
      does not require normal (gaussian) distribution of the data, it does
      assume normal distribution of error terms." is almost true but the second half
      is completely false. Henley consistently fails to distinguish between
      properties of a particular data set and the properties of the random function.
      The kriging equations are derived without making any distributional
      assumptions. It is true that kriging (and any other interpolation method)
      will tend to smooth the data. This characteristic is exagerated when the
      distribution of the data is not approximately symmetric (but not necessarily
      normal) and if the distribution of the data has "fat tails". Both lognormal
      kriging and indicator kriging are ways to deal with this problem.

      In claiming that the kriging estimator is no longer "BLUE" when using the
      a log transformation, Henley is presumably referring to the fact that
      if one simply exponentiates the result then there is a bias. Under an
      assumption of multivariate lognormality one can compute the bias and hence
      compensate for it. This has received considerable attention in the geostatistics
      literature. The problem of course is that is not possible to absolutely
      know whether the underlying random function (for a particular data set) is
      multivariate lognormal (univariate lognormal is a much weaker condition).
      All one can do is to determine whether that assumption is reasonable.

      Finally, it is quite plausible to ask whether there might be a better way
      to approach the problem. There is nothing really wrong with Matheron's
      work, the problem is knowing whether the assumptions implicit in its
      use are valid. Since those assumptions pertain to the random function
      , RATHER THAN THE DATA, it is not possible to completely answer the
      question. The real question, particularly for the practitioner is whether
      geostatistics produces useful results. "useful results" does not have
      a very precise meaning in general but it will have a very real meaning
      to the practitioner. Geostatistics is not a cure-all nor is it useful
      for all problems. It is not a black box scheme and must be used with
      some care. In some cases, i.e., for some data sets and some objectives,
      a user need only have access to a software package such as GEOEAS,
      G-STAT or even the add-ons available in ARCVIEW, S-PLUS, etc. In other
      cases it may be necessary to seek the advice and assistance of someone
      more experienced and with a stronger understanding of the mathematics/
      statistics in order to adequately apply the geostatistics to the data
      set.

      For a slightly different take on the same general question see a recent book by M.L. Stein

      Donald E. Myers
      http://www.u.arizona.edu/~donaldm


      --
      * To post a message to the list, send it to ai-geostats@...
      * As a general service to the users, please remember to post a summary of any useful responses to your questions.
      * To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
      * Support to the list is provided at http://www.ai-geostats.org
    Your message has been successfully submitted and would be delivered to recipients shortly.