Loading ...
Sorry, an error occurred while loading the content.

GEOSTATS: Normality, cross-validation, etc

Expand Messages
  • Donald Myers
    Some observations about some of the comments that have been appearing. Sometimes people are using terms with different meanings or are confusing the terms or
    Message 1 of 1 , Dec 7, 1998
    • 0 Attachment
      Some observations about some of the comments that have been appearing.
      Sometimes people are using terms with different meanings or are confusing
      the terms or ideas.

      1. There is a difference between saying that the "data" is normal
      (gaussian) and saying that the random function is multi-variate normal. If
      the data is viewed as a sample from one realization of the random function
      then the sample histogram is an estimator of the SPATIAL distribution but
      this is not the same as the ensemble distribution.

      It is not quite correct to say that the sample histogram is normal (or
      non-normal). The histogram is based on putting the data into classes or
      bins and hence it can NOT be normal since the data set is discrete (and in
      fact finite) where as the normal distribution is continuous. Normality is
      not a "decision", one may decide to make the ASSUMPTION of normality, i.e.,
      to use an hypothesis of normality.

      2. There are at least six difference statistics that can be computed from
      cross-validation; the (i) average error, (ii) the average squared error,
      (iii)the average standardized squared error, (iv) the sample correlation
      between the estimated and the observed data values, (v) the sample
      correlation between the estimated values and the "errors" (which could be
      standardized or not), (vi) the histogram of the errors (standardized or
      not). The expected value of (i) is zero since the kriging estimator is
      unbiased, the expected value of the second is the sum of the kriging
      variances, the expected value of (iii) is one. In the case of simple
      kriging the expected value of (iv) is one (but not in the case of ordinary
      kriging unless the Lagrange multipliers are all zero), the expected value
      of (v) is zero in the case of simple kriging. Unfortunately these
      statistics are not equally sensitive to changes in the variogram model (or
      its parameters), the choice of the search neighborhood. Cross-validation as
      a tool for evaluating possible variogram models is neither perfect nor
      uniformly imperfect. It is foolish to think that with the use of
      cross-validation (or MLE or least squares fitting of the variogram or some
      other favorite method) that we can find the "right" variogram model. In
      some cases cross-validation may be more useful for identifying "unusual"
      data locations than for evaluating the variogram model. Cross-validation
      can be useful for comparing one variogram model against another (taking
      into account the effect of the search neighborhood.

      3. Authors frequently say that the kriging estimator is "robust" but one
      should ask exactly what that means. It is true that small changes in the
      variogram parameters and/or small changes in the data values do not produce
      large changes in the kriged estimates. In order to quantify this
      "robustness" however one would need to be a bit more specific. As is
      well-known the kriging estimator is in two parts, the weights (obtained
      from the kriging equations which do not explicitly depend on the data) and
      the data. The weights then depend on only two things, the variogram model
      and the search neighborhood. To detect change in the kriging weights (which
      is a vector not a scalar), one must quantify change in the variogram model,
      at least two definitions of "neighborhood" for variograms (see Armstrong
      and Diamond) have been given. One is simply the maximum absolute difference
      between two variograms, the other is a ratio (both are sensitive to the
      maximum lag considered), a third is essentially differentiability.
      Unfortunately none of these is best for all circumstances nor are they
      equivalent or ranked (i.e., one implying the other(s)). In trying to
      quantify the "size" of the change vector (change in the weights vector) the
      two commonest measures would be the maximum absolute value (of an entry)
      and the sum of the squares of the entries. The second is easier to work
      with but has other dis-advantages.

      THE REAL PROBLEM is that in most (potential) applications of geostatistics
      there are no state equations from which one could derive the model
      parameters or verify the underlying assumptions, hence the variogram, the
      assumption of second order or intrinsic stationarity (or multi-variate
      normality) become just that, they are assumptions. As yet all purported
      inference tests depend on random sampling (random site selection is not
      random sampling) or strong distributional assumptions such as multi-variate
      normality. The problem is ILL_POSED, i.e., we do not have enough
      information to make a unique inference about the values at non-data
      locations. To do so we must make model assumptions and then the question is
      how strongly do our results depend on the model assumptions and how
      strongly on the data?

      The data is an inanimate object hence it does NOT "intervene" in anything,
      nor does it "honor" anything.

      There is no substitute for having some understanding of the particular
      phenomenon to which geostatistical tools are being applied, geostatistics
      should not be used in a vacumn. The bottom line is does it produce useful
      results, that is usually not a statistical or a mathematical question.

      Donald E. Myers
      Department of Mathematics
      University of Arizona
      Tucson, AZ 85721



      *To post a message to the list, send it to ai-geostats@....
      *As a general service to list users, please remember to post a summary
      of any useful responses to your questions.
      *To unsubscribe, send email to majordomo@... with no subject and
      "unsubscribe ai-geostats" in the message body.
      DO NOT SEND Subscribe/Unsubscribe requests to the list!
    Your message has been successfully submitted and would be delivered to recipients shortly.