Loading ...
Sorry, an error occurred while loading the content.

117AI-GEOSTATS: SUMMARY: Nscore transform & kriging of log normal data sets

Expand Messages
  • Gregoire Dubois
    Mar 4, 2001
    • 0 Attachment
      Dear all,

      I�m sorry for being so late with the summary of the replies I got to the
      following question

      - What are the drawbacks of the normal score transformation
      - What are the latest developments that have been made to handle properly data
      sets that have a log normal distribution.

      I have cut and pasted here under bits and parts of the many replies I
      received. Thanks a lot to:

      Andrew, Joao Felipe, Isobel Clark, Paulo Justiniano Ribeiro Jr, Warr Benjamin,
      Hirotaka Saito, Nelleke Swager, Syed Abdul Rahman Shibli, Raymond J. O'Connor

      I also received a two pages long reply from Donald Myers. I have put the full
      text in the archives of AI-GEOSTATS.

      A. Comments on skewed data sets

      The skewness of a data set can have many different origins and its
      interpretation is of course highly subjective. Many assumptions have therefore
      to be made.

      Most of geostatistics is "distribution free", i.e., the derivation of the
      simple kriging, ordinary kriging and universal kriging equations do not depend
      on a distributional assumption (contrary to what is sometimes claimed).
      However if a distributional assumption is to be useful it should be
      multivariate rather than just univariate. Essentially none of the
      transformations that are used in geostatistics can really preserve or produce
      multivariate distributional properties, they are only univariate
      transformations. For example, a histogram might appear lognormal and a log
      transformation might then appear normal, this does not imply anything about
      multivariate lognormality.

      When handling skewed data sets, one can

      1) remove the long tail and dismiss them as another population, i.e. work with
      the main subset

      2) dismiss the long tail as a set of "erroneous" data (this might be difficult
      to justify)

      3) use the data "as is" and use more robust measures, e.g. madogram, and do
      not work with squared differences which are quite sensitive to long tails. The
      choice of the sill becomes a problem in such a case. In the case of
      multivariate lognormality, one can compute the relationship between the
      variogram/covariance of the original and the variogram/covariance of the
      transformed. This relationship is
      essentially unknown in all other cases because it requires again, knowing the
      multivariate distribution in analytic form (and being able to carrying out
      certain complicated multiple integrations). The multivariate
      transform must be known in analytic form and have a unique inverse. There are
      examples in the literature of using power series approximations for the
      transformation but too often the approximation is reduced to a linear one.

      4) use a transformation and work in the transformed domain before
      backtransforming (watch out for possible biases, where applicable).

      5) use an indicator transform for different thresholds and regard the
      connectivity of extreme values foremost on your agenda. This might be
      difficult to implement in practice, particularly with sparse datasets
      and the deterioration of the number of "pairs" at extreme thresholds where you
      would normally want the best "resolution" anyway (median indicator kriging is
      a possible workaround).

      From the replies I have received, the last seems to be the most frequently
      chosen option.

      B. Problems with Normal Score Transformation (NST)

      NST are useful to reveal the spatial correlation of highly skewed data sets.
      Nevertheless, when a transformation is made prior to the estimation, several
      problems will remain, First, one has introduced an element of ranking rather
      than interval or ratio data for the original. Although one uses the NST data
      as satisfying the requirements of normality, the back transformation process
      can only recover the point estimates (e.g., for confidence limits) within the
      resolution afforded by the original data at that point. If you have sparsely
      distributed data there, the limit estimate has an uncertainty reflecting the
      corresponding coarse steps (more a measurement error than an estimation

      Second, if one has ties in the original data, the NST assigns them to the
      corresponding block of contiguous normal scores. Thus extra variance is
      introduced as a result of handling the ties.

      There are two types of nscore transformation:

      1) a frequency based NST: data are transformed in order to get a histogram
      showing a normal distribution.

      Inconvenient: The ordering of the tied values introduces a bias when doing a
      back-transformation, especially if there are many zero values

      2) an empiricaly based NST: the transformation uses the cumulative
      distribution and assigns the equivalent in the Gaussian space. When performing
      a back-transformation, one get the original value.

      Inconvenient: the histogram of the transformed data is often not normal.
      Nevertheless, the results after kriging and simulation appear to be relatively

      C. Performing kriging with log normal data sets

      Most of the replies underlined the frequent use of an indicator approach. If
      Lognormal kriging seems to be the solution for log normal data sets, it is
      based on the strict assumption that the data set is log normal, assumption
      which is almost impossible to verify unless one has an extensive knowledge of
      the data set.
      If one is willing to assume multi-variate lognormality (univariate is not
      really sufficient) then the transformation is theoretically known and has a
      unique inverse that is also known. Even in this case there
      is the problem of a bias in the re-transformed estimates. A number of authors
      have written on this, Journel, Dowd being two of them (see various papers in
      Math. Geology). As pointed out in those papers the correction in the case of
      Simple Kriging (punctual) is essentially solved, a good approximation is
      available in the case of Ordinary Kriging (punctual). There are some
      theoretical problems in the case of block kriging that are usually handled in
      an almost ad-hoc way, e.g., if the point values are multi-variate lognormal
      then the block values theoretically should not be either univariate or
      multivariate lognormal. There seems to be little in the literature pertaining
      to a mixing of lognormality and non-constant drift(mean). If the non-constant
      mean is not first removed then the complications resulting from a non-linear
      transformation are much worse since the non-constant mean and the mean zero
      random component are not separately transformed.

      For other non-linear transforms (other than the log in the case of
      multivariate lognormality), even knowing the inverse transform in analytic
      form is not sufficient to allow computing the bias adjustment unless
      one also knows the MULTIVARIATE distribution in analytic form. Even then, the
      actual mechanics of doing so can be very tedious or complicated. That is,
      while there is a nice theorem on change of variables in a multiple integral,
      the actual step of applying it to a specific problem can be very tedious and
      complicated. Moreover the theorem has moderately strong assumptions which are
      not always satisfied.

      In the case of multivariate lognormality, one can also determine the
      adjustment needed in the kriging variances. This aspect seems to have
      attracted little attention in the case of other non-linear transforms and it
      is at least as difficult a problem.

      Apparently, lognormal kriging and indicator kriging produce very similar

      D. Recent developments:

      The litterature seems to be quite poor in publications on non-parametric

      The Box-Cox family of transformations which has the log-normal as a particular
      case has been recently proposed.


      CHRISTENSEN, O.F., DIGGLE, P.J. AND RIBEIRO JR, P.J. (2001). Analysing
      positive-valued spatial data: the transformed Gaussian model. In GeoENV III -
      Geostatistics for environmental applications, Quantitative Geology and
      Geostatistics, Kluwer Series (to appear)

      CLARK I. 1996 "Lognormal kriging applied to non-lognormal deposits: two case
      5th International Geostatistics Congress, Wollongong Australia, 22--27

      CLARK I. 1997. Geostatistics applied to skewed data", Conference of the
      International Section on Mathematical Methods in Geology (Mining P��bram
      Symposia) of the International Association for Mathematical Geology, Prague,
      6--10 October, Matematicke Metody V Geologii: P��bram Scientiae Rerum

      CLARK I. 1998. Geostatistical estimation and the lognormal distribution
      Geocongress, Pretoria RSA, June

      SAITO, H. and P. GOOVAERTS. 2000. Geostatistical interpolation of positively
      skewed and censored data in a dioxin contaminated site. Environmental Science
      & Technology, vol.34, No.19: 4228-4235.

      Gregoire Dubois
      Institute of Mineralogy and Petrography
      Dept. of Earth Sciences
      University of Lausanne


      Get free email and a permanent address at http://www.netaddress.com/?N=1

      * To post a message to the list, send it to ai-geostats@...
      * As a general service to the users, please remember to post a summary of any useful responses to your questions.
      * To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
      * Support to the list is provided at http://www.ai-geostats.org