Loading ...
Sorry, an error occurred while loading the content.

GEOSTATS: Non-normal distribution

Expand Messages
  • Simon Brewer
    Non-normal distribution I posted a question to the mailing list on the 30 Nov, concerning the non-normal distribution of some data, that I am trying to map
    Message 1 of 2 , Dec 4, 1998
    • 0 Attachment
      Non-normal distribution

      I posted a question to the mailing list on the 30 Nov, concerning the
      non-normal distribution of some data, that I am trying to map using the
      kriging process. To follow this up, I am posting a brief synopsis of the
      (many) replies I received. So first, a big THANK YOU to everyone who
      took the time to send me answers, suggestions and references. I have put
      where possible the names of the people who have supplied the
      information. If I've missed any one - I'm very sorry, please let me
      know.

      1) Should the data be normally distributed? The first point, stressed by
      a number of people, was that there is NO requirement for data to be
      normally distributed for use in the computation of semi-variograms, or
      predictions made by kriging. (Gilles Guillot, Daniel Guibal, Pierre
      Goovaerts). However, as it is a linear estimator, kriging is sensitive
      to a few large samples, which may bias the results. Notably, the
      variogram may become unstructed, close to a pure nugget effect. As Med
      Bennett pointed out, normality is a rare find in mining and
      environmental data! - On this point it also worth reading Don Myers'
      message dated 25 June 1997, and the subsequent postings.

      A number of tests of normality have been mentioned, and I list some of
      them here:
      Q-Q (quantile-quantile) plot
      Kolgorov-Smirnov test
      Data density vs. Theoretical density

      2) What can I do with the data?
      Three possibilities were suggested, as follows:

      a) Transformations - Skewed data may be transformed to a normal
      distribution by a non-linear transformation, e.g. logarithmic. If,
      however, the data are then back-transformed to real values, then the
      unbiased property of the kriging estimates is lost. Other
      transformations have been suggested: the family of Box-Cox power
      transformations, the square-root transformation (for count data), the
      arcsine square-root transformation for percentage data, and the normal
      score transform, of which, I'm afraid, I know nothing! (Joyce Witebsky,
      Daniel Guibal, Vera Pawlowsky-Glahn)

      However, for my data (which is the percentage fossil pollen of various
      tree taxa in lake sediments), the distribution contains a large number
      (approx 50%) of zero values (where no pollen was found). Any attempt to
      transform this simply results in a large number of arbitrary values,
      replacing the zeros. So, another method was needed...

      b) Data partitioning. If the zero values were grouped in a way, that a
      physically different domain could be identified, then it would be
      possible to construct variograms for the different domains. I cannot do
      this, except by arbitrary domaining, which would defeat the object of
      the exercise! (Daniel Guibal, Vera Pawlowsky-Glahn),

      c) Indicator kriging. This would allow an estimate at each prediction
      location of whether or not the fossil pollen would have been present
      (zero or non-zero value). This could then be combined with ordinary
      kriging of the non-zero points. (Daniel Guibal).

      Alternatively, the range could be discretised, using a number of
      thresholds. This I have taken directly from Pierre Goovaerts message:
      What I would suggest is to use an indicator approach,
      that is:
      1. discretize the range of variation of your data using
      a given number of thresholds, say 5: the first threshold
      would be 0% (which is close to the median of your sample
      distribution) and 4 other thresholds corresponding to the
      0.6, 0.7, 0.8 and 0.9 quantiles of your distribution.
      2. for each threshold, code each observation into an indicator value
      which is zero if the measured percentage is larger than the threshold
      and one otherwise.
      3. Compute and model the 5 corresponding indicator semivariograms,
      that is the semivariograms of indicator values.
      4. Use indicator kriging to interpolate the probabilities
      to be no greater than each of the 5 thresholds at the nodes
      of your interpolation grid.
      5. At each location, you can now model the conditional cumulative
      distribution function which provides you with the probability
      that the unknown percentage value is no greater than any
      given threshold. You could use the mean of that distribution
      as your estimate and the variance as a measure of uncertainty.

      The method of indicator kriging seems most appropriate to the data I
      have, it appears to be a more robust method. So I will try this at the
      weekend - wish me luck...

      Well, it's been quite a crash course!! I don't believe I have understood
      everything, so if anyone sees any blinding errors in this message -
      again please let me know.

      If anyone wants a copy of all the replies received, and a small
      collection of other messages related to the issue of non-normality,
      please contect me. I did not include all this on this mail, to keep it's
      size down. On the other hand, if it is generally felt that all replies
      should be posted, I would be happy to do that.

      Thanks again

      Simon
      --
      -------------------------------------------------------
      PLEASE NOTE NEW EMAIL ADDRESS : simon.brewer@...

      Simon Brewer

      European Pollen Database
      (Laboratoire de Botanique Historique et Palynologie)
      (IMEP CNRS URA 1152)
      (Faculte St Jerome - Aix Marseille III)

      Centre Universitaire d'Arles
      Place de la Republique
      13200 Arles - France
      Tel: (33)-(0)4 90 96 18 18 Fax: (33)-(0)4 90 93 98 03
      -------------------------------------------------------
      --
      *To post a message to the list, send it to ai-geostats@....
      *As a general service to list users, please remember to post a summary
      of any useful responses to your questions.
      *To unsubscribe, send email to majordomo@... with no subject and
      "unsubscribe ai-geostats" in the message body.
      DO NOT SEND Subscribe/Unsubscribe requests to the list!
    Your message has been successfully submitted and would be delivered to recipients shortly.