Loading ...
Sorry, an error occurred while loading the content.

Re: AI-GEOSTATS: Nornal score transform & skewed distributions

Expand Messages
  • Syed Abdul Rahman Shibli
    ... Decisions, decisions. Some variables are inherently skewed (e.g. rock permeability, mineral concentrations, etc) while some would raise an eyebrow if found
    Message 1 of 2 , Feb 25, 2001
    • 0 Attachment
      on 23/02/01 17:14, Gregoire Dubois at gregoire.dubois@... wrote:

      > the normal score transformation set seems to be, at least in the litterature,
      > the magic solution to handle a skewed data set. Could anyone point me the main
      > drawbacks of such a step ?

      Decisions, decisions. Some variables are inherently skewed (e.g. rock
      permeability, mineral concentrations, etc) while some would raise
      an eyebrow if found skewed (e.g. porosity). In the latter case, perhaps
      two different populations are at work. Some can be skewed because of
      outliers, but what exactly would qualify as an outlier would be anybody's
      guess. My outlier might be your most significant data discovery. Yet others
      tend to ignore the skewness and try to work with more "robust" measures
      such as the madogram or family of pairwise relative variograms. The
      problem with some transforms (e.g. logarithmic) is that a backtransform
      will introduce a bias. Many transforms are also possible, e.g. rank,
      uniform and normal score, logarithmic, etc so the proper choice would
      have to be made. So in summary, one would have the following choices with
      a skewed dataset:

      1) remove the long tail and dismiss them as another population, i.e. work
      with the main subset.
      2) dismiss the long tail as a set of "erroneous" data.
      3) use the data "as is" and use more robust measures, e.g. madogram, and
      do not work with squared differences which are quite sensitive to long
      tails.
      4) use a transformation and work in the transformed domain before
      backtransforming (watch out for possible biases, where applicable).
      5) use an indicator transform for different thresholds and regard the
      connectivity of extreme values foremost on your agenda.

      (4) is most widespread, at least in oil and gas studies. The prevalence
      of the porosity-perm correlation based on log transformed permeabilities
      is almost de riguer in any subsurface study. (5) is conceptually elegant
      but difficult to implement in practice, particularly with sparse datasets
      and the deterioration of the number of "pairs" at extreme thresholds where
      you would normally want the best "resolution" anyway (median indicator
      kriging is a possible workaround). (2) is difficult to justify unless
      something has gone drastically wrong. (1) is probably more hassle than
      it's worth. (3) would be OK, but what sill would one use? Unless, the
      objective is to get just one representative map (just use the range, and
      assume any sill you want).

      > I would be also curious to know what the latest developments are that have
      > been made to handle properly data sets that have a lognormal distribution.

      There's lognormal kriging, but that probably wouldn't qualify as the
      "latest" development.

      Syed


      --
      * To post a message to the list, send it to ai-geostats@...
      * As a general service to the users, please remember to post a summary of any useful responses to your questions.
      * To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
      * Support to the list is provided at http://www.ai-geostats.org
    Your message has been successfully submitted and would be delivered to recipients shortly.