## Re: AI-GEOSTATS: Nornal score transform & skewed distributions

Expand Messages
• ... Decisions, decisions. Some variables are inherently skewed (e.g. rock permeability, mineral concentrations, etc) while some would raise an eyebrow if found
Message 1 of 2 , Feb 25, 2001
on 23/02/01 17:14, Gregoire Dubois at gregoire.dubois@... wrote:

> the normal score transformation set seems to be, at least in the litterature,
> the magic solution to handle a skewed data set. Could anyone point me the main
> drawbacks of such a step ?

Decisions, decisions. Some variables are inherently skewed (e.g. rock
permeability, mineral concentrations, etc) while some would raise
an eyebrow if found skewed (e.g. porosity). In the latter case, perhaps
two different populations are at work. Some can be skewed because of
outliers, but what exactly would qualify as an outlier would be anybody's
guess. My outlier might be your most significant data discovery. Yet others
tend to ignore the skewness and try to work with more "robust" measures
such as the madogram or family of pairwise relative variograms. The
problem with some transforms (e.g. logarithmic) is that a backtransform
will introduce a bias. Many transforms are also possible, e.g. rank,
uniform and normal score, logarithmic, etc so the proper choice would
have to be made. So in summary, one would have the following choices with
a skewed dataset:

1) remove the long tail and dismiss them as another population, i.e. work
with the main subset.
2) dismiss the long tail as a set of "erroneous" data.
3) use the data "as is" and use more robust measures, e.g. madogram, and
do not work with squared differences which are quite sensitive to long
tails.
4) use a transformation and work in the transformed domain before
backtransforming (watch out for possible biases, where applicable).
5) use an indicator transform for different thresholds and regard the
connectivity of extreme values foremost on your agenda.

(4) is most widespread, at least in oil and gas studies. The prevalence
of the porosity-perm correlation based on log transformed permeabilities
is almost de riguer in any subsurface study. (5) is conceptually elegant
but difficult to implement in practice, particularly with sparse datasets
and the deterioration of the number of "pairs" at extreme thresholds where
you would normally want the best "resolution" anyway (median indicator
kriging is a possible workaround). (2) is difficult to justify unless
something has gone drastically wrong. (1) is probably more hassle than
it's worth. (3) would be OK, but what sill would one use? Unless, the
objective is to get just one representative map (just use the range, and
assume any sill you want).

> I would be also curious to know what the latest developments are that have
> been made to handle properly data sets that have a lognormal distribution.

There's lognormal kriging, but that probably wouldn't qualify as the
"latest" development.

Syed

--
* To post a message to the list, send it to ai-geostats@...
* As a general service to the users, please remember to post a summary of any useful responses to your questions.
* To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
* Support to the list is provided at http://www.ai-geostats.org
Your message has been successfully submitted and would be delivered to recipients shortly.