AI-GEOSTATS: SUMMARY: Nscore transform & kriging of log normal data sets
- Dear all,
I�m sorry for being so late with the summary of the replies I got to the
- What are the drawbacks of the normal score transformation
- What are the latest developments that have been made to handle properly data
sets that have a log normal distribution.
I have cut and pasted here under bits and parts of the many replies I
received. Thanks a lot to:
Andrew, Joao Felipe, Isobel Clark, Paulo Justiniano Ribeiro Jr, Warr Benjamin,
Hirotaka Saito, Nelleke Swager, Syed Abdul Rahman Shibli, Raymond J. O'Connor
I also received a two pages long reply from Donald Myers. I have put the full
text in the archives of AI-GEOSTATS.
A. Comments on skewed data sets
The skewness of a data set can have many different origins and its
interpretation is of course highly subjective. Many assumptions have therefore
to be made.
Most of geostatistics is "distribution free", i.e., the derivation of the
simple kriging, ordinary kriging and universal kriging equations do not depend
on a distributional assumption (contrary to what is sometimes claimed).
However if a distributional assumption is to be useful it should be
multivariate rather than just univariate. Essentially none of the
transformations that are used in geostatistics can really preserve or produce
multivariate distributional properties, they are only univariate
transformations. For example, a histogram might appear lognormal and a log
transformation might then appear normal, this does not imply anything about
When handling skewed data sets, one can
1) remove the long tail and dismiss them as another population, i.e. work with
the main subset
2) dismiss the long tail as a set of "erroneous" data (this might be difficult
3) use the data "as is" and use more robust measures, e.g. madogram, and do
not work with squared differences which are quite sensitive to long tails. The
choice of the sill becomes a problem in such a case. In the case of
multivariate lognormality, one can compute the relationship between the
variogram/covariance of the original and the variogram/covariance of the
transformed. This relationship is
essentially unknown in all other cases because it requires again, knowing the
multivariate distribution in analytic form (and being able to carrying out
certain complicated multiple integrations). The multivariate
transform must be known in analytic form and have a unique inverse. There are
examples in the literature of using power series approximations for the
transformation but too often the approximation is reduced to a linear one.
4) use a transformation and work in the transformed domain before
backtransforming (watch out for possible biases, where applicable).
5) use an indicator transform for different thresholds and regard the
connectivity of extreme values foremost on your agenda. This might be
difficult to implement in practice, particularly with sparse datasets
and the deterioration of the number of "pairs" at extreme thresholds where you
would normally want the best "resolution" anyway (median indicator kriging is
a possible workaround).
From the replies I have received, the last seems to be the most frequently
B. Problems with Normal Score Transformation (NST)
NST are useful to reveal the spatial correlation of highly skewed data sets.
Nevertheless, when a transformation is made prior to the estimation, several
problems will remain, First, one has introduced an element of ranking rather
than interval or ratio data for the original. Although one uses the NST data
as satisfying the requirements of normality, the back transformation process
can only recover the point estimates (e.g., for confidence limits) within the
resolution afforded by the original data at that point. If you have sparsely
distributed data there, the limit estimate has an uncertainty reflecting the
corresponding coarse steps (more a measurement error than an estimation
Second, if one has ties in the original data, the NST assigns them to the
corresponding block of contiguous normal scores. Thus extra variance is
introduced as a result of handling the ties.
There are two types of nscore transformation:
1) a frequency based NST: data are transformed in order to get a histogram
showing a normal distribution.
Inconvenient: The ordering of the tied values introduces a bias when doing a
back-transformation, especially if there are many zero values
2) an empiricaly based NST: the transformation uses the cumulative
distribution and assigns the equivalent in the Gaussian space. When performing
a back-transformation, one get the original value.
Inconvenient: the histogram of the transformed data is often not normal.
Nevertheless, the results after kriging and simulation appear to be relatively
C. Performing kriging with log normal data sets
Most of the replies underlined the frequent use of an indicator approach. If
Lognormal kriging seems to be the solution for log normal data sets, it is
based on the strict assumption that the data set is log normal, assumption
which is almost impossible to verify unless one has an extensive knowledge of
the data set.
If one is willing to assume multi-variate lognormality (univariate is not
really sufficient) then the transformation is theoretically known and has a
unique inverse that is also known. Even in this case there
is the problem of a bias in the re-transformed estimates. A number of authors
have written on this, Journel, Dowd being two of them (see various papers in
Math. Geology). As pointed out in those papers the correction in the case of
Simple Kriging (punctual) is essentially solved, a good approximation is
available in the case of Ordinary Kriging (punctual). There are some
theoretical problems in the case of block kriging that are usually handled in
an almost ad-hoc way, e.g., if the point values are multi-variate lognormal
then the block values theoretically should not be either univariate or
multivariate lognormal. There seems to be little in the literature pertaining
to a mixing of lognormality and non-constant drift(mean). If the non-constant
mean is not first removed then the complications resulting from a non-linear
transformation are much worse since the non-constant mean and the mean zero
random component are not separately transformed.
For other non-linear transforms (other than the log in the case of
multivariate lognormality), even knowing the inverse transform in analytic
form is not sufficient to allow computing the bias adjustment unless
one also knows the MULTIVARIATE distribution in analytic form. Even then, the
actual mechanics of doing so can be very tedious or complicated. That is,
while there is a nice theorem on change of variables in a multiple integral,
the actual step of applying it to a specific problem can be very tedious and
complicated. Moreover the theorem has moderately strong assumptions which are
not always satisfied.
In the case of multivariate lognormality, one can also determine the
adjustment needed in the kriging variances. This aspect seems to have
attracted little attention in the case of other non-linear transforms and it
is at least as difficult a problem.
Apparently, lognormal kriging and indicator kriging produce very similar
D. Recent developments:
The litterature seems to be quite poor in publications on non-parametric
The Box-Cox family of transformations which has the log-normal as a particular
case has been recently proposed.
CHRISTENSEN, O.F., DIGGLE, P.J. AND RIBEIRO JR, P.J. (2001). Analysing
positive-valued spatial data: the transformed Gaussian model. In GeoENV III -
Geostatistics for environmental applications, Quantitative Geology and
Geostatistics, Kluwer Series (to appear)
CLARK I. 1996 "Lognormal kriging applied to non-lognormal deposits: two case
5th International Geostatistics Congress, Wollongong Australia, 22--27
CLARK I. 1997. Geostatistics applied to skewed data", Conference of the
International Section on Mathematical Methods in Geology (Mining P��bram
Symposia) of the International Association for Mathematical Geology, Prague,
6--10 October, Matematicke Metody V Geologii: P��bram Scientiae Rerum
CLARK I. 1998. Geostatistical estimation and the lognormal distribution
Geocongress, Pretoria RSA, June
SAITO, H. and P. GOOVAERTS. 2000. Geostatistical interpolation of positively
skewed and censored data in a dioxin contaminated site. Environmental Science
& Technology, vol.34, No.19: 4228-4235.
Institute of Mineralogy and Petrography
Dept. of Earth Sciences
University of Lausanne
Get free email and a permanent address at http://www.netaddress.com/?N=1
* To post a message to the list, send it to ai-geostats@...
* As a general service to the users, please remember to post a summary of any useful responses to your questions.
* To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
* Support to the list is provided at http://www.ai-geostats.org