## AI-GEOSTATS: answers to "transformation of data"

Expand Messages
• Hi! Please find below my original message and the list of answers to my question concering the transformation of negative binomial data deriving from weed
Message 1 of 2 , Apr 16, 2002
Hi!

Please find below my original message and the list of answers to my question concering the transformation of negative binomial data deriving from weed counts. Thanks everybody for your effort!

Sibylle

--------------------------------------------------------------------------------

I´m doing my diploma thesis on the spatial distribution of weeds and I´m an absolute beginner with geostatistics. Please take that into account when reading my question.

My data are weed counts with excess zeros and fit a negative binomial distribution. But as far as I know semivariagram modelling can only be done with a more or less gaussian distribution. If yes, has anybody an idea how to transform negative binomial data to get a gaussian distribution? I would be very pleased if anybody of you could give me at least a tip how to solve this problem or maybe you can recommend some literature.

Regards,
Sibylle

--------------------------------------------------------------------------------

I suggest you may want to transform the data in a different way, namely by recording as a rate per something such as area, i.e. to make the data look like averages over cells. Presumably your counts already represent something like this but the problem with pure counts is that they don't "add" right. The variogram corresponds to "point" data and the theory provides a way to "regularize" the variogram when the support changes, pure counts will likely not behave properly in that respect.

With respect to transforming a negative binomial to a normal, strictly speaking that can't be done since the negative binomial is discrete and the normal is continuous. You might want to look at some of the literature relating to geostatistics and entomology, see for example papers by Liebhold and Hohn.

Donald E. Myers
http://www.u.arizona.edu/~donaldm

--------------------------------------------------------------------------------

Hi,

just a quick suggestion. If you have enough data you could use a non-parametric indicator kriging technique. Often when there are many zeros this works well to delineate the regions of presence and absence.

Ben

Benjamin Warr

Research Associate
Centre for the Management of Environmental Resource(CMER)
Boulevard de Constance,
77305 Fontainebleau Cedex,
France

Tel: 33 (0)1 60 72 4456
Fax: 33 (0)1 60 74 55 64
e-mail: benjamin.warr@...

--------------------------------------------------------------------------------

Sibylle,

I am rather new on geostats too, and I went through the same problem when I
started. I work with soybean cyst nematode, so I also have count data with
a negative binomial distribution.
What I did with my data is a log 10(counts+1) transformation. Then you do
the semivariogram and if it is not stationary, try to remove a linear or a
quadratic trend. If that solves the non-stationarity problem, then apply
universal krigging.
It is pretty simple to do with Surfer.

Good luck!
Felicitas

--------------------------------------------------------------------------------

Hy Sibylle

You can fit variogram models for any kind of distribution. Gaussian distributions are required just on some
simulation algorithms, but gaussian transformation (or gaussian anamorphosis) is a useful
tool to use and transform a raw variable in a gaussian variable, with mean = 0 and variance = 1,
making structures more clear on variography.

For that you can use gslib (normal score transformation or nscore.par) or gaussian anamorphosis
at Isatis Software (Geovariances).

Alessandro Henrique Medeiros Silva
Geologist - Anglogold Brasil
alessandro@...

+55-31-3589-1687
+55-31-9953-0759

--------------------------------------------------------------------------------

Dear Sibylle,

are counts. Luckily, normality is not a requirement for variogram
calculation nor for kriging interpolation.

However, before calculating variograms it may be a good idea to
correct for non-stationarity in the variances, and work with Pearson
residuals.

See:

Gotway, C.A., Stroup, W.W. (1997) A Generalized Linear Model Approach
to Spatial Data Analysis and Prediction. Journal of Agricultural, Biological
and Environmental Statistics 2(2), pp. 157--178.

Diggle, P.J., Liang, K-Y., Zeger, S.L. (1994) Analysis of Longitudinal
Data. Oxford University Press, Oxford.

or the more advanced approach of:

Diggle, P.J., J.A. Tawn, R.A. Moyeed (1998), Model-based
geostatistics. Applied Statistics 47(3), pp 299-350.
--
Edzer

--------------------------------------------------------------------------------

Just as a complement to Edzer's email:

The package geoRglm (www.maths.lancs.ac.uk/~christen)
does the analysis based on the Poison/Binomial models
suggested in his last reference.

geoRglm in an add-on (package) to the R software (www.r-project.org)

Cheers
P.J.

Paulo Justiniano Ribeiro Jr
Departamento de Estatistica
Caixa Postal 19.081
CEP 81.531-990
Curitiba, PR - Brasil

e-mail: Paulo.Ribeiro@...
http://www.maths.lancs.ac.uk/~ribeiro (english)
http://www.est.ufpr.br/~ribeiro (portugues)

--------------------------------------------------------------------------------

Hi Sybille,

You may want to look at using an indicator transformation for your data.
I.e. split the distribution into (ordered) intervals (say 5 ore more..but it
will depend on your data)...and code your variable as 1 if it is less than
the interval-threshold, 0 if it is not. So you get a 'categorical' data set.
Zero could be one of the thresholds.
You would then use indicator kriging to interpolate.
This is usually more flexible and does not use a gaussian model.

I hope it helps,

Alessandro Gimona
Fisheries Research Services
Aberdeen
Scotland UK

[Non-text portions of this message have been removed]
• Hi Sibylle: Sorry this is so late, but I have just been working on generating a negative binomial as described by Pielou, Cressie and others (e.g., Diggle,
Message 2 of 2 , Apr 16, 2002
Hi Sibylle:

Sorry this is so late, but I have just been working on generating a negative
binomial as described by Pielou, Cressie and others (e.g., Diggle,
Ripley). Apparently it can be derived in two ways, one of which is a
Poisson distribution of clusters (of weeds) and a gamma distribution
describing the number of individual weeds per cluster.

You don't say what your objective is -- if you are interested in kriging,
do you want to interpolate to find weed patches that you missed during
sampling, generate other possible realizations, or you just want to find an
index of autocorrelation?
Because you are focusing on the semivariogram, I'm assuming its the latter
you want. The ratio of the variance to the mean (counts/quadrat) and
Ripley's K are two indices of contagion used to describe point processes.
The semivariogram is not the best tool to analyze your data with. I would
look in Cressie's book, Chapter 8 on Spatial point patterns. If you want
to generate alternative realizations or describe your distribution, one (or
more) of these can be fitted to your data.

Good luck.

Yetta

At 09:07 AM 4/16/2002 +0200, you wrote:
>Hi!
>
>Please find below my original message and the list of answers to my
>question concering the transformation of negative binomial data deriving
>from weed counts. Thanks everybody for your effort!
>
>Sibylle
>
>----------
>
>I´m doing my diploma thesis on the spatial distribution of weeds and I´m
>an absolute beginner with geostatistics. Please take that into account
>
>My data are weed counts with excess zeros and fit a negative binomial
>distribution. But as far as I know semivariagram modelling can only be
>done with a more or less gaussian distribution. If yes, has anybody an
>idea how to transform negative binomial data to get a gaussian
>distribution? I would be very pleased if anybody of you could give me at
>least a tip how to solve this problem or maybe you can recommend some
>literature.
>