- Hi!

Please find below my original message and the list of answers to my question concering the transformation of negative binomial data deriving from weed counts. Thanks everybody for your effort!

Sibylle

--------------------------------------------------------------------------------

I´m doing my diploma thesis on the spatial distribution of weeds and I´m an absolute beginner with geostatistics. Please take that into account when reading my question.

My data are weed counts with excess zeros and fit a negative binomial distribution. But as far as I know semivariagram modelling can only be done with a more or less gaussian distribution. If yes, has anybody an idea how to transform negative binomial data to get a gaussian distribution? I would be very pleased if anybody of you could give me at least a tip how to solve this problem or maybe you can recommend some literature.

Thanks a lot in advance.

Regards,

Sibylle

--------------------------------------------------------------------------------

I suggest you may want to transform the data in a different way, namely by recording as a rate per something such as area, i.e. to make the data look like averages over cells. Presumably your counts already represent something like this but the problem with pure counts is that they don't "add" right. The variogram corresponds to "point" data and the theory provides a way to "regularize" the variogram when the support changes, pure counts will likely not behave properly in that respect.

With respect to transforming a negative binomial to a normal, strictly speaking that can't be done since the negative binomial is discrete and the normal is continuous. You might want to look at some of the literature relating to geostatistics and entomology, see for example papers by Liebhold and Hohn.

Donald E. Myers

http://www.u.arizona.edu/~donaldm

--------------------------------------------------------------------------------

Hi,

just a quick suggestion. If you have enough data you could use a non-parametric indicator kriging technique. Often when there are many zeros this works well to delineate the regions of presence and absence.

Ben

Benjamin Warr

Research Associate

Centre for the Management of Environmental Resource(CMER)

INSEAD

Boulevard de Constance,

77305 Fontainebleau Cedex,

France

Tel: 33 (0)1 60 72 4456

Fax: 33 (0)1 60 74 55 64

e-mail: benjamin.warr@...

http://www.insead.fr/CMER

--------------------------------------------------------------------------------

Sibylle,

I am rather new on geostats too, and I went through the same problem when I

started. I work with soybean cyst nematode, so I also have count data with

a negative binomial distribution.

What I did with my data is a log 10(counts+1) transformation. Then you do

the semivariogram and if it is not stationary, try to remove a linear or a

quadratic trend. If that solves the non-stationarity problem, then apply

universal krigging.

It is pretty simple to do with Surfer.

Good luck!

Felicitas

--------------------------------------------------------------------------------

Hy Sibylle

You can fit variogram models for any kind of distribution. Gaussian distributions are required just on some

simulation algorithms, but gaussian transformation (or gaussian anamorphosis) is a useful

tool to use and transform a raw variable in a gaussian variable, with mean = 0 and variance = 1,

making structures more clear on variography.

For that you can use gslib (normal score transformation or nscore.par) or gaussian anamorphosis

at Isatis Software (Geovariances).

Alessandro Henrique Medeiros Silva

Geologist - Anglogold Brasil

alessandro@...

+55-31-3589-1687

+55-31-9953-0759

--------------------------------------------------------------------------------

Dear Sibylle,

I suspect your residuals will never become normal, because your data

are counts. Luckily, normality is not a requirement for variogram

calculation nor for kriging interpolation.

However, before calculating variograms it may be a good idea to

correct for non-stationarity in the variances, and work with Pearson

residuals.

See:

Gotway, C.A., Stroup, W.W. (1997) A Generalized Linear Model Approach

to Spatial Data Analysis and Prediction. Journal of Agricultural, Biological

and Environmental Statistics 2(2), pp. 157--178.

Diggle, P.J., Liang, K-Y., Zeger, S.L. (1994) Analysis of Longitudinal

Data. Oxford University Press, Oxford.

or the more advanced approach of:

Diggle, P.J., J.A. Tawn, R.A. Moyeed (1998), Model-based

geostatistics. Applied Statistics 47(3), pp 299-350.

--

Edzer

--------------------------------------------------------------------------------

Just as a complement to Edzer's email:

The package geoRglm (www.maths.lancs.ac.uk/~christen)

does the analysis based on the Poison/Binomial models

suggested in his last reference.

geoRglm in an add-on (package) to the R software (www.r-project.org)

Cheers

P.J.

Paulo Justiniano Ribeiro Jr

Departamento de Estatistica

Universidade Federal do Parana'

Caixa Postal 19.081

CEP 81.531-990

Curitiba, PR - Brasil

e-mail: Paulo.Ribeiro@...

http://www.maths.lancs.ac.uk/~ribeiro (english)

http://www.est.ufpr.br/~ribeiro (portugues)

--------------------------------------------------------------------------------

Hi Sybille,

You may want to look at using an indicator transformation for your data.

I.e. split the distribution into (ordered) intervals (say 5 ore more..but it

will depend on your data)...and code your variable as 1 if it is less than

the interval-threshold, 0 if it is not. So you get a 'categorical' data set.

Zero could be one of the thresholds.

You would then use indicator kriging to interpolate.

This is usually more flexible and does not use a gaussian model.

I hope it helps,

Alessandro Gimona

Fisheries Research Services

Aberdeen

Scotland UK

[Non-text portions of this message have been removed] - Hi Sibylle:

Sorry this is so late, but I have just been working on generating a negative

binomial as described by Pielou, Cressie and others (e.g., Diggle,

Ripley). Apparently it can be derived in two ways, one of which is a

Poisson distribution of clusters (of weeds) and a gamma distribution

describing the number of individual weeds per cluster.

You don't say what your objective is -- if you are interested in kriging,

do you want to interpolate to find weed patches that you missed during

sampling, generate other possible realizations, or you just want to find an

index of autocorrelation?

Because you are focusing on the semivariogram, I'm assuming its the latter

you want. The ratio of the variance to the mean (counts/quadrat) and

Ripley's K are two indices of contagion used to describe point processes.

The semivariogram is not the best tool to analyze your data with. I would

look in Cressie's book, Chapter 8 on Spatial point patterns. If you want

to generate alternative realizations or describe your distribution, one (or

more) of these can be fitted to your data.

Good luck.

Yetta

At 09:07 AM 4/16/2002 +0200, you wrote:>Hi!

[Non-text portions of this message have been removed]

>

>Please find below my original message and the list of answers to my

>question concering the transformation of negative binomial data deriving

>from weed counts. Thanks everybody for your effort!

>

>Sibylle

>

>----------

>

>I´m doing my diploma thesis on the spatial distribution of weeds and I´m

>an absolute beginner with geostatistics. Please take that into account

>when reading my question.

>

>My data are weed counts with excess zeros and fit a negative binomial

>distribution. But as far as I know semivariagram modelling can only be

>done with a more or less gaussian distribution. If yes, has anybody an

>idea how to transform negative binomial data to get a gaussian

>distribution? I would be very pleased if anybody of you could give me at

>least a tip how to solve this problem or maybe you can recommend some

>literature.

>

>Thanks a lot in advance.

>

>Regards,

>Sibylle

>

>----------