Loading ...
Sorry, an error occurred while loading the content.

AI-GEOSTATS: answers to "transformation of data"

Expand Messages
  • Sibylle Eisenberger
    Hi! Please find below my original message and the list of answers to my question concering the transformation of negative binomial data deriving from weed
    Message 1 of 2 , Apr 16, 2002
    • 0 Attachment
      Hi!

      Please find below my original message and the list of answers to my question concering the transformation of negative binomial data deriving from weed counts. Thanks everybody for your effort!

      Sibylle

      --------------------------------------------------------------------------------


      I´m doing my diploma thesis on the spatial distribution of weeds and I´m an absolute beginner with geostatistics. Please take that into account when reading my question.

      My data are weed counts with excess zeros and fit a negative binomial distribution. But as far as I know semivariagram modelling can only be done with a more or less gaussian distribution. If yes, has anybody an idea how to transform negative binomial data to get a gaussian distribution? I would be very pleased if anybody of you could give me at least a tip how to solve this problem or maybe you can recommend some literature.


      Thanks a lot in advance.

      Regards,
      Sibylle

      --------------------------------------------------------------------------------

      I suggest you may want to transform the data in a different way, namely by recording as a rate per something such as area, i.e. to make the data look like averages over cells. Presumably your counts already represent something like this but the problem with pure counts is that they don't "add" right. The variogram corresponds to "point" data and the theory provides a way to "regularize" the variogram when the support changes, pure counts will likely not behave properly in that respect.

      With respect to transforming a negative binomial to a normal, strictly speaking that can't be done since the negative binomial is discrete and the normal is continuous. You might want to look at some of the literature relating to geostatistics and entomology, see for example papers by Liebhold and Hohn.

      Donald E. Myers
      http://www.u.arizona.edu/~donaldm


      --------------------------------------------------------------------------------


      Hi,

      just a quick suggestion. If you have enough data you could use a non-parametric indicator kriging technique. Often when there are many zeros this works well to delineate the regions of presence and absence.

      Ben

      Benjamin Warr

      Research Associate
      Centre for the Management of Environmental Resource(CMER)
      INSEAD
      Boulevard de Constance,
      77305 Fontainebleau Cedex,
      France

      Tel: 33 (0)1 60 72 4456
      Fax: 33 (0)1 60 74 55 64
      e-mail: benjamin.warr@...
      http://www.insead.fr/CMER



      --------------------------------------------------------------------------------


      Sibylle,

      I am rather new on geostats too, and I went through the same problem when I
      started. I work with soybean cyst nematode, so I also have count data with
      a negative binomial distribution.
      What I did with my data is a log 10(counts+1) transformation. Then you do
      the semivariogram and if it is not stationary, try to remove a linear or a
      quadratic trend. If that solves the non-stationarity problem, then apply
      universal krigging.
      It is pretty simple to do with Surfer.

      Good luck!
      Felicitas


      --------------------------------------------------------------------------------




      Hy Sibylle

      You can fit variogram models for any kind of distribution. Gaussian distributions are required just on some
      simulation algorithms, but gaussian transformation (or gaussian anamorphosis) is a useful
      tool to use and transform a raw variable in a gaussian variable, with mean = 0 and variance = 1,
      making structures more clear on variography.

      For that you can use gslib (normal score transformation or nscore.par) or gaussian anamorphosis
      at Isatis Software (Geovariances).


      Alessandro Henrique Medeiros Silva
      Geologist - Anglogold Brasil
      alessandro@...

      +55-31-3589-1687
      +55-31-9953-0759


      --------------------------------------------------------------------------------


      Dear Sibylle,

      I suspect your residuals will never become normal, because your data
      are counts. Luckily, normality is not a requirement for variogram
      calculation nor for kriging interpolation.

      However, before calculating variograms it may be a good idea to
      correct for non-stationarity in the variances, and work with Pearson
      residuals.

      See:

      Gotway, C.A., Stroup, W.W. (1997) A Generalized Linear Model Approach
      to Spatial Data Analysis and Prediction. Journal of Agricultural, Biological
      and Environmental Statistics 2(2), pp. 157--178.

      Diggle, P.J., Liang, K-Y., Zeger, S.L. (1994) Analysis of Longitudinal
      Data. Oxford University Press, Oxford.

      or the more advanced approach of:

      Diggle, P.J., J.A. Tawn, R.A. Moyeed (1998), Model-based
      geostatistics. Applied Statistics 47(3), pp 299-350.
      --
      Edzer


      --------------------------------------------------------------------------------


      Just as a complement to Edzer's email:

      The package geoRglm (www.maths.lancs.ac.uk/~christen)
      does the analysis based on the Poison/Binomial models
      suggested in his last reference.

      geoRglm in an add-on (package) to the R software (www.r-project.org)

      Cheers
      P.J.


      Paulo Justiniano Ribeiro Jr
      Departamento de Estatistica
      Universidade Federal do Parana'
      Caixa Postal 19.081
      CEP 81.531-990
      Curitiba, PR - Brasil

      e-mail: Paulo.Ribeiro@...
      http://www.maths.lancs.ac.uk/~ribeiro (english)
      http://www.est.ufpr.br/~ribeiro (portugues)


      --------------------------------------------------------------------------------


      Hi Sybille,

      You may want to look at using an indicator transformation for your data.
      I.e. split the distribution into (ordered) intervals (say 5 ore more..but it
      will depend on your data)...and code your variable as 1 if it is less than
      the interval-threshold, 0 if it is not. So you get a 'categorical' data set.
      Zero could be one of the thresholds.
      You would then use indicator kriging to interpolate.
      This is usually more flexible and does not use a gaussian model.

      I hope it helps,

      Alessandro Gimona
      Fisheries Research Services
      Aberdeen
      Scotland UK






      [Non-text portions of this message have been removed]
    • Yetta Jager
      Hi Sibylle: Sorry this is so late, but I have just been working on generating a negative binomial as described by Pielou, Cressie and others (e.g., Diggle,
      Message 2 of 2 , Apr 16, 2002
      • 0 Attachment
        Hi Sibylle:

        Sorry this is so late, but I have just been working on generating a negative
        binomial as described by Pielou, Cressie and others (e.g., Diggle,
        Ripley). Apparently it can be derived in two ways, one of which is a
        Poisson distribution of clusters (of weeds) and a gamma distribution
        describing the number of individual weeds per cluster.

        You don't say what your objective is -- if you are interested in kriging,
        do you want to interpolate to find weed patches that you missed during
        sampling, generate other possible realizations, or you just want to find an
        index of autocorrelation?
        Because you are focusing on the semivariogram, I'm assuming its the latter
        you want. The ratio of the variance to the mean (counts/quadrat) and
        Ripley's K are two indices of contagion used to describe point processes.
        The semivariogram is not the best tool to analyze your data with. I would
        look in Cressie's book, Chapter 8 on Spatial point patterns. If you want
        to generate alternative realizations or describe your distribution, one (or
        more) of these can be fitted to your data.

        Good luck.

        Yetta

        At 09:07 AM 4/16/2002 +0200, you wrote:
        >Hi!
        >
        >Please find below my original message and the list of answers to my
        >question concering the transformation of negative binomial data deriving
        >from weed counts. Thanks everybody for your effort!
        >
        >Sibylle
        >
        >----------
        >
        >I´m doing my diploma thesis on the spatial distribution of weeds and I´m
        >an absolute beginner with geostatistics. Please take that into account
        >when reading my question.
        >
        >My data are weed counts with excess zeros and fit a negative binomial
        >distribution. But as far as I know semivariagram modelling can only be
        >done with a more or less gaussian distribution. If yes, has anybody an
        >idea how to transform negative binomial data to get a gaussian
        >distribution? I would be very pleased if anybody of you could give me at
        >least a tip how to solve this problem or maybe you can recommend some
        >literature.
        >
        >Thanks a lot in advance.
        >
        >Regards,
        >Sibylle
        >
        >----------


        [Non-text portions of this message have been removed]
      Your message has been successfully submitted and would be delivered to recipients shortly.