## AI-GEOSTATS: data transformation and variograms

• Hi. I have a question about transforming data. I have infection prevalence data for many points- a proportion of trees infected. Numbers are between 0 and 1.
Message 1 of 2 , Apr 19, 2001
Hi. I have a question about transforming data.

I have infection prevalence data for many points- a proportion of
trees infected. Numbers are between 0 and 1. Sample size varies for the
different points (because density of trees varies). When I plot a variogram
of the prevalence data, I get a nice sill for about 4000 meters and then a
rise in the variogram. If I take the residuals of prevalence against
elevation the second rise goes away. Biologically this all makes sense and
makes a nice story.
However for some other analyses that I also did with this data, I
was advised to logit transform the prevalence data because it is a
proportion and should be binomially distributed.
If I plot the variogram of the logit transformed prevalence, the
first sill is much less distinct if it is there at all - this seems to be
mostly due to one point, the last point before the rise, which now goes up
instead of being about even with the previous point. ( I guess this
difference is due to the stretching of zero prevalence values that occurs
with the logit transformation.) And if I look at smaller lags, it looks
like a power function with no sill. Biologically, that is harder to
explain. If I plot the residuals of the (logit transformed prevalence)
against ( elevation), the variogram has a nice sill and is similar, even
prettier than the analysis of the untransformed data (but based on the
previous variogram, I don't have a very good reason for plotting the
residuals).
My question, then is whether the logit transformation is necessary
and/or appropriate for the geostatistical analysis. Does it make sense to
use the transformed data for both variograms, for just the residuals
(because the residuals are based on regression for which the transformation
ought to be done) or for neither?
Thank you very much.

Juliann
jaukema@...

--
• Hi, I think the problem might be even more subtle. Essentially you are looking at a marked point process, and trying to apply methods designed principally for
Message 2 of 2 , Apr 19, 2001
Hi,
I think the problem might be even more subtle. Essentially you are looking
at a marked point process, and trying to apply methods designed
principally for data that is continuous throughout the sampling domain.

I would suggest looking at the following paper:
Stoyan and Waelder 2000. On variograms in point process statistics
II. Models of markings and ecological interpretation. Biometrical journal
42(2):171-187

Another approach you might think about is spatial cdf estimation. take a
look at the work of cressie and friends.

Nicholas

On Thu, 19 Apr 2001, Juliann Aukema wrote:

--
