transformation. Here it is, better late than never, I hope. Thanks a lot to

all those who responded.

Juliann

The key question about transformations and geostatistics is whether one

needs to re-transform. For example, if one uses a log transform (not logit)

then usually one wants to re-transform to the original form

whereas in the case of the indicator transform one does not re-transform.

The two difficulties and problems that arise are (1) how the variogram of

the original and the variogram of the transformed variable are related, (2)

in the case of a re-transformation how to compute the bias.

(1) is probably not a problem if you are not going to re-transform but to

actually compute the relationship one would need to know the multivariate

distribution density function (even then it may be difficult)

which is very unlikely in most geostatistical applications.

Donald Meyers

It is always better to use untransformed data if you

can.

Every complexity you add to your modelling increases

your chances of things going wrong exponentially.

Prime rule: simpler is always better

What I do every time I get a new set of data is the

following:

(1) calculate semi-variograms and look at histograms.

If semi-variogram nice, model and continue. If not:

(2) take logarithms and repeat. If still not nice:

(3) try indicators (lots of) to see if you have mixed

distributions or something similar. If still not nice:

(4) try a rank order (uniform) transform. If you still

don't got nice semi-variograms there is something

BADLY WRONG with your data. Re-assess your basic

assumptions:

(a) precise reproducable data?

(b) accurate representative data?

(c) homogeneous sampling zones (single populations)?

(d) trend?

Isobel Clark

Handling correlation on the link scale vs handling it on the unadjusted

scale is apparently "a topic of discussion in statistics." However, the

following

may help: if you handle covariance on the link scale you are working with

a subject-specific model while a population averaged model refers to

modeling the covariance in the error term. I'd recommend getting a

copy of Wolfinger R and M O'Connell 1993 on generalized linear mixed models.

Fundamentally, your approach may depend on your goals. Are you really

trying to explain outcomes using predictor variables? Are you

fundamentally interested in the covariance from an ecological

perspective? Or, are you trying to predict the number of trees per given

area??

If your goal falls into the former two categories and if you have a

nonignorable source of nonstationarity, then you can adjust for that

nonstationarity using binary or binomial regression. If you have covariates

at the tree level, then you might want to use the binary route. You'll

need to pick a link but you might find that a logit link might get you

started. After modeling the mean using logistic regression, you can

assess the spatial structure of the residuals by building semivariograms

from the Pearson or deviance residuals. if you observe structure, you can

model both the nonstationarity *and* the covariance using

generalized linear mixed models. if you get this far, you should probably

have read the papers below (or their equivalent). you can model spatial

variability as either a random effect or as correlated errors. all

this can be done in SAS using PROC LOGISTIC, PROC SEMIVARIOGRAM and the

GLIMMIX macro, respectively . Brian

z. Gotway, CA and WW Stroup. 1997. A generalized linear model approach to

spatial data analysis and prediction. JABES 2: 157-178.

aa. Gumpertz, ML, C Wu and JM Pye. 2000. Logistic regression for Southern

Pine Beetle outbreaks with spatial and temporal correlation. Forest Science

46: 95-107.

Wolfinger, R. 1993. Covariance structure selection in general mixed

models. Communications in StatisticsÂ–Simulations 22: 1079-1106.

Wolfinger, R. and M O'Connell. 1993. Generalized Linear Mixed Models: A

Pseudo-Likelihood Approach. Journal of Statistical Computation and

Simulation 48: 233-243

Brian Gray

I think the problem might be even more subtle. Essentially you are looking

at a marked point process, and trying to apply methods designed

principally for data that is continuous throughout the sampling domain.

I would suggest looking at the following paper:

Stoyan and Waelder 2000. On variograms in point process statistics

II. Models of markings and ecological interpretation. Biometrical journal

42(2):171-187

Another approach you might think about is spatial cdf estimation. take a

look at the work of cressie and friends.

Nicholas Lewin-Koh

> >Juliann Aukema wrote:

--

> >

> >> Hi. I have a question about transforming data.

> >>

> >> I have infection prevalence data for many points- a proportion of

> >> trees infected. Numbers are between 0 and 1. Sample size varies for the

> >> different points (because density of trees varies). When I plot a

>variogram

> >> of the prevalence data, I get a nice sill for about 4000 meters and

>then a

> >> rise in the variogram. If I take the residuals of prevalence against

> >> elevation the second rise goes away. Biologically this all makes sense and

> >> makes a nice story.

> >> However for some other analyses that I also did with this data, I

> >> was advised to logit transform the prevalence data because it is a

> >> proportion and should be binomially distributed.

> >> If I plot the variogram of the logit transformed prevalence, the

> >> first sill is much less distinct if it is there at all - this seems to be

> >> mostly due to one point, the last point before the rise, which now goes up

> >> instead of being about even with the previous point. ( I guess this

> >> difference is due to the stretching of zero prevalence values that occurs

> >> with the logit transformation.) And if I look at smaller lags, it looks

> >> like a power function with no sill. Biologically, that is harder to

> >> explain. If I plot the residuals of the (logit transformed prevalence)

> >> against ( elevation), the variogram has a nice sill and is similar, even

> >> prettier than the analysis of the untransformed data (but based on the

> >> previous variogram, I don't have a very good reason for plotting the

> >> residuals).

> >> My question, then is whether the logit transformation is necessary

> >> and/or appropriate for the geostatistical analysis. Does it make sense to

> >> use the transformed data for both variograms, for just the residuals

> >> (because the residuals are based on regression for which the

>transformation

> >> ought to be done) or for neither?

> >> Thank you very much.

> >>

> >> Juliann

> >> jaukema@...

> >>

* To post a message to the list, send it to ai-geostats@...

* As a general service to the users, please remember to post a summary of any useful responses to your questions.

* To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list

* Support to the list is provided at http://www.ai-geostats.org