Here's a post of the many helpful responses I received to a question about

heteroscedasticity of my residuals from a regression analysis. What I tried

was giving all of my zero values a very tiny, near-zero number proportional

to the mean value of all of the points in the neighborhood of the zero

value. I then normal score transformed the data, performed my regression

and the variography and kriging of the residuals, and then backtransformed

my final surface of normal scores. My residual plots then did not indicate

any bias, interestingly, and I couldn't really find any evidence that any of

my results performed very poorly, as judged by a cross validation procedure.

I'm currently trying to find some information about potential theoretical

problems with using normal scored data in this sort of analysis. It seems

that normal scored data are used in sequential gaussian conditional

simulation in gslib�It also seems to me that normal scoring is kind of like

a ranking procedure, not like the lognormal transformation, and thus the

same problems associated with kriging of lognormal data do not necessarily

exist. However, there are probably some other theoretical problems with

using normal scored data that I am not aware of. If anyone has any pointers

r.e. this, I'd be happy to hear them!

Again thank you so much for your help! Original message, and then responses

to follow:

ORIGINAL MESSAGE:>I'm wondering if anyone might have some advice on this one: I'm basically

of

>trying to model the spatial distribution of the importance of a certain

>forest type using a combination of multiple linear regression and kriging

>(~universal kriging). I have wall to wall coverages of the dependent

>variables: precipitation, slope, elevation, aspect, road density, distance

>to roads, and various satellite variables (vegetation indices); I have ~700

>spatially referenced field samples where there is a measurement of the

>dependent variable: amount of spruce-fir forest type. I transformed all

>the variables, trying to make them as normal as possible.

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

>

>I've found, using stepwise linear regression, that the best model uses 4 of

>the dependent variables, and has an r^2 of about .45 (which I found kind of

>exciting, given the amount of noise in this sort of thing). My intention

>is

>to then krig the residuals of the estimates from this model, assuming they

>exhibit spatial autocorrelation, which they do. Adding the estimates from

>both procedures will hopefully yield me a "better" set of estimates than

>either procedure alone.

>

>My worry, however, is that when I examine the residuals from my multiple

>linear regression, I find that the plot of the residuals (y axis) vs. the

>fitted value (x axis) indicate heteroscedasticity (they are more

>concentrated around 0 at low values of x, and spread out as x increases (a

>megaphone form)). They are normally distributed around 0, however, and do

>not show any spatial pattern.

>

>I have transformed the heck out of everything, and I have tried in a rather

>clumsy way to implement weighted least squares regression (in Minitab--the

>online help is very weak on this!), with poor results (the residual plot

>remains very much the same). I also removed outliers to the point where I

>felt a little guilty, but without much impact (although minitab still tells

>me that there are lots of "unusual observations.....)

>

>One clue: the dependent variable has all sorts of zero values--there are

>about 200 of the 700 that have a measurement of "0 spruce-fir" found at

>plot. I removed the zero values, then ran the regression on the remaining

>values, yielding a nearly normal distribution, but the residual plot did

>not

>change much (still the megaphone shape). I looked at scatterplots of all

>of

>the independents vs. the dependent, and saw a little evidence of

>nonconstant

>variance in the x across values of y, but it didn't seem dramatic. I also

>plotted the absolute values of the residuals vs. the independents, and

>didn't see any crazy relationship in terms of non homogeneous variance..

>The correlation coefficients of the independents vs. the dependent are all

>between .3 and .5; the scatterplots are pretty fat...

>

>My next course of action is to try doing a principal components analysis on

>the independent variables, and using a pc in the regression analysis. I

>was

>also going to look into some sort of nonparametric regression.. I'd really

>like to just stick with the model I came up with, however....

>

>Does anyone have any good ideas? Should I worry about the

>heteroscedasticity of the data, given my goals (from what I read, it seems

>like one mostly worries about heteroscedasticity when considering

>confidence

>intervals...However, I'd like my predictions not to be biased...)

>

>Sorry if this is an inane question!

>

>I'll post any responses...

>

>Thank you,

>Andrew

>\

RESPONSES:

andrew: Let me recommend generalized linear mixed modeling (GLiMM). See

Gotway

and Stroup (1997), JABES 2: 157-178. Briefly, this may allow you to model a

binomial outcome (or other members of hte exponential family) that exhibits

spatial covariance. I say "may" because you may need to face a few caveats

and

you may need to go through a little trial and error. Allison (1999, pp

206-213)

in SAS's Logistic Regression text covers some concerns--as does the macro

documentation (see below). The big deal, however, is that you should be

able to

avoid the outlier and transformation approaches you've used to date. If you

have access to SAS, GLiMM can be performed using the GLIMMIX macro. The

documentation is provided in the macro and in the PROC MIXED documentation

(which GLIMMIX calls). The macro is available from www.sas.com and as a

sample

under STAT.

Brian Gray

xxxxxxxxxxxxx

to me the problem looks like functional misspecification and not like a

distributional problem. have you tried squares or squesrroots of the driver

variables and interaction terms? you can either try these things or do a

test for functional specification or analyse the residual versus the driver

vareiables. there is a very simple test for functional misspecification: the

Ramsey test. you calculate the residual and the square it and cube it. add

the squared and cubed residual to the regression model and see whether the r

square increases significantly (f-test if you want). i typically do all

these things iteratively until i am satisfied.

i would not worry about weighted regression and stuff like that but i would

develop the model manually instead of stepwise regression. typically one

needs meny runs=model respecifications to get an understanding of the data.

i find graphs or tables of lefthand side variables versus drivers always

very usefull as a first step.

Otto Hellwig

xxxxxxxxxxxxxxxxxx

A few observations

1. You mention having transformed "everything" to obtain "normality", I

would assume that ultimately you want to obtain interpolates for the

original data (not the transformed data nor the residual data). Simply

exponentiating will lead to problems. Consider the simpler problem of a

constant mean (no auxillary variables) and a log transformation for the

variable of interest. If you model a variogram from the transformed data,

use the transformed data and this variogram to krige then you will have a

multiplicative bias if you simply exponentiate the kriged values. Under an

assumption of multivariate lognormality for the original you can compute

the bias adjustment (see for example Journel, Math. Geology, circa 1980).

For any other transformation than logarithmic and without the lognormality

assumption it is not possible to compute the bias adjustment (even knowing

the multivariate distribution type and the analytic form of the transform

is usually not enough to actually do it).

As an ad-hoc solution (described in Journel and Huigbrechts, 1978)

for the

log transform case, compute the mean of the original untransformed data,

compute the

mean of the kriged values. Take the quotient, this is your

multiplicative

bias adjustment.

Did you transform the original dependent variable data or the residuals? If

the former then you have real problems, if the later then your problem is

close at least to that described above.

You mention universal kriging but don't seem to have used it. Having

identified the 4 critical auxillary variables you might try UK using

"external drift". This avoids having to go back and compute the value of

the regression term at non-data locations.

Although there are a lot of similarities between regression (multivariate)

and kriging, the underlying assumptions are quite different. One should be

a little careful mixing them.

Donal Myers

Xxxxxxxxxxxxxx

Andrew,

I agree with the post that suggested that you try other (nonlinear) forms

of regression if your data is not naturally continuous and/or linear.

Linearizing transforms are often the culprits when residuals are

heteroskedastic. If you are working with continuous data, which you have

transformed to obtain normal distributions then the advice below may

apply.

Usually, parametric statistical tests like multiple regression are

much more robust to non-normality than they are to heterogeniety of variance

problems, so transforms are generally geared toward creating homoskedastic

residuals rather than normal data. The fan-shaped or

megaphone-shaped residual plot you describe suggests that you need to

transform the data one more time using a log or square root transform.

(Most stats books on linear regression describe this problem and suggest

appropriate transforms based on shape of the residual plot.)

Additionally, multiple regression tests lose power when independent

variables are correlated, which is why PCA is used to remove correlations

before doing multiple regression or alternatively, one variable from a set

of correlated variables is used for the regression test. If the variables

you are using are correlated, your r^2 may be even better than .45.

It's also important to remember that step-wise regression may produce a

"best fit" set of variables that is unique to the particular data set on

which it is used, but the "best fit" set may not generalize to new data

sets. Therefore it should generally be used only to give you a feel for

the data sets and the variables, but you should try to base your published

work on either established theory or personal experience (your own

theories) about the variables involved (i.e., try to test a priori models

rather than post hoc models).

Joyce Witebsky

Xxxxxxxxxxxx

I would suggest to assume Poisson-like distributions for the regression

residual, which would lead you to fit Generalized Linear (or Additive)

Models for the trend. This deals well with variances that are

Proportional to the mean value.

Residuals from such a regression are non-stationary by nature.

Interpolating

them is possible, if you assume a non-stationary (mean dependent)

covariogram, derived from a stationary correllogram. I wrote a paper (+

software) for

this for the geostats conference in Cape town, which is in PDF available on

the web. Follow Conference paper Nr. 4 (Pebesma/Bio/Duin) in my publications

list

athttp://www.geog.uu.nl/~pebesma/

--

Edzer Pebesma

Xxxxxxxxxxxxxxxxxxxx

I've done something similar and used blocking or stratification of the

residuals to deal with heteroscedasticity instead of transformation.

Transformation is really not advisable because of having to then add the

kriging variances in the back-transformation. See the pdf file on my

webpage below.

Yetta Jager

WEBpage: http://www.esd.ornl.gov/~zij/

Xxxxxxxxxx

In OLS regression, the assumption is that you have independent observations.

Your description gives me the sense that your observations are highly

spatially

autocorrelated. Your correlations are thus invalid. Consider strategies to

get rid of the spatial autocorrelation. The PCA won't do it since your

observations are

not of equal worth.

Art Getis

xxxxxxxxxxxxxxxx

Hi

I understand from your email that you are trying to map the

"amount of spruce fir forest type using wal to wall coverages of

a number of predictor variables.

Instead of multiple regression, you might find the predictions

provided by classification and regression trees (CART) more

useful. For an example of CART and kriging applied to a

geological data set, see http://www.isaaks.com, and click on

EDA (exploratory data analysis). Here, the response variable is

the amount of copper while the predictors are an assortment of

geological descriptors.

Ed Isaaks

________________________________________________________________________

Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com

--

*To post a message to the list, send it to ai-geostats@....

*As a general service to list users, please remember to post a summary

of any useful responses to your questions.

*To unsubscribe, send email to majordomo@... with no subject and

"unsubscribe ai-geostats" in the message body.

DO NOT SEND Subscribe/Unsubscribe requests to the list!