I'm wondering if anyone might have some advice on this one: I'm basically

trying to model the spatial distribution of the importance of a certain

forest type using a combination of multiple linear regression and kriging

(~universal kriging). I have wall to wall coverages of the dependent

variables: precipitation, slope, elevation, aspect, road density, distance

to roads, and various satellite variables (vegetation indices); I have ~700

spatially referenced field samples where there is a measurement of the

dependent variable: amount of spruce-fir forest type. I transformed all of

the variables, trying to make them as normal as possible.

I've found, using stepwise linear regression, that the best model uses 4 of

the dependent variables, and has an r^2 of about .45 (which I found kind of

exciting, given the amount of noise in this sort of thing). My intention is

to then krig the residuals of the estimates from this model, assuming they

exhibit spatial autocorrelation, which they do. Adding the estimates from

both procedures will hopefully yield me a "better" set of estimates than

either procedure alone.

My worry, however, is that when I examine the residuals from my multiple

linear regression, I find that the plot of the residuals (y axis) vs. the

fitted value (x axis) indicate heteroscedasticity (they are more

concentrated around 0 at low values of x, and spread out as x increases (a

megaphone form)). They are normally distributed around 0, however, and do

not show any spatial pattern.

I have transformed the heck out of everything, and I have tried in a rather

clumsy way to implement weighted least squares regression (in Minitab--the

online help is very weak on this!), with poor results (the residual plot

remains very much the same). I also removed outliers to the point where I

felt a little guilty, but without much impact (although minitab still tells

me that there are lots of "unusual observations.....)

One clue: the dependent variable has all sorts of zero values--there are

about 200 of the 700 that have a measurement of "0 spruce-fir" found at

plot. I removed the zero values, then ran the regression on the remaining

values, yielding a nearly normal distribution, but the residual plot did not

change much (still the megaphone shape). I looked at scatterplots of all of

the independents vs. the dependent, and saw a little evidence of nonconstant

variance in the x across values of y, but it didn't seem dramatic. I also

plotted the absolute values of the residuals vs. the independents, and

didn't see any crazy relationship in terms of non homogeneous variance..

The correlation coefficients of the independents vs. the dependent are all

between .3 and .5; the scatterplots are pretty fat...

My next course of action is to try doing a principal components analysis on

the independent variables, and using a pc in the regression analysis. I was

also going to look into some sort of nonparametric regression.. I'd really

like to just stick with the model I came up with, however....

Does anyone have any good ideas? Should I worry about the

heteroscedasticity of the data, given my goals (from what I read, it seems

like one mostly worries about heteroscedasticity when considering confidence

intervals...However, I'd like my predictions not to be biased...)

Sorry if this is an inane question!

I'll post any responses...

Thank you,

Andrew

________________________________________________________________________

Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com

--

*To post a message to the list, send it to ai-geostats@....

*As a general service to list users, please remember to post a summary

of any useful responses to your questions.

*To unsubscribe, send email to majordomo@... with no subject and

"unsubscribe ai-geostats" in the message body.

DO NOT SEND Subscribe/Unsubscribe requests to the list!