Re: GEOSTATS: heteroscedasticity in mult. lin. regression
- and Stroup (1997), JABES 2: 157-178. Briefly, this may allow you to model a
binomial outcome (or other members of hte exponential family) that exhibits
spatial covariance. I say "may" because you may need to face a few caveats and
you may need to go through a little trial and error. Allison (1999, pp 206-213)
in SAS's Logistic Regression text covers some concerns--as does the macro
documentation (see below). The big deal, however, is that you should be able to
avoid the outlier and transformation approaches you've used to date. If you
have access to SAS, GLiMM can be performed using the GLIMMIX macro. The
documentation is provided in the macro and in the PROC MIXED documentation
(which GLIMMIX calls). The macro is available from www.sas.com and as a sample
under STAT. Email or call me if you have questions. Brian Gray
NEFIA NEFIA wrote:
> Hello list!--
> I'm wondering if anyone might have some advice on this one: I'm basically
> trying to model the spatial distribution of the importance of a certain
> forest type using a combination of multiple linear regression and kriging
> (~universal kriging). I have wall to wall coverages of the dependent
> variables: precipitation, slope, elevation, aspect, road density, distance
> to roads, and various satellite variables (vegetation indices); I have ~700
> spatially referenced field samples where there is a measurement of the
> dependent variable: amount of spruce-fir forest type. I transformed all of
> the variables, trying to make them as normal as possible.
> I've found, using stepwise linear regression, that the best model uses 4 of
> the dependent variables, and has an r^2 of about .45 (which I found kind of
> exciting, given the amount of noise in this sort of thing). My intention is
> to then krig the residuals of the estimates from this model, assuming they
> exhibit spatial autocorrelation, which they do. Adding the estimates from
> both procedures will hopefully yield me a "better" set of estimates than
> either procedure alone.
> My worry, however, is that when I examine the residuals from my multiple
> linear regression, I find that the plot of the residuals (y axis) vs. the
> fitted value (x axis) indicate heteroscedasticity (they are more
> concentrated around 0 at low values of x, and spread out as x increases (a
> megaphone form)). They are normally distributed around 0, however, and do
> not show any spatial pattern.
> I have transformed the heck out of everything, and I have tried in a rather
> clumsy way to implement weighted least squares regression (in Minitab--the
> online help is very weak on this!), with poor results (the residual plot
> remains very much the same). I also removed outliers to the point where I
> felt a little guilty, but without much impact (although minitab still tells
> me that there are lots of "unusual observations.....)
> One clue: the dependent variable has all sorts of zero values--there are
> about 200 of the 700 that have a measurement of "0 spruce-fir" found at
> plot. I removed the zero values, then ran the regression on the remaining
> values, yielding a nearly normal distribution, but the residual plot did not
> change much (still the megaphone shape). I looked at scatterplots of all of
> the independents vs. the dependent, and saw a little evidence of nonconstant
> variance in the x across values of y, but it didn't seem dramatic. I also
> plotted the absolute values of the residuals vs. the independents, and
> didn't see any crazy relationship in terms of non homogeneous variance..
> The correlation coefficients of the independents vs. the dependent are all
> between .3 and .5; the scatterplots are pretty fat...
> My next course of action is to try doing a principal components analysis on
> the independent variables, and using a pc in the regression analysis. I was
> also going to look into some sort of nonparametric regression.. I'd really
> like to just stick with the model I came up with, however....
> Does anyone have any good ideas? Should I worry about the
> heteroscedasticity of the data, given my goals (from what I read, it seems
> like one mostly worries about heteroscedasticity when considering confidence
> intervals...However, I'd like my predictions not to be biased...)
> Sorry if this is an inane question!
> I'll post any responses...
> Thank you,
> Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com
> *To post a message to the list, send it to ai-geostats@....
> *As a general service to list users, please remember to post a summary
> of any useful responses to your questions.
> *To unsubscribe, send email to majordomo@... with no subject and
> "unsubscribe ai-geostats" in the message body.
> DO NOT SEND Subscribe/Unsubscribe requests to the list!
* Brian R. Gray
* Department of Epidemiology and Biostatistics
* School of Public Health
* University of South Carolina
* Columbia, SC 29208
* phone (803) 777-1765; fax (803) 777-8769; email brgray@...
*To post a message to the list, send it to ai-geostats@....
*As a general service to list users, please remember to post a summary
of any useful responses to your questions.
*To unsubscribe, send email to majordomo@... with no subject and
"unsubscribe ai-geostats" in the message body.
DO NOT SEND Subscribe/Unsubscribe requests to the list!