Re: AI-GEOSTATS: LAI (leaf area index) fromNDVI
- You might want to consider the implications of using data with
different supports -
3x3 neighborhoods and points. The 3x3 neighborhoods are probably
larger than the
areas associated with LAI field values. Thus the 3x3 mean NDVI's can
to be estimates of NDVI's at the points that have associated error.
Error in the independent
(x) variable leads to underestimated correlation and (ordinary least
squares (OLS) regression) slope.
There are a number of alternatives to OLS, such a MA and RMA
regression, that might lead to
improved slope estimates, but I prefer to correct the correlation and
slope estimates using
estimates of the precision of the x variable.
You can roughly estimate the precision of the point NDVI
by : 1) calculating the standard deviation of the nine pixel NDVI's
associated with each field observation and 2) plotting the standard
deviations versus the means to check for dependence of variation with
3) estimate standard error of NDVI as the mean standard deviation if
no dependence, otherwise consider
regression of LAI versus log(NDVI) or log-log regression. Note that
if the "point" area is much smaller than a pixel, the x error
will be underestimated - but fixing this would involve either a
geostatistical analysis of point values for NDVI
or estimation involving fractal analysis.
The error estimate can then be used to correct the estimates of
correlation and slope (my apologies
for cutting and pasting a Word file so that subsrcipts and
superscripts are lost, and maybe also Greek font, and illustrating
with S commands):
Preliminaries: First, consider regression of variable x (the
independent variable) versus y (the dependent variable). The usual
formula for the slope is:
S [(xi -mx)*(yi - my)]/S (xi - mx)2
where summation is over the index i for individual data points, and
the means are mx and my. This formula (section 1.2 in N. Draper and
H. Smith, Applied Regression Analysis, John Wiley & Sons, Inc., New
York, 1966) is correct, and computationally simple and accurate, that
is, works well to preserve floating point accuracy. However,
formulae involving descriptive statistics (correlation or covariance
of x and y, and the standard devations of x and y) convey more
information about the factors related to the slope:
cor(x,y)*sy/sx or cov(x,y)/sx2
where one can see that the magnitude of the slope increases with the
correlation and range of the dependent variable y (as measured by the
standard deviation), and decreases with range of the independent
variable. If one of the formulae in (2) is used with n data points,
it will be accurate (unbiased) if multiplied by the square root of
(n-2)/(n-1) to correct for the effect of using estimated, rather than
"true", means and if the usual assumptions, including accurate values
for the indpendent variable, are correct. If the range of the
independent variable is inflated by errors, the slope will decrease,
that is. will be biased low.
Predicting the slope when precise values of independent variable
variable x are replaced by the estimated or measured values variable
x, following Section 29.56 in M. Kendall and A. Stuart, The Advanced
Theory of Statistics: Volume 2: Inference and Relationship, 4th
Edition, Charles Griffin & Company Limited, London, 1979 (copy in
your mailbox). Let's assume that the measurements are made without
bias and with a precision represented as a standard deviation in
error: the observed measurements (x1,x2, ) of the dependent variable
x can be considered as sums of the true values (x1, x2, ) with 0
standard error plus errors (d1,d2, .) with average of 0 and standard
deviation sd. The least squares regression slope is cov(x,y)/sx2 =
cov(x,y)/( sx2 + sd2), where cov(x,y) is the covariance between x and
y, i.e. the correlation times the product of the standard deviations
of x and y. Now if the least squares slope with no errors is
cov(x,y)/ sx2 = 1, then the slope with the errors is:
cov(x,y)/( sx2 + sd2) = (cov(x,y)/sx2) * [sx2/ (sx2 + sd2)] =
sx2/( sx2 + sd2)
= 1/[1 + (sd/sx)2]
The expression on the right is a function of the relative magnitude
sd/sx of the measurement error to data range for the independent
variable, where standard deviation is the metric. The range term sx
can be approximated with sx, the standard deviation of the measured
values, if the range of measurements is large compared to the
measurement errors. Otherwise, correct for the effect of measurement
error by using , leading (as you have noted) to (sx2 - sd2)/sx2 as
the predicted slope. The estimate for sd is generally known from an
independent source, such as instrument specs or calibration analysis.
You will note that the slope predicted with (1) is always less than
1. The mathematical cause is due to the inflation of the denominator
from sx2 to sx2 + sd2. Perhaps what is counter-intuitive is that
the slope is biased due to mean zero errors. Shouldn't the errors
just degrade the precision of the least squares slope? No - because
least squares regression is asymmetrical in the way the independent
and dependent variables are treated: the sum of squares to be
minimized are the squared residuals that are distances of points to
the regression line in the y direction. One way to correct the
problem, i.e. predict the slope one would have if true x values were
known, is by multiplying by [1 + (sd/sx)2]. Another approach is to
use a least squares technique using distances from data points to
nearest point on line.
You may also note that inflation of the variance and standard
deviation of the independent variable leads to degradation in
correlation between the independent and independent variable:
cor(x,y) = cov(x,y)/(sx*sy) = cov(x,y)/(sx*sy) = cov(x,y)/( *sy)
= cor(x,y)*sx/ = cor(x,y)* /sx = cor(x,y)*
If the dependent variable is a transformed variable, then the
standard deviations are statistics of the transformed variable rather
then the original variable. For example, your independent variable
was the log-transformed predicted leaf area per vine (log(LA)), using
a calibration equation from regression analysis with this same
variable (log(LA)) as the dependent variable and pruning weight as
the independent variable. So the standard deviation of the
calibration regression residuals is a good estimate of sd. This
predicted log(LA) is the independent variable in the validation
regression you are concerned with, that is, the one with a slope of
less than one. So a good estimate of sx is either the standard
deviation s of the validation values for log(LA) or the corrected
value sqrt(s2 - sd2).
measurement std. dev. = 0.326
std. dev. of independent variable = 0.616
> 1/(1 + .326^2/.616^2) # predicted slope 0.7812045
> sqrt(.616^2 - .326^2) # corrected std dev of indep var 0.5226662
> 1/(1 + .326^2/.522^2) # predicted slope with corrected std dev 0.7194107
The least squares slope was 0.67 +/- 0.14.
>I'd like to know if the approach I used to derive LAI from NDVI is correct.--
>STEP 1: I've got 46 field point values of LAI (leaf area index, namely the
>cover of plant leafs)
>STEP 2: I derived the NDVI index from a multispectral image.
>STEP 3: For every field plot I calculated the mean NDVI of 3x3 neighbour
>STEP 4: I made a regression between mean_NDVI and LAI.
>STEP 5: r^2 was low (0.34), r being 0.70, but t, measured as
>r/sqr((1-r^2)/(n-2)) was over the minimum t 2.7, being my t 5.75
>STEP 6: Since the correlation was highly significant p<0.01 I applied the
>equation of the regression line y= 4.9053x + 0.2406 where y was LAI and x
>was NDVI to the NDVI map, obtaining the LAI map
>STEP 7: I made a control on the accuracy of the model by measuring the mbe
>(mean bias error) calculated as the mean of single errors for every plot
>(46 measures): mean of P-O
>where P was the estimated LAI value and O the observed, by obtaining a mbe
>1. Could I apply the equation as in step 6?
>2. Could I control my model by using the same input observed values as in
>* To post a message to the list, send it to ai-geostats@...
>* As a general service to the users, please remember to post a
>summary of any useful responses to your questions.
>* To unsubscribe, send an email to majordomo@... with no subject
>and "unsubscribe ai-geostats" followed by "end" on the next line in
>the message body. DO NOT SEND Subscribe/Unsubscribe requests to the
>* Support to the list is provided at http://www.ai-geostats.org
NASA/Ames Research Center 242-4
Moffett Field, CA 94035-1000
(650)604-3328 FAX 604-4680
[Non-text portions of this message have been removed]
but I didn't give information about spatial resolutions.
Plots have a dimension of 10*10 meters. The image (Qickbird) has a
resolution of 3 meters.
Now, since I calculated mean NDVI of 3*3 cells, I think that spatial
almost the same.
* To post a message to the list, send it to ai-geostats@...
* As a general service to the users, please remember to post a summary of any useful responses to your questions.
* To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
* Support to the list is provided at http://www.ai-geostats.org