- You might want to consider the implications of using data with

different supports -

3x3 neighborhoods and points. The 3x3 neighborhoods are probably

larger than the

areas associated with LAI field values. Thus the 3x3 mean NDVI's can

be considered

to be estimates of NDVI's at the points that have associated error.

Error in the independent

(x) variable leads to underestimated correlation and (ordinary least

squares (OLS) regression) slope.

There are a number of alternatives to OLS, such a MA and RMA

regression, that might lead to

improved slope estimates, but I prefer to correct the correlation and

slope estimates using

estimates of the precision of the x variable.

You can roughly estimate the precision of the point NDVI

by : 1) calculating the standard deviation of the nine pixel NDVI's

associated with each field observation and 2) plotting the standard

deviations versus the means to check for dependence of variation with

magnitude,

3) estimate standard error of NDVI as the mean standard deviation if

no dependence, otherwise consider

regression of LAI versus log(NDVI) or log-log regression. Note that

if the "point" area is much smaller than a pixel, the x error

will be underestimated - but fixing this would involve either a

geostatistical analysis of point values for NDVI

or estimation involving fractal analysis.

The error estimate can then be used to correct the estimates of

correlation and slope (my apologies

for cutting and pasting a Word file so that subsrcipts and

superscripts are lost, and maybe also Greek font, and illustrating

with S commands):

Preliminaries: First, consider regression of variable x (the

independent variable) versus y (the dependent variable). The usual

formula for the slope is:

S [(xi -mx)*(yi - my)]/S (xi - mx)2

(1)

where summation is over the index i for individual data points, and

the means are mx and my. This formula (section 1.2 in N. Draper and

H. Smith, Applied Regression Analysis, John Wiley & Sons, Inc., New

York, 1966) is correct, and computationally simple and accurate, that

is, works well to preserve floating point accuracy. However,

formulae involving descriptive statistics (correlation or covariance

of x and y, and the standard devations of x and y) convey more

information about the factors related to the slope:

cor(x,y)*sy/sx or cov(x,y)/sx2

(2)

where one can see that the magnitude of the slope increases with the

correlation and range of the dependent variable y (as measured by the

standard deviation), and decreases with range of the independent

variable. If one of the formulae in (2) is used with n data points,

it will be accurate (unbiased) if multiplied by the square root of

(n-2)/(n-1) to correct for the effect of using estimated, rather than

"true", means and if the usual assumptions, including accurate values

for the indpendent variable, are correct. If the range of the

independent variable is inflated by errors, the slope will decrease,

that is. will be biased low.

Predicting the slope when precise values of independent variable

variable x are replaced by the estimated or measured values variable

x, following Section 29.56 in M. Kendall and A. Stuart, The Advanced

Theory of Statistics: Volume 2: Inference and Relationship, 4th

Edition, Charles Griffin & Company Limited, London, 1979 (copy in

your mailbox). Let's assume that the measurements are made without

bias and with a precision represented as a standard deviation in

error: the observed measurements (x1,x2, ) of the dependent variable

x can be considered as sums of the true values (x1, x2, ) with 0

standard error plus errors (d1,d2, .) with average of 0 and standard

deviation sd. The least squares regression slope is cov(x,y)/sx2 =

cov(x,y)/( sx2 + sd2), where cov(x,y) is the covariance between x and

y, i.e. the correlation times the product of the standard deviations

of x and y. Now if the least squares slope with no errors is

cov(x,y)/ sx2 = 1, then the slope with the errors is:

cov(x,y)/( sx2 + sd2) = (cov(x,y)/sx2) * [sx2/ (sx2 + sd2)] =

sx2/( sx2 + sd2)

= 1/[1 + (sd/sx)2]

(3)

The expression on the right is a function of the relative magnitude

sd/sx of the measurement error to data range for the independent

variable, where standard deviation is the metric. The range term sx

can be approximated with sx, the standard deviation of the measured

values, if the range of measurements is large compared to the

measurement errors. Otherwise, correct for the effect of measurement

error by using , leading (as you have noted) to (sx2 - sd2)/sx2 as

the predicted slope. The estimate for sd is generally known from an

independent source, such as instrument specs or calibration analysis.

You will note that the slope predicted with (1) is always less than

1. The mathematical cause is due to the inflation of the denominator

from sx2 to sx2 + sd2. Perhaps what is counter-intuitive is that

the slope is biased due to mean zero errors. Shouldn't the errors

just degrade the precision of the least squares slope? No - because

least squares regression is asymmetrical in the way the independent

and dependent variables are treated: the sum of squares to be

minimized are the squared residuals that are distances of points to

the regression line in the y direction. One way to correct the

problem, i.e. predict the slope one would have if true x values were

known, is by multiplying by [1 + (sd/sx)2]. Another approach is to

use a least squares technique using distances from data points to

nearest point on line.

You may also note that inflation of the variance and standard

deviation of the independent variable leads to degradation in

correlation between the independent and independent variable:

cor(x,y) = cov(x,y)/(sx*sy) = cov(x,y)/(sx*sy) = cov(x,y)/( *sy)

= cor(x,y)*sx/ = cor(x,y)* /sx = cor(x,y)*

If the dependent variable is a transformed variable, then the

standard deviations are statistics of the transformed variable rather

then the original variable. For example, your independent variable

was the log-transformed predicted leaf area per vine (log(LA)), using

a calibration equation from regression analysis with this same

variable (log(LA)) as the dependent variable and pruning weight as

the independent variable. So the standard deviation of the

calibration regression residuals is a good estimate of sd. This

predicted log(LA) is the independent variable in the validation

regression you are concerned with, that is, the one with a slope of

less than one. So a good estimate of sx is either the standard

deviation s of the validation values for log(LA) or the corrected

value sqrt(s2 - sd2).

Example calculation:

measurement std. dev. = 0.326

std. dev. of independent variable = 0.616> 1/(1 + .326^2/.616^2) # predicted slope

[1] 0.7812045

> sqrt(.616^2 - .326^2) # corrected std dev of indep var

[1] 0.5226662

> 1/(1 + .326^2/.522^2) # predicted slope with corrected std dev

[1] 0.7194107

The least squares slope was 0.67 +/- 0.14.

Chris

>I'd like to know if the approach I used to derive LAI from NDVI is correct.

--

>STEP 1: I've got 46 field point values of LAI (leaf area index, namely the

>cover of plant leafs)

>STEP 2: I derived the NDVI index from a multispectral image.

>STEP 3: For every field plot I calculated the mean NDVI of 3x3 neighbour

>cells

>STEP 4: I made a regression between mean_NDVI and LAI.

>STEP 5: r^2 was low (0.34), r being 0.70, but t, measured as

>r/sqr((1-r^2)/(n-2)) was over the minimum t 2.7, being my t 5.75

>STEP 6: Since the correlation was highly significant p<0.01 I applied the

>equation of the regression line y= 4.9053x + 0.2406 where y was LAI and x

>was NDVI to the NDVI map, obtaining the LAI map

>STEP 7: I made a control on the accuracy of the model by measuring the mbe

>(mean bias error) calculated as the mean of single errors for every plot

>(46 measures): mean of P-O

>where P was the estimated LAI value and O the observed, by obtaining a mbe

>of 0.03249587

>

>Questions:

>1. Could I apply the equation as in step 6?

>2. Could I control my model by using the same input observed values as in

>step 7?

>

>Thanks

>Duccio

>

>

>

>

>--

>* To post a message to the list, send it to ai-geostats@...

>* As a general service to the users, please remember to post a

>summary of any useful responses to your questions.

>* To unsubscribe, send an email to majordomo@... with no subject

>and "unsubscribe ai-geostats" followed by "end" on the next line in

>the message body. DO NOT SEND Subscribe/Unsubscribe requests to the

>list

>* Support to the list is provided at http://www.ai-geostats.org

***************************************

Chris Hlavka

NASA/Ames Research Center 242-4

Moffett Field, CA 94035-1000

(650)604-3328 FAX 604-4680

Christine.A.Hlavka@...

***************************************

[Non-text portions of this message have been removed] - Sorry,

but I didn't give information about spatial resolutions.

Plots have a dimension of 10*10 meters. The image (Qickbird) has a

resolution of 3 meters.

Now, since I calculated mean NDVI of 3*3 cells, I think that spatial

resolution is

almost the same.

Duccio

--

* To post a message to the list, send it to ai-geostats@...

* As a general service to the users, please remember to post a summary of any useful responses to your questions.

* To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list

* Support to the list is provided at http://www.ai-geostats.org