Loading ...
Sorry, an error occurred while loading the content.

Re: AI-GEOSTATS: LAI (leaf area index) fromNDVI

Expand Messages
  • Chris Hlavka
    You might want to consider the implications of using data with different supports - 3x3 neighborhoods and points. The 3x3 neighborhoods are probably larger
    Message 1 of 3 , Aug 12, 2003
    • 0 Attachment
      You might want to consider the implications of using data with
      different supports -
      3x3 neighborhoods and points. The 3x3 neighborhoods are probably
      larger than the
      areas associated with LAI field values. Thus the 3x3 mean NDVI's can
      be considered
      to be estimates of NDVI's at the points that have associated error.
      Error in the independent
      (x) variable leads to underestimated correlation and (ordinary least
      squares (OLS) regression) slope.
      There are a number of alternatives to OLS, such a MA and RMA
      regression, that might lead to
      improved slope estimates, but I prefer to correct the correlation and
      slope estimates using
      estimates of the precision of the x variable.

      You can roughly estimate the precision of the point NDVI
      by : 1) calculating the standard deviation of the nine pixel NDVI's
      associated with each field observation and 2) plotting the standard
      deviations versus the means to check for dependence of variation with
      magnitude,
      3) estimate standard error of NDVI as the mean standard deviation if
      no dependence, otherwise consider
      regression of LAI versus log(NDVI) or log-log regression. Note that
      if the "point" area is much smaller than a pixel, the x error
      will be underestimated - but fixing this would involve either a
      geostatistical analysis of point values for NDVI
      or estimation involving fractal analysis.

      The error estimate can then be used to correct the estimates of
      correlation and slope (my apologies
      for cutting and pasting a Word file so that subsrcipts and
      superscripts are lost, and maybe also Greek font, and illustrating
      with S commands):

      Preliminaries: First, consider regression of variable x (the
      independent variable) versus y (the dependent variable). The usual
      formula for the slope is:

      S [(xi -mx)*(yi - my)]/S (xi - mx)2
      (1)

      where summation is over the index i for individual data points, and
      the means are mx and my. This formula (section 1.2 in N. Draper and
      H. Smith, Applied Regression Analysis, John Wiley & Sons, Inc., New
      York, 1966) is correct, and computationally simple and accurate, that
      is, works well to preserve floating point accuracy. However,
      formulae involving descriptive statistics (correlation or covariance
      of x and y, and the standard devations of x and y) convey more
      information about the factors related to the slope:

      cor(x,y)*sy/sx or cov(x,y)/sx2
      (2)

      where one can see that the magnitude of the slope increases with the
      correlation and range of the dependent variable y (as measured by the
      standard deviation), and decreases with range of the independent
      variable. If one of the formulae in (2) is used with n data points,
      it will be accurate (unbiased) if multiplied by the square root of
      (n-2)/(n-1) to correct for the effect of using estimated, rather than
      "true", means and if the usual assumptions, including accurate values
      for the indpendent variable, are correct. If the range of the
      independent variable is inflated by errors, the slope will decrease,
      that is. will be biased low.

      Predicting the slope when precise values of independent variable
      variable x are replaced by the estimated or measured values variable
      x, following Section 29.56 in M. Kendall and A. Stuart, The Advanced
      Theory of Statistics: Volume 2: Inference and Relationship, 4th
      Edition, Charles Griffin & Company Limited, London, 1979 (copy in
      your mailbox). Let's assume that the measurements are made without
      bias and with a precision represented as a standard deviation in
      error: the observed measurements (x1,x2, Š) of the dependent variable
      x can be considered as sums of the true values (x1, x2, Š ) with 0
      standard error plus errors (d1,d2, Š.) with average of 0 and standard
      deviation sd. The least squares regression slope is cov(x,y)/sx2 =
      cov(x,y)/( sx2 + sd2), where cov(x,y) is the covariance between x and
      y, i.e. the correlation times the product of the standard deviations
      of x and y. Now if the least squares slope with no errors is
      cov(x,y)/ sx2 = 1, then the slope with the errors is:

      cov(x,y)/( sx2 + sd2) = (cov(x,y)/sx2) * [sx2/ (sx2 + sd2)] =
      sx2/( sx2 + sd2)
      = 1/[1 + (sd/sx)2]
      (3)

      The expression on the right is a function of the relative magnitude
      sd/sx of the measurement error to data range for the independent
      variable, where standard deviation is the metric. The range term sx
      can be approximated with sx, the standard deviation of the measured
      values, if the range of measurements is large compared to the
      measurement errors. Otherwise, correct for the effect of measurement
      error by using , leading (as you have noted) to (sx2 - sd2)/sx2 as
      the predicted slope. The estimate for sd is generally known from an
      independent source, such as instrument specs or calibration analysis.

      You will note that the slope predicted with (1) is always less than
      1. The mathematical cause is due to the inflation of the denominator
      from sx2 to sx2 + sd2. Perhaps what is counter-intuitive is that
      the slope is biased due to mean zero errors. Shouldn't the errors
      just degrade the precision of the least squares slope? No - because
      least squares regression is asymmetrical in the way the independent
      and dependent variables are treated: the sum of squares to be
      minimized are the squared residuals that are distances of points to
      the regression line in the y direction. One way to correct the
      problem, i.e. predict the slope one would have if true x values were
      known, is by multiplying by [1 + (sd/sx)2]. Another approach is to
      use a least squares technique using distances from data points to
      nearest point on line.

      You may also note that inflation of the variance and standard
      deviation of the independent variable leads to degradation in
      correlation between the independent and independent variable:

      cor(x,y) = cov(x,y)/(sx*sy) = cov(x,y)/(sx*sy) = cov(x,y)/( *sy)
      = cor(x,y)*sx/ = cor(x,y)* /sx = cor(x,y)*

      If the dependent variable is a transformed variable, then the
      standard deviations are statistics of the transformed variable rather
      then the original variable. For example, your independent variable
      was the log-transformed predicted leaf area per vine (log(LA)), using
      a calibration equation from regression analysis with this same
      variable (log(LA)) as the dependent variable and pruning weight as
      the independent variable. So the standard deviation of the
      calibration regression residuals is a good estimate of sd. This
      predicted log(LA) is the independent variable in the validation
      regression you are concerned with, that is, the one with a slope of
      less than one. So a good estimate of sx is either the standard
      deviation s of the validation values for log(LA) or the corrected
      value sqrt(s2 - sd2).


      Example calculation:
      measurement std. dev. = 0.326
      std. dev. of independent variable = 0.616
      > 1/(1 + .326^2/.616^2) # predicted slope
      [1] 0.7812045
      > sqrt(.616^2 - .326^2) # corrected std dev of indep var
      [1] 0.5226662
      > 1/(1 + .326^2/.522^2) # predicted slope with corrected std dev
      [1] 0.7194107

      The least squares slope was 0.67 +/- 0.14.

      Chris


      >I'd like to know if the approach I used to derive LAI from NDVI is correct.
      >STEP 1: I've got 46 field point values of LAI (leaf area index, namely the
      >cover of plant leafs)
      >STEP 2: I derived the NDVI index from a multispectral image.
      >STEP 3: For every field plot I calculated the mean NDVI of 3x3 neighbour
      >cells
      >STEP 4: I made a regression between mean_NDVI and LAI.
      >STEP 5: r^2 was low (0.34), r being 0.70, but t, measured as
      >r/sqr((1-r^2)/(n-2)) was over the minimum t 2.7, being my t 5.75
      >STEP 6: Since the correlation was highly significant p<0.01 I applied the
      >equation of the regression line y= 4.9053x + 0.2406 where y was LAI and x
      >was NDVI to the NDVI map, obtaining the LAI map
      >STEP 7: I made a control on the accuracy of the model by measuring the mbe
      >(mean bias error) calculated as the mean of single errors for every plot
      >(46 measures): mean of P-O
      >where P was the estimated LAI value and O the observed, by obtaining a mbe
      >of 0.03249587
      >
      >Questions:
      >1. Could I apply the equation as in step 6?
      >2. Could I control my model by using the same input observed values as in
      >step 7?
      >
      >Thanks
      >Duccio
      >
      >
      >
      >
      >--
      >* To post a message to the list, send it to ai-geostats@...
      >* As a general service to the users, please remember to post a
      >summary of any useful responses to your questions.
      >* To unsubscribe, send an email to majordomo@... with no subject
      >and "unsubscribe ai-geostats" followed by "end" on the next line in
      >the message body. DO NOT SEND Subscribe/Unsubscribe requests to the
      >list
      >* Support to the list is provided at http://www.ai-geostats.org

      --
      ***************************************
      Chris Hlavka
      NASA/Ames Research Center 242-4
      Moffett Field, CA 94035-1000
      (650)604-3328 FAX 604-4680
      Christine.A.Hlavka@...
      ***************************************

      [Non-text portions of this message have been removed]
    • rocchini@unisi.it
      Sorry, but I didn t give information about spatial resolutions. Plots have a dimension of 10*10 meters. The image (Qickbird) has a resolution of 3 meters. Now,
      Message 2 of 3 , Aug 13, 2003
      • 0 Attachment
        Sorry,
        but I didn't give information about spatial resolutions.
        Plots have a dimension of 10*10 meters. The image (Qickbird) has a
        resolution of 3 meters.
        Now, since I calculated mean NDVI of 3*3 cells, I think that spatial
        resolution is
        almost the same.
        Duccio



        --
        * To post a message to the list, send it to ai-geostats@...
        * As a general service to the users, please remember to post a summary of any useful responses to your questions.
        * To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
        * Support to the list is provided at http://www.ai-geostats.org
      Your message has been successfully submitted and would be delivered to recipients shortly.