Loading ...
Sorry, an error occurred while loading the content.

Re: [ai-geostats] Cross-validation, sampling design and confirmation of my way of thinking...

Expand Messages
  • Rajive Ganguli
    Koen, Some quick thoughts. Is it possible, you have a mixture of distributions in your data? It may be why when you combine many distributions (as in your
    Message 1 of 2 , Nov 8 3:35 PM
    • 0 Attachment

      Some quick thoughts.

      Is it possible, you have a mixture of distributions in your data? It
      may be why when you combine many distributions (as in your pooled
      data), you tend to see normal distribution. Maybe you can try to
      segregate the data into classes based on magnitude (or some other
      criteria), and then model the classes separately. We had success
      doing that with placer gold. Each class was better modeled using data
      from within the class.

      Two of the three variograms look like all-nugget. Based on my
      experience, that is not a good sign, i.e. tough times ahead. Your
      best bet seems to be to estimate it at the pooled level (for which you
      have a variogram) and throw in a distribution around each ESU rather
      than go for individual predictions within the ESU.

      Rajive Ganguli, Ph.D., P.E., C.O.I
      Associate Professor of Mining Engineering
      University of Alaska Fairbanks
      Office: 317 Duckering Building
      Mailing Add: Box 755800, Fairbanks, AK 99775
      ph: 907-474-7631, fax: 907-474-6635
      web: http://www.faculty.uaf.edu/ffrg/
      "He uses statistics as a drunken man uses lamp-posts... for support rather
      than illumination." - Andrew Lang (1844-1912)

      On Fri, 05 Nov 2004 11:56:48 +0100, Koen Hufkens
      <koen.hufkens@...> wrote:
      > Dear list,
      > First I want to thank you all for the help you gave me last year. It
      > resulted in a Master degree with honours! So, thank you for all the
      > support, tips and tricks.
      > So, here I am again with a brainstorm question...
      > The situation:
      > -------------------------------------------
      > I'll give an idea of the analysis.
      > In short, I tested a sampling design three scale levels. An elementary
      > sample unit (ESU) level, a Cell level (1x1km) and a site level (3x3km).
      > The link shows you an illustration of the situation:
      > http://users.pandora.be/requested/images/sampling.jpg
      > In every ESU leaf are indexes were measured at the given locations, in
      > the given patern.
      > To check for spatial dependence at an ESU level I did a simple Moran's
      > I/Geary's C analysis. All results were negative, so I concluded that
      > under current conditions, in this vegetation sampling could as well be
      > done at random and location didn't matter. This had some implications,
      > in respect to further field surveys. Not having to deal with complex
      > site descriptions and measuring problems => costs less time and money.
      > Because of the tricky things like boundary situations, I skipped the
      > Cell level (1x1km) and went straight to the site level. Problem was that
      > the vallues of the whole pool weren't exactly normal and not fixable by
      > transformation. So for the whole pool of data I did an indicator
      > (kriging) analysis avoiding the distribution problem. This came out
      > negative with no spatial relations for any of the cutoff levels.
      > At all cutoff levels the semivariogram looked like this or close to it:
      > http://users.pandora.be/requested/images/cutoff2.gif
      > Just as an extra, the semivariogram of all the datapoint (far from
      > normal distributed) looks like this.. notice the downward curving of the
      > tail end of the semivariogram:
      > http://users.pandora.be/requested/images/semivario.png
      > (Lag distance in meter)
      > I also averaged the data on an ESU level, given 38 ESU's this isn't much
      > to work with but those data showed a normal distribution so I calculated
      > a semivariogram for this data.
      > The semivariogram for the averaged data:
      > http://users.pandora.be/requested/images/variogram.png
      > This would suggest a sill of around 0.10 and a range of some 1200 m.
      > Some wobly sinus movements can be seen in the semivariogram, this could
      > be due to the dune like environment of the site... but this is a rather
      > bold statement given the veeeery few sampling points (38 average ESU
      > values).
      > -----------------------------------------------------
      > So, If I need to interpolate between measured values for validation of
      > satellite images (the final goal of all this), I will need a model. And
      > more important, I can't use the images to validate the model because I
      > need the model to validate the images. So the model has to be rather good.
      > Is cross-validation of the semivariogram model a valid option to check
      > the model or not (given the small amount of data used for the
      > semivariogram and the model based on it)? Any other tests? Other ways to
      > optimise the current model?
      > Any ideas on approaches to test the sampling design itself, and not so
      > much the model that would be needed for an actual application
      > (validation of the sat. images)?
      > Any remarks no my current way of thinking? Mistakes I could have made?
      > Any ideas to get more out of the data, given the fact that the whole
      > pool (individual measurements) is faaaaaaaaaaaaaaaar from normal, but
      > the averaged data per ESU is?
      > Thank you for reading it all,
      > Koen.
      > * By using the ai-geostats mailing list you agree to follow its rules
      > ( see http://www.ai-geostats.org/help_ai-geostats.htm )
      > * To unsubscribe to ai-geostats, send the following in the subject or in the body (plain text format) of an email message to sympa@...
      > Signoff ai-geostats

    Your message has been successfully submitted and would be delivered to recipients shortly.