Loading ...
Sorry, an error occurred while loading the content.

GEOSTATS: SUM: Quantifying Sample Clustering

Expand Messages
  • Bill Thayer
    Thanks to the people who responded to my question regarding how to quantify the degree of sample clustering present on a site. The original question appears
    Message 1 of 1 , Jan 16, 1999
      Thanks to the people who responded to my question regarding how to quantify the degree of sample clustering present on a site.  The original question appears at the end of the message.  The summary of the replies is:

      1)  Chuck Fritz suggested the following reference:
               The best (and most easily understood) reference I found was:
                  Ludwig, John A and J. F. Reynolds. 1988.  Statistical Ecology: a primer on
                  methods and computing.  John Wiley and Sons, Inc.

      2)  John Kern discussed some experience he had comparing global estimates of the mean/kriging variance to confidence intervals from conditional simulations.   He found the two approaches yielded similar results.

      3) Jeff Myers suggested a couple of different ways to quantify the degree of clustering:
          a)    You could do a visual approach and plot an histogram of the
      distances between points. A grid would be a spike, imperfect grids roughly
      normal, clustered sampling would show a skewed graph.
          b)    You could also use your variogram program, which probably calculates an average lag distance. By using just one lag (the entire site), you could alter the program to output the variability of location pairs for a site clustering index.

      4) I also ran into another source of information since my original e-mail that suggests the following method:
          Calculate the standard deviation of the weights obtained from the Thiessen polygons(Sa).  If the samples are distributed randomly, Sa = 0.  If the samples are randomly distributed across the site Sa = [sqrt(0.280)/n].  If the samples are clustered Sa = 1/ sqrt(n).

      Original Question:

      I am looking at the effects of sample clustering on estimating
      confidence intervals of the mean.  I am working with five sites that
      have varying degrees of sample clustering around known or suspected
      "hotspots".  In addition to qualitatively characterizing the amount of
      clustering, I would like to use a numerical method to characterize the
      amount of sample clustering present within each site.

      I have used Thiessen polygons to assign "areas of influence" to each
      sample.  I am considering calculating a site "clustering index"  for
      each site by summing all the polygons that are less than an area = the
      total site area/# of polygons (i.e. the area associated with an evenly
      spaced sample configuration) and dividing this number by the total
      number of polygons.  This would produce a ratio of the number of
      "clustered samples" to total samples.  I anticipate the need to use some
      level of tolerance in selecting those polygons that are less than (total
      area/# polygons), say +/- 10-20%.

      Am I reinventing the wheel here?  Is there a method that has been widely
      used?  All suggestions would be appreciated.

    Your message has been successfully submitted and would be delivered to recipients shortly.