GEOSTATS: SUM: Quantifying Sample Clustering
- Thanks to the people who responded to my question regarding how to quantify the degree of sample clustering present on a site. The original question appears at the end of the message. The summary of the replies is:
1) Chuck Fritz suggested the following reference:
The best (and most easily understood) reference I found was:
Ludwig, John A and J. F. Reynolds. 1988. Statistical Ecology: a primer on
methods and computing. John Wiley and Sons, Inc.
2) John Kern discussed some experience he had comparing global estimates of the mean/kriging variance to confidence intervals from conditional simulations. He found the two approaches yielded similar results.
3) Jeff Myers suggested a couple of different ways to quantify the degree of clustering:
a) You could do a visual approach and plot an histogram of the
distances between points. A grid would be a spike, imperfect grids roughly
normal, clustered sampling would show a skewed graph.
b) You could also use your variogram program, which probably calculates an average lag distance. By using just one lag (the entire site), you could alter the program to output the variability of location pairs for a site clustering index.
4) I also ran into another source of information since my original e-mail that suggests the following method:
Calculate the standard deviation of the weights obtained from the Thiessen polygons(Sa). If the samples are distributed randomly, Sa = 0. If the samples are randomly distributed across the site Sa = [sqrt(0.280)/n]. If the samples are clustered Sa = 1/ sqrt(n).
I am looking at the effects of sample clustering on estimating
confidence intervals of the mean. I am working with five sites that
have varying degrees of sample clustering around known or suspected
"hotspots". In addition to qualitatively characterizing the amount of
clustering, I would like to use a numerical method to characterize the
amount of sample clustering present within each site.
I have used Thiessen polygons to assign "areas of influence" to each
sample. I am considering calculating a site "clustering index" for
each site by summing all the polygons that are less than an area = the
total site area/# of polygons (i.e. the area associated with an evenly
spaced sample configuration) and dividing this number by the total
number of polygons. This would produce a ratio of the number of
"clustered samples" to total samples. I anticipate the need to use some
level of tolerance in selecting those polygons that are less than (total
area/# polygons), say +/- 10-20%.
Am I reinventing the wheel here? Is there a method that has been widely
used? All suggestions would be appreciated.