I am a graduate student at Dartmouth College and a new user to AI-GEOSTATS. I am hoping to use geostatistical techniques to help describe the distribution and abundance of bird species in a forested ecosystem, with the goal of understanding the mechanisms that influence spatial variation in abundance. I have censused bird species at approx. 350 locations in a 3000ha forested valley and am beginning to analyze the data. Additionally, I have access to a wide variety of other environmental and ecological variables from each census plot.

As a first cut I hope to use variogram analysis and kriging to provide a thorough description of the spatial distribution of the bird census data. However, I am concerned with the nature of the census data - as with most bird census data, the range is generally from 0-3 individuals/point (in intervals of 0.33) and the data is highly skewed to the left with many zero values. For example, for one species at 373 plots the mean per plot is 0.55, the range is 0-2.67 and I have 110 values of "0", 91 values of "0.33", 81 of "0.66", 38 of "1", 30 of "1.33", 14 of "1.66", 5 of "2", 3 of "2.33" and 1 of "2.67"

Due to the highly skewed nature of this data, do I have to transform it before I attempt the variogram analysis and kriging? What transformation would work best? I have attempted to search the ai-geostats archive and the intro texts and have not been able to answer these simple questions. Are there any other references that may help me with this analysis?

DO NOT SEND Subscribe/Unsubscribe requests to the list! - One should be a little careful about accepting the validity of a test for

the "equality" of two variograms. If one uses an estimator such as the

sample variogram, one only obtains estimates of the values of the variogram

for a finite number of lags (note that dealing with a possible anisotropy

makes it even more complicated). Moreover the reliability of these

estimates varies, in part because the numbers of pairs will vary. If one is

using the variogram for kriging or simulation then one is most interested

in the behavior of the variogram, i.e., the values for short lags and

unfortunately the short lags usually have the smallest numbers of pairs. If

one uses least squares or maximum likelihood then one must first choose a

model (or models in the case of a nested model) and then one of these is

used to estimate the parameters.

There is an old paper by Davis and Borgman in Mathematical Geology (circa

1980) on the distribution of the sample variogram, they give two results:

(1) beginning with an assumption of multivariate normality (which is not

testable) and an assumed model type then they obtain numerical results for

the distribution , (2) they obtain asymptotic results which are

theoretically interesting but probably not much help in practice.

There is also a paper in Mathematical Geology, circa 1990, on the "true"

numbers of pairs. The problem as is well known is that there is an

interdependence between the pairs used to estimate for one lag and those

used to estimate for another. The author has to assume multivariate

normality to derive the results.

It is known that the kriging estimator is relatively robust with respect to

the variogram, i.e., slight changes in the variogram will result in only

slight changes in the kriging weight vector and hence in general only

slight changes in the kriged values. There are at least two different ways

to quantify the "distance" between two variograms, these correspond to a

notion of continuity. A third one corresponds to differentiability, none of

the three implies the others.

In practice one often uses a search neighborhood in kriging hence it is

only of interest whether the variograms match or are at least close for up

to some maximum lag. One will have very little information about the

variogram for longer lags anyway.

In general statistical tests will require some distributional assumptions

and these are hard to obtain for variograms/variogram estimators. It is an

interesting question to ask, i.e., are the variograms for two different

variables or the same variable for two different regions the same but one

that will be hard to test without making very strong assumptions

(non-testable assumptions).

Finally one might want to consider the question of sample location pattern

design relative to testing the equality of two variograms. I have an old

paper with A.W. Warrick on the design of sampling plans in order to control

the numbers of pairs for each lag. If one assumes isotropy (it is even more

complicated in the case of anisotropy) then the pattern that generates an

equal number of pairs is a spiral, not a very practical result.

Note also that if one assumes normality then the distribution of the

half-squared differences will be Chi-Squared (one can see this effect in

most sample variograms, the VARIO component of GEOEAS will provide

histograms for these distributions). Not a particularly nice distribution

for testing because of the "fat" tails.

At 05:02 PM 2/14/00 -0800, you wrote:>Assume that we have two sets of geostatistical data. Is there any

>statistical test to determine whether variograms on those two sets are the

>same?

