- Dec 5, 2004Most of the tests of hypotheses that have been mentioned recently on this list

serv are non-spatial, i.e., there is nothing in the underlying statistical

assumptions that specifically pertains to spatial data. The one common

assumption is "random sampling" or "iid" (independent, identically

distributed). In many typical (non-spatial) applications, this assumption is

ensured by the "design of the experiment", i.e., the way the data is generated

and collected. Spatial data problems more often involve "observational data"

which does not easily lend itself to being able to design the experiment in such

a way as to ensure this basic assumption.

In the case of spatial data, random site selection does not necessarily

correspond to "random sampling". In the case of the random function model

implicit in most of geostatistics, the data is a non-random sample from one

realization of the random function (in that context using random site selection

does not then make it a "random sample"). Note that not all spatial statistical

analysis methods are based on this random function model.

Normality is another common underlying assumption in many hypothesis tests. In

the case of random sampling from a distribution with a finite moment of order

2+delta, delta >0 then the distribution of the sample mean will converge IN

DISTRIBUTION to a normal distribution. This means that a sequence of functions

is converging to another function. It is important to note that this convergence

may be pointwise or uniform or uniform on intervals. Pointwise is you usually

get from the Central Limit Theorem, this means that the rate of convergence

depends on where you are on the curve. The difference between using a normal

statistic vs using a t-statistic usually is the difference between a known

variance and an unknown variance (and hence estimated). But in either case the

variance is assumed to exist and be finite. The sample variance can always be

computed from a data set but that does not ensure that the variance of the

distribution exists. The quotient of two standard normal random variables has a

Cauchy distribution, neither the mean nor the variance is finite. Hence the

Central Limit Theorem does not apply.

In the case of a non-normal distribution one really needs to know how robust the

test is to deviation from normality, increasing the sample size does not really

solve this problem.

Finally note that most tests of hypotheses are not exactly "neutral", there is a

tendency to accept the null hypothesis UNLESS there is evidence against the null

hypothesis, this is one of the reasons for the emphasis on the POWER of the

test. Often the null hypothesis is the "status quo" and this logical stance for

the null and alternative hypotheses is okay but not in all circumstances.

However in some tests for normality (which still depend on the assumption of

random sampling) the test is set up in such a way that the null hypothesis

corresponds to the conclusion of normality. E.g., Chi-square tests. If you are

trying to argue that it is safe to assume normality then you want to accept the

null hypothesis and you should want a very high power for the test, you don't

want a small p-vallue, instead you want a very large p-value. Note that the

normal distribution is symmetric but not all symmetric distributions are normal.

Donald Myers