I'm an ecologist with an interest in wildlife disease epidemiology. I have
two unique datasets representing indices of occurrence of the same disease
in two species, both highly mobile terrestrial mammals. Visually (in
postings) the two maps are convincingly similar - i.e. these data are
ecologically very interesting indeed! I want to test the spatial
correlation between the two datasets, because it's likely that one species
is the reservoir infecting the other.

Leaving aside my inadequacies as a mathematician (which cripple me as a
reader of geostats texts), my problems fall into two categories:

(1) Spatial autocorrelation

A logical first step would seem to be to test whether each dataset is
spatially autocorrelated. This seems likely when one examines the postings,
but semivariograms suggest that a sill is very quickly reached (at about 20
km), a distance that seems improbably small as we are dealing with highly
mobile mammals and epidemics that look to be 100-200 km across. In both
datasets, the variogram value is thereafter highly variable with increasing
distance, and there is some suggestion of an oscillating 'hole effect'.
However, as distance increases the variogram is clearly being influenced by
the shape of mainland Britain, and the timid faith I have in the variogram
at small h falls away rapidly as h increases. For instance, a location in
south-west Wales is very close to north Cornwall for a bird (by Euclidean
distance), but quite far away for a terrestrial mammal that must travel by
land around the Severn estuary.

A further consideration, if I have correctly understood the meaning of
stationarity, is that both datasets have an underlying trend, with values
increasing from west to east and north to south. Despite having read at
length in the AI-Geostats archives, I am still unsure how to deal with this
in practical terms.

For each species alone, the variogram is theoretically of tremendous
interest. How local are the epidemics? How close must an epidemic be
before a given animal is at risk? Is there genuinely a rippling effect
surrounding a disease epicentre? Unfortunately, from the outset there seems
to be a discrepancy between the variogram and my eye. I am a sworn disciple
of objectivity, but I'm not yet convinced that my variogram is doing the
right thing.

(2) Sampling locations and correlation between the datasets

Both datasets cover the whole UK (including some islands, which are easily
and logically excluded), but originate from two populations of people
(hunters and veterinarians) with necessarily different geographical
distributions - i.e. they are not colocated. I could convert both datasets
to a common regular grid, but this involves interpolation, a number of
assumptions, and the creation of quite a few new grid locations that have NO
data from one or both dataset(s). If I did convert to a common grid, I am
then at a loss to know how to proceed further. The two datasets do not have
similar underlying distributions. One is an incidence (count of diseased
animals per unit effort), and is easily normalised by a log transformation.
The other is a measure of prevalence, with many essential (meaningful) zeros
that make transformation awkward and perhaps undesirable; these prevalence
data can also be weighted by the sample size on which each is based.

Please can anyone suggest a route forward? I have read (all the easy words
in) quite a number of textbooks. So far as I can judge (I pull up all too
soon), most books stop short of problems like this because no
self-respecting miner would burden himself by collecting data so awkwardly.
For me, this is a crude pilot study, hence a stratified sampling programme
to test a hypothesis will be the next stage IF I can formalise the
correlation that looks so blindingly obvious to the naked eye. So please
don't suggest I do my sampling differently.

Jonathan Reynolds

Dr Jonathan C Reynolds
The Game Conservancy Trust
Fordingbridge
Hampshire SP6 1EF
UK

tel: +44 (0)1425 652381
FAX: +44 (0)1425 651026
email: jreynolds@...
website: www.gct.org.uk/index.html

--
On Fri, 8 Sep 2000, Jonathan Reynolds wrote:

> I'm an ecologist with an interest in wildlife disease epidemiology. I have
> two unique datasets representing indices of occurrence of the same disease
> in two species, both highly mobile terrestrial mammals. Visually (in
> postings) the two maps are convincingly similar - i.e. these data are
> ecologically very interesting indeed! I want to test the spatial
> correlation between the two datasets, because it's likely that one species
> is the reservoir infecting the other.
>

Are the data in fact two sets of points of occurrences? Could the
hypothesised dependence between them be tested rather using point pattern
analysis than geostatistics, say by the Ripley K12-function? Or are they
some other number on some spatial basis?

Just curious,

Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Norway. voice: +47 55 95 93 55; fax +47 55 95 93 93
e-mail: Roger.Bivand@...
and: Department of Geography and Regional Development, University of
Gdansk, al. Mar. J. Pilsudskiego 46, PL-81 378 Gdynia, Poland.

--
I am contemplating the purchase of ESRI's ResearchAnalyst, and I would
appreciate feedback from anyone who has used it. Is it stable? Worth
purchasing? (BTW: We have AV 3.2 w/ spatial analyst FWIW)

--Shane
Shane Hornibrook, (http://www.geologist.net/)

GIS Statistician/Programmer/Analyst
Department of Community Health and Epidemiology
Clinical Research Centre
5849 University Avenue, Halifax, Nova Scotia, B3H 4H7

--
