- Dear list,

I am working on a kriging problem of log-PCB concentrations in

river sediments (the coordinates have been "straightened"), using GSLib.

I have strong anisotropy with a ratio of about 1:6 (x:y). I have some

clustered locations as well as some sparsely sampled areas, and several

instances where the high and low concentrations are found very close to

eachother. The distribution is lognormal and I am working with

log-transformed values. The variograms are rather nice in both

directions. Nevertheless, ordinary kriging gives a very peculiar-looking

map (of log-concentrations). It would be too difficult to put into

words, so I have included maps of estimates, variance and local mean as

an attachment.

Does anybody know what causes this "plaid" effect? Looking at the map

of variances, it appears that an estimation location has low variance

if it has a data point directly above and next to it, but intermediate

variance if those same two data points are in a diagonal direction

relative to the axes of anisotropy, even if the new position takes the

estimation point closer to the data points. I would like to undestand the

reason for this effect, as well as whether there is something that can be

done about it.

Could the fact that there are high values embedded in low value locations

be partially responsible for these strange maps?

(I did experiment with octant search, various maximum search radii,

various min and max number of data points for estimation, and this effect

persists. I even reversed the angles of anisotropy, tried different

variogram ranges. The variogram ranges are about 20% of the width/length

of the domain, and the relative nugget effect is about 6% in both

directions)

Thanks very much!

Noemi

[Non-text portions of this message have been removed] - Hi,

I am working myself with pollution data in soils and i have very high

values very close to very low values, and highly skewed

distribution. I am more and more concerned with doing kriging on

transformed data. This simply means we believe the data came

from only one population. But what if it comes from 2 different

populations representing 2 different polluting processes? Much

more if we do believe there are no gross error measurements. The

fact that high values are very close to low values would tell me that

the spatial autocorrelation is violated locally. I would try first to see

if the outliers (local and global) represent a different population, if

these values cluster or not, how significant is the association high-

low values, and if the global Moran's I increases if i eliminate the

"outliers". Maybe the majority of the data which have a higher

spatial autocorrelation belong to a "better expressed" diffusive

process, (maybe an older one) while the rest of the data which

were identified as outliers before, represent a more patch-y or point

source pollution process which didn't have time to diffuse over the

entire study area (a younger process, maybe?).

Of course if you have proof that the data came from only one

population then .... it is a different story.

I will really appreciate to hear other opinions about these thoughts.

Thanks,

Monica

--

* To post a message to the list, send it to ai-geostats@...

* As a general service to the users, please remember to post a summary of any useful responses to your questions.

* To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list

* Support to the list is provided at http://www.ai-geostats.org > Hi,

Exploratory analysis of the frequency distribution of the data (i.e. the

>

> I am working myself with pollution data in soils and i have very high

> values very close to very low values, and highly skewed

> distribution. I am more and more concerned with doing kriging on

> transformed data. This simply means we believe the data came

> from only one population. But what if it comes from 2 different

> populations representing 2 different polluting processes? Much

> more if we do believe there are no gross error measurements. The

> fact that high values are very close to low values would tell me that

> the spatial autocorrelation is violated locally. I would try first to see

> if the outliers (local and global) represent a different population, if

> these values cluster or not, how significant is the association high-

> low values, and if the global Moran's I increases if i eliminate the

> "outliers". Maybe the majority of the data which have a higher

> spatial autocorrelation belong to a "better expressed" diffusive

> process, (maybe an older one) while the rest of the data which

> were identified as outliers before, represent a more patch-y or point

> source pollution process which didn't have time to diffuse over the

> entire study area (a younger process, maybe?).

aggregated, non-spatial, frequency) could reveal the existence of two (or

more) populations. To evaluate the evidence in favour of such an

hypothesis, you could compare the hypothesis that the frequency

distribution is formed by a mixture of two (or more) specified

distributions versus the hypothesis that it is formed by only one. The

general topic in statistics is called 'mixture distribution analysis' (not

to be confused with 'mixture models'). Useful references are:

Everitt & Hand, 1981, Mixture distribution analysis. Chapman & Hall

Chen & Chen, 2001, Statistics and Probability Letters 52:125

Hawkins et al., 2001, Computational Statistics & Data Analysis 38:15

http://www.math.mcmaster.ca/peter/mix/mix.html

Some robust regression methods, for example, are based on treating the

data as coming from a mixture of two distributions, the main one, and a

contaminating distribution.

If you conclude that there are two (or more) distributions, then you can

compute the maximum conditional probability that any given data point

belong to any of the two (or more) distributions, and use this computation

to classify data. After this exploratory analysis, you could treat the two

(or more) populations differently, if there is evidence for a mixture, and

maybe even perform separate geostatistical analyses on the separate

populations.

I used this general strategy in the analysis of a time series of an index

of returns from investments in finantial markets. The strategy was

proposed by Hamilton, 1994, Time Series Analysis, Ch. 22, Princeton U. P.

Ruben

--

* To post a message to the list, send it to ai-geostats@...

* As a general service to the users, please remember to post a summary of any useful responses to your questions.

* To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list

* Support to the list is provided at http://www.ai-geostats.org- Hello,

I agree that in many environmental datasets we could question the

assumption of existence of a single population. Although there are

ways to split the data into several populations, the key issue is

that the study area needs also to be stratified into several populations.

In some fields, such as geology, geological maps could provide

a stratification of the study area and helps delineating the boundaries

between populations. This is far less obvious for environmental

data sets.

Looking at Noemi's maps, I would agree with Richard's comment that

nothing seems to be out of the ordinary. Of course, when dealing with

streams the data configuration is far from optimal and screening effects

abound. Also, the strong anisotropy ratio means that we deal with

a "zonal-like" anisotopy which might cause sudden changes of covariance

for slight difference of angles. In particular, this covariance model

could lead to very small correlations off the two main axes of anisotropy,

which could explain the larger kriging variance observed along the

diagonal directions.

Pierre

<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

Dr. Pierre Goovaerts

President of PGeostat, LLC

Chief Scientist with Biomedware Inc.

710 Ridgemont Lane

Ann Arbor, Michigan, 48103-1535, U.S.A.

E-mail: goovaert@...

Phone: (734) 668-9900

Fax: (734) 668-7788

http://alumni.engin.umich.edu/~goovaert/

<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

On Tue, 9 Mar 2004, Monica Palaseanu-Lovejoy wrote:

> Hi,

>

> I am working myself with pollution data in soils and i have very high

> values very close to very low values, and highly skewed

> distribution. I am more and more concerned with doing kriging on

> transformed data. This simply means we believe the data came

> from only one population. But what if it comes from 2 different

> populations representing 2 different polluting processes? Much

> more if we do believe there are no gross error measurements. The

> fact that high values are very close to low values would tell me that

> the spatial autocorrelation is violated locally. I would try first to see

> if the outliers (local and global) represent a different population, if

> these values cluster or not, how significant is the association high-

> low values, and if the global Moran's I increases if i eliminate the

> "outliers". Maybe the majority of the data which have a higher

> spatial autocorrelation belong to a "better expressed" diffusive

> process, (maybe an older one) while the rest of the data which

> were identified as outliers before, represent a more patch-y or point

> source pollution process which didn't have time to diffuse over the

> entire study area (a younger process, maybe?).

>

> Of course if you have proof that the data came from only one

> population then .... it is a different story.

>

> I will really appreciate to hear other opinions about these thoughts.

>

> Thanks,

>

> Monica

>

> --

> * To post a message to the list, send it to ai-geostats@...

> * As a general service to the users, please remember to post a summary of any useful responses to your questions.

> * To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list

> * Support to the list is provided at http://www.ai-geostats.org

>

--

* To post a message to the list, send it to ai-geostats@...

* As a general service to the users, please remember to post a summary of any useful responses to your questions.

* To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list

* Support to the list is provided at http://www.ai-geostats.org