Loading ...
Sorry, an error occurred while loading the content.

1468Re: AI-GEOSTATS: mysterious kriging output

Expand Messages
  • Ruben Roa Ureta
    Mar 9, 2004
      > Hi,
      >
      > I am working myself with pollution data in soils and i have very high
      > values very close to very low values, and highly skewed
      > distribution. I am more and more concerned with doing kriging on
      > transformed data. This simply means we believe the data came
      > from only one population. But what if it comes from 2 different
      > populations representing 2 different polluting processes? Much
      > more if we do believe there are no gross error measurements. The
      > fact that high values are very close to low values would tell me that
      > the spatial autocorrelation is violated locally. I would try first to see
      > if the outliers (local and global) represent a different population, if
      > these values cluster or not, how significant is the association high-
      > low values, and if the global Moran's I increases if i eliminate the
      > "outliers". Maybe the majority of the data which have a higher
      > spatial autocorrelation belong to a "better expressed" diffusive
      > process, (maybe an older one) while the rest of the data which
      > were identified as outliers before, represent a more patch-y or point
      > source pollution process which didn't have time to diffuse over the
      > entire study area (a younger process, maybe?).

      Exploratory analysis of the frequency distribution of the data (i.e. the
      aggregated, non-spatial, frequency) could reveal the existence of two (or
      more) populations. To evaluate the evidence in favour of such an
      hypothesis, you could compare the hypothesis that the frequency
      distribution is formed by a mixture of two (or more) specified
      distributions versus the hypothesis that it is formed by only one. The
      general topic in statistics is called 'mixture distribution analysis' (not
      to be confused with 'mixture models'). Useful references are:

      Everitt & Hand, 1981, Mixture distribution analysis. Chapman & Hall
      Chen & Chen, 2001, Statistics and Probability Letters 52:125
      Hawkins et al., 2001, Computational Statistics & Data Analysis 38:15
      http://www.math.mcmaster.ca/peter/mix/mix.html

      Some robust regression methods, for example, are based on treating the
      data as coming from a mixture of two distributions, the main one, and a
      contaminating distribution.

      If you conclude that there are two (or more) distributions, then you can
      compute the maximum conditional probability that any given data point
      belong to any of the two (or more) distributions, and use this computation
      to classify data. After this exploratory analysis, you could treat the two
      (or more) populations differently, if there is evidence for a mixture, and
      maybe even perform separate geostatistical analyses on the separate
      populations.

      I used this general strategy in the analysis of a time series of an index
      of returns from investments in finantial markets. The strategy was
      proposed by Hamilton, 1994, Time Series Analysis, Ch. 22, Princeton U. P.

      Ruben

      --
      * To post a message to the list, send it to ai-geostats@...
      * As a general service to the users, please remember to post a summary of any useful responses to your questions.
      * To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
      * Support to the list is provided at http://www.ai-geostats.org
    • Show all 5 messages in this topic