Loading ...
Sorry, an error occurred while loading the content.

AI-GEOSTATS: mysterious kriging output

Expand Messages
  • Noemi Barabas
    Dear list, I am working on a kriging problem of log-PCB concentrations in river sediments (the coordinates have been straightened ), using GSLib. I have
    Message 1 of 5 , Mar 8, 2004
    • 0 Attachment
      Dear list,

      I am working on a kriging problem of log-PCB concentrations in
      river sediments (the coordinates have been "straightened"), using GSLib.
      I have strong anisotropy with a ratio of about 1:6 (x:y). I have some
      clustered locations as well as some sparsely sampled areas, and several
      instances where the high and low concentrations are found very close to
      eachother. The distribution is lognormal and I am working with
      log-transformed values. The variograms are rather nice in both
      directions. Nevertheless, ordinary kriging gives a very peculiar-looking
      map (of log-concentrations). It would be too difficult to put into
      words, so I have included maps of estimates, variance and local mean as
      an attachment.

      Does anybody know what causes this "plaid" effect? Looking at the map
      of variances, it appears that an estimation location has low variance
      if it has a data point directly above and next to it, but intermediate
      variance if those same two data points are in a diagonal direction
      relative to the axes of anisotropy, even if the new position takes the
      estimation point closer to the data points. I would like to undestand the
      reason for this effect, as well as whether there is something that can be
      done about it.

      Could the fact that there are high values embedded in low value locations
      be partially responsible for these strange maps?

      (I did experiment with octant search, various maximum search radii,
      various min and max number of data points for estimation, and this effect
      persists. I even reversed the angles of anisotropy, tried different
      variogram ranges. The variogram ranges are about 20% of the width/length
      of the domain, and the relative nugget effect is about 6% in both
      directions)

      Thanks very much!

      Noemi




      [Non-text portions of this message have been removed]
    • Monica Palaseanu-Lovejoy
      Hi, I am working myself with pollution data in soils and i have very high values very close to very low values, and highly skewed distribution. I am more and
      Message 2 of 5 , Mar 9, 2004
      • 0 Attachment
        Hi,

        I am working myself with pollution data in soils and i have very high
        values very close to very low values, and highly skewed
        distribution. I am more and more concerned with doing kriging on
        transformed data. This simply means we believe the data came
        from only one population. But what if it comes from 2 different
        populations representing 2 different polluting processes? Much
        more if we do believe there are no gross error measurements. The
        fact that high values are very close to low values would tell me that
        the spatial autocorrelation is violated locally. I would try first to see
        if the outliers (local and global) represent a different population, if
        these values cluster or not, how significant is the association high-
        low values, and if the global Moran's I increases if i eliminate the
        "outliers". Maybe the majority of the data which have a higher
        spatial autocorrelation belong to a "better expressed" diffusive
        process, (maybe an older one) while the rest of the data which
        were identified as outliers before, represent a more patch-y or point
        source pollution process which didn't have time to diffuse over the
        entire study area (a younger process, maybe?).

        Of course if you have proof that the data came from only one
        population then .... it is a different story.

        I will really appreciate to hear other opinions about these thoughts.

        Thanks,

        Monica

        --
        * To post a message to the list, send it to ai-geostats@...
        * As a general service to the users, please remember to post a summary of any useful responses to your questions.
        * To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
        * Support to the list is provided at http://www.ai-geostats.org
      • Ruben Roa Ureta
        ... Exploratory analysis of the frequency distribution of the data (i.e. the aggregated, non-spatial, frequency) could reveal the existence of two (or more)
        Message 3 of 5 , Mar 9, 2004
        • 0 Attachment
          > Hi,
          >
          > I am working myself with pollution data in soils and i have very high
          > values very close to very low values, and highly skewed
          > distribution. I am more and more concerned with doing kriging on
          > transformed data. This simply means we believe the data came
          > from only one population. But what if it comes from 2 different
          > populations representing 2 different polluting processes? Much
          > more if we do believe there are no gross error measurements. The
          > fact that high values are very close to low values would tell me that
          > the spatial autocorrelation is violated locally. I would try first to see
          > if the outliers (local and global) represent a different population, if
          > these values cluster or not, how significant is the association high-
          > low values, and if the global Moran's I increases if i eliminate the
          > "outliers". Maybe the majority of the data which have a higher
          > spatial autocorrelation belong to a "better expressed" diffusive
          > process, (maybe an older one) while the rest of the data which
          > were identified as outliers before, represent a more patch-y or point
          > source pollution process which didn't have time to diffuse over the
          > entire study area (a younger process, maybe?).

          Exploratory analysis of the frequency distribution of the data (i.e. the
          aggregated, non-spatial, frequency) could reveal the existence of two (or
          more) populations. To evaluate the evidence in favour of such an
          hypothesis, you could compare the hypothesis that the frequency
          distribution is formed by a mixture of two (or more) specified
          distributions versus the hypothesis that it is formed by only one. The
          general topic in statistics is called 'mixture distribution analysis' (not
          to be confused with 'mixture models'). Useful references are:

          Everitt & Hand, 1981, Mixture distribution analysis. Chapman & Hall
          Chen & Chen, 2001, Statistics and Probability Letters 52:125
          Hawkins et al., 2001, Computational Statistics & Data Analysis 38:15
          http://www.math.mcmaster.ca/peter/mix/mix.html

          Some robust regression methods, for example, are based on treating the
          data as coming from a mixture of two distributions, the main one, and a
          contaminating distribution.

          If you conclude that there are two (or more) distributions, then you can
          compute the maximum conditional probability that any given data point
          belong to any of the two (or more) distributions, and use this computation
          to classify data. After this exploratory analysis, you could treat the two
          (or more) populations differently, if there is evidence for a mixture, and
          maybe even perform separate geostatistical analyses on the separate
          populations.

          I used this general strategy in the analysis of a time series of an index
          of returns from investments in finantial markets. The strategy was
          proposed by Hamilton, 1994, Time Series Analysis, Ch. 22, Princeton U. P.

          Ruben

          --
          * To post a message to the list, send it to ai-geostats@...
          * As a general service to the users, please remember to post a summary of any useful responses to your questions.
          * To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
          * Support to the list is provided at http://www.ai-geostats.org
        • Pierre Goovaerts
          Hello, I agree that in many environmental datasets we could question the assumption of existence of a single population. Although there are ways to split the
          Message 4 of 5 , Mar 9, 2004
          • 0 Attachment
            Hello,

            I agree that in many environmental datasets we could question the
            assumption of existence of a single population. Although there are
            ways to split the data into several populations, the key issue is
            that the study area needs also to be stratified into several populations.
            In some fields, such as geology, geological maps could provide
            a stratification of the study area and helps delineating the boundaries
            between populations. This is far less obvious for environmental
            data sets.

            Looking at Noemi's maps, I would agree with Richard's comment that
            nothing seems to be out of the ordinary. Of course, when dealing with
            streams the data configuration is far from optimal and screening effects
            abound. Also, the strong anisotropy ratio means that we deal with
            a "zonal-like" anisotopy which might cause sudden changes of covariance
            for slight difference of angles. In particular, this covariance model
            could lead to very small correlations off the two main axes of anisotropy,
            which could explain the larger kriging variance observed along the
            diagonal directions.

            Pierre

            <><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

            Dr. Pierre Goovaerts
            President of PGeostat, LLC
            Chief Scientist with Biomedware Inc.
            710 Ridgemont Lane
            Ann Arbor, Michigan, 48103-1535, U.S.A.

            E-mail: goovaert@...
            Phone: (734) 668-9900
            Fax: (734) 668-7788
            http://alumni.engin.umich.edu/~goovaert/

            <><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

            On Tue, 9 Mar 2004, Monica Palaseanu-Lovejoy wrote:

            > Hi,
            >
            > I am working myself with pollution data in soils and i have very high
            > values very close to very low values, and highly skewed
            > distribution. I am more and more concerned with doing kriging on
            > transformed data. This simply means we believe the data came
            > from only one population. But what if it comes from 2 different
            > populations representing 2 different polluting processes? Much
            > more if we do believe there are no gross error measurements. The
            > fact that high values are very close to low values would tell me that
            > the spatial autocorrelation is violated locally. I would try first to see
            > if the outliers (local and global) represent a different population, if
            > these values cluster or not, how significant is the association high-
            > low values, and if the global Moran's I increases if i eliminate the
            > "outliers". Maybe the majority of the data which have a higher
            > spatial autocorrelation belong to a "better expressed" diffusive
            > process, (maybe an older one) while the rest of the data which
            > were identified as outliers before, represent a more patch-y or point
            > source pollution process which didn't have time to diffuse over the
            > entire study area (a younger process, maybe?).
            >
            > Of course if you have proof that the data came from only one
            > population then .... it is a different story.
            >
            > I will really appreciate to hear other opinions about these thoughts.
            >
            > Thanks,
            >
            > Monica
            >
            > --
            > * To post a message to the list, send it to ai-geostats@...
            > * As a general service to the users, please remember to post a summary of any useful responses to your questions.
            > * To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
            > * Support to the list is provided at http://www.ai-geostats.org
            >

            --
            * To post a message to the list, send it to ai-geostats@...
            * As a general service to the users, please remember to post a summary of any useful responses to your questions.
            * To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
            * Support to the list is provided at http://www.ai-geostats.org
          Your message has been successfully submitted and would be delivered to recipients shortly.