Loading ...
Sorry, an error occurred while loading the content.
 

[ai-geostats] question about kriging with skewed distribution

Expand Messages
  • Ing. Marek Brabec PhD
    hello, I have a question about what is/should be typically done when kriging is used for spatial interpolation of a process X(z) where z gives spatial location
    Message 1 of 3 , Mar 4 7:39 AM
      hello,
      I have a question about what is/should be typically done when kriging is
      used for spatial interpolation of a process X(z) where z gives spatial
      location (e.g. z=(x,y) with cartesian coordinates x,y) and X(z) has a
      skewed continuous distribution with nonnegative support. For instance
      lognormal.
      Now,
      if all data are in the form of point samples, X(z)'s can obviously be
      transformed by taking logs to Y(z)=log(X(z)) which are exactly (with
      lognormal X's) or approximately Gaussian, so that kriging can be done
      comfortably (and the result backtransformed with easy correction for the
      fact that E f(X) is generally not equal to f(E X), based on the formula
      for lognormal expected value or Taylor expansion).
      If at least some data are not point samples, but correspond to the
      regional averages, then problem occurs due to the facts that: i) sum
      of lognormals is not lognormal, ii) the log of the sum (or average)
      of lognormals is not normal.
      Obviously, one can do:
      i) the kriging on logs anyway with some hand-waving (effectively
      replacing sums by products based on delta method),
      ii) or one can (quite inefficiently) work with original data without log
      transformation and argue that at least method of moments estimators
      are invoked (with proper weighting),
      iii)or one can use some kind of Monte Carlo computationally-intensive
      approach to compute likelihood (or posterior) based on sums of
      lognormals.
      At this point, I am not interested in either of the three. My question
      is whether people used some other parametric family (it cannot be
      lognormal) of marginal distributions with positive support, positive
      skew, that is closed under convolution (or under taking weighted
      averages, to be more general) - so that the regional averages and point
      values will have distribution of the same type, differing only in
      parameters (just like in normal case and real support case). One
      possibility would be gamma, what about others?
      Thanks in advance for any suggestions.
      Best Regards
      Ing. Marek Brabec, PhD
    • Ruben Roa Ureta
      ... As far as i know, traditional geostatistics as originated in Matheron is distribution-free. The analysis does not require a pre-experimental probability
      Message 2 of 3 , Mar 5 5:55 AM
        > hello,
        > I have a question about what is/should be typically done when kriging is
        > used for spatial interpolation of a process X(z) where z gives spatial
        > location (e.g. z=(x,y) with cartesian coordinates x,y) and X(z) has a
        > skewed continuous distribution with nonnegative support. For instance
        > lognormal.

        As far as i know, traditional geostatistics as originated in Matheron is
        distribution-free. The analysis does not require a pre-experimental
        probability model for the data, thus it does not rely on any
        post-experimental likelihood function. So it does not matter, for kriging,
        if the data are skewed or has any shape whatsoever. That is the theory at
        least. People may still want to work with symmetrical distributions
        because they may not be entirely confortable with the theory?

        > Now,
        > if all data are in the form of point samples, X(z)'s can obviously be
        > transformed by taking logs to Y(z)=log(X(z)) which are exactly (with
        > lognormal X's) or approximately Gaussian, so that kriging can be done
        > comfortably (and the result backtransformed with easy correction for the
        > fact that E f(X) is generally not equal to f(E X), based on the formula
        > for lognormal expected value or Taylor expansion).

        Yes, though the data may only be a little lognormal. If it is exactly
        lognormal then the parameter of the Box-Cox transformation is 0, but
        values like -0.1 or +0.1 can produce more symetrical distributions . This
        parameter can be estimated along with spatial correlation function
        parameters to let the data decide what precise transformation makes it
        look more Gaussian. For this you would need to set up a formal statistical
        model for the data instead of following the traditional distribution-free
        methodology. Check the info on geoR, a contributred package to R.

        > If at least some data are not point samples, but correspond to the
        > regional averages, then problem occurs due to the facts that: i) sum
        > of lognormals is not lognormal, ii) the log of the sum (or average)
        > of lognormals is not normal.

        If you don't have raw data but averages then within the likelihood-based
        approach you may want to think of a marginal likelihood model to carry
        over the uncertainty associated with the averaging into the final
        analysis. I think this is rather complicated. On the other hand, maybe
        there is no such problem within the traditional distribution-free school
        because the uncertainty associated to the fitting of the spatial model
        ususally is ignored.

        [snip the rest for brevity]

        Ruben
      • Isobel Clark
        Ruben (et al) It is true that Matheron s theory is based on no distributional assumptions. In fact, there is no requirement for the distribution to be the same
        Message 3 of 3 , Mar 5 6:21 AM
          Ruben (et al)

          It is true that Matheron's theory is based on no
          distributional assumptions. In fact, there is no
          requirement for the distribution to be the same at
          every location in the study area.

          The necessity for using traditional geostatistical
          theory is that the 'difference between two values'
          should have a common distribution for a specified
          distance (and possibly direction). The form of this
          distribution is irrelevant but it needs to possess a
          mean and variance.

          The problem lies not with the theory but with the
          practice. If you have the whole 'realisation' you can
          calculate the true average and variance and the shape
          of each distribution is irrelevant. If you have only a
          few samples, then you can only find estimates for the
          means and variances at each distance.

          If the underlying distribution is highly skewed then,
          unless you have ideal conditions (large number of
          samples, regular sampling locations), your estimate of
          the variance will be unstable -- influenced by the
          average of the samples included in the particular
          estimate. There was a huge amount of debate about this
          "proportional effect" back in the 70s [search for
          'relative semi-variogram'].

          So, you have two potential problems:

          (1) you may not get any true picture of the
          semi-variogram due to the uncertainty associated with
          each point exacerbated by the proportional effect;

          (2) you may not wish to use an averaging technique
          such as kriging on skewed samples. All of Sichel's
          (mining) and much of Krige's work was motivated by the
          fact that local averaging is not sensible when your
          data has a coefficient of variation greater than
          around 1.

          The theory is terrific, witness its survival for over
          40 years and its proliferation over many fields of
          application. However, real life isn't so tidy at the
          sharp end ;-)

          Isobel
          http://geoecosse.bizland.com/whatsnew.htm
        Your message has been successfully submitted and would be delivered to recipients shortly.