Loading ...
Sorry, an error occurred while loading the content.

Re: [ai-geostats] question about kriging with skewed distribution

Expand Messages
  • Ruben Roa Ureta
    ... As far as i know, traditional geostatistics as originated in Matheron is distribution-free. The analysis does not require a pre-experimental probability
    Message 1 of 3 , Mar 5, 2005
    • 0 Attachment
      > hello,
      > I have a question about what is/should be typically done when kriging is
      > used for spatial interpolation of a process X(z) where z gives spatial
      > location (e.g. z=(x,y) with cartesian coordinates x,y) and X(z) has a
      > skewed continuous distribution with nonnegative support. For instance
      > lognormal.

      As far as i know, traditional geostatistics as originated in Matheron is
      distribution-free. The analysis does not require a pre-experimental
      probability model for the data, thus it does not rely on any
      post-experimental likelihood function. So it does not matter, for kriging,
      if the data are skewed or has any shape whatsoever. That is the theory at
      least. People may still want to work with symmetrical distributions
      because they may not be entirely confortable with the theory?

      > Now,
      > if all data are in the form of point samples, X(z)'s can obviously be
      > transformed by taking logs to Y(z)=log(X(z)) which are exactly (with
      > lognormal X's) or approximately Gaussian, so that kriging can be done
      > comfortably (and the result backtransformed with easy correction for the
      > fact that E f(X) is generally not equal to f(E X), based on the formula
      > for lognormal expected value or Taylor expansion).

      Yes, though the data may only be a little lognormal. If it is exactly
      lognormal then the parameter of the Box-Cox transformation is 0, but
      values like -0.1 or +0.1 can produce more symetrical distributions . This
      parameter can be estimated along with spatial correlation function
      parameters to let the data decide what precise transformation makes it
      look more Gaussian. For this you would need to set up a formal statistical
      model for the data instead of following the traditional distribution-free
      methodology. Check the info on geoR, a contributred package to R.

      > If at least some data are not point samples, but correspond to the
      > regional averages, then problem occurs due to the facts that: i) sum
      > of lognormals is not lognormal, ii) the log of the sum (or average)
      > of lognormals is not normal.

      If you don't have raw data but averages then within the likelihood-based
      approach you may want to think of a marginal likelihood model to carry
      over the uncertainty associated with the averaging into the final
      analysis. I think this is rather complicated. On the other hand, maybe
      there is no such problem within the traditional distribution-free school
      because the uncertainty associated to the fitting of the spatial model
      ususally is ignored.

      [snip the rest for brevity]

      Ruben
    • Isobel Clark
      Ruben (et al) It is true that Matheron s theory is based on no distributional assumptions. In fact, there is no requirement for the distribution to be the same
      Message 2 of 3 , Mar 5, 2005
      • 0 Attachment
        Ruben (et al)

        It is true that Matheron's theory is based on no
        distributional assumptions. In fact, there is no
        requirement for the distribution to be the same at
        every location in the study area.

        The necessity for using traditional geostatistical
        theory is that the 'difference between two values'
        should have a common distribution for a specified
        distance (and possibly direction). The form of this
        distribution is irrelevant but it needs to possess a
        mean and variance.

        The problem lies not with the theory but with the
        practice. If you have the whole 'realisation' you can
        calculate the true average and variance and the shape
        of each distribution is irrelevant. If you have only a
        few samples, then you can only find estimates for the
        means and variances at each distance.

        If the underlying distribution is highly skewed then,
        unless you have ideal conditions (large number of
        samples, regular sampling locations), your estimate of
        the variance will be unstable -- influenced by the
        average of the samples included in the particular
        estimate. There was a huge amount of debate about this
        "proportional effect" back in the 70s [search for
        'relative semi-variogram'].

        So, you have two potential problems:

        (1) you may not get any true picture of the
        semi-variogram due to the uncertainty associated with
        each point exacerbated by the proportional effect;

        (2) you may not wish to use an averaging technique
        such as kriging on skewed samples. All of Sichel's
        (mining) and much of Krige's work was motivated by the
        fact that local averaging is not sensible when your
        data has a coefficient of variation greater than
        around 1.

        The theory is terrific, witness its survival for over
        40 years and its proliferation over many fields of
        application. However, real life isn't so tidy at the
        sharp end ;-)

        Isobel
        http://geoecosse.bizland.com/whatsnew.htm
      Your message has been successfully submitted and would be delivered to recipients shortly.