Loading ...
Sorry, an error occurred while loading the content.

[ai-geostats] extreme values

Expand Messages
  • kai.zosseder@gla.bayern.de
    Hello list, I´m dealing with organic contamination in soil and groundwater and have some questions: 1. I get a quite good fitting with an omnidirectional
    Message 1 of 10 , Aug 30, 2004
    • 0 Attachment
      Hello list,

      I�m dealing with organic contamination in soil and groundwater and have some questions:

      1. I get a quite good fitting with an omnidirectional spherical variogram but the kriging standard deviation is relativly high and the results of the cross validation aren�t very good. How can I interpret that? Is it possible that extreme values in my data set can be responsible for that ?

      2. I �ve read that it is useful to use standardized variograms for minimize the influence of extreme values. Can I use the variogram parameters of the standaridized variogram as an input for the kriging system like the paramteres of a 'normal' semivariogram ?

      3. I get a very good fitting with another data set by a Power model. Can I interpret that as only a trend function ?
    • Monica Palaseanu-Lovejoy
      Hi, I am dealing with PAHs contamination data in soils as well. In my experience, depending where this contamination is (i mean if it is an old industrial
      Message 2 of 10 , Aug 30, 2004
      • 0 Attachment
        Hi,

        I am dealing with PAHs contamination data in soils as well. In my
        experience, depending where this contamination is (i mean if it is
        an old industrial site, a dump site, or something else) you may
        have actually more than one population, so your data you want to
        krige is a mixture of populations. There are statistical tools through
        which you can check if this is the case. But usually if you suspect
        at least 2 different pollution processes, for sure you will have a
        mixture of populations sampled.

        In this case, an indicator kriging or probability kriging or disjunctive
        indicator kriging might be more appropriate than actually predicting
        values. For environmental purposes, most of the times we are
        interested to see the probability with which a contaminant may be
        above (or not) the environmental threshold.

        If you are still interested in predicting values, a better solution, in
        my experience, is to use a bayesian kriging method. Such
        methods are implemented in the package R (which is free) with the
        geoR routine (http://cran.r-project.org/)({ HYPERLINK "http://cran.r-project.org/" }. Using this method i
        always had smaller error standard deviations, and the precision and
        accuracy are better than the "normal" kriging method.

        I hope this helps a little, good luck,

        Monica

        Monica Palaseanu-Lovejoy
        University of Manchester
        School of Geography
        Mansfield Cooper Bld. 3.21
        Oxford Road
        Manchester M13 9PL
        England, UK
        Tel: +44 (0) 275 8689
        Email: monica.palaseanu-lovejoy@...
      • Edzer J. Pebesma
        Monica Palaseanu-Lovejoy wrote: ... Thanks for sharing your experiences with us, Monica. I wondered if you published your results somewhere, because there is,
        Message 3 of 10 , Aug 30, 2004
        • 0 Attachment
          Monica Palaseanu-Lovejoy wrote:
          ....

          >If you are still interested in predicting values, a better solution, in
          >my experience, is to use a bayesian kriging method. Such
          >methods are implemented in the package R (which is free) with the
          >geoR routine (http://cran.r-project.org/)({ HYPERLINK "http://cran.r-project.org/" }. Using this method i
          >always had smaller error standard deviations, and the precision and
          >accuracy are better than the "normal" kriging method.
          >
          Thanks for sharing your experiences with us, Monica. I wondered if you
          published
          your results somewhere, because there is, AFAIK, little published
          material on
          comparisons of the "traditional" and the "model based" geostatistical
          approaches.

          You mention smaller error standard deviations -- I assume that you refer to
          cross validation error standard deviations, and not kriging prediction
          standard
          errors? How did you calculate precision and accuracy? In addition to
          specifying
          a variogram model, you also need to specify prior distribution on all
          variogram
          parameters in the model-based approach, how did you choose these?

          One paper that does the comparison is Moyeed and Papritz, Math Geol
          34(4), 365-386 but they found little improvement in using model-based as
          opposed
          to regular kriging; in their comparison case they used a large (n>2500)
          data set
          though.

          Anyone else who wants to shed light on this issue? Is there e.g. a minimum
          sample size above which both approaches become hard to distinguish?
          --
          Edzer
        • Glover, Tim
          Just a quick point on PAHs - are you aware that there is a general background concentration of PAHs everywhere? These come from air-deposition of PAHs from
          Message 4 of 10 , Aug 30, 2004
          • 0 Attachment
            Just a quick point on PAHs - are you aware that there is a general
            background concentration of PAHs everywhere? These come from
            air-deposition of PAHs from many sources, including natural forest
            fires, auto exhaust, incinerators, jet engines, etc. There are also
            PAHs in asphalt. Any site that has PAH contamination WILL be at least
            bi-modal - the "background" and any contamination.

            Tim Glover
            Senior Environmental Scientist - Geochemistry
            Geoenvironmental Department
            MACTEC Engineering and Consulting, Inc.
            Kennesaw, Georgia, USA
            Office 770-421-3310
            Fax 770-421-3486
            Email ntglover@...
            Web www.mactec.com

            -----Original Message-----
            From: Monica Palaseanu-Lovejoy
            [mailto:monica.palaseanu-lovejoy@...]
            Sent: Monday, August 30, 2004 7:36 AM
            To: kai.zosseder@...; ai-geostats@...
            Subject: Re: [ai-geostats] extreme values

            Hi,

            I am dealing with PAHs contamination data in soils as well. In my
            experience, depending where this contamination is (i mean if it is
            an old industrial site, a dump site, or something else) you may
            have actually more than one population, so your data you want to
            krige is a mixture of populations. There are statistical tools through
            which you can check if this is the case. But usually if you suspect
            at least 2 different pollution processes, for sure you will have a
            mixture of populations sampled.

            In this case, an indicator kriging or probability kriging or disjunctive

            indicator kriging might be more appropriate than actually predicting
            values. For environmental purposes, most of the times we are
            interested to see the probability with which a contaminant may be
            above (or not) the environmental threshold.

            If you are still interested in predicting values, a better solution, in
            my experience, is to use a bayesian kriging method. Such
            methods are implemented in the package R (which is free) with the
            geoR routine (http://cran.r-project.org/)({ HYPERLINK
            "http://cran.r-project.org/" }. Using this method i
            always had smaller error standard deviations, and the precision and
            accuracy are better than the "normal" kriging method.

            I hope this helps a little, good luck,

            Monica

            Monica Palaseanu-Lovejoy
            University of Manchester
            School of Geography
            Mansfield Cooper Bld. 3.21
            Oxford Road
            Manchester M13 9PL
            England, UK
            Tel: +44 (0) 275 8689
            Email: monica.palaseanu-lovejoy@...
          • Isobel Clark
            Hello Kai ... Your kriging standard deviation is a direct consequence of the semi-variogram model which you fitted. This, of course, is a direct reflection of
            Message 5 of 10 , Aug 30, 2004
            • 0 Attachment
              Hello Kai

              > 1. I get a quite good fitting with an
              > omnidirectional spherical variogram but the kriging
              > standard deviation is relativly high and the results
              > of the cross validation aren´t very good. How can I
              > interpret that? Is it possible that extreme values
              > in my data set can be responsible for that ?
              Your kriging standard deviation is a direct
              consequence of the semi-variogram model which you
              fitted. This, of course, is a direct reflection of the
              variance of your data. If your data follows a skewed
              distribution (or, at least, not very Normal) then the
              variance is affected by other factors than simple
              variability -- such as, extreme values in the 'tails'.

              You can probably get a much better semi-variogram by
              transforming your data in some way. Most software
              packages have a mechanism for this. This assumes that
              your extreme values are in the tail and not anomalies
              of some kind.

              > 2. I ´ve read that it is useful to use standardized
              > variograms for minimize the influence of extreme
              > values. Can I use the variogram parameters of the
              > standaridized variogram as an input for the kriging
              > system like the paramteres of a 'normal'
              > semivariogram ?
              Not if you want to do cross validation. See my paper
              'Does Geostatistics Work', 1979. Download from
              http://uk.geocities.com/drisobelclark/resume or Noel
              Cressie's paper which I cited last week.

              > 3. I get a very good fitting with another data set
              > by a Power model. Can I interpret that as only a
              > trend function ?
              Only if the power is approaching 2 or greater.

              Isobel Clark
              http://geoecosse.bizland.com/whatsnew.htm





              ___________________________________________________________ALL-NEW Yahoo! Messenger - all new features - even more fun! http://uk.messenger.yahoo.com
            • Monica Palaseanu-Lovejoy
              Hi, This is more or less the subject of my PhD thesis i hope to submit this January 2005 ;-)) I am working with small sample size (around 300 values).
              Message 6 of 10 , Aug 30, 2004
              • 0 Attachment
                Hi,

                This is more or less the subject of my PhD thesis i hope to submit
                this January 2005 ;-))

                I am working with small sample size (around 300 values).
                Specifying prior distribution may be tricky and i have to recognize
                that for me it is still a "try and error". Besides, when i want to
                predict values / or probabilities at different locations- i am doing
                that on a grid of 10 by 10 metres which gives me about 9000 cells.
                This stretches the limit of my computer at maximum.

                I represented the precision and accuracy graphically by plotting
                together the bayesian density and krige density curves for a certain
                measured value - for example. For kriging i have always a bell
                shape curve since the assumption is Gaussian. For the bayesian
                method the curve may resemble a bell shaped curve by i never got
                a true Gaussian shape until now. Usually the bayesian density
                curve is more "narrower" yielding a smaller prediction interval for a
                95% confidence. The validation error standard deviations are
                usually smaller for the bayesian method than for kriging. For the
                grid predictions, always the bayesian method yields smaller error
                standard deviations, does not matter how "good" the kriging was.

                I hope to be able to publish some of my results next year.
                Meanwhile i will test this bayesian method on the SIC2004 data as
                well.

                Thanks for the encouragements,

                Monica
                Monica Palaseanu-Lovejoy
                University of Manchester
                School of Geography
                Mansfield Cooper Bld. 3.21
                Oxford Road
                Manchester M13 9PL
                England, UK
                Tel: +44 (0) 275 8689
                Email: monica.palaseanu-lovejoy@...
              • Monica Palaseanu-Lovejoy
                Hi, Yes i know. For these reasons i have suggested to look if the data does not come from 2 different populations. Also, usually the background is not above
                Message 7 of 10 , Aug 30, 2004
                • 0 Attachment
                  Hi,

                  Yes i know. For these reasons i have suggested to look if the data
                  does not come from 2 different populations. Also, usually the
                  background is not above the environmental threshold, so indicator
                  kriging or probability kriging are more appropriate, in my opinion,
                  than doing predictions using a set of data coming from a mixture of
                  populations.

                  Thanks for stressing out that usually there is a "background" for
                  PAHs we should take into consideration.

                  Monica
                  Monica Palaseanu-Lovejoy
                  University of Manchester
                  School of Geography
                  Mansfield Cooper Bld. 3.21
                  Oxford Road
                  Manchester M13 9PL
                  England, UK
                  Tel: +44 (0) 275 8689
                  Email: monica.palaseanu-lovejoy@...
                • Soeren Nymand Lophaven
                  Based on my relatively limited knowledge on Bayesian kriging I have a few comments to the current discussion: - Bayesian kriging gives better predictions than
                  Message 8 of 10 , Aug 30, 2004
                  • 0 Attachment
                    Based on my relatively limited knowledge on Bayesian kriging I have a few
                    comments to the current discussion:

                    - Bayesian kriging gives better predictions than the classical approach if
                    you have relatively few data points and at the same time is able to come
                    up with good prior distributions for your model parameters.

                    - The two approaches gives similar predictions if you have many data
                    points.

                    - The Bayesian approach always results in higher prediction variances,
                    i.e. the classical kriging approach under estimates the prediction
                    variances, because it is assumed that the parameters are known, which in
                    practice they are not.

                    - I chapter 2 in the reference below there is a figure showing predictions
                    computed by the two approaches. Predictions were computed from a subset of
                    the Swiss rainfall dataset (SIC97) consisting of 100 data values. It is
                    seen that the predictions are very close to being exactly equal. This
                    means that if you are interested in prediction and have more than 100 data
                    values it does not matter which approach you use. If you for some reason
                    are interested in prediction variance, e.g. for comparing the efficiency
                    of different designs, then Bayesian kriging gives you the best answer.

                    Best regards / Venlig hilsen

                    Søren Lophaven
                    ******************************************************************************
                    Master of Science in Engineering | Ph.D. student
                    Informatics and Mathematical Modelling | Building 321, Room 011
                    Technical University of Denmark | 2800 kgs. Lyngby, Denmark
                    E-mail: snl@... | http://www.imm.dtu.dk/~snl
                    Telephone: +45 45253419 |
                    ******************************************************************************

                    On Mon, 30 Aug 2004, Edzer J. Pebesma wrote:

                    >
                    >
                    > Monica Palaseanu-Lovejoy wrote:
                    > ....
                    >
                    > >If you are still interested in predicting values, a better solution, in
                    > >my experience, is to use a bayesian kriging method. Such
                    > >methods are implemented in the package R (which is free) with the
                    > >geoR routine (http://cran.r-project.org/)({ HYPERLINK "http://cran.r-project.org/" }. Using this method i
                    > >always had smaller error standard deviations, and the precision and
                    > >accuracy are better than the "normal" kriging method.
                    > >
                    > Thanks for sharing your experiences with us, Monica. I wondered if you
                    > published
                    > your results somewhere, because there is, AFAIK, little published
                    > material on
                    > comparisons of the "traditional" and the "model based" geostatistical
                    > approaches.
                    >
                    > You mention smaller error standard deviations -- I assume that you refer to
                    > cross validation error standard deviations, and not kriging prediction
                    > standard
                    > errors? How did you calculate precision and accuracy? In addition to
                    > specifying
                    > a variogram model, you also need to specify prior distribution on all
                    > variogram
                    > parameters in the model-based approach, how did you choose these?
                    >
                    > One paper that does the comparison is Moyeed and Papritz, Math Geol
                    > 34(4), 365-386 but they found little improvement in using model-based as
                    > opposed
                    > to regular kriging; in their comparison case they used a large (n>2500)
                    > data set
                    > though.
                    >
                    > Anyone else who wants to shed light on this issue? Is there e.g. a minimum
                    > sample size above which both approaches become hard to distinguish?
                    > --
                    > Edzer
                    >
                    >
                    >
                  • Gregoire Dubois
                    Hello everyone, I m profiting from the discussion about Bayesian kriging to update my knowledge. Are there not various types of Bayesian kriging? I remember
                    Message 9 of 10 , Aug 31, 2004
                    • 0 Attachment
                      Hello everyone,

                      I'm profiting from the discussion about Bayesian kriging to update my
                      knowledge. Are there not various types of Bayesian kriging?

                      I remember having applied in 1998 methodologies and codes (in C)
                      developed in Klagenfurt, by the team of the Juergen Pilz (see
                      http://www.math.uni-klu.ac.at/?language=en ). If I remember well, I have
                      used functions like

                      - Subjective Bayesian kriging (SBK) is a scenario that is between Simple
                      Kriging (mean is known) and Ordinary kriging (mean unknown). In the case
                      of SBK, one has some knowledge about the min and max values taken by the
                      mean value of the variable that is analysed. In other words, the values
                      of the mean values are constrained. Various scenarios were implemented
                      in the code depending on the shape of the probability distribution
                      function. For what concerns the kriging variance, the theory predicts a
                      lower kriging variance for SBK only if the experimental semivariogram is
                      the true one. A case study I did in my PhD was to improve estimations of
                      radioactivity in Switzerland, using information provided by measurements
                      made in a neighbouring country. Although the statistical distribution of
                      these two datasets were very different but with similar mean values,
                      this information could be efficiently used to improve to clearly reduce
                      estimation errors. On the other hand, I often got a higher kriging
                      variance with SBK than with OK.

                      - Empirical Bayesian kriging (EBK): one has a much better knowledge of
                      the pdf of the analysed dataset than in SBK. I did apply it to
                      investigate two contaminated regions with similar distributions. Mean
                      errors were lower for EBK than for Ordinary kriging. However, I also
                      encountered many cases in which I got terrible results with EBK.

                      Are other versions of Bayesian kriging not those with known
                      semivariograms (Cui & Stein?) or those for which some knowledge about a
                      number of parameters of the semivariogram is known, etc. Thus, going
                      back to my first question, is there not a standard vocabulary that would
                      allow readers to distinguish the type of prior knowledge used when one
                      is talking about Bayesian kriging?

                      For what concerns the number of points to be used etc... I don't
                      understand the discussion. Should the correct question not be "how far
                      does the number of samples used reflect the prior knowledge?".

                      I hope I did not add too much confusion here :((

                      Cheers,

                      Gregoire

                      PS: useful resources about the above described methods:

                      Practically, the codes I used were written by Albrecht Gebhard( I think
                      they are still available from his web site)and had a number of bugs at
                      that time (in 1998-1999). The codes may have been updated since.

                      For what concerns the mathematical developments, I used papers from
                      Klagenfurt (all of them are in German, sorry). I enjoyed reading Pilz &
                      Knospe (1997): Eine Anwendung des Bayes Kriging in der
                      Lagerstaettentmodellierung. Glueckauf-Forschungshefte, 58(4): 670-677. I
                      also recommend the master's thesis of Gerhard Buchacher: Bayes'sche und
                      Empirisch Bayes'sche Methoden in der Geostatistik.

                      More recent codes and papers should be available from Juergen Pilz's and
                      Albrecht Gebhardt's homepages (again, see
                      http://www.math.uni-klu.ac.at/?language=en )

                      Hope this helps a bit.

                      __________________________________________
                      Gregoire Dubois (Ph.D.)
                      JRC - European Commission
                      IES - Emissions and Health Unit
                      Radioactivity Environmental Monitoring group
                      TP 441, Via Fermi 1
                      21020 Ispra (VA)
                      ITALY

                      Tel. +39 (0)332 78 6360
                      Fax. +39 (0)332 78 5466
                      Email: gregoire.dubois@...
                      WWW: http://www.ai-geostats.org
                      WWW: http://rem.jrc.cec.eu.int

                      "The views expressed are purely those of the writer and may not in any
                      circumstances be regarded as stating an official position of the
                      European Commission."





                      -----Original Message-----
                      From: Soeren Nymand Lophaven [mailto:snl@...]
                      Sent: 30 August 2004 22:13
                      To: Edzer J. Pebesma
                      Cc: Monica Palaseanu-Lovejoy; kai.zosseder@...;
                      ai-geostats@...
                      Subject: Re: [ai-geostats] extreme values



                      Based on my relatively limited knowledge on Bayesian kriging I have a
                      few comments to the current discussion:

                      - Bayesian kriging gives better predictions than the classical approach
                      if you have relatively few data points and at the same time is able to
                      come up with good prior distributions for your model parameters.

                      - The two approaches gives similar predictions if you have many data
                      points.

                      - The Bayesian approach always results in higher prediction variances,
                      i.e. the classical kriging approach under estimates the prediction
                      variances, because it is assumed that the parameters are known, which in
                      practice they are not.

                      - I chapter 2 in the reference below there is a figure showing
                      predictions computed by the two approaches. Predictions were computed
                      from a subset of the Swiss rainfall dataset (SIC97) consisting of 100
                      data values. It is seen that the predictions are very close to being
                      exactly equal. This means that if you are interested in prediction and
                      have more than 100 data values it does not matter which approach you
                      use. If you for some reason are interested in prediction variance, e.g.
                      for comparing the efficiency of different designs, then Bayesian kriging
                      gives you the best answer.

                      Best regards / Venlig hilsen

                      Søren Lophaven
                      ************************************************************************
                      ******
                      Master of Science in Engineering | Ph.D. student
                      Informatics and Mathematical Modelling | Building 321, Room 011
                      Technical University of Denmark | 2800 kgs. Lyngby, Denmark
                      E-mail: snl@... | http://www.imm.dtu.dk/~snl
                      Telephone: +45 45253419 |
                      ************************************************************************
                      ******

                      On Mon, 30 Aug 2004, Edzer J. Pebesma wrote:

                      >
                      >
                      > Monica Palaseanu-Lovejoy wrote:
                      > ....
                      >
                      > >If you are still interested in predicting values, a better solution,
                      > >in
                      > >my experience, is to use a bayesian kriging method. Such
                      > >methods are implemented in the package R (which is free) with the
                      > >geoR routine (http://cran.r-project.org/)({ HYPERLINK
                      "http://cran.r-project.org/" }. Using this method i
                      > >always had smaller error standard deviations, and the precision and
                      > >accuracy are better than the "normal" kriging method.
                      > >
                      > Thanks for sharing your experiences with us, Monica. I wondered if you
                      > published
                      > your results somewhere, because there is, AFAIK, little published
                      > material on
                      > comparisons of the "traditional" and the "model based" geostatistical
                      > approaches.
                      >
                      > You mention smaller error standard deviations -- I assume that you
                      > refer to cross validation error standard deviations, and not kriging
                      > prediction standard errors? How did you calculate precision and
                      > accuracy? In addition to specifying
                      > a variogram model, you also need to specify prior distribution on all
                      > variogram
                      > parameters in the model-based approach, how did you choose these?
                      >
                      > One paper that does the comparison is Moyeed and Papritz, Math Geol
                      > 34(4), 365-386 but they found little improvement in using model-based
                      > as opposed to regular kriging; in their comparison case they used a
                      > large (n>2500) data set
                      > though.
                      >
                      > Anyone else who wants to shed light on this issue? Is there e.g. a
                      > minimum sample size above which both approaches become hard to
                      > distinguish?
                      > --
                      > Edzer
                      >
                      >
                      >
                    • Monica Palaseanu-Lovejoy
                      Hi, Well, the bayesian kriging methods you are describing are somewhat different than what i am using. I am using R and geoR by Ribeiro and Diggle (2001). Web
                      Message 10 of 10 , Aug 31, 2004
                      • 0 Attachment
                        Hi,

                        Well, the bayesian kriging methods you are describing are
                        somewhat different than what i am using. I am using R and geoR
                        by Ribeiro and Diggle (2001).

                        Web pages for R:{ HYPERLINK "http://cran.r-project.org/" }http://cran.r-project.org/

                        Web page for geoR: www.est.ufpr.br/geoR

                        Usually with Bayesian kriging you will have higher variance just
                        because the uncertainty is incorporated in all (some) parameters,
                        while for the geostatistical kriging (or the "other kriging") there is no
                        uncertainty assumed for the semi-variogram model. So, in a way
                        kriging is a particular case of bayesian kriging as it is described by
                        Ribeiro and Diggle.

                        Uncertainty can be assumed for nugget, variance, mean and range,
                        or only for one parameter, or a combination of parameters. Usually
                        everything is depending on how well one is understanding the data,
                        or at least so i think. Citing from Ribeiro the inference is done by
                        Monte Carlo simulations, and samples are taken from the posterior
                        and predictive distributions and used for inference and predictions.
                        One of his algorithms looks like that:

                        1. Choose a range of values for phi (range parameter in
                        geostatistical kriging) which is sensible for the given data, and
                        assign a discrete uniform prior for phi on a set of values spanning
                        the chosen range;

                        2. compute the posterior probabilities on this discrete support set,
                        defining a discrete posterior distribution with probability mass
                        function pr(phi | y);

                        3. sample a value of phi from this discrete distribution pr(phi | y);

                        4. attach the sampled value phi to the distribution [beta, sigma
                        square |y, phi] and sample from this distribution (beta = mean
                        param., sigma square = variance, phi = range)

                        5. repeat steps 3 and 4 as many times as required / desired. the
                        resulting sample of the triplets (beta, sigma square, phi) is a
                        sample from the joint posterior distribution.

                        In my experience, if the data set is highly skewed and the spatial
                        autocorrelation is weak, bayesian kriging does a better job than
                        geostatistical kriging, even if the data is transformed to approach
                        normality. From literature (see the paper mentioned by Edzer
                        Pebesma - Moyeed and Papritz, Math Geol 34(4), 365-386) it
                        seems that for very large sets of data (n > 2500) the advantage
                        Bayesian kriging has over geostatistical kriging is minimal, while
                        with the data sets i am using (random locations, weak spatial
                        autocorrelation, areas of spatial heterogeneity, n in between 200 to
                        350 points), Bayesian kriging seems to be superior.

                        I hope this helps a little,

                        Monica


                        Monica Palaseanu-Lovejoy
                        University of Manchester
                        School of Geography
                        Mansfield Cooper Bld. 3.21
                        Oxford Road
                        Manchester M13 9PL
                        England, UK
                        Tel: +44 (0) 275 8689
                        Email: monica.palaseanu-lovejoy@...
                      Your message has been successfully submitted and would be delivered to recipients shortly.