Loading ...
Sorry, an error occurred while loading the content.

AI-GEOSTATS: Extreme values?

Expand Messages
  • Chaosheng Zhang
    Dear all, My question is: How to deal with the extreme/outlying values in a data set? I am dealing with heavy metal concentrations in soils from a mine area.
    Message 1 of 10 , Dec 13, 2001
    • 0 Attachment
      Dear all,

      My question is: How to deal with the extreme/outlying values in a data set?

      I am dealing with heavy metal concentrations in soils from a mine area. The sample number is 223, and the samples are spatially evenly distributed with the sampling interval of 400 metres. There are several samples with extremely high values, which makes me feel uncomfortable. The percentiles of the dataset are listed as follows (in mg/kg):

      Zn Cu Pb Cd As
      Min 4 1 25 0.0 2
      5% 35 6 35 0.1 6
      10% 40 7 41 0.2 7
      25% 65 13 62 0.3 9
      50% 122 18 168 0.6 15
      75% 338 27 821 1.5 28
      90% 907 56 2799 2.8 58
      95% 1986 116 4490 4.2 80
      96% 2462 151 4698 4.9 82
      97% 3493 178 5413 6.2 91
      98% 4697 207 7609 8.3 111
      99% 6712 247 11750 12.4 184
      Max 11473 1293 16305 48.5 1060

      When doing geostatistical and statistical analyses, we need some confidence in dealing with the these very high extreme values which account for less than 2% of the total sample number.

      Any suggestions?

      Cheers,

      Chaosheng Zhang
      ===================================
      Dr. Chaosheng Zhang
      Department of Geography
      National University of Ireland
      Galway
      IRELAND

      Tel: +353-91-524411 ext. 2375
      Fax: +353-91-525700
      Email: Chaosheng.Zhang@...
      ===================================



      [Non-text portions of this message have been removed]
    • Isobel Clark
      ... The real priority is to establish why you have extreme highs. For example: (1) is there a high imprecision in measuring the values, so that the sample
      Message 2 of 10 , Dec 13, 2001
      • 0 Attachment
        > My question is: How to deal with the
        > extreme/outlying values in a data set?
        The real priority is to establish why you have extreme
        highs. For example:

        (1) is there a high imprecision in measuring the
        values, so that the sample observations are actually
        inaccurate? If so, is it relative to the value or a
        flat error?

        (2) do you have a skewed distribution of values?

        (3) do you have two (or more) populations, only one of
        which gives the high values?

        and there may be others. Once you determine the reason
        for extreme values, then you can more objectively know
        how to deal with them.

        For example, if you think (2) is most likely than look
        at transformations or distribution-free approaches to
        geostatistics. You can find some of my papers in
        dealing with positivel skewed distributions at:

        http://uk.geocities.com/drisobelclark/resume/Publications.html

        If (3) is more likely - as may be probable is your are
        looking at an area where samples may be 'background'
        or 'contaminated' - you really need to identify the
        populations first. Then you may be able to apply a
        mixture model together with indicator geostatistical
        approaches.

        If (1) is your problem, then you may be able to use a
        rough non-parametric approach to get to cross
        validation. The 'error statistics' in a cross
        validation exercise will often assist in identifying
        erroneous sample measurements.

        Hope this helps
        Isobel Clark




        __________________________________________________
        Do You Yahoo!?
        Everything you'll ever need on one web page
        from News and Sport to Email and Music Charts
        http://uk.my.yahoo.com

        --
        * To post a message to the list, send it to ai-geostats@...
        * As a general service to the users, please remember to post a summary of any useful responses to your questions.
        * To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
        * Support to the list is provided at http://www.ai-geostats.org
      • Chaosheng Zhang
        Dear Isobel, Thanks for your quick and helpful reply! (1) I would like to trust both the accuracy and precision of the dataset, and the real problem is how we
        Message 3 of 10 , Dec 13, 2001
        • 0 Attachment
          Dear Isobel,

          Thanks for your quick and helpful reply!

          (1) I would like to trust both the accuracy and precision of the dataset,
          and the real problem is how we "play the computer game". The extreme values
          may be from the samples which by chance contains many minerals.

          (2) From the information of percentiles I provided in the message, you can
          find that
          the dataset is heavily skewed in deed. Logarithmic transformation can make
          some of the variables follow the "normal distribution", but not all.
          However, the extreme values still look extreme in the transformed dataset.

          (3) There may be two populations: "background" and "mineralised". However,
          there is really no way to "dichotomise" the two populations. Geographically
          or mathematically? Geographically, there are three areas of high values.
          Mathematically, we need some proof. Even though we could properly separate
          the datasets into two "populations", the extreme values may still be extreme
          in the "mineralised" population.

          Since the really "bad" values are only <2% of the total number (such as 4 or
          5 values out of the total number of 223, which can also be seen from the
          percentiles), I am unwilling to use nonparametric methods until we cannot
          find a way to use the parametric methods.

          Another problem is when we carry out spatial interpolation, these values may
          produce artificial contour lines around these sampling locations, even
          though they can be smoothed. I don't think this is the realistic situation
          in the field.

          Well, I am still not very confident what the best way should be ... I know
          the worst way is to discard these "outlying" values, and the second worst
          way is to use non-parametric methods.

          Cheers,

          Chaosheng Zhang


          ----- Original Message -----
          From: "Isobel Clark" <drisobelclark@...>
          To: "Chaosheng Zhang" <Chaosheng.Zhang@...>
          Cc: <ai-geostats@...>
          Sent: Thursday, December 13, 2001 2:18 PM
          Subject: Re: AI-GEOSTATS: Extreme values?


          > > My question is: How to deal with the
          > > extreme/outlying values in a data set?
          > The real priority is to establish why you have extreme
          > highs. For example:
          >
          > (1) is there a high imprecision in measuring the
          > values, so that the sample observations are actually
          > inaccurate? If so, is it relative to the value or a
          > flat error?
          >
          > (2) do you have a skewed distribution of values?
          >
          > (3) do you have two (or more) populations, only one of
          > which gives the high values?
          >
          > and there may be others. Once you determine the reason
          > for extreme values, then you can more objectively know
          > how to deal with them.
          >
          > For example, if you think (2) is most likely than look
          > at transformations or distribution-free approaches to
          > geostatistics. You can find some of my papers in
          > dealing with positivel skewed distributions at:
          >
          > http://uk.geocities.com/drisobelclark/resume/Publications.html
          >
          > If (3) is more likely - as may be probable is your are
          > looking at an area where samples may be 'background'
          > or 'contaminated' - you really need to identify the
          > populations first. Then you may be able to apply a
          > mixture model together with indicator geostatistical
          > approaches.
          >
          > If (1) is your problem, then you may be able to use a
          > rough non-parametric approach to get to cross
          > validation. The 'error statistics' in a cross
          > validation exercise will often assist in identifying
          > erroneous sample measurements.
          >
          > Hope this helps
          > Isobel Clark
          >
          >
          >
          >
          > __________________________________________________
          > Do You Yahoo!?
          > Everything you'll ever need on one web page
          > from News and Sport to Email and Music Charts
          > http://uk.my.yahoo.com


          --
          * To post a message to the list, send it to ai-geostats@...
          * As a general service to the users, please remember to post a summary of any useful responses to your questions.
          * To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
          * Support to the list is provided at http://www.ai-geostats.org
        • Marcel Vallée
          Dear Chaosheng Zang The sampling interval is so wide that the high values could easily be related to hot spots of higher grade contamination, i..e dumping
          Message 4 of 10 , Dec 13, 2001
          • 0 Attachment
            Dear Chaosheng Zang

            The sampling interval is so wide that the high values could easily be related to "hot spots" of
            higher grade contamination, i..e dumping areas for particular kinds of slags, mineralized
            waste, etc. A property map might help.

            Have you contoured the data? If so, the sampling interval is so wide that real hot spots of
            environmental significance might not show 2D distribution on such a wide sampling grid,
            however.

            Regards

            Marcel Vallée, Eng,, Geo.
            Geoconseil Marcel Vallée Inc.
            706 Routhier Ave
            Québec, Québec G1X 3J9
            Canada
            Tel: (1) 418 652 3497
            Fax: (1) 418 652 9148
            Email: vallee.marcel@...

            ==============================================
            13/12/01 08:01:48, Chaosheng Zhang <Chaosheng.Zhang@...> wrote:

            >
            > Date: Thu, 13 Dec 2001 13:01:48 +0000
            >
            > From: Chaosheng Zhang <Chaosheng.Zhang@...>
            > Subject:AI-GEOSTATS: Extreme values?
            > To: ai-geostats@...
            >
            >
            >
            > Dear all,
            >
            > My question is: How to deal with the extreme/outlying values in a data set?
            >
            > I am dealing with heavy metal concentrations in soils from a mine area. The
            >
            > sample number is 223, and the samples are spatially evenly distributed with
            > the sampling interval of 400 metres. There are several samples with
            > extremely high values, which makes me feel uncomfortable. The percentiles of
            > the dataset are listed as follows (in mg/kg):
            >
            >
            > Zn Cu Pb Cd As
            > Min 4 1 25 0.0 2
            > 5% 35 6 35 0.1 6
            > 10% 40 7 41 0.2 7
            >
            > 25% 65 13 62 0.3 9
            > 50% 122 18 168 0.6 15
            > 75% 338 27 821 1.5 28
            > 90% 907 56 2799 2.8 58
            >
            > 95% 1986 116 4490 4.2 80
            > 96% 2462 151 4698 4.9 82
            > 97% 3493 178 5413 6.2 91
            > 98% 4697 207 7609 8.3 111
            >
            > 99% 6712 247 11750 12.4 184
            > Max 11473 1293 16305 48.5 1060
            > When doing geostatistical and statistical analyses, we need some confidence
            > in dealing with the these very high extreme values which account for less
            >
            > than 2% of the total sample number.
            >
            > Any suggestions?
            >
            > Cheers,
            >
            > Chaosheng Zhang
            > ===================================
            > Dr. Chaosheng Zhang
            > Department of Geography
            > National University of Ireland
            > Galway
            > IRELAND
            >
            > Tel: +353-91-524411 ext. 2375
            > Fax: +353-91-525700
            > Email: Chaosheng.Zhang@...
            > ===================================




            --
            * To post a message to the list, send it to ai-geostats@...
            * As a general service to the users, please remember to post a summary of any useful responses to your questions.
            * To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
            * Support to the list is provided at http://www.ai-geostats.org
          • Chaosheng Zhang
            Dear Marcel Vallée, Thanks. I think the sampling density is good enough to reveal the spatial structure, and the extreme samples are located within the hot
            Message 5 of 10 , Dec 14, 2001
            • 0 Attachment
              Dear Marcel Vallée,

              Thanks. I think the sampling density is good enough to reveal the spatial
              structure, and the extreme samples are located within the "hot spots". The
              problem is that the few values are still extremely high within the "hot
              spots". This may be what the "nugget effect" means.

              I'm just wondering if these few extreme values should really be "discarded"/
              "censored" or replaced. However, this could get some criticism as they may
              be "real".

              If it is hard to find the best way, I will have to "replace" all the extreme
              values with 99% or 98% percentiles. But I'm not sure if it is appropriate to
              do so.

              Cheers,

              Chaosheng Zhang


              ----- Original Message -----
              From: "Marcel Vallée" <vallee.marcel@...>
              To: <ai-geostats@...>; "Chaosheng Zhang" <Chaosheng.Zhang@...>
              Sent: Thursday, December 13, 2001 10:40 PM
              Subject: Re: AI-GEOSTATS: Extreme values?


              >
              > Dear Chaosheng Zang
              >
              > The sampling interval is so wide that the high values could easily be
              related to "hot spots" of
              > higher grade contamination, i..e dumping areas for particular kinds of
              slags, mineralized
              > waste, etc. A property map might help.
              >
              > Have you contoured the data? If so, the sampling interval is so wide that
              real hot spots of
              > environmental significance might not show 2D distribution on such a wide
              sampling grid,
              > however.
              >
              > Regards
              >
              > Marcel Vallée, Eng,, Geo.
              > Geoconseil Marcel Vallée Inc.
              > 706 Routhier Ave
              > Québec, Québec G1X 3J9
              > Canada
              > Tel: (1) 418 652 3497
              > Fax: (1) 418 652 9148
              > Email: vallee.marcel@...
              >
              > ==============================================
              > 13/12/01 08:01:48, Chaosheng Zhang <Chaosheng.Zhang@...> wrote:
              >
              > >
              > > Date: Thu, 13 Dec 2001 13:01:48 +0000
              > >
              > > From: Chaosheng Zhang <Chaosheng.Zhang@...>
              > > Subject:AI-GEOSTATS: Extreme values?
              > > To: ai-geostats@...
              > >
              > >
              > >
              > > Dear all,
              > >
              > > My question is: How to deal with the extreme/outlying values in a data
              set?
              > >
              > > I am dealing with heavy metal concentrations in soils from a mine area.
              The
              > >
              > > sample number is 223, and the samples are spatially evenly distributed
              with
              > > the sampling interval of 400 metres. There are several samples with
              > > extremely high values, which makes me feel uncomfortable. The
              percentiles of
              > > the dataset are listed as follows (in mg/kg):
              > >
              > >
              > > Zn Cu Pb Cd As
              > > Min 4 1 25 0.0 2
              > > 5% 35 6 35 0.1 6
              > > 10% 40 7 41 0.2 7
              > >
              > > 25% 65 13 62 0.3 9
              > > 50% 122 18 168 0.6 15
              > > 75% 338 27 821 1.5 28
              > > 90% 907 56 2799 2.8 58
              > >
              > > 95% 1986 116 4490 4.2 80
              > > 96% 2462 151 4698 4.9 82
              > > 97% 3493 178 5413 6.2 91
              > > 98% 4697 207 7609 8.3 111
              > >
              > > 99% 6712 247 11750 12.4 184
              > > Max 11473 1293 16305 48.5 1060
              > > When doing geostatistical and statistical analyses, we need some
              confidence
              > > in dealing with the these very high extreme values which account for
              less
              > >
              > > than 2% of the total sample number.
              > >
              > > Any suggestions?
              > >
              > > Cheers,
              > >
              > > Chaosheng Zhang
              > > ===================================
              > > Dr. Chaosheng Zhang
              > > Department of Geography
              > > National University of Ireland
              > > Galway
              > > IRELAND
              > >
              > > Tel: +353-91-524411 ext. 2375
              > > Fax: +353-91-525700
              > > Email: Chaosheng.Zhang@...
              > > ===================================
              >
              >
              >
              >
              > --
              > * To post a message to the list, send it to ai-geostats@...
              > * As a general service to the users, please remember to post a summary of
              any useful responses to your questions.
              > * To unsubscribe, send an email to majordomo@... with no subject and
              "unsubscribe ai-geostats" followed by "end" on the next line in the message
              body. DO NOT SEND Subscribe/Unsubscribe requests to the list
              > * Support to the list is provided at http://www.ai-geostats.org




              --
              * To post a message to the list, send it to ai-geostats@...
              * As a general service to the users, please remember to post a summary of any useful responses to your questions.
              * To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
              * Support to the list is provided at http://www.ai-geostats.org
            • claudio.cocheo
              Dear Chaosheng, ... Is it possible, in your opinion, to model your variogram excluding those few extremes data and after to krige all data, included the
              Message 6 of 10 , Dec 14, 2001
              • 0 Attachment
                Dear Chaosheng,

                > Thanks. I think the sampling density is good enough to reveal the spatial
                > structure, and the extreme samples are located within the "hot spots". The
                > problem is that the few values are still extremely high within the "hot
                > spots". This may be what the "nugget effect" means.
                >
                > I'm just wondering if these few extreme values should really be
                > "discarded"/
                > "censored" or replaced. However, this could get some criticism as they may
                > be "real".

                Is it possible, in your opinion, to model your variogram excluding those few
                extremes data and after to krige all data, included the extremes values?
                In this way, probably, you loose some spatial information concerning the
                variability of your data but you could obtain a more reliable picture of the
                "background" values. It depends from what you are asking to your data.
                What you, or somebody else, think about?

                regards
                Claudio

                ----------------------------------------------------------------------------
                -----------------------------

                Claudio Cocheo
                Fondazione Salvatore Maugeri - IRCCS
                Centro di Ricerche Ambientali
                via Svizzera, 16
                I 35127 - Padova
                ph. (39) 0498064511
                fax (39) 0498064555
                mailto:ccocheo@...
                website: http://www.fsm.it


                --
                * To post a message to the list, send it to ai-geostats@...
                * As a general service to the users, please remember to post a summary of any useful responses to your questions.
                * To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
                * Support to the list is provided at http://www.ai-geostats.org
              • Martin Roseveare
                Chaosheng Zhang said Another problem is when we carry out spatial interpolation, these values may produce artificial contour lines around these sampling
                Message 7 of 10 , Dec 14, 2001
                • 0 Attachment
                  Chaosheng Zhang said

                  "Another problem is when we carry out spatial interpolation, these values
                  may
                  produce artificial contour lines around these sampling locations, even
                  though they can be smoothed. I don't think this is the realistic situation
                  in the field."

                  This sounds like the crux of the problem. You sampled data and within it you
                  have discrete large values. You have confidence in the integrity of the data
                  but don't accept that for these values to be genuine you must have all these
                  'artificial' contour lines. This suggests to me that you are expecting the
                  data to behave so that these large values don't exist, yet you are saying
                  they should be regarded as valid. Is your sampling at a high enough spatial
                  resolution?

                  If you were to sample another point right next to one of these large values
                  would you expect another large value or a more 'normal' one? If you know the
                  answer to that then you should be able to decide whether the large values
                  are truly errors or simply unexpected but valid data. I would suggest the
                  problem here lies with understanding the underlying spatial variation of the
                  data set from which the samples were taken, rather than a problem of which
                  process to apply to the sampled data.

                  Just another way of looking at it!

                  regards,

                  Martin

                  ______________________________________

                  ArchaeoPhysica Ltd.
                  Reconnaissance & Geophysics for Archaeology

                  Telephone: +44 (0) 7050 369789
                  E-mail: mail@...
                  Website: http://www.archaeophysica.co.uk
                  ______________________________________

                  This e-mail is intended only for the addressee
                  named above and may contain confidential or
                  privileged information. If you receive this e-mail
                  by mistake please advise the sender and destroy
                  it without further disclosure of its content.

                  Unless otherwise stated no opinions expressed in
                  this e-mail should be regarded as representative of
                  any policy of ArchaeoPhysica Ltd.


                  --
                  * To post a message to the list, send it to ai-geostats@...
                  * As a general service to the users, please remember to post a summary of any useful responses to your questions.
                  * To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
                  * Support to the list is provided at http://www.ai-geostats.org
                • Pierre Goovaerts
                  Hello, The crux of the problem is the smoothing effect of kriging. If you don t want to get artificial countour lines in your map, you have 2 choices: 1. use
                  Message 8 of 10 , Dec 14, 2001
                  • 0 Attachment
                    Hello,

                    The crux of the problem is the smoothing effect of kriging.
                    If you don't want to get artificial countour lines in your
                    map, you have 2 choices:
                    1. use stochastic simulation which generates maps that
                    are consistent with (reproduce) the variability of your data.
                    2. use a non-exact interpolator, that is filter the
                    noise at data locations. An alternative is to slightly
                    shift the interpolation grid so that no interpolation
                    grid node coincides with a sampled location.

                    Pierre
                    <><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

                    ________ ________
                    | \ / | Pierre Goovaerts
                    |_ \ / _| Assistant professor
                    __|________\/________|__ Dept of Civil & Environmental Engineering
                    | | The University of Michigan
                    | M I C H I G A N | EWRE Building, Room 117
                    |________________________| Ann Arbor, Michigan, 48109-2125, U.S.A
                    _| |_\ /_| |_
                    | |\ /| | E-mail: goovaert@...
                    |________| \/ |________| Phone: (734) 936-0141
                    Fax: (734) 763-2275
                    http://www-personal.engin.umich.edu/~goovaert/

                    <><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>


                    On Fri, 14 Dec 2001, Martin Roseveare wrote:

                    > Chaosheng Zhang said
                    >
                    > "Another problem is when we carry out spatial interpolation, these values
                    > may
                    > produce artificial contour lines around these sampling locations, even
                    > though they can be smoothed. I don't think this is the realistic situation
                    > in the field."
                    >
                    > This sounds like the crux of the problem. You sampled data and within it you
                    > have discrete large values. You have confidence in the integrity of the data
                    > but don't accept that for these values to be genuine you must have all these
                    > 'artificial' contour lines. This suggests to me that you are expecting the
                    > data to behave so that these large values don't exist, yet you are saying
                    > they should be regarded as valid. Is your sampling at a high enough spatial
                    > resolution?
                    >
                    > If you were to sample another point right next to one of these large values
                    > would you expect another large value or a more 'normal' one? If you know the
                    > answer to that then you should be able to decide whether the large values
                    > are truly errors or simply unexpected but valid data. I would suggest the
                    > problem here lies with understanding the underlying spatial variation of the
                    > data set from which the samples were taken, rather than a problem of which
                    > process to apply to the sampled data.
                    >
                    > Just another way of looking at it!
                    >
                    > regards,
                    >
                    > Martin
                    >
                    > ______________________________________
                    >
                    > ArchaeoPhysica Ltd.
                    > Reconnaissance & Geophysics for Archaeology
                    >
                    > Telephone: +44 (0) 7050 369789
                    > E-mail: mail@...
                    > Website: http://www.archaeophysica.co.uk
                    > ______________________________________
                    >
                    > This e-mail is intended only for the addressee
                    > named above and may contain confidential or
                    > privileged information. If you receive this e-mail
                    > by mistake please advise the sender and destroy
                    > it without further disclosure of its content.
                    >
                    > Unless otherwise stated no opinions expressed in
                    > this e-mail should be regarded as representative of
                    > any policy of ArchaeoPhysica Ltd.
                    >
                    >
                    > --
                    > * To post a message to the list, send it to ai-geostats@...
                    > * As a general service to the users, please remember to post a summary of any useful responses to your questions.
                    > * To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
                    > * Support to the list is provided at http://www.ai-geostats.org
                    >


                    --
                    * To post a message to the list, send it to ai-geostats@...
                    * As a general service to the users, please remember to post a summary of any useful responses to your questions.
                    * To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
                    * Support to the list is provided at http://www.ai-geostats.org
                  • Marcel Vallée
                    Dear Chaosheng Zhang This problem can be looked in various perspectives. You have to fit the data in the broader picture and objectives. First, what do your
                    Message 9 of 10 , Dec 14, 2001
                    • 0 Attachment
                      Dear Chaosheng Zhang

                      This problem can be looked in various perspectives. You have to fit the data in the broader
                      picture and objectives.

                      First, what do your soil samples represent? How were they collected, what was their size? Are
                      they spot samples, multiple takes in a cross pattern with x metres between takes up to y
                      meters away from the centre? Etc.?

                      A significant part of nuggets effects when dealing with rock or soil materials may be sampling
                      and sample preparation generated. If these samples were assayed by AA, what was the size
                      of the portion used? If one gram, it is much more liable to generating a nugget effect than with 5
                      or 10 grams whenever pulverisation size was not fine enough and uniform.

                      Second, what is the purpose of your study. Academic work? Detection, remediation-
                      restoration, etc.? The high values might have physical significance in the later perspective
                      and smothing them may not be the ideal solution. Lead and Arsenic contamination cannot be
                      neglected or minimized.

                      In an industry or regulation perspective, the recommendation in that case might be to to carry
                      out additional sampling around the hot spots to delineate them better, say samples at 100 m
                      spacing, as well as checking the original hot spots, with a sampling method designed to be
                      representative. I am afraid I may not be easing you out of your problem, but such is physical
                      reality.

                      Chapter 8 in Jeff Myer's book "Geostatistical Error Management," deals with sampling and
                      Chapter 16 with sampling strategy. I published a text on "Sampling Quality Control" in a
                      mineral exploration and development perspective in Exploration and Mining Geology, Vol 7,
                      No 1-2, p. 107-116 (1998). This issue has several other papers on sampling. If it is not
                      available to you, I could send you a file copy of my paper.

                      Cheers

                      Marcel Vallée

                      Geoconseil Marcel Vallée Inc.
                      706 Routhier Ave
                      Québec, Québec G1X 3J9
                      Canada
                      Tel: (1) 418 652 3497
                      Fax: (1) 418 652 9148
                      Email: vallee.marcel@...

                      ================================================

                      14/12/01 06:33:35, Chaosheng Zhang <Chaosheng.Zhang@...> wrote:

                      >Dear Marcel Vallée,
                      >
                      >Thanks. I think the sampling density is good enough to reveal the spatial
                      >structure, and the extreme samples are located within the "hot spots". The
                      >problem is that the few values are still extremely high within the "hot
                      >spots". This may be what the "nugget effect" means.
                      >
                      >I'm just wondering if these few extreme values should really be "discarded"/
                      >"censored" or replaced. However, this could get some criticism as they may
                      >be "real".
                      >
                      >If it is hard to find the best way, I will have to "replace" all the extreme
                      >values with 99% or 98% percentiles. But I'm not sure if it is appropriate to
                      >do so.
                      >
                      >Cheers,
                      >
                      >Chaosheng Zhang
                      >
                      >
                      >----- Original Message -----
                      >From: "Marcel Vallée" <vallee.marcel@...>
                      >To: <ai-geostats@...>; "Chaosheng Zhang" <Chaosheng.Zhang@...>
                      >Sent: Thursday, December 13, 2001 10:40 PM
                      >Subject: Re: AI-GEOSTATS: Extreme values?
                      >
                      >
                      >>
                      >> Dear Chaosheng Zang
                      >>
                      >> The sampling interval is so wide that the high values could easily be
                      >>related to "hot spots" of
                      >> higher grade contamination, i..e dumping areas for particular kinds of
                      >>slags, mineralized waste, etc. A property map might help.
                      >>
                      >> Have you contoured the data? If so, the sampling interval is so wide that
                      >>real hot spots of
                      >> environmental significance might not show 2D distribution on such a wide
                      >sampling grid, however.
                      >>
                      >> Regards
                      >>
                      >> Marcel Vallée, Eng,, Geo.
                      >> Geoconseil Marcel Vallée Inc.
                      >> 706 Routhier Ave
                      >> Québec, Québec G1X 3J9
                      >> Canada
                      >> Tel: (1) 418 652 3497
                      >> Fax: (1) 418 652 9148
                      >> Email: vallee.marcel@...
                      >>
                      >> ==============================================
                      >> 13/12/01 08:01:48, Chaosheng Zhang <Chaosheng.Zhang@...> wrote:
                      >> >
                      >> > Date: Thu, 13 Dec 2001 13:01:48 +0000
                      >> >
                      >> > From: Chaosheng Zhang <Chaosheng.Zhang@...>
                      >> > Subject:AI-GEOSTATS: Extreme values?
                      >> > To: ai-geostats@...
                      >> >
                      >> > Dear all,
                      >> >
                      >> > My question is: How to deal with the extreme/outlying values in a data
                      >>>set?
                      >>>
                      >> > I am dealing with heavy metal concentrations in soils from a mine area.
                      >>>The sample number is 223, and the samples are spatially evenly distributed
                      >>>with the sampling interval of 400 metres. There are several samples with
                      >>>extremely high values, which makes me feel uncomfortable. The
                      >>>percentiles of the dataset are listed as follows (in mg/kg):
                      >> >
                      >> >
                      >> > Zn Cu Pb Cd As
                      >> > Min 4 1 25 0.0 2
                      >> > 5% 35 6 35 0.1 6
                      >> > 10% 40 7 41 0.2 7
                      >> >
                      >> > 25% 65 13 62 0.3 9
                      >> > 50% 122 18 168 0.6 15
                      >> > 75% 338 27 821 1.5 28
                      >> > 90% 907 56 2799 2.8 58
                      >> >
                      >> > 95% 1986 116 4490 4.2 80
                      >> > 96% 2462 151 4698 4.9 82
                      >> > 97% 3493 178 5413 6.2 91
                      >> > 98% 4697 207 7609 8.3 111
                      >> >
                      >> > 99% 6712 247 11750 12.4 184
                      >> > Max 11473 1293 16305 48.5 1060

                      >> > When doing geostatistical and statistical analyses, we need some confidence
                      >> > in dealing with the these very high extreme values which account for less
                      >> > than 2% of the total sample number.
                      >> >
                      >> > Any suggestions?
                      >> >
                      >> > Cheers,
                      >> >
                      >> > Chaosheng Zhang
                      >> > ===================================
                      >> > Dr. Chaosheng Zhang
                      >> > Department of Geography
                      >> > National University of Ireland
                      >> > Galway
                      >> > IRELAND
                      >> >
                      >> > Tel: +353-91-524411 ext. 2375
                      >> > Fax: +353-91-525700
                      >> > Email: Chaosheng.Zhang@...
                      >> > ===================================
                      >>
                      >>
                      >>
                      >>
                      >> --
                      >> * To post a message to the list, send it to ai-geostats@...
                      >> * As a general service to the users, please remember to post a summary of
                      >any useful responses to your questions.
                      >> * To unsubscribe, send an email to majordomo@... with no subject and
                      >"unsubscribe ai-geostats" followed by "end" on the next line in the message
                      >body. DO NOT SEND Subscribe/Unsubscribe requests to the list
                      >> * Support to the list is provided at http://www.ai-geostats.org
                      >
                      >
                      >




                      --
                      * To post a message to the list, send it to ai-geostats@...
                      * As a general service to the users, please remember to post a summary of any useful responses to your questions.
                      * To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
                      * Support to the list is provided at http://www.ai-geostats.org
                    • Myers, Jeff
                      Chaosheng Zhang - I think Marcel Vallee is headed in the right direction on your problem. There is a good chance that the problem is one of sample and or
                      Message 10 of 10 , Dec 14, 2001
                      • 0 Attachment
                        Chaosheng Zhang -

                        I think Marcel Vallee is headed in the right direction on your problem.
                        There is a good chance that the problem is one of sample and or subsample
                        support. As mentioned, if you sampled within a foot or tow of a location
                        that displays an extreme or "outlier" value, you may find values an order of
                        magnitude or more below the outlier. Similarly, you may also have
                        "inliers", where a sample nearby a location with a low concentration may
                        contain a significantly higher value. Of course, no one gets excited about
                        the inliers that may be unrepresentative, but we get very excited about the
                        outliers!

                        The possibility of extreme values should be planned for in the initial stage
                        of the sampling program. Pierre Gy's work has revealed that the physical
                        size, volume, and orientation of a sample and subsample (i.e. the support)
                        are crucial to the concentration estimate obtained. You are asking a lot to
                        have a 10-g sample represent 400 meters between sample locations in any
                        case. Unless the support of the original sample and all subsampling stages
                        was sufficient, there is little chance that the samples are highly
                        representative of the true concentration. Mine areas typically are very
                        heterogeneous and proper sampling support when sampling is essential.
                        Perhaps you can provide some details. If the underlying data are not
                        representative due to improper suppoort, you are trying to "contour an
                        illusion", and typically the results are not pleasing.

                        The way in which the data are used in decision-making is also important.
                        For instance, if your purpose is to delineate hot spots for risk assessment,
                        extreme values do not pose a problem as they will be addressed. You may,
                        however, be very interested in getting your best information at an economic
                        cutoff value or risk threshold, since the decision for treatment of values
                        high above or way below the action level is easy.

                        Jeff Myers
                        Westinghouse Safety Management Solutions
                        2131 S. Centennial Ave., SE
                        Aiken, SC 29803
                        803.502.9747 (direct)
                        803.502.9767 (main)
                        803.502.2747 (fax)
                        jeff.myers@... <mailto:jeff.myers@...>
                        http://www.gemdqos.com <http://www.gemdqos.com>


                        -----Original Message-----
                        From: Chaosheng Zhang [mailto:Chaosheng.Zhang@...]
                        Sent: Thursday, December 13, 2001 8:02 AM
                        To: ai-geostats@...
                        Subject: AI-GEOSTATS: Extreme values?


                        Dear all,

                        My question is: How to deal with the extreme/outlying values in a data set?

                        I am dealing with heavy metal concentrations in soils from a mine area. The
                        sample number is 223, and the samples are spatially evenly distributed with
                        the sampling interval of 400 metres. There are several samples with
                        extremely high values, which makes me feel uncomfortable. The percentiles of
                        the dataset are listed as follows (in mg/kg):

                        Zn Cu Pb Cd As
                        Min 4 1 25 0.0 2
                        5% 35 6 35 0.1 6
                        10% 40 7 41 0.2 7
                        25% 65 13 62 0.3 9
                        50% 122 18 168 0.6 15
                        75% 338 27 821 1.5 28
                        90% 907 56 2799 2.8 58
                        95% 1986 116 4490 4.2 80
                        96% 2462 151 4698 4.9 82
                        97% 3493 178 5413 6.2 91
                        98% 4697 207 7609 8.3 111
                        99% 6712 247 11750 12.4 184
                        Max 11473 1293 16305 48.5 1060

                        When doing geostatistical and statistical analyses, we need some confidence
                        in dealing with the these very high extreme values which account for less
                        than 2% of the total sample number.

                        Any suggestions?


                        Cheers,

                        Chaosheng Zhang
                        ===================================
                        Dr. Chaosheng Zhang
                        Department of Geography
                        National University of Ireland
                        Galway
                        IRELAND

                        Tel: +353-91-524411 ext. 2375
                        Fax: +353-91-525700
                        Email: Chaosheng.Zhang@...
                        ===================================




                        [Non-text portions of this message have been removed]
                      Your message has been successfully submitted and would be delivered to recipients shortly.