Loading ...
Sorry, an error occurred while loading the content.

GEOSTATS: variograms and count data

Expand Messages
  • Patrick J. Doran
    Greetings! I am a graduate student at Dartmouth College and a new user to AI-GEOSTATS. I am hoping to use geostatistical techniques to help describe the
    Message 1 of 6 , Feb 10 9:36 AM
    • 0 Attachment
      Greetings!

      I am a graduate student at Dartmouth College and a new user to AI-GEOSTATS. I am hoping to use geostatistical techniques to help describe the distribution and abundance of bird species in a forested ecosystem, with the goal of understanding the mechanisms that influence spatial variation in abundance. I have censused bird species at approx. 350 locations in a 3000ha forested valley and am beginning to analyze the data. Additionally, I have access to a wide variety of other environmental and ecological variables from each census plot.

      As a first cut I hope to use variogram analysis and kriging to provide a thorough description of the spatial distribution of the bird census data. However, I am concerned with the nature of the census data - as with most bird census data, the range is generally from 0-3 individuals/point (in intervals of 0.33) and the data is highly skewed to the left with many zero values. For example, for one species at 373 plots the mean per plot is 0.55, the range is 0-2.67 and I have 110 values of "0", 91 values of "0.33", 81 of "0.66", 38 of "1", 30 of "1.33", 14 of "1.66", 5 of "2", 3 of "2.33" and 1 of "2.67"

      Due to the highly skewed nature of this data, do I have to transform it before I attempt the variogram analysis and kriging? What transformation would work best? I have attempted to search the ai-geostats archive and the intro texts and have not been able to answer these simple questions. Are there any other references that may help me with this analysis?

      Any help would be very much appreciated and will be summarized for the list.

      Thanks,

      Pat Doran

      Patrick J. Doran
      Department of Biological Sciences
      Dartmouth College
      Hanover, NH 03755
      603-646-3688
      Patrick.J.Doran@...
      --
      *To post a message to the list, send it to ai-geostats@....
      *As a general service to list users, please remember to post a summary
      of any useful responses to your questions.
      *To unsubscribe, send email to majordomo@... with no subject and
      "unsubscribe ai-geostats" in the message body.
      DO NOT SEND Subscribe/Unsubscribe requests to the list!
    • Simon Brewer
      Hi Pat I asked this question a while ago, as I have a very similar problem with my data - too many zeros!! Here s the summary of the responses I received. I
      Message 2 of 6 , Feb 11 1:28 AM
      • 0 Attachment
        Hi Pat

        I asked this question a while ago, as I have a very similar problem with my data - too many zeros!!

        Here's the summary of the responses I received. I also made a copy of other emails I found on this topic. I will send these on to you as well.

        hth

        Simon

        Non-normal distribution

        I posted a question to the mailing list on the 30 Nov, concerning the
        non-normal distribution of some data, that I am trying to map using the
        kriging process. To follow this up, I am posting a brief synopsis of the
        (many) replies I received. So first, a big THANK YOU to everyone who
        took the time to send me answers, suggestions and references. I have put
        where possible the names of the people who have supplied the
        information. If I've missed any one - I'm very sorry, please let me
        know.

        1) Should the data be normally distributed? The first point, stressed by
        a number of people, was that there is NO requirement for data to be
        normally distributed for use in the computation of semi-variograms, or
        predictions made by kriging. (Gilles Guillot, Daniel Guibal, Pierre
        Goovaerts). However, as it is a linear estimator, kriging is sensitive
        to a few large samples, which may bias the results. Notably, the
        variogram may become unstructed, close to a pure nugget effect. As Med
        Bennett pointed out, normality is a rare find in mining and
        environmental data! - On this point it also worth reading Don Myers'
        message dated 25 June 1997, and the subsequent postings.

        A number of tests of normality have been mentioned, and I list some of
        them here:
        Q-Q (quantile-quantile) plot
        Kolgorov-Smirnov test
        Data density vs. Theoretical density

        2) What can I do with the data?
        Three possibilities were suggested, as follows:

        a) Transformations - Skewed data may be transformed to a normal
        distribution by a non-linear transformation, e.g. logarithmic. If,
        however, the data are then back-transformed to real values, then the
        unbiased property of the kriging estimates is lost. Other
        transformations have been suggested: the family of Box-Cox power
        transformations, the square-root transformation (for count data), the
        arcsine square-root transformation for percentage data, and the normal
        score transform, of which, I'm afraid, I know nothing! (Joyce Witebsky,
        Daniel Guibal, Vera Pawlowsky-Glahn)

        However, for my data (which is the percentage fossil pollen of various
        tree taxa in lake sediments), the distribution contains a large number
        (approx 50%) of zero values (where no pollen was found). Any attempt to
        transform this simply results in a large number of arbitrary values,
        replacing the zeros. So, another method was needed...

        b) Data partitioning. If the zero values were grouped in a way, that a
        physically different domain could be identified, then it would be
        possible to construct variograms for the different domains. I cannot do
        this, except by arbitrary domaining, which would defeat the object of
        the exercise! (Daniel Guibal, Vera Pawlowsky-Glahn),

        c) Indicator kriging. This would allow an estimate at each prediction
        location of whether or not the fossil pollen would have been present
        (zero or non-zero value). This could then be combined with ordinary
        kriging of the non-zero points. (Daniel Guibal).

        Alternatively, the range could be discretised, using a number of
        thresholds. This I have taken directly from Pierre Goovaerts message:
        What I would suggest is to use an indicator approach,
        that is:
        1. discretize the range of variation of your data using
        a given number of thresholds, say 5: the first threshold
        would be 0% (which is close to the median of your sample
        distribution) and 4 other thresholds corresponding to the
        0.6, 0.7, 0.8 and 0.9 quantiles of your distribution.
        2. for each threshold, code each observation into an indicator value
        which is zero if the measured percentage is larger than the threshold
        and one otherwise.
        3. Compute and model the 5 corresponding indicator semivariograms,
        that is the semivariograms of indicator values.
        4. Use indicator kriging to interpolate the probabilities
        to be no greater than each of the 5 thresholds at the nodes
        of your interpolation grid.
        5. At each location, you can now model the conditional cumulative
        distribution function which provides you with the probability
        that the unknown percentage value is no greater than any
        given threshold. You could use the mean of that distribution
        as your estimate and the variance as a measure of uncertainty.

        The method of indicator kriging seems most appropriate to the data I
        have, it appears to be a more robust method. So I will try this at the
        weekend - wish me luck...

        Well, it's been quite a crash course!! I don't believe I have understood
        everything, so if anyone sees any blinding errors in this message -
        again please let me know.

        If anyone wants a copy of all the replies received, and a small
        collection of other messages related to the issue of non-normality,
        please contect me. I did not include all this on this mail, to keep it's
        size down. On the other hand, if it is generally felt that all replies
        should be posted, I would be happy to do that.

        "Patrick J. Doran" a écrit :

        > Greetings!
        >
        > I am a graduate student at Dartmouth College and a new user to AI-GEOSTATS. I am hoping to use geostatistical techniques to help describe the distribution and abundance of bird species in a forested ecosystem, with the goal of understanding the mechanisms that influence spatial variation in abundance. I have censused bird species at approx. 350 locations in a 3000ha forested valley and am beginning to analyze the data. Additionally, I have access to a wide variety of other environmental and ecological variables from each census plot.
        >
        > As a first cut I hope to use variogram analysis and kriging to provide a thorough description of the spatial distribution of the bird census data. However, I am concerned with the nature of the census data - as with most bird census data, the range is generally from 0-3 individuals/point (in intervals of 0.33) and the data is highly skewed to the left with many zero values. For example, for one species at 373 plots the mean per plot is 0.55, the range is 0-2.67 and I have 110 values of "0", 91 values of "0.33", 81 of "0.66", 38 of "1", 30 of "1.33", 14 of "1.66", 5 of "2", 3 of "2.33" and 1 of "2.67"
        >
        > Due to the highly skewed nature of this data, do I have to transform it before I attempt the variogram analysis and kriging? What transformation would work best? I have attempted to search the ai-geostats archive and the intro texts and have not been able to answer these simple questions. Are there any other references that may help me with this analysis?
        >
        > Any help would be very much appreciated and will be summarized for the list.
        >
        > Thanks,
        >
        > Pat Doran
        >
        > Patrick J. Doran
        > Department of Biological Sciences
        > Dartmouth College
        > Hanover, NH 03755
        > 603-646-3688
        > Patrick.J.Doran@...
        > --
        > *To post a message to the list, send it to ai-geostats@....
        > *As a general service to list users, please remember to post a summary
        > of any useful responses to your questions.
        > *To unsubscribe, send email to majordomo@... with no subject and
        > "unsubscribe ai-geostats" in the message body.
        > DO NOT SEND Subscribe/Unsubscribe requests to the list!

        --

        -------------------------------------------------------
        Simon Brewer email: simon.brewer@...

        European Pollen Database
        (Laboratoire de Botanique Historique et Palynologie)
        (IMEP CNRS URA 1152)
        (Faculte St Jerome - Aix Marseille III)

        Centre Universitaire d'Arles
        Place de la Republique
        13200 Arles - France
        Tel: (33)-(0)4 90 96 18 18 Fax: (33)-(0)4 90 93 98 03
        -------------------------------------------------------



        --
        *To post a message to the list, send it to ai-geostats@....
        *As a general service to list users, please remember to post a summary
        of any useful responses to your questions.
        *To unsubscribe, send email to majordomo@... with no subject and
        "unsubscribe ai-geostats" in the message body.
        DO NOT SEND Subscribe/Unsubscribe requests to the list!
      • Donald Myers
        Just a word of caution about tests for normality, i.e., your question about whether normality is required for geostatistical tools. In geostatistics the data
        Message 3 of 6 , Feb 14 9:25 AM
        • 0 Attachment
          Just a word of caution about tests for normality, i.e., your question about
          whether normality is required for geostatistical tools.

          In geostatistics the data is considered to be a non-random sample from one
          realization of the random function (otherwise the underlying concepts of
          geostatistics would not apply). There are at least two different notions of
          normality to consider:

          1. If one considered the data obtained from all the locations in the region
          of interest, would the distribution of this data be normal? The data is a
          sample from this distribution but is not a random sample (random location
          selection is not quite the same thing as random sampling).

          Note that the region of interest is often defined after the fact, i.e.,
          after data has been collected.

          2. Assuming that the random function is strongly stationary, is the
          univariate distribution normal ? Note that the data is not a sample from
          this distribution.

          The standard tests for normality are based on random sampling FROM the
          distribution, it is difficult to modify the tests to allow for the spatial
          correlation (especially without knowing the "true" variogram/covariance).
          Hence one should be cautious in treating the results of a test of
          hypothesis (for normality) as really definitive.

          Although not often mentioned, we use a form of egodicity in geostatistics
          but the simplest form of this pertains to moments not distributions. For
          example, the weak law of Large numbers implies that the sample proportion
          (for a random sample) will converge (in a certain sense) to the population
          proportion. Similarly, we expect the sample mean to converge to the
          population mean (as the sample size increases) but note that the
          distribution of the sample mean tends to the standard normal not to the
          original distribution.

          We do know that if the random function is multivariate normal that the
          simple kriging estimator is the conditional expectation and hence is THE
          minimum variance, unbiased estimator/predictor. In general however it is
          only the minimum variance, unbiased LINEAR estimator/predictor.

          The bottom line however is probably not a statistical question, do the
          geostatistical tools produce useful results? "Useful" has to be decided by
          the user, not the statistician. Across a wide spectrum of applications the
          answer seems to be yes but in specific instances the answer may be no
          because of a lack of data, difficulty in estimating/modeling the variogram,
          sensitivity of any linear estimator to unusual data values, etc.

          Donald E. Myers
          Department of Mathematics
          University of Arizona
          Tucson, AZ 85721

          http://www.u.arizona.edu/~donaldm

          --
          *To post a message to the list, send it to ai-geostats@....
          *As a general service to list users, please remember to post a summary
          of any useful responses to your questions.
          *To unsubscribe, send email to majordomo@... with no subject and
          "unsubscribe ai-geostats" in the message body.
          DO NOT SEND Subscribe/Unsubscribe requests to the list!
        • Dragoljub Pokrajac - EECS
          Assume that we have two sets of geostatistical data. Is there any statistical test to determine whether variograms on those two sets are the same? Thanks, A.
          Message 4 of 6 , Feb 14 5:02 PM
          • 0 Attachment
            Assume that we have two sets of geostatistical data. Is there any
            statistical test to determine whether variograms on those two sets are the
            same?

            Thanks,

            A. Lazarevic




            --
            *To post a message to the list, send it to ai-geostats@....
            *As a general service to list users, please remember to post a summary
            of any useful responses to your questions.
            *To unsubscribe, send email to majordomo@... with no subject and
            "unsubscribe ai-geostats" in the message body.
            DO NOT SEND Subscribe/Unsubscribe requests to the list!
          • Daniel Bebber
            I asked a similar question a while back, and was sent the following reference by Andrew Lister Kabrick J. M., Clayton M. K. & McSweeney K. 1997. Spatial
            Message 5 of 6 , Feb 15 1:27 AM
            • 0 Attachment
              I asked a similar question a while back, and was sent the following
              reference by Andrew Lister

              Kabrick J. M., Clayton M. K. & McSweeney K. 1997. Spatial patterns of carbon
              texture on
              drumlins in northeastern Wisconsin. Soil Sci. Soc. Am. J. 61(2):541-548

              This contains a method for estimating the errors on a semivariogram value,
              and testing for differences between them. If anyone has any comments on this
              methodology I would be interested to hear them.

              Dan
              _____________________________________
              Mr. Daniel P. Bebber
              Department of Plant Sciences
              University of Oxford
              South Parks Road
              Oxford OX1 3RB
              UK
              Tel. 01865 275000 Fax. 01865 275074

              --
              *To post a message to the list, send it to ai-geostats@....
              *As a general service to list users, please remember to post a summary
              of any useful responses to your questions.
              *To unsubscribe, send email to majordomo@... with no subject and
              "unsubscribe ai-geostats" in the message body.
              DO NOT SEND Subscribe/Unsubscribe requests to the list!
            • Donald Myers
              One should be a little careful about accepting the validity of a test for the equality of two variograms. If one uses an estimator such as the sample
              Message 6 of 6 , Feb 15 9:52 AM
              • 0 Attachment
                One should be a little careful about accepting the validity of a test for
                the "equality" of two variograms. If one uses an estimator such as the
                sample variogram, one only obtains estimates of the values of the variogram
                for a finite number of lags (note that dealing with a possible anisotropy
                makes it even more complicated). Moreover the reliability of these
                estimates varies, in part because the numbers of pairs will vary. If one is
                using the variogram for kriging or simulation then one is most interested
                in the behavior of the variogram, i.e., the values for short lags and
                unfortunately the short lags usually have the smallest numbers of pairs. If
                one uses least squares or maximum likelihood then one must first choose a
                model (or models in the case of a nested model) and then one of these is
                used to estimate the parameters.

                There is an old paper by Davis and Borgman in Mathematical Geology (circa
                1980) on the distribution of the sample variogram, they give two results:
                (1) beginning with an assumption of multivariate normality (which is not
                testable) and an assumed model type then they obtain numerical results for
                the distribution , (2) they obtain asymptotic results which are
                theoretically interesting but probably not much help in practice.

                There is also a paper in Mathematical Geology, circa 1990, on the "true"
                numbers of pairs. The problem as is well known is that there is an
                interdependence between the pairs used to estimate for one lag and those
                used to estimate for another. The author has to assume multivariate
                normality to derive the results.

                It is known that the kriging estimator is relatively robust with respect to
                the variogram, i.e., slight changes in the variogram will result in only
                slight changes in the kriging weight vector and hence in general only
                slight changes in the kriged values. There are at least two different ways
                to quantify the "distance" between two variograms, these correspond to a
                notion of continuity. A third one corresponds to differentiability, none of
                the three implies the others.

                In practice one often uses a search neighborhood in kriging hence it is
                only of interest whether the variograms match or are at least close for up
                to some maximum lag. One will have very little information about the
                variogram for longer lags anyway.

                In general statistical tests will require some distributional assumptions
                and these are hard to obtain for variograms/variogram estimators. It is an
                interesting question to ask, i.e., are the variograms for two different
                variables or the same variable for two different regions the same but one
                that will be hard to test without making very strong assumptions
                (non-testable assumptions).

                Finally one might want to consider the question of sample location pattern
                design relative to testing the equality of two variograms. I have an old
                paper with A.W. Warrick on the design of sampling plans in order to control
                the numbers of pairs for each lag. If one assumes isotropy (it is even more
                complicated in the case of anisotropy) then the pattern that generates an
                equal number of pairs is a spiral, not a very practical result.

                Note also that if one assumes normality then the distribution of the
                half-squared differences will be Chi-Squared (one can see this effect in
                most sample variograms, the VARIO component of GEOEAS will provide
                histograms for these distributions). Not a particularly nice distribution
                for testing because of the "fat" tails.

                Donald E. Myers
                Department of Mathematics
                University of Arizona
                Tucson, AZ 85721

                http://www.u.arizona.edu/~donaldm

                At 05:02 PM 2/14/00 -0800, you wrote:
                >Assume that we have two sets of geostatistical data. Is there any
                >statistical test to determine whether variograms on those two sets are the
                >same?
                >
                >Thanks,
                >
                >A. Lazarevic
                >
                >
                >
                >
                >--
                >*To post a message to the list, send it to ai-geostats@....
                >*As a general service to list users, please remember to post a summary
                >of any useful responses to your questions.
                >*To unsubscribe, send email to majordomo@... with no subject and
                >"unsubscribe ai-geostats" in the message body.
                >DO NOT SEND Subscribe/Unsubscribe requests to the list!
                >

                --
                *To post a message to the list, send it to ai-geostats@....
                *As a general service to list users, please remember to post a summary
                of any useful responses to your questions.
                *To unsubscribe, send email to majordomo@... with no subject and
                "unsubscribe ai-geostats" in the message body.
                DO NOT SEND Subscribe/Unsubscribe requests to the list!
              Your message has been successfully submitted and would be delivered to recipients shortly.