Loading ...
Sorry, an error occurred while loading the content.

AI-GEOSTATS: Log transformation and zeros

Expand Messages
  • Ernesto Jardim
    Hi I m analysing fisheries data (number of fish caught per hour) and I have some 0 values. When I log-trans I have to translate the values by hading some
    Message 1 of 8 , Oct 2, 2002
    • 0 Attachment
      Hi

      I'm analysing fisheries data (number of fish caught per hour) and I have
      some 0 values. When I log-trans I have to translate the values by hading
      some value.

      My question is which value is the best ? is there any works about this ?

      I usually had 1 so that I get values between 0 and infinite (no negative
      value) but I have doubts about it.

      Regards

      EJ




      --
      * To post a message to the list, send it to ai-geostats@...
      * As a general service to the users, please remember to post a summary of any useful responses to your questions.
      * To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
      * Support to the list is provided at http://www.ai-geostats.org
    • Isobel Clark
      Ernesto There are several ways of tackling skewed data with zeroes and I am sure you will get emails from proponents of this or that other contributor. Ways
      Message 2 of 8 , Oct 2, 2002
      • 0 Attachment
        Ernesto

        There are several ways of tackling skewed data with
        zeroes and I am sure you will get emails from
        proponents of this or that other contributor.

        Ways which I have found useful:

        (1) try a lognormal probability plot and see whether
        you have a straight line or if it drops off the line
        at low values. This is indicative of a three parameter
        lognormal distribution which needs an additive
        constant. Find the additive constant that makes the
        line straightest (my criterion) or the skewness
        closest to zero (Sichel's recommendation). You can
        find this described in my 1987 paper following
        Sichel's definitive works. Full copy at
        http://uk.geocities.com/drisobelclark/resume/Publications.html
        {paper titled "turning the tables...."

        (2) treat the zeroes as a different population. Are
        they zero because there are no fish there or because
        you didn't catch any? If the later, use an indicator
        approach to separate the 'no fish' population from the
        'some fish' one. Then do your lognormal stuff on the
        'some fish' and recombine for final results.

        (3) - not so nice: use the probability plot as
        suggested above to choose a 'threshhold' value to
        replace the zeroes. This assumes that all areas
        sampled are 'some fish' areas and you just didn't
        catch any.

        Isobel Clark
        http://uk.geocities.com/geoecosse/news.html

        __________________________________________________
        Do You Yahoo!?
        Everything you'll ever need on one web page
        from News and Sport to Email and Music Charts
        http://uk.my.yahoo.com

        --
        * To post a message to the list, send it to ai-geostats@...
        * As a general service to the users, please remember to post a summary of any useful responses to your questions.
        * To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
        * Support to the list is provided at http://www.ai-geostats.org
      • Syed Abdul Rahman Shibli
        If the skewness of the fish data is causing havoc to your variograms try one of the more robust measures, i.e. the family of relative variograms
        Message 3 of 8 , Oct 2, 2002
        • 0 Attachment
          If the skewness of the fish data is causing havoc to your
          variograms try one of the more "robust" measures, i.e.
          the family of relative variograms (general/pairwise), or the
          non-ergodic covariance. Transformation would mask the extreme
          values which may or may not be very significant to your
          problem domain. Thereafter krige within a limited search
          neighborhood or try an indicator approach at various
          thresholds.

          Syed

          ---- Original message ----
          >Date: 02 Oct 2002 12:24:57 +0100
          >From: Ernesto Jardim <ernesto@...>
          >Subject: AI-GEOSTATS: Log transformation and zeros
          >To: Mailing List AI-Geostats <ai-geostats@...>
          >
          >Hi
          >
          >I'm analysing fisheries data (number of fish caught per hour) and I have
          >some 0 values. When I log-trans I have to translate the values by hading
          >some value.
          >
          >My question is which value is the best ? is there any works about this ?
          >
          >I usually had 1 so that I get values between 0 and infinite (no negative
          >value) but I have doubts about it.
          >
          >Regards
          >
          >EJ
          >
          >
          >
          >
          >--
          >* To post a message to the list, send it to ai-geostats@...
          >* As a general service to the users, please remember to post a summary of any
          useful responses to your questions.
          >* To unsubscribe, send an email to majordomo@... with no subject
          and "unsubscribe ai-geostats" followed by "end" on the next line in the message
          body. DO NOT SEND Subscribe/Unsubscribe requests to the list
          >* Support to the list is provided at http://www.ai-geostats.org

          --
          * To post a message to the list, send it to ai-geostats@...
          * As a general service to the users, please remember to post a summary of any useful responses to your questions.
          * To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
          * Support to the list is provided at http://www.ai-geostats.org
        • Ruben Roa
          ... some 0 values. When I log-trans I have to translate the values by hading some value. I guess you mean you have to do something arbitrary about the zeros
          Message 4 of 8 , Oct 2, 2002
          • 0 Attachment
            >Hi
            >
            >I'm analysing fisheries data (number of fish caught per hour) and I have
            some 0 values. When I log-trans I have to translate the values by hading
            some value.

            I guess you mean you have to do something arbitrary about the zeros before
            the log transform. The delta distribution is a generalization of the
            lognormal for the presence of zeros. See:
            Pennington M. 1983. EFFICIENT ESTIMATORS OF ABUNDANCE FOR FISH AND PLANKTON
            SURVEYS. Biometrics 39:281-286.
            If you are interested, i have template worksheets that compute the
            statistics from delta/lognormal distributions, including confidence bounds,
            by using Land's theory of linear combinations of the normal mean and
            variance.
            You can speak in Spanish to me if you feel more comfortable.
            Saludos
            Rubén
            http://webmail.udec.cl

            --
            * To post a message to the list, send it to ai-geostats@...
            * As a general service to the users, please remember to post a summary of any useful responses to your questions.
            * To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
            * Support to the list is provided at http://www.ai-geostats.org
          • Ernesto Jardim
            Hi The data are not discrete. We collect number per hour, so it s a yield ! Thanks EJ ... -- * To post a message to the list, send it to ai-geostats@unil.ch *
            Message 5 of 8 , Oct 2, 2002
            • 0 Attachment
              Hi

              The data are not discrete. We collect number per hour, so it's a yield !

              Thanks

              EJ

              On Wed, 2002-10-02 at 15:27, Nicholas Lewin-Koh wrote:
              > Hi,
              > If the data are counts, ie integer number of fish and not tons, you
              > might want to try a discrete model such as a negative binomial or
              > Poisson. I have listed some references below, the top two have a more
              > Bayesian flavor.
              >
              > Nicholas
              >
              >
              > Alexander, N., Moyeed, R., Stander, J. (2000). Spatial modelling of
              > individual-level parasite counts using the negative binomial
              > distribution, Biostatistics, 2000, 1, 453-463.
              >
              > Diggle, P. J., Moyeed, R. A., Tawn, J. A. (1998). Model-based
              > geostatistics (with discussion), J. R. Statist. Soc. C, 47, 299-350.
              >
              > Gotway, C.A., Stroup, W.W. (1997) A Generalized Linear Model Approach
              > to Spatial Data Analysis and Prediction. Journal of Agricultural, Bio-
              > logical and Environmental Statistics 2(2), pp. 157­178.
              >
              >
              > On Wed, 2002-10-02 at 19:24, Ernesto Jardim wrote:
              > > Hi
              > >
              > > I'm analysing fisheries data (number of fish caught per hour) and I have
              > > some 0 values. When I log-trans I have to translate the values by hading
              > > some value.
              > >
              > > My question is which value is the best ? is there any works about this ?
              > >
              > > I usually had 1 so that I get values between 0 and infinite (no negative
              > > value) but I have doubts about it.
              > >
              > > Regards
              > >
              > > EJ
              > >
              > >
              > >
              > >
              > > --
              > > * To post a message to the list, send it to ai-geostats@...
              > > * As a general service to the users, please remember to post a summary of any useful responses to your questions.
              > > * To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
              > > * Support to the list is provided at http://www.ai-geostats.org
              > >
              > >
              > >
              >



              --
              * To post a message to the list, send it to ai-geostats@...
              * As a general service to the users, please remember to post a summary of any useful responses to your questions.
              * To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
              * Support to the list is provided at http://www.ai-geostats.org
            • Ernesto Jardim
              Hi I was investigating my data and it is possible to identifie areas of zeros on the outside limits of the distribution, so it can be possible to model the
              Message 6 of 8 , Oct 3, 2002
              • 0 Attachment
                Hi

                I was investigating my data and it is possible to identifie areas of
                zeros on the outside limits of the distribution, so it can be possible
                to model the spatial behaviour in two steps.

                My guess is that I can simple reduce the kriging area to leave the zero
                area out.

                My doubt is how to model boundaries. I'm sure this is a common problem,
                so if you can give me some references I'll look forward to find them.

                Thanks and regards

                EJ

                On Wed, 2002-10-02 at 19:33, Donald E. Myers wrote:
                > Adding a constant to all values will shift the distribution but will not
                > change its shape. If the fraction of zeros is large then you will likely
                > not have a lognormal distribution and hence taking logs may not solve
                > the problem. If you intend using kriging (after applying a log
                > transform) then you will have to worry about the bias correction when
                > you re-transform, to do that the theoretical solution requires
                > multivariate lognormality (univariate is not sufficient).
                >
                > You might want to look at the spatial pattern of the zeros, i.e., is it
                > plausible to separate the data set spatially and have most of the zeros
                > in only one region?
                >
                > Donald E. Myers
                > http://www.u.arizona.edu/~donaldm
                >
                > Ernesto Jardim wrote:
                >
                > >Hi
                > >
                > >I'm analysing fisheries data (number of fish caught per hour) and I have
                > >some 0 values. When I log-trans I have to translate the values by hading
                > >some value.
                > >
                > >My question is which value is the best ? is there any works about this ?
                > >
                > >I usually had 1 so that I get values between 0 and infinite (no negative
                > >value) but I have doubts about it.
                > >
                > >Regards
                > >
                > >EJ
                > >
                > >
                > >
                > >
                > >--
                > >* To post a message to the list, send it to ai-geostats@...
                > >* As a general service to the users, please remember to post a summary of any useful responses to your questions.
                > >* To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
                > >* Support to the list is provided at http://www.ai-geostats.org
                > >
                > >
                >



                --
                * To post a message to the list, send it to ai-geostats@...
                * As a general service to the users, please remember to post a summary of any useful responses to your questions.
                * To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
                * Support to the list is provided at http://www.ai-geostats.org
              • Ruben Roa
                ... on the outside limits of the distribution, so it can be possible to model the spatial behaviour in two steps. ... area out. ... if you can give me some
                Message 7 of 8 , Oct 3, 2002
                • 0 Attachment
                  >Hi
                  >
                  >I was investigating my data and it is possible to identifie areas of zeros
                  on the outside limits of the distribution, so it can be possible to model
                  the spatial behaviour in two steps.
                  >
                  >My guess is that I can simple reduce the kriging area to leave the zero
                  area out.
                  >
                  >My doubt is how to model boundaries. I'm sure this is a common problem, so
                  if you can give me some references I'll look forward to find them.
                  >
                  >Thanks and regards
                  >
                  >EJ

                  Intrinsic geostatistics, the theory based on random functions, does not
                  allow for 'boundary effects'. There should be no interaction between the
                  random variable and its field (in practice, no decrease of density near the
                  borders). On the other hand, transitive geostatistics, the theory based on
                  purposive randomization, does allow for border effects and estimation of
                  boundaries, which may fall anywhere between zero and non-zero observations.
                  The difference between intrinsic and transitive geostatistics is as basic
                  as the difference between model-unbiased and design-unbiased statistical
                  inference.
                  See
                  Petitgas. 1993. Geostatistics for fish stock assessmens: a review and an
                  acoustic application. ICES J Mar Sci 50:285-298.
                  Petitgas and Lafont. 1997. EVA2: estimation variance. Version 2. A
                  geostatistical software on Windows 95 for the precision of fish stock
                  assessment surveys. ICES CM 1997/Y:22.

                  Rubén
                  http://webmail.udec.cl

                  --
                  * To post a message to the list, send it to ai-geostats@...
                  * As a general service to the users, please remember to post a summary of any useful responses to your questions.
                  * To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
                  * Support to the list is provided at http://www.ai-geostats.org
                • Brian R Gray
                  while possibly outside the domain of your original question, I suspect that you may be able to treat your yield data as integers by treating the denominator as
                  Message 8 of 8 , Oct 3, 2002
                  • 0 Attachment
                    while possibly outside the domain of your original question, I suspect that
                    you may be able to treat your yield data as integers by treating the
                    denominator as an offset variable (as a technicality, I'd argue that, even
                    after dividing by a constant, your data remain discrete--just not
                    integers). this common technique would appear to take you back into the
                    discrete world that Nicholas touched on. brian

                    ****************************************************************
                    Brian Gray
                    USGS Upper Midwest Environmental Sciences Center
                    575 Lester Avenue, Onalaska, WI 54650
                    ph 608-783-7550 ext 19, FAX 608-783-8058
                    brgray@...
                    *****************************************************************



                    Ernesto Jardim
                    <ernesto@ipimar. To: Nicholas Lewin-Koh <nikko@...>
                    pt> cc: Mailing List AI-Geostats <ai-geostats@...>
                    Sent by: Subject: Re: AI-GEOSTATS: Log transformation and zeros
                    ai-geostats-list
                    @...


                    10/02/2002 09:47
                    AM
                    Please respond
                    to Ernesto
                    Jardim





                    Hi

                    The data are not discrete. We collect number per hour, so it's a yield !

                    Thanks

                    EJ

                    On Wed, 2002-10-02 at 15:27, Nicholas Lewin-Koh wrote:
                    > Hi,
                    > If the data are counts, ie integer number of fish and not tons, you
                    > might want to try a discrete model such as a negative binomial or
                    > Poisson. I have listed some references below, the top two have a more
                    > Bayesian flavor.
                    >
                    > Nicholas
                    >
                    >
                    > Alexander, N., Moyeed, R., Stander, J. (2000). Spatial modelling of
                    > individual-level parasite counts using the negative binomial
                    > distribution, Biostatistics, 2000, 1, 453-463.
                    >
                    > Diggle, P. J., Moyeed, R. A., Tawn, J. A. (1998). Model-based
                    > geostatistics (with discussion), J. R. Statist. Soc. C, 47, 299-350.
                    >
                    > Gotway, C.A., Stroup, W.W. (1997) A Generalized Linear Model Approach
                    > to Spatial Data Analysis and Prediction. Journal of Agricultural, Bio-
                    > logical and Environmental Statistics 2(2), pp. 157­178.
                    >
                    >
                    > On Wed, 2002-10-02 at 19:24, Ernesto Jardim wrote:
                    > > Hi
                    > >
                    > > I'm analysing fisheries data (number of fish caught per hour) and I
                    have
                    > > some 0 values. When I log-trans I have to translate the values by
                    hading
                    > > some value.
                    > >
                    > > My question is which value is the best ? is there any works about this
                    ?
                    > >
                    > > I usually had 1 so that I get values between 0 and infinite (no
                    negative
                    > > value) but I have doubts about it.
                    > >
                    > > Regards
                    > >
                    > > EJ
                    > >
                    > >
                    > >
                    > >
                    > > --
                    > > * To post a message to the list, send it to ai-geostats@...
                    > > * As a general service to the users, please remember to post a summary
                    of any useful responses to your questions.
                    > > * To unsubscribe, send an email to majordomo@... with no subject
                    and "unsubscribe ai-geostats" followed by "end" on the next line in the
                    message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
                    > > * Support to the list is provided at http://www.ai-geostats.org
                    > >
                    > >
                    > >
                    >



                    --
                    * To post a message to the list, send it to ai-geostats@...
                    * As a general service to the users, please remember to post a summary of
                    any useful responses to your questions.
                    * To unsubscribe, send an email to majordomo@... with no subject and
                    "unsubscribe ai-geostats" followed by "end" on the next line in the message
                    body. DO NOT SEND Subscribe/Unsubscribe requests to the list
                    * Support to the list is provided at http://www.ai-geostats.org






                    --
                    * To post a message to the list, send it to ai-geostats@...
                    * As a general service to the users, please remember to post a summary of any useful responses to your questions.
                    * To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list
                    * Support to the list is provided at http://www.ai-geostats.org
                  Your message has been successfully submitted and would be delivered to recipients shortly.