Loading ...
Sorry, an error occurred while loading the content.

[ai-geostats] On spatially well separated binary data

Expand Messages
  • Tib
    Greetings, This is a question which troubles me for a long time. I am doing statistical analysis on a species distribution data. The interested variable is
    Message 1 of 4 , Feb 21, 2005
    • 0 Attachment
      Greetings,

      This is a question which troubles me for a long time.

      I am doing statistical analysis on a species distribution data. The
      interested variable is binary (i.e., presence/absence at sampled
      locations) However, the sampling scheme is much influenced by the
      accessability of locations. For example, many areas have no
      observations at all, some areas have dense observations of absence,
      while other areas have dense observations of presence. Under such
      "non-statistically designed" sampling scheme and also the reason that
      species tends to form colonisations, the data have clusters of
      presence/presence in different locations. We can easily make visual
      discrimination easily in two dimensional plot. The consequence is,
      adding coordinates variables into my logistic regression model is
      highly significant. Quadratic terms of coordinates will improve the
      model even more to a very good discrimination (area under ROC curve
      more than 90%). However, this obviously causes difficulties in
      interpreting these linear and quadratic term of coordinate variables.
      (Well, this might make sense for large scale data, latitude reflects
      temperature, longtitude reflects distance to sea, but it is
      meaningless for small or medium scale data.)

      So my question is, is there any method to "pre-process" these data, or
      is logistic regression a suitable approach for them? Any other
      approaches? I doubt the autologistic approach (Besag 1974, Augustin
      1993, Huffer 1997, Hoeting 2000) will help much because there are so
      many unsampled areas (more than 70% if I discretize them into a
      regular lattice system).

      Any suggestions will be highly appreciated!

      tib
    • Christof Bigler
      I don t know if ecological-niche factor analysis is suitable for your kind of analysis, but this multivariate method is used when presence data of a species
      Message 2 of 4 , Feb 21, 2005
      • 0 Attachment
        I don't know if ecological-niche factor analysis is suitable for your
        kind of analysis, but this multivariate method is used when presence
        data of a species are available and absence data are not available or
        unreliable:

        Hirzel, A. H., J. Hausser, D. Chessel, and N. Perrin. 2002.
        Ecological-niche factor analysis: how to compute habitat-suitability
        maps without absence data? Ecology 83:2027-2036.

        There's software to do ENFA:
        http://www2.unil.ch/biomapper

        or e.g. in R:
        http://www.maths.lth.se/help/R/.R/library/adehabitat/html/enfa.html

        Christof

        On 21.02.2005, at 12:13, Tib wrote:

        > Greetings,
        >
        > This is a question which troubles me for a long time.
        >
        > I am doing statistical analysis on a species distribution data. The
        > interested variable is binary (i.e., presence/absence at sampled
        > locations) However, the sampling scheme is much influenced by the
        > accessability of locations. For example, many areas have no
        > observations at all, some areas have dense observations of absence,
        > while other areas have dense observations of presence. Under such
        > "non-statistically designed" sampling scheme and also the reason that
        > species tends to form colonisations, the data have clusters of
        > presence/presence in different locations. We can easily make visual
        > discrimination easily in two dimensional plot. The consequence is,
        > adding coordinates variables into my logistic regression model is
        > highly significant. Quadratic terms of coordinates will improve the
        > model even more to a very good discrimination (area under ROC curve
        > more than 90%). However, this obviously causes difficulties in
        > interpreting these linear and quadratic term of coordinate variables.
        > (Well, this might make sense for large scale data, latitude reflects
        > temperature, longtitude reflects distance to sea, but it is
        > meaningless for small or medium scale data.)
        >
        > So my question is, is there any method to "pre-process" these data, or
        > is logistic regression a suitable approach for them? Any other
        > approaches? I doubt the autologistic approach (Besag 1974, Augustin
        > 1993, Huffer 1997, Hoeting 2000) will help much because there are so
        > many unsampled areas (more than 70% if I discretize them into a
        > regular lattice system).
        >
        > Any suggestions will be highly appreciated!
        >
        > tib
      • Rajive Ganguli
        Tib, I forwarded the posting to a colleague of mine, Falk Huetman (fffh@uaf.edu). Here is his response (you can obtain the paper he refers, to from him):
        Message 3 of 4 , Feb 21, 2005
        • 0 Attachment
          Tib,

          I forwarded the posting to a colleague of mine, Falk Huetman
          (fffh@...). Here is his response (you can obtain the paper he
          refers, to from him):

          Rajive
          -----
          Thanks, that's a common issue with Museum Data (as well as with most
          data
          collected in wilderness and by humans),
          and is partly addressed by using Ecological Niche Factor
          Analysis (see BIOMAPPER website by A. Hirzel; 'presence only' data), as
          well as with correction surfaces and weighing, e.g. when
          considering distribution of samples across the range of predictors.
          Researchers in New Zealand and Switzerland did that (see GAM modeling on
          WWW)

          For some biological data in Israel, it was shown that a road bias would
          be very small, for instance.

          See attached a paper that deals with these issues a little bit, too.

          The lat lon predictors one can bring in, but only if you don't want to
          generalize to other areas. Usually, this is not done by biologists
          because they are after the environmental predictors alone.
          E.g. describing pres/abs from geographical space into the biological
          space
          (see papers by Townsend Peterson).

          Let me know and we go from there; kind regards

          F.


          -----

          On Mon, 21 Feb 2005 14:13:31 -0500, Tib <tibshirani@...> wrote:
          > Greetings,
          >
          > This is a question which troubles me for a long time.
          >
          > I am doing statistical analysis on a species distribution data. The
          > interested variable is binary (i.e., presence/absence at sampled
          > locations) However, the sampling scheme is much influenced by the
          > accessability of locations. For example, many areas have no
          > observations at all, some areas have dense observations of absence,
          > while other areas have dense observations of presence. Under such
          > "non-statistically designed" sampling scheme and also the reason that
          > species tends to form colonisations, the data have clusters of
          > presence/presence in different locations. We can easily make visual
          > discrimination easily in two dimensional plot. The consequence is,
          > adding coordinates variables into my logistic regression model is
          > highly significant. Quadratic terms of coordinates will improve the
          > model even more to a very good discrimination (area under ROC curve
          > more than 90%). However, this obviously causes difficulties in
          > interpreting these linear and quadratic term of coordinate variables.
          > (Well, this might make sense for large scale data, latitude reflects
          > temperature, longtitude reflects distance to sea, but it is
          > meaningless for small or medium scale data.)
          >
          > So my question is, is there any method to "pre-process" these data, or
          > is logistic regression a suitable approach for them? Any other
          > approaches? I doubt the autologistic approach (Besag 1974, Augustin
          > 1993, Huffer 1997, Hoeting 2000) will help much because there are so
          > many unsampled areas (more than 70% if I discretize them into a
          > regular lattice system).
          >
          > Any suggestions will be highly appreciated!
          >
          > tib
          >
          >
          > * By using the ai-geostats mailing list you agree to follow its rules
          > ( see http://www.ai-geostats.org/help_ai-geostats.htm )
          >
          > * To unsubscribe to ai-geostats, send the following in the subject or in the body (plain text format) of an email message to sympa@...
          >
          > Signoff ai-geostats
          >
          >


          --
          Rajive
        • Edzer J. Pebesma
          ... Tib, if you consider including x and y coordinates in your trend model, make sure that you add their interaction otherwise the surface you predict is not
          Message 4 of 4 , Feb 22, 2005
          • 0 Attachment
            Tib wrote:

            >So my question is, is there any method to "pre-process" these data, or
            >is logistic regression a suitable approach for them? Any other
            >approaches? I doubt the autologistic approach (Besag 1974, Augustin
            >1993, Huffer 1997, Hoeting 2000) will help much because there are so
            >many unsampled areas (more than 70% if I discretize them into a
            >regular lattice system).
            >
            >Any suggestions will be highly appreciated!
            >
            >tib
            >
            >
            Tib, if you consider including x and y coordinates in your
            trend model, make sure that you add their interaction
            otherwise the surface you predict is not rotation
            invariant. An alternative is to use a two-dimensionan
            (smoothing) spline in x and y. An alternative closely
            related to the spline is to use kriging for spatial prediction.
            --
            Edzer
          Your message has been successfully submitted and would be delivered to recipients shortly.