Loading ...
Sorry, an error occurred while loading the content.

Re: Standardized Sampling Methodologies and a Common Database

Expand Messages
  • Sam Droege
    OK, I can see Matt s original message if I look on the listserv s web site...it was somehow corrupted by my email browser originally... For future reference
    Message 1 of 28 , Aug 15, 2008
    • 0 Attachment
      OK, I can see Matt's original message if I look on the listserv's web
      site...it was somehow corrupted by my email browser originally...

      For future reference all these messages are archived at:

      http://tech.groups.yahoo.com/group/beemonitoring/

      I believe that anyone can see these.

      So, this will be another important set of topics at any meeting.

      1. Standardized vs. Opportunistic samples or surveys

      2. Databasing and datasharing.

      In regards to topic one...Both general approaches are very useful, in
      their places and there is no reason not to develope systems for both.

      A survey or set of surveys can be established (likely at several
      geographic scales) that is systematic, standardized, and repeatable
      that will provide the most statistically rigorous means of looking at
      change and another complementary system can be established that
      compiles unstandarized studies, data collections, museum information,
      general collecting etc.

      In regards to topic number 2. Sharing data and databasing are often
      big bottlenecks in collaborative projects. I have seen a number of
      ways for the NOT to work in the past, but only 3 that seem to work
      well.

      1. One agency or group pays for, collects, analyzes, databases ALL
      the data (relatively unrealistic in this case). North American
      Waterfowl Surveys or the Breeding Bird Survey are good examples of
      these.

      2. One group maintains a data entry web site in which everyone
      shares and produces reports and dataset of equal value to the
      stakeholders. The North American Amphibian Monitoring program and
      FrogwatchUSA are good examples.

      3. Everyone does their own thing and keeps data in whatever
      database/spreadsheet they like and periodically contributes a text
      file with column headers to a central repository. Each database is
      owned by the contributor and is maintained (and included or excluded)
      by that group. Another body provides a service for extraction or
      display of these datasets...Discoverlife is a good example of this.

      sam






      --- In beemonitoring@yahoogroups.com, "Matthew Sarver" <mjsarver@...>
      wrote:
      >
      > All -
      >
      > Clearly, we each have different opinions on this topic, biased by
      our own
      > interests and specializations. Such is the challenge of
      collaborative work
      > in the age of academic globalization! The common ground, as I read
      it, is
      > threefold:
      >
      > (1) A desire for some level of standardization in methods of
      inventorying
      > bees for the specific purpose of monitoring long-term population and
      > distributional trends (Sam's original point, and the goal of his
      work, if I
      > understand it correctly)
      >
      > (2) A way to incorporate and make available the massive amount of
      > non-standardized data already available in museums, and that will
      continue
      > to be generated by taxonomists and ecological field workers. This
      data, as
      > John points out, is of tremendous importance in natural history,
      taxonomy,
      > and biogeography, and can add to the standardized data in (1), and
      may
      > supersede it in many cases of rare or infrequently collected
      species.
      >
      > (3) Following from the first two points, and as has been alluded to
      by John
      > and others, the need for a collaborative and
      accessible "clearinghouse" for
      > the resultant data from both standardized and non-standardized
      origins.
      >
      > As a bit of an outsider (I often find myself walking a tightrope
      between
      > academia, government, non-profits, etc) perhaps I can offer a start.
      >
      > It seems to me that the standardization of protocols is only useful
      if that
      > data ends up in a common database for analysis and sharing. If we
      are to
      > build a common database for bee records, it would be foolish not to
      include
      > all of the records from non-standardized methodology, including
      museum
      > specimens, expert-identified photographs, etc.
      >
      > While the georeferenced specimen mapping tools in the Discover Life
      guides
      > are a good start, I would argue that an expanded version of that
      database,
      > with a much fuller feature set and search functions, and including
      more
      > fields, would be highly desirable. This North American Bee
      Database (or
      > whatever it might be called) could become the standard location for
      storage
      > of all bee specimen and photo records for the continent, and could
      be made
      > accessible on the web.
      >
      > Issues of standardization could be dealt with by populating, for
      each import
      > of records, a selection of fields indicating the type of record, the
      > collection methods used, etc. This would hopefully not be as hard
      as it
      > might seem. Most bee specimens could be assigned to one of the
      following
      > collection methods: malaise, net/hand, bowl, vane trap, photograph
      only, or
      > unknown method (for museum specimens). Another field could ask for
      the
      > specific protocol used. Still more linked fields would hold floral
      > association, habitat data, etc
      >
      > In this way, all relevant data could be compiled in a centralized
      clearing
      > house. Researchers interested in monitoring trends could simply
      filter the
      > database and view only specimens from standardized methods, while
      those
      > interested in floral associations or distributions could make use
      of the
      > complete data set.
      >
      > Several challenges come to mind here:
      >
      > (1) Funding / Personnel - such a project would require full time
      attention
      > from at least a few people building and managing the database, in
      addition
      > to much time from taxonomists (who, as John points out, are already
      > overextended).
      > (2) Academic intellectual property - Regrettably, this is a major
      issue when
      > dealing with such an endeavor, but that is the nature of our field,
      and
      > everyone should get due credit for their contributions. Perhaps
      this could
      > be overcome by a lock that contributors could place on data of
      their own
      > specimens. This "lock" would allow the data to show up in certain
      contexts
      > (e.g. state species list queries), but not in full detail until any
      relevant
      > publications were completed.
      > (3) Data accuracy - a database such as this would require much
      effort from
      > competent individuals to ensure the accuracty of determinations,
      etc.
      > Including det. codes and dates in the database would be a minimal
      step to
      > help ensure the validity of records.
      > (4) Accessibility. Difficult decisions would need to be made about
      use of
      > the contributed data. I am in the open data-sharing camp, but many
      are not,
      > and I understand the reasons for that. If full funding could be
      found to
      > support the efforts of staff and taxonomists, it would compel open
      access to
      > the compiled data.
      >
      > I feel that this is the direction that we should be going in this
      > information age. We should all strive to overcome our own self-
      interests
      > and work toward a true collaborative effort!
      >
      > Sam, I apologize if I have hijacked your original intention, but it
      seems to
      > me that standardized methodologies are closely intertwined with
      this idea.
      >
      >
      > My two cents
      > Matt Sarver
      >
    • Sam Droege
      Oligolectic species would be in one of the groups more likely to be missed...depending on the survey technique. Males and females may sometimes nectar off
      Message 2 of 28 , Aug 15, 2008
      • 0 Attachment
        Oligolectic species would be in one of the groups more likely to be
        missed...depending on the survey technique.

        Males and females may sometimes nectar off their host which would
        increase their probabilities of capture. Pantrap, malaise and other
        general traps often pick up oligolectic species, but there are many
        instances where they seem to be poor vehicles for capturing this group.

        This may be an instance where you would have to develop host-based
        special surveys, decide that general collecting would be sufficient, or
        decide that some groups simply will not be "monitored."

        I think that will be another topic area when surveys are being
        developed...that is, which species will be adequately covered, and
        which will not.

        sam

        --- In beemonitoring@yahoogroups.com, "Michael Wilson" <mwilso14@...>
        wrote:
        >
        > Just trying to understand,
        > To determine change in the health of oligolectic species, wouldn't
        > one need to follow plant communities that often move
        > dynamically across the landscape? How would this
        > be done with static locations?
        > Thanks,
        > Michael Wilson
        >
      • John S. Ascher
        This sounds good Sam. I have a few minor additions as follows: 1. Standardized vs. Opportunistic samples or surveys I m not sure that these can be broken
        Message 3 of 28 , Aug 15, 2008
        • 0 Attachment
          This sounds good Sam. I have a few minor additions as follows:

          " 1. Standardized vs. Opportunistic samples or surveys"

          I'm not sure that these can be broken down so simply. My sense is that a
          wide array of sampling techniques are appropriate depending on the
          questions of interest and the circumstances. Much "Opportunistic" or
          taxonomically-focused sampling can be standardized to some degree, but
          using methods appropriate to descriptive and historical science (e.g.,
          historical biogeography) and therefore quite different from those applied
          to experimental studies such as those designed by statistically savvy bee
          ecologists.

          "3. Everyone does their own thing and keeps data in whatever
          database/spreadsheet they like and periodically contributes a text file
          with column headers to a central repository. Each database is owned by the
          contributor and is maintained (and included or excluded) by that group.
          Another body provides a service for extraction or
          display of these datasets...Discoverlife is a good example of this."

          A useful model, already implemented at Discoverlife, is for small
          contributors and those lacking computer resources to periodically send
          static data (e.g., from a spreadsheet) whereas larger and/or more
          computer savvy contributors can set up dynamic, continuously updating
          links (e.g. to a relational database) between their servers and the
          community resource.

          Many groups have already been developing useful standards for sharing
          pollinator data and we can usefully consult these and suggest that people
          adopt them. If people nonetheless persist in doing their own thing for
          whatever reason much of their data may still be rendered useful to all if
          a clever computer scientist can extract these.

          It is extremely important to note that there are already multiple linked
          central repositories in place. All data sent to one central repository can
          and should be shared dynamically with other collaborating repositories.
          Local repositories can enhance centralized (global) data by providing
          additional more particular services (e.g., customizable dynamic local maps
          and potentially analyses based on these) and by sending corrections
          discovered locally back to the general repositories.

          As a specific example, note that bee specimen records sent to GBIF can
          also be sent to other centralized data sources. This map of Bombus
          includes 135,000+ GBIF records and many others, all error-checked by the
          Global Mapper:

          http://www.discoverlife.org/mp/20m?kind=Bombus

          This example shows how the community can and should take advantage of
          multiple central repositories, as these have different strengths and can
          usefully link to each other to collectively display and error-check data.

          When planning this or any other project we should try to take full
          advantage of existing tools. Of these, web-based collaborative tools are
          already very powerful and are being improved every day.

          Images in particular can have a very wide array of uses once copyright
          issues can be addressed.

          In summary I suggest that we as a community assemble globally relevant
          data, which can of course easily be repackaged for local use, and
          establish dynamic links among central repositories (plural) and between
          these and local repositories.

          John

          P.S. On the subject of sampling oligolectic bees, these are not
          efficiently sampled using single-site/ecological protocols designed to
          obtain an unbiased cross-section of the community from an unbiased sample
          of floral resources. However these can be found very effectively using
          taxonomically-oriented methods, such as targeted collecting at sites were
          the particular taxa of interest have been recorded historically or at
          biogeographically similar sites. In this case sampling bias in favor of
          the oligolectic species of interest is a very good thing.





          > OK, I can see Matt's original message if I look on the listserv's web
          site...it was somehow corrupted by my email browser originally...
          >
          > For future reference all these messages are archived at:
          >
          > http://tech.groups.yahoo.com/group/beemonitoring/
          >
          > I believe that anyone can see these.
          >
          > So, this will be another important set of topics at any meeting.
          >
          > 1. Standardized vs. Opportunistic samples or surveys
          >
          > 2. Databasing and datasharing.
          >
          > In regards to topic one...Both general approaches are very useful, in
          their places and there is no reason not to develope systems for both.
          >
          > A survey or set of surveys can be established (likely at several
          geographic scales) that is systematic, standardized, and repeatable that
          will provide the most statistically rigorous means of looking at change
          and another complementary system can be established that
          > compiles unstandarized studies, data collections, museum information,
          general collecting etc.
          >
          > In regards to topic number 2. Sharing data and databasing are often big
          bottlenecks in collaborative projects. I have seen a number of ways for
          the NOT to work in the past, but only 3 that seem to work well.
          >
          > 1. One agency or group pays for, collects, analyzes, databases ALL the
          data (relatively unrealistic in this case). North American
          > Waterfowl Surveys or the Breeding Bird Survey are good examples of these.
          >
          > 2. One group maintains a data entry web site in which everyone
          > shares and produces reports and dataset of equal value to the
          > stakeholders. The North American Amphibian Monitoring program and
          FrogwatchUSA are good examples.
          >
          > 3. Everyone does their own thing and keeps data in whatever
          > database/spreadsheet they like and periodically contributes a text file
          with column headers to a central repository. Each database is owned by
          the contributor and is maintained (and included or excluded) by that
          group. Another body provides a service for extraction or display of
          these datasets...Discoverlife is a good example of this.
          >
          > sam
          >
          >
          >
          >
          >
          >
          > --- In beemonitoring@yahoogroups.com, "Matthew Sarver" <mjsarver@...>
          wrote:
          >> All -
          >> Clearly, we each have different opinions on this topic, biased by
          > our own
          >> interests and specializations. Such is the challenge of
          > collaborative work
          >> in the age of academic globalization! The common ground, as I read
          > it, is
          >> threefold:
          >> (1) A desire for some level of standardization in methods of
          > inventorying
          >> bees for the specific purpose of monitoring long-term population and
          distributional trends (Sam's original point, and the goal of his
          > work, if I
          >> understand it correctly)
          >> (2) A way to incorporate and make available the massive amount of
          non-standardized data already available in museums, and that will
          > continue
          >> to be generated by taxonomists and ecological field workers. This
          > data, as
          >> John points out, is of tremendous importance in natural history,
          > taxonomy,
          >> and biogeography, and can add to the standardized data in (1), and
          > may
          >> supersede it in many cases of rare or infrequently collected
          > species.
          >> (3) Following from the first two points, and as has been alluded to
          > by John
          >> and others, the need for a collaborative and
          > accessible "clearinghouse" for
          >> the resultant data from both standardized and non-standardized
          > origins.
          >> As a bit of an outsider (I often find myself walking a tightrope
          > between
          >> academia, government, non-profits, etc) perhaps I can offer a start. It
          seems to me that the standardization of protocols is only useful
          > if that
          >> data ends up in a common database for analysis and sharing. If we
          > are to
          >> build a common database for bee records, it would be foolish not to
          > include
          >> all of the records from non-standardized methodology, including
          > museum
          >> specimens, expert-identified photographs, etc.
          >> While the georeferenced specimen mapping tools in the Discover Life
          > guides
          >> are a good start, I would argue that an expanded version of that
          > database,
          >> with a much fuller feature set and search functions, and including
          > more
          >> fields, would be highly desirable. This North American Bee
          > Database (or
          >> whatever it might be called) could become the standard location for
          > storage
          >> of all bee specimen and photo records for the continent, and could
          > be made
          >> accessible on the web.
          >> Issues of standardization could be dealt with by populating, for
          > each import
          >> of records, a selection of fields indicating the type of record, the
          collection methods used, etc. This would hopefully not be as hard
          > as it
          >> might seem. Most bee specimens could be assigned to one of the
          > following
          >> collection methods: malaise, net/hand, bowl, vane trap, photograph
          > only, or
          >> unknown method (for museum specimens). Another field could ask for
          > the
          >> specific protocol used. Still more linked fields would hold floral
          association, habitat data, etc
          >> In this way, all relevant data could be compiled in a centralized
          > clearing
          >> house. Researchers interested in monitoring trends could simply
          > filter the
          >> database and view only specimens from standardized methods, while
          > those
          >> interested in floral associations or distributions could make use
          > of the
          >> complete data set.
          >> Several challenges come to mind here:
          >> (1) Funding / Personnel - such a project would require full time
          > attention
          >> from at least a few people building and managing the database, in
          > addition
          >> to much time from taxonomists (who, as John points out, are already
          overextended).
          >> (2) Academic intellectual property - Regrettably, this is a major
          > issue when
          >> dealing with such an endeavor, but that is the nature of our field,
          > and
          >> everyone should get due credit for their contributions. Perhaps
          > this could
          >> be overcome by a lock that contributors could place on data of
          > their own
          >> specimens. This "lock" would allow the data to show up in certain
          > contexts
          >> (e.g. state species list queries), but not in full detail until any
          > relevant
          >> publications were completed.
          >> (3) Data accuracy - a database such as this would require much
          > effort from
          >> competent individuals to ensure the accuracty of determinations,
          > etc.
          >> Including det. codes and dates in the database would be a minimal
          > step to
          >> help ensure the validity of records.
          >> (4) Accessibility. Difficult decisions would need to be made about
          > use of
          >> the contributed data. I am in the open data-sharing camp, but many
          > are not,
          >> and I understand the reasons for that. If full funding could be
          > found to
          >> support the efforts of staff and taxonomists, it would compel open
          > access to
          >> the compiled data.
          >> I feel that this is the direction that we should be going in this
          information age. We should all strive to overcome our own self-
          > interests
          >> and work toward a true collaborative effort!
          >> Sam, I apologize if I have hijacked your original intention, but it
          > seems to
          >> me that standardized methodologies are closely intertwined with
          > this idea.
          >> My two cents
          >> Matt Sarver
          >
          >
          >


          --
          John S. Ascher, Ph.D.
          Bee Database Project Manager
          Division of Invertebrate Zoology
          American Museum of Natural History
          Central Park West @ 79th St.
          New York, NY 10024-5192
          work phone: 212-496-3447
          mobile phone: 917-407-0378
        • Gretchen LeBuhn
          All- While I was at ESA, I spoke with Matt Jones, the bionformatics guru at NCEAS about how to archive bee data sets that used a common protocol. NCEAS has
          Message 4 of 28 , Aug 15, 2008
          • 0 Attachment
            All-

            While I was at ESA, I spoke with Matt Jones, the bionformatics guru at  NCEAS about how to archive bee data sets that used a common protocol.  NCEAS has been working toward becoming a clearinghouse for exactly these types of data and has particular expertise in the issues of sharing scientific data tat Matthew has outlined below.  They archive all of the LTER and NRS datasets among many others. 

            Gretchen

            On Fri, Aug 15, 2008 at 11:36 AM, Matthew Sarver <mjsarver@...> wrote:

            All - 
             
            Clearly, we each have different opinions on this topic, biased by our own interests and specializations.  Such is the challenge of collaborative work in the age of academic globalization!  The common ground, as I read it, is threefold:
             
            (1) A desire for some level of standardization in methods of inventorying bees for the specific purpose of monitoring long-term population and distributional trends (Sam's original point, and the goal of his work, if I understand it correctly)
             
            (2) A way to incorporate and make available the massive amount of non-standardized data already available in museums, and that will continue to be generated by taxonomists and ecological field workers.  This data, as John points out, is of tremendous importance in natural history, taxonomy, and biogeography, and can add to the standardized data in (1), and may supersede it in many cases of rare or infrequently collected species.
             
            (3) Following from the first two points, and as has been alluded to by John and others, the need for a collaborative and accessible "clearinghouse" for the resultant data from both standardized and non-standardized origins
             
            As a bit of an outsider (I often find myself walking a tightrope between academia, government, non-profits, etc) perhaps I can offer a start.
             
            It seems to me that the standardization of protocols is only useful if that data ends up in a common database for analysis and sharing.  If we are to build a common database for bee records, it would be foolish not to include all of the records from non-standardized methodology, including museum specimens, expert-identified photographs, etc. 
             
            While the georeferenced specimen mapping tools in the Discover Life guides are a good start, I would argue that an expanded version of that database, with a much fuller feature set and search functions, and including more fields, would be highly desirable.  This North American Bee Database (or whatever it might be called) could become the standard location for storage of all bee specimen and photo records for the continent, and could be made accessible on the web.
             
            Issues of standardization could be dealt with by populating, for each import of records, a selection of fields indicating the type of record, the collection methods used, etc.  This would hopefully not be as hard as it might seem.  Most bee specimens could be assigned to one of the following collection methods: malaise, net/hand, bowl, vane trap, photograph only, or unknown method (for museum specimens).  Another field could ask for the specific protocol used.  Still more linked fields would hold floral association, habitat data, etc
             
            In this way, all relevant data could be compiled in a centralized clearing house.  Researchers interested in monitoring trends could simply filter the database and view only specimens from standardized methods, while those interested in floral associations or distributions could make use of the complete data set.
             
            Several challenges come to mind here:
             
            (1) Funding / Personnel - such a project would require full time attention from at least a few people building and managing the database, in addition to much time from taxonomists (who, as John points out, are already overextended).
            (2) Academic intellectual property - Regrettably, this is a major issue when dealing with such an endeavor, but that is the nature of our field, and everyone should get due credit for their contributions.  Perhaps this could be overcome by a lock that contributors could place on data of their own specimens.  This "lock" would allow the data to show up in certain contexts (e.g. state species list queries), but not in full detail until any relevant publications were completed.
            (3) Data accuracy - a database such as this would require much effort from competent individuals to ensure the accuracty of determinations, etc.  Including det. codes and dates in the database would be a minimal step to help ensure the validity of records.
            (4) Accessibility.  Difficult decisions would need to be made about use of the contributed data.  I am in the open data-sharing camp, but many are not, and I understand the reasons for that.  If full funding could be found to support the efforts of staff and taxonomists, it would compel open access to the compiled data. 
             
            I feel that this is the direction that we should be going in this information age.  We should all strive to overcome our own self-interests and work toward a true collaborative effort! 
             
            Sam, I apologize if I have hijacked your original intention, but it seems to me that standardized methodologies are closely intertwined with this idea.
             
             
            My two cents
            Matt Sarver 



            --
            Gretchen LeBuhn
          • Matthew Sarver
            John - It is extremely important to note that there are already multiple linked central repositories in place. Thanks for pointing this out. I am obviously
            Message 5 of 28 , Aug 15, 2008
            • 0 Attachment
              John -
               
              "It is extremely important to note that there are already multiple linked
              central repositories in place."
               
              Thanks for pointing this out.  I am obviously not as well-versed in bioinformatics databases as I could be.  I did not mean to suggest reinventing the wheel on this, but wasn't sure how many of these existing databases are flexible enough in their data input to allow us to work with the specific fields that the bee community would find useful / neccessary.  Generating a map for a species is one thing, but a fully searchable database that allows one to find flower records, flight periods, etc for a certain part of the world or a certain species is another.  Right now, the Discover Life specimen view includes a number of very useful data fields, but there are certainly many more that might be of interest, particularly in terms of habitat and floral associations.  As far as I know, there is no easy way to search the fields in that database, other than by viewing a specimen record from the mapper.  Likewise, GBIF is primarily biogeographical data.  I was thinking about the creation of a database web portal with a design and front end that would be specifically geared toward pollinator records, and the associated ecological data that might not fit the mold of available broader repositories.
               
              Such a customized portal could also be expanded to include an EBird or Bugguide-like citizen science component, where photos could be posted by amateurs.  I agree that bugguide already serves that purpose admirably, but its structure does not encourage the entry of scientifically useful data along with submitted records in the way that a custom-tailored user interface like Ebird does.  The already useful information generated by bugguide could be made even more useful by asking users for more information about their sighting.
               
              "Local repositories can enhance centralized (global) data by providing
              additional more particular services (e.g., customizable dynamic local maps
              and potentially analyses based on these) "
               
              I guess this is more along the lines of what I am thinking.  But "local" in the sense of specificty of purpose or usage, rather than geography.  Thoughts?
               
              Matt
               
               
            • Dan Kjar
              As a database person I have to just say I am surprised savvy and relational database ended up in the same sentence... ;) Remember that old saying you can
              Message 6 of 28 , Aug 15, 2008
              • 0 Attachment
                As a database person I have to just say I am surprised 'savvy' and
                'relational database' ended up in the same sentence...

                ;)


                Remember that old saying "you can choose two of the following:
                quality, quantity, and currency. You cannot have all three."

                Dan
              • Dan Kjar
                Discoverlife s fields are whatever the submitter wants them to be. The only thing required is a taxonomic name and hopefully a location in whatever format you
                Message 7 of 28 , Aug 15, 2008
                • 0 Attachment
                  Discoverlife's fields are whatever the submitter wants them to be.
                  The only thing required is a taxonomic name and hopefully a location
                  in whatever format you like.

                  --- In beemonitoring@yahoogroups.com, "Matthew Sarver" <mjsarver@...>
                  wrote:
                  >
                  > John -
                  >
                  > "It is extremely important to note that there are already multiple
                  linked
                  > central repositories in place."
                  >
                  > Thanks for pointing this out. I am obviously not as well-versed in
                  > bioinformatics databases as I could be. I did not mean to suggest
                  > reinventing the wheel on this, but wasn't sure how many of these
                  existing
                  > databases are flexible enough in their data input to allow us to
                  work with
                  > the specific fields that the bee community would find useful /
                  neccessary.
                  > Generating a map for a species is one thing, but a fully searchable
                  database
                  > that allows one to find flower records, flight periods, etc for a
                  certain
                  > part of the world or a certain species is another. Right now, the
                  Discover
                  > Life specimen view includes a number of very useful data fields, but
                  there
                  > are certainly many more that might be of interest, particularly in
                  terms of
                  > habitat and floral associations. As far as I know, there is no easy
                  way to
                  > search the fields in that database, other than by viewing a specimen
                  record
                  > from the mapper. Likewise, GBIF is primarily biogeographical data.
                  I was
                  > thinking about the creation of a database web portal with a design
                  and front
                  > end that would be specifically geared toward pollinator records, and the
                  > associated ecological data that might not fit the mold of available
                  broader
                  > repositories.
                  >
                  > Such a customized portal could also be expanded to include an EBird or
                  > Bugguide-like citizen science component, where photos could be posted by
                  > amateurs. I agree that bugguide already serves that purpose
                  admirably, but
                  > its structure does not encourage the entry of scientifically useful data
                  > along with submitted records in the way that a custom-tailored user
                  > interface like Ebird does. The already useful information generated by
                  > bugguide could be made even more useful by asking users for more
                  information
                  > about their sighting.
                  >
                  > "Local repositories can enhance centralized (global) data by providing
                  > additional more particular services (e.g., customizable dynamic
                  local maps
                  > and potentially analyses based on these) "
                  >
                  > I guess this is more along the lines of what I am thinking. But
                  "local" in
                  > the sense of specificty of purpose or usage, rather than geography.
                  > Thoughts?
                  >
                  > Matt
                  >
                • John S. Ascher
                  Matt - Thanks for another thoughtful response. I did not mean to suggest ... existing ... with ... neccessary. As Dan already noted Discoverlife can
                  Message 8 of 28 , Aug 15, 2008
                  • 0 Attachment
                    Matt -

                    Thanks for another thoughtful response.

                    I did not mean to suggest
                    > reinventing the wheel on this, but wasn't sure how many of these
                    existing
                    > databases are flexible enough in their data input to allow us to work
                    with
                    > the specific fields that the bee community would find useful /
                    neccessary.

                    As Dan already noted Discoverlife can accommodate virtually any field as
                    long as data are linked directly to a species name. Only fields with data
                    appear when you pull up specimen records; blank fields are not displayed.

                    > Generating a map for a species is one thing, but a fully searchable
                    database
                    > that allows one to find flower records, flight periods, etc for a
                    certain
                    > part of the world or a certain species is another.

                    There are web portals being designed specifically to fulfill precisely
                    these needs, e.g.:

                    http://libraryportals.com/PCDL

                    Stuart Roberts in the UK is developing an excellent database optimized to
                    record these data.

                    Right now, the
                    > Discover
                    > Life specimen view includes a number of very useful data fields, but
                    there
                    > are certainly many more that might be of interest, particularly in terms of
                    > habitat and floral associations.

                    These can already be mapped. These and other fields you can dream up can
                    certainly be displayed. Sam even has a field where he notes brand of
                    soap!

                    As far as I know, there is no easy way
                    > to
                    > search the fields in that database, other than by viewing a specimen record
                    > from the mapper.

                    You are correct. The search function needs improvement.

                    Likewise, GBIF is primarily biogeographical data. I was
                    > thinking about the creation of a database web portal with a design and
                    front
                    > end that would be specifically geared toward pollinator records, and the
                    associated ecological data that might not fit the mold of available
                    broader
                    > repositories.

                    As noted above this may already exist:

                    http://libraryportals.com/PCDL

                    > Such a customized portal could also be expanded to include an EBird or
                    Bugguide-like citizen science component, where photos could be posted by
                    amateurs. I agree that bugguide already serves that purpose admirably,
                    but
                    > its structure does not encourage the entry of scientifically useful data
                    along with submitted records in the way that a custom-tailored user
                    interface like Ebird does. The already useful information generated by
                    bugguide could be made even more useful by asking users for more
                    information
                    > about their sighting.

                    I would advocate an all of the above solution, i.e. improving Bugguide
                    itself, improving relevant tools at other sites such as Discoverlife, and
                    establishing useful links between sites with complementary emphases.

                    > "Local repositories can enhance centralized (global) data by providing
                    additional more particular services (e.g., customizable dynamic local
                    maps
                    > and potentially analyses based on these) "
                    >
                    > I guess this is more along the lines of what I am thinking. But "local" in
                    > the sense of specificty of purpose or usage, rather than geography.
                    Thoughts?

                    I meant both.

                    In terms of geography, one example of a local site would be a global or
                    regional ID guide customized for a specific site by filtering out
                    extralimital taxa.

                    For example, here is the eastern Bee Genera guide customized for the
                    Fingerlakes region of NY:

                    http://www.discoverlife.org/mp/20q?guide=Bee_genera&cl=US/NY/Fingerlakes

                    In terms of specificity of purpose, a local site could highlight and
                    extend a subset of data, e.g., pollinator-plant interactions, derived by
                    querying one or more central repositories.

                    John


                    > Matt
                    >
                    >
                    >


                    --
                    John S. Ascher, Ph.D.
                    Bee Database Project Manager
                    Division of Invertebrate Zoology
                    American Museum of Natural History
                    Central Park West @ 79th St.
                    New York, NY 10024-5192
                    work phone: 212-496-3447
                    mobile phone: 917-407-0378
                  • Matthew Sarver
                    Great! I didn t know discoverlife was set up that way until Dan pointed it out. A query interface for this database now seems like an obvious starting point.
                    Message 9 of 28 , Aug 15, 2008
                    • 0 Attachment
                      Great!  I didn't know discoverlife was set up that way until Dan pointed it out.  A query interface for this database now seems like an obvious starting point.  As for PCDL - I thought they were only tackling literature, at least for now.  Do they have plans to incorporate specimen data as well?  I've certainly used it for plant/pollinator interactions a number of times already. 
                       
                      The "citizen science" thing for insects has great potential - as long as those who can ID the pics can keep up!  An integration of bugguide and discover life would be really cool!
                       
                      Matt


                      From: beemonitoring@yahoogroups.com [mailto:beemonitoring@yahoogroups.com] On Behalf Of John S. Ascher
                      Sent: Saturday, August 16, 2008 1:16 AM
                      To: beemonitoring@yahoogroups.com
                      Subject: Re: [beemonitoring] Re: Standardized Sampling Methodologies and a Common Database


                      Matt -

                      Thanks for another thoughtful response.

                      I did not mean to suggest

                      > reinventing the wheel on this, but wasn't sure how many
                      of these
                      existing
                      > databases are flexible enough in their data input
                      to allow us to work
                      with
                      > the specific fields that the bee community
                      would find useful /
                      neccessary.

                      As Dan already noted Discoverlife can accommodate virtually any field as
                      long as data are linked directly to a species name. Only fields with data
                      appear when you pull up specimen records; blank fields are not displayed.

                      > Generating a map for a species is
                      one thing, but a fully searchable
                      database
                      > that allows one to find
                      flower records, flight periods, etc for a
                      certain
                      > part of the world
                      or a certain species is another.

                      There are web portals being designed specifically to fulfill precisely
                      these needs, e.g.:

                      http://libraryporta ls.com/PCDL

                      Stuart Roberts in the UK is developing an excellent database optimized to
                      record these data.

                      Right now, the
                      > Discover
                      > Life specimen view
                      includes a number of very useful data fields, but
                      there
                      > are certainly
                      many more that might be of interest, particularly in terms of
                      > habitat
                      and floral associations.

                      These can already be mapped. These and other fields you can dream up can
                      certainly be displayed. Sam even has a field where he notes brand of
                      soap!

                      As far as I know, there is no easy way
                      > to
                      > search the fields in that database, other than by viewing
                      a specimen record
                      > from the mapper.

                      You are correct. The search function needs improvement.

                      Likewise, GBIF is primarily biogeographical data. I was
                      > thinking about the creation of a database web portal with a
                      design and
                      front
                      > end that would be specifically geared toward
                      pollinator records, and the
                      associated ecological data that might not fit the mold of available
                      broader
                      > repositories.

                      As noted above this may already exist:

                      http://libraryporta ls.com/PCDL

                      >
                      Such a customized portal could also be expanded to include an EBird or
                      Bugguide-like citizen science component, where photos could be posted by
                      amateurs. I agree that bugguide already serves that purpose admirably,
                      but
                      > its structure does not encourage the entry of
                      scientifically useful data
                      along with submitted records in the way that a custom-tailored user
                      interface like Ebird does. The already useful information generated by
                      bugguide could be made even more useful by asking users for more
                      information
                      > about their sighting.

                      I would advocate an all of the above solution, i.e. improving Bugguide
                      itself, improving relevant tools at other sites such as Discoverlife, and
                      establishing useful links between sites with complementary emphases.

                      > "Local repositories can enhance centralized (global) data
                      by providing
                      additional more particular services (e.g., customizable dynamic local
                      maps
                      > and potentially analyses based on these) "
                      >
                      >
                      I guess this is more along the lines of what I am thinking. But "local" in
                      > the sense of specificty of purpose or usage, rather than
                      geography.
                      Thoughts?

                      I meant both.

                      In terms of geography, one example of a local site would be a global or
                      regional ID guide customized for a specific site by filtering out
                      extralimital taxa.

                      For example, here is the eastern Bee Genera guide customized for the
                      Fingerlakes region of NY:

                      http://www.discover life.org/ mp/20q?guide= Bee_genera& cl=US/NY/ Fingerlakes

                      In terms of specificity of purpose, a local site could highlight and
                      extend a subset of data, e.g., pollinator-plant interactions, derived by
                      querying one or more central repositories.

                      John

                      >
                      Matt
                      >
                      >
                      >

                      --
                      John S. Ascher, Ph.D.
                      Bee Database Project Manager
                      Division of Invertebrate Zoology
                      American Museum of Natural History
                      Central Park West @ 79th St.
                      New York, NY 10024-5192
                      work phone: 212-496-3447
                      mobile phone: 917-407-0378

                    • Sam Droege
                      I wasn t aware of some of those new, more flexible database features, it will be good to have representation at the meeting from that group. While one could
                      Message 10 of 28 , Aug 16, 2008
                      • 0 Attachment
                        I wasn't aware of some of those new, more flexible database features,
                        it will be good to have representation at the meeting from that
                        group. While one could argue that you could develop those features
                        later, I think that more and more that database functions will help
                        guide the development of what gets monitored. Its also clear that
                        internet functions can be built directly into monitoring schemes
                        rather than having paper surveys that get entered later.

                        The possibilities of expanding Bugguide.net are intriguing. It seems
                        particularly good at detetecting the spread of introduced
                        species...and the digital libraries that are produced are going to
                        become invaluable.

                        sam


                        --- In beemonitoring@yahoogroups.com, "Matthew Sarver" <mjsarver@...>
                        wrote:
                        >
                        > Great! I didn't know discoverlife was set up that way until Dan
                        pointed it
                        > out. A query interface for this database now seems like an obvious
                        starting
                        > point. As for PCDL - I thought they were only tackling literature,
                        at least
                        > for now. Do they have plans to incorporate specimen data as well?
                        I've
                        > certainly used it for plant/pollinator interactions a number of
                        times
                        > already.
                        >
                        > The "citizen science" thing for insects has great potential - as
                        long as
                        > those who can ID the pics can keep up! An integration of bugguide
                        and
                        > discover life would be really cool!
                        >
                        > Matt
                        >
                        > _____
                        >
                        > From: beemonitoring@yahoogroups.com
                        [mailto:beemonitoring@yahoogroups.com]
                        > On Behalf Of John S. Ascher
                        > Sent: Saturday, August 16, 2008 1:16 AM
                        > To: beemonitoring@yahoogroups.com
                        > Subject: Re: [beemonitoring] Re: Standardized Sampling
                        Methodologies and a
                        > Common Database
                        >
                        >
                        >
                        >
                        > Matt -
                        >
                        > Thanks for another thoughtful response.
                        >
                        > I did not mean to suggest
                        > > reinventing the wheel on this, but wasn't sure how many of these
                        > existing
                        > > databases are flexible enough in their data input to allow us to
                        work
                        > with
                        > > the specific fields that the bee community would find useful /
                        > neccessary.
                        >
                        > As Dan already noted Discoverlife can accommodate virtually any
                        field as
                        > long as data are linked directly to a species name. Only fields
                        with data
                        > appear when you pull up specimen records; blank fields are not
                        displayed.
                        >
                        > > Generating a map for a species is one thing, but a fully
                        searchable
                        > database
                        > > that allows one to find flower records, flight periods, etc for a
                        > certain
                        > > part of the world or a certain species is another.
                        >
                        > There are web portals being designed specifically to fulfill
                        precisely
                        > these needs, e.g.:
                        >
                        > http://libraryporta <http://libraryportals.com/PCDL> ls.com/PCDL
                        >
                        > Stuart Roberts in the UK is developing an excellent database
                        optimized to
                        > record these data.
                        >
                        > Right now, the
                        > > Discover
                        > > Life specimen view includes a number of very useful data fields,
                        but
                        > there
                        > > are certainly many more that might be of interest, particularly
                        in terms
                        > of
                        > > habitat and floral associations.
                        >
                        > These can already be mapped. These and other fields you can dream
                        up can
                        > certainly be displayed. Sam even has a field where he notes brand of
                        > soap!
                        >
                        > As far as I know, there is no easy way
                        > > to
                        > > search the fields in that database, other than by viewing a
                        specimen
                        > record
                        > > from the mapper.
                        >
                        > You are correct. The search function needs improvement.
                        >
                        > Likewise, GBIF is primarily biogeographical data. I was
                        > > thinking about the creation of a database web portal with a
                        design and
                        > front
                        > > end that would be specifically geared toward pollinator records,
                        and the
                        > associated ecological data that might not fit the mold of available
                        > broader
                        > > repositories.
                        >
                        > As noted above this may already exist:
                        >
                        > http://libraryporta <http://libraryportals.com/PCDL> ls.com/PCDL
                        >
                        > > Such a customized portal could also be expanded to include an
                        EBird or
                        > Bugguide-like citizen science component, where photos could be
                        posted by
                        > amateurs. I agree that bugguide already serves that purpose
                        admirably,
                        > but
                        > > its structure does not encourage the entry of scientifically
                        useful data
                        > along with submitted records in the way that a custom-tailored user
                        > interface like Ebird does. The already useful information generated
                        by
                        > bugguide could be made even more useful by asking users for more
                        > information
                        > > about their sighting.
                        >
                        > I would advocate an all of the above solution, i.e. improving
                        Bugguide
                        > itself, improving relevant tools at other sites such as
                        Discoverlife, and
                        > establishing useful links between sites with complementary emphases.
                        >
                        > > "Local repositories can enhance centralized (global) data by
                        providing
                        > additional more particular services (e.g., customizable dynamic
                        local
                        > maps
                        > > and potentially analyses based on these) "
                        > >
                        > > I guess this is more along the lines of what I am thinking.
                        But "local" in
                        > > the sense of specificty of purpose or usage, rather than
                        geography.
                        > Thoughts?
                        >
                        > I meant both.
                        >
                        > In terms of geography, one example of a local site would be a
                        global or
                        > regional ID guide customized for a specific site by filtering out
                        > extralimital taxa.
                        >
                        > For example, here is the eastern Bee Genera guide customized for the
                        > Fingerlakes region of NY:
                        >
                        > http://www.discover
                        > <http://www.discoverlife.org/mp/20q?
                        guide=Bee_genera&cl=US/NY/Fingerlakes>
                        > life.org/mp/20q?guide=Bee_genera&cl=US/NY/Fingerlakes
                        >
                        > In terms of specificity of purpose, a local site could highlight and
                        > extend a subset of data, e.g., pollinator-plant interactions,
                        derived by
                        > querying one or more central repositories.
                        >
                        > John
                        >
                        > > Matt
                        > >
                        > >
                        > >
                        >
                        > --
                        > John S. Ascher, Ph.D.
                        > Bee Database Project Manager
                        > Division of Invertebrate Zoology
                        > American Museum of Natural History
                        > Central Park West @ 79th St.
                        > New York, NY 10024-5192
                        > work phone: 212-496-3447
                        > mobile phone: 917-407-0378
                        >
                      • Dan Kjar
                        Here is a quick break down of relational vs flat databases. Relational databases link tables to tables and those links allow you to do some very powerful
                        Message 11 of 28 , Aug 16, 2008
                        • 0 Attachment
                          Here is a quick break down of relational vs flat databases.

                          Relational databases link tables to tables and those links allow you
                          to do some very powerful queries. However, as the tables grow the
                          queries slow and as the relationships become more complex the database
                          gets kludgy to deal with and nearly incomprehensible to people that
                          did not design it.

                          Flat file databases are always meaningful to humans and any human that
                          can read text. Flat files do not allow you to do some of the more
                          wizbang pull it out of your *** searches that relational databases
                          allow you. However, if you know what people are going to search
                          (genus/species/whatever), the way you make flat file databases scream
                          is by indexing the information and holding the indexes in hash tables
                          (at the file system/OS/Perl/C++) level. This is how pick can put
                          300,000 points on a map in just a few seconds. His database currently
                          has over 1.4 million records and when he gets all of th GBIF info it
                          will be over 15 million records (if I remember correctly). The
                          difficult part here is that you need to predetermine what queries the
                          user will be doing. The big search engines all work along the same lines.

                          I have mostly made relational databases, including my last one for the
                          Smithsonian. That database is limited to the exact number of type ant
                          specimens the museum holds. I made the decision that 1200 specimens
                          would not slow the searches to any appreciable level so I went with
                          the ease and power of a relational database. If it were going to
                          30,000 I would go with a flat file design.

                          If you would like to see the difference do a search on aphaenogaster
                          at this website
                          http://ripley.si.edu/ent/nmnhtypdb

                          and compare it to an author search on wheeler
                          at this website
                          http://ripley.si.edu/ent/nmnhtypedb/wlb/wlbsearch.cfm

                          The first is relational and allows me to easily assign multiple
                          taxonomies and specimens for a single type. The second is a flat
                          file. The first has 1400 or so entries in the typetable hooked to a
                          variety of other tables through relationships. The second has 10,000
                          records and is not hooked to other tables.


                          Dan
                        • Matthew Sarver
                          Dan wrote: the way you make flat file databases scream is by indexing the information and holding the indexes in hash tables (at the file system/OS/Perl/C++)
                          Message 12 of 28 , Aug 16, 2008
                          • 0 Attachment
                            Dan wrote: "the way you make flat file databases scream
                            is by indexing the information and holding the indexes in hash tables
                            (at the file system/OS/Perl/ C++) level."

                            John replied: "Clearly I need to learn more about this, at least enough to understand
                            something about what the experts are doing."

                             
                            The whole topic is way over my head, but maybe this will help with some very basic info about different ways of indexing a database, including hash tables (I hope the info presented in this brief article is correct):
                             
                             
                            So, Dan - what you're telling us is that a db of the size that could store all of the potentially-contributed bee specimen records from North America would HAVE to be a flat db (eg Discover Life), rather than relational, right?  So, the question is, is it possible to create some kind of front end web interface for a db like Discover Life that would allow queries on the basis of host plant, locality, collection method, month, etc.?  Or would the amount of indexing required to do this screw up data entry?  It doesn't seem very useful to store all this information with a specimen record, but effectively have no way to access it via a query.  Being able to sort by collection method and collection protocol would go a long way toward the goal of increasing standardization without sacrificing information.  
                             
                            I didn't realize how limited relational dbs were in terms of number of records - thanks for enlightening us on all of this!
                             
                            Apologies for ignorance about database design. :(
                             
                            Thanks
                            Matt

                          • Dan Kjar
                            There is no real limit on the hashes since they can be stored in various ways on filesystems. They can be loaded into memory and accessed very quickly. The
                            Message 13 of 28 , Aug 16, 2008
                            • 0 Attachment
                              There is no 'real' limit on the hashes since they can be stored in
                              various ways on filesystems. They can be loaded into memory and
                              accessed very quickly. The limit on this method is exactly what you
                              state... we need to know the searches a priori of the visit. If
                              someone suddenly wants to map all of the 5 legged male bees found in
                              southern utah we will have a problem.

                              Relational databases get around this by caching common searches and
                              renewing the cache occasionally. Products like cold fusion have
                              included this for years (yuck, but easy, that is what I wrote the
                              Smithsonian site in. MYSQL for the database if you are interested. Now
                              I only use perl and MYSQL. Pick uses berkeleyDB, luddite that he is).

                              Let me run down a simple search using a relational database.
                              You have three tables. One is a taxonomic data, another is specimen
                              data, and another is locale data. You can have multiple specimens
                              tied to single entries in the taxonomic data table and multiple
                              specimens tied to the locale data (e.g. all the specimens of one
                              species, and all of the specimens from one site). You would do this
                              to avoid having the exact same taxonomic or locale data for all 150
                              million specimens. The more crap in the table the longer it takes to
                              search it.

                              The problem is if you search on the fly and you have 300,000 records,
                              a simple search for the bees of Wisconsin takes a very long time (but
                              not nearly as long as searching a flat file without the hash table).
                              If you have a hash table of locales all you need to do is search down
                              the locales and then grab all of the records included.

                              example hash table based on previously searched terms
                              key value
                              Minnesota 1,3,5,6,9,10,23,35
                              Wisconsin 2,3,4,8,11,20,34

                              It only takes a split second to reach into the flat database and grab
                              everything in records 2,3, etc. It takes a little longer to reach in
                              to a relational database and check each specimen record to see if it
                              has a link to a locale table entry that includes Wisconsin (or vice
                              versa, but you would still need to check the taxonomic table to make
                              sure it is a bee or whatever you are interested in). Every time there
                              is a comparison statement it takes much more time. Like I said though,
                              this only really matters with very large datasets and people at places
                              invested in relational datasets spend most of their time figuring out
                              how to make things move more quickly.

                              There are many other ways to get relational datasets moving fast but
                              in the business world it is a bit easier for the consumer. If you log
                              onto your bank account they can cache all information dealing with
                              your accounts so you can have quick access to it after a short login
                              wait. However, they know you are only going to look at your own stuff
                              (hopefully). Since it takes this kind of magic to get relational
                              databases to move I have decided that I might as well skip all that
                              nonsense and move to the indexing right away and leave the data in a
                              human readable format in case I kick off.

                              The other nice thing about flat files is that anyone can write queries
                              or index it however they see fit. As soon as you decide to put it
                              into a relational setup (e.g. speciesname table, genusname table,
                              specimen table, source table, locale table, alien invasive status
                              table etc..) You are tied to that setup to create queries. Of course
                              you could right a query that would flatten it (I did this with some
                              Fish data from STRI and it WAS AWFUL), but that begs the question why
                              not just leave the data in human readable form and cut it up for
                              individual uses?

                              Not that any of this needs to be worried about at this point....

                              Dan


                              --- In beemonitoring@yahoogroups.com, "Matthew Sarver" <mjsarver@...>
                              wrote:
                              >
                              > Dan wrote: "the way you make flat file databases scream
                              > is by indexing the information and holding the indexes in hash tables
                              > (at the file system/OS/Perl/C++) level."
                              >
                              > John replied: "Clearly I need to learn more about this, at least
                              enough to
                              > understand
                              > something about what the experts are doing."
                              >
                              >
                              > The whole topic is way over my head, but maybe this will help with
                              some very
                              > basic info about different ways of indexing a database, including hash
                              > tables (I hope the info presented in this brief article is correct):
                              >
                              > http://20bits.com/2008/05/13/interview-questions-database-indexes/
                              >
                              > So, Dan - what you're telling us is that a db of the size that could
                              store
                              > all of the potentially-contributed bee specimen records from North
                              America
                              > would HAVE to be a flat db (eg Discover Life), rather than relational,
                              > right? So, the question is, is it possible to create some kind of
                              front end
                              > web interface for a db like Discover Life that would allow queries
                              on the
                              > basis of host plant, locality, collection method, month, etc.? Or
                              would the
                              > amount of indexing required to do this screw up data entry? It
                              doesn't seem
                              > very useful to store all this information with a specimen record, but
                              > effectively have no way to access it via a query. Being able to sort by
                              > collection method and collection protocol would go a long way toward the
                              > goal of increasing standardization without sacrificing information.
                              >
                              > I didn't realize how limited relational dbs were in terms of number of
                              > records - thanks for enlightening us on all of this!
                              >
                              > Apologies for ignorance about database design. :(
                              >
                              > Thanks
                              > Matt
                              >
                              >
                              <http://geo.yahoo.com/serv?s=97359714/grpId=17598545/grpspId=1705083125/msgI
                              > d=406/stime=1218922240/nc1=3848642/nc2=4025291/nc3=5202316>
                              >
                            Your message has been successfully submitted and would be delivered to recipients shortly.