Loading ...
Sorry, an error occurred while loading the content.

405Re: Standardized Sampling Methodologies and a Common Database

Expand Messages
  • Dan Kjar
    Aug 16, 2008
      Here is a quick break down of relational vs flat databases.

      Relational databases link tables to tables and those links allow you
      to do some very powerful queries. However, as the tables grow the
      queries slow and as the relationships become more complex the database
      gets kludgy to deal with and nearly incomprehensible to people that
      did not design it.

      Flat file databases are always meaningful to humans and any human that
      can read text. Flat files do not allow you to do some of the more
      wizbang pull it out of your *** searches that relational databases
      allow you. However, if you know what people are going to search
      (genus/species/whatever), the way you make flat file databases scream
      is by indexing the information and holding the indexes in hash tables
      (at the file system/OS/Perl/C++) level. This is how pick can put
      300,000 points on a map in just a few seconds. His database currently
      has over 1.4 million records and when he gets all of th GBIF info it
      will be over 15 million records (if I remember correctly). The
      difficult part here is that you need to predetermine what queries the
      user will be doing. The big search engines all work along the same lines.

      I have mostly made relational databases, including my last one for the
      Smithsonian. That database is limited to the exact number of type ant
      specimens the museum holds. I made the decision that 1200 specimens
      would not slow the searches to any appreciable level so I went with
      the ease and power of a relational database. If it were going to
      30,000 I would go with a flat file design.

      If you would like to see the difference do a search on aphaenogaster
      at this website

      and compare it to an author search on wheeler
      at this website

      The first is relational and allows me to easily assign multiple
      taxonomies and specimens for a single type. The second is a flat
      file. The first has 1400 or so entries in the typetable hooked to a
      variety of other tables through relationships. The second has 10,000
      records and is not hooked to other tables.

    • Show all 28 messages in this topic