Loading ...
Sorry, an error occurred while loading the content.

4459Re: csv? Re: [baseball-databank] Proposed changes to master table

Expand Messages
  • F. X. Flinn
    Dec 15, 2013
      It should be possible to supply a tab-delimited version or, even better, a strictly compliant csv file with " surrounding text fields.

      By the way, I applaud the proposed changes. It will streamline and modernize. Further I agree with Ted's suggestion that the birth and death dates continue to be provided as separate year/month/day fields, and suggest providing an ISO date field would be a good addition and let us move to that as time goes along.

      F. X. Flinn
      FXFlinn@gmail | c:802-369-0069


      On Sat, Dec 14, 2013 at 3:52 PM, John Rickert <rickert@...> wrote:
       

      This may not be a problem for anyone else, but I usually work with the csv files and treat the comma as a separator for the variables.  But a few of the nickname and college/school tables use commas to list several names. Is it reasonable to change the sub-separtors in the csv files? 

      If not I'll continue to rewrite the tables, but if it can be changed now seems like a fine time. 
      (I've been using perl to process the files - it may be that I'm overlooking some commands because of my wolf-child background as a programmer)

      I'm fine with the other changes  being made or not made as are best for everyone else.

      john rickert



      On Dec 13, 2013, at 1:57 PM, <seanlahman@...> <seanlahman@...> wrote:

       

      I've been having some conversations offline with Ted Turocy about updating and improving the Master table, and I'd like to propose a few changes.  Please take a look and let me know if you have any objections to any of these.



      1) I'd like to eliminate the fields for lahman40, lahman45, and holtzID.  The first two are IDs for older versions of the database, from 1999 and 2000 to be precise.  They were maintained for backwards compatability, but it doesn't seem like there's reason to retain them any longer. The same applies to the HoltzID, which hasn't been updated since 2001.

      2) I'd also like to consolidate the birth and death dates from three fields each (year, month, day)  into a single field (yyyy-mm-dd).  When I created the three-field scheme in the mid-1990s, it was necessary because Microsoft Access could not handle dates earlier than 1/1/1900.  That is no longer an issue.   Folks who use the csv files in Excel will still encounter this problem, but to address that I'll resume the practice of creating human-readable Excel-native versions of the data.

      3) Unless there is significant objection, I'd like to eliminate the practice of having seperate managerIDs and hofIDs.  I'd remove those fields from the master table, and revert to playerIDs as a key in the relevant tables (AwardsManagers, AwardsShareManagers,HallOfFame,Managers,ManageesHalf)

      4) I'm also proposing elimination of the "college" field in the master table, in deference to the "Schools" and "SchoolsPlayers" table.  At present, all of the information in the "college" field exisst in the "SchoolsPlayers" table.  This will put us in better position to make significant additions and upgrades to that dataset.

      Though ts? Questions? Concerns?  Objections?  Let's hear 'them!

      Regards,
      Sean Lahman



    • Show all 23 messages in this topic