Loading ...
Sorry, an error occurred while loading the content.

2134Re: [baseball-databank] Re: Means of Updating and Correcting BDB Data

Expand Messages
  • Sean Forman
    May 6 6:45 AM
    • 0 Attachment
      > 1. Go to my working directory with my checked out copy of the database (or
      > run "cvs checkout" to create a local copy).
      >
      > 2. Run "cvs update" to get the freshest copy from the repository.
      >
      > 3. Make the change to the text file. This could either be done by text
      > editor at this point, or by loading into MySQL, running the update command
      > on that data point, and extracting the data back out.
      >
      > 4. For the tracking we suggested, we want a datasource and a datasupplier.
      > I'm just taking stabs at these, but here's what my first attempt at an
      > entry in DATASOURCES would be:
      >
      > 001,"BR Comment",2004
      >
      > 5. I'd add a row to DATASUPPLIER:
      > 001,"Michael Timmons"
      >
      > 6. If we wanted to put CVSversion in the DATACHANGE table, I'd want to run
      > "cvs status -v Master.txt" to see the current revision.
      >
      > 7. I'd add a row to DATACHANGE to link:
      > changeid,sourceid,supplierid,table,CVSversion,comment
      > 195,001,001,Master,2.08,"changed birth place of mccarfr01 to NYC from
      > Middletown, CT"
      >
      > 8. I'd then run "cvs commit" and add a descriptive log message:
      >
      > Changed birth place of mccarfr01 to NYC from Middletown, CT per BR comment
      > from great-grandson Michael Timmons.


      Steps 4, 5, 6, 7 are where it is starting to look onerous, though I may
      be overreacting. I'm just trying to imagine running through Bill
      Carle's bimonthly newsletters. There are probably 100 corrections per
      newsletter. I guess we would make all of the corrections and then make
      the following data entries into the DB.

      002,"SABR Biographical Commmittee Newsletter, March/April 2004",2004

      002,"SABR Biographical Committee, Bill Carle, Chair"

      196,002,002,Master,2.09,"made dozens of corrections to many player bio
      data, place and date of birth, place and date of death, debut date and
      removed two players who were found to be duplicates. foobarr01 was
      folded into foobarr02 and zippo01 was folded into ziper01."
      197,002,002,Pitching,2.09,"made dozens of corrections to many player bio
      data, place and date of birth, place and date of death, debut date and
      removed two players who were found to be duplicates. foobarr01 was
      folded into foobarr02 and zippo01 was folded into ziper01."
      198,002,002,Fielding,2.09,"made dozens of corrections to many player bio
      data, place and date of birth, place and date of death, debut date and
      removed two players who were found to be duplicates. foobarr01 was
      folded into foobarr02 and zippo01 was folded into ziper01."


      I think that we might also put a blob field at the end, so that you
      attach the entire e-mail to the datachange file. However, you run into
      issues entering that into a text file with line returns, etc. you would
      have to enter that into a db and then dump the db for that to work.

      What if we set up a web form for tracking the changes and updates that
      are made?


      Another concern I have in the loading and dumping from and to the DB and
      the text files is ordering of the lines. It would be a real pain to
      have an admin load and then dump the files in some other order and then
      everyone have to update all of their files, but that is probably a minor
      issue. I know how I would handle it in linux (I'd dump and then sort),
      but I'm not sure how to handle it in windows or on a mac.

      I've looked for other projects trying to do this as well, but haven't
      found any. I sort of chose the name based on the ProteinDatabank, but
      that isn't really the same thing.

      The other advantage of doing it directly in mysql rather than to a text
      file is the the formatting is done for free. We won't have to be as
      careful for column counts etc., but I guess CVS will allow us to undo
      any issues that might arise.

      Sincerely,
      Sean Forman

      Baseball Stats! http://www.Baseball-Reference.com/
      Baseball Analysis! http://www.BaseballPrimer.com/
    • Show all 20 messages in this topic