Loading ...
Sorry, an error occurred while loading the content.

2138Re: [baseball-databank] Re: Means of Updating and Correcting BDB Data

Expand Messages
  • Derek Adair
    May 6, 2004
    • 0 Attachment
      On Thu, 6 May 2004, Sean Forman wrote:

      > Steps 4, 5, 6, 7 are where it is starting to look onerous, though I may
      > be overreacting. I'm just trying to imagine running through Bill
      > Carle's bimonthly newsletters. There are probably 100 corrections per
      > newsletter. I guess we would make all of the corrections and then make
      > the following data entries into the DB.

      I definitely think you'd want to group the corrections. I think of them
      more as a change set than a change to a single value. For example, "Added
      ROE column" would be one "change" even though it affected a number of
      rows. In the case of scattered data points, like you have with the Bio
      Committee stuff, the change source and note, as well as the cvs diff,
      provide you with all the tracking info you'd need.

      > 002,"SABR Biographical Commmittee Newsletter, March/April 2004",2004
      >
      > 002,"SABR Biographical Committee, Bill Carle, Chair"
      >
      > 196,002,002,Master,2.09,"made dozens of corrections to many player bio
      > data, place and date of birth, place and date of death, debut date and
      > removed two players who were found to be duplicates. foobarr01 was
      > folded into foobarr02 and zippo01 was folded into ziper01."
      > 197,002,002,Pitching,2.09,"made dozens of corrections to many player bio
      > data, place and date of birth, place and date of death, debut date and
      > removed two players who were found to be duplicates. foobarr01 was
      > folded into foobarr02 and zippo01 was folded into ziper01."
      > 198,002,002,Fielding,2.09,"made dozens of corrections to many player bio
      > data, place and date of birth, place and date of death, debut date and
      > removed two players who were found to be duplicates. foobarr01 was
      > folded into foobarr02 and zippo01 was folded into ziper01."

      I think you'd only need/want one of the entries in the Changes table.
      Change 196 was "(making) dozens of corrections...."

      > I think that we might also put a blob field at the end, so that you
      > attach the entire e-mail to the datachange file. However, you run into
      > issues entering that into a text file with line returns, etc. you would
      > have to enter that into a db and then dump the db for that to work.

      I agree that having a blob would make working with text difficult.
      Personally, I'd rather see a web archive than have the db bloated with the
      info.

      > What if we set up a web form for tracking the changes and updates that
      > are made?

      That's definitely possible.

      > Another concern I have in the loading and dumping from and to the DB and
      > the text files is ordering of the lines. It would be a real pain to
      > have an admin load and then dump the files in some other order and then
      > everyone have to update all of their files, but that is probably a minor
      > issue. I know how I would handle it in linux (I'd dump and then sort),
      > but I'm not sure how to handle it in windows or on a mac.

      I echo Tom here - what happens that your lines go screwy? I've always got
      lines out in the order I put them in. I admit, I do a lot more "select *
      from Batting into outfile '/dump/Batting'" type queries than I do actual
      DB dumps. I do think the probable admins are tech-savvy enough that this
      won't be an issue.

      > I've looked for other projects trying to do this as well, but haven't
      > found any. I sort of chose the name based on the ProteinDatabank, but
      > that isn't really the same thing.
      >
      > The other advantage of doing it directly in mysql rather than to a text
      > file is the the formatting is done for free. We won't have to be as
      > careful for column counts etc., but I guess CVS will allow us to undo
      > any issues that might arise.

      CVS does allow you to roll back fairly easily. Hopefully this wouldn't be
      an issue, though. I know before I send out any file, I import and make
      sure it doesn't give errors or warnings.

      Regards,
      Derek
    • Show all 20 messages in this topic