Loading ...
Sorry, an error occurred while loading the content.

4488Alpha version of collegiate information

Expand Messages
  • Theodore Turocy
    Feb 19, 2014

      First, thanks to Sean and all for the lovely words last week. Indeed, some of the recent glitches in the last release especially had to do with trying to synchronise up systems. There may still be some hiccoughs to come. But the important thing is that this is work that just has to be done once, and then it's done -- what is key is that it is rather silly for there to be multiple people compiling information on things like basic demographic data on MLBers. It's not a good use of anyone's scarce time; there are plenty of more interesting things for us all to be working on than constantly reinventing the wheel!

      In particular, it should be noted that the demographic information now in the Master table matches what is sent out to, among other places, baseball-reference. So if you find any errors, please do report them (to Sean). It is worth the effort because corrections will propagate out widely!

      With that out of the way, there are a few other datasets I am hopeful of being able to maintain as "sidecars" to the databank. Today I am making one of these available. This is an alternative set of information on college affiliations for MLBers. It is databank-compatible in that the tables are produced using the databank IDs (as opposed to the underlying UUIDs you'll find in the main Chadwick register), so it should be possible to just "drop in" this data into your database schema.

      You can fetch it from


      There is a README.txt file which I strongly suggest you read before doing anything with this information -- as there are many, many caveats that should be taken into account. This dataset will be far less mature than, e.g., MLBer basic demographics, and there will be correspondingly more discrepancies and outright errors.

      A key innovation in this dataset is that it separates explicitly attendance at colleges from playing at colleges; there are separate attendance records and playing records. I believe this is the first dataset (at least, the first dataset to see the light of day) to make this distinction systematically in the data model.

      Contributions are welcome, but again please see the README first before acting. THIS DATASET SURELY CONTAINS MANY ERRORS! The goal of making it available is to root out those errors. Flames will be cheerfully ignored.

      Also, as with the demographics above, the attendance information is also furnished to baseball-reference, so corrections made here will ultimately propagate widely as well.

      Dr Theodore L Turocy -- ted.turocy@...
      Chadwick Baseball Bureau -- http://www.chadwick-bureau.com