Loading ...
Sorry, an error occurred while loading the content.

Re: Brainstorming

Expand Messages
  • tmasc
    Have Sean and Sean made any decisions yet on how to open-source the database? Just like to know where we stand, and how the rest of us can contribute.
    Message 1 of 13 , Mar 7, 2002
    • 0 Attachment
      Have Sean and Sean made any decisions yet on how to "open-source" the
      database?

      Just like to know where we stand, and how the rest of us can
      contribute.

      Thanks, Tom (aka Tangotiger)


      --- In baseball-databank@y..., Sean Forman <sean-forman@b...> wrote:
      >
      > > >I think we should place each file in a separate zip file and
      then allow
      > > >the download of each file as needed. I think this is the most
      likely to
      > > >keep bandwidth somewhat under control. Each 4MB costs around a
      penny
      > > >and I think when it is said and done the database is going to be
      pretty
      > > >large. This will also discourage folks who only want parts of
      the
      > > >database not to take more than they need.
      >
      > > I understand how bandwidth is a concern to you, but I suspect
      there are a great
      > > many people for whom it isn't, so I'd guess most people would
      want it all.
      > > Having both options would of course be nice.
      >
      >
      > tmasc followed up and was right. I'm worried about bandwidth on the
      > server end. I get charged $2.50 a GB/month, and BR.com is using
      about
      > 60GB/month as it is. I'm a bit nervous about starting this and
      having a
      > $500 bill come rolling into my inbox. ;-)
      >
      >
      > > >We will likely need some documentation on how to load this into
      Access
      > > >as well, since I'm guessing that is what most people will need.
      As a
      > > >mysql user, I tend to want to just dump the tables which means
      it prints
      > > >out the SQL needed to recreate the tables. I'm not sure how
      Access
      > > >feels about this sort of input, Any input? Small scripts to aid
      the
      > > >import process might be good as well. Again, I'm a Linux user,
      so I'm
      > > >open to feedback from the 99% of the world that uses Windows.
      >
      > > The DB used to be available in Access, no? Any particular reasons
      not to keep it
      > > that way? Just work or licensing? It's probably a lot easier to
      have an Access
      > > version downloadable than to maintain proper instructions.
      >
      >
      > To be honest, I was thinking this for three reasons.
      > 1) The file format is smaller in the non-Access case, unless I'm
      > mistaken.
      > 2) It would require the download of all the files, which may be
      > something, that while I don't want to discourage, I don't want
      folks to
      > download all the files to get some small part of the database
      > 3) If we have a "golden" DB then it makes sense to me to have it
      online
      > somewhere (accessible by trusted users) and then dump that into the
      > public's hands every so often. I'm worried about having to synch
      the
      > online database with the Access version, then having to upload the
      > Access version, etc., but I do understand that this is how folks may
      > want it, so I'm willing to be flexible. I'm just hoping that
      Access has
      > some sort of import scripting capabilities, that will make importing
      > super easy.
      >
      >
      >
      > > I think the SQL in each patch should be such that it is safe to
      run it multiple
      > > times. Thus, we should at any one point in time just keep one
      patch file, the
      > > one taking the user from the latest released version of the db to
      the current
      > > "patch level". I don't think it's a big concern for most people
      that this patch
      > > would contain changes already in his/her DB. I think this will
      make maintenance
      > > quite a bit simpler and it will decrease the possible confusion.
      >
      >
      > I'm trying to think of instances where this wouldn't be the case.
      > Perhaps if we had updates followed by deletes or something like
      that, it
      > could make things problematic. I'm not sure about this.
      >
      >
      > > If it's ok with you, I think that is a good idea:
      > > * Mail to Sean F
      > > * Sean F checks to see if it's reasonable and if so posts on the
      list
      > > * If no comments are given in x hrs, Sean F incorporates the
      changes to a patch
      > > file.
      >
      >
      > I think x can be pretty large here.
      >
      >
      > > >Also, if you are looking just for 2001 data, try here for now
      > > >http://www.baseball-reference.com/data/2001/
      >
      > > This makes it interesting: how to manage the yearly updates
      conveniently? Let's
      > > see what files are needed (I use batting as an example, but this
      should go for
      > > any table):
      >
      > > * A complete batting file
      > > * A batting file for each year starting 2001 (so that one can add
      files
      > > incrementally in the future)
      > > * A patch file for the complete file fixing all known errors in
      that file
      >
      >
      > I'm tempted to just update the database every year, or I think it
      might
      > get too complicated.
      >
      >
      > > Let's then look at a proper naming standard. Perhaps:
      > > <table name>_<C/Y/P><year></P>_<Vyy>.txt
      >
      > > <table name> is obvious. C would be for the complete file, Y for
      a specific file
      > > and P for patches. <year> is year. Vxx is version nr.
      >
      >
      > <Naming standards deleted>
      >
      > I like the naming standards, though I do think we need to make it
      fairly
      > simple and probably don't need to have too many files each year.
      >
      > > /Micke
      >
      >
      >
      > Sincerely,
      > Sean Forman
      >
      > Baseball Stats! http://www.Baseball-Reference.com/
      > Baseball Analysis! http://www.BaseballPrimer.com/
    • Holmes, Dan
      I haven t heard anything other than it is supposed to be available soon. ... From: tmasc [mailto:tmasc@yahoo.com] Sent: Thursday, March 07, 2002 11:44 AM To:
      Message 2 of 13 , Mar 7, 2002
      • 0 Attachment
        I haven't heard anything other than it is supposed to be available soon.

        -----Original Message-----
        From: tmasc [mailto:tmasc@...]
        Sent: Thursday, March 07, 2002 11:44 AM
        To: baseball-databank@yahoogroups.com
        Subject: [baseball-databank] Re: Brainstorming


        Have Sean and Sean made any decisions yet on how to "open-source" the
        database?

        Just like to know where we stand, and how the rest of us can
        contribute.

        Thanks, Tom (aka Tangotiger)


        --- In baseball-databank@y..., Sean Forman <sean-forman@b...> wrote:
        >
        > > >I think we should place each file in a separate zip file and
        then allow
        > > >the download of each file as needed. I think this is the most
        likely to
        > > >keep bandwidth somewhat under control. Each 4MB costs around a
        penny
        > > >and I think when it is said and done the database is going to be
        pretty
        > > >large. This will also discourage folks who only want parts of
        the
        > > >database not to take more than they need.
        >
        > > I understand how bandwidth is a concern to you, but I suspect
        there are a great
        > > many people for whom it isn't, so I'd guess most people would
        want it all.
        > > Having both options would of course be nice.
        >
        >
        > tmasc followed up and was right. I'm worried about bandwidth on the
        > server end. I get charged $2.50 a GB/month, and BR.com is using
        about
        > 60GB/month as it is. I'm a bit nervous about starting this and
        having a
        > $500 bill come rolling into my inbox. ;-)
        >
        >
        > > >We will likely need some documentation on how to load this into
        Access
        > > >as well, since I'm guessing that is what most people will need.
        As a
        > > >mysql user, I tend to want to just dump the tables which means
        it prints
        > > >out the SQL needed to recreate the tables. I'm not sure how
        Access
        > > >feels about this sort of input, Any input? Small scripts to aid
        the
        > > >import process might be good as well. Again, I'm a Linux user,
        so I'm
        > > >open to feedback from the 99% of the world that uses Windows.
        >
        > > The DB used to be available in Access, no? Any particular reasons
        not to keep it
        > > that way? Just work or licensing? It's probably a lot easier to
        have an Access
        > > version downloadable than to maintain proper instructions.
        >
        >
        > To be honest, I was thinking this for three reasons.
        > 1) The file format is smaller in the non-Access case, unless I'm
        > mistaken.
        > 2) It would require the download of all the files, which may be
        > something, that while I don't want to discourage, I don't want
        folks to
        > download all the files to get some small part of the database
        > 3) If we have a "golden" DB then it makes sense to me to have it
        online
        > somewhere (accessible by trusted users) and then dump that into the
        > public's hands every so often. I'm worried about having to synch
        the
        > online database with the Access version, then having to upload the
        > Access version, etc., but I do understand that this is how folks may
        > want it, so I'm willing to be flexible. I'm just hoping that
        Access has
        > some sort of import scripting capabilities, that will make importing
        > super easy.
        >
        >
        >
        > > I think the SQL in each patch should be such that it is safe to
        run it multiple
        > > times. Thus, we should at any one point in time just keep one
        patch file, the
        > > one taking the user from the latest released version of the db to
        the current
        > > "patch level". I don't think it's a big concern for most people
        that this patch
        > > would contain changes already in his/her DB. I think this will
        make maintenance
        > > quite a bit simpler and it will decrease the possible confusion.
        >
        >
        > I'm trying to think of instances where this wouldn't be the case.
        > Perhaps if we had updates followed by deletes or something like
        that, it
        > could make things problematic. I'm not sure about this.
        >
        >
        > > If it's ok with you, I think that is a good idea:
        > > * Mail to Sean F
        > > * Sean F checks to see if it's reasonable and if so posts on the
        list
        > > * If no comments are given in x hrs, Sean F incorporates the
        changes to a patch
        > > file.
        >
        >
        > I think x can be pretty large here.
        >
        >
        > > >Also, if you are looking just for 2001 data, try here for now
        > > >http://www.baseball-reference.com/data/2001/
        >
        > > This makes it interesting: how to manage the yearly updates
        conveniently? Let's
        > > see what files are needed (I use batting as an example, but this
        should go for
        > > any table):
        >
        > > * A complete batting file
        > > * A batting file for each year starting 2001 (so that one can add
        files
        > > incrementally in the future)
        > > * A patch file for the complete file fixing all known errors in
        that file
        >
        >
        > I'm tempted to just update the database every year, or I think it
        might
        > get too complicated.
        >
        >
        > > Let's then look at a proper naming standard. Perhaps:
        > > <table name>_<C/Y/P><year></P>_<Vyy>.txt
        >
        > > <table name> is obvious. C would be for the complete file, Y for
        a specific file
        > > and P for patches. <year> is year. Vxx is version nr.
        >
        >
        > <Naming standards deleted>
        >
        > I like the naming standards, though I do think we need to make it
        fairly
        > simple and probably don't need to have too many files each year.
        >
        > > /Micke
        >
        >
        >
        > Sincerely,
        > Sean Forman
        >
        > Baseball Stats! http://www.Baseball-Reference.com/
        > Baseball Analysis! http://www.BaseballPrimer.com/



        http://www.baseball-databank.org/

        To unsubscribe from this group, send an email to:
        baseball-databank-unsubscribe@yahoogroups.com



        Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
      • Sean Forman
        ... Not to leave you hanging, but I think we ll have some progress on this by opening day. Sorry for all the delays. Sincerely, Sean Forman Baseball Stats!
        Message 3 of 13 , Mar 8, 2002
        • 0 Attachment
          tmasc wrote:
          >
          > Have Sean and Sean made any decisions yet on how to "open-source" the
          > database?
          >
          > Just like to know where we stand, and how the rest of us can
          > contribute.
          >
          > Thanks, Tom (aka Tangotiger)

          Not to leave you hanging, but I think we'll have some progress on this
          by opening day. Sorry for all the delays.

          Sincerely,
          Sean Forman

          Baseball Stats! http://www.Baseball-Reference.com/
          Baseball Analysis! http://www.BaseballPrimer.com/
        Your message has been successfully submitted and would be delivered to recipients shortly.