Loading ...
Sorry, an error occurred while loading the content.

Re: Stints

Expand Messages
  • railsplitter_44
    Just for the sake of a survey, I also use the stints in the Batting, Fielding, and Pitching tables for my website. Not only to correctly order the teams in
    Message 1 of 26 , Mar 19, 2012
    • 0 Attachment
      Just for the sake of a survey, I also use the stints in the Batting, Fielding, and Pitching tables for my website. Not only to correctly order the teams in each season, but to also correctly join the tables for players who played for the same team in 2 different stints.

      I may be biased, but changing the structure of the database may cause many of us to have to change our existing code.

      I don't imagine it would be too difficult to make a Master Stint table that (for example) shows Mike MacDougal CHA 1, Mike MacDougal WAS 2 and then apply it to Batting, Pitching, and Fielding tables.

      Dan Hirsch
    • Tangotiger
      Another advantage of having a STINTS table is that you can incorporate minor leagues as well. If someone has a Mets-Expos-Mets needs for same-year players,
      Message 2 of 26 , Mar 19, 2012
      • 0 Attachment
        Another advantage of having a STINTS table is that you can incorporate
        minor leagues as well. If someone has a Mets-Expos-Mets needs for
        same-year players, then surely someone else would have a
        Boston-Pawtucket-Boston needs for same-year players. Not that STINTS
        table *must* do that, but it *can* be used to accomodate that requirement.
        Some guy gets sent down for three weeks, say JJ Hardy, then it might be
        good to know when that happened.

        And, as I said, a StintsBatting table will give the user what he needs, if
        he needs it at that detail.

        Basically, it's a question of doing a small redesign, and this will
        localize any data quality issues in a very tight manner that will affect
        only those people who need that data.

        Tom
      • Theodore Turocy
        Without giving away (too many) trade secrets, I can state the following about the way I manage statistical data. There is only one statistical table,
        Message 3 of 26 , Mar 19, 2012
        • 0 Attachment

          Without giving away (too many) trade secrets, I can state the following about the way I manage statistical data.

          There is only one statistical table, 'players', which incorporates all batting, pitching, and fielding totals, as well as relevant metadata.  These include stints (within leagues), stints (global across all professional leagues), and first and last played dates.  These all have UUIDs assigned (and published, as those of you with Baseball ID working group affiliations know) which are guaranteed to be stable.

          In my own experience, managing 'batting', 'pitching', and 'fielding' as separate tables is of dubious value - all you can do is screw things up.  Further, absence of a record is not necessarily indicative of zeroes; these are logically different.  Within the very limited scope of MLB statistics, it is not a bad assumption that a missing, e.g., pitching record means the player did not pitch; outside that, it's a questionable to dubious assumption.  So schemata based on this assumption do not scale well.

          Similarly, having separate 'postseason' tables adds complexity without value.  A 'season_phase' column accomplishes the same much more elegantly.  Similar scoping principles can accommodate e.g., splits if desired.

          TLT
          --
          Dr Theodore L Turocy
          Chadwick Baseball Bureau



          On 19 Mar 2012, at 16:19, KJOK wrote:

           

          Rod - I didn't say it wasn't useful, just that it should NOT be in batting, pitching, fielding tables.  Same applies for the minor league data - there should be a separate table with something like:
           
          ID                 Team Stint     StartDate EndDate
          JoeBlowID  Team1 1   19480401  19480530
          Joe BlowID  Team2 2   19480601  194800615
          JoeBlowID  Team1 3   19480616 19480825 
          Putting stints in batting, pitching, fielding is just a bad kluge. 
           
          THANKS,
          Kevin
          From: Rod Nelson <rodericnelson@...>
          To: baseball-databank@yahoogroups.com
          Sent: Monday, March 19, 2012 11:07 AM
          Subject: Re: [baseball-databank] Stints

           
          I'm really surprised that Kevin would say that stints are not useful in batting, pitching, fielding tables since I know that he has dealt with historical minor league seasons. For many years, the guides showed players in some league who appeared for multiple clubs as a single entry and their performance stats were summed. That integration is very problematic and should be avoided forevermore - no matter what Tom Tango might argue - because it makes team and league totals suspect, for one reason.   This is NOT something that should ever be contemplated. Same as with breaking out outfield by position. You won't miss it until it's gone.

          Is it worth it?  Of course it is.  But then again, we have access to a superior dataset.

          --
          Rod Nelson, Managing Editor
          The Emerald Guide to Baseball 2012
          Download it Free!  http://www.sabr.org/


          On Mon, Mar 19, 2012 at 11:55 AM, Tangotiger <tom@...> wrote:
          I agree with Kevin that it should appear in some other table.

          Unless, well, do people need to track the batting line of someone who
          played with the Mets in 1986 in Apr-May, with the Expos in Jun-Aug, then
          again with the Mets in Sept 1986, so that the two 1986 Mets lines are
          distinct?  If you have an online DB like BR.com, or BaseballCube, or
          Fangraphs, sure, maybe.  For the rest of us though?

          If ever we were to create a "splits" table, say, performance by home/away,
          we wouldn't REALLY do a Apr-May Home split AND a Sept Home split, would
          we?  (I mean, if you wanted to do that, you'd want it for every player, so
          you have a Home-month split, and you wouldn't want it only for those who
          have two distinct splints.)

          We've seen already that it's tripped up Lahman for the 2011 data, and it's
          alot of extra effort to get it right (and based on the recent post, it's
          still not right).  (Not criticizing Sean, just pointing out that it trips
          up even the most diligent of researchers.)  Imagine the rest of us who
          wouldn't necessarily always remember to link the stint IDs.  (I fell in
          the trap to this recently.)

          So, I go back to "is it worth it"?  Is it worth Sean's time to get it
          right, is it worth our time to validate it, and is it worth our time to
          first sum at the player-year-team level 99.99% of the time because we just
          don't need that splint id?

          Note: The splint-id existence predates my involved with the DB, so I'm
          sure this has been covered many years ago.

          Tom







        • anson2995
          ... The problem of bad data don t have anything to do with the database design. It s 100% attributable to me, the person who processed most of the updates.
          Message 4 of 26 , Mar 20, 2012
          • 0 Attachment
            "Tangotiger" <tom@...> wrote:
            > I'm asking if the cost of that use is a justifiable cost.
            > We had bad data in the first release, and we have bad data
            > according to the recent post, and it's traced to the stint ID.

            The problem of bad data don't have anything to do with the database design. It's 100% attributable to me, the person who processed most of the updates. It's the first time in several years that I made the offseason updates rather than Sean Forman, and the scripts I used to make and check the updates were outdated. It shouldn't be a problem in the future.

            I think it's much more labor intensive to use and maintain a table that lists start and end dates in a separate transaction file, especially if we continue to maintain batting/pitching/fielding as separate files.

            But I'm certainly open to further discussion, on this or other design issues.

            Regards,
            Sean Lahman
          • Paul Golba
            My vote is to keep the stints as is.  From a database perspective, it is much, much easier to take a stint divided table and sum it up to get the overall
            Message 5 of 26 , Mar 24, 2012
            • 0 Attachment
              My vote is to keep the stints as is. 

              From a database perspective, it is much, much easier to take a stint divided table and sum it up to get the overall numbers than it is to try to take a combined table and then split it back up using a separate stint table.  I suspect it would be harder for the administrator to maintain two separate tables.

              From a baseball perspective, the stint field in valuable to determine how a player moved from team to team during a season.  This is pretty basic information.  Does everyone need this information?  No.  Is it useful for people who do need this information?  Yes.

              Also, along with the team discrepancies on the stints that I noted to start this (unexpected) thread, there are also several hundred pitching stints in the years 2009-2011 that did not have a Batting record at all.  Almost all of them are in the AL and I suspect none of the pitchers involved ever batted.  This is not a huge deal, except that for all other seasons if a playerID had a Pitching record he always had a Batting record, even if he never batted.  You may already be aware of it at this point, but I mention it anyway.

              Paul Golba


              From: anson2995 <slahman@...>
              To: baseball-databank@yahoogroups.com
              Sent: Tuesday, March 20, 2012 9:39 AM
              Subject: [baseball-databank] Re: Stints

               
              "Tangotiger" <tom@...> wrote:
              > I'm asking if the cost of that use is a justifiable cost.
              > We had bad data in the first release, and we have bad data
              > according to the recent post, and it's traced to the stint ID.

              The problem of bad data don't have anything to do with the database design. It's 100% attributable to me, the person who processed most of the updates. It's the first time in several years that I made the offseason updates rather than Sean Forman, and the scripts I used to make and check the updates were outdated. It shouldn't be a problem in the future.

              I think it's much more labor intensive to use and maintain a table that lists start and end dates in a separate transaction file, especially if we continue to maintain batting/pitching/fielding as separate files.

              But I'm certainly open to further discussion, on this or other design issues.

              Regards,
              Sean Lahman



            • Tangotiger
              Paul, Your post is clear why we do *not* want to keep the stints as-is. The requirement about chrono-stints can already be addressed by a Stints table that
              Message 6 of 26 , Mar 24, 2012
              • 0 Attachment
                Paul,

                Your post is clear why we do *not* want to keep the stints as-is.

                The requirement about chrono-stints can already be addressed by a Stints
                table that shows the stint order. Several posters have already responded
                positively to this.

                The "batting" table's dual role has already caused problems in the past.
                I think the official MLB position is that the "batting" table is the "all
                appearances" table, so that any game gets recorded with a "batting"
                record, even if he didn't bat. (I'm not exactly sure about this, but I'm
                going on memory here, but it's consistent with players not batting still
                having a batting record.)

                Anyway, from a database perspective, we don't need to have a stint denoted
                in the batting *and* pitching *and* fielding tables, and ensure it
                matches. The key fields of tables are supposed to identify records in not
                such a rigid way that you would have to alter the key field if you find
                the data needs to be updated. That's why playerID fields should never
                change, even if a pitcher's name gets changed. You don't want to have the
                stint as a key field, if it means that it may change if we have new
                information. Imagine we introduce minor league data. Now, you've got
                MASSIVE changes in key fields for tons of players across multiple tables.
                (Think of players like JJ Hardy.)

                At least, with a Stints table, it will be localized to a single table,
                whose entire purpose is to track that. Indeed, you wouldn't even need to
                have the stintID have to be a key field.

                Had we started with a clean slate, the Stints would be treated
                equivalently to Home/Away splits or Inning splits or Starter/Relief
                splits. They'd be part of a child table.

                Tom



                > My vote is to keep the stints as is. 
                >
                >
                > From a database perspective, it is much, much easier to take a stint
                > divided table and sum it up to get the overall numbers than it is to try
                > to take a combined table and then split it back up using a separate stint
                > table.  I suspect it would be harder for the administrator to maintain
                > two separate tables.
                >
                > From a baseball perspective, the stint field in valuable to determine how
                > a player moved from team to team during a season.  This is pretty basic
                > information.  Does everyone need this information?  No.  Is it useful
                > for people who do need this information?  Yes.
                >
                > Also, along with the team discrepancies on the stints that I noted to
                > start this (unexpected) thread, there are also several hundred pitching
                > stints in the years 2009-2011 that did not have a Batting record at all. 
                > Almost all of them are in the AL and I suspect none of the pitchers
                > involved ever batted.  This is not a huge deal, except that for all other
                > seasons if a playerID had a Pitching record he always had a Batting
                > record, even if he never batted.  You may already be aware of it at this
                > point, but I mention it anyway.
                >
                > Paul Golba
                >
                >
                >
                > ________________________________
                > From: anson2995 <slahman@...>
                > To: baseball-databank@yahoogroups.com
                > Sent: Tuesday, March 20, 2012 9:39 AM
                > Subject: [baseball-databank] Re: Stints
                >
                >
                >  
                > "Tangotiger" <tom@...> wrote:
                >> I'm asking if the cost of that use is a justifiable cost.
                >> We had bad data in the first release, and we have bad data
                >> according to the recent post, and it's traced to the stint ID.
                >
                > The problem of bad data don't have anything to do with the database
                > design. It's 100% attributable to me, the person who processed most of the
                > updates. It's the first time in several years that I made the offseason
                > updates rather than Sean Forman, and the scripts I used to make and check
                > the updates were outdated. It shouldn't be a problem in the future.
                >
                > I think it's much more labor intensive to use and maintain a table that
                > lists start and end dates in a separate transaction file, especially if we
                > continue to maintain batting/pitching/fielding as separate files.
                >
                > But I'm certainly open to further discussion, on this or other design
                > issues.
                >
                > Regards,
                > Sean Lahman
                >
                >
                >


                ---------------------------------------------
                The Book--Playing The Percentages In Baseball
                http://www.InsideTheBook.com
              • chrislambrou
                Does anyone have a comment on the reply below? I m always running into problems with appearances and have to use the fielding table. Not the best table to
                Message 7 of 26 , Mar 26, 2012
                • 0 Attachment
                  Does anyone have a comment on the reply below?

                  I'm always running into problems with appearances and have to use the fielding table. Not the best table to JOIN with since almost every player has multiple records per season.

                  Thanks,
                  -Chris

                  --- In baseball-databank@yahoogroups.com, "Tangotiger" <tom@...> wrote:
                  >
                  The "batting" table's dual role has already caused problems in the past.
                  I think the official MLB position is that the "batting" table is the "all
                  appearances" table, so that any game gets recorded with a "batting"
                  record, even if he didn't bat. (I'm not exactly sure about this, but I'm
                  going on memory here, but it's consistent with players not batting still
                  having a batting record.)
                • Tangotiger
                  There s an APPEARANCES table in the Lahman DB. I haven t verified it, but that might help you. Tom ... The Book--Playing The Percentages In Baseball
                  Message 8 of 26 , Mar 26, 2012
                  • 0 Attachment
                    There's an APPEARANCES table in the Lahman DB. I haven't verified it, but
                    that might help you.

                    Tom

                    > Does anyone have a comment on the reply below?
                    >
                    > I'm always running into problems with appearances and have to use the
                    > fielding table. Not the best table to JOIN with since almost every player
                    > has multiple records per season.
                    >
                    > Thanks,
                    > -Chris
                    >
                    > --- In baseball-databank@yahoogroups.com, "Tangotiger" <tom@...> wrote:
                    >>
                    > The "batting" table's dual role has already caused problems in the past.
                    > I think the official MLB position is that the "batting" table is the "all
                    > appearances" table, so that any game gets recorded with a "batting"
                    > record, even if he didn't bat. (I'm not exactly sure about this, but I'm
                    > going on memory here, but it's consistent with players not batting still
                    > having a batting record.)
                    >
                    >


                    ---------------------------------------------
                    The Book--Playing The Percentages In Baseball
                    http://www.InsideTheBook.com
                  • Clay Dreslough
                    Speaking of stints, is there a consensus on the abbreviation to use for the single-game wild card round when displaying player stats? For example, the current
                    Message 9 of 26 , Apr 1, 2012
                    • 0 Attachment
                      Speaking of stints, is there a consensus on the abbreviation to use for
                      the single-game wild card round when displaying player stats?

                      For example, the current abbreviations are ALDS, NLDS, ALCS, NLCS and WS.

                      Like this:

                      http://bit.ly/HJ5beA

                      I don't have the luxury of waiting until October. So, FWIW, we're using
                      'ALWC' and 'NLWC' to label stat lines for the Wild Card Showdown.

                      Clay
                    • Clay Dreslough
                      I don t fully understand what is being proposed, but I just wanted to take a moment to speak out in favor of backwards compatibility. For example, people
                      Message 10 of 26 , Apr 6, 2012
                      • 0 Attachment
                        I don't fully understand what is being proposed, but I just wanted to
                        take a moment to speak out in favor of backwards compatibility.

                        For example, people bought "Puresim 4" ( a baseball simulation game by
                        Shaun Sullivan). It loads data in the "Lahman" format. Even though the
                        game was published in 2010, it can still load last year's database and
                        this year's database. If the stint column goes away, many users will be
                        unable to load next year's database.

                        If it's too difficult to ensure that the stint data is correct, I'd
                        rather see it 90% correct, but with documentation that (for example)
                        stint data after 2010 may not be 100% accurate.

                        Clay
                      Your message has been successfully submitted and would be delivered to recipients shortly.