Loading ...
Sorry, an error occurred while loading the content.

Re: [baseball-databank] Stints

Expand Messages
  • Chris Lambrou
    Me too.  It s the only way to indicate the correct order a player appeared on team within the same year. -Chris ________________________________ From: Alberto
    Message 1 of 26 , Mar 19, 2012
    View Source
    • 0 Attachment

      Me too.  It's the only way to indicate the correct order a player appeared on team within the same year.
      -Chris

      From: Alberto Perdomo <aperdomo@...>
      To: baseball-databank@yahoogroups.com
      Sent: Monday, March 19, 2012 9:58 AM
      Subject: Re: [baseball-databank] Stints

       
      Yes, I use stints to create statistics tables with the correct order.  


      On Mon, Mar 19, 2012 at 9:36 AM, Tangotiger <tom@...> wrote:
       
      I'd like to know if people actually use the stint information, or, if they
      end up summing at the player-team-year level.

      From my perspective, the value of the stint information is to get a
      chronology of team-movement. So, if a player was on the Mets in 1984, and
      started with the Mets in 1985, was traded to the Expos in 1985 and came
      back to the Mets in 1986, then it would be useful to see:
      1984 Mets
      1985 Mets
      1985 Expos
      1986 Mets

      as opposed to a random
      1984 Mets
      1985 Expos
      1985 Mets
      1986 Mets

      Is that the extent of it? Is there more use than that among the readers
      here?

      As for myself, I never need that stint information, and so, the first
      thing I have to do is sum at the player-team-year level.

      Is the stint id more trouble than it's worth?

      Tom




    • KJOK
      Rod - I didn t say it wasn t useful, just that it should NOT be in batting, pitching, fielding tables.  Same applies for the minor league data - there should
      Message 2 of 26 , Mar 19, 2012
      View Source
      • 0 Attachment
        Rod - I didn't say it wasn't useful, just that it should NOT be in batting, pitching, fielding tables.  Same applies for the minor league data - there should be a separate table with something like:
         
        ID                 Team Stint     StartDate EndDate
        JoeBlowID  Team1 1   19480401  19480530
        Joe BlowID  Team2 2   19480601  194800615
        JoeBlowID  Team1 3   19480616 19480825 
        Putting stints in batting, pitching, fielding is just a bad kluge. 
         
        THANKS,
        Kevin
        From: Rod Nelson <rodericnelson@...>
        To: baseball-databank@yahoogroups.com
        Sent: Monday, March 19, 2012 11:07 AM
        Subject: Re: [baseball-databank] Stints

         
        I'm really surprised that Kevin would say that stints are not useful in batting, pitching, fielding tables since I know that he has dealt with historical minor league seasons. For many years, the guides showed players in some league who appeared for multiple clubs as a single entry and their performance stats were summed. That integration is very problematic and should be avoided forevermore - no matter what Tom Tango might argue - because it makes team and league totals suspect, for one reason.   This is NOT something that should ever be contemplated. Same as with breaking out outfield by position. You won't miss it until it's gone.

        Is it worth it?  Of course it is.  But then again, we have access to a superior dataset.

        --
        Rod Nelson, Managing Editor
        The Emerald Guide to Baseball 2012
        Download it Free!  http://www.sabr.org/


        On Mon, Mar 19, 2012 at 11:55 AM, Tangotiger <tom@...> wrote:
        I agree with Kevin that it should appear in some other table.

        Unless, well, do people need to track the batting line of someone who
        played with the Mets in 1986 in Apr-May, with the Expos in Jun-Aug, then
        again with the Mets in Sept 1986, so that the two 1986 Mets lines are
        distinct?  If you have an online DB like BR.com, or BaseballCube, or
        Fangraphs, sure, maybe.  For the rest of us though?

        If ever we were to create a "splits" table, say, performance by home/away,
        we wouldn't REALLY do a Apr-May Home split AND a Sept Home split, would
        we?  (I mean, if you wanted to do that, you'd want it for every player, so
        you have a Home-month split, and you wouldn't want it only for those who
        have two distinct splints.)

        We've seen already that it's tripped up Lahman for the 2011 data, and it's
        alot of extra effort to get it right (and based on the recent post, it's
        still not right).  (Not criticizing Sean, just pointing out that it trips
        up even the most diligent of researchers.)  Imagine the rest of us who
        wouldn't necessarily always remember to link the stint IDs.  (I fell in
        the trap to this recently.)

        So, I go back to "is it worth it"?  Is it worth Sean's time to get it
        right, is it worth our time to validate it, and is it worth our time to
        first sum at the player-year-team level 99.99% of the time because we just
        don't need that splint id?

        Note: The splint-id existence predates my involved with the DB, so I'm
        sure this has been covered many years ago.

        Tom





      • Sean Forman
        Unless you want to show the stat lines for a player like Rob Ducey or Matt Luke who had multiple stints with a single team. sean ... Sean Forman Sports
        Message 3 of 26 , Mar 19, 2012
        View Source
        • 0 Attachment
          Unless you want to show the stat lines for a player like Rob Ducey or Matt Luke who had multiple stints with a single team.

          sean
          ---
          Sean Forman
          Sports Reference LLC, President
          http://www.sports-reference.com/



          On Mon, Mar 19, 2012 at 12:09 PM, Mike Emeigh <mwe55innc@...> wrote:
           

          I agree that stint data should be in a separate table - it's not going to vary from, say, batting to pitching to fielding, I wouldn't think.

          Where it's known, it might also be useful to have a start date and end date for each stint.

          Sent from my iPhone

          On Mar 19, 2012, at 11:55, "Tangotiger" <tom@...> wrote:

           

          I agree with Kevin that it should appear in some other table.

          Unless, well, do people need to track the batting line of someone who
          played with the Mets in 1986 in Apr-May, with the Expos in Jun-Aug, then
          again with the Mets in Sept 1986, so that the two 1986 Mets lines are
          distinct? If you have an online DB like BR.com, or BaseballCube, or
          Fangraphs, sure, maybe. For the rest of us though?

          If ever we were to create a "splits" table, say, performance by home/away,
          we wouldn't REALLY do a Apr-May Home split AND a Sept Home split, would
          we? (I mean, if you wanted to do that, you'd want it for every player, so
          you have a Home-month split, and you wouldn't want it only for those who
          have two distinct splints.)

          We've seen already that it's tripped up Lahman for the 2011 data, and it's
          alot of extra effort to get it right (and based on the recent post, it's
          still not right). (Not criticizing Sean, just pointing out that it trips
          up even the most diligent of researchers.) Imagine the rest of us who
          wouldn't necessarily always remember to link the stint IDs. (I fell in
          the trap to this recently.)

          So, I go back to "is it worth it"? Is it worth Sean's time to get it
          right, is it worth our time to validate it, and is it worth our time to
          first sum at the player-year-team level 99.99% of the time because we just
          don't need that splint id?

          Note: The splint-id existence predates my involved with the DB, so I'm
          sure this has been covered many years ago.

          Tom


        • Derek Adair
          So... I will admit up front that I am biased here, since I added them to the database in the first place. I had two motivations to add them. First, I was
          Message 4 of 26 , Mar 19, 2012
          View Source
          • 0 Attachment
            So... I will admit up front that I am biased here, since I added them to
            the database in the first place. I had two motivations to add them. First,
            I was annoyed by most of the database sites out there listing team
            breakdowns randomly before this information was available. Second, I was
            in a Strat-O-Matic retro league that was NL-only (and a similar AL
            counterpart). The league had crazy complicated eligibility rules revolving
            around stints, but ways to look up eligibility was difficult at best. So I
            added the concept of stint to the database. This allowed the league I was
            in to run successfully, but more importantly enabled sites like ESPN and
            b-r.com (and the myriad of smaller similar sites using this data) to order
            stints correctly in their views.

            Regards,
            Derek

            On Mon, 19 Mar 2012, Rod Nelson wrote:

            > I'm really surprised that Kevin would say that stints are not useful in
            > batting, pitching, fielding tables since I know that he has dealt with
            > historical minor league seasons. For many years, the guides showed players
            > in some league who appeared for multiple clubs as a single entry and their
            > performance stats were summed. That integration is very problematic and
            > should be avoided forevermore - no matter what Tom Tango might argue -
            > because it makes team and league totals suspect, for one reason. This is
            > NOT something that should ever be contemplated. Same as with breaking out
            > outfield by position. You won't miss it until it's gone.
            >
            > Is it worth it? Of course it is. But then again, we have access to a
            > superior dataset.
            >
            > --
            > *Rod Nelson, Managing Editor*
            > *The Emerald Guide to Baseball 2012*
            > *Download it Free! * http://www.sabr.org
            >
            >
            > On Mon, Mar 19, 2012 at 11:55 AM, Tangotiger <tom@...> wrote:
            >
            >> I agree with Kevin that it should appear in some other table.
            >>
            >> Unless, well, do people need to track the batting line of someone who
            >> played with the Mets in 1986 in Apr-May, with the Expos in Jun-Aug, then
            >> again with the Mets in Sept 1986, so that the two 1986 Mets lines are
            >> distinct? If you have an online DB like BR.com, or BaseballCube, or
            >> Fangraphs, sure, maybe. For the rest of us though?
            >>
            >> If ever we were to create a "splits" table, say, performance by home/away,
            >> we wouldn't REALLY do a Apr-May Home split AND a Sept Home split, would
            >> we? (I mean, if you wanted to do that, you'd want it for every player, so
            >> you have a Home-month split, and you wouldn't want it only for those who
            >> have two distinct splints.)
            >>
            >> We've seen already that it's tripped up Lahman for the 2011 data, and it's
            >> alot of extra effort to get it right (and based on the recent post, it's
            >> still not right). (Not criticizing Sean, just pointing out that it trips
            >> up even the most diligent of researchers.) Imagine the rest of us who
            >> wouldn't necessarily always remember to link the stint IDs. (I fell in
            >> the trap to this recently.)
            >>
            >> So, I go back to "is it worth it"? Is it worth Sean's time to get it
            >> right, is it worth our time to validate it, and is it worth our time to
            >> first sum at the player-year-team level 99.99% of the time because we just
            >> don't need that splint id?
            >>
            >> Note: The splint-id existence predates my involved with the DB, so I'm
            >> sure this has been covered many years ago.
            >>
            >> Tom
            >>
            >>
            >>
            >
          • Derek Adair
            Exactly, Sean. Was just coming to post this specifically. Stint being included in the pitching/fielding/batting subtables wasn t a kluge. It was a necessity to
            Message 5 of 26 , Mar 19, 2012
            View Source
            • 0 Attachment
              Exactly, Sean. Was just coming to post this specifically. Stint being
              included in the pitching/fielding/batting subtables wasn't a kluge. It was
              a necessity to handle the players listed below. Yes, they are rare. But
              throwing away cases because they're rare seems ill-advised.

              If you do care about the stints, they're there for you. If you don't care,
              collapsing is straightforward. If we remove the stints, or move them to
              a different table, the folks who do care about the Rob Duceys have zero
              way to get the info they need.

              Regards,
              Derek

              On Mon, 19 Mar 2012, Sean Forman wrote:

              > Unless you want to show the stat lines for a player like Rob Ducey or Matt
              > Luke who had multiple stints with a single team.
              >
              > sean
              > ---
              > Sean Forman
              > Sports Reference LLC, President
              > http://www.sports-reference.com/
              >
              >
              >
              > On Mon, Mar 19, 2012 at 12:09 PM, Mike Emeigh <mwe55innc@...> wrote:
              >
              >> **
              >>
              >>
              >> I agree that stint data should be in a separate table - it's not going to
              >> vary from, say, batting to pitching to fielding, I wouldn't think.
              >>
              >> Where it's known, it might also be useful to have a start date and end
              >> date for each stint.
              >>
              >> Sent from my iPhone
              >>
              >> On Mar 19, 2012, at 11:55, "Tangotiger" <tom@...> wrote:
              >>
              >>
              >>
              >> I agree with Kevin that it should appear in some other table.
              >>
              >> Unless, well, do people need to track the batting line of someone who
              >> played with the Mets in 1986 in Apr-May, with the Expos in Jun-Aug, then
              >> again with the Mets in Sept 1986, so that the two 1986 Mets lines are
              >> distinct? If you have an online DB like BR.com, or BaseballCube, or
              >> Fangraphs, sure, maybe. For the rest of us though?
              >>
              >> If ever we were to create a "splits" table, say, performance by home/away,
              >> we wouldn't REALLY do a Apr-May Home split AND a Sept Home split, would
              >> we? (I mean, if you wanted to do that, you'd want it for every player, so
              >> you have a Home-month split, and you wouldn't want it only for those who
              >> have two distinct splints.)
              >>
              >> We've seen already that it's tripped up Lahman for the 2011 data, and it's
              >> alot of extra effort to get it right (and based on the recent post, it's
              >> still not right). (Not criticizing Sean, just pointing out that it trips
              >> up even the most diligent of researchers.) Imagine the rest of us who
              >> wouldn't necessarily always remember to link the stint IDs. (I fell in
              >> the trap to this recently.)
              >>
              >> So, I go back to "is it worth it"? Is it worth Sean's time to get it
              >> right, is it worth our time to validate it, and is it worth our time to
              >> first sum at the player-year-team level 99.99% of the time because we just
              >> don't need that splint id?
              >>
              >> Note: The splint-id existence predates my involved with the DB, so I'm
              >> sure this has been covered many years ago.
              >>
              >> Tom
              >>
              >>
              >>
              >
            • Mike Emeigh
              This goes back to Tom s original question - how are people using stints? In a volunteer effort one has to evaluate the cost of maintaining and validating the
              Message 6 of 26 , Mar 19, 2012
              View Source
              • 0 Attachment
                This goes back to Tom's original question - how are people using stints?

                In a volunteer effort one has to evaluate the cost of maintaining and validating the information vs the value of providing it. If split data across multiple stints takes several people several weeks to validate each year, and isn't of value to anyone except a couple of people, does it really make sense to keep it? (Not saying those numbers are correct, just presenting it as a for-example).

                Sent from my iPhone

                On Mar 19, 2012, at 12:20, Sean Forman <sean-forman@...> wrote:

                 

                Unless you want to show the stat lines for a player like Rob Ducey or Matt Luke who had multiple stints with a single team.


                sean
                ---
                Sean Forman
                Sports Reference LLC, President
                http://www.sports-reference.com/



                On Mon, Mar 19, 2012 at 12:09 PM, Mike Emeigh <mwe55innc@...> wrote:
                 

                I agree that stint data should be in a separate table - it's not going to vary from, say, batting to pitching to fielding, I wouldn't think.

                Where it's known, it might also be useful to have a start date and end date for each stint.

                Sent from my iPhone

                On Mar 19, 2012, at 11:55, "Tangotiger" <tom@...> wrote:

                 

                I agree with Kevin that it should appear in some other table.

                Unless, well, do people need to track the batting line of someone who
                played with the Mets in 1986 in Apr-May, with the Expos in Jun-Aug, then
                again with the Mets in Sept 1986, so that the two 1986 Mets lines are
                distinct? If you have an online DB like BR.com, or BaseballCube, or
                Fangraphs, sure, maybe. For the rest of us though?

                If ever we were to create a "splits" table, say, performance by home/away,
                we wouldn't REALLY do a Apr-May Home split AND a Sept Home split, would
                we? (I mean, if you wanted to do that, you'd want it for every player, so
                you have a Home-month split, and you wouldn't want it only for those who
                have two distinct splints.)

                We've seen already that it's tripped up Lahman for the 2011 data, and it's
                alot of extra effort to get it right (and based on the recent post, it's
                still not right). (Not criticizing Sean, just pointing out that it trips
                up even the most diligent of researchers.) Imagine the rest of us who
                wouldn't necessarily always remember to link the stint IDs. (I fell in
                the trap to this recently.)

                So, I go back to "is it worth it"? Is it worth Sean's time to get it
                right, is it worth our time to validate it, and is it worth our time to
                first sum at the player-year-team level 99.99% of the time because we just
                don't need that splint id?

                Note: The splint-id existence predates my involved with the DB, so I'm
                sure this has been covered many years ago.

                Tom


              • railsplitter_44
                Just for the sake of a survey, I also use the stints in the Batting, Fielding, and Pitching tables for my website. Not only to correctly order the teams in
                Message 7 of 26 , Mar 19, 2012
                View Source
                • 0 Attachment
                  Just for the sake of a survey, I also use the stints in the Batting, Fielding, and Pitching tables for my website. Not only to correctly order the teams in each season, but to also correctly join the tables for players who played for the same team in 2 different stints.

                  I may be biased, but changing the structure of the database may cause many of us to have to change our existing code.

                  I don't imagine it would be too difficult to make a Master Stint table that (for example) shows Mike MacDougal CHA 1, Mike MacDougal WAS 2 and then apply it to Batting, Pitching, and Fielding tables.

                  Dan Hirsch
                • Rod Nelson
                  Zactly. We agree. That s the ideal format. Rod
                  Message 8 of 26 , Mar 19, 2012
                  View Source
                  • 0 Attachment
                    Zactly.  We agree. That's the ideal format.

                    Rod

                    On Mon, Mar 19, 2012 at 12:19 PM, KJOK <kjokbaseball@...> wrote:


                    Rod - I didn't say it wasn't useful, just that it should NOT be in batting, pitching, fielding tables.  Same applies for the minor league data - there should be a separate table with something like:
                     
                    ID                 Team Stint     StartDate EndDate
                    JoeBlowID  Team1 1   19480401  19480530
                    Joe BlowID  Team2 2   19480601  194800615
                    JoeBlowID  Team1 3   19480616 19480825 
                    Putting stints in batting, pitching, fielding is just a bad kluge. 
                     
                    THANKS,
                    Kevin
                    From: Rod Nelson <rodericnelson@...>
                    To: baseball-databank@yahoogroups.com
                    Sent: Monday, March 19, 2012 11:07 AM
                    Subject: Re: [baseball-databank] Stints

                     
                    I'm really surprised that Kevin would say that stints are not useful in batting, pitching, fielding tables since I know that he has dealt with historical minor league seasons. For many years, the guides showed players in some league who appeared for multiple clubs as a single entry and their performance stats were summed. That integration is very problematic and should be avoided forevermore - no matter what Tom Tango might argue - because it makes team and league totals suspect, for one reason.   This is NOT something that should ever be contemplated. Same as with breaking out outfield by position. You won't miss it until it's gone.

                    Is it worth it?  Of course it is.  But then again, we have access to a superior dataset.

                  • Tangotiger
                    Right, Mike has the spirit of my question. When I ask if it s worth it, I m not asking if it s useful. Clearly, it has many uses, be it the chrono-order of
                    Message 9 of 26 , Mar 19, 2012
                    View Source
                    • 0 Attachment
                      Right, Mike has the spirit of my question.

                      When I ask if it's worth it, I'm not asking if it's useful. Clearly, it
                      has many uses, be it the chrono-order of listing teams, or, for those who
                      need it, splitting the stints of guys who left a team and came back in the
                      same year.

                      I'm asking if the cost of that use is a justifiable cost.

                      We had bad data in the first release, and we have bad data according to
                      the recent post, and it's traced to the stint ID.

                      One alternative is as Kevin suggested, and simply having a separate table,
                      that lists the stints for the players. It keeps it away from that
                      batting, pitching, fielding table. And, you can even enhance it by
                      including dates. So, this handles the chrono-order need that some have.

                      That leaves us with the guys who left a team and came back in the same
                      year, while playing for some other MLB team in the same year. Rob Ducey
                      and whatever other players are similarly affected. Do we need to see his
                      batting and fielding line split between stints? If not, then a separate
                      STINTS table handles that as well. Indeed, if you really really need it,
                      you can have a StintsBatting table as an offshoot of the Batting table.

                      Think of the StintsBatting table as a split table, just as you might do a
                      HomeawayBatting table, you have a StintsBatting table, and it'll be
                      comrpised of Rob Ducey and the other handful of players around.

                      Everyone gets what they want here. Without the headache of having to
                      remember to sum the Batting, Pitching, Fielding tables. And, without what
                      we've seen so far, of having to contend with bad data, because of the
                      time-cost associated with it.

                      Tom
                    • KJOK
                      Exactly. For example, LH/RH splits are probably much more useful, but we dont have those in this database, primarily I suspect because they can be derived
                      Message 10 of 26 , Mar 19, 2012
                      View Source
                      • 0 Attachment
                        Exactly. For example, LH/RH splits are probably much more useful, but we dont' have those in this database, primarily I suspect because they can be derived from Retrosheet data.  Same applies for the Rob Ducey two teams in one year guys - for any player post-1950 at least, you can derive their batting, fielding, pitching stint-segregated info from Retrosheet.
                        THANKS,
                        Kevin
                        From: Mike Emeigh <mwe55innc@...>
                        To: "baseball-databank@yahoogroups.com" <baseball-databank@yahoogroups.com>
                        Sent: Monday, March 19, 2012 11:35 AM
                        Subject: Re: [baseball-databank] Stints

                         
                        This goes back to Tom's original question - how are people using stints?

                        In a volunteer effort one has to evaluate the cost of maintaining and validating the information vs the value of providing it. If split data across multiple stints takes several people several weeks to validate each year, and isn't of value to anyone except a couple of people, does it really make sense to keep it? (Not saying those numbers are correct, just presenting it as a for-example).

                        Sent from my iPhone

                        On Mar 19, 2012, at 12:20, Sean Forman <sean-forman@...> wrote:

                         
                        Unless you want to show the stat lines for a player like Rob Ducey or Matt Luke who had multiple stints with a single team.

                        sean
                        ---
                        Sean Forman
                        Sports Reference LLC, President
                        http://www.sports-reference.com/



                        On Mon, Mar 19, 2012 at 12:09 PM, Mike Emeigh <mwe55innc@...> wrote:
                         
                        I agree that stint data should be in a separate table - it's not going to vary from, say, batting to pitching to fielding, I wouldn't think.

                        Where it's known, it might also be useful to have a start date and end date for each stint.

                        Sent from my iPhone

                        On Mar 19, 2012, at 11:55, "Tangotiger" <tom@...> wrote:

                         
                        I agree with Kevin that it should appear in some other table.

                        Unless, well, do people need to track the batting line of someone who
                        played with the Mets in 1986 in Apr-May, with the Expos in Jun-Aug, then
                        again with the Mets in Sept 1986, so that the two 1986 Mets lines are
                        distinct? If you have an online DB like BR.com, or BaseballCube, or
                        Fangraphs, sure, maybe. For the rest of us though?

                        If ever we were to create a "splits" table, say, performance by home/away,
                        we wouldn't REALLY do a Apr-May Home split AND a Sept Home split, would
                        we? (I mean, if you wanted to do that, you'd want it for every player, so
                        you have a Home-month split, and you wouldn't want it only for those who
                        have two distinct splints.)

                        We've seen already that it's tripped up Lahman for the 2011 data, and it's
                        alot of extra effort to get it right (and based on the recent post, it's
                        still not right). (Not criticizing Sean, just pointing out that it trips
                        up even the most diligent of researchers.) Imagine the rest of us who
                        wouldn't necessarily always remember to link the stint IDs. (I fell in
                        the trap to this recently.)

                        So, I go back to "is it worth it"? Is it worth Sean's time to get it
                        right, is it worth our time to validate it, and is it worth our time to
                        first sum at the player-year-team level 99.99% of the time because we just
                        don't need that splint id?

                        Note: The splint-id existence predates my involved with the DB, so I'm
                        sure this has been covered many years ago.

                        Tom




                      • Tangotiger
                        Another advantage of having a STINTS table is that you can incorporate minor leagues as well. If someone has a Mets-Expos-Mets needs for same-year players,
                        Message 11 of 26 , Mar 19, 2012
                        View Source
                        • 0 Attachment
                          Another advantage of having a STINTS table is that you can incorporate
                          minor leagues as well. If someone has a Mets-Expos-Mets needs for
                          same-year players, then surely someone else would have a
                          Boston-Pawtucket-Boston needs for same-year players. Not that STINTS
                          table *must* do that, but it *can* be used to accomodate that requirement.
                          Some guy gets sent down for three weeks, say JJ Hardy, then it might be
                          good to know when that happened.

                          And, as I said, a StintsBatting table will give the user what he needs, if
                          he needs it at that detail.

                          Basically, it's a question of doing a small redesign, and this will
                          localize any data quality issues in a very tight manner that will affect
                          only those people who need that data.

                          Tom
                        • Theodore Turocy
                          Without giving away (too many) trade secrets, I can state the following about the way I manage statistical data. There is only one statistical table,
                          Message 12 of 26 , Mar 19, 2012
                          View Source
                          • 0 Attachment

                            Without giving away (too many) trade secrets, I can state the following about the way I manage statistical data.

                            There is only one statistical table, 'players', which incorporates all batting, pitching, and fielding totals, as well as relevant metadata.  These include stints (within leagues), stints (global across all professional leagues), and first and last played dates.  These all have UUIDs assigned (and published, as those of you with Baseball ID working group affiliations know) which are guaranteed to be stable.

                            In my own experience, managing 'batting', 'pitching', and 'fielding' as separate tables is of dubious value - all you can do is screw things up.  Further, absence of a record is not necessarily indicative of zeroes; these are logically different.  Within the very limited scope of MLB statistics, it is not a bad assumption that a missing, e.g., pitching record means the player did not pitch; outside that, it's a questionable to dubious assumption.  So schemata based on this assumption do not scale well.

                            Similarly, having separate 'postseason' tables adds complexity without value.  A 'season_phase' column accomplishes the same much more elegantly.  Similar scoping principles can accommodate e.g., splits if desired.

                            TLT
                            --
                            Dr Theodore L Turocy
                            Chadwick Baseball Bureau



                            On 19 Mar 2012, at 16:19, KJOK wrote:

                             

                            Rod - I didn't say it wasn't useful, just that it should NOT be in batting, pitching, fielding tables.  Same applies for the minor league data - there should be a separate table with something like:
                             
                            ID                 Team Stint     StartDate EndDate
                            JoeBlowID  Team1 1   19480401  19480530
                            Joe BlowID  Team2 2   19480601  194800615
                            JoeBlowID  Team1 3   19480616 19480825 
                            Putting stints in batting, pitching, fielding is just a bad kluge. 
                             
                            THANKS,
                            Kevin
                            From: Rod Nelson <rodericnelson@...>
                            To: baseball-databank@yahoogroups.com
                            Sent: Monday, March 19, 2012 11:07 AM
                            Subject: Re: [baseball-databank] Stints

                             
                            I'm really surprised that Kevin would say that stints are not useful in batting, pitching, fielding tables since I know that he has dealt with historical minor league seasons. For many years, the guides showed players in some league who appeared for multiple clubs as a single entry and their performance stats were summed. That integration is very problematic and should be avoided forevermore - no matter what Tom Tango might argue - because it makes team and league totals suspect, for one reason.   This is NOT something that should ever be contemplated. Same as with breaking out outfield by position. You won't miss it until it's gone.

                            Is it worth it?  Of course it is.  But then again, we have access to a superior dataset.

                            --
                            Rod Nelson, Managing Editor
                            The Emerald Guide to Baseball 2012
                            Download it Free!  http://www.sabr.org/


                            On Mon, Mar 19, 2012 at 11:55 AM, Tangotiger <tom@...> wrote:
                            I agree with Kevin that it should appear in some other table.

                            Unless, well, do people need to track the batting line of someone who
                            played with the Mets in 1986 in Apr-May, with the Expos in Jun-Aug, then
                            again with the Mets in Sept 1986, so that the two 1986 Mets lines are
                            distinct?  If you have an online DB like BR.com, or BaseballCube, or
                            Fangraphs, sure, maybe.  For the rest of us though?

                            If ever we were to create a "splits" table, say, performance by home/away,
                            we wouldn't REALLY do a Apr-May Home split AND a Sept Home split, would
                            we?  (I mean, if you wanted to do that, you'd want it for every player, so
                            you have a Home-month split, and you wouldn't want it only for those who
                            have two distinct splints.)

                            We've seen already that it's tripped up Lahman for the 2011 data, and it's
                            alot of extra effort to get it right (and based on the recent post, it's
                            still not right).  (Not criticizing Sean, just pointing out that it trips
                            up even the most diligent of researchers.)  Imagine the rest of us who
                            wouldn't necessarily always remember to link the stint IDs.  (I fell in
                            the trap to this recently.)

                            So, I go back to "is it worth it"?  Is it worth Sean's time to get it
                            right, is it worth our time to validate it, and is it worth our time to
                            first sum at the player-year-team level 99.99% of the time because we just
                            don't need that splint id?

                            Note: The splint-id existence predates my involved with the DB, so I'm
                            sure this has been covered many years ago.

                            Tom







                          • anson2995
                            ... The problem of bad data don t have anything to do with the database design. It s 100% attributable to me, the person who processed most of the updates.
                            Message 13 of 26 , Mar 20, 2012
                            View Source
                            • 0 Attachment
                              "Tangotiger" <tom@...> wrote:
                              > I'm asking if the cost of that use is a justifiable cost.
                              > We had bad data in the first release, and we have bad data
                              > according to the recent post, and it's traced to the stint ID.

                              The problem of bad data don't have anything to do with the database design. It's 100% attributable to me, the person who processed most of the updates. It's the first time in several years that I made the offseason updates rather than Sean Forman, and the scripts I used to make and check the updates were outdated. It shouldn't be a problem in the future.

                              I think it's much more labor intensive to use and maintain a table that lists start and end dates in a separate transaction file, especially if we continue to maintain batting/pitching/fielding as separate files.

                              But I'm certainly open to further discussion, on this or other design issues.

                              Regards,
                              Sean Lahman
                            • Paul Golba
                              My vote is to keep the stints as is.  From a database perspective, it is much, much easier to take a stint divided table and sum it up to get the overall
                              Message 14 of 26 , Mar 24, 2012
                              View Source
                              • 0 Attachment
                                My vote is to keep the stints as is. 

                                From a database perspective, it is much, much easier to take a stint divided table and sum it up to get the overall numbers than it is to try to take a combined table and then split it back up using a separate stint table.  I suspect it would be harder for the administrator to maintain two separate tables.

                                From a baseball perspective, the stint field in valuable to determine how a player moved from team to team during a season.  This is pretty basic information.  Does everyone need this information?  No.  Is it useful for people who do need this information?  Yes.

                                Also, along with the team discrepancies on the stints that I noted to start this (unexpected) thread, there are also several hundred pitching stints in the years 2009-2011 that did not have a Batting record at all.  Almost all of them are in the AL and I suspect none of the pitchers involved ever batted.  This is not a huge deal, except that for all other seasons if a playerID had a Pitching record he always had a Batting record, even if he never batted.  You may already be aware of it at this point, but I mention it anyway.

                                Paul Golba


                                From: anson2995 <slahman@...>
                                To: baseball-databank@yahoogroups.com
                                Sent: Tuesday, March 20, 2012 9:39 AM
                                Subject: [baseball-databank] Re: Stints

                                 
                                "Tangotiger" <tom@...> wrote:
                                > I'm asking if the cost of that use is a justifiable cost.
                                > We had bad data in the first release, and we have bad data
                                > according to the recent post, and it's traced to the stint ID.

                                The problem of bad data don't have anything to do with the database design. It's 100% attributable to me, the person who processed most of the updates. It's the first time in several years that I made the offseason updates rather than Sean Forman, and the scripts I used to make and check the updates were outdated. It shouldn't be a problem in the future.

                                I think it's much more labor intensive to use and maintain a table that lists start and end dates in a separate transaction file, especially if we continue to maintain batting/pitching/fielding as separate files.

                                But I'm certainly open to further discussion, on this or other design issues.

                                Regards,
                                Sean Lahman



                              • Tangotiger
                                Paul, Your post is clear why we do *not* want to keep the stints as-is. The requirement about chrono-stints can already be addressed by a Stints table that
                                Message 15 of 26 , Mar 24, 2012
                                View Source
                                • 0 Attachment
                                  Paul,

                                  Your post is clear why we do *not* want to keep the stints as-is.

                                  The requirement about chrono-stints can already be addressed by a Stints
                                  table that shows the stint order. Several posters have already responded
                                  positively to this.

                                  The "batting" table's dual role has already caused problems in the past.
                                  I think the official MLB position is that the "batting" table is the "all
                                  appearances" table, so that any game gets recorded with a "batting"
                                  record, even if he didn't bat. (I'm not exactly sure about this, but I'm
                                  going on memory here, but it's consistent with players not batting still
                                  having a batting record.)

                                  Anyway, from a database perspective, we don't need to have a stint denoted
                                  in the batting *and* pitching *and* fielding tables, and ensure it
                                  matches. The key fields of tables are supposed to identify records in not
                                  such a rigid way that you would have to alter the key field if you find
                                  the data needs to be updated. That's why playerID fields should never
                                  change, even if a pitcher's name gets changed. You don't want to have the
                                  stint as a key field, if it means that it may change if we have new
                                  information. Imagine we introduce minor league data. Now, you've got
                                  MASSIVE changes in key fields for tons of players across multiple tables.
                                  (Think of players like JJ Hardy.)

                                  At least, with a Stints table, it will be localized to a single table,
                                  whose entire purpose is to track that. Indeed, you wouldn't even need to
                                  have the stintID have to be a key field.

                                  Had we started with a clean slate, the Stints would be treated
                                  equivalently to Home/Away splits or Inning splits or Starter/Relief
                                  splits. They'd be part of a child table.

                                  Tom



                                  > My vote is to keep the stints as is. 
                                  >
                                  >
                                  > From a database perspective, it is much, much easier to take a stint
                                  > divided table and sum it up to get the overall numbers than it is to try
                                  > to take a combined table and then split it back up using a separate stint
                                  > table.  I suspect it would be harder for the administrator to maintain
                                  > two separate tables.
                                  >
                                  > From a baseball perspective, the stint field in valuable to determine how
                                  > a player moved from team to team during a season.  This is pretty basic
                                  > information.  Does everyone need this information?  No.  Is it useful
                                  > for people who do need this information?  Yes.
                                  >
                                  > Also, along with the team discrepancies on the stints that I noted to
                                  > start this (unexpected) thread, there are also several hundred pitching
                                  > stints in the years 2009-2011 that did not have a Batting record at all. 
                                  > Almost all of them are in the AL and I suspect none of the pitchers
                                  > involved ever batted.  This is not a huge deal, except that for all other
                                  > seasons if a playerID had a Pitching record he always had a Batting
                                  > record, even if he never batted.  You may already be aware of it at this
                                  > point, but I mention it anyway.
                                  >
                                  > Paul Golba
                                  >
                                  >
                                  >
                                  > ________________________________
                                  > From: anson2995 <slahman@...>
                                  > To: baseball-databank@yahoogroups.com
                                  > Sent: Tuesday, March 20, 2012 9:39 AM
                                  > Subject: [baseball-databank] Re: Stints
                                  >
                                  >
                                  >  
                                  > "Tangotiger" <tom@...> wrote:
                                  >> I'm asking if the cost of that use is a justifiable cost.
                                  >> We had bad data in the first release, and we have bad data
                                  >> according to the recent post, and it's traced to the stint ID.
                                  >
                                  > The problem of bad data don't have anything to do with the database
                                  > design. It's 100% attributable to me, the person who processed most of the
                                  > updates. It's the first time in several years that I made the offseason
                                  > updates rather than Sean Forman, and the scripts I used to make and check
                                  > the updates were outdated. It shouldn't be a problem in the future.
                                  >
                                  > I think it's much more labor intensive to use and maintain a table that
                                  > lists start and end dates in a separate transaction file, especially if we
                                  > continue to maintain batting/pitching/fielding as separate files.
                                  >
                                  > But I'm certainly open to further discussion, on this or other design
                                  > issues.
                                  >
                                  > Regards,
                                  > Sean Lahman
                                  >
                                  >
                                  >


                                  ---------------------------------------------
                                  The Book--Playing The Percentages In Baseball
                                  http://www.InsideTheBook.com
                                • chrislambrou
                                  Does anyone have a comment on the reply below? I m always running into problems with appearances and have to use the fielding table. Not the best table to
                                  Message 16 of 26 , Mar 26, 2012
                                  View Source
                                  • 0 Attachment
                                    Does anyone have a comment on the reply below?

                                    I'm always running into problems with appearances and have to use the fielding table. Not the best table to JOIN with since almost every player has multiple records per season.

                                    Thanks,
                                    -Chris

                                    --- In baseball-databank@yahoogroups.com, "Tangotiger" <tom@...> wrote:
                                    >
                                    The "batting" table's dual role has already caused problems in the past.
                                    I think the official MLB position is that the "batting" table is the "all
                                    appearances" table, so that any game gets recorded with a "batting"
                                    record, even if he didn't bat. (I'm not exactly sure about this, but I'm
                                    going on memory here, but it's consistent with players not batting still
                                    having a batting record.)
                                  • Tangotiger
                                    There s an APPEARANCES table in the Lahman DB. I haven t verified it, but that might help you. Tom ... The Book--Playing The Percentages In Baseball
                                    Message 17 of 26 , Mar 26, 2012
                                    View Source
                                    • 0 Attachment
                                      There's an APPEARANCES table in the Lahman DB. I haven't verified it, but
                                      that might help you.

                                      Tom

                                      > Does anyone have a comment on the reply below?
                                      >
                                      > I'm always running into problems with appearances and have to use the
                                      > fielding table. Not the best table to JOIN with since almost every player
                                      > has multiple records per season.
                                      >
                                      > Thanks,
                                      > -Chris
                                      >
                                      > --- In baseball-databank@yahoogroups.com, "Tangotiger" <tom@...> wrote:
                                      >>
                                      > The "batting" table's dual role has already caused problems in the past.
                                      > I think the official MLB position is that the "batting" table is the "all
                                      > appearances" table, so that any game gets recorded with a "batting"
                                      > record, even if he didn't bat. (I'm not exactly sure about this, but I'm
                                      > going on memory here, but it's consistent with players not batting still
                                      > having a batting record.)
                                      >
                                      >


                                      ---------------------------------------------
                                      The Book--Playing The Percentages In Baseball
                                      http://www.InsideTheBook.com
                                    • Clay Dreslough
                                      Speaking of stints, is there a consensus on the abbreviation to use for the single-game wild card round when displaying player stats? For example, the current
                                      Message 18 of 26 , Apr 1 1:19 PM
                                      View Source
                                      • 0 Attachment
                                        Speaking of stints, is there a consensus on the abbreviation to use for
                                        the single-game wild card round when displaying player stats?

                                        For example, the current abbreviations are ALDS, NLDS, ALCS, NLCS and WS.

                                        Like this:

                                        http://bit.ly/HJ5beA

                                        I don't have the luxury of waiting until October. So, FWIW, we're using
                                        'ALWC' and 'NLWC' to label stat lines for the Wild Card Showdown.

                                        Clay
                                      • Clay Dreslough
                                        I don t fully understand what is being proposed, but I just wanted to take a moment to speak out in favor of backwards compatibility. For example, people
                                        Message 19 of 26 , Apr 6 3:33 PM
                                        View Source
                                        • 0 Attachment
                                          I don't fully understand what is being proposed, but I just wanted to
                                          take a moment to speak out in favor of backwards compatibility.

                                          For example, people bought "Puresim 4" ( a baseball simulation game by
                                          Shaun Sullivan). It loads data in the "Lahman" format. Even though the
                                          game was published in 2010, it can still load last year's database and
                                          this year's database. If the stint column goes away, many users will be
                                          unable to load next year's database.

                                          If it's too difficult to ensure that the stint data is correct, I'd
                                          rather see it 90% correct, but with documentation that (for example)
                                          stint data after 2010 may not be 100% accurate.

                                          Clay
                                        Your message has been successfully submitted and would be delivered to recipients shortly.