Loading ...
Sorry, an error occurred while loading the content.

Re: Proposed Data Additions

Expand Messages
  • tangotiger
    Great stuff! My comments are interspersed... ... Now this is an interesting category. One thing that I do that others may like is simply %PA against LH and
    Message 1 of 11 , May 8, 2004
    View Source
    • 0 Attachment
      Great stuff! My comments are interspersed...

      --- In baseball-databank@yahoogroups.com, "tjruane" <truane@v...>
      wrote:
      > gsl - starts against LHP
      > gsr - starts against RHP

      Now this is an interesting category. One thing that I do that others
      may like is simply %PA against LH and RH. (You only need one of them
      of course). This applies for batters and pitchers *and* fielders,
      and shows how much the platoon effect is working for or against the
      player. For fielders, it should only be on batted balls in park
      (excludes HR), excluding bunts.

      However, this might be too much work for the payoff, and I'm not sure
      if many others will find the benefit out of this that I have.

      > gb - ground balls
      > fb - fly balls

      Are these all batted balls, or just outs? (Batted balls would be
      better, if you've got it.) As well, Retro makes 5 classifications:
      - gb
      - fb
      - pops
      - liners
      - bunts

      Are pops and liners included in fb? If they are, I'd rename the
      field to "air balls". (You also have bunts that are popped to 1B.)
      Otherwise, I would keep them as 5 separate fields.

      Tom
    • tjruane
      ... This has the traditional meaning: balls caught before they hit the ground are coded fb and the rest are coded gb . So all the F, L and P codes are
      Message 2 of 11 , May 8, 2004
      View Source
      • 0 Attachment
        Tom wrote:

        > > gb - ground balls
        > > fb - fly balls
        >
        > Are these all batted balls, or just outs? (Batted balls would be
        > better, if you've got it.) As well, Retro makes 5 classifications:
        > - gb
        > - fb
        > - pops
        > - liners
        > - bunts

        This has the traditional meaning: balls caught before they hit the
        ground are coded "fb" and the rest are coded "gb". So all the F, L
        and P codes are grouped under fb. Bunts, if not modified with a P
        are treated as gb. Hits, where the data exists, are also treated
        in the same manner. I'm assuming here that hits that touch the
        ground in the infield are coded as /G and the rest are /L or /G.
        I'm pretty sure how this kind of data is usually handled and I
        don't think there would be much point to trying to differentiate
        between pop-ups, fly-outs and liners.

        Tom Ruane
      • Mike Emeigh
        At 12:41 PM 5/8/2004, Tom Ruane wrote: (snip) ... It would be a good idea to split out line drives where the data is available (if it s not too much work) -
        Message 3 of 11 , May 8, 2004
        View Source
        • 0 Attachment
          At 12:41 PM 5/8/2004, Tom Ruane wrote:
          (snip)
          >This has the traditional meaning: balls caught before they hit the
          >ground are coded "fb" and the rest are coded "gb". So all the F, L
          >and P codes are grouped under fb. Bunts, if not modified with a P
          >are treated as gb. Hits, where the data exists, are also treated
          >in the same manner. I'm assuming here that hits that touch the
          >ground in the infield are coded as /G and the rest are /L or /G.
          >I'm pretty sure how this kind of data is usually handled and I
          >don't think there would be much point to trying to differentiate
          >between pop-ups, fly-outs and liners.

          It would be a good idea to split out line drives where the data is
          available (if it's not too much work) - there are a fair number of people
          who are interested in looking at line drive rates.

          I presume that you meant that hits that do not touch the ground in the
          infield are coded as /L or /F, not /G.

          Mike Emeigh
          piratefan1@...
        • tangotiger
          ... people ... the ... One of the good piece of research done can be found here by Dan Levitt:
          Message 4 of 11 , May 8, 2004
          View Source
          • 0 Attachment
            --- In baseball-databank@yahoogroups.com, Mike Emeigh
            <piratefan1@b...> wrote:
            >
            > It would be a good idea to split out line drives where the data is
            > available (if it's not too much work) - there are a fair number of
            people
            > who are interested in looking at line drive rates.
            >
            > I presume that you meant that hits that do not touch the ground in
            the
            > infield are coded as /L or /F, not /G.
            >

            One of the good piece of research done can be found here by Dan
            Levitt:
            http://www.baseballstuff.com/btf/scholars/levitt/articles/fielding_opp
            s.htm

            If you look at this 2nd chart, you will note the % of hits fielded by
            each fielder. This is interesting at first, but then you realize
            that you would much prefer to see % of hits "passed through" each
            fielder's zone. You can figure that a huge portion of those numbers
            in Levitt's piece are really just the OF running in to get a hit that
            landed on the ground through the IF, but rolled to the OF.

            I bring this up because a hit to the OF is not necessarily a FB or LD
            but really a GB through the IF. I'm not sure how the earlier Retro
            years codes this data, if they follow the standards that we expect.
            It will be interesting to see how the early Retro data compares to
            the 1989-1992 data, which is presumed to be far more reliable and
            consistent.

            Tom
          • Mike Emeigh
            At 02:39 PM 5/8/2004, Tom wrote: (snip) ... True, but Tom Ruane s comment was referring to the coding for hits based on where they touched the ground ; e.g. a
            Message 5 of 11 , May 8, 2004
            View Source
            • 0 Attachment
              At 02:39 PM 5/8/2004, Tom wrote:
              (snip)

              >I bring this up because a hit to the OF is not necessarily a FB or LD
              >but really a GB through the IF.

              True, but Tom Ruane's comment was referring to the coding for hits based on
              where they "touched the ground"; e.g. a hit coded as S7/G would be
              considered to be a ground ball through the infield (probably past the SS)
              whereas a hit coded as S7/F or S7/L would be coded as having landed
              somewhere in the vicinity of the left fielder rather than an infielder. A
              hit coded just as S7 is going to be an unknown ball type, and there's
              really no good way to estimate the ball type for singles if it's not coded
              - one can generally assume that an extra-base hit without a coded ball type
              is a fly ball, but singles to the outfield could be anything. I usually
              treat singles without a coded ball type as line drives for those years when
              we have coded ball types for nearly all BIP, but we'd just be guessing at
              the ball type for those years where we don't have coded ball types for the
              majority of hits. I'd rather see those left as unknown rather than trying
              to estimate their distribution.

              Mike Emeigh
              piratefan1@...
            • tjruane
              ... I m not sure we can make that distinction for the overwhelming majority of our games. One of the assumptions I make is to assume that all unassisted
              Message 6 of 11 , May 8, 2004
              View Source
              • 0 Attachment
                Mike Emeigh wrote:

                > It would be a good idea to split out line drives where the data is
                > available (if it's not too much work) - there are a fair number of
                > people who are interested in looking at line drive rates.

                I'm not sure we can make that distinction for the overwhelming
                majority of our games. One of the assumptions I make is to
                assume that all unassisted putouts in the outfield (absent any
                other qualifier) is a fly ball. Since I have no way of telling
                whether this a liner or not (and since I would guess that well over
                75% of our games are missing "/L" and "/F" notations), I'm not sure
                how helpful it would be to attempt to make this distinction. What
                if I showed the following data for a pitcher's career:

                Year gb% ld% fb%
                1981 41 3 56
                1982 38 22 40
                1983 43 2 55
                1984 40 21 39

                How could you tell if the batters were hitting more liners in 1982
                and 1984 or that we simply had better event files for those years?
                This would be even more deceptive within a given year, if one
                team's scorer included this notation and the other didn't. I fear
                that the quality of our data does simply not support distinctions
                this fine, and any attempt to overreach in this manner would do
                more harm than good to the researchers attempting to use this data.

                > I presume that you meant that hits that do not touch the ground
                > in the infield are coded as /L or /F, not /G.

                Hopefully, that was obvious to everyone.

                Tom Ruane
              • Mike Emeigh
                ... That s fine. But that leads to the problem to which I alluded in my reply to Tom (Tango, that is). A play that is merely coded as S7 with no other
                Message 7 of 11 , May 8, 2004
                View Source
                • 0 Attachment
                  At 04:55 PM 5/8/2004, Tom Ruane wrote:
                  >Mike Emeigh wrote:
                  >
                  > > It would be a good idea to split out line drives where the data is
                  > > available (if it's not too much work) - there are a fair number of
                  > > people who are interested in looking at line drive rates.
                  >
                  >I'm not sure we can make that distinction for the overwhelming
                  >majority of our games.

                  That's fine. But that leads to the problem to which I alluded in my reply
                  to Tom (Tango, that is). A play that is merely coded as

                  S7

                  with no other qualifier can't be assumed to be a fly ball - it is often a
                  ground ball through the hole at short. The data pre-Baseball Workshop is
                  almost never of the quality where we can make that distinction on hits.

                  Mike Emeigh
                  piratefan1@...
                • tjruane
                  ... Agreed. And I don t make any such assumption in the data that I presented. Tom Ruane
                  Message 8 of 11 , May 8, 2004
                  View Source
                  • 0 Attachment
                    Mike Emeigh wrote:

                    > That's fine. But that leads to the problem to which I alluded in
                    > my reply to Tom (Tango, that is). A play that is merely coded as
                    >
                    > S7
                    >
                    > with no other qualifier can't be assumed to be a fly ball - it is
                    > often a ground ball through the hole at short. The data
                    > pre-Baseball Workshop is almost never of the quality where we can
                    > make that distinction on hits.

                    Agreed. And I don't make any such assumption in the data that I
                    presented.

                    Tom Ruane
                  • tjruane
                    Michael Mavrogiannis was kind enough to send me a list of problems with the data I posted a few days ago and I have updated the file with the fixes. I really
                    Message 9 of 11 , May 10, 2004
                    View Source
                    • 0 Attachment
                      Michael Mavrogiannis was kind enough to send me a list of problems
                      with the data I posted a few days ago and I have updated the file
                      with the fixes.

                      I really have to start sending Michael these kinds of things BEFORE
                      making them public :-).

                      Tom Ruane
                    • Sean Forman
                      ... Tom, Thank you for this data. We will work to incorporate it into the DB this summer, so that everyone can use it at will. -- Sincerely, Sean Forman
                      Message 10 of 11 , May 17, 2004
                      View Source
                      • 0 Attachment
                        tjruane wrote:
                        > I have just uploaded a file containing a first pass at some of the
                        > data addition we have discussed, both here and over at Retrolist,
                        > during the last few months. There are three files in newdat.zip.


                        Tom,

                        Thank you for this data. We will work to incorporate it into the DB
                        this summer, so that everyone can use it at will.


                        --
                        Sincerely,
                        Sean Forman

                        Baseball Stats! http://www.Baseball-Reference.com/
                      Your message has been successfully submitted and would be delivered to recipients shortly.