Loading ...
Sorry, an error occurred while loading the content.

Re: Proposed Data Additions

Expand Messages
  • tjruane
    ... This has the traditional meaning: balls caught before they hit the ground are coded fb and the rest are coded gb . So all the F, L and P codes are
    Message 1 of 11 , May 8, 2004
    View Source
    • 0 Attachment
      Tom wrote:

      > > gb - ground balls
      > > fb - fly balls
      >
      > Are these all batted balls, or just outs? (Batted balls would be
      > better, if you've got it.) As well, Retro makes 5 classifications:
      > - gb
      > - fb
      > - pops
      > - liners
      > - bunts

      This has the traditional meaning: balls caught before they hit the
      ground are coded "fb" and the rest are coded "gb". So all the F, L
      and P codes are grouped under fb. Bunts, if not modified with a P
      are treated as gb. Hits, where the data exists, are also treated
      in the same manner. I'm assuming here that hits that touch the
      ground in the infield are coded as /G and the rest are /L or /G.
      I'm pretty sure how this kind of data is usually handled and I
      don't think there would be much point to trying to differentiate
      between pop-ups, fly-outs and liners.

      Tom Ruane
    • Mike Emeigh
      At 12:41 PM 5/8/2004, Tom Ruane wrote: (snip) ... It would be a good idea to split out line drives where the data is available (if it s not too much work) -
      Message 2 of 11 , May 8, 2004
      View Source
      • 0 Attachment
        At 12:41 PM 5/8/2004, Tom Ruane wrote:
        (snip)
        >This has the traditional meaning: balls caught before they hit the
        >ground are coded "fb" and the rest are coded "gb". So all the F, L
        >and P codes are grouped under fb. Bunts, if not modified with a P
        >are treated as gb. Hits, where the data exists, are also treated
        >in the same manner. I'm assuming here that hits that touch the
        >ground in the infield are coded as /G and the rest are /L or /G.
        >I'm pretty sure how this kind of data is usually handled and I
        >don't think there would be much point to trying to differentiate
        >between pop-ups, fly-outs and liners.

        It would be a good idea to split out line drives where the data is
        available (if it's not too much work) - there are a fair number of people
        who are interested in looking at line drive rates.

        I presume that you meant that hits that do not touch the ground in the
        infield are coded as /L or /F, not /G.

        Mike Emeigh
        piratefan1@...
      • tangotiger
        ... people ... the ... One of the good piece of research done can be found here by Dan Levitt:
        Message 3 of 11 , May 8, 2004
        View Source
        • 0 Attachment
          --- In baseball-databank@yahoogroups.com, Mike Emeigh
          <piratefan1@b...> wrote:
          >
          > It would be a good idea to split out line drives where the data is
          > available (if it's not too much work) - there are a fair number of
          people
          > who are interested in looking at line drive rates.
          >
          > I presume that you meant that hits that do not touch the ground in
          the
          > infield are coded as /L or /F, not /G.
          >

          One of the good piece of research done can be found here by Dan
          Levitt:
          http://www.baseballstuff.com/btf/scholars/levitt/articles/fielding_opp
          s.htm

          If you look at this 2nd chart, you will note the % of hits fielded by
          each fielder. This is interesting at first, but then you realize
          that you would much prefer to see % of hits "passed through" each
          fielder's zone. You can figure that a huge portion of those numbers
          in Levitt's piece are really just the OF running in to get a hit that
          landed on the ground through the IF, but rolled to the OF.

          I bring this up because a hit to the OF is not necessarily a FB or LD
          but really a GB through the IF. I'm not sure how the earlier Retro
          years codes this data, if they follow the standards that we expect.
          It will be interesting to see how the early Retro data compares to
          the 1989-1992 data, which is presumed to be far more reliable and
          consistent.

          Tom
        • Mike Emeigh
          At 02:39 PM 5/8/2004, Tom wrote: (snip) ... True, but Tom Ruane s comment was referring to the coding for hits based on where they touched the ground ; e.g. a
          Message 4 of 11 , May 8, 2004
          View Source
          • 0 Attachment
            At 02:39 PM 5/8/2004, Tom wrote:
            (snip)

            >I bring this up because a hit to the OF is not necessarily a FB or LD
            >but really a GB through the IF.

            True, but Tom Ruane's comment was referring to the coding for hits based on
            where they "touched the ground"; e.g. a hit coded as S7/G would be
            considered to be a ground ball through the infield (probably past the SS)
            whereas a hit coded as S7/F or S7/L would be coded as having landed
            somewhere in the vicinity of the left fielder rather than an infielder. A
            hit coded just as S7 is going to be an unknown ball type, and there's
            really no good way to estimate the ball type for singles if it's not coded
            - one can generally assume that an extra-base hit without a coded ball type
            is a fly ball, but singles to the outfield could be anything. I usually
            treat singles without a coded ball type as line drives for those years when
            we have coded ball types for nearly all BIP, but we'd just be guessing at
            the ball type for those years where we don't have coded ball types for the
            majority of hits. I'd rather see those left as unknown rather than trying
            to estimate their distribution.

            Mike Emeigh
            piratefan1@...
          • tjruane
            ... I m not sure we can make that distinction for the overwhelming majority of our games. One of the assumptions I make is to assume that all unassisted
            Message 5 of 11 , May 8, 2004
            View Source
            • 0 Attachment
              Mike Emeigh wrote:

              > It would be a good idea to split out line drives where the data is
              > available (if it's not too much work) - there are a fair number of
              > people who are interested in looking at line drive rates.

              I'm not sure we can make that distinction for the overwhelming
              majority of our games. One of the assumptions I make is to
              assume that all unassisted putouts in the outfield (absent any
              other qualifier) is a fly ball. Since I have no way of telling
              whether this a liner or not (and since I would guess that well over
              75% of our games are missing "/L" and "/F" notations), I'm not sure
              how helpful it would be to attempt to make this distinction. What
              if I showed the following data for a pitcher's career:

              Year gb% ld% fb%
              1981 41 3 56
              1982 38 22 40
              1983 43 2 55
              1984 40 21 39

              How could you tell if the batters were hitting more liners in 1982
              and 1984 or that we simply had better event files for those years?
              This would be even more deceptive within a given year, if one
              team's scorer included this notation and the other didn't. I fear
              that the quality of our data does simply not support distinctions
              this fine, and any attempt to overreach in this manner would do
              more harm than good to the researchers attempting to use this data.

              > I presume that you meant that hits that do not touch the ground
              > in the infield are coded as /L or /F, not /G.

              Hopefully, that was obvious to everyone.

              Tom Ruane
            • Mike Emeigh
              ... That s fine. But that leads to the problem to which I alluded in my reply to Tom (Tango, that is). A play that is merely coded as S7 with no other
              Message 6 of 11 , May 8, 2004
              View Source
              • 0 Attachment
                At 04:55 PM 5/8/2004, Tom Ruane wrote:
                >Mike Emeigh wrote:
                >
                > > It would be a good idea to split out line drives where the data is
                > > available (if it's not too much work) - there are a fair number of
                > > people who are interested in looking at line drive rates.
                >
                >I'm not sure we can make that distinction for the overwhelming
                >majority of our games.

                That's fine. But that leads to the problem to which I alluded in my reply
                to Tom (Tango, that is). A play that is merely coded as

                S7

                with no other qualifier can't be assumed to be a fly ball - it is often a
                ground ball through the hole at short. The data pre-Baseball Workshop is
                almost never of the quality where we can make that distinction on hits.

                Mike Emeigh
                piratefan1@...
              • tjruane
                ... Agreed. And I don t make any such assumption in the data that I presented. Tom Ruane
                Message 7 of 11 , May 8, 2004
                View Source
                • 0 Attachment
                  Mike Emeigh wrote:

                  > That's fine. But that leads to the problem to which I alluded in
                  > my reply to Tom (Tango, that is). A play that is merely coded as
                  >
                  > S7
                  >
                  > with no other qualifier can't be assumed to be a fly ball - it is
                  > often a ground ball through the hole at short. The data
                  > pre-Baseball Workshop is almost never of the quality where we can
                  > make that distinction on hits.

                  Agreed. And I don't make any such assumption in the data that I
                  presented.

                  Tom Ruane
                • tjruane
                  Michael Mavrogiannis was kind enough to send me a list of problems with the data I posted a few days ago and I have updated the file with the fixes. I really
                  Message 8 of 11 , May 10, 2004
                  View Source
                  • 0 Attachment
                    Michael Mavrogiannis was kind enough to send me a list of problems
                    with the data I posted a few days ago and I have updated the file
                    with the fixes.

                    I really have to start sending Michael these kinds of things BEFORE
                    making them public :-).

                    Tom Ruane
                  • Sean Forman
                    ... Tom, Thank you for this data. We will work to incorporate it into the DB this summer, so that everyone can use it at will. -- Sincerely, Sean Forman
                    Message 9 of 11 , May 17, 2004
                    View Source
                    • 0 Attachment
                      tjruane wrote:
                      > I have just uploaded a file containing a first pass at some of the
                      > data addition we have discussed, both here and over at Retrolist,
                      > during the last few months. There are three files in newdat.zip.


                      Tom,

                      Thank you for this data. We will work to incorporate it into the DB
                      this summer, so that everyone can use it at will.


                      --
                      Sincerely,
                      Sean Forman

                      Baseball Stats! http://www.Baseball-Reference.com/
                    Your message has been successfully submitted and would be delivered to recipients shortly.