Loading ...
Sorry, an error occurred while loading the content.

Re: [baseball-databank] Digest Number 24

Expand Messages
  • Sean Forman
    ... I agree with Peter here. A race table doesn t have to be part of the main biographical table. I think it is reasonable to have another table,
    Message 1 of 6 , Jan 10, 2002
    • 0 Attachment
      P Kreutzer wrote:
      >
      > Hi. I think this discussion of race has been interesting. But there is an
      > obvious middle-ground.
      >
      > No, the racial definitions wouldn't be official, would be subjective and
      > would sometimes be wrong, but that doesn't mean they might not help
      > formulate interesting or useful studies. So, IMHO they should be compiled.
      >
      > But they are not official, are subjective and would sometimes be wrong, and
      > so should not be included as part of the official baseball database. But
      > there's no reason the larger database and race table couldn't be linked.
      >
      > Win-win?
      > Peter

      I agree with Peter here. A race table doesn't have to be part of the
      main biographical table. I think it is reasonable to have another
      table, player_race, where someone can make assertions about player's
      ethnicity and it will be linked back to the main biographical table. As
      I see it this is the point of this project, to link all of these
      disparate data sets together. Of course, we might end up with dueling
      race tables (using different definitions), but that doesn't really
      concern me at the moment.

      Sincerely,
      Sean Forman

      Baseball Stats! http://www.Baseball-Reference.com/
      Baseball Analysis! http://www.BaseballPrimer.com/
    • Randy Cox
      ... The problem is that the Census has thrown up their collective hands and said that each citizen gets to define their own race. Their sound logic is that no
      Message 2 of 6 , Jan 10, 2002
      • 0 Attachment
        >In response to Sean L, and his list of Tiger-type players: I don't
        >know. However, I would classify them in any way that any definition
        >says that they should be classified. If the 1998 US Census
        >definition says that Jeter is white, then he is white.

        The problem is that the Census has thrown up their collective hands and
        said that each citizen gets to define their own race. Their sound
        logic is that no one can really say what race one is. And what
        happened in the last census is that we had a spike in the "Other"
        cagegory, as people decided they were Euro-Asian or Mixed or
        Eskimo/Black or whatever (10000+ actually reported White/Black/American
        Indian/Asian 4-race mix and over 6 million reported two races). So,
        even getting the info straight from the horse's mouth ("Mr. Jeter, can
        you tell me your race for my records?") isn't going to net valid data.
        Case in point, Tiger Woods Cablinasian self-classification.

        Now, the only reasonable specific data that has been suggested which
        would be truly discrete is the Yes/No question "Would this player have
        been allowed in the major leagues between 1900 and 1946?" However, to
        answer that question, we need to exhume Mountain Landis and ask for his
        racially-skewed judgment. There are some people around who might still
        be able to answer this question, but they're dying fast, and for the
        rest of us, it's simply conjecture.

        And even if we could get 80+% of this data from 1947 to today, I'm
        still not convinced that we could make valid conclusions. For
        instance, the data may show that Jackie Robinson and Larry Doby were
        hit by pitches more often than other batters (or not). But this
        doesn't mean it's racially charged. Maybe they were just tough guys
        who crowded the plate. Or maybe they were just unlucky. And if you
        extend the data for 10 or 20 years, how do you know when the results
        switch from being race-based to being simple statistical abberations.
        If anti-black sentiments had faded from most pitchers' minds by, say,
        1960, is it valid to look at the data in 1961? 1958? When are the
        results just blips in the data and not based on skin color?

        Though I believe it to be a politically charged hot potato in the wrong
        hands, I also don't believe it would be terribly revealing in the
        "right" hands, either. It's not that it's "too hard," though it is,
        it's that it's pretty meaningless in the big picture. Would it be
        valid to check the stats of one pitcher and find that he hit
        proportionally more black batters than white and then insinuate that
        he's a bigot? Maybe his pitches that got away were just that. Maybe
        there's really no point. Just maybe.

        And, for the record, the three races used long ago to define people by
        skin color were Caucasian, Negroid, and Mongoloid. These are
        rightfully out of fashion now.

        __________________________________________________
        Do You Yahoo!?
        Send FREE video emails in Yahoo! Mail!
        http://promo.yahoo.com/videomail/
      • tmasc
        I m with Sean F on this one. We could end up with 10 race tables. The point is that you use a definition and apply it. Like the Florida federal vote and
        Message 3 of 6 , Jan 10, 2002
        • 0 Attachment
          I'm with Sean F on this one. We could end up with 10 "race" tables.
          The point is that you use a definition and apply it. Like the
          Florida federal vote and the hanging ballot. So not everything is
          cut and dried. I can live with it. Keep it in a separate table. I
          can live with that too. The salary table I'm sure is up to
          speculation with signing bonuses and deferred salaries.
          The "correct" way to do it is the "present value" method. That is
          what is really costing the team, and what the player is actually
          getting. But then you have to "guess" at the discount rate.

          However, we should get away from the "hot potato" issue. It's not a
          question of "ok, so what does this prove?", etc. That's another
          issue. You can't shoot down the conclusion, unless you look at the
          analysis. So, can we just forget the "hot potato" part of this?

          > And, for the record, the three races used long ago to define people
          by
          > skin color were Caucasian, Negroid, and Mongoloid. These are
          > rightfully out of fashion now.
          Thanks for pointing that out. It's been 20 years since I read about
          that, and I had the belief it was based on genetics, and not simply
          skin color.
        • Vinay Kumar
          Going off on a tangent now... ... From: tmasc To: Sent: Thursday, January 10, 2002 7:22 AM ...
          Message 4 of 6 , Jan 10, 2002
          • 0 Attachment
            Going off on a tangent now...

            ----- Original Message -----
            From: "tmasc" <tmasc@...>
            To: <baseball-databank@yahoogroups.com>
            Sent: Thursday, January 10, 2002 7:22 AM


            > I'm with Sean F on this one. We could end up with 10 "race" tables.
            > The point is that you use a definition and apply it. Like the
            > Florida federal vote and the hanging ballot. So not everything is
            > cut and dried. I can live with it. Keep it in a separate table. I
            > can live with that too. The salary table I'm sure is up to
            > speculation with signing bonuses and deferred salaries.
            > The "correct" way to do it is the "present value" method. That is
            > what is really costing the team, and what the player is actually
            > getting. But then you have to "guess" at the discount rate.

            Actually, I think the "correct" way to do it would be to include the whole
            stream of payments. Then whoever is using the database can choose to
            discount to present value or whatever. Depending on what you're trying to
            measure, you may want to use different methods, and the database should
            contain the raw data to let you use any method.

            Obviously that's harder to model in a database. But this unofficial MLB
            contracts web site
            (http://www.bluemanc.demon.co.uk/baseball/mlbcontracts.htm) does a pretty
            good job of it (which makes me think that the best way to organize records
            would be by contract, not necessarily player/year).
          • tmasc
            ... the whole ... I stand corrected. Yes, that s right. A database should worry about data, and not all the calculations I brought up. That s part of the
            Message 5 of 6 , Jan 10, 2002
            • 0 Attachment
              --- In baseball-databank@y..., Vinay Kumar <vinay@b...> wrote:
              > Going off on a tangent now...
              >
              > Actually, I think the "correct" way to do it would be to include
              the whole
              > stream of payments.

              I stand corrected. Yes, that's right. A database should worry about
              data, and not all the calculations I brought up. That's part of the
              analysis.
            Your message has been successfully submitted and would be delivered to recipients shortly.