Loading ...
Sorry, an error occurred while loading the content.

RE: [baseball-databank] Digest Number 24

Expand Messages
  • P Kreutzer
    Hi. I think this discussion of race has been interesting. But there is an obvious middle-ground. No, the racial definitions wouldn t be official, would be
    Message 1 of 6 , Jan 10, 2002
    • 0 Attachment
      Hi. I think this discussion of race has been interesting. But there is an
      obvious middle-ground.

      No, the racial definitions wouldn't be official, would be subjective and
      would sometimes be wrong, but that doesn't mean they might not help
      formulate interesting or useful studies. So, IMHO they should be compiled.

      But they are not official, are subjective and would sometimes be wrong, and
      so should not be included as part of the official baseball database. But
      there's no reason the larger database and race table couldn't be linked.

      Win-win?
      Peter
    • Sean Forman
      ... I agree with Peter here. A race table doesn t have to be part of the main biographical table. I think it is reasonable to have another table,
      Message 2 of 6 , Jan 10, 2002
      • 0 Attachment
        P Kreutzer wrote:
        >
        > Hi. I think this discussion of race has been interesting. But there is an
        > obvious middle-ground.
        >
        > No, the racial definitions wouldn't be official, would be subjective and
        > would sometimes be wrong, but that doesn't mean they might not help
        > formulate interesting or useful studies. So, IMHO they should be compiled.
        >
        > But they are not official, are subjective and would sometimes be wrong, and
        > so should not be included as part of the official baseball database. But
        > there's no reason the larger database and race table couldn't be linked.
        >
        > Win-win?
        > Peter

        I agree with Peter here. A race table doesn't have to be part of the
        main biographical table. I think it is reasonable to have another
        table, player_race, where someone can make assertions about player's
        ethnicity and it will be linked back to the main biographical table. As
        I see it this is the point of this project, to link all of these
        disparate data sets together. Of course, we might end up with dueling
        race tables (using different definitions), but that doesn't really
        concern me at the moment.

        Sincerely,
        Sean Forman

        Baseball Stats! http://www.Baseball-Reference.com/
        Baseball Analysis! http://www.BaseballPrimer.com/
      • Randy Cox
        ... The problem is that the Census has thrown up their collective hands and said that each citizen gets to define their own race. Their sound logic is that no
        Message 3 of 6 , Jan 10, 2002
        • 0 Attachment
          >In response to Sean L, and his list of Tiger-type players: I don't
          >know. However, I would classify them in any way that any definition
          >says that they should be classified. If the 1998 US Census
          >definition says that Jeter is white, then he is white.

          The problem is that the Census has thrown up their collective hands and
          said that each citizen gets to define their own race. Their sound
          logic is that no one can really say what race one is. And what
          happened in the last census is that we had a spike in the "Other"
          cagegory, as people decided they were Euro-Asian or Mixed or
          Eskimo/Black or whatever (10000+ actually reported White/Black/American
          Indian/Asian 4-race mix and over 6 million reported two races). So,
          even getting the info straight from the horse's mouth ("Mr. Jeter, can
          you tell me your race for my records?") isn't going to net valid data.
          Case in point, Tiger Woods Cablinasian self-classification.

          Now, the only reasonable specific data that has been suggested which
          would be truly discrete is the Yes/No question "Would this player have
          been allowed in the major leagues between 1900 and 1946?" However, to
          answer that question, we need to exhume Mountain Landis and ask for his
          racially-skewed judgment. There are some people around who might still
          be able to answer this question, but they're dying fast, and for the
          rest of us, it's simply conjecture.

          And even if we could get 80+% of this data from 1947 to today, I'm
          still not convinced that we could make valid conclusions. For
          instance, the data may show that Jackie Robinson and Larry Doby were
          hit by pitches more often than other batters (or not). But this
          doesn't mean it's racially charged. Maybe they were just tough guys
          who crowded the plate. Or maybe they were just unlucky. And if you
          extend the data for 10 or 20 years, how do you know when the results
          switch from being race-based to being simple statistical abberations.
          If anti-black sentiments had faded from most pitchers' minds by, say,
          1960, is it valid to look at the data in 1961? 1958? When are the
          results just blips in the data and not based on skin color?

          Though I believe it to be a politically charged hot potato in the wrong
          hands, I also don't believe it would be terribly revealing in the
          "right" hands, either. It's not that it's "too hard," though it is,
          it's that it's pretty meaningless in the big picture. Would it be
          valid to check the stats of one pitcher and find that he hit
          proportionally more black batters than white and then insinuate that
          he's a bigot? Maybe his pitches that got away were just that. Maybe
          there's really no point. Just maybe.

          And, for the record, the three races used long ago to define people by
          skin color were Caucasian, Negroid, and Mongoloid. These are
          rightfully out of fashion now.

          __________________________________________________
          Do You Yahoo!?
          Send FREE video emails in Yahoo! Mail!
          http://promo.yahoo.com/videomail/
        • tmasc
          I m with Sean F on this one. We could end up with 10 race tables. The point is that you use a definition and apply it. Like the Florida federal vote and
          Message 4 of 6 , Jan 10, 2002
          • 0 Attachment
            I'm with Sean F on this one. We could end up with 10 "race" tables.
            The point is that you use a definition and apply it. Like the
            Florida federal vote and the hanging ballot. So not everything is
            cut and dried. I can live with it. Keep it in a separate table. I
            can live with that too. The salary table I'm sure is up to
            speculation with signing bonuses and deferred salaries.
            The "correct" way to do it is the "present value" method. That is
            what is really costing the team, and what the player is actually
            getting. But then you have to "guess" at the discount rate.

            However, we should get away from the "hot potato" issue. It's not a
            question of "ok, so what does this prove?", etc. That's another
            issue. You can't shoot down the conclusion, unless you look at the
            analysis. So, can we just forget the "hot potato" part of this?

            > And, for the record, the three races used long ago to define people
            by
            > skin color were Caucasian, Negroid, and Mongoloid. These are
            > rightfully out of fashion now.
            Thanks for pointing that out. It's been 20 years since I read about
            that, and I had the belief it was based on genetics, and not simply
            skin color.
          • Vinay Kumar
            Going off on a tangent now... ... From: tmasc To: Sent: Thursday, January 10, 2002 7:22 AM ...
            Message 5 of 6 , Jan 10, 2002
            • 0 Attachment
              Going off on a tangent now...

              ----- Original Message -----
              From: "tmasc" <tmasc@...>
              To: <baseball-databank@yahoogroups.com>
              Sent: Thursday, January 10, 2002 7:22 AM


              > I'm with Sean F on this one. We could end up with 10 "race" tables.
              > The point is that you use a definition and apply it. Like the
              > Florida federal vote and the hanging ballot. So not everything is
              > cut and dried. I can live with it. Keep it in a separate table. I
              > can live with that too. The salary table I'm sure is up to
              > speculation with signing bonuses and deferred salaries.
              > The "correct" way to do it is the "present value" method. That is
              > what is really costing the team, and what the player is actually
              > getting. But then you have to "guess" at the discount rate.

              Actually, I think the "correct" way to do it would be to include the whole
              stream of payments. Then whoever is using the database can choose to
              discount to present value or whatever. Depending on what you're trying to
              measure, you may want to use different methods, and the database should
              contain the raw data to let you use any method.

              Obviously that's harder to model in a database. But this unofficial MLB
              contracts web site
              (http://www.bluemanc.demon.co.uk/baseball/mlbcontracts.htm) does a pretty
              good job of it (which makes me think that the best way to organize records
              would be by contract, not necessarily player/year).
            • tmasc
              ... the whole ... I stand corrected. Yes, that s right. A database should worry about data, and not all the calculations I brought up. That s part of the
              Message 6 of 6 , Jan 10, 2002
              • 0 Attachment
                --- In baseball-databank@y..., Vinay Kumar <vinay@b...> wrote:
                > Going off on a tangent now...
                >
                > Actually, I think the "correct" way to do it would be to include
                the whole
                > stream of payments.

                I stand corrected. Yes, that's right. A database should worry about
                data, and not all the calculations I brought up. That's part of the
                analysis.
              Your message has been successfully submitted and would be delivered to recipients shortly.