Re: [baseball-databank] Digest Number 24
- P Kreutzer wrote:
>I agree with Peter here. A race table doesn't have to be part of the
> Hi. I think this discussion of race has been interesting. But there is an
> obvious middle-ground.
> No, the racial definitions wouldn't be official, would be subjective and
> would sometimes be wrong, but that doesn't mean they might not help
> formulate interesting or useful studies. So, IMHO they should be compiled.
> But they are not official, are subjective and would sometimes be wrong, and
> so should not be included as part of the official baseball database. But
> there's no reason the larger database and race table couldn't be linked.
main biographical table. I think it is reasonable to have another
table, player_race, where someone can make assertions about player's
ethnicity and it will be linked back to the main biographical table. As
I see it this is the point of this project, to link all of these
disparate data sets together. Of course, we might end up with dueling
race tables (using different definitions), but that doesn't really
concern me at the moment.
Baseball Stats! http://www.Baseball-Reference.com/
Baseball Analysis! http://www.BaseballPrimer.com/
>In response to Sean L, and his list of Tiger-type players: I don'tThe problem is that the Census has thrown up their collective hands and
>know. However, I would classify them in any way that any definition
>says that they should be classified. If the 1998 US Census
>definition says that Jeter is white, then he is white.
said that each citizen gets to define their own race. Their sound
logic is that no one can really say what race one is. And what
happened in the last census is that we had a spike in the "Other"
cagegory, as people decided they were Euro-Asian or Mixed or
Eskimo/Black or whatever (10000+ actually reported White/Black/American
Indian/Asian 4-race mix and over 6 million reported two races). So,
even getting the info straight from the horse's mouth ("Mr. Jeter, can
you tell me your race for my records?") isn't going to net valid data.
Case in point, Tiger Woods Cablinasian self-classification.
Now, the only reasonable specific data that has been suggested which
would be truly discrete is the Yes/No question "Would this player have
been allowed in the major leagues between 1900 and 1946?" However, to
answer that question, we need to exhume Mountain Landis and ask for his
racially-skewed judgment. There are some people around who might still
be able to answer this question, but they're dying fast, and for the
rest of us, it's simply conjecture.
And even if we could get 80+% of this data from 1947 to today, I'm
still not convinced that we could make valid conclusions. For
instance, the data may show that Jackie Robinson and Larry Doby were
hit by pitches more often than other batters (or not). But this
doesn't mean it's racially charged. Maybe they were just tough guys
who crowded the plate. Or maybe they were just unlucky. And if you
extend the data for 10 or 20 years, how do you know when the results
switch from being race-based to being simple statistical abberations.
If anti-black sentiments had faded from most pitchers' minds by, say,
1960, is it valid to look at the data in 1961? 1958? When are the
results just blips in the data and not based on skin color?
Though I believe it to be a politically charged hot potato in the wrong
hands, I also don't believe it would be terribly revealing in the
"right" hands, either. It's not that it's "too hard," though it is,
it's that it's pretty meaningless in the big picture. Would it be
valid to check the stats of one pitcher and find that he hit
proportionally more black batters than white and then insinuate that
he's a bigot? Maybe his pitches that got away were just that. Maybe
there's really no point. Just maybe.
And, for the record, the three races used long ago to define people by
skin color were Caucasian, Negroid, and Mongoloid. These are
rightfully out of fashion now.
Do You Yahoo!?
Send FREE video emails in Yahoo! Mail!
- I'm with Sean F on this one. We could end up with 10 "race" tables.
The point is that you use a definition and apply it. Like the
Florida federal vote and the hanging ballot. So not everything is
cut and dried. I can live with it. Keep it in a separate table. I
can live with that too. The salary table I'm sure is up to
speculation with signing bonuses and deferred salaries.
The "correct" way to do it is the "present value" method. That is
what is really costing the team, and what the player is actually
getting. But then you have to "guess" at the discount rate.
However, we should get away from the "hot potato" issue. It's not a
question of "ok, so what does this prove?", etc. That's another
issue. You can't shoot down the conclusion, unless you look at the
analysis. So, can we just forget the "hot potato" part of this?
> And, for the record, the three races used long ago to define peopleby
> skin color were Caucasian, Negroid, and Mongoloid. These areThanks for pointing that out. It's been 20 years since I read about
> rightfully out of fashion now.
that, and I had the belief it was based on genetics, and not simply
- Going off on a tangent now...
----- Original Message -----
From: "tmasc" <tmasc@...>
Sent: Thursday, January 10, 2002 7:22 AM
> I'm with Sean F on this one. We could end up with 10 "race" tables.
> The point is that you use a definition and apply it. Like the
> Florida federal vote and the hanging ballot. So not everything is
> cut and dried. I can live with it. Keep it in a separate table. I
> can live with that too. The salary table I'm sure is up to
> speculation with signing bonuses and deferred salaries.
> The "correct" way to do it is the "present value" method. That is
> what is really costing the team, and what the player is actually
> getting. But then you have to "guess" at the discount rate.
Actually, I think the "correct" way to do it would be to include the whole
stream of payments. Then whoever is using the database can choose to
discount to present value or whatever. Depending on what you're trying to
measure, you may want to use different methods, and the database should
contain the raw data to let you use any method.
Obviously that's harder to model in a database. But this unofficial MLB
contracts web site
(http://www.bluemanc.demon.co.uk/baseball/mlbcontracts.htm) does a pretty
good job of it (which makes me think that the best way to organize records
would be by contract, not necessarily player/year).
- --- In baseball-databank@y..., Vinay Kumar <vinay@b...> wrote:
> Going off on a tangent now...the whole
> Actually, I think the "correct" way to do it would be to include
> stream of payments.I stand corrected. Yes, that's right. A database should worry about
data, and not all the calculations I brought up. That's part of the