Re: [baseball-databank] Re: Games Played

• I think the biggest point Paul might be trying to make, and one which I m trying to get to as well, is: how many total games did a player play? Take Babe Ruth.
Message 1 of 28 , Aug 16, 2003
I think the biggest point Paul might be trying to
make, and one which I'm trying to get to as well, is:
how many total games did a player play?

Take Babe Ruth. If he has 120 OF games, and 20 games
as a pitcher, did he play 140 games?

If you have a defensive specialist with 50 games at
SS, and 48 games as a hitter, how many games did he
really play? 50? 52? 60?

So, what we need is a superset to the fielding,
hitting, pitching tables that tells you how many games
the player played.

Tom

> I generally understand the rules to dictate as
> follows:
>

• 2 cents: First, we always seem to lose momentum about this time of year, but it always picks up as soon as the season ends. Second, with the season end only 6
Message 1 of 28 , Aug 16, 2003
2 cents:

it always picks up as soon as the season ends.

Second, with the season end only 6 weeks away, I would suggest the
next big, important issue coming up is getting 2003 data into the
database as quickly as possible once the season ends.

Accordingly, for the next 6 weeks, I think the focus should be on
getting the existing data elements for 1871-2002 "in order" and ready
to have the 2003 data added and a new version created. This means
correcting known errors, finalizing any design changes and table
additions, etc. and not focusing so much on "new" data to add at this
time.

So, if any has specific items that need to be addressed, please post
them. If it's a design/database issue, someone else will have to
tackle it, but if it's a data issue, I will volunteer to be one of
the people trying to shore up the data in the next 6 weeks.

THANKS,
Kevin

• Sorry I misunderstood. Yes, I guess this is an issue. Also, I d be interested in a breakout of games as a DH, PH, and PR. Would a games played by position
Message 1 of 28 , Aug 16, 2003
Sorry I misunderstood. Yes, I guess this is an issue. Also, I'd be interested in a
breakout of games as a DH, PH, and PR.

Would a games played by position table do the trick? It could have Total Games,
Games as a P, C, 1B, 2B, 3B, SS, LF, CF, RF, DH, PH, and PR. Since we already have a
separate table for outfield games, it could go there.

• Agreed. My goal will be to do a release of the Access version of the database by mid-November. At that point, we should have been able to integrate 2003
Message 1 of 28 , Aug 16, 2003
Agreed. My goal will be to do a release of the Access version of the
database by mid-November. At that point, we should have been able to
integrate 2003 playing stats, postseason data, and award winners. As Kevin
suggests, any other data elements that are going to be added really need to
get wrapped up over the next 4-6 weeks. If we need to have some discussion
about what those specific elements are, let's do it. Last year, some people
expressed disappointment that some pieces weren't incorporated, and I'd like
to avoid that. Also, if there are going to be any design changes, I'd like
to give a heads up to the folks that maintain third-party applications.

Regards,
Sean Lahman

• ... I think this is an elegant way to address the problem, and I think a table with this data would add value by making possible any number of other queries.
Message 1 of 28 , Aug 16, 2003
> From: Jeff Burk [mailto:arkyvaughan@...]
> Would a games played by position table do the trick? It could
> have Total Games,
> Games as a P, C, 1B, 2B, 3B, SS, LF, CF, RF, DH, PH, and PR.
> Since we already have a
> separate table for outfield games, it could go there.

I think this is an elegant way to address the problem, and I think a table
with this data would add value by making possible any number of other
queries. I would say that the data exists for games played as a PR or PH is
not available for all seasons, as far as I know.

Regards,
Sean Lahman
• ... Design changes were last described on I believe Jan 31, 2003. Going forward, I think that should be incorporated. Kevin s park data is the huge thing to
Message 1 of 28 , Aug 16, 2003
View Source
--- Sean Lahman <slahman@...> wrote:
> Agreed. My goal will be to do a release of the
> Access version of the
> database by mid-November. At that point, we should
> have been able to
> integrate 2003 playing stats, postseason data, and
> award winners.
> Last year, some people
> expressed disappointment that some pieces weren't
> incorporated, and I'd like
> to avoid that. Also, if there are going to be any
> design changes, I'd like
> to give a heads up to the folks that maintain
> third-party applications.
>

Design changes were last described on I believe Jan
31, 2003. Going forward, I think that should be
incorporated.

Kevin's park data is the huge thing to deliver this
year. It's a remarkable database that needs to be
data.

A project plan should be issued if a mid-Nov target
date is required.

If the new design will be used, then you can count me
in for about 2-3 hours / week. Otherwise, I will
stand aside.

Finally, licencing should be discussed. I believe BDB
currently shows no disclaimer. KJOK has contributed
his great park data, Mike has contributed new data.
Michael W the Japan data. Tom Lewis and I produced
the normalized DB design.

Currently, Sean Lahman sells the database (with 90% of
the data probably originating from him), but incurs
all the bandwidth costs as well. Sean Forman sells ad
space at br.com, and again, incurs bandwidth costs,
plus the costs at bdb.com . I don't know where
Sinins, BP, and others get their data, but they also
profit from the data.

With all this new data, things aren't so cut/dried. I
know Sean/Sean mentioned various licencing models (GPL
and whatnot). This should be established for the
next release.

Tom

• ... I plan on start working toward integrating my Japanese baseball data with the BDB once the season ends, so the main thing I m concerned with is having the
Message 1 of 28 , Aug 16, 2003
KJOK-san wrote:

>2 cents:
>
>it always picks up as soon as the season ends.
>
>[...]
>
>So, if any has specific items that need to be addressed, please post
>them. If it's a design/database issue, someone else will have to
>tackle it, but if it's a data issue, I will volunteer to be one of
>the people trying to shore up the data in the next 6 weeks.
>

I plan on start working toward integrating my Japanese baseball data
with the BDB once the season ends, so the main thing I'm concerned with
is having the latest table schema set by that time. I'm very interested
in what the DB Design Committee has decided on for the next version of
the BDB (more numeric primary keys?), with Parks and other new tables
included.

From what I've gathered, the Access version (to which I don't have
access since I'm all UNIX-like) contains the most recent schema. I'm
still using the MySQL version from December of 2002. When will the
latest schema (and data) be available in mysqldump format?

Also, I was wondering if the DB Design Committee is on a different
channel (mailing list). Not much in the way of design is discussed
here, but that's where much of my interest is.

--
Michael Westbay
http://JapaneseBaseball.com
• ... missing for ... researchers. Did ... unopposed, ... if you ... there were an ... I ve mostly concentrated on the regular season tables, and had never
Message 1 of 28 , Aug 16, 2003
View Source
--- In baseball-databank@yahoogroups.com, Michael Mavrogiannis
<mmavrogi@o...> wrote:
> In the PitchingPost table, the fields GS, SHO, HR, and BAOpp are
missing for
> most years, with sporadic values sprinkled in to confuse
researchers. Did
> you know that Buck Becannon and Andy Pettitte started, apparently
unopposed,
> the sole post-season games of the 1884 and 1995 seasons? You would
if you
> were to believe the dirty lyin' database. As recently as 1999,
there were an
> odd number of games started in the post-season.
>
> Hope this helps.

I've mostly concentrated on the 'regular season' tables, and had
never really looked at PitchingPost, but you're right, it's weak -
doesn't have GS, or even Runs Allowed, HR's allowed, etc. which are
all available for post season games (at least 20th century ones). If
Sean will indulge me, I'd like to create a new PitchingPost table
that would have all of these plust BFP, GF, IBB, WP, GDP and even SH
and SF allowed, and would replace completely the current PitchingPost
Table.

THANKS,
Kevin
• ... Michael, Tom Lewis and I are the only members, and we delivered the design in Jan, and we have not made any changes/discussions since. Starting a separate
Message 1 of 28 , Aug 17, 2003
View Source
--- Michael Westbay
<westbaystars@...> wrote:
> Also, I was wondering if the DB Design Committee is
> on a different
> channel (mailing list). Not much in the way of
> design is discussed
> here, but that's where much of my interest is.
>

Michael,

Tom Lewis and I are the only members, and we delivered
changes/discussions since.

Starting a separate yahoo group sounds like a good
idea, and I'll send out a note Monday to that effect.
Anyone wishing to make contributions to the design
will be invited.

Things like keys and the like will be discussed
prominently, I'm sure.

I'll also look for past threads on this issue, so that
we'll all be on the same page.

Tom

• In response to Sean s email regarding roadmap and priorities, the message below is the section titles of the TO DO list of January. I would say this is
Message 1 of 28 , Aug 17, 2003
In response to Sean's email regarding roadmap and priorities, the
message below is the "section titles" of the "TO DO" list of
January. I would say this is what we should strive for.

As for Nov, 30 launch date of the next release, this should contain
the latest BDB design, including the park data.

Tom

> > ==========================
> > PRIORITY 1 - Handle ASAP
> > Type:
> > (1) any item that contains errors in data
> > (2) data that existed, and is now missing
> > ==========================
> > ==========================
> > PRIORITY 2 - Handle Very Very Soon
> > Type:
> > (1) Organizational, procedural items for handling BDB
> > ==========================
> > ==========================
> > PRIORITY 3 - Handle Very Soon
> > Type:
> > (1) Primary Tables
> > ==========================
> > ==========================
> > PRIORITY 4 - Handle Soon
> > Type:
> > (1) Design Issues, normalization, keys
> > (2) XREF to other databases
> > (3) Standards
> > ==========================
> > ==========================
> > PRIORITY 5 - Handle At Some point
> > Type:
> > (2) Secondary (tertiary?) data/tables
> > ==========================
> > ==========================
> > PRIORITY 6 - Unknown
> > ==========================
• Did we ever resolve if the baseball-databank data was going to contain games played information? Right now, there s no way from the 2004 data that was
Message 1 of 28 , Nov 17, 2004
View Source
Did we ever resolve if the baseball-databank data was going to
contain games played information? Right now, there's no way from
the 2004 data that was recently posted to determine how many
games many AL pitchers actually appeared in. I know we discussed
this before, but this is a pretty basic piece of information to
be missing, and I can't believe that the resolution was to keep
the batting tables the way they are. At one point someone
suggestion having an additional table to contain this single
piece of information, but this seems like a particularly inelegant
solution to this problem, especially when a much better and much
more obvious solution (use the games field in the batting table for
this) exists.

Tom Ruane
• I think I understand the concern. It is that pitchers are not credited with a game played in the batting table if they appeared in a game where the
Message 1 of 28 , Nov 17, 2004
View Source
I think I understand the concern. It is that pitchers are not credited with
a game played in the batting table if they appeared in a game where the
designated-hitter was in effect. In most cases, this isn't a problem
because if a pitcher only appeared in games as a pitcher, the fielding table
and the pitching table give a full and accurate report. The problem (I
believe) comes when an American League pitcher made appearances as a
pinch-hitter or pinch-runner.

What solution are you proposing, Tom? Is it that the batting table would
credit an AL pitcher for a game played if the DH was in effect? So then, a
pitcher who pinch hit in 2 games would have 30 games in the pitching table
and 32 in the batting table? (whereas now we show him with 30 in pitching
and 2 in batting).

If this is the case, can you update the batting table for the DH-era
pitchers?

Regards,
Sean Lahman

• I believe we said that the official league stats actually uses the G field in the batting record to denote all games played, regardless of whether the player
Message 1 of 28 , Nov 17, 2004
I believe we said that the official league stats
actually uses the "G" field in the batting record to
denote all games played, regardless of whether the
player ever batted.

My preferred solution is to have an APPEARANCES table,
where you'd have things like: Start, Pinch Hit, Pinch
Run, Field Sub. If we remember, the objective for the
BDB (eventually) is for it to be a DB geared for a DB
developer. That DB developer would generate a DB for
the user that he'd like to see, most notably, having
the "G" the way everyone is used to.

Tom

• ... What do we need to get this table set up? -- Sincerely, Sean Forman Baseball Stats! http://www.Baseball-Reference.com/
Message 1 of 28 , Nov 17, 2004
Tangotiger wrote:
> I believe we said that the official league stats
> actually uses the "G" field in the batting record to
> denote all games played, regardless of whether the
> player ever batted.
>
> My preferred solution is to have an APPEARANCES table,
> where you'd have things like: Start, Pinch Hit, Pinch
> Run, Field Sub. If we remember, the objective for the
> BDB (eventually) is for it to be a DB geared for a DB
> developer. That DB developer would generate a DB for
> the user that he'd like to see, most notably, having
> the "G" the way everyone is used to.
>
> Tom

What do we need to get this table set up?

--
Sincerely,
Sean Forman

Baseball Stats! http://www.Baseball-Reference.com/
• ... Let me go through the archives (tomorrow), as I seem to remember several good suggestions by the members here. I can write a script that can generate the
Message 1 of 28 , Nov 17, 2004
View Source
--- Sean Forman <sean-forman@...>
wrote:

>
>
> What do we need to get this table set up?
>
> --

Let me go through the archives (tomorrow), as I seem
to remember several good suggestions by the members
here. I can write a script that can generate the
APPEARNCES table. I'll have to also look for Michael
Mavrogiannis' notes on the matter, regarding the
differing use of "G" pre/post 1996, and probably
necessitating multiple scripts.

Tom

• ... Exactly. If you want to include a field indicating how many times a player actually appeared in the lineup (as opposed to appearing in the game), I would
Message 1 of 28 , Nov 18, 2004
Sean Lahman wrote:

> What solution are you proposing, Tom? Is it that the batting
> table would credit an AL pitcher for a game played if the DH
> was in effect? So then, a pitcher who pinch hit in 2 games
> would have 30 games in the pitching table and 32 in the batting
> table? (whereas now we show him with 30 in pitching and 2 in
> batting).

Exactly. If you want to include a field indicating how many
times a player actually appeared in the lineup (as opposed to
field for it. The games field in the batting table should
always equal the number of games played.

> If this is the case, can you update the batting table for the
> DH-era pitchers?

Last May I posted this data to the files section of the group.
It's in the "games" field of the ofdata.txt file (within the
newdat.zip file posting May 10th. I don't have the 2004 data
yet (actually, I had been hoping to get it from you :-)), but
I should have the 2004 event files shortly and can generate it
then.

Thanks.
Tom Ruane
• Thanks, Tom. Let me take a look at the file you posted, but I think I should be able to provide the 2004 data. Regards, Sean Lahman
Message 1 of 28 , Nov 18, 2004
Thanks, Tom. Let me take a look at the file you posted, but I think I
should be able to provide the 2004 data.

Regards,
Sean Lahman

