## Statistical Difference in HRs hit

Expand Messages
• Hi guys, I apologize for the stat-based question. If this is not the appropriate place, if you could point me in the right direction I would greatly apprecate
Message 1 of 2 , Nov 19, 2007
Hi guys,

I apologize for the stat-based question. If this is not the
appropriate place, if you could point me in the right direction I
would greatly apprecate it.

I have had some statistics courses a while back, but I am trying to
figure out what statistical method should be used if you wanted to
evalute if there is a difference in the number of homeruns being
hit. A basic example would be the number of home runs hit in 2007
versus 2006. Extending it further, you could look at a team relative
to the remainder of the league.

From what I understand, the number of HRs hit could be modeled as
continuous variable (basic tests of means), but I also look at it as
a discrete variable. What tests should you use if you were to look at
the difference between two counts? What about if you looked at it as
a rate (HR/game)? Can you perform hypothesis tests about a rate?

I have seen a lot of great work applying statistics to baseball data
on a number of blogs, so I am wondering what method would be
appropriate. I feel like I have just enough exposure to stats to
know that you can test for significant differences, but I am not sure
what methods are appropriate.

Again, I apologize if this post is beyond the scope of this forum.

- Brock
• Brock: There are a number of ways to skin your cat. One of the oldest and easiest to understand is the chi-square contingency table . In your case you would
Message 2 of 2 , Nov 20, 2007
Brock:
There are a number of ways to skin your cat. One of the oldest and
easiest to understand is the "chi-square contingency table".
In your case you would construct a table with two columns "number of
homers" and the number of "non homers". homers plus the non homers
might equal the total number of hits, or maybe at bats, etc.The rows in
this table are labeled 2006 and 2007 respectively.

You then sum across each row--the sums represent the total number of
homer plus the rest for each year. You then sum down each row--this
yields the numbers of homers hit in both years and the number of
nor-homers hit in both years.
These sums are known as :marginal totals. because the can be entered at
the end of every row and every column. If we now sum all of the row
totals ( or column totals) we obtain the grand total of all hits plus
non hits. this total is usually entered in the lower right portion of

From the values of the marginal totals you can then calculate the
chi-square statistic and look up the value in a table to determine
whether or not the homers differ significantly. You can ads as many
rows as you like (e.g. several years) and calculate chi square.

If you want to know the logic behind chi square and the formula send
me an email. Otherwise you can look it up on line or in a stats. book.

Bob Ehrlich

btibert3 wrote:

> Hi guys,
>
> I apologize for the stat-based question. If this is not the
> appropriate place, if you could point me in the right direction I
> would greatly apprecate it.
>
> I have had some statistics courses a while back, but I am trying to
> figure out what statistical method should be used if you wanted to
> evalute if there is a difference in the number of homeruns being
> hit. A basic example would be the number of home runs hit in 2007
> versus 2006. Extending it further, you could look at a team relative
> to the remainder of the league.
>
> >From what I understand, the number of HRs hit could be modeled as
> continuous variable (basic tests of means), but I also look at it as
> a discrete variable. What tests should you use if you were to look at
> the difference between two counts? What about if you looked at it as
> a rate (HR/game)? Can you perform hypothesis tests about a rate?
>
> I have seen a lot of great work applying statistics to baseball data
> on a number of blogs, so I am wondering what method would be
> appropriate. I feel like I have just enough exposure to stats to
> know that you can test for significant differences, but I am not sure
> what methods are appropriate.
>
> Again, I apologize if this post is beyond the scope of this forum.
>