- May 31, 2004--- In APBR_analysis@yahoogroups.com, igor eduardo küpfer

<edkupfer@r...> wrote:> Okay, I ran the test again, this time using 03-04 results. Before I

show you

> what I got, let me address a couple of things.

I've had those moments, often inspired by arguments with soon-to-be

>

> Dean Oliver wrote:

> > --- In APBR_analysis@yahoogroups.com, igor eduardo küpfer

> > <edkupfer@r...> wrote:

> >>>> I'm not quite sure what endogenous means. If it means being

> >

> > I should note that, not being an economist, I like throwing this word

> > around without as great an appreciation or understanding for it as I

> > should.

>

> Hell, that's nothing. Once during the course of an argument with an

> ex-girlfriend I used the word "heretofore." I still don't know what it

> means.

ex-girlfriends. What the hell is "vis-a-vis"?

> >

term. How

> >> You'll have to help me out here, as I don't know anything about

> >> transforming data. Do you mean include Days^2 in addition to Days or

> >> instead of Days? I did both, and here's how they turned out:

> >

> > Include both, which you did in the first set below. Looks like it got

> > you significant on both days and days^2. And the signs are as

> > expected. It suggests optimal rest at about 3 days, longer than the 2

> > days we saw before. (Potentially important for the talk about rust vs

> > rest, esp if the Lakers wrap up on M.)

>

> Questions: I don't understand a couple of things about the squared

> did you know that squaring the Days variable would give a better

fit? And,

> just exactly how does it suggest the optimal 3 day rest?

I didn't _know_ it would give a better fit. I hoped it would because

of what we were observing -- that there was an optimal number of days

off. The only way to get an optimum out of a regression is to throw

in higher order terms. Usually a squared term is plenty. It doesn't

answer the bigger question of whether teams get rusty, though. It

suggests an answer (another lesson in how to lie with statistics), one

that I wouldn't trust from this study.

Look at the results of your regression. Take just the Days and Days^2

coefficients and calculate the marginal net points those terms

contribute for Days = 1, 2, 3, 4, etc. You'll see a max at 3.

>

So 2 days of rest is optimal.

> > Let me also ask -- is Days = 0

> > if a team plays back to back nights or is that Days = 1?

> >

>

> The latter. I am subtracting game dates from each other.

>

> > I'm sure there are other ways to manipulate things, but this looks

used

> > like a pretty good thing. I'm saving it.

> >

> > Home is a binary 1/0 indicator for home/road, resp?

>

> Yes.

>

> Okay. Here are the results for 03-04. For the Matchup Probability, I

> the team records heading into the game. For example, for two teams

playing

> their first games of the season, I would use 0-0 records for each

team in my

> probability calculation.

I was curious to see how you handled the early games of the season,

especially the times where one team was undefeated. It looks like you

used Pythagorean projections, rather than real records anyway. That

helps. But 0-0 usually requires some other assumption, like a

Bayesian prior that carries through the first few games.

>Interestingly, this doesn't seem to affect the

reduced in

> regression results too much. The effect of Days between games is

> this sample. Weird.

Not sure what to make of that weakening of the Days. What was the R2

of the previous version? We may have to improve the prior matchup P

to get back a reasonable estimate of the value of Days. If you just

look at games beyond the first 20 in the season, does r2 get better

and does Days become more significant?

>

DeanO

>

> PtsDiff = - 13.6 + 7.31 Home +0.000027 Distance + 18.1 WinProb + 0.722

> Days - 0.122 Days^2

>

> Predictor Coef SE Coef T P

> Constant -13.582 1.173 -11.58 0.000

> Home 7.3056 0.5010 14.58 0.000

> Distance 0.0000269 0.0003734 0.07 0.943

> WinProb 18.054 1.163 15.53 0.000

> Days 0.7221 0.7202 1.00 0.316

> Days^2 -0.1216 0.1138 -1.07 0.286

>

> S = 11.48 R-Sq = 16.7% R-Sq(adj) = 16.6%

>

> Analysis of Variance

>

> Source DF SS MS F P

> Regression 5 62072 12414 94.19 0.000

> Residual Error 2343 308806 132

> Total 2348 370877

>

Dean Oliver

Author, Basketball on Paper

http://www.basketballonpaper.com

"Excellent writing. There are a lot of math guys who just rush from

the numbers to the conclusion. . .they'll tell you that Shaq is a real

good player but his team would win a couple more games a year if he

could hit a free throw. Dean is more than that; he's really

struggling to understand the actual problem, rather than the

statistical after-image of it. I learn a lot by reading him." Bill

James, author Baseball Abstract - << Previous post in topic Next post in topic >>