## RE: [ai-geostats] Why degree of freedom is n-1

Expand Messages
• Reza, If you are just asking why n-1 in the formula commonly found in stat books for computing the sample variance s^2, it is so that we have an unbiased
Message 1 of 7 , Aug 25, 2005

Reza,

If you are just asking why n-1 in the formula commonly found in stat books for computing the sample variance s^2, it is so that we have an unbiased estimate of the population variance – look at a good calculus based probability and stat book.

Other estimation methods (e.g., maximum likelihood) divide by n instead of n-1.

Oh, while the n-1 does make the sample variance s^2 an unbiased estimate of the population variance sigma^2, taking the square root and getting the sample standard deviation s does not result in an unbiased estimated of the population standard deviation sigma.  Another reason some prefer m.l.e.

Best,

Bill

--

William V Harper, Mathematical Sciences

Otterbein College, Towers Hall 139, 1 Otterbein College

Westerville OH 43081-2006   USA

Office phone: 614-823-1417     Office Fax 614-823-3201

Faculty page: http://www.otterbein.edu/home/fac/WLLVHRPR

For the best in geostatistics: http://geoecosse.hypermart.net/

From: Reza Nazarian [mailto:rnazarian@...]
Sent: Thursday, August 25, 2005 3:23 PM
To: ai-geostats@...
Subject: [ai-geostats] Why degree of freedom is n-1

Dear Experts
Sorry may be the question is so basic .After searching my statistics books to find an answer with no great success, could you please explain me why we consider degree of freedom as n-1 in calculating variance. Thanks for your kind advises.

Very Best Regards
Reza Nazarian
Schlumberger Information Solutions
SONILS Oil Services Centre, Porto de Luanda , Angola

(Via UK : +44 (0)207 576 6306
* rnazarian@...
http://www.sis.slb.com

• Reza: Having N samples, and then n degrees of freedom. One degree of freedom is used (or taken) by the mean calculation. Then when you calculate the variance
Message 2 of 7 , Aug 25, 2005
Reza:

Having N samples, and then n degrees of freedom.
One degree of freedom is used (or taken)  by the mean calculation.
Then when you calculate the variance or the standard deviation, you only have left n-1 degrees of freedom.

Regards,

Federico

At 8/25/2005 Thursday 05:06 PM, you wrote:
Content-type: multipart/alternative;
boundary="Boundary_(ID_IIuAYacrun97cWTh9BuI0g)"
Content-class: urn:content-classes:message

Reza,

If you are just asking why n-1 in the formula commonly found in stat books for computing the sample variance s^2, it is so that we have an unbiased estimate of the population variance  look at a good calculus based probability and stat book.

Other estimation methods (e.g., maximum likelihood) divide by n instead of n-1.

Oh, while the n-1 does make the sample variance s^2 an unbiased estimate of the population variance sigma^2, taking the square root and getting the sample standard deviation s does not result in an unbiased estimated of the population standard deviation sigma.  Another reason some prefer m.l.e.

Best,

Bill

--
William V Harper, Mathematical Sciences
Otterbein
College, Towers Hall 139, 1 Otterbein College
Westerville OH 43081-2006  USA
Office phone: 614-823-1417     Office Fax 614-823-3201
Faculty page: http://www.otterbein.edu/home/fac/WLLVHRPR
For the best in geostatistics: http://geoecosse.hypermart.net/

From: Reza Nazarian [mailto:rnazarian@...]
Sent: Thursday, August 25, 2005 3:23 PM
To: ai-geostats@...
Subject: [ai-geostats] Why degree of freedom is n-1

Dear Experts
Sorry may be the question is so basic .After searching my statistics books to find an answer with no great success, could you please explain me why we consider degree of freedom as n-1 in calculating variance. Thanks for your kind advises.

Very Best Regards
Reza Nazarian
Schlumberger Information Solutions
SONILS Oil Services Centre, Porto de Luanda, Angola

(Via UK: +44 (0)207 576 6306
* rnazarian@...
http://www.sis.slb.com

* By using the ai-geostats mailing list you agree to follow its rules
( see http://www.ai-geostats.org/help_ai-geostats.htm )

* To unsubscribe to ai-geostats, send the following in the subject or in the body (plain text format) of an email message to sympa@...

Signoff ai-geostats
• Hello Reza, i think it is because variance needs mean to be estimated. while you assume that every observation are i.i.d when you estimate the mean (so you
Message 3 of 7 , Aug 25, 2005
Hello Reza,
i think it is because variance needs mean to be estimated. while you
assume that every observation are i.i.d when you estimate the mean (so
you divide by n), that's not true when you estimate variance, because
1 observation depends on the mean you previously calculated (you can
get the value of any observation from your sample, knowing all others
and the mean value, right?)
hum, hope it helps.
PS: Please if i'm wrong somewhere in my explanation (i don't think so,
Greetings, Manuel

On 8/25/05, Reza Nazarian <rnazarian@...> wrote:
> Dear Experts
> Sorry may be the question is so basic .After searching my statistics books
> to find an answer with no great success, could you please explain me why we
> consider degree of freedom as n-1 in calculating variance. Thanks for your
>
>
>
> Very Best Regards
> Reza Nazarian
> Schlumberger Information Solutions
> SONILS Oil Services Centre, Porto de Luanda, Angola
>
> (Via UK: +44 (0)207 576 6306
> * rnazarian@...
> http://www.sis.slb.com
>
> * By using the ai-geostats mailing list you agree to follow its rules
> ( see http://www.ai-geostats.org/help_ai-geostats.htm )
>
> * To unsubscribe to ai-geostats, send the following in the subject or in the
> body (plain text format) of an email message to sympa@...
>
> Signoff ai-geostats
>
>
• Dear Reza, the proof goes as follows: your question: why does one set empirical variance = s^2 = (sum(xi-xm)^2)/(n-1), where xm := sum(xi)/n (i=1...n), the
Message 4 of 7 , Aug 27, 2005
Dear Reza,

the proof goes as follows:

your question: why does one set empirical variance = s^2 =
(sum(xi-xm)^2)/(n-1), where xm := sum(xi)/n (i=1...n),
the estimated mean, is equivalent to asking: is s^2 an unbiased estimator
of the variance of the parent distribution, E(s^2) = Var(x) ?
(E = expectation value)

First, remember, Var(x) := E((x-Ex)^2) = E(x^2) - (Ex)^2.

Now, (n-1)*s^2 = sum(xi-xm)^2 = (calculate the square) =
(sum(xi^2)-sum(xm^2)) = (sum(xi^2) - n*xm^2)

Next take the expectation value of this,

(n-1)*E(s^2) = n*(E(x^2)-E(xm^2))

We know from the central limit theorem that E(xm) = E(x), and Var(xm) =
Var(x)/n. Therefore,

(n-1)*E(s^2) = n*(E(x^2) - (Ex)^2 + (Exm)^2 - E(xm^2)) ....(the 2nd and
3rd term cancel)

.... = n*((E(x^2)-(Ex)^2) - (E(xm^2)-(Exm)^2))=

= n*(Var(x) - Var(xm)) = n*Var(x)*(1-(1/n)) = (n-1) * Var(x), or

E(s^2) = Var(x) q.e.d.

There are different (very similar) versions of this proof, this one
follows closely Roger Barlow, Statistics, John Wiley & Sons 1989 (chapter
5.2.2.), which I find a good introduction into basic statistics.

best regards,
Peter

>Dear Experts
>> Sorry may be the question is so basic .After searching my statistics
>books
>> to find an answer with no great success, could you please explain me
>why we
>> consider degree of freedom as n-1 in calculating variance. Thanks for
>your

=================================================================
Dr. Peter Bossew
Division of Physics and Biophysics, University of Salzburg, Austria

home: A-1090 Vienna, Austria, Georg Sigl-Gasse 13/11, ph: +43-1-3177627
telefonino: +43-650-8625623
peter.bossew@...
peter.bossew@...

=================================================================
• Dear Madam/Sir I have to thank all of you for so great answers to my question on degree of freedom. I have gone through all of them. Also I have found an
Message 5 of 7 , Aug 29, 2005
I have to thank all of you for so great answers to my question on degree of freedom. I have gone through all of them. Also I have found an excellent explanation/solution or proof for that in Practical Geostatistics 2000 written by Dr. Isoble Clark and ... .(congratulations). I couldn't find it anywhere else with so deep in teaching the concepts. Again I have to thank you each and everybody.
Very Best Regards
Reza

At 08:22 PM 8/25/2005, you wrote:
Dear Experts
Sorry may be the question is so basic .After searching my statistics books to find an answer with no great success, could you please explain me why we consider degree of freedom as n-1 in calculating variance. Thanks for your kind advises.

Very Best Regards
Reza Nazarian
Schlumberger Information Solutions
SONILS Oil Services Centre, Porto de Luanda, Angola

(Via UK: +44 (0)207 576 6306
* rnazarian@...
http://www.sis.slb.com

* By using the ai-geostats mailing list you agree to follow its rules
( see http://www.ai-geostats.org/help_ai-geostats.htm )

* To unsubscribe to ai-geostats, send the following in the subject or in the body (plain text format) of an email message to sympa@...

Signoff ai-geostats

Very Best Regards

Reza Nazarian
Schlumberger Information Solutions
SONILS Oil Services Centre, Porto de Luanda, Angola

(Via UK: +44 (0)207 576 6306
* rnazarian@...
http://www.sis.slb.com

• This follow-up is slighlty aside the subject line of the mailing list, but as a geologist, this is the only statistically-flavoured one I am subscribed to.
Message 6 of 7 , Aug 31, 2005
This follow-up is slighlty aside the subject line of the mailing list, but
as a geologist, this is the only statistically-flavoured one I am
subscribed to. Therefore :

Federico Pardo <federico.pardo@...> said:
> Having N samples, and then n degrees of freedom.
> One degree of freedom is used (or taken) by the mean calculation.
> Then when you calculate the variance or the standard deviation, you only
> have left n-1 degrees of freedom.

Apart a rigorous calculation I am aware of that in this very case (cf.
Peter Bossew's contribution on the same thread, that details it), gives a
proof for this rule-of-thumb, what more or less rigourous statistical
developments gives consistance to it ?

I mean, for the empirical correlation coefficient,
rhoXiYi = SUM_i=1..N( (x_i - mx).(y_i - my) / sx / sy ) / WHAT_NUMBER
Must WHAT_NUMBER be, for a kind of unbiased estimate ("a kind of" meaning
"with some eventual Fisher z-transform"...):
* N for simplicity,
* N-2 as I have most frequently seen in books that dare give this formula
(N points, minus 1 for position and 1 for dispersion ?),
* or 2N-4 -- 2N for the (x_i,y_i), minus 4 for {mx,my,sx,sy} -- as a
strict application of the rule-of-thumb seems to suggest ?

And what about, when fitting for instance a 3-parameter non-linear
function, reducing the number of degrees of freedom, to N-3 (number of
points, minus one for each function parameter ? I have never read any kind
of explanation to support it, though it seems widely

Thanks in advance for enlightments or simply tracks for other resources of
explanations.
-- Éric L.
• Dear Reza I was away from my office for quite a while. After surfing my folder, I came across your enquiry. I found it helpful to share the following thoughts
Message 7 of 7 , Sep 17, 2005
Dear Reza

I was away from my office for quite a while. After surfing my folder, I
came across your enquiry. I found it helpful to share the following
thoughts with you and other colleagues over the list.

I prefer to approach your question from another angle.

At first, one has to acknowledge that almost all measurements are
corrupted by noise in one way or another. Furthermore, standard deviation is a
measure uncertainty in measurement. Now, keeping These points in mind, look
at the relation for calculating the standard deviation or for that matter
variance when you have only ONE measurement. If you use
the relation with n in the denominator, then you would get 0 for standard
deviation implying your single measurement is exact and not corrupted by
noise which is not true. On the other hand, relation with n-1 in the
denominator would give you 0/0 which is indeterminate more compatible with
preliminary propositions mentioned above.

Another useful question might be the origin of that equation which has
something to do with Normal probability distribution. The first chapter of
"Nonlinear parameter estimation by Bard (1974)" might be useful to refer
to as he was resorting to Entropy to derive Normal distribution and its
associated parameters.

Hope this helps.

Thanks
Abedini

On Thu, 25 Aug 2005, Reza Nazarian wrote:

> Dear Experts
> Sorry may be the question is so basic .After searching my statistics books to
> find an answer with no great success, could you please explain me why we
> consider degree of freedom as n-1 in calculating variance. Thanks for your
>
>
> Very Best Regards
> Reza Nazarian
> Schlumberger Information Solutions
> SONILS Oil Services Centre, Porto de Luanda, Angola
>
> (Via UK: +44 (0)207 576 6306
> * rnazarian@...
> http://www.sis.slb.com
>
Your message has been successfully submitted and would be delivered to recipients shortly.