Loading ...
Sorry, an error occurred while loading the content.

2177[ai-geostats] Re: Why degree of freedom is n-1

Expand Messages
  • Isobel Clark
    Aug 31, 2005
    • 0 Attachment
      Hi Eric
       
      What complications! You should find, in any basic statistical inference that the correlation is divided by (n-1) and has (n-2) degrees of freedom.
       
      The logic behind this is because the correlation is actually calculated as the covariance divided by the two standard deviations.
       
      The covariance is calculated from n PAIRS of samples, not 2n individual observations and has (n-1) degrees of freedom because it uses the pair of means (m1,m2) as its centroid.
       
      Dividing by the pair (s1,s2) loses you the other degree of freedom. Tests on the correlation have (n-2) degrees of freedom.
       
      If you use (say) a regression relationship with 'k' coefficients including the constraint of the means, you lose k degrees of freedom. Any book which deals with 'Analysis of variance' will explain this for you. We use exactly this approach for testing a trend surface (see free tutorial at http://geoecosse.bizland.com/softwares or download my SNARK (1977) paper from http://uk.geocities.com/drisobelclark/resume).
       
      Hope this helps.
      Isobel 

      Eric.Lewin@... wrote:
      This follow-up is slighlty aside the subject line of the mailing list, but
      as a geologist, this is the only statistically-flavoured one I am
      subscribed to. Therefore :

      Federico Pardo said:
      > Having N samples, and then n degrees of freedom.
      > One degree of freedom is used (or taken) by the mean calculation.
      > Then when you calculate the variance or the standard deviation, you only
      > have left n-1 degrees of freedom.

      Apart a rigorous calculation I am aware of that in this very case (cf.
      Peter Bossew's contribution on the same thread, that details it), gives a
      proof for this rule-of-thumb, what more or less rigourous statistical
      developments gives consistance to it ?

      I mean, for the empirical correlation coefficient,
      rhoXiYi = SUM_i=1..N( (x_i - mx).(y_i - my) / sx / sy ) / WHAT_NUMBER
      Must WHAT_NUMBER be, for a kind of unbiased estimate ("a kind of" meaning
      "with some eventual Fisher z-transform"...):
      * N for simplicity,
      * N-2 as I have most frequently seen in books that dare give this formula
      (N points, minus 1 for position and 1 for dispersion ?),
      * or 2N-4 -- 2N for the (x_i,y_i), minus 4 for {mx,my,sx,sy} -- as a
      strict application of the rule-of-thumb seems to suggest ?

      And what about, when fitting for instance a 3-parameter non-linear
      function, reducing the number of degrees of freedom, to N-3 (number of
      points, minus one for each function parameter ? I have never read any kind
      of explanation to support it, though it seems widely

      Thanks in advance for enlightments or simply tracks for other resources of
      explanations.
      -- √Čric L.

      * By using the ai-geostats mailing list you agree to follow its rules
      ( see http://www.ai-geostats.org/help_ai-geostats.htm )

      * To unsubscribe to ai-geostats, send the following in the subject or in the body (plain text format) of an email message to sympa@...

      Signoff ai-geostats