Loading ...
Sorry, an error occurred while loading the content.

[ai-geostats] How to Estimate Variance with Weighted Samples?

Expand Messages
  • Edward Isaaks
    Hello List My inquiry is quite straight forward. I require an unbiased estimate of variance using weighted samples. There are several equations commonly used
    Message 1 of 2 , Oct 6, 2005
    • 0 Attachment
      Hello List
       
      My inquiry is quite straight forward.  I require an unbiased estimate of variance using weighted samples.  There are several equations commonly used to calculate an estimate of variance using weighted samples. But they are all slightly different and thus, they can't all be unbiased. Currently, my favorite equation is as follows:
      1. Calculate a weighted estimate of the mean as xbar  = Sum[(w_i  * x_i)] / Sum[w_i],   where x_i are sample values and w_i are the corresponding sample weights.
      2. Calculate the denominator D = Sum[w] - Sum[w * w] / Sum[w],  where Sum[w] is the sum of weights and Sum[w * w] is the sum of squared weights.
      3. Now, I believe an unbiased estimate of the variance is given by s2 = Sum[ w_i * (x_i - xbar) * (w_i - xbar)] / D where xbar is the weighted estimate of the mean. Do you agree?
       
      The thing that bothers me most is that JMP (an excellent EDA stat tool put out by SAS for those of you not familiar with JMP) calculates the weighted estimate of variance as follows: s2 = Sum[w_i * (x_i - xbar) * (x_i - xbar)] / N-1 where xbar is the weighted estimate of the mean. JMP Support insists that this equation is correct. However, it doesn't make any sense to me. Can anyone explain the theoretical basis or statistical model that might give some validity to this equation?
      Thank you for your response.
      Edward Isaaks.
       
    • Colin Daly
      RE: [ai-geostats] How to Estimate Variance with Weighted Samples? Hi Ed I agree with your result (although you have a slight typo in your formula - the second
      Message 2 of 2 , Oct 7, 2005
      • 0 Attachment
        RE: [ai-geostats] How to Estimate Variance with Weighted Samples?

        Hi Ed

         I agree with your result (although you have a slight typo in your formula - the second term should be (x_i-xbar) - not (w_i - xbar)
         To show this

         Might as well assume that the sum of the weights is 1 (otherwise, easy to normalise them)

         Then the result for the unbiased operator is     s2 = (1/(1-sum(w*w)) * sum(w_i * (x_i-xbar)*(x_i-xbar))  (this is same as yours)

         where as you used sum(w*w) is the sum of the squared weights. Note that if w_i=1/N then this reduces to the usual unbiased estimator of the variance.

         To demonstrate above, first note that since weights are normalised, E(xbar) = E(sum(w_i * x_i) = E(x)

         Note, we also must assume that the samples are independent.

         E( sum(w_i * (x_i-xbar)*(x_i-xbar)) ) = E( sum(w_i * (x_i*x_i  - 2*x_i*xbar + xbar*xbar)))
                                               = sum(w_i * E(x**2))  - 2*sum(E(w_i*x_i*(sum(w_j*x_j)))) + sum(E(w_i*w_j*x_i*x_j))
                                               = sum(w_i * E(x**2)) - sum(E(w_i*w_j*x_i*x_j)) 
                                               = E(x**2) - sum(E(w_i*w_j*x_i*x_j))   ....(1)

          2nd term on right hand side = sum_over_i_of(E(w_i*w_i*x_i*x_i)) + sum_for_i_not_equal_to_j_of(E(w_i*w_j*x_i*x_j))
                                      = sum(w_i*w_i*E(x**2)) + sum_for_i_not_equal_to_j_of((E(x)**2) *w_i*w_j)  (as x_i and x_j independent)
                                      = E(x**2)*sum(w_i*w_i) + (1-sum(w_i*w_i))*(E(x)**2)  (as 1 = sum(w_i)*sum(w_j) = sum(w_i*w_j))

          replacing in (1) gives
         
         E( sum(w_i * (x_i-xbar)*(x_i-xbar)) ) = E(x**2)*(1 - sum(w_i*w_i))   + (E(x)**2) * (1-sum(w_i*w_i))
                                               = (1 - sum(w_i*w_i)) * s2

           and so s2 = (1/(1-sum(w*w)) * sum(w_i * (x_i-xbar)*(x_i-xbar))      (2)

           as required





        if you assume that the weights are 1/N (all equal weights) then
                          1/(1-sum(w*w))  = 1/(1-1/N)= N/N-1,  and since the w_i in equation 2 is 1/N, then (2) becomes
                           s2 = 1/(N-1) sum((x_i-xbar)*(x_i-xbar))
                      which is the usual unbiased estimate of the mean

        JMP have simply stuck with the 1/N-1 term for denominator instead of correcting...

        Best Regards



        Colin Daly



        -----Original Message-----
        From: Edward Isaaks [mailto:ed@...]
        Sent: Fri 10/7/2005 1:26 AM
        To: AI-GEOSTATS
        Subject: [ai-geostats] How to Estimate Variance with Weighted Samples?

        Hello List

        My inquiry is quite straight forward.  I require an unbiased estimate of variance using weighted samples.  There are several equations commonly used to calculate an estimate of variance using weighted samples. But they are all slightly different and thus, they can't all be unbiased. Currently, my favorite equation is as follows:
        1. Calculate a weighted estimate of the mean as xbar  = Sum[(w_i  * x_i)] / Sum[w_i],   where x_i are sample values and w_i are the corresponding sample weights.
        2. Calculate the denominator D = Sum[w] - Sum[w * w] / Sum[w],  where Sum[w] is the sum of weights and Sum[w * w] is the sum of squared weights.
        3. Now, I believe an unbiased estimate of the variance is given by s2 = Sum[ w_i * (x_i - xbar) * (w_i - xbar)] / D where xbar is the weighted estimate of the mean. Do you agree?

        The thing that bothers me most is that JMP (an excellent EDA stat tool put out by SAS for those of you not familiar with JMP) calculates the weighted estimate of variance as follows: s2 = Sum[w_i * (x_i - xbar) * (x_i - xbar)] / N-1 where xbar is the weighted estimate of the mean. JMP Support insists that this equation is correct. However, it doesn't make any sense to me. Can anyone explain the theoretical basis or statistical model that might give some validity to this equation?
        Thank you for your response.
        Edward Isaaks.

      Your message has been successfully submitted and would be delivered to recipients shortly.