- Hello ListMy inquiry is quite straight forward. I require an unbiased estimate of variance using weighted samples. There are several equations commonly used to calculate an estimate of variance using weighted samples. But they are all slightly different and thus, they can't all be unbiased. Currently, my favorite equation is as follows:1. Calculate a weighted estimate of the mean as xbar = Sum[(w_i * x_i)] / Sum[w_i], where x_i are sample values and w_i are the corresponding sample weights.2. Calculate the denominator D = Sum[w] - Sum[w * w] / Sum[w], where Sum[w] is the sum of weights and Sum[w * w] is the sum of squared weights.3. Now, I believe an unbiased estimate of the variance is given by s2 = Sum[ w_i * (x_i - xbar) * (w_i - xbar)] / D where xbar is the weighted estimate of the mean. Do you agree?The thing that bothers me most is that JMP (an excellent EDA stat tool put out by SAS for those of you not familiar with JMP) calculates the weighted estimate of variance as follows: s2 = Sum[w_i * (x_i - xbar) * (x_i - xbar)] / N-1 where xbar is the weighted estimate of the mean. JMP Support insists that this equation is correct. However, it doesn't make any sense to me. Can anyone explain the theoretical basis or statistical model that might give some validity to this equation?Thank you for your response.Edward Isaaks.
RE: [ai-geostats] How to Estimate Variance with Weighted Samples? Hi Ed

I agree with your result (although you have a slight typo in your formula - the second term should be (x_i-xbar) - not (w_i - xbar)

To show this

Might as well assume that the sum of the weights is 1 (otherwise, easy to normalise them)

Then the result for the unbiased operator is s2 = (1/(1-sum(w*w)) * sum(w_i * (x_i-xbar)*(x_i-xbar)) (this is same as yours)

where as you used sum(w*w) is the sum of the squared weights. Note that if w_i=1/N then this reduces to the usual unbiased estimator of the variance.

To demonstrate above, first note that since weights are normalised, E(xbar) = E(sum(w_i * x_i) = E(x)

Note, we also must assume that the samples are independent.

E( sum(w_i * (x_i-xbar)*(x_i-xbar)) ) = E( sum(w_i * (x_i*x_i - 2*x_i*xbar + xbar*xbar)))

= sum(w_i * E(x**2)) - 2*sum(E(w_i*x_i*(sum(w_j*x_j)))) + sum(E(w_i*w_j*x_i*x_j))

= sum(w_i * E(x**2)) - sum(E(w_i*w_j*x_i*x_j))

= E(x**2) - sum(E(w_i*w_j*x_i*x_j)) ....(1)

2nd term on right hand side = sum_over_i_of(E(w_i*w_i*x_i*x_i)) + sum_for_i_not_equal_to_j_of(E(w_i*w_j*x_i*x_j))

= sum(w_i*w_i*E(x**2)) + sum_for_i_not_equal_to_j_of((E(x)**2) *w_i*w_j) (as x_i and x_j independent)

= E(x**2)*sum(w_i*w_i) + (1-sum(w_i*w_i))*(E(x)**2) (as 1 = sum(w_i)*sum(w_j) = sum(w_i*w_j))

replacing in (1) gives

E( sum(w_i * (x_i-xbar)*(x_i-xbar)) ) = E(x**2)*(1 - sum(w_i*w_i)) + (E(x)**2) * (1-sum(w_i*w_i))

= (1 - sum(w_i*w_i)) * s2

and so s2 = (1/(1-sum(w*w)) * sum(w_i * (x_i-xbar)*(x_i-xbar)) (2)

as required

if you assume that the weights are 1/N (all equal weights) then

1/(1-sum(w*w)) = 1/(1-1/N)= N/N-1, and since the w_i in equation 2 is 1/N, then (2) becomes

s2 = 1/(N-1) sum((x_i-xbar)*(x_i-xbar))

which is the usual unbiased estimate of the mean

JMP have simply stuck with the 1/N-1 term for denominator instead of correcting...

Best Regards

Colin Daly

-----Original Message-----

From: Edward Isaaks [mailto:ed@...]

Sent: Fri 10/7/2005 1:26 AM

To: AI-GEOSTATS

Subject: [ai-geostats] How to Estimate Variance with Weighted Samples?

Hello List

My inquiry is quite straight forward. I require an unbiased estimate of variance using weighted samples. There are several equations commonly used to calculate an estimate of variance using weighted samples. But they are all slightly different and thus, they can't all be unbiased. Currently, my favorite equation is as follows:

1. Calculate a weighted estimate of the mean as xbar = Sum[(w_i * x_i)] / Sum[w_i], where x_i are sample values and w_i are the corresponding sample weights.

2. Calculate the denominator D = Sum[w] - Sum[w * w] / Sum[w], where Sum[w] is the sum of weights and Sum[w * w] is the sum of squared weights.

3. Now, I believe an unbiased estimate of the variance is given by s2 = Sum[ w_i * (x_i - xbar) * (w_i - xbar)] / D where xbar is the weighted estimate of the mean. Do you agree?

The thing that bothers me most is that JMP (an excellent EDA stat tool put out by SAS for those of you not familiar with JMP) calculates the weighted estimate of variance as follows: s2 = Sum[w_i * (x_i - xbar) * (x_i - xbar)] / N-1 where xbar is the weighted estimate of the mean. JMP Support insists that this equation is correct. However, it doesn't make any sense to me. Can anyone explain the theoretical basis or statistical model that might give some validity to this equation?

Thank you for your response.

Edward Isaaks.