Notes / Kish's Effective Sample Size


When we work with weighted samples, we need a way to calculate the effective sample size. Previously, I used the sum of all weights normalized by the maximum weight. In most cases, it worked OK.

Recently, Ben Jann pointed out that it would be better to use Kish’s formula to calculate the effective sample size. In this post, you find the formula and a few numerical simulations that illustrate the actual impact of the underlying sample size formula.

Let’s say we have a sample $x = \{ x_1, x_2, \ldots, x_n \}$ with a vector of corresponding weights $w = \{ w_1, w_2, \ldots, w_n \}$. In the non-weighted case (when all weights $w_i$ are equal), we can safely use the sample size $n$ in all equations that require the sample size. However, in the weighted case (when we have different values in $w$), we should perform some adjustments and calculate the effective sample size. Initially, I used the sum of all weights normalized by the maximum element:

$$ n_\textrm{eff/norm} = \dfrac{\sum_{i=1}^n w_i}{\max_{i=1}^{n} w_i}. $$

In kish1965, there is a better way to estimate the effective sample size:

$$ n_\textrm{eff/kish} = \frac{\Big( \sum_{i=1}^n w_i \Big)^2}{\sum_{i=1}^n w_i^2 }. $$

References (1)