\(\mathcal{U} = \left\{ 1, \cdots, N \right\}\): index set of finite population of size \(N\)
\(\mathcal{S}\): index set of sample. \((\mathcal{S} \subset \mathcal{U})\)
\(P\left( \mathcal{S} \right)\): probability of selecting sample \(\mathcal{S}\).
\(\pi_i = P( i \in \mathcal{S} )\): (the first-order) inclusion probability of unit \(i\). Note that \[ \pi_i = \sum_{\mathcal{S}; i \in \mathcal{S} } P( \mathcal{S}). \]
Sampling design (or sampling mechanism): Enumeration of possible \(\left( \mathcal{S}, P \left( \mathcal{S} \right) \right)\)
Probability sampling design: sampling design with \(\pi_i>0\) for all \(i \in \mathcal{U}\).
Definition: Simple random sampling (of size \(n\)) design \[ \iff P( \mathcal{S}) = \left\{ \begin{array}{ll} {N \choose n}^{-1} & \mbox{ if } |\mathcal{S} | = n \\ 0 & \mbox{ otherwise} \end{array} \right. \] where \[ {N \choose n} = \frac{ N!}{ n! (N-n)!} . \]
Define the following sampling indicator function \[ I_i = \left\{ \begin{array}{ll} 1 & \mbox{ if } i \in \mathcal{S} \\ 0 & \mbox{ otherwise} \end{array} \right. \]
By definition of \(I_i\), we have \[ \sum_{i=1}^N I_i = n \] where \(n\) is the (realized) sample size.
The first-order inclusion probability can be written as \[ \pi_i = Pr( I_i=1) = E( I_i) . \]
The sum of \(\pi_i\) over the finite population is equal to the sample size. That is, \[ \sum_{i=1}^N \pi_i = n.\]
Under SRS, \(\pi_i\) are all equal. That is, \(\pi_i= n/N\) for all \(\mathcal{U}\), where \(n=|\mathcal{S}|\) and \(N=|\mathcal{U}|\).
| Case | Sample ID | Selection Prob. |
|---|---|---|
| 1 | 1,2 | 0 |
| 2 | 1,3 | 1/4 |
| 3 | 1,4 | 1/4 |
| 4 | 2,3 | 1/4 |
| 5 | 2,4 | 1/4 |
| 6 | 3,4 | 0 |
\[ 1/N, 1/(N -1), 1/(N -2), \cdots , 1/(N - n +1) \]
The sampling distribution of \(\hat{\theta}\) is defined as the enumeration of all possible \((\hat{\theta}(\mathcal{S}), P( \mathcal{S}))\), where \(P( \mathcal{S})\) is the selection probability of sample \(\mathcal{S}\).
Using the sampling distribution, we can derive \[ E( \hat{\theta}) = \sum_{\mathcal{S}} \hat{\theta} (\mathcal{S}) P( \mathcal{S}). \]
If \(E( \hat{\theta})=\theta\), then \(\hat{\theta}\) is unbased for \(\theta\).
Also, the variance is \[ V( \hat{\theta}) = E [ \{ ( \hat{\theta} - E( \hat{\theta}) \}^2 ] \] where \[ E [ \{ ( \hat{\theta} - E( \hat{\theta}) \}^2 ] = \sum_{\mathcal{S}} \{ ( \hat{\theta}( \mathcal{S} ) - E( \hat{\theta}) \}^2 P( \mathcal{S} ) \]
Note that \[ V( \hat{\theta}) = E( \hat{\theta}^2) - \{E( \hat{\theta}) \}^2 \] where \[ E( \hat{\theta}^2) = \sum_{\mathcal{S}} \{ \hat{\theta} (\mathcal{S}) \}^2 P( \mathcal{S}). \]
Standard Error of \(\hat{\theta}\): \[ SE ( \hat{\theta}) = \sqrt{ V( \hat{\theta})} \]
| Case | Sample ID | Statistic (Sample mean) |
Selection Prob. |
|---|---|---|---|
| 1 | 1,2 | \((y_1 + y_2)/2\) | 1/6 |
| 2 | 1,3 | \((y_1+ y_3)/2\) | 1/6 |
| 3 | 1,4 | \((y_1 + y_4)/2\) | 1/6 |
| 4 | 2,3 | \((y_2 + y_3)/2\) | 1/6 |
| 5 | 2,4 | \((y_2 + y_4)/2\) | 1/6 |
| 6 | 3,4 | \((y_3+ y_4)/2\) | 1/6 |
For the variance term, note that \[ V( \bar{y}) = E\{ (\bar{y}-\bar{Y}_U )^2 \} \]
We can express \[\begin{eqnarray*} \bar{y} - \bar{Y}_U &=& \frac{1}{n} \sum_{i=1}^N I_i ( y_i - \bar{Y}_U ):= \frac{1}{n} \sum_{i=1}^N I_i z_i \end{eqnarray*}\] where \(z_i = y_i - \bar{Y}_U\).
Thus, we have
\[\begin{eqnarray*}
(\bar{y} - \bar{Y}_U)^2 &=& \frac{1}{n^2} \left\{ \sum_{i=1}^N I_i z_i^2 + \sum_{i \neq j }I_i I_j z_i z_j \right\} \\
E\{ (\bar{y} - \bar{Y}_U)^2 \} &=& \frac{1}{n^2} \left\{ \sum_{i=1}^N E(I_i) z_i^2 + \sum_{i \neq j }E( I_i I_j) z_i z_j \right\}
\end{eqnarray*}\]
If \(n=N\), then \(V( \bar{y})=0\).
If \(n=1\), then \(\bar{y}\) is equal to the \(y\) value of the first selected element and
\[ V( \bar{y})= \frac{1}{N} \sum_{i=1}^N ( y_i - \bar{Y}_U)^2. \] It is the variance of selecting one element at random from \(\{ y_1, \cdots, y_N\}\).
If the sample is SRS with replacement, then we can define \(a(k)\) to be the index of the element selected at the \(k\)-th draw.
The \(n\) sample elements, \(y_{a(1)}, \cdots, y_{a(n)}\), are independently and identically distributed with the distribution \[ Y_{a(k)} = \left\{ \begin{array}{ll} y_1 & \mbox{ with prob. } 1/N \\ y_2 & \mbox{ with prob. } 1/N \\ \vdots & \\ y_N & \mbox{ with prob. } 1/N \end{array} \right. \]
The sample mean is \[ \bar{y} = \frac{1}{ n} \sum_{k=1}^n Y_{a(k)} . \]
It can be shown that the sample mean \(\bar{y}\) satisfies \[ E( \bar{y} ) = \bar{Y}_N \] and \[ V( \bar{y}) = \frac{1}{n} \left( 1- \frac{1}{N}\right) S^2 , \]
Thus, the variance is smaller for without replacement sampling.
Finite population correction factor (FPC) \[ FPC = 1 - \frac{n}{N} \]
Sampling fraction is the proportion of the population sampled, or \(n/N\).
Often FPC is very close to 1.
In cases where sampling fraction is very small and FPC is very close to 1, FPC has no practical effect on the variance or estimated variance of the parameter estimate.
\(T = \sum_{i=1}^N y_i\): population total of \(y\), parameter of interest
\(\hat{T}=N \bar{y}\): an estimator of \(T\).
Properties of \(\hat{T}\)
Y is binary, taking 1 or 0, where \(Y=1\) means the unit has the characteristic of interest
Parameter: proportion of \(Y=1\). \(P = N^{-1} \sum_{i=1}^N y_i\):
\(\hat{P}=\bar{y}\): an estimator of \(P\).
Properties
Different concepts
We are interested in estimating \(V( \bar{y})\) under SRS. We use \(\hat{V} ( \bar{y})\) to denote an estimator of \(V( \bar{y})\).
Recall that, by Theorem 1,
\[V( \bar{y}) = \frac{1}{n} \left( 1- \frac{n}{N}\right) S^2\]
Population variance \[ \small S^2 \equiv \frac{1}{N-1}\sum_{i=1}^N \left( y_i - \bar{Y}_U \right)^2= \frac{1}{2N(N-1)} \sum_{i=1}^N \sum_{j=1}^N (y_i - y_j)^2 \]
To verify the equality, use
\[\begin{eqnarray*} {\small \sum_{i=1}^N \sum_{j=1}^N (y_i - y_j)^2 } & =&{\small \sum_{i=1}^N \sum_{j=1}^N \{ (y_i - \bar{Y}_U) - (y_j - \bar{Y}_U) \}^2 } \\ &=& 2 N \sum_{i=1}^N (y_i - \bar{Y}_U)^2 \end{eqnarray*}\]
Sample variance \[ {\small s^2 = \frac{1}{n-1} \sum_{i \in \mathcal{S}} \left( y_i - \bar{y} \right)^2 = \frac{1}{2n(n-1)} \sum_{i\in \mathcal{S}} \sum_{j \in \mathcal{S}} (y_i - y_j)^2} \]
Property (under SRS) \[ E\left( s^2 \right) = S^2 \]