Convenience, volunteer samples
Take whomever is willing (e.g.: Volunteer web surveys)
Judgment, purposive, quota samples
May be useful for initial studies to probe a topic
CANNOT make inferences about a population from such studies
Undercoverage: sampling frame omits portion of target population (e.g. Homeless in telephone survey of U.S. residents)
Remedies
Recall that survey sampling has two important components
Selection bias occurs when the representativeness is impaired in the sample.
Measurement bias occurs when measurement process produces observations on an OU that differ from the true value for the OU in a systematic manner
Systematic bias over OUs in sample in same direction results in a biased estimate of a population characteristic
Measurement error often results in increased variance in estimates (with or without bias) as well
Sampling error:
Nonsampling error: Everything else
In Stat 421, we will mostly focus on sampling error.
Sampling error can be controlled and understood better under probability sampling.
Statistic : a function of observations (\(y_i\)’s) in the sample
Estimator : If a statistic is used to estimate a parameter, it is called an estimator for the parameter
Estimate : A numerical value obtained from the estimator by applying the real observations in the sample.
Example: Population mean is a parameter. Mean of the sample observations (i.e. sample mean) is a statistic. If the sample mean is used to estimate the population mean, it is an estimator of the population mean.
Setup: Suppose that we wish to draw a sample of size \(n\) from a finite population of size \(N\).
There are \(N \choose n\) possible samples.
Recall that \[ {N \choose n} = \frac{ N!}{ n! (N-n)!} = \frac{ N \cdot (N-1) \cdots (N-n+1) }{ n \cdot (n-1) \cdots 1} \]
| ID | Size of farms (Acres) | Yield |
|---|---|---|
| 1 | 4 | 1 |
| 2 | 6 | 3 |
| 3 | 6 | 5 |
| 4 | 20 | 15 |
Instead of observing \(N=4\) farms, we want to select a sample of size \(n=2\).
There are 6 possible samples
| Case | Sample ID | Sample mean \((=\bar{y})\) |
|---|---|---|
| 1 | 1,2 | 2 |
| 2 | 1,3 | 3 |
| 3 | 1,4 | 8 |
| 4 | 2,3 | 4 |
| 5 | 2,4 | 9 |
| 6 | 3,4 | 10 |
Non-probability sampling approach: (Using the size of farms or etc) select a sample subjectively.
Probability sampling approach: select a sample by a probability rule.
Definition: Probability sampling
| Case | Sample ID | Selection Prob. |
|---|---|---|
| 1 | 1,2 | 0 |
| 2 | 1,3 | 0 |
| 3 | 1,4 | 0 |
| 4 | 2,3 | 1 |
| 5 | 2,4 | 0 |
| 6 | 3,4 | 0 |
| Case | Sample ID | Selection Prob. |
|---|---|---|
| 1 | 1,2 | 1/6 |
| 2 | 1,3 | 1/6 |
| 3 | 1,4 | 1/6 |
| 4 | 2,3 | 1/6 |
| 5 | 2,4 | 1/6 |
| 6 | 3,4 | 1/6 |
From the sample design, we can induce the sampling distribution of a statistic.
Under SRS, the sampling distribution of a statistic can be induced from the sampling design.
| Case | Sample ID | Statistic (Sample mean) |
Selection Prob. |
|---|---|---|---|
| 1 | 1,2 | \((y_1 + y_2)/2\) | 1/6 |
| 2 | 1,3 | \((y_1+ y_3)/2\) | 1/6 |
| 3 | 1,4 | \((y_1 + y_4)/2\) | 1/6 |
| 4 | 2,3 | \((y_2 + y_3)/2\) | 1/6 |
| 5 | 2,4 | \((y_2 + y_4)/2\) | 1/6 |
| 6 | 3,4 | \((y_3+ y_4)/2\) | 1/6 |
In survey sampling from a finite population, the statistic (e.g. sample mean) can be treated as a discrete random variable whose probability distribution is completely determined by the sampling design.
In the above example, the sampling distribution of the sample mean (\(\bar{y}\)) is derived as follows:
| Case | Sample mean | Selection Prob. |
|---|---|---|
| 1 | 2 | 1/6 |
| 2 | 3 | 1/6 |
| 3 | 8 | 1/6 |
| 4 | 4 | 1/6 |
| 5 | 9 | 1/6 |
| 6 | 10 | 1/6 |
Since the sampling distribution of \(\bar{y}\) is known, we can compute its mean and variance.
Mean of \(\bar{y}\) in this example is \[ E( \bar{y} ) = \frac{1}{6} \left( 2+ 3 + \cdots + 10 \right) = 6. \]
Here, the expectation is over the sampling distribution, the distribution obtained by repeatedly applying the sample selection from the sampling design.
The individual values of \(y_i\) are treated as fixed, only the selected sample (i.e. which elements are selected) is random.
The bias of \(\hat{\theta}\) as an estimator of \(\theta\) is defined as \[ Bias ( \hat{\theta}) = E ( \hat{\theta}) - \theta \] where the expectation is with respect to the sampling mechanism, or the sampling distribution of \(\hat{\theta}\).
If the bias of an estimator is zero, the estimator is called unbiased, or design-unbiased (to emphasize that the expectation is obtained from the sampling design.)
Under SRS, the sample mean is unbiased for the population mean. (It is not necessarily true for other sampling designs.)
The variance of \(\hat{\theta}\) is defined as \[ V( \hat{\theta} ) = E\left\{ \left( \hat{\theta} - E ( \hat{\theta}) \right)^2 \right\} \]
The MSE of \(\hat{\theta}\) as an estimator of \(\theta\) is defined as \[ MSE ( \hat{\theta}) = E\left\{ \left( \hat{\theta} - \theta\right)^2 \right\} \]
In general, we have \[ MSE ( \hat{\theta}) = \{Bias ( \hat{\theta}) \}^2 + V( \hat{\theta} ) . \]
\[ \]
| Case | Sample ID | Selection Prob. |
|---|---|---|
| 1 | 1,2 | 1/3 |
| 2 | 1,3 | 0 |
| 3 | 1,4 | 1/3 |
| 4 | 2,3 | 0 |
| 5 | 2,4 | 1/3 |
| 6 | 3,4 | 0 |
Recall that \[ MSE( \hat{\theta})= \{ Bias (\hat{\theta})\}^2 + V ( \hat{\theta}) \]
In general, the variance will be reduced if we increase the sample size. However, the bias is independent of the sample size.
Thus, in probability sampling, we can make \(MSE ( \hat{\theta}) \downarrow 0\) as \(n \rightarrow \infty\).
In non-porbability sampling, the MSE does not necessarily decrease with the sample size.