Introduction

The sampling distribution of a statistic is the probability distribution of a statistic, i.e.ย what values can the statistic take on and how often will we see these values if we took every possible sample of size \(n\) from the population.

Consider taking repeated samples from a population and computing the statistic for each sample. You would get many different values of the statistic and some values would be more common than others.

The Mean

Let \(\hat p\) denote the sample proportion of a random sample of \(n\) observations from a population with population proportion \(p\). The mean of the sampling distribution of \(\hat p\) is equal to the population proportion \(p\) \[E(\hat p) = p\]

The Standard Deviation

Let \(\hat p\) denote the sample proportion of a random sample of \(n\) observations from a population with population proportion \(p\). The standard deviation of the sampling distribution of \(\hat p\) is equal to \[\sigma_{\hat p} = \sqrt{\frac{p (1 - p)}{n}}\]

Note: The standard error of \(\hat p\) is another name for the standard deviation of the sampling distribution of \(\hat p\).

The Shape

Let \(\hat p\) denote the sample proportion of a random sample of \(n\) observations from a population with population proportion \(p\). The sampling distribution of \(\hat p\) will be normal if \(n p \ge 10\) and \(n (1- p) \ge 10\)

Putting it Together

Let \(\hat p\) denote the sample proportion of a random sample of \(n\) observations from a population with population proportion \(p\). Then \[\hat p \sim \text{ approximately }N(p,\sqrt{\frac{ p (1- p)}{n}})\]

as long as \(n p \ge 10\) and \(n (1- p) \ge 10\)

NOTE: From now on, we will say that \(\hat p\) has a normal distribution when we more accurately mean an approximately normal distribution

Example 1: Suppose we have a population with proportion \(p = 0.50\) and a random sample of size \(n = 900\) drawn from the population.

  • Does \(\hat p\) have a normal distribution?
  • What is the mean of the sampling distribution of \(\hat p\)?
  • What is the standard error of \(\hat p\)?
  • What is the probability that the sample proportion is more than 0.52?
  • What is the probability that the sample proportion is less than 0.46?
  • What is the 40th percentile of the sampling distribution of the sample proportion?

  • Yes since \(np = 900 \cdot 0.50 \ge 10\) and \(n(1-p) = 900 \cdot (1-0.50) \ge 10\)
  • \(E(\hat p) = p = 0.50\)
  • \(\sigma_{\hat p} = \sqrt{\frac{p(1-p)}{n}} = \sqrt{\frac{0.5(1-0.5)}{900}} = 0.167\)
  • \(P(\hat p > 0.52) = 1 - \text{pnorm}(0.52,0.50,\text{sqrt}(0.5*(1-0.5)/900))=0.1151\)
  • \(P(\hat p < 0.46) = \text{pnorm}(0.46,0.50,\text{sqrt}(0.5*(1-0.5)/900))=0.0082\)
  • \(\text{qnorm}(0.4,0.50,\text{sqrt}(0.5*(1-0.5)/900))=0.496\)

Note: You can do it all at once in R with the following code


Example 2: A particular candidate for public office is favored by \(38%\) of all registered voters in the district. A polling organization takes a random sample of \(100\) voters.

  • Does the sampling distribution of the sample proportion have a normal distribution?
  • What is the mean of the sampling distribution of the sample proportion?
  • What is the standard error of the sample proportion?
  • What is the probability that the sample proportion exceeds 0.4?
  • What is the probability that the sample proportion lies between 0.35 and 0.39?
  • What is the 20th percentile of the sampling distribution of \(\hat p\)?

  • Yes, since \(np = 100 \cdot 0.38 \ge 10\) and \(n(1-p) = 100 \cdot (1-0.38) \ge 10\)
  • \(E(\hat p) = p = 0.38\)
  • \(\sigma_{\hat p} = \sqrt{\frac{p(1-p)}{n}} = \sqrt{\frac{0.38 \cdot (1-0.38)}{100}} = 0.049\)
  • \(P(\hat p > 0.40) = 1 - \text{pnorm}(0.40,0.38,\text{sqrt}(0.38*(1-0.38)/100))=0.3402\)
  • \(P(0.35 \le \hat p \le 0.39) \\ =\text{pnorm}(0.39,0.38,\text{sqrt}(0.38*(1-0.38)/100))-\text{pnorm}(0.35,0.38,\text{sqrt}(0.38*(1-0.38)/100))\\=0.3133\)
  • \(\text{qnorm}(0.2,0.38,\text{sqrt}(0.38*(1-0.38)/100))=0.3391\)

Note: You can do it all at once in R with the following code