Introduction

The sampling distribution of a statistic is the probability distribution of a statistic, i.e. what values can the statistic take on and how often will we see these values if we took every possible sample of size \(n\) from the population.

Consider taking repeated samples from a population and computing the statistic for each sample. You would get many different values of the statistic and some values would be more common than others.


You can try out the simulation yourself here

The Mean

Let \(\bar X\) denote the sample mean of a random sample of \(n\) observations from a population with mean \(\mu\) and standard deviation \(\sigma\), then the mean of the sampling distribution of \(\bar X\) is equal to the population mean \(\mu\) \[E(\bar X) = \mu\]

The Standard Deviation

Let \(\bar X\) denote the sample mean of a random sample of \(n\) observations from a population with mean \(\mu\) and standard deviation \(\sigma\), then the standard deviation of the sampling distribution of \(\bar X\) is equal to the population standard deviation divided by the square root of \(n\) \[\sigma_{\bar X} = \frac{\sigma}{\sqrt{n}}\] Note: The standard error of \(\bar X\) is another name for the standard deviation of the sampling distribution of \(\bar X\).

The Shape

Let \(\bar X\) denote the sample mean of a random sample of \(n\) observations from a population with mean \(\mu\) and standard deviation \(\sigma\), then the sampling distribution of \(\bar X\) will be approximately normal if either of the following is true:

  • the population is normal
  • \(n \ge 30\) (this is called the Central Limit Theorem)

Putting it Together

Let \(\bar X\) denote the sample mean of a random sample of \(n\) observations from a population with mean \(\mu\) and standard deviation \(\sigma\), then \[\bar X \sim \text{ approximately } N(\mu, \frac{\sigma}{\sqrt{n}})\] as long as one of the following is true

  • the population is normal
  • \(n \ge 30\)

NOTE: From now on, we will say that \(\bar X\) has a normal distribution when we more accurately mean an approximately normal distribution

Example 1: A random sample of size 16 is taken from a normal population with mean \(\mu = 100\) and standard deviation \(\sigma = 9\).

  • Does \(\bar X\) have a normal distribution?
  • What is the mean of the sampling distribution of \(\bar X\)?
  • What is the standard error of \(\bar X\)?
  • What is the probability that \(\bar X\) is less than 97?
  • What is the probability that \(\bar X\) is greater than 101?
  • What is the 90th percentile of the sampling distribution of \(\bar X\)?

  • Yes, since the population is normal
  • \(E(\bar X) = \mu = 100\)
  • \(\sigma_{\bar X} = \frac{\sigma}{n} = \frac{9}{\sqrt{16}} = 2.25\)
  • \(P(\bar X < 97) = \text{pnorm}(97,100,2.25)=0.0912\)
  • \(P(\bar X > 101) = 1 - \text{pnorm}(101,100,2.25)=0.3284\)
  • \(\text{qnorm}(0.9,100,2.25)=102.884\)

Note: You can do the whole thing with the following R code


Example 2: Carbon monoxide (CO) emissions for a certain kind of car vary with a mean 2.9 grams per mile (g/mi) and standard deviation 0.9 g/mi. A company has 81 cars in its fleet.

  • Do the average CO emissions for the fleet have a normal distribution?
  • What is the mean of the sampling distribution of average CO emissions for the fleet?
  • What is the standard error of average CO emissions for the fleet?
  • What is the probability that the average CO emissions for the fleet is between 3.0 and 3.1 g/mi?
  • What is the 80th percentile of the average CO emissions for the fleet?

  • Yes, since \(n \ge 30\)
  • \(E(\bar X) = \mu = 2.9\)
  • \(\sigma_{\bar X} = \frac{\sigma}{n} = \frac{0.9}{\sqrt{81}} = 0.1\)
  • \(P(3.0 \le \bar X \le 3.1) = \text{pnorm}(3.1,2.9,0.1)-\text{pnorm}(3.0,2.9,0.1)=0.1359\)
  • \(\text{qnorm}(0.8,2.9,0.1)=2.98\)

Note: You can do the whole thing with the following R code