1 Central Limit Theorem

Let \(X_1\), \(X_2\), …, \(X_n\) are \(n\) randomly sampled observations from a population with mean \(E(X_i)=\mu\) and variance \(Var(X_i)=\sigma^2\). Then the sample mean \(\bar{X}=\frac{\sum_i^n X_i}{n}\) can be approximated by a normal density function. Furthermore,

  1. \(E(\bar{X})=\mu\), and
  2. \(Var(\bar{X})=\frac{\sigma^2}{n}\)

\(\mu_\bar{X} = E(\bar{X})\) and \(\sigma_\bar{X} = \sqrt{Var(\bar{X})}\) are called the mean and standard deviation of the sampling distribution of \(\bar{X}\).


Recall, if X and Y are random variables, then:

  • \(E(X+Y) = E(X+Y)\)
  • \(E(aX+b) = aE(X) + b\)
  • \(Var(X+Y)=Var(X)+Var(Y)\)
  • \(Var(aX+b)=a^2 Var(X)\)

  1. Use these properties to prove parts (a) and (b) of the CLT.

  2. Suppose we have sampled \(n=10\) people, ask them a numeric question (e.g. How old are you?), and calculate the mean of their answers: \(\bar{X}=\frac{\sum_i^n X_i}{n}\).

    1. If we assume mean age is \(\mu=20\) and \(\sigma=2\), what is the sampling distribution of of \(\bar{X}\)?
    2. If we assume mean age is \(\mu=30\) and \(\sigma=3\), what is the sampling distribution of of \(\bar{X}\)?
    3. If we assume mean age is \(\mu=60\) and \(\sigma=5\), what is the sampling distribution of of \(\bar{X}\)?
    4. Which of these sampling distributions has the smallest standard deviation (margin of error)?
  3. Redo #3 with \(n=100\). How does the standard deviation of the sampling distribution change? Which case would you choose if you wanted to reduce the margin of error?

2 Using the sampling distribution of a Bernoulli Trial

An event that has two outcomes (success or failure) is called a Bernoulli trial. If \(X\) is Bernoulli random variable with probability of success \(p\), where \(X=1\) is success and \(X=0\) is failure, then:

  • \(E(X)=p\)
  • \(Var(X)=p(1-p)\)

Suppose you sample \(n\) Bernoulli trials, each with probability \(p\). What is the expected value (mean) of those trials? A proportion!

3 Central Limit Theorem for a Proportion

For \(X_1\), \(X_2\), …, \(X_n\) Bernoulli trials with probability of success \(p\), \(\bar{X}=\frac{\sum_i^n X_i}{n}\) is the proportion of successes.

In this case, \(\bar{X}\) follows a normal distribution with:

  1. mean: \(\mu = E(\bar{X})=p\), and
  2. variance: \(\sigma^2 = Var(\bar{X})=\frac{p(1-p)}{n}\).

  1. Use the CLT to prove the CLT for a proportion

Often we don’t know what \(p\) is. BUT, if we assume we know what \(p\) is, we can test the probability of observing data more extreme that what we already have sample, and use that to test our hypothesis.

  1. Suppose we have sampled \(n=100\) people and ask them a Yes/No question and we measure the proportion of people \(\bar{X}=\frac{\sum_i^n X_i}{n}\) answering Yes.
    1. If we assume \(p=0.5\), what is the sampling distribution of of \(\bar{X}\)?
    2. If we assume \(p=0.25\), what is the sampling distribution of of \(\bar{X}\)?
    3. If we assume \(p=0.75\), what is the sampling distribution of of \(\bar{X}\)?
    4. Which of these sampling distributions has the smallest standard deviation (margin of error)?
  2. Redo 3 with \(n=1000\).How does the standard deviation of the sampling distribution change? Which case would you choose if you wanted to reduce the margin of error?