Let \(X_1\), \(X_2\), …, \(X_n\) are \(n\) randomly sampled observations from a population with mean \(E(X_i)=\mu\) and variance \(Var(X_i)=\sigma^2\). Then the sample mean \(\bar{X}=\frac{\sum_i^n X_i}{n}\) can be approximated by a normal density function. Furthermore,
\(\mu_\bar{X} = E(\bar{X})\) and \(\sigma_\bar{X} = \sqrt{Var(\bar{X})}\) are called the mean and standard deviation of the sampling distribution of \(\bar{X}\).
Recall, if X and Y are random variables, then:
Use these properties to prove parts (a) and (b) of the CLT.
Suppose we have sampled \(n=10\) people, ask them a numeric question (e.g. How old are you?), and calculate the mean of their answers: \(\bar{X}=\frac{\sum_i^n X_i}{n}\).
Redo #3 with \(n=100\). How does the standard deviation of the sampling distribution change? Which case would you choose if you wanted to reduce the margin of error?
An event that has two outcomes (success or failure) is called a Bernoulli trial. If \(X\) is Bernoulli random variable with probability of success \(p\), where \(X=1\) is success and \(X=0\) is failure, then:
Suppose you sample \(n\) Bernoulli trials, each with probability \(p\). What is the expected value (mean) of those trials? A proportion!
For \(X_1\), \(X_2\), …, \(X_n\) Bernoulli trials with probability of success \(p\), \(\bar{X}=\frac{\sum_i^n X_i}{n}\) is the proportion of successes.
In this case, \(\bar{X}\) follows a normal distribution with:
Often we don’t know what \(p\) is. BUT, if we assume we know what \(p\) is, we can test the probability of observing data more extreme that what we already have sample, and use that to test our hypothesis.