Sampling Distributions and Confidence Intervals

M. Drew LaMar
April 13, 2022

“…a hypothesis test tells us whether the observed data are consistent with the null hypothesis, and a confidence interval tells us which hypotheses are consistent with the data.”

- William C. Blackwelder

Random sampling

The main assumptions of all statistical techniques is that your data come from a random sample.

Definition: In a random sample, each member of a population has an equal and independent chance of being selected.

Random sampling

minimizes bias (equal) and
makes it possible to measure the amount of (quantify precision) sampling error (independent)

Sampling Distributions

Definition: The sampling distribution represents the distribution of the point estimates based on samples of a fixed size from a certain population. It is useful to think of a particular point estimate as being drawn from such a distribution. Understanding the concept of a sampling distribution is central to understanding statistical inference.

Definition: The standard deviation associated with an estimate is called the standard error. It describes the typical error or uncertainty associated with the estimate.

The standard error is also the standard deviation of the sampling distribution.

http://www.zoology.ubc.ca/~whitlock/kingfisher/SamplingNormal.htm

Central Limit Theorem (informal)

If a sample consists of at least 30 independent observations and the data are not strongly skewed, then the sampling distribution for the mean is well approximated by a normal model even if the population is not normally distributed.

http://www.zoology.ubc.ca/~whitlock/kingfisher/CLT.htm

Confidence Intervals

Definition: The standard error represents the standard deviation associated with the estimate, and roughly 95% of the time the estimate will be within 2 standard errors of the parameter.

An approximate 95% confidence interval for a point estimate is given by \[ \textrm{point estimate} \pm 1.96\times SE \]

Note: For a huge number of computed 95% confidence intervals, the population parameter will be contained in 95% of the confidence intervals.

http://www.zoology.ubc.ca/~whitlock/kingfisher/CIMean.htm