Sampling Distributions and Confidence Intervals

M. Drew LaMar
February 3, 2021

“…a hypothesis test tells us whether the observed data are consistent with the null hypothesis, and a confidence interval tells us which hypotheses are consistent with the data.”

- William C. Blackwelder

Introduction to Statistical Inference

alt text

Populations vs Samples

Definition: A parameter is a quantity describing a population, whereas an estimate or statistic is a related quantity calculated from a sample.

Parameter examples: Averages, proportions, measures of variation, and measures of relationship

What is statistics?

Statistics is a technology that describes and measures aspects of nature from samples.

Statistics lets us quantify the uncertainty of these measures.

Statistics makes it possible to determine the likely magnitude of measurements departure from the “truth”.

Statistics is about estimation, the process of inferring an unknown quantity of a target population using sample data.

What is statistics?

The two sides of the statistical coin:

Parameter estimation
Hypothesis testing

Definition: A statistical hypothesis is a specific claim regarding a population parameter.

Definition: Hypothesis testing uses data to evaluate evidence for or against statistical hypotheses.

What is statistics? Parameter estimation

The two sides of the statistical coin:

Parameter estimation
Hypothesis testing

Example: A trapping study measures the rate of fruit fall in forest clear-cuts.

What is statistics? Hypothesis testing

The two sides of the statistical coin:

Parameter estimation
Hypothesis testing

Example: A clinical trial is carried out to determine whether taking large doses of vitamin C benefits health of advanced cancer patients.

What is probability?

$alt text$

alt text

Probability comes first!

…well, most of the time.

Many statistical techniques require assumptions about where your data is coming from (i.e. properties of the population)
In other words, an assumed probability model describes the population
Statistical techniques that are based on probability models are called parametric techniques, while those that are not are called non-parametric techniques.

Data as Information

For your question, there is desired and undesired information in your data.

Goals:

Get accurate information by reducing bias
Get precise information by reducing sampling error due to random variation (increase signal-to-noise ratio)

Definition: Bias is a systematic discrepancy between the estimates we would obtain, if we could sample a population again and again, and the true population characteristic.

Data as Information

For your question, there is desired and undesired information in your data.

Goals:

Get accurate information by reducing bias
Get precise information by reducing sampling error due to random variation (increase signal-to-noise ratio)

Definition: Sampling error is the difference between an estimate and the population parameter being estimated caused by chance.

Precision vs Accuracy

Data as Information

For your question, there is desired and undesired information in your data.

Goals:

Isolate desired information by reducing or controlling for confounding factors (i.e. undesired information)

“The aim … is to provide a clear and rigorous basis for determining when a causal ordering can be said to hold between two variables or groups of variables in a model…”

- H. Simon

Random sampling

The main assumptions of all statistical techniques is that your data come from a random sample.

Definition: In a random sample, each member of a population has an equal and independent chance of being selected.

Random sampling

minimizes bias (equal) and
makes it possible to measure the amount of (quantify precision) sampling error (independent)

Sampling Distributions

Definition: The sampling distribution represents the distribution of the point estimates based on samples of a fixed size from a certain population. It is useful to think of a particular point estimate as being drawn from such a distribution. Understanding the concept of a sampling distribution is central to understanding statistical inference.

Definition: The standard deviation associated with an estimate is called the standard error. It describes the typical error or uncertainty associated with the estimate.

The standard error is also the standard deviation of the sampling distribution.

http://www.zoology.ubc.ca/~whitlock/kingfisher/SamplingNormal.htm

Confidence Intervals

Definition: The standard error represents the standard deviation associated with the estimate, and roughly 95% of the time the estimate will be within 2 standard errors of the parameter.

An approximate 95% confidence interval for a point estimate is given by \[ \textrm{point estimate} \pm 1.96\times SE \]

Note: For a yuge number of computed 95% confidence intervals, the population parameter will be contained in 95% of the confidence intervals.

http://www.zoology.ubc.ca/~whitlock/kingfisher/CIMean.htm