Sampling Distributions

Sampling distribution is a probability distribution that is used to account for the variability naturally present when you estimate population parameters using sample statistics.

In previous chapters, we assumed we knew the parameters of the relevant example distribution, but in practice these kinds of quantities are often unknown. In these cases, we need to estimate the quantities from a sample.

Different samples from the same population will provide a different value for the same statistic. This is because they are random variables and like any other probability distribution, the central “balance” point of a sampling distribution is its mean, but the standard deviation of a sampling distribution is referred to as a standard error.

Distributions for a Sample Mean

Let \(X_1, X_2, X_3, ... X_N\) be a random sample, then \(\bar{X}\) is a Random Variable with a finite mean.

Standard Deviation Known

If we know \(\sigma_X\) and \(X\) comes from a Normal distribution then:

\[\bar{X} \, \sim \, Normal \Bigg( \mu_X, \frac{\sigma_X}{\sqrt{n}}\Bigg).\]

If we know \(\sigma_X\) and \(X\) but X does not come from a Normal distribution then: \[\bar{X} \, \dot\sim \, Normal \Bigg( \mu_X, \frac{\sigma_X}{\sqrt{n}}\Bigg).\]

This last result is known as the Central Limit Theorem.

Standard Deviation Unknown

In practice, \(\sigma_X\) is usually not known so we must replace it with \(s_X\), this introduces more variability and affects the distribution. We can no longer use the Normal distribution, and instead we use the t distribution.

\[\bar{X} \, \sim \, t_{n-1} \Bigg( \mu_X, \frac{s_X}{\sqrt{n}}\Bigg)\] where \(n-1\) are the degrees of freedom. The higher the degrees of freedom, the closer the t distribution resembles a Normal distribution.

Distribution for a Sample Proportion

The random variable of interest,\(\hat{p}\) , represents the estimated proportions of successes over any n trials, each resulting in some defined binary outcome. It is estimated as:

\[\hat{p}=\frac{x}{n}\] where x is the number of successes in a sample of size n.

The true (unknown) proportion of successes is denoted by \(\pi\) or \(p\).

The sampling distribution of \(\hat{p}\) is given by:

\[\hat{p} \, \dot\sim \, Normal \Bigg( \hat{p}, \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\Bigg).\]

Note: - This approximation is valid for large \(n\) and/or when \(\pi\) is not too close to 0 or 1. - The rule of thumb for the validy of these assumptios is for \(np\) and \(n(1-p)\) to be greater than 5.


Confidence Intervals

A confidence interval (CI) is an interval defined by a lower limit L and an upper limit U, used to describe possible values of a corresponding true population parameter in light of observed sample data.

Allows us to state a “level of confidence” that a true parameter of interest falls between this upper and lower limit.

The level of confidence is usually expressed as a percentage, such that you’d construct a \(100 \times (1 − \alpha)\) percent confidence interval, where \(0 \lt \alpha \lt 1\) is an “amount of tail probability” or the amount of error we are comfortable with.

The three most common intervals are defined with either \(\alpha = 0.1\) (a 90 percent confidende interval), \(\alpha = 0.05\) (a 95 percent confidence interval), or \(\alpha = 0.01\) (a 99 percent confidence interval).

INTERPRETATION

For a given confidence interval (L, U) we say:

“I am \(100 \times (1 − \alpha)\) percent confident that the true parameter value lies somewhere between L and U.”

GENERAL STRUCTURE

A confidence interval is build as:

\[ statistic \pm critical\> value \times standard \> error\]

When we are working with symmetric distributions, like Normal or t, and we want to have \((1 - \alpha)\) probability in the center, we need to find values such that we have \(\alpha/2\) probability in each tail.

An Inverval for a Mean

A \(100 \times (1 − \alpha)\) Confidence Interval for \(\mu\) is given by either:

\[\bar{x} \pm z_{\alpha/2}\frac{\sigma}{\sqrt{n}} \] or \[\bar{x} \pm t_{n-1;\>\alpha/2}\>\frac{s}{\sqrt{n}} \]

depending on whether the standard deviation is known or not.

An Interval for a Proportion

A \(100 \times (1 − \alpha)\) Confidence Interval for \(\pi\) is given by:

\[\hat{p} \pm z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}} .\]

Common Interpretations of Confidence Intervals

Over many samples of the same size and from the same population where a CI, of the same confidence level, is constructed with respect to the same statistic from each sample, you would expect the true corresponding parameter value to fall within the limits of \(100\times(1 - \alpha)\) percent of those intervals.