Confidence Intervals
Intro to OLS Regression
POLS 3316: Statistics for Political Scientists

Tom Hanna

2023-11-12

Confidence Intervals

Review \(X^2\) and t-test
General hypothesis testing: scores and critical values
Confidence intervals

\(X^2\) and t-test

Note on formulas: Numbers in numerator or denominator of a fraction all get solved together as if in parentheses. PEMDAS is shorthand, but the P should really be G for “grouping symbols” and includes: parentheses ( ), brackets [ ], braces { }, and even fraction bars.

Terms

null hypothesis
alternative hypothesis
critical value (cutoff point)
critical value is found using degrees of freedom and desired probability or alpha. (In z-scores, only probability is needed.)
probability or p - confidence level

Test Scores and Critical Values

if the test score is greater than the critical value, reject the null
The critical value is minimum
Above that value, the probability that the null hypothesis is true is equal to or less than the target so we reject
This all leads to confidence intervals

# Confidence intervals

## What is a confidence interval?

Range where we could expect to find the true measurement based on the probability level we specify if we had the full population data instead of just the sample.
Range where we expect to find the true population parameter given the sample statistic
If we say the 95% confidence interval of the mean is from 45 to 55, it means we are 95% certain that the true population mean lies somewhere in that range.

Getting the confidence interval

The confidence intervals are based on the thing we actually measured, the probability level, and the distribution.
If we know the population parameters (especially standard deviation) we can create simple CI using 68-95-99 rule or accurate CI using z-scores
For small samples or uncertain population standard deviation we use appropriate distributions, such as Student’s t, to adjust for both these uncertainties

Using t-scores to find confidence intervals

For small samples or uncertain population standard deviation we use t-scores to adjust for these uncertainties

          - If we wanted to get a 95% confidence interval with 68-95-99, we would look for a value 2 standard deviations either side of the statistic we want to find a CI for 
          - With t-test scores we divide the standard deviation by the square root of sample size and multiply times a t-table critical value instead.
          - We can get the critical value using the qt() function in R as in the examples

Using t-scores to find confidence intervals

\(CI_\mu\) Confidence interval of the population mean
\(CI_\mu\) = \(\bar{x} \pm t \frac{\sigma_{s}}{\sqrt{n}}\)

Where:

\(\bar{x}\) = Sample mean
\(\mu\) = Population mean
\(\sigma_{s}\) = sample standard deviation
n = sample size

What about the normal distribution?

The 68-95-99 rule gives an oversimplified CI.

For normal distributions, the actual formula is:

What about the normal distribution?

\(CI_\mu\) Confidence interval of the population mean

\(CI_\mu = \bar{x} \pm z \frac{\sigma}{\sqrt{n}}\)

Where:

\(\bar{x}\) Sample mean \(\sigma\) population standard deviation \(n\) is the sample size

We can get the z-score from a z-table or from the qnorm() function in R

Examples:

We’ll go through a couple of example computations, then I’ll provide you some sample R lab scripts (Quarto files).

Examples:

Sample mean: 50 = \(\bar{x}\)
Sample standard deviation 5 = \(\sigma_{s}\)
Sample size 18 = \(n\)
Looking for a 95% confidence interval.

We need the t-score for a two-tailed test:

Code

qt(.025,17)

[1] -2.109816

Examples:

Code

qt(.025,17)

[1] -2.109816

Then we can fill in the formula:

\(CI_\mu\) = \(\bar{x} \pm t \frac{\sigma_{s}}{\sqrt{n}}\)

Code

50 + c(-2.11, 2.11) * 5 / sqrt(18)

[1] 47.51334 52.48666

Example 2:

Using the z-score:

Sample mean: 50
Population standard deviation: 5
Sample size: 500
We need a z-score for 95% (.025% above and .025% below):

Code

qnorm(.025)

[1] -1.959964

Example 2

Then we apply the formula:

\(CI_\mu = \bar{x} \pm z \frac{\sigma}{\sqrt{n}}\)

Code

50 + c(-1.96, 1.96) * 5 / sqrt(500)

[1] 49.56173 50.43827

Is this just about means?

NO!

The biggest use I see of confidence intervals is confidence intervals for regression coefficients

\(y = \alpha + \beta X + \epsilon\)

If we find that the \(\beta\) in the equation defining the relationship between our explanatory (x) and dependent (y) variables is 0.9, the confidence interval can tell us that it’s actually 95% likely that the true relationship is between 0.88 and 0.92.

Even better, we can graph it…

Sample of graphing confidence interval in OLS

Additional Resources for Confidence Intervals

https://www.statisticshowto.com/probability-and-statistics/confidence-interval/#WhatisCI

https://www.statisticshowto.com/probability-and-statistics/statistics-definitions/confidence-level/

https://www.statisticshowto.com/95-percent-confidence-interval/

## Authorship, License, Credits

Author: Tom Hanna
Website: tomhanna.me
License: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Confidence Intervals Intro to OLS Regression POLS 3316: Statistics for Political Scientists