Confidence Intervals

2024-10-31

Confidence Interval (CI)

A CI is a range of data values that tells us the estimate of the parameter of a set of data. It had a confidence level which tells us the approximation of how many CIs the true mean value would fall under.
CIs help in decision-making.
They can quantify uncertainty.
They also provide us with a range for estimating population parameters.

Sample Data

To demonstrate the CI and other attributes of CIs, we will use the mtcars built-in data set which contains details about different models of cars.
From this data set we will mainly use the miles per gallon data.
The following code loads the data set and prints a few rows and the ‘mpg’ and ‘cyl’ columns.

mtcars_data <- mtcars[c("mpg", "cyl")]
head(mtcars_data)

##                    mpg cyl
## Mazda RX4         21.0   6
## Mazda RX4 Wag     21.0   6
## Datsun 710        22.8   4
## Hornet 4 Drive    21.4   6
## Hornet Sportabout 18.7   8
## Valiant           18.1   6

Sample Mean and Standard Deviation for mpg

The sample mean is the average of the data values.
The standard deviation is how much each point deviates from the mean.
Both of these values will be used in calculating the Confidence Interval.
The sample mean (𝑥) and sample standard deviation (𝑠) for the mpg variable are calculated as follows:

sample_mean <- mean(mtcars$mpg)
sample_size <- length(mtcars$mpg)
sample_sd <- sqrt( (sum((mtcars$mpg - sample_mean)^2)) / (sample_size - 1))

Calculating Margin of Error

The formula for margin of error is:

\[ \text{Margin of Error} = z_{\alpha/2} \left( \frac{s}{\sqrt{n}} \right) \]

This value represents how much we expect the true population parameter to vary from our sample mean, based on the chosen confidence level.
We will use 0.95 as the standard z-score.
\(z_{\alpha/2}\) is the critical z-value, \(s\) is the sample standard deviation, \(n\) is the sample size.

z_alpha <- qnorm(0.975)

margin_error <- z_alpha * (sample_sd / sqrt(sample_size))

Calculating the Confidence Interval

The formula for a confidence interval is:

\[ CI = \bar{x} \pm z_{\alpha/2} \left( \frac{s}{\sqrt{n}} \right) \]

\(\bar{x}\) is the sample mean, \(z_{\alpha/2}\) is the critical z-value, \(s\) is the sample standard deviation, \(n\) is the sample size.

lower_bound <- sample_mean - margin_error
upper_bound <- sample_mean + margin_error

cat("Confidence Interval for mpg: [", round(lower_bound, 2), ", ", 
    round(upper_bound, 2), "]\n")

## Confidence Interval for mpg: [ 18 ,  22.18 ]

Histogram for MPG

The red line represents the sample mean of the mpg. The blue lines are the upper and lower bounds of the CI.

Scatter Plot of MPG and weight for cars

The line demonstrates the average miles per gallon used across cars with different weight distributions.