2024-10-31

Confidence Interval (CI)

  • A CI is a range of data values that tells us the estimate of the parameter of a set of data. It had a confidence level which tells us the approximation of how many CIs the true mean value would fall under.
  • CIs help in decision-making.
  • They can quantify uncertainty.
  • They also provide us with a range for estimating population parameters.

Sample Data

  • To demonstrate the CI and other attributes of CIs, we will use the mtcars built-in data set which contains details about different models of cars.
  • From this data set we will mainly use the miles per gallon data.
  • The following code loads the data set and prints a few rows and the ‘mpg’ and ‘cyl’ columns.
mtcars_data <- mtcars[c("mpg", "cyl")]
head(mtcars_data)
##                    mpg cyl
## Mazda RX4         21.0   6
## Mazda RX4 Wag     21.0   6
## Datsun 710        22.8   4
## Hornet 4 Drive    21.4   6
## Hornet Sportabout 18.7   8
## Valiant           18.1   6

Sample Mean and Standard Deviation for mpg

  • The sample mean is the average of the data values.
  • The standard deviation is how much each point deviates from the mean.
  • Both of these values will be used in calculating the Confidence Interval.
  • The sample mean (𝑥) and sample standard deviation (𝑠) for the mpg variable are calculated as follows:
sample_mean <- mean(mtcars$mpg)
sample_size <- length(mtcars$mpg)
sample_sd <- sqrt( (sum((mtcars$mpg - sample_mean)^2)) / (sample_size - 1))

Calculating Margin of Error

The formula for margin of error is:

\[ \text{Margin of Error} = z_{\alpha/2} \left( \frac{s}{\sqrt{n}} \right) \]

  • This value represents how much we expect the true population parameter to vary from our sample mean, based on the chosen confidence level.
  • We will use 0.95 as the standard z-score.
  • \(z_{\alpha/2}\) is the critical z-value, \(s\) is the sample standard deviation, \(n\) is the sample size.
z_alpha <- qnorm(0.975)

margin_error <- z_alpha * (sample_sd / sqrt(sample_size))

Calculating the Confidence Interval

The formula for a confidence interval is:

\[ CI = \bar{x} \pm z_{\alpha/2} \left( \frac{s}{\sqrt{n}} \right) \]

  • \(\bar{x}\) is the sample mean, \(z_{\alpha/2}\) is the critical z-value, \(s\) is the sample standard deviation, \(n\) is the sample size.
lower_bound <- sample_mean - margin_error
upper_bound <- sample_mean + margin_error

cat("Confidence Interval for mpg: [", round(lower_bound, 2), ", ", 
    round(upper_bound, 2), "]\n")
## Confidence Interval for mpg: [ 18 ,  22.18 ]

Histogram for MPG

  • The red line represents the sample mean of the mpg. The blue lines are the upper and lower bounds of the CI.

Scatter Plot of MPG and weight for cars

  • The line demonstrates the average miles per gallon used across cars with different weight distributions.

Scatter Plot to show distribution of MPG