2025-02-27

Interval Estimation

  • Using data to estimate an interval of potential values for a specific population parameter
  • There are many different types of Interval Estimations in Statistics
    • The most common type is Confidence Intervals
    • Other types include Prediction Intervals, Tolerance Intervals, and more
  • The contrary of Interval Estimation is Point Estimation, which uses a single point/value instead of a range of values.

What are Confidence Intervals?

  • Confidence Intervals are intervals that are used for the estimation of a certain parameter for a population for a given confidence level
  • Example of these population parameters include the mean, standard deviation, and proportion of a population
  • Formula for calculating the confidence interval of the mean of a population:

\[\text{Confidence Interval} = \bar{x} \pm z \frac{s}{\sqrt{n}}\] In this formula, \(\bar{x}\) is the mean of the sample, \(z\) is the z-score for the given confidence level, \(s\) is the standard deviation of the sample, and \(n\) is the size of the sample.

Example of Calculating the Confidence Interval of the Population Mean:

Assume we want the find the 95% confidence interval of the mean of exam grades of a population of students. A sample of 12 students is selected, with a sample mean of 85.2 and a sample standard deviation of 4.5.

For this example, let us assign the following values: \(\bar{x} = 85.2,s = 4.5, n = 12\)

Calculating the z-score gives us \(z = 1.645\).

Using the formula, we get \(\bar{x} \pm z \frac{s}{\sqrt{n}} = 85.2 \pm 1.645 \frac{4.5}{\sqrt{12}}\).

Thus, the Lower Bounds is 83.063 and the Upper Bounds is 87.337.

Confidence Intervals of the Population Means of Movie Ratings By Year:

Let’s assume we want to the find the 99% confidence interval of the mean ratings of random samples of movies per year from the years 1990 to 2000.

First, we will find the ratings of 50 random movies per year, from 1990 to 2000.

library(ggplot2movies)
set.seed(123)
sampleRatings = c()
for(year in 1990:2000) {
  movieRatingsForSpecificYear = movies[movies$year == year, 5]
  sampleRatings = cbind(sampleRatings, 
                movieRatingsForSpecificYear
                [sample(1:length(movieRatingsForSpecificYear), 50)])
}

Calculating the Confidence Intervals of the Population Means of Movie Ratings By Year:

Next, we will use the formula to calculate the confidence intervals of the means of the population.

sampleMeansOfPopulation = apply(sampleRatings, 2, mean)
sampleStandardDevs = apply(sampleRatings, 2, sd)
sampleSize = nrow(sampleRatings)
zScoreForConfidenceLevel = qnorm(1 - ((1 - 0.99) / 2))

lowerBoundOfConfidenceIntervals = sampleMeansOfPopulation - 
  zScoreForConfidenceLevel * (sampleStandardDevs / sqrt(sampleSize))

upperBoundOfConfidenceIntervals = sampleMeansOfPopulation + 
  zScoreForConfidenceLevel * (sampleStandardDevs / sqrt(sampleSize))

Setting up the Plot of Confidence Intervals using plotly

Next, let’s set up our plot using the plotly library.

library(plotly)
plotOfMovieRatingsCI = plot_ly(x = 1990:2000, y = sampleMeansOfPopulation, 
            type = "scatter", mode = "lines+markers", 
            marker = list(color = "green"), line = list(color = "green"), 
            name = "Sample Means") %>% 
  add_trace(x = 1990:2000, y = upperBoundOfConfidenceIntervals, 
            marker = list(color = "red", symbol = "square"), 
            line = list(color = "red"), name = "Upper Bounds") %>% 
  add_trace(x = 1990:2000, y = lowerBoundOfConfidenceIntervals, 
            marker = list(color = "red", symbol = "x"), 
            line = list(color = "red"), name = "Lower Bounds")  %>%
  layout(title = "Confidence Intervals of Mean Movie Rating By Year", 
         xaxis = list(title = "Years"), yaxis = list(title = "Ratings"))

Plotting the Confidence Intervals

plotOfMovieRatingsCI

Plotting 90% CI of Mean Sepal Length By Species in iris, using ggplot

Plotting 95% CI Plot of Mean Weight By Transmission in mtcars, using ggplot