A study investigated the distribution of passenger vehicle speeds traveling on the Interstate 5 Freeway (I-5) in California. They discovered that the distribution is nearly normal with a mean of 72.6 miles/hour and a standard deviation of 4.78 miles/hour.

Packages

We’ll use the tidyverse package for data wrangling and visualisation.

You can load the package by running the following code:

library(tidyverse) 

Exercises 1

  1. What percent of passenger vehicles travel slower than 80 miles/hour?

Solution: The percent of passenger vehicles that travel slower than 80 miles/hour is calculated by

pnorm(80, mean = 72.6, sd = 4.78)
## [1] 0.939203

Therefore, approximately 93.92% of passenger vehicles on I-5 travel slower than than 80 mph.

Exercise 2

  1. What percent of passenger vehicles travel between 60 and 80 miles/hour?

Solution: The percent of passenger vehicles that travel between 60 and 80 miles/hour is calculated by

pnorm(80, mean = 72.6, sd = 4.78) - pnorm(60, mean = 72.6, sd = 4.78)
## [1] 0.9350083

Therefore, approximately 93.5% of passenger vehicles on I-5 travel between 60 mph and 80 mph.

Exercise 3

  1. How fast do the fastest 5% of passenger vehicles travel?

Solution: The fastest 5% of passenger vehicles travel is calculated by

qnorm(0.95, mean = 72.6, sd = 4.78)
## [1] 80.4624

Therefore, the fastest 5% of passenger vehicles on I-5 travel at approximately 80.46 mph or more.

Exercise 4

  1. The speed limit on this stretch of the I-5 is 70 miles/hour. Compute the percentage of the passenger vehicles traveling above the speed limit on this stretch of the I-5.

Solution: The percentage of passenger vehicles that travel above the speed limit on this stretch of the I-5 is calculated by

1 - pnorm(70, mean = 72.6, sd = 4.78)
## [1] 0.7067562

Therefore, approximately 70.68% of passenger vehicles on I-5 travel above speed limit.

Exercise 5

  1. A highway patrol officer is hidden on the side of the freeway. What is the probability that 5 cars pass and none are speeding? Assume that the speeds of the cars are independent of each other.

Solution: If 5 cars are checked by the highway patrol officer on the side of the freeway, the probability that none are speeding is calculated by

z <- (70 - 72.6)/4.78                    #z-score for no speeding
pns <- pnorm(z, mean = 0, sd = 1)        #probability of passing car not speeding
pns^5                                    #probability this event occurs 5 times
## [1] 0.002168423

Therefore, the probability that 5 cars pass and none are speeding is approximately 0,22%.

Exercise 6

  1. On average, how many cars would the highway patrol officer expect to watch until the first car that is speeding? What is the standard deviation of the number of cars he would expect to watch?

Solution: The number cars the highway patrol officer would expect to watch until the first speeding car is and the standard deviation of the number of cars he would expect to watch are equivalent to the expected value (mean) and the standard deviation of the geometric distribution, that are calculated by

p <- pnorm(70, mean = 72.6, sd = 4.78, lower.tail = FALSE)
#probability of success: observing a passing car that is speeding


mean <- 1/p
mean #expected value (mean): number of cars passing until one is speeding
## [1] 1.414915
sd <- sqrt((1 - p)/p^2)

sd #standard deviation of number of cars passing until one is speeding
## [1] 0.7662046

Therefore, on average, the highway patrol officer would expect to watch approximately 1.41 cars until the first car that is speeding. The standard deviation of the number of cars he would expect to watch is approximately 0,77.

Exercise 7

Start by reading in the data

speed_data <- read_csv("speed_data.csv")
  1. Compute the empirical mean and standard deviation of the variable speed. Based on these two values, decide, if the sample could have been taken on the Interstate 5. To be precise, if the sample distribution could be nearly normal with a mean of 72.6 and a standard deviation of 4.78.

Solution: According to the calculated empirical mean and standard deviation of the variable speed, their values suggest that this sample could not have been taken on the Interstate 5, because the values of the mean and standard deviation differ considerably from the original distribution.

speed_data |> 
  
  summarise(
    
    mean_speed = mean(speed),
    sd_speed = sd(speed)
    
    )
## # A tibble: 1 × 2
##   mean_speed sd_speed
##        <dbl>    <dbl>
## 1       73.1     5.12

Now create a normal probability plot for the variable speed. Use the normal probability plot to decide again if the sample distribution is nearly normal. Justify your answer. If you decide, that the sample distribution is not nearly normal, describe the shape of the distribution.

Solution: The following normal probability plot and histogram suggest that the sample is not nearly normally distributed, as there are too many extreme outlines. It follows the shape of long tails type distribution (wider than the normal distribution) - points start below the line, bend to follow it, and end above it.

ggplot(speed_data, aes(sample = speed)) +
  
 geom_qq() + 
  
 geom_qq_line()                                                                 # normal probability plot

ggplot(speed_data, aes(x = speed)) +
 geom_histogram(
 aes(y = after_stat(density)),
 bins = 20,
 colour = "brown") +
 geom_function(
 fun = dnorm,
 args = list(
 mean = 72.6, sd = 4.78),
 colour = "purple", linewidth = 1.5)