A study investigated the distribution of passenger vehicle speeds traveling on the Interstate 5 Freeway (I-5) in California. They discovered that the distribution is nearly normal with a mean of 72.6 miles/hour and a standard deviation of 4.78 miles/hour.
We’ll use the tidyverse package for data wrangling and visualisation.
You can load the package by running the following code:
library(tidyverse)
Solution: The percent of passenger vehicles that travel slower than 80 miles/hour is calculated by
pnorm(80, mean = 72.6, sd = 4.78)
## [1] 0.939203
Therefore, approximately 93.92% of passenger vehicles on I-5 travel slower than than 80 mph.
Solution: The percent of passenger vehicles that travel between 60 and 80 miles/hour is calculated by
pnorm(80, mean = 72.6, sd = 4.78) - pnorm(60, mean = 72.6, sd = 4.78)
## [1] 0.9350083
Therefore, approximately 93.5% of passenger vehicles on I-5 travel between 60 mph and 80 mph.
Solution: The fastest 5% of passenger vehicles travel is calculated by
qnorm(0.95, mean = 72.6, sd = 4.78)
## [1] 80.4624
Therefore, the fastest 5% of passenger vehicles on I-5 travel at approximately 80.46 mph or more.
Solution: The percentage of passenger vehicles that travel above the speed limit on this stretch of the I-5 is calculated by
1 - pnorm(70, mean = 72.6, sd = 4.78)
## [1] 0.7067562
Therefore, approximately 70.68% of passenger vehicles on I-5 travel above speed limit.
Solution: If 5 cars are checked by the highway patrol officer on the side of the freeway, the probability that none are speeding is calculated by
z <- (70 - 72.6)/4.78 #z-score for no speeding
pns <- pnorm(z, mean = 0, sd = 1) #probability of passing car not speeding
pns^5 #probability this event occurs 5 times
## [1] 0.002168423
Therefore, the probability that 5 cars pass and none are speeding is approximately 0,22%.
Solution: The number cars the highway patrol officer would expect to watch until the first speeding car is and the standard deviation of the number of cars he would expect to watch are equivalent to the expected value (mean) and the standard deviation of the geometric distribution, that are calculated by
p <- pnorm(70, mean = 72.6, sd = 4.78, lower.tail = FALSE)
#probability of success: observing a passing car that is speeding
mean <- 1/p
mean #expected value (mean): number of cars passing until one is speeding
## [1] 1.414915
sd <- sqrt((1 - p)/p^2)
sd #standard deviation of number of cars passing until one is speeding
## [1] 0.7662046
Therefore, on average, the highway patrol officer would expect to watch approximately 1.41 cars until the first car that is speeding. The standard deviation of the number of cars he would expect to watch is approximately 0,77.
Start by reading in the data
speed_data <- read_csv("speed_data.csv")
Solution: According to the calculated empirical mean and standard deviation of the variable speed, their values suggest that this sample could not have been taken on the Interstate 5, because the values of the mean and standard deviation differ considerably from the original distribution.
speed_data |>
summarise(
mean_speed = mean(speed),
sd_speed = sd(speed)
)
## # A tibble: 1 × 2
## mean_speed sd_speed
## <dbl> <dbl>
## 1 73.1 5.12
Now create a normal probability plot for the variable speed. Use the normal probability plot to decide again if the sample distribution is nearly normal. Justify your answer. If you decide, that the sample distribution is not nearly normal, describe the shape of the distribution.
Solution: The following normal probability plot and histogram suggest that the sample is not nearly normally distributed, as there are too many extreme outlines. It follows the shape of long tails type distribution (wider than the normal distribution) - points start below the line, bend to follow it, and end above it.
ggplot(speed_data, aes(sample = speed)) +
geom_qq() +
geom_qq_line() # normal probability plot
ggplot(speed_data, aes(x = speed)) +
geom_histogram(
aes(y = after_stat(density)),
bins = 20,
colour = "brown") +
geom_function(
fun = dnorm,
args = list(
mean = 72.6, sd = 4.78),
colour = "purple", linewidth = 1.5)