Exercise 8.11 introduces data on the Coast Starlight Amtrak train that runs from Seattle to Los Angeles. The mean travel time from one stop to the next on the Coast Starlight is 129 mins, with a standard deviation of 113 minutes. The mean distance traveled from one stop to the next is 108 miles with a standard deviation of 99 miles. The correlation between travel time and distance is 0.636.
The equation is : \(y={ b }_{ 1 }x+{ b }_{ 0 }\)
For this case the equation becomes : \(t={ b }_{ 1 }d+{ b }_{ 0 }\)
t : predicted travel time
d : distance between the stops in miles
\({ b }_{ 1 }\) : Slope
\({ b }_{ 0 }\) : Intercept
m_t <- 129
sd_t <- 113
m_d <- 108
sd_d <- 99
r <- 0.636
df <- data.frame(MEAN=c(m_t, m_d), SD=c(sd_t, sd_d), Correlation=c(r, ""))
df
b1 <- (sd_t/sd_d)*r
b1 <- round(b1,2)
b1
## [1] 0.73
b0 <- - (b1*m_d)+m_t
b0 <- round(b0,2)
b0
## [1] 50.16
So the equation becomes : t=0.73d+50.16
The slope predicts that it takes 0.73 mins for each additional mile. When the distance traveled is 0 the intercept is 50.16 In this context it sounds like 0 miles is traveled in 50.16 mins. It can be considered as the wait time.
R_sq <- r^2
R_sq
## [1] 0.404496
\({ R }^{ 2 }=0.404\). This means that there is 40.4 % variation in travel time.
t <- (b1*103)+ b0
t
## [1] 125.35
res <- 168-t
res
## [1] 42.65
There is a difference on 42.65 between the predicted and the actual value. The predicted value is less.
No, It would not be appropriate to use the linear model to predict the travel time from Los Angeles to a point 500 miles away as 500 miles is 4 standard deviations from the mean so it is way out of the range of distance we could consider.