DATA606: Homework Presentation

by Bikram Barua

8.23 The Coast Starlight, Part II

Exercise 8.11 introduces data on the Coast Starlight Amtrak train that runs from Seattle to Los Angeles. The mean travel time from one stop to the next on the Coast Starlight is 129 mins, with a standard deviation of 113 minutes. The mean distance traveled from one stop to the next is 108 miles with a standard deviation of 99 miles. The correlation between travel time and distance is 0.636.

(a) Write the equation of the regression line for predicting travel time.

Correlation (R) = 0.636
Standard deviation of time( \(S_{y}\) ) = 113
Standard deviation of distance( \(S_{x}\) ) = 99

Slope of the least squares line: \(b_{1}\) = R x \(S_{y}\)/\(S_{x}\) = 0.636 x 113/99 = 0.726

Mean travel time ( \(\overline{y}\) ) = 129
Mean distance ( \(\overline{x}\) ) = 108

Regression line relationship based on \(\overline{x}\) and \(\overline{y}\) is
\(\overline{y}\) = \(b_{0}\) + \(b_{1}\) x \(\overline{x}\)
129 = \(b_{0}\) + 0.726 x 108
\(b_{0}\) = 129 - 78.408 = 50.592

Equation of least squares regression equation is
\(\hat{Y}\) = \(b_{0}\) + \(b_{1}\) * X
\(\hat{travel time}\) = 50.6 + 0.726 x distance

(b) Interpret the slope and the intercept in this context.

The slope (\(b_{1}\)) is 0.726 based on the calculations above. It means for every additional mile the travel time will increase by 0.726 minutes.
Also based on the equation, the y-intercept (\(b_{0}\)) is 50.6 at x=0. When distance travelled is 0 miles, time taken is 50.6 minutes, which does not make sense. So, it is meaningless except to adjust the height of the line in the graphical representation.

(c) Calculate R2 of the regression line for predicting travel time from distance traveled for the Coast Starlight, and interpret R2 in the context of the application.

The strength of the linear fit between the variables is represented by \(R^{2}\)
\(R^{2}\) = \((0.726)^{2}\) = 0.5
50% variability is accounted for by the model.

(d) The distance between Santa Barbara and Los Angeles is 103 miles. Use the model to estimate the time it takes for the Starlight to travel between these two cities.

\(\hat{travel time}\) = 50.6 + 0.726 x distance
= 50.6 + (0.726 * 103)
= 50.6 + 74.8 = 125.4
The estimated travel time is between 125 to 126 minutes.

(e) It actually takes the Coast Starlight about 168 mins to travel from Santa Barbara to Los Angeles. Calculate the residual and explain the meaning of this residual value.

\(e_{i}\) = 168 - 126 = 42 minutes
There is a positive residual which mean the model underestimates the travel time.

(f) Suppose Amtrak is considering adding a stop to the Coast Starlight 500 miles away from Los Angeles. Would it be appropriate to use this linear model to predict the travel time from Los Angeles to this point?

No, it would not be appropriate to use this linear model. More exploration is required on this model to improve predictability.