Chapter 8 : INTRODUCTION TO LINEAR REGRESSION

Exercise 8.23: The Coast Starlight, Part II.

Exercise 8.11 introduces data on the Coast Starlight Amtrak train that runs from Seattle to Los Angeles. The mean travel time from one stop to the next on the Coast Starlight is 129 mins, with a standard deviation of 113 minutes. The mean distance traveled from one stop to the next is 108 miles with a standard deviation of 99 miles. The correlation between travel time and distance is 0.636.

Write the equation of the regression line for predicting travel time.

The equation is : \(y={ b }_{ 1 }x+{ b }_{ 0 }\)

For this case the equation becomes : \(t={ b }_{ 1 }d+{ b }_{ 0 }\)

t : predicted travel time

d : distance between the stops in miles

\({ b }_{ 1 }\) : Slope

\({ b }_{ 0 }\) : Intercept

m_t <- 129
sd_t <- 113
m_d <- 108
sd_d <- 99
r <- 0.636
df <- data.frame(MEAN=c(m_t, m_d), SD=c(sd_t, sd_d), Correlation=c(r, ""))
df

b1 <- (sd_t/sd_d)*r
b1 <- round(b1,2)
b1

## [1] 0.73

b0 <- - (b1*m_d)+m_t
b0 <- round(b0,2)
b0

## [1] 50.16

So the equation becomes : t=0.73d+50.16

Interpret the slope and the intercept in this context.

The slope predicts that it takes 0.73 mins for each additional mile. When the distance traveled is 0 the intercept is 50.16 In this context it sounds like 0 miles is traveled in 50.16 mins. It can be considered as the wait time.

Calculate R2 of the regression line for predicting travel time from distance traveled for the Coast Starlight, and interpret R2 in the context of the application.

R_sq <- r^2
R_sq

## [1] 0.404496

\({ R }^{ 2 }=0.404\). This means that there is 40.4 % variation in travel time.

The distance between Santa Barbara and Los Angeles is 103 miles. Use the model to estimate the time it takes for the Starlight to travel between these two cities.

t <- (b1*103)+ b0
t

## [1] 125.35

It actually takes the Coast Starlight about 168 mins to travel from Santa Barbara to Los Angeles. Calculate the residual and explain the meaning of this residual value.

res <- 168-t
res

## [1] 42.65

There is a difference on 42.65 between the predicted and the actual value. The predicted value is less.

Suppose Amtrak is considering adding a stop to the Coast Starlight 500 miles away from Los Angeles. Would it be appropriate to use this linear model to predict the travel time from Los Angeles to this point?

No, It would not be appropriate to use the linear model to predict the travel time from Los Angeles to a point 500 miles away as 500 miles is 4 standard deviations from the mean so it is way out of the range of distance we could consider.

Data606 Meetup Presentation

Irene Jacob

2020-11-01

Chapter 8 : INTRODUCTION TO LINEAR REGRESSION

Exercise 8.23: The Coast Starlight, Part II.