2026-02-08

Commute Times in Sampled Major Cities

We will be looking at the relationship between commute times and distance traveled. For this presentation, we have 500 points of data for each of the following cities: Boston, Houston, Minneapolis, and Washington.

Linear Regression Model

\[Time_i = \beta_0 + \beta_1 \cdot Distance_i + \varepsilon_i\]

  • \(\beta_0\): intercept (baseline commute time)
  • \(\beta_1\): average change in time per unit distance
  • \(\varepsilon_i\): random error

Hypothesis Test for Slope

\[ H_0: \beta_1 = 0 \]

\[ H_a: \beta_1 \neq 0 \]

  • Tests whether distance is linearly related to commute time

Scatter Plot vs Regression Line

## `geom_smooth()` using formula = 'y ~ x'

Residual Plot

R Code used for previous plot

g <- lm(Time ~ Distance, data = MetroCommutes)

ggplot(data.frame(
  fitted = fitted(g),
  residuals = resid(g)
), aes(x = fitted, y = residuals)) +
  geom_point(alpha = 0.4) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  labs(
    title = "Residuals vs Fitted Values",
    x = "Fitted Commute Time",
    y = "Residuals"
  )

Commute Times by Distance and City Plot

Conclusion

Based on the available data, we can conclude that commute time generally increases as commute distance increases. We used a linear model to display this relationship. Residual analysis demonstrates that the linear model we used was appropriate for this data set.