- How waiting time until eruption predicts eruption duration
- The OLS model, key diagnostics, and interpretation
- A quick look at model uncertainty and significance
2025-10-16
| Eruption Duration (min) | Waiting Time (min) |
|---|---|
| 3.600 | 79 |
| 1.800 | 54 |
| 3.333 | 74 |
| 2.283 | 62 |
| 4.533 | 85 |
| 2.883 | 55 |
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | -1.874 | 0.160 | -11.702 | 0 |
| waiting | 0.076 | 0.002 | 34.089 | 0 |
Let \(x_i\) be waiting time and \(y_i\) be eruption duration.
\[ \hat{\beta}_{1} = \frac{\sum_{i=1}^{n}(x_i-\bar{x})(y_i-\bar{y})} {\sum_{i=1}^{n}(x_i-\bar{x})^2}, \quad \hat{\beta}_{0} = \bar{y} - \hat{\beta}_{1}\bar{x} \]
We test the null hypothesis against the alternative:
\[ H_{0} : \beta_{1} = 0 \quad \text{vs} \quad H_{a} : \beta_{1} \neq 0 \]
The test statistic is:
\[ t = \frac{\hat{\beta}_{1}}{\operatorname{SE}(\hat{\beta}_{1})} \sim t_{n-2} \]
And the confidence interval for the slope is:
\[ \hat{\beta}_{1} \; \pm \; t_{0.975,\,n-2} \cdot \operatorname{SE}(\hat{\beta}_{1}) \]
p_scatter <- ggplot(df, aes(waiting, eruptions)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Old Faithful: Eruption Duration vs Waiting Time",
x = "Waiting Time (minutes)",
y = "Eruption Duration (minutes)")