2025-09-17

Dataset Overview

The faithful data set contains 272 eruptions of the Old Faithful geyser.

Variables:

  - eruptions: eruption duration (minutes)
  
  - waiting: waiting time until the next eruption (minutes)

We will use simple linear regression to predict eruption duration.

The Regression Model

A simple linear regression has the form:

\[ Y = \beta_0 + \beta_1 X + \epsilon \]

For the Old Faithful data set:

\[ \text{eruption} = \beta_0 + \beta_1 \times \text{waiting} + \epsilon \]

R Code Summary

## 
## Call:
## lm(formula = eruptions ~ waiting, data = faithful)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.29917 -0.37689  0.03508  0.34909  1.19329 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.874016   0.160143  -11.70   <2e-16 ***
## waiting      0.075628   0.002219   34.09   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4965 on 270 degrees of freedom
## Multiple R-squared:  0.8115, Adjusted R-squared:  0.8108 
## F-statistic:  1162 on 1 and 270 DF,  p-value: < 2.2e-16

ggplot Scatter Plot & Regression Line

ggplot Residuals Plot

Plotly Scatter Plot

Results

The fitted regression equation is:

\[ \text{eruption} = -1.87 + 0.075 \times \text{waiting} \]

  • \(R^2 \approx 0.81\)

  • p-value \(< 2 \times 10^{-16}\) → highly significant

Interpretation:
Each additional minute of waiting increases eruption duration by about
\(0.075\) minutes (~4.5 seconds).

Conclusion

The simple linear regression model shows a strong positive correlation between waiting time and eruption duration.