Found a neat dataset related to duration of eruptions and waiting time between for Old Faithful (see here).

Load the data

eruptions waiting
3.600 79
1.800 54
3.333 74
2.283 62
4.533 85

Create the Model

## 
## Call:
## lm(formula = eruptions ~ waiting, data = d)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.29917 -0.37689  0.03508  0.34909  1.19329 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.874016   0.160143  -11.70   <2e-16 ***
## waiting      0.075628   0.002219   34.09   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4965 on 270 degrees of freedom
## Multiple R-squared:  0.8115, Adjusted R-squared:  0.8108 
## F-statistic:  1162 on 1 and 270 DF,  p-value: < 2.2e-16

Based on the plot, it looks like we have a strong linear relationship here! The summary table confirms this.

Examine the Residuals

## 
##  Shapiro-Wilk normality test
## 
## data:  m$residuals
## W = 0.99278, p-value = 0.2106

Here we see that the residuals are reasonably normally distributed and based on the Shapiro-Wilk test we cannot reject the null hypothesis that the sample comes from a normally distributed population (… or process in this case, I suppose)

Real Random Data

Just for visual comparison, I took a look at the residuals of truly random data using qqnorm, and it’s hard to distinguish. I’d say that the assumptions for regression are likely valid here, but a larger sample size would be ideal.