Found a neat dataset related to duration of eruptions and waiting time between for Old Faithful (see here).

Load the data

library(datasets)
library(knitr)

data(faithful)
d <- data.frame(faithful)

#examine the data
kable(head(d,5))

eruptions	waiting
3.600	79
1.800	54
3.333	74
2.283	62
4.533	85

Create the Model

m <- lm(eruptions~  waiting,data=d)
plot(d$eruptions ~ d$waiting)
abline(m)

summary(m)

## 
## Call:
## lm(formula = eruptions ~ waiting, data = d)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.29917 -0.37689  0.03508  0.34909  1.19329 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.874016   0.160143  -11.70   <2e-16 ***
## waiting      0.075628   0.002219   34.09   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4965 on 270 degrees of freedom
## Multiple R-squared:  0.8115, Adjusted R-squared:  0.8108 
## F-statistic:  1162 on 1 and 270 DF,  p-value: < 2.2e-16

Based on the plot, it looks like we have a strong linear relationship here! The summary table confirms this.

Examine the Residuals

hist(m$residuals)

qqnorm(m$residuals)
qqline(m$residuals)

shapiro.test(m$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  m$residuals
## W = 0.99278, p-value = 0.2106

Here we see that the residuals are reasonably normally distributed and based on the Shapiro-Wilk test we cannot reject the null hypothesis that the sample comes from a normally distributed population (… or process in this case, I suppose)

Real Random Data

Just for visual comparison, I took a look at the residuals of truly random data using qqnorm, and it’s hard to distinguish. I’d say that the assumptions for regression are likely valid here, but a larger sample size would be ideal.

#for comparison purposes
rnd.data <- rnorm(272)
qqnorm(rnd.data )
qqline(rnd.data )

DATA 605 Week 11 - Old Faithful

Paul Britton

Nov 7, 2018

Load the data

Create the Model

Examine the Residuals

Real Random Data