Prediction of Old Faithful geyser eruption duration

Yellowstone National Park, Wyoming, USA

Pierre Paquay

Presentation

  • This Shiny application predicts the duration of an eruption of Old Faithful geyser (Yellowstone National Park, Wyoming, USA) as a linear function of the waiting time to the next eruption.

  • This application uses linear regression and builds a model based on the "faithful" dataset. This regression model is then used to predict eruption duration based on the waiting time (in minutes) entered by the user.

  • The predicted eruption duration is displayed in minutes.

  • This application also plots the eruption duration vs the waiting time to the next eruption. Moreover the user may control which graphical elements he wishes to display in the plot (sample points, regression line and transparency level of the sample points).

  • Link to the application hosted at shinyapps.io : ppaquay.shinyapps.io/PredApp/

About the data

The dataset used for this application is the "faithful" dataset from the "datasets" package.

library(datasets)
head(faithful)
##   eruptions waiting
## 1     3.600      79
## 2     1.800      54
## 3     3.333      74
## 4     2.283      62
## 5     4.533      85
## 6     2.883      55

This dataset consists of 272 observations of 2 variables : "eruptions" and "waiting" which are the times in minutes of the eruption duration and the waiting time to the next eruption respectively.

About the model

We use the "lm" function to build the regression model.

model <- lm(eruptions ~ waiting, data = faithful)
summary(model)$coef
##             Estimate Std. Error t value   Pr(>|t|)
## (Intercept) -1.87402   0.160143  -11.70  7.359e-26
## waiting      0.07563   0.002219   34.09 8.130e-100

The model obtained is the following.

\[\widehat{\mathtt{eruptions}} = 0.07563\times\mathtt{waiting} - 1.87402\]

By examining the \(R^2\) value, we may see that more than 81% of the variation is explained by the model. This comforts us in the choice of this model.

Eruption duration prediction

Finally the regression model is used to predict the eruption duration based on the waiting time entered by the user. As an example, you'll find below the prediction of the eruption duration after a waiting time of 70 minutes.

predict(model, data.frame(waiting = 70))[[1]]
## [1] 3.42

So we may conclude that the predicted eruption duration based on a waiting time of 70 minutes is about 3.42 minutes.