Predict with lm and faithful R-dataset

Anat Kedem
April 25, 2015

Predict Waiting Time for the Next Geyser Eruption

Interactive Shiny application to predict with lm model

OVERVIEW

Prepared as an assignment in the Coursera course Developing Data Products.

Description of the Shiny Application

The Shiny page include static and interactive elements:

  • Chart
    • Scattered chart show the faithful R dataset where x axis=eruptions duration and y=waiting time, and a red regression line show the fitted lm model.
    • Small blue square and line show interactively the predicted value for a given eruption time, and the tolerance interval for this prediction.
  • Input/Output
    • 2 scrollbars let the user control the given eruption time and the tolerance for the prediction interval.
    • 3 Text boxes show the predicted waiting time and the prediction interval values.
  • The next two slides describe the calculations behind the application.

The “lm” Model

The lm function create linear regression model that can be used to predict new values, and contain information as the Residual Standard Error and the prediction equation.
RSE equation: \( S_{y}=\sqrt{\frac{\sum_{i=i}^{n} (y_{i}-\hat{y}_{i})^2}{(n-2)}}=5.91 \)
prediction: \( \hat{y}=10.73\cdot x^* + 33.47 \)

were \( y_{i} \) is each waiting value in faithful, \( \hat{y}_{i} \) is each equivalent predicted waiting time, (n-2) degree of freedom, \( x^* \) new eruption value to predict with. See code below:

library(datasets) ; data(faithful)
modFit <- lm(waiting~eruptions, data=faithful)
summary(modFit)$coefficients[1:2]
[1] 33.47440 10.72964
summary(modFit)$sigma                                   #model RSE
[1] 5.914009

The Prediction Interval

The predict function use the model coefficients, mean, RSE, etc. to predict waiting time and interval, for a new eruption.
The Shiny application show interactively one predicted value at a time and therefore the prediction interval (tolerance) is presented.
Tolerance equation:
\[ \hat{y}\pm t_{n-2}^*\cdot S_{y}\cdot\sqrt{1+\frac{1}{n}+\frac{(x^*-\bar{x})^2}{(n-1)S_x^2}}=55.88 to 75.45 \]
for \( x^*=3 \) (new value) and 0.9 tolerance, were \( t_{n-2}^* \) is t value for a given tolerance and degree of freedom (270), and \( \bar{x} \) is eruptions mean in faithful data. See code below:

predict(modFit, data.frame(eruptions=3), interval="prediction", level=0.9)
       fit      lwr     upr
1 65.66332 55.88094 75.4457

Link to The Old Faithful Geyser

National Park Service site, you can also click on the links below:

click inside the iframe