Tom Withey
30/11/19
R contains a data set which records the eruption time and waiting time between eruptions for the Old Faithful geyser
I have created a shiny app which uses the old faithful data to do the following:
The proportion of data to be used for training is set using an interactive slider, and all plots and outputs are updated accordingly when the 'submit' button is pressed
Based on the slider inut value, the following code (which is made reactive in the server code) creates training and test sets, builds a model and calculates the values predicted by the model
library(caret)
data(faithful)
train_prop <- 0.5 # normally set using the slider
set.seed(333)
train_prop <- train_prop # in the server code this is input$train_prop
inTrain <- createDataPartition(y=faithful$waiting,p=train_prop,list=FALSE)
trainFaith <- faithful[inTrain,]
lm1 <- lm(eruptions ~ waiting,data=trainFaith)
trainFaith$preds <- predict(lm1)
The following code plots the training data (similar code plots the test data)
library(plotly)
p1 <- plot_ly(trainFaith,x=~waiting) %>%
add_trace(y=~eruptions, type = 'scatter',mode='markers',name="Training data") %>%
add_trace(y=~preds,type = 'scatter',mode='lines',name="Fitted model",color='orange')
Finally, the following code reports the linear model parameters and the root mean squared error on the training set (similar code reports the error on the test set)
modvals <- lm1$coefficients
paste0("Intercept: ", format(round(modvals[1],2),nsmall=2), " Slope: ",format(round(modvals[2],4),nsmall=4))
[1] "Intercept: -1.65 Slope: 0.0722"
paste0("Eruption time = waiting time * ",format(round(modvals[2],4),nsmall=4)," ",format(round(modvals[1],2),nsmall=2))
[1] "Eruption time = waiting time * 0.0722 -1.65"
sqrt(mean((trainFaith$preds-trainFaith$eruptions)^2))
[1] 0.4904734