Tom Withey
30/11/19
R contains a data set which records the eruption time and waiting time between eruptions for the Old Faithful geyser
I have created a shiny app which uses the old faithful data to do the following:
The server and UI files for building the app may be found on my github page: https://github.com/tw81/shiny_assignment
Based on the slider inut value, the following code (which is made reactive in the server code) creates training and test sets, builds a model and calculates the values predicted by the model
library(caret)
data(faithful)
train_prop <- 0.5 # normally set using the slider
set.seed(333)
train_prop <- train_prop # in the server code this is input$train_prop
inTrain <- createDataPartition(y=faithful$waiting,p=train_prop,list=FALSE)
trainFaith <- faithful[inTrain,]
lm1 <- lm(eruptions ~ waiting,data=trainFaith)
trainFaith$preds <- predict(lm1)
The following code plots the training data (similar code plots the test data)
library(plotly)
p1 <- plot_ly(trainFaith,x=~waiting) %>%
add_trace(y=~eruptions, type = 'scatter',mode='markers',name="Training data") %>%
add_trace(y=~preds,type = 'scatter',mode='lines',name="Fitted model",color='orange')
Finally, the following code reports the linear model parameters and the root mean squared error on the training set (similar code reports the error on the test set)
modvals <- lm1$coefficients
paste0("Eruption time = waiting time * ",format(round(modvals[2],4),nsmall=4)," ",format(round(modvals[1],2),nsmall=2))
[1] "Eruption time = waiting time * 0.0722 -1.65"
sqrt(mean((trainFaith$preds-trainFaith$eruptions)^2))
[1] 0.4904734