1/21/2020

AirQuality Dataset

The “airquality” dataset built into base-R is data of the daily air quality measurements in New York from May to September of 1973. The dataset includes the following:

## 'data.frame':    153 obs. of  6 variables:
##  $ Ozone  : int  41 36 12 18 NA 28 23 19 8 NA ...
##  $ Solar.R: int  190 118 149 313 NA NA 299 99 19 194 ...
##  $ Wind   : num  7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
##  $ Temp   : int  67 72 74 62 56 66 65 59 61 69 ...
##  $ Month  : int  5 5 5 5 5 5 5 5 5 5 ...
##  $ Day    : int  1 2 3 4 5 6 7 8 9 10 ...

These data provide the basis for understanding and manipulating the interdependencies of each variable.

This app was built to allow a user an easy interface to explore this dataset, choose some predictor values and calculate a predicted outcome using a simple randomForest model.

Exploratory Analysis

This shiny app allows a user to explore this dataset using an interactive map. The X-axis, Y-axis, Color and Size of the data is user-selectable, allowing the user to explore how the variables depend on each other.

User Prediction Selection

Once the user has selected the variables, they are able to enter values for the predictors (X-axis, Color, Size) and predict an outcome (Y-axis).

This is a very simple layout and one does not need to know anything about prediction algorithms or regression models.

To keep it simple, the code requires inputs from all three predictions, though with some additional time and effort, this could be modified to allow a user to select only one or all values as predictors.

Results

The results are only run when the user presses the Predict button. This allows a quicker interface, so the server is not constantly updating the randomForest model and prediction with every selection change.

Below is the code from server.R showing the very simple randomForest model used as the model fit:

## Train a randomforest model based on the complete cases of the data.
model <- train(x=data[complete.cases(data),x],
               y=data[complete.cases(data),y], method="rf")
## Get the predicted value based on the inputed predictors.
prediction <- predict(model, pred)
## Get the results to the user.
paste0("Predicted value of ",y,": ",round(prediction,2),
       "\nPredictors: ",x1,", ",x2,", ",x3,
       "\nUsing randomforest method inside caret package's train() function.",
       "\n(Only complete cases used for modeling.)",
       "\nTo make another prediction, 
       enter new values and select Predict, 
       or select new variables in the graph.")