Summary

This R markup document shows training and validation of a sample class prediction model. The code below, along with its comments, should be self-explanatory.

library(jsonlite)
library(randomForest)

# read the training data set 'D' for the class prediction model:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
D = read.csv(url,header=F)
names(D)[5]='label'

# train the model
set.seed(123)
model <- randomForest(label ~ ., data=D, na.action=na.omit)
# print performance metrics incl. its estimated out-of-sample (OOB) error rate:
model
## 
## Call:
##  randomForest(formula = label ~ ., data = D, na.action = na.omit) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 2
## 
##         OOB estimate of  error rate: 4%
## Confusion matrix:
##                 Iris-setosa Iris-versicolor Iris-virginica class.error
## Iris-setosa              50               0              0        0.00
## Iris-versicolor           0              47              3        0.06
## Iris-virginica            0               3             47        0.06
# prepare the validation test set 'V'
V = fromJSON('example.json')
V = na.omit(V)
VI = as.data.frame(t(simplify2array(V$info)))
names(VI) = c('V1','V2','V3','V4')
V$info = NULL
V$id = NULL
VL = factor(V$label)
V$label = NULL
V = cbind(V,VI)
V = na.omit(V) # the predictors V and corresponding labels VL to be used for validating the model
P = predict(model,V)
PL = as.factor(as.numeric(P)-1)

# predict labels of V using model, and compare against VL:
accuracy = sum(PL==VL)/length(VL)

# accuracy in range 0..1:
print(accuracy)
## [1] 1

What’s next

Obviously, what’s shown above is very basic use of R. The challenges include: