Classifier for Iris species

Summary

This R markup document shows training and validation of a sample class prediction model. The code below, along with its comments, should be self-explanatory.

library(jsonlite)
library(randomForest)

# read the training data set 'D' for the class prediction model:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
D = read.csv(url,header=F)
names(D)[5]='label'

# train the model
set.seed(123)
model <- randomForest(label ~ ., data=D, na.action=na.omit)
# print performance metrics incl. its estimated out-of-sample (OOB) error rate:
model

## 
## Call:
##  randomForest(formula = label ~ ., data = D, na.action = na.omit) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 2
## 
##         OOB estimate of  error rate: 4%
## Confusion matrix:
##                 Iris-setosa Iris-versicolor Iris-virginica class.error
## Iris-setosa              50               0              0        0.00
## Iris-versicolor           0              47              3        0.06
## Iris-virginica            0               3             47        0.06

# prepare the validation test set 'V'
V = fromJSON('example.json')
V = na.omit(V)
VI = as.data.frame(t(simplify2array(V$info)))
names(VI) = c('V1','V2','V3','V4')
V$info = NULL
V$id = NULL
VL = factor(V$label)
V$label = NULL
V = cbind(V,VI)
V = na.omit(V) # the predictors V and corresponding labels VL to be used for validating the model
P = predict(model,V)
PL = as.factor(as.numeric(P)-1)

# predict labels of V using model, and compare against VL:
accuracy = sum(PL==VL)/length(VL)

# accuracy in range 0..1:
print(accuracy)

## [1] 1

What’s next

Obviously, what’s shown above is very basic use of R. The challenges include:

classifying samples from streaming data sources,
continuously training the prediction model based on actual species labels of the samples classified by it, and
providing an API for using such streaming and on-line learning object classifier.

Classifier for Iris species

Mark Sandstrom

Tuesday, February 02, 2016