A Prediction of Flower Species using the Iris Data Set in R

Teo Tse Tsong
13th January 2016

Scope

Using Fischer's Iris flower data set, a prediction model is derived upon which future predictions can be based. The data set consists of 150 observations with the following attributes (for each observation) :

Sepal length
Sepal width
Petal length
Petal width

Visualizing the Data

The following plots allows the user to see the correlation of the four attributes in the data.

plot of chunk unnamed-chunk-1

Data Partitioning

The “Random Forest” method is chosen to build the model due to its ability to handle complex classification tasks, and it works well even if there are non-linear interacts between features.

First the data is separated into training and testing sets.

library(caret)
library(randomForest)
data(iris)
set.seed(12345)
inTrain<-createDataPartition(iris$Species,p=0.7,list=FALSE)
training<-iris[inTrain,]
testing<-iris[-inTrain,]

Model Building

Training the model is straightforward

modfit<-train(Species~.,method="rf",data=training)

Testing the prediction model is then performed

predicted<-predict(modfit,testing[,-5])

Testing Results

It is possible to evaluate the quality of the predictions using the confusion matrix. As can be seen from the table below, there is only 1 wrong classification out of 45.

table(predicted,testing$Species)


predicted    setosa versicolor virginica
  setosa         15          0         0
  versicolor      0         15         1
  virginica       0          0        14

The shiny app version of this prediction is deployed at : https://tsetsong.shinyapps.io/Project/