A Prediction of Flower Species using the Iris Data Set in R

Teo Tse Tsong
13th January 2016

Scope

Using Fischer's Iris flower data set, a prediction model is derived upon which future predictions can be based. The data set consists of 150 observations with the following attributes (for each observation) :

  • Sepal length
  • Sepal width
  • Petal length
  • Petal width

Visualizing the Data

The following plots allows the user to see the correlation of the four attributes in the data.

plot of chunk unnamed-chunk-1

Data Partitioning

The “Random Forest” method is chosen to build the model due to its ability to handle complex classification tasks, and it works well even if there are non-linear interacts between features.

First the data is separated into training and testing sets.

library(caret)
library(randomForest)
data(iris)
set.seed(12345)
inTrain<-createDataPartition(iris$Species,p=0.7,list=FALSE)
training<-iris[inTrain,]
testing<-iris[-inTrain,]

Model Building

Training the model is straightforward

modfit<-train(Species~.,method="rf",data=training)

Testing the prediction model is then performed

predicted<-predict(modfit,testing[,-5])

Testing Results

It is possible to evaluate the quality of the predictions using the confusion matrix. As can be seen from the table below, there is only 1 wrong classification out of 45.

table(predicted,testing$Species)

predicted    setosa versicolor virginica
  setosa         15          0         0
  versicolor      0         15         1
  virginica       0          0        14

The shiny app version of this prediction is deployed at : https://tsetsong.shinyapps.io/Project/