Teo Tse Tsong
13th January 2016
Using Fischer's Iris flower data set, a prediction model is derived upon which future predictions can be based. The data set consists of 150 observations with the following attributes (for each observation) :
The following plots allows the user to see the correlation of the four attributes in the data.
The “Random Forest” method is chosen to build the model due to its ability to handle complex classification tasks, and it works well even if there are non-linear interacts between features.
First the data is separated into training and testing sets.
library(caret)
library(randomForest)
data(iris)
set.seed(12345)
inTrain<-createDataPartition(iris$Species,p=0.7,list=FALSE)
training<-iris[inTrain,]
testing<-iris[-inTrain,]
Training the model is straightforward
modfit<-train(Species~.,method="rf",data=training)
Testing the prediction model is then performed
predicted<-predict(modfit,testing[,-5])
It is possible to evaluate the quality of the predictions using the confusion matrix. As can be seen from the table below, there is only 1 wrong classification out of 45.
table(predicted,testing$Species)
predicted setosa versicolor virginica
setosa 15 0 0
versicolor 0 15 1
virginica 0 0 14
The shiny app version of this prediction is deployed at : https://tsetsong.shinyapps.io/Project/