1/1/2017

Prediction iris species

Biologists use tools to aid in species classification such as dichotomous keys. The iris prediction app, found at https://emchasen.shinyapps.io/iris_prediction/ can help botanists distinguish between three different species that are similar in appearence.

By taking measurements on the sepal length, sepal width, petal length, and petal width, we can predict whether the species is Iris setosa, I. versicolor, or I. virginica.

Data and packages used

This app was devoloped by creating a random forest model of the iris dataset, and using the caret package for model creation.

library("caret")
data("iris")
str(iris)
## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

Measurements as species predictors

Some are much better than others

Model accuracy

set.seed(2017)
inTrain <- createDataPartition(y = iris$Species, p = 0.7, list = FALSE)
training <- iris[inTrain,]
testing <- iris[-inTrain,]
model1 <- train(Species ~ ., method = "rf", data = training)
pred <- predict(model1, newdata = testing)
confusionMatrix(pred, testing$Species)$overall[1]
##  Accuracy 
## 0.9777778

Model accuracy is 97.8%