Machine Learning with Iris Dataset

Sarthak Dasadia
August 17,2016

Iris Data Set

The Iris flower data set or Fisher's Iris data set is a multivariate data set introduced by Ronald Fisher in his 1936 paper.
The data set consists of 150 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimetres.

Dataset

Let's have a quick look ar the dataset.

data(iris)
head(iris)

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

As you can see we have 4 features to predict the species.

Machine Learning with Iris Dataset

In this application, we will use two methods to predict the type of species of the iris flower.

i. Random Forest Method

Random forests are operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees.
Reference: https://en.wikipedia.org/wiki/Random_forest

ii. Decision Tree Method

A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm.
Reference: https://en.wikipedia.org/wiki/Decision_tree

Results

Both methods show pretty accurate results. The Random Forest method is more accurate than the Decesion Tree method.

i. Random Forest

It predicted most of all the test data accurately with >98% accuracy.

ii. Decision Tree

The accuracy ranges between 90% - 95%.