Iris Data Set

Introduction

The Iris flower data set or Fisher’s Iris data set is a multivariate data set introduced by the British statistician and biologist Ronald Fisher in his 1936 paper The use of multiple measurements in taxonomic problems as an example of linear discriminant analysis.
Preview of the Iris Dataset
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa
5.4 3.9 1.7 0.4 setosa
4.6 3.4 1.4 0.3 setosa
5.0 3.4 1.5 0.2 setosa
4.4 2.9 1.4 0.2 setosa
4.9 3.1 1.5 0.1 setosa

Numeric Summaries by Species

The table below summarizes the information for the Setosa species.
Summary of Species Setosa
Sepal_Length Sepal_Width Petal_Length Petal_Width
Min. 4.3 2.3 1.0 0.10
1st Qu. 4.8 3.2 1.4 0.20
Median 5.0 3.4 1.5 0.20
Mean 5.0 3.4 1.5 0.25
3rd Qu. 5.2 3.7 1.6 0.30
Max. 5.8 4.4 1.9 0.60
The table below summarizes the information for the versicolor species.
Summary of Species Versicolor
Sepal_Length Sepal_Width Petal_Length Petal_Width
Min. 4.9 2.0 3.0 1.0
1st Qu. 5.6 2.5 4.0 1.2
Median 5.9 2.8 4.4 1.3
Mean 5.9 2.8 4.3 1.3
3rd Qu. 6.3 3.0 4.6 1.5
Max. 7.0 3.4 5.1 1.8
The table below summarizes the information for the virginica species.
Summary of Species Virginica
Sepal_Length Sepal_Width Petal_Length Petal_Width
Min. 4.9 2.2 4.5 1.4
1st Qu. 6.2 2.8 5.1 1.8
Median 6.5 3.0 5.6 2.0
Mean 6.6 3.0 5.6 2.0
3rd Qu. 6.9 3.2 5.9 2.3
Max. 7.9 3.8 6.9 2.5

Visuals

Boxplots for Comparison

Below is a box plot showing the sepal lengths of the irises based on their species.

Below is a box plot showing the sepal lengths of the irises based on their species.

Multiple Distributions Present

Scatterplots