# ABSTRACT

Measurements of the sepals and petals for three varieties of iris were analyzed to determine whether any of these characteristics could be used to distibuish varieties. The most useful variables were The Sepal Length, Sepal Width and the Petal Width.

INTRODUCTION

From the Wikipedia article “Iris flower data set” (https://en.wikipedia.org/wiki/Iris_flower_data_set)

The Iris flower data set or Fisher’s Iris data set is a multivariate data set introduced by Ronald Fisher in his 1936 paper ``The use of multiple measurements in taxonomic problems as an example of linear discriminant analysis’’. It is sometimes called Anderson’s Iris data set because Edgar Anderson collected the data to quantify the variation of Iris flowers.

The data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimetres.

PROBLEM

Which features, either alone or in combination, can be used to distinguish among the three varieties of iris? # DATA

summary(iris)
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
## 

ANALYSIS

It was found that Setosa had the highest average Sepal width as well as the smallest average Sepal length and Petal width. Virginica was found to have the largest average Sepal Length. Versicolor fell in the middle of every average distrobution.

Numerical Summaries

Here are the quartiles and means, by variety:

## iris$Species: setosa ## Sepal.Length Sepal.Width Petal.Length Petal.Width ## Min. :4.300 Min. :2.300 Min. :1.000 Min. :0.100 ## 1st Qu.:4.800 1st Qu.:3.200 1st Qu.:1.400 1st Qu.:0.200 ## Median :5.000 Median :3.400 Median :1.500 Median :0.200 ## Mean :5.006 Mean :3.428 Mean :1.462 Mean :0.246 ## 3rd Qu.:5.200 3rd Qu.:3.675 3rd Qu.:1.575 3rd Qu.:0.300 ## Max. :5.800 Max. :4.400 Max. :1.900 Max. :0.600 ## Species ## setosa :50 ## versicolor: 0 ## virginica : 0 ## ## ## ## -------------------------------------------------------- ## iris$Species: versicolor ## Sepal.Length Sepal.Width Petal.Length Petal.Width ## Min. :4.900 Min. :2.000 Min. :3.00 Min. :1.000 ## 1st Qu.:5.600 1st Qu.:2.525 1st Qu.:4.00 1st Qu.:1.200 ## Median :5.900 Median :2.800 Median :4.35 Median :1.300 ## Mean :5.936 Mean :2.770 Mean :4.26 Mean :1.326 ## 3rd Qu.:6.300 3rd Qu.:3.000 3rd Qu.:4.60 3rd Qu.:1.500 ## Max. :7.000 Max. :3.400 Max. :5.10 Max. :1.800 ## Species ## setosa : 0 ## versicolor:50 ## virginica : 0 ## ## ## ## -------------------------------------------------------- ## iris$Species: virginica ## Sepal.Length Sepal.Width Petal.Length Petal.Width ## Min. :4.900 Min. :2.200 Min. :4.500 Min. :1.400 ## 1st Qu.:6.225 1st Qu.:2.800 1st Qu.:5.100 1st Qu.:1.800 ## Median :6.500 Median :3.000 Median :5.550 Median :2.000 ## Mean :6.588 Mean :2.974 Mean :5.552 Mean :2.026 ## 3rd Qu.:6.900 3rd Qu.:3.175 3rd Qu.:5.875 3rd Qu.:2.300 ## Max. :7.900 Max. :3.800 Max. :6.900 Max. :2.500 ## Species ## setosa : 0 ## versicolor: 0 ## virginica :50 ## ## ## and the correlations between each of the measurement variables

##              Sepal.Length Sepal.Width Petal.Length Petal.Width
## Sepal.Length    1.0000000  -0.1175698    0.8717538   0.8179411
## Sepal.Width    -0.1175698   1.0000000   -0.4284401  -0.3661259
## Petal.Length    0.8717538  -0.4284401    1.0000000   0.9628654
## Petal.Width     0.8179411  -0.3661259    0.9628654   1.0000000

Graphical Summaries

Versicolor has an average Sepal length of 6 cm and is in between Setosa and Virginica. Virginica has the greatest variation with more upper quartile outlyers. Setosa has the samllest Sepal lengths.

attach(iris)
boxplot(Sepal.Length~Species, main="Sepal Lengths of Irises", ylab="centimeters")

Show the relationship between sepal lengths and widths, for each variety

plot(Sepal.Length, Sepal.Width, pch=c(21, 22, 23)[as.numeric(Species)],
     col=c("red", "blue", "green")[as.numeric(Species)])

FINDINGS AND DISCUSSION

Use the Sepal length and Petal width to distinguish the Setosa Iris as it should have the lowest Sepal length (usually under 5.5), while also having the largest Petal width. It is quite difficult to truly distinguish Versicolor and Virginica. They have many data points that occur at the same time, but do have limits to their similarities as you near the edge of their potenial to grow. Use Sepal length to distinguish Versicolor and Virginica Iris. Versicolor shows lengths between 5.5 and 6.5cm on average. Virginica shows lengths above 6.5cm and in some cases up to nearly 8cm.