July 10, 2017

Analysis of Iris Data Set

This is an R Markdown presentation that explores the characteristics of the Iris Data Set. This data set was created by American Botanist Edgar Anderson on or about 1929 to quantify the variation of Iris flowers in three related species. Anderson took measurements of four features (sepal and petal length and width) from the three Iris species he measured (setosa, virginica, and versicolor).

The British statistician Ronald Fisher published the dataset in 1936 as a part of his study on the use of linear discriminant analysis in categorizing classes of objects or events. As a result the data set is sometimes referred to as Fisher's Iris data set.

(Source Wikipedia)

Slide Descriptions

  • Summary of the Widths and Lengths Measured
  • Plot of Sepal Lengths versus Sepal Widths
  • Plot of Petal Lengths versus Petal Widths

Summary of Widths and Lengths

summary(iris[1:4])
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500

Comparison of Sepal Width to Sepal Length

Comparison of Petal Width to Petal Length

Summary

  • The plots clearly show the virginica variety of plants have the greatest petal lengths and widths of the three, while the setosa variety has the smallest.
  • The virginica variety also have the greatest sepal lengths but the setosa variety have greatest sepal widths.
  • The versicolor plants seem to lie in the middle ranges of all the measurements taken.