Iris Data Analysis

Author

Marco Vazquez

Intro

In this exercise we explore the iris dataset from a few previous modules which contains measurements of petal and sepal dimensions for three species of iris flowers: setosa, versicolor and virginica.

Loading the Data

First we load the data

iris <- read.csv("iris.csv")
str(iris)
'data.frame':   150 obs. of  6 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : chr  "setosa" "setosa" "setosa" "setosa" ...
 $ Code        : int  1 1 1 1 1 1 1 1 1 1 ...

Basic Plot

Next we can make a scatterplot of Petal Length vs Petal Width:

plot(iris$Petal.Length, iris$Petal.Width,
     xlab = 'Petal Length', ylab = 'Petal Width',
     main = 'Petal Length vs Petal Width')

Plot by Species

Here we differentiate each species with color to show the differences between the three groups:

plot(iris$Petal.Length, iris$Petal.Width,
     col = c('purple', 'darkorange', 'blue')[as.integer(as.factor(iris$Species))],
     pch = 16,
     xlab = 'Petal Length', ylab = 'Petal Width',
     main = 'Petal Length vs Width by Species')
legend('topleft', legend = c('setosa', 'versicolor', 'virginica'),
       col = c('purple', 'darkorange', 'blue'), pch = 16)

Summary Statistics

Now, we look at the mean petal length for each species:

tapply(iris$Petal.Length, iris$Species, mean)
    setosa versicolor  virginica 
     1.462      4.260      5.552