Determining the relationship between variables in the iris data set

9 November 2018

Introducing the iris data set

The iris data set provides measurements, in cm, for 50 flowers from each of 3 species of iris (setosa, versicolor, and virginica) for the following variables:

  • Sepal length
  • Sepal width
  • Petal length
  • Petal width

Visualising the iris data set

str(iris)
'data.frame':   150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

App to determine the relationship between variables in the iris data set

The application contains two input boxes for the user to select both the x and y measurement variables. Based on the user's selection, a plot is generated, which shows the relationship between these two variables. The plot also groups the flower species by colour.

The figure below shows an example of the plot between sepal length vs sepal width.

plot(iris$Sepal.Length, iris$Sepal.Width, col=iris$Species)
legend("topright", c("Setosa", "Versicolor", "Virginica"), 
           col=c("black", "red", "green"), pch=16)

plot of chunk plot

Thank you

Have fun with the application!