Include any libraries that we are going to use

#install.packages("openintro")  #you only need this the first time you load openintro
library(openintro)

Load the data

data(iris)
data(COL)

Inspect the data

str(iris)
names(iris)

What type of data structure is iris? How many observations (rows) are there? How many variables (columns) are there?

summary(iris)
head(iris)

Which of the columns contains categorical data? How many categories (levels) are there in this column?

Scatter plots

plot(iris$Sepal.Width ~ iris$Sepal.Length)

or, equivalently,

plot(Sepal.Width ~ Sepal.Length, data=iris)

Plot different species with differently coloured symbols

What colours does R recognise?

It is possible to get R to use a wide range of attractive colour schemes, but base R recognises a limited palate of colours. Wat is this palette?

palette()

So colour number 1 is black, number 2 is red, and so on. Let us use the first three colours for each of the different speciies of iris:

plot(Sepal.Width ~ Sepal.Length, data=iris, col=iris$Species)

Add a legend

Whenever we plot different data sets on the same set of axes, as we have here, we need to add a legend to distingusih them. This can be done in , once the plot has been drawn, with a single extra line:

legend(x=6.5, y=4.5,unique(iris$Species),col=1:3,pch=1)

The x and y values dictate where the legend is drawn. ‘unique(iris$Species)’ tells R what to write in the legend ‘col’ tells it what colours to use, drawing from the numbered R palette. ‘pch’ tells R which symbol to use, from a set op options numbered 1-20.

Exercises

Adapt this code to write out the x and y axis labels as “Sepal length” and “Sepal width” respectively.

Plot a scatter plot of petal width against petal length, with a legend and with each species appearing in a different colour.

Box plots

A single box plot for the whole data set

boxplot(iris$Sepal.Length, main="Box plot", ylab="Sepal Length")

We can use box plot to explore the distribution of a continuous variable across the species.

Compare distributions of sepal length across species

boxplot(Sepal.Length ~ Species, data=iris,
     main="Box Plot",
     xlab="Species",
     ylab="Sepal Length")

Exercise

Now do the same for petal width