26 June 2021

Our Dataset

  • We will work with the in-built airquality dataset.
  • It represents daily air quality measurements in New York from 1st May to 30th September 1973.
df <- airquality
str(df)
## 'data.frame':    153 obs. of  6 variables:
##  $ Ozone  : int  41 36 12 18 NA 28 23 19 8 NA ...
##  $ Solar.R: int  190 118 149 313 NA NA 299 99 19 194 ...
##  $ Wind   : num  7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
##  $ Temp   : int  67 72 74 62 56 66 65 59 61 69 ...
##  $ Month  : int  5 5 5 5 5 5 5 5 5 5 ...
##  $ Day    : int  1 2 3 4 5 6 7 8 9 10 ...

Plotting of Numerical Data

  • Histograms are for representing discrete numerical data i.e. integers.
  • They are particularly good at displaying the ‘skew’ of the distribution around an estimator such as the arithmetic mean or the median.

For Ozone:

hist(df$Ozone)

For Solar Radiation:

## Compare this with the other histogram
hist(df$Solar.R)

Frequency Polygons

  • Also good for discrete values.
  • The function freqpolygon() is from the mosaic package.

For Ozone:

library(mosaic)
freqpolygon(df$Ozone)

For Solar Radiation:

freqpolygon(df$Solar.R)

Scatter Plots

  • Unlike the previous charts, which represent data as ‘collectives’, scatter plots are used to represent individual observations.
  • Quite useful in bivariate analyses - to examine relationship between 2 discrete numerical variables (e.g. correlation).

For Ozone:

## Not all that useful for univariate analysis
plot(df$Ozone)

For Solar Radiation:

plot(df$Solar.R)

Matrix of Scatter Plots

pairs(df)

Do some data transformation

Create a brand new column called Mth.Name

  • that contains the months May to September
  • properly spelt out (rather than as numbers as we have in the original dataset).
  • placed in chronological order (i.e. an ordinal variable).
df$Mth.Name <- month.name[df$Month]
df$Mth.Name <-
  factor(df$Mth.Name, levels = month.name[5:9], ordered = TRUE)

View the results:

head(df$Mth.Name)
## [1] May May May May May May
## Levels: May < June < July < August < September
tail(df$Mth.Name)
## [1] September September September September September September
## Levels: May < June < July < August < September
  • We have created a new column called Mth.Name and made it an ordered factor (categorical variable) of the months of May through September.
  • Can be confirmed using functions like str(), dim(), etc.

Bar Charts

tab <- table(df$Mth.Name)
barplot(tab)

Box-and-Whiskers Plot

For Ozone:

boxplot(df$Ozone)

For Solar Radiation:

boxplot(df$Solar.R)

1-D Scatter Plot

The one-dimensional scatter plot is useful for displaying ‘spread’ in small datasets.

For Ozone:

stripchart(df$Ozone)

For Solar Radiation:

stripchart(df$Solar.R)

Time Series

var = df$Ozone
t <- ts(var, start = 5, end = 9, frequency = 5)
plot(t)

For other variables in the data frame
Solar Radiation:

var = df$Solar.R
t <- ts(var, start = 5, end = 9, frequency = 5)
plot(t)

Temperature:
Note: R function arguments can usually be identified by position.

t <- ts(df$Temp, start = 5, end = 9, frequency = 5)
plot(t)

Annotation

  • A graph or chart is incomplete without title, labels, legends and other kinds of annotation.
  • We will briefly examine some of the simpler approaches that could be employed in basic R plotting.

Titles

title <- "Wind Speed for New York, May - Sept. 1973"
hist(df$Wind, main = title)

Subtitles

sub <- "Source: NY State Dept. of Conservation & US National Weather Service"
hist(df$Wind, main = title, sub = sub)

Axis Labels

X-Axis

x.label <- "Wind Speed (mph)"
hist(df$Wind, main = title, sub = sub, xlab = x.label)

Y-Axis

y.label <- "No. of observations"
hist(
  df$Wind,
  main = title,
  sub = sub,
  xlab = x.label,
  ylab = y.label
)

Additional Labels

labels = LETTERS[1:11]
hist(
  df$Wind,
  main = title,
  sub = sub,
  xlab = x.label,
  ylab = y.label,
  labels = labels      # earlier counted 11 bars
)      

Additional Text

Explanatory notes can be added to the plot

hist(
  df$Wind,
  main = title,
  sub = sub,
  xlab = x.label,
  ylab = y.label,
  labels = labels
) 
mtext("The data are approximately normally distributed.", side = 4)

Customisation

  • We can customise our graphs and charts.
  • R gives the user several possibilities.
  • Limited only by imagination and skill.

Axis limits

hist(
  df$Wind,
  main = title,
  sub = sub,
  xlab = x.label,
  ylab = y.label,
  labels = labels,
  ylim = c(0, 40)
)

hist(
  df$Wind,
  main = title,
  sub = sub,
  xlab = x.label,
  ylab = y.label,
  labels = labels,
  ylim = c(0, 40),
  xlim = c(0, 25)
)

Colour

Fill:

hist(
  df$Wind,
  main = title,
  sub = sub,
  xlab = x.label,
  ylab = y.label,
  labels = labels,
  ylim = c(0, 40),
  xlim = c(0, 25),
  col = "cyan"
)

Line:

hist(
  df$Wind,
  main = title,
  sub = sub,
  xlab = x.label,
  ylab = y.label,
  labels = labels,
  ylim = c(0, 40),
  xlim = c(0, 25),
  col = "cyan",
  border = "red"
)

Plotting Bivariate Data

Data transformation

First, we will modify our data a bit by adding a column of temperature categories.

head(df$Temp)
## [1] 67 72 74 62 56 66
df$Heat <- cut(df$Temp,
               breaks = c(55, 70, 85, 100),
               labels = c("Cold", "Normal", "Hot"),
               include.lowest = TRUE,
               ordered_result = TRUE)
head(df$Heat, 10L)
##  [1] Cold   Normal Normal Cold   Cold   Cold   Cold   Cold   Cold   Cold  
## Levels: Cold < Normal < Hot

table() gives the count of each discrete variable.

table(df$Heat)
## 
##   Cold Normal    Hot 
##     33     86     34

With 2 variables, we can make a 2 X 2 contingency table

table(df$Heat, df$Mth.Name)
##         
##          May June July August September
##   Cold    24    2    0      0         7
##   Normal   7   23   21     17        18
##   Hot      0    5   10     14         5

The table forms the basis for our bar chart

barplot(heatTable)

Default call on barplot. Use ? to see all the arguments.

Change from the default stacked bar chart to a grouped bar chart

barplot(heatTable, beside = TRUE)

Let’s fix the y-axis

barplot(heatTable, beside = TRUE, ylim = c(0, 30))

Our chart could do with some colour!

barplot(heatTable, beside = TRUE, ylim = c(0, 30),
        col = c("lightblue", "brown", "red"))

What do the colours represent? We need a legend…

barplot(heatTable, beside = TRUE, ylim = c(0, 30),
        col = c("lightblue", "brown", "red"), legend = TRUE)

Finally, we need to give this chart a title and a y-axis label.

barplot(heatTable, beside = TRUE, ylim = c(0, 30),
        col = c("lightblue", "brown", "red"), legend = TRUE, 
        main = "Ambient Cold/Heat in New York City, May - Sept. 1973",
        ylab = "No. of Days")

More fun plotting Air Quality Data

Scatter plots with fitted line

plot(
  df$Wind,
  df$Temp,
  main = "Wind vs. Temperature",
  ylab = "Temperature (Fahrenheit)",
  xlab = "Wind (mph)",
  pch = 16,
  col = "blue"
)
mod <- lm(Temp ~ Wind, data = df)
coef <- mod$coefficients
abline(coef = coef, col = "red", lwd = 2)
legend(
  "bottomleft",
  legend = paste(names(coef), round(coef, 2)),
  bg = "gray",
  cex = .7
)
grid(col = "darkgrey")

plot(
  df$Wind,
  df$Ozone,
  main = "Wind vs. Ozone",
  ylab = "Ozone (ppb)",
  xlab = "Wind (mph)",
  pch = 16,
  col = "red"
)
mod1 <- lm(Ozone ~ Wind, data = df)
coef1 <- mod1$coefficients
abline(coef = coef1, col = "blue", lwd = 2)
legend(
  "topright",
  legend = paste(names(coef1), round(coef1, 2)),
  bg = "gray",
  cex = .7
)
grid(col = "darkgrey")

Study the code later…

cols <- c("blue", "green", "red", "black", "cyan")
plot(df$Ozone,
     df$Temp,
     main = "Ozone vs. Temperature",
     ylab = expression({"Temperature ("^0 + "F"}),
     xlab = "Ozone (ppb)",
     pch = 2,
     col = sapply(df$Mth.Name, function(s)
       switch(as.character(s),
              May = cols[1],
              June = cols[2],
              July = cols[3],
              August = cols[4],
              September = cols[5])))

legend("bottomright", legend = levels(df$Mth.Name), fill = cols, cex = .7)

grid()

In conclusion

  • Indeed, R’s graphics capabilities are what makes it to shine.
  • We have only scratched the surface
  • The key to learning to plot in R is practice and a lot of experimentation.
  • Click here to visit a very useful blog post on R plots

Fin