The ggplot2 Plotting System: Part 1

with(airquality, {
plot(Temp, Ozone)
lines(loess.smooth(Temp, Ozone))
})

library(ggplot2)
ggplot(airquality, aes(Temp, Ozone)) +
geom_point() +
geom_smooth(method = "loess", se = FALSE)
## Warning: Removed 37 rows containing non-finite values (stat_smooth).
## Warning: Removed 37 rows containing missing values (geom_point).

The Basics: qplot()

Before You Start: Label Your Data

One thing that is always true, but is particularly useful when using ggplot2, is that you should always use informative and descriptive labels on your data. More generally, your data should have appropriate metadata so that you can quickly look at a dataset and know:

This means that each column of a data frame should have a meaningful (but concise) variable name that accurately reflects the data stored in that column. Also, non-numeric or categorical variables should be coded as factor variables and have meaningful labels for each level of the factor.

ggplot2 “Hello, world!”

library(ggplot2)
str(mpg)
## Classes 'tbl_df', 'tbl' and 'data.frame':    234 obs. of  11 variables:
##  $ manufacturer: chr  "audi" "audi" "audi" "audi" ...
##  $ model       : chr  "a4" "a4" "a4" "a4" ...
##  $ displ       : num  1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
##  $ year        : int  1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
##  $ cyl         : int  4 4 4 4 6 6 6 4 4 4 ...
##  $ trans       : chr  "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
##  $ drv         : chr  "f" "f" "f" "f" ...
##  $ cty         : int  18 21 20 21 16 18 18 18 16 20 ...
##  $ hwy         : int  29 29 31 30 26 26 27 26 25 28 ...
##  $ fl          : chr  "p" "p" "p" "p" ...
##  $ class       : chr  "compact" "compact" "compact" "compact" ...
qplot(displ, hwy, data = mpg)

Modifying aesthetics

qplot(displ, hwy, data = mpg, color = drv)

Adding a geom

** Sometimes it’s nice to add a smoother to a scatterplot ot highlight any trends. Trends can be difficult to see if the data are very noisy or there are many data points obscuring the view. A smooth is a “geom” that you can add along with your data points.

qplot(displ, hwy, data = mpg, geom = c("point", "smooth"))

qplot(displ, hwy, data = mpg, color = drv, geom = c("point", "smooth"))

Histograms

qplot(hwy, data = mpg, fill = drv, binwidth = 2)

qplot(drv, hwy, data = mpg, geom = "boxplot")

Facets

Facets are a way to create multiple panels of plots based on the levels of categorical variable. Here, we want to see a histogram of the highway mileages and the categorical variable is the drive class variable. We can do that using the facets argument to qplot().

qplot(hwy, data = mpg, facets = drv ~ ., binwidth = 2)

qplot(displ, hwy, data = mpg, facets = . ~ drv)

qplot(displ, hwy, data = mpg, facets = . ~ drv) + geom_smooth()

Case Study: MAACS Cohort

  1. The children all had persistent asthma, defined as having had an exacerbation in the past year.
# Load the DF for the MAACS data
load("maacs.rda")

# Spot Check
head(maacs)
##   id eno duBedMusM   pm25 mopos
## 1  1 141      2423 15.560   yes
## 2  2 124      2793 34.370   yes
## 3  3 126      3055 38.953   yes
## 4  4 164       775 33.249   yes
## 5  5  99      1634 27.060   yes
## 6  6  68       939 18.890   yes
# Check the structure of the data
str(maacs)
## 'data.frame':    750 obs. of  5 variables:
##  $ id       : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ eno      : num  141 124 126 164 99 68 41 50 12 30 ...
##  $ duBedMusM: num  2423 2793 3055 775 1634 ...
##  $ pm25     : num  15.6 34.4 39 33.2 27.1 ...
##  $ mopos    : Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 2 ...

The key variables are:

qplot(log(eno), data = maacs)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 108 rows containing non-finite values (stat_bin).

qplot(log(eno), data = maacs, fill = mopos)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 108 rows containing non-finite values (stat_bin).

qplot(log(eno), data = maacs, geom = "density")
## Warning: Removed 108 rows containing non-finite values (stat_density).

qplot(log(eno), data = maacs, geom = "density", color = mopos)
## Warning: Removed 108 rows containing non-finite values (stat_density).

qplot(log(pm25), log(eno), data = maacs, geom = c("point", "smooth"))
## Warning: Removed 184 rows containing non-finite values (stat_smooth).
## Warning: Removed 184 rows containing missing values (geom_point).

qplot(log(pm25), log(eno), data = maacs, shape = mopos)
## Warning: Removed 184 rows containing missing values (geom_point).

qplot(log(pm25), log(eno), data = maacs, color = mopos)
## Warning: Removed 184 rows containing missing values (geom_point).

qplot(log(pm25), log(eno), data = maacs, color = mopos) + geom_smooth(method = "lm")
## Warning: Removed 184 rows containing non-finite values (stat_smooth).
## Warning: Removed 184 rows containing missing values (geom_point).

qplot(log(pm25), log(eno), data = maacs, facets = . ~ mopos) + geom_smooth(method = "lm")
## Warning: Removed 184 rows containing non-finite values (stat_smooth).
## Warning: Removed 184 rows containing missing values (geom_point).

Summary of qplot()