Plotting in R using Lattice


Library

Install the package -

install.packages("lattice")

Load the package -

library(lattice)

Histogram

Let’s use the USCancerRates dataset from latticeExtra package -

data(USCancerRates, package = "latticeExtra")
str(USCancerRates)
'data.frame':   3041 obs. of  8 variables:
 $ rate.male   : num  364 346 341 336 330 ...
 $ LCL95.male  : num  311 274 304 289 293 ...
 $ UCL95.male  : num  423 431 381 389 371 ...
 $ rate.female : num  151 140 182 185 172 ...
 $ LCL95.female: num  124 103 161 157 151 ...
 $ UCL95.female: num  184 190 206 218 195 ...
 $ state       : Factor w/ 49 levels "Alabama","Alaska",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ county      : 'AsIs' chr  "Pickens County" "Bullock County" "Russell County" "Barbour County" ...

Make a simple histogram -

histogram(x = ~ rate.male, data = USCancerRates)

Here, Y-axis by default shows relative bin frequency.

Using base R-

hist(USCancerRates$rate.male)

In the two outputs the following things are different -

  1. Visual appearance (colors, etc.) is different
  2. The y-axes represent different quantities
  3. Bin boundaries are different

Adding title and axis labels -

histogram(x = ~ rate.male, data = USCancerRates,
          main = "Country wise deaths due to cancer (1999-2003)",
          xlab = "Rate among males (per 100,000)")

Specifying number of intervals -

histogram(x = ~ rate.male, data = USCancerRates,
          nint = 30)

In the case of histogram(), the optional argument type controls what is plotted on the y-axis. It can take three values:

  1. “percent”, the default, gives percentage or relative frequency.(default)
  2. “count” gives bin count, which is the default in hist().
  3. “density” gives a density histogram.
histogram(x = ~ rate.male, data = USCancerRates,
          nint = 30, type = "density")

histogram(x = ~ rate.male, data = USCancerRates,
          nint = 30, type = "count")

Scatterplot

Make a simple scatterplot -

xyplot(rate.female ~ rate.male, data = USCancerRates)

To add axis labels -

xyplot(rate.female ~ rate.male, data = USCancerRates,
       xlab = "Rate among males (per 100,000)",
       ylab = "Rate among females (per 100,000)")

Adding grid and abline -

xyplot(rate.female ~ rate.male, data = USCancerRates,
       abline = c(0,1), grid = TRUE)

Adding linear regression line -

xyplot(rate.female ~ rate.male, data = USCancerRates,
       panel = function(x, y) {
         panel.xyplot(x, y)
         panel.abline(lm(y ~ x))
       })

Customizing legend -

xyplot(Ozone ~ Temp, data = airquality, groups = Month,
       # Complete the legend spec 
       auto.key = list(space = "right", 
                       title = "Month", 
                       text = month.name[5:9]))

Conditioned scatterplot -

# Create 'state.ordered' by reordering levels
library(dplyr)
USCancerRates <- 
  mutate(USCancerRates, 
         state.ordered = reorder(state, 
                                    rate.male + rate.female, 
                                    mean, na.rm = TRUE))

# Create conditioned scatter plot
xyplot(rate.female ~ rate.male | state.ordered,
       data = USCancerRates, 
       grid = TRUE, 
       panel = function(x, y) {
         panel.xyplot(x, y)
         panel.abline(lm(y ~ x))
       })

In a conditioned lattice plot, the panels are by default drawn starting from the bottom-left position, going right and then up. This is patterned on the Cartesian coordinate system where the x-axis increases to the right and the y-axis increases from bottom to top.

Often we want to change this so that the layout is similar to a matrix or table, where rows start at the top. The layout of any conditioned lattice plot can be changed to follow this scheme by adding the optional argument as.table = TRUE.

xyplot(rate.female ~ rate.male | state.ordered,
       data = USCancerRates, 
       grid = TRUE, 
       panel = function(x, y) {
         panel.xyplot(x, y)
         panel.abline(lm(y ~ x))
       },
       as.table = TRUE)

Density plot

Use the ‘airquality’ dataset

data(airquality)
str(airquality)
'data.frame':   153 obs. of  6 variables:
 $ Ozone  : int  41 36 12 18 NA 28 23 19 8 NA ...
 $ Solar.R: int  190 118 149 313 NA NA 299 99 19 194 ...
 $ Wind   : num  7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
 $ Temp   : int  67 72 74 62 56 66 65 59 61 69 ...
 $ Month  : int  5 5 5 5 5 5 5 5 5 5 ...
 $ Day    : int  1 2 3 4 5 6 7 8 9 10 ...

Create a density plot -

densityplot(~ Ozone, data = airquality)

A useful optional argument for densityplot() is plot.points, which can take values -

  1. TRUE, the default, to plot the data points along the x-axis in addition to the density;
  2. FALSE to suppress plotting the data points, and
  3. “jitter”, to plot the points along the y-axis but with some random jittering in the y-direction so that overlapping points are easier to see.
densityplot(~ Ozone, data = airquality,
    plot.points = TRUE)

densityplot(~ Ozone, data = airquality,
    plot.points = FALSE)

Box and Whisker Plot

Creating a box and whisker plot -

bwplot(x = ~ rate.male, data = USCancerRates)

Creating box and whisker plots by some factor -

bwplot(state ~ rate.male, data = USCancerRates)

Reordering the states by their median rate -

bymedian <- with(USCancerRates, reorder(state, rate.male, median, na.rm = T))
bwplot(bymedian ~ rate.male, data = USCancerRates)

Changing labels -

# Create box-and-whisker plot
bwplot(state.ordered ~ rate.female + rate.male,
       data = USCancerRates, 
       outer = TRUE, 
       xlab = "Rate (per 100,000)", 
       # Add strip labels
       strip = strip.custom(factor.levels = c("Male", "Female")))

Using the plot as an object -

pl <- bwplot(state.ordered ~ rate.female + rate.male,
       data = USCancerRates, 
       outer = TRUE, 
       xlab = "Rate (per 100,000)")
pl

class(pl)
[1] "trellis"
summary(pl)

Call:
bwplot(state.ordered ~ rate.female + rate.male, data = USCancerRates, 
    outer = TRUE, xlab = "Rate (per 100,000)")

Number of observations:
rate.female   rate.male 
       3041        3041 
dimnames(pl)
[[1]]
[1] "rate.female" "rate.male"  

Updating trellis object -

update(pl, strip = strip.custom(factor.levels = c("Men","Women")))

Another way to change the labels -

dimnames(pl)[[1]] <- c("Male", "Female")

Subset the trellis object like matrix -

pl[1,]  # only males

Conditioning/Facetting

Conditioning scatterplot on Species -

str(iris)
'data.frame':   150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
xyplot(Sepal.Width ~ Sepal.Length | Species,   # facet by Species
       iris, grid = TRUE)

Conditioning histogram of weight on group -

str(PlantGrowth)
'data.frame':   30 obs. of  2 variables:
 $ weight: num  4.17 5.58 5.18 6.11 4.5 4.61 5.17 4.53 5.33 5.14 ...
 $ group : Factor w/ 3 levels "ctrl","trt1",..: 1 1 1 1 1 1 1 1 1 1 ...
densityplot( ~ weight | group, PlantGrowth)

Conditioning two different variables in one plot -

histogram( ~ rate.male + rate.female, USCancerRates,
           outer = TRUE)

Notice that rate.male and rate.female are two different variables in the dataset, which means that USCancerRates is not a tidy data frame. lattice, unlike ggplot2, allows you to have data in a wide format.

densityplot(~ rate.male + rate.female,
    data = USCancerRates, 
    plot.points = FALSE,    # Suppress data points
    )

With outer=TRUE -

densityplot(~ rate.male + rate.female,
    data = USCancerRates, 
    outer = TRUE,
    plot.points = FALSE,    # Suppress data points
    )

Changing layout -

densityplot( ~ rate.male + rate.female, USCancerRates,
             outer = TRUE, layout = c(1,2) # 1 column, 2 rows
           )

Doing some data manipulation to get summary statistics -

USCancerRates.state <- with(USCancerRates, {    
  rmale <- tapply(rate.male, state, median, na.rm= TRUE)    
  rfemale <- tapply(rate.female, state, median, na.rm= TRUE)  
  data.frame(
    Rate = c(rmale, rfemale),
    State = rep(names(rmale), 2),
    Gender = rep(c("Male", "Female"), each = length(rmale))
    )
  })
USCancerRates.state <- dplyr::mutate(USCancerRates.state,
                                     State = reorder(State, Rate))
head(USCancerRates.state, 10)
     Rate       State Gender
1  286.00     Alabama   Male
2  237.95      Alaska   Male
3  209.30     Arizona   Male
4  284.10    Arkansas   Male
5  221.30  California   Male
6  204.40    Colorado   Male
7  228.55 Connecticut   Male
8  268.25    Delaware   Male
9  250.20     Florida   Male
10 280.80     Georgia   Male

Conditioning by gender -

xyplot(State ~ Rate | Gender, USCancerRates.state, grid = TRUE)

Grouping by gender -

xyplot(State ~ Rate, groups = Gender, data = USCancerRates.state, grid = TRUE)

To add legend -

xyplot(State ~ Rate, groups = Gender, data = USCancerRates.state, 
       grid = TRUE,
       auto.key = TRUE)

Positioning and formatting the legend -

xyplot(State ~ Rate, groups = Gender, data = USCancerRates.state, 
       grid = TRUE,
       auto.key=list(space="bottom", columns = 2,
                     title=NULL, cex.title = 1))

# USCancerRates has been pre-loaded
str(USCancerRates)
'data.frame':   3041 obs. of  9 variables:
 $ rate.male    : num  364 346 341 336 330 ...
 $ LCL95.male   : num  311 274 304 289 293 ...
 $ UCL95.male   : num  423 431 381 389 371 ...
 $ rate.female  : num  151 140 182 185 172 ...
 $ LCL95.female : num  124 103 161 157 151 ...
 $ UCL95.female : num  184 190 206 218 195 ...
 $ state        : Factor w/ 49 levels "Alabama","Alaska",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ county       : 'AsIs' chr  "Pickens County" "Bullock County" "Russell County" "Barbour County" ...
 $ state.ordered: Factor w/ 49 levels "Utah","Colorado",..: 40 40 40 40 40 40 40 40 40 40 ...
  ..- attr(*, "scores")= num [1:49(1d)] 450 428 351 457 383 ...
  .. ..- attr(*, "dimnames")=List of 1
  .. .. ..$ : chr [1:49] "Alabama" "Alaska" "Arizona" "Arkansas" ...
# Create a density plot
densityplot(~ rate.male + rate.female,
    data = USCancerRates,
    # Set value of 'outer' 
    outer = FALSE,
    # Add x-axis label
    xlab = "Rate (per 100,000)",
    # Add a legend
    auto.key = TRUE,
    plot.points = FALSE,
    ref = TRUE)

