Package Overview

The Lattice package is a tool for data visualization, written to provide a better alternative to base R graphics. Through the use of trellis graphics, Lattice allows the user to more easily display relationships within multivariate data. Lattice’s focus on multivariate data allows it to easily produce “small multiple” plots, which are multiple charts arranged on a single grid. These are useful when comparing multiple levels of data across a single variable.

Lattice includes functions which allow the user to display multivariate relationships through quantile plots, matrix plots, as well as a wide range of traditional plots like bar charts and histograms.

Version History

V0.2-3 through V0.20-40 are currently available, with V0.20-40 being the most recent version.

How it works

  • Lattice plots data onto panels. Input data is split into sections called packets, sorting each packet into the appropriate panel via the panel function. All high-level functions contain their own default panel functions which can be used to customize how packets are sorted and how plots are displayed.
    • (x, z | y)
  • Higher level Lattice functions do not calculate the relationships between input variables; they automatically plot points by returning objects and then printing them on an actual graph.
    • Notes: Calling one function within another supresses automatic plotting

Dependencies

  • Lattice is dependent on the following packages:
  • grid
  • grDevices
  • graphics
  • stats
  • utils

Examples of Usage

Dataset

For the following examples, we are using the USArrests data set included in base R. Not included in this document, we created three separate variables to make graphing easier.

head(USArrests)
##            Murder Assault UrbanPop Rape
## Alabama      13.2     236       58 21.2
## Alaska       10.0     263       48 44.5
## Arizona       8.1     294       80 31.0
## Arkansas      8.8     190       50 19.5
## California    9.0     276       91 40.6
## Colorado      7.9     204       78 38.7
head(USArrests_Final)
##   Murder Assault UrbanPop Rape Region Total_Crime
## 1    2.2      48       32 11.2      1        61.4
## 2    5.7      81       39  9.3      2        96.0
## 3   13.0     337       45 16.1      2       366.1
## 4   14.4     279       48 22.5      2       315.9
## 5    8.8     190       50 19.5      2       218.3
## 6   16.1     259       44 17.1      2       292.2
##           UrbanPop_Quartile
## 1 Minority Urban Population
## 2 Minority Urban Population
## 3 Minority Urban Population
## 4 Minority Urban Population
## 5 Minority Urban Population
## 6 Minority Urban Population

3D Scatter Plot

# used to show the relationship between 3 variables
# ~ usually specifies y ~ x or "y as a function of x"
# cloud requires a formula in the form of z ~ x * y

cloud(Murder ~ Assault * UrbanPop, data = USArrests,
      screen = list(x = -90, y = 70), distance = .4, zoom = .7)

Parallel Plot

# used for comparing many variables and seeing relationships between them
# each line represents a stat
# easy to put together, & get quick feel for data

parallelplot(USArrests)

Box Plot

# box plots can displayed to easily compare groups of univariate data

bwplot(Region ~ UrbanPop, data = USArrests_Region,
       xlab = "Urban Population", ylab = "Region")

Strip Plot

# strip plots are used for displaying univariate data (one variable)
# lattice makes simple plots like this fast and easy

stripplot(Region ~ UrbanPop, data = USArrests_Region,
          xlab = "Urban Population", ylab = "Region")

Dot Plot

# dot plots are commonly used for multivariate data.
# lattice makes it easy to sort data into groups
#   numeric values of y are graphed against qualitative levels of x
# aesthetics are difficult to manipulate

dotplot(Murder ~ UrbanPop_Quartile| Region, data = USArrests_Final,
        layout = c(5,1),
        xlab = "Region", ylab = "Murder Arrests per 100,000")

# another possible dot plot example? 
# dotplot(Region ~ UrbanPop, data = USArrests_Region,
#          xlab = "Urban Population", ylab = "Region")

Bar Chart

# example of small multiple plots
# bar charts can be used for both univariate and multivariate data

A <- barchart(Region ~ Assault,  data = USArrests_Final,
         groups = UrbanPop_Quartile,
         auto.key = TRUE,
         main = "Assault by Region",
         xlab = "Assault",
         ylab = "Region")
R <- barchart(Region ~ Rape,  data = USArrests_Final,
         groups = UrbanPop_Quartile,
         auto.key = TRUE,
         main = "Rape by Region",
         xlab = "Rape",
         ylab = "Region")
M <- barchart(Region ~ Murder,  data = USArrests_Final,
         groups = UrbanPop_Quartile,
         auto.key = TRUE,
         main = "Murder by Region",
         xlab = "Murder",
         ylab = "Region")

print(R, split = c(1, 1, 2, 2), more = TRUE)
print(A, split = c(2, 1, 2, 2), more = TRUE)
print(M, split = c(1, 2, 2, 2), more = FALSE)

Two sample quantile plot

# two sample quantile plots are used to determine if two data sets come from a population with a simlar distribution
# this example illustrates how to use the qq function

# Total Crime rates from Minority Urban and Majority Urban Populations
#   given the a limited number of observations for each variable, actual quantiles have not been calculated or graphed against each other
#   thus no conclusions can be drawn

qq(UrbanPop_Quartile ~ Total_Crime, 
   aspect = 1, 
   data = USArrests_Final,
   subset = (UrbanPop_Quartile == "Majority Urban Population" | UrbanPop_Quartile == "Minority Urban Population"))

Compared to ggplot

xyplot(Assault ~ UrbanPop,
       group=Region,             
       data=USArrests_Final, 
       xlab="Urban Population", 
       auto.key = TRUE          # add legend 
       # sub =                  # add subtitle to each graph
       # main =                 # set main title
       )

xy_ggplot <- ggplot(
  data=USArrests_Final, 
  mapping=aes(
    x=UrbanPop,
    y=Assault
  )
) 

xy_ggplot <- xy_ggplot + 
  geom_point(aes(color=Region)) + 
  labs(x="Urban Population", y="Assault")

xy_ggplot

Similar Packages

  • ggplot - popular because it is an easy way to create customize charts in R. It is also easy to create multi-layered complicated graphs in R.
  • highcharter - similar to ggplot and lattice, except it is based on a pre-existing Javascript library, and is well known for its easy to customize visual themes.
  • RColorBrewer - another package used specifically for its ability to create aesthetically appealing graphs, based on the work of cartographer Cynthia Brewer.

Reflection

Coming from a background with ggplot, lattice was difficult to pick up. Many of the graphing functions offer lots of parameters, which makes it feel like you are setting up the entire graph at once, instead of layer by layer like ggplot. Lattice also does not make it particularly easy to edit the aesthetics of images - that we learned the hard way with creating our bar chart and dot plot. Many of the parameters names also did not feel intuitive, and needed to be read about. While ggplot and lattice are very similar, one source claims that lattice is faster than ggplot, and also allows users to fine tune their graphs more than ggplot. This might be true, but for the scope of this class and the graphs we are making, we are not sure if these pros are worthwhile.

Some features that could be tweaked include more intuitive parameter names, and other features to make manipulating aesthetics easier. Overall we felt as though lattice is more difficult to learn than ggplot, but offers many of the same features.