The Lattice package is a tool for data visualization, written to provide a better alternative to base R graphics. Through the use of trellis graphics, Lattice allows the user to more easily display relationships within multivariate data. Lattice’s focus on multivariate data allows it to easily produce “small multiple” plots, which are multiple charts arranged on a single grid. These are useful when comparing multiple levels of data across a single variable.
Lattice includes functions which allow the user to display multivariate relationships through quantile plots, matrix plots, as well as a wide range of traditional plots like bar charts and histograms.
V0.2-3 through V0.20-40 are currently available, with V0.20-40 being the most recent version.
For the following examples, we are using the USArrests data set included in base R. Not included in this document, we created three separate variables to make graphing easier.
head(USArrests)
## Murder Assault UrbanPop Rape
## Alabama 13.2 236 58 21.2
## Alaska 10.0 263 48 44.5
## Arizona 8.1 294 80 31.0
## Arkansas 8.8 190 50 19.5
## California 9.0 276 91 40.6
## Colorado 7.9 204 78 38.7
head(USArrests_Final)
## Murder Assault UrbanPop Rape Region Total_Crime
## 1 2.2 48 32 11.2 1 61.4
## 2 5.7 81 39 9.3 2 96.0
## 3 13.0 337 45 16.1 2 366.1
## 4 14.4 279 48 22.5 2 315.9
## 5 8.8 190 50 19.5 2 218.3
## 6 16.1 259 44 17.1 2 292.2
## UrbanPop_Quartile
## 1 Minority Urban Population
## 2 Minority Urban Population
## 3 Minority Urban Population
## 4 Minority Urban Population
## 5 Minority Urban Population
## 6 Minority Urban Population
# used to show the relationship between 3 variables
# ~ usually specifies y ~ x or "y as a function of x"
# cloud requires a formula in the form of z ~ x * y
cloud(Murder ~ Assault * UrbanPop, data = USArrests,
screen = list(x = -90, y = 70), distance = .4, zoom = .7)
# used for comparing many variables and seeing relationships between them
# each line represents a stat
# easy to put together, & get quick feel for data
parallelplot(USArrests)
# box plots can displayed to easily compare groups of univariate data
bwplot(Region ~ UrbanPop, data = USArrests_Region,
xlab = "Urban Population", ylab = "Region")
# strip plots are used for displaying univariate data (one variable)
# lattice makes simple plots like this fast and easy
stripplot(Region ~ UrbanPop, data = USArrests_Region,
xlab = "Urban Population", ylab = "Region")
# dot plots are commonly used for multivariate data.
# lattice makes it easy to sort data into groups
# numeric values of y are graphed against qualitative levels of x
# aesthetics are difficult to manipulate
dotplot(Murder ~ UrbanPop_Quartile| Region, data = USArrests_Final,
layout = c(5,1),
xlab = "Region", ylab = "Murder Arrests per 100,000")
# another possible dot plot example?
# dotplot(Region ~ UrbanPop, data = USArrests_Region,
# xlab = "Urban Population", ylab = "Region")
# example of small multiple plots
# bar charts can be used for both univariate and multivariate data
A <- barchart(Region ~ Assault, data = USArrests_Final,
groups = UrbanPop_Quartile,
auto.key = TRUE,
main = "Assault by Region",
xlab = "Assault",
ylab = "Region")
R <- barchart(Region ~ Rape, data = USArrests_Final,
groups = UrbanPop_Quartile,
auto.key = TRUE,
main = "Rape by Region",
xlab = "Rape",
ylab = "Region")
M <- barchart(Region ~ Murder, data = USArrests_Final,
groups = UrbanPop_Quartile,
auto.key = TRUE,
main = "Murder by Region",
xlab = "Murder",
ylab = "Region")
print(R, split = c(1, 1, 2, 2), more = TRUE)
print(A, split = c(2, 1, 2, 2), more = TRUE)
print(M, split = c(1, 2, 2, 2), more = FALSE)
# two sample quantile plots are used to determine if two data sets come from a population with a simlar distribution
# this example illustrates how to use the qq function
# Total Crime rates from Minority Urban and Majority Urban Populations
# given the a limited number of observations for each variable, actual quantiles have not been calculated or graphed against each other
# thus no conclusions can be drawn
qq(UrbanPop_Quartile ~ Total_Crime,
aspect = 1,
data = USArrests_Final,
subset = (UrbanPop_Quartile == "Majority Urban Population" | UrbanPop_Quartile == "Minority Urban Population"))
xyplot(Assault ~ UrbanPop,
group=Region,
data=USArrests_Final,
xlab="Urban Population",
auto.key = TRUE # add legend
# sub = # add subtitle to each graph
# main = # set main title
)
xy_ggplot <- ggplot(
data=USArrests_Final,
mapping=aes(
x=UrbanPop,
y=Assault
)
)
xy_ggplot <- xy_ggplot +
geom_point(aes(color=Region)) +
labs(x="Urban Population", y="Assault")
xy_ggplot
Coming from a background with ggplot, lattice was difficult to pick up. Many of the graphing functions offer lots of parameters, which makes it feel like you are setting up the entire graph at once, instead of layer by layer like ggplot. Lattice also does not make it particularly easy to edit the aesthetics of images - that we learned the hard way with creating our bar chart and dot plot. Many of the parameters names also did not feel intuitive, and needed to be read about. While ggplot and lattice are very similar, one source claims that lattice is faster than ggplot, and also allows users to fine tune their graphs more than ggplot. This might be true, but for the scope of this class and the graphs we are making, we are not sure if these pros are worthwhile.
Some features that could be tweaked include more intuitive parameter names, and other features to make manipulating aesthetics easier. Overall we felt as though lattice is more difficult to learn than ggplot, but offers many of the same features.