Visualizing Data with GGPLOT2

GEOG 5023: Quantitative Methods In Geography

# need to install package for first time use
install.packages("ggplot2")
install.packages("hexbin")

Loading libraries

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 2.15.3
library(maptools)
## Loading required package: foreign
## Loading required package: sp
## Warning: package 'sp' was built under R version 2.15.3
## Loading required package: grid
## Loading required package: lattice
## Checking rgeos availability: FALSE Note: when rgeos is not available,
## polygon geometry computations in maptools depend on gpclib, which has a
## restricted licence. It is disabled by default; to enable gpclib, type
## gpclibPermit()

Loading Data

## LOAD DATA 'USA copy.shp'
USA <- readShapePoly(choose.files())
## Remove count fields and rows with missing data
USA <- USA[, c(1:8, 14:30)]
USA <- na.omit(USA)

Build a ggplot object

plot1 <- ggplot(data = USA@data, aes(x = Obese, y = homevalu))

Notice the above line doesn't draw anything. That's because we have to assign a “geom” to the aesthetic mapping specified by aes().

plot1 + geom_point()

plot of chunk unnamed-chunk-5

# transform the coordinates
plot1 + geom_point() + scale_x_log10() + scale_y_log10()

plot of chunk unnamed-chunk-5

add transparency to the points to make overplotting visible

plot1 + geom_point(alpha = 1/10) + scale_x_log10() + scale_y_log10()

plot of chunk unnamed-chunk-6

# add a fitted line to the plot
plot1 + geom_point(alpha = 1/10) + geom_smooth(method = "lm")

plot of chunk unnamed-chunk-6

plot1 + geom_point(alpha = 1/10) + geom_smooth(method = "loess")

plot of chunk unnamed-chunk-6

other ways to deal with the over-plotting problem.

library(hexbin)
## Warning: package 'hexbin' was built under R version 2.15.3
plot1 + stat_binhex()

plot of chunk unnamed-chunk-7

plot1 + geom_bin2d()

plot of chunk unnamed-chunk-7

plot1 + geom_density2d()

plot of chunk unnamed-chunk-7

incorporate qualitative variables.

# create a qualitative variable:
USA$good_states <- ifelse(USA$STATE_NAME %in% c("New York", "Massachusetts", 
    "Rhode Island", "Wyoming"), yes = "its good", no = "its ok")
USA$good_states <- as.factor(USA$good_states)

# colors by 'good_states'
plot2 <- ggplot(data = USA@data, aes(x = Obese, y = homevalu, color = good_states))
plot2 + geom_point()

plot of chunk unnamed-chunk-8


# uses a local fit
plot2 <- ggplot(data = USA@data, aes(x = Obese, y = homevalu, color = good_states, 
    shape = good_states))
plot2 + stat_smooth()
## geom_smooth: method="auto" and size of largest group is >=1000, so using
## gam with formula: y ~ s(x, bs = "cs"). Use 'method = x' to change the
## smoothing method.

plot of chunk unnamed-chunk-8


# use both
plot2 + geom_point() + stat_smooth(method = "lm", se = TRUE, lwd = 0.5, lty = 1)

plot of chunk unnamed-chunk-8

Adding marginalia and changing the appearance of the plot is easy. Lets look at the percent college educated (pctcoled) and the per capita income (pcincome), these two variables have a correlation of r=.7 so our plots should show some sort of relationship.

plot3 <- ggplot(data = USA@data, aes(x = pctcoled, y = pcincome))
plot3 + geom_point() + ylab("Per Capita Income") + xlab("Percent College Educated") + 
    ggtitle("US Counties (2000)\nPercent College Educated by Per Capita Income")

plot of chunk unnamed-chunk-9

make multidimensional plot. Lets say we wanted to add the unemployment variable to the plot by changing the color of the dots based on the unemployment rate.

plot4 <- ggplot(data = USA@data, aes(x = pctcoled, y = pcincome, color = unemploy)) + 
    geom_point() + ylab("Per Capita Income") + xlab("Percent College Educated") + 
    ggtitle("US Counties (2000)\nPercent College Educated by Per Capita Income") + 
    scale_color_gradient2("Unemployment", breaks = c(min(USA$unemploy), mean(USA$unemploy), 
        max(USA$unemploy)), labels = c("Below Average", "Average", "Above Average"), 
        low = "green", mid = "yellow", high = "red", midpoint = mean(USA$unemploy))
plot4

plot of chunk unnamed-chunk-10

split the plot into panels based upon the “good states” variable. We create “facets” or subplots that display only the data for each level of the factor:

plot4 + facet_grid(. ~ good_states)

plot of chunk unnamed-chunk-11

create your own theme. edit codes to make title and legend tile visible

sethTheme <- theme(panel.background = element_rect(fill = "black"), title = element_text(colour = "white"), 
    plot.background = element_rect(fill = "black"), panel.grid.minor = element_blank(), 
    panel.grid.major = element_line(linetype = 3, colour = "white"), axis.text.x = element_text(colour = "grey80"), 
    axis.text.y = element_text(colour = "grey80"), axis.title.x = element_text(colour = "grey80"), 
    axis.title.y = element_text(colour = "grey80"), legend.key = element_rect(fill = "black"), 
    legend.text = element_text(colour = "white"), legend.title = element_text(colour = "white"), 
    legend.background = element_rect(fill = "black"), axis.ticks = element_blank())
plot4 + sethTheme

plot of chunk unnamed-chunk-12

ggsave function makes it very easy to save plots in just about any graphics format. plots saved as PDFs move nicely into Adobe illustrator. Using ggsave is simple ggsave(“path/plotName.png”) saves a png file. To save a PDF file simply change the extension: ggsave(“path/plotName.pdf”).


Created by: Li Xu; Created on: 04/29/2013; Updated on: 05/04/2013