Healy Chapter 5 Part 3

Harold Nelson

9/26/2018

Setup

library(tidyverse)
## ── Attaching packages ───────────
## ✔ ggplot2 3.0.0     ✔ purrr   0.2.5
## ✔ tibble  1.4.2     ✔ dplyr   0.7.6
## ✔ tidyr   0.8.1     ✔ stringr 1.3.1
## ✔ readr   1.1.1     ✔ forcats 0.3.0
## ── Conflicts ────────────────────
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
# devtools::install_github("kjhealy/socviz")
library(gapminder)
load("~/Dropbox/RProjects/Math 146 Notes/cdc.Rdata")

Two Histograms Overlaid?

Here’s an example comparing histograms of a variable for two states. The first graph from Healy’s text overlays the two histograms.

oh_wi <- c("OH", "WI")

p <- ggplot(data = subset(midwest, subset = state %in% oh_wi),
            mapping = aes(x = percollege, fill = state))
p + geom_histogram(alpha = 0.4, bins = 20)

An alternative

Here’s an alternative using facet_wrap.

oh_wi <- c("OH", "WI")

p <- ggplot(data = subset(midwest, subset = state %in% oh_wi),
            mapping = aes(x = percollege)) 
p + geom_histogram(alpha = 0.4, bins = 20) +
  facet_wrap(~state,ncol=1)

Which of these two makes the comparison of the two histograms easiest?

Density Plots

Here’s an example of overlaid density plots.

p <- ggplot(data = midwest,
            mapping = aes(x = area, fill = state, color = state))
p + geom_density(alpha = 0.3)

Facet Instedd

And then look at the version with facet_wrap.

p <- ggplot(data = midwest,
            mapping = aes(x = area, fill = state, color = state))
p + geom_density(alpha = 0.3) + facet_wrap(~state,ncol=1)

Exercise

Here is a problem from the end of Chapter 4 in Healy.

Revisit the gapminder plots at the beginning of the chapter and experiment with different ways to facet the data. Try plotting population and per capita GDP while faceting on year, or even on country. In the latter case you will get a lot of panels, and plotting them straight to the screen may take a long time. Instead, assign the plot to an object and save it as a PDF file to your figures/ folder. Experiment with the height and width of the figure.

Let’s use a subset of gapminder to keep the size under control.

My Answer

sgap = gapminder %>% filter(year %% 10 == 7)

glimpse(sgap)
## Observations: 852
## Variables: 6
## $ country   <fct> Afghanistan, Afghanistan, Afghanistan, Afghanistan, ...
## $ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Europe, Europe, ...
## $ year      <int> 1957, 1967, 1977, 1987, 1997, 2007, 1957, 1967, 1977...
## $ lifeExp   <dbl> 30.332, 34.020, 38.438, 40.822, 41.763, 43.828, 59.2...
## $ pop       <int> 9240934, 11537966, 14880372, 13867957, 22227415, 318...
## $ gdpPercap <dbl> 820.8530, 836.1971, 786.1134, 852.3959, 635.3414, 97...
Exp1 = sgap %>% ggplot(aes(x=gdpPercap,y=lifeExp)) +
  geom_point(aes(size=pop),alpha=.1) +
  scale_x_log10(breaks = c(1000,25000 ))  +
  facet_grid(year~continent)
Exp1

ggsave("Exp1.pdf",height=11,width=8,units="in")

Exercise

What happens if we replace year~continent with ~year+continent in this graph.

Answer

sgap = gapminder %>% filter(year %% 10 == 7)
glimpse(sgap)
## Observations: 852
## Variables: 6
## $ country   <fct> Afghanistan, Afghanistan, Afghanistan, Afghanistan, ...
## $ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Europe, Europe, ...
## $ year      <int> 1957, 1967, 1977, 1987, 1997, 2007, 1957, 1967, 1977...
## $ lifeExp   <dbl> 30.332, 34.020, 38.438, 40.822, 41.763, 43.828, 59.2...
## $ pop       <int> 9240934, 11537966, 14880372, 13867957, 22227415, 318...
## $ gdpPercap <dbl> 820.8530, 836.1971, 786.1134, 852.3959, 635.3414, 97...
Exp2 = sgap %>% ggplot(aes(x=gdpPercap,y=lifeExp)) +
  geom_point(aes(size=pop),alpha=.1) +
  scale_x_log10(breaks = c(1000,25000 ))  +
  facet_grid(~year+continent)
ggsave("Exp2.pdf",height=11,width=8,units="in")