Healy Chapter 5 Part 3

9/26/2018

Setup

library(tidyverse)
## â”€â”€ Attaching packages â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€
## âœ” ggplot2 3.0.0     âœ” purrr   0.2.5
## âœ” tibble  1.4.2     âœ” dplyr   0.7.6
## âœ” tidyr   0.8.1     âœ” stringr 1.3.1
## âœ” readr   1.1.1     âœ” forcats 0.3.0
## â”€â”€ Conflicts â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€
## âœ– dplyr::lag()    masks stats::lag()
# devtools::install_github("kjhealy/socviz")
library(gapminder)
load("~/Dropbox/RProjects/Math 146 Notes/cdc.Rdata")

Two Histograms Overlaid?

Hereâ€™s an example comparing histograms of a variable for two states. The first graph from Healyâ€™s text overlays the two histograms.

oh_wi <- c("OH", "WI")

p <- ggplot(data = subset(midwest, subset = state %in% oh_wi),
mapping = aes(x = percollege, fill = state))
p + geom_histogram(alpha = 0.4, bins = 20)

An alternative

Hereâ€™s an alternative using facet_wrap.

oh_wi <- c("OH", "WI")

p <- ggplot(data = subset(midwest, subset = state %in% oh_wi),
mapping = aes(x = percollege))
p + geom_histogram(alpha = 0.4, bins = 20) +
facet_wrap(~state,ncol=1)

Which of these two makes the comparison of the two histograms easiest?

Density Plots

Hereâ€™s an example of overlaid density plots.

p <- ggplot(data = midwest,
mapping = aes(x = area, fill = state, color = state))
p + geom_density(alpha = 0.3)

Facet Instedd

And then look at the version with facet_wrap.

p <- ggplot(data = midwest,
mapping = aes(x = area, fill = state, color = state))
p + geom_density(alpha = 0.3) + facet_wrap(~state,ncol=1)

Exercise

Here is a problem from the end of Chapter 4 in Healy.

Revisit the gapminder plots at the beginning of the chapter and experiment with different ways to facet the data. Try plotting population and per capita GDP while faceting on year, or even on country. In the latter case you will get a lot of panels, and plotting them straight to the screen may take a long time. Instead, assign the plot to an object and save it as a PDF file to your figures/ folder. Experiment with the height and width of the figure.

Letâ€™s use a subset of gapminder to keep the size under control.

sgap = gapminder %>% filter(year %% 10 == 7)

glimpse(sgap)
## Observations: 852
## Variables: 6
## $country <fct> Afghanistan, Afghanistan, Afghanistan, Afghanistan, ... ##$ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Europe, Europe, ...
## $year <int> 1957, 1967, 1977, 1987, 1997, 2007, 1957, 1967, 1977... ##$ lifeExp   <dbl> 30.332, 34.020, 38.438, 40.822, 41.763, 43.828, 59.2...
## $pop <int> 9240934, 11537966, 14880372, 13867957, 22227415, 318... ##$ gdpPercap <dbl> 820.8530, 836.1971, 786.1134, 852.3959, 635.3414, 97...
Exp1 = sgap %>% ggplot(aes(x=gdpPercap,y=lifeExp)) +
geom_point(aes(size=pop),alpha=.1) +
scale_x_log10(breaks = c(1000,25000 ))  +
facet_grid(year~continent)
Exp1

ggsave("Exp1.pdf",height=11,width=8,units="in")

Exercise

What happens if we replace year~continent with ~year+continent in this graph.

sgap = gapminder %>% filter(year %% 10 == 7)
glimpse(sgap)
## Observations: 852
## $country <fct> Afghanistan, Afghanistan, Afghanistan, Afghanistan, ... ##$ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Europe, Europe, ...
## $year <int> 1957, 1967, 1977, 1987, 1997, 2007, 1957, 1967, 1977... ##$ lifeExp   <dbl> 30.332, 34.020, 38.438, 40.822, 41.763, 43.828, 59.2...
## $pop <int> 9240934, 11537966, 14880372, 13867957, 22227415, 318... ##$ gdpPercap <dbl> 820.8530, 836.1971, 786.1134, 852.3959, 635.3414, 97...
Exp2 = sgap %>% ggplot(aes(x=gdpPercap,y=lifeExp)) +
ggsave("Exp2.pdf",height=11,width=8,units="in")