Healy Chapter 4 Part 2

Harold Nelson

1/27/2021

Setup

library(socviz)
library(gapminder)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## âś“ ggplot2 3.3.0     âś“ purrr   0.3.4
## âś“ tibble  3.0.5     âś“ dplyr   1.0.3
## âś“ tidyr   1.0.2     âś“ stringr 1.4.0
## âś“ readr   1.3.1     âś“ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

Make the figures big enough to read.

knitr::opts_chunk$set(fig.width=8, fig.height=5) 

Two Histograms Overlaid?

Here’s an example comparing histograms of a variable, percollege, for two states. The first graph from Healy’s text overlays the two histograms.

oh_wi <- c("OH", "WI")

p <- ggplot(data = subset(midwest, subset = state %in% oh_wi),
            mapping = aes(x = percollege, fill = state))
p + geom_histogram(alpha = 0.4, bins = 20)

An alternative

Here’s an alternative using facet_wrap.

oh_wi <- c("OH", "WI")

p <- ggplot(data = subset(midwest, subset = state %in% oh_wi),
            mapping = aes(x = percollege)) 
p + geom_histogram(alpha = 0.4, bins = 20) +
  facet_wrap(~state,ncol=1)

Which of these two makes the comparison of the two histograms easiest?

Density Plots

Here’s an example of overlaid density plots.

p <- ggplot(data = midwest,
            mapping = aes(x = percollege, fill = state, color = state))
p + geom_density(alpha = 0.3)

Remove “fill = state” in this graph. Which is easiest to read?

Answer

p <- ggplot(data = midwest,
            mapping = aes(x = percollege, color = state))
p + geom_density(alpha = 0.3)

Facet Instead

And then look at the version with facet_wrap.

Answer

p <- ggplot(data = midwest,
            mapping = aes(x = percollege, fill = state, color = state))
p + geom_density(alpha = 0.3) + facet_wrap(~state,ncol=1)

Exercise

Here is a problem from the end of Chapter 4 in Healy.

Revisit the gapminder plots at the beginning of the chapter and experiment with different ways to facet the data. Try plotting population and per capita GDP while faceting on year, or even on country. In the latter case you will get a lot of panels, and plotting them straight to the screen may take a long time.

Let’s use a subset of gapminder to keep the size under control. I used years ending in 7. Play with alpha to get a graph you like.

My Answer

sgap = gapminder %>% filter(year %% 10 == 7)

glimpse(sgap)
## Rows: 852
## Columns: 6
## $ country   <fct> Afghanistan, Afghanistan, Afghanistan, Afghanistan, Afghani…
## $ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Europe, Europe, Europe,…
## $ year      <int> 1957, 1967, 1977, 1987, 1997, 2007, 1957, 1967, 1977, 1987,…
## $ lifeExp   <dbl> 30.332, 34.020, 38.438, 40.822, 41.763, 43.828, 59.280, 66.…
## $ pop       <int> 9240934, 11537966, 14880372, 13867957, 22227415, 31889923, …
## $ gdpPercap <dbl> 820.8530, 836.1971, 786.1134, 852.3959, 635.3414, 974.5803,…
Exp1 = sgap %>% ggplot(aes(x=gdpPercap,y=lifeExp)) +
  geom_point(aes(size=pop),alpha=.2) +
  scale_x_log10(breaks = c(1000,25000 ))  +
  facet_grid(year~continent)
Exp1

Exercise

What happens if we replace year~continent with ~year+continent in this graph.

Answer

sgap = gapminder %>% filter(year %% 10 == 7)
glimpse(sgap)
## Rows: 852
## Columns: 6
## $ country   <fct> Afghanistan, Afghanistan, Afghanistan, Afghanistan, Afghani…
## $ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Europe, Europe, Europe,…
## $ year      <int> 1957, 1967, 1977, 1987, 1997, 2007, 1957, 1967, 1977, 1987,…
## $ lifeExp   <dbl> 30.332, 34.020, 38.438, 40.822, 41.763, 43.828, 59.280, 66.…
## $ pop       <int> 9240934, 11537966, 14880372, 13867957, 22227415, 31889923, …
## $ gdpPercap <dbl> 820.8530, 836.1971, 786.1134, 852.3959, 635.3414, 974.5803,…
Exp2 = sgap %>% ggplot(aes(x=gdpPercap,y=lifeExp)) +
  geom_point(aes(size=pop),alpha=.1) +
  scale_x_log10(breaks = c(1000,25000 ))  +
  facet_grid(~year+continent)
Exp2

That didn’t work.

```