Harold Nelson
1/27/2021
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## âś“ ggplot2 3.3.0 âś“ purrr 0.3.4
## âś“ tibble 3.0.5 âś“ dplyr 1.0.3
## âś“ tidyr 1.0.2 âś“ stringr 1.4.0
## âś“ readr 1.3.1 âś“ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
Make the figures big enough to read.
Here’s an example comparing histograms of a variable, percollege, for two states. The first graph from Healy’s text overlays the two histograms.
oh_wi <- c("OH", "WI")
p <- ggplot(data = subset(midwest, subset = state %in% oh_wi),
mapping = aes(x = percollege, fill = state))
p + geom_histogram(alpha = 0.4, bins = 20)
Here’s an alternative using facet_wrap.
oh_wi <- c("OH", "WI")
p <- ggplot(data = subset(midwest, subset = state %in% oh_wi),
mapping = aes(x = percollege))
p + geom_histogram(alpha = 0.4, bins = 20) +
facet_wrap(~state,ncol=1)
Which of these two makes the comparison of the two histograms easiest?
Here’s an example of overlaid density plots.
p <- ggplot(data = midwest,
mapping = aes(x = percollege, fill = state, color = state))
p + geom_density(alpha = 0.3)
p <- ggplot(data = midwest,
mapping = aes(x = percollege, color = state))
p + geom_density(alpha = 0.3)
And then look at the version with facet_wrap.
p <- ggplot(data = midwest,
mapping = aes(x = percollege, fill = state, color = state))
p + geom_density(alpha = 0.3) + facet_wrap(~state,ncol=1)
Here is a problem from the end of Chapter 4 in Healy.
Revisit the gapminder plots at the beginning of the chapter and experiment with different ways to facet the data. Try plotting population and per capita GDP while faceting on year, or even on country. In the latter case you will get a lot of panels, and plotting them straight to the screen may take a long time.
Let’s use a subset of gapminder to keep the size under control. I used years ending in 7. Play with alpha to get a graph you like.
## Rows: 852
## Columns: 6
## $ country <fct> Afghanistan, Afghanistan, Afghanistan, Afghanistan, Afghani…
## $ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Europe, Europe, Europe,…
## $ year <int> 1957, 1967, 1977, 1987, 1997, 2007, 1957, 1967, 1977, 1987,…
## $ lifeExp <dbl> 30.332, 34.020, 38.438, 40.822, 41.763, 43.828, 59.280, 66.…
## $ pop <int> 9240934, 11537966, 14880372, 13867957, 22227415, 31889923, …
## $ gdpPercap <dbl> 820.8530, 836.1971, 786.1134, 852.3959, 635.3414, 974.5803,…
Exp1 = sgap %>% ggplot(aes(x=gdpPercap,y=lifeExp)) +
geom_point(aes(size=pop),alpha=.2) +
scale_x_log10(breaks = c(1000,25000 )) +
facet_grid(year~continent)
Exp1
What happens if we replace year~continent with ~year+continent in this graph.
## Rows: 852
## Columns: 6
## $ country <fct> Afghanistan, Afghanistan, Afghanistan, Afghanistan, Afghani…
## $ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Europe, Europe, Europe,…
## $ year <int> 1957, 1967, 1977, 1987, 1997, 2007, 1957, 1967, 1977, 1987,…
## $ lifeExp <dbl> 30.332, 34.020, 38.438, 40.822, 41.763, 43.828, 59.280, 66.…
## $ pop <int> 9240934, 11537966, 14880372, 13867957, 22227415, 31889923, …
## $ gdpPercap <dbl> 820.8530, 836.1971, 786.1134, 852.3959, 635.3414, 974.5803,…
Exp2 = sgap %>% ggplot(aes(x=gdpPercap,y=lifeExp)) +
geom_point(aes(size=pop),alpha=.1) +
scale_x_log10(breaks = c(1000,25000 )) +
facet_grid(~year+continent)
Exp2
That didn’t work.
```