Exercises on ggplot2 2

Harold Nelson

2/7/2022

Setup

Load the tidyverse and the dataset “county_clean.Rdata”. Import the file “state_region.csv” into the dataframe state_region. Use left_join to add the region data to county_clean. Glimpse county_clean to make sure you’re OK

Solution

library(tidyverse)
load("county_clean.Rdata")
state_region <- read_csv("state_region.csv")

county_clean = county_clean %>%
  left_join(state_region, by = c("state" = "State"))

glimpse(county_clean)
## Rows: 3,135
## Columns: 17
## $ name              <fct> Autauga County, Baldwin County, Barbour County, Bibb…
## $ state             <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Alabama…
## $ pop2000           <dbl> 43671, 140415, 29038, 20826, 51024, 11714, 21399, 11…
## $ pop2010           <dbl> 54571, 182265, 27457, 22915, 57322, 10914, 20947, 11…
## $ pop2017           <int> 55504, 212628, 25270, 22668, 58013, 10309, 19825, 11…
## $ pop_change        <dbl> 1.48, 9.19, -6.22, 0.73, 0.68, -2.28, -2.69, -1.51, …
## $ poverty           <dbl> 13.7, 11.8, 27.2, 15.2, 15.6, 28.5, 24.4, 18.6, 18.8…
## $ homeownership     <dbl> 77.5, 76.7, 68.0, 82.9, 82.0, 76.9, 69.0, 70.7, 71.4…
## $ multi_unit        <dbl> 7.2, 22.6, 11.1, 6.6, 3.7, 9.9, 13.7, 14.3, 8.7, 4.3…
## $ unemployment_rate <dbl> 3.86, 3.99, 5.90, 4.39, 4.02, 4.93, 5.49, 4.93, 4.08…
## $ metro             <fct> yes, yes, no, yes, yes, no, no, yes, no, no, yes, no…
## $ median_edu        <fct> some_college, some_college, hs_diploma, hs_diploma, …
## $ per_capita_income <dbl> 27841.70, 27779.85, 17891.73, 20572.05, 21367.39, 15…
## $ median_hh_income  <int> 55317, 52562, 33368, 43404, 47412, 29655, 36326, 436…
## $ `State Code`      <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL"…
## $ Region            <chr> "South", "South", "South", "South", "South", "South"…
## $ Division          <chr> "East South Central", "East South Central", "East So…

Exercise 1

An alternative for Two Categorical Variables

Examine the relationship between median educational level and region. Put region on the horizontal axis and median education on the vertical axis. Use geom_jitter(). Try a few different values of the size parameter in geom_jitter().

Solution

county_clean %>% 
  ggplot(aes(x = Region,y = median_edu)) +
  geom_jitter(size = .8)

Exercise 2

Another Alternative

Repeat the last exercise but use geom_count() instead of geom_jitter().

Solution

county_clean %>% 
  ggplot(aes(x = Region,y = median_edu)) +
  geom_count()

Cleveland

There is a style of plotting popularized by William Cleveland. We’ll start with something that doesn’t really work and improve it in steps.

Exercise 3

The Simple Version

Get a barplot of the number of counties in each state using geom_bar().

Solution

county_clean %>% 
  ggplot(aes(x = state)) + geom_bar()

Exercise 4

The graph is totally unreadable because the x-axis labels are on top of each other. A solution is to add coord_flip() as a layer. Do that.

Solution

county_clean %>% 
  ggplot(aes(x = state)) + geom_bar() + coord_flip()

Exercise 5: dplyr

Use dplyr to create a dataframe state_counties with state name and the count of counties.

Solution

state_counties = county_clean %>% 
  group_by(state) %>% 
  summarize(count = n()) %>% 
  ungroup()

head(state_counties)
## # A tibble: 6 × 2
##   state      count
##   <chr>      <int>
## 1 Alabama       67
## 2 Alaska        25
## 3 Arizona       15
## 4 Arkansas      75
## 5 California    58
## 6 Colorado      63

Exercise 6: Column Plot

Use state_counties as the data argument of ggplot. In the aes, map x to count and y to state. Use geom_col().

Solution

state_counties %>% 
  ggplot(aes(x = count, y = state)) + geom_col()

Exercise 7: Reorder

Instead of state in the previous exercise, use reorder(state,count).

Solution

state_counties %>% 
  ggplot(aes(x = count, y = reorder(state,count))) + geom_col()

Exercise 8: Point

Replace geom_col() in the previous exercise with geom_point().

Solution

state_counties %>% 
  ggplot(aes(x = count, y = reorder(state,count))) + geom_point()

Exercise 9: Column again

Add a geom_col() to the previous exercise. Set width = .1.

Solution

state_counties %>% 
  ggplot(aes(x = count, y = reorder(state,count))) + 
  geom_point() + 
  geom_col(width = .1)

Exercise 10

Get a histogram of pop2017.

Solution

county_clean %>% 
  ggplot(aes(x = pop2017)) +
  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The graph is unreadable because of the extreme right skew.

Exercise 11

Use a logarithmic scale. Add scale_x_log10() as a layer.

Solution

county_clean %>% 
  ggplot(aes(x = pop2017)) +
  geom_histogram() +
  scale_x_log10()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.