Exercises on ggplot2 3

Harold Nelson

2/9/2022

Setup

Load the tidyverse and the dataset “county_clean.Rdata”. Import the file “state_region.csv” into the dataframe state_region. Use left_join to add the region data to county_clean. Glimpse county_clean to make sure you’re OK

Solution

library(tidyverse)
load("county_clean.Rdata")
state_region <- read_csv("state_region.csv")

county_clean = county_clean %>%
  left_join(state_region, by = c("state" = "State"))

glimpse(county_clean)

## Rows: 3,135
## Columns: 17
## $ name              <fct> Autauga County, Baldwin County, Barbour County, Bibb…
## $ state             <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Alabama…
## $ pop2000           <dbl> 43671, 140415, 29038, 20826, 51024, 11714, 21399, 11…
## $ pop2010           <dbl> 54571, 182265, 27457, 22915, 57322, 10914, 20947, 11…
## $ pop2017           <int> 55504, 212628, 25270, 22668, 58013, 10309, 19825, 11…
## $ pop_change        <dbl> 1.48, 9.19, -6.22, 0.73, 0.68, -2.28, -2.69, -1.51, …
## $ poverty           <dbl> 13.7, 11.8, 27.2, 15.2, 15.6, 28.5, 24.4, 18.6, 18.8…
## $ homeownership     <dbl> 77.5, 76.7, 68.0, 82.9, 82.0, 76.9, 69.0, 70.7, 71.4…
## $ multi_unit        <dbl> 7.2, 22.6, 11.1, 6.6, 3.7, 9.9, 13.7, 14.3, 8.7, 4.3…
## $ unemployment_rate <dbl> 3.86, 3.99, 5.90, 4.39, 4.02, 4.93, 5.49, 4.93, 4.08…
## $ metro             <fct> yes, yes, no, yes, yes, no, no, yes, no, no, yes, no…
## $ median_edu        <fct> some_college, some_college, hs_diploma, hs_diploma, …
## $ per_capita_income <dbl> 27841.70, 27779.85, 17891.73, 20572.05, 21367.39, 15…
## $ median_hh_income  <int> 55317, 52562, 33368, 43404, 47412, 29655, 36326, 436…
## $ `State Code`      <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL"…
## $ Region            <chr> "South", "South", "South", "South", "South", "South"…
## $ Division          <chr> "East South Central", "East South Central", "East So…

Exercise 1:

Review Histogram. Display the distribution of per_capita_income using a histogram.

Solution

county_clean %>% 
  ggplot(aes(x = per_capita_income)) +
  geom_histogram()

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Exercise 2: Density

The data displayed by default with a histogram is the count of observations in each bin. You can also ask that the data be normalized so that the total area of the histogram is equal to 1.0. To see this, in the call to geom_histogra() insert aes(y = ..density..).

The .. .. indicates to ggplot2 that this “variable” is not present in the dataframe, but will be made available.

county_clean %>% 
  ggplot(aes(x = per_capita_income)) +
  geom_histogram(aes(y = ..density..))

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Exercise 3: geom_density()

We have an alternative, geom_density(), to geom_histogra() which displays the distribution of a single quantitative variable.

Display the distribution of per_capita_income using geom_density() in place of geom_histogram().

Solution

county_clean %>% 
  ggplot(aes(x = per_capita_income)) + geom_density()

Exercise 4

Add geom_histogram() to the previous graph for comparison.

county_clean %>% 
  ggplot(aes(x = per_capita_income)) + 
  geom_histogram(aes(y = ..density..)) +
  geom_density()

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Exercise 5: The adjust parameter

geom_density has an optional parameter, adjust. The default value is 1. Try setting it to .5 and 2.

Solution

county_clean %>% 
  ggplot(aes(x = per_capita_income)) + 
  geom_density(adjust = .5)

county_clean %>% 
  ggplot(aes(x = per_capita_income)) + 
  geom_density(adjust = 2)

Exercise 6: A Rug

Use geom_density() and add geom_rug(). There is a stroke for each observation.

Solution

county_clean %>% 
  ggplot(aes(x = per_capita_income)) + 
  geom_density() +
  geom_rug()

Exercise 7

Do a facet by region of the last graph.

Solution

county_clean %>% 
  ggplot(aes(x = per_capita_income)) + 
  geom_density() +
  geom_rug() +
  facet_wrap(~Region,ncol = 1)