Creating a State Map with GGplot

knitr::opts_chunk$set(echo = F,
                      fig.align = "center") 


## Load the libraries we will be using
pacman::p_load(gapminder, socviz, tidyverse, grid, ggthemes,
               usmap, maps, statebins, viridis, leaflet)  

# Creating a vector for dem/rep colors
party_colors <- c("Democratic" = "#2E74C0", 
                  "Republican" = "#CB454A") 

# Will display the election data set in the global environment
election <- election

Getting the States Outline data set

This state level mapping data is stored in ggplot2. We can use the map_data() function along with the map = ... argument to get the data set that has the outlines for certain maps:

map = "state" will be a data set for each state in the US
map = "county" will be a data set for each county in the US
map = "world" or map = "world2" returns a data set for each country
map = "usa" returns a map of just the United States border (no state borders)

For other countries, you can give map = “italy”, “france”, “nz”, etc…

We’ll keep it simple and just look at the US states data for now.

Let’s see what it looks like…

##        long      lat group order  region subregion
## 1 -87.46201 30.38968     1     1 alabama      <NA>
## 2 -87.48493 30.37249     1     2 alabama      <NA>
## 3 -87.52503 30.37249     1     3 alabama      <NA>
## 4 -87.53076 30.33239     1     4 alabama      <NA>
## 5 -87.57087 30.32665     1     5 alabama      <NA>
## 6 -87.58806 30.32665     1     6 alabama      <NA>

## [1] 15537     6

Creating the presidential election winner map

Let’s create a map that shows which candidate each state voted for in the 2016 election.

To do that, we need to add who won each state to the us_states data set. But how do we do that?

We need a column in both data sets we can use to ID which row of the election data goes with which rows in the us_states data.

Fortunately, both data sets have a column named state!

Unfortunately, in election, the state names are capitalized “Alabama” and in the us_states data they are all lower case “alabama”. So we need to fix that first using the tolower() (to lower) function!

Since the state column in us_states is named region, let’s name the new column in election region as well!

To make joining the data set together easier, call the new data set election2 that only has the region, st, winner, party, and pct_trump columns

## # A tibble: 51 × 5
##    region               st    winner  party      pct_trump
##    <chr>                <chr> <chr>   <chr>          <dbl>
##  1 alabama              AL    Trump   Republican     62.1 
##  2 alaska               AK    Trump   Republican     51.3 
##  3 arizona              AZ    Trump   Republican     48.1 
##  4 arkansas             AR    Trump   Republican     60.6 
##  5 california           CA    Clinton Democratic     31.5 
##  6 colorado             CO    Clinton Democratic     43.2 
##  7 connecticut          CT    Clinton Democratic     40.9 
##  8 delaware             DE    Clinton Democratic     41.7 
##  9 district of columbia DC    Clinton Democratic      4.09
## 10 florida              FL    Trump   Republican     48.6 
## # ℹ 41 more rows

Now that we have the two data sets with the same column name and have matching cases, join the elections and us_states data sets by region and sets together and name the results us_states2!

## # A tibble: 15,537 × 10
##     long   lat group order region  subregion st    winner party      pct_trump
##    <dbl> <dbl> <dbl> <int> <chr>   <chr>     <chr> <chr>  <chr>          <dbl>
##  1 -87.5  30.4     1     1 alabama <NA>      AL    Trump  Republican      62.1
##  2 -87.5  30.4     1     2 alabama <NA>      AL    Trump  Republican      62.1
##  3 -87.5  30.4     1     3 alabama <NA>      AL    Trump  Republican      62.1
##  4 -87.5  30.3     1     4 alabama <NA>      AL    Trump  Republican      62.1
##  5 -87.6  30.3     1     5 alabama <NA>      AL    Trump  Republican      62.1
##  6 -87.6  30.3     1     6 alabama <NA>      AL    Trump  Republican      62.1
##  7 -87.6  30.3     1     7 alabama <NA>      AL    Trump  Republican      62.1
##  8 -87.6  30.3     1     8 alabama <NA>      AL    Trump  Republican      62.1
##  9 -87.7  30.3     1     9 alabama <NA>      AL    Trump  Republican      62.1
## 10 -87.8  30.3     1    10 alabama <NA>      AL    Trump  Republican      62.1
## # ℹ 15,527 more rows

Now graph winning party in 2016, by state

Let’s create a map of the lower 48 states using ggplot().

To create a map with ggplot(), we use 4 main aesthetics:

x = the longitude of each line (long)
y = the latitude of each line (lat)
group = the column with the states’ group numbers (group)
fill = the column you want to shade each state for.

The geom_ you want to use is geom_polygon(), which will draw lines between the x, y coordinates in order of the rows presented for each group. Include color = "black" to drop a black outline around each state.

Make sure to use the party colors from the party_colors vector with the appropriate scale function.

To improve the look of the map, add the following options to gg_elect2016

Add theme_map() from the ggthemes package to use a more suitable theme for the graph
Add a projection using coord_map(projection = "albers", lat0 = 39, lat1 = 45) to make the plot look like a map
Add scale_x_continuous(expand = c(0, 0,)) and scale_y_continuous(expand = c(0, 0)) to remove the buffer space ggplot() typically creates

Map a Numeric variable by State: Percent Margin in 2016

Instead of displaying the binary option of republican or democrat, change the map to display the percent that voted for Trump (pct_trump).

To have the colors appear using dem blue and rep red, use scale_fill_gradient2() with

low = "#2E74C0"
mid = scales::muted("purple")
high = "#CB454A"
midpoint = 50

The colors are more red and purple than we’d expect. Why?

Try without DC, which made the scale too dark…

Remove the rows corresponding to Washington DC (only 4% for Trump)

In region it is “district of colombia”, in st it is just “DC”, so let’s use st to help remove all the rows with DC

Then pipe the resulting data set into the same code as the previous code chunk, just remove the data = argument in ggplot()

Opiates map

Now we’ll switch back to the opiates data set for our next example

Use opiates data to graph small multiples

## Warning in right_join(mutate(opiates, state = tolower(state)), y = us_states, : Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 1 of `x` matches multiple rows in `y`.
## ℹ Row 1 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.