Creating Basic Maps in R

Hi Everyone! This is a quick tutorial to help you get started making maps in R. Below is a video where I walk through the code which follows.

I’m going to be using the “urbnmapr” package, which is pretty user friendly and generates maps for US states and counties. There are, by now, similar packages for most regions of the world. Hopefully this tutorial will give you a basic understanding of how things work, which you can apply to other countries/regions that you’re interested in. For this tutorial I’m assuming you know some basics about Rstudio. If this is your first time opening Rstudio, I recommend you complete the following tutorial first.

To get started let’s download the “urbnmapr” package. This also requires devtools:

install.packages('devtools')
devtools::install_github("UrbanInstitute/urbnmapr")

(When you first install devtools and urbnmapr sometimes you need to restart R to make everything run without error.)

Let’s look at the data in urbnmapr to get a feel for the structure. There are data for “states” and “counties.” Let’s look at “states” first by typing:View(states) in your console.

You see quite a large data set. What this data set contains is latitudinal and longitudinal coordinates that trace out the geographic boundaries of each state. Thankfully, someone has done all the heavy lifting here for you, they’ve coded all of this geographic information. When R plots the states it’s just going to “connect the dots” of those geographic coordinates for each state. Let’s take a look:

library(urbnmapr)
library(tidyverse)
StateRaw<-ggplot() + 
  geom_polygon(data = urbnmapr::states, mapping = aes(x = long, y = lat, group = group),
               fill = "grey", color = "white") +
  coord_map(projection = "albers", lat0 = 39, lat1 = 45)
StateRaw

The principles are the same for the county maps. Type View(counties) in your console to check it out. Let’s make a county map:

CountyRaw<-ggplot() + 
  geom_polygon(data = urbnmapr::counties, mapping = aes(x = long, y = lat, group = group),
               fill = "grey", color = "white") +
  coord_map(projection = "albers", lat0 = 39, lat1 = 45)
CountyRaw

State Maps

OK, so those are the basic maps. What we want to do is to shade those maps based on the data we are interested in visualizing. Let’s start with a shaded state map. What you’ll do here is first compile the state level data you are interested in. As an example, I got state level data on voter turnout in the 2020 U.S. Presidential election. I’ve formatted it for you in a table, which you can download here. (The reference for this data is here.)

Open up the data in Excel and take a look. What we’re going to do is to join this voter turnout data with the “states” data from urbnmapr. We’re going to do this join based on the variable in the “states” data called “state_fips.” Let’s View(states) again to see what this variable looks like. You’ll notice that the first state (Alabama) has a state_fips of “01.” The state_fips data is stored as a character, if it was stored as numeric, the state_fips would be “1.” This is important because when you join these two data sets you really want to make sure the state identifiers are consistent. You’ll notice in the Excel file that for Alabama, the identifier is simply “1.” If we tried to join these two data sets, we wouldn’t get a match because of the different data types. There are a couple of ways we can fix this. The easier way is to change the “states” data so that “state_fips” is numeric:

states$state_fips=as.numeric(states$state_fips)

(Disclaimer: you really want to make sure chopping off the leading “0” doesn’t create duplicate indicators for different states. I think we’re OK, but always be very careful when you change data types.)

(If you’re interested, the other way you can do this is to enter the data in Excel as a character by typing “01", and with the "” before the 0, Excel will keep it there. When you load this into Rstudio, you’ll need to specify that “state_fips” is to be stored as a character.)

Alright, let’s import the voter turnout data to Rstudio (changing the path to where the file is stored on your computer):

VoterDat<-read.table("/Users/dmunro/Dropbox/Teaching/Econ Portal/VoterTurnout.csv",sep=',',header=T)

View(VoterDat) to make sure everything looks OK. If you are building your own state data table, the other thing you want to be careful about is that the state indicator has the same name (“state_fips”) in the two data sets, since we’ll be joining on that. Let’s join our voter turnout data with the “states” data:

VoteMap_data <- left_join(VoterDat, states, by = "state_fips") 

(For more on “joins” see here.)

We’re now ready to make our state map with voter turnout rates. For the “fill” specification in the code below, you want to use the variable name in the data set that you want to shade with, in this case “Voter.Turnout”.

State1<-VoteMap_data %>%
     ggplot(aes(long, lat, group = group, fill = Voter.Turnout)) +
     geom_polygon(color = NA) +
     coord_map(projection = "albers", lat0 = 39, lat1 = 45) +
     labs(fill = "Voter Turnout 2020")
State1

So, there we go. I don’t love a few things about this default color scheme. The first is the colors and the second is the fact that the states with higher voter turnout are shaded more lightly. Below is some slightly modified code where I change these features:

State2<-VoteMap_data %>% 
  ggplot(mapping = aes(long, lat, group = group, fill = Voter.Turnout)) +
  geom_polygon(color = "#ffffff", size = 0.25) +
  coord_map(projection = "albers", lat0 = 39, lat1 = 45) +
  scale_fill_gradient(low = 'cyan', high = 'cyan4',labels = scales::percent) +
  theme(legend.position = "right",
        legend.direction = "vertical",
        legend.title = element_text(face = "bold", size = 11),
        legend.key.height = unit(.2, "in")) +
  labs(fill = "Voter Turnout 2020")
State2

Here I’m choosing “cyan” as my color for low voter turnout states and "cyan4 for my high voter turnout states. (See here for a color guide in ggplot2.) I’m also slightly modifying the legend so that it’s plotting in percent instead of decimals (labels=scales::percent). This is much more aesthetically pleasing to me than the first plot. We see Minnesota with very high voter turnout, and Oklahoma and Arkansas with quite low voter turnout. (If you are interested in other formatting tweaks such as eliminating the axis labels, or eliminating the lat/long lines all together, see the very end of the tutorial.)

County Maps

Let’s move on to the county-level maps. Again, what you’d do here is compile county level data (specifically following the census definition of “FIPS Counties”) that you want to shade the map with. As an example, Professor Caitlin Myers sent me data she has compiled that calculates the average driving distance to an abortion provider in each county. This is a nice measure of access to abortion services. You can download this data here. This data is in Stata format (.dta), so we’ll have to use “read_dta” instead of “read.table” (in the video I use the “readstata13” package which is another way of doing this, but I realized read_dta is part of “tidyverse” via “haven,” which is easier.)

Let’s load this data into Rstudio (again, changing the path as appropriate):

library(haven)
TravelDat <- read_dta("/Users/dmunro/Dropbox/Teaching/Econ Portal/td_2021.dta")

View(TravelDat) to take a look. Again, we want to be careful that the joining variables match. In the “counties” data, this variable is called “county_fips.” However, in the the abortion data it’s called “fips_code,” so let’s rename them so they match:

TravelDat<-rename(TravelDat, county_fips=fips_code)

The other thing we need to be worried about is the issue with “01” vs. “1”. In the counties data, the first county has a fips code of “01001”, this is stored as “01001” in the abortion data as well, so we’re good to go.

Let’s join the abortion data with the mapping data:

AbortionMap_data <- left_join(TravelDat, counties, by = "county_fips")

And let’s plot!!

AbortionMap1<-AbortionMap_data %>% 
  ggplot(mapping = aes(long, lat, group = group, fill = td)) +
  geom_polygon(color = "#ffffff", size = 0.1) +
  coord_map(projection = "albers", lat0 = 39, lat1 = 45) +
  scale_fill_gradient(low = 'cyan', high = 'cyan4') +
  theme(legend.position = "right",
        legend.direction = "vertical",
        legend.title = element_text(face = "bold", size = 11),
        legend.key.height = unit(.2, "in")) +
  labs(fill = "Travel Distance")
AbortionMap1

You might also be interested in looking at counties in a specific region, say the Northeast. Let’s choose Maine, Vermont, New Hampshire, and Massachusetts.

AbortionMapNE<-AbortionMap_data %>%
filter(state_name %in% c("Maine", "Vermont", "New Hampshire", "Massachusetts")) %>%
  ggplot(mapping = aes(long, lat, group = group, fill = td)) +
  geom_polygon(color = "#ffffff", size = 0.25) +
  coord_map(projection = "albers", lat0 = 39, lat1 = 45) +
  scale_fill_gradient(low = 'cyan', high = 'cyan4') +
  theme(legend.position = "right",
        legend.direction = "vertical",
        legend.title = element_text(face = "bold", size = 11),
        legend.key.height = unit(.2, "in")) +
  labs(fill = "Travel Distance")
AbortionMapNE

It looks like Northern VT and NH have the worst access.

And finally, if you’re interested in Vermont counties only:

AbortionMapVT<-AbortionMap_data %>%
filter(state_name %in% c("Vermont")) %>%
  ggplot(mapping = aes(long, lat, group = group, fill = td)) +
  geom_polygon(color = "#ffffff", size = 0.25) +
  coord_map(projection = "albers", lat0 = 39, lat1 = 45) +
  scale_fill_gradient(low = 'cyan', high = 'cyan4') +
  theme(legend.position = "right",
        legend.direction = "vertical",
        legend.title = element_text(face = "bold", size = 11),
        legend.key.height = unit(.2, "in")) +
  labs(fill = "Travel Distance")
AbortionMapVT

Here we can see that the Northeast Kingdom has the worst access to abortion services in Vermont.

That’s it! Hope it was helpful. If you’re interested in other regions of the world, you might check out the following packages:

-tmap

-rnaturalearth

Additional Formatting Options

If you want to get rid of the lat/long labels on these maps you can use commands in the plotting “theme.” As an example, the state map without lat/long labels on the axes is given by:

State3<-VoteMap_data %>% 
  ggplot(mapping = aes(long, lat, group = group, fill = Voter.Turnout)) +
  geom_polygon(color = "#ffffff", size = 0.25) +
  coord_map(projection = "albers", lat0 = 39, lat1 = 45) +
  scale_fill_gradient(low = 'cyan', high = 'cyan4',labels = scales::percent) +
  theme(legend.position = "right",
        legend.direction = "vertical",
        legend.title = element_text(face = "bold", size = 11),
        legend.key.height = unit(.2, "in"),
        axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.title.y=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank()) +
  labs(fill = "Voter Turnout 2020")
State3

And if you’re interested in eliminating the Lat/Long lines all together (you’ll need to install: install.packages("ggthemes").

library(ggthemes)
State4<-VoteMap_data %>% 
  ggplot(mapping = aes(long, lat, group = group, fill = Voter.Turnout)) +
  geom_polygon(color = "#ffffff", size = 0.25) +
  coord_map(projection = "albers", lat0 = 39, lat1 = 45) +
  scale_fill_gradient(low = 'cyan', high = 'cyan4',labels = scales::percent) +
   theme_map() +
  theme(legend.position = "right",
        legend.direction = "vertical",
        legend.title = element_text(face = "bold", size = 11),
        legend.key.height = unit(.2, "in"),
        axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.title.y=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank()) +
  labs(fill = "Voter Turnout 2020")
State4