Introduction

In this code-through, we are going to walk through the process of creating a choropleth map to represent the number of households with children under the age of 18 in the United States. In order to accomplish this, we will be relying on data collected by the U.S. Census Bureau through the 2017 American Community Survey.

Set-Up

Install R and RStudio

The data analysis software that we will be using in this code-through are the open source software products R and RStudio. Therefore, you must have both installed on your device to follow along.

If you do not have R and RStudio on your device, you can download the appropriate software for your device at the following links: R and RStudio.

However, if you already have R and RStudio installed, then you can skip ahead to the next section.

Install R-Markdown.

One of the most useful features of RStudio is its ability to create fancy documents known as R-Markdown files that end with .rmd. These files can then be deployed to .html files that can be sent to others to explore your data. You can also publish these files online to a web server utilizing RPubs within RStudio.

To begin, you will need to have the R Markdown package installed on your computer. To check whether you have the package installed, you can enter the following code into your console in RStudio:

installed.packages("rmarkdown")

If you do not have R Markdown installed, then you will enter the following code:

install.packages("rmarkdown")

Set Up Interactive HTML Document

Once R Markdown is successfully installed, we can begin the process of creating our HTML document.

Following the instructions on RPub’s Introduction to R Markdown you can open up the dialog box to create a New File and then a New R Markdown.

This will then present you with a dialog box like the following:

In this dialogue box, you want to select the Document option on the left hand side of the screen. If you would like, you can also enter a title in the title box and your name in the author box. If not, you can enter these later. The most important thing here is to select the HTML default output format to create an interactive webpage.

Once these options are selected, click okay and you will be taken to the document in RStudio where you will see the following screen:

There is quite a lot of information here, including a link to RStudio’s myriad of information on R Markdown. I would suggest clicking the Knit button at the top of the screen to get an idea of how your basic HTML document will look and preview all of the features that can be tailored to your individual needs. In addition, you can preview your options to publish your file online using RPubs by clicking the blue circular figure next to ‘Run’.

After you have completed previewing the HTML file and publishing options, return to the main RStudio window where we will examine the document.

First, you will notice the YAML header at the top of the page:

If you did not enter your name or the title in dialog box to create the file, then the default is for the document to contain just 2 elements in the YAML header:

  1. title
  2. output

The output element is the most critical because it is needed to tell RStudio to transform the default .rmd file into an HTML document.

However, we can also customize this header a bit to look something like this:

These changes provide:

  1. A detailed title so that viewers of your HTML web page document can understand the subject immediately.

  2. An author for the document so that you can be credited for the document and/or contacted about any issues with the document.

  3. A date to give an estimate of how recently the information presented has been curated.

  4. A fancier format for your HTML document. RMarkdown is compatible with quite a few different templates. Andrew Zieffler created an awesome gallery of many of the different templates available that you can view by clicking here. Specifically, the theme used for this code-through is presented in the image and can be accessed by entering the following code in your RStudio Console and then setting your YAML header with the appropriate settings:

install.packages("rmdformats")

Customization

Now that we have the basic settings in place, we will add a few details to our document. There will be three primary categories throughout the file:

  1. Code Chunks
  2. Sections
  3. Plain Text

You can also view the R Markdown Cheatsheet for assistance with all of these customizations.

Code Chunks

Following the YAML header at the top of your document, the next section of the file that you see is what’s known as a code chunk:

The sample codes detailing how to check for installed packages and install packages included in this code-through are all examples of code chunks. These chunks are where you will run your R codes in R Markdown and they are essential to all data analysis.

For this tutorial, we will have several code chunks to do everything from running packages to defining our data set to graphing our choropleth map.

Sections

The sections element of the document is relatively self-explanatory and serves as the way that you will “break down” your findings in your HTML document to make your data easy for you to navigate and for your viewers to understand. As shown in the image below, these headers are defined in RMarkdown by using the ‘#’ sign known as a ‘pound sign’, ‘number sign’, or more recently, a ‘hashtag’:

Plain Text

As implied by the name, plain text is the basic text that is written in a RMarkdown file to provide details, answers, or any other form of text needed to accompany the code chunks. This text can be stylized with asterisks, html code, and other features as mentioned on the RMarkdown cheat sheet to enhance its appearance.

Data Preparation

Packages

Before we can begin our data analysis, we need a few tools. Most importantly, we need to install the packages we will be using to create our choropleth map:

install.packages ("tidycensus")
install.packages("tidyverse")
install.packages("viridis")
install.packages("DT")
install.packages("dplyr")
install.packages("leaflet")
install.packages("pander")
install.packages("stringr")
install.packages("sf")
install.packages("htmltools")

Library

Once these packages have been installed, we will add them to our library with the following code chunk:

#Library
library(tidycensus)
library(tidyverse)
library(viridis)
library(DT)
library(dplyr)
library(leaflet)
library(pander)
library(stringr)
library(sf)
library(htmltools)

Data

Now that we have installed R, RStudio, RMarkdown, and all of our packages, we’re ready to begin our data analysis.

First, we’re going to load the data from the Census Bureau.

In order to do this, you will need to create a Census API Key from the census website:

After submitting the form, you will receive an email with your code that you will enter into the following code:

census_api_key("YOUR_CENSUS_API_KEY_HERE")

Preview Data

Once you have set your API key, you can now query data from any of the Census Bureau’s available American Community Surveys. For the purposes of this tutorial, we will be accessing data from the ACS’s 5 year estimates for 2017. To preview this data enter the following code:

# Preview Data
VarPreview <- load_variables( 2017, "acs5", cache=TRUE )

# Convert all letters to upper case to make searching easier
VarPreview$concept <- toupper(VarPreview$concept)

# Search for households that have one or people under 18
ChildHouseholds <-  VarPreview %>%
  mutate( contains.ChildHouseholds = grepl( "PEOPLE UNDER 18", concept ) ) %>% 
  # Create new variable with Mutate that has PEOPLE UNDER 18 in title using grepl
  filter( contains.ChildHouseholds )

# Preview data
head(ChildHouseholds) %>% pander()
Table continues below
name label concept
B11005_001 Estimate!!Total HOUSEHOLDS BY PRESENCE OF PEOPLE UNDER 18 YEARS BY HOUSEHOLD TYPE
B11005_002 Estimate!!Total!!Households with one or more people under 18 years HOUSEHOLDS BY PRESENCE OF PEOPLE UNDER 18 YEARS BY HOUSEHOLD TYPE
B11005_003 Estimate!!Total!!Households with one or more people under 18 years!!Family households HOUSEHOLDS BY PRESENCE OF PEOPLE UNDER 18 YEARS BY HOUSEHOLD TYPE
B11005_004 Estimate!!Total!!Households with one or more people under 18 years!!Family households!!Married-couple family HOUSEHOLDS BY PRESENCE OF PEOPLE UNDER 18 YEARS BY HOUSEHOLD TYPE
B11005_005 Estimate!!Total!!Households with one or more people under 18 years!!Family households!!Other family HOUSEHOLDS BY PRESENCE OF PEOPLE UNDER 18 YEARS BY HOUSEHOLD TYPE
B11005_006 Estimate!!Total!!Households with one or more people under 18 years!!Family households!!Other family!!Male householder no wife present HOUSEHOLDS BY PRESENCE OF PEOPLE UNDER 18 YEARS BY HOUSEHOLD TYPE
contains.ChildHouseholds
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE

A review of the data indicates that the variable that lists all households is B11005_001 and the variable that lists all households with one or more people under 18 (children) is B11005_002. Therefore, this will be the variable that we will use to map households with children on our choropleth map.

Next, we will prepare our data set to contain information on households with children under 18 and total households for every state and the District of Columbia in the United States:

# Data Set

dat <- c(ChildHouseholds = "B11005_002", TotalHouseholds = "B11005_001")
CenDF <- get_acs(geography="state", year=2017, survey="acs5", 
                  variables= dat, geometry=T)
head(CenDF) %>% pander()
Table continues below
GEOID NAME variable estimate moe
01 Alabama TotalHouseholds 1856695 5529
01 Alabama ChildHouseholds 570218 5330
02 Alaska TotalHouseholds 252536 1271
02 Alaska ChildHouseholds 88703 1135
04 Arizona TotalHouseholds 2482311 6429
04 Arizona ChildHouseholds 774208 5238
geometry
MULTIPOLYGON (((-88.05338 3…
MULTIPOLYGON (((-88.05338 3…
MULTIPOLYGON (((-166.5772 5…
MULTIPOLYGON (((-166.5772 5…
MULTIPOLYGON (((-114.8163 3…
MULTIPOLYGON (((-114.8163 3…

Now, that we have the data we need, we will make a few changes to prepare for mapping. This will include removing the margin of error moe from our data set, shifting child households and total households to their own columns, and finding the percentage of households that have children for each state and D.C.:

library(tidyr)
CenDF <- CenDF %>% 
  select(- moe) %>%  
  spread(variable, estimate) %>% #Spread moves rows into columns
  mutate(PercentageHHWithKids= round((ChildHouseholds/TotalHouseholds)*100, 2)) %>% #mutate finds percentage of households with children 
  arrange(PercentageHHWithKids) # Orders by percentage
CenDF %>% pander()
Table continues below
GEOID NAME ChildHouseholds TotalHouseholds
11 District of Columbia 59794 277985
23 Maine 143583 554061
50 Vermont 67446 258535
30 Montana 114479 419975
12 Florida 2070279 7510882
54 West Virginia 205402 737671
38 North Dakota 87617 311525
42 Pennsylvania 1420567 5007442
33 New Hampshire 150964 526710
44 Rhode Island 119319 412028
55 Wisconsin 677337 2328754
41 Oregon 457331 1571631
10 Delaware 103355 352357
26 Michigan 1142299 3888646
25 Massachusetts 769371 2585715
39 Ohio 1381829 4633145
19 Iowa 376231 1251587
36 New York 2194841 7302710
56 Wyoming 69356 230237
45 South Carolina 565221 1871307
09 Connecticut 411982 1361755
29 Missouri 724127 2386203
46 South Dakota 103097 339458
72 Puerto Rico 375214 1222606
01 Alabama 570218 1856695
27 Minnesota 662900 2153202
53 Washington 852879 2755697
35 New Mexico 238687 770435
47 Tennessee 789370 2547194
04 Arizona 774208 2482311
32 Nevada 329681 1052249
21 Kentucky 540452 1724514
17 Illinois 1511085 4818452
37 North Carolina 1216304 3874346
08 Colorado 657781 2082531
05 Arkansas 362985 1147291
31 Nebraska 237121 748405
18 Indiana 803968 2537189
22 Louisiana 554512 1737645
20 Kansas 360415 1121943
51 Virginia 1002203 3105636
40 Oklahoma 479559 1468971
15 Hawaii 148852 455502
24 Maryland 713897 2181093
16 Idaho 200867 609124
28 Mississippi 368719 1103514
34 New Jersey 1069635 3199111
13 Georgia 1263112 3663104
02 Alaska 88703 252536
06 California 4540623 12888128
48 Texas 3530159 9430419
49 Utah 391666 938365
PercentageHHWithKids geometry
21.51 MULTIPOLYGON (((-77.11976 3…
25.91 MULTIPOLYGON (((-68.83144 4…
26.09 MULTIPOLYGON (((-73.43774 4…
27.26 MULTIPOLYGON (((-116.0497 4…
27.56 MULTIPOLYGON (((-80.17628 2…
27.84 MULTIPOLYGON (((-82.6432 38…
28.13 MULTIPOLYGON (((-104.0487 4…
28.37 MULTIPOLYGON (((-80.51989 4…
28.66 MULTIPOLYGON (((-70.61702 4…
28.96 MULTIPOLYGON (((-71.28802 4…
29.09 MULTIPOLYGON (((-86.95617 4…
29.10 MULTIPOLYGON (((-123.5989 4…
29.33 MULTIPOLYGON (((-75.56555 3…
29.38 MULTIPOLYGON (((-83.19159 4…
29.75 MULTIPOLYGON (((-70.23405 4…
29.82 MULTIPOLYGON (((-82.73571 4…
30.06 MULTIPOLYGON (((-96.6397 42…
30.06 MULTIPOLYGON (((-72.03683 4…
30.12 MULTIPOLYGON (((-111.0569 4…
30.20 MULTIPOLYGON (((-79.50795 3…
30.25 MULTIPOLYGON (((-72.76143 4…
30.35 MULTIPOLYGON (((-95.77355 4…
30.37 MULTIPOLYGON (((-104.0577 4…
30.69 MULTIPOLYGON (((-65.23805 1…
30.71 MULTIPOLYGON (((-88.05338 3…
30.79 MULTIPOLYGON (((-89.59206 4…
30.95 MULTIPOLYGON (((-122.3316 4…
30.98 MULTIPOLYGON (((-109.0502 3…
30.99 MULTIPOLYGON (((-90.3103 35…
31.19 MULTIPOLYGON (((-114.8163 3…
31.33 MULTIPOLYGON (((-120.0057 3…
31.34 MULTIPOLYGON (((-89.40565 3…
31.36 MULTIPOLYGON (((-91.51297 4…
31.39 MULTIPOLYGON (((-75.72681 3…
31.59 MULTIPOLYGON (((-109.0603 3…
31.64 MULTIPOLYGON (((-94.61783 3…
31.68 MULTIPOLYGON (((-104.0534 4…
31.69 MULTIPOLYGON (((-88.09776 3…
31.91 MULTIPOLYGON (((-88.8677 29…
32.12 MULTIPOLYGON (((-102.0517 4…
32.27 MULTIPOLYGON (((-75.74241 3…
32.65 MULTIPOLYGON (((-103.0026 3…
32.68 MULTIPOLYGON (((-156.0608 1…
32.73 MULTIPOLYGON (((-76.05015 3…
32.98 MULTIPOLYGON (((-117.2427 4…
33.41 MULTIPOLYGON (((-88.50297 3…
33.44 MULTIPOLYGON (((-75.5591 39…
34.48 MULTIPOLYGON (((-81.27939 3…
35.12 MULTIPOLYGON (((-166.5772 5…
35.23 MULTIPOLYGON (((-118.6044 3…
37.43 MULTIPOLYGON (((-94.7183 29…
41.74 MULTIPOLYGON (((-114.053 37…

Interactive Choropleth Map Creation

Now that we have the tools we need, we can begin creating our map.

Details

Before we plot our data set above, we need to decide on a few critical details: the color scheme we would like for our map, the interactive information we want to display, and our information dialogue box.

As detailed in this guide on Leaflet for R, there are a variety of color schemes and options that can be customized for our mapping needs. In this tutorial, we will use the colorNumeric function and a palette of blue hues. Many leading data visualization specialists such as Cole Nussbaumer Knaflic in her book ‘Storytelling with Data’ recommend creating charts in blue whenever possible because it is an easy-to-view color for individuals with color blindness and the range of hues allows for quick comparison across categories.

Therefore, we will set our pallete as Blues from Color Brewer 2 and set our domain to the variable we created above PercentageHHWithKids.

pal <- colorNumeric(palette = "Blues", domain = CenDF$PercentageHHWithKids)

Now that we have our color scheme set, we need to decide what information we want to display when people click on our interactive map. For our purposes, just two fields will suffice: the name of the state and the percentage of households with children.

We will define these details below and preview the output of our popup:

popup <- paste0("<strong>", CenDF$NAME, 
                   "</strong><br />Percentage of Households with Children: ", CenDF$PercentageHHWithKids, "%")
head(popup)
## [1] "<strong>District of Columbia</strong><br />Percentage of Households with Children: 21.51%"
## [2] "<strong>Maine</strong><br />Percentage of Households with Children: 25.91%"               
## [3] "<strong>Vermont</strong><br />Percentage of Households with Children: 26.09%"             
## [4] "<strong>Montana</strong><br />Percentage of Households with Children: 27.26%"             
## [5] "<strong>Florida</strong><br />Percentage of Households with Children: 27.56%"             
## [6] "<strong>West Virginia</strong><br />Percentage of Households with Children: 27.84%"

For our information dialogue box we will use the htmltools package to customize a message:

library(htmltools)
MapInfo <- tags$p(tags$style("<p {font-size:12px} />"),
            tags$b("Percentage of households with children by state, click on states to view details"))

Mapping Data

Finally, we can put all these steps together to create our map with the following code:

CenDF %>% # Inputs data set
    leaflet(width = "100%") %>% # Interactive mapping package
    addProviderTiles(provider = "CartoDB.Positron") %>% # Changes base map
    setView(-98.483330, 38.712046, zoom = 4) %>% # Zooms into the United States
    addPolygons(popup = ~ popup,
                stroke = FALSE,
                smoothFactor = 0,
                fillOpacity = 0.7,
                color = ~ pal(PercentageHHWithKids)) %>% # Customizes map
    addLegend("bottomright", 
              pal = pal, 
              values = ~ PercentageHHWithKids,
              title = "Percentages",
              opacity = 1) %>% # Creates legend
    addControl(MapInfo, position = "bottomleft") #Creates info dialogue box

The CenDF line of code calls for our data set to be pooled into the leaflet() package that maps our data in an interactive format.

Once this occurs, we then tinker with leaflet’s settings to create our choropleth map. The first thing that we alter is leaflet’s base map. By default, leaflet uses the basic OpenStreetMap displayed below:

leaflet(width = "100%") %>% 
setView(-98.483330, 38.712046, zoom = 4) %>% 
addTiles()

While this map is acceptable for use, altering it to the more customized CartoDB.Positron map gives us a cleaner, less-busy base map:

 leaflet(width = "100%") %>%
    addProviderTiles(provider = "CartoDB.Positron") %>% 
    setView(-98.483330, 38.712046, zoom = 4)

The next line of code zooms our map into the contiguous United States:

setView(-98.483330, 38.712046, zoom = 4)

Now, we add the details we created above by entering our popup and pallette preferences into the addPolygons() function:

    addPolygons(popup = ~ popup,
                stroke = FALSE,
                smoothFactor = 0,
                fillOpacity = 0.7,
                color = ~ pal(PercentageHHWithKids))

Last, but not least, we create a legend for our map:

    addLegend("bottomright", 
              pal = pal, 
              values = ~ PercentageHHWithKids,
              title = "Percentages",
              opacity = 1)

And voila!

CenDF %>% 
    leaflet(width = "100%") %>% 
    addProviderTiles(provider = "CartoDB.Positron") %>%
    setView(-98.483330, 38.712046, zoom = 4) %>%
    addPolygons(popup = ~ popup,
                stroke = FALSE,
                smoothFactor = 0,
                fillOpacity = 0.7,
                color = ~ pal(PercentageHHWithKids)) %>%
    addLegend("bottomright", 
              pal = pal, 
              values = ~ PercentageHHWithKids,
              title = "Percentages",
              opacity = 1) %>%
    addControl(MapInfo, position = "bottomleft")


We now have an interactive map that displays our census data for each state. Specifically, we can clearly see that Maine is the state with the lowest percentage of households with children (25.91%) while Utah is the state with the highest percentage of households with children (41.74%)!

Conclusion

Congratulations! You have completed this entire code-through of step-by-step instructions and together we have created an interactive choropleth map!

If you would like to explore the Census data further and all of the neat features available with the Leaflet package, feel free to view the resources below:

Videos

This video created by the United States Census Bureau provides a lot of insights on the American Community Survey and its many uses for data analysis and policymaking:

This DataGem video from the Census Bureau provides additional details on ways to access the survey data:

Readings

American Community Survey: Website from the U.S. Census Bureau that provides details on the survey and access to data.

TidyCensus: Details and tips on how to use the tidycensus R package.

Leaflet for R: Details and tips on how to use the leaflet R package.

TidyCensus and Leaflet Example One: This blog post by Julia Silge was the inspiration for this code-through and provides other great examples of combining the tidycensus and leaflet packages to create interactive choropleths.

TidyCensus and Leaflet Example Two: Similar to this code-through, this tutorial by Kier O’Neil was also based off of Julia Silge’s blog post and gives other examples of ways to work with tidycensus and leaflet together.

Interactive Choropleth Maps: This tutorial video and website provides great tips and tricks on ways to create choropleth maps in R. However, this specific tutorial uses tigris to get shapefiles. This is helpful if you’re using a data set that does not already have the shapefiles built in, but requires a few extra steps beyond using the tidycensus package that the site details beautifully.

About The Author

This code-through tutorial was created by Courtney Stowers for CPP 529 Data Analytics Practicum in the Master of Science in Program Evaluation and Data Analytics program at Arizona State University. She can be contacted at: .