In this code-through, we are going to walk through the process of creating a choropleth map to represent the number of households with children under the age of 18 in the United States. In order to accomplish this, we will be relying on data collected by the U.S. Census Bureau through the 2017 American Community Survey.
The data analysis software that we will be using in this code-through are the open source software products R and RStudio. Therefore, you must have both installed on your device to follow along.
If you do not have R and RStudio on your device, you can download the appropriate software for your device at the following links: R and RStudio.
However, if you already have R and RStudio installed, then you can skip ahead to the next section.
One of the most useful features of RStudio is its ability to create fancy documents known as R-Markdown files that end with .rmd. These files can then be deployed to .html files that can be sent to others to explore your data. You can also publish these files online to a web server utilizing RPubs within RStudio.
To begin, you will need to have the R Markdown package installed on your computer. To check whether you have the package installed, you can enter the following code into your console in RStudio:
If you do not have R Markdown installed, then you will enter the following code:
Once R Markdown is successfully installed, we can begin the process of creating our HTML document.
Following the instructions on RPub’s Introduction to R Markdown you can open up the dialog box to create a New File and then a New R Markdown.
This will then present you with a dialog box like the following:
In this dialogue box, you want to select the Document option on the left hand side of the screen. If you would like, you can also enter a title in the title box and your name in the author box. If not, you can enter these later. The most important thing here is to select the HTML default output format to create an interactive webpage.
Once these options are selected, click okay and you will be taken to the document in RStudio where you will see the following screen:
There is quite a lot of information here, including a link to RStudio’s myriad of information on R Markdown. I would suggest clicking the Knit button at the top of the screen to get an idea of how your basic HTML document will look and preview all of the features that can be tailored to your individual needs. In addition, you can preview your options to publish your file online using RPubs by clicking the blue circular figure next to ‘Run’.
After you have completed previewing the HTML file and publishing options, return to the main RStudio window where we will examine the document.
First, you will notice the YAML header at the top of the page:
If you did not enter your name or the title in dialog box to create the file, then the default is for the document to contain just 2 elements in the YAML header:
The output element is the most critical because it is needed to tell RStudio to transform the default .rmd file into an HTML document.
However, we can also customize this header a bit to look something like this:
These changes provide:
A detailed title so that viewers of your HTML web page document can understand the subject immediately.
An author for the document so that you can be credited for the document and/or contacted about any issues with the document.
A date to give an estimate of how recently the information presented has been curated.
A fancier format for your HTML document. RMarkdown is compatible with quite a few different templates. Andrew Zieffler created an awesome gallery of many of the different templates available that you can view by clicking here. Specifically, the theme used for this code-through is presented in the image and can be accessed by entering the following code in your RStudio Console and then setting your YAML header with the appropriate settings:
Now that we have the basic settings in place, we will add a few details to our document. There will be three primary categories throughout the file:
You can also view the R Markdown Cheatsheet for assistance with all of these customizations.
Following the YAML header at the top of your document, the next section of the file that you see is what’s known as a code chunk:
The sample codes detailing how to check for installed packages and install packages included in this code-through are all examples of code chunks. These chunks are where you will run your R codes in R Markdown and they are essential to all data analysis.
For this tutorial, we will have several code chunks to do everything from running packages to defining our data set to graphing our choropleth map.
The sections element of the document is relatively self-explanatory and serves as the way that you will “break down” your findings in your HTML document to make your data easy for you to navigate and for your viewers to understand. As shown in the image below, these headers are defined in RMarkdown by using the ‘#’ sign known as a ‘pound sign’, ‘number sign’, or more recently, a ‘hashtag’:
As implied by the name, plain text is the basic text that is written in a RMarkdown file to provide details, answers, or any other form of text needed to accompany the code chunks. This text can be stylized with asterisks, html code, and other features as mentioned on the RMarkdown cheat sheet to enhance its appearance.
Before we can begin our data analysis, we need a few tools. Most importantly, we need to install the packages we will be using to create our choropleth map:
Once these packages have been installed, we will add them to our library with the following code chunk:
Now that we have installed R, RStudio, RMarkdown, and all of our packages, we’re ready to begin our data analysis.
First, we’re going to load the data from the Census Bureau.
In order to do this, you will need to create a Census API Key from the census website:
After submitting the form, you will receive an email with your code that you will enter into the following code:
Once you have set your API key, you can now query data from any of the Census Bureau’s available American Community Surveys. For the purposes of this tutorial, we will be accessing data from the ACS’s 5 year estimates for 2017. To preview this data enter the following code:
# Preview Data
VarPreview <- load_variables( 2017, "acs5", cache=TRUE )
# Convert all letters to upper case to make searching easier
VarPreview$concept <- toupper(VarPreview$concept)
# Search for households that have one or people under 18
ChildHouseholds <- VarPreview %>%
mutate( contains.ChildHouseholds = grepl( "PEOPLE UNDER 18", concept ) ) %>%
# Create new variable with Mutate that has PEOPLE UNDER 18 in title using grepl
filter( contains.ChildHouseholds )
# Preview data
head(ChildHouseholds) %>% pander()
name | label | concept |
---|---|---|
B11005_001 | Estimate!!Total | HOUSEHOLDS BY PRESENCE OF PEOPLE UNDER 18 YEARS BY HOUSEHOLD TYPE |
B11005_002 | Estimate!!Total!!Households with one or more people under 18 years | HOUSEHOLDS BY PRESENCE OF PEOPLE UNDER 18 YEARS BY HOUSEHOLD TYPE |
B11005_003 | Estimate!!Total!!Households with one or more people under 18 years!!Family households | HOUSEHOLDS BY PRESENCE OF PEOPLE UNDER 18 YEARS BY HOUSEHOLD TYPE |
B11005_004 | Estimate!!Total!!Households with one or more people under 18 years!!Family households!!Married-couple family | HOUSEHOLDS BY PRESENCE OF PEOPLE UNDER 18 YEARS BY HOUSEHOLD TYPE |
B11005_005 | Estimate!!Total!!Households with one or more people under 18 years!!Family households!!Other family | HOUSEHOLDS BY PRESENCE OF PEOPLE UNDER 18 YEARS BY HOUSEHOLD TYPE |
B11005_006 | Estimate!!Total!!Households with one or more people under 18 years!!Family households!!Other family!!Male householder no wife present | HOUSEHOLDS BY PRESENCE OF PEOPLE UNDER 18 YEARS BY HOUSEHOLD TYPE |
contains.ChildHouseholds |
---|
TRUE |
TRUE |
TRUE |
TRUE |
TRUE |
TRUE |
A review of the data indicates that the variable that lists all households is B11005_001 and the variable that lists all households with one or more people under 18 (children) is B11005_002. Therefore, this will be the variable that we will use to map households with children on our choropleth map.
Next, we will prepare our data set to contain information on households with children under 18 and total households for every state and the District of Columbia in the United States:
# Data Set
dat <- c(ChildHouseholds = "B11005_002", TotalHouseholds = "B11005_001")
CenDF <- get_acs(geography="state", year=2017, survey="acs5",
variables= dat, geometry=T)
GEOID | NAME | variable | estimate | moe |
---|---|---|---|---|
01 | Alabama | TotalHouseholds | 1856695 | 5529 |
01 | Alabama | ChildHouseholds | 570218 | 5330 |
02 | Alaska | TotalHouseholds | 252536 | 1271 |
02 | Alaska | ChildHouseholds | 88703 | 1135 |
04 | Arizona | TotalHouseholds | 2482311 | 6429 |
04 | Arizona | ChildHouseholds | 774208 | 5238 |
geometry |
---|
MULTIPOLYGON (((-88.05338 3… |
MULTIPOLYGON (((-88.05338 3… |
MULTIPOLYGON (((-166.5772 5… |
MULTIPOLYGON (((-166.5772 5… |
MULTIPOLYGON (((-114.8163 3… |
MULTIPOLYGON (((-114.8163 3… |
Now, that we have the data we need, we will make a few changes to prepare for mapping. This will include removing the margin of error moe from our data set, shifting child households and total households to their own columns, and finding the percentage of households that have children for each state and D.C.:
library(tidyr)
CenDF <- CenDF %>%
select(- moe) %>%
spread(variable, estimate) %>% #Spread moves rows into columns
mutate(PercentageHHWithKids= round((ChildHouseholds/TotalHouseholds)*100, 2)) %>% #mutate finds percentage of households with children
arrange(PercentageHHWithKids) # Orders by percentage
GEOID | NAME | ChildHouseholds | TotalHouseholds |
---|---|---|---|
11 | District of Columbia | 59794 | 277985 |
23 | Maine | 143583 | 554061 |
50 | Vermont | 67446 | 258535 |
30 | Montana | 114479 | 419975 |
12 | Florida | 2070279 | 7510882 |
54 | West Virginia | 205402 | 737671 |
38 | North Dakota | 87617 | 311525 |
42 | Pennsylvania | 1420567 | 5007442 |
33 | New Hampshire | 150964 | 526710 |
44 | Rhode Island | 119319 | 412028 |
55 | Wisconsin | 677337 | 2328754 |
41 | Oregon | 457331 | 1571631 |
10 | Delaware | 103355 | 352357 |
26 | Michigan | 1142299 | 3888646 |
25 | Massachusetts | 769371 | 2585715 |
39 | Ohio | 1381829 | 4633145 |
19 | Iowa | 376231 | 1251587 |
36 | New York | 2194841 | 7302710 |
56 | Wyoming | 69356 | 230237 |
45 | South Carolina | 565221 | 1871307 |
09 | Connecticut | 411982 | 1361755 |
29 | Missouri | 724127 | 2386203 |
46 | South Dakota | 103097 | 339458 |
72 | Puerto Rico | 375214 | 1222606 |
01 | Alabama | 570218 | 1856695 |
27 | Minnesota | 662900 | 2153202 |
53 | Washington | 852879 | 2755697 |
35 | New Mexico | 238687 | 770435 |
47 | Tennessee | 789370 | 2547194 |
04 | Arizona | 774208 | 2482311 |
32 | Nevada | 329681 | 1052249 |
21 | Kentucky | 540452 | 1724514 |
17 | Illinois | 1511085 | 4818452 |
37 | North Carolina | 1216304 | 3874346 |
08 | Colorado | 657781 | 2082531 |
05 | Arkansas | 362985 | 1147291 |
31 | Nebraska | 237121 | 748405 |
18 | Indiana | 803968 | 2537189 |
22 | Louisiana | 554512 | 1737645 |
20 | Kansas | 360415 | 1121943 |
51 | Virginia | 1002203 | 3105636 |
40 | Oklahoma | 479559 | 1468971 |
15 | Hawaii | 148852 | 455502 |
24 | Maryland | 713897 | 2181093 |
16 | Idaho | 200867 | 609124 |
28 | Mississippi | 368719 | 1103514 |
34 | New Jersey | 1069635 | 3199111 |
13 | Georgia | 1263112 | 3663104 |
02 | Alaska | 88703 | 252536 |
06 | California | 4540623 | 12888128 |
48 | Texas | 3530159 | 9430419 |
49 | Utah | 391666 | 938365 |
PercentageHHWithKids | geometry |
---|---|
21.51 | MULTIPOLYGON (((-77.11976 3… |
25.91 | MULTIPOLYGON (((-68.83144 4… |
26.09 | MULTIPOLYGON (((-73.43774 4… |
27.26 | MULTIPOLYGON (((-116.0497 4… |
27.56 | MULTIPOLYGON (((-80.17628 2… |
27.84 | MULTIPOLYGON (((-82.6432 38… |
28.13 | MULTIPOLYGON (((-104.0487 4… |
28.37 | MULTIPOLYGON (((-80.51989 4… |
28.66 | MULTIPOLYGON (((-70.61702 4… |
28.96 | MULTIPOLYGON (((-71.28802 4… |
29.09 | MULTIPOLYGON (((-86.95617 4… |
29.10 | MULTIPOLYGON (((-123.5989 4… |
29.33 | MULTIPOLYGON (((-75.56555 3… |
29.38 | MULTIPOLYGON (((-83.19159 4… |
29.75 | MULTIPOLYGON (((-70.23405 4… |
29.82 | MULTIPOLYGON (((-82.73571 4… |
30.06 | MULTIPOLYGON (((-96.6397 42… |
30.06 | MULTIPOLYGON (((-72.03683 4… |
30.12 | MULTIPOLYGON (((-111.0569 4… |
30.20 | MULTIPOLYGON (((-79.50795 3… |
30.25 | MULTIPOLYGON (((-72.76143 4… |
30.35 | MULTIPOLYGON (((-95.77355 4… |
30.37 | MULTIPOLYGON (((-104.0577 4… |
30.69 | MULTIPOLYGON (((-65.23805 1… |
30.71 | MULTIPOLYGON (((-88.05338 3… |
30.79 | MULTIPOLYGON (((-89.59206 4… |
30.95 | MULTIPOLYGON (((-122.3316 4… |
30.98 | MULTIPOLYGON (((-109.0502 3… |
30.99 | MULTIPOLYGON (((-90.3103 35… |
31.19 | MULTIPOLYGON (((-114.8163 3… |
31.33 | MULTIPOLYGON (((-120.0057 3… |
31.34 | MULTIPOLYGON (((-89.40565 3… |
31.36 | MULTIPOLYGON (((-91.51297 4… |
31.39 | MULTIPOLYGON (((-75.72681 3… |
31.59 | MULTIPOLYGON (((-109.0603 3… |
31.64 | MULTIPOLYGON (((-94.61783 3… |
31.68 | MULTIPOLYGON (((-104.0534 4… |
31.69 | MULTIPOLYGON (((-88.09776 3… |
31.91 | MULTIPOLYGON (((-88.8677 29… |
32.12 | MULTIPOLYGON (((-102.0517 4… |
32.27 | MULTIPOLYGON (((-75.74241 3… |
32.65 | MULTIPOLYGON (((-103.0026 3… |
32.68 | MULTIPOLYGON (((-156.0608 1… |
32.73 | MULTIPOLYGON (((-76.05015 3… |
32.98 | MULTIPOLYGON (((-117.2427 4… |
33.41 | MULTIPOLYGON (((-88.50297 3… |
33.44 | MULTIPOLYGON (((-75.5591 39… |
34.48 | MULTIPOLYGON (((-81.27939 3… |
35.12 | MULTIPOLYGON (((-166.5772 5… |
35.23 | MULTIPOLYGON (((-118.6044 3… |
37.43 | MULTIPOLYGON (((-94.7183 29… |
41.74 | MULTIPOLYGON (((-114.053 37… |
Now that we have the tools we need, we can begin creating our map.
Before we plot our data set above, we need to decide on a few critical details: the color scheme we would like for our map, the interactive information we want to display, and our information dialogue box.
As detailed in this guide on Leaflet for R, there are a variety of color schemes and options that can be customized for our mapping needs. In this tutorial, we will use the colorNumeric function and a palette of blue hues. Many leading data visualization specialists such as Cole Nussbaumer Knaflic in her book ‘Storytelling with Data’ recommend creating charts in blue whenever possible because it is an easy-to-view color for individuals with color blindness and the range of hues allows for quick comparison across categories.
Therefore, we will set our pallete as Blues from Color Brewer 2 and set our domain to the variable we created above PercentageHHWithKids.
Now that we have our color scheme set, we need to decide what information we want to display when people click on our interactive map. For our purposes, just two fields will suffice: the name of the state and the percentage of households with children.
We will define these details below and preview the output of our popup:
popup <- paste0("<strong>", CenDF$NAME,
"</strong><br />Percentage of Households with Children: ", CenDF$PercentageHHWithKids, "%")
head(popup)
## [1] "<strong>District of Columbia</strong><br />Percentage of Households with Children: 21.51%"
## [2] "<strong>Maine</strong><br />Percentage of Households with Children: 25.91%"
## [3] "<strong>Vermont</strong><br />Percentage of Households with Children: 26.09%"
## [4] "<strong>Montana</strong><br />Percentage of Households with Children: 27.26%"
## [5] "<strong>Florida</strong><br />Percentage of Households with Children: 27.56%"
## [6] "<strong>West Virginia</strong><br />Percentage of Households with Children: 27.84%"
For our information dialogue box we will use the htmltools package to customize a message:
Finally, we can put all these steps together to create our map with the following code:
CenDF %>% # Inputs data set
leaflet(width = "100%") %>% # Interactive mapping package
addProviderTiles(provider = "CartoDB.Positron") %>% # Changes base map
setView(-98.483330, 38.712046, zoom = 4) %>% # Zooms into the United States
addPolygons(popup = ~ popup,
stroke = FALSE,
smoothFactor = 0,
fillOpacity = 0.7,
color = ~ pal(PercentageHHWithKids)) %>% # Customizes map
addLegend("bottomright",
pal = pal,
values = ~ PercentageHHWithKids,
title = "Percentages",
opacity = 1) %>% # Creates legend
addControl(MapInfo, position = "bottomleft") #Creates info dialogue box
The CenDF line of code calls for our data set to be pooled into the leaflet() package that maps our data in an interactive format.
Once this occurs, we then tinker with leaflet’s settings to create our choropleth map. The first thing that we alter is leaflet’s base map. By default, leaflet uses the basic OpenStreetMap displayed below:
While this map is acceptable for use, altering it to the more customized CartoDB.Positron map gives us a cleaner, less-busy base map:
leaflet(width = "100%") %>%
addProviderTiles(provider = "CartoDB.Positron") %>%
setView(-98.483330, 38.712046, zoom = 4)
The next line of code zooms our map into the contiguous United States:
Now, we add the details we created above by entering our popup and pallette preferences into the addPolygons() function:
addPolygons(popup = ~ popup,
stroke = FALSE,
smoothFactor = 0,
fillOpacity = 0.7,
color = ~ pal(PercentageHHWithKids))
Last, but not least, we create a legend for our map:
addLegend("bottomright",
pal = pal,
values = ~ PercentageHHWithKids,
title = "Percentages",
opacity = 1)
And voila!
CenDF %>%
leaflet(width = "100%") %>%
addProviderTiles(provider = "CartoDB.Positron") %>%
setView(-98.483330, 38.712046, zoom = 4) %>%
addPolygons(popup = ~ popup,
stroke = FALSE,
smoothFactor = 0,
fillOpacity = 0.7,
color = ~ pal(PercentageHHWithKids)) %>%
addLegend("bottomright",
pal = pal,
values = ~ PercentageHHWithKids,
title = "Percentages",
opacity = 1) %>%
addControl(MapInfo, position = "bottomleft")
We now have an interactive map that displays our census data for each state. Specifically, we can clearly see that Maine is the state with the lowest percentage of households with children (25.91%) while Utah is the state with the highest percentage of households with children (41.74%)!
Congratulations! You have completed this entire code-through of step-by-step instructions and together we have created an interactive choropleth map!
If you would like to explore the Census data further and all of the neat features available with the Leaflet package, feel free to view the resources below:
This video created by the United States Census Bureau provides a lot of insights on the American Community Survey and its many uses for data analysis and policymaking:
This DataGem video from the Census Bureau provides additional details on ways to access the survey data:
American Community Survey: Website from the U.S. Census Bureau that provides details on the survey and access to data.
TidyCensus: Details and tips on how to use the tidycensus R package.
Leaflet for R: Details and tips on how to use the leaflet R package.
TidyCensus and Leaflet Example One: This blog post by Julia Silge was the inspiration for this code-through and provides other great examples of combining the tidycensus and leaflet packages to create interactive choropleths.
TidyCensus and Leaflet Example Two: Similar to this code-through, this tutorial by Kier O’Neil was also based off of Julia Silge’s blog post and gives other examples of ways to work with tidycensus and leaflet together.
Interactive Choropleth Maps: This tutorial video and website provides great tips and tricks on ways to create choropleth maps in R. However, this specific tutorial uses tigris to get shapefiles. This is helpful if you’re using a data set that does not already have the shapefiles built in, but requires a few extra steps beyond using the tidycensus package that the site details beautifully.