https://rpubs.com/drkblake/EasyMappingInR
Making shareable data-driven maps can be surprisingly easy with the R programming language and RStudio programming environment. You can install both for free by visiting:
https://posit.co/download/rstudio-desktop/
… and the first part of this tutorial from Andrew Gard’s excellent Equitable Equations YouTube channel will show you how to install, set up, and begin using R:
https://youtu.be/yZ0bV2Afkjc?si=4Z83Hd-BsTNDgoJP
This tutorial begins with five ready-to-copy R scripts, then includes annotated output from each one:
Script 1: County-level broadband access in Tennessee. Mapping broadband access in each Tennessee county. See Script 1 output and the Selected variables list for some often-used variables.
Script 2: County-level broadband access in the Nashville MSA. Filtering the above map for counties in the Nashville Metropolitan Statistical Area. Script 2 output.
Script 3: County-subdivision-level broadband access in the Nashville MSA. The above map, but redrawn with county subivision borders and data. Script 3 output.
Script 4: County-subdivision-level broadband access in the New York City area. The above map, but redrawn for the New York City area, to show the ease of switching to other U.S. localities. Script 4 output.
Script 5: Mapping external “news deserts” data for counties in Middle Tennessee. A county-level map of “news deserts” in Middle Tennessee, to show the ease of mapping data from sources other than the American Community Survey. Script 5 output.
All scripts require a Census API key, which you can obtain immediately and for free here:
https://api.census.gov/data/key_signup.html
To use your Census API key in a script, find the line that reads:
# census_api_key("PasteYourAPIKeyBetweenTheseQuoteMarks")
… delete the #
at the start of the line, and replace
PasteYourAPIKeyBetweenTheseQuoteMarks
with your API key. Be
sure to keep the quote marks. Without both modifications, the script
will not work.
All scripts use the tidyverse package to pull data from the 2022
five-year American Community Survey, presently the latest-available
five-year ACS dataset. In any script, a different year can be selected
by changing 2022
in year = 2022
to the
preferred year. Data are available for 2005 and later.
A map of all counties in Tennessee and each one’s percentage of households with broadband internet.
##################################################################
# County level, U.S.
##################################################################
# Installing and loading required packages
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("tidycensus")) install.packages("tidycensus")
if (!require("sf")) install.packages("sf")
if (!require("mapview")) install.packages("mapview")
library(tidyverse)
library(tidycensus)
library(sf)
library(mapview)
# Transmitting API key
# census_api_key("PasteYourAPIKeyBetweenTheseQuoteMarks")
# Fetching ACS codebooks
DetailedTables <- load_variables(2022, "acs5", cache = TRUE)
SubjectTables <- load_variables(2022, "acs5/subject", cache = TRUE)
ProfileTables <- load_variables(2022, "acs5/profile", cache = TRUE)
All_ACS_Variables <- bind_rows(DetailedTables, ProfileTables)
All_ACS_Variables <- bind_rows(All_ACS_Variables, SubjectTables)
rm (DetailedTables, SubjectTables, ProfileTables)
# Specify a variable to estimate
VariableList =
c(Estimate_ = "DP02_0154P")
# Fetching data
mydata <- get_acs(
geography = "county",
state = "TN",
variables = VariableList,
year = 2022,
survey = "acs5",
output = "wide",
geometry = TRUE)
# Reformatting data
mydata <-
separate_wider_delim(mydata,
NAME,
delim = ", ",
names = c("County", "State"))
# Mapping data
mapdata <- mydata %>%
rename(Estimate = Estimate_E, Estimate_MOE = Estimate_M)
mapdata <- st_as_sf(mapdata)
mapviewOptions(basemaps.color.shuffle = FALSE)
mapview(mapdata, zcol = "Estimate",
layer.name = "Estimate",
popup = TRUE)
# Exporting data in .csv format
CSVdata <- st_drop_geometry(mapdata)
write.csv(CSVdata, "mydata.csv", row.names = FALSE)
Same as script 1, but filtered for just the counties in the Nashville Metropolitan Statistical area.
##################################################################
# County level, Nashville MSA
##################################################################
# Installing and loading required packages
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("tidycensus")) install.packages("tidycensus")
if (!require("sf")) install.packages("sf")
if (!require("mapview")) install.packages("mapview")
library(tidyverse)
library(tidycensus)
library(sf)
library(mapview)
# Transmitting API key
# census_api_key("PasteYourAPIKeyBetweenTheseQuoteMarks")
# Fetching ACS codebooks
DetailedTables <- load_variables(2022, "acs5", cache = TRUE)
SubjectTables <- load_variables(2022, "acs5/subject", cache = TRUE)
ProfileTables <- load_variables(2022, "acs5/profile", cache = TRUE)
All_ACS_Variables <- bind_rows(DetailedTables, ProfileTables)
All_ACS_Variables <- bind_rows(All_ACS_Variables, SubjectTables)
rm (DetailedTables, SubjectTables, ProfileTables)
# Specify a variable to estimate
VariableList =
c(Estimate_ = "DP02_0154P")
# Fetching data
mydata <- get_acs(
geography = "county",
state = "TN",
variables = VariableList,
year = 2022,
survey = "acs5",
output = "wide",
geometry = TRUE)
# Reformatting data
mydata <-
separate_wider_delim(mydata,
NAME,
delim = ", ",
names = c("County", "State"))
# Filtering data
mydata <- mydata %>%
filter(County == "Cannon County"|
County == "Cheatham County"|
County == "Davidson County"|
County == "Dickson County"|
County == "Hickman County"|
County == "Macon County"|
County == "Maury County"|
County == "Robertson County"|
County == "Rutherford County"|
County == "Smith County"|
County == "Sumner County"|
County == "Trousdale County"|
County == "Williamson County"|
County == "Wilson County")
# Mapping data
mapdata <- mydata %>%
rename(Estimate = Estimate_E, Estimate_MOE = Estimate_M)
mapdata <- st_as_sf(mapdata)
mapviewOptions(basemaps.color.shuffle = FALSE)
mapview(mapdata, zcol = "Estimate",
layer.name = "Estimate",
popup = TRUE)
# Exporting data in .csv format
CSVdata <- st_drop_geometry(mapdata)
write.csv(CSVdata, "mydata.csv", row.names = FALSE)
Same as Script 2, but redrawn with county-subdivision-level boundaries and data.
##################################################################
#County subdivision level, Nashville MSA
##################################################################
# Installing and loading required packages
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("tidycensus")) install.packages("tidycensus")
if (!require("sf")) install.packages("sf")
if (!require("mapview")) install.packages("mapview")
library(tidyverse)
library(tidycensus)
library(sf)
library(mapview)
# Transmitting API key
# census_api_key("PasteYourAPIKeyBetweenTheseQuoteMarks")
# Fetching ACS codebooks
DetailedTables <- load_variables(2022, "acs5", cache = TRUE)
SubjectTables <- load_variables(2022, "acs5/subject", cache = TRUE)
ProfileTables <- load_variables(2022, "acs5/profile", cache = TRUE)
All_ACS_Variables <- bind_rows(DetailedTables, ProfileTables)
All_ACS_Variables <- bind_rows(All_ACS_Variables, SubjectTables)
rm (DetailedTables, SubjectTables, ProfileTables)
# Specify a variable to estimate
VariableList =
c(Estimate_ = "DP02_0154P")
# Fetching data
mydata <- get_acs(
geography = "county subdivision",
state = "TN",
variables = VariableList,
year = 2022,
survey = "acs5",
output = "wide",
geometry = TRUE)
# Reformatting data
mydata <-
separate_wider_delim(mydata,
NAME,
delim = ", ",
names = c("District", "County", "State"))
# Filtering data
mydata <- mydata %>%
filter(County == "Cannon County"|
County == "Cheatham County"|
County == "Davidson County"|
County == "Dickson County"|
County == "Hickman County"|
County == "Macon County"|
County == "Maury County"|
County == "Robertson County"|
County == "Rutherford County"|
County == "Smith County"|
County == "Sumner County"|
County == "Trousdale County"|
County == "Williamson County"|
County == "Wilson County")
# Mapping data
mapdata <- mydata %>%
rename(Estimate = Estimate_E, Estimate_MOE = Estimate_M)
mapdata <- st_as_sf(mapdata)
mapviewOptions(basemaps.color.shuffle = FALSE)
mapview(mapdata, zcol = "Estimate",
layer.name = "Estimate",
popup = TRUE)
# Exporting data in .csv format
CSVdata <- st_drop_geometry(mapdata)
write.csv(CSVdata, "mydata.csv", row.names = FALSE)
Like Script 3, but for the New York City area. See how easy it is to change geographic areas?
##################################################################
# County Subdivision level, New York City area
##################################################################
# Installing and loading required packages
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("tidycensus")) install.packages("tidycensus")
if (!require("sf")) install.packages("sf")
if (!require("mapview")) install.packages("mapview")
library(tidyverse)
library(tidycensus)
library(sf)
library(mapview)
# Transmitting API key
# census_api_key("PasteYourAPIKeyBetweenTheseQuoteMarks")
# Fetching ACS codebooks
DetailedTables <- load_variables(2022, "acs5", cache = TRUE)
SubjectTables <- load_variables(2022, "acs5/subject", cache = TRUE)
ProfileTables <- load_variables(2022, "acs5/profile", cache = TRUE)
All_ACS_Variables <- bind_rows(DetailedTables, ProfileTables)
All_ACS_Variables <- bind_rows(All_ACS_Variables, SubjectTables)
rm (DetailedTables, SubjectTables, ProfileTables)
# Specify a variable to estimate
VariableList =
c(Estimate_ = "DP02_0154P")
# Fetching data
mydata <- get_acs(
geography = "county subdivision",
state = "NY",
variables = VariableList,
year = 2022,
survey = "acs5",
output = "wide",
geometry = TRUE)
# Reformatting data
mydata <-
separate_wider_delim(mydata,
NAME,
delim = ", ",
names = c("District", "County", "State"))
# Filtering data
mydata <- mydata %>%
filter(County == "Suffolk County"|
County == "Nassau County"|
County == "Queens County"|
County == "Kings County"|
County == "Richmond County"|
County == "New York County"|
County == "Bronx County"|
County == "Westchester County"|
County == "Rockland County")
# Mapping data
mapdata <- mydata %>%
rename(Estimate = Estimate_E, Estimate_MOE = Estimate_M)
mapdata <- st_as_sf(mapdata)
mapviewOptions(basemaps.color.shuffle = FALSE)
mapview(mapdata, zcol = "Estimate",
layer.name = "Estimate",
popup = TRUE)
# Exporting data in .csv format
CSVdata <- st_drop_geometry(mapdata)
write.csv(CSVdata, "mydata.csv", row.names = FALSE)
What if you want to map data from a source other than the ACS? It’s easier than you might think.
##################################################################
# County level, Middle Tennessee, merged with external data
##################################################################
# Installing and loading required packages
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("tidycensus")) install.packages("tidycensus")
if (!require("sf")) install.packages("sf")
if (!require("mapview")) install.packages("mapview")
library(tidyverse)
library(tidycensus)
library(sf)
library(mapview)
# News outlet count by Middle Tennessee County
# Compiled from: https://localnewsinitiative.northwestern.edu/
NewsOutlets <- read.csv("https://raw.githubusercontent.com/drkblake/Data/main/NewsOutlets.csv")
# Transmitting API key
# census_api_key("PasteYourAPIKeyBetweenTheseQuoteMarks")
# Fetching ACS codebooks
DetailedTables <- load_variables(2022, "acs5", cache = TRUE)
SubjectTables <- load_variables(2022, "acs5/subject", cache = TRUE)
ProfileTables <- load_variables(2022, "acs5/profile", cache = TRUE)
All_ACS_Variables <- bind_rows(DetailedTables, ProfileTables)
All_ACS_Variables <- bind_rows(All_ACS_Variables, SubjectTables)
rm (DetailedTables, SubjectTables, ProfileTables)
# Specify a variable to estimate
VariableList =
c(Estimate_ = "DP05_0089") # DP05_0089 = Voting-age population
# Fetching data
mydata <- get_acs(
geography = "county",
state = "TN",
variables = VariableList,
year = 2022,
survey = "acs5",
output = "wide",
geometry = TRUE)
# Reformatting data
mydata <-
separate_wider_delim(mydata,
NAME,
delim = ", ",
names = c("County", "State"))
# Filtering data
mydata <- mydata %>%
filter(County == "Bedford County"|
County == "Cannon County"|
County == "Cheatham County"|
County == "Clay County"|
County == "Coffee County"|
County == "Davidson County"|
County == "DeKalb County"|
County == "Dickson County"|
County == "Fentress County"|
County == "Franklin County"|
County == "Giles County"|
County == "Grundy County"|
County == "Hickman County"|
County == "Houston County"|
County == "Humphreys County"|
County == "Jackson County"|
County == "Lawrence County"|
County == "Lewis County"|
County == "Lincoln County"|
County == "Macon County"|
County == "Marshall County"|
County == "Maury County"|
County == "Montgomery County"|
County == "Moore County"|
County == "Overton County"|
County == "Perry County"|
County == "Pickett County"|
County == "Putnam County"|
County == "Robertson County"|
County == "Rutherford County"|
County == "Sequatchie County"|
County == "Smith County"|
County == "Stewart County"|
County == "Sumner County"|
County == "Trousdale County"|
County == "Van Buren County"|
County == "Warren County"|
County == "Wayne County"|
County == "White County"|
County == "Williamson County"|
County == "Wilson County")
# Merging data
mydata <- left_join(mydata,
NewsOutlets,
by = join_by(County == County))
# Mapping data
mapdata <- mydata %>%
rename(Estimate = Estimate_E, Estimate_MOE = Estimate_M)
mapdata <- st_as_sf(mapdata)
mapviewOptions(basemaps.color.shuffle = FALSE)
mapview(mapdata, zcol = "Type",
layer.name = "Estimate",
popup = TRUE)
# Exporting data in .csv format
CSVdata <- st_drop_geometry(mapdata)
write.csv(CSVdata, "mydata.csv", row.names = FALSE)
Like all scripts in this tutorial, this one begins by installing, if needed, and loading several R packages that are required for the script to work. Packages are free add-ons that expand the functionality of R’s basic programming language.
##################################################################
# County level, U.S.
##################################################################
# Installing and loading required packages
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("tidycensus")) install.packages("tidycensus")
if (!require("sf")) install.packages("sf")
if (!require("mapview")) install.packages("mapview")
library(tidyverse)
library(tidycensus)
library(sf)
library(mapview)
Next, the script allows you to specify your Census API key and transmit it to the Census API for verification. To get the code to work, you must find the line that reads:
# census_api_key("PasteYourAPIKeyBetweenTheseQuoteMarks")
… and edit it by deleting the #
and replacing
PasteYourAPIKeyBetweenTheseQuoteMarks
with your Census API
key. You can get an API key immediately and for free by visiting https://api.census.gov/data/key_signup.html.
# Transmitting API key
# census_api_key("PasteYourAPIKeyBetweenTheseQuoteMarks")
This section of the code will produce a data frame called
All_ACS_Variables
that contains the name and label of every
ACS variable available for the year (here, 2022
) and
dataset (here, acs5
) specified. Open and examine
All_ACS_Variables
to select a variable you want to map.
The example will map the 2022 five-year ACS’s DP02_0154P
variable, which estimates the percentage of households that have a
broadband internet connection of any kind. If you want to work with a
different variable, replace DP02_0154P
with the variable’s
name.
General-interest variables in the 2022 five-year dataset include:
B01001_001: Total population.
B19013_001: Median household income in the past 12 months.
DP03_0074P: Percentage of households receiving SNAP benefits in last 12 months.
DP02_0068P: Percent of residents age 25 and older with a bachelor’s degree or higher.
B23013_001: Median age, in years. Search SEX AND AGE!!Total population!! for a variety of age category counts.
DP05_0037: Percent white. Search RACE!!Total population!! for a range of race counts and percentages.
DP05_0002P: Percent male. Search SEX AND AGE!!Total population!! for counts and percentages for a range of sex/age combinations.
# Fetching ACS codebooks
DetailedTables <- load_variables(2022, "acs5", cache = TRUE)
SubjectTables <- load_variables(2022, "acs5/subject", cache = TRUE)
ProfileTables <- load_variables(2022, "acs5/profile", cache = TRUE)
All_ACS_Variables <- bind_rows(DetailedTables, ProfileTables)
All_ACS_Variables <- bind_rows(All_ACS_Variables, SubjectTables)
rm (DetailedTables, SubjectTables, ProfileTables)
# Specify a variable to estimate
VariableList =
c(Estimate_ = "DP02_0154P")
A couple of things can be modified in this next part of the script:
You can get data for a different state by finding
state = "TN"
and changing TN
to the two-letter
postal abbreviation for the state you want.
As suggested above, you also can get data for a different year by
finding year = 2022
and changing 2022
to the
year you want.
In most cases, the mydata
data frame will contain two
values associated with the variable you requested.
Estimate_E
will be the estimate you asked for, like the
estimated percentage of households with a broadband connection.
Estimate_M
will be the 90-percent margin of error
associated with the estimate. Eventually, the script renames these to
Estimate
and Estimate_MOE
, respectively.
# Fetching data
mydata <- get_acs(
geography = "county",
state = "TN",
variables = VariableList,
year = 2022,
survey = "acs5",
output = "wide",
geometry = TRUE)
## Getting data from the 2018-2022 5-year ACS
## Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
## Using the ACS Data Profile
## | | | 0% | | | 1% | |= | 1% | |= | 2% | |== | 2% | |== | 3% | |=== | 4% | |=== | 5% | |==== | 6% | |===== | 6% | |===== | 7% | |===== | 8% | |====== | 8% | |====== | 9% | |======= | 9% | |======= | 10% | |======= | 11% | |======== | 11% | |======== | 12% | |========= | 12% | |========= | 13% | |========== | 14% | |========== | 15% | |=========== | 15% | |=========== | 16% | |============ | 17% | |============ | 18% | |============= | 18% | |============= | 19% | |============== | 20% | |============== | 21% | |=============== | 21% | |=============== | 22% | |================ | 22% | |================ | 23% | |================ | 24% | |================= | 24% | |================= | 25% | |================== | 25% | |================== | 26% | |=================== | 27% | |=================== | 28% | |==================== | 28% | |==================== | 29% | |===================== | 30% | |===================== | 31% | |====================== | 31% | |====================== | 32% | |======================= | 32% | |======================= | 33% | |======================== | 34% | |======================== | 35% | |========================= | 35% | |========================= | 36% | |========================== | 37% | |========================== | 38% | |=========================== | 38% | |=========================== | 39% | |============================ | 39% | |============================ | 40% | |============================= | 41% | |============================= | 42% | |============================== | 42% | |============================== | 43% | |=================================== | 49% | |=================================== | 50% | |==================================== | 51% | |==================================== | 52% | |===================================== | 52% | |===================================== | 53% | |====================================== | 54% | |====================================== | 55% | |======================================= | 56% | |======================================== | 57% | |======================================== | 58% | |========================================= | 58% | |========================================= | 59% | |========================================== | 59% | |========================================== | 60% | |========================================== | 61% | |=========================================== | 61% | |=========================================== | 62% | |============================================ | 62% | |============================================ | 63% | |============================================ | 64% | |============================================= | 64% | |============================================= | 65% | |============================================== | 65% | |============================================== | 66% | |=============================================== | 67% | |=============================================== | 68% | |================================================ | 68% | |================================================ | 69% | |================================================= | 70% | |================================================== | 71% | |================================================== | 72% | |=================================================== | 72% | |=================================================== | 73% | |==================================================== | 74% | |==================================================== | 75% | |===================================================== | 75% | |===================================================== | 76% | |====================================================== | 77% | |====================================================== | 78% | |======================================================= | 78% | |======================================================= | 79% | |======================================================== | 79% | |======================================================== | 80% | |======================================================== | 81% | |========================================================= | 81% | |========================================================= | 82% | |========================================================== | 82% | |========================================================== | 83% | |========================================================== | 84% | |=========================================================== | 84% | |=========================================================== | 85% | |============================================================ | 85% | |============================================================ | 86% | |============================================================= | 87% | |============================================================= | 88% | |============================================================== | 88% | |============================================================== | 89% | |=============================================================== | 89% | |=============================================================== | 90% | |=============================================================== | 91% | |================================================================ | 91% | |================================================================ | 92% | |================================================================= | 92% | |================================================================= | 93% | |================================================================== | 94% | |================================================================== | 95% | |=================================================================== | 95% | |=================================================================== | 96% | |==================================================================== | 97% | |===================================================================== | 98% | |===================================================================== | 99% | |======================================================================| 100%
# Reformatting data
mydata <-
separate_wider_delim(mydata,
NAME,
delim = ", ",
names = c("County", "State"))
The part of the script is responsible for producing the map. The line
reading layer.name = "Estimate"
specifies “Estimate” as the
label that will appear above the may’s legend. I figured “Estimate”
would be generic enough to suit most mapping purposes. But you can
change “Estimate” to something else if you want a different legend
title.
# Mapping data
mapdata <- mydata %>%
rename(Estimate = Estimate_E, Estimate_MOE = Estimate_M)
mapdata <- st_as_sf(mapdata)
mapviewOptions(basemaps.color.shuffle = FALSE)
mapview(mapdata, zcol = "Estimate",
layer.name = "Estimate",
popup = TRUE)
You might want a copy of your data that is suitable for working with
in a spreadsheet program, like Microsoft Excel. This code strips the
mapping data out of the file and saves the rest in a comma-separated
value file called mydata.csv
. You will find the data file
on your computer, in the same directory as your R script file.
# Exporting data in .csv format
CSVdata <- st_drop_geometry(mapdata)
write.csv(CSVdata, "mydata.csv", row.names = FALSE)
This adaptation of the script adds the following code, which filters
the mydata data frame for just the counties that make up the Nashville
Metropolitan Statistical Area. The code between
# Filtering data
and # Mapping data
handles
the filtering. You may edit the list to include any Tennessee counties
you want.
##################################################################
# County level, Nashville MSA
##################################################################
# Installing and loading required packages
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("tidycensus")) install.packages("tidycensus")
if (!require("sf")) install.packages("sf")
if (!require("mapview")) install.packages("mapview")
library(tidyverse)
library(tidycensus)
library(sf)
library(mapview)
# Transmitting API key
# census_api_key("PasteYourAPIKeyBetweenTheseQuoteMarks")
# Fetching ACS codebooks
DetailedTables <- load_variables(2022, "acs5", cache = TRUE)
SubjectTables <- load_variables(2022, "acs5/subject", cache = TRUE)
ProfileTables <- load_variables(2022, "acs5/profile", cache = TRUE)
All_ACS_Variables <- bind_rows(DetailedTables, ProfileTables)
All_ACS_Variables <- bind_rows(All_ACS_Variables, SubjectTables)
rm (DetailedTables, SubjectTables, ProfileTables)
# Specify a variable to estimate
VariableList =
c(Estimate_ = "DP02_0154P")
# Fetching data
mydata <- get_acs(
geography = "county",
state = "TN",
variables = VariableList,
year = 2022,
survey = "acs5",
output = "wide",
geometry = TRUE)
# Reformatting data
mydata <-
separate_wider_delim(mydata,
NAME,
delim = ", ",
names = c("County", "State"))
# Filtering data
mydata <- mydata %>%
filter(County == "Cannon County"|
County == "Cheatham County"|
County == "Davidson County"|
County == "Dickson County"|
County == "Hickman County"|
County == "Macon County"|
County == "Maury County"|
County == "Robertson County"|
County == "Rutherford County"|
County == "Smith County"|
County == "Sumner County"|
County == "Trousdale County"|
County == "Williamson County"|
County == "Wilson County")
# Mapping data
mapdata <- mydata %>%
rename(Estimate = Estimate_E, Estimate_MOE = Estimate_M)
mapdata <- st_as_sf(mapdata)
mapviewOptions(basemaps.color.shuffle = FALSE)
mapview(mapdata, zcol = "Estimate",
layer.name = "Estimate",
popup = TRUE)
# Exporting data in .csv format
CSVdata <- st_drop_geometry(mapdata)
write.csv(CSVdata, "mydata.csv", row.names = FALSE)
This script is identical to Script 2, except that it retrieves data for county subdivisions, rather than whole counties. Many geographic levels are available. For a list, see “Geography in tidycensus” at https://walker-data.com/tidycensus/articles/basic-usage.html.
Note that this line of code, from Script 2:
names = c("County", "State"))
… has been altered to read:
names = c("District", "County", "State"))
… because, in county-subdivision-level data, the NAME
variable combines three components - the district name, the county name,
and the state name - rather than just the county name and state name.
Accordingly, the code has to split NAME
into three columns,
rather than two.
##################################################################
#County subdivision level, Nashville MSA
##################################################################
# Installing and loading required packages
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("tidycensus")) install.packages("tidycensus")
if (!require("sf")) install.packages("sf")
if (!require("mapview")) install.packages("mapview")
library(tidyverse)
library(tidycensus)
library(sf)
library(mapview)
# Transmitting API key
# census_api_key("PasteYourAPIKeyBetweenTheseQuoteMarks")
# Fetching ACS codebooks
DetailedTables <- load_variables(2022, "acs5", cache = TRUE)
SubjectTables <- load_variables(2022, "acs5/subject", cache = TRUE)
ProfileTables <- load_variables(2022, "acs5/profile", cache = TRUE)
All_ACS_Variables <- bind_rows(DetailedTables, ProfileTables)
All_ACS_Variables <- bind_rows(All_ACS_Variables, SubjectTables)
rm (DetailedTables, SubjectTables, ProfileTables)
# Specify a variable to estimate
VariableList =
c(Estimate_ = "DP02_0154P")
# Fetching data
mydata <- get_acs(
geography = "county subdivision",
state = "TN",
variables = VariableList,
year = 2022,
survey = "acs5",
output = "wide",
geometry = TRUE)
## | | | 0% | |== | 2% | |==== | 5% | |======= | 9% | |======== | 11% | |============ | 18% | |============== | 19% | |=============== | 21% | |===================== | 29% | |======================== | 34% | |========================= | 36% | |=========================== | 39% | |============================= | 42% | |================================ | 46% | |================================== | 49% | |===================================== | 53% | |======================================= | 56% | |======================================== | 57% | |======================================================================| 100%
# Reformatting data
mydata <-
separate_wider_delim(mydata,
NAME,
delim = ", ",
names = c("District", "County", "State"))
# Filtering data
mydata <- mydata %>%
filter(County == "Cannon County"|
County == "Cheatham County"|
County == "Davidson County"|
County == "Dickson County"|
County == "Hickman County"|
County == "Macon County"|
County == "Maury County"|
County == "Robertson County"|
County == "Rutherford County"|
County == "Smith County"|
County == "Sumner County"|
County == "Trousdale County"|
County == "Williamson County"|
County == "Wilson County")
# Mapping data
mapdata <- mydata %>%
rename(Estimate = Estimate_E, Estimate_MOE = Estimate_M)
mapdata <- st_as_sf(mapdata)
mapviewOptions(basemaps.color.shuffle = FALSE)
mapview(mapdata, zcol = "Estimate",
layer.name = "Estimate",
popup = TRUE)
# Exporting data in .csv format
CSVdata <- st_drop_geometry(mapdata)
write.csv(CSVdata, "mydata.csv", row.names = FALSE)
The point, here, is to show that Script 3 can easily be modified to map data for some other part of the U.S., like the New York City area. All that’s needed is to change:
state = "TN"
… to …
state = "NY"
… and edit the county list to reflect the desired New York counties.
##################################################################
# County Subdivision level, New York City area
##################################################################
# Installing and loading required packages
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("tidycensus")) install.packages("tidycensus")
if (!require("sf")) install.packages("sf")
if (!require("mapview")) install.packages("mapview")
library(tidyverse)
library(tidycensus)
library(sf)
library(mapview)
# Transmitting API key
# census_api_key("PasteYourAPIKeyBetweenTheseQuoteMarks")
# Fetching ACS codebooks
DetailedTables <- load_variables(2022, "acs5", cache = TRUE)
SubjectTables <- load_variables(2022, "acs5/subject", cache = TRUE)
ProfileTables <- load_variables(2022, "acs5/profile", cache = TRUE)
All_ACS_Variables <- bind_rows(DetailedTables, ProfileTables)
All_ACS_Variables <- bind_rows(All_ACS_Variables, SubjectTables)
rm (DetailedTables, SubjectTables, ProfileTables)
# Specify a variable to estimate
VariableList =
c(Estimate_ = "DP02_0154P")
# Fetching data
mydata <- get_acs(
geography = "county subdivision",
state = "NY",
variables = VariableList,
year = 2022,
survey = "acs5",
output = "wide",
geometry = TRUE)
## | | | 0% | |=== | 4% | |====== | 9% | |=========== | 16% | |================ | 23% | |====================== | 31% | |========================== | 37% | |============================== | 43% | |=================================== | 50% | |======================================== | 57% | |============================================= | 64% | |================================================= | 71% | |========================================================= | 82% | |============================================================== | 88% | |=================================================================== | 96% | |======================================================================| 100%
# Reformatting data
mydata <-
separate_wider_delim(mydata,
NAME,
delim = ", ",
names = c("District", "County", "State"))
# Filtering data
mydata <- mydata %>%
filter(County == "Suffolk County"|
County == "Nassau County"|
County == "Queens County"|
County == "Kings County"|
County == "Richmond County"|
County == "New York County"|
County == "Bronx County"|
County == "Westchester County"|
County == "Rockland County")
# Mapping data
mapdata <- mydata %>%
rename(Estimate = Estimate_E, Estimate_MOE = Estimate_M)
mapdata <- st_as_sf(mapdata)
mapviewOptions(basemaps.color.shuffle = FALSE)
mapview(mapdata, zcol = "Estimate",
layer.name = "Estimate",
popup = TRUE)
# Exporting data in .csv format
CSVdata <- st_drop_geometry(mapdata)
write.csv(CSVdata, "mydata.csv", row.names = FALSE)
If you want to map data that is not included in the American Community Survey, you can use the ACS to get the geographic boundaries for your map, merge the map data with your external data, then make the map.
This version of the script accesses my GitHub page to retrieve
NewsOutlets.csv, a comma-separated value file containing the number of
news outlets in each Middle Tennessee county, according to the 2023
database at Northwestern University, https://localnewsinitiative.northwestern.edu/projects/state-of-local-news/.
It then downloads voting-age-population estimates for each county from
the ACS and merges the two data files by matching them on the “County”
variable that they have in common. See the code between
# Merging data
and # Mapping data
for a look
at how it’s done.
The merged file contains a variable called Type
that
identifies each county as having between “0 outlets” and “3 or more
outlets.” The mapping portion of the code specifies this
Type
variable as the basis for the mapping key by plugging
it into the zcol =
argument in place of the
Estimate
variable:
mapview(mapdata, zcol = "Type"
##################################################################
# County level, Middle Tennessee, merged with external data
##################################################################
# Installing and loading required packages
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("tidycensus")) install.packages("tidycensus")
if (!require("sf")) install.packages("sf")
if (!require("mapview")) install.packages("mapview")
library(tidyverse)
library(tidycensus)
library(sf)
library(mapview)
# News outlet count by Middle Tennessee County
# Compiled from: https://localnewsinitiative.northwestern.edu/
NewsOutlets <- read.csv("https://raw.githubusercontent.com/drkblake/Data/main/NewsOutlets.csv")
# Transmitting API key
# census_api_key("PasteYourAPIKeyBetweenTheseQuoteMarks")
# Fetching ACS codebooks
DetailedTables <- load_variables(2022, "acs5", cache = TRUE)
SubjectTables <- load_variables(2022, "acs5/subject", cache = TRUE)
ProfileTables <- load_variables(2022, "acs5/profile", cache = TRUE)
All_ACS_Variables <- bind_rows(DetailedTables, ProfileTables)
All_ACS_Variables <- bind_rows(All_ACS_Variables, SubjectTables)
rm (DetailedTables, SubjectTables, ProfileTables)
# Specify a variable to estimate
VariableList =
c(Estimate_ = "DP05_0089") # DP05_0089 = Voting-age population
# Fetching data
mydata <- get_acs(
geography = "county",
state = "TN",
variables = VariableList,
year = 2022,
survey = "acs5",
output = "wide",
geometry = TRUE)
# Reformatting data
mydata <-
separate_wider_delim(mydata,
NAME,
delim = ", ",
names = c("County", "State"))
# Filtering data
mydata <- mydata %>%
filter(County == "Bedford County"|
County == "Cannon County"|
County == "Cheatham County"|
County == "Clay County"|
County == "Coffee County"|
County == "Davidson County"|
County == "DeKalb County"|
County == "Dickson County"|
County == "Fentress County"|
County == "Franklin County"|
County == "Giles County"|
County == "Grundy County"|
County == "Hickman County"|
County == "Houston County"|
County == "Humphreys County"|
County == "Jackson County"|
County == "Lawrence County"|
County == "Lewis County"|
County == "Lincoln County"|
County == "Macon County"|
County == "Marshall County"|
County == "Maury County"|
County == "Montgomery County"|
County == "Moore County"|
County == "Overton County"|
County == "Perry County"|
County == "Pickett County"|
County == "Putnam County"|
County == "Robertson County"|
County == "Rutherford County"|
County == "Sequatchie County"|
County == "Smith County"|
County == "Stewart County"|
County == "Sumner County"|
County == "Trousdale County"|
County == "Van Buren County"|
County == "Warren County"|
County == "Wayne County"|
County == "White County"|
County == "Williamson County"|
County == "Wilson County")
# Merging data
mydata <- left_join(mydata,
NewsOutlets,
by = join_by(County == County))
# Mapping data
mapdata <- mydata %>%
rename(Estimate = Estimate_E, Estimate_MOE = Estimate_M)
mapdata <- st_as_sf(mapdata)
mapviewOptions(basemaps.color.shuffle = FALSE)
mapview(mapdata, zcol = "Type",
layer.name = "Estimate",
popup = TRUE)
# Exporting data in .csv format
CSVdata <- st_drop_geometry(mapdata)
write.csv(CSVdata, "mydata.csv", row.names = FALSE)