Introduction

As someone uses Census data for research, I know the challenge that it can be in finding, downloading, and cleaning that data. While, the U.S. Census has recently updated its http://data.census.gov, its still not the easiest way of getting the data. At the same time, files from their site are not in shapefile format such as .shp—a key file type for mapping and spatial analysis.

In this code through, I will demonstrate how to easily access and manipulate Census data in R.


Content Overview

Specifically, we will look a few R packages such as “TidyCensus”, “Tigris,” and “SF” that together make R a powerful tool for spatial analysis and map creation. From these packages, we will create a map and shapefile that contains demographic data for the city of Atlanta broken out by Census tracts.

library(tidycensus)
library(tigris)
library(sf)
library(tidyverse)
library(ggplot2)


Why You Should Care

While there are other programs such as ArcGis, Tableau, or ArcMap that allow for mapping and spatial analysis, these require purchasing them or a paid subscription. Not typically thought of as a “go to” mapping tool, R actually has powerful mapping features that are often easier done than in other mapping products.


Learning Objectives

In looking at R’s mapping and spatial analysis capabilities, you will learn:

  1. How to pull Census data directly via a Census API,

  2. Obtain and join that data with other geographic shapefiles,

  3. Perform an advanced mapping function that results in a map of Atlanta with Census data—a feature that requires mapping manipulation



Getting Census Data

Created by Kyle Walker, the TidyCensus package uses the Census API to get access to the backend files of the U.S. Census. Obtaining a Census key is very simple and straightforward and can be get gathered from https://api.census.gov/data/key_signup.html. Once you get your API key, insert your key into the “YOUR API KEY” code below. NOTE: YOU WILL ONLY DO THIS ONCE.

Obtaining your Census API

# Replace YOUR API KEY with the key string you received from Census API 
census_api_key("YOUR API KEY", install = TRUE)
readRenviron("~/.Renviron")

Downloading Census Data

Once your API key is in place, you can now start querying the Census and getting data directly into your R program. This can be done two ways, the first is “get_acs” for American Community Survey data and the second is “get_decennial” for the decennial census data. The variables can be found by going to the Census’s site on APIs https://www.census.gov/data/developers/data-sets.html Or you can use the load_variables() function as shown below. To use the load_variables function you need the year of the survey, and the survey name.

#This loads the variables for ACS 1 Year estimates for 2009 data. 
acs1_2009_variables <- load_variables(2009, "acs1", cache = TRUE)

head(acs1_2009_variables)

Remember every survey will have different variable codes, so you will need to be sure to look them up when you run the code or you will not get the results you are looking for.

Now let’s look more at the syntax of running the query. This code chunk will focus on getting tract level data for Fulton county around 4 key demographic variables on race for the year 2019. Unfortunately, tidycensus doesn’t allow for obtaining census tract data for places and cities. We will need to get county data and then join that with a city of Atlanta shapefile.

fulton_tracts <- get_acs(geography = "tract",
                                variables = c(white = "B02001_002",  
                                black = "B02001_003",
                                asian = "B02001_005",
                                hispanic = "B03001_003"),#You can go ahead and rename the variable
                                year = 2019,
                                state = "Georgia",
                                county =  "Fulton",
                                output = "wide", #This output allows for variables have their own columns. The default is "tidy" which collapses all variables into one column. 
                                geometry = TRUE,
                                cache_table = TRUE) #If you want a shapefile keep this as true. If you only want data, then make FALSE

#Here is the output on a map. 
ggplot(fulton_tracts)+
       geom_sf()+
  theme_void()

The map above shows all census tracts within Fulton county. A simple ggplot command allows us to see what that looks by a particular demographic variable—for example the white population of the county.

ggplot(fulton_tracts)+
geom_sf(aes(fill = whiteE))+
theme_void()

Getting Atlanta Shapefile

In order to get data for the city of Atlanta, we will need to first get a shapefile for it. The Tigris package allows for the obtaining base shapefiles of cities, counties, etc.

atlanta <- places("GA") %>% 
  filter(NAME == "Atlanta") #be sure to use the filter function as the first operation gets places for all of Georgia


plot(atlanta$geom)

Now we have the city of Atlanta shapefile but we still need to get census data for the city. Since Fulton County has that data, we can use an SF package function called “st_intersection” that clips the Fulton county shapefile based on the city of Atlanta border

atlanta_tracts <- st_intersection(fulton_tracts, atlanta)

plot(atlanta_tracts)

As we can see in the map above, we now have the map of Atlanta (minus DeKalb county portion) along with several demographic variables for the city. If desired, you can export this now created Atlanta shapefile out of R to be used in other programs as shown below.

st_write(atlanta_tracts, "atlanta_race.shp")

Conclusion

We have now created a map of Atlanta with a range of demographic variables. If you want to look at other variables or use other geographies such as census blocks you can use the exact same code as above (with the appropriate changes). If you want to how to make better map visuals track out the further resources below for a link.


Further Resources

Learn more about Tidycensus, Tigris, and SF packages from the following:




Works Cited

This code through references and cites the following sources:


  • Walker, Kyle (2022). Analyzing US Census Data: Methods, Maps, and Models in R. Link

  • Walker, Kyle (2016). “tigris: An R Package to Access and Work with Geographic Data from the US Census Bureau.” Link