This tutorial assumes that you already know how to set up and run
RStudio and that you have a very basic understanding of the R language.
Essentially, that you know how to assign an object using the assignment
operator <- and that you know how to install packages
and bring them into your working environment.
One quick tip as we move forward. Instead of separately typing the
< and - each time, you can use the following hotkey
Alt+- to generate the assignment operator
<-. Similarly, you can use ctrl+shft+m to
generate the %>% “pipe” operator (assuming you are on a PC).
Now that we have that out of the way, let’s have some fun working with ACS Census data. This is a very basic overview of the process using the Tidyverse and Tidycensus packages. For detailed descriptions and tutorials for those packages, you can use the following resources:
R for Data Science – https://r4ds.had.co.nz/ Statistical Inference via Data Science: A ModernDive into R and the Tidyverse – https://moderndive.com/ Analyzing US Census Data: Methods, Maps, and Models in R – https://walker-data.com/census-r/index.html
To start, you will need to load three packages. The tidyverse package is a very useful package for sorting and cleaning data. Tidy census makes available a number of functions that we will be using to pull information from the ACS Census files. Readxl is a package that allows us to read excel packages faster and adds some interesting functions to pull from different sheets etc.
library(tidyverse)
library(tidycensus)
library(readxl)
library(ggplot2)
options(tigris_use_cache = TRUE)
In order to access Census data using R, you will need to acquire an API key from the census website. API keys can be obtained at https://api.census.gov/data/key_signup.html. After you’ve signed up for an API key, be sure to activate the key from the email you receive from the Census Bureau so it works correctly.
Once you have done that, you will need to enter the
census_api_key() command at the front end of your R script.
You can include install = True as a part of the call so
that you don’t have to have that line on future scripts, but I prefer to
have the full call at the beginning of each project just in case I
update R or the system crashes and that information is removed from the
environment.
# Load Census API Key
census_api_key("7302062e895b2ad7a445969495ea23af6979f633")
## If you want to install the key for convenience, use the code below.
# census_api_key("YOUR KEY GOES HERE", install = TRUE)
I find that creating objects early in the process for the geographic areas you want data for makes using existing scripts much easier to use repeatedly. Instead of writing out the state, county, etc. for each call, you can set an object that does that for you and update it at the top of the script. Of course, if you are drawing for multiple geographic areas in a single script, this isn’t as useful.