If you want more in depth info, please go to [https://walker-data.com/tidycensus/]
If you are at the office or on your VPN, you will need to load the ‘curl’ package and run these three lines of code to get around the proxy server and be able to use the census API
#function to get around proxy server for tidycensus
library(curl)
## Using libcurl 7.64.1 with Schannel
companyproxy <- curl::ie_proxy_info()$Proxy
Sys.setenv(http_proxy=companyproxy)
Sys.setenv(https_proxy=companyproxy)
Tidyverse and Tidycensus are the two packages required to pull census data in R. The ‘srvyr’ package is useful to make estimations with weighted PUMS data, but not required if you’re using Census or ACS data– because there are no weights for those.
#load packages
library(tidycensus)
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.5 v dplyr 1.0.7
## v tidyr 1.1.4 v stringr 1.4.0
## v readr 2.1.1 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## x readr::parse_date() masks curl::parse_date()
library(srvyr)
##
## Attaching package: 'srvyr'
## The following object is masked from 'package:stats':
##
## filter
The following line of code creates a table of variable information for the specified survey. The load_variables() function takes the year and dataset as its arguments. In the following code block I am looking at variables for the 2015-2019 5 year ACS.
vars <- load_variables(2019, "acs5")
vars
## # A tibble: 27,040 x 3
## name label concept
## <chr> <chr> <chr>
## 1 B01001_001 Estimate!!Total: SEX BY AGE
## 2 B01001_002 Estimate!!Total:!!Male: SEX BY AGE
## 3 B01001_003 Estimate!!Total:!!Male:!!Under 5 years SEX BY AGE
## 4 B01001_004 Estimate!!Total:!!Male:!!5 to 9 years SEX BY AGE
## 5 B01001_005 Estimate!!Total:!!Male:!!10 to 14 years SEX BY AGE
## 6 B01001_006 Estimate!!Total:!!Male:!!15 to 17 years SEX BY AGE
## 7 B01001_007 Estimate!!Total:!!Male:!!18 and 19 years SEX BY AGE
## 8 B01001_008 Estimate!!Total:!!Male:!!20 years SEX BY AGE
## 9 B01001_009 Estimate!!Total:!!Male:!!21 years SEX BY AGE
## 10 B01001_010 Estimate!!Total:!!Male:!!22 to 24 years SEX BY AGE
## # ... with 27,030 more rows
The get_acs() function is where you are able to load a dataset into R. Most of the time, the required arguments are geography, state, variables, year, survey, and key (which is your Census API key). Go to this website to get a free API key from the Census Bureau. The following API call gets county level information for Pennsylvania from the 2019 5 year ACS. The variables (whos codes I found from the above table.) show the total population, population that lives in owner-occupied units, and population that lives in owner occupied units and drives to work.
census_api_key("82bbcc6711ff4a4923edffb104d47daaa8d191cd", overwrite = TRUE, install = TRUE)
## Your original .Renviron will be backed up and stored in your R HOME directory if needed.
## Your API key has been stored in your .Renviron and can be accessed by Sys.getenv("CENSUS_API_KEY").
## To use now, restart R or run `readRenviron("~/.Renviron")`
## [1] "82bbcc6711ff4a4923edffb104d47daaa8d191cd"
drive2work <- get_acs(geography = "county",
state = "PA",
variables = c("B01001_001", "B08537_002", "B08537_005"), #population, owner occupied, owner occupied that drove to work
year = 2019,
survey = "acs5",
key = "82bbcc6711ff4a4923edffb104d47daaa8d191cd")
## Getting data from the 2015-2019 5-year ACS
drive2work
## # A tibble: 201 x 5
## GEOID NAME variable estimate moe
## <chr> <chr> <chr> <dbl> <dbl>
## 1 42001 Adams County, Pennsylvania B01001_001 102470 NA
## 2 42001 Adams County, Pennsylvania B08537_002 26104 879
## 3 42001 Adams County, Pennsylvania B08537_005 21672 816
## 4 42003 Allegheny County, Pennsylvania B01001_001 1221744 NA
## 5 42003 Allegheny County, Pennsylvania B08537_002 502386 4412
## 6 42003 Allegheny County, Pennsylvania B08537_005 393627 3703
## 7 42005 Armstrong County, Pennsylvania B01001_001 65867 NA
## 8 42005 Armstrong County, Pennsylvania B08537_002 14523 568
## 9 42005 Armstrong County, Pennsylvania B08537_005 12093 532
## 10 42007 Beaver County, Pennsylvania B01001_001 165833 NA
## # ... with 191 more rows
The following block of code includes some data cleaning/manipulation, If you can see from the above table, there is a variable column with all of our variables. In order to make it easier to work with this data, we will use the pivot_wider() function. It takes the names from the variable column and makes them their own columns, and then uses the values from the estimate column and puts them under the correct variable. Then I filtered for some specific counties to make the upcoming graph easier to read.
Note: the first step in this section is unnecessary if you include output = "wider" in the get_acs() call
#pivoting data to make it clean and filtering for Philadelphia and aurrounding counties
padriving <- drive2work %>% pivot_wider(names_from = variable, values_from = c(estimate, moe)) %>% filter(NAME %in% c("Philadelphia County, Pennsylvania", "Bucks County, Pennsylvania", "Montgomery County, Pennsylvania", "Chester County, Pennsylvania", "Delaware County, Pennsylvania"))
You can also create new variables from the existing variables. In these lines of code I am dividing the number of people who both live in an owner occupied unit and drive to work by the amount of people who live in an owner occupied unit and then the population as a whole.
#percent of people who live in an owner occupied home, who drive to work
padriving$pctowndrive <- padriving$estimate_B08537_005 / padriving$estimate_B08537_002
#percent of the total population that both lives in an owner occupied home and drives to work
padriving$pctdrive <- padriving$estimate_B08537_005 / padriving$estimate_B01001_001
Next we can use ggplot to visualize the difference between the counties that we selected
#bar graph
ggplot(padriving, aes(y = NAME, x = pctowndrive)) +
geom_col(fill = "#9400c6") +
ylab("") +
theme(plot.title = element_text(hjust= 0, face="bold", size = 15),
plot.background = element_rect(fill="#f7f7f7"),
legend.title = element_blank(),
legend.background = element_rect(fill = "#f7f7f7", color = NA),
panel.background=element_rect(fill = "#f7f7f7", color = NA),
panel.grid.minor=element_blank(),
panel.grid.major.y=element_blank(),
panel.grid.major.x=element_line(color = "#cfcfcf"),
axis.ticks=element_blank(),
plot.caption = element_text(hjust = -0.7, size = 10),
legend.position="top",
panel.border=element_blank())
If you want to make a quick map, its relatively easy with Tidycensus. in the get_acs() function just specify geometry = TRUE and a geometry column will be added to make your data mappable. In the following code, I loaded the number of renters per census tract in Philadelphia from the 2015-2019 ACS. Then you can use ggplot to create a map.
renters <- get_acs(geography = "tract",
state = "PA",
county = "Philadelphia",
variables = "B25008_003",
year = 2019,
survey = "acs5",
key = "82bbcc6711ff4a4923edffb104d47daaa8d191cd",
geometry = TRUE)
## Getting data from the 2015-2019 5-year ACS
## Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
renters
## Simple feature collection with 384 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -75.28027 ymin: 39.86705 xmax: -74.95578 ymax: 40.13799
## Geodetic CRS: NAD83
## First 10 features:
## GEOID NAME
## 1 42101014500 Census Tract 145, Philadelphia County, Pennsylvania
## 2 42101004202 Census Tract 42.02, Philadelphia County, Pennsylvania
## 3 42101004102 Census Tract 41.02, Philadelphia County, Pennsylvania
## 4 42101000804 Census Tract 8.04, Philadelphia County, Pennsylvania
## 5 42101025300 Census Tract 253, Philadelphia County, Pennsylvania
## 6 42101028100 Census Tract 281, Philadelphia County, Pennsylvania
## 7 42101008602 Census Tract 86.02, Philadelphia County, Pennsylvania
## 8 42101030700 Census Tract 307, Philadelphia County, Pennsylvania
## 9 42101031101 Census Tract 311.01, Philadelphia County, Pennsylvania
## 10 42101025900 Census Tract 259, Philadelphia County, Pennsylvania
## variable estimate moe geometry
## 1 B25008_003 1042 205 MULTIPOLYGON (((-75.15131 3...
## 2 B25008_003 1529 670 MULTIPOLYGON (((-75.15614 3...
## 3 B25008_003 4358 1083 MULTIPOLYGON (((-75.16396 3...
## 4 B25008_003 3342 445 MULTIPOLYGON (((-75.17067 3...
## 5 B25008_003 2015 558 MULTIPOLYGON (((-75.18653 4...
## 6 B25008_003 1842 638 MULTIPOLYGON (((-75.15237 4...
## 7 B25008_003 2372 401 MULTIPOLYGON (((-75.22156 3...
## 8 B25008_003 2458 553 MULTIPOLYGON (((-75.0935 40...
## 9 B25008_003 2404 678 MULTIPOLYGON (((-75.08267 4...
## 10 B25008_003 1071 375 MULTIPOLYGON (((-75.1809 40...
renters %>%
ggplot(aes(fill = estimate)) +
geom_sf(color = NA) +
scale_fill_viridis_c()
You can even write these datasets to a shapefile and use it in ArcGIS
library(sf)
st_write(renters, "tidycensus demo renters.shp")