census_api_key("put your census api key here",
install = TRUE,
overwrite = TRUE)`Advanced GIS, Week 9
Outline
Assignment 3
Lab 3 overview & Part 2 deliverables
Intro to tidycensus
Intro to spatial data and sf
Create a map with decennial Census data
Create a map with ACS data
Lab 3 deliverables detail
Assignment 3
Research Proposal:
One PDF (3-5 pages) describing the research question, methods and expected results of the research project.
- Due next week: Nov 1
Lab 3, Part 2 Deliverables
due November 8 (pushing this by one week so you have time for in-class help next week)
- Deliverable 1 R script to process housing occupancy data for every state from the 2020 decennial Census and create a leaflet map in your R Studio viewer.
- Deliverable 2 R script to:
- Process median rent data for every census tract in Brooklyn from the 2016-20 5-year American Community Survey
- Calculate which tract is affordable for 2 people with minimum wage jobs
- Create two leaflet maps to display median rent and rent affordability for 2 people with minimum wage jobs
- Resubmit any part of Lab, part 1 if you have corrections
tidycensus
Today we talk about how to use the tidycensus package to access Census data in R and prepare it for mapping and spatial analysis.
tidycensus:- delivers Census data to R users in a tidyverse-friendly way
- Decennial Census data
- American Community Survey data, 5-year and 1-year
- geometry
- it requires a Census API key: get it here.
- delivers Census data to R users in a tidyverse-friendly way
lesson prep
- install packages:
- tidycensus: to import Census data
- sf: to handle spatial data
- scales: to format text
- leaflet: to create interactive map viewer in R Studio
install.packages("package_name")- or click
Installin the Packages window for a user-interface
- create a script to install your Census API key
- create new R script, save it in your scripts folder as
tidycensus_API_key.R
- create new R script, save it in your scripts folder as
- run the script
- On your computer you only need to install the API key once
- Pratt computers may require you run this every time you log on
tidycensus documentation
- Overview
- Basic usage
- Available geography
- You can also look at the ChangeLog to see what is available for the 2020 census.
Create a new script
Create a new script to process housing occupancy data from the 2020 decennial Census (this will be Deliverable 1:
- New R Script
- Save it as
housing_occupancy_2020.Rin the main_data/scripts/data_processing folder - Add the packages you will use in the script at the top of the script:
library(tidyverse)
library(tidycensus)
library(sf)
library(leaflet)
library(scales)list of decennial census variables
- Create a data frame of all available variables in the 2020 decennial Census
- Only data for redistricting is available so far (boo, covid delays)
- The 2020 Census Redistricting Data (P.L. 94-171) Summary Files
- called “pl” in tidycensus
# create table of all variables in the 2020 redistricting file
decennial_vars_2020 <- load_variables(2020, "pl", cache = TRUE)cache = TRUE: saves information to your computer so that this table will load faster next time. (recommended)
load_variables documentation
- in your console, type
??load_variablesto see the help section

decennial census variables
| name | label | concept |
|---|---|---|
| H1_001N | !!Total: | OCCUPANCY STATUS |
| H1_002N | !!Total:!!Occupied | OCCUPANCY STATUS |
| H1_003N | !!Total:!!Vacant | OCCUPANCY STATUS |
| P1_001N | !!Total: | RACE |
| P1_002N | !!Total:!!Population of one race: | RACE |
| P1_003N | !!Total:!!Population of one race:!!White alone | RACE |
….(displaying the first 6 of 301 rows)
- The returned data frame always has three columns:
- name:refers to the Census variable ID
- format =
tableID_variableID- label: a descriptive data label for the variable
- concept: refers to the topic of the data and often corresponds to a table of Census data.
Redistricting Race/Ethnicity data
There are 6 tables in the PL-2020 Census Redistricting Data:
- P1. Race
- P2. Hispanic or Latino, and Not Hispanic or Latino by Race
- P3. Race for the Population 18 Years and Over
- P4. Hispanic or Latino, and Not Hispanic or Latino by Race for the Population 18 Years and Over
- P5. Group Quarters Population by Major Group Quarters Type
- H1. Occupancy Status
Import Housing Units data
Use the get_decennial() function to create a data frame of housing units for every county in New York:
- Look at the
decennial_vars_2020data frame to find the variable IDs for:- housing units, occupied housing units, vacant housing units
raw_housing_units <- get_decennial(geography = "state",
variables = c(housing_units = "H1_001N",
occupied_units = "H1_002N",
vacant_units = "H1_003N"),
year = 2020,
output = "wide",
geometry = TRUE)| GEOID | NAME | housing_units | occupied_units | vacant_units | geometry |
|---|---|---|---|---|---|
| 35 | New Mexico | 940859 | 829514 | 111345 | MULTIPOLYGON (((-109.0502 3… |
| 72 | Puerto Rico | 1598159 | 1340534 | 257625 | MULTIPOLYGON (((-65.23805 1… |
| 06 | California | 14392140 | 13475623 | 916517 | MULTIPOLYGON (((-118.6044 3… |
| 01 | Alabama | 2288330 | 2011947 | 276383 | MULTIPOLYGON (((-88.05338 3… |
| 13 | Georgia | 4410956 | 4020808 | 390148 | MULTIPOLYGON (((-81.27939 3… |
….(displaying the first 5 of 52 rows)
get_decennial explainer
Let’s break down the get_decennial() function
- look it up in the help for more details:
??get_decennial
raw_housing_units <- get_decennial(geography = "state",
variables = c(housing_units = "H1_001N",
occupied_units = "H1_002N",
vacant_units = "H1_003N"),
year = 2020,
output = "wide",
geometry = TRUE)
- geography: defines the geographic unit (find available geographies)
- variables: list of variable IDs
- we are renaming as we import with
housing_units = "H1_001N"- year: defines the year (2000, 2010, 2020 are available for decennial)
- output: wide returns a data frame where each variable is a column
- the default option is
tidyreturns a data frame with unique unit-variable combinations- geometry: imports as a spatial data frame with a geometry column
- *Note: the order of the parameters above doesn’t matter
Answer a question
The ACS questionnaire is sent to 3.5 million addresses each year.
- What proportion of US households get the questionnaire?
- How would we answer this question with our data frame?
acs_percent <- 3500000/sum(raw_housing_units$housing_units)
Calculate new variables
Create a new data frame and calculate two new variables in your dataframe:
- use the
filter()function to remove Alaska, Hawaii, and Puerto Rico - use the
mutate()function to add the two new variables- percent vacant = occupied_units/housing_units
- percent occupied = vacant_units/housing_units
- percent_vacant_label =
percent_vacantformatted withpercent()function from scales package
housing_units <- raw_housing_units %>%
mutate(pct_occupied = occupied_units/housing_units,
pct_vacant = vacant_units/housing_units,
percent_vacant_label = percent(pct_vacant, accuracy = 1L)) %>%
filter(NAME != "Alaska",
NAME != "Hawaii",
NAME != "Puerto Rico")- so your data frame looks like this:
| GEOID | NAME | housing_units | occupied_units | vacant_units | geometry | pct_occupied | pct_vacant | percent_vacant_label |
|---|---|---|---|---|---|---|---|---|
| 35 | New Mexico | 940859 | 829514 | 111345 | MULTIPOLYGON (((-109.0502 3… | 0.8816560 | 0.1183440 | 12% |
| 06 | California | 14392140 | 13475623 | 916517 | MULTIPOLYGON (((-118.6044 3… | 0.9363182 | 0.0636818 | 6% |
| 01 | Alabama | 2288330 | 2011947 | 276383 | MULTIPOLYGON (((-88.05338 3… | 0.8792207 | 0.1207793 | 12% |
Make a map with leaflet
use st_tranform() to project to WGS84 (WGS84 is required for leaflet)
- use epsg.io to find the EPSG code for any projection
# transform spatial data to WGS84 for leaflet
housing_units_4326 <- st_transform(housing_units, 4326)leaflet() |>
addProviderTiles(provider = "CartoDB.Positron") %>%
addPolygons(data = housing_units_4326,
color = "#ababab",
fillColor = "purple")Leaflet explained
leaflet() |>
addProviderTiles(provider = "CartoDB.Positron") %>%
addPolygons(data = housing_units_4326,
color = "#ababab",
fillColor = "purple")leaflet(): creates the map objectaddProviderTiles(): adds the basemap layeraddPolygons(): adds the polygon layer with the following parameters:- data: the data frame to use
- color: the color of the lines
- fillColor: the color of the fill
Make a choropleth map
We’re going to make a choropleth map of the proportion of housing units that are vacant in each state.
Steps:
- Create a histogram to see the range of the statistic
- Define the color palette we’ll use
- Add the color palette to the leaflet map
- Add a popup to the map
Choropleth part 1: histogram
min(housing_units_4326$pct_vacant)[1] 0.06368177
max(housing_units_4326$pct_vacant)[1] 0.2119347
hist(housing_units_4326$pct_vacant)
Choropleth part 2: palette
- We’ll use the max, min and histogram to define the breaks in the colors in our map
- Then use the
colorBin()function to define the palette
vacant_green_palette <- colorBin(palette = "Greens",
domain = housing_units_4326$pct_vacant,
bins = c(0, .05, .1, .15, .2, .25))colorBin(): defines the palette used in a leaflet map:palette(): defines a predefined gradient (see optional below for more information)domain(): defines the variable to apply the palette tobins(): defines the breaks in the palette - when the color changes
- see colorBrewer color palettes
- optional: see the Colors chapter of Leaflet for R to learn about:
- the accepted names of color palettes
- the 3 other color functions you can use
Choropleth part 3: map
leaflet() |>
addProviderTiles(provider = "CartoDB.Positron") %>%
addPolygons(data = housing_units_4326,
color = "#ababab",
fillColor = ~vacant_green_palette(pct_vacant),
fillOpacity = .85,
popup = paste0(housing_units_4326$NAME,
'<br/>', 'Percent Vacant: ',
housing_units_4326$pct_vacant))leaflet, learn more
This is as far as we’ll go with leaflet. To learn more, check out this simple tutorial and the other chapters in that book.
new functions in this lesson
census_api_key: (tidycensus) install api key for tidycensus (do this once)load_variables(): (tidycensus) import dataframe of all variables in the Censusc(): combine values into a listget_decennial(): (tidycensus) import data from decennial Censusget_acs(): (tidycensus) import data from ACSst_transform(): (sf) project spatial dataframe to new projectionleaflet(): (leaflet) create interactive map, displayed in viewerhist(): create a histogram of a variableaddProviderTiles(): (leaflet) add baselayeraddPolygons(): (leaflet) add polygon layercolorBin(): (leaflet) define color palette for leaflet map based on a variablepaste0(): concatenate strings
Deliverable 1
Clean up the script you created above, housing_occupancy_2020.R.
Upload to CANVAS, Lab 3 Part 2.
Deliverable 2
- Create a new script to create 2 maps using data from the American Community Survey:
- median rent for every census tract in Brooklyn (or another county if you prefer)
- affordability for two people working 40 hours/week making minimum wage
- Save the script as
Brooklyn_rent_affordability.Rin the main_data/scripts/data_exploration folder- change the name if you’ll do the analysis in a different county
You will use the same steps as in the Percent Vacant example, using data from the 2016-20, 5-yr American Community Survey.
See the instructions in the following pages. When complete, submit your script on CANVAS.
D2: Load packages
library(tidyverse)
library(tidycensus)
library(sf)
library(leaflet)
library(scales)D2: Download contract rent
- Use the
load_variables()function to see a list of all variables available in the 5-year ACSacs201620 <- load_variables(2020, "acs5", cache = T)
- View the
acs201620dataframe and search for “MEDIAN CONTRACT RENT” in the concept column - Note the variable ID for MEDIAN CONTRACT RENT
- Use the
get_acs()function to download MEDIAN CONTRACT RENT for everytractin Brooklyn- this will look similar the get_decennial function in the previous example with the following changes:
geography = "tract"- change the variable to contract rent
- add two new parameters:
state = "New York", "county = "Brooklyn,
- this will look similar the get_decennial function in the previous example with the following changes:
D2: get_acs example
Your get_acs() should look like this, with the correct variable inserted
raw_contract_rent <- get_acs(geography = "tract",
variables = c(contract_rent = "insert variable id here"),
state = "New York",
county = "Kings County",
year = 2020,
geometry = TRUE,
output = "wide")- Notice that ACS data has 2 data columns
- contract_rentE: this is the estimate
- contract_rentM: this is the margin of error
- For now, you can ignore the margin of error, but it is best practice to exclude areas with high margins of error from your analysis
D2: Calculate variables
Create 4 new columns using the mutate() function:
afforable_rent_2_people_min_wage = 2 * 40 * 15 * 4 * .3
- this is a back-of-the-envelope calculation of how much 2 people with minimum wage jobs can afford to pay for rent
- 2 people, working 40 hours at $15/hour for 4 weeks, allocating 30% of their income to rent
rent_affordability = afforable_rent_2_people_min_wage - contract_rentE
- negative values indicate that the rent is NOT affordable
median_rent_label = dollar(contract_rentE)
rent_affordability_label = dollar(rent_affordability)
use
st_transform()to project this spatial dataframe to WGS84
D2: Define palette
Define a palette for 2 maps:
- Choropleth of median rent for every census tract
- Choropleth of affordability for 2 people with minimum wage jobs
- Look at the minimum, maximum, and histogram for each variable to determine the breaks to use in your palette for each map
- Note: there are NA values in this data that we must explicitly ignore in the max and min calculations with
na.rm()- ex. using the percent vacant data
min(housing_units$pct_vacant, na.rm=TRUE)
- ex. using the percent vacant data
- Note: there are NA values in this data that we must explicitly ignore in the max and min calculations with
- Define the palette for your map with
colorBin() - For the affordablity map, I recommend a diverging color palette like “RdBu”
D2: Create leaflet maps
- Create two leaflet maps:
- Choropleth of median rent for every census tract
- color by contract rent
- create popup that displays the rent for each census tract
- Choropleth of affordability for 2 people with minimum wage jobs
- color by rent affordability for 2 people with minimum wage jobs
- create popup that displays the median rent for each census tract, and the difference between median rent and what 2 people with minimum wage jobs can afford
D2: Submit your script
When you are happy with your maps, clean up your script and upload it to CANVAS, Lab 3 Part 2.