Advanced GIS, Week 9

Outline

Assignment 3
Lab 3 overview & Part 2 deliverables
Intro to tidycensus
Intro to spatial data and sf
Create a map with decennial Census data
Create a map with ACS data
Lab 3 deliverables detail

Assignment 3

Research Proposal:

One PDF (3-5 pages) describing the research question, methods and expected results of the research project.

Due next week: Nov 1

Lab 3, Part 2 Deliverables

due November 8 (pushing this by one week so you have time for in-class help next week)

Deliverable 1 R script to process housing occupancy data for every state from the 2020 decennial Census and create a leaflet map in your R Studio viewer.
Deliverable 2 R script to:
- Process median rent data for every census tract in Brooklyn from the 2016-20 5-year American Community Survey
- Calculate which tract is affordable for 2 people with minimum wage jobs
- Create two leaflet maps to display median rent and rent affordability for 2 people with minimum wage jobs
Resubmit any part of Lab, part 1 if you have corrections

tidycensus

Today we talk about how to use the tidycensus package to access Census data in R and prepare it for mapping and spatial analysis.

tidycensus:
- delivers Census data to R users in a tidyverse-friendly way
  - Decennial Census data
  - American Community Survey data, 5-year and 1-year
  - geometry
- it requires a Census API key: get it here.

lesson prep

install packages:
- tidycensus: to import Census data
- sf: to handle spatial data
- scales: to format text
- leaflet: to create interactive map viewer in R Studio
  - install.packages("package_name")
  - or click Install in the Packages window for a user-interface
create a script to install your Census API key
- create new R script, save it in your scripts folder as tidycensus_API_key.R

census_api_key("put your census api key here", 
               install = TRUE,
               overwrite = TRUE)`

run the script
- On your computer you only need to install the API key once
- Pratt computers may require you run this every time you log on

tidycensus documentation

Overview
Basic usage
Available geography
You can also look at the ChangeLog to see what is available for the 2020 census.

Create a new script

Create a new script to process housing occupancy data from the 2020 decennial Census (this will be Deliverable 1:

New R Script
Save it as housing_occupancy_2020.R in the main_data/scripts/data_processing folder
Add the packages you will use in the script at the top of the script:

library(tidyverse)
library(tidycensus)
library(sf)
library(leaflet)
library(scales)

list of decennial census variables

Create a data frame of all available variables in the 2020 decennial Census
Only data for redistricting is available so far (boo, covid delays)
- The 2020 Census Redistricting Data (P.L. 94-171) Summary Files
- called “pl” in tidycensus

# create table of all variables in the 2020 redistricting file
decennial_vars_2020 <- load_variables(2020, "pl", cache = TRUE)

cache = TRUE: saves information to your computer so that this table will load faster next time. (recommended)

load_variables documentation

in your console, type ??load_variables to see the help section

decennial census variables

name	label	concept
H1_001N	!!Total:	OCCUPANCY STATUS
H1_002N	!!Total:!!Occupied	OCCUPANCY STATUS
H1_003N	!!Total:!!Vacant	OCCUPANCY STATUS
P1_001N	!!Total:	RACE
P1_002N	!!Total:!!Population of one race:	RACE
P1_003N	!!Total:!!Population of one race:!!White alone	RACE

….(displaying the first 6 of 301 rows)

The returned data frame always has three columns:

name:refers to the Census variable ID

format = tableID_variableID

label: a descriptive data label for the variable

concept: refers to the topic of the data and often corresponds to a table of Census data.

Redistricting Race/Ethnicity data

There are 6 tables in the PL-2020 Census Redistricting Data:

P1. Race
P2. Hispanic or Latino, and Not Hispanic or Latino by Race
P3. Race for the Population 18 Years and Over
P4. Hispanic or Latino, and Not Hispanic or Latino by Race for the Population 18 Years and Over
P5. Group Quarters Population by Major Group Quarters Type
H1. Occupancy Status

Import Housing Units data

Use the get_decennial() function to create a data frame of housing units for every county in New York:

Look at the decennial_vars_2020 data frame to find the variable IDs for:
- housing units, occupied housing units, vacant housing units

raw_housing_units <- get_decennial(geography = "state",
                             variables = c(housing_units = "H1_001N",
                                           occupied_units = "H1_002N", 
                                           vacant_units = "H1_003N"), 
                             year = 2020,
                             output = "wide",
                             geometry = TRUE)

GEOID	NAME	housing_units	occupied_units	vacant_units	geometry
35	New Mexico	940859	829514	111345	MULTIPOLYGON (((-109.0502 3…
72	Puerto Rico	1598159	1340534	257625	MULTIPOLYGON (((-65.23805 1…
06	California	14392140	13475623	916517	MULTIPOLYGON (((-118.6044 3…
01	Alabama	2288330	2011947	276383	MULTIPOLYGON (((-88.05338 3…
13	Georgia	4410956	4020808	390148	MULTIPOLYGON (((-81.27939 3…

….(displaying the first 5 of 52 rows)

get_decennial explainer

Let’s break down the get_decennial() function

look it up in the help for more details: ??get_decennial

raw_housing_units <- get_decennial(geography = "state",
                             variables = c(housing_units = "H1_001N",
                                           occupied_units = "H1_002N", 
                                           vacant_units = "H1_003N"), 
                             year = 2020,
                             output = "wide",
                             geometry = TRUE)

geography: defines the geographic unit (find available geographies)

variables: list of variable IDs

we are renaming as we import with housing_units = "H1_001N"

year: defines the year (2000, 2010, 2020 are available for decennial)

output: wide returns a data frame where each variable is a column

the default option is tidy returns a data frame with unique unit-variable combinations

geometry: imports as a spatial data frame with a geometry column

*Note: the order of the parameters above doesn’t matter

Answer a question

The ACS questionnaire is sent to 3.5 million addresses each year.

What proportion of US households get the questionnaire?
How would we answer this question with our data frame?

acs_percent <- 3500000/sum(raw_housing_units$housing_units)

Calculate new variables

Create a new data frame and calculate two new variables in your dataframe:

use the filter() function to remove Alaska, Hawaii, and Puerto Rico
use the mutate() function to add the two new variables
- percent vacant = occupied_units/housing_units
- percent occupied = vacant_units/housing_units
- percent_vacant_label = percent_vacant formatted with percent() function from scales package

housing_units <- raw_housing_units %>% 
 mutate(pct_occupied = occupied_units/housing_units,
        pct_vacant = vacant_units/housing_units,
        percent_vacant_label = percent(pct_vacant, accuracy = 1L)) %>% 
  filter(NAME != "Alaska",
         NAME != "Hawaii",
         NAME != "Puerto Rico")

so your data frame looks like this:

GEOID	NAME	housing_units	occupied_units	vacant_units	geometry	pct_occupied	pct_vacant	percent_vacant_label
35	New Mexico	940859	829514	111345	MULTIPOLYGON (((-109.0502 3…	0.8816560	0.1183440	12%
06	California	14392140	13475623	916517	MULTIPOLYGON (((-118.6044 3…	0.9363182	0.0636818	6%
01	Alabama	2288330	2011947	276383	MULTIPOLYGON (((-88.05338 3…	0.8792207	0.1207793	12%

Make a map with leaflet

use st_tranform() to project to WGS84 (WGS84 is required for leaflet)

use epsg.io to find the EPSG code for any projection

# transform spatial data to WGS84 for leaflet
housing_units_4326 <- st_transform(housing_units, 4326)

leaflet() %>%
  addProviderTiles(provider = "CartoDB.Positron") %>% 
  addPolygons(data = housing_units_4326,
              color = "#ababab",
              fillColor = "purple")

Leaflet explained

leaflet() %>%
  addProviderTiles(provider = "CartoDB.Positron") %>% 
  addPolygons(data = housing_units_4326,
              color = "#ababab",
              fillColor = "purple")

leaflet(): creates the map object
addProviderTiles(): adds the basemap layer
addPolygons(): adds the polygon layer with the following parameters:
- data: the data frame to use
- color: the color of the lines
- fillColor: the color of the fill

Make a choropleth map

We’re going to make a choropleth map of the proportion of housing units that are vacant in each state.

Steps:

Create a histogram to see the range of the statistic
Define the color palette we’ll use
Add the color palette to the leaflet map
Add a popup to the map

Choropleth part 1: histogram

min(housing_units_4326$pct_vacant)

[1] 0.06368177

max(housing_units_4326$pct_vacant)

[1] 0.2119347

hist(housing_units_4326$pct_vacant)

Choropleth part 2: palette

We’ll use the max, min and histogram to define the breaks in the colors in our map
Then use the colorBin() function to define the palette

vacant_green_palette <- colorBin(palette = "Greens",
                          domain = housing_units_4326$pct_vacant,
                          bins = c(0, .05, .1, .15, .2, .25))

colorBin(): defines the palette used in a leaflet map:
- palette(): defines a predefined gradient (see optional below for more information)
- domain(): defines the variable to apply the palette to
- bins(): defines the breaks in the palette - when the color changes
see colorBrewer color palettes
optional: see the Colors chapter of Leaflet for R to learn about:
- the accepted names of color palettes
- the 3 other color functions you can use

Choropleth part 3: map

leaflet() %>%
  addProviderTiles(provider = "CartoDB.Positron") %>% 
  addPolygons(data = housing_units_4326,
              color = "#ababab",
              fillColor = ~vacant_green_palette(pct_vacant),
              fillOpacity = .85,
              popup = paste0(housing_units_4326$NAME,
                             '<br/>', 'Percent Vacant: ', 
                             housing_units_4326$pct_vacant))

leaflet, learn more

This is as far as we’ll go with leaflet. To learn more, check out this simple tutorial and the other chapters in that book.

new functions in this lesson

census_api_key: (tidycensus) install api key for tidycensus (do this once)
load_variables(): (tidycensus) import dataframe of all variables in the Census
c(): combine values into a list
get_decennial(): (tidycensus) import data from decennial Census
get_acs(): (tidycensus) import data from ACS
st_transform(): (sf) project spatial dataframe to new projection
leaflet(): (leaflet) create interactive map, displayed in viewer
hist(): create a histogram of a variable
addProviderTiles(): (leaflet) add baselayer
addPolygons(): (leaflet) add polygon layer
colorBin(): (leaflet) define color palette for leaflet map based on a variable
paste0(): concatenate strings

Deliverable 1

Clean up the script you created above, housing_occupancy_2020.R.

Upload to CANVAS, Lab 3 Part 2.

Deliverable 2

Create a new script to create 2 maps using data from the American Community Survey:
- median rent for every census tract in Brooklyn (or another county if you prefer)
- affordability for two people working 40 hours/week making minimum wage
Save the script as Brooklyn_rent_affordability.R in the main_data/scripts/data_exploration folder
- change the name if you’ll do the analysis in a different county

You will use the same steps as in the Percent Vacant example, using data from the 2016-20, 5-yr American Community Survey.

See the instructions in the following pages. When complete, submit your script on CANVAS.

D2: Load packages

library(tidyverse)
library(tidycensus)
library(sf)
library(leaflet)
library(scales)

D2: Download contract rent

Use the load_variables() function to see a list of all variables available in the 5-year ACS
- acs201620 <- load_variables(2020, "acs5", cache = T)
View the acs201620 dataframe and search for “MEDIAN CONTRACT RENT” in the concept column
Note the variable ID for MEDIAN CONTRACT RENT
Use the get_acs() function to download MEDIAN CONTRACT RENT for every tract in Brooklyn
- this will look similar the get_decennial function in the previous example with the following changes:
  - geography = "tract"
  - change the variable to contract rent
  - add two new parameters: state = "New York", "county = "Brooklyn,

D2: get_acs example

Your get_acs() should look like this, with the correct variable inserted

raw_contract_rent <- get_acs(geography = "tract",
  variables = c(contract_rent = "insert variable id here"),
  state = "New York",
  county = "Kings County",
  year = 2020,
  geometry = TRUE,
  output = "wide")

Notice that ACS data has 2 data columns
- contract_rentE: this is the estimate
- contract_rentM: this is the margin of error
For now, you can ignore the margin of error, but it is best practice to exclude areas with high margins of error from your analysis

D2: Calculate variables

Create 4 new columns using the mutate() function:

afforable_rent_2_people_min_wage = 2 * 40 * 15 * 4 * .3
- this is a back-of-the-envelope calculation of how much 2 people with minimum wage jobs can afford to pay for rent
- 2 people, working 40 hours at $15/hour for 4 weeks, allocating 30% of their income to rent
rent_affordability = afforable_rent_2_people_min_wage - contract_rentE
- negative values indicate that the rent is NOT affordable
median_rent_label = dollar(contract_rentE)
rent_affordability_label = dollar(rent_affordability)
use st_transform() to project this spatial dataframe to WGS84

D2: Define palette

Define a palette for 2 maps:

Choropleth of median rent for every census tract
Choropleth of affordability for 2 people with minimum wage jobs

Look at the minimum, maximum, and histogram for each variable to determine the breaks to use in your palette for each map
- Note: there are NA values in this data that we must explicitly ignore in the max and min calculations with na.rm()
  - ex. using the percent vacant data min(housing_units$pct_vacant, na.rm=TRUE)
Define the palette for your map with colorBin()
For the affordablity map, I recommend a diverging color palette like “RdBu”

D2: Create leaflet maps

Create two leaflet maps:

Choropleth of median rent for every census tract
- color by contract rent
- create popup that displays the rent for each census tract
Choropleth of affordability for 2 people with minimum wage jobs
- color by rent affordability for 2 people with minimum wage jobs
- create popup that displays the median rent for each census tract, and the difference between median rent and what 2 people with minimum wage jobs can afford

D2: Submit your script

When you are happy with your maps, clean up your script and upload it to CANVAS, Lab 3 Part 2.

Outline

Assignment 3

Lab 3 overview & Part 2 deliverables

Intro to tidycensus

Intro to spatial data and sf

Create a map with decennial Census data

Create a map with ACS data

Lab 3 deliverables detail

Assignment 3

Lab 3, Part 2 Deliverables

tidycensus

lesson prep

tidycensus documentation

Create a new script

list of decennial census variables

load_variables documentation

decennial census variables

Redistricting Race/Ethnicity data

Import Housing Units data

get_decennial explainer

Answer a question

Calculate new variables

Make a map with leaflet

Leaflet explained

Make a choropleth map

Choropleth part 1: histogram

Choropleth part 2: palette

Choropleth part 3: map

leaflet, learn more

new functions in this lesson

Deliverable 1

Deliverable 2

D2: Load packages

D2: Download contract rent

D2: get_acs example

D2: Calculate variables

D2: Define palette

D2: Create leaflet maps

D2: Submit your script