Research Proposal:
One PDF (3-5 pages) describing the research question, methods and expected results of the research project.
due November 8 (pushing this by one week so you have time for in-class help next week)
Today we talk about how to use the tidycensus package to access Census data in R and prepare it for mapping and spatial analysis.
tidycensus:
install.packages("package_name")Install in the Packages window for a user-interfacetidycensus_API_key.RCreate a new script to process housing occupancy data from the 2020 decennial Census (this will be Deliverable 1:
housing_occupancy_2020.R in the main_data/scripts/data_processing foldercache = TRUE: saves information to your computer so that this table will load faster next time. (recommended)??load_variables to see the help section| name | label | concept |
|---|---|---|
| H1_001N | !!Total: | OCCUPANCY STATUS |
| H1_002N | !!Total:!!Occupied | OCCUPANCY STATUS |
| H1_003N | !!Total:!!Vacant | OCCUPANCY STATUS |
| P1_001N | !!Total: | RACE |
| P1_002N | !!Total:!!Population of one race: | RACE |
| P1_003N | !!Total:!!Population of one race:!!White alone | RACE |
….(displaying the first 6 of 301 rows)
tableID_variableIDThere are 6 tables in the PL-2020 Census Redistricting Data:
Use the get_decennial() function to create a data frame of housing units for every county in New York:
decennial_vars_2020 data frame to find the variable IDs for:
| GEOID | NAME | housing_units | occupied_units | vacant_units | geometry |
|---|---|---|---|---|---|
| 35 | New Mexico | 940859 | 829514 | 111345 | MULTIPOLYGON (((-109.0502 3… |
| 72 | Puerto Rico | 1598159 | 1340534 | 257625 | MULTIPOLYGON (((-65.23805 1… |
| 06 | California | 14392140 | 13475623 | 916517 | MULTIPOLYGON (((-118.6044 3… |
| 01 | Alabama | 2288330 | 2011947 | 276383 | MULTIPOLYGON (((-88.05338 3… |
| 13 | Georgia | 4410956 | 4020808 | 390148 | MULTIPOLYGON (((-81.27939 3… |
….(displaying the first 5 of 52 rows)
Let’s break down the get_decennial() function
??get_decennialhousing_units = "H1_001N"tidy returns a data frame with unique unit-variable combinationsThe ACS questionnaire is sent to 3.5 million addresses each year.
acs_percent <- 3500000/sum(raw_housing_units$housing_units)Create a new data frame and calculate two new variables in your dataframe:
filter() function to remove Alaska, Hawaii, and Puerto Ricomutate() function to add the two new variables
percent_vacant formatted with percent() function from scales package| GEOID | NAME | housing_units | occupied_units | vacant_units | geometry | pct_occupied | pct_vacant | percent_vacant_label |
|---|---|---|---|---|---|---|---|---|
| 35 | New Mexico | 940859 | 829514 | 111345 | MULTIPOLYGON (((-109.0502 3… | 0.8816560 | 0.1183440 | 12% |
| 06 | California | 14392140 | 13475623 | 916517 | MULTIPOLYGON (((-118.6044 3… | 0.9363182 | 0.0636818 | 6% |
| 01 | Alabama | 2288330 | 2011947 | 276383 | MULTIPOLYGON (((-88.05338 3… | 0.8792207 | 0.1207793 | 12% |
use st_tranform() to project to WGS84 (WGS84 is required for leaflet)
leaflet(): creates the map objectaddProviderTiles(): adds the basemap layeraddPolygons(): adds the polygon layer with the following parameters:
We’re going to make a choropleth map of the proportion of housing units that are vacant in each state.
Steps:
[1] 0.06368177
[1] 0.2119347
colorBin() function to define the palettecolorBin(): defines the palette used in a leaflet map:
palette(): defines a predefined gradient (see optional below for more information)domain(): defines the variable to apply the palette tobins(): defines the breaks in the palette - when the color changesThis is as far as we’ll go with leaflet. To learn more, check out this simple tutorial and the other chapters in that book.
census_api_key: (tidycensus) install api key for tidycensus (do this once)load_variables(): (tidycensus) import dataframe of all variables in the Censusc(): combine values into a listget_decennial(): (tidycensus) import data from decennial Censusget_acs(): (tidycensus) import data from ACSst_transform(): (sf) project spatial dataframe to new projectionleaflet(): (leaflet) create interactive map, displayed in viewerhist(): create a histogram of a variableaddProviderTiles(): (leaflet) add baselayeraddPolygons(): (leaflet) add polygon layercolorBin(): (leaflet) define color palette for leaflet map based on a variablepaste0(): concatenate stringsClean up the script you created above, housing_occupancy_2020.R.
Upload to CANVAS, Lab 3 Part 2.
Brooklyn_rent_affordability.R in the main_data/scripts/data_exploration folder
You will use the same steps as in the Percent Vacant example, using data from the 2016-20, 5-yr American Community Survey.
See the instructions in the following pages. When complete, submit your script on CANVAS.
load_variables() function to see a list of all variables available in the 5-year ACS
acs201620 <- load_variables(2020, "acs5", cache = T)acs201620 dataframe and search for “MEDIAN CONTRACT RENT” in the concept columnget_acs() function to download MEDIAN CONTRACT RENT for every tract in Brooklyn
geography = "tract"state = "New York", "county = "Brooklyn,Your get_acs() should look like this, with the correct variable inserted
Create 4 new columns using the mutate() function:
afforable_rent_2_people_min_wage = 2 * 40 * 15 * 4 * .3
rent_affordability = afforable_rent_2_people_min_wage - contract_rentE
median_rent_label = dollar(contract_rentE)
rent_affordability_label = dollar(rent_affordability)
use st_transform() to project this spatial dataframe to WGS84
Define a palette for 2 maps:
na.rm()
min(housing_units$pct_vacant, na.rm=TRUE)colorBin()When you are happy with your maps, clean up your script and upload it to CANVAS, Lab 3 Part 2.