In this lab you will practice working with the US census data searching for variables, loading them into R, performing basic calculations and data wrangling tasks, visualizing the data with figures and maps, and offer substantive feedback on the data analysis.
The topic for our analysis will be to compare the house-price-to-income ratio across US counties and over time. The ratio tells the number of years it would take for the median income household to buy the median household price. Under healthy economic conditions, the rule of thumb is that a buyer can afford a house if its price is equivalent to a house-price-to-income ratio of 2.6. Read the Citylab report report by Richard Florida for more background on the ratio and its importance and US county rankings.
For loading, manipulating, and plotting Census data in R, refer to Lecture 2 video and notes, and in particular, the part of lecture that introduces tidycensus. In that portion of lecture, the exact codes are already given for you. All you need to do typically is to change variable name or year value, and re-run the code.
Remember to set census_api_key("your_key_here") to
access data
library(censusapi)
census_api_key("404739a4c1d1baf6067b937fe42425b977ef145d")
#apis <- listCensusApis()
#View(apis)
# Add key to .Renviron
Sys.setenv(CENSUS_KEY="404739a4c1d1baf6067b937fe42425b977ef145d")
# Check to see that the expected key is output in your R console
Sys.getenv("CENSUS_KEY")## [1] "404739a4c1d1baf6067b937fe42425b977ef145d"
You can get a Census API Key at: https://api.census.gov/data/key_signup.html
Use tidycensus::get_acs() to download information on median household income and median household value for ACS 5-year estimate, 2013-2017 period. The notation tidycensus::get_acs() means the get_acs() function is located inside the tidycensus package, i.e. you need that package loaded to use the function.
Call the data set CenDF
Notes:
Use load_variables to view and search for variables. In filter, search for Median value and use first row to get median household price variable name.
Remember that year=2017 and survey=“acs5” reports 5-year estimates for the last five years from the specified date (i.e. year=2017 uses data from 2013-2017).
library(tidycensus)
#view(load_variables("2010"))
vars <- load_variables(year=2017,
dataset = "acs5",
cache = TRUE)
dim(vars)## [1] 25071 4
head(vars, 5)## # A tibble: 5 × 4
## name label concept geography
## <chr> <chr> <chr> <chr>
## 1 B00001_001 Estimate!!Total UNWEIGHTED SAMPLE C… block gr…
## 2 B00002_001 Estimate!!Total UNWEIGHTED SAMPLE H… block gr…
## 3 B01001_001 Estimate!!Total SEX BY AGE block gr…
## 4 B01001_002 Estimate!!Total!!Male SEX BY AGE block gr…
## 5 B01001_003 Estimate!!Total!!Male!!Under 5 years SEX BY AGE block gr…
#view(vars)
#Median income = B19013_001
#Median home value = B25097_001#varsselect <- c(Median_Income = "B19013_001",
# Median_Value = "B25097_001")
CenDF <- get_acs(geography = "county",
year=2017,
survey = "acs5",
variable = c("B19013_001", "B25097_001"),
geometry = T,
shift_geo = T)
head(CenDF, 10)## Simple feature collection with 10 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 1085367 ymin: -1309052 xmax: 1390214 ymax: -1045061
## Projected CRS: +proj=laea +lat_0=45 +lon_0=-100 +x_0=0 +y_0=0 +a=6370997 +b=6370997 +units=m +no_defs
## GEOID NAME variable estimate moe
## 1 01001 Autauga County, Alabama B19013_001 55317 2838
## 2 01001 Autauga County, Alabama B25097_001 143000 6496
## 3 01009 Blount County, Alabama B19013_001 47412 2630
## 4 01009 Blount County, Alabama B25097_001 122200 4169
## 5 01017 Chambers County, Alabama B19013_001 37342 2011
## 6 01017 Chambers County, Alabama B25097_001 86400 4374
## 7 01021 Chilton County, Alabama B19013_001 43501 2877
## 8 01021 Chilton County, Alabama B25097_001 98600 6349
## 9 01033 Colbert County, Alabama B19013_001 45477 1959
## 10 01033 Colbert County, Alabama B25097_001 108500 5071
## geometry
## 1 MULTIPOLYGON (((1269841 -13...
## 2 MULTIPOLYGON (((1269841 -13...
## 3 MULTIPOLYGON (((1240383 -11...
## 4 MULTIPOLYGON (((1240383 -11...
## 5 MULTIPOLYGON (((1382944 -12...
## 6 MULTIPOLYGON (((1382944 -12...
## 7 MULTIPOLYGON (((1257515 -12...
## 8 MULTIPOLYGON (((1257515 -12...
## 9 MULTIPOLYGON (((1085910 -10...
## 10 MULTIPOLYGON (((1085910 -10...
Use spread() to transform the dataset from long to wide and save the dataset back into CenDF
Use mutate() to create the ratio of median household value divided by household median income, and call it HHInc_HousePrice_Ratio
Notes:
It will be helpful to relabel variable names from census name to a more easy to understand name, such as HHIncome and HouseValue. You can do this using mutate() and case_when(). It will also be helpful to remove MOE variable using select()
CenDF <- CenDF %>%
mutate (variable=case_when(
variable=="B19013_001" ~ "HHIncome",
variable=="B25097_001" ~ "HouseValue")) %>%
select(-moe) %>%
spread(variable, estimate) %>%
mutate(HHInc_HousePrice_Ratio = round(HouseValue/HHIncome,1))
head(CenDF)## Simple feature collection with 6 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 1085367 ymin: -1419116 xmax: 1390214 ymax: -1045061
## Projected CRS: +proj=laea +lat_0=45 +lon_0=-100 +x_0=0 +y_0=0 +a=6370997 +b=6370997 +units=m +no_defs
## GEOID NAME HHIncome HouseValue
## 1 01001 Autauga County, Alabama 55317 143000
## 2 01009 Blount County, Alabama 47412 122200
## 3 01017 Chambers County, Alabama 37342 86400
## 4 01021 Chilton County, Alabama 43501 98600
## 5 01033 Colbert County, Alabama 45477 108500
## 6 01045 Dale County, Alabama 44711 108000
## geometry HHInc_HousePrice_Ratio
## 1 MULTIPOLYGON (((1269841 -13... 2.6
## 2 MULTIPOLYGON (((1240383 -11... 2.6
## 3 MULTIPOLYGON (((1382944 -12... 2.3
## 4 MULTIPOLYGON (((1257515 -12... 2.3
## 5 MULTIPOLYGON (((1085910 -10... 2.4
## 6 MULTIPOLYGON (((1382203 -13... 2.4
Use base R function order() and rev(order) to explore which counties have the lowest and highest house-price-to-income ratio.
There is always more than one way to accomplish the same task in R. For example, to get the lowest and highest house-price-to-income ratios
Bonus: Use datatable() function in the DT library to get interactive table that allows you to order columns
datatable(CenDF)
Low to High:
head(CenDF[ order(CenDF$HHInc_HousePrice_Ratio) , ],3) #sorts the data by HHInc_HousePrice_Ratio low to high and previews top three## Simple feature collection with 3 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -285178 ymin: -1287946 xmax: 841.6695 ymax: -141973.3
## Projected CRS: +proj=laea +lat_0=45 +lon_0=-100 +x_0=0 +y_0=0 +a=6370997 +b=6370997 +units=m +no_defs
## GEOID NAME HHIncome HouseValue
## 1543 46102 Oglala Lakota County, South Dakota 27804 18700
## 2232 48269 King County, Texas 56964 38800
## 545 48079 Cochran County, Texas 37500 33300
## geometry HHInc_HousePrice_Ratio
## 1543 MULTIPOLYGON (((-242083.1 -... 0.7
## 2232 MULTIPOLYGON (((-48135.61 -... 0.7
## 545 MULTIPOLYGON (((-282767.2 -... 0.9
High to Low:
head(CenDF[ rev(order(CenDF$HHInc_HousePrice_Ratio)) , ],3) #sorts the data by HHInc_HousePrice_Ratio high to low and previews top three## Simple feature collection with 3 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -377587.6 ymin: -2055704 xmax: 2163853 ymax: -128535.4
## Projected CRS: +proj=laea +lat_0=45 +lon_0=-100 +x_0=0 +y_0=0 +a=6370997 +b=6370997 +units=m +no_defs
## GEOID NAME HHIncome HouseValue
## 3141 15005 Kalawao County, Hawaii 61750 NA
## 1690 48301 Loving County, Texas 80938 NA
## 2662 36047 Kings County, New York 52782 623900
## geometry HHInc_HousePrice_Ratio
## 3141 MULTIPOLYGON (((-374079.7 -... NA
## 1690 MULTIPOLYGON (((-315615.9 -... NA
## 2662 MULTIPOLYGON (((2158769 -13... 11.8
#interactive table
DT::datatable(CenDF)library(tidyr)
ggplot(CenDF)+
geom_sf(aes(fill=HHInc_HousePrice_Ratio), color=NA)+
coord_sf(datum=NA)+
labs(title="House-Price-to-Income Ratio", caption="Source: US Census/ACS5 2017")+
scale_fill_viridis(direction=-1)
# Questions
1a. In step 3 above, what is the county with the highest (and lowest) house-price-to-income ratio? What is the average house-price-to-income ratio across all US counties over the time period?
- Answer: Highest - Kings County, NY
Lowest - Oglala Lakota County, South Dakota
The average house-price-to-income ratio across all US is 2.8
average2017 = mean(CenDF$HHInc_HousePrice_Ratio, na.rm = TRUE)1b. Where is Maricopa County on the list? How many years of median income will it take to buy a home in Maricopa County?
- Answer: Maricopa County is 409th highest (two missing values) out of 3220 counties in the US with a ratio of 3.8.
DT::datatable(CenDF)1c. Where is Los Angeles on the list and how does its ranking compare to the list reported in the Citylab report report by Richard Florida?
- Answer: Los Angeles is 12th highest out of 3220 counties in the US with a ratio of 8.1. The article by Richard Florida claims Los Angeles is 1st with a ratio of 9.6. Fun Fact, San Francisco is 6th with a ratio of 9.6.
### Question 2: Temporal analysis.
1a. Go back to Step 1 above, and enter in 2012 as the year value, which refers to the 2008-2012 5-year ACS estimate. The time period corresponds to the aftermath of the 2007/08 Great Financial Crisis. Repeat Steps 2-4, and re-answer Question 1 above for the 2008-2012 time period.
1a - Answer: Highest - Kings County, NY
Lowest - Shannon County, South Dakota
The average ratio is 2.8
1b - Answer: Maricopa - 3.6, 527 out of 3143
1c - Answer: Los Angeles - 7.9, 17 out of 3143
Code and Output:
CenDF2012 <- get_acs(geography = "county",
year=2012,
survey = "acs5",
variable = c("B19013_001", "B25097_001"),
geometry = T,
shift_geo = T)
CenDF2012 <- CenDF2012 %>%
mutate (variable=case_when(
variable=="B19013_001" ~ "HHIncome",
variable=="B25097_001" ~ "HouseValue")) %>%
select(-moe) %>%
spread(variable, estimate) %>%
mutate(HHInc_HousePrice_Ratio = round(HouseValue/HHIncome,1))
ggplot(CenDF)+
geom_sf(aes(fill=HHInc_HousePrice_Ratio), color=NA)+
coord_sf(datum=NA)+
labs(title="House-Price-to-Income Ratio", caption="Source: US Census/ACS5 2017")+
scale_fill_viridis(direction=-1)DT::datatable(CenDF2012)average2012 = mean(CenDF2012$HHInc_HousePrice_Ratio, na.rm = T)2b. Compare and contrast the findings over the different time periods. Calculate the change in the housing value-to-income ratio from 2008-2012 to 2013-2017. What is the change? Did the average house-price-to-income ratio increase or decrease? Did the house-price-to-income ratio increase (or decrease) for Maricopa County and Los Angeles county?
- Answer: % change = +1.53%
Increase for all US (+1.53%), Maricopa (+5.55%), and Los Angeles (+2.53%)
change = (average2012-average2017)/(average2017)*100
change## [1] 1.53434
3a. Go back to Step 1 above and keep 2017 as the year value. This
time change geography value from county to
tract. Also add state = "AZ" and
county = "Maricopa County" within the get_acs
function. Repeat Steps 2-4. What are the summary statistics (min, max,
median, mean, sd) for the house-price-to-income ratio in Maricopa
county?
- Answer: see table output
CenDF_az <- get_acs(geography = "tract",
year=2017,
survey = "acs5",
variable = c("B19013_001", "B25097_001"),
county="Maricopa County",
state = "AZ",
geometry = T)##
|
| | 0%
|
|= | 2%
|
|== | 3%
|
|=== | 5%
|
|==== | 6%
|
|===== | 8%
|
|======= | 9%
|
|======= | 10%
|
|======== | 12%
|
|========= | 12%
|
|========== | 14%
|
|=========== | 16%
|
|============= | 19%
|
|============== | 20%
|
|================ | 23%
|
|================= | 24%
|
|================= | 25%
|
|=================== | 26%
|
|==================== | 28%
|
|==================== | 29%
|
|===================== | 30%
|
|====================== | 31%
|
|======================= | 33%
|
|========================= | 35%
|
|========================== | 37%
|
|=========================== | 39%
|
|============================= | 41%
|
|============================== | 43%
|
|=============================== | 44%
|
|================================= | 48%
|
|=================================== | 50%
|
|==================================== | 52%
|
|====================================== | 54%
|
|======================================= | 56%
|
|========================================== | 60%
|
|============================================ | 62%
|
|============================================== | 66%
|
|================================================ | 69%
|
|================================================== | 72%
|
|=================================================== | 73%
|
|==================================================== | 75%
|
|====================================================== | 77%
|
|======================================================= | 79%
|
|========================================================= | 81%
|
|=========================================================== | 84%
|
|============================================================ | 85%
|
|============================================================= | 87%
|
|================================================================ | 91%
|
|================================================================== | 94%
|
|=================================================================== | 95%
|
|==================================================================== | 98%
|
|======================================================================| 100%
CenDF_az <- CenDF_az %>%
mutate (variable=case_when(
variable=="B19013_001" ~ "HHIncome",
variable=="B25097_001" ~ "HouseValue")) %>%
select(-moe) %>%
spread(variable, estimate) %>%
mutate(HHInc_HousePrice_Ratio = round(HouseValue/HHIncome,1))
ggplot(CenDF_az, aes(fill=HHInc_HousePrice_Ratio))+
geom_sf(color="white")+
theme_void() + theme(panel.grid.major = element_line(colour='transparent'))+
scale_fill_distiller(palette="Reds", direction=1, name="Estimate")+
labs(title="House-Price-to-Income Ratio: Cache County, UT", caption="Source: US Census/ACS5 2017")+
NULL
library(pander)
summary(CenDF_az$HHInc_HousePrice_Ratio) %>% pander()| Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | NA’s |
|---|---|---|---|---|---|---|
| 0.3 | 2.9 | 3.3 | 3.623 | 4.1 | 14.2 | 24 |
sd(CenDF_az$HHInc_HousePrice_Ratio, na.rm = T)## [1] 1.329234
3b. Compare and contrast the findings for the different levels of geography. How does the minimum, maxim, and mean value for census tracts within Maricopa county compare to results found in Question 1 looking at county-level data?
- Answer: The census tracts data for Maricopa county show a much lower median value that the county-level US data. The percent change is very close, but, the numeric measure of income to home value ratio is very different (much smaller) when looking at smaller tracts.