Overview

In this lab you will practice working with the US census data searching for variables, loading them into R, performing basic calculations and data wrangling tasks, visualizing the data with figures and maps, and offer substantive feedback on the data analysis.

The topic for our analysis will be to compare the house-price-to-income ratio across US counties and over time. The ratio tells the number of years it would take for the median income household to buy the median household price. Under healthy economic conditions, the rule of thumb is that a buyer can afford a house if its price is equivalent to a house-price-to-income ratio of 2.6. Read the Citylab report report by Richard Florida for more background on the ratio and its importance and US county rankings.

For loading, manipulating, and plotting Census data in R, refer to Lecture 2 video and notes, and in particular, the part of lecture that introduces tidycensus. In that portion of lecture, the exact codes are already given for you. All you need to do typically is to change variable name or year value, and re-run the code.

Remember to set census_api_key("your_key_here") to access data

library(censusapi)

census_api_key("404739a4c1d1baf6067b937fe42425b977ef145d")
#apis <- listCensusApis()
#View(apis)
# Add key to .Renviron
Sys.setenv(CENSUS_KEY="404739a4c1d1baf6067b937fe42425b977ef145d") 
# Check to see that the expected key is output in your R console
Sys.getenv("CENSUS_KEY")
## [1] "404739a4c1d1baf6067b937fe42425b977ef145d"

You can get a Census API Key at: https://api.census.gov/data/key_signup.html

Step 1: Getting the Data

Use tidycensus::get_acs() to download information on median household income and median household value for ACS 5-year estimate, 2013-2017 period. The notation tidycensus::get_acs() means the get_acs() function is located inside the tidycensus package, i.e. you need that package loaded to use the function.

Call the data set CenDF

Notes:

Use load_variables to view and search for variables. In filter, search for Median value and use first row to get median household price variable name.

Remember that year=2017 and survey=“acs5” reports 5-year estimates for the last five years from the specified date (i.e. year=2017 uses data from 2013-2017).

library(tidycensus)

#view(load_variables("2010"))
vars <- load_variables(year=2017,
                       dataset = "acs5",
                       cache = TRUE)
dim(vars)
## [1] 25071     4
head(vars, 5)
## # A tibble: 5 × 4
##   name       label                                concept              geography
##   <chr>      <chr>                                <chr>                <chr>    
## 1 B00001_001 Estimate!!Total                      UNWEIGHTED SAMPLE C… block gr…
## 2 B00002_001 Estimate!!Total                      UNWEIGHTED SAMPLE H… block gr…
## 3 B01001_001 Estimate!!Total                      SEX BY AGE           block gr…
## 4 B01001_002 Estimate!!Total!!Male                SEX BY AGE           block gr…
## 5 B01001_003 Estimate!!Total!!Male!!Under 5 years SEX BY AGE           block gr…
#view(vars)

#Median income = B19013_001
#Median home value = B25097_001


#varsselect <- c(Median_Income = "B19013_001",
#               Median_Value = "B25097_001")

CenDF <- get_acs(geography = "county", 
                 year=2017,
                 survey = "acs5",
                 variable = c("B19013_001", "B25097_001"),
                 geometry = T,
                 shift_geo = T)

head(CenDF, 10)
## Simple feature collection with 10 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 1085367 ymin: -1309052 xmax: 1390214 ymax: -1045061
## Projected CRS: +proj=laea +lat_0=45 +lon_0=-100 +x_0=0 +y_0=0 +a=6370997 +b=6370997 +units=m +no_defs
##    GEOID                     NAME   variable estimate  moe
## 1  01001  Autauga County, Alabama B19013_001    55317 2838
## 2  01001  Autauga County, Alabama B25097_001   143000 6496
## 3  01009   Blount County, Alabama B19013_001    47412 2630
## 4  01009   Blount County, Alabama B25097_001   122200 4169
## 5  01017 Chambers County, Alabama B19013_001    37342 2011
## 6  01017 Chambers County, Alabama B25097_001    86400 4374
## 7  01021  Chilton County, Alabama B19013_001    43501 2877
## 8  01021  Chilton County, Alabama B25097_001    98600 6349
## 9  01033  Colbert County, Alabama B19013_001    45477 1959
## 10 01033  Colbert County, Alabama B25097_001   108500 5071
##                          geometry
## 1  MULTIPOLYGON (((1269841 -13...
## 2  MULTIPOLYGON (((1269841 -13...
## 3  MULTIPOLYGON (((1240383 -11...
## 4  MULTIPOLYGON (((1240383 -11...
## 5  MULTIPOLYGON (((1382944 -12...
## 6  MULTIPOLYGON (((1382944 -12...
## 7  MULTIPOLYGON (((1257515 -12...
## 8  MULTIPOLYGON (((1257515 -12...
## 9  MULTIPOLYGON (((1085910 -10...
## 10 MULTIPOLYGON (((1085910 -10...


Step 2: Transforming the Data

Use spread() to transform the dataset from long to wide and save the dataset back into CenDF

Use mutate() to create the ratio of median household value divided by household median income, and call it HHInc_HousePrice_Ratio

Notes:

It will be helpful to relabel variable names from census name to a more easy to understand name, such as HHIncome and HouseValue. You can do this using mutate() and case_when(). It will also be helpful to remove MOE variable using select()

CenDF <- CenDF %>%
  mutate (variable=case_when(
    variable=="B19013_001" ~ "HHIncome",
    variable=="B25097_001" ~ "HouseValue")) %>%
  select(-moe) %>%
  spread(variable, estimate) %>%
  mutate(HHInc_HousePrice_Ratio = round(HouseValue/HHIncome,1))

head(CenDF)
## Simple feature collection with 6 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 1085367 ymin: -1419116 xmax: 1390214 ymax: -1045061
## Projected CRS: +proj=laea +lat_0=45 +lon_0=-100 +x_0=0 +y_0=0 +a=6370997 +b=6370997 +units=m +no_defs
##   GEOID                     NAME HHIncome HouseValue
## 1 01001  Autauga County, Alabama    55317     143000
## 2 01009   Blount County, Alabama    47412     122200
## 3 01017 Chambers County, Alabama    37342      86400
## 4 01021  Chilton County, Alabama    43501      98600
## 5 01033  Colbert County, Alabama    45477     108500
## 6 01045     Dale County, Alabama    44711     108000
##                         geometry HHInc_HousePrice_Ratio
## 1 MULTIPOLYGON (((1269841 -13...                    2.6
## 2 MULTIPOLYGON (((1240383 -11...                    2.6
## 3 MULTIPOLYGON (((1382944 -12...                    2.3
## 4 MULTIPOLYGON (((1257515 -12...                    2.3
## 5 MULTIPOLYGON (((1085910 -10...                    2.4
## 6 MULTIPOLYGON (((1382203 -13...                    2.4


Step 3: Data Exploration

Use base R function order() and rev(order) to explore which counties have the lowest and highest house-price-to-income ratio.

  • Example: CenDF[ order(CenDF$HHInc_HousePrice_Ratio) , ] sorts the data by HHInc_HousePrice_Ratio low to high
  • Example: CenDF[ rev(order(CenDF$HHInc_HousePrice_Ratio)) , ] sorts the data by HHInc_HousePrice_Ratio high to low

There is always more than one way to accomplish the same task in R. For example, to get the lowest and highest house-price-to-income ratios

  • dplyr: arrange( CenDF, HHInc_HousePrice_Ratio ) sorts the df low to high
  • dplyr: arrange( CenDF, desc(HHInc_HousePrice_Ratio) ) sorts the df high to low

Bonus: Use datatable() function in the DT library to get interactive table that allows you to order columns

datatable(CenDF)

Low to High:

head(CenDF[ order(CenDF$HHInc_HousePrice_Ratio) , ],3) #sorts the data by HHInc_HousePrice_Ratio low to high and previews top three
## Simple feature collection with 3 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -285178 ymin: -1287946 xmax: 841.6695 ymax: -141973.3
## Projected CRS: +proj=laea +lat_0=45 +lon_0=-100 +x_0=0 +y_0=0 +a=6370997 +b=6370997 +units=m +no_defs
##      GEOID                               NAME HHIncome HouseValue
## 1543 46102 Oglala Lakota County, South Dakota    27804      18700
## 2232 48269                 King County, Texas    56964      38800
## 545  48079              Cochran County, Texas    37500      33300
##                            geometry HHInc_HousePrice_Ratio
## 1543 MULTIPOLYGON (((-242083.1 -...                    0.7
## 2232 MULTIPOLYGON (((-48135.61 -...                    0.7
## 545  MULTIPOLYGON (((-282767.2 -...                    0.9


High to Low:

head(CenDF[ rev(order(CenDF$HHInc_HousePrice_Ratio)) , ],3) #sorts the data by HHInc_HousePrice_Ratio high to low and previews top three
## Simple feature collection with 3 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -377587.6 ymin: -2055704 xmax: 2163853 ymax: -128535.4
## Projected CRS: +proj=laea +lat_0=45 +lon_0=-100 +x_0=0 +y_0=0 +a=6370997 +b=6370997 +units=m +no_defs
##      GEOID                   NAME HHIncome HouseValue
## 3141 15005 Kalawao County, Hawaii    61750         NA
## 1690 48301   Loving County, Texas    80938         NA
## 2662 36047 Kings County, New York    52782     623900
##                            geometry HHInc_HousePrice_Ratio
## 3141 MULTIPOLYGON (((-374079.7 -...                     NA
## 1690 MULTIPOLYGON (((-315615.9 -...                     NA
## 2662 MULTIPOLYGON (((2158769 -13...                   11.8


#interactive table
DT::datatable(CenDF)


Step 4: Map the Data

library(tidyr)

ggplot(CenDF)+
  geom_sf(aes(fill=HHInc_HousePrice_Ratio), color=NA)+
  coord_sf(datum=NA)+
  labs(title="House-Price-to-Income Ratio", caption="Source: US Census/ACS5 2017")+
  scale_fill_viridis(direction=-1)



# Questions

Question 1: Exploration.

1a. In step 3 above, what is the county with the highest (and lowest) house-price-to-income ratio? What is the average house-price-to-income ratio across all US counties over the time period?

- Answer: Highest - Kings County, NY
          Lowest - Oglala Lakota County, South Dakota
          
  The average house-price-to-income ratio across all US is 2.8
average2017 = mean(CenDF$HHInc_HousePrice_Ratio, na.rm = TRUE)


1b. Where is Maricopa County on the list? How many years of median income will it take to buy a home in Maricopa County?

- Answer: Maricopa County is 409th highest (two missing values) out of 3220 counties in the US with a ratio of 3.8. 
DT::datatable(CenDF)


1c. Where is Los Angeles on the list and how does its ranking compare to the list reported in the Citylab report report by Richard Florida?

- Answer: Los Angeles is 12th highest out of 3220 counties in the US with a ratio of 8.1. The article by Richard Florida claims Los Angeles is 1st with a ratio of 9.6. Fun Fact, San Francisco is 6th with a ratio of 9.6.   



### Question 2: Temporal analysis.

1a. Go back to Step 1 above, and enter in 2012 as the year value, which refers to the 2008-2012 5-year ACS estimate. The time period corresponds to the aftermath of the 2007/08 Great Financial Crisis. Repeat Steps 2-4, and re-answer Question 1 above for the 2008-2012 time period.

1a - Answer: Highest - Kings County, NY
             Lowest - Shannon County, South Dakota
     The average ratio is 2.8
1b - Answer: Maricopa - 3.6, 527 out of 3143
1c - Answer: Los Angeles - 7.9, 17 out of 3143 

Code and Output:

CenDF2012 <- get_acs(geography = "county", 
                 year=2012,
                 survey = "acs5",
                 variable = c("B19013_001", "B25097_001"),
                 geometry = T,
                 shift_geo = T)

CenDF2012 <- CenDF2012 %>%
  mutate (variable=case_when(
    variable=="B19013_001" ~ "HHIncome",
    variable=="B25097_001" ~ "HouseValue")) %>%
  select(-moe) %>%
  spread(variable, estimate) %>%
  mutate(HHInc_HousePrice_Ratio = round(HouseValue/HHIncome,1))

ggplot(CenDF)+
  geom_sf(aes(fill=HHInc_HousePrice_Ratio), color=NA)+
  coord_sf(datum=NA)+
  labs(title="House-Price-to-Income Ratio", caption="Source: US Census/ACS5 2017")+
  scale_fill_viridis(direction=-1)

DT::datatable(CenDF2012)
average2012 = mean(CenDF2012$HHInc_HousePrice_Ratio, na.rm = T)


2b. Compare and contrast the findings over the different time periods. Calculate the change in the housing value-to-income ratio from 2008-2012 to 2013-2017. What is the change? Did the average house-price-to-income ratio increase or decrease? Did the house-price-to-income ratio increase (or decrease) for Maricopa County and Los Angeles county?

- Answer: % change = +1.53%
  Increase for all US (+1.53%), Maricopa (+5.55%), and Los Angeles (+2.53%)
  
change = (average2012-average2017)/(average2017)*100
change
## [1] 1.53434


Question 3: High-Resolution Analysis.

3a. Go back to Step 1 above and keep 2017 as the year value. This time change geography value from county to tract. Also add state = "AZ" and county = "Maricopa County" within the get_acs function. Repeat Steps 2-4. What are the summary statistics (min, max, median, mean, sd) for the house-price-to-income ratio in Maricopa county?

- Answer: see table output
CenDF_az <- get_acs(geography = "tract", 
                 year=2017,
                 survey = "acs5",
                 variable = c("B19013_001", "B25097_001"),
                 county="Maricopa County",
                 state = "AZ",
                 geometry = T)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |===                                                                   |   5%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=====                                                                 |   8%
  |                                                                            
  |=======                                                               |   9%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |========                                                              |  12%
  |                                                                            
  |=========                                                             |  12%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |=============                                                         |  19%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |================                                                      |  23%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |=================                                                     |  25%
  |                                                                            
  |===================                                                   |  26%
  |                                                                            
  |====================                                                  |  28%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |======================                                                |  31%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |=========================                                             |  35%
  |                                                                            
  |==========================                                            |  37%
  |                                                                            
  |===========================                                           |  39%
  |                                                                            
  |=============================                                         |  41%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |=================================                                     |  48%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |====================================                                  |  52%
  |                                                                            
  |======================================                                |  54%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |============================================                          |  62%
  |                                                                            
  |==============================================                        |  66%
  |                                                                            
  |================================================                      |  69%
  |                                                                            
  |==================================================                    |  72%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |======================================================                |  77%
  |                                                                            
  |=======================================================               |  79%
  |                                                                            
  |=========================================================             |  81%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |============================================================          |  85%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |================================================================      |  91%
  |                                                                            
  |==================================================================    |  94%
  |                                                                            
  |===================================================================   |  95%
  |                                                                            
  |====================================================================  |  98%
  |                                                                            
  |======================================================================| 100%
CenDF_az <- CenDF_az %>%
  mutate (variable=case_when(
    variable=="B19013_001" ~ "HHIncome",
    variable=="B25097_001" ~ "HouseValue")) %>%
  select(-moe) %>%
  spread(variable, estimate) %>%
  mutate(HHInc_HousePrice_Ratio = round(HouseValue/HHIncome,1))

ggplot(CenDF_az, aes(fill=HHInc_HousePrice_Ratio))+
  geom_sf(color="white")+
  theme_void() + theme(panel.grid.major = element_line(colour='transparent'))+
  scale_fill_distiller(palette="Reds", direction=1, name="Estimate")+
  labs(title="House-Price-to-Income Ratio: Cache County, UT", caption="Source: US Census/ACS5 2017")+
  NULL


library(pander)
summary(CenDF_az$HHInc_HousePrice_Ratio) %>% pander()
Min. 1st Qu. Median Mean 3rd Qu. Max. NA’s
0.3 2.9 3.3 3.623 4.1 14.2 24
sd(CenDF_az$HHInc_HousePrice_Ratio, na.rm = T)
## [1] 1.329234


3b. Compare and contrast the findings for the different levels of geography. How does the minimum, maxim, and mean value for census tracts within Maricopa county compare to results found in Question 1 looking at county-level data?

- Answer: The census tracts data for Maricopa county show a much lower median value that the county-level US data. The percent change is very close, but, the numeric measure of income to home value ratio is very different (much smaller) when looking at smaller tracts.