GIS Research Project Blog

Author

Brian Surratt

Published

May 3, 2023

Blog Post 1

Research project idea (500 words). Here you will describe the idea for your research project, including some background on the topic, a description of your data source(s) and a description of your variable(s). You will also state your research questions and goals related to your project.

In December 2014, Minneapolis, Minnesota passed a city ordinance allowing for the construction of accessory dwelling units (ADUs, also known as “granny flats”) on existing single-family lots. [1] The ordinance was the first in a series of steps to address the costs of housing in the city by increasing housing supply. In 2018, Minneapolis continued this approach by banning single-family zoning, thus requiring a 3-unit minimum on each lot for new construction.

The 2014 ordinance allows single-family homeowners to construct ADUs on their property. Theoretically, this should add housing supply to the city and apply downward pressure on housing costs. Critics argued ADUs wouldn’t make a significant impact on housing supply since they are relatively expensive to build (around $100,000 each at the time) and the property owner had to continue to reside in either the existing home or the ADU. In other words, they would primarily be used by family members and would not enter the general rental market.

The 2018 ordinance banning single-family zoning was one of the first of its kind. Single-family zoning is highly valued by homeowners but advocates of high-density housing argue it should be phased out. It is highly unusual for existing single-family zoning to be modified, so Minneapolis is entering uncharted territory for housing policy.

This research project will analyze changes in rental affordability in Minneapolis between 2014, when the ADU ordinance was passed, and 2020. Comparing housing costs before and after the ADU ordinance and single-family zoning ban will provide evidence of the effects of these changes in housing policy. The research question is how did rental affordability change in Minneapolis, Minnesota, five years after the city passed an ordinance allowing ADUs on single family lots in December 2014 and two years after the city banned single family zoning in 2018?

I will map 3 variables from the U.S. Census Bureau American Community Survey Public Use Microdata (2014 and 2020) related to housing costs in Minneapolis, Minnesota, in 2014 and 2020, by census tract. The variables are percent of renters in each tract (derived from the TEN variable, housing tenure), the percent of renters who are cost burdened (derived from GRPIP, gross rental as a percentate of household income), and the change in percent of renters who are cost burdened from 2014 to 2020. “Cost burdened” is defined as paying greater than 30% of income towards housing costs.

I will produce the following maps and tables:

  • Map 1: Percent of all residents who are renters in each census tract in 2014.
  • Map 2: Precent of all residents who are renters in each census tract in 2020.
  • Map 3: Percent of renters in each census tract who are cost burdened in 2014.
  • Map 4: Percent of renters in each census tract who are cost burdened in 2020.
  • Map 5: Change in percent of renters who are cost burdened from 2014 to 2019.
  • Table 1: List of census tracts with percent cost burdened in 2014, in 2020, and change in percentage over the 5-year period.

References

  1. ACCESSORY DWELLING UNIT ZONING CODE TEXT AMENDMENT PASSES IN MINNEAPOLIS. (2014, Dec 09). US Fed News Service, Including US State News Retrieved from https://login.libweb.lib.utsa.edu/login?url=https://www.proquest.com/wire-feeds/accessory-dwelling-unit-zoning-code-text/docview/1634265244/se-2

Blog Post 2

Description of data and GIS processes (250-500 words, 1-2 tables, 2-3 figures). Here you will describe your data, including the source and origin, as well as a plan for what GIS operations you will be conducting during the course of your project.

Description of data

The source of data is the U.S. Census Bureau’s American Community Survey Public Use Microdata from 2014 and 2019. I will access the data via the get_pums() function in r. The variables I will use are the following:

  • PUMA: Public use area microdata area code. The PUMAs for Minneapolis are 1405, 1406, and 1407. In the 2014 data, this is a 4 character variable. In the 2019 data, this is a 5 character variable.
  • TYPE: Type of unit, filtered for “1” which is a housing unit. This removes group quarters.
  • TEN: This variable is housing tenure and can have four values, 1 = “owned with mortgage,” 2 = “owned free and clear,” 3 = “rented,” and 4 = “occupied without payment of rent.”
  • GRPIP: Gross rent as a percentage of household income past 12 months.
  • RELP: This is the relationship of the respondent to the household reference person. For the 2014 data this must be filtered to “00” to limit to one response per household. For the 2019 data, the variable name is RELSHIPP and the filter must be set to “20”.

Plan for GIS operations

For the GIS operations, I will download the shapefiles using the purrr::map() function and merge the data with the Minneapolis housing data. I will produce the following maps:

  • Map 1: Percent of all residents who are renters in each census tract in 2014
  • Map 2: Percent of all residents who are renters in each census tract in 2019
  • Map 3: Percent of renters in each census tract who are cost burdened in 2014
  • Map 4: Percent of renters in each census tract who are cost burdened in 2019
  • Map 5: Change in percent of renters who are cost burdened from 2014 to 2019

Cleaning the data and initial summary statistics

First, let’s download and clean Minnesota PUMS data for 2014.

Code
mnpums2014 <- get_pums(
  variables = c("PUMA", "TYPE", "TEN", "GRPIP", "HHT", "RELP"),
  state = "MN",
  variables_filter = list(SPORDER = 1, TYPE = 1), # SPORDER = 1 gets households, TYPE = 1 gets housing units and eliminates group quarters.
  #puma = c(1405, 1406, 1407), # I can't figure out how to select by puma.  Maybe capitalize "PUMA"?
  survey = "acs1",
  year = 2014
  )
Getting data from the 2014 1-year ACS Public Use Microdata Sample

This is a sample of all respondents in Minnesota for 2014. Let’s check the sample size.

Code
nrow(mnpums2014)
[1] 21524

Let’s create a new dataframe with only the PUMAs that cover Minneapolis (1405, 1406, and 1407) in 2014.

Code
dat2014 <- mnpums2014 %>%
  filter(PUMA %in% c("1405", "1406", "1407"))

Let’s modify the PUMA variable in the 2014 dataframe so the PUMA is 5 digits.

Code
dat2014 <- dat2014 %>%
  mutate(PUMA = case_when(.$PUMA == "1405" ~ "01405",
                          .$PUMA == "1406" ~ "01406",
                          .$PUMA == "1407" ~ "01407",
                          )
         )

Let’s check the sample size.

Code
nrow(dat2014)
[1] 1023

Let’s check the distribution by PUMA.

Code
tabyl(dat2014$PUMA)
 dat2014$PUMA   n   percent
        01405 344 0.3362659
        01406 331 0.3235582
        01407 348 0.3401760

In order to get just one observation per household, we need to ensure RELP == 0 for every record.

Code
tabyl(dat2014$RELP)
 dat2014$RELP    n percent
            0 1023       1

So now we know there is only one observation per household. Let’s see the distribution of renters vs. non-renters. 1 is owned with a mortgage, 2 is owned free and clear, 3 is rented, 4 is occupied without payment of rent.

Code
tabyl(dat2014$TEN)
 dat2014$TEN   n     percent
           1 401 0.391984360
           2 158 0.154447703
           3 461 0.450635386
           4   3 0.002932551

Let’s recode the TEN variable categories to reflect renters and non-renters.

Code
dat2014 <- dat2014 %>%
  mutate(tenure = (ifelse(.$TEN == 3, 'Renter', 'Non-renter')))

tabyl(dat2014$tenure)
 dat2014$tenure   n   percent
     Non-renter 562 0.5493646
         Renter 461 0.4506354

Making a data frame with percent of renters in 2014

Code
renters2014 <- dat2014%>%
  group_by(PUMA, tenure) %>%
  summarise(N = sum(WGTP)) %>%
  mutate(Proportion = N/sum(N)) %>% 
  filter(tenure == 'Renter')
`summarise()` has grouped output by 'PUMA'. You can override using the
`.groups` argument.
Code
renters2014$Proportion <- renters2014$Proportion*100

renters2014
# A tibble: 3 × 4
# Groups:   PUMA [3]
  PUMA  tenure     N Proportion
  <chr> <chr>  <dbl>      <dbl>
1 01405 Renter 25280       47.7
2 01406 Renter 21451       40.9
3 01407 Renter 40292       60.8

Let’s filter only renters and check the distribution of rent as a percentage of income.

Code
dat2014 <- dat2014 %>%
  filter(tenure == "Renter")

dat2014$GRPIP <- (as.numeric(dat2014$GRPIP))/100

hist(dat2014$GRPIP)

Let’s create a new variable called “cost_burden” with three levels:

  • Paying less than 30% of income on rent is “Not cost burdened”.
  • Paying between 30% of income on rent and 49.4% of income on rent is “Cost burdened”.
  • Paying greater than 50% of income on rent is “Extremely cost burdened”.
Code
dat2014 <- dat2014 %>%
  mutate(cost_burden = case_when(.$GRPIP <.30 ~ "Not cost burdened",
                                 .$GRPIP >=.30 & .$GRPIP <.50 ~ "Cost burdened",
                                 .$GRPIP >=.50 ~ "Extremely cost burdened",
                                 )
         )

Let’s check the distribution of rent cost burden in Minneapolis in 2014.

Code
tabyl(dat2014$cost_burden)
     dat2014$cost_burden   n   percent
           Cost burdened  95 0.2060738
 Extremely cost burdened 136 0.2950108
       Not cost burdened 230 0.4989154

Now, let’s download and clean Minnesota PUMS data for 2019. (PUMS data was not released for 2020 and PUMS geographies are not available for 2021, so those years are not available.)

Code
mnpums2019 <- get_pums(
  variables = c("PUMA", "TYPE", "TEN", "GRPIP", "HHT", "RELSHIPP"),
  state = "MN",
  variables_filter = list(SPORDER = 1, TYPE = 1), # SPORDER = 1 gets households, TYPE = 1 gets housing units and eliminates group quarters.
  #puma = c(1405, 1406, 1407), # I can't figure out how to select by puma.  Maybe capitalize "PUMA".
  survey = "acs1",
  year = 2019
  )
Getting data from the 2019 1-year ACS Public Use Microdata Sample

This is a sample of PUMS respondents in Minnesota for 2019. Let’s check the sample size.

Code
nrow(mnpums2019)
[1] 22576

First, let’s create a new dataframe with only the PUMAs that cover Minneapolis (1405, 1406, and 1407) in 2019.

Code
dat2019 <- mnpums2019 %>%
  filter(PUMA %in% c("01405", "01406", "01407"))

Let’s check the sample size.

Code
nrow(dat2019)
[1] 1054

Let’s check the distribution by PUMA.

Code
tabyl(dat2019$PUMA)
 dat2019$PUMA   n   percent
        01405 350 0.3320683
        01406 336 0.3187856
        01407 368 0.3491461

In order to get just one observation per household, we need to ensure RELSHIPP == 20 for every record.

Code
tabyl(dat2019$RELSHIPP)
 dat2019$RELSHIPP    n percent
               20 1054       1

So now we know there is only one observation per household. Let’s see the distribution of renters vs. non-renters. 1 is owned with a mortgage, 2 is owned free and clear, 3 is rented, 4 is occupied without payment of rent.

Code
tabyl(dat2019$TEN)
 dat2019$TEN   n   percent
           1 413 0.3918406
           2 174 0.1650854
           3 461 0.4373814
           4   6 0.0056926

Let’s recode the TEN variable categories to reflect renters and non-renters.

Code
dat2019 <- dat2019 %>%
  mutate(tenure = (ifelse(.$TEN == 3, 'Renter', 'Non-renter')))

tabyl(dat2019$tenure)
 dat2019$tenure   n   percent
     Non-renter 593 0.5626186
         Renter 461 0.4373814

Making a data frame with percent of renters in 2019.

Code
renters2019 <- dat2019%>%
  group_by(PUMA, tenure) %>%
  summarise(N = sum(WGTP)) %>%
  mutate(Proportion = N/sum(N)) %>% 
  filter(tenure == 'Renter')
`summarise()` has grouped output by 'PUMA'. You can override using the
`.groups` argument.
Code
renters2019$Proportion <- renters2019$Proportion*100

renters2019
# A tibble: 3 × 4
# Groups:   PUMA [3]
  PUMA  tenure     N Proportion
  <chr> <chr>  <dbl>      <dbl>
1 01405 Renter 27542       49.5
2 01406 Renter 23169       42.8
3 01407 Renter 44253       59.6

Let’s filter only renters and check the distribution of rent as a percentage of income.

Code
dat2019 <- dat2019 %>%
  filter(tenure == "Renter")

dat2019$GRPIP <- (as.numeric(dat2019$GRPIP))/100

hist(dat2019$GRPIP)

Let’s create a new variable called “cost_burden” with three levels:

  • Paying less than 30% of income on rent is “Not cost burdened”.
  • Paying between 30% of income on rent and 49.4% of income on rent is “Cost burdened.
  • Paying greater than 50% of income on rent is “Extremely cost burdened”.
Code
dat2019 <- dat2019 %>%
  mutate(cost_burden = case_when(.$GRPIP <.30 ~ "Not cost burdened",
                                 .$GRPIP >=.30 & .$GRPIP <.50 ~ "Cost burdened",
                                 .$GRPIP >=.50 ~ "Extremely cost burdened",
                                 )
         )

Let’s check the distribution of rent cost burden in Minneapolis in 2019.

Code
tabyl(dat2019$cost_burden)
     dat2019$cost_burden   n   percent
           Cost burdened  86 0.1865510
 Extremely cost burdened  94 0.2039046
       Not cost burdened 281 0.6095445

Now I need to merge the 2014 and 2019 dataframes.

Let’s rename RELSHIPP as RELP in the 2019 dataframe.

Code
dat2019 <- dat2019  %>%
  rename_at('RELSHIPP', ~'RELP')

Let’s add a year column to both dataframes.

Code
dat2014 <- dat2014 %>%
  mutate(year = "2014")

dat2019 <- dat2019 %>%
  mutate(year = "2019")

Let’s merge these dataframes into one.

Code
dat <- rbind(dat2014, dat2019)

head(dat)
# A tibble: 6 × 14
  SERIALNO  WGTP PWGTP PUMA  TEN   GRPIP HHT   RELP  SPORDER TYPE  ST    tenure
  <chr>    <dbl> <dbl> <chr> <chr> <dbl> <chr> <chr> <chr>   <chr> <chr> <chr> 
1 2339        96    96 01405 3      0.17 6     0     1       1     27    Renter
2 10042      106   106 01407 3      0.24 7     0     1       1     27    Renter
3 22566      132   132 01405 3      0.31 3     0     1       1     27    Renter
4 25036      350   350 01405 3      0.22 3     0     1       1     27    Renter
5 27547      319   319 01406 3      0.21 3     0     1       1     27    Renter
6 30186      178   178 01407 3      0.97 6     0     1       1     27    Renter
# ℹ 2 more variables: cost_burden <chr>, year <chr>

Let’s make a table showing the change in rent cost burden in the city from 2014 to 2019. This code weighs the data with WGTP, the variable for household weight.

Code
tab1 <- dat %>% 
  group_by(year, cost_burden) %>% 
  summarise(N = sum(WGTP)) %>%
  mutate(Proportion = N/sum(N))
`summarise()` has grouped output by 'year'. You can override using the
`.groups` argument.
Code
gt(tab1)
cost_burden N Proportion
2014
Cost burdened 17160 0.1971892
Extremely cost burdened 28138 0.3233398
Not cost burdened 41725 0.4794709
2019
Cost burdened 16108 0.1696222
Extremely cost burdened 20488 0.2157449
Not cost burdened 58368 0.6146329

This table shows the share of cost burdened renters declined from 19.7% in 2014 to 17.0% in 2019. The proportion of “Extremely cost burdened” renters declined from 32.2% in 2014 to 21.6% in 2019. The share of “Not cost burdened” renters rose from 47.9% in 2014 to 61.5% in 2019.

Let’s create a new variable called “cost_burden2” with two levels:

  • Paying less than 30% of income on rent is “Not cost burdened”.
  • Paying 30% or greater of income on rent is “Cost burdened.
Code
dat <- dat %>%
  mutate(cost_burden2 = case_when(.$GRPIP <.30 ~ "Not cost burdened",
                                  .$GRPIP >=.30 ~ "Cost burdened",
                                 )
         )

Let’s make a table showing the change in rent cost burden (cost_burden2) in the city from 2014 to 2019. This code weighs the data with WGTP, the variable for household weight.

Code
tab2 <- dat %>% 
  group_by(PUMA, year, cost_burden2) %>% 
  summarise(N = sum(WGTP)) %>%
  mutate(Proportion = N/sum(N))
`summarise()` has grouped output by 'PUMA', 'year'. You can override using the
`.groups` argument.
Code
gt(tab2)
cost_burden2 N Proportion
01405 - 2014
Cost burdened 14819 0.5861946
Not cost burdened 10461 0.4138054
01405 - 2019
Cost burdened 12676 0.4602425
Not cost burdened 14866 0.5397575
01406 - 2014
Cost burdened 12346 0.5755443
Not cost burdened 9105 0.4244557
01406 - 2019
Cost burdened 9089 0.3922914
Not cost burdened 14080 0.6077086
01407 - 2014
Cost burdened 18133 0.4500397
Not cost burdened 22159 0.5499603
01407 - 2019
Cost burdened 14831 0.3351411
Not cost burdened 29422 0.6648589

Downloading the shape files for 2014 and 2019.

Downloading the shapefile for Minnesota for 2014. I tried to use the FIPS code for Hennepin County, but the shape file is for all of Minnesota.

Code
# What package is the map function from (purrr)?  Why are we calling tigris::pumas?  What is 'cb'?  Can I use this sf file with tmap?

hmap2014 <- map("27053", # Getting PUMA geography for Hennepin County.
                tigris::pumas,
                class = "sf",
                cb = TRUE,
                year = 2014) %>%
  reduce(rbind) # This changes the list to a dataframe.  left_join won't work if you don't do this.

hcpumas2014 <- as.vector(hmap2014$PUMACE10)

Downloading the shapefile for Minnesota for 2019. I tried to use the FIPS code for Hennepin County, but the shape file is for all of Minnesota.

Code
hmap2019 <- map("27053",
                tigris::pumas,
                class = "sf",
                cb = TRUE,
                year = 2019) %>%
  reduce(rbind)

hcpumas2019 <- as.vector(hmap2019$PUMACE10)

# Cartographic boundary PUMAs are not yet available for years after 2019. Use the argument `year = 2019` instead to request your data.

Use mapview to view the shapefile. It shows all the PUMAs in the state of Minnesota.

Code
mapview(hmap2014)

Filter for the 3 PUMAs I am interested in for 2014 and create a quick map.

Code
mlps2014_sf <- hmap2014 %>%
  filter(PUMACE10 %in% c('01405', '01406', '01407'))

mapview(mlps2014_sf)

Should I expand the mapping/analysis to all of Hennepin County?

Using ggplot to make a map showing percent of renters in 2014.

Code
# These PUMAS are missing the 0 in the front.
map1 <- mlps2014_sf %>%
  left_join(renters2014, by = c("PUMACE10" = "PUMA")) %>%
  ggplot(aes(fill = Proportion)) +
  geom_sf() +
  scale_fill_viridis_b(
    name = NULL,
    option = "magma",
    labels = scales::label_percent(1)
    ) +
  # labs(title = "Mean Gross Rent as a Percentage of Houshold Income \nMapped by Public Use Microdata Area") +
  ggtitle("Percent of households who rent, 2014",
          subtitle = "Source: U.S. Census Bureau, 2014 ACS 1 year") +
  theme_void()

map1

Using ggplot to make a map showing percent of renters in 2019.

Code
map2 <- mlps2014_sf %>%
  left_join(renters2019, by = c("PUMACE10" = "PUMA")) %>%
  ggplot(aes(fill = Proportion)) +
  geom_sf() +
  scale_fill_viridis_b(
    name = NULL,
    option = "magma",
    labels = scales::label_percent(1)
    ) +
  # labs(title = "Mean Gross Rent as a Percentage of Houshold Income \nMapped by Public Use Microdata Area") +
  ggtitle("Percent of households who rent, 2019",
          subtitle = "Source: U.S. Census Bureau, 2014 ACS 1 year") +
  theme_void()

map2

Next, use ggplot to make maps of the percent who are cost burdened in 2014 and 2019

Make a dataframe grouped by PUMA showing percent of renters who are cost burdened in 2014.

Code
cb2014 <- dat %>%
  filter(year == 2014) %>%
  group_by(PUMA, cost_burden2) %>%
  summarise(N = sum(WGTP)) %>%
  mutate(Proportion = N/sum(N)) %>% 
  filter(cost_burden2 == 'Cost burdened')
`summarise()` has grouped output by 'PUMA'. You can override using the
`.groups` argument.
Code
cb2014$Proportion <- cb2014$Proportion*100

cb2014
# A tibble: 3 × 4
# Groups:   PUMA [3]
  PUMA  cost_burden2      N Proportion
  <chr> <chr>         <dbl>      <dbl>
1 01405 Cost burdened 14819       58.6
2 01406 Cost burdened 12346       57.6
3 01407 Cost burdened 18133       45.0

Join my 2014 cost burden data with the shapefile and make a map.

Make a dataframe grouped by PUMA showing percent of renters who are cost burdened in 2019.

Code
cb2019 <- dat %>%
  filter(year == 2019) %>%
  group_by(PUMA, cost_burden2) %>%
  summarise(N = sum(WGTP)) %>%
  mutate(Proportion = N/sum(N)) %>% 
  filter(cost_burden2 == 'Cost burdened')
`summarise()` has grouped output by 'PUMA'. You can override using the
`.groups` argument.
Code
cb2019$Proportion <- cb2019$Proportion*100

cb2019
# A tibble: 3 × 4
# Groups:   PUMA [3]
  PUMA  cost_burden2      N Proportion
  <chr> <chr>         <dbl>      <dbl>
1 01405 Cost burdened 12676       46.0
2 01406 Cost burdened  9089       39.2
3 01407 Cost burdened 14831       33.5

Join my 2019 cost burden data with the shapefile and make a map.

Blog Post 3

Preliminary results (500-750 words, 1-2 tables, 1-2 figures). This is where you will describe the preliminary results of your project. Here you will describe the justification of which techniques you used, and describe the preliminary results of your analysis based on the procedures you conducted.

Percent of renters in each PUMA in 2014

Code
gt(renters2014)
tenure N Proportion
01405
Renter 25280 47.66394
01406
Renter 21451 40.89254
01407
Renter 40292 60.82179

In 2014, 48% of households in PUMA 1405 were renters. 40.1% of households in PUMA 1406 were renters. 60.1% of households in PUMA 1407 were renters.

Code
tmap1 <- mlps2014_sf %>%
  left_join(renters2014, by = c("PUMACE10" = "PUMA"))

tm_shape(tmap1)+
  tm_polygons("Proportion",
              title="Percent of renters in each PUMA",
              palette="Reds"
              )+
  tm_format("World",
            title="Percent of renters in 2014",
            legend.format = list(digits = 2),
            legend.outside=T)+
  tm_scale_bar()+
  tm_compass()

Percent of renters in each PUMA in 2019

Code
gt(renters2019)
tenure N Proportion
01405
Renter 27542 49.52706
01406
Renter 23169 42.80805
01407
Renter 44253 59.64981

In 2019, 49.6% of households in PUMA 1405 were renters. 42.8% of households in PUMA 1406 were renters. 59.6% of households in PUMA 1407 were renters.

Code
tmap2 <- mlps2014_sf %>%
  left_join(renters2019, by = c("PUMACE10" = "PUMA"))

tm_shape(tmap2)+
  tm_polygons("Proportion",
              title="Percent of renters in each PUMA",
              palette="Reds"
              )+
  tm_format("World",
            title="Percent of renters in 2019",
            legend.outside=T)+
  tm_scale_bar()+
  tm_compass()

Percent of renters who are cost burdened in each PUMA in 2014.

Code
gt(cb2014)
cost_burden2 N Proportion
01405
Cost burdened 14819 58.61946
01406
Cost burdened 12346 57.55443
01407
Cost burdened 18133 45.00397

In 2014, 58.6% of renters were cost burdened in PUMA 1405. 57.6% of renters were cost burdened in PUMA 1406. 45.0% of renters were cost burdened in PUMA 1407.

Code
tmap3 <- mlps2014_sf %>%
  left_join(cb2014, by = c("PUMACE10" = "PUMA"))

tm_shape(tmap3)+
  tm_polygons("Proportion",
              title="Percent of renters who are cost burdened",
              palette="Reds"
              )+
  tm_format("World",
            title="Percent who are cost burdened in 2014",
            legend.outside=T)+
  tm_scale_bar()+
  tm_compass()

Percent of renters who are cost burdened in each PUMA in 2019.

Code
gt(cb2019)
cost_burden2 N Proportion
01405
Cost burdened 12676 46.02425
01406
Cost burdened 9089 39.22914
01407
Cost burdened 14831 33.51411

In 2019, 46.0% of renters were cost burdened in PUMA 1405. 39.2% of renters were cost burdened in PUMA 1406. 33.6% of renters were cost burdened in PUMA 1407.

Code
tmap4 <- mlps2014_sf %>%
  left_join(cb2019, by = c("PUMACE10" = "PUMA"))

tm_shape(tmap4)+
  tm_polygons("Proportion",
              title="Percent of renters who are cost burdened",
              palette="Reds"
              )+
  tm_format("World",
            title="Percent who are cost burdened in 2019",
            legend.outside=T)+
  tm_scale_bar()+
  tm_compass()

Create a data frame to find the difference in cost burden between 2014 and 2019.

Code
fiveyr <- cb2019 %>% 
  select(PUMA, Proportion)

fiveyr <- rename(fiveyr, Prop2019 = Proportion)

fiveyr <- merge(fiveyr, cb2014)

fiveyr <- fiveyr %>% 
  select(PUMA, Prop2019, Proportion)

fiveyr <- rename(fiveyr, Prop2014 = Proportion)

fiveyr <- fiveyr %>% 
  mutate(diff = round(((Prop2019 - Prop2014)), digits = 1))

gt(fiveyr)
PUMA Prop2019 Prop2014 diff
01405 46.02425 58.61946 -12.6
01406 39.22914 57.55443 -18.3
01407 33.51411 45.00397 -11.5

Between 2014 and 2019, the percent of renters who were costs burdened declined by 12.6% in PUMA 1405.

The proportion of renters who were cost burdened declined by 18.3% in PUMA 1406.

The proportion of renters who were cost burdened declined by 11.5% in PUMA 1407.

Change in renters who are cost burdened from 2014 to 2019

Code
tmap5 <- mlps2014_sf %>%
  left_join(fiveyr, by = c("PUMACE10" = "PUMA"))

tm_shape(tmap5)+
  tm_polygons("diff",
              title="Change in renters who are cost burdened from 2014 to 2019",
              palette="Reds"
              )+
  tm_format("World",
            title="Change in renters who are cost burdened from 2014 to 2019",
            legend.outside=T)+
  tm_scale_bar()+
  tm_compass()

In this project, I analyzed the change in affordability in Minneapolis, Minnesota at the PUMA level. I downloaded microdata from the US Census Bureau and analyzed rent as a percentage of income. “Cost burdened” was defined as paying greater than 30% of income on housing. I calculated summary statistics for the three PUMAS in Minneapolis. The city is comprised of three PUMAs, 1405, 1406, and 1407.

The preliminary results are:

  • In 2014, 48% of households in PUMA 1405 were renters. 40.1% of households in PUMA 1406 were renters. 60.1% of households in PUMA 1407 were renters.
  • In 2019, 49.6% of households in PUMA 1405 were renters. 42.8% of households in PUMA 1406 were renters. 59.6% of households in PUMA 1407 were renters.
  • In 2014, 58.6% of renters were cost burdened in PUMA 1405. 57.6% of renters were cost burdened in PUMA 1406. 45.0% of renters were cost burdened in PUMA 1407.
  • In 2019, 46.0% of renters were cost burdened in PUMA 1405. 39.2% of renters were cost burdened in PUMA 1406. 33.6% of renters were cost burdened in PUMA 1407.
  • Between 2014 and 2019, the percent of renters who were costs burdened declined by 12.6% in PUMA 1405.
  • Between 2014 and 2019, the proportion of renters who were cost burdened declined by 18.3% in PUMA 1406.
  • Between 2014 and 2019, the proportion of renters who were cost burdened declined by 11.5% in PUMA 1407.

Blog Post 4

Final project discussion (500-750 words, tables and figures as necessary). Here you will describe the overall results of your project, including how your results aligned with your research questions from post 1, the overall takeaway from your project, the limitations of your project and how you could build upon this project in the future.

Introduction

On December 5, 2014, the Minneapolis City Council passed a zoning code amendment which allows accessory dwelling units (or ADUs) to be constructed on lots with single or two-family homes (“Accessory Dwelling Unit Zoning Code Text Amendment Passes in Minneapolis,” 2014). An ADU must provide a separate living space with a separate entrance. They are also known as granny flats, mother-in-law apartments, carriage houses, or casitas (Speck, 2018).

ADUs emerged as a policy option to increase housing supply and provide affordable options in areas impacted by housing shortages (Kim et al., 2023). ADUs can be created by converting portions of existing homes, constructing additions to existing homes, or constructing new structures on the same lots as existing structures (Speck, 2018). One benefit of ADUs is that they can add housing units to built-up areas, adding both new units to the supply of housing as well as providing diversity in the types of housing available.

However, ADUs are controversial in some areas due to homeowner resistance, regulatory burden, or the cost of construction (Speck, 2018). For example, many ADU ordinances require that the unit be built with off-street parking. This increases the expense for construction and takes up additional space on existing lots, which can disincentivize ADU construction (Brinig & Garnett, 2013). Due to their controversial nature, only a few municipalities nationwide have passed ADU ordinances (Kim et al., 2023; Maaoui, 2018).

One of the main goals of ADU ordinances is to improve housing affordability for renters (Speck, 2018). However, few studies have analyzed the effect of ADU ordinances on affordability. Kim et al. (2023) utilized multilevel logistic regression to examine the effects of an ADU ordinance in Los Angeles, California. They found the types of properties that constructed ADUs after the ordinance was passed were more varied in terms of lot characteristics. For example, the ordinance probably led to more ADU development on lots near bus transit. However, the study did not explore the relationship between the ordinance and changes in rent costs.

Gabbe (2019) analyzed zoning changes to address high housing prices in Los Angeles, between 2016 and 2020. The zoning changes included easing regulations on ADUs. The study concluded Los Angeles made progress in five categories of policy recommendations, including ADUs, but the study explicitly did not evaluate the effect on rent costs.

Maaoui (2018) analyzed the correlation between minority household concentration and the permitting of ADUs in unincorporated areas of King County, Washington. The study found permits were positively correlated with Black and Latino households relative to White households, and negatively correlated with Asian households. Furthermore, permits were negatively correlated with low- and high-income households, and positively correlated with middle income households. Again, the study did not investigate the effect of ADUs on rent costs.

This study contributes to the literature by calculating whether the percentage of renters who are cost burdened changed in Minneapolis between 2014, when the ADU ordinance was passed, and 2019. The year 2019 was chosen because it was the last year 1-year PUMS data was available prior to the COVID epidemic. Due to the COVID epidemic, the U.S. Census Bureau did not release standard 1-year PUMS data for 2020 (U.S. Census Bureau, 2022a).

The data source for this study is the American Community Survey Public Use Microdata Sample Files, also known as ACS microdata or PUMS (U.S. Census Bureau, 2021). ACS microdata consists of records with information about characteristics of individual households, with personalized information removed (U.S. Census Bureau, 2021). One-year PUMS include records for about 1 percent of the total population. The samples are divided by Public Use Microdata Areas (PUMAs). Each PUMA has a population of at least 100,000 and no more than 200,000.

The benefit of microdata is that it provides individual responses to the full range of topics in the ACS. PUMS contains approximately 200 housing-level variables, which permits data users to analyze specific population groups and create custom variables that are not available through ACS summary tables. This allows more flexibility than ACS summary data, allowing users to focus on demographic characteristics and create unique household variables.

The geographic area of analysis for this project is the city of Minneapolis, Minnesota. In order to extract a sample for the city of Minneapolis from PUMS, the PUMAs covering Minneapolis were identified as Minnesota PUMAs 1405, 1406, and 1407 (U.S. Census Bureau, 2022b). Samples of 1-year PUMS data were extracted for the years of 2014 and 2019 using the get_pums() function in R.

Research Objective

The objective of this study is to measure the change in cost burden in Minneapolis, Minnesota, between the years of 2014 and 2019. “Rent cost” is defined as rent as a percentage of household income. “Cost burden” is defined as the percent of renters whose rent exceeds 30% of their gross income for the past 12 months.

The PUMS variable GRPIP is defined as “Gross rent as a percentage of household income past 12 months” (U.S. Census Bureau, 2015). It is a 3-digit numerical variable with 001 to 100 representing 1% to 100% and the number 101 representing 101% or more. The proportion of cost burdened renters was derived from GRPIP for the three PUMAs.

Additional variables were extracted in order to conduct the analysis. The PUMS variable TEN is defined as “Tenue.” It is a one-digit numerical variable with five possible values. The value “3” reflects that the household is “rented.” This variable was used to isolate renters from homeowners.

“Rent cost” is defined as rent as a percentage of income. In 2014, the median rent cost for renters in Minneapolis was 30.0%. In 2019, the median was 25.0%. Between 2014 and 2019, the median decreased by 5.0%, meaning rent as a proportion of income decreased during that time.

The following five maps were produced to show changes between 2014 and 2019: - Map 1: Percent of all households who are renters in each census tract in 2014 - Map 2: Percent of all households who are renters in each census tract in 2019 - Map 3: Percent of renters in each census tract who are cost burdened in 2014 - Map 4: Percent of renters in each census tract who are cost burdened in 2019 - Map 5: Change in percent of renters who are cost burdened from 2014 to 2019

A series of statistical tests were performed to determine if the mean difference in rent costs between the two years is statistically significant. First, the assumption of normality was tested for the two samples. A Shapiro-Wilk test for normality was conducted using the shapiro.test() function in R Studio. The p-value was less than 0.05 for both values, so the assumption of normality was rejected. Next, Boxplots were produced to visually show outliers. The data showed outliers in the 2019 data.

The lack of normality and presence of outliers violate the assumptions of a t-test. As an alternate, a Wilcoxson Rank Sum test was performed using the wilcox.test() function in R Studio. This tests for the differences in the sample means when violations of normality and outliers are present.

Results

Code
gt(fiveyr)
PUMA Prop2019 Prop2014 diff
01405 46.02425 58.61946 -12.6
01406 39.22914 57.55443 -18.3
01407 33.51411 45.00397 -11.5

The percent of renters who are cost burden declined in every PUMA between 2014 and 2019. In PUMA 1405, the percent of cost burdened renters declined by 12.6%. In PUMA 1406, the percent of cost burdened renters declined by 18.3%. In PUMA 1407, the percent of cost burdened renters declined by 11.5%.

The Wilcoxson Rank Sum test of the median rent cost by year resulted in a two-sided test p-value = 7.892e-05. This indicates that we should reject the null hypothesis that distributions are equal and conclude that there is a statistically significant difference in rent costs between the two years. Unweighted descriptive statistics using the summarise() function indicated that the median rent cost in 2014 was 30% and the median rental cost in 2019 was 26%. The difference between the median rent costs of each year is about 4%. We are 95% certain that the median difference between 2014 and 2019 is between 2% and 7%. Thus, the 4% decline in unweighted media rent cost from 2014 to 2019 is statistically significant.

Limitations

The study has a number of limitations. Minneapolis has a relatively small population and is covered by only three PUMAs. As a result, the sample sizes for some demographic groups are relatively small which could have affected the calculations. Studying a population with a larger sample size, such as Hennepin County, may produce more accurate descriptive statistics. The challenge with this approach is that changes in policy often occur at smaller geographic levels, such as the city level. Therefore, small sample sizes create a challenge for researchers studying the impacts of policies on smaller geographies.

A test for statistical significance was only conducted for the sample of all renters, but were not conducted for individual PUMAs. Furthermore, the study was not designed to control for other variables or find evidence of causation. Other statistical methods, such as multivariate regression or difference-in-difference could be used to analyze rent costs in relationship with other demographic variables.

Future Directions

The scope of the research could be increased to larger geographical areas or multiple geographical areas. Increasing the geographic scope would increase the sample sizes and improve the accuracy of the statistical tests. Future research could use regression models or tests such as difference-in-difference to explore the impact of the ADU ordinance in the context of multiple variables. Regression models could include demographic characteristics such as sex, race/ethnicity, age groups, income quantiles, presence of children, and citizenship status. Other variables could include local, state, or national housing policies, as well as indexes for inequality. Similar research could be conducted on other policies intended to decrease rent costs, such as incentives to construct multifamily housing and “upzoning” neighborhoods with single family homes. Furthermore, physical features related to rent affordability could be mapped and anaylzed in GIS, including sites of ADU construction.

Projecting maps and editing legends for consistent colors and scales.

Projecting the maps using EPSG 2812, a projection for Minnesota South. Correting the scales so the colors and legends are consistent between maps.

Code
new_tmap1 <- st_transform(tmap1, crs = 2812)
new_tmap2 <- st_transform(tmap2, crs = 2812)
new_tmap3 <- st_transform(tmap3, crs = 2812)
new_tmap4 <- st_transform(tmap4, crs = 2812)
new_tmap5 <- st_transform(tmap5, crs = 2812)

Map 1: Percent of all households that are renters in each census tract in 2014

Code
tm_shape(new_tmap1)+
  tm_polygons(title="Percent of renters in each PUMA",
              palette="Reds",
              col = "Proportion",
              breaks = c(40, 45, 50, 55, 60, 65)
              )+
  tm_format("World",
            title="Percent of renters in 2014",
            legend.outside=T)+
  tm_scale_bar()+
  tm_compass()

Map 2: Percent of all households that are renters in each census tract in 2019

Code
tm_shape(new_tmap2)+
  tm_polygons(title="Percent of renters in each PUMA",
              palette="Reds",
              col = "Proportion",
              breaks = c(40, 45, 50, 55, 60, 65)
              )+
  tm_format("World",
            title="Percent of renters in 2019",
            legend.outside=T)+
  tm_scale_bar()+
  tm_compass()

Map 3: Percent of renters in each census tract who are cost burdened in 2014

Code
tm_shape(new_tmap3)+
  tm_polygons(title="Percent of renters who are cost burdened",
              palette="Blues",
              col = "Proportion",
              breaks = c(30, 35, 40, 45, 50, 55, 60)
              )+
  tm_format("World",
            title="Percent who are cost burdened in 2014",
            legend.outside=T)+
  tm_scale_bar()+
  tm_compass()

Map 4: Percent of renters in each census tract who are cost burdened in 2019

Code
tm_shape(new_tmap4)+
  tm_polygons(title="Percent of renters who are cost burdened",
              palette="Blues",
              col = "Proportion",
              breaks = c(30, 35, 40, 45, 50, 55, 60)
              )+
  tm_format("World",
            title="Percent who are cost burdened in 2019",
            legend.outside=T)+
  tm_scale_bar()+
  tm_compass()

Map 5: Change in percent of renters who are cost burdened from 2014 to 2019

Code
tm_shape(new_tmap5)+
  tm_polygons(title="Change in renters who are cost burdened from 2014 to 2019",
              palette="Greens",
              col = "diff",
              breaks = c(-20, -18, -16, -14, -12, -10)
              )+
  tm_format("World",
            title="Change in renters who are cost burdened from 2014 to 2019",
            legend.outside=T)+
  tm_scale_bar()+
  tm_compass()

Testing data for normality

Performing the Shapiro-Wilk Test for Normality on the variable GRPIP in the samples from 2014 and 2019.

Code
swtab <- dat %>%
  group_by(year) %>%
  summarise(`W Statistic` = shapiro.test(GRPIP)$statistic,
            `p-value` = shapiro.test(GRPIP)$p.value)

swtab
# A tibble: 2 × 3
  year  `W Statistic` `p-value`
  <chr>         <dbl>     <dbl>
1 2014          0.844  6.08e-21
2 2019          0.825  4.56e-22

Because the p-value < 0.05 for both samples, we reject the assumption of normality. The distributions are not normal.

Producing boxplots and visually checking for outliers

Code
ggplot(dat, aes(x = year, y = GRPIP, fill = year)) +
  stat_boxplot(geom ="errorbar", width = 0.5) +
  geom_boxplot(fill = "light blue") + 
  stat_summary(fun=mean, geom="point", shape=10, size=3.5, color="black") + 
  ggtitle("Boxplots of rent cost burdent in 2014 and 2019") + 
  theme_bw() + theme(legend.position="none")

The boxplots show many outliers in 2019.

The lack of normality or severe impact of outliers can violate independent sample t-test assumptions and ultimately the results. If this happens, there are other options.

Performing a nonparametric Mann-Whitney U test is the most popular alternative. This is also known as the Mann-Whitney-Wilcoxon or the Wilcoxon Rank Sum test. This test is considered robust to violations of normality and outliers (among others) and tests for differences in mean ranks.

Produce descriptive statistics by year.

Code
dat %>%
  select(GRPIP, year) %>%
  group_by(year) %>% 
  summarise(n = n(),
            median=median(GRPIP, na.rm = TRUE), 
            mean = mean(GRPIP, na.rm = TRUE), 
            sd = sd(GRPIP, na.rm = TRUE),
            stderr = sd/sqrt(n), 
            LCL = mean - qt(1 - (0.05 / 2), n - 1) * stderr,
            UCL = mean + qt(1 - (0.05 / 2), n - 1) * stderr,
            min=min(GRPIP, na.rm = TRUE), 
            max=max(GRPIP, na.rm = TRUE),
            IQR=IQR(GRPIP, na.rm = TRUE))
# A tibble: 2 × 11
  year      n median  mean    sd stderr   LCL   UCL   min   max   IQR
  <chr> <int>  <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2014    461   0.3  0.410 0.297 0.0138 0.383 0.437     0  1.01  0.35
2 2019    461   0.26 0.341 0.261 0.0122 0.317 0.365     0  1.01  0.25

Perform the Mann-Whitney U test

The following assumptions must be met in order to run a Mann-Whitney U test:

  • Treatment groups are independent of one another. Experimental units only receive one treatment and they do not overlap.
  • The response variable of interest is ordinal or continuous.
  • Both samples are random.

The dependent response variable is median rent cost as a percentage of income.

The independent categorical variable is year.

Code
test1<-wilcox.test(GRPIP ~ year, data=dat, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
print(test1)

    Wilcoxon rank sum test with continuity correction

data:  GRPIP by year
W = 122212, p-value = 7.892e-05
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 0.02007798 0.06992372
sample estimates:
difference in location 
            0.04005175 

Interpretation of Mann-Whitney U test

The rent cost in each year is not normally distributed. Outliers exist in each group. A Mann-Whitney U test is more appropriate than a traditional independent samples t-test to compare the rent cost between the two years.

The Mann-Whitney U test results in a two-sided test p-value = 7.892e-05. This indicates that we should reject the null hypothesis that distributions are equal and conclude that there is a significant difference in rent cost between the two years. Descriptive statistics indicate that the median cost burden in 2014 was 30% and the median rent cost in 2019 was 26%. The difference between the median rent cost of each year is about 4%. We are 95% certain that the median difference between 2014 and 2019 is between 2% and 7%. Thus, the 4% decline in rent cost from 2014 to 2019 is statistically significant.

References

Accessory Dwelling Unit zoning Code Text Amendment Passes in Minneapolis. (2014, December 5). Targeted News Service. ProQuest One Academic. https://login.libweb.lib.utsa.edu/login?url=https://www.proquest.com/wire-feeds/accessory-dwelling-unit-zoning-code-text/docview/1632227679/se-2?accountid=7122

Brinig, M. F., & Garnett, N. S. (2013). A Room of One’s Own? Accessory Dwelling Unit Reforms and Local Parochialism. The Urban Lawyer, 45(3), 519–569. JSTOR.

Gabbe, C. J. (2019). Changing Residential Land Use Regulations to Address High Housing Prices: Evidence From Los Angeles. Journal of the American Planning Association, 85(2), 152–168. https://doi.org/10.1080/01944363.2018.1559078

Kim, D., Baek, S.-R., Garcia, B., Vo, T., & Wen, F. (2023). The influence of accessory dwelling unit (ADU) policy on the contributing factors to ADU development: An assessment of the city of Los Angeles. Journal of Housing and the Built Environment. https://doi.org/10.1007/s10901-022-10000-2

Maaoui, M. (2018). A granny flat of one’s own? The households that build accessory-dwelling units in Seattle’s King County. Berkeley Planning Journal, 30(1).

Speck, J. (2018). Encourage Granny Flats: Allow and Incentivize Accessory Dwelling Units. In J. Speck, Walkable City Rules (pp. 28–29). Island Press/Center for Resource Economics. https://doi.org/10.5822/978-1-61091-899-2_12

U.S. Census Bureau. (2015). 2014 ACS PUMS Data Dictionary.

U.S. Census Bureau. (2021). Understanding and Using the American Community Survey Public Use Microdata Sample Files: What Data Users Need to Know. US Government Printing Office. https://www.census.gov/content/dam/Census/library/publications/2021/acs/acs_pums_handbook_2021.pdf

U.S. Census Bureau. (2022a, March 28). 2020 PUMS Data. https://www.census.gov/programs-surveys/acs/microdata/access/2020.html

U.S. Census Bureau. (2022b, August 22). Public Use Microdata Areas (PUMAs). https://www.census.gov/programs-surveys/geography/guidance/geo-areas/pumas.html