Group Purrr Assignment

Introduction

In their Combatting Redlining Initiative, the Department of Justice defines redlining as an “illegal practice in which lenders avoid providing credit services to individuals living in or seeking to live in, communities of color because of the race, color, or national origin of the residents in those communities.” Modern day redlining litigation today consists of defining ‘peer lending institutions’ and comparing the location of loan originations to a target institution, within an urban whole. To that end, I have prepared a dataframe that contains all the home loan originations made in Ohio in 2022, limiting the sample to single-family homes. We will see whether Huntington Bank made loans to homes in majority-minority neighborhoods at a significantly lower rate than peer institutions. The main content of this kind of litigation involves the definition of a peer lender.

Let’s explore whether Huntington seemed to engage in redlining in Columbus without limiting the sample, i.e. treating every other institution as a peer.

oh_hmda <- read_csv('https://www.dropbox.com/scl/fi/oxtr5j32cbckidz9ur6g6/state_OH_actions_taken_1-2-3_dwelling_categories_Single-Family-1-4-Units-_Site-Built-Single-Family.csv?rlkey=fq1qebp6ig5nnky3a1ak5bn6j&st=6g4vqx18&dl=1') %>% 
  # selecting needed variables
  select(lei, 
         county_code,
         `derived_msa-md`,
         tract_minority_population_percent, 
         census_tract, 
         conforming_loan_limit,
         action_taken,
         preapproval,
         loan_purpose, 
         loan_type,
         business_or_commercial_purpose,
         construction_method,
         occupancy_type,
         multifamily_affordable_units,
         tract_to_msa_income_percentage) %>% 
  # annoying name fixed
  rename(msa_md = `derived_msa-md`) %>% 
  # only looking at originations
  filter(action_taken == 1) %>% 
  # tag majority-minority tracts & Huntington bank
  mutate(maj_minority_tract = ifelse(tract_minority_population_percent > 50,
                                     1, 
                                     0),
         huntington = ifelse(lei == "2WHM8VNJH63UN14OL754",
                             1, 
                             0)) %>% 
  as_tibble()

## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
##   dat <- vroom(...)
##   problems(dat)

## Rows: 417417 Columns: 99
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (26): lei, state_code, conforming_loan_limit, derived_loan_product_type,...
## dbl (62): activity_year, derived_msa-md, county_code, census_tract, action_t...
## lgl (11): applicant_ethnicity-3, applicant_ethnicity-4, applicant_ethnicity-...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

table <- oh_hmda %>% 
  filter(msa_md == 18140) %>% 
  group_by(huntington, maj_minority_tract) %>% 
  summarise(n = n())

## `summarise()` has grouped output by 'huntington'. You can override using the
## `.groups` argument.

summary(lm(maj_minority_tract ~ huntington, data = oh_hmda))

## 
## Call:
## lm(formula = maj_minority_tract ~ huntington, data = oh_hmda)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.0894 -0.0894 -0.0894 -0.0894  0.9331 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.0893954  0.0005297  168.77   <2e-16 ***
## huntington  -0.0224892  0.0016969  -13.25   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2821 on 314146 degrees of freedom
## Multiple R-squared:  0.0005588,  Adjusted R-squared:  0.0005556 
## F-statistic: 175.6 on 1 and 314146 DF,  p-value: < 2.2e-16

oh_hmda %>% 
  group_by(msa_md) %>% 
  summarise(n = n()) %>% 
  arrange(desc(n))

## # A tibble: 15 × 2
##    msa_md     n
##     <dbl> <int>
##  1  18140 64327
##  2  17460 57345
##  3  17140 51083
##  4  99999 48809
##  5  19430 22265
##  6  10420 20055
##  7  45780 15779
##  8  15940 11132
##  9  49660 10887
## 10  44220  3380
## 11  31900  3012
## 12  30620  2574
## 13  48260  1222
## 14  48540  1177
## 15  26580  1101

Uh oh… not looking wonderful. However, Huntington is a full service, huge lender. Some lenders may specialize in loans geared towards lower income people, such as FHA, and small lenders may have better coverage in majority-minority communities. Let’s do the same analyeses, but with the following peer groups.

Lenders with…

All lenders (no peer limitation)
The proportion of loans Conforming to Fannie’s loan limit (conforming_loan_limit == C) within 5 percentage points of Huntington
Peer group 2 adding condition of the proportion of Conventional loans (loan_type == 1) within 10 percentage points of Huntington
Peer group 3 adding condition total number of originated loans (Remember, I have already limited the sample to originated loans) within 50 percentage points.

Here, I’m saying ‘within x percentage points’ as +/- the target value. For example, if a lender made 10 originations, the peers making loans within 10 percentage points would have made 10 - (10 * .1) = 9 to 10 + (10 * .1) = 11 originations.

Let’s do this analysis for Cincinnati (msa_md == 17140), Cleveland (msa_md == 17460), and Columbus (msa_md == 18140).

Remember these peer groups should be defined after filtering to the MSA of interest. Also, lenders are defined by their LEI, so you will need to group by LEI when defining peers. Think like so…

# identifying the lenders who made more than 1500 originations in Columbus in 2022 

oh_hmda %>% 
  filter(msa_md == 18140) %>% 
  group_by(lei) %>%
  summarise(n = n()) %>% 
  filter(n > 1500) %>% 
  arrange(desc(n))

## # A tibble: 9 × 2
##   lei                      n
##   <chr>                <int>
## 1 2WHM8VNJH63UN14OL754  6467
## 2 549300FGXN1K3HLB1R50  2780
## 3 QFROUN1UWUYU0DVIWD51  2713
## 4 5493008CPTDVOS570626  2559
## 5 549300HFBEONQN2CK447  2132
## 6 AD6GFRVSDT01YPT1CS68  2037
## 7 7H6GLXDRUGQFU57RNE97  1696
## 8 549300Y6FMTOZSY0VW06  1669
## 9 549300RPOGWJRH63HS39  1581

# 2WHM8VNJH63UN14OL754 is Huntington, 549300FGXN1K3HLB1R50 is Rocket Mortgage, QFROUN1UWUYU0DVIWD51 if fifth third... face valid

I’m imagining making a nested dataframe for each city we’re interested in, and passing through a list of peer conditions before doing a uncontrolled regression (or a 2-sample difference in means with majority minority tract as the response and Huntington as the group, they’re equivalent here).

My attempt

# look at columbus first 

# conditions:
# (1) conforming_loan_limit == "C" within 5 pp
# (2) 1 and loan_type == 1 within 10 pp
# (3) 2 and total number of loans within 50 pp

# nice

z <- oh_hmda %>% 
  filter(msa_md == 18140) %>% 
  group_by(lei) %>% 
  mutate(total = n(),
         condition = sum(conforming_loan_limit == "C"),
         rate_conf = condition/total) %>% 
  ungroup %>%
  filter(rate_conf >= rate_conf[huntington == 1] - .1 & 
           rate_conf <= rate_conf[huntington == 1] + .1)

## Warning: There were 2 warnings in `filter()`.
## The first warning was:
## ℹ In argument: `&...`.
## Caused by warning in `rate_conf >= rate_conf[huntington == 1] - 0.1`:
## ! longer object length is not a multiple of shorter object length
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.

# first condition doesn't remove any rows
conditions = c("TRUE",
               "conforming_loan_limit == 'C'",
               "conforming_loan_limit == 'C' & loan_type == 1")

thresholds = c(1,
               .05,
               .1)

cities = c("17140", "17460", "18140")

oh_hmda_nested <- oh_hmda %>% 
  nest(!msa_md)

## Warning: Supplying `...` without names was deprecated in tidyr 1.0.0.
## ℹ Please specify a name for each selection.
## ℹ Did you want `data = !msa_md`?
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

# saving models and iterating through 
models <- map(set_names(cities), \(city) {  
  map2(set_names(conditions), thresholds, \(condition, threshold) {
    temp <- oh_hmda %>%
      filter(msa_md == city) %>%
      group_by(lei) %>%
      mutate(
        total = n(),
        condition = sum(eval(parse_expr(condition))),
        rate_conf = condition / total
      ) %>%
      ungroup() %>%
      filter(
        rate_conf >= rate_conf[huntington == 1] - threshold & 
        rate_conf <= rate_conf[huntington == 1] + threshold
      )

    # Run regression model
    lm(maj_minority_tract ~ huntington, data = temp)
  })
})

## Warning: There were 2 warnings in `filter()`.
## The first warning was:
## ℹ In argument: `&...`.
## Caused by warning in `rate_conf >= rate_conf[huntington == 1] - threshold`:
## ! longer object length is not a multiple of shorter object length
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.

## Warning: There were 2 warnings in `filter()`.
## The first warning was:
## ℹ In argument: `&...`.
## Caused by warning in `rate_conf >= rate_conf[huntington == 1] - threshold`:
## ! longer object length is not a multiple of shorter object length
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
## There were 2 warnings in `filter()`.
## The first warning was:
## ℹ In argument: `&...`.
## Caused by warning in `rate_conf >= rate_conf[huntington == 1] - threshold`:
## ! longer object length is not a multiple of shorter object length
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
## There were 2 warnings in `filter()`.
## The first warning was:
## ℹ In argument: `&...`.
## Caused by warning in `rate_conf >= rate_conf[huntington == 1] - threshold`:
## ! longer object length is not a multiple of shorter object length
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
## There were 2 warnings in `filter()`.
## The first warning was:
## ℹ In argument: `&...`.
## Caused by warning in `rate_conf >= rate_conf[huntington == 1] - threshold`:
## ! longer object length is not a multiple of shorter object length
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
## There were 2 warnings in `filter()`.
## The first warning was:
## ℹ In argument: `&...`.
## Caused by warning in `rate_conf >= rate_conf[huntington == 1] - threshold`:
## ! longer object length is not a multiple of shorter object length
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
## There were 2 warnings in `filter()`.
## The first warning was:
## ℹ In argument: `&...`.
## Caused by warning in `rate_conf >= rate_conf[huntington == 1] - threshold`:
## ! longer object length is not a multiple of shorter object length
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
## There were 2 warnings in `filter()`.
## The first warning was:
## ℹ In argument: `&...`.
## Caused by warning in `rate_conf >= rate_conf[huntington == 1] - threshold`:
## ! longer object length is not a multiple of shorter object length
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
## There were 2 warnings in `filter()`.
## The first warning was:
## ℹ In argument: `&...`.
## Caused by warning in `rate_conf >= rate_conf[huntington == 1] - threshold`:
## ! longer object length is not a multiple of shorter object length
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.

map(names(models), \(city) {
  modelplot(models[[city]], 
            coef_omit = 'Interc', 
            background = list(geom_vline(xintercept = 0, color = 'orange'))) +
    ggtitle(city) +
  scale_color_manual(values = wes_palette('Darjeeling1'))
})

## [[1]]

## 
## [[2]]

## 
## [[3]]

Group Purrr Assignment

Connor Tragesser

2025-02-20

Introduction