In their Combatting Redlining Initiative, the Department of Justice defines redlining as an “illegal practice in which lenders avoid providing credit services to individuals living in or seeking to live in, communities of color because of the race, color, or national origin of the residents in those communities.” Modern day redlining litigation today consists of defining ‘peer lending institutions’ and comparing the location of loan originations to a target institution, within an urban whole. To that end, I have prepared a dataframe that contains all the home loan originations made in Ohio in 2022, limiting the sample to single-family homes. We will see whether Huntington Bank made loans to homes in majority-minority neighborhoods at a significantly lower rate than peer institutions. The main content of this kind of litigation involves the definition of a peer lender.
Let’s explore whether Huntington seemed to engage in redlining in Columbus without limiting the sample, i.e. treating every other institution as a peer.
oh_hmda <- read_csv('https://www.dropbox.com/scl/fi/oxtr5j32cbckidz9ur6g6/state_OH_actions_taken_1-2-3_dwelling_categories_Single-Family-1-4-Units-_Site-Built-Single-Family.csv?rlkey=fq1qebp6ig5nnky3a1ak5bn6j&st=6g4vqx18&dl=1') %>%
# selecting needed variables
select(lei,
county_code,
`derived_msa-md`,
tract_minority_population_percent,
census_tract,
conforming_loan_limit,
action_taken,
preapproval,
loan_purpose,
loan_type,
business_or_commercial_purpose,
construction_method,
occupancy_type,
multifamily_affordable_units,
tract_to_msa_income_percentage) %>%
# annoying name fixed
rename(msa_md = `derived_msa-md`) %>%
# only looking at originations
filter(action_taken == 1) %>%
# tag majority-minority tracts & Huntington bank
mutate(maj_minority_tract = ifelse(tract_minority_population_percent > 50,
1,
0),
huntington = ifelse(lei == "2WHM8VNJH63UN14OL754",
1,
0)) %>%
as_tibble()
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
## dat <- vroom(...)
## problems(dat)
## Rows: 417417 Columns: 99
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (26): lei, state_code, conforming_loan_limit, derived_loan_product_type,...
## dbl (62): activity_year, derived_msa-md, county_code, census_tract, action_t...
## lgl (11): applicant_ethnicity-3, applicant_ethnicity-4, applicant_ethnicity-...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
table <- oh_hmda %>%
filter(msa_md == 18140) %>%
group_by(huntington, maj_minority_tract) %>%
summarise(n = n())
## `summarise()` has grouped output by 'huntington'. You can override using the
## `.groups` argument.
summary(lm(maj_minority_tract ~ huntington, data = oh_hmda))
##
## Call:
## lm(formula = maj_minority_tract ~ huntington, data = oh_hmda)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0894 -0.0894 -0.0894 -0.0894 0.9331
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0893954 0.0005297 168.77 <2e-16 ***
## huntington -0.0224892 0.0016969 -13.25 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2821 on 314146 degrees of freedom
## Multiple R-squared: 0.0005588, Adjusted R-squared: 0.0005556
## F-statistic: 175.6 on 1 and 314146 DF, p-value: < 2.2e-16
oh_hmda %>%
group_by(msa_md) %>%
summarise(n = n()) %>%
arrange(desc(n))
## # A tibble: 15 × 2
## msa_md n
## <dbl> <int>
## 1 18140 64327
## 2 17460 57345
## 3 17140 51083
## 4 99999 48809
## 5 19430 22265
## 6 10420 20055
## 7 45780 15779
## 8 15940 11132
## 9 49660 10887
## 10 44220 3380
## 11 31900 3012
## 12 30620 2574
## 13 48260 1222
## 14 48540 1177
## 15 26580 1101
Uh oh… not looking wonderful. However, Huntington is a full service, huge lender. Some lenders may specialize in loans geared towards lower income people, such as FHA, and small lenders may have better coverage in majority-minority communities. Let’s do the same analyeses, but with the following peer groups.
Lenders with…
All lenders (no peer limitation)
The proportion of loans Conforming to Fannie’s loan limit (conforming_loan_limit == C) within 5 percentage points of Huntington
Peer group 2 adding condition of the proportion of Conventional loans (loan_type == 1) within 10 percentage points of Huntington
Peer group 3 adding condition total number of originated loans (Remember, I have already limited the sample to originated loans) within 50 percentage points.
Here, I’m saying ‘within x percentage points’ as +/- the target value. For example, if a lender made 10 originations, the peers making loans within 10 percentage points would have made 10 - (10 * .1) = 9 to 10 + (10 * .1) = 11 originations.
Let’s do this analysis for Cincinnati (msa_md == 17140), Cleveland (msa_md == 17460), and Columbus (msa_md == 18140).
Remember these peer groups should be defined after filtering to the MSA of interest. Also, lenders are defined by their LEI, so you will need to group by LEI when defining peers. Think like so…
# identifying the lenders who made more than 1500 originations in Columbus in 2022
oh_hmda %>%
filter(msa_md == 18140) %>%
group_by(lei) %>%
summarise(n = n()) %>%
filter(n > 1500) %>%
arrange(desc(n))
## # A tibble: 9 × 2
## lei n
## <chr> <int>
## 1 2WHM8VNJH63UN14OL754 6467
## 2 549300FGXN1K3HLB1R50 2780
## 3 QFROUN1UWUYU0DVIWD51 2713
## 4 5493008CPTDVOS570626 2559
## 5 549300HFBEONQN2CK447 2132
## 6 AD6GFRVSDT01YPT1CS68 2037
## 7 7H6GLXDRUGQFU57RNE97 1696
## 8 549300Y6FMTOZSY0VW06 1669
## 9 549300RPOGWJRH63HS39 1581
# 2WHM8VNJH63UN14OL754 is Huntington, 549300FGXN1K3HLB1R50 is Rocket Mortgage, QFROUN1UWUYU0DVIWD51 if fifth third... face valid
I’m imagining making a nested dataframe for each city we’re interested in, and passing through a list of peer conditions before doing a uncontrolled regression (or a 2-sample difference in means with majority minority tract as the response and Huntington as the group, they’re equivalent here).
My attempt
# look at columbus first
# conditions:
# (1) conforming_loan_limit == "C" within 5 pp
# (2) 1 and loan_type == 1 within 10 pp
# (3) 2 and total number of loans within 50 pp
# nice
z <- oh_hmda %>%
filter(msa_md == 18140) %>%
group_by(lei) %>%
mutate(total = n(),
condition = sum(conforming_loan_limit == "C"),
rate_conf = condition/total) %>%
ungroup %>%
filter(rate_conf >= rate_conf[huntington == 1] - .1 &
rate_conf <= rate_conf[huntington == 1] + .1)
## Warning: There were 2 warnings in `filter()`.
## The first warning was:
## ℹ In argument: `&...`.
## Caused by warning in `rate_conf >= rate_conf[huntington == 1] - 0.1`:
## ! longer object length is not a multiple of shorter object length
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
# first condition doesn't remove any rows
conditions = c("TRUE",
"conforming_loan_limit == 'C'",
"conforming_loan_limit == 'C' & loan_type == 1")
thresholds = c(1,
.05,
.1)
cities = c("17140", "17460", "18140")
oh_hmda_nested <- oh_hmda %>%
nest(!msa_md)
## Warning: Supplying `...` without names was deprecated in tidyr 1.0.0.
## ℹ Please specify a name for each selection.
## ℹ Did you want `data = !msa_md`?
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
# saving models and iterating through
models <- map(set_names(cities), \(city) {
map2(set_names(conditions), thresholds, \(condition, threshold) {
temp <- oh_hmda %>%
filter(msa_md == city) %>%
group_by(lei) %>%
mutate(
total = n(),
condition = sum(eval(parse_expr(condition))),
rate_conf = condition / total
) %>%
ungroup() %>%
filter(
rate_conf >= rate_conf[huntington == 1] - threshold &
rate_conf <= rate_conf[huntington == 1] + threshold
)
# Run regression model
lm(maj_minority_tract ~ huntington, data = temp)
})
})
## Warning: There were 2 warnings in `filter()`.
## The first warning was:
## ℹ In argument: `&...`.
## Caused by warning in `rate_conf >= rate_conf[huntington == 1] - threshold`:
## ! longer object length is not a multiple of shorter object length
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
## Warning: There were 2 warnings in `filter()`.
## The first warning was:
## ℹ In argument: `&...`.
## Caused by warning in `rate_conf >= rate_conf[huntington == 1] - threshold`:
## ! longer object length is not a multiple of shorter object length
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
## There were 2 warnings in `filter()`.
## The first warning was:
## ℹ In argument: `&...`.
## Caused by warning in `rate_conf >= rate_conf[huntington == 1] - threshold`:
## ! longer object length is not a multiple of shorter object length
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
## There were 2 warnings in `filter()`.
## The first warning was:
## ℹ In argument: `&...`.
## Caused by warning in `rate_conf >= rate_conf[huntington == 1] - threshold`:
## ! longer object length is not a multiple of shorter object length
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
## There were 2 warnings in `filter()`.
## The first warning was:
## ℹ In argument: `&...`.
## Caused by warning in `rate_conf >= rate_conf[huntington == 1] - threshold`:
## ! longer object length is not a multiple of shorter object length
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
## There were 2 warnings in `filter()`.
## The first warning was:
## ℹ In argument: `&...`.
## Caused by warning in `rate_conf >= rate_conf[huntington == 1] - threshold`:
## ! longer object length is not a multiple of shorter object length
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
## There were 2 warnings in `filter()`.
## The first warning was:
## ℹ In argument: `&...`.
## Caused by warning in `rate_conf >= rate_conf[huntington == 1] - threshold`:
## ! longer object length is not a multiple of shorter object length
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
## There were 2 warnings in `filter()`.
## The first warning was:
## ℹ In argument: `&...`.
## Caused by warning in `rate_conf >= rate_conf[huntington == 1] - threshold`:
## ! longer object length is not a multiple of shorter object length
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
## There were 2 warnings in `filter()`.
## The first warning was:
## ℹ In argument: `&...`.
## Caused by warning in `rate_conf >= rate_conf[huntington == 1] - threshold`:
## ! longer object length is not a multiple of shorter object length
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
map(names(models), \(city) {
modelplot(models[[city]],
coef_omit = 'Interc',
background = list(geom_vline(xintercept = 0, color = 'orange'))) +
ggtitle(city) +
scale_color_manual(values = wes_palette('Darjeeling1'))
})
## [[1]]
##
## [[2]]
##
## [[3]]