“The Lasting Legacy of Redlining” aims to examine the long-term effects of redlining practices on racial disparities within many different metropolitan areas of the United States. My goal with this project is to analyze data on the practice of red-lining which occurred in the 20th century throughout the United States. The practice of red-lining was implemented after the abolishment of slavery and one which aimed to prevent or prohibit individuals of color from purchasing properties in certain neighborhoods. This unethical practice of segregation continues to manifest issues that have persisted since its inception. My research will investigate how the legacy of redlining is influencing current demographic and socioeconomic conditions. These include but are not limited to housing, wealth inequality and access to resources. The data that I’ll be utilizing is from the Mapping Inequality project, the study will analyze 2020 population estimates by race and ethnicity within zones assigned different redlining grades by the Home Owners Loan Corporation (HOLC) from 1935-40. I’m particularly interested in analyzing the impact of redlining on wealth inequality which I’ll attempt to analyze utilizing data on the appreciation of residential real estate properties in certain areas in comparison to areas that are comprised primarily of minority populations. I hope this project will give better insight on the factors that have contributed to systematic inequality and its persistence in our modern society. Throughout my life I’ve heard a saying which has resonated with me when I decided to embark on this project which states “A person is a product of their environment”. I believe this statement to be in fact true and I concur that redlining is a practice that leads the proliferation of people with similar disadvantaged backgrounds living in one area often resulting in concentrated poverty and perpetuating cycles of socioeconomic disadvantages.
We will begin by loading the necessary libraries
library(readr)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ purrr 1.0.2
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(data.table)
##
## Attaching package: 'data.table'
##
## The following objects are masked from 'package:lubridate':
##
## hour, isoweek, mday, minute, month, quarter, second, wday, week,
## yday, year
##
## The following objects are masked from 'package:dplyr':
##
## between, first, last
##
## The following object is masked from 'package:purrr':
##
## transpose
library(ggplot2)
metro_grades <- read_csv("https://raw.githubusercontent.com/Zcash95/data/master/redlining/metro-grades.csv")
## Rows: 551 Columns: 28
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): metro_area, holc_grade
## dbl (26): white_pop, black_pop, hisp_pop, asian_pop, other_pop, total_pop, p...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
zone_block_matches <- read_csv("https://raw.githubusercontent.com/Zcash95/data/master/redlining/zone-block-matches.csv")
## Rows: 752351 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): holc_city, holc_state, holc_grade, holc_id, block_geoid20
## dbl (2): holc_neighborhood_id, pct_match
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
homes_sales_data <- fread("https://raw.githubusercontent.com/Zcash95/DATA607-FINALPROJECT/main/FRED%20Price%20of%20Houses%20Sold%20by%20Census%20Region%20963-01-01%20to%202024-01-01%20(Dollars).csv")
glimpse(metro_grades)
## Rows: 551
## Columns: 28
## $ metro_area <chr> "Akron, OH", "Akron, OH", "Akron, OH", "Akron, OH"…
## $ holc_grade <chr> "A", "B", "C", "D", "A", "B", "C", "D", "A", "B", …
## $ white_pop <dbl> 24702, 41531, 73105, 6179, 16989, 26644, 56878, 16…
## $ black_pop <dbl> 8624, 16499, 22847, 6921, 1818, 7094, 16795, 19581…
## $ hisp_pop <dbl> 956, 2208, 3149, 567, 1317, 4334, 10357, 6688, 367…
## $ asian_pop <dbl> 688, 3367, 6291, 455, 1998, 2509, 6355, 2191, 21, …
## $ other_pop <dbl> 1993, 4211, 7302, 1022, 1182, 4650, 11153, 4364, 8…
## $ total_pop <dbl> 36963, 67816, 112694, 15144, 23303, 45230, 101538,…
## $ pct_white <dbl> 66.83, 61.24, 64.87, 40.80, 72.91, 58.91, 56.02, 3…
## $ pct_black <dbl> 23.33, 24.33, 20.27, 45.70, 7.80, 15.68, 16.54, 39…
## $ pct_hisp <dbl> 2.59, 3.26, 2.79, 3.75, 5.65, 9.58, 10.20, 13.48, …
## $ pct_asian <dbl> 1.86, 4.96, 5.58, 3.00, 8.57, 5.55, 6.26, 4.42, 1.…
## $ pct_other <dbl> 5.39, 6.21, 6.48, 6.75, 5.07, 10.28, 10.98, 8.79, …
## $ lq_white <dbl> 0.94, 0.86, 0.91, 0.57, 1.09, 0.88, 0.84, 0.51, 1.…
## $ lq_black <dbl> 1.41, 1.47, 1.23, 2.76, 0.66, 1.33, 1.40, 3.35, 0.…
## $ lq_hisp <dbl> 1.00, 1.26, 1.08, 1.45, 0.77, 1.30, 1.39, 1.83, 0.…
## $ lq_asian <dbl> 0.46, 1.23, 1.38, 0.74, 1.21, 0.78, 0.88, 0.62, 0.…
## $ lq_other <dbl> 0.97, 1.11, 1.16, 1.21, 0.72, 1.47, 1.57, 1.26, 1.…
## $ surr_area_white_pop <dbl> 304399, 304399, 304399, 304399, 387016, 387016, 38…
## $ surr_area_black_pop <dbl> 70692, 70692, 70692, 70692, 68371, 68371, 68371, 6…
## $ surr_area_hisp_pop <dbl> 11037, 11037, 11037, 11037, 42699, 42699, 42699, 4…
## $ surr_area_asian_pop <dbl> 17295, 17295, 17295, 17295, 41112, 41112, 41112, 4…
## $ surr_area_other_pop <dbl> 23839, 23839, 23839, 23839, 40596, 40596, 40596, 4…
## $ surr_area_pct_white <dbl> 71.24, 71.24, 71.24, 71.24, 66.75, 66.75, 66.75, 6…
## $ surr_area_pct_black <dbl> 16.55, 16.55, 16.55, 16.55, 11.79, 11.79, 11.79, 1…
## $ surr_area_pct_hisp <dbl> 2.58, 2.58, 2.58, 2.58, 7.36, 7.36, 7.36, 7.36, 26…
## $ surr_area_pct_asian <dbl> 4.05, 4.05, 4.05, 4.05, 7.09, 7.09, 7.09, 7.09, 3.…
## $ surr_area_pct_other <dbl> 5.58, 5.58, 5.58, 5.58, 7.00, 7.00, 7.00, 7.00, 4.…
glimpse(zone_block_matches)
## Rows: 752,351
## Columns: 7
## $ holc_city <chr> "Akron", "Akron", "Akron", "Akron", "Akron", "Akr…
## $ holc_state <chr> "OH", "OH", "OH", "OH", "OH", "OH", "OH", "OH", "…
## $ holc_grade <chr> "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",…
## $ holc_id <chr> "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", "…
## $ holc_neighborhood_id <dbl> 3351, 3351, 3351, 3351, 3351, 3351, 3351, 3351, 3…
## $ block_geoid20 <chr> "391535307001006", "391535307003009", "3915353070…
## $ pct_match <dbl> 0.64629202, 0.55468982, 0.86498715, 0.96328946, 1…
glimpse(homes_sales_data)
## Rows: 4
## Columns: 248
## $ `Series ID` <chr> "MSPNE", "MSPMW", "MSPS", "MSPW"
## $ `Region Name` <chr> "Northeast", "Midwest", "South", "West"
## $ `Region Code` <int> 1, 2, 3, 4
## $ `1963-01-01` <int> 20800, 17500, 16800, 18000
## $ `1963-04-01` <int> 20600, 17700, 15800, 18900
## $ `1963-07-01` <int> 19600, 17800, 15900, 19000
## $ `1963-10-01` <int> 20600, 19100, 15800, 19500
## $ `1964-01-01` <int> 20300, 18700, 16500, 19600
## $ `1964-04-01` <int> 19800, 19800, 16800, 20100
## $ `1964-07-01` <int> 20200, 18900, 16800, 20600
## $ `1964-10-01` <int> 21400, 20800, 16700, 21500
## $ `1965-01-01` <int> 21000, 21900, 17400, 21600
## $ `1965-04-01` <int> 21900, 20800, 16400, 22100
## $ `1965-07-01` <int> 21200, 22100, 17700, 21500
## $ `1965-10-01` <int> 23100, 21100, 18100, 22400
## $ `1966-01-01` <int> 23700, 22500, 17800, 23300
## $ `1966-04-01` <int> 23700, 23300, 18800, 23800
## $ `1966-07-01` <int> 23700, 24100, 18200, 22500
## $ `1966-10-01` <int> 23400, 23700, 18900, 22800
## $ `1967-01-01` <int> 23800, 23000, 18900, 24700
## $ `1967-04-01` <int> 25400, 25900, 19400, 24700
## $ `1967-07-01` <int> 25600, 24100, 19400, 23600
## $ `1967-10-01` <int> 27500, 26900, 19600, 23300
## $ `1968-01-01` <int> 26600, 26900, 19800, 24700
## $ `1968-04-01` <int> 26900, 28200, 22000, 25500
## $ `1968-07-01` <int> 27800, 26800, 21700, 25500
## $ `1968-10-01` <int> 29900, 27700, 22400, 24800
## $ `1969-01-01` <int> 31500, 27300, 23100, 24800
## $ `1969-04-01` <int> 32100, 28700, 23300, 25600
## $ `1969-07-01` <int> 31300, 28700, 23000, 25400
## $ `1969-10-01` <int> 30500, 26400, 21100, 25500
## $ `1970-01-01` <int> 32000, 28500, 20000, 23800
## $ `1970-04-01` <int> 33100, 25000, 21100, 24700
## $ `1970-07-01` <int> 26800, 23800, 20400, 24200
## $ `1970-10-01` <int> 29200, 22700, 20000, 23600
## $ `1971-01-01` <int> 31000, 27100, 21300, 24700
## $ `1971-04-01` <int> 30500, 27800, 22700, 26200
## $ `1971-07-01` <int> 30900, 27600, 22900, 25300
## $ `1971-10-01` <int> 31300, 26200, 23200, 25600
## $ `1972-01-01` <int> 31900, 28300, 23900, 26200
## $ `1972-04-01` <int> 31100, 28400, 25200, 26300
## $ `1972-07-01` <int> 29400, 30200, 26600, 28300
## $ `1972-10-01` <int> 33800, 30800, 27500, 28800
## $ `1973-01-01` <int> 35100, 31500, 29100, 29600
## $ `1973-04-01` <int> 38400, 32200, 30900, 32800
## $ `1973-07-01` <int> 36900, 33500, 32200, 33600
## $ `1973-10-01` <int> 38600, 35000, 32100, 33900
## $ `1974-01-01` <int> 38600, 35200, 34800, 34400
## $ `1974-04-01` <int> 39500, 36000, 33600, 36100
## $ `1974-07-01` <int> 40700, 36200, 35100, 36000
## $ `1974-10-01` <int> 44000, 37500, 34900, 37200
## $ `1975-01-01` <int> 44000, 37400, 36700, 38500
## $ `1975-04-01` <int> 43400, 39700, 36800, 40400
## $ `1975-07-01` <int> 44400, 39300, 36500, 39800
## $ `1975-10-01` <int> 44400, 41900, 39000, 43400
## $ `1976-01-01` <int> 45700, 42500, 39800, 45100
## $ `1976-04-01` <int> 46300, 45800, 40400, 47400
## $ `1976-07-01` <int> 48400, 44100, 41400, 47300
## $ `1976-10-01` <int> 50200, 48400, 40000, 50000
## $ `1977-01-01` <int> 50900, 47600, 41200, 51700
## $ `1977-04-01` <int> 53200, 50800, 44300, 53200
## $ `1977-07-01` <int> 49300, 51100, 44600, 53700
## $ `1977-10-01` <int> 52000, 56400, 46100, 56800
## $ `1978-01-01` <int> 56800, 54700, 47700, 58800
## $ `1978-04-01` <int> 55400, 58700, 49900, 61800
## $ `1978-07-01` <int> 57800, 61400, 51300, 61000
## $ `1978-10-01` <int> 60700, 63200, 53000, 65500
## $ `1979-01-01` <int> 65500, 61300, 54300, 66500
## $ `1979-04-01` <int> 65800, 65300, 57400, 69600
## $ `1979-07-01` <int> 66500, 64300, 59700, 73400
## $ `1979-10-01` <int> 67100, 63100, 57300, 71700
## $ `1980-01-01` <int> 65400, 64700, 59000, 72100
## $ `1980-04-01` <int> 67900, 60100, 58700, 74400
## $ `1980-07-01` <int> 71900, 64000, 60400, 70800
## $ `1980-10-01` <int> 71400, 64200, 61200, 74100
## $ `1981-01-01` <int> 71300, 68300, 62300, 74700
## $ `1981-04-01` <int> 84000, 66900, 63500, 78500
## $ `1981-07-01` <int> 73700, 63600, 66000, 77500
## $ `1981-10-01` <int> 71100, 65600, 66100, 79600
## $ `1982-01-01` <int> 69900, 66400, 64400, 73300
## $ `1982-04-01` <int> 81600, 68600, 65700, 77900
## $ `1982-07-01` <int> 81300, 64800, 67300, 72000
## $ `1982-10-01` <int> 76800, 75800, 68100, 77500
## $ `1983-01-01` <int> 77800, 77000, 69200, 78800
## $ `1983-04-01` <int> 81000, 80200, 69300, 78900
## $ `1983-07-01` <int> 84100, 81500, 73200, 82600
## $ `1983-10-01` <int> 84900, 78100, 71900, 81900
## $ `1984-01-01` <int> 82700, 81400, 71900, 85600
## $ `1984-04-01` <int> 94200, 86400, 71900, 87400
## $ `1984-07-01` <int> 91200, 87500, 71100, 87400
## $ `1984-10-01` <int> 86600, 86500, 73000, 89800
## $ `1985-01-01` <int> 89800, 80100, 74800, 92100
## $ `1985-04-01` <int> 99800, 79600, 73600, 93700
## $ `1985-07-01` <int> 108800, 78200, 74300, 90600
## $ `1985-10-01` <int> 105700, 81800, 78300, 95300
## $ `1986-01-01` <int> 121500, 84100, 78400, 92800
## $ `1986-04-01` <int> 121000, 89900, 79800, 95200
## $ `1986-07-01` <int> 125000, 87500, 81900, 99400
## $ `1986-10-01` <int> 131500, 89900, 84400, 98500
## $ `1987-01-01` <int> 139000, 90100, 85800, 104200
## $ `1987-04-01` <int> 145000, 95000, 88000, 113700
## $ `1987-07-01` <int> 139900, 92000, 88900, 118000
## $ `1987-10-01` <int> 142000, 103000, 89800, 116400
## $ `1988-01-01` <int> 149000, 104100, 91000, 124500
## $ `1988-04-01` <int> 145000, 104000, 89300, 123000
## $ `1988-07-01` <int> 153000, 95000, 95800, 131000
## $ `1988-10-01` <int> 155000, 97500, 93000, 129000
## $ `1989-01-01` <int> 168500, 109900, 93000, 135900
## $ `1989-04-01` <int> 144900, 113000, 97500, 136000
## $ `1989-07-01` <int> 165700, 99500, 94900, 139000
## $ `1989-10-01` <int> 161400, 110000, 96500, 144900
## $ `1990-01-01` <int> 150000, 114000, 98900, 145000
## $ `1990-04-01` <int> 159900, 116500, 103000, 150000
## $ `1990-07-01` <int> 158000, 99500, 95900, 150000
## $ `1990-10-01` <int> 167000, 97000, 98000, 145000
## $ `1991-01-01` <int> 153900, 115000, 101300, 145000
## $ `1991-04-01` <int> 150000, 110000, 100900, 143500
## $ `1991-07-01` <int> 155200, 107000, 99700, 144000
## $ `1991-10-01` <int> 169000, 112900, 100000, 136000
## $ `1992-01-01` <int> 166900, 112400, 106500, 129900
## $ `1992-04-01` <int> 175000, 120000, 101000, 129000
## $ `1992-07-01` <int> 170000, 110000, 102000, 134500
## $ `1992-10-01` <int> 165000, 125000, 110000, 132300
## $ `1993-01-01` <int> 150000, 123800, 109000, 134000
## $ `1993-04-01` <int> 175000, 125000, 115500, 135000
## $ `1993-07-01` <int> 155000, 127500, 114000, 136600
## $ `1993-10-01` <int> 162600, 124400, 115000, 135200
## $ `1994-01-01` <int> 159900, 133000, 116200, 140000
## $ `1994-04-01` <int> 172000, 131800, 118500, 137000
## $ `1994-07-01` <int> 165000, 133300, 113700, 140000
## $ `1994-10-01` <int> 169000, 130000, 117900, 148000
## $ `1995-01-01` <int> 179900, 130000, 118000, 139400
## $ `1995-04-01` <int> 179900, 136000, 124500, 140000
## $ `1995-07-01` <int> 179900, 131000, 121000, 143000
## $ `1995-10-01` <int> 183500, 135000, 127000, 143000
## $ `1996-01-01` <int> 179000, 135200, 125500, 148200
## $ `1996-04-01` <int> 199700, 138200, 125000, 155900
## $ `1996-07-01` <int> 181000, 134900, 123900, 154800
## $ `1996-10-01` <int> 200000, 145000, 127900, 160000
## $ `1997-01-01` <int> 204400, 144900, 127100, 159900
## $ `1997-04-01` <int> 189000, 148500, 129900, 160000
## $ `1997-07-01` <int> 180000, 150000, 127000, 159000
## $ `1997-10-01` <int> 195000, 144500, 129000, 159000
## $ `1998-01-01` <int> 196000, 160000, 131000, 163400
## $ `1998-04-01` <int> 200000, 152000, 132300, 159300
## $ `1998-07-01` <int> 212000, 159000, 137300, 166400
## $ `1998-10-01` <int> 200000, 156000, 138500, 165000
## $ `1999-01-01` <int> 195500, 165700, 142500, 168700
## $ `1999-04-01` <int> 214700, 155200, 143000, 170400
## $ `1999-07-01` <int> 206400, 163100, 141100, 173400
## $ `1999-10-01` <int> 212600, 165100, 149600, 188000
## $ `2000-01-01` <int> 229300, 165300, 148000, 190300
## $ `2000-04-01` <int> 239500, 165900, 142500, 190800
## $ `2000-07-01` <int> 212800, 162200, 152300, 192700
## $ `2000-10-01` <int> 241400, 180000, 148400, 210300
## $ `2001-01-01` <int> 242800, 170400, 154700, 206400
## $ `2001-04-01` <int> 255200, 177200, 156100, 204600
## $ `2001-07-01` <int> 244200, 168600, 152800, 218100
## $ `2001-10-01` <int> 247800, 169500, 152100, 227200
## $ `2002-01-01` <int> 254200, 181800, 160900, 238500
## $ `2002-04-01` <int> 261100, 173000, 164000, 239600
## $ `2002-07-01` <int> 255400, 170900, 158200, 228100
## $ `2002-10-01` <int> 287100, 179800, 165400, 244400
## $ `2003-01-01` <int> 208100, 178200, 165800, 253700
## $ `2003-04-01` <int> 279900, 176500, 164600, 245600
## $ `2003-07-01` <int> 259400, 184000, 163400, 272200
## $ `2003-10-01` <int> 290000, 189600, 169400, 272800
## $ `2004-01-01` <int> 292000, 208900, 173800, 273300
## $ `2004-04-01` <int> 290300, 203500, 171400, 278700
## $ `2004-07-01` <int> 347700, 198100, 176700, 277100
## $ `2004-10-01` <int> 357400, 214300, 190900, 297000
## $ `2005-01-01` <int> 366800, 219000, 188600, 309800
## $ `2005-04-01` <int> 325700, 208900, 192000, 329900
## $ `2005-07-01` <int> 318700, 202700, 190000, 344300
## $ `2005-10-01` <int> 370300, 224200, 200000, 332000
## $ `2006-01-01` <int> 334600, 210700, 205900, 330000
## $ `2006-04-01` <int> 344600, 203100, 206700, 329800
## $ `2006-07-01` <int> 380500, 216800, 195100, 342200
## $ `2006-10-01` <int> 351400, 216200, 207400, 356500
## $ `2007-01-01` <int> 370300, 212800, 222900, 341500
## $ `2007-04-01` <int> 304900, 203200, 208300, 344600
## $ `2007-07-01` <int> 301300, 209600, 214900, 310200
## $ `2007-10-01` <int> 336900, 197400, 214900, 321300
## $ `2008-01-01` <int> 325900, 219200, 202200, 293700
## $ `2008-04-01` <int> 352500, 198500, 208100, 302500
## $ `2008-07-01` <int> 385200, 184700, 203300, 290700
## $ `2008-10-01` <int> 300700, 202500, 188700, 296800
## $ `2009-01-01` <int> 314800, 187100, 189300, 274300
## $ `2009-04-01` <int> 272500, 193200, 201000, 272400
## $ `2009-07-01` <int> 322200, 184900, 189700, 253700
## $ `2009-10-01` <int> 324600, 196000, 191800, 251900
## $ `2010-01-01` <int> 337400, 203800, 187900, 263600
## $ `2010-04-01` <int> 348700, 192400, 195200, 264100
## $ `2010-07-01` <int> 291000, 191800, 203900, 259500
## $ `2010-10-01` <int> 358000, 205800, 198500, 248900
## $ `2011-01-01` <int> 336200, 196800, 209800, 251400
## $ `2011-04-01` <int> 289100, 211600, 209900, 259200
## $ `2011-07-01` <int> 324100, 195400, 210300, 251400
## $ `2011-10-01` <int> 322800, 209800, 201200, 252000
## $ `2012-01-01` <int> 305400, 223100, 217300, 272300
## $ `2012-04-01` <int> 360900, 230600, 211700, 258600
## $ `2012-07-01` <int> 385700, 239500, 226200, 265500
## $ `2012-10-01` <int> 374300, 219200, 237500, 291200
## $ `2013-01-01` <int> 370300, 221700, 232400, 290500
## $ `2013-04-01` <int> 373200, 261900, 250200, 302400
## $ `2013-07-01` <int> 320100, 257300, 237800, 322900
## $ `2013-10-01` <int> 421400, 258400, 245100, 331000
## $ `2014-01-01` <int> 339800, 256000, 252000, 324700
## $ `2014-04-01` <int> 361900, 269100, 265400, 344400
## $ `2014-07-01` <int> 453900, 269700, 253200, 327500
## $ `2014-10-01` <int> 398600, 282300, 276400, 355400
## $ `2015-01-01` <int> 456800, 277200, 263900, 347200
## $ `2015-04-01` <int> 397800, 270100, 261800, 341000
## $ `2015-07-01` <int> 383700, 276300, 276100, 330800
## $ `2015-10-01` <int> 501900, 277300, 278700, 370300
## $ `2016-01-01` <int> 417100, 267700, 277400, 350800
## $ `2016-04-01` <int> 454100, 270100, 277100, 380000
## $ `2016-07-01` <int> 434000, 278700, 280200, 368600
## $ `2016-10-01` <int> 403700, 295400, 285900, 371100
## $ `2017-01-01` <int> 566500, 276000, 281400, 372500
## $ `2017-04-01` <int> 472200, 288300, 285400, 386300
## $ `2017-07-01` <int> 445800, 278500, 295300, 385500
## $ `2017-10-01` <int> 496500, 285600, 298500, 409700
## $ `2018-01-01` <int> 437500, 291200, 295800, 408000
## $ `2018-04-01` <int> 453300, 277600, 285300, 423400
## $ `2018-07-01` <int> 503700, 292100, 296100, 408600
## $ `2018-10-01` <int> 519700, 300300, 291600, 404300
## $ `2019-01-01` <int> 480300, 288700, 280000, 402000
## $ `2019-04-01` <int> 453500, 273800, 292400, 411400
## $ `2019-07-01` <int> 543400, 294700, 289900, 399600
## $ `2019-10-01` <int> 469500, 284200, 290900, 417500
## $ `2020-01-01` <int> 512100, 288300, 279900, 411900
## $ `2020-04-01` <int> 441000, 287200, 290600, 404300
## $ `2020-07-01` <int> 449500, 311700, 294600, 403000
## $ `2020-10-01` <int> 508100, 294600, 320000, 427300
## $ `2021-01-01` <int> 511700, 320600, 327300, 473500
## $ `2021-04-01` <int> 543800, 324100, 342200, 490200
## $ `2021-07-01` <int> 523800, 358800, 372500, 516000
## $ `2021-10-01` <int> 615900, 372700, 378000, 548300
## $ `2022-01-01` <int> 580600, 393500, 385900, 574400
## $ `2022-04-01` <int> 577100, 412500, 408800, 582600
## $ `2022-07-01` <int> 699000, 409900, 437200, 567400
## $ `2022-10-01` <int> 686600, 355000, 448600, 564600
## $ `2023-01-01` <int> 727800, 384900, 385600, 558200
## $ `2023-04-01` <int> 744500, 393800, 374300, 543500
## $ `2023-07-01` <int> 893300, 423100, 394100, 513200
## $ `2023-10-01` <int> 707900, 370800, 388400, 520700
## $ `2024-01-01` <int> 785300, 375800, 376500, 548400
We will begin by viewing how many total HOLC neighborhood ratings there are. We will do this to visualize how many different neighborhoods are categorized for every grade rating. We can see that there are 551 grade ratings. These ratings consist of nearly the same amount of neighborhoods in each grade.
# Count how many HOLC neighborhoods are in each grade
holc_grade_counts <- table(metro_grades$holc_grade)
print("HOLC Grade Counts:")
## [1] "HOLC Grade Counts:"
print(holc_grade_counts)
##
## A B C D
## 138 138 137 138
We will now conduct some more exploratory analysis on this dataset
# Count how many places have majority white populations and majority black populations
majority_white <- subset(metro_grades, pct_white > 50)
majority_black <- subset(metro_grades, pct_black > 50)
cat("Number of places with majority white populations:", nrow(majority_white), "\n")
## Number of places with majority white populations: 332
cat("Number of places with majority black populations:", nrow(majority_black), "\n")
## Number of places with majority black populations: 46
Exploratory Analysis
# Filter cities with A grade
grade_A_cities <- subset(metro_grades, holc_grade == 'A')
# Filter cities with majority white population among A grade cities
majority_white_A <- subset(grade_A_cities, pct_white > 50)
# Calculate the number of cities with majority white population among A grade cities
num_majority_white_A <- nrow(majority_white_A)
# Print the number of cities with majority white population among A grade cities
cat("Number of HOLC Grade A cities with majority white population:", num_majority_white_A, "\n")
## Number of HOLC Grade A cities with majority white population: 125
# Plotting
barplot(majority_white_A$total_pop, names.arg = majority_white_A$metro_area, horiz = TRUE,
col = "skyblue", xlab = "Total Population",
main = "Cities with HOLC Grade A and Majority White Population")
We want to determine the composition of race in the neighborhoods with grades of A. We then want to find what is the average white population in these grade A neighborhoods. We discover that the grade A neighborhoods composed of a 73% white population.
# Calculate percentage of white population
grade_A_cities$white_percentage <- (grade_A_cities$white_pop / grade_A_cities$total_pop) * 100
# View the data
print(grade_A_cities$white_percentage)
## [1] 66.82899 72.90478 66.58416 91.52174 68.41044 90.14837 80.16027 82.91721
## [9] 89.49233 70.26740 40.43190 78.18533 90.52478 11.29823 75.92593 94.13026
## [17] 76.08255 68.91367 77.18317 77.27697 84.98302 87.26924 80.33232 67.76311
## [25] 89.20199 68.69090 90.16473 73.77049 79.94616 76.58618 84.32112 72.27017
## [33] 78.93951 82.46501 79.18903 43.82354 87.31321 88.40705 75.02298 21.49694
## [41] 78.91738 90.70796 84.95025 69.43794 82.84869 54.86173 84.62319 87.16418
## [49] 60.77670 71.42623 47.65448 83.05728 82.96893 79.01542 63.01491 86.93223
## [57] 82.95004 88.68513 86.19730 72.01237 74.17461 91.26095 77.27825 77.84658
## [65] 73.87164 56.57082 89.49828 77.09394 78.39196 84.96802 81.76166 78.77429
## [73] 40.33067 70.57526 82.35027 74.82993 74.55228 88.91688 88.78227 82.24661
## [81] 65.33392 78.29766 65.53795 75.70884 63.30440 72.36943 76.31154 40.79299
## [89] 67.27966 79.33784 81.14082 82.45614 57.24616 63.32500 57.94352 81.01473
## [97] 87.32345 92.83054 80.16635 82.89230 79.15781 70.61773 61.93294 85.97364
## [105] 29.45393 73.47658 63.20230 49.83660 47.39718 61.52839 79.22137 76.65291
## [113] 60.92652 80.56384 84.79015 85.53496 82.80605 85.67543 77.28575 77.80488
## [121] 68.46704 48.37209 68.84717 81.58709 86.14674 70.81563 79.85417 16.56903
## [129] 74.44771 86.77091 69.32779 41.07305 76.79213 88.69427 64.18385 88.47615
## [137] 80.89115 61.84478
# Calculate average percentage of white population
average_white_percentage <- mean(grade_A_cities$white_percentage, na.rm = TRUE)
# Print the result
print(average_white_percentage)
## [1] 73.77562
# Count how many places have majority white populations and majority black populations
majority_white <- subset(metro_grades, pct_white > 50)
majority_black <- subset(metro_grades, pct_black > 50)
cat("Number of places with majority white populations:", nrow(majority_white), "\n")
## Number of places with majority white populations: 332
cat("Number of places with majority black populations:", nrow(majority_black), "\n")
## Number of places with majority black populations: 46
# Sort by total white population from largest to smallest
metro_grades_sorted <- metro_grades[order(-metro_grades$white_pop), ]
# List the top 20 cities with the highest white population
top_20_white_population <- head(metro_grades_sorted, 20)
print(top_20_white_population[, c("metro_area", "white_pop")])
## # A tibble: 20 × 2
## metro_area white_pop
## <chr> <dbl>
## 1 New York-Newark-Jersey City, NY-NJ-PA 1164087
## 2 New York-Newark-Jersey City, NY-NJ-PA 914030
## 3 New York-Newark-Jersey City, NY-NJ-PA 752223
## 4 Chicago-Naperville-Elgin, IL-IN-WI 579196
## 5 Los Angeles-Long Beach-Anaheim, CA 533108
## 6 Boston-Cambridge-Newton, MA-NH 452120
## 7 Detroit-Warren-Dearborn, MI 278844
## 8 Chicago-Naperville-Elgin, IL-IN-WI 250471
## 9 Los Angeles-Long Beach-Anaheim, CA 247872
## 10 Philadelphia-Camden-Wilmington, PA-NJ-DE-MD 210709
## 11 New York-Newark-Jersey City, NY-NJ-PA 205702
## 12 Chicago-Naperville-Elgin, IL-IN-WI 202942
## 13 Philadelphia-Camden-Wilmington, PA-NJ-DE-MD 192442
## 14 Boston-Cambridge-Newton, MA-NH 179722
## 15 San Francisco-Oakland-Berkeley, CA 173369
## 16 Los Angeles-Long Beach-Anaheim, CA 172344
## 17 Boston-Cambridge-Newton, MA-NH 170439
## 18 Seattle-Tacoma-Bellevue, WA 160312
## 19 Cleveland-Elyria, OH 155360
## 20 Minneapolis-St. Paul-Bloomington, MN-WI 142550
# Get the count of HOLC grades for the top 20 highest populated white neighborhoods
top_20_holc_grade_counts <- table(top_20_white_population$holc_grade)
print("HOLC Grade Counts for Top 20 Highest Populated White Neighborhoods:")
## [1] "HOLC Grade Counts for Top 20 Highest Populated White Neighborhoods:"
print(top_20_holc_grade_counts)
##
## A B C D
## 1 7 7 5
# Plotting
barplot(top_20_white_population$white_pop, names.arg = top_20_white_population$metro_area, horiz = TRUE,
col = "skyblue", xlab = "Total White Population",
main = "Top 20 Cities with Highest White Population")
# Sort by total black population from largest to smallest
metro_grades_sorted <- metro_grades[order(-metro_grades$black_pop), ]
# List the top 20 cities with the highest black population
top_20_black_population <- head(metro_grades_sorted, 20)
print(top_20_black_population[, c("metro_area", "black_pop")])
## # A tibble: 20 × 2
## metro_area black_pop
## <chr> <dbl>
## 1 New York-Newark-Jersey City, NY-NJ-PA 894704
## 2 New York-Newark-Jersey City, NY-NJ-PA 781692
## 3 Chicago-Naperville-Elgin, IL-IN-WI 447000
## 4 New York-Newark-Jersey City, NY-NJ-PA 346677
## 5 Chicago-Naperville-Elgin, IL-IN-WI 297196
## 6 Detroit-Warren-Dearborn, MI 235388
## 7 Philadelphia-Camden-Wilmington, PA-NJ-DE-MD 211220
## 8 Los Angeles-Long Beach-Anaheim, CA 205869
## 9 Philadelphia-Camden-Wilmington, PA-NJ-DE-MD 202592
## 10 Philadelphia-Camden-Wilmington, PA-NJ-DE-MD 179683
## 11 Boston-Cambridge-Newton, MA-NH 136685
## 12 Cleveland-Elyria, OH 136527
## 13 Detroit-Warren-Dearborn, MI 116045
## 14 Chicago-Naperville-Elgin, IL-IN-WI 113905
## 15 Baltimore-Columbia-Towson, MD 109091
## 16 Detroit-Warren-Dearborn, MI 105556
## 17 Milwaukee-Waukesha, WI 91603
## 18 Baltimore-Columbia-Towson, MD 90595
## 19 Los Angeles-Long Beach-Anaheim, CA 88680
## 20 Indianapolis-Carmel-Anderson, IN 83952
# Get the count of HOLC grades for the top 20 highest populated black neighborhoods
top_20_holc_grade_counts <- table(top_20_black_population$holc_grade)
print("HOLC Grade Counts for Top 20 Highest Populated Black Neighborhoods:")
## [1] "HOLC Grade Counts for Top 20 Highest Populated Black Neighborhoods:"
print(top_20_holc_grade_counts)
##
## B C D
## 5 10 5
# Plotting
barplot(top_20_black_population$black_pop,
names.arg = top_20_black_population$metro_area,
horiz = TRUE,
col = "skyblue",
xlab = "Total Black Population",
main = "Top 20 Cities with Highest Black Population")
We created a bar graph giving us the distribution of each race in each neighborhood grade level. This displayed the concentration of white populations in higher rated neighborhoods and minority populations in lower-grade neighborhoods. Thus having an impact on housing values and wealth accumulation. These disparities perpetuate systemic racism and socioeconomic disadvantages, affecting access to quality housing, education, and employment opportunities.
# Grouping the data by holc_grade and calculating the mean percentage of each race
race_pct_means <- metro_grades %>%
group_by(holc_grade) %>%
summarize(
pct_white = mean(pct_white),
pct_black = mean(pct_black),
pct_hisp = mean(pct_hisp),
pct_asian = mean(pct_asian),
pct_other = mean(pct_other)
)
# Combine data for plotting
race_pct_means_long <- race_pct_means %>%
pivot_longer(cols = starts_with("pct_"), names_to = "Race", values_to = "Percentage")
# Plotting the percentages for each race by holc_grade
ggplot(race_pct_means_long, aes(x = holc_grade, y = Percentage, fill = Race)) +
geom_bar(stat = "identity", position = "stack") +
labs(title = "Percentage of Race by HOLC Grade",
x = "HOLC Grade",
y = "Percentage") +
scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
These are the total appreciation that people who lived in different regions of the U.S. experienced through 1963-2024. We can see that the first region which is the Northeast had the greatest appreciation, followed by the midwest, south and West. We can see that the midwest and south had the lowest amounts of appreciation
# Calculate appreciation rates
# Define a function to calculate percentage change
percentage_change <- function(x) {
return((x[length(x)] - x[1]) / x[1] * 100)
}
# Apply the function to each row (region) of the dataset
appreciation_rates <- apply(homes_sales_data[, -c(1:3)], 1, percentage_change)
# Print appreciation rates to inspect
print(appreciation_rates)
## [1] 3675.481 2047.429 2141.071 2946.667
These results are from Tukey’s multiple comparisons of means, which is a post-hoc test conducted after performing the ANOVA. Since ANOVA indicates significant differences, we can conduct post-hoc tests, such as Tukey’s Honestly Significant Difference (HSD) test, to identify which specific regions differ significantly from each other in terms of appreciation rates.The diff column indicates the appreciation rates in the Northeast are, on average, 6.65 units higher than in the Midwest. There is no significant difference in appreciation rates between the South and Midwest regions. The appreciation rates in the South are, on average, 6.26 units lower than in the Northeast which confirms our alternative hypothesis. There is also no significant difference in appreciation rates between the West and Northeast regions. These results confirmed my alternative hypothesis for me. The LWR and UPR columns provide the lower and upper bound of the 95% confidence interval for the difference in means. The P adj column provides the adjusted p-value for each pairwise comparison
# Create a new data frame with region names and appreciation rates
# Flatten the appreciation_rates matrix and convert it to a data frame
appreciation_data <- data.frame(
Region = rep(homes_sales_data$`Region Name`, each = ncol(homes_sales_data) - 3),
Appreciation_Rate = as.vector(appreciation_rates)
)
# Statistical analysis
# One-way ANOVA to compare appreciation rates between regions
anova_result <- aov(Appreciation_Rate ~ Region, data = appreciation_data)
# Summary of ANOVA
summary(anova_result)
## Df Sum Sq Mean Sq F value Pr(>F)
## Region 3 7145 2382 0.005 0.999
## Residuals 976 428897070 439444
# Post-hoc tests
# Tukey's HSD test for multiple comparisons
tukey_result <- TukeyHSD(anova_result)
print(tukey_result)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Appreciation_Rate ~ Region, data = appreciation_data)
##
## $Region
## diff lwr upr p adj
## Northeast-Midwest 6.6451110 -147.4900 160.7802 0.9995114
## South-Midwest 0.3822157 -153.7529 154.5173 0.9999999
## West-Midwest 3.6703596 -150.4647 157.8054 0.9999174
## South-Northeast -6.2628953 -160.3980 147.8722 0.9995907
## West-Northeast -2.9747514 -157.1098 151.1603 0.9999560
## West-South 3.2881438 -150.8469 157.4232 0.9999406
Redlining has had a profound and lasting impact on communities, contributing to systemic inequalities in housing and wealth. Understanding the legacy of redlining is essential for addressing systemic racism and promoting social justice in our society. We rejected the Null Hypothesis because our analysis did show differences in asset appreciation in these different regions of the U.S. The analysis of the housing sales data showed just the averages by region so this was a challenges in our analysis but we did conclude that practices such as Jim crow had a greater effect in preventing economic growth in the southern region of the U.S. as well as the population that left the south due to these policies. Collaboration across sectors, community engagement, and advocacy are essential for creating inclusive and sustainable communities where everyone can thrive. Together, we can work towards a future where every individual has access to safe, affordable, and dignified housing, regardless of race or socioeconomic status.
Data Sources & References:
https://github.com/fivethirtyeight/data/tree/master/redlining