607_Assignment1

Data607 Assignment 1: Examine FiveThirtyEight Dataset

Introduction

The FiveThirtyEight article (https://projects.fivethirtyeight.com/redlining/) and dataset chosen for further review in R covers the current state of housing segregation in many US cities. The data is used to show how many of the redlined neighborhoods from the 1930s in metropolitan and surrounding areas remain highly segregated mirroring the initial discriminatory policies meant to prevent black, latino, and other minorities from living in certain communities in the US. The location quotient shows the proportion of a race in a given area compared to the larger metropolitan area with values higher than 1 meaning greater levels of racial concentration in a smaller area and vice versa.

Load Raw CSV File from Github Repo

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.8     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.1
## ✔ readr   2.1.2     ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

file_path = 'https://raw.githubusercontent.com/fivethirtyeight/data/master/redlining/metro-grades.csv'

df <- read.table(file_path,header = TRUE,sep = ',')

colnames(df)

##  [1] "metro_area"          "holc_grade"          "white_pop"          
##  [4] "black_pop"           "hisp_pop"            "asian_pop"          
##  [7] "other_pop"           "total_pop"           "pct_white"          
## [10] "pct_black"           "pct_hisp"            "pct_asian"          
## [13] "pct_other"           "lq_white"            "lq_black"           
## [16] "lq_hisp"             "lq_asian"            "lq_other"           
## [19] "surr_area_white_pop" "surr_area_black_pop" "surr_area_hisp_pop" 
## [22] "surr_area_asian_pop" "surr_area_other_pop" "surr_area_pct_white"
## [25] "surr_area_pct_black" "surr_area_pct_hisp"  "surr_area_pct_asian"
## [28] "surr_area_pct_other"

Subset Relevant columns of dataset

excluding gross counts that are used for percentages

sub_df <- subset(df, select = c("metro_area","holc_grade","pct_white","pct_black","pct_hisp","pct_asian","pct_other"      ,"lq_white","lq_black","lq_hisp","lq_asian","lq_other"))

Rename Columns for clarity and Replace rankings with descriptive values

#based on rename_with documentation and this link (https://cmdlinetips.com/2022/03/how-to-replace-multiple-column-names-of-a-dataframe-with-tidyverse/)
sub_df <- sub_df %>% rename_with(function(rename){gsub('lq','location_quotient',rename)})
sub_df <- sub_df %>% rename_with(function(rename){gsub('pct','percent',rename)})
sub_df <- sub_df %>% rename(homeowners_loan_corp = holc_grade)

sub_df$homeowners_loan_corp[which(sub_df$homeowners_loan_corp=='A')] <- "Best"
sub_df$homeowners_loan_corp[which(sub_df$homeowners_loan_corp=='B')] <- "Desirable"
sub_df$homeowners_loan_corp[which(sub_df$homeowners_loan_corp=='C')] <- "Declining"
sub_df$homeowners_loan_corp[which(sub_df$homeowners_loan_corp=='D')] <- "Hazardous"
head(sub_df)

##                    metro_area homeowners_loan_corp percent_white percent_black
## 1                   Akron, OH                 Best         66.83         23.33
## 2                   Akron, OH            Desirable         61.24         24.33
## 3                   Akron, OH            Declining         64.87         20.27
## 4                   Akron, OH            Hazardous         40.80         45.70
## 5 Albany-Schenectady-Troy, NY                 Best         72.91          7.80
## 6 Albany-Schenectady-Troy, NY            Desirable         58.91         15.68
##   percent_hisp percent_asian percent_other location_quotient_white
## 1         2.59          1.86          5.39                    0.94
## 2         3.26          4.96          6.21                    0.86
## 3         2.79          5.58          6.48                    0.91
## 4         3.75          3.00          6.75                    0.57
## 5         5.65          8.57          5.07                    1.09
## 6         9.58          5.55         10.28                    0.88
##   location_quotient_black location_quotient_hisp location_quotient_asian
## 1                    1.41                   1.00                    0.46
## 2                    1.47                   1.26                    1.23
## 3                    1.23                   1.08                    1.38
## 4                    2.76                   1.45                    0.74
## 5                    0.66                   0.77                    1.21
## 6                    1.33                   1.30                    0.78
##   location_quotient_other
## 1                    0.97
## 2                    1.11
## 3                    1.16
## 4                    1.21
## 5                    0.72
## 6                    1.47

Exploratory Bar Chart

The initial bar graph shown below is meant to preliminary explore if there are still higher concentrations of white populations in the most favorably rating neighborhoods even after so many years after redlining was banned. The initial takeaway is that white populations represent higher proportions in the two higher ratings neighborhoods and that there may still be persistent segregated neighborhoods. The article does a good job of mixing graphics and analysis to further this point.

avg_bar <- ggplot(sub_df,aes(x=homeowners_loan_corp,y=location_quotient_white))+stat_summary(fun = 'mean',geom='bar')
avg_bar + ggtitle('Average percent of 2020 white population by original ranking of neighborhood')+xlab('Original Area Ranking')+ylab('Average Percent of White Residents')

Findings and Recommendations:

Although there is valuable data included within this dataset only the names of the regions are provided without coordinates that would be most useful for graphing map data within R. It would be interesting to explore this analysis further incorporating that information to plot it on a map and potentially drill into the regions not specifically referenced in the article. Another additional data point to enhance the analysis would be property values or mortgage approval rates to further hone in on the disparity across neighborhoods.