Neighborhood Financial Health Digital Mapping and Data Tool

Discussion’s author: Dirk Hartog

Context:

The Neighborhood Financial Health (NFH) Tool provides financial indicators for all NYC neighborhoods. Created by DCWP’s Office of Financial Empowerment (OFE), its aim is to highlight patterns related to economic factors like race and income, aiding in understanding the racial wealth gap.

Content:

Sourced from NYC Open Data, the dataset has 52 columns detailing financial health indicators, neighborhood comparisons, and demographic factors. The data offers insights into the correlation between poverty, income, race, and financial outcomes in different neighborhoods.

Questions:

  1. How do the NFH indicators relate to economic and demographic factors, especially in terms of race and income?

  2. How do the poverty rates, incomes, and racial demographics of each area relate to the median income and poverty score?

The discussion also suggests potential data transformation tasks, like separating certain columns into another table for ease of analysis and refining column names for clarity, but these are more about data preparation rather than specific analytical questions.

Analysis:

library(dplyr)
library(tidyr)
library(tidyverse)
url <- "https://raw.githubusercontent.com/hbedros/data607_prj2/main/df2/Neighborhood_Financial_Health.csv"

neighborhood_fin_health <- read_csv(url)
## Rows: 385 Columns: 52
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (26): Borough, Neighborhoods, CD, Goal, GoalName, GoalFullName, GoalRank...
## dbl (26): Year Published, PUMA, Join, NYC_Poverty_Rate, Median_Income, Perc_...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Data Organization:

# First converting all outcome columns to numeric so we can consolidate columns
outcome_cols <- grep("Ind\\dOutcome$", names(neighborhood_fin_health), value = TRUE)
neighborhood_fin_health[outcome_cols] <- lapply(neighborhood_fin_health[outcome_cols], as.numeric)
## Warning in lapply(neighborhood_fin_health[outcome_cols], as.numeric): NAs
## introduced by coercion

## Warning in lapply(neighborhood_fin_health[outcome_cols], as.numeric): NAs
## introduced by coercion
# Using pivot_longer() from `tidyr` to transform data from wide to long format
neighborhood_fin_health_cln <- neighborhood_fin_health %>%
  # Reshaping indicator names
  
# Here I'm joining columns Ind1, Ind2, etc... 
# And columns Ind1Outcome, Ind2Outcome, etc...
  pivot_longer(cols = starts_with("Ind") & ends_with("Outcome"),
               names_to = "Indicators",
               values_to = "Outcomes") %>%
  # Adjusting 'Indicators' column to only have 'Ind1', 'Ind2', etc.
  mutate(Indicators = str_extract(Indicators, "Ind\\d")) %>%
  # Reshaping definitions
  pivot_longer(cols = starts_with("Ind") & ends_with("Definition"),
               names_to = "Ind_Def",
               values_to = "Definitions") %>%
  # Making sure definitions align with indicators
  filter(Indicators == str_extract(Ind_Def, "Ind\\d")) %>%
  # Dropping redundant columns
  select(-Ind_Def) %>%
  # Ordering columns for better readability
  select(`Year Published`, PUMA, Borough, Neighborhoods, CD, Join, NYC_Poverty_Rate, Median_Income, 
         Perc_White, Perc_Black, Perc_Asian, Perc_Hispanic, Perc_Other, Goal, GoalName, GoalFullName, 
         TotalOutcome, GoalRank, IndexScore, ScoreRank, Indicators, Outcomes, Definitions)

# Check the transformed data
head(neighborhood_fin_health_cln)

1. How do the NFH indicators relate to economic and demographic factors, especially in terms of race and income?

# Descriptive stats for Median_Income of our df
summary(neighborhood_fin_health_cln$Median_Income)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   14213   22264   26140   31238   32210   79181
# Percentage distribution for racial groups
mean(neighborhood_fin_health_cln$Perc_White, na.rm = TRUE) # on average 32.34% of the population in the neighborhoods covered by the dataset identify as White.
## [1] 0.3233636
mean(neighborhood_fin_health_cln$Perc_Black, na.rm = TRUE) # on average, 22.51% of the population in the neighborhoods identify as Black.
## [1] 0.2250545
mean(neighborhood_fin_health_cln$Perc_Hispanic, na.rm = TRUE) # on average, 29.16% of the population in the neighborhoods identify as Black.
## [1] 0.2916909
mean(neighborhood_fin_health_cln$Perc_Asian, na.rm = TRUE) # on average, 13.13% of the population in the neighborhoods identify as Asian.
## [1] 0.1313091
mean(neighborhood_fin_health_cln$Perc_Other, na.rm = TRUE) # on average, 28.63% of the population in the neighborhoods identify as Asian.
## [1] 0.02863636
# using the dplyr package we're summarizing the clean df a little further 
# by selecting unique values
neighborhood_fin_health_unique <- neighborhood_fin_health_cln %>%
  group_by(`Year Published`, PUMA, Borough, Neighborhoods, CD, NYC_Poverty_Rate, 
           Median_Income, Perc_White, Perc_Black, Perc_Asian, Perc_Hispanic, Perc_Other) %>%
  slice(1) %>%
  ungroup()

# let's check top 5 areas with the highest median income
top_5_median_income <- neighborhood_fin_health_unique %>%
  arrange(desc(Median_Income)) %>%
  head(5) %>%
  select(Borough, Neighborhoods, Median_Income, Perc_White, Perc_Black, Perc_Hispanic, Perc_Asian, Perc_Other)

print(top_5_median_income)
## # A tibble: 5 × 8
##   Borough   Neighborhoods      Median_Income Perc_White Perc_Black Perc_Hispanic
##   <chr>     <chr>                      <dbl>      <dbl>      <dbl>         <dbl>
## 1 Manhattan Battery Park City…         79181      0.716      0.022         0.07 
## 2 Manhattan Murray Hill, Gram…         73067      0.694      0.032         0.087
## 3 Manhattan Upper East Side            71933      0.761      0.028         0.09 
## 4 Manhattan Chelsea, Clinton …         65905      0.609      0.054         0.146
## 5 Manhattan Upper West Side &…         65844      0.677      0.06          0.144
## # ℹ 2 more variables: Perc_Asian <dbl>, Perc_Other <dbl>

Interpretation: he data showcases the top 5 neighborhoods in Manhattan with the highest median incomes. Each entry provides a detailed breakdown of the racial demographics in these areas:

Battery Park City, Greenwich Village & Soho: The wealthiest neighborhood, with a median income of $79,181. The majority (71.6%) of its population is White, followed by Asian (15.9%). Only a small percentage (2.2%) is Black, and 7% are Hispanic.

Murray Hill, Gramercy & Stuyvesant Town: With a median income of $73,067, this neighborhood’s majority is also White at 69.4%, and Asian residents account for 15.8%. Black residents make up 3.2%, slightly higher than the previous area.

Upper East Side: Boasting a median income of $71,933, this iconic Manhattan neighborhood is primarily White (76.1%). Interestingly, its Asian population (9.3%) is lower compared to the other top-income areas, while the Black population remains relatively low at 2.8%.

Chelsea, Clinton & Midtown Business District: With a median income of $65,905, this area has a more diverse racial mix, though still predominantly White (60.9%). Asian and Black populations are notably higher here at 16.2% and 5.4%, respectively.

Upper West Side & West Side: Just trailing behind the previous neighborhood with a median income of $65,844, this area is 67.7% White. It has a slightly higher percentage of Black residents (6%) and a comparable Asian population (9.2%).

In summary, these affluent Manhattan neighborhoods are predominantly White, with varying levels of Asian, Black, and Hispanic populations. This distribution may offer insights into socioeconomic patterns in Manhattan’s high-income areas.

# Correlation with Median Income
cor_results <- neighborhood_fin_health_cln %>% 
  group_by(Indicators) %>% 
  summarise(cor_income = cor(Median_Income, Outcomes, use = "complete.obs"))

# Grouping neighborhood_fin_health_cln by Indicators and calculates correlations between Outcomes and various demographic features, storing results in cor_results.
cor_results <- neighborhood_fin_health_cln %>% 
  group_by(Indicators) %>% 
  summarise(
    cor_income = cor(Median_Income, Outcomes, use = "complete.obs"),
    cor_perc_white = cor(Perc_White, Outcomes, use = "complete.obs"),
    cor_perc_black = cor(Perc_Black, Outcomes, use = "complete.obs"),
    cor_perc_asian = cor(Perc_Asian, Outcomes, use = "complete.obs"),
    cor_perc_hispanic = cor(Perc_Hispanic, Outcomes, use = "complete.obs"),
    cor_perc_other = cor(Perc_Other, Outcomes, use = "complete.obs")
  )

# Transforming cor_results into a long format, separating demographic columns into distinct rows with their associated correlation values in cor_plot
cor_plot <- cor_results %>%
  pivot_longer(cols = -Indicators, names_to = "Demographics", values_to = "Correlation")

# Plot
ggplot(cor_plot, aes(x = Indicators, y = Correlation, fill = Demographics)) +
  geom_bar(stat="identity", position="dodge") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(title="Correlation of NFH Indicators with Economic and Demographic Factors",
       y="Correlation Coefficient")

Interpretation:
- When the cor_income values are positive and significant, it implies that as the median income increases, the financial health indicator also tends to increase. If it’s negative, the opposite is true.
- For racial demographics (cor_perc_white, cor_perc_black, etc.), positive correlations imply that as the percentage of a particular race increases in a PUMA, the outcome of that financial health indicator also tends to increase, and vice versa.


2. How do the poverty rates, incomes, and racial demographics of each area relate to the median income and poverty score?

# Plot for median income vs poverty rate for the df
ggplot(neighborhood_fin_health_cln, aes(x = Median_Income, y = NYC_Poverty_Rate)) +
  geom_point(aes(color = "Overall"), size = 3, alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE, color = "blue") +
  labs(
    title = "Median Income vs. Poverty Rate",
    x = "Median Income",
    y = "Poverty Rate (%)",
    color = "Demographic"
  ) +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

In the below scatter plot:

  • Each racial group is represented by a different color.
  • The size of the points represents the percentage of that racial group in the neighborhood.
  • Different line types in the regression lines help to differentiate the racial groups.
# Median Income vs. Poverty Rate colored by racial demographic percentages:

# Reshape the data
long_data <- neighborhood_fin_health_cln %>%
  select(Median_Income, NYC_Poverty_Rate, Perc_White, Perc_Black, Perc_Asian, Perc_Hispanic, Perc_Other) %>%
  gather(key = "Race", value = "Percentage", -Median_Income, -NYC_Poverty_Rate)

# Plot
ggplot(long_data, aes(x = Median_Income, y = NYC_Poverty_Rate, color = Race)) +
  geom_point(aes(size = Percentage), alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE, aes(group = Race, linetype = Race)) +
  labs(
    title = "Median Income vs. Poverty Rate Colored by Racial Demographics",
    x = "Median Income",
    y = "Poverty Rate (%)"
  ) +
  scale_color_manual(values = c("Perc_White" = "red", "Perc_Black" = "blue", "Perc_Asian" = "green", "Perc_Hispanic" = "purple", "Perc_Other" = "orange")) +
  theme_minimal() +
  theme(legend.position = "bottom")
## `geom_smooth()` using formula = 'y ~ x'

Based on the visualization:

  • Low Median Income Areas (Left side):
    Dominated by Hispanic (Purple) and Other racial demographics (Orange), suggesting they often reside in areas with lower incomes but face higher poverty rates.

  • High Median Income Areas (Right side):
    Mostly White (Red) with some Asian (Green) and Other (Orange) demographics. These groups tend to live in higher income areas, yet some still face above-average poverty rates. Notably, there’s very little Hispanic (Purple) presence, indicating fewer Hispanic communities in these wealthier areas. Interpretation:

The plot reveals a racial wealth gap. Hispanic communities are more common in lower-income, higher-poverty areas, while White and Asian communities are more prevalent in wealthier areas. This suggests economic disparities tied to racial backgrounds.