Analysis:
library(dplyr)
library(tidyr)
library(tidyverse)
url <- "https://raw.githubusercontent.com/hbedros/data607_prj2/main/df2/Neighborhood_Financial_Health.csv"
neighborhood_fin_health <- read_csv(url)
## Rows: 385 Columns: 52
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (26): Borough, Neighborhoods, CD, Goal, GoalName, GoalFullName, GoalRank...
## dbl (26): Year Published, PUMA, Join, NYC_Poverty_Rate, Median_Income, Perc_...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Data Organization:
# First converting all outcome columns to numeric so we can consolidate columns
outcome_cols <- grep("Ind\\dOutcome$", names(neighborhood_fin_health), value = TRUE)
neighborhood_fin_health[outcome_cols] <- lapply(neighborhood_fin_health[outcome_cols], as.numeric)
## Warning in lapply(neighborhood_fin_health[outcome_cols], as.numeric): NAs
## introduced by coercion
## Warning in lapply(neighborhood_fin_health[outcome_cols], as.numeric): NAs
## introduced by coercion
# Using pivot_longer() from `tidyr` to transform data from wide to long format
neighborhood_fin_health_cln <- neighborhood_fin_health %>%
# Reshaping indicator names
# Here I'm joining columns Ind1, Ind2, etc...
# And columns Ind1Outcome, Ind2Outcome, etc...
pivot_longer(cols = starts_with("Ind") & ends_with("Outcome"),
names_to = "Indicators",
values_to = "Outcomes") %>%
# Adjusting 'Indicators' column to only have 'Ind1', 'Ind2', etc.
mutate(Indicators = str_extract(Indicators, "Ind\\d")) %>%
# Reshaping definitions
pivot_longer(cols = starts_with("Ind") & ends_with("Definition"),
names_to = "Ind_Def",
values_to = "Definitions") %>%
# Making sure definitions align with indicators
filter(Indicators == str_extract(Ind_Def, "Ind\\d")) %>%
# Dropping redundant columns
select(-Ind_Def) %>%
# Ordering columns for better readability
select(`Year Published`, PUMA, Borough, Neighborhoods, CD, Join, NYC_Poverty_Rate, Median_Income,
Perc_White, Perc_Black, Perc_Asian, Perc_Hispanic, Perc_Other, Goal, GoalName, GoalFullName,
TotalOutcome, GoalRank, IndexScore, ScoreRank, Indicators, Outcomes, Definitions)
# Check the transformed data
head(neighborhood_fin_health_cln)
1. How do the NFH indicators relate to economic and demographic
factors, especially in terms of race and income?
# Descriptive stats for Median_Income of our df
summary(neighborhood_fin_health_cln$Median_Income)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 14213 22264 26140 31238 32210 79181
# Percentage distribution for racial groups
mean(neighborhood_fin_health_cln$Perc_White, na.rm = TRUE) # on average 32.34% of the population in the neighborhoods covered by the dataset identify as White.
## [1] 0.3233636
mean(neighborhood_fin_health_cln$Perc_Black, na.rm = TRUE) # on average, 22.51% of the population in the neighborhoods identify as Black.
## [1] 0.2250545
mean(neighborhood_fin_health_cln$Perc_Hispanic, na.rm = TRUE) # on average, 29.16% of the population in the neighborhoods identify as Black.
## [1] 0.2916909
mean(neighborhood_fin_health_cln$Perc_Asian, na.rm = TRUE) # on average, 13.13% of the population in the neighborhoods identify as Asian.
## [1] 0.1313091
mean(neighborhood_fin_health_cln$Perc_Other, na.rm = TRUE) # on average, 28.63% of the population in the neighborhoods identify as Asian.
## [1] 0.02863636
# using the dplyr package we're summarizing the clean df a little further
# by selecting unique values
neighborhood_fin_health_unique <- neighborhood_fin_health_cln %>%
group_by(`Year Published`, PUMA, Borough, Neighborhoods, CD, NYC_Poverty_Rate,
Median_Income, Perc_White, Perc_Black, Perc_Asian, Perc_Hispanic, Perc_Other) %>%
slice(1) %>%
ungroup()
# let's check top 5 areas with the highest median income
top_5_median_income <- neighborhood_fin_health_unique %>%
arrange(desc(Median_Income)) %>%
head(5) %>%
select(Borough, Neighborhoods, Median_Income, Perc_White, Perc_Black, Perc_Hispanic, Perc_Asian, Perc_Other)
print(top_5_median_income)
## # A tibble: 5 × 8
## Borough Neighborhoods Median_Income Perc_White Perc_Black Perc_Hispanic
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Manhattan Battery Park City… 79181 0.716 0.022 0.07
## 2 Manhattan Murray Hill, Gram… 73067 0.694 0.032 0.087
## 3 Manhattan Upper East Side 71933 0.761 0.028 0.09
## 4 Manhattan Chelsea, Clinton … 65905 0.609 0.054 0.146
## 5 Manhattan Upper West Side &… 65844 0.677 0.06 0.144
## # ℹ 2 more variables: Perc_Asian <dbl>, Perc_Other <dbl>
Interpretation: he data showcases the top 5
neighborhoods in Manhattan with the highest median incomes. Each entry
provides a detailed breakdown of the racial demographics in these
areas:
Battery Park City, Greenwich Village & Soho: The
wealthiest neighborhood, with a median income of
$79,181. The majority (71.6%) of its
population is White, followed by Asian
(15.9%). Only a small percentage (2.2%) is
Black, and 7% are Hispanic.
Murray Hill, Gramercy & Stuyvesant Town: With a
median income of $73,067, this neighborhood’s
majority is also White at 69.4%, and
Asian residents account for 15.8%.
Black residents make up 3.2%, slightly
higher than the previous area.
Upper East Side: Boasting a median income of
$71,933, this iconic Manhattan neighborhood is
primarily White (76.1%). Interestingly, its
Asian population (9.3%) is lower
compared to the other top-income areas, while the Black
population remains relatively low at 2.8%.
Chelsea, Clinton & Midtown Business District: With a
median income of $65,905, this area has a more diverse
racial mix, though still predominantly White (60.9%).
Asian and Black populations are notably higher here at
16.2% and 5.4%, respectively.
Upper West Side & West Side: Just trailing behind
the previous neighborhood with a median income of
$65,844, this area is 67.7% White. It has a
slightly higher percentage of Black residents
(6%) and a comparable Asian population
(9.2%).
In summary, these affluent Manhattan neighborhoods are predominantly
White, with varying levels of Asian, Black, and Hispanic populations.
This distribution may offer insights into socioeconomic patterns in
Manhattan’s high-income areas.
# Correlation with Median Income
cor_results <- neighborhood_fin_health_cln %>%
group_by(Indicators) %>%
summarise(cor_income = cor(Median_Income, Outcomes, use = "complete.obs"))
# Grouping neighborhood_fin_health_cln by Indicators and calculates correlations between Outcomes and various demographic features, storing results in cor_results.
cor_results <- neighborhood_fin_health_cln %>%
group_by(Indicators) %>%
summarise(
cor_income = cor(Median_Income, Outcomes, use = "complete.obs"),
cor_perc_white = cor(Perc_White, Outcomes, use = "complete.obs"),
cor_perc_black = cor(Perc_Black, Outcomes, use = "complete.obs"),
cor_perc_asian = cor(Perc_Asian, Outcomes, use = "complete.obs"),
cor_perc_hispanic = cor(Perc_Hispanic, Outcomes, use = "complete.obs"),
cor_perc_other = cor(Perc_Other, Outcomes, use = "complete.obs")
)
# Transforming cor_results into a long format, separating demographic columns into distinct rows with their associated correlation values in cor_plot
cor_plot <- cor_results %>%
pivot_longer(cols = -Indicators, names_to = "Demographics", values_to = "Correlation")
# Plot
ggplot(cor_plot, aes(x = Indicators, y = Correlation, fill = Demographics)) +
geom_bar(stat="identity", position="dodge") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title="Correlation of NFH Indicators with Economic and Demographic Factors",
y="Correlation Coefficient")

Interpretation:
- When the cor_income values are positive and significant, it implies
that as the median income increases, the financial health indicator also
tends to increase. If it’s negative, the opposite is true.
- For racial demographics (cor_perc_white, cor_perc_black, etc.),
positive correlations imply that as the percentage of a particular race
increases in a PUMA, the outcome of that financial health indicator also
tends to increase, and vice versa.
2. How do the poverty rates, incomes, and racial demographics of
each area relate to the median income and poverty score?
# Plot for median income vs poverty rate for the df
ggplot(neighborhood_fin_health_cln, aes(x = Median_Income, y = NYC_Poverty_Rate)) +
geom_point(aes(color = "Overall"), size = 3, alpha = 0.7) +
geom_smooth(method = "lm", se = FALSE, color = "blue") +
labs(
title = "Median Income vs. Poverty Rate",
x = "Median Income",
y = "Poverty Rate (%)",
color = "Demographic"
) +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

In the below scatter plot:
- Each racial group is represented by a different color.
- The size of the points represents the percentage of that racial
group in the neighborhood.
- Different line types in the regression lines help to differentiate
the racial groups.
# Median Income vs. Poverty Rate colored by racial demographic percentages:
# Reshape the data
long_data <- neighborhood_fin_health_cln %>%
select(Median_Income, NYC_Poverty_Rate, Perc_White, Perc_Black, Perc_Asian, Perc_Hispanic, Perc_Other) %>%
gather(key = "Race", value = "Percentage", -Median_Income, -NYC_Poverty_Rate)
# Plot
ggplot(long_data, aes(x = Median_Income, y = NYC_Poverty_Rate, color = Race)) +
geom_point(aes(size = Percentage), alpha = 0.7) +
geom_smooth(method = "lm", se = FALSE, aes(group = Race, linetype = Race)) +
labs(
title = "Median Income vs. Poverty Rate Colored by Racial Demographics",
x = "Median Income",
y = "Poverty Rate (%)"
) +
scale_color_manual(values = c("Perc_White" = "red", "Perc_Black" = "blue", "Perc_Asian" = "green", "Perc_Hispanic" = "purple", "Perc_Other" = "orange")) +
theme_minimal() +
theme(legend.position = "bottom")
## `geom_smooth()` using formula = 'y ~ x'
Based on the visualization:
Low Median Income Areas (Left side):
Dominated by Hispanic (Purple) and Other racial demographics (Orange),
suggesting they often reside in areas with lower incomes but face higher
poverty rates.
High Median Income Areas (Right side):
Mostly White (Red) with some Asian (Green) and Other (Orange)
demographics. These groups tend to live in higher income areas, yet some
still face above-average poverty rates. Notably, there’s very little
Hispanic (Purple) presence, indicating fewer Hispanic communities in
these wealthier areas. Interpretation:
The plot reveals a racial wealth gap. Hispanic communities are more
common in lower-income, higher-poverty areas, while White and Asian
communities are more prevalent in wealthier areas. This suggests
economic disparities tied to racial backgrounds.