The Community Development Block Grant (CDBG) program has been an essential tool for community development since its inception in 1974. Administered by the US Department of Housing and Urban Development (HUD), CDBG funds are available to urban communities that have high populations of low-and-moderate income (LMI) individuals (HUD uses a formula-based approach, comparing census tract incomes with Area Median Income (AMI) for the Metropolitan Statistical Area (MSA) to which the tract belongs). This means income is the deciding factor in determining which census tracts receive CDBG eligibility status.
Although CDBG funds are great tools that provide the foundation for investment in housing and other community development-related projects, the prolonged CDBG status of many of the CDBG-eligible census tracts is an area of concern. At the most basic level, since relative income is the HUD criteria used for determining CDBG eligibility, programs focused on reducing poverty and improving job opportunities get right as some of the central issues facing LMI communities. However, there are several factors that may impact a community’s ability to grow wealth. These ‘other factors’ are left out when focusing on income or employment alone. This paper attempts to investigate other variables that may be impacting CDBG status within census tracts in Pittsburgh. An improved focus on other possible determinants besides income can boost the economic resiliency of a CDBG census tract by taking a more creative approach to income generation or acknowledging rising costs in LMI communities. This is critical as a rise in incomes without improvements in cost reduction, for example, will ultimately mean a census tract will not be eligible for CDBG funds. It is also possible that increased incomes are a result of an influx of wealthier households as opposed to real wage growth among the low-income population. In addition, a rise in income for a census tract that would erase CDBG eligibility status does not mean that all LMI individuals in that tract no longer need the support previously offered through the CDBG program- it simply means less than fifty percent of the tract is LMI. Therefore, other factors, such as costs related to living expenses, should also be considered, and are discussed further in this paper.
The goal of this project was to analyze which factors within the City of Pittsburgh census tracts lead to a higher probability of CDBG status. With outputs of statistically significant variables, policymakers and community development professionals may be able to address better the needs of the LMI populations they serve. In addition, the analysis put forth in this paper is meant to give a better understanding of the determining factors of CDBG status beyond a simple examination of income within tracts.
Tract-specific data was obtained for census tracts within the City of Pittsburgh using American Community Survey (ACS) five-year estimates for the year 2020. Census tracts with incomplete data were excluded from the analysis. Overall, 94 census tracts were observed, 58 of which were CDBG eligible according to the City of Pittsburgh GIS map of eligible tracts.
A logistic regression model was created with CDBG eligibility as the dependent variable. The dichotomous variable is coded 1 for CDBG eligible and 0 for not CDBG eligible. Explanatory factors include the percentage of the tract that is Black or African American, the percentage of the tract that are owner-occupants, the percent of the tract that has disability status, and the percentage of the tract that pays more than 35 percent of income toward rent (gross rent as a percent of income, or GRAPI).
A logistic regression approach was determined to be most appropriate, and similar approaches have been used in determining Black-White inequality in homeownership entry and exits (Ren 2020) and in determining the likelihood of public housing demolition based on Black occupancy rates (Goetz 2011). Population percentages were used for all independent variables to control for differences in population size.
The variables can be split up into two types- demographic indicators and housing indicators. By approaching the variables in this way, we can observe them each in their impact on CDBG eligibility individually, collectively, and in terms of the two types: demographic indicators and housing indicators.
Beginning with the demographic indicators, the variable for percent Black or African American was important to include, not just because of the literature on obstacles to Black wealth accumulation or homeownership sustainability (Ren 2020), but because of the dynamic currently unfolding in Pittsburgh specifically. The City of Pittsburgh is losing Black residents at a significantly higher rate than the total population, and there are still significant obstacles to homeownership for Blacks within the City that are likely to be exacerbated by rising home values (Boyle 2022). This variable may also help to illustrate the present-day impacts of historical redlining and neighborhood segregation. For example, Crowder and South (2008) used a logistic analysis to conclude that a higher presence of minority populations in and around surrounding neighborhoods increases the probability of white flight. By examining the Black population as a variable, we are able to interpret the racial factor in determining CDBG eligibility, and we can see if one’s race is a determinant for living in a CDBG-eligible census tract.
The percentage of the population with a disability is the other demographic indicator. There is growing literature on the intersection of race, disability, and poverty (Ben-Moshe and Magaña 2014), and observing disability status can help better understand the demographic makeup of LMI tracts. In addition, disability status is an obvious barrier to income, and therefore a potential influence on prolonged CDBG status for tracts with high proportions of disabled residents.
Housing is an example of the residual costs that limit wealth accumulation and may not be perfectly captured when looking solely at income. GRAPI, for example, is essentially a measure of cost-burdened to highly cost-burdened renters within tracts. Of course, high rental cost-burdened may be offset to some degree by increased incomes, but they could also be reduced by lower housing costs or an increase in affordable rental units. Goetz (2011) uses logistic regression to determine public housing units with higher Black populations have been more likely to be demolished, thus limiting the supply of affordable housing in cities. Another way to escape high rents is home ownership, which, as already mentioned, has been a barrier for certain groups, especially LMI and BIPOC. Homeownership, acting as a possible path toward wealth creation and stability (Boehm and Schlottmann 2022), not just on an individual level but on a community level, is the final variable observed.
The model produced a statistically significant output for the four observed variables, and the coefficients make intuitive sense according to the literature.
This is the linear model income predictor. Interesting to see how the factors relate to income alone not just CDBG status.
##
## ===============================================
## Dependent variable:
## ---------------------------
## log(income)
## -----------------------------------------------
## race_b -0.004***
## (0.001)
##
## own 0.008***
## (0.001)
##
## unemp -0.017***
## (0.006)
##
## grapi -0.008***
## (0.002)
##
## disab -0.021***
## (0.005)
##
## Constant 11.221***
## (0.105)
##
## -----------------------------------------------
## Observations 94
## R2 0.683
## Adjusted R2 0.665
## Residual Std. Error 0.257 (df = 88)
## F Statistic 37.887*** (df = 5; 88)
## ===============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
## race_b own unemp grapi disab
## 1.820554 1.105396 1.242126 1.037586 1.936588
This matrix uses the original four variables.
## Confusion Matrix and Statistics
##
##
## cdbg_status Positive Negative
## Positive 29 7
## Negative 6 52
##
## Accuracy : 0.8617
## 95% CI : (0.7751, 0.9243)
## No Information Rate : 0.6277
## P-Value [Acc > NIR] : 4.438e-07
##
## Kappa : 0.7058
##
## Mcnemar's Test P-Value : 1
##
## Sensitivity : 0.8286
## Specificity : 0.8814
## Pos Pred Value : 0.8056
## Neg Pred Value : 0.8966
## Prevalence : 0.3723
## Detection Rate : 0.3085
## Detection Prevalence : 0.3830
## Balanced Accuracy : 0.8550
##
## 'Positive' Class : Positive
##
This is the matrix using the second logit regression, which includes income and unemployment. Using these two extra measures improved the accuracy of the confusion matrix, correctly predicting 56 out of 58 CDBG eligible tracts.
## Confusion Matrix and Statistics
##
##
## cdbg_status Positive Negative
## Positive 31 5
## Negative 2 56
##
## Accuracy : 0.9255
## 95% CI : (0.8526, 0.9695)
## No Information Rate : 0.6489
## P-Value [Acc > NIR] : 3.603e-10
##
## Kappa : 0.8399
##
## Mcnemar's Test P-Value : 0.4497
##
## Sensitivity : 0.9394
## Specificity : 0.9180
## Pos Pred Value : 0.8611
## Neg Pred Value : 0.9655
## Prevalence : 0.3511
## Detection Rate : 0.3298
## Detection Prevalence : 0.3830
## Balanced Accuracy : 0.9287
##
## 'Positive' Class : Positive
##
The odds ratio for the percent Black or African American shows that for each one-point increase in African American population proportion, the odds of having a CDBG eligibility status equal to 1 (meaning CDBG eligible) increases by about 16 percent. That is, the higher percentage of the tract that is Black or African American, the higher the odds the treat will be CDBG-eligible. This positive association between the Black population and CDBG eligibility is a negative for the City of Pittsburgh, and likely a result of historical discrimination, redlining, and lack of investment in communities of color.
The odds ratio for the percent owner-occupants is negative, which follows logically. The higher percentage of residents in a tract that are homeowners, the less likely the tract is to be CDBG eligible. For this analysis, each one-percentage-point increase in owner-occupancy within a tract decreases CDBG eligibility probability by 10 percent. Homeownership is a potential path toward higher household wealth (Boehm and Schlottmann 2022), and therefore a potential point of focus for policymakers looking to improve the wealth of lower-income urban communities.
The other housing variable, the percent of the population with GRAPI greater than 35 percent, shows that each one-point increase in the percent of renters paying more than 35 percent of their income towards rent leads to an 8 percent increase in odds the tract will be CDBG eligible. Overall, CDBG-eligible census tracts observed averaged 41 percent of renters paying over 35 percent of income towards rent, as opposed to just 31 percent of renters in non-CDBG tracts. This is a meaningful statistic given the lower owner-occupancy rates in CDBG tracts, meaning there are on average higher percentages of renters paying a higher percentage of their income toward rent, even with lower property values.
The most impactful coefficient of the ones observed in this model was the portion of the population with a disability. The odds ratio states that for each one-point increase in the percentage of the population with a disability, the odds of CDBG eligibility increase by about 44 percent. This result is in line with the literature on the correlation between disability and poverty (Brucker et al. 2015) and shows that there are more obstacles facing LMI communities than better employment or low incomes. The intersectionality of disability, poverty, and race can exacerbate or reproduce inequalities faced by these communities(Ben-Moshe and Magaña 2014).
##
## ==============================
## OR 2.5 % 97.5 %
## ------------------------------
## (Intercept) 0.020 0.001 0.299
## race_b 1.161 1.064 1.314
## own 0.906 0.851 0.950
## grapi 1.081 1.027 1.151
## disab 1.435 1.195 1.849
## ------------------------------
This correlation plot matrix includes all data from the dataset:
this correlation matrix contains only the variables used in the logistic regression:
The logistic regression model determined a statistically significant relationship between the independent variables of Black population proportion, owner-occupancy rates, the proportion of the population paying gross rent as a percent of income over 35 percent, and the population proportion of disabled residents on CDBG eligibility for census tracts within the City of Pittsburgh. Proportions of Black population, disabled population, and GRAPI over 35 percent were seen to have a positive association with a tract being identified as CDBG eligible. Owner-occupancy rate was the only variable to have an odds ratio that lowered the percentage of CDBG eligibility status. These results appear to follow the literature on low-income communities and wealth generation. This paper serves in some sense as an affirmation of targeted policies within communities that focus on improvements in paths to homeownership, accessibility, and addressing racial discrimination.
This study was limited in sample size, and a more effective analysis would result in the inclusion of a higher number of census tracts. In addition, future research may benefit from including more variables that measure neighborhood assets or historical redlining practices. Hopefully, this model can serve as a template for analysis in other cities.
Boehm, Thomas P., and Alan Schlottmann. 2022. ``Wealth Accumulation and Homeownership: Evidence for Low-Income Households on JSTOR.’’
Boyle, John. 2022. ``Taking Stock: A Decade in Decline for Black Homeownership in Pittsburgh.’’ Pittsburgh: Pittsburgh Community Reinvestment Group (PCRG).
Brucker, Debra L., Sophie Mitra, Navena Chaitoo, and Joseph Mauro. 2015. ``More Likely to Be Poor Whatever the Measure: Working-Age Persons with Disabilities in the United States.’’
Crowder, Kyle, and Scott J. South. 2008. ``Spatial Dynamics of White Flight: The Effects of Local and Extralocal Racial Conditions on Neighborhood Out-Migration.’
Goetz, Edward. 2011. ``Gentrification in Black and White: The Racial Impact of Public Housing Demolition in American Cities.’
Ren, Chunhui. 2020. ``A Framework for Explaining Black-White Inequality in Homeownership Sustainability.’’
rm(list = ls())
gc()
cat("\f")
packages <- c("readr", #open csv
"psych", # quick summary stats for data exploration,
"stargazer", #summary stats for sharing,
"tidyverse", # data manipulation like selecting variables,
"corrplot", # correlation plots
"ggplot2", # graphing
"ggcorrplot", # correlation plot
"gridExtra", #overlay plots
"data.table", # reshape for graphing
"car", #vif
"prettydoc", # html output
"visdat", # visualize missing variables
"glmnet", # lasso/ridge
"caret", # confusion matrix
"MASS", #step AIC
"plm", # fixed effects demeaned regression
"lmtest", # test regression coefficients
"cvms"
)
for (i in 1:length(packages)) {
if (!packages[i] %in% rownames(installed.packages())) {
install.packages(packages[i]
, repos = "http://cran.rstudio.com/"
, dependencies = TRUE
)
}
library(packages[i], character.only = TRUE)
}
rm(packages)
setwd("/Users/matthewcolantonio/Documents/Research/cdbg logit project/")
pgh_cdbg_v05 <- read_csv("cdbg_logit_pgh_v05.csv")
View(pgh_cdbg_v05)
data.clean <- pgh_cdbg_v05
cdbg_status <- as.factor(data.clean$`CDBG Status`)
race_b <- data.clean$`Black or African-American (%)`
vac <- data.clean$`Vacant Housing Units (%)`
own <- data.clean$`% Owner-Occupied`
grapi <- data.clean$`Gross Rent as % of HH Income (GRAPI) over .35 (%)`
disab <- data.clean$`With Disability (%)`
race_w <- data.clean$`White (%)`
unemp <- data.clean$`Unemployment Rate`
edu <- data.clean$`% w/ Bachelor's Degree (pop 25+)`
income <- data.clean$`Median Household Income`
logit <- glm(cdbg_status ~ race_b
+ own
+ grapi
+ disab,
data = data.clean,
family = binomial(link = "logit"))
summary(logit)
stargazer(logit,
type = 'text')
vif(logit)
logit2 <- glm(cdbg_status ~ race_b
+ own
+ unemp
+ income
+ grapi
+ disab,
data = data.clean,
family = binomial(link = "logit"))
summary(logit2)
vif(logit2)
lm_model <- lm(log(income) ~ race_b
+ own
+ unemp
+ grapi
+ disab,
data = data.clean)
stargazer(lm_model, type = 'text')
vif(lm_model)
data.clean$PREDICTED <- ifelse(test = predict(object = logit, newdata = data.clean)>0,
yes = 1,
no = 0)
data.clean$PREDICTED
describe(predict(object = logit, newdata = data.clean))
describe(predict(object = logit, newdata = data.clean, type = "response"))
confusion_matrix1 <- table(cdbg_status,data.clean$PREDICTED)
confusion_matrix1
colnames(confusion_matrix1) <- c('Positive', 'Negative')
rownames(confusion_matrix1) <- c('Positive', 'Negative')
stargazer(confusion_matrix1,
type = 'text')
confusionMatrix(confusion_matrix1)
data.clean$PREDICTED <- ifelse(test = predict(object = logit2, newdata = data.clean)>0,
yes = 1,
no = 0)
data.clean$PREDICTED
describe(predict(object = logit2, newdata = data.clean))
describe(predict(object = logit2, newdata = data.clean, type = "response"))
confusion_matrix2 <- table(cdbg_status,data.clean$PREDICTED)
confusion_matrix2
colnames(confusion_matrix2) <- c('Positive', 'Negative')
rownames(confusion_matrix2) <- c('Positive', 'Negative')
stargazer(confusion_matrix2,
type = 'text')
confusionMatrix(confusion_matrix2)
odds <- exp(cbind(OR = coef(logit), confint(logit)))
stargazer(odds, type = 'text')
# For Logit 2
odds2 <- exp(cbind(OR = coef(logit2), confint(logit2)))
stargazer(odds2, type = 'text')
mycorr<- cor(x = data.clean)
ggcorrplot(mycorr, hc.order = TRUE,
type = "lower", lab = TRUE,
lab_size = 1.5, method = "square",
tl.cex = 7.5,
pch = 4,
colors = c("#6D9EC1", "white", "#E46726"),
title = "Correlation Plot Matrix")
mycorr2 <- cor(data.clean[, c("CDBG Status", "Black or African-American (%)", "% Owner-Occupied", "Median Household Income", "With Disability (%)", "Unemployment Rate")])
ggcorrplot(mycorr2, hc.order = TRUE,
type = "lower", lab = TRUE,
lab_size = 1.5, method = "square",
tl.cex = 7.5,
pch = 4,
colors = c("#6D9EC1", "white", "#E46726"),
title = "Correlation Plot Matrix")