DATA606 - Final Data Project

December 8, 2020

HUD and Homeless Services Data

Overview

Core Concept: Are the per capita CoC award amounts predictive of reduced populations of people experiencing homelessness over a five year period?

Dependent Variable: Population of people experiencing homelessness in a community

Independent Variable: Per capita amount of funding through the HUD Continuum of Care (CoC) program, by Continuums of Care in U.S. states and territories

Setting the Stage:

Where’s data coming from?
- Points in time
- Smoothing edges by taking and analyzing averages
What’s a CoC?
- Somewhat arbitrary HUD jurisdiction for some federal awards
- Competitive HUD program/funding stream
Why it’s important, or might be

Initial Impressions - All Communities

ggplot(pit, aes(avgFund, avgPop)) + geom_point() +
    xlab("Average per Capita Funding (CoC)") + ylab("Average Population") +
  geom_smooth(method='lm',formula= y~x) +
    ggtitle("Homeless Population vs per capita CoC Funding")

Initial Impressions - Outliers Dropped

ggplot(pitSub, aes(avgFund, avgPop)) + geom_point() +
    xlab("Average per Capita Funding (CoC)") + ylab("Average Population") +
  geom_smooth(method='lm',formula= y~x) +
    ggtitle("Homeless Population vs per capita CoC Funding")

Correlation Tables - All Communities

cor.test(pit$avgPop, pit$avgFund)

## 
##  Pearson's product-moment correlation
## 
## data:  pit$avgPop and pit$avgFund
## t = -1.4848, df = 381, p-value = 0.1384
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.17472641  0.02454533
## sample estimates:
##         cor 
## -0.07584781

Correlation Tables - Outliers Dropped

cor.test(pitSub$avgPop, pitSub$avgFund)

## 
##  Pearson's product-moment correlation
## 
## data:  pitSub$avgPop and pitSub$avgFund
## t = -1.7494, df = 379, p-value = 0.08103
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.1882776  0.0110680
## sample estimates:
##         cor 
## -0.08950108

Cor: -0.090 (vs. -0.076 for All Communities)

Linear Regression Model - All Communities

pitModel <- lm(avgPop ~ avgFund, data = pit)
summary(pitModel)

## 
## Call:
## lm(formula = avgPop ~ avgFund, data = pit)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -1728  -1163   -794    -25  74794 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1864.24905  374.77714   4.974 9.93e-07 ***
## avgFund       -0.08808    0.05932  -1.485    0.138    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4775 on 381 degrees of freedom
## Multiple R-squared:  0.005753,   Adjusted R-squared:  0.003143 
## F-statistic: 2.205 on 1 and 381 DF,  p-value: 0.1384

R-squared: 0.006
P-value: 0.138

Residual Analysis

Log Transformation - All Communities

ggplot(pit, aes(avgFund, avgPop)) + geom_point() +
    scale_x_log10() + scale_y_log10() +
    xlab("perCapita CoC Funding") + ylab("Avg. Homeless Population") +
  geom_smooth(method='glm',formula= y~x) +
    ggtitle("Avg. Per Capita Funding by Avg. Total Homeless Population")

Log Transformation - Outliers Dropped

ggplot(pitSub , aes(avgFund, avgPop)) + geom_point() +
    scale_x_log10() + scale_y_log10() +
    xlab("Log of perCapita CoC Funding") + ylab("Log of Avg. Homeless Population") +
  geom_smooth(method='glm',formula= y~x) +
    ggtitle("Avg. Per Capita Funding by Avg. Total Homeless Population")

Linear Regression Model (log transform) - All Communities

logmodel <- lm(log(avgFund) ~ log(avgPop), data = pit)
summary(logmodel)

## 
## Call:
## lm(formula = log(avgFund) ~ log(avgPop), data = pit)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.7937 -0.5134  0.1627  0.6306  2.0274 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  8.59419    0.31459  27.319   <2e-16 ***
## log(avgPop) -0.07919    0.04788  -1.654    0.099 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.033 on 381 degrees of freedom
## Multiple R-squared:  0.007127,   Adjusted R-squared:  0.004521 
## F-statistic: 2.735 on 1 and 381 DF,  p-value: 0.099

R-squared: 0.007 (vs. linear 0.006)
P-value: 0.099 (vs. linear 0.138)

Conclusions and Limitations

Many limitations on these data…

Point-in-Time and subject to wide swings
Funding landscape is nuanced, broad, and vary by geography
COC Funding limited in scope and scale
Competition makes things very political

Conclusion: Would have been surprised to see a significant and substantial connection between funding (at least this stream) and total homeless population. That’s just not the case as these data show. Useful as part of a larger model, however, with other facets and more influential factors, and annualized actual data.