Homelessness in Seattle

One Night Count

Data for this project is shared on my github:

oncDF<-read.csv("https://raw.githubusercontent.com/kitadasmalley/fallChallenge2019/main/data/oneNightCount.csv", 
                header=TRUE)

Making Barcharts for ONC Data

These data show the trends over time for One Night Count data from the 8 cities/areas of the greater Seattle Metropolitan area. It includes total counts for each location as well subcounts for locations where homeless were counted.

library(tidyverse)

# GATHER DATA TO TIDY
oncG<-oncDF%>%
  gather(City, Count, -c(Location,YEAR))

#View(oncG)

# TOTAL COUNTS 
oncTot<-oncG%>%
  filter(Location=="TOTAL", 
         !City %in% c("TOTAL", "EAST.SIDE", 
                     "NIGHT.OWL.BUSES", "NORTH.END"))

# STACKED BAR CHART (SHOWS OVERALL TOTAL FOR KING COUNTY)
ggplot(oncTot, aes(YEAR, Count, fill=City))+
  geom_bar(stat="identity")

# SIDE-BY-SIDE BAR CHART
ggplot(oncTot, aes(YEAR, Count, fill=City))+
  geom_bar(stat="identity", position="dodge")

# LOOK AT THE LOCATIONS OF HOMELESS
oncLoc<-oncG%>%
  filter(Location!="TOTAL", 
         !City %in% c("TOTAL", "EAST.SIDE", 
                      "NIGHT.OWL.BUSES", "NORTH.END"))

# SIDE-BY-SIDE BAR CHART
# WITH FACET FOR LOCATION
ggplot(oncLoc, aes(YEAR, Count, fill=Location))+
  geom_bar(stat="identity", position="dodge")+
  facet_wrap(~City, scales="free")+
  theme_bw()

# SEATTLE ONLY
oncSEA<-oncG%>%
  filter(Location!="TOTAL", 
         City == "SEATTLE")

# SIDE-BY-SIDE BAR CHART
# WITH FACET FOR LOCATION
ggplot(oncSEA, aes(YEAR, Count, fill=Location))+
  geom_bar(stat="identity", position="dodge")+
  theme_bw()

Making Map for ONC Data

The team also mapped the ONC cities to census tracts, which was a challenging task. The Census uses GEO_IDs for their shapefiles and there uniquely define areas based on three things: State, County, and Census tract. However, there is not a level for city. We approximated to the best of our abilities by overlaying maps for the cities with Census tracts.

# IMPORT OUR DATA
oncT<-read.csv("https://raw.githubusercontent.com/kitadasmalley/fallChallenge2019/main/data/oncTract2.csv",
               header=TRUE,
               stringsAsFactors = FALSE)

oncT$City<-as.factor(oncT$City)

These tract codes were then joined with shapefiles from the Census’ Tigris package in R.

# LOAD THE TIGRIS PACKAGE
#install.packages("sp")
#install.packages("sf")
#install.packages("tigris")
#install.packages("tidycensus")

library(sp)
library(sf)
library(tigris)
library(tidycensus)

# SET OPTIONS
options(tigris_class="sf")
options(tigris_use_cache = TRUE)

# SET API KEY
census_api_key("fbf65ba295ce81cab0a2eccd52d62ca564bf896b",
              overwrite =TRUE)

# set year for boundaries 
this.year = 2010 #the last census

wa_tracts <- tracts(state = 'WA', county = 'King', 
                    cb = T, year = this.year)

# GEO JOIN DATA 
joinONC<-geo_join(wa_tracts, oncT, 
                 by_sp="GEO_ID", by_df="GEO_ID")

# CREAT PLOT IN GGPLOT
ggplot(data = joinONC, aes(fill=City))+
  geom_sf()+
  #coord_sf(xlim=c(-122.6, -122))+ #zoom in on onc only
  theme_bw()+
  ggtitle("King County Cities Included in One Night Count Data")

Mapping Meal Programs

We also wanted to explore the placement of meal programs in the Seattle area. This data came from the City of Seattle and is also publically hosted on Kaggle. Addresses for each location are provided, which we geo-coded into their latitudes and longitudes.

Note that there is a high concentration of meal programs in downtown Seattle but not many in other locations. This might suggest that this is where homeless people are located.

# IMPORT THE GEOCODED DATA
food<-read.csv("https://raw.githubusercontent.com/kitadasmalley/fallChallenge2019/main/data/GEOCODE_FOOD%20-%20Sheet1.csv",
               header=TRUE)

ggplot(data = joinONC)+
  geom_sf(aes(fill=City))+
  geom_point(data=food, aes(Long, Lat))+
  theme_bw()

Maps for Data from the American Community Survey (ACS)

We explored several possible explantory variables from the American Community Survey (ACS) including:

B02001_001: Total
B03002_003: White alone (Not Hispanic or Latino)
B03002_004 Black or African American alone (Not Hispanic or Latino)
B03002_012: Hispanic or Latino
B03002_005: Native American alone (Not Hispanic or Latino)
B03002_006: Asian alone (Not Hispanic or Latino)
B03002_007: Native Hawaiian or Pacific Islander alone (Not Hispanic or Latino)
B03002_009: Multiple Races (Not Hispanic or Latino)
B03002_008: Other (Not Hispanic or Latino)
B25064_001 MEDIAN GROSS RENT
B25071_001: Rent Burden (MEDIAN GROSS RENT AS A PERCENTAGE OF HOUSEHOLD INCOME)
B19013_001: MEDIAN HOUSEHOLD INCOME IN PAST 12 MONTHS
B01002_001: Median age
B25115_016: Renter Occupied - family
B25115_027: Renter Occupied - nonfamily

Here are plots for median home value, income, and population

Median Home Value by Tract

#install.packages("leaflet")
library(leaflet)

# MEDIAN HOME VALUE
wa2 <- get_acs(geography = "tract", year=this.year,
               state = "WA", county = "King",
               variables = "B25077_001E",
               geometry = TRUE)

pal<-colorNumeric("Greens", domain=0:ceiling(max(wa2$estimate, na.rm=TRUE)))

popup<-paste("Tract: ", as.character(substring(wa2$GEOID, 6, 11)), "<br>",
             "Median Home Value: ", as.character(wa2$estimate))
leaflet()%>%
  addProviderTiles("CartoDB.Positron")%>%
  addPolygons(data=wa2,
              fillColor= ~pal(wa2$estimate),
              fillOpacity = .7,
              weight =.5,
              smoothFactor = 0.2,
              popup = popup)

Median Household Income by Tract

#install.packages("leaflet")
library(leaflet)

# MEDIAN HOUSEHOLD INCOME
wa2 <- get_acs(geography = "tract", year=this.year,
               state = "WA", county = "King",
               variables = "B19013_001",
               geometry = TRUE)

pal<-colorNumeric("Blues", domain=0:ceiling(max(wa2$estimate, na.rm=TRUE)))

popup<-paste("Tract: ", as.character(substring(wa2$GEOID, 6, 11)), "<br>",
             "Median Household Income: ", as.character(wa2$estimate))
leaflet()%>%
  addProviderTiles("CartoDB.Positron")%>%
  addPolygons(data=wa2,
              fillColor= ~pal(wa2$estimate),
              fillOpacity = .7,
              weight =.5,
              smoothFactor = 0.2,
              popup = popup)

Population by Tract

#install.packages("leaflet")
library(leaflet)

# Population
wa2 <- get_acs(geography = "tract", year=this.year,
               state = "WA", county = "King",
               variables = "B02001_001",
               geometry = TRUE)

pal<-colorNumeric("Reds", domain=0:ceiling(max(wa2$estimate, na.rm=TRUE)))

popup<-paste("Tract: ", as.character(substring(wa2$GEOID, 6, 11)), "<br>",
             "Population: ", as.character(wa2$estimate))
leaflet()%>%
  addProviderTiles("CartoDB.Positron")%>%
  addPolygons(data=wa2,
              fillColor= ~pal(wa2$estimate),
              fillOpacity = .7,
              weight =.5,
              smoothFactor = 0.2,
              popup = popup)

ACS Data Wangling

Now that we have the mapping for the ONC cities to tracts we can join it with ACS data to model counts of homeless (response) as a function of demographic explanatory variables.

When we download the ACS data it is not in the proper tidy format. When we join it to the ONC data we will further aggregate (group_by) and summarise tract level data so that we have information at the ONC city level.

# REFORMAT COLUMNS FOR GEOID
onc_GEO<-oncT%>%
  select(City, Nickname, GEO_ID)%>%
  separate(GEO_ID, c("US", "GEOID"), sep="US", remove=FALSE)

#years 2007 to 2016 for ONC
# ACS starts at 2010
# only does 5 year estimates

this.year=2013

# This looks at the 5 year estimates
# You can also do "acs1"
vars <- load_variables(year = this.year,
                       dataset = "acs5",
                       cache = TRUE)

# There are 22711 possible variables 
#dim(vars)
#View(vars)

waDem <- get_acs(geography = "tract", year=this.year,
                 state = "WA", county = "King", geometry = TRUE,
                 variables = c(popululation = "B02001_001",
                               median.gross.rent = "B25064_001",
                               median.household.income = "B19013_001",
                               med.home.value ="B25077_001E",
                               med.age ="B01002_001",
                               rent.burden = "B25071_001",
                               white = "B03002_003", 
                               af.am = "B03002_004",
                               hispanic = "B03002_012",
                               am.ind = "B03002_005",
                               asian = "B03002_006",
                               nh.pi = "B03002_007",
                               multiple = "B03002_009",
                               other = "B03002_008"))

## Getting data from the 2009-2013 5-year ACS

waPct<-as.data.frame(waDem)[,c(1,3:4)]%>%
  spread(variable, estimate)%>%
  mutate(checkTot = white+af.am+hispanic+am.ind+ # looks good!
           asian+nh.pi+multiple+other)%>%
  mutate(pct.white = white/checkTot,
         pct.af.am = af.am/checkTot,
         pct.hispanic = hispanic/checkTot,
         pct.am.ind = am.ind/checkTot,
         pct.asian = asian/checkTot,
         pct.nh.pi = nh.pi/checkTot,
         pct.multiple = multiple/checkTot, 
         pct.other = other/checkTot, 
         year = this.year)

# assumes the median is approximately equal to the mean 
# this may not be true, but its the only way to combine the medians 
waACS_J<-waPct%>%
  left_join(onc_GEO)%>%
  filter(is.na(City)==FALSE)%>%
  group_by(year, Nickname)%>%
  summarise(nTracts=n(), 
            pop=sum(popululation, na.rm=TRUE),
            income=sum(median.household.income*popululation)/sum(popululation),
            rent=sum(median.gross.rent*popululation)/sum(popululation),
            home=sum(B25077_001*popululation, na.rm=TRUE)/sum(popululation),
            age=sum(med.age*popululation, na.rm=TRUE)/sum(popululation),
            pctW=sum(pct.white*popululation)/sum(popululation), 
            pctH=sum(pct.hispanic*popululation)/sum(popululation))

## Joining, by = "GEOID"

# WRANGLE ONC DATA
oncYearT<-oncDF%>%
  gather(City, Count, -c(Location,YEAR))%>%
  filter(YEAR==this.year, 
         Location=="TOTAL")%>%
  mutate(Nickname=City)

#View(oncYearT)

#head(waACS_J)

# JOIN ONC AND ACS DATA
yearJoin<-oncYearT%>%
  inner_join(waACS_J, by="Nickname")

Variable Selection and Model Fitting

After fitting multiple linear models each year separately using the ONC counts as the response. We tried several models with the candidate variables and interactions; however, the only model that was significant had both population and income as explanatory variables. Alternatively, rent could have been used instead of income; however, their appear to be correlated so they can be exchangable but including both would lead to issues with multicollinearity.

Observe the direction of the coefficients. For instance, as average income increases the count of homeless decreases.

Multivariate Linear Regression

# tried other models with rent and percent minority groups 
# but they were not significant 
# also income and rent appear to be highly correlated 
# (so only one can stay in the model because of multicollinearity) 
# also fit models with interactions, but they were not significant
mod1<-lm(Count~pop+income, data=yearJoin)
summary(mod1)

## 
## Call:
## lm(formula = Count ~ pop + income, data = yearJoin)
## 
## Residuals:
##       1       2       4       5       6       7 
##   2.962 -59.055  -2.193  -5.350  15.933  47.704 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  8.055e+02  1.769e+02   4.554  0.01984 *  
## pop          4.006e-03  1.265e-04  31.665 6.92e-05 ***
## income      -1.906e-02  3.221e-03  -5.918  0.00964 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 44.94 on 3 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.998,  Adjusted R-squared:  0.9967 
## F-statistic: 757.1 on 2 and 3 DF,  p-value: 8.792e-05

Mixed Effects Model

Since we have data from multiple years we have think of it as repeated measures for each city. Now we use a mixed effects model to fit a random effect for each city.

Note: That we used the packages lme4 for mixed effects models and car to test significance.

Using a mixed effects models that takes homeless counts from all the ONC cities, as well as population and income from the ACS from 2010-2016, we found that population and income were significant. However, when allowing for a random effect of city it reverses the direction of the effect of income.

# Loop to pull many years of ACS 
for(i in 2010:2016){
  
  this.year=i
  
  
  waDem <- get_acs(geography = "tract", year=this.year,
                   state = "WA", county = "King", geometry = TRUE,
                   variables = c(popululation = "B02001_001",
                                 median.household.income = "B19013_001"))
  
  waPct<-as.data.frame(waDem)[,c(1,3:4)]%>%
    spread(variable, estimate)%>%
    mutate(year=this.year)
  
  waACS_J<-waPct%>%
    left_join(onc_GEO)%>%
    filter(is.na(City)==FALSE)%>%
    group_by(year, Nickname)%>%
    summarise(nTracts=n(), 
              pop=sum(popululation, na.rm=TRUE),
              income=sum(median.household.income*popululation)/sum(popululation))
  
  oncYearT<-oncDF%>%
    gather(City, Count, -c(Location,YEAR))%>%
    filter(YEAR==this.year, 
           Location=="TOTAL")%>%
    mutate(Nickname=City)
  
  yearJoin<-oncYearT%>%
    inner_join(waACS_J, by="Nickname")
  
  if(i==2010){
  allYears<-yearJoin
  }
  
  if(i>2010){
    allYears<-rbind(allYears,yearJoin)
  }
  
  
}

#View(allYears)

# PACKAGE FOR MIXED EFFECTS MODELING
#install.packages("lme4")
library(lme4)

mixedMod<-lmer(Count~pop+income+(1|City), data=allYears)
summary(mixedMod)

## Linear mixed model fit by REML ['lmerMod']
## Formula: Count ~ pop + income + (1 | City)
##    Data: allYears
## 
## REML criterion at convergence: 557.4
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.2512 -0.3272 -0.0645  0.2799  3.6669 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  City     (Intercept) 106907   327.0   
##  Residual              10230   101.1   
## Number of obs: 43, groups:  City, 8
## 
## Fixed effects:
##               Estimate Std. Error t value
## (Intercept) -1.945e+03  3.972e+02  -4.898
## pop          3.720e-03  6.580e-04   5.653
## income       2.901e-02  6.873e-03   4.221
## 
## Correlation of Fixed Effects:
##        (Intr) pop   
## pop     0.219       
## income -0.929 -0.451
## fit warnings:
## Some predictor variables are on very different scales: consider rescaling

anova(mixedMod)

## Analysis of Variance Table
##        Df Sum Sq Mean Sq F value
## pop     1 732494  732494  71.604
## income  1 182232  182232  17.814

# PACKAGE FOR TESTING SIGNIFICANCE OF COEFFICIENTS 
# IN MIXED MODEL
#install.packages("car")
library(car)
Anova(mixedMod)

## Analysis of Deviance Table (Type II Wald chisquare tests)
## 
## Response: Count
##         Chisq Df Pr(>Chisq)    
## pop    31.955  1  1.578e-08 ***
## income 17.814  1  2.436e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1