In the following analysis, I would like to see the distribution of rent (DP04_0134E) among the census tracts in the Alamo Area Council of Governments (AACOG), Texas, and how is rent affected by race/ethnicity and median household income? The data are gathered using the get_acs function of the tidycensus package, which essentially provides access to the American Community Survey data. For the sake of this analysis, the data have been restricted to the State of Texas, and only to a few census tracts which are combinedly known as AACOG.

It says that we expect the proportion of the population that is White, the proportion of the population that is Hispanic, the proportion of the population that is Black, and the proportion of the population with low Median Household Income to affect the rental distribution across the census tracts.

lm(rent ~ pwhite + phisp + pblack + medhhinc)

Mapping the Distribution of Rent in Census Tracts -AACOG 2015

sa_acs2%>%
  mutate(rentquant=cut(prent, breaks = quantile(sa_acs2$prent, p=seq(0,1,length.out = 8)), include.lowest = T))%>%
  ggplot(aes(color=rentquant, fill=rentquant))+geom_sf()+
  scale_fill_viridis_d()+
  scale_color_viridis_d()+
  ggtitle(label = "Distribution of Rent in Census Tracts -AACOG 2015")+geom_sf(data=metro, fill=NA, color="black")

3) Linear Regression Model

fit <- lm(prent ~pwhite + phisp + pblack + medhhinc, data=sa_acs2)
summary(fit)
## 
## Call:
## lm(formula = prent ~ pwhite + phisp + pblack + medhhinc, data = sa_acs2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -736.04 -102.72   -4.44   96.24 1236.27 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.513e+03  2.518e+02   6.010 3.76e-09 ***
## pwhite      -1.151e+01  2.544e+00  -4.525 7.68e-06 ***
## phisp       -1.110e+01  2.491e+00  -4.457 1.04e-05 ***
## pblack      -7.275e+00  3.123e+00  -2.330   0.0202 *  
## medhhinc     9.029e-03  4.783e-04  18.877  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 194.5 on 462 degrees of freedom
## Multiple R-squared:  0.6192, Adjusted R-squared:  0.6159 
## F-statistic: 187.8 on 4 and 462 DF,  p-value: < 2.2e-16

All the predictors, percetage white, percentage hispanic, percentage black, and median household income show siginificant association with rent.

3)b. Studentized Residual

Mapping the studentized residual

library(ggplot2)
library(sf)
library(dplyr)
sa_acs2<-st_as_sf(sa_acs2)
sa_acs2%>%
  mutate(rquant=cut(olsresid, breaks = quantile(sa_acs2$olsresid, p=seq(0,1,length.out = 8)), include.lowest = T))%>%
  ggplot(aes(color=rquant, fill=rquant))+geom_sf()+
  scale_fill_viridis_d()+
  scale_color_viridis_d()+geom_sf(data=metro, fill=NA, color="black")+ggtitle("OLS Model Residuals")

If we visually interpret the map we can see significant clustering and correlation in the residuals.

3)a. Moran’s I on residuals from model

lm.morantest(fit, listw = wts1) #Queen Adjacency Matrix
## 
##  Global Moran I for regression residuals
## 
## data:  
## model: lm(formula = prent ~ pwhite + phisp + pblack + medhhinc,
## data = sa_acs2)
## weights: wts1
## 
## Moran I statistic standard deviate = 3.0034, p-value = 0.001335
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I      Expectation         Variance 
##     0.0722049972    -0.0071326636     0.0006977835
lm.morantest(fit, listw = wts2) #Rook Adjacency Matrix
## 
##  Global Moran I for regression residuals
## 
## data:  
## model: lm(formula = prent ~ pwhite + phisp + pblack + medhhinc,
## data = sa_acs2)
## weights: wts2
## 
## Moran I statistic standard deviate = 2.8102, p-value = 0.002475
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I      Expectation         Variance 
##     0.0733171402    -0.0072399790     0.0008217097

We can see that, Moran’s I is very close for both neighborhood specification, with the Rook Adjacency Matrix providing a slightly larger degree of dependence. The observed value for Moran’s I for the residual is .07, with a z-test of 2.81 and a small p-value allow us to conclude that there is significant autocorrelation in the model residuals. So it violates the assumptions of the model.

Moran Scatterplot

moran.plot(sa_acs2$prent, listw=wts1, main="Moran Scatterplot (Queen)", xlab = "Distribution of Rent", ylab="Spatially lagged Rent")

moran.plot(sa_acs2$prent, listw=wts2, main="Moran Scatterplot (Rook)",  xlab = "Distribution of Rent", ylab="Spatially lagged Rent")

Local Moran’s I (Rook)

Map of Local Moran Statistic for the Residuals

spplot(sa_acs2, "localir", main="Local Moran's I (Rook)", 
       at=quantile(sa_acs2$localir), 
       col.regions=brewer.pal(n=4, "RdBu"))

Local Moran Clusters

spplot(sa_acs2, "clr", main="Local Moran's Clusters (Rook)", 
       col.regions=c(2,0))

4) Alternative SAR model specifications

Using Lagrange Multiplier Test (LMT)

#lm.LMtests(model = fit, listw=wts1, test = c("LMerr", "LMlag", "RLMerr", "RLMlag"))
lm.LMtests(model = fit, listw=wts2, test = c("LMerr", "LMlag", "RLMerr", "RLMlag"))
## 
##  Lagrange multiplier diagnostics for spatial dependence
## 
## data:  
## model: lm(formula = prent ~ pwhite + phisp + pblack + medhhinc,
## data = sa_acs2)
## weights: wts2
## 
## LMerr = 6.3352, df = 1, p-value = 0.01184
## 
## 
##  Lagrange multiplier diagnostics for spatial dependence
## 
## data:  
## model: lm(formula = prent ~ pwhite + phisp + pblack + medhhinc,
## data = sa_acs2)
## weights: wts2
## 
## LMlag = 10.504, df = 1, p-value = 0.001191
## 
## 
##  Lagrange multiplier diagnostics for spatial dependence
## 
## data:  
## model: lm(formula = prent ~ pwhite + phisp + pblack + medhhinc,
## data = sa_acs2)
## weights: wts2
## 
## RLMerr = 0.026686, df = 1, p-value = 0.8702
## 
## 
##  Lagrange multiplier diagnostics for spatial dependence
## 
## data:  
## model: lm(formula = prent ~ pwhite + phisp + pblack + medhhinc,
## data = sa_acs2)
## weights: wts2
## 
## RLMlag = 4.1955, df = 1, p-value = 0.04053

We can see that the lag model has more support, with both the regular and the robust LMT showing larger values for that model, compared to the error model specification.

5&6) Fitting the spatial regression models

Spatial Error model

fit.err<-errorsarlm(prent ~pwhite + phisp + pblack + medhhinc, data=sa_acs2, listw=wts2)
## Warning in errorsarlm(prent ~ pwhite + phisp + pblack + medhhinc, data = sa_acs2, : inversion of asymptotic covariance matrix failed for tol.solve = 1e-10 
##   reciprocal condition number = 5.23578e-15 - using numerical Hessian.
summary(fit.err)
## 
## Call:errorsarlm(formula = prent ~ pwhite + phisp + pblack + medhhinc, 
##     data = sa_acs2, listw = wts2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -740.496 -100.336  -10.743   92.721 1226.234 
## 
## Type: error 
## Coefficients: (asymptotic standard errors) 
##                Estimate  Std. Error z value  Pr(>|z|)
## (Intercept)  1.4471e+03  2.6042e+02  5.5566 2.752e-08
## pwhite      -1.0521e+01  2.6578e+00 -3.9586 7.539e-05
## phisp       -1.0406e+01  2.5840e+00 -4.0271 5.647e-05
## pblack      -6.7035e+00  3.2673e+00 -2.0517    0.0402
## medhhinc     8.8480e-03  5.0889e-04 17.3870 < 2.2e-16
## 
## Lambda: 0.17531, LR test value: 6.0764, p-value: 0.0137
## Approximate (numerical Hessian) standard error: 0.07029
##     z-value: 2.494, p-value: 0.01263
## Wald statistic: 6.2203, p-value: 0.01263
## 
## Log likelihood: -3118.373 for error model
## ML residual variance (sigma squared): 36717, (sigma: 191.62)
## Number of observations: 467 
## Number of parameters estimated: 7 
## AIC: 6250.7, (AIC for lm: 6254.8)

We can see that all the effects are significant in the model, similar to what we saw in the OLS model above. The direction of the coefficients are similar to the ones we can see in the OLS models, but the magnitudes of the regression effects are different. We can also see the estimate of the spatial auto-regression coefficient for the lag model, \(\lambda\) is 0.17, with a standard error of 0.07, and a z-test that shows the coefficient is different from 0. The output also contains the estimates of the residual mean square (7.03), which is much smaller than it was in the OLS model (191.62), indicating the error model is explaining more of the variation in the data than the OLS model. Finally, we see the Akaike Information Criteria, or AIC, which is a measure of model fit. For the error model it is 6250.7 and for the OLS model it was 6254.8, a slight difference. In this case, the error model is preferred based on the AIC value.

Spatial Lag Model

fit.lag<-lagsarlm(prent ~pwhite + phisp + pblack + medhhinc, data=sa_acs2, listw=wts2, type="lag")
## Warning in lagsarlm(prent ~ pwhite + phisp + pblack + medhhinc, data = sa_acs2, : inversion of asymptotic covariance matrix failed for tol.solve = 1e-10 
##   reciprocal condition number = 3.63626e-15 - using numerical Hessian.
summary(fit.lag)
## 
## Call:lagsarlm(formula = prent ~ pwhite + phisp + pblack + medhhinc, 
##     data = sa_acs2, listw = wts2, type = "lag")
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -721.102  -98.297   -5.039   87.832 1224.698 
## 
## Type: lag 
## Coefficients: (numerical Hessian approximate standard errors) 
##                Estimate  Std. Error z value  Pr(>|z|)
## (Intercept)  1.2125e+03  2.6570e+02  4.5634 5.032e-06
## pwhite      -9.4873e+00  2.5843e+00 -3.6712 0.0002414
## phisp       -8.9828e+00  2.5416e+00 -3.5343 0.0004088
## pblack      -5.7148e+00  3.1079e+00 -1.8388 0.0659458
## medhhinc     8.2069e-03  5.3875e-04 15.2334 < 2.2e-16
## 
## Rho: 0.15694, LR test value: 9.4481, p-value: 0.0021137
## Approximate (numerical Hessian) standard error: 0.050387
##     z-value: 3.1148, p-value: 0.0018409
## Wald statistic: 9.7018, p-value: 0.0018409
## 
## Log likelihood: -3116.687 for lag model
## ML residual variance (sigma squared): 36497, (sigma: 191.04)
## Number of observations: 467 
## Number of parameters estimated: 7 
## AIC: 6247.4, (AIC for lm: 6254.8)

The regression effects show that the coefficients changed compared to the error or OLS models. Specifically, the lag model shows a notable degree of change in the effect of the %white and the %Hispanic variables. The auto-regressive coefficient, \(\rho\) for the lag model is also significant and positive, indicating that rent is significantly associated with rents in neighboring areas. The lag model fit (AIC value of 6247.4) is better than the OLS model (6254.8).

Comparing the AIC value among these three models, the lag model is a better measure of model fit.