Plot I

Column

Exploratory Spatial Data Analysis (ESDA)

ESDA techniques can help detect spatial patterns in data, lead to the formulation of hypotheses based on the geography of the data, and in assessing spatial models. ESDA helps determine whether the OLS model needs to incorporate spatial dependency. In this section, the goal is to visually detect for spatial dependency or autocorrelation in outcome variable, mental health prevalence. The outcome appears to cluster, so further analysis in necessary.

Standard linear regression

Column

As a first step, let’s examine the relationship between poor mental health and the unemployment rate unempr, the percent of residents who moved in the past year pmob, percent of 25 year olds with a college degree pcol, percent poverty ppov, percent non-Hispanic black pnhblk, percent Hispanic phisp, and the log population size.


Call:
lm(formula = MHLTH_CrudePrev ~ unempr + pmob + pcol + ppov + 
    pnhblk + phisp + log(tpop), data = sea.tracts)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.47690 -0.40419 -0.01758  0.42427  2.33766 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  9.088319   1.393019   6.524 1.53e-09 ***
unempr       0.090346   0.025929   3.484 0.000681 ***
pmob         0.004222   0.007091   0.595 0.552608    
pcol        -0.034845   0.006390  -5.453 2.54e-07 ***
ppov         0.149402   0.009736  15.346  < 2e-16 ***
pnhblk      -0.015599   0.009847  -1.584 0.115693    
phisp        0.028165   0.015996   1.761 0.080719 .  
log(tpop)    0.222953   0.164192   1.358 0.176949    
---
Signif. codes:  
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6642 on 125 degrees of freedom
Multiple R-squared:  0.919, Adjusted R-squared:  0.9144 
F-statistic: 202.5 on 7 and 125 DF,  p-value: < 2.2e-16

It appears that higher unemployment and poverty rates are associated with higher levels of poor mental health whereas higher percent college educated is associated with lower levels. Tools and approaches for testing OLS assumptions are not addressed here because it is beyond the scope of this analysis. Let’s focus on spatial exploratory analysis next.

Plot II

Column

Second, mapping the residuals from the OLS regression model to see if there is visual evidence of spatial autocorrelation in the error term.

The residuals appear to cluster, so further analysis in necessary.

Plot III

Column

Spatial Autocorrelation: There appears to be evidence of clustering based on the exploratory maps, so let’s first examine the Moran scatterplot. It looks like there is a spatial association.

Tests

Column

Let’s use a popular test of spatial autocorrelation: the Global Moran’s I test with and without Monte Carlo simulation to get the p-value. A rule of thumb is a spatial autocorrelation higher than 0.3 and lower than -0.3 is meaningful. First, for the dependent variable:

    Monte-Carlo simulation of Moran I

data:  sea.tracts$MHLTH_CrudePrev 
weights: seaw  
number of simulations + 1: 1000 

statistic = 0.39728, observed rank = 1000, p-value =
0.001
alternative hypothesis: greater
Second, for the OLS regression residuals:

    Global Moran I for regression residuals

data:  
model: lm(formula = MHLTH_CrudePrev ~ unempr + pmob +
pcol + ppov + pnhblk + phisp + log(tpop), data =
sea.tracts)
weights: seaw

Moran I statistic standard deviate = 3.9687, p-value
= 3.614e-05
alternative hypothesis: greater
sample estimates:
Observed Moran I      Expectation         Variance 
     0.175864778     -0.025326838      0.002570005 

Both the dependent variable and the residuals indicate spatial autocorrelation, although the Moran’s I for the residuals is not strong (but yet statistically significant).

SLM

Column

Based on the exploratory mapping, Moran scatterplot, and the global Moran’s I, there appears to be spatial autocorrelation in the dependent variable. This means that if there is a spatial lag process going on and we fit an OLS model the regression coefficients will be biased and inefficient. That is, the coefficient sizes and signs are not close to their true value and its standard errors are underestimated.

There are two standard types of spatial regression models: a spatial lag model (SLM), which models dependency in the outcome, and a spatial error model (SEM), which models dependency in the residuals.

Let’s start with the SLM:


Call:
lagsarlm(formula = MHLTH_CrudePrev ~ unempr + pmob + pcol + ppov + 
    pnhblk + phisp + log(tpop), data = sea.tracts, listw = seaw)

Residuals:
      Min        1Q    Median        3Q       Max 
-1.494101 -0.412011 -0.012753  0.446070  2.279226 

Type: lag 
Coefficients: (asymptotic standard errors) 
              Estimate Std. Error z value  Pr(>|z|)
(Intercept)  9.5003409  1.5662067  6.0658 1.313e-09
unempr       0.0904386  0.0251047  3.6025 0.0003152
pmob         0.0039999  0.0068667  0.5825 0.5602229
pcol        -0.0360700  0.0065844 -5.4781 4.299e-08
ppov         0.1516731  0.0100004 15.1666 < 2.2e-16
pnhblk      -0.0161411  0.0095358 -1.6927 0.0905151
phisp        0.0277670  0.0155579  1.7848 0.0743011
log(tpop)    0.2280948  0.1590558  1.4341 0.1515565

Rho: -0.034898, LR test value: 0.38247, p-value: 0.53628
Asymptotic standard error: 0.055763
    z-value: -0.62583, p-value: 0.53142
Wald statistic: 0.39167, p-value: 0.53142

Log likelihood: -129.9789 for lag model
ML residual variance (sigma squared): 0.41332, (sigma: 0.6429)
Number of observations: 133 
Number of parameters estimated: 10 
AIC: 279.96, (AIC for lm: 278.34)
LM test for residual autocorrelation
test value: 14.719, p-value: 0.00012481

The unemployment rate, percent college educated and percent poverty continue to be statistically significant. The lag parameter is Rho, whose value is quite small at -0.035 and not statistically significant across all tests. This indicates that the spatial lag in the dependent variable is accounted for through the demographic and socioeconomic variables already included in the model. This likely shows that a spatial lag on the dependent variable is not needed.

SEM

Column

Spatial error model (SEM)

The spatial error model incorporates spatial dependence in the errors. If there is a spatial error process going on and we fit an OLS model our coefficients will be unbiased but inefficient. That is, the coefficient size and sign are asymptotically correct but its standard errors are underestimated.

Call:
errorsarlm(formula = MHLTH_CrudePrev ~ unempr + pmob + pcol + 
    ppov + pnhblk + phisp + log(tpop), data = sea.tracts, listw = seaw)

Residuals:
      Min        1Q    Median        3Q       Max 
-1.365103 -0.419244 -0.015275  0.413074  1.964454 

Type: error 
Coefficients: (asymptotic standard errors) 
              Estimate Std. Error z value  Pr(>|z|)
(Intercept) 11.6061151  1.3058441  8.8878 < 2.2e-16
unempr       0.0780587  0.0229153  3.4064 0.0006583
pmob         0.0147918  0.0077040  1.9200 0.0548564
pcol        -0.0429317  0.0068995 -6.2224 4.896e-10
ppov         0.1442649  0.0096688 14.9206 < 2.2e-16
pnhblk      -0.0041949  0.0103829 -0.4040 0.6861972
phisp        0.0175817  0.0146865  1.1971 0.2312555
log(tpop)   -0.0286693  0.1502948 -0.1908 0.8487186

Lambda: 0.51295, LR test value: 14.233, p-value: 0.00016152
Asymptotic standard error: 0.098797
    z-value: 5.1919, p-value: 2.0814e-07
Wald statistic: 26.956, p-value: 2.0814e-07

Log likelihood: -123.0536 for error model
ML residual variance (sigma squared): 0.35148, (sigma: 0.59286)
Number of observations: 133 
Number of parameters estimated: 10 
AIC: 266.11, (AIC for lm: 278.34)

The unemployment rate, percent college educated and percent poverty continue to be statistically significant. The lag error parameter Lambda is positive and significant, indicating the need to control for spatial autocorrelation in the error.

Comparing Models

Column

One way of deciding which model is appropriate is to examine the fit statistic Akaike Information Criterion (AIC), which is a index of sorts to indicate how close the model is to reality. A lower value indicates a better fitting model.

Model AIC
OLS 278.34
SLM 279.96
SEM 266.11