library(nortest)
library(car)
library(lmtest)
library(classInt)

Introduction to Spatial Regression Models

Up until now, we have been concerned with describing the structure of spatial data through correlational, and the methods of exploratory spatial data analysis.

Through ESDA, we examined data for patterns and using the Moran I and Local Moran I statistics, we examined clustering of variables. Now we consider regression models for continuous outcomes. We begin with a review of the Ordinary Least Squares model for a continuous outcome.

OLS Model

The basic OLS model is an attempt to estimate the effect of an independent variable(s) on the value of a dependent variable. This is written as: \(y_i = \beta_0 + \beta_1 * x_i + e_i\)

where y is the dependent variable that we want to model, x is the independent variable we think has an association with y, \(\beta_0\) is the model intercept, or grand mean of y, when x = 0, and \(\beta_1\) is the slope parameter that defines the strength of the linear relationship between x and y. e is the error in the model for y that is unaccounted for by the values of x and the grand mean \(\beta_0\). The average, or expected value of y is : \(E[y|x] = \beta_0 + \beta_1 * x_i\), which is the linear mean function for y, conditional on x, and this gives us the customary linear regression plot:

set.seed(1234)
x<- rnorm(100, 10, 5)
beta0<-1
beta1<-1.5
y<-beta0+beta1*x+rnorm(100, 0, 5)

plot(x, y)
abline(coef = coef(lm(y~x)), lwd=1.5, col=2)

summary(lm(y~x))$coef
##             Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 1.446620  1.0879494  1.329676 1.867119e-01
## x           1.473915  0.1037759 14.202863 1.585002e-25

Where, the line shows \(E[y|x] = \beta_0 + \beta_1 * x_i\)

We assume that the errors, \(e_i \sim N(0, \sigma^2)\) are independent, Normally distributed and homoskedastic, with variances \(\sigma^2\).

This is the simple model with one predictor. We can easily add more predictors to the equation and rewrite it: \(y = \beta_0 + \sum^k \beta_k * x_{ik} + e_i\)

So, now the mean of y is modeled with multiple x variables. We can write this relationship more compactly using matrix notation:

\(Y = X ' \beta + e\)

Where Y is now a \(n*1\) vector of observations of our dependent variable, X is a \(n*k\) matrix of independent variables, with the first column being all 1’s and e is the \(n*1\) vector of errors for each observation.

In matrices this looks like: \[\begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix}=\begin{bmatrix} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_k \end{bmatrix}*\begin{bmatrix} 1 & x_{1,1} & x_{1,2} & \dots & x_{1, k}\\ 1 & x_{2,1} & x_{1,2} & \dots & x_{1, k} \\ 1 &\vdots & \vdots & \vdots & \vdots \\ 1 & x_{n,1} & x_{n,2} & \dots & x_{n, k} \end{bmatrix}+\begin{bmatrix} e_1 \\ e_2 \\ \vdots \\ e_n \end{bmatrix}\]

The residuals are uncorrelated, with covariance matrix \(\Sigma\) =

\[ \Sigma = \sigma^2 I = \sigma^2 * \begin{bmatrix} 1 & 0 & 0 & \dots & 0\\ 0 & 1 & 0 & \dots & 0 \\ 0 & \vdots & \vdots & \dots & \vdots \\ 0 & 0 & 0 & \dots & 1 \end{bmatrix} = \begin{bmatrix} \sigma^2 & 0 & 0 & \dots & 0\\ 0 & \sigma^2 & 0 & \dots & 0 \\ 0 & \vdots & \vdots & \dots & \vdots \\ 0 & 0 & 0 & \dots & \sigma^2 \end{bmatrix}\]

To estimate the \(\beta\) coefficients, we use the customary OLS estimator \(\beta = (X'X)^{-1} (X' Y)\)

this is the estimator that minimizes the residual sum of squares: \((Y - X ' \beta)' (Y - X' \beta)\)

or

\((Y - \hat Y)' (Y - \hat Y)\)

We can inspect the properties of the estimates by examining the residuals, or \(e_i\) of the model. Since we assume the data are normal, a quantile-quantile (Q-Q) plot of the residuals against the expected quantile of the standard normal distribution should be a straight line. Formal tests of normality can also be used.

fit<-lm(y~x)
qqnorm(rstudent(fit))
qqline(rstudent(fit))

shapiro.test(resid(fit))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(fit)
## W = 0.98878, p-value = 0.5677
ad.test(resid(fit))
## 
##  Anderson-Darling normality test
## 
## data:  resid(fit)
## A = 0.39859, p-value = 0.3593
lillie.test(resid(fit))
## 
##  Lilliefors (Kolmogorov-Smirnov) normality test
## 
## data:  resid(fit)
## D = 0.052017, p-value = 0.7268

We may also inspect the association between \(e_i\), or more appropriately the studentized/standardized residuals, and the predictors and the dependent variable. If we see evidence of association, then homoskedasticity is a poor assumption

par(mfrow=c(2,2))
plot(fit)

par(mfrow=c(1,1))

Model-data agreement

Do we (meaning our data) meet the statistical assumptions of our analytical models?
Always ask this of any analysis you do, if your model is wrong, your inference will also be wrong.

Since spatial data often display correlations amongst closely located observations (autocorrelation), we should probably test for autocorrelation in the model residuals, as that would violate the assumptions of the OLS model.

One method for doing this is to calculate the value of Moran’s I for the OLS residuals.

library(spdep)
library(rgdal)
library(RColorBrewer)
dat<-readOGR(dsn = "/Users/ozd504/Google Drive/classes/dem7263/class17/data", layer="SA_classdata")
## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/ozd504/Google Drive/classes/dem7263/class17/data", layer: "SA_classdata"
## with 235 features
## It has 71 fields
## Integer64 fields read as strings:  POLY_ID Sum_school
#Make a queen style weight matrix
sanb<-poly2nb(dat, queen=T)
summary(sanb)
## Neighbour list object:
## Number of regions: 235 
## Number of nonzero links: 1314 
## Percentage nonzero weights: 2.379357 
## Average number of links: 5.591489 
## Link number distribution:
## 
##  1  2  3  4  5  6  7  8  9 10 
##  2  7 16 30 52 61 41 18  7  1 
## 2 least connected regions:
## 82 205 with 1 link
## 1 most connected region:
## 30 with 10 links
salw<-nb2listw(sanb, style="W")
dat$mort3<-apply(dat@data[, c("deaths09", "deaths10", "deaths11")],1,sum)
dat$mortrate<-1000*dat$mort3/dat$acs_poptot

fit2<-lm(mortrate ~ pfemhh + PBLACK+ log(MEDHHINC), data=dat )

dat$resid<-rstudent(fit2)
spplot(dat, "resid",at=quantile(dat$resid), col.regions=brewer.pal(n=5, "RdBu"), main="Residuals from OLS Fit of Mortality Rate")

lm.morantest(fit2, listw = salw, resfun = rstudent)
## 
##  Global Moran I for regression residuals
## 
## data:  
## model: lm(formula = mortrate ~ pfemhh + PBLACK + log(MEDHHINC),
## data = dat)
## weights: salw
## 
## Moran I statistic standard deviate = 0.22711, p-value = 0.4102
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I      Expectation         Variance 
##     -0.002991893     -0.011928881      0.001548559

Which, in this case, there appears to be no clustering in the residuals, since the observed value of Moran’s I is-.002, with a z-test of .211, p= .41.

Extending the OLS model to accommodate spatial structure

If we now assume we measure our Y and X’s at specific spatial locations (s), so we have Y(s) and X(s).

In most analysis, the spatial location (i.e. the county or census tract) only serves to link X and Y so we can collect our data on them, and in the subsequent analysis this spatial information is ignored that explicitly considers the spatial relationships between the variables or the locations.

In fact, even though we measure Y(s) and X(s) what we end up analyzing X and Y, and apply the ordinary regression methods on these data to understand the effects of X on Y.

Moreover, we could move them around in space (as long as we keep the observations together \(y_i\) with \(x_i\)) and still get the same results. Such analyses have been called a-spatial. This is the kind of regression model you are used to fitting, where we ignore any information on the locations of the observations themselves.

However, we can extend the simple regression case to include the information on (s) and incorporate it into our models explicitly, so they are no longer a-spatial.

There are several methods by which to incorporate the (s) locations into our models, there are several alternatives to use on this problem, and we will spend this semester talking about each of these:

  • The structured linear mixed (multi-level) model, or GLMM (generalized linear mixed model)
  • Spatial filtering of observations
  • Spatially autoregressive models (SAR)
  • Geographically weighted regression (GWR)

We will first deal with the case of the spatially autoregressive model, or SAR model, as its structure is just a modification of the OLS model from above.

Spatially autoregressive models

These models come to demography largely from economics, where they were well described in Lesage and Pace, 2009 and Anselin, 1988 among other sources. These models are used for continuous outcomes, and generally for outcomes measured at some level of aggregation, but can be used if you have individualy geocoded data as well.

We saw in the normal OLS model that some of the basic assumptions of the model are that the: 1) model residuals are distributed as iid standard normal random variates 2) and that they have common (and constant, meaning homoskedastic) unit variance.

Spatial data, however present a series of problems to the standard OLS regression model. These problems are typically seen as various representations of spatial structure or dependence within the data. The spatial structure of the data can introduce spatial dependence into both the outcome, the predictors and the model residuals.

This can be observed as neighboring observations, both with high (or low) values (positive autocorrelation) for either the dependent variable, the model predictors or the model residuals. We can also observe situations where areas with high values can be surrounded by areas with low values (negative autocorrelation).

Since the standard OLS model assumes the residuals (and the outcomes themselves) are uncorrelated, as previous stated, the autocorrelation inherent to most spatial data introduces factors that violate the iid distributional assumptions for the residuals, and could violate the assumption of common variance for the OLS residuals. To account for the expected spatial association in the data, we would like a model that accounts for the spatial structure of the data. One such way of doing this is by allowing there to be correlation between residuals in our model, or to be correlation in the dependent variable.

We are familiar with the concept of autoregression amongst neighboring observations. This concept is that a particular observation is a linear combination of its neighboring values. This autoregression introduces dependence into the data. Instead of specifying the autoregression structure directly, we introduce spatial autocorrelation through a global autocorrelation coefficient and a spatial proximity measure.

There are 2 basic forms of the spatial autoregressive model: the spatial lag and the spatial error models, with expansions of each, referred to as the Durbin lag and the Durbin error model.

Both of these models build on the basic OLS regression model: \(Y = X ' \beta + e\)

Where Y is the dependent variable, X is the matrix of independent variables, \(\beta\) is the vector of regression parameters to be estimated from the data, and e are the model residuals, which are assumed to be distributed as a Gaussian random variable with mean 0 and constant variance-covariance matrix \(\Sigma\) .

The spatial lag model

The spatial lag model introduces autocorrelation into the regression model by lagging the dependent variables themselves, much like in a time-series approach .
The model is specified as: \(Y= \rho W Y + X '\beta +e\)

where \(\rho\) is the autoregressive coefficient, which tells us how strong the resemblance is, on average, between \(Y_i\) and it’s neighbors. The matrix ** W** is the spatial weight matrix, describing the spatial network structure of the observations, like we described in the ESDA lecture.

In the lag model, we are specifying the spatial component on the dependent variable. This leads to a spatial filtering of the variable, where they are averaged over the surrounding neighborhood defined in W, called the spatially lagged variable.

The specification that is used most often is a spatially filtered Y variable that can then be regressed on X, which can directly be seen in a re-expression of the OLS model as:

\(Y= \rho WY + X' \beta + e\)

where the direct effect of the spatial lagging of the dependent variable is seen.

To estimate these models we can use either GeoDa or R in R we use the spdep package, and the lagsarlm() function

The lag model is:

fit.lag<-lagsarlm(mortrate ~ pfemhh + PBLACK+ log(MEDHHINC), data=dat, listw = salw) 
summary(fit.lag, Nagelkerke=T)
## 
## Call:lagsarlm(formula = mortrate ~ pfemhh + PBLACK + log(MEDHHINC), 
##     data = dat, listw = salw)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -95.7578 -16.5280  -3.6110   9.3572 876.7036 
## 
## Type: lag 
## Coefficients: (asymptotic standard errors) 
##               Estimate Std. Error z value  Pr(>|z|)
## (Intercept)   -294.022    143.182 -2.0535   0.04003
## pfemhh         330.334     56.693  5.8267 5.655e-09
## PBLACK         -64.002     37.986 -1.6849   0.09201
## log(MEDHHINC)   21.065     12.536  1.6804   0.09288
## 
## Rho: -0.021935, LR test value: 0.054577, p-value: 0.81528
## Asymptotic standard error: 0.099682
##     z-value: -0.22005, p-value: 0.82583
## Wald statistic: 0.048422, p-value: 0.82583
## 
## Log likelihood: -1300.676 for lag model
## ML residual variance (sigma squared): 3757.9, (sigma: 61.302)
## Nagelkerke pseudo-R-squared: 0.15201 
## Number of observations: 235 
## Number of parameters estimated: 6 
## AIC: 2613.4, (AIC for lm: 2611.4)
## LM test for residual autocorrelation
## test value: 1.2808, p-value: 0.25775

We see that \(\rho\) is estimated to be .034, and the likelihood ratio test shows that this is not significantly different from 0.

The spatial error model

The spatial error model says that the autocorrelation is not in the outcome itself, but instead, any autocorrelation is attributable to there being missing spatial covariates in the data. If these spatially patterned covariates could be measures, the tne autocorrelation would be 0. This model is written:

\(Y= X' \beta +e\)

\(e=\lambda W e + v\)

This model, in effect, controls for the nuisance of correlated errors in the data that are attributable to an inherently spatial process, or to spatial autocorrelation in the measurement errors of the measured and possibly unmeasured variables in the model.

This model is estimated in R using errorsarlm() in the spdep library

fit.err<-errorsarlm(mortrate ~ pfemhh + PBLACK+ log(MEDHHINC), data=dat, listw = salw) 
summary(fit.err, Nagelkerke=T)
## 
## Call:errorsarlm(formula = mortrate ~ pfemhh + PBLACK + log(MEDHHINC), 
##     data = dat, listw = salw)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -95.1147 -16.2790  -3.4917   9.3874 877.0479 
## 
## Type: error 
## Coefficients: (asymptotic standard errors) 
##               Estimate Std. Error z value  Pr(>|z|)
## (Intercept)   -298.466    144.259 -2.0690   0.03855
## pfemhh         330.075     56.280  5.8649 4.495e-09
## PBLACK         -63.927     38.280 -1.6700   0.09492
## log(MEDHHINC)   21.439     12.635  1.6968   0.08974
## 
## Lambda: 0.011755, LR test value: 0.014515, p-value: 0.90411
## Asymptotic standard error: 0.10476
##     z-value: 0.11221, p-value: 0.91066
## Wald statistic: 0.012591, p-value: 0.91066
## 
## Log likelihood: -1300.696 for error model
## ML residual variance (sigma squared): 3758.8, (sigma: 61.309)
## Nagelkerke pseudo-R-squared: 0.15186 
## Number of observations: 235 
## Number of parameters estimated: 6 
## AIC: 2613.4, (AIC for lm: 2611.4)

We see \(\lambda\) = .071, with a p-value of .465, suggesting again that, in this case, there is no autocorrelation in the model residuals.

We can examine the relative fits of each model by extracting the AIC values from each:

AIC(fit.lag)
## [1] 2613.352
AIC(fit.err)
## [1] 2613.392
AIC(fit2)
## [1] 2611.406

Which, while slightly lower than the OLS model, show little evidence of favoring the spatial regression models in this case.

Examination of Model Specification

To some degree, both of the SAR specifications allow us to model spatial dependence in the data. The primary difference between them is where we model said dependence.

The lag model says that the dependence affects the dependent variable only, we can liken this to a diffusion scenario, where your neighbors have a diffusive effect on you.

The error model says that dependence affects the residuals only. We can liken this to the missing spatially dependent covariate situation, where, if only we could measure another really important spatially associated predictor, we could account for the spatial dependence. But alas, we cannot, and we instead model dependence in our errors.

These are inherently two completely different ways to think about specifying a model, and we should really make our decision based upon how we think our process of interest operates.

That being said, this way of thinking isn’t necessarily popular among practitioners. Most practitioners want the best fitting model, ‘nuff said. So methods have been developed that test for alternate model specifications, to see which kind of model best summarizes the observed variation in the dependent variable and the spatial dependence.

These are a set of so-called Lagrange Multiplier (econometrician’s jargon for a score test) test. These tests compare the model fits from the OLS, spatial error, and spatial lag models using the method of the score test.

For those who don’t remember, the score test is a test based on the relative change in the first derivative of the likelihood function around the maximum likelihood. The particular thing here that is affecting the value of this derivative is the autoregressive parameter, \(\rho\) or \(\lambda\). In the OLS model \(\rho\) or \(\lambda\) = 0 (so both the lag and error models simplify to OLS), but as this parameter changes, so does the likelihood for the model, hence why the derivative of the likelihood function is used. This is all related to how the estimation routines estimate the value of \(\rho\) or \(\lambda\).

Using the Lagrange Multiplier Test (LMT)

In general, you fit the OLS model to your dependent variable, then submit the OLS model fit to the LMT testing procedure.

Then you look to see which model (spatial error, or spatial lag) has the highest value for the test.

Enter the uncertainty…

So how much bigger, you might say?

Well, drastically bigger, if the LMT for the error model is 2500 and the LMT for the lag model is 2480, this is NOT A BIG DIFFERENCE, only about 1%. If you see a LMT for the error model of 2500 and a LMT for the lag model of 250, THIS IS A BIG DIFFERENCE.

So what if you don’t see a BIG DIFFERENCE, HOW DO YOU DECIDE WHICH MODEL TO USE???

Well, you could think more, but who has time for that.

The econometricians have thought up a better LMT test, the so-called robust LMT, robust to what I’m not sure, but it is said that it can settle such problems of a not so big difference between the lag and error model specifications.

So what do you do? In general, think about your problem before you run your analysis, should this fail you, proceed with using the LMT, if this is inconclusive, look at the robust LMT, and choose the model which has the larger value for this test.

More Data Examples:

San Antonio, TX mortality data

#Spatial Regression Models 1

#, proj4string=CRS("+proj=utm zone=14")
dat<-dat[which(dat$acs_poptot>100),]

#Create a good representative set of neighbor types
sa.nb6<-knearneigh(coordinates(dat), k=6)
sa.nb6<-knn2nb(sa.nb6)
sa.wt6<-nb2listw(sa.nb6, style="W")

sa.nb5<-knearneigh(coordinates(dat), k=5)
sa.nb5<-knn2nb(sa.nb5)
sa.wt5<-nb2listw(sa.nb5, style="W")

sa.nb4<-knearneigh(coordinates(dat), k=4)
sa.nb4<-knn2nb(sa.nb4)
sa.wt4<-nb2listw(sa.nb4, style="W")

sa.nb3<-knearneigh(coordinates(dat), k=3)
sa.nb3<-knn2nb(sa.nb3)
sa.wt3<-nb2listw(sa.nb3,style="W")

sa.nb2<-knearneigh(coordinates(dat), k=2)
sa.nb2<-knn2nb(sa.nb2)
sa.wt2<-nb2listw(sa.nb2,style="W")

sa.nbr<-poly2nb(dat, queen=F)
sa.wtr<-nb2listw(sa.nbr, zero.policy=T)

sa.nbq<-poly2nb(dat, queen=T)
sa.wtq<-nb2listw(sa.nbr, style="W", zero.policy=T)

sa.nbd<-dnearneigh(coordinates(dat), d1=0, d2=10000)
sa.wtd<-nb2listw(sa.nbd, zero.policy=T)

#create a mortality rate, 3 year average
dat$mort3<-apply(dat@data[, c("deaths09", "deaths10", "deaths11")],1,mean)
dat$mortrate<-1000*dat$mort3/dat$acs_poptot

#just a 
hist(dat$mortrate)

#do some basic regression models, without spatial structure

fit.1<-lm(scale(mortrate)~scale(ppersonspo)+scale(I(viol3yr/acs_poptot))+scale(dissim)+scale(ppop65plus), data=dat)
summary(fit.1)
## 
## Call:
## lm(formula = scale(mortrate) ~ scale(ppersonspo) + scale(I(viol3yr/acs_poptot)) + 
##     scale(dissim) + scale(ppop65plus), data = dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.96044 -0.27804 -0.00673  0.26359  2.18006 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  -1.533e-17  3.120e-02   0.000   1.0000    
## scale(ppersonspo)             1.215e-01  5.047e-02   2.407   0.0169 *  
## scale(I(viol3yr/acs_poptot))  2.287e-01  3.841e-02   5.953  9.8e-09 ***
## scale(dissim)                 8.467e-02  4.817e-02   1.758   0.0801 .  
## scale(ppop65plus)             7.240e-01  3.594e-02  20.146  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4773 on 229 degrees of freedom
## Multiple R-squared:  0.7761, Adjusted R-squared:  0.7722 
## F-statistic: 198.4 on 4 and 229 DF,  p-value: < 2.2e-16
car::vif(fit.1)
##            scale(ppersonspo) scale(I(viol3yr/acs_poptot)) 
##                     2.605367                     1.508595 
##                scale(dissim)            scale(ppop65plus) 
##                     2.372665                     1.320878
par(mfrow=c(2,2))
plot(fit.1)

par(mfrow=c(1,1))
#this is a test for constant variance
lmtest::bptest(fit.1)  #looks like have heteroskedasticity
## 
##  studentized Breusch-Pagan test
## 
## data:  fit.1
## BP = 51.088, df = 4, p-value = 2.14e-10
#extract studentized residuals from the fit, and examine them
dat$residfit1<-rstudent(fit.1)

cols<-brewer.pal(5,"RdBu")
spplot(dat,"residfit1", at=quantile(dat$residfit1), col.regions=cols, main="Residuals from Model fit of Mortality Rate")

Chi and Zhu suggest using a wide array of neighbor specifications, then picking the one that maximizes the autocorrelation coefficient. So, here I emulate their results:

#test for residual autocorrelation 
resi<-c(lm.morantest(fit.1, listw=sa.wt2)$estimate[1],
lm.morantest(fit.1, listw=sa.wt3)$estimate[1],
lm.morantest(fit.1, listw=sa.wt4)$estimate[1],
lm.morantest(fit.1, listw=sa.wt5)$estimate[1],
lm.morantest(fit.1, listw=sa.wt6)$estimate[1],
lm.morantest(fit.1, listw=sa.wtd, zero.policy=T)$estimate[1],
lm.morantest(fit.1, listw=sa.wtq)$estimate[1],
lm.morantest(fit.1, listw=sa.wtr)$estimate[1])
plot(resi, type="l")

lm.morantest(fit.1, listw=sa.wt2)
## 
##  Global Moran I for regression residuals
## 
## data:  
## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) +
## scale(I(viol3yr/acs_poptot)) + scale(dissim) + scale(ppop65plus),
## data = dat)
## weights: sa.wt2
## 
## Moran I statistic standard deviate = -1.3282, p-value = 0.908
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I      Expectation         Variance 
##     -0.089262137     -0.010642737      0.003503515
lm.morantest(fit.1, listw=sa.wt3)
## 
##  Global Moran I for regression residuals
## 
## data:  
## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) +
## scale(I(viol3yr/acs_poptot)) + scale(dissim) + scale(ppop65plus),
## data = dat)
## weights: sa.wt3
## 
## Moran I statistic standard deviate = 0.10133, p-value = 0.4596
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I      Expectation         Variance 
##     -0.005775111     -0.010724844      0.002386190
lm.morantest(fit.1, listw=sa.wt4)
## 
##  Global Moran I for regression residuals
## 
## data:  
## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) +
## scale(I(viol3yr/acs_poptot)) + scale(dissim) + scale(ppop65plus),
## data = dat)
## weights: sa.wt4
## 
## Moran I statistic standard deviate = 0.90538, p-value = 0.1826
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I      Expectation         Variance 
##       0.02812192      -0.01050301       0.00182003
lm.morantest(fit.1, listw=sa.wt5)
## 
##  Global Moran I for regression residuals
## 
## data:  
## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) +
## scale(I(viol3yr/acs_poptot)) + scale(dissim) + scale(ppop65plus),
## data = dat)
## weights: sa.wt5
## 
## Moran I statistic standard deviate = 1.3996, p-value = 0.08082
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I      Expectation         Variance 
##       0.04315932      -0.01029146       0.00145856
lm.morantest(fit.1, listw=sa.wt6)
## 
##  Global Moran I for regression residuals
## 
## data:  
## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) +
## scale(I(viol3yr/acs_poptot)) + scale(dissim) + scale(ppop65plus),
## data = dat)
## weights: sa.wt6
## 
## Moran I statistic standard deviate = 1.7095, p-value = 0.04368
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I      Expectation         Variance 
##      0.049203016     -0.010078442      0.001202519
lm.morantest(fit.1, listw=sa.wtd, zero.policy=T)
## 
##  Global Moran I for regression residuals
## 
## data:  
## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) +
## scale(I(viol3yr/acs_poptot)) + scale(dissim) + scale(ppop65plus),
## data = dat)
## weights: sa.wtd
## 
## Moran I statistic standard deviate = 2.6709, p-value = 0.003782
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I      Expectation         Variance 
##     1.901642e-02    -6.929932e-03     9.436939e-05
lm.morantest(fit.1, listw=sa.wtq)
## 
##  Global Moran I for regression residuals
## 
## data:  
## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) +
## scale(I(viol3yr/acs_poptot)) + scale(dissim) + scale(ppop65plus),
## data = dat)
## weights: sa.wtq
## 
## Moran I statistic standard deviate = 0.65308, p-value = 0.2569
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I      Expectation         Variance 
##      0.017905792     -0.010601820      0.001905428
lm.morantest(fit.1, listw=sa.wtr)
## 
##  Global Moran I for regression residuals
## 
## data:  
## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) +
## scale(I(viol3yr/acs_poptot)) + scale(dissim) + scale(ppop65plus),
## data = dat)
## weights: sa.wtr
## 
## Moran I statistic standard deviate = 0.65308, p-value = 0.2569
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I      Expectation         Variance 
##      0.017905792     -0.010601820      0.001905428
#looks like we have minimal autocorrelation in our residuals, but the distance based
#weight does show significant autocorrelation

#Let's look at the local autocorrelation in our residuals
#get the values of I
dat$lmfit1<-localmoran(dat$mortrate, sa.wt5, zero.policy=T)[,1]
brks<-classInt::classIntervals(dat$lmfit1, n=5, style="quantile")
spplot(dat, "lmfit1", at=brks$brks
, col.regions=brewer.pal(5, "RdBu"), main="Local Moran Plot of Mortality Rate")

We examine the Lagrange multiplier test for model specification:

lm.LMtests(fit.1, listw = sa.wt6, test = "all")
## 
##  Lagrange multiplier diagnostics for spatial dependence
## 
## data:  
## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) +
## scale(I(viol3yr/acs_poptot)) + scale(dissim) + scale(ppop65plus),
## data = dat)
## weights: sa.wt6
## 
## LMerr = 1.8773, df = 1, p-value = 0.1706
## 
## 
##  Lagrange multiplier diagnostics for spatial dependence
## 
## data:  
## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) +
## scale(I(viol3yr/acs_poptot)) + scale(dissim) + scale(ppop65plus),
## data = dat)
## weights: sa.wt6
## 
## LMlag = 10.038, df = 1, p-value = 0.001533
## 
## 
##  Lagrange multiplier diagnostics for spatial dependence
## 
## data:  
## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) +
## scale(I(viol3yr/acs_poptot)) + scale(dissim) + scale(ppop65plus),
## data = dat)
## weights: sa.wt6
## 
## RLMerr = 0.09383, df = 1, p-value = 0.7594
## 
## 
##  Lagrange multiplier diagnostics for spatial dependence
## 
## data:  
## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) +
## scale(I(viol3yr/acs_poptot)) + scale(dissim) + scale(ppop65plus),
## data = dat)
## weights: sa.wt6
## 
## RLMlag = 8.2549, df = 1, p-value = 0.004064
## 
## 
##  Lagrange multiplier diagnostics for spatial dependence
## 
## data:  
## model: lm(formula = scale(mortrate) ~ scale(ppersonspo) +
## scale(I(viol3yr/acs_poptot)) + scale(dissim) + scale(ppop65plus),
## data = dat)
## weights: sa.wt6
## 
## SARMA = 10.132, df = 2, p-value = 0.006307

This suggests that a spatial lag model is best in this case, since the LMlag and RLMLag tests have the largest values, and the LMerr tests are not significant.

Now we fit the spatial lag model, The lag mode is fit with lagsarlm() in the spdep library, we basically specify the same model as in the lm() fit above, but we need to specify the spatial weight matrix and the type of lag model to fit:

fit.lag<-lagsarlm(scale(mortrate)~scale(ppersonspo)+scale(I(viol3yr/acs_poptot))+scale(dissim)+scale(ppop65plus), data=dat, listw=sa.wt2, type="lag")
summary(fit.1)
## 
## Call:
## lm(formula = scale(mortrate) ~ scale(ppersonspo) + scale(I(viol3yr/acs_poptot)) + 
##     scale(dissim) + scale(ppop65plus), data = dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.96044 -0.27804 -0.00673  0.26359  2.18006 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  -1.533e-17  3.120e-02   0.000   1.0000    
## scale(ppersonspo)             1.215e-01  5.047e-02   2.407   0.0169 *  
## scale(I(viol3yr/acs_poptot))  2.287e-01  3.841e-02   5.953  9.8e-09 ***
## scale(dissim)                 8.467e-02  4.817e-02   1.758   0.0801 .  
## scale(ppop65plus)             7.240e-01  3.594e-02  20.146  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4773 on 229 degrees of freedom
## Multiple R-squared:  0.7761, Adjusted R-squared:  0.7722 
## F-statistic: 198.4 on 4 and 229 DF,  p-value: < 2.2e-16
summary(fit.lag, Nagelkerke=T)
## 
## Call:
## lagsarlm(formula = scale(mortrate) ~ scale(ppersonspo) + scale(I(viol3yr/acs_poptot)) + 
##     scale(dissim) + scale(ppop65plus), data = dat, listw = sa.wt2, 
##     type = "lag")
## 
## Residuals:
##         Min          1Q      Median          3Q         Max 
## -1.91953751 -0.28320690  0.00039592  0.25188870  2.28912610 
## 
## Type: lag 
## Coefficients: (asymptotic standard errors) 
##                                Estimate Std. Error z value  Pr(>|z|)
## (Intercept)                  -0.0035113  0.0307209 -0.1143    0.9090
## scale(ppersonspo)             0.1047111  0.0509073  2.0569    0.0397
## scale(I(viol3yr/acs_poptot))  0.2235749  0.0379480  5.8916 3.825e-09
## scale(dissim)                 0.0758260  0.0476155  1.5925    0.1113
## scale(ppop65plus)             0.7019724  0.0376957 18.6221 < 2.2e-16
## 
## Rho: 0.066138, LR test value: 2.1119, p-value: 0.14616
## Asymptotic standard error: 0.043789
##     z-value: 1.5104, p-value: 0.13095
## Wald statistic: 2.2813, p-value: 0.13095
## 
## Log likelihood: -155.394 for lag model
## ML residual variance (sigma squared): 0.22064, (sigma: 0.46973)
## Nagelkerke pseudo-R-squared: 0.77808 
## Number of observations: 234 
## Number of parameters estimated: 7 
## AIC: 324.79, (AIC for lm: 324.9)
## LM test for residual autocorrelation
## test value: 9.0458, p-value: 0.002633
#Test for homoskedasticity
bptest.sarlm(fit.lag)
## 
##  studentized Breusch-Pagan test
## 
## data:  
## BP = 53.263, df = 4, p-value = 7.506e-11

We see evidence for heteroskedastic variances, which suggests we need an empirically derived set of standard errors for the model. These are referred to as White, or Huber–White standard errors, following White, 1980. For more on this, see the recent paper by Cribari-Neto and Bernardino da Silva, 2010. I get this method of calculating the robust standard errors from the spdep manual. If all else fails, READ THE INSTRUCTIONS!!

#robust SE's for the spatial model
library(sandwich)
lm.target <- lm(fit.lag$tary ~ fit.lag$tarX - 1)
lmtest::coeftest(lm.target, vcov.=vcovHC(lm.target, type="HC0", df=Inf))
## 
## t test of coefficients:
## 
##                                             Estimate Std. Error t value
## fit.lag$tarXx(Intercept)                  -0.0035113  0.0307070 -0.1143
## fit.lag$tarXxscale(ppersonspo)             0.1047111  0.0790642  1.3244
## fit.lag$tarXxscale(I(viol3yr/acs_poptot))  0.2235749  0.1140534  1.9603
## fit.lag$tarXxscale(dissim)                 0.0758260  0.0498978  1.5196
## fit.lag$tarXxscale(ppop65plus)             0.7019724  0.0617274 11.3721
##                                           Pr(>|t|)    
## fit.lag$tarXx(Intercept)                   0.90906    
## fit.lag$tarXxscale(ppersonspo)             0.18670    
## fit.lag$tarXxscale(I(viol3yr/acs_poptot))  0.05118 .  
## fit.lag$tarXxscale(dissim)                 0.12998    
## fit.lag$tarXxscale(ppop65plus)             < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

which shows that, compared to the asymptotic standard errors, the empirical ones are larger for several parameters, and thus, we have only one significant effect (% over age 65) in the model once we apply these.

Next we fit the spatial error model:

fit.err<-errorsarlm(scale(mortrate)~scale(ppersonspo)+scale(I(viol3yr/acs_poptot))+scale(dissim)+scale(ppop65plus), data=dat, listw=sa.wt2)
summary(fit.err, Nagelkerke=T)
## 
## Call:
## errorsarlm(formula = scale(mortrate) ~ scale(ppersonspo) + scale(I(viol3yr/acs_poptot)) + 
##     scale(dissim) + scale(ppop65plus), data = dat, listw = sa.wt2)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -1.9126219 -0.2695111 -0.0030349  0.2680765  2.1121586 
## 
## Type: error 
## Coefficients: (asymptotic standard errors) 
##                               Estimate Std. Error z value  Pr(>|z|)
## (Intercept)                  0.0029538  0.0277141  0.1066   0.91512
## scale(ppersonspo)            0.1202146  0.0484444  2.4815   0.01308
## scale(I(viol3yr/acs_poptot)) 0.2303560  0.0371271  6.2045 5.486e-10
## scale(dissim)                0.0870377  0.0458697  1.8975   0.05776
## scale(ppop65plus)            0.7350826  0.0340110 21.6131 < 2.2e-16
## 
## Lambda: -0.10644, LR test value: 2.2391, p-value: 0.13456
## Asymptotic standard error: 0.071547
##     z-value: -1.4877, p-value: 0.13684
## Wald statistic: 2.2132, p-value: 0.13684
## 
## Log likelihood: -155.3304 for error model
## ML residual variance (sigma squared): 0.22001, (sigma: 0.46905)
## Nagelkerke pseudo-R-squared: 0.7782 
## Number of observations: 234 
## Number of parameters estimated: 7 
## AIC: 324.66, (AIC for lm: 324.9)

As a pretty good indicator of which model is best, look at the AIC of each:

AIC(fit.1)
## [1] 324.8999
AIC(fit.lag)
## [1] 324.788
AIC(fit.err)
## [1] 324.6608

Larger data example US counties

This example shows a lot more in terms of spatial effects. These are the data from Sparks and Sparks, 2010, and we replicate their analysis below:

spdat<-readOGR(dsn="/Users/ozd504/Google Drive/classes/dem7263/class17/data/Sparks_Sparks_PSP_data_code", layer = "PSP_Data_CLnad83")
## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/ozd504/Google Drive/classes/dem7263/class17/data/Sparks_Sparks_PSP_data_code", layer: "PSP_Data_CLnad83"
## with 3071 features
## It has 81 fields
#Create a good representative set of neighbor types
us.nb6<-knearneigh(coordinates(spdat), k=6)
us.nb6<-knn2nb(us.nb6)
us.wt6<-nb2listw(us.nb6, style="W")

us.nb5<-knearneigh(coordinates(spdat), k=5)
us.nb5<-knn2nb(us.nb5)
us.wt5<-nb2listw(us.nb5, style="W")

us.nb4<-knearneigh(coordinates(spdat), k=4)
us.nb4<-knn2nb(us.nb4)
us.wt4<-nb2listw(us.nb4, style="W")

us.nb3<-knearneigh(coordinates(spdat), k=3)
us.nb3<-knn2nb(us.nb3)
us.wt3<-nb2listw(us.nb3,style="W")

us.nb2<-knearneigh(coordinates(spdat), k=2)
us.nb2<-knn2nb(us.nb2)
us.wt2<-nb2listw(us.nb2,style="W")

us.nbr<-poly2nb(spdat, queen=F)
us.wtr<-nb2listw(us.nbr, zero.policy=T)

us.nbq<-poly2nb(spdat, queen=T)
us.wtq<-nb2listw(us.nbr, style="W", zero.policy=T)

In that paper, all variables were z-scored prior to analysis, so here we do that:

spdat$mortz<-scale(spdat$ARSMORT, center=T, scale=T)
spdat$densz<-scale(spdat$PDENSITY, center=T, scale=T)
spdat$rurz<-scale(spdat$PRURAL, center=T, scale=T)
spdat$blacz<-scale(spdat$PBLACK, center=T, scale=T)
spdat$hisz<-scale(spdat$PHISP, center=T, scale=T)
spdat$femz<-scale(spdat$FEMHH, center=T, scale=T)
spdat$unemz<-scale(spdat$UNEMP, center=T, scale=T)
spdat$povz<-scale(spdat$PPERSONPO, center=T, scale=T)
spdat$incz<-scale(spdat$MEDHHINC, center=T, scale=T)
spdat$hvalz<-scale(spdat$MEDHVAL, center=T, scale=T)
spdat$giniz<-scale(spdat$GINI001, center=T, scale=T)

Basic descriptives of the dependent variable

library(ggplot2)
qplot(spdat$mortz,geom="histogram" )
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

spplot(spdat,"mortz", at=quantile(spdat$mortz), col.regions=brewer.pal(n=5, "Reds"), main="Spatial Distribution of US Mortality Rate")

Now we fit the OLS and weighted OLS models from the paper:

#construct the weight for the model, based on the population size of the counties
spdat$popsize<-(spdat$F1198401+spdat$F0453000+spdat$F1198499)/3
summary(spdat$popsize)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##      83   11310   24900   89780   61990 9496000
#These are the OLS and weighted OLS regression models
fit.ols<-lm(mortz~rurz+blacz+hisz+femz+unemz+hvalz+densz+giniz,data=spdat)
fit.ols.wt<-lm(mortz~rurz+blacz+hisz+femz+unemz+hvalz+densz+giniz,data=spdat, weights=popsize)

resi<-c(lm.morantest(fit.ols, listw=us.wt2)$estimate[1],
        lm.morantest(fit.ols, listw=us.wt3)$estimate[1],
        lm.morantest(fit.ols, listw=us.wt4)$estimate[1],
        lm.morantest(fit.ols, listw=us.wt5)$estimate[1],
        lm.morantest(fit.ols, listw=us.wt6)$estimate[1],
        lm.morantest(fit.ols, listw=us.wtq,zero.policy=T)$estimate[1],
        lm.morantest(fit.ols, listw=us.wtr,zero.policy=T)$estimate[1])
plot(resi, type="l")

In the original paper, the authors used a Queen weight for the polygons, but upon further inspection, the k=2 nearest neighbor rule has the highest residual Moran’s I value, so we will use that instead.

#Spatial lag and error models
fit.err<-errorsarlm(mortz~rurz+blacz+hisz+femz+unemz+hvalz+densz+giniz,data=spdat, listw=us.wt2)
summary(fit.err,Nagelkerke=T)
## 
## Call:errorsarlm(formula = mortz ~ rurz + blacz + hisz + femz + unemz + 
##     hvalz + densz + giniz, data = spdat, listw = us.wt2)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -5.7386852 -0.4193383 -0.0093521  0.4105584  4.7892036 
## 
## Type: error 
## Coefficients: (asymptotic standard errors) 
##                Estimate  Std. Error  z value  Pr(>|z|)
## (Intercept)  0.00017225  0.02138118   0.0081  0.993572
## rurz        -0.02334989  0.01777164  -1.3139  0.188885
## blacz       -0.23365396  0.03061917  -7.6310 2.331e-14
## hisz        -0.15401508  0.02027011  -7.5981 2.998e-14
## femz         0.53778220  0.03146982  17.0888 < 2.2e-16
## unemz        0.04702766  0.01804642   2.6059  0.009163
## hvalz       -0.22989533  0.02001925 -11.4837 < 2.2e-16
## densz        0.02855309  0.01900162   1.5027  0.132925
## giniz        0.10239896  0.01620513   6.3189 2.634e-10
## 
## Lambda: 0.39048, LR test value: 491.85, p-value: < 2.22e-16
## Asymptotic standard error: 0.016478
##     z-value: 23.698, p-value: < 2.22e-16
## Wald statistic: 561.57, p-value: < 2.22e-16
## 
## Log likelihood: -3454.334 for error model
## ML residual variance (sigma squared): 0.52143, (sigma: 0.7221)
## Nagelkerke pseudo-R-squared: 0.44451 
## Number of observations: 3071 
## Number of parameters estimated: 11 
## AIC: 6930.7, (AIC for lm: 7420.5)
#Robust SE's
lm.target.us1 <- lm(fit.err$tary ~ fit.err$tarX - 1)
lmtest::coeftest(lm.target.us1, vcov.=vcovHC(lm.target.us1, type="HC0"))
## 
## t test of coefficients:
## 
##                                              Estimate  Std. Error t value
## fit.err$tarXI(x - lambda * WX)(Intercept)  0.00017225  0.02147193  0.0080
## fit.err$tarXI(x - lambda * WX)rurz        -0.02334989  0.02717644 -0.8592
## fit.err$tarXI(x - lambda * WX)blacz       -0.23365396  0.06017942 -3.8826
## fit.err$tarXI(x - lambda * WX)hisz        -0.15401508  0.02667697 -5.7733
## fit.err$tarXI(x - lambda * WX)femz         0.53778220  0.07539508  7.1329
## fit.err$tarXI(x - lambda * WX)unemz        0.04702766  0.02561214  1.8361
## fit.err$tarXI(x - lambda * WX)hvalz       -0.22989533  0.02689350 -8.5484
## fit.err$tarXI(x - lambda * WX)densz        0.02855309  0.03591108  0.7951
## fit.err$tarXI(x - lambda * WX)giniz        0.10239896  0.05015631  2.0416
##                                            Pr(>|t|)    
## fit.err$tarXI(x - lambda * WX)(Intercept) 0.9935998    
## fit.err$tarXI(x - lambda * WX)rurz        0.3902996    
## fit.err$tarXI(x - lambda * WX)blacz       0.0001055 ***
## fit.err$tarXI(x - lambda * WX)hisz        8.550e-09 ***
## fit.err$tarXI(x - lambda * WX)femz        1.222e-12 ***
## fit.err$tarXI(x - lambda * WX)unemz       0.0664327 .  
## fit.err$tarXI(x - lambda * WX)hvalz       < 2.2e-16 ***
## fit.err$tarXI(x - lambda * WX)densz       0.4266138    
## fit.err$tarXI(x - lambda * WX)giniz       0.0412771 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
fit.lag<-lagsarlm(mortz~rurz+blacz+hisz+femz+unemz+hvalz+densz+giniz,data=spdat, listw=us.wt2, type="lag")
summary(fit.lag, Nagelkerke=T)
## 
## Call:lagsarlm(formula = mortz ~ rurz + blacz + hisz + femz + unemz + 
##     hvalz + densz + giniz, data = spdat, listw = us.wt2, type = "lag")
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -5.557820 -0.417118 -0.011395  0.414508  5.057599 
## 
## Type: lag 
## Coefficients: (asymptotic standard errors) 
##               Estimate Std. Error  z value  Pr(>|z|)
## (Intercept)  0.0021367  0.0129920   0.1645   0.86936
## rurz        -0.0336341  0.0168510  -1.9960   0.04594
## blacz       -0.2475166  0.0247293 -10.0091 < 2.2e-16
## hisz        -0.1391756  0.0144548  -9.6283 < 2.2e-16
## femz         0.5127824  0.0290808  17.6330 < 2.2e-16
## unemz        0.0169236  0.0154308   1.0967   0.27275
## hvalz       -0.1941811  0.0160979 -12.0625 < 2.2e-16
## densz        0.0103019  0.0148174   0.6953   0.48689
## giniz        0.0996776  0.0152778   6.5244 6.829e-11
## 
## Rho: 0.36025, LR test value: 540.49, p-value: < 2.22e-16
## Asymptotic standard error: 0.015195
##     z-value: 23.709, p-value: < 2.22e-16
## Wald statistic: 562.11, p-value: < 2.22e-16
## 
## Log likelihood: -3430.017 for lag model
## ML residual variance (sigma squared): 0.51833, (sigma: 0.71995)
## Nagelkerke pseudo-R-squared: 0.45324 
## Number of observations: 3071 
## Number of parameters estimated: 11 
## AIC: 6882, (AIC for lm: 7420.5)
## LM test for residual autocorrelation
## test value: 25.778, p-value: 3.8301e-07
#Robust SE's
lm.target.us2 <- lm(fit.lag$tary ~ fit.lag$tarX - 1)
lmtest::coeftest(lm.target.us2, vcov.=vcovHC(lm.target.us2, type="HC0"))
## 
## t test of coefficients:
## 
##                            Estimate Std. Error  t value  Pr(>|t|)    
## fit.lag$tarXx(Intercept)  0.0021367  0.0129916   0.1645   0.86937    
## fit.lag$tarXxrurz        -0.0336341  0.0261127  -1.2880   0.19783    
## fit.lag$tarXxblacz       -0.2475166  0.0508244  -4.8700 1.172e-06 ***
## fit.lag$tarXxhisz        -0.1391756  0.0189604  -7.3403 2.717e-13 ***
## fit.lag$tarXxfemz         0.5127824  0.0691708   7.4133 1.586e-13 ***
## fit.lag$tarXxunemz        0.0169236  0.0204405   0.8279   0.40777    
## fit.lag$tarXxhvalz       -0.1941811  0.0191067 -10.1630 < 2.2e-16 ***
## fit.lag$tarXxdensz        0.0103019  0.0205378   0.5016   0.61598    
## fit.lag$tarXxginiz        0.0996776  0.0462517   2.1551   0.03123 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
fit.spauto.w<-spautolm(mortz~rurz+blacz+hisz+femz+unemz+hvalz+densz+giniz,data=spdat, listw=us.wt2, family="SAR", weights=1/spdat$popsize)
summary(fit.spauto.w, Nagelkerke=T)
## 
## Call: spautolm(formula = mortz ~ rurz + blacz + hisz + femz + unemz + 
##     hvalz + densz + giniz, data = spdat, listw = us.wt2, weights = 1/spdat$popsize, 
##     family = "SAR")
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -4.281229 -0.479232  0.068133  0.618946  5.483900 
## 
## Coefficients: 
##               Estimate Std. Error  z value  Pr(>|z|)
## (Intercept) -0.0895453  0.0441495  -2.0282   0.04254
## rurz        -0.2880542  0.0276863 -10.4042 < 2.2e-16
## blacz       -0.2056207  0.0386415  -5.3212 1.031e-07
## hisz        -0.0263833  0.0216050  -1.2212   0.22202
## femz         0.6243591  0.0347707  17.9565 < 2.2e-16
## unemz       -0.1418222  0.0224553  -6.3158 2.688e-10
## hvalz       -0.1357841  0.0300650  -4.5163 6.292e-06
## densz        0.0078593  0.2361797   0.0333   0.97345
## giniz        0.4562581  0.0100004  45.6242 < 2.2e-16
## 
## Lambda: 0.22872 LR test value: 131.47 p-value: < 2.22e-16 
## Numerical Hessian standard error of lambda: 0.019703 
## 
## Log likelihood: -5955.165 
## ML residual variance (sigma squared): 0.00010013, (sigma: 0.010006)
## Number of observations: 3071 
## Number of parameters estimated: 11 
## AIC: 11932
## Nagelkerke pseudo-R-squared: 0.62774

I sent the paper to a very famous spatial econometrician who immediately poo-pooed it for not considering the spatial Durbin model, so i’ll present it here:

fit.lag.d<-lagsarlm(mortz~rurz+blacz+hisz+femz+unemz+hvalz+densz+giniz,data=spdat, listw=us.wt2, type="mixed")
summary(fit.lag.d, Nagelkerke=T)
## 
## Call:lagsarlm(formula = mortz ~ rurz + blacz + hisz + femz + unemz + 
##     hvalz + densz + giniz, data = spdat, listw = us.wt2, type = "mixed")
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -4.921677 -0.403729 -0.017918  0.392254  4.955699 
## 
## Type: mixed 
## Coefficients: (asymptotic standard errors) 
##               Estimate Std. Error  z value  Pr(>|z|)
## (Intercept)  0.0014794  0.0127846   0.1157 0.9078798
## rurz        -0.0435601  0.0177681  -2.4516 0.0142224
## blacz       -0.3792902  0.0374625 -10.1245 < 2.2e-16
## hisz        -0.0234159  0.0301664  -0.7762 0.4376157
## femz         0.5253596  0.0314711  16.6934 < 2.2e-16
## unemz        0.0317220  0.0188650   1.6815 0.0926612
## hvalz       -0.2458832  0.0231279 -10.6314 < 2.2e-16
## densz        0.0539333  0.0199499   2.7034 0.0068626
## giniz        0.1053401  0.0163545   6.4410 1.187e-10
## lag.rurz     0.0152096  0.0226504   0.6715 0.5019052
## lag.blacz    0.2072645  0.0428155   4.8409 1.293e-06
## lag.hisz    -0.1298747  0.0323443  -4.0154 5.935e-05
## lag.femz    -0.1099243  0.0409856  -2.6820 0.0073178
## lag.unemz   -0.0412451  0.0228562  -1.8045 0.0711453
## lag.hvalz    0.1017119  0.0262243   3.8785 0.0001051
## lag.densz   -0.0578695  0.0188644  -3.0677 0.0021575
## lag.giniz    0.0456150  0.0212404   2.1476 0.0317490
## 
## Rho: 0.37982, LR test value: 502.56, p-value: < 2.22e-16
## Asymptotic standard error: 0.016518
##     z-value: 22.994, p-value: < 2.22e-16
## Wald statistic: 528.72, p-value: < 2.22e-16
## 
## Log likelihood: -3386.543 for mixed model
## ML residual variance (sigma squared): 0.5007, (sigma: 0.7076)
## Nagelkerke pseudo-R-squared: 0.4685 
## Number of observations: 3071 
## Number of parameters estimated: 19 
## AIC: 6811.1, (AIC for lm: 7311.6)
## LM test for residual autocorrelation
## test value: 226.64, p-value: < 2.22e-16
fit.err.d<-errorsarlm(mortz~rurz+blacz+hisz+femz+unemz+hvalz+densz+giniz,data=spdat, listw=us.wt2,etype = "emixed")
summary(fit.err.d,Nagelkerke=T)
## 
## Call:errorsarlm(formula = mortz ~ rurz + blacz + hisz + femz + unemz + 
##     hvalz + densz + giniz, data = spdat, listw = us.wt2, etype = "emixed")
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -4.90464 -0.40899 -0.01381  0.38919  4.89100 
## 
## Type: error 
## Coefficients: (asymptotic standard errors) 
##               Estimate Std. Error  z value  Pr(>|z|)
## (Intercept)  0.0010341  0.0205964   0.0502 0.9599557
## rurz        -0.0389040  0.0182540  -2.1313 0.0330679
## blacz       -0.3495931  0.0342603 -10.2040 < 2.2e-16
## hisz        -0.0643982  0.0265170  -2.4286 0.0151586
## femz         0.5448644  0.0314945  17.3003 < 2.2e-16
## unemz        0.0291888  0.0180619   1.6160 0.1060856
## hvalz       -0.2401698  0.0213028 -11.2741 < 2.2e-16
## densz        0.0456698  0.0190769   2.3940 0.0166668
## giniz        0.1197507  0.0166056   7.2115 5.536e-13
## lag.rurz     0.0075656  0.0243701   0.3104 0.7562211
## lag.blacz    0.1143086  0.0416746   2.7429 0.0060901
## lag.hisz    -0.1592545  0.0298767  -5.3304 9.800e-08
## lag.femz     0.0650709  0.0420798   1.5464 0.1220154
## lag.unemz   -0.0244629  0.0237965  -1.0280 0.3039477
## lag.hvalz    0.0077994  0.0257802   0.3025 0.7622443
## lag.densz   -0.0488094  0.0197789  -2.4678 0.0135965
## lag.giniz    0.0859747  0.0224018   3.8378 0.0001241
## 
## Lambda: 0.37691, LR test value: 477.98, p-value: < 2.22e-16
## Asymptotic standard error: 0.016652
##     z-value: 22.635, p-value: < 2.22e-16
## Wald statistic: 512.34, p-value: < 2.22e-16
## 
## Log likelihood: -3398.833 for error model
## ML residual variance (sigma squared): 0.50521, (sigma: 0.71078)
## Nagelkerke pseudo-R-squared: 0.46423 
## Number of observations: 3071 
## Number of parameters estimated: 19 
## AIC: 6835.7, (AIC for lm: 7311.6)
#As a pretty good indicator of which model is best, look at the AIC of each
AIC(fit.ols)
## [1] 7420.523
AIC(fit.ols.wt)
## [1] 8155.177
AIC(fit.lag)
## [1] 6882.035
AIC(fit.err)
## [1] 6930.668
AIC(fit.spauto.w)
## [1] 11932.33
AIC(fit.lag.d)
## [1] 6811.086
AIC(fit.err.d)
## [1] 6835.666

Sure enough, the Durbin model is better. Guess, I should listen to the famous guy.