Generally, if we have a continuous outcome, we consider using the OLS model and when we have data collected over space, we have other assumptions too.
library(spdep)
library(maptools)
library(car)
library(lmtest)
library(spgwr)
library(classInt)
library(RColorBrewer)
setwd("~/Google Drive//dem7263/data")
Stationarity is a general term in statistics/math, generally it means something doesn’t change over time or space e.g. stationary population e.g. stationary time series In spatial statistics, we can stay stationarity = homogeneity of an effect, or, that a process works the same regardless of where you observe the process. In spatial statistics, the latter can be a weak assumption, and we can ask, does X affect Y differently at different geographic locations, or in terms of parameters: If we estimate in OLS \(\beta\) = .5, are there locations in the data where \(\beta\) != .5?
There are several approaches to dealing with this problem. Among them are: * Create zones of homogeneity and stratify * Allow parameters to vary constantly
The first technique is stratification, or spatial regimes. This is where we stratify the data by either a known fixed quality, such as the state in which a county is located, or perhaps some other socio-demographic or socio-economic variable.
We then fit separate regression models for each region. These could be OLS or some other specification we have discussed.
First, we read in our data and make a spatial representation of it
spdat<-readShapePoly("~/Google Drive/dem7263/data/usdata_mort.shp")
spdat$population<-as.numeric(as.character(spdat$Population))
spdat$deaths<-as.numeric(as.character(spdat$Deaths))
spdat<-spdat[spdat$STATE%in%c("05","22","40","35", "48"),]
spdat$mig_group<-cut(spdat$p5yrinmig, breaks = quantile(spdat$p5yrinmig, p=c(0, .3, .6, 1)), include.lowest = T)
lvs<-levels(spdat$mig_group)
nbs<-knearneigh(coordinates(spdat), k=2)
nbsmg1<-knearneigh(coordinates(spdat[spdat$mig_group==lvs[1],]), k=2)
nbsmg2<-knearneigh(coordinates(spdat[spdat$mig_group==lvs[2],]), k=2)
nbsmg3<-knearneigh(coordinates(spdat[spdat$mig_group==lvs[3],]), k=2)
nbs<-knn2nb(nbs,sym=T)
nbs1<-knn2nb(nbsmg1, sym=T)
nbs2<-knn2nb(nbsmg2, sym=T)
nbs3<-knn2nb(nbsmg3, sym=T)
Here’s the basic OLS model:
fit.1<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), data=spdat)
summary(fit.1)
##
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0066877 -0.0005928 0.0000377 0.0006448 0.0031275
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.046e-02 7.256e-05 144.199 < 2e-16 ***
## scale(ppersonspo) 5.545e-04 7.066e-05 7.847 2.64e-14 ***
## scale(p65plus) 2.092e-03 6.286e-05 33.281 < 2e-16 ***
## scale(pblack_1) 9.032e-05 7.291e-05 1.239 0.216
## scale(phisp) -8.643e-04 7.401e-05 -11.677 < 2e-16 ***
## I(RUCC >= 7)TRUE -1.620e-04 1.195e-04 -1.356 0.176
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.001159 on 495 degrees of freedom
## Multiple R-squared: 0.8073, Adjusted R-squared: 0.8054
## F-statistic: 414.8 on 5 and 495 DF, p-value: < 2.2e-16
We could stratify by state:
#Examine basic regimes, using states as a nesting mechanism
spplot(spdat, "STATE")
fit.strat<-lm(I(deaths/population)~STATE/(scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7)), data=spdat)
summary(fit.strat)
##
## Call:
## lm(formula = I(deaths/population) ~ STATE/(scale(ppersonspo) +
## scale(p65plus) + scale(pblack_1) + scale(phisp) + I(RUCC >=
## 7)), data = spdat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0044801 -0.0005148 0.0000290 0.0006616 0.0043913
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.682e-03 6.809e-04 14.219 < 2e-16 ***
## STATE22 -9.979e-04 2.059e-03 -0.485 0.628201
## STATE35 -1.388e-04 1.560e-03 -0.089 0.929152
## STATE40 1.940e-03 8.023e-04 2.418 0.015976 *
## STATE48 6.572e-04 6.893e-04 0.953 0.340858
## STATE05:scale(ppersonspo) 6.526e-04 2.551e-04 2.558 0.010828 *
## STATE22:scale(ppersonspo) 3.219e-04 2.586e-04 1.245 0.213715
## STATE35:scale(ppersonspo) 5.528e-04 1.943e-04 2.845 0.004634 **
## STATE40:scale(ppersonspo) 8.200e-04 1.930e-04 4.249 2.59e-05 ***
## STATE48:scale(ppersonspo) 2.606e-04 1.112e-04 2.344 0.019477 *
## STATE05:scale(p65plus) 1.953e-03 1.991e-04 9.808 < 2e-16 ***
## STATE22:scale(p65plus) 2.355e-03 2.776e-04 8.485 2.83e-16 ***
## STATE35:scale(p65plus) 1.575e-03 1.694e-04 9.299 < 2e-16 ***
## STATE40:scale(p65plus) 2.530e-03 1.855e-04 13.639 < 2e-16 ***
## STATE48:scale(p65plus) 2.228e-03 8.561e-05 26.025 < 2e-16 ***
## STATE05:scale(pblack_1) 2.350e-04 1.605e-04 1.464 0.143734
## STATE22:scale(pblack_1) 9.331e-05 1.901e-04 0.491 0.623848
## STATE35:scale(pblack_1) 6.666e-04 2.100e-03 0.317 0.751093
## STATE40:scale(pblack_1) 1.879e-04 4.565e-04 0.412 0.680838
## STATE48:scale(pblack_1) 4.049e-04 1.633e-04 2.480 0.013505 *
## STATE05:scale(phisp) -1.783e-03 9.267e-04 -1.925 0.054887 .
## STATE22:scale(phisp) -3.092e-03 2.549e-03 -1.213 0.225843
## STATE35:scale(phisp) 1.852e-04 2.336e-04 0.793 0.428117
## STATE40:scale(phisp) 1.186e-04 5.806e-04 0.204 0.838172
## STATE48:scale(phisp) -4.991e-04 1.334e-04 -3.741 0.000206 ***
## STATE05:I(RUCC >= 7)TRUE 2.247e-04 3.017e-04 0.745 0.456661
## STATE22:I(RUCC >= 7)TRUE 3.250e-04 3.560e-04 0.913 0.361859
## STATE35:I(RUCC >= 7)TRUE -5.091e-04 4.390e-04 -1.160 0.246774
## STATE40:I(RUCC >= 7)TRUE -2.993e-04 2.947e-04 -1.016 0.310346
## STATE48:I(RUCC >= 7)TRUE -3.663e-04 1.629e-04 -2.249 0.024975 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.001079 on 471 degrees of freedom
## Multiple R-squared: 0.8412, Adjusted R-squared: 0.8314
## F-statistic: 86.01 on 29 and 471 DF, p-value: < 2.2e-16
anova(fit.1, fit.strat)
## Analysis of Variance Table
##
## Model 1: I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) + scale(pblack_1) +
## scale(phisp) + I(RUCC >= 7)
## Model 2: I(deaths/population) ~ STATE/(scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7))
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 495 0.00066533
## 2 471 0.00054848 24 0.00011685 4.181 4.489e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Or, we could examine an alternative regime, based on a demographic characteristic, say migration:
spplot(spdat, "mig_group", col.regions=brewer.pal(n=3, "Greys"))
fit.strat2<-lm(I(deaths/population)~mig_group/(scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7)), data=spdat)
summary(fit.strat2)
##
## Call:
## lm(formula = I(deaths/population) ~ mig_group/(scale(ppersonspo) +
## scale(p65plus) + scale(pblack_1) + scale(phisp) + I(RUCC >=
## 7)), data = spdat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0060718 -0.0005637 0.0000301 0.0006485 0.0033088
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 1.052e-02 1.417e-04 74.268
## mig_group(0.18,0.227] 2.396e-04 1.891e-04 1.267
## mig_group(0.227,0.504] -3.476e-04 1.900e-04 -1.830
## mig_group[0.0622,0.18]:scale(ppersonspo) 2.522e-04 1.135e-04 2.222
## mig_group(0.18,0.227]:scale(ppersonspo) 5.935e-04 1.557e-04 3.812
## mig_group(0.227,0.504]:scale(ppersonspo) 5.085e-04 1.198e-04 4.243
## mig_group[0.0622,0.18]:scale(p65plus) 1.738e-03 1.272e-04 13.661
## mig_group(0.18,0.227]:scale(p65plus) 2.175e-03 1.264e-04 17.204
## mig_group(0.227,0.504]:scale(p65plus) 2.118e-03 8.822e-05 24.007
## mig_group[0.0622,0.18]:scale(pblack_1) 1.319e-04 1.149e-04 1.147
## mig_group(0.18,0.227]:scale(pblack_1) -1.363e-04 1.404e-04 -0.971
## mig_group(0.227,0.504]:scale(pblack_1) 9.340e-05 1.453e-04 0.643
## mig_group[0.0622,0.18]:scale(phisp) -8.396e-04 1.184e-04 -7.092
## mig_group(0.18,0.227]:scale(phisp) -9.079e-04 1.304e-04 -6.962
## mig_group(0.227,0.504]:scale(phisp) -8.952e-04 1.386e-04 -6.461
## mig_group[0.0622,0.18]:I(RUCC >= 7)TRUE 3.227e-04 2.125e-04 1.519
## mig_group(0.18,0.227]:I(RUCC >= 7)TRUE -2.944e-04 2.044e-04 -1.441
## mig_group(0.227,0.504]:I(RUCC >= 7)TRUE -3.302e-04 1.901e-04 -1.737
## Pr(>|t|)
## (Intercept) < 2e-16 ***
## mig_group(0.18,0.227] 0.205722
## mig_group(0.227,0.504] 0.067872 .
## mig_group[0.0622,0.18]:scale(ppersonspo) 0.026740 *
## mig_group(0.18,0.227]:scale(ppersonspo) 0.000156 ***
## mig_group(0.227,0.504]:scale(ppersonspo) 2.65e-05 ***
## mig_group[0.0622,0.18]:scale(p65plus) < 2e-16 ***
## mig_group(0.18,0.227]:scale(p65plus) < 2e-16 ***
## mig_group(0.227,0.504]:scale(p65plus) < 2e-16 ***
## mig_group[0.0622,0.18]:scale(pblack_1) 0.251779
## mig_group(0.18,0.227]:scale(pblack_1) 0.332149
## mig_group(0.227,0.504]:scale(pblack_1) 0.520587
## mig_group[0.0622,0.18]:scale(phisp) 4.74e-12 ***
## mig_group(0.18,0.227]:scale(phisp) 1.10e-11 ***
## mig_group(0.227,0.504]:scale(phisp) 2.54e-10 ***
## mig_group[0.0622,0.18]:I(RUCC >= 7)TRUE 0.129412
## mig_group(0.18,0.227]:I(RUCC >= 7)TRUE 0.150336
## mig_group(0.227,0.504]:I(RUCC >= 7)TRUE 0.082976 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.001116 on 483 degrees of freedom
## Multiple R-squared: 0.8257, Adjusted R-squared: 0.8196
## F-statistic: 134.6 on 17 and 483 DF, p-value: < 2.2e-16
anova(fit.1, fit.strat2)
## Analysis of Variance Table
##
## Model 1: I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) + scale(pblack_1) +
## scale(phisp) + I(RUCC >= 7)
## Model 2: I(deaths/population) ~ mig_group/(scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7))
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 495 0.00066533
## 2 483 0.00060171 12 6.362e-05 4.2557 2.056e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(fit.strat2, fit.strat)
## Analysis of Variance Table
##
## Model 1: I(deaths/population) ~ mig_group/(scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7))
## Model 2: I(deaths/population) ~ STATE/(scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7))
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 483 0.00060171
## 2 471 0.00054848 12 5.3229e-05 3.8092 1.477e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(spdat, main="Neighbors for low migration counties")
plot(nbs1,coords=coordinates(spdat[spdat$mig_group==lvs[1],]), col=2, add=T)
plot(spdat, main="Neighbors for medium migration counties")
plot(nbs2,coords=coordinates(spdat[spdat$mig_group==lvs[2],]), col=2, add=T)
plot(spdat, main="Neighbors for high migration counties")
plot(nbs3,coords=coordinates(spdat[spdat$mig_group==lvs[3],]), col=2, add=T)
wts<-nb2listw(nbs)
wts1<-nb2listw(nbs1)
wts2<-nb2listw(nbs2)
wts3<-nb2listw(nbs3)
So we see non-stationarity in several of the covariates, with respect to state. The effects of poverty, age structure and race/ethnicty all vary significantly by state. Similar results are also seen for the migration rate grouping variable, althought the final model comparison of the two stratified models shows more variation between states than between migration groups.
If we wanted, we could stratify our analysis by state. Below, I simply stratify the analysis by state and use the OLS model.
fit.ar<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), data=spdat, subset=STATE=="05")
summary(fit.ar)
##
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat,
## subset = STATE == "05")
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.356e-03 -4.771e-04 3.499e-05 5.518e-04 2.209e-03
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0096818 0.0005626 17.210 < 2e-16 ***
## scale(ppersonspo) 0.0006526 0.0002108 3.097 0.00283 **
## scale(p65plus) 0.0019531 0.0001645 11.871 < 2e-16 ***
## scale(pblack_1) 0.0002350 0.0001326 1.773 0.08072 .
## scale(phisp) -0.0017834 0.0007656 -2.329 0.02277 *
## I(RUCC >= 7)TRUE 0.0002247 0.0002492 0.902 0.37037
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0008916 on 69 degrees of freedom
## Multiple R-squared: 0.8106, Adjusted R-squared: 0.7969
## F-statistic: 59.08 on 5 and 69 DF, p-value: < 2.2e-16
fit.la<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), data=spdat, subset=STATE=="22")
summary(fit.la)
##
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat,
## subset = STATE == "22")
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0035436 -0.0003792 -0.0000223 0.0005726 0.0018121
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.684e-03 1.663e-03 5.222 2.50e-06 ***
## scale(ppersonspo) 3.219e-04 2.212e-04 1.455 0.151
## scale(p65plus) 2.355e-03 2.375e-04 9.916 4.25e-14 ***
## scale(pblack_1) 9.331e-05 1.627e-04 0.573 0.569
## scale(phisp) -3.092e-03 2.181e-03 -1.417 0.162
## I(RUCC >= 7)TRUE 3.250e-04 3.047e-04 1.067 0.291
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0009234 on 58 degrees of freedom
## Multiple R-squared: 0.7974, Adjusted R-squared: 0.7799
## F-statistic: 45.65 on 5 and 58 DF, p-value: < 2.2e-16
fit.ok<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), data=spdat, subset=STATE=="40")
summary(fit.ok)
##
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat,
## subset = STATE == "40")
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.029e-03 -5.192e-04 -5.854e-05 6.775e-04 2.030e-03
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0116220 0.0003860 30.106 < 2e-16 ***
## scale(ppersonspo) 0.0008200 0.0001755 4.671 1.38e-05 ***
## scale(p65plus) 0.0025302 0.0001688 14.993 < 2e-16 ***
## scale(pblack_1) 0.0001879 0.0004152 0.452 0.652
## scale(phisp) 0.0001186 0.0005281 0.225 0.823
## I(RUCC >= 7)TRUE -0.0002993 0.0002681 -1.116 0.268
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0009816 on 71 degrees of freedom
## Multiple R-squared: 0.8346, Adjusted R-squared: 0.8229
## F-statistic: 71.65 on 5 and 71 DF, p-value: < 2.2e-16
fit.nm<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), data=spdat, subset=STATE=="35")
summary(fit.nm)
##
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat,
## subset = STATE == "35")
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0044801 -0.0007503 -0.0000290 0.0008573 0.0043913
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0095430 0.0021307 4.479 0.000123 ***
## scale(ppersonspo) 0.0005528 0.0002949 1.874 0.071753 .
## scale(p65plus) 0.0015753 0.0002571 6.126 1.52e-06 ***
## scale(pblack_1) 0.0006666 0.0031883 0.209 0.835951
## scale(phisp) 0.0001852 0.0003545 0.522 0.605598
## I(RUCC >= 7)TRUE -0.0005091 0.0006664 -0.764 0.451516
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.001638 on 27 degrees of freedom
## Multiple R-squared: 0.6221, Adjusted R-squared: 0.5521
## F-statistic: 8.89 on 5 and 27 DF, p-value: 4.382e-05
fit.tx<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), data=spdat, subset=STATE=="48")
summary(fit.tx)
##
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat,
## subset = STATE == "48")
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0042947 -0.0005506 0.0000912 0.0006799 0.0029490
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.034e-02 1.101e-04 93.921 < 2e-16 ***
## scale(ppersonspo) 2.606e-04 1.144e-04 2.278 0.023568 *
## scale(p65plus) 2.228e-03 8.809e-05 25.292 < 2e-16 ***
## scale(pblack_1) 4.049e-04 1.680e-04 2.410 0.016702 *
## scale(phisp) -4.991e-04 1.373e-04 -3.636 0.000337 ***
## I(RUCC >= 7)TRUE -3.663e-04 1.676e-04 -2.186 0.029786 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.00111 on 246 degrees of freedom
## Multiple R-squared: 0.8453, Adjusted R-squared: 0.8422
## F-statistic: 268.9 on 5 and 246 DF, p-value: < 2.2e-16
Or, we could run a SAR model for each subset using the weights we created above:
#first the "global" model
efit<-lagsarlm(scale(I(deaths/population))~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), data=spdat, listw=wts)
Then, a regime specification as used by Brazil 2015
efit0<-lagsarlm(scale(I(deaths/population))~mig_group/(scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7)), data=spdat, listw=wts)
summary(efit0)
##
## Call:
## lagsarlm(formula = scale(I(deaths/population)) ~ mig_group/(scale(ppersonspo) +
## scale(p65plus) + scale(pblack_1) + scale(phisp) + I(RUCC >=
## 7)), data = spdat, listw = wts)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.2535080 -0.2175824 0.0092613 0.2420036 1.2534095
##
## Type: lag
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value
## (Intercept) 0.070702 0.052020 1.3591
## mig_group(0.18,0.227] 0.062960 0.069376 0.9075
## mig_group(0.227,0.504] -0.144586 0.069608 -2.0771
## mig_group[0.0622,0.18]:scale(ppersonspo) 0.089667 0.041622 2.1543
## mig_group(0.18,0.227]:scale(ppersonspo) 0.212126 0.057090 3.7156
## mig_group(0.227,0.504]:scale(ppersonspo) 0.161934 0.044400 3.6472
## mig_group[0.0622,0.18]:scale(p65plus) 0.613779 0.047683 12.8721
## mig_group(0.18,0.227]:scale(p65plus) 0.791811 0.047329 16.7298
## mig_group(0.227,0.504]:scale(p65plus) 0.784719 0.033044 23.7478
## mig_group[0.0622,0.18]:scale(pblack_1) 0.052671 0.042112 1.2507
## mig_group(0.18,0.227]:scale(pblack_1) -0.042097 0.051436 -0.8184
## mig_group(0.227,0.504]:scale(pblack_1) 0.038860 0.053224 0.7301
## mig_group[0.0622,0.18]:scale(phisp) -0.284882 0.044298 -6.4311
## mig_group(0.18,0.227]:scale(phisp) -0.301421 0.048620 -6.1995
## mig_group(0.227,0.504]:scale(phisp) -0.282886 0.052171 -5.4223
## mig_group[0.0622,0.18]:I(RUCC >= 7)TRUE 0.086956 0.078148 1.1127
## mig_group(0.18,0.227]:I(RUCC >= 7)TRUE -0.121716 0.075011 -1.6226
## mig_group(0.227,0.504]:I(RUCC >= 7)TRUE -0.154449 0.069837 -2.2116
## Pr(>|z|)
## (Intercept) 0.1741000
## mig_group(0.18,0.227] 0.3641342
## mig_group(0.227,0.504] 0.0377878
## mig_group[0.0622,0.18]:scale(ppersonspo) 0.0312148
## mig_group(0.18,0.227]:scale(ppersonspo) 0.0002027
## mig_group(0.227,0.504]:scale(ppersonspo) 0.0002651
## mig_group[0.0622,0.18]:scale(p65plus) < 2.2e-16
## mig_group(0.18,0.227]:scale(p65plus) < 2.2e-16
## mig_group(0.227,0.504]:scale(p65plus) < 2.2e-16
## mig_group[0.0622,0.18]:scale(pblack_1) 0.2110316
## mig_group(0.18,0.227]:scale(pblack_1) 0.4131111
## mig_group(0.227,0.504]:scale(pblack_1) 0.4653192
## mig_group[0.0622,0.18]:scale(phisp) 1.267e-10
## mig_group(0.18,0.227]:scale(phisp) 5.665e-10
## mig_group(0.227,0.504]:scale(phisp) 5.883e-08
## mig_group[0.0622,0.18]:I(RUCC >= 7)TRUE 0.2658359
## mig_group(0.18,0.227]:I(RUCC >= 7)TRUE 0.1046657
## mig_group(0.227,0.504]:I(RUCC >= 7)TRUE 0.0269975
##
## Rho: 0.12088, LR test value: 16.799, p-value: 4.156e-05
## Asymptotic standard error: 0.028979
## z-value: 4.1712, p-value: 3.0305e-05
## Wald statistic: 17.399, p-value: 3.0305e-05
##
## Log likelihood: -264.3111 for lag model
## ML residual variance (sigma squared): 0.16721, (sigma: 0.40891)
## Number of observations: 501
## Number of parameters estimated: 20
## AIC: 568.62, (AIC for lm: 583.42)
## LM test for residual autocorrelation
## test value: 3.5091, p-value: 0.061031
anova(efit, efit0)
## Model df AIC logLik Test L.Ratio p-value
## efit 1 8 595.77 -289.89 1
## efit0 2 20 568.62 -264.31 2 51.15 8.7649e-07
Or, I could fit the fully stratified models, one regime at a time by subsetting and using the group-specific spatial weights:
efit1<-lagsarlm(scale(I(deaths/population))~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), data=spdat[spdat$mig_group==lvs[1],], listw=wts1)
efit2<-lagsarlm(scale(I(deaths/population))~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), data=spdat[spdat$mig_group==lvs[2],], listw=wts2)
efit3<-lagsarlm(scale(I(deaths/population))~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), data=spdat[spdat$mig_group==lvs[3],], listw=wts3)
summary(efit1)
##
## Call:lagsarlm(formula = scale(I(deaths/population)) ~ scale(ppersonspo) +
## scale(p65plus) + scale(pblack_1) + scale(phisp) + I(RUCC >=
## 7), data = spdat[spdat$mig_group == lvs[1], ], listw = wts1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.4544851 -0.2654621 -0.0015307 0.2899709 1.0866101
##
## Type: lag
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.044804 0.054807 -0.8175 0.41365
## scale(ppersonspo) 0.122484 0.054066 2.2655 0.02348
## scale(p65plus) 0.573820 0.048549 11.8195 < 2.2e-16
## scale(pblack_1) 0.054060 0.064939 0.8325 0.40515
## scale(phisp) -0.401658 0.072147 -5.5672 2.588e-08
## I(RUCC >= 7)TRUE 0.102547 0.091158 1.1249 0.26062
##
## Rho: 0.15548, LR test value: 6.4493, p-value: 0.0111
## Asymptotic standard error: 0.059236
## z-value: 2.6247, p-value: 0.0086724
## Wald statistic: 6.8891, p-value: 0.0086724
##
## Log likelihood: -103.2209 for lag model
## ML residual variance (sigma squared): 0.22757, (sigma: 0.47704)
## Number of observations: 151
## Number of parameters estimated: 8
## AIC: 222.44, (AIC for lm: 226.89)
## LM test for residual autocorrelation
## test value: 2.8762, p-value: 0.089897
summary(efit2)
##
## Call:lagsarlm(formula = scale(I(deaths/population)) ~ scale(ppersonspo) +
## scale(p65plus) + scale(pblack_1) + scale(phisp) + I(RUCC >=
## 7), data = spdat[spdat$mig_group == lvs[2], ], listw = wts2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.244411 -0.258925 -0.022572 0.238146 1.121171
##
## Type: lag
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.061767 0.047187 1.3090 0.1905
## scale(ppersonspo) 0.170906 0.038992 4.3831 1.170e-05
## scale(p65plus) 0.778337 0.043870 17.7418 < 2.2e-16
## scale(pblack_1) -0.031947 0.041736 -0.7655 0.4440
## scale(phisp) -0.315580 0.044718 -7.0571 1.701e-12
## I(RUCC >= 7)TRUE -0.133604 0.074314 -1.7978 0.0722
##
## Rho: 0.10064, LR test value: 3.8633, p-value: 0.049354
## Asymptotic standard error: 0.050143
## z-value: 2.007, p-value: 0.044746
## Wald statistic: 4.0282, p-value: 0.044746
##
## Log likelihood: -77.42665 for lag model
## ML residual variance (sigma squared): 0.16374, (sigma: 0.40464)
## Number of observations: 150
## Number of parameters estimated: 8
## AIC: 170.85, (AIC for lm: 172.72)
## LM test for residual autocorrelation
## test value: 0.16968, p-value: 0.6804
summary(efit3)
##
## Call:lagsarlm(formula = scale(I(deaths/population)) ~ scale(ppersonspo) +
## scale(p65plus) + scale(pblack_1) + scale(phisp) + I(RUCC >=
## 7), data = spdat[spdat$mig_group == lvs[3], ], listw = wts3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.384600 -0.188351 0.037029 0.229312 1.174578
##
## Type: lag
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.048072 0.038961 1.2338 0.21726
## scale(ppersonspo) 0.143595 0.035900 3.9998 6.339e-05
## scale(p65plus) 0.856224 0.036718 23.3190 < 2.2e-16
## scale(pblack_1) 0.023191 0.034032 0.6815 0.49558
## scale(phisp) -0.207571 0.034207 -6.0680 1.295e-09
## I(RUCC >= 7)TRUE -0.119329 0.067403 -1.7704 0.07666
##
## Rho: 0.015616, LR test value: 0.12585, p-value: 0.72278
## Asymptotic standard error: 0.043516
## z-value: 0.35886, p-value: 0.7197
## Wald statistic: 0.12878, p-value: 0.7197
##
## Log likelihood: -97.39382 for lag model
## ML residual variance (sigma squared): 0.15505, (sigma: 0.39376)
## Number of observations: 200
## Number of parameters estimated: 8
## AIC: 210.79, (AIC for lm: 208.91)
## LM test for residual autocorrelation
## test value: 0.80749, p-value: 0.36886
First proposed by Brunsdon et al 1996
Their model was specified, where now each \(\beta_k\) is estimated at the location \(u_i,v_j\) where i and j are the coordinates or geographic location of the observation i. \(\beta_k(ui,vj)\) is the local realization of \(\beta\) at location i,j. This constructs a trend surface of parameter values for each independent variable and the model intercept.
Note that the basic OLS regression model above is just a special case of the GWR model where the coefficients are constant over space. The parameters in the GWR are estimated by weighted least squares. The weighting matrix is a diagonal matrix, with each diagonal element being a function of the location of the observation. if Wi is the weighting matrix at location i, then the parameter estimate at that location would be
\(\hat \beta_i = (X'WX)^{-1} X'WY\)
The role of the weight matrix is to give more value to observations that are close to i. as it is assumed that observations that are close will influence each other more than those that are far away. This is NOT the same W matrix as was used in Moran’s I!
W is generally constructed based on the distances between locations, and if \(d_ij\) < d , then \(w_{ij}\)=1, and 0 otherwise. Brunsdson, Charlton, and Fotheringham used a kernel-density approach to measure W
\(w_{ij} = exp(- \frac{1}{2}(d_{ij}/b)^2)\)
where b is the bandwidth parameter to be estimated. under this weighting scheme, the weight for points that exactly co-occur will be 1 and for those that are the distance \(d_{ij}\) apart, the weight will decrease as a Gaussian curve as \(d_{ij}\) increases.
There are several ways to do this, but most programs use a cross-validation method to choose the optimal kernel bandwidth. The cross validation method takes the form: \(CV = \Sigma_i \left[y_i - \hat y_{\neq i} (\beta) \right ]^2\)
Where \(\hat y_{\neq i} (\beta)\) is the fitted value of \(y_i\) with the observations for point i omitted from the calibration process. This in effect minimizes the sum of squared errors at all locations i, and arrives at an optimal bandwidth.
When the model is fit, we can also calculate local z/t tests to test whether the effect of a specifix X variable is significant at a particular location. Likewise, we can calculate a local \(R^2\) measure at each location.
Leung et al 2000 proposed a series of F tests for model improvement over OLS, and they also derive an F test for variation in the beta coefficients (so called F3 test.
The advantage of GWR is that we can visualize the difference in the covariate effects as a continuous surface. There are a few steps to fitting the GWR model. First we need to select a bandwidth for the weighting algorithm
gwr.b.f<-gwr.sel(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), spdat)
## Bandwidth: 837305.2 CV score: 0.0006802827
## Bandwidth: 1353436 CV score: 0.0006843992
## Bandwidth: 518319 CV score: 0.000671332
## Bandwidth: 321174.7 CV score: 0.0006543891
## Bandwidth: 199332.9 CV score: 0.0006472186
## Bandwidth: 124030.4 CV score: 0.0006733644
## Bandwidth: 245872.3 CV score: 0.0006478826
## Bandwidth: 210553.7 CV score: 0.0006470031
## Bandwidth: 215074.7 CV score: 0.0006469975
## Bandwidth: 213356.6 CV score: 0.0006469947
## Bandwidth: 213425.2 CV score: 0.0006469947
## Bandwidth: 213408.1 CV score: 0.0006469947
## Bandwidth: 213407.9 CV score: 0.0006469947
## Bandwidth: 213407.9 CV score: 0.0006469947
## Bandwidth: 213407.9 CV score: 0.0006469947
## Bandwidth: 213388.3 CV score: 0.0006469947
## Bandwidth: 213400.4 CV score: 0.0006469947
## Bandwidth: 213405 CV score: 0.0006469947
## Bandwidth: 213406.8 CV score: 0.0006469947
## Bandwidth: 213407.4 CV score: 0.0006469947
## Bandwidth: 213407.7 CV score: 0.0006469947
## Bandwidth: 213407.8 CV score: 0.0006469947
## Bandwidth: 213407.8 CV score: 0.0006469947
## Bandwidth: 213407.9 CV score: 0.0006469947
## Bandwidth: 213407.9 CV score: 0.0006469947
## Bandwidth: 213407.9 CV score: 0.0006469947
gwr.b.f
## [1] 213407.9
#this is the distance (in meters, because our data are projected in a system measured in meters), which the weighting fuction will search, and include all observations within this radius.
gwr.f<-gwr(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), spdat, bandwidth=gwr.b.f, se.fit=T, hatmatrix=T)
gwr.f
## Call:
## gwr(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat,
## bandwidth = gwr.b.f, hatmatrix = T, se.fit = T)
## Kernel function: gwr.Gauss
## Fixed bandwidth: 213407.9
## Summary of GWR coefficient estimates at data points:
## Min. 1st Qu. Median 3rd Qu. Max.
## X.Intercept. 9.360e-03 1.027e-02 1.045e-02 1.080e-02 1.125e-02
## scale.ppersonspo. -2.531e-04 4.700e-04 6.296e-04 7.949e-04 9.485e-04
## scale.p65plus. 1.410e-03 2.060e-03 2.136e-03 2.220e-03 2.367e-03
## scale.pblack_1. -1.134e-04 4.689e-05 1.157e-04 2.391e-04 2.934e-03
## scale.phisp. -2.182e-03 -1.048e-03 -9.715e-04 -7.932e-04 3.420e-04
## I.RUCC....7.TRUE -1.096e-03 -3.638e-04 -2.153e-04 5.367e-05 4.777e-04
## Global
## X.Intercept. 0.0105
## scale.ppersonspo. 0.0006
## scale.p65plus. 0.0021
## scale.pblack_1. 0.0001
## scale.phisp. -0.0009
## I.RUCC....7.TRUE -0.0002
## Number of data points: 501
## Effective number of parameters (residual: 2traceS - traceS'S): 46.7347
## Effective degrees of freedom (residual: 2traceS - traceS'S): 454.2653
## Sigma (residual: 2traceS - traceS'S): 0.001059183
## Effective number of parameters (model: traceS): 33.82929
## Effective degrees of freedom (model: traceS): 467.1707
## Sigma (model: traceS): 0.001044451
## Sigma (ML): 0.001008572
## AICc (GWR p. 61, eq 2.33; p. 96, eq. 4.21): -5416.218
## AIC (GWR p. 96, eq. 4.22): -5457.412
## Residual sum of squares: 0.0005096261
## Quasi-global R2: 0.8524083
BFC02.gwr.test(gwr.f)
##
## Brunsdon, Fotheringham & Charlton (2002, pp. 91-2) ANOVA
##
## data: gwr.f
## F = 1.3055, df1 = 495.00, df2 = 454.27, p-value = 0.001934
## alternative hypothesis: greater
## sample estimates:
## SS OLS residuals SS GWR residuals
## 0.0006653254 0.0005096261
BFC99.gwr.test(gwr.f)
##
## Brunsdon, Fotheringham & Charlton (1999) ANOVA
##
## data: gwr.f
## F = 3.4071, df1 = 256.14, df2 = 470.49, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## SS GWR improvement SS GWR residuals
## 0.0001556992 0.0005096261
LMZ.F1GWR.test(gwr.f)
##
## Leung et al. (2000) F(1) test
##
## data: gwr.f
## F = 0.83467, df1 = 470.49, df2 = 495.00, p-value = 0.02385
## alternative hypothesis: less
## sample estimates:
## SS OLS residuals SS GWR residuals
## 0.0006653254 0.0005096261
LMZ.F2GWR.test(gwr.f)
##
## Leung et al. (2000) F(2) test
##
## data: gwr.f
## F = 2.8438, df1 = 66.187, df2 = 495.000, p-value = 5.23e-11
## alternative hypothesis: greater
## sample estimates:
## SS OLS residuals SS GWR improvement
## 0.0006653254 0.0001556992
LMZ.F3GWR.test(gwr.f)
##
## Leung et al. (2000) F(3) test
##
## F statistic Numerator d.f. Denominator d.f. Pr(>)
## (Intercept) 2.9132 54.3122 470.49 5.168e-10
## scale(ppersonspo) 3.7056 143.4838 470.49 < 2.2e-16
## scale(p65plus) 2.1047 106.2542 470.49 5.926e-08
## scale(pblack_1) 2.8647 42.1765 470.49 3.026e-08
## scale(phisp) 2.4810 46.2870 470.49 9.023e-07
## I(RUCC >= 7)TRUE 2.3018 205.7501 470.49 8.846e-14
##
## (Intercept) ***
## scale(ppersonspo) ***
## scale(p65plus) ***
## scale(pblack_1) ***
## scale(phisp) ***
## I(RUCC >= 7)TRUE ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
gwr.dat<-gwr.f$SDF
cols<-brewer.pal(n=6, name="RdBu")
names(gwr.dat)
## [1] "sum.w" "X.Intercept."
## [3] "scale.ppersonspo." "scale.p65plus."
## [5] "scale.pblack_1." "scale.phisp."
## [7] "I.RUCC....7.TRUE" "X.Intercept._se"
## [9] "scale.ppersonspo._se" "scale.p65plus._se"
## [11] "scale.pblack_1._se" "scale.phisp._se"
## [13] "I.RUCC....7.TRUE_se" "gwr.e"
## [15] "pred" "pred.se"
## [17] "localR2" "X.Intercept._se_EDF"
## [19] "scale.ppersonspo._se_EDF" "scale.p65plus._se_EDF"
## [21] "scale.pblack_1._se_EDF" "scale.phisp._se_EDF"
## [23] "I.RUCC....7.TRUE_se_EDF" "pred.se_EDF"
#pdf("gwrfixed.pdf")
spplot(gwr.dat, "scale.ppersonspo.", at=quantile(gwr.f$SDF$scale.ppersonspo.), col.regions=cols, main="% Poverty effect")
spplot(gwr.dat, "scale.pblack_1.", at=quantile(gwr.f$SDF$scale.pblack_1.), col.regions=cols, main="% Black effect")
spplot(gwr.dat, "scale.phisp.", at=quantile(gwr.f$SDF$scale.phisp.), col.regions=cols, main="% Hispanic effect")
#another "lighter" plot method
#brks<-c(-3.038e-04, 3.441e-04, 5.882e-04, 7.912e-04 , 1.355e-03); cols<-brewer.pal(n=5,"Reds")
#plot(gwr.a$SDF, col=cols[findInterval(gwr.a$SDF$scale.ppersonspo.,brks, all.inside=TRUE)])
#Local R^2
brks<-seq(0, 1, length.out = 9); cols=brewer.pal(n=9, "Reds")
spplot(gwr.f$SDF, "localR2", at=brks, col.regions=cols, main="local R^2")
#plots of the correlations of the coefficients
pairs(gwr.f$SDF[,3:7])
cor(as(gwr.f$SDF[,3:7], "data.frame"))
## scale.ppersonspo. scale.p65plus. scale.pblack_1.
## scale.ppersonspo. 1.000000000 -0.001474733 -0.3823172
## scale.p65plus. -0.001474733 1.000000000 -0.6196133
## scale.pblack_1. -0.382317219 -0.619613301 1.0000000
## scale.phisp. -0.650938360 -0.193689549 0.6597657
## I.RUCC....7.TRUE -0.286463792 0.134305334 -0.2545913
## scale.phisp. I.RUCC....7.TRUE
## scale.ppersonspo. -0.6509384 -0.2864638
## scale.p65plus. -0.1936895 0.1343053
## scale.pblack_1. 0.6597657 -0.2545913
## scale.phisp. 1.0000000 -0.2432299
## I.RUCC....7.TRUE -0.2432299 1.0000000
This plot shows a problem in GWR, that the coefficients can often be highly correlated See Wheeler and Tiefelsdorf (2005), one of the major critiques of GWR. In general, the critiques of GWR have pushed most users to use it for “exploratory” purposes only. So, one way we may explore our model is to examine where the regression effects are “significant” versus not. This can be done by plotting the local t-statistics, or z-statistics if you prefer.
cols<-brewer.pal(8,"RdBu")
dfree<-gwr.f$results$edf
names(gwr.dat)
## [1] "sum.w" "X.Intercept."
## [3] "scale.ppersonspo." "scale.p65plus."
## [5] "scale.pblack_1." "scale.phisp."
## [7] "I.RUCC....7.TRUE" "X.Intercept._se"
## [9] "scale.ppersonspo._se" "scale.p65plus._se"
## [11] "scale.pblack_1._se" "scale.phisp._se"
## [13] "I.RUCC....7.TRUE_se" "gwr.e"
## [15] "pred" "pred.se"
## [17] "localR2" "X.Intercept._se_EDF"
## [19] "scale.ppersonspo._se_EDF" "scale.p65plus._se_EDF"
## [21] "scale.pblack_1._se_EDF" "scale.phisp._se_EDF"
## [23] "I.RUCC....7.TRUE_se_EDF" "pred.se_EDF"
gwr.dat$pov.t<-gwr.dat$scale.ppersonspo./gwr.dat$scale.ppersonspo._se
gwr.dat$pov.t.p<-2*pt(-abs(gwr.dat$pov.t) , dfree)
tbrks<- c(min(gwr.dat$pov.t),qt(c( .975, .9, .1, .025),df=dfree, lower.tail = F),max(gwr.dat$pov.t))
spplot(gwr.dat, "pov.t", col.regions=cols, at=tbrks , main="t-stat %Poverty")
gwr.dat$black.t<-gwr.dat$scale.pblack_1./gwr.dat$scale.pblack_1._se
gwr.dat$black.t.p<-2*pt(-abs(gwr.dat$black.t) , dfree)
tbrks<- c(min(gwr.dat$black.t),qt(c( .975, .9, .1, .025),df=dfree, lower.tail = F),max(gwr.dat$black.t))
spplot(gwr.dat, "black.t", col.regions=cols,at=tbrks , main="t-stat %Black")
gwr.dat$hisp.t<-gwr.dat$scale.phisp./gwr.dat$scale.phisp._se
gwr.dat$hisp.t.p<-2*pt(-abs(gwr.dat$hisp.t) , dfree)
tbrks<- c(min(gwr.dat$hisp.t),qt(c( .975, .9, .1, .025),df=dfree, lower.tail = F),max(gwr.dat$hisp.t))
spplot(gwr.dat, "hisp.t", col.regions=cols, at=tbrks, main="t-stat %Hispanic")
So, those are the results from the fixed bandwith analysis. But there are obviously places in our data where counties are more densely occurring, so an adaptive kernel approach is also warranted. If all our data were equally spaced, then this may not be necessary.
Now we try everthing with the adaptive bandwidth:
gwr.b.a<-gwr.sel(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), spdat, adapt=T)
## Adaptive q: 0.381966 CV score: 0.0006708917
## Adaptive q: 0.618034 CV score: 0.0006759116
## Adaptive q: 0.236068 CV score: 0.0006662507
## Adaptive q: 0.145898 CV score: 0.0006610212
## Adaptive q: 0.09016994 CV score: 0.0006542459
## Adaptive q: 0.05572809 CV score: 0.0006442366
## Adaptive q: 0.03444185 CV score: 0.0006379112
## Adaptive q: 0.02128624 CV score: 0.0006543154
## Adaptive q: 0.04257247 CV score: 0.000642091
## Adaptive q: 0.02941686 CV score: 0.0006457471
## Adaptive q: 0.03754747 CV score: 0.0006396459
## Adaptive q: 0.03252248 CV score: 0.0006416178
## Adaptive q: 0.03543098 CV score: 0.0006381852
## Adaptive q: 0.03389516 CV score: 0.0006379173
## Adaptive q: 0.03419852 CV score: 0.0006378657
## Adaptive q: 0.03415783 CV score: 0.0006378591
## Adaptive q: 0.0340575 CV score: 0.0006378438
## Adaptive q: 0.03399549 CV score: 0.0006378352
## Adaptive q: 0.0339548 CV score: 0.00063783
## Adaptive q: 0.0339548 CV score: 0.00063783
gwr.b.a
## [1] 0.0339548
This is the proportion of all cases which the weighting fuction will search, and include this fraction of observations in a model for each county. In this case it will include 17.0113545 counties in the model for a particular county.
gwr.a<-gwr(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), spdat, adapt=gwr.b.a, se.fit=T, hatmatrix=T )
gwr.a
## Call:
## gwr(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat,
## adapt = gwr.b.a, hatmatrix = T, se.fit = T)
## Kernel function: gwr.Gauss
## Adaptive quantile: 0.0339548 (about 17 of 501 data points)
## Summary of GWR coefficient estimates at data points:
## Min. 1st Qu. Median 3rd Qu. Max.
## X.Intercept. 7.577e-03 9.934e-03 1.049e-02 1.089e-02 1.191e-02
## scale.ppersonspo. -3.038e-04 3.441e-04 5.882e-04 7.912e-04 1.355e-03
## scale.p65plus. 1.403e-03 1.998e-03 2.190e-03 2.333e-03 2.790e-03
## scale.pblack_1. -3.075e-04 6.612e-05 2.370e-04 4.524e-04 2.271e-03
## scale.phisp. -4.785e-03 -1.187e-03 -9.467e-04 -5.321e-04 7.882e-04
## I.RUCC....7.TRUE -1.135e-03 -4.824e-04 -2.081e-04 1.074e-04 5.971e-04
## Global
## X.Intercept. 0.0105
## scale.ppersonspo. 0.0006
## scale.p65plus. 0.0021
## scale.pblack_1. 0.0001
## scale.phisp. -0.0009
## I.RUCC....7.TRUE -0.0002
## Number of data points: 501
## Effective number of parameters (residual: 2traceS - traceS'S): 101.2227
## Effective degrees of freedom (residual: 2traceS - traceS'S): 399.7773
## Sigma (residual: 2traceS - traceS'S): 0.00104379
## Effective number of parameters (model: traceS): 72.72109
## Effective degrees of freedom (model: traceS): 428.2789
## Sigma (model: traceS): 0.001008461
## Sigma (ML): 0.0009324024
## AICc (GWR p. 61, eq 2.33; p. 96, eq. 4.21): -5396.638
## AIC (GWR p. 96, eq. 4.22): -5497.204
## Residual sum of squares: 0.0004355565
## Quasi-global R2: 0.8738594
BFC02.gwr.test(gwr.a)
##
## Brunsdon, Fotheringham & Charlton (2002, pp. 91-2) ANOVA
##
## data: gwr.a
## F = 1.5275, df1 = 495.00, df2 = 399.78, p-value = 5.41e-06
## alternative hypothesis: greater
## sample estimates:
## SS OLS residuals SS GWR residuals
## 0.0006653254 0.0004355565
BFC99.gwr.test(gwr.a)
##
## Brunsdon, Fotheringham & Charlton (1999) ANOVA
##
## data: gwr.a
## F = 2.2147, df1 = 394.70, df2 = 437.65, p-value = 4.244e-16
## alternative hypothesis: greater
## sample estimates:
## SS GWR improvement SS GWR residuals
## 0.0002297688 0.0004355565
LMZ.F1GWR.test(gwr.a)
##
## Leung et al. (2000) F(1) test
##
## data: gwr.a
## F = 0.81058, df1 = 437.65, df2 = 495.00, p-value = 0.01213
## alternative hypothesis: less
## sample estimates:
## SS OLS residuals SS GWR residuals
## 0.0006653254 0.0004355565
LMZ.F2GWR.test(gwr.a)
##
## Leung et al. (2000) F(2) test
##
## data: gwr.a
## F = 1.7952, df1 = 149.55, df2 = 495.00, p-value = 1.501e-06
## alternative hypothesis: greater
## sample estimates:
## SS OLS residuals SS GWR improvement
## 0.0006653254 0.0002297688
LMZ.F3GWR.test(gwr.a)
##
## Leung et al. (2000) F(3) test
##
## F statistic Numerator d.f. Denominator d.f. Pr(>)
## (Intercept) 1.4040 16.7158 437.65 0.131199
## scale(ppersonspo) 1.9580 141.0856 437.65 1.094e-07
## scale(p65plus) 1.7019 124.0449 437.65 4.939e-05
## scale(pblack_1) 1.5254 60.3445 437.65 0.009694
## scale(phisp) 1.0542 18.9957 437.65 0.396847
## I(RUCC >= 7)TRUE 1.4327 177.0345 437.65 0.001656
##
## (Intercept)
## scale(ppersonspo) ***
## scale(p65plus) ***
## scale(pblack_1) **
## scale(phisp)
## I(RUCC >= 7)TRUE **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
gwr.data<-gwr.a$SDF
spplot(gwr.data, "scale.ppersonspo.", at=quantile(gwr.a$SDF$scale.ppersonspo.), col.regions=cols, main="% Poverty effect")
spplot(gwr.data, "scale.pblack_1.", at=quantile(gwr.a$SDF$scale.pblack_1.), col.regions=cols, main="% Black effect")
spplot(gwr.data, "scale.phisp.", at=quantile(gwr.a$SDF$scale.phisp.), col.regions=cols, main="% Hispanic effect")
cols<-brewer.pal(8,"RdBu")
dfree<-gwr.a$results$edf
names(gwr.data)
## [1] "sum.w" "X.Intercept."
## [3] "scale.ppersonspo." "scale.p65plus."
## [5] "scale.pblack_1." "scale.phisp."
## [7] "I.RUCC....7.TRUE" "X.Intercept._se"
## [9] "scale.ppersonspo._se" "scale.p65plus._se"
## [11] "scale.pblack_1._se" "scale.phisp._se"
## [13] "I.RUCC....7.TRUE_se" "gwr.e"
## [15] "pred" "pred.se"
## [17] "localR2" "X.Intercept._se_EDF"
## [19] "scale.ppersonspo._se_EDF" "scale.p65plus._se_EDF"
## [21] "scale.pblack_1._se_EDF" "scale.phisp._se_EDF"
## [23] "I.RUCC....7.TRUE_se_EDF" "pred.se_EDF"
gwr.data$pov.t<-gwr.data$scale.ppersonspo./gwr.data$scale.ppersonspo._se
gwr.data$pov.t.p<-2*pt(-abs(gwr.data$pov.t) , dfree)
tbrks<- c(min(gwr.data$pov.t),qt(c( .975, .9, .1, .025),df=dfree, lower.tail = F),max(gwr.data$pov.t))
spplot(gwr.data, "pov.t", col.regions=cols, at=tbrks , main="t-stat %Poverty")
gwr.data$black.t<-gwr.data$scale.pblack_1./gwr.data$scale.pblack_1._se
gwr.data$black.t.p<-2*pt(-abs(gwr.data$black.t) , dfree)
tbrks<- c(min(gwr.data$black.t),qt(c( .975, .9, .1, .025),df=dfree, lower.tail = F),max(gwr.data$black.t))
spplot(gwr.data, "black.t", col.regions=cols,at=tbrks , main="t-stat %Black")
gwr.data$hisp.t<-gwr.data$scale.phisp./gwr.data$scale.phisp._se
gwr.data$hisp.t.p<-2*pt(-abs(gwr.data$hisp.t) , dfree)
tbrks<- c(min(gwr.data$hisp.t),qt(c( .975, .9, .1, .025),df=dfree, lower.tail = F),max(gwr.data$hisp.t))
spplot(gwr.data, "hisp.t", col.regions=cols, at=tbrks, main="t-stat %Hispanic")
#Local R^2
brks<-seq(0, 1, length.out = 9); cols=brewer.pal(n=9, "Reds")
spplot(gwr.a$SDF, "localR2", at=brks, col.regions=cols, main="local R^2")
The first step in the construction of spatial regimes is to do a hierarchical clustering method. We use a distance matrix between all observations for this In this case, we’re using the GWR coefficients as our data.
dat.dist<-dist(gwr.data@data[, 3:7])
#The hclust() function does basic hierarchical clustering
#according to several different methods, we'll use Ward's method
clust.dat<-hclust(dat.dist, method="ward.D")
#And we'll plot the dendrogram, or tree plot
plot(clust.dat)
#I only want a few groups, so I'll cut the tree so I get 5 clusters
gwr.data$clus<-cutree(clust.dat, k=4)
#i'll use table() to get the frequencies of each cluster
table(gwr.data$clus)
##
## 1 2 3 4
## 85 269 90 57
#to get it to plot right, we have to convert the cluster number
#to a factor variable
gwr.data$b.cf<-as.factor(gwr.data$clus)
spplot(gwr.data,"b.cf", col.regions=brewer.pal(4, "Accent"), main="GWR Spatial Regimes")
#We can get the means of each coefficient for each of the regime areas
b.means<-aggregate(gwr.data@data[, 3:5], by=list(gwr.data$clus), mean)
b.means
## Group.1 scale.ppersonspo. scale.p65plus. scale.pblack_1.
## 1 1 0.0005680811 0.002101334 0.0002283936
## 2 2 0.0006852921 0.002200900 0.0001550988
## 3 3 0.0002219154 0.002300691 0.0004603948
## 4 4 0.0005660539 0.001853176 0.0013099481
#The main idea behind examining spatial regimes is that we are looking for areas where the model may be working differently, we've done this, now we will
#fit separate ols models to each regime
spdat$cluster<-gwr.data$clus
lm1<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7),spdat, subset=spdat$cluster==1)
lm2<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7),spdat, subset=spdat$cluster==2)
lm3<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7),spdat, subset=spdat$cluster==3)
lm4<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7),spdat, subset=spdat$cluster==4)
#examine each of the spatial regimes
summary(lm1)
##
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat,
## subset = spdat$cluster == 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.878e-03 -6.063e-04 -4.957e-05 3.881e-04 2.643e-03
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.251e-03 9.556e-04 8.634 5.04e-13 ***
## scale(ppersonspo) 5.763e-04 1.537e-04 3.749 0.000336 ***
## scale(p65plus) 2.069e-03 1.489e-04 13.896 < 2e-16 ***
## scale(pblack_1) 1.421e-04 1.188e-04 1.196 0.235173
## scale(phisp) -3.630e-03 1.288e-03 -2.819 0.006084 **
## I(RUCC >= 7)TRUE 7.228e-05 2.263e-04 0.319 0.750227
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0008858 on 79 degrees of freedom
## Multiple R-squared: 0.8255, Adjusted R-squared: 0.8145
## F-statistic: 74.75 on 5 and 79 DF, p-value: < 2.2e-16
summary(lm2)
##
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat,
## subset = spdat$cluster == 2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0043029 -0.0005601 0.0000598 0.0006509 0.0029322
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.041e-02 1.040e-04 100.131 < 2e-16 ***
## scale(ppersonspo) 7.790e-04 1.087e-04 7.166 7.79e-12 ***
## scale(p65plus) 2.075e-03 8.251e-05 25.144 < 2e-16 ***
## scale(pblack_1) -1.279e-04 1.052e-04 -1.216 0.225
## scale(phisp) -1.287e-03 1.277e-04 -10.082 < 2e-16 ***
## I(RUCC >= 7)TRUE -2.334e-04 1.620e-04 -1.441 0.151
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.001093 on 263 degrees of freedom
## Multiple R-squared: 0.8347, Adjusted R-squared: 0.8315
## F-statistic: 265.5 on 5 and 263 DF, p-value: < 2.2e-16
summary(lm3)
##
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat,
## subset = spdat$cluster == 3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0041216 -0.0005345 0.0000171 0.0006476 0.0022513
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.025e-02 1.868e-04 54.847 <2e-16 ***
## scale(ppersonspo) 2.915e-06 1.564e-04 0.019 0.985
## scale(p65plus) 2.285e-03 1.562e-04 14.629 <2e-16 ***
## scale(pblack_1) 2.999e-04 2.097e-04 1.430 0.156
## scale(phisp) -2.728e-04 1.970e-04 -1.385 0.170
## I(RUCC >= 7)TRUE -3.323e-05 2.976e-04 -0.112 0.911
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.001087 on 84 degrees of freedom
## Multiple R-squared: 0.7977, Adjusted R-squared: 0.7857
## F-statistic: 66.24 on 5 and 84 DF, p-value: < 2.2e-16
summary(lm4)
##
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat,
## subset = spdat$cluster == 4)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0051635 -0.0008085 -0.0000394 0.0007192 0.0035645
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0108274 0.0010132 10.687 1.28e-14 ***
## scale(ppersonspo) 0.0002726 0.0002269 1.201 0.235
## scale(p65plus) 0.0017292 0.0002076 8.330 4.45e-11 ***
## scale(pblack_1) 0.0014645 0.0014865 0.985 0.329
## scale(phisp) -0.0004185 0.0002522 -1.659 0.103
## I(RUCC >= 7)TRUE -0.0003595 0.0004359 -0.825 0.413
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.001537 on 51 degrees of freedom
## Multiple R-squared: 0.6137, Adjusted R-squared: 0.5758
## F-statistic: 16.2 on 5 and 51 DF, p-value: 1.519e-09
Or, we could cluster on the original attributes in the data to find similar places based on their demographic variables:
dem.dist<-dist(spdat@data[, c("ppersonspo", "p65plus", "pblack_1", "phisp", "RUCC")])
#The hclust() function does basic hierarchical clustering
#according to several different methods, we'll use Ward's method
clust.dem<-hclust(dem.dist, method="ward.D")
#And we'll plot the dendrogram, or tree plot
plot(clust.dem)
#I only want a few groups, so I'll cut the tree so I get 4 clusters
spdat$clus<-cutree(clust.dem, k=4)
#i'll use table() to get the frequencies of each cluster
table(spdat$clus)
##
## 1 2 3 4
## 116 113 97 175
#to get it to plot right, we have to convert the cluster number
#to a factor variable
spdat$d.cf<-as.factor(spdat$clus)
spplot(spdat,"d.cf", col.regions=brewer.pal(4, "Accent"), main="Demographic Spatial Regimes")
#We can get the means of each coefficient for each of the regime areas
b.means<-aggregate(spdat@data[, c("ppersonspo", "p65plus", "pblack_1", "phisp", "RUCC")], by=list(spdat$d.cf), mean)
b.means
## Group.1 ppersonspo p65plus pblack_1 phisp RUCC
## 1 1 0.2035794 0.1623899 0.10176707 0.2068053 7.000000
## 2 2 0.1469202 0.1106256 0.13197354 0.1586634 1.902655
## 3 3 0.1785309 0.1793988 0.06183557 0.1690796 8.577320
## 4 4 0.1896993 0.1480544 0.10835914 0.1856226 5.685714
#fit separate ols models to each regime
lm1<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7),spdat, subset=spdat$d.cf==1)
lm2<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7),spdat, subset=spdat$d.cf==2)
lm3<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7),spdat, subset=spdat$d.cf==3)
lm4<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7),spdat, subset=spdat$d.cf==4)
#examine each of the spatial regimes
summary(lm1)
##
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat,
## subset = spdat$d.cf == 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0038075 -0.0005097 0.0000522 0.0005946 0.0030558
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.060e-02 1.159e-04 91.449 < 2e-16 ***
## scale(ppersonspo) 6.775e-04 1.349e-04 5.022 1.97e-06 ***
## scale(p65plus) 1.771e-03 1.254e-04 14.128 < 2e-16 ***
## scale(pblack_1) 8.215e-05 1.357e-04 0.605 0.546
## scale(phisp) -1.040e-03 1.373e-04 -7.577 1.14e-11 ***
## I(RUCC >= 7)TRUE NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.001051 on 111 degrees of freedom
## Multiple R-squared: 0.8114, Adjusted R-squared: 0.8046
## F-statistic: 119.4 on 4 and 111 DF, p-value: < 2.2e-16
summary(lm2)
##
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat,
## subset = spdat$d.cf == 2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.947e-03 -3.949e-04 4.587e-05 4.450e-04 1.939e-03
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0104293 0.0001434 72.752 < 2e-16 ***
## scale(ppersonspo) 0.0004382 0.0001168 3.752 0.000284 ***
## scale(p65plus) 0.0023816 0.0001358 17.537 < 2e-16 ***
## scale(pblack_1) 0.0001386 0.0001076 1.288 0.200504
## scale(phisp) -0.0007527 0.0001222 -6.161 1.27e-08 ***
## I(RUCC >= 7)TRUE NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0007792 on 108 degrees of freedom
## Multiple R-squared: 0.8427, Adjusted R-squared: 0.8368
## F-statistic: 144.6 on 4 and 108 DF, p-value: < 2.2e-16
summary(lm3)
##
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat,
## subset = spdat$d.cf == 3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0056808 -0.0009007 0.0001499 0.0009163 0.0033947
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0101926 0.0002077 49.078 < 2e-16 ***
## scale(ppersonspo) 0.0004024 0.0002402 1.675 0.0973 .
## scale(p65plus) 0.0019056 0.0001730 11.012 < 2e-16 ***
## scale(pblack_1) 0.0002083 0.0002587 0.805 0.4228
## scale(phisp) -0.0010208 0.0002335 -4.371 3.24e-05 ***
## I(RUCC >= 7)TRUE NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.00163 on 92 degrees of freedom
## Multiple R-squared: 0.6535, Adjusted R-squared: 0.6384
## F-statistic: 43.38 on 4 and 92 DF, p-value: < 2.2e-16
summary(lm4)
##
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat,
## subset = spdat$d.cf == 4)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0040428 -0.0004612 -0.0000613 0.0006240 0.0023597
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.066e-02 7.596e-05 140.344 < 2e-16 ***
## scale(ppersonspo) 2.544e-04 1.106e-04 2.301 0.0226 *
## scale(p65plus) 2.290e-03 1.081e-04 21.190 < 2e-16 ***
## scale(pblack_1) 6.792e-05 1.041e-04 0.652 0.5150
## scale(phisp) -6.464e-04 1.032e-04 -6.261 3.02e-09 ***
## I(RUCC >= 7)TRUE NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.000991 on 170 degrees of freedom
## Multiple R-squared: 0.7996, Adjusted R-squared: 0.7949
## F-statistic: 169.6 on 4 and 170 DF, p-value: < 2.2e-16