Generally, if we have a continuous outcome, we consider using the OLS model and when we have data collected over space, we have other assumptions too.
library(spdep)
library(maptools)
library(car)
library(lmtest)
library(spgwr)
library(classInt)
library(RColorBrewer)
setwd("/Users/ozd504/Google Drive/classes/dem7263/class17//data")
Stationarity is a general term in statistics/math, generally it means something doesn’t change over time or space e.g. stationary population e.g. stationary time series In spatial statistics, we can stay stationarity = homogeneity of an effect, or, that a process works the same regardless of where you observe the process. In spatial statistics, the latter can be a weak assumption, and we can ask, does X affect Y differently at different geographic locations, or in terms of parameters: If we estimate in OLS \(\beta\) = .5, are there locations in the data where \(\beta\) != .5?
There are several approaches to dealing with this problem. Among them are: * Create zones of homogeneity and stratify * Allow parameters to vary constantly
The first technique is stratification, or spatial regimes. This is where we stratify the data by either a known fixed quality, such as the state in which a county is located, or perhaps some other socio-demographic or socio-economic variable.
We then fit separate regression models for each region. These could be OLS or some other specification we have discussed.
First, we read in our data and make a spatial representation of it. I only ues a few state for brevity.
spdat<-readShapePoly("/Users/ozd504/Google Drive/classes/dem7263/class17//data/usdata_mort.shp")
## Warning: use rgdal::readOGR or sf::st_read
spdat$population<-as.numeric(as.character(spdat$Population))
spdat$deaths<-as.numeric(as.character(spdat$Deaths))
spdat<-spdat[spdat$STATE%in%c("05","22", "20","08","29","28", "40","35", "48"),]
spdat$pov_group<-cut(spdat$ppersonspo, breaks = quantile(spdat$ppersonspo, p=c(0, .3, .6, 1)), include.lowest = T)
lvs<-levels(spdat$pov_group)
lvs
## [1] "[0.0212,0.129]" "(0.129,0.174]" "(0.174,0.509]"
nbs<-knearneigh(coordinates(spdat), k=2)
nbsmg1<-knearneigh(coordinates(spdat[spdat$pov_group==lvs[1],]), k=2)
nbsmg2<-knearneigh(coordinates(spdat[spdat$pov_group==lvs[2],]), k=2)
nbsmg3<-knearneigh(coordinates(spdat[spdat$pov_group==lvs[3],]), k=2)
nbs<-knn2nb(nbs,sym=T)
nbs1<-knn2nb(nbsmg1, sym=T)
nbs2<-knn2nb(nbsmg2, sym=T)
nbs3<-knn2nb(nbsmg3, sym=T)
Here’s the basic OLS model:
fit.1<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), data=spdat)
summary(fit.1)
##
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0071774 -0.0005667 0.0000313 0.0006072 0.0033003
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.064e-02 5.965e-05 178.450 < 2e-16 ***
## scale(ppersonspo) 5.956e-04 5.948e-05 10.015 < 2e-16 ***
## scale(p65plus) 2.370e-03 4.646e-05 51.008 < 2e-16 ***
## scale(pblack_1) 1.625e-04 6.103e-05 2.662 0.007903 **
## scale(phisp) -6.559e-04 5.309e-05 -12.356 < 2e-16 ***
## I(RUCC >= 7)TRUE -3.351e-04 8.593e-05 -3.900 0.000104 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.001151 on 860 degrees of freedom
## Multiple R-squared: 0.8262, Adjusted R-squared: 0.8252
## F-statistic: 817.7 on 5 and 860 DF, p-value: < 2.2e-16
We could stratify by state:
#Examine basic regimes, using states as a nesting mechanism
spplot(spdat, "STATE")
fit.strat<-lm(I(deaths/population)~STATE/(scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7)), data=spdat)
#summary(fit.strat)
anova(fit.1, fit.strat)
## Analysis of Variance Table
##
## Model 1: I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) + scale(pblack_1) +
## scale(phisp) + I(RUCC >= 7)
## Model 2: I(deaths/population) ~ STATE/(scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7))
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 860 0.00113860
## 2 812 0.00095373 48 0.00018487 3.2791 4.18e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
which gives us 50 regression models, one for each state.
Or, we could examine an alternative regime, based on a demographic characteristic, say poverty:
spplot(spdat, "pov_group", col.regions=brewer.pal(n=3, "Greys"), col="transparent")
fit.strat2<-lm(I(deaths/population)~pov_group/(scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7)), data=spdat)
summary(fit.strat2)
##
## Call:
## lm(formula = I(deaths/population) ~ pov_group/(scale(ppersonspo) +
## scale(p65plus) + scale(pblack_1) + scale(phisp) + I(RUCC >=
## 7)), data = spdat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0062054 -0.0005616 0.0000257 0.0006351 0.0032782
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 1.076e-02 2.498e-04 43.101
## pov_group(0.129,0.174] 4.674e-05 2.867e-04 0.163
## pov_group(0.174,0.509] 7.295e-05 2.743e-04 0.266
## pov_group[0.0212,0.129]:scale(ppersonspo) 7.522e-04 2.437e-04 3.086
## pov_group(0.129,0.174]:scale(ppersonspo) 7.759e-04 3.764e-04 2.061
## pov_group(0.174,0.509]:scale(ppersonspo) 2.604e-04 1.016e-04 2.563
## pov_group[0.0212,0.129]:scale(p65plus) 2.447e-03 7.858e-05 31.144
## pov_group(0.129,0.174]:scale(p65plus) 2.128e-03 9.714e-05 21.905
## pov_group(0.174,0.509]:scale(p65plus) 2.371e-03 8.528e-05 27.804
## pov_group[0.0212,0.129]:scale(pblack_1) 2.535e-04 2.842e-04 0.892
## pov_group(0.129,0.174]:scale(pblack_1) 1.214e-04 1.542e-04 0.787
## pov_group(0.174,0.509]:scale(pblack_1) 2.069e-04 7.316e-05 2.828
## pov_group[0.0212,0.129]:scale(phisp) -5.357e-04 1.670e-04 -3.208
## pov_group(0.129,0.174]:scale(phisp) -7.779e-04 1.029e-04 -7.558
## pov_group(0.174,0.509]:scale(phisp) -5.976e-04 6.734e-05 -8.874
## pov_group[0.0212,0.129]:I(RUCC >= 7)TRUE -3.447e-04 1.671e-04 -2.063
## pov_group(0.129,0.174]:I(RUCC >= 7)TRUE -4.785e-04 1.646e-04 -2.906
## pov_group(0.174,0.509]:I(RUCC >= 7)TRUE -7.200e-05 1.310e-04 -0.550
## Pr(>|t|)
## (Intercept) < 2e-16 ***
## pov_group(0.129,0.174] 0.87054
## pov_group(0.174,0.509] 0.79035
## pov_group[0.0212,0.129]:scale(ppersonspo) 0.00209 **
## pov_group(0.129,0.174]:scale(ppersonspo) 0.03958 *
## pov_group(0.174,0.509]:scale(ppersonspo) 0.01054 *
## pov_group[0.0212,0.129]:scale(p65plus) < 2e-16 ***
## pov_group(0.129,0.174]:scale(p65plus) < 2e-16 ***
## pov_group(0.174,0.509]:scale(p65plus) < 2e-16 ***
## pov_group[0.0212,0.129]:scale(pblack_1) 0.37254
## pov_group(0.129,0.174]:scale(pblack_1) 0.43144
## pov_group(0.174,0.509]:scale(pblack_1) 0.00479 **
## pov_group[0.0212,0.129]:scale(phisp) 0.00139 **
## pov_group(0.129,0.174]:scale(phisp) 1.06e-13 ***
## pov_group(0.174,0.509]:scale(phisp) < 2e-16 ***
## pov_group[0.0212,0.129]:I(RUCC >= 7)TRUE 0.03942 *
## pov_group(0.129,0.174]:I(RUCC >= 7)TRUE 0.00375 **
## pov_group(0.174,0.509]:I(RUCC >= 7)TRUE 0.58267
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.001134 on 848 degrees of freedom
## Multiple R-squared: 0.8335, Adjusted R-squared: 0.8302
## F-statistic: 249.7 on 17 and 848 DF, p-value: < 2.2e-16
anova(fit.1, fit.strat2)
## Analysis of Variance Table
##
## Model 1: I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) + scale(pblack_1) +
## scale(phisp) + I(RUCC >= 7)
## Model 2: I(deaths/population) ~ pov_group/(scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7))
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 860 0.0011386
## 2 848 0.0010909 12 4.7738e-05 3.0925 0.0002658 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(fit.strat2, fit.strat)
## Analysis of Variance Table
##
## Model 1: I(deaths/population) ~ pov_group/(scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7))
## Model 2: I(deaths/population) ~ STATE/(scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7))
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 848 0.00109086
## 2 812 0.00095373 36 0.00013713 3.2432 1.263e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(spdat, main="Neighbors for low poverty counties")
plot(nbs1,coords=coordinates(spdat[spdat$pov_group==lvs[1],]), col=2, add=T)
plot(spdat, main="Neighbors for medium poverty counties")
plot(nbs2,coords=coordinates(spdat[spdat$pov_group==lvs[2],]), col=2, add=T)
plot(spdat, main="Neighbors for high poverty counties")
plot(nbs3,coords=coordinates(spdat[spdat$pov_group==lvs[3],]), col=2, add=T)
wts<-nb2listw(nbs)
wts1<-nb2listw(nbs1)
wts2<-nb2listw(nbs2)
wts3<-nb2listw(nbs3)
So we see non-stationarity in several of the covariates, with respect to state. The effects of poverty, age structure and race/ethnicty all vary significantly by state. Similar results are also seen for the poverty rate grouping variable, althought the final model comparison of the two stratified models shows more variation between states than between poverty groups.
If we wanted, we could stratify our analysis by state. Below, I simply stratify the analysis by state and use the OLS model.
fit.ar<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), data=spdat, subset=STATE=="05")
summary(fit.ar)
##
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat,
## subset = STATE == "05")
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.356e-03 -4.771e-04 3.499e-05 5.518e-04 2.209e-03
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0100603 0.0003889 25.870 < 2e-16 ***
## scale(ppersonspo) 0.0007007 0.0002263 3.097 0.00283 **
## scale(p65plus) 0.0020609 0.0001736 11.871 < 2e-16 ***
## scale(pblack_1) 0.0002793 0.0001576 1.773 0.08072 .
## scale(phisp) -0.0015382 0.0006603 -2.329 0.02277 *
## I(RUCC >= 7)TRUE 0.0002247 0.0002492 0.902 0.37037
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0008916 on 69 degrees of freedom
## Multiple R-squared: 0.8106, Adjusted R-squared: 0.7969
## F-statistic: 59.08 on 5 and 69 DF, p-value: < 2.2e-16
fit.la<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), data=spdat, subset=STATE=="22")
summary(fit.la)
##
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat,
## subset = STATE == "22")
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0035436 -0.0003792 -0.0000223 0.0005726 0.0018121
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0094746 0.0011208 8.454 1.07e-11 ***
## scale(ppersonspo) 0.0003456 0.0002375 1.455 0.151
## scale(p65plus) 0.0024852 0.0002506 9.916 4.25e-14 ***
## scale(pblack_1) 0.0001109 0.0001934 0.573 0.569
## scale(phisp) -0.0026665 0.0018815 -1.417 0.162
## I(RUCC >= 7)TRUE 0.0003250 0.0003047 1.067 0.291
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0009234 on 58 degrees of freedom
## Multiple R-squared: 0.7974, Adjusted R-squared: 0.7799
## F-statistic: 45.65 on 5 and 58 DF, p-value: < 2.2e-16
fit.ok<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), data=spdat, subset=STATE=="40")
summary(fit.ok)
##
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat,
## subset = STATE == "40")
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.029e-03 -5.192e-04 -5.854e-05 6.775e-04 2.030e-03
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0115100 0.0002970 38.755 < 2e-16 ***
## scale(ppersonspo) 0.0008804 0.0001885 4.671 1.38e-05 ***
## scale(p65plus) 0.0026697 0.0001781 14.993 < 2e-16 ***
## scale(pblack_1) 0.0002233 0.0004935 0.452 0.652
## scale(phisp) 0.0001023 0.0004555 0.225 0.823
## I(RUCC >= 7)TRUE -0.0002993 0.0002681 -1.116 0.268
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0009816 on 71 degrees of freedom
## Multiple R-squared: 0.8346, Adjusted R-squared: 0.8229
## F-statistic: 71.65 on 5 and 71 DF, p-value: < 2.2e-16
fit.nm<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), data=spdat, subset=STATE=="35")
summary(fit.nm)
##
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat,
## subset = STATE == "35")
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0044801 -0.0007503 -0.0000290 0.0008573 0.0043913
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0094413 0.0021653 4.360 0.00017 ***
## scale(ppersonspo) 0.0005935 0.0003167 1.874 0.07175 .
## scale(p65plus) 0.0016622 0.0002713 6.126 1.52e-06 ***
## scale(pblack_1) 0.0007923 0.0037894 0.209 0.83595
## scale(phisp) 0.0001598 0.0003058 0.522 0.60560
## I(RUCC >= 7)TRUE -0.0005091 0.0006664 -0.764 0.45152
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.001638 on 27 degrees of freedom
## Multiple R-squared: 0.6221, Adjusted R-squared: 0.5521
## F-statistic: 8.89 on 5 and 27 DF, p-value: 4.382e-05
fit.tx<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), data=spdat, subset=STATE=="48")
summary(fit.tx)
##
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat,
## subset = STATE == "48")
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0042947 -0.0005506 0.0000912 0.0006799 0.0029490
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.049e-02 1.146e-04 91.530 < 2e-16 ***
## scale(ppersonspo) 2.798e-04 1.228e-04 2.278 0.023568 *
## scale(p65plus) 2.351e-03 9.295e-05 25.292 < 2e-16 ***
## scale(pblack_1) 4.812e-04 1.997e-04 2.410 0.016702 *
## scale(phisp) -4.304e-04 1.184e-04 -3.636 0.000337 ***
## I(RUCC >= 7)TRUE -3.663e-04 1.676e-04 -2.186 0.029786 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.00111 on 246 degrees of freedom
## Multiple R-squared: 0.8453, Adjusted R-squared: 0.8422
## F-statistic: 268.9 on 5 and 246 DF, p-value: < 2.2e-16
Or, we could run a SAR model for each subset using the weights we created above:
#first the "global" model
efit<-lagsarlm(scale(I(deaths/population))~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), data=spdat, listw=wts, method="MC")
Then, a regime specification as used by Brazil 2015
efit0<-lagsarlm(scale(I(deaths/population))~pov_group/(scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7)), data=spdat, listw=wts, method="MC")
summary(efit0)
##
## Call:
## lagsarlm(formula = scale(I(deaths/population)) ~ pov_group/(scale(ppersonspo) +
## scale(p65plus) + scale(pblack_1) + scale(phisp) + I(RUCC >=
## 7)), data = spdat, listw = wts, method = "MC")
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.2296199 -0.2000084 0.0083389 0.2228830 1.1937891
##
## Type: lag
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value
## (Intercept) 0.089600 0.088193 1.0160
## pov_group(0.129,0.174] 0.024349 0.101172 0.2407
## pov_group(0.174,0.509] 0.045556 0.096811 0.4706
## pov_group[0.0212,0.129]:scale(ppersonspo) 0.230477 0.086402 2.6675
## pov_group(0.129,0.174]:scale(ppersonspo) 0.229471 0.132925 1.7263
## pov_group(0.174,0.509]:scale(ppersonspo) 0.091278 0.035816 2.5485
## pov_group[0.0212,0.129]:scale(p65plus) 0.833853 0.029727 28.0500
## pov_group(0.129,0.174]:scale(p65plus) 0.732307 0.035260 20.7685
## pov_group(0.174,0.509]:scale(p65plus) 0.829172 0.030658 27.0462
## pov_group[0.0212,0.129]:scale(pblack_1) 0.093496 0.100192 0.9332
## pov_group(0.129,0.174]:scale(pblack_1) 0.058091 0.054372 1.0684
## pov_group(0.174,0.509]:scale(pblack_1) 0.079272 0.025799 3.0727
## pov_group[0.0212,0.129]:scale(phisp) -0.150249 0.059375 -2.5305
## pov_group(0.129,0.174]:scale(phisp) -0.247997 0.036793 -6.7404
## pov_group(0.174,0.509]:scale(phisp) -0.186614 0.024301 -7.6791
## pov_group[0.0212,0.129]:I(RUCC >= 7)TRUE -0.116687 0.058925 -1.9803
## pov_group(0.129,0.174]:I(RUCC >= 7)TRUE -0.179497 0.058067 -3.0912
## pov_group(0.174,0.509]:I(RUCC >= 7)TRUE -0.052215 0.046358 -1.1264
## Pr(>|z|)
## (Intercept) 0.309652
## pov_group(0.129,0.174] 0.809811
## pov_group(0.174,0.509] 0.637951
## pov_group[0.0212,0.129]:scale(ppersonspo) 0.007642
## pov_group(0.129,0.174]:scale(ppersonspo) 0.084290
## pov_group(0.174,0.509]:scale(ppersonspo) 0.010818
## pov_group[0.0212,0.129]:scale(p65plus) < 2.2e-16
## pov_group(0.129,0.174]:scale(p65plus) < 2.2e-16
## pov_group(0.174,0.509]:scale(p65plus) < 2.2e-16
## pov_group[0.0212,0.129]:scale(pblack_1) 0.350732
## pov_group(0.129,0.174]:scale(pblack_1) 0.285332
## pov_group(0.174,0.509]:scale(pblack_1) 0.002122
## pov_group[0.0212,0.129]:scale(phisp) 0.011390
## pov_group(0.129,0.174]:scale(phisp) 1.579e-11
## pov_group(0.174,0.509]:scale(phisp) 1.599e-14
## pov_group[0.0212,0.129]:I(RUCC >= 7)TRUE 0.047673
## pov_group(0.129,0.174]:I(RUCC >= 7)TRUE 0.001993
## pov_group(0.174,0.509]:I(RUCC >= 7)TRUE 0.260013
##
## Rho: 0.11712, LR test value: 29.179, p-value: 6.6006e-08
## Asymptotic standard error: 0.021705
## z-value: 5.3963, p-value: 6.8029e-08
## Wald statistic: 29.12, p-value: 6.8029e-08
##
## Log likelihood: -437.462 for lag model
## ML residual variance (sigma squared): 0.15993, (sigma: 0.39991)
## Number of observations: 866
## Number of parameters estimated: 20
## AIC: 914.92, (AIC for lm: 942.1)
## LM test for residual autocorrelation
## test value: 4.5618, p-value: 0.032693
anova(efit, efit0)
## Model df AIC logLik Test L.Ratio p-value
## efit 1 8 922.52 -453.26 1
## efit0 2 20 914.92 -437.46 2 31.598 0.0015959
Or, I could fit the fully stratified models, one regime at a time by subsetting and using the group-specific spatial weights:
efit1<-lagsarlm(scale(I(deaths/population))~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), data=spdat[spdat$pov_group==lvs[1],], listw=wts1, method="MC")
efit2<-lagsarlm(scale(I(deaths/population))~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), data=spdat[spdat$pov_group==lvs[2],], listw=wts2, method="MC")
efit3<-lagsarlm(scale(I(deaths/population))~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), data=spdat[spdat$pov_group==lvs[3],], listw=wts3, method="MC")
summary(efit1)
##
## Call:lagsarlm(formula = scale(I(deaths/population)) ~ scale(ppersonspo) +
## scale(p65plus) + scale(pblack_1) + scale(phisp) + I(RUCC >=
## 7), data = spdat[spdat$pov_group == lvs[1], ], listw = wts1,
## method = "MC")
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.2400498 -0.1599200 -0.0049624 0.1863141 0.9686230
##
## Type: lag
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.055588 0.030362 1.8308 0.067126
## scale(ppersonspo) 0.073129 0.023164 3.1570 0.001594
## scale(p65plus) 0.868721 0.031857 27.2695 < 2.2e-16
## scale(pblack_1) 0.021391 0.022688 0.9429 0.345756
## scale(phisp) -0.069007 0.022166 -3.1132 0.001851
## I(RUCC >= 7)TRUE -0.103967 0.045610 -2.2795 0.022639
##
## Rho: 0.093594, LR test value: 9.0027, p-value: 0.0026958
## Asymptotic standard error: 0.031567
## z-value: 2.9649, p-value: 0.003028
## Wald statistic: 8.7905, p-value: 0.003028
##
## Log likelihood: -64.50503 for lag model
## ML residual variance (sigma squared): 0.095837, (sigma: 0.30958)
## Number of observations: 260
## Number of parameters estimated: 8
## AIC: 145.01, (AIC for lm: 152.01)
## LM test for residual autocorrelation
## test value: 10.272, p-value: 0.0013508
summary(efit2)
##
## Call:lagsarlm(formula = scale(I(deaths/population)) ~ scale(ppersonspo) +
## scale(p65plus) + scale(pblack_1) + scale(phisp) + I(RUCC >=
## 7), data = spdat[spdat$pov_group == lvs[2], ], listw = wts2,
## method = "MC")
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.617670 -0.222604 -0.011701 0.263513 1.339840
##
## Type: lag
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.092452 0.045575 2.0285 0.042505
## scale(ppersonspo) 0.061688 0.031415 1.9636 0.049573
## scale(p65plus) 0.818666 0.038946 21.0203 < 2.2e-16
## scale(pblack_1) 0.038639 0.035990 1.0736 0.283012
## scale(phisp) -0.225453 0.033643 -6.7013 2.066e-11
## I(RUCC >= 7)TRUE -0.193816 0.070547 -2.7473 0.006008
##
## Rho: 0.068926, LR test value: 2.5758, p-value: 0.10851
## Asymptotic standard error: 0.04315
## z-value: 1.5973, p-value: 0.11019
## Wald statistic: 2.5515, p-value: 0.11019
##
## Log likelihood: -181.0869 for lag model
## ML residual variance (sigma squared): 0.23532, (sigma: 0.4851)
## Number of observations: 260
## Number of parameters estimated: 8
## AIC: 378.17, (AIC for lm: 378.75)
## LM test for residual autocorrelation
## test value: 1.8726, p-value: 0.17118
summary(efit3)
##
## Call:lagsarlm(formula = scale(I(deaths/population)) ~ scale(ppersonspo) +
## scale(p65plus) + scale(pblack_1) + scale(phisp) + I(RUCC >=
## 7), data = spdat[spdat$pov_group == lvs[3], ], listw = wts3,
## method = "MC")
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.858575 -0.238619 0.027251 0.264560 1.337662
##
## Type: lag
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.022702 0.039200 0.5791 0.562502
## scale(ppersonspo) 0.084316 0.033428 2.5223 0.011658
## scale(p65plus) 0.773182 0.030059 25.7218 < 2.2e-16
## scale(pblack_1) 0.116802 0.039375 2.9664 0.003013
## scale(phisp) -0.275942 0.039832 -6.9277 4.277e-12
## I(RUCC >= 7)TRUE -0.042717 0.053836 -0.7935 0.427509
##
## Rho: 0.10926, LR test value: 8.3474, p-value: 0.0038624
## Asymptotic standard error: 0.036687
## z-value: 2.9783, p-value: 0.0028984
## Wald statistic: 8.8703, p-value: 0.0028984
##
## Log likelihood: -226.843 for lag model
## ML residual variance (sigma squared): 0.21626, (sigma: 0.46504)
## Number of observations: 346
## Number of parameters estimated: 8
## AIC: 469.69, (AIC for lm: 476.03)
## LM test for residual autocorrelation
## test value: 0.64047, p-value: 0.42354
First proposed by Brunsdon et al 1996
Their model was specified, where now each \(\beta_k\) is estimated at the location \(u_i,v_j\) where i and j are the coordinates or geographic location of the observation i. \(\beta_k(ui,vj)\) is the local realization of \(\beta\) at location i,j. This constructs a trend surface of parameter values for each independent variable and the model intercept.
Note that the basic OLS regression model above is just a special case of the GWR model where the coefficients are constant over space. The parameters in the GWR are estimated by weighted least squares. The weighting matrix is a diagonal matrix, with each diagonal element being a function of the location of the observation. if Wi is the weighting matrix at location i, then the parameter estimate at that location would be
\(\hat \beta_i = (X'WX)^{-1} X'WY\)
The role of the weight matrix is to give more value to observations that are close to i. as it is assumed that observations that are close will influence each other more than those that are far away. This is NOT the same W matrix as was used in Moran’s I!
W is generally constructed based on the distances between locations, and if \(d_ij\) < d , then \(w_{ij}\)=1, and 0 otherwise. Brunsdson, Charlton, and Fotheringham used a kernel-density approach to measure W
\(w_{ij} = exp(- \frac{1}{2}(d_{ij}/b)^2)\)
where b is the bandwidth parameter to be estimated. under this weighting scheme, the weight for points that exactly co-occur will be 1 and for those that are the distance \(d_{ij}\) apart, the weight will decrease as a Gaussian curve as \(d_{ij}\) increases.
GWR Kernels
There are several ways to do this, but most programs use a cross-validation method to choose the optimal kernel bandwidth. The cross validation method takes the form: \(CV = \Sigma_i \left[y_i - \hat y_{\neq i} (\beta) \right ]^2\)
Where \(\hat y_{\neq i} (\beta)\) is the fitted value of \(y_i\) with the observations for point i omitted from the calibration process. This in effect minimizes the sum of squared errors at all locations i, and arrives at an optimal bandwidth.
When the model is fit, we can also calculate local z/t tests to test whether the effect of a specifix X variable is significant at a particular location. Likewise, we can calculate a local \(R^2\) measure at each location.
Leung et al 2000 proposed a series of F tests for model improvement over OLS, and they also derive an F test for variation in the beta coefficients (so called F3 test.
The advantage of GWR is that we can visualize the difference in the covariate effects as a continuous surface. There are a few steps to fitting the GWR model. First we need to select a bandwidth for the weighting algorithm
gwr.b.f<-gwr.sel(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), spdat)
## Bandwidth: 969429.9 CV score: 0.00113975
## Bandwidth: 1567004 CV score: 0.001150311
## Bandwidth: 600108.5 CV score: 0.001123044
## Bandwidth: 371855.3 CV score: 0.001093567
## Bandwidth: 230787.1 CV score: 0.001058121
## Bandwidth: 143602.1 CV score: 0.001041474
## Bandwidth: 89718.85 CV score: 0.001110606
## Bandwidth: 176903.8 CV score: 0.001044459
## Bandwidth: 136418.3 CV score: 0.001043043
## Bandwidth: 154363.4 CV score: 0.001041077
## Bandwidth: 150804.8 CV score: 0.001041
## Bandwidth: 151244.2 CV score: 0.001040999
## Bandwidth: 151072.9 CV score: 0.001040999
## Bandwidth: 151070.5 CV score: 0.001040999
## Bandwidth: 151070.2 CV score: 0.001040999
## Bandwidth: 151070.2 CV score: 0.001040999
## Bandwidth: 151070.2 CV score: 0.001040999
## Bandwidth: 151070.2 CV score: 0.001040999
## Bandwidth: 151070.2 CV score: 0.001040999
gwr.b.f
## [1] 151070.2
#this is the distance (in meters, because our data are projected in a system measured in meters), which the weighting fuction will search, and include all observations within this radius.
gwr.f<-gwr(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), spdat, bandwidth=gwr.b.f, se.fit=T, hatmatrix=T)
gwr.f
## Call:
## gwr(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat,
## bandwidth = gwr.b.f, hatmatrix = T, se.fit = T)
## Kernel function: gwr.Gauss
## Fixed bandwidth: 151070.2
## Summary of GWR coefficient estimates at data points:
## Min. 1st Qu. Median 3rd Qu. Max.
## X.Intercept. 5.536e-03 1.021e-02 1.050e-02 1.084e-02 1.217e-02
## scale.ppersonspo. -4.253e-04 4.215e-04 5.645e-04 7.620e-04 1.274e-03
## scale.p65plus. 1.458e-03 2.218e-03 2.351e-03 2.456e-03 2.841e-03
## scale.pblack_1. -6.664e-03 -2.847e-06 1.390e-04 3.199e-04 4.986e-03
## scale.phisp. -3.435e-03 -1.015e-03 -8.177e-04 -3.383e-04 9.259e-04
## I.RUCC....7.TRUE -1.389e-03 -4.151e-04 -2.689e-04 -5.552e-05 9.886e-04
## Global
## X.Intercept. 0.0106
## scale.ppersonspo. 0.0006
## scale.p65plus. 0.0024
## scale.pblack_1. 0.0002
## scale.phisp. -0.0007
## I.RUCC....7.TRUE -0.0003
## Number of data points: 866
## Effective number of parameters (residual: 2traceS - traceS'S): 118.3647
## Effective degrees of freedom (residual: 2traceS - traceS'S): 747.6353
## Sigma (residual: 2traceS - traceS'S): 0.001017698
## Effective number of parameters (model: traceS): 84.46943
## Effective degrees of freedom (model: traceS): 781.5306
## Sigma (model: traceS): 0.000995384
## Sigma (ML): 0.000945594
## AICc (GWR p. 61, eq 2.33; p. 96, eq. 4.21): -9413.622
## AIC (GWR p. 96, eq. 4.22): -9519.053
## Residual sum of squares: 0.0007743321
## Quasi-global R2: 0.8818083
BFC02.gwr.test(gwr.f)
##
## Brunsdon, Fotheringham & Charlton (2002, pp. 91-2) ANOVA
##
## data: gwr.f
## F = 1.4704, df1 = 860.00, df2 = 747.64, p-value = 3.185e-08
## alternative hypothesis: greater
## sample estimates:
## SS OLS residuals SS GWR residuals
## 0.0011386013 0.0007743321
BFC99.gwr.test(gwr.f)
##
## Brunsdon, Fotheringham & Charlton (1999) ANOVA
##
## data: gwr.f
## F = 3.1301, df1 = 501.02, df2 = 790.40, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## SS GWR improvement SS GWR residuals
## 0.0003642692 0.0007743321
LMZ.F1GWR.test(gwr.f)
##
## Leung et al. (2000) F(1) test
##
## data: gwr.f
## F = 0.78228, df1 = 790.4, df2 = 860.0, p-value = 0.0002232
## alternative hypothesis: less
## sample estimates:
## SS OLS residuals SS GWR residuals
## 0.0011386013 0.0007743321
LMZ.F2GWR.test(gwr.f)
##
## Leung et al. (2000) F(2) test
##
## data: gwr.f
## F = 2.4486, df1 = 175.56, df2 = 860.00, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## SS OLS residuals SS GWR improvement
## 0.0011386013 0.0003642692
LMZ.F3GWR.test(gwr.f)
##
## Leung et al. (2000) F(3) test
##
## F statistic Numerator d.f. Denominator d.f. Pr(>)
## (Intercept) 1.1840 46.7226 790.4 0.1906
## scale(ppersonspo) 2.6035 210.0455 790.4 < 2.2e-16
## scale(p65plus) 1.8851 227.5250 790.4 1.475e-10
## scale(pblack_1) 1.3051 29.9402 790.4 0.1285
## scale(phisp) 1.5632 33.9424 790.4 0.0225
## I(RUCC >= 7)TRUE 1.6855 297.4143 790.4 8.846e-09
##
## (Intercept)
## scale(ppersonspo) ***
## scale(p65plus) ***
## scale(pblack_1)
## scale(phisp) *
## I(RUCC >= 7)TRUE ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
gwr.dat<-gwr.f$SDF
cols<-brewer.pal(n=6, name="RdBu")
names(gwr.dat)
## [1] "sum.w" "X.Intercept."
## [3] "scale.ppersonspo." "scale.p65plus."
## [5] "scale.pblack_1." "scale.phisp."
## [7] "I.RUCC....7.TRUE" "X.Intercept._se"
## [9] "scale.ppersonspo._se" "scale.p65plus._se"
## [11] "scale.pblack_1._se" "scale.phisp._se"
## [13] "I.RUCC....7.TRUE_se" "gwr.e"
## [15] "pred" "pred.se"
## [17] "localR2" "X.Intercept._se_EDF"
## [19] "scale.ppersonspo._se_EDF" "scale.p65plus._se_EDF"
## [21] "scale.pblack_1._se_EDF" "scale.phisp._se_EDF"
## [23] "I.RUCC....7.TRUE_se_EDF" "pred.se_EDF"
#pdf("gwrfixed.pdf")
spplot(gwr.dat, "scale.ppersonspo.", at=quantile(gwr.f$SDF$scale.ppersonspo.), col.regions=cols, main="% Poverty effect", col="transparent")
spplot(gwr.dat, "scale.pblack_1.", at=quantile(gwr.f$SDF$scale.pblack_1.), col.regions=cols, main="% Black effect", col="transparent")
spplot(gwr.dat, "scale.phisp.", at=quantile(gwr.f$SDF$scale.phisp.), col.regions=cols, main="% Hispanic effect", col="transparent")
#another "lighter" plot method
#brks<-c(-3.038e-04, 3.441e-04, 5.882e-04, 7.912e-04 , 1.355e-03); cols<-brewer.pal(n=5,"Reds")
#plot(gwr.a$SDF, col=cols[findInterval(gwr.a$SDF$scale.ppersonspo.,brks, all.inside=TRUE)])
#Local R^2
brks<-seq(0, 1, length.out = 9); cols=brewer.pal(n=9, "Reds")
spplot(gwr.f$SDF, "localR2", at=brks, col.regions=cols, main="local R^2", col="transparent")
#plots of the correlations of the coefficients
pairs(gwr.f$SDF[,3:7])
cor(as(gwr.f$SDF[,3:7], "data.frame"))
## scale.ppersonspo. scale.p65plus. scale.pblack_1.
## scale.ppersonspo. 1.00000000 -0.28695573 -0.05725905
## scale.p65plus. -0.28695573 1.00000000 -0.32080137
## scale.pblack_1. -0.05725905 -0.32080137 1.00000000
## scale.phisp. -0.45583825 -0.07608532 0.28078030
## I.RUCC....7.TRUE -0.11568366 -0.03801324 -0.32057620
## scale.phisp. I.RUCC....7.TRUE
## scale.ppersonspo. -0.45583825 -0.11568366
## scale.p65plus. -0.07608532 -0.03801324
## scale.pblack_1. 0.28078030 -0.32057620
## scale.phisp. 1.00000000 -0.21497469
## I.RUCC....7.TRUE -0.21497469 1.00000000
This plot shows a problem in GWR, that the coefficients can often be highly correlated See Wheeler and Tiefelsdorf (2005), one of the major critiques of GWR. In general, the critiques of GWR have pushed most users to use it for “exploratory” purposes only. So, one way we may explore our model is to examine where the regression effects are “significant” versus not. This can be done by plotting the local t-statistics, or z-statistics if you prefer.
cols<-brewer.pal(8,"RdBu")
dfree<-gwr.f$results$edf
names(gwr.dat)
## [1] "sum.w" "X.Intercept."
## [3] "scale.ppersonspo." "scale.p65plus."
## [5] "scale.pblack_1." "scale.phisp."
## [7] "I.RUCC....7.TRUE" "X.Intercept._se"
## [9] "scale.ppersonspo._se" "scale.p65plus._se"
## [11] "scale.pblack_1._se" "scale.phisp._se"
## [13] "I.RUCC....7.TRUE_se" "gwr.e"
## [15] "pred" "pred.se"
## [17] "localR2" "X.Intercept._se_EDF"
## [19] "scale.ppersonspo._se_EDF" "scale.p65plus._se_EDF"
## [21] "scale.pblack_1._se_EDF" "scale.phisp._se_EDF"
## [23] "I.RUCC....7.TRUE_se_EDF" "pred.se_EDF"
gwr.dat$pov.t<-gwr.dat$scale.ppersonspo./gwr.dat$scale.ppersonspo._se
gwr.dat$pov.t.p<-2*pt(-abs(gwr.dat$pov.t) , dfree)
tbrks<- c(min(gwr.dat$pov.t),qt(c( .975, .9, .1, .025),df=dfree, lower.tail = F),max(gwr.dat$pov.t))
spplot(gwr.dat, "pov.t", col.regions=cols, at=tbrks , main="t-stat %Poverty", col="transparent")
gwr.dat$black.t<-gwr.dat$scale.pblack_1./gwr.dat$scale.pblack_1._se
gwr.dat$black.t.p<-2*pt(-abs(gwr.dat$black.t) , dfree)
tbrks<- c(min(gwr.dat$black.t),qt(c( .975, .9, .1, .025),df=dfree, lower.tail = F),max(gwr.dat$black.t))
spplot(gwr.dat, "black.t", col.regions=cols,at=tbrks , main="t-stat %Black", col="transparent")
gwr.dat$hisp.t<-gwr.dat$scale.phisp./gwr.dat$scale.phisp._se
gwr.dat$hisp.t.p<-2*pt(-abs(gwr.dat$hisp.t) , dfree)
tbrks<- c(min(gwr.dat$hisp.t),qt(c( .975, .9, .1, .025),df=dfree, lower.tail = F),max(gwr.dat$hisp.t))
spplot(gwr.dat, "hisp.t", col.regions=cols, at=tbrks, main="t-stat %Hispanic", col="transparent")
So, those are the results from the fixed bandwith analysis. But there are obviously places in our data where counties are more densely occurring, so an adaptive kernel approach is also warranted. If all our data were equally spaced, then this may not be necessary.
Now we try everthing with the adaptive bandwidth:
gwr.b.a<-gwr.sel(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), spdat, adapt=T)
## Adaptive q: 0.381966 CV score: 0.001136242
## Adaptive q: 0.618034 CV score: 0.001142128
## Adaptive q: 0.236068 CV score: 0.001126485
## Adaptive q: 0.145898 CV score: 0.00111129
## Adaptive q: 0.09016994 CV score: 0.001092296
## Adaptive q: 0.05572809 CV score: 0.001062696
## Adaptive q: 0.03444185 CV score: 0.001034263
## Adaptive q: 0.02128624 CV score: 0.001018798
## Adaptive q: 0.01315562 CV score: 0.00103849
## Adaptive q: 0.0243862 CV score: 0.001023387
## Adaptive q: 0.01818062 CV score: 0.001027968
## Adaptive q: 0.02180007 CV score: 0.001018992
## Adaptive q: 0.02110114 CV score: 0.001018775
## Adaptive q: 0.0210191 CV score: 0.001018773
## Adaptive q: 0.02097841 CV score: 0.001018774
## Adaptive q: 0.0210598 CV score: 0.001018773
## Adaptive q: 0.0210191 CV score: 0.001018773
gwr.b.a
## [1] 0.0210191
This is the proportion of all cases which the weighting fuction will search, and include this fraction of observations in a model for each county. In this case it will include 18.2025446 counties in the model for a particular county.
gwr.a<-gwr(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), spdat, adapt=gwr.b.a, se.fit=T, hatmatrix=T )
gwr.a
## Call:
## gwr(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat,
## adapt = gwr.b.a, hatmatrix = T, se.fit = T)
## Kernel function: gwr.Gauss
## Adaptive quantile: 0.0210191 (about 18 of 866 data points)
## Summary of GWR coefficient estimates at data points:
## Min. 1st Qu. Median 3rd Qu. Max.
## X.Intercept. 7.715e-03 9.985e-03 1.046e-02 1.085e-02 1.233e-02
## scale.ppersonspo. -6.379e-04 3.358e-04 5.552e-04 7.810e-04 1.369e-03
## scale.p65plus. 1.664e-03 2.201e-03 2.348e-03 2.498e-03 3.195e-03
## scale.pblack_1. -1.737e-03 -2.639e-05 1.697e-04 4.175e-04 3.495e-03
## scale.phisp. -5.445e-03 -1.196e-03 -7.769e-04 -1.955e-04 1.793e-03
## I.RUCC....7.TRUE -1.383e-03 -4.946e-04 -2.727e-04 -1.530e-05 4.533e-04
## Global
## X.Intercept. 0.0106
## scale.ppersonspo. 0.0006
## scale.p65plus. 0.0024
## scale.pblack_1. 0.0002
## scale.phisp. -0.0007
## I.RUCC....7.TRUE -0.0003
## Number of data points: 866
## Effective number of parameters (residual: 2traceS - traceS'S): 159.6773
## Effective degrees of freedom (residual: 2traceS - traceS'S): 706.3227
## Sigma (residual: 2traceS - traceS'S): 0.001027636
## Effective number of parameters (model: traceS): 113.9685
## Effective degrees of freedom (model: traceS): 752.0315
## Sigma (model: traceS): 0.0009959163
## Sigma (ML): 0.0009280725
## AICc (GWR p. 61, eq 2.33; p. 96, eq. 4.21): -9370.427
## AIC (GWR p. 96, eq. 4.22): -9521.948
## Residual sum of squares: 0.000745902
## Quasi-global R2: 0.8861478
BFC02.gwr.test(gwr.a)
##
## Brunsdon, Fotheringham & Charlton (2002, pp. 91-2) ANOVA
##
## data: gwr.a
## F = 1.5265, df1 = 860.00, df2 = 706.32, p-value = 2.871e-09
## alternative hypothesis: greater
## sample estimates:
## SS OLS residuals SS GWR residuals
## 0.001138601 0.000745902
BFC99.gwr.test(gwr.a)
##
## Brunsdon, Fotheringham & Charlton (1999) ANOVA
##
## data: gwr.a
## F = 2.4198, df1 = 641.80, df2 = 766.23, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## SS GWR improvement SS GWR residuals
## 0.0003926993 0.0007459020
LMZ.F1GWR.test(gwr.a)
##
## Leung et al. (2000) F(1) test
##
## data: gwr.a
## F = 0.79764, df1 = 766.23, df2 = 860.00, p-value = 0.0006739
## alternative hypothesis: less
## sample estimates:
## SS OLS residuals SS GWR residuals
## 0.001138601 0.000745902
LMZ.F2GWR.test(gwr.a)
##
## Leung et al. (2000) F(2) test
##
## data: gwr.a
## F = 1.9301, df1 = 239.87, df2 = 860.00, p-value = 7.202e-12
## alternative hypothesis: greater
## sample estimates:
## SS OLS residuals SS GWR improvement
## 0.0011386013 0.0003926993
LMZ.F3GWR.test(gwr.a)
##
## Leung et al. (2000) F(3) test
##
## F statistic Numerator d.f. Denominator d.f. Pr(>)
## (Intercept) 1.1130 39.1932 766.23 0.295026
## scale(ppersonspo) 2.2536 221.9249 766.23 3.657e-16
## scale(p65plus) 1.6122 210.3144 766.23 2.736e-06
## scale(pblack_1) 1.3562 51.0037 766.23 0.053086
## scale(phisp) 1.3343 34.5128 766.23 0.097159
## I(RUCC >= 7)TRUE 1.3079 320.3032 766.23 0.001787
##
## (Intercept)
## scale(ppersonspo) ***
## scale(p65plus) ***
## scale(pblack_1) .
## scale(phisp) .
## I(RUCC >= 7)TRUE **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
gwr.data<-gwr.a$SDF
spplot(gwr.data, "scale.ppersonspo.", at=quantile(gwr.a$SDF$scale.ppersonspo.), col.regions=cols, main="% Poverty effect", col="transparent")
spplot(gwr.data, "scale.pblack_1.", at=quantile(gwr.a$SDF$scale.pblack_1.), col.regions=cols, main="% Black effect", col="transparent")
spplot(gwr.data, "scale.phisp.", at=quantile(gwr.a$SDF$scale.phisp.), col.regions=cols, main="% Hispanic effect", col="transparent")
cols<-brewer.pal(8,"RdBu")
dfree<-gwr.a$results$edf
names(gwr.data)
## [1] "sum.w" "X.Intercept."
## [3] "scale.ppersonspo." "scale.p65plus."
## [5] "scale.pblack_1." "scale.phisp."
## [7] "I.RUCC....7.TRUE" "X.Intercept._se"
## [9] "scale.ppersonspo._se" "scale.p65plus._se"
## [11] "scale.pblack_1._se" "scale.phisp._se"
## [13] "I.RUCC....7.TRUE_se" "gwr.e"
## [15] "pred" "pred.se"
## [17] "localR2" "X.Intercept._se_EDF"
## [19] "scale.ppersonspo._se_EDF" "scale.p65plus._se_EDF"
## [21] "scale.pblack_1._se_EDF" "scale.phisp._se_EDF"
## [23] "I.RUCC....7.TRUE_se_EDF" "pred.se_EDF"
gwr.data$pov.t<-gwr.data$scale.ppersonspo./gwr.data$scale.ppersonspo._se
gwr.data$pov.t.p<-2*pt(-abs(gwr.data$pov.t) , dfree)
tbrks<- c(min(gwr.data$pov.t),qt(c( .975, .9, .1, .025),df=dfree, lower.tail = F),max(gwr.data$pov.t))
spplot(gwr.data, "pov.t", col.regions=cols, at=tbrks , main="t-stat %Poverty", col="transparent")
gwr.data$black.t<-gwr.data$scale.pblack_1./gwr.data$scale.pblack_1._se
gwr.data$black.t.p<-2*pt(-abs(gwr.data$black.t) , dfree)
tbrks<- c(min(gwr.data$black.t),qt(c( .975, .9, .1, .025),df=dfree, lower.tail = F),max(gwr.data$black.t))
spplot(gwr.data, "black.t", col.regions=cols,at=tbrks , main="t-stat %Black", col="transparent")
gwr.data$hisp.t<-gwr.data$scale.phisp./gwr.data$scale.phisp._se
gwr.data$hisp.t.p<-2*pt(-abs(gwr.data$hisp.t) , dfree)
tbrks<- c(min(gwr.data$hisp.t),qt(c( .975, .9, .1, .025),df=dfree, lower.tail = F),max(gwr.data$hisp.t))
spplot(gwr.data, "hisp.t", col.regions=cols, at=tbrks, main="t-stat %Hispanic", col="transparent")
#Local R^2
brks<-seq(0, 1, length.out = 9); cols=brewer.pal(n=9, "Reds")
spplot(gwr.a$SDF, "localR2", at=brks, col.regions=cols, main="local R^2", col="transparent")
The first step in the construction of spatial regimes is to do a hierarchical clustering method. We use a distance matrix between all observations for this In this case, we’re using the GWR coefficients as our data.
dat.dist<-dist(gwr.data@data[, 3:7])
#The hclust() function does basic hierarchical clustering
#according to several different methods, we'll use Ward's method
clust.dat<-hclust(dat.dist, method="ward.D")
#And we'll plot the dendrogram, or tree plot
plot(clust.dat)
#I only want a few groups, so I'll cut the tree so I get 5 clusters
gwr.data$clus<-cutree(clust.dat, k=4)
#i'll use table() to get the frequencies of each cluster
table(gwr.data$clus)
##
## 1 2 3 4
## 134 348 93 291
#to get it to plot right, we have to convert the cluster number
#to a factor variable
gwr.data$b.cf<-as.factor(gwr.data$clus)
spplot(gwr.data,"b.cf", col.regions=brewer.pal(4, "Accent"), main="GWR Spatial Regimes", col="transparent")
#We can get the means of each coefficient for each of the regime areas
b.means<-aggregate(gwr.data@data[, 3:5], by=list(gwr.data$clus), mean)
b.means
## Group.1 scale.ppersonspo. scale.p65plus. scale.pblack_1.
## 1 1 0.0006567404 0.002269058 3.272880e-04
## 2 2 0.0006995221 0.002316225 1.266564e-04
## 3 3 0.0006656177 0.002479362 1.761626e-05
## 4 4 0.0002322601 0.002385656 4.692997e-04
#The main idea behind examining spatial regimes is that we are looking for areas where the model may be working differently, we've done this, now we will
#fit separate ols models to each regime
spdat$cluster<-gwr.data$clus
lm1<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7),spdat, subset=spdat$cluster==1)
lm2<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7),spdat, subset=spdat$cluster==2)
lm3<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7),spdat, subset=spdat$cluster==3)
lm4<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7),spdat, subset=spdat$cluster==4)
#examine each of the spatial regimes
summary(lm1)
##
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat,
## subset = spdat$cluster == 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0037214 -0.0006349 -0.0000029 0.0004542 0.0030069
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0097931 0.0001895 51.673 < 2e-16 ***
## scale(ppersonspo) 0.0005286 0.0001448 3.652 0.000379 ***
## scale(p65plus) 0.0020632 0.0001198 17.218 < 2e-16 ***
## scale(pblack_1) 0.0001018 0.0001181 0.861 0.390597
## scale(phisp) -0.0018129 0.0002552 -7.105 7.47e-11 ***
## I(RUCC >= 7)TRUE 0.0000471 0.0001993 0.236 0.813563
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.001006 on 128 degrees of freedom
## Multiple R-squared: 0.7932, Adjusted R-squared: 0.7851
## F-statistic: 98.17 on 5 and 128 DF, p-value: < 2.2e-16
summary(lm2)
##
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat,
## subset = spdat$cluster == 2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0042561 -0.0005575 0.0000333 0.0005845 0.0028865
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.069e-02 8.774e-05 121.890 < 2e-16 ***
## scale(ppersonspo) 7.185e-04 1.058e-04 6.792 4.9e-11 ***
## scale(p65plus) 2.242e-03 7.741e-05 28.963 < 2e-16 ***
## scale(pblack_1) -9.560e-05 1.219e-04 -0.784 0.434
## scale(phisp) -9.431e-04 9.797e-05 -9.627 < 2e-16 ***
## I(RUCC >= 7)TRUE -1.640e-04 1.396e-04 -1.175 0.241
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.001114 on 342 degrees of freedom
## Multiple R-squared: 0.8315, Adjusted R-squared: 0.829
## F-statistic: 337.5 on 5 and 342 DF, p-value: < 2.2e-16
summary(lm3)
##
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat,
## subset = spdat$cluster == 3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0026446 -0.0006281 -0.0000377 0.0006559 0.0032255
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.092e-03 1.469e-03 6.190 1.92e-08 ***
## scale(ppersonspo) 6.966e-04 1.779e-04 3.917 0.000178 ***
## scale(p65plus) 2.585e-03 2.695e-04 9.590 2.78e-15 ***
## scale(pblack_1) -5.846e-05 1.446e-04 -0.404 0.686912
## scale(phisp) -4.293e-03 2.443e-03 -1.757 0.082377 .
## I(RUCC >= 7)TRUE -5.338e-04 2.752e-04 -1.940 0.055658 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.001077 on 87 degrees of freedom
## Multiple R-squared: 0.6692, Adjusted R-squared: 0.6502
## F-statistic: 35.2 on 5 and 87 DF, p-value: < 2.2e-16
summary(lm4)
##
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat,
## subset = spdat$cluster == 4)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0042116 -0.0006219 0.0000312 0.0007016 0.0031745
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.040e-02 1.116e-04 93.204 < 2e-16 ***
## scale(ppersonspo) 1.589e-04 9.909e-05 1.603 0.10996
## scale(p65plus) 2.516e-03 6.619e-05 38.019 < 2e-16 ***
## scale(pblack_1) 6.119e-04 1.450e-04 4.221 3.28e-05 ***
## scale(phisp) -1.912e-04 8.480e-05 -2.254 0.02494 *
## I(RUCC >= 7)TRUE -4.607e-04 1.508e-04 -3.056 0.00246 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.001123 on 285 degrees of freedom
## Multiple R-squared: 0.8664, Adjusted R-squared: 0.864
## F-statistic: 369.6 on 5 and 285 DF, p-value: < 2.2e-16
Or, we could cluster on the original attributes in the data to find similar places based on their demographic variables:
dem.dist<-dist(spdat@data[, c("ppersonspo", "p65plus", "pblack_1", "phisp")])
#The hclust() function does basic hierarchical clustering
#according to several different methods, we'll use Ward's method
clust.dem<-hclust(dem.dist, method="ward.D")
#And we'll plot the dendrogram, or tree plot
plot(clust.dem)
#I only want a few groups, so I'll cut the tree so I get 4 clusters
spdat$clus<-cutree(clust.dem, k=5)
#i'll use table() to get the frequencies of each cluster
table(spdat$clus)
##
## 1 2 3 4 5
## 121 437 81 145 82
#to get it to plot right, we have to convert the cluster number
#to a factor variable
spdat$d.cf<-as.factor(spdat$clus)
spplot(spdat,"d.cf", col.regions=brewer.pal(5, "Accent"), main="Demographic Spatial Regimes", col="transparent")
#We can get the means of each coefficient for each of the regime areas
b.means<-aggregate(spdat@data[, c("ppersonspo", "p65plus", "pblack_1", "phisp")], by=list(spdat$d.cf), mean)
b.means
## Group.1 ppersonspo p65plus pblack_1 phisp
## 1 1 0.1870019 0.1347202 0.24460231 0.05091380
## 2 2 0.1333486 0.1601399 0.02868554 0.03607801
## 3 3 0.2691394 0.1334080 0.50796272 0.01291506
## 4 4 0.1592734 0.1560563 0.03786441 0.25512090
## 5 5 0.2359616 0.1297745 0.01963549 0.60438085
#fit separate ols models to each regime
lm1<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7),spdat, subset=spdat$d.cf==1)
lm2<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7),spdat, subset=spdat$d.cf==2)
lm3<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7),spdat, subset=spdat$d.cf==3)
lm4<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7),spdat, subset=spdat$d.cf==4)
lm5<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7),spdat, subset=spdat$d.cf==5)
#examine each of the spatial regimes
summary(lm1)
##
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat,
## subset = spdat$d.cf == 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0040618 -0.0004403 -0.0000458 0.0004309 0.0032204
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.058e-02 2.234e-04 47.347 < 2e-16 ***
## scale(ppersonspo) 4.268e-04 1.592e-04 2.681 0.00843 **
## scale(p65plus) 2.593e-03 1.437e-04 18.040 < 2e-16 ***
## scale(pblack_1) 9.025e-05 2.503e-04 0.361 0.71908
## scale(phisp) -1.407e-03 3.184e-04 -4.419 2.25e-05 ***
## I(RUCC >= 7)TRUE 5.660e-05 1.979e-04 0.286 0.77538
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0008934 on 115 degrees of freedom
## Multiple R-squared: 0.8305, Adjusted R-squared: 0.8231
## F-statistic: 112.7 on 5 and 115 DF, p-value: < 2.2e-16
summary(lm2)
##
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat,
## subset = spdat$d.cf == 2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0042864 -0.0005923 0.0000077 0.0006263 0.0033010
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.084e-02 1.871e-04 57.968 < 2e-16 ***
## scale(ppersonspo) 7.084e-04 7.846e-05 9.029 < 2e-16 ***
## scale(p65plus) 2.509e-03 5.758e-05 43.566 < 2e-16 ***
## scale(pblack_1) 5.618e-04 2.379e-04 2.362 0.018637 *
## scale(phisp) -7.625e-04 2.874e-04 -2.653 0.008270 **
## I(RUCC >= 7)TRUE -4.274e-04 1.197e-04 -3.570 0.000397 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.00109 on 431 degrees of freedom
## Multiple R-squared: 0.8675, Adjusted R-squared: 0.866
## F-statistic: 564.4 on 5 and 431 DF, p-value: < 2.2e-16
summary(lm3)
##
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat,
## subset = spdat$d.cf == 3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.723e-03 -4.447e-04 2.352e-05 6.502e-04 1.918e-03
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.039e-02 1.362e-03 7.630 6.02e-11 ***
## scale(ppersonspo) 4.289e-04 2.122e-04 2.021 0.0468 *
## scale(p65plus) 2.652e-03 2.594e-04 10.225 7.22e-16 ***
## scale(pblack_1) 2.052e-04 2.441e-04 0.841 0.4032
## scale(phisp) -8.199e-04 2.261e-03 -0.363 0.7179
## I(RUCC >= 7)TRUE 8.991e-05 2.584e-04 0.348 0.7288
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.001042 on 75 degrees of freedom
## Multiple R-squared: 0.6484, Adjusted R-squared: 0.6249
## F-statistic: 27.66 on 5 and 75 DF, p-value: 9.194e-16
summary(lm4)
##
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat,
## subset = spdat$d.cf == 4)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0043452 -0.0006337 0.0000419 0.0007731 0.0031712
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0108390 0.0002709 40.015 < 2e-16 ***
## scale(ppersonspo) 0.0009881 0.0002048 4.824 3.65e-06 ***
## scale(p65plus) 0.0021361 0.0001222 17.480 < 2e-16 ***
## scale(pblack_1) 0.0001032 0.0004490 0.230 0.81851
## scale(phisp) -0.0007386 0.0002557 -2.889 0.00449 **
## I(RUCC >= 7)TRUE -0.0005763 0.0002387 -2.414 0.01708 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.001215 on 139 degrees of freedom
## Multiple R-squared: 0.8231, Adjusted R-squared: 0.8167
## F-statistic: 129.4 on 5 and 139 DF, p-value: < 2.2e-16
summary(lm5)
##
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) +
## scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat,
## subset = spdat$d.cf == 5)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0040018 -0.0007696 -0.0000850 0.0008709 0.0025106
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.044e-02 6.271e-04 16.643 < 2e-16 ***
## scale(ppersonspo) -7.492e-05 1.819e-04 -0.412 0.682
## scale(p65plus) 1.367e-03 2.009e-04 6.805 2.04e-09 ***
## scale(pblack_1) 1.380e-03 1.092e-03 1.264 0.210
## scale(phisp) -2.976e-04 2.380e-04 -1.251 0.215
## I(RUCC >= 7)TRUE 1.261e-04 3.070e-04 0.411 0.682
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.001247 on 76 degrees of freedom
## Multiple R-squared: 0.4972, Adjusted R-squared: 0.4642
## F-statistic: 15.03 on 5 and 76 DF, p-value: 3.025e-10