Geographically Weighted Regression

Generally, if we have a continuous outcome, we consider using the OLS model and when we have data collected over space, we have other assumptions too.

library(spdep)
library(maptools)
library(car)
library(lmtest)
library(spgwr)
library(classInt)
library(RColorBrewer)
setwd("/Users/ozd504/Google Drive/classes/dem7263/class17//data")

Stationarity is a general term in statistics/math, generally it means something doesn’t change over time or space e.g. stationary population e.g. stationary time series In spatial statistics, we can stay stationarity = homogeneity of an effect, or, that a process works the same regardless of where you observe the process. In spatial statistics, the latter can be a weak assumption, and we can ask, does X affect Y differently at different geographic locations, or in terms of parameters: If we estimate in OLS \(\beta\) = .5, are there locations in the data where \(\beta\) != .5?

Approaches to non-stationarity

There are several approaches to dealing with this problem. Among them are: * Create zones of homogeneity and stratify * Allow parameters to vary constantly

The first technique is stratification, or spatial regimes. This is where we stratify the data by either a known fixed quality, such as the state in which a county is located, or perhaps some other socio-demographic or socio-economic variable. Technique 1 - Stratification

We then fit separate regression models for each region. These could be OLS or some other specification we have discussed.

First, we read in our data and make a spatial representation of it. I only ues a few state for brevity.

spdat<-readShapePoly("/Users/ozd504/Google Drive/classes/dem7263/class17//data/usdata_mort.shp")
## Warning: use rgdal::readOGR or sf::st_read
spdat$population<-as.numeric(as.character(spdat$Population))
spdat$deaths<-as.numeric(as.character(spdat$Deaths))
spdat<-spdat[spdat$STATE%in%c("05","22", "20","08","29","28", "40","35", "48"),]
spdat$pov_group<-cut(spdat$ppersonspo, breaks = quantile(spdat$ppersonspo, p=c(0, .3, .6, 1)), include.lowest = T)
lvs<-levels(spdat$pov_group)
lvs
## [1] "[0.0212,0.129]" "(0.129,0.174]"  "(0.174,0.509]"
nbs<-knearneigh(coordinates(spdat), k=2)

nbsmg1<-knearneigh(coordinates(spdat[spdat$pov_group==lvs[1],]), k=2)
nbsmg2<-knearneigh(coordinates(spdat[spdat$pov_group==lvs[2],]), k=2)
nbsmg3<-knearneigh(coordinates(spdat[spdat$pov_group==lvs[3],]), k=2)

nbs<-knn2nb(nbs,sym=T)
nbs1<-knn2nb(nbsmg1, sym=T)
nbs2<-knn2nb(nbsmg2, sym=T)
nbs3<-knn2nb(nbsmg3, sym=T)

Here’s the basic OLS model:

fit.1<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), data=spdat)
summary(fit.1)
## 
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) + 
##     scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0071774 -0.0005667  0.0000313  0.0006072  0.0033003 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        1.064e-02  5.965e-05 178.450  < 2e-16 ***
## scale(ppersonspo)  5.956e-04  5.948e-05  10.015  < 2e-16 ***
## scale(p65plus)     2.370e-03  4.646e-05  51.008  < 2e-16 ***
## scale(pblack_1)    1.625e-04  6.103e-05   2.662 0.007903 ** 
## scale(phisp)      -6.559e-04  5.309e-05 -12.356  < 2e-16 ***
## I(RUCC >= 7)TRUE  -3.351e-04  8.593e-05  -3.900 0.000104 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.001151 on 860 degrees of freedom
## Multiple R-squared:  0.8262, Adjusted R-squared:  0.8252 
## F-statistic: 817.7 on 5 and 860 DF,  p-value: < 2.2e-16

We could stratify by state:

#Examine basic regimes, using states as a nesting mechanism
spplot(spdat, "STATE")

fit.strat<-lm(I(deaths/population)~STATE/(scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7)), data=spdat)
#summary(fit.strat)
anova(fit.1, fit.strat)
## Analysis of Variance Table
## 
## Model 1: I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) + scale(pblack_1) + 
##     scale(phisp) + I(RUCC >= 7)
## Model 2: I(deaths/population) ~ STATE/(scale(ppersonspo) + scale(p65plus) + 
##     scale(pblack_1) + scale(phisp) + I(RUCC >= 7))
##   Res.Df        RSS Df  Sum of Sq      F   Pr(>F)    
## 1    860 0.00113860                                  
## 2    812 0.00095373 48 0.00018487 3.2791 4.18e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

which gives us 50 regression models, one for each state.

Or, we could examine an alternative regime, based on a demographic characteristic, say poverty:

spplot(spdat, "pov_group", col.regions=brewer.pal(n=3, "Greys"), col="transparent")

fit.strat2<-lm(I(deaths/population)~pov_group/(scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7)), data=spdat)
summary(fit.strat2)
## 
## Call:
## lm(formula = I(deaths/population) ~ pov_group/(scale(ppersonspo) + 
##     scale(p65plus) + scale(pblack_1) + scale(phisp) + I(RUCC >= 
##     7)), data = spdat)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0062054 -0.0005616  0.0000257  0.0006351  0.0032782 
## 
## Coefficients:
##                                             Estimate Std. Error t value
## (Intercept)                                1.076e-02  2.498e-04  43.101
## pov_group(0.129,0.174]                     4.674e-05  2.867e-04   0.163
## pov_group(0.174,0.509]                     7.295e-05  2.743e-04   0.266
## pov_group[0.0212,0.129]:scale(ppersonspo)  7.522e-04  2.437e-04   3.086
## pov_group(0.129,0.174]:scale(ppersonspo)   7.759e-04  3.764e-04   2.061
## pov_group(0.174,0.509]:scale(ppersonspo)   2.604e-04  1.016e-04   2.563
## pov_group[0.0212,0.129]:scale(p65plus)     2.447e-03  7.858e-05  31.144
## pov_group(0.129,0.174]:scale(p65plus)      2.128e-03  9.714e-05  21.905
## pov_group(0.174,0.509]:scale(p65plus)      2.371e-03  8.528e-05  27.804
## pov_group[0.0212,0.129]:scale(pblack_1)    2.535e-04  2.842e-04   0.892
## pov_group(0.129,0.174]:scale(pblack_1)     1.214e-04  1.542e-04   0.787
## pov_group(0.174,0.509]:scale(pblack_1)     2.069e-04  7.316e-05   2.828
## pov_group[0.0212,0.129]:scale(phisp)      -5.357e-04  1.670e-04  -3.208
## pov_group(0.129,0.174]:scale(phisp)       -7.779e-04  1.029e-04  -7.558
## pov_group(0.174,0.509]:scale(phisp)       -5.976e-04  6.734e-05  -8.874
## pov_group[0.0212,0.129]:I(RUCC >= 7)TRUE  -3.447e-04  1.671e-04  -2.063
## pov_group(0.129,0.174]:I(RUCC >= 7)TRUE   -4.785e-04  1.646e-04  -2.906
## pov_group(0.174,0.509]:I(RUCC >= 7)TRUE   -7.200e-05  1.310e-04  -0.550
##                                           Pr(>|t|)    
## (Intercept)                                < 2e-16 ***
## pov_group(0.129,0.174]                     0.87054    
## pov_group(0.174,0.509]                     0.79035    
## pov_group[0.0212,0.129]:scale(ppersonspo)  0.00209 ** 
## pov_group(0.129,0.174]:scale(ppersonspo)   0.03958 *  
## pov_group(0.174,0.509]:scale(ppersonspo)   0.01054 *  
## pov_group[0.0212,0.129]:scale(p65plus)     < 2e-16 ***
## pov_group(0.129,0.174]:scale(p65plus)      < 2e-16 ***
## pov_group(0.174,0.509]:scale(p65plus)      < 2e-16 ***
## pov_group[0.0212,0.129]:scale(pblack_1)    0.37254    
## pov_group(0.129,0.174]:scale(pblack_1)     0.43144    
## pov_group(0.174,0.509]:scale(pblack_1)     0.00479 ** 
## pov_group[0.0212,0.129]:scale(phisp)       0.00139 ** 
## pov_group(0.129,0.174]:scale(phisp)       1.06e-13 ***
## pov_group(0.174,0.509]:scale(phisp)        < 2e-16 ***
## pov_group[0.0212,0.129]:I(RUCC >= 7)TRUE   0.03942 *  
## pov_group(0.129,0.174]:I(RUCC >= 7)TRUE    0.00375 ** 
## pov_group(0.174,0.509]:I(RUCC >= 7)TRUE    0.58267    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.001134 on 848 degrees of freedom
## Multiple R-squared:  0.8335, Adjusted R-squared:  0.8302 
## F-statistic: 249.7 on 17 and 848 DF,  p-value: < 2.2e-16
anova(fit.1, fit.strat2)
## Analysis of Variance Table
## 
## Model 1: I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) + scale(pblack_1) + 
##     scale(phisp) + I(RUCC >= 7)
## Model 2: I(deaths/population) ~ pov_group/(scale(ppersonspo) + scale(p65plus) + 
##     scale(pblack_1) + scale(phisp) + I(RUCC >= 7))
##   Res.Df       RSS Df  Sum of Sq      F    Pr(>F)    
## 1    860 0.0011386                                   
## 2    848 0.0010909 12 4.7738e-05 3.0925 0.0002658 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(fit.strat2, fit.strat)
## Analysis of Variance Table
## 
## Model 1: I(deaths/population) ~ pov_group/(scale(ppersonspo) + scale(p65plus) + 
##     scale(pblack_1) + scale(phisp) + I(RUCC >= 7))
## Model 2: I(deaths/population) ~ STATE/(scale(ppersonspo) + scale(p65plus) + 
##     scale(pblack_1) + scale(phisp) + I(RUCC >= 7))
##   Res.Df        RSS Df  Sum of Sq      F    Pr(>F)    
## 1    848 0.00109086                                   
## 2    812 0.00095373 36 0.00013713 3.2432 1.263e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(spdat, main="Neighbors for low poverty counties")
plot(nbs1,coords=coordinates(spdat[spdat$pov_group==lvs[1],]), col=2, add=T)

plot(spdat, main="Neighbors for medium poverty counties")
plot(nbs2,coords=coordinates(spdat[spdat$pov_group==lvs[2],]), col=2, add=T)

plot(spdat, main="Neighbors for high poverty counties")
plot(nbs3,coords=coordinates(spdat[spdat$pov_group==lvs[3],]), col=2, add=T)

wts<-nb2listw(nbs)
wts1<-nb2listw(nbs1)
wts2<-nb2listw(nbs2)
wts3<-nb2listw(nbs3)

So we see non-stationarity in several of the covariates, with respect to state. The effects of poverty, age structure and race/ethnicty all vary significantly by state. Similar results are also seen for the poverty rate grouping variable, althought the final model comparison of the two stratified models shows more variation between states than between poverty groups.

If we wanted, we could stratify our analysis by state. Below, I simply stratify the analysis by state and use the OLS model.

fit.ar<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), data=spdat, subset=STATE=="05")
summary(fit.ar)
## 
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) + 
##     scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat, 
##     subset = STATE == "05")
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -2.356e-03 -4.771e-04  3.499e-05  5.518e-04  2.209e-03 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        0.0100603  0.0003889  25.870  < 2e-16 ***
## scale(ppersonspo)  0.0007007  0.0002263   3.097  0.00283 ** 
## scale(p65plus)     0.0020609  0.0001736  11.871  < 2e-16 ***
## scale(pblack_1)    0.0002793  0.0001576   1.773  0.08072 .  
## scale(phisp)      -0.0015382  0.0006603  -2.329  0.02277 *  
## I(RUCC >= 7)TRUE   0.0002247  0.0002492   0.902  0.37037    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0008916 on 69 degrees of freedom
## Multiple R-squared:  0.8106, Adjusted R-squared:  0.7969 
## F-statistic: 59.08 on 5 and 69 DF,  p-value: < 2.2e-16
fit.la<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), data=spdat, subset=STATE=="22")
summary(fit.la)
## 
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) + 
##     scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat, 
##     subset = STATE == "22")
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0035436 -0.0003792 -0.0000223  0.0005726  0.0018121 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        0.0094746  0.0011208   8.454 1.07e-11 ***
## scale(ppersonspo)  0.0003456  0.0002375   1.455    0.151    
## scale(p65plus)     0.0024852  0.0002506   9.916 4.25e-14 ***
## scale(pblack_1)    0.0001109  0.0001934   0.573    0.569    
## scale(phisp)      -0.0026665  0.0018815  -1.417    0.162    
## I(RUCC >= 7)TRUE   0.0003250  0.0003047   1.067    0.291    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0009234 on 58 degrees of freedom
## Multiple R-squared:  0.7974, Adjusted R-squared:  0.7799 
## F-statistic: 45.65 on 5 and 58 DF,  p-value: < 2.2e-16
fit.ok<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), data=spdat, subset=STATE=="40")
summary(fit.ok)
## 
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) + 
##     scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat, 
##     subset = STATE == "40")
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -3.029e-03 -5.192e-04 -5.854e-05  6.775e-04  2.030e-03 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        0.0115100  0.0002970  38.755  < 2e-16 ***
## scale(ppersonspo)  0.0008804  0.0001885   4.671 1.38e-05 ***
## scale(p65plus)     0.0026697  0.0001781  14.993  < 2e-16 ***
## scale(pblack_1)    0.0002233  0.0004935   0.452    0.652    
## scale(phisp)       0.0001023  0.0004555   0.225    0.823    
## I(RUCC >= 7)TRUE  -0.0002993  0.0002681  -1.116    0.268    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0009816 on 71 degrees of freedom
## Multiple R-squared:  0.8346, Adjusted R-squared:  0.8229 
## F-statistic: 71.65 on 5 and 71 DF,  p-value: < 2.2e-16
fit.nm<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), data=spdat, subset=STATE=="35")
summary(fit.nm)
## 
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) + 
##     scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat, 
##     subset = STATE == "35")
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0044801 -0.0007503 -0.0000290  0.0008573  0.0043913 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        0.0094413  0.0021653   4.360  0.00017 ***
## scale(ppersonspo)  0.0005935  0.0003167   1.874  0.07175 .  
## scale(p65plus)     0.0016622  0.0002713   6.126 1.52e-06 ***
## scale(pblack_1)    0.0007923  0.0037894   0.209  0.83595    
## scale(phisp)       0.0001598  0.0003058   0.522  0.60560    
## I(RUCC >= 7)TRUE  -0.0005091  0.0006664  -0.764  0.45152    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.001638 on 27 degrees of freedom
## Multiple R-squared:  0.6221, Adjusted R-squared:  0.5521 
## F-statistic:  8.89 on 5 and 27 DF,  p-value: 4.382e-05
fit.tx<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), data=spdat, subset=STATE=="48")
summary(fit.tx)
## 
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) + 
##     scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat, 
##     subset = STATE == "48")
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0042947 -0.0005506  0.0000912  0.0006799  0.0029490 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        1.049e-02  1.146e-04  91.530  < 2e-16 ***
## scale(ppersonspo)  2.798e-04  1.228e-04   2.278 0.023568 *  
## scale(p65plus)     2.351e-03  9.295e-05  25.292  < 2e-16 ***
## scale(pblack_1)    4.812e-04  1.997e-04   2.410 0.016702 *  
## scale(phisp)      -4.304e-04  1.184e-04  -3.636 0.000337 ***
## I(RUCC >= 7)TRUE  -3.663e-04  1.676e-04  -2.186 0.029786 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.00111 on 246 degrees of freedom
## Multiple R-squared:  0.8453, Adjusted R-squared:  0.8422 
## F-statistic: 268.9 on 5 and 246 DF,  p-value: < 2.2e-16

Or, we could run a SAR model for each subset using the weights we created above:

#first the "global" model
efit<-lagsarlm(scale(I(deaths/population))~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), data=spdat, listw=wts, method="MC")

Then, a regime specification as used by Brazil 2015

efit0<-lagsarlm(scale(I(deaths/population))~pov_group/(scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7)), data=spdat, listw=wts, method="MC")
summary(efit0)
## 
## Call:
## lagsarlm(formula = scale(I(deaths/population)) ~ pov_group/(scale(ppersonspo) + 
##     scale(p65plus) + scale(pblack_1) + scale(phisp) + I(RUCC >= 
##     7)), data = spdat, listw = wts, method = "MC")
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -2.2296199 -0.2000084  0.0083389  0.2228830  1.1937891 
## 
## Type: lag 
## Coefficients: (asymptotic standard errors) 
##                                            Estimate Std. Error z value
## (Intercept)                                0.089600   0.088193  1.0160
## pov_group(0.129,0.174]                     0.024349   0.101172  0.2407
## pov_group(0.174,0.509]                     0.045556   0.096811  0.4706
## pov_group[0.0212,0.129]:scale(ppersonspo)  0.230477   0.086402  2.6675
## pov_group(0.129,0.174]:scale(ppersonspo)   0.229471   0.132925  1.7263
## pov_group(0.174,0.509]:scale(ppersonspo)   0.091278   0.035816  2.5485
## pov_group[0.0212,0.129]:scale(p65plus)     0.833853   0.029727 28.0500
## pov_group(0.129,0.174]:scale(p65plus)      0.732307   0.035260 20.7685
## pov_group(0.174,0.509]:scale(p65plus)      0.829172   0.030658 27.0462
## pov_group[0.0212,0.129]:scale(pblack_1)    0.093496   0.100192  0.9332
## pov_group(0.129,0.174]:scale(pblack_1)     0.058091   0.054372  1.0684
## pov_group(0.174,0.509]:scale(pblack_1)     0.079272   0.025799  3.0727
## pov_group[0.0212,0.129]:scale(phisp)      -0.150249   0.059375 -2.5305
## pov_group(0.129,0.174]:scale(phisp)       -0.247997   0.036793 -6.7404
## pov_group(0.174,0.509]:scale(phisp)       -0.186614   0.024301 -7.6791
## pov_group[0.0212,0.129]:I(RUCC >= 7)TRUE  -0.116687   0.058925 -1.9803
## pov_group(0.129,0.174]:I(RUCC >= 7)TRUE   -0.179497   0.058067 -3.0912
## pov_group(0.174,0.509]:I(RUCC >= 7)TRUE   -0.052215   0.046358 -1.1264
##                                            Pr(>|z|)
## (Intercept)                                0.309652
## pov_group(0.129,0.174]                     0.809811
## pov_group(0.174,0.509]                     0.637951
## pov_group[0.0212,0.129]:scale(ppersonspo)  0.007642
## pov_group(0.129,0.174]:scale(ppersonspo)   0.084290
## pov_group(0.174,0.509]:scale(ppersonspo)   0.010818
## pov_group[0.0212,0.129]:scale(p65plus)    < 2.2e-16
## pov_group(0.129,0.174]:scale(p65plus)     < 2.2e-16
## pov_group(0.174,0.509]:scale(p65plus)     < 2.2e-16
## pov_group[0.0212,0.129]:scale(pblack_1)    0.350732
## pov_group(0.129,0.174]:scale(pblack_1)     0.285332
## pov_group(0.174,0.509]:scale(pblack_1)     0.002122
## pov_group[0.0212,0.129]:scale(phisp)       0.011390
## pov_group(0.129,0.174]:scale(phisp)       1.579e-11
## pov_group(0.174,0.509]:scale(phisp)       1.599e-14
## pov_group[0.0212,0.129]:I(RUCC >= 7)TRUE   0.047673
## pov_group(0.129,0.174]:I(RUCC >= 7)TRUE    0.001993
## pov_group(0.174,0.509]:I(RUCC >= 7)TRUE    0.260013
## 
## Rho: 0.11712, LR test value: 29.179, p-value: 6.6006e-08
## Asymptotic standard error: 0.021705
##     z-value: 5.3963, p-value: 6.8029e-08
## Wald statistic: 29.12, p-value: 6.8029e-08
## 
## Log likelihood: -437.462 for lag model
## ML residual variance (sigma squared): 0.15993, (sigma: 0.39991)
## Number of observations: 866 
## Number of parameters estimated: 20 
## AIC: 914.92, (AIC for lm: 942.1)
## LM test for residual autocorrelation
## test value: 4.5618, p-value: 0.032693
anova(efit, efit0)
##       Model df    AIC  logLik Test L.Ratio   p-value
## efit      1  8 922.52 -453.26    1                  
## efit0     2 20 914.92 -437.46    2  31.598 0.0015959

Or, I could fit the fully stratified models, one regime at a time by subsetting and using the group-specific spatial weights:

efit1<-lagsarlm(scale(I(deaths/population))~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), data=spdat[spdat$pov_group==lvs[1],], listw=wts1, method="MC")

efit2<-lagsarlm(scale(I(deaths/population))~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), data=spdat[spdat$pov_group==lvs[2],], listw=wts2, method="MC")

efit3<-lagsarlm(scale(I(deaths/population))~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), data=spdat[spdat$pov_group==lvs[3],], listw=wts3, method="MC")

summary(efit1)
## 
## Call:lagsarlm(formula = scale(I(deaths/population)) ~ scale(ppersonspo) + 
##     scale(p65plus) + scale(pblack_1) + scale(phisp) + I(RUCC >= 
##     7), data = spdat[spdat$pov_group == lvs[1], ], listw = wts1, 
##     method = "MC")
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -1.2400498 -0.1599200 -0.0049624  0.1863141  0.9686230 
## 
## Type: lag 
## Coefficients: (asymptotic standard errors) 
##                    Estimate Std. Error z value  Pr(>|z|)
## (Intercept)        0.055588   0.030362  1.8308  0.067126
## scale(ppersonspo)  0.073129   0.023164  3.1570  0.001594
## scale(p65plus)     0.868721   0.031857 27.2695 < 2.2e-16
## scale(pblack_1)    0.021391   0.022688  0.9429  0.345756
## scale(phisp)      -0.069007   0.022166 -3.1132  0.001851
## I(RUCC >= 7)TRUE  -0.103967   0.045610 -2.2795  0.022639
## 
## Rho: 0.093594, LR test value: 9.0027, p-value: 0.0026958
## Asymptotic standard error: 0.031567
##     z-value: 2.9649, p-value: 0.003028
## Wald statistic: 8.7905, p-value: 0.003028
## 
## Log likelihood: -64.50503 for lag model
## ML residual variance (sigma squared): 0.095837, (sigma: 0.30958)
## Number of observations: 260 
## Number of parameters estimated: 8 
## AIC: 145.01, (AIC for lm: 152.01)
## LM test for residual autocorrelation
## test value: 10.272, p-value: 0.0013508
summary(efit2)
## 
## Call:lagsarlm(formula = scale(I(deaths/population)) ~ scale(ppersonspo) + 
##     scale(p65plus) + scale(pblack_1) + scale(phisp) + I(RUCC >= 
##     7), data = spdat[spdat$pov_group == lvs[2], ], listw = wts2, 
##     method = "MC")
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -2.617670 -0.222604 -0.011701  0.263513  1.339840 
## 
## Type: lag 
## Coefficients: (asymptotic standard errors) 
##                    Estimate Std. Error z value  Pr(>|z|)
## (Intercept)        0.092452   0.045575  2.0285  0.042505
## scale(ppersonspo)  0.061688   0.031415  1.9636  0.049573
## scale(p65plus)     0.818666   0.038946 21.0203 < 2.2e-16
## scale(pblack_1)    0.038639   0.035990  1.0736  0.283012
## scale(phisp)      -0.225453   0.033643 -6.7013 2.066e-11
## I(RUCC >= 7)TRUE  -0.193816   0.070547 -2.7473  0.006008
## 
## Rho: 0.068926, LR test value: 2.5758, p-value: 0.10851
## Asymptotic standard error: 0.04315
##     z-value: 1.5973, p-value: 0.11019
## Wald statistic: 2.5515, p-value: 0.11019
## 
## Log likelihood: -181.0869 for lag model
## ML residual variance (sigma squared): 0.23532, (sigma: 0.4851)
## Number of observations: 260 
## Number of parameters estimated: 8 
## AIC: 378.17, (AIC for lm: 378.75)
## LM test for residual autocorrelation
## test value: 1.8726, p-value: 0.17118
summary(efit3)
## 
## Call:lagsarlm(formula = scale(I(deaths/population)) ~ scale(ppersonspo) + 
##     scale(p65plus) + scale(pblack_1) + scale(phisp) + I(RUCC >= 
##     7), data = spdat[spdat$pov_group == lvs[3], ], listw = wts3, 
##     method = "MC")
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -1.858575 -0.238619  0.027251  0.264560  1.337662 
## 
## Type: lag 
## Coefficients: (asymptotic standard errors) 
##                    Estimate Std. Error z value  Pr(>|z|)
## (Intercept)        0.022702   0.039200  0.5791  0.562502
## scale(ppersonspo)  0.084316   0.033428  2.5223  0.011658
## scale(p65plus)     0.773182   0.030059 25.7218 < 2.2e-16
## scale(pblack_1)    0.116802   0.039375  2.9664  0.003013
## scale(phisp)      -0.275942   0.039832 -6.9277 4.277e-12
## I(RUCC >= 7)TRUE  -0.042717   0.053836 -0.7935  0.427509
## 
## Rho: 0.10926, LR test value: 8.3474, p-value: 0.0038624
## Asymptotic standard error: 0.036687
##     z-value: 2.9783, p-value: 0.0028984
## Wald statistic: 8.8703, p-value: 0.0028984
## 
## Log likelihood: -226.843 for lag model
## ML residual variance (sigma squared): 0.21626, (sigma: 0.46504)
## Number of observations: 346 
## Number of parameters estimated: 8 
## AIC: 469.69, (AIC for lm: 476.03)
## LM test for residual autocorrelation
## test value: 0.64047, p-value: 0.42354

GWR Model

First proposed by Brunsdon et al 1996

Their model was specified, where now each \(\beta_k\) is estimated at the location \(u_i,v_j\) where i and j are the coordinates or geographic location of the observation i. \(\beta_k(ui,vj)\) is the local realization of \(\beta\) at location i,j. This constructs a trend surface of parameter values for each independent variable and the model intercept.
Note that the basic OLS regression model above is just a special case of the GWR model where the coefficients are constant over space. The parameters in the GWR are estimated by weighted least squares. The weighting matrix is a diagonal matrix, with each diagonal element being a function of the location of the observation. if Wi is the weighting matrix at location i, then the parameter estimate at that location would be

\(\hat \beta_i = (X'WX)^{-1} X'WY\)

The role of the weight matrix is to give more value to observations that are close to i. as it is assumed that observations that are close will influence each other more than those that are far away. This is NOT the same W matrix as was used in Moran’s I!

W is generally constructed based on the distances between locations, and if \(d_ij\) < d , then \(w_{ij}\)=1, and 0 otherwise. Brunsdson, Charlton, and Fotheringham used a kernel-density approach to measure W

\(w_{ij} = exp(- \frac{1}{2}(d_{ij}/b)^2)\)

where b is the bandwidth parameter to be estimated. under this weighting scheme, the weight for points that exactly co-occur will be 1 and for those that are the distance \(d_{ij}\) apart, the weight will decrease as a Gaussian curve as \(d_{ij}\) increases.

GWR Kernels

GWR Kernels

There are several ways to do this, but most programs use a cross-validation method to choose the optimal kernel bandwidth. The cross validation method takes the form: \(CV = \Sigma_i \left[y_i - \hat y_{\neq i} (\beta) \right ]^2\)

Where \(\hat y_{\neq i} (\beta)\) is the fitted value of \(y_i\) with the observations for point i omitted from the calibration process. This in effect minimizes the sum of squared errors at all locations i, and arrives at an optimal bandwidth.

When the model is fit, we can also calculate local z/t tests to test whether the effect of a specifix X variable is significant at a particular location. Likewise, we can calculate a local \(R^2\) measure at each location.

Leung et al 2000 proposed a series of F tests for model improvement over OLS, and they also derive an F test for variation in the beta coefficients (so called F3 test.

GWR Modeling

The advantage of GWR is that we can visualize the difference in the covariate effects as a continuous surface. There are a few steps to fitting the GWR model. First we need to select a bandwidth for the weighting algorithm

gwr.b.f<-gwr.sel(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), spdat)
## Bandwidth: 969429.9 CV score: 0.00113975 
## Bandwidth: 1567004 CV score: 0.001150311 
## Bandwidth: 600108.5 CV score: 0.001123044 
## Bandwidth: 371855.3 CV score: 0.001093567 
## Bandwidth: 230787.1 CV score: 0.001058121 
## Bandwidth: 143602.1 CV score: 0.001041474 
## Bandwidth: 89718.85 CV score: 0.001110606 
## Bandwidth: 176903.8 CV score: 0.001044459 
## Bandwidth: 136418.3 CV score: 0.001043043 
## Bandwidth: 154363.4 CV score: 0.001041077 
## Bandwidth: 150804.8 CV score: 0.001041 
## Bandwidth: 151244.2 CV score: 0.001040999 
## Bandwidth: 151072.9 CV score: 0.001040999 
## Bandwidth: 151070.5 CV score: 0.001040999 
## Bandwidth: 151070.2 CV score: 0.001040999 
## Bandwidth: 151070.2 CV score: 0.001040999 
## Bandwidth: 151070.2 CV score: 0.001040999 
## Bandwidth: 151070.2 CV score: 0.001040999 
## Bandwidth: 151070.2 CV score: 0.001040999
gwr.b.f
## [1] 151070.2
#this is the distance (in meters, because our data are projected in a system measured in meters), which the weighting fuction will search, and include all observations within this radius.

gwr.f<-gwr(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), spdat, bandwidth=gwr.b.f, se.fit=T, hatmatrix=T)
gwr.f
## Call:
## gwr(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) + 
##     scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat, 
##     bandwidth = gwr.b.f, hatmatrix = T, se.fit = T)
## Kernel function: gwr.Gauss 
## Fixed bandwidth: 151070.2 
## Summary of GWR coefficient estimates at data points:
##                         Min.    1st Qu.     Median    3rd Qu.       Max.
## X.Intercept.       5.536e-03  1.021e-02  1.050e-02  1.084e-02  1.217e-02
## scale.ppersonspo. -4.253e-04  4.215e-04  5.645e-04  7.620e-04  1.274e-03
## scale.p65plus.     1.458e-03  2.218e-03  2.351e-03  2.456e-03  2.841e-03
## scale.pblack_1.   -6.664e-03 -2.847e-06  1.390e-04  3.199e-04  4.986e-03
## scale.phisp.      -3.435e-03 -1.015e-03 -8.177e-04 -3.383e-04  9.259e-04
## I.RUCC....7.TRUE  -1.389e-03 -4.151e-04 -2.689e-04 -5.552e-05  9.886e-04
##                    Global
## X.Intercept.       0.0106
## scale.ppersonspo.  0.0006
## scale.p65plus.     0.0024
## scale.pblack_1.    0.0002
## scale.phisp.      -0.0007
## I.RUCC....7.TRUE  -0.0003
## Number of data points: 866 
## Effective number of parameters (residual: 2traceS - traceS'S): 118.3647 
## Effective degrees of freedom (residual: 2traceS - traceS'S): 747.6353 
## Sigma (residual: 2traceS - traceS'S): 0.001017698 
## Effective number of parameters (model: traceS): 84.46943 
## Effective degrees of freedom (model: traceS): 781.5306 
## Sigma (model: traceS): 0.000995384 
## Sigma (ML): 0.000945594 
## AICc (GWR p. 61, eq 2.33; p. 96, eq. 4.21): -9413.622 
## AIC (GWR p. 96, eq. 4.22): -9519.053 
## Residual sum of squares: 0.0007743321 
## Quasi-global R2: 0.8818083
BFC02.gwr.test(gwr.f)
## 
##  Brunsdon, Fotheringham & Charlton (2002, pp. 91-2) ANOVA
## 
## data:  gwr.f
## F = 1.4704, df1 = 860.00, df2 = 747.64, p-value = 3.185e-08
## alternative hypothesis: greater
## sample estimates:
## SS OLS residuals SS GWR residuals 
##     0.0011386013     0.0007743321
BFC99.gwr.test(gwr.f)
## 
##  Brunsdon, Fotheringham & Charlton (1999) ANOVA
## 
## data:  gwr.f
## F = 3.1301, df1 = 501.02, df2 = 790.40, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## SS GWR improvement   SS GWR residuals 
##       0.0003642692       0.0007743321
LMZ.F1GWR.test(gwr.f)
## 
##  Leung et al. (2000) F(1) test
## 
## data:  gwr.f
## F = 0.78228, df1 = 790.4, df2 = 860.0, p-value = 0.0002232
## alternative hypothesis: less
## sample estimates:
## SS OLS residuals SS GWR residuals 
##     0.0011386013     0.0007743321
LMZ.F2GWR.test(gwr.f)
## 
##  Leung et al. (2000) F(2) test
## 
## data:  gwr.f
## F = 2.4486, df1 = 175.56, df2 = 860.00, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
##   SS OLS residuals SS GWR improvement 
##       0.0011386013       0.0003642692
LMZ.F3GWR.test(gwr.f)
## 
## Leung et al. (2000) F(3) test
## 
##                   F statistic Numerator d.f. Denominator d.f.     Pr(>)
## (Intercept)            1.1840        46.7226            790.4    0.1906
## scale(ppersonspo)      2.6035       210.0455            790.4 < 2.2e-16
## scale(p65plus)         1.8851       227.5250            790.4 1.475e-10
## scale(pblack_1)        1.3051        29.9402            790.4    0.1285
## scale(phisp)           1.5632        33.9424            790.4    0.0225
## I(RUCC >= 7)TRUE       1.6855       297.4143            790.4 8.846e-09
##                      
## (Intercept)          
## scale(ppersonspo) ***
## scale(p65plus)    ***
## scale(pblack_1)      
## scale(phisp)      *  
## I(RUCC >= 7)TRUE  ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
gwr.dat<-gwr.f$SDF
cols<-brewer.pal(n=6, name="RdBu")
names(gwr.dat)
##  [1] "sum.w"                    "X.Intercept."            
##  [3] "scale.ppersonspo."        "scale.p65plus."          
##  [5] "scale.pblack_1."          "scale.phisp."            
##  [7] "I.RUCC....7.TRUE"         "X.Intercept._se"         
##  [9] "scale.ppersonspo._se"     "scale.p65plus._se"       
## [11] "scale.pblack_1._se"       "scale.phisp._se"         
## [13] "I.RUCC....7.TRUE_se"      "gwr.e"                   
## [15] "pred"                     "pred.se"                 
## [17] "localR2"                  "X.Intercept._se_EDF"     
## [19] "scale.ppersonspo._se_EDF" "scale.p65plus._se_EDF"   
## [21] "scale.pblack_1._se_EDF"   "scale.phisp._se_EDF"     
## [23] "I.RUCC....7.TRUE_se_EDF"  "pred.se_EDF"
#pdf("gwrfixed.pdf")
spplot(gwr.dat, "scale.ppersonspo.", at=quantile(gwr.f$SDF$scale.ppersonspo.), col.regions=cols, main="% Poverty effect", col="transparent")

spplot(gwr.dat, "scale.pblack_1.", at=quantile(gwr.f$SDF$scale.pblack_1.), col.regions=cols, main="% Black effect", col="transparent")

spplot(gwr.dat, "scale.phisp.", at=quantile(gwr.f$SDF$scale.phisp.), col.regions=cols, main="% Hispanic effect", col="transparent")

#another "lighter" plot method
#brks<-c(-3.038e-04,  3.441e-04,  5.882e-04,  7.912e-04 , 1.355e-03); cols<-brewer.pal(n=5,"Reds")
#plot(gwr.a$SDF, col=cols[findInterval(gwr.a$SDF$scale.ppersonspo.,brks, all.inside=TRUE)])

#Local R^2
brks<-seq(0, 1, length.out = 9); cols=brewer.pal(n=9, "Reds")
spplot(gwr.f$SDF, "localR2", at=brks, col.regions=cols, main="local R^2", col="transparent")

#plots of the correlations of the coefficients
pairs(gwr.f$SDF[,3:7])

cor(as(gwr.f$SDF[,3:7], "data.frame"))
##                   scale.ppersonspo. scale.p65plus. scale.pblack_1.
## scale.ppersonspo.        1.00000000    -0.28695573     -0.05725905
## scale.p65plus.          -0.28695573     1.00000000     -0.32080137
## scale.pblack_1.         -0.05725905    -0.32080137      1.00000000
## scale.phisp.            -0.45583825    -0.07608532      0.28078030
## I.RUCC....7.TRUE        -0.11568366    -0.03801324     -0.32057620
##                   scale.phisp. I.RUCC....7.TRUE
## scale.ppersonspo.  -0.45583825      -0.11568366
## scale.p65plus.     -0.07608532      -0.03801324
## scale.pblack_1.     0.28078030      -0.32057620
## scale.phisp.        1.00000000      -0.21497469
## I.RUCC....7.TRUE   -0.21497469       1.00000000

This plot shows a problem in GWR, that the coefficients can often be highly correlated See Wheeler and Tiefelsdorf (2005), one of the major critiques of GWR. In general, the critiques of GWR have pushed most users to use it for “exploratory” purposes only. So, one way we may explore our model is to examine where the regression effects are “significant” versus not. This can be done by plotting the local t-statistics, or z-statistics if you prefer.

cols<-brewer.pal(8,"RdBu")
dfree<-gwr.f$results$edf

names(gwr.dat)
##  [1] "sum.w"                    "X.Intercept."            
##  [3] "scale.ppersonspo."        "scale.p65plus."          
##  [5] "scale.pblack_1."          "scale.phisp."            
##  [7] "I.RUCC....7.TRUE"         "X.Intercept._se"         
##  [9] "scale.ppersonspo._se"     "scale.p65plus._se"       
## [11] "scale.pblack_1._se"       "scale.phisp._se"         
## [13] "I.RUCC....7.TRUE_se"      "gwr.e"                   
## [15] "pred"                     "pred.se"                 
## [17] "localR2"                  "X.Intercept._se_EDF"     
## [19] "scale.ppersonspo._se_EDF" "scale.p65plus._se_EDF"   
## [21] "scale.pblack_1._se_EDF"   "scale.phisp._se_EDF"     
## [23] "I.RUCC....7.TRUE_se_EDF"  "pred.se_EDF"
gwr.dat$pov.t<-gwr.dat$scale.ppersonspo./gwr.dat$scale.ppersonspo._se
gwr.dat$pov.t.p<-2*pt(-abs(gwr.dat$pov.t)  , dfree)
tbrks<- c(min(gwr.dat$pov.t),qt(c( .975, .9, .1, .025),df=dfree, lower.tail = F),max(gwr.dat$pov.t))
spplot(gwr.dat, "pov.t", col.regions=cols, at=tbrks , main="t-stat %Poverty", col="transparent")

gwr.dat$black.t<-gwr.dat$scale.pblack_1./gwr.dat$scale.pblack_1._se
gwr.dat$black.t.p<-2*pt(-abs(gwr.dat$black.t)  , dfree)
tbrks<- c(min(gwr.dat$black.t),qt(c( .975, .9, .1, .025),df=dfree, lower.tail = F),max(gwr.dat$black.t))
spplot(gwr.dat, "black.t", col.regions=cols,at=tbrks , main="t-stat %Black", col="transparent")

gwr.dat$hisp.t<-gwr.dat$scale.phisp./gwr.dat$scale.phisp._se
gwr.dat$hisp.t.p<-2*pt(-abs(gwr.dat$hisp.t)  , dfree)
tbrks<- c(min(gwr.dat$hisp.t),qt(c( .975, .9, .1, .025),df=dfree, lower.tail = F),max(gwr.dat$hisp.t))
spplot(gwr.dat, "hisp.t", col.regions=cols, at=tbrks, main="t-stat %Hispanic", col="transparent")

So, those are the results from the fixed bandwith analysis. But there are obviously places in our data where counties are more densely occurring, so an adaptive kernel approach is also warranted. If all our data were equally spaced, then this may not be necessary.

Now we try everthing with the adaptive bandwidth:

gwr.b.a<-gwr.sel(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), spdat, adapt=T)
## Adaptive q: 0.381966 CV score: 0.001136242 
## Adaptive q: 0.618034 CV score: 0.001142128 
## Adaptive q: 0.236068 CV score: 0.001126485 
## Adaptive q: 0.145898 CV score: 0.00111129 
## Adaptive q: 0.09016994 CV score: 0.001092296 
## Adaptive q: 0.05572809 CV score: 0.001062696 
## Adaptive q: 0.03444185 CV score: 0.001034263 
## Adaptive q: 0.02128624 CV score: 0.001018798 
## Adaptive q: 0.01315562 CV score: 0.00103849 
## Adaptive q: 0.0243862 CV score: 0.001023387 
## Adaptive q: 0.01818062 CV score: 0.001027968 
## Adaptive q: 0.02180007 CV score: 0.001018992 
## Adaptive q: 0.02110114 CV score: 0.001018775 
## Adaptive q: 0.0210191 CV score: 0.001018773 
## Adaptive q: 0.02097841 CV score: 0.001018774 
## Adaptive q: 0.0210598 CV score: 0.001018773 
## Adaptive q: 0.0210191 CV score: 0.001018773
gwr.b.a
## [1] 0.0210191

This is the proportion of all cases which the weighting fuction will search, and include this fraction of observations in a model for each county. In this case it will include 18.2025446 counties in the model for a particular county.

gwr.a<-gwr(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7), spdat, adapt=gwr.b.a, se.fit=T, hatmatrix=T )
gwr.a
## Call:
## gwr(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) + 
##     scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat, 
##     adapt = gwr.b.a, hatmatrix = T, se.fit = T)
## Kernel function: gwr.Gauss 
## Adaptive quantile: 0.0210191 (about 18 of 866 data points)
## Summary of GWR coefficient estimates at data points:
##                         Min.    1st Qu.     Median    3rd Qu.       Max.
## X.Intercept.       7.715e-03  9.985e-03  1.046e-02  1.085e-02  1.233e-02
## scale.ppersonspo. -6.379e-04  3.358e-04  5.552e-04  7.810e-04  1.369e-03
## scale.p65plus.     1.664e-03  2.201e-03  2.348e-03  2.498e-03  3.195e-03
## scale.pblack_1.   -1.737e-03 -2.639e-05  1.697e-04  4.175e-04  3.495e-03
## scale.phisp.      -5.445e-03 -1.196e-03 -7.769e-04 -1.955e-04  1.793e-03
## I.RUCC....7.TRUE  -1.383e-03 -4.946e-04 -2.727e-04 -1.530e-05  4.533e-04
##                    Global
## X.Intercept.       0.0106
## scale.ppersonspo.  0.0006
## scale.p65plus.     0.0024
## scale.pblack_1.    0.0002
## scale.phisp.      -0.0007
## I.RUCC....7.TRUE  -0.0003
## Number of data points: 866 
## Effective number of parameters (residual: 2traceS - traceS'S): 159.6773 
## Effective degrees of freedom (residual: 2traceS - traceS'S): 706.3227 
## Sigma (residual: 2traceS - traceS'S): 0.001027636 
## Effective number of parameters (model: traceS): 113.9685 
## Effective degrees of freedom (model: traceS): 752.0315 
## Sigma (model: traceS): 0.0009959163 
## Sigma (ML): 0.0009280725 
## AICc (GWR p. 61, eq 2.33; p. 96, eq. 4.21): -9370.427 
## AIC (GWR p. 96, eq. 4.22): -9521.948 
## Residual sum of squares: 0.000745902 
## Quasi-global R2: 0.8861478
BFC02.gwr.test(gwr.a)
## 
##  Brunsdon, Fotheringham & Charlton (2002, pp. 91-2) ANOVA
## 
## data:  gwr.a
## F = 1.5265, df1 = 860.00, df2 = 706.32, p-value = 2.871e-09
## alternative hypothesis: greater
## sample estimates:
## SS OLS residuals SS GWR residuals 
##      0.001138601      0.000745902
BFC99.gwr.test(gwr.a)
## 
##  Brunsdon, Fotheringham & Charlton (1999) ANOVA
## 
## data:  gwr.a
## F = 2.4198, df1 = 641.80, df2 = 766.23, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## SS GWR improvement   SS GWR residuals 
##       0.0003926993       0.0007459020
LMZ.F1GWR.test(gwr.a)
## 
##  Leung et al. (2000) F(1) test
## 
## data:  gwr.a
## F = 0.79764, df1 = 766.23, df2 = 860.00, p-value = 0.0006739
## alternative hypothesis: less
## sample estimates:
## SS OLS residuals SS GWR residuals 
##      0.001138601      0.000745902
LMZ.F2GWR.test(gwr.a)
## 
##  Leung et al. (2000) F(2) test
## 
## data:  gwr.a
## F = 1.9301, df1 = 239.87, df2 = 860.00, p-value = 7.202e-12
## alternative hypothesis: greater
## sample estimates:
##   SS OLS residuals SS GWR improvement 
##       0.0011386013       0.0003926993
LMZ.F3GWR.test(gwr.a)
## 
## Leung et al. (2000) F(3) test
## 
##                   F statistic Numerator d.f. Denominator d.f.     Pr(>)
## (Intercept)            1.1130        39.1932           766.23  0.295026
## scale(ppersonspo)      2.2536       221.9249           766.23 3.657e-16
## scale(p65plus)         1.6122       210.3144           766.23 2.736e-06
## scale(pblack_1)        1.3562        51.0037           766.23  0.053086
## scale(phisp)           1.3343        34.5128           766.23  0.097159
## I(RUCC >= 7)TRUE       1.3079       320.3032           766.23  0.001787
##                      
## (Intercept)          
## scale(ppersonspo) ***
## scale(p65plus)    ***
## scale(pblack_1)   .  
## scale(phisp)      .  
## I(RUCC >= 7)TRUE  ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
gwr.data<-gwr.a$SDF
spplot(gwr.data, "scale.ppersonspo.", at=quantile(gwr.a$SDF$scale.ppersonspo.), col.regions=cols, main="% Poverty effect", col="transparent")

spplot(gwr.data, "scale.pblack_1.", at=quantile(gwr.a$SDF$scale.pblack_1.), col.regions=cols, main="% Black effect", col="transparent")

spplot(gwr.data, "scale.phisp.", at=quantile(gwr.a$SDF$scale.phisp.), col.regions=cols, main="% Hispanic effect", col="transparent")

cols<-brewer.pal(8,"RdBu")
dfree<-gwr.a$results$edf

names(gwr.data)
##  [1] "sum.w"                    "X.Intercept."            
##  [3] "scale.ppersonspo."        "scale.p65plus."          
##  [5] "scale.pblack_1."          "scale.phisp."            
##  [7] "I.RUCC....7.TRUE"         "X.Intercept._se"         
##  [9] "scale.ppersonspo._se"     "scale.p65plus._se"       
## [11] "scale.pblack_1._se"       "scale.phisp._se"         
## [13] "I.RUCC....7.TRUE_se"      "gwr.e"                   
## [15] "pred"                     "pred.se"                 
## [17] "localR2"                  "X.Intercept._se_EDF"     
## [19] "scale.ppersonspo._se_EDF" "scale.p65plus._se_EDF"   
## [21] "scale.pblack_1._se_EDF"   "scale.phisp._se_EDF"     
## [23] "I.RUCC....7.TRUE_se_EDF"  "pred.se_EDF"
gwr.data$pov.t<-gwr.data$scale.ppersonspo./gwr.data$scale.ppersonspo._se
gwr.data$pov.t.p<-2*pt(-abs(gwr.data$pov.t)  , dfree)
tbrks<- c(min(gwr.data$pov.t),qt(c( .975, .9, .1, .025),df=dfree, lower.tail = F),max(gwr.data$pov.t))
spplot(gwr.data, "pov.t", col.regions=cols, at=tbrks , main="t-stat %Poverty", col="transparent")

gwr.data$black.t<-gwr.data$scale.pblack_1./gwr.data$scale.pblack_1._se
gwr.data$black.t.p<-2*pt(-abs(gwr.data$black.t)  , dfree)
tbrks<- c(min(gwr.data$black.t),qt(c( .975, .9, .1, .025),df=dfree, lower.tail = F),max(gwr.data$black.t))
spplot(gwr.data, "black.t", col.regions=cols,at=tbrks , main="t-stat %Black", col="transparent")

gwr.data$hisp.t<-gwr.data$scale.phisp./gwr.data$scale.phisp._se
gwr.data$hisp.t.p<-2*pt(-abs(gwr.data$hisp.t)  , dfree)
tbrks<- c(min(gwr.data$hisp.t),qt(c( .975, .9, .1, .025),df=dfree, lower.tail = F),max(gwr.data$hisp.t))
spplot(gwr.data, "hisp.t", col.regions=cols, at=tbrks, main="t-stat %Hispanic", col="transparent")

#Local R^2
brks<-seq(0, 1, length.out = 9); cols=brewer.pal(n=9, "Reds")
spplot(gwr.a$SDF, "localR2", at=brks, col.regions=cols, main="local R^2", col="transparent")

Alternative Construction of spatial regimes - Clustering on GWR coefficients

The first step in the construction of spatial regimes is to do a hierarchical clustering method. We use a distance matrix between all observations for this In this case, we’re using the GWR coefficients as our data.

dat.dist<-dist(gwr.data@data[, 3:7])

#The hclust() function does basic hierarchical clustering
#according to several different methods, we'll use Ward's method
clust.dat<-hclust(dat.dist, method="ward.D")

#And we'll plot the dendrogram, or tree plot
plot(clust.dat)

#I only want a few groups, so I'll cut the tree so I get 5 clusters
gwr.data$clus<-cutree(clust.dat, k=4)


#i'll use table() to get the frequencies of each cluster
table(gwr.data$clus)
## 
##   1   2   3   4 
## 134 348  93 291
#to get it to plot right, we have to convert the cluster number
#to a factor variable
gwr.data$b.cf<-as.factor(gwr.data$clus)
spplot(gwr.data,"b.cf", col.regions=brewer.pal(4, "Accent"), main="GWR Spatial Regimes", col="transparent")

#We can get the means of each coefficient for each of the regime areas
b.means<-aggregate(gwr.data@data[, 3:5], by=list(gwr.data$clus), mean)
b.means
##   Group.1 scale.ppersonspo. scale.p65plus. scale.pblack_1.
## 1       1      0.0006567404    0.002269058    3.272880e-04
## 2       2      0.0006995221    0.002316225    1.266564e-04
## 3       3      0.0006656177    0.002479362    1.761626e-05
## 4       4      0.0002322601    0.002385656    4.692997e-04
#The main idea behind examining spatial regimes is that we are looking for areas where the model may be working differently, we've done this, now we will
#fit separate ols models to each regime
spdat$cluster<-gwr.data$clus
lm1<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7),spdat, subset=spdat$cluster==1)
lm2<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7),spdat, subset=spdat$cluster==2)
lm3<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7),spdat, subset=spdat$cluster==3)
lm4<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7),spdat, subset=spdat$cluster==4)

#examine each of the spatial regimes
summary(lm1)
## 
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) + 
##     scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat, 
##     subset = spdat$cluster == 1)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0037214 -0.0006349 -0.0000029  0.0004542  0.0030069 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        0.0097931  0.0001895  51.673  < 2e-16 ***
## scale(ppersonspo)  0.0005286  0.0001448   3.652 0.000379 ***
## scale(p65plus)     0.0020632  0.0001198  17.218  < 2e-16 ***
## scale(pblack_1)    0.0001018  0.0001181   0.861 0.390597    
## scale(phisp)      -0.0018129  0.0002552  -7.105 7.47e-11 ***
## I(RUCC >= 7)TRUE   0.0000471  0.0001993   0.236 0.813563    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.001006 on 128 degrees of freedom
## Multiple R-squared:  0.7932, Adjusted R-squared:  0.7851 
## F-statistic: 98.17 on 5 and 128 DF,  p-value: < 2.2e-16
summary(lm2)
## 
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) + 
##     scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat, 
##     subset = spdat$cluster == 2)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0042561 -0.0005575  0.0000333  0.0005845  0.0028865 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        1.069e-02  8.774e-05 121.890  < 2e-16 ***
## scale(ppersonspo)  7.185e-04  1.058e-04   6.792  4.9e-11 ***
## scale(p65plus)     2.242e-03  7.741e-05  28.963  < 2e-16 ***
## scale(pblack_1)   -9.560e-05  1.219e-04  -0.784    0.434    
## scale(phisp)      -9.431e-04  9.797e-05  -9.627  < 2e-16 ***
## I(RUCC >= 7)TRUE  -1.640e-04  1.396e-04  -1.175    0.241    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.001114 on 342 degrees of freedom
## Multiple R-squared:  0.8315, Adjusted R-squared:  0.829 
## F-statistic: 337.5 on 5 and 342 DF,  p-value: < 2.2e-16
summary(lm3)
## 
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) + 
##     scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat, 
##     subset = spdat$cluster == 3)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0026446 -0.0006281 -0.0000377  0.0006559  0.0032255 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        9.092e-03  1.469e-03   6.190 1.92e-08 ***
## scale(ppersonspo)  6.966e-04  1.779e-04   3.917 0.000178 ***
## scale(p65plus)     2.585e-03  2.695e-04   9.590 2.78e-15 ***
## scale(pblack_1)   -5.846e-05  1.446e-04  -0.404 0.686912    
## scale(phisp)      -4.293e-03  2.443e-03  -1.757 0.082377 .  
## I(RUCC >= 7)TRUE  -5.338e-04  2.752e-04  -1.940 0.055658 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.001077 on 87 degrees of freedom
## Multiple R-squared:  0.6692, Adjusted R-squared:  0.6502 
## F-statistic:  35.2 on 5 and 87 DF,  p-value: < 2.2e-16
summary(lm4)
## 
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) + 
##     scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat, 
##     subset = spdat$cluster == 4)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0042116 -0.0006219  0.0000312  0.0007016  0.0031745 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        1.040e-02  1.116e-04  93.204  < 2e-16 ***
## scale(ppersonspo)  1.589e-04  9.909e-05   1.603  0.10996    
## scale(p65plus)     2.516e-03  6.619e-05  38.019  < 2e-16 ***
## scale(pblack_1)    6.119e-04  1.450e-04   4.221 3.28e-05 ***
## scale(phisp)      -1.912e-04  8.480e-05  -2.254  0.02494 *  
## I(RUCC >= 7)TRUE  -4.607e-04  1.508e-04  -3.056  0.00246 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.001123 on 285 degrees of freedom
## Multiple R-squared:  0.8664, Adjusted R-squared:  0.864 
## F-statistic: 369.6 on 5 and 285 DF,  p-value: < 2.2e-16

Or, we could cluster on the original attributes in the data to find similar places based on their demographic variables:

dem.dist<-dist(spdat@data[, c("ppersonspo", "p65plus", "pblack_1", "phisp")])

#The hclust() function does basic hierarchical clustering
#according to several different methods, we'll use Ward's method
clust.dem<-hclust(dem.dist, method="ward.D")

#And we'll plot the dendrogram, or tree plot
plot(clust.dem)

#I only want a few groups, so I'll cut the tree so I get 4 clusters
spdat$clus<-cutree(clust.dem, k=5)


#i'll use table() to get the frequencies of each cluster
table(spdat$clus)
## 
##   1   2   3   4   5 
## 121 437  81 145  82
#to get it to plot right, we have to convert the cluster number
#to a factor variable
spdat$d.cf<-as.factor(spdat$clus)
spplot(spdat,"d.cf", col.regions=brewer.pal(5, "Accent"), main="Demographic Spatial Regimes", col="transparent")

#We can get the means of each coefficient for each of the regime areas
b.means<-aggregate(spdat@data[, c("ppersonspo", "p65plus", "pblack_1", "phisp")], by=list(spdat$d.cf), mean)
b.means
##   Group.1 ppersonspo   p65plus   pblack_1      phisp
## 1       1  0.1870019 0.1347202 0.24460231 0.05091380
## 2       2  0.1333486 0.1601399 0.02868554 0.03607801
## 3       3  0.2691394 0.1334080 0.50796272 0.01291506
## 4       4  0.1592734 0.1560563 0.03786441 0.25512090
## 5       5  0.2359616 0.1297745 0.01963549 0.60438085
#fit separate ols models to each regime

lm1<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7),spdat, subset=spdat$d.cf==1)
lm2<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7),spdat, subset=spdat$d.cf==2)
lm3<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7),spdat, subset=spdat$d.cf==3)
lm4<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7),spdat, subset=spdat$d.cf==4)
lm5<-lm(I(deaths/population)~scale(ppersonspo)+scale(p65plus)+scale(pblack_1)+scale(phisp)+I(RUCC>=7),spdat, subset=spdat$d.cf==5)
#examine each of the spatial regimes
summary(lm1)
## 
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) + 
##     scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat, 
##     subset = spdat$d.cf == 1)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0040618 -0.0004403 -0.0000458  0.0004309  0.0032204 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        1.058e-02  2.234e-04  47.347  < 2e-16 ***
## scale(ppersonspo)  4.268e-04  1.592e-04   2.681  0.00843 ** 
## scale(p65plus)     2.593e-03  1.437e-04  18.040  < 2e-16 ***
## scale(pblack_1)    9.025e-05  2.503e-04   0.361  0.71908    
## scale(phisp)      -1.407e-03  3.184e-04  -4.419 2.25e-05 ***
## I(RUCC >= 7)TRUE   5.660e-05  1.979e-04   0.286  0.77538    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0008934 on 115 degrees of freedom
## Multiple R-squared:  0.8305, Adjusted R-squared:  0.8231 
## F-statistic: 112.7 on 5 and 115 DF,  p-value: < 2.2e-16
summary(lm2)
## 
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) + 
##     scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat, 
##     subset = spdat$d.cf == 2)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0042864 -0.0005923  0.0000077  0.0006263  0.0033010 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        1.084e-02  1.871e-04  57.968  < 2e-16 ***
## scale(ppersonspo)  7.084e-04  7.846e-05   9.029  < 2e-16 ***
## scale(p65plus)     2.509e-03  5.758e-05  43.566  < 2e-16 ***
## scale(pblack_1)    5.618e-04  2.379e-04   2.362 0.018637 *  
## scale(phisp)      -7.625e-04  2.874e-04  -2.653 0.008270 ** 
## I(RUCC >= 7)TRUE  -4.274e-04  1.197e-04  -3.570 0.000397 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.00109 on 431 degrees of freedom
## Multiple R-squared:  0.8675, Adjusted R-squared:  0.866 
## F-statistic: 564.4 on 5 and 431 DF,  p-value: < 2.2e-16
summary(lm3)
## 
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) + 
##     scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat, 
##     subset = spdat$d.cf == 3)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -2.723e-03 -4.447e-04  2.352e-05  6.502e-04  1.918e-03 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        1.039e-02  1.362e-03   7.630 6.02e-11 ***
## scale(ppersonspo)  4.289e-04  2.122e-04   2.021   0.0468 *  
## scale(p65plus)     2.652e-03  2.594e-04  10.225 7.22e-16 ***
## scale(pblack_1)    2.052e-04  2.441e-04   0.841   0.4032    
## scale(phisp)      -8.199e-04  2.261e-03  -0.363   0.7179    
## I(RUCC >= 7)TRUE   8.991e-05  2.584e-04   0.348   0.7288    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.001042 on 75 degrees of freedom
## Multiple R-squared:  0.6484, Adjusted R-squared:  0.6249 
## F-statistic: 27.66 on 5 and 75 DF,  p-value: 9.194e-16
summary(lm4)
## 
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) + 
##     scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat, 
##     subset = spdat$d.cf == 4)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0043452 -0.0006337  0.0000419  0.0007731  0.0031712 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        0.0108390  0.0002709  40.015  < 2e-16 ***
## scale(ppersonspo)  0.0009881  0.0002048   4.824 3.65e-06 ***
## scale(p65plus)     0.0021361  0.0001222  17.480  < 2e-16 ***
## scale(pblack_1)    0.0001032  0.0004490   0.230  0.81851    
## scale(phisp)      -0.0007386  0.0002557  -2.889  0.00449 ** 
## I(RUCC >= 7)TRUE  -0.0005763  0.0002387  -2.414  0.01708 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.001215 on 139 degrees of freedom
## Multiple R-squared:  0.8231, Adjusted R-squared:  0.8167 
## F-statistic: 129.4 on 5 and 139 DF,  p-value: < 2.2e-16
summary(lm5)
## 
## Call:
## lm(formula = I(deaths/population) ~ scale(ppersonspo) + scale(p65plus) + 
##     scale(pblack_1) + scale(phisp) + I(RUCC >= 7), data = spdat, 
##     subset = spdat$d.cf == 5)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0040018 -0.0007696 -0.0000850  0.0008709  0.0025106 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        1.044e-02  6.271e-04  16.643  < 2e-16 ***
## scale(ppersonspo) -7.492e-05  1.819e-04  -0.412    0.682    
## scale(p65plus)     1.367e-03  2.009e-04   6.805 2.04e-09 ***
## scale(pblack_1)    1.380e-03  1.092e-03   1.264    0.210    
## scale(phisp)      -2.976e-04  2.380e-04  -1.251    0.215    
## I(RUCC >= 7)TRUE   1.261e-04  3.070e-04   0.411    0.682    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.001247 on 76 degrees of freedom
## Multiple R-squared:  0.4972, Adjusted R-squared:  0.4642 
## F-statistic: 15.03 on 5 and 76 DF,  p-value: 3.025e-10