This data set was presented first in Symons et al. (1983), analysed with reference to the spatial nature of the data in Cressie and Read (1985), expanded in Cressie and Chan (1989), and used in detail in Cressie (1991). It is for the 100 counties of North Carolina, and includes counts of numbers of live births (also non-white live births) and numbers of sudden infant deaths, for the July 1, 1974 to June 30, 1978 and July 1, 1979 to June 30, 1984 periods. The county seat location coordinates are given in miles in a local (unknown) coordinate reference system.
For this labwork the spdep package will be employed which has the sp package and the maptools package. The data from the sources refered to above is documented in the help page for the nc.sids data set in spdep.The actual data, included in a shapefile of the county boundaries for North Carolina is also made available in the maptools package.
This data frame contains the following columns:
SP_ID SpatialPolygons ID
CNTY_ID county ID
east eastings, county seat, miles, local projection
north northings, county seat, miles, local projection
L_id Cressie and Read (1985) L index
M_id Cressie and Read (1985) M index
names County names
AREA County polygon areas in degree units
PERIMETER County polygon perimeters in degree units
CNTY_ Internal county ID
NAME County names
FIPS County ID
FIPSNO County ID
CRESS_ID Cressie papers ID
BIR74 births, 1974-78
SID74 SID deaths, 1974-78
NWBIR74 non-white births, 1974-78
BIR79 births, 1979-84
SID79 SID deaths, 1979-84
NWBIR79 non-white births, 1979-84
setwd("C:/Users/charl/OneDrive/Desktop/DSCI 607 Data Analytics for Environmental Sciences")
#install.packages("spdep")
#install.packages("maptools")
#install.packages("spatialreg")
library(spdep)
## Warning: package 'spdep' was built under R version 4.3.3
## Loading required package: spData
## To access larger datasets in this package, install the spDataLarge
## package with: `install.packages('spDataLarge',
## repos='https://nowosad.github.io/drat/', type='source')`
## Loading required package: sf
## Linking to GEOS 3.11.2, GDAL 3.7.2, PROJ 9.3.0; sf_use_s2() is TRUE
library(maptools)
## Loading required package: sp
## Please note that 'maptools' will be retired during October 2023,
## plan transition at your earliest convenience (see
## https://r-spatial.org/r/2023/05/15/evolution4.html and earlier blogs
## for guidance);some functionality will be moved to 'sp'.
## Checking rgeos availability: FALSE
##
## Attaching package: 'maptools'
## The following object is masked from 'package:sp':
##
## sp2Mondrian
library(spatialreg)
## Warning: package 'spatialreg' was built under R version 4.3.3
## Loading required package: Matrix
##
## Attaching package: 'spatialreg'
## The following objects are masked from 'package:spdep':
##
## get.ClusterOption, get.coresOption, get.mcOption,
## get.VerboseOption, get.ZeroPolicyOption, set.ClusterOption,
## set.coresOption, set.mcOption, set.VerboseOption,
## set.ZeroPolicyOption
data(nc.sids)
help(nc.sids) # check the data and variables
## starting httpd help server ... done
head(nc.sids,5)
## CNTY.ID BIR74 SID74 NWBIR74 BIR79 SID79 NWBIR79 east north x
## Ashe 1825 1091 1 10 1364 0 19 164 176 -81.67
## Alleghany 1827 487 0 10 542 3 12 183 182 -50.06
## Surry 1828 3188 5 208 3616 6 260 204 174 -16.14
## Currituck 1831 508 1 123 830 2 145 461 182 406.01
## Northampton 1832 1421 9 1066 1606 3 1197 385 176 281.10
## y lon lat L.id M.id
## Ashe 4052.29 -81.48594 36.43940 1 2
## Alleghany 4059.70 -81.14061 36.52443 1 2
## Surry 4043.76 -80.75312 36.40033 1 2
## Currituck 4035.10 -76.04892 36.45655 1 4
## Northampton 4029.75 -77.44057 36.38799 1 4
names(nc.sids)
## [1] "CNTY.ID" "BIR74" "SID74" "NWBIR74" "BIR79" "SID79" "NWBIR79"
## [8] "east" "north" "x" "y" "lon" "lat" "L.id"
## [15] "M.id"
summary(nc.sids)
## CNTY.ID BIR74 SID74 NWBIR74
## Min. :1825 Min. : 248 Min. : 0.00 Min. : 1.0
## 1st Qu.:1902 1st Qu.: 1077 1st Qu.: 2.00 1st Qu.: 190.0
## Median :1982 Median : 2180 Median : 4.00 Median : 697.5
## Mean :1986 Mean : 3300 Mean : 6.67 Mean :1051.0
## 3rd Qu.:2067 3rd Qu.: 3936 3rd Qu.: 8.25 3rd Qu.:1168.5
## Max. :2241 Max. :21588 Max. :44.00 Max. :8027.0
## BIR79 SID79 NWBIR79 east
## Min. : 319 Min. : 0.00 Min. : 3.0 Min. : 19.0
## 1st Qu.: 1336 1st Qu.: 2.00 1st Qu.: 250.5 1st Qu.:178.8
## Median : 2636 Median : 5.00 Median : 874.5 Median :285.0
## Mean : 4224 Mean : 8.36 Mean : 1352.8 Mean :271.3
## 3rd Qu.: 4889 3rd Qu.:10.25 3rd Qu.: 1406.8 3rd Qu.:361.2
## Max. :30757 Max. :57.00 Max. :11631.0 Max. :482.0
## north x y lon
## Min. : 6.0 Min. :-328.04 Min. :3757 Min. :-84.08
## 1st Qu.: 97.0 1st Qu.: -60.55 1st Qu.:3920 1st Qu.:-81.20
## Median :125.5 Median : 114.38 Median :3963 Median :-79.26
## Mean :122.1 Mean : 91.46 Mean :3953 Mean :-79.51
## 3rd Qu.:151.5 3rd Qu.: 240.03 3rd Qu.:4000 3rd Qu.:-77.87
## Max. :182.0 Max. : 439.65 Max. :4060 Max. :-75.67
## lat L.id M.id
## Min. :33.92 Min. :1.00 Min. :1.00
## 1st Qu.:35.26 1st Qu.:1.00 1st Qu.:2.00
## Median :35.68 Median :2.00 Median :3.00
## Mean :35.62 Mean :2.12 Mean :2.67
## 3rd Qu.:36.05 3rd Qu.:3.00 3rd Qu.:3.25
## Max. :36.52 Max. :4.00 Max. :4.00
sidspolys <- readShapePoly(system.file("shapes/sids.shp", package="maptools"))
## Warning: shapelib support is provided by GDAL through the sf and terra packages
## among others
plot(sidspolys,axes=T)
#text(coordinates(sidspolys), labels=row.names(sidspolys), cex=0.6)
We can plot a map of counties coded according to the value of a variable, say SID74. First, we examine the values of SID74 cut the values of the variable in intervals
nc.sids$SID74 #SID deaths, 1974-78
## [1] 1 0 5 1 9 7 0 0 4 1 2 16 4 4 4 18 3 4 1 1 1 0 1 2 10
## [26] 23 13 6 4 16 8 0 10 6 0 2 16 2 4 1 0 8 5 5 0 5 7 2 11 3
## [51] 14 5 9 6 2 0 7 3 4 5 12 18 6 10 8 2 5 44 3 3 5 5 0 10 3
## [76] 11 1 0 4 1 2 38 1 4 15 7 0 4 4 0 13 8 29 31 5 8 4 15 12 5
Sid74 <- as.ordered(cut(nc.sids$SID74, breaks=c(0, 5, 10, 15, 20, 30, 40, 50, 60),
include.lowest=TRUE))
unclass(Sid74)
## [1] 1 1 1 1 2 2 1 1 1 1 1 4 1 1 1 4 1 1 1 1 1 1 1 1 2 5 3 2 1 4 2 1 2 2 1 1 4
## [38] 1 1 1 1 2 1 1 1 1 2 1 3 1 3 1 2 2 1 1 2 1 1 1 3 4 2 2 2 1 1 7 1 1 1 1 1 2
## [75] 1 3 1 1 1 1 1 6 1 1 3 2 1 1 1 1 3 2 5 6 1 2 1 3 3 1
## attr(,"levels")
## [1] "[0,5]" "(5,10]" "(10,15]" "(15,20]" "(20,30]" "(30,40]" "(40,50]"
Plot the map of polygons according to the intervals and color codes selected
cols <- grey(seq(0.3,1,0.1))
plot(sidspolys, col=cols[unclass(Sid74)], border = par("fg"),axes=T)
legend(c(-84,-82 ), c(32, 34), legend=paste("Sids 74", levels(Sid74)), fill=cols, bty="n")
We examine the values of NWBIR74 cut the values of the variable in intervals
nc.sids$NWBIR74 #NWBIR74, 1974-78
## [1] 10 10 208 123 1066 954 115 254 748 160 550 1243 930 613 1179
## [16] 2365 622 200 17 230 386 4 65 736 3919 5483 1243 921 776 3732
## [31] 1851 1 2186 309 12 883 4397 5 1144 148 128 736 326 521 116
## [46] 134 384 591 1827 1057 2620 790 930 1165 57 43 1131 281 534 736
## [61] 495 2593 1051 1491 302 215 844 8027 856 472 370 158 40 1826 92
## [76] 1523 95 9 1396 222 32 7043 297 1034 952 987 134 1061 1043 1
## [91] 1744 1206 2217 5904 341 818 580 1431 1633 659
nwbir74 <- as.ordered(cut(nc.sids$NWBIR74, breaks=c(0, 5, 10, 15, 20, 30, 40, 50, 60),
include.lowest=TRUE))
unclass(nwbir74)
## [1] 2 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 4 NA NA 1 NA NA NA
## [26] NA NA NA NA NA NA 1 NA NA 3 NA NA 1 NA NA NA NA NA NA NA NA NA NA NA NA
## [51] NA NA NA NA 7 6 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 5 NA NA
## [76] NA NA 2 NA NA 5 NA NA NA NA NA NA NA NA 1 NA NA NA NA NA NA NA NA NA NA
## attr(,"levels")
## [1] "[0,5]" "(5,10]" "(10,15]" "(15,20]" "(30,40]" "(40,50]" "(50,60]"
cols <- grey(seq(0.3,1,0.1))
plot(sidspolys, col=cols[unclass(nwbir74)], border = par("fg"),axes=T)
legend(c(-84,-82 ), c(32, 34), legend=paste("NWBIR74", levels(Sid74)), fill=cols, bty="n")
We can also view the map as probabilities of observing these values if
rates were to follow a Poisson distribution. First, compute the
parameter for the Poisson
sids.phat <- sum(nc.sids$SID74) / sum(nc.sids$BIR74)
# apply the cumulative of the Poisson (ppois)
pm <- ppois(nc.sids$SID74, sids.phat*nc.sids$BIR74)
pm.f <- as.ordered(cut(pm, breaks=c(0.0, 0.01, 0.05, 0.1, 0.9, 0.95, 0.99, 1),
include.lowest=TRUE))
We assign colors and map
cols <- grey(seq(0.3,1,0.1))
plot(sidspolys, col=cols[unclass(pm.f)],border = par("fg"),axes=T)
legend(c(-84,-82 ), c(32, 34), legend=paste("prob.", levels(pm.f)), fill=cols, bty="n")
For auto-correlation and auto-regression, we need to build a neighbor
structure to be stored as object of class nb.
sids.coords <- cbind(nc.sids$east, nc.sids$north)
sids.utm <- cbind(nc.sids$x, nc.sids$y)
sids.lonlat <- cbind(nc.sids$lon, nc.sids$lat)
sids.nb <- dnearneigh(sids.coords, 0, 30, row.names = rownames(nc.sids))
sids.nb
## Neighbour list object:
## Number of regions: 100
## Number of nonzero links: 398
## Percentage nonzero weights: 3.98
## Average number of links: 3.98
## 2 regions with no links:
## Dare Hyde
## 3 disjoint connected subgraphs
# you can find the link for each county by addressing the id number, for example for counties 1
# and 2
sids.nb[1:2]
## [[1]]
## [1] 2 18 19
##
## [[2]]
## [1] 1 3 18
# where you see some isolated county seats. Increase the neighbor cutoff to 35 miles to see if we include these counties
sids.nb <- dnearneigh(sids.coords, 0, 35, row.names = rownames(nc.sids))
plot.nb(sids.nb, sids.coords)
plot(sidspolys, border = "blue")
plot(sids.nb, sids.lonlat, add=T)
# Next, we need weights for neighborhood structure. In this case, a simple calculation of weights is
# the inverse of the distance; i.e. the closest the regions the larger the weight (the closest two regions are, the more intense the neighbor effect).
# We can get distances between each pair
sids.dists <- nbdists(sids.nb, sids.coords)
head(sids.dists)
## [[1]]
## [1] 19.92486 25.49510 19.10497 34.05877 34.52535
##
## [[2]]
## [1] 19.92486 22.47221 25.07987
##
## [[3]]
## [1] 22.47221 29.01724 28.60070 18.43909
##
## [[4]]
## [1] 12.04159 14.14214 30.61046
##
## [[5]]
## [1] 26.00000 10.29563 34.71311
##
## [[6]]
## [1] 26.00000 10.04988 29.06888 27.80288 28.00000
# and also we can get an intensity of neighborhood using the inverse of the distance applied to all
# pairs
inten <- lapply(sids.dists, function(x) 1/x)
head(inten)
## [[1]]
## [1] 0.05018856 0.03922323 0.05234239 0.02936101 0.02896422
##
## [[2]]
## [1] 0.05018856 0.04449942 0.03987261
##
## [[3]]
## [1] 0.04449942 0.03446228 0.03496418 0.05423261
##
## [[4]]
## [1] 0.08304548 0.07071068 0.03266858
##
## [[5]]
## [1] 0.03846154 0.09712859 0.02880756
##
## [[6]]
## [1] 0.03846154 0.09950372 0.03440105 0.03596750 0.03571429
# Next we produce a neighbor matrix with these weights. This is a 100x100 matrix. Style “W” is
# row standardized, i.e., divide by number of neighbors in row. For brevity in this guide, we only
# show the first 4 rows and 10 columns
sids.nbmat <- nb2mat(sids.nb, glist=inten, style="W")
sids.nbmat[1:4,1:10]
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
## Ashe 0.0000000 0.2508432 0.0000000 0 0 0 0.0000000 0 0
## Alleghany 0.3729811 0.0000000 0.3307017 0 0 0 0.0000000 0 0
## Surry 0.0000000 0.2646278 0.0000000 0 0 0 0.0000000 0 0
## Currituck 0.0000000 0.0000000 0.0000000 0 0 0 0.4454639 0 0
## [,10]
## Ashe 0.0000000
## Alleghany 0.0000000
## Surry 0.2049393
## Currituck 0.0000000
# Same result can be put in a list because the function to compute Moran statistics will require a
# list. Shown here are only the first four rows
sids.nblsw <- nb2listw(sids.nb, glist=inten, style="W")
sids.nblsw
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 100
## Number of nonzero links: 570
## Percentage nonzero weights: 5.7
## Average number of links: 5.7
##
## Weights style: W
## Weights constants summary:
## n nn S0 S1 S2
## W 100 10000 100 44.05019 408.0159
#
# We can build a neighbor object using the polygons instead of centroids. Neighboring relations
# are established based on shared borders instead ot distance between county seats.
sids.nb.pol <- poly2nb(sidspolys, row.names = rownames(nc.sids))
sids.nb.pol
## Neighbour list object:
## Number of regions: 100
## Number of nonzero links: 490
## Percentage nonzero weights: 4.9
## Average number of links: 4.9
# having similar structure as the sids.nb object we built above using distance between county seats.
# We can examine the differences
#diffnb(sids.nb, sids.nb.pol, verbose=TRUE)
# Next, we need weights for neighborhood structure. The weights would represent intensity of
# neighbor relationship (in this case extent of common boundary). Results are put in a list because
# later the function to compute Moran’s I statistic will require a list. Shown here only the first four
# rows
sids.nblsw.pol <- nb2listw(sids.nb.pol, glist=NULL, style="W")
sids.nblsw.pol$weights[1:4]
## [[1]]
## [1] 0.3333333 0.3333333 0.3333333
##
## [[2]]
## [1] 0.3333333 0.3333333 0.3333333
##
## [[3]]
## [1] 0.2 0.2 0.2 0.2 0.2
##
## [[4]]
## [1] 0.5 0.5
Apply the Moran’s I spatial auto-correlation test to the SID74 variable
moran.test(nc.sids$SID74, sids.nblsw, randomisation=F)
##
## Moran I test under normality
##
## data: nc.sids$SID74
## weights: sids.nblsw
##
## Moran I statistic standard deviate = 2.8689, p-value = 0.00206
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.175722094 -0.010101010 0.004195403
Apply the Moran’s I spatial auto-correlation test to the NWBIR74 variable
moran.test(nc.sids$NWBIR74, sids.nblsw, randomisation=F)
##
## Moran I test under normality
##
## data: nc.sids$NWBIR74
## weights: sids.nblsw
##
## Moran I statistic standard deviate = 3.0086, p-value = 0.001312
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.184773571 -0.010101010 0.004195403
# Here although the value for the Moran statistic is low and would seem to indicate no spatial
# pattern, the low variance makes the Z value high and consequently the p-value is low enough to
# conclude that there is spatial pattern.
# We have assumed that the mean and variance are the same for all regions. As discussed in
# Kaluzny et al. (1996) the variance of sids rate increases for counties with low birth rates and
# therefore the data needs to be transformed using the Freeman-Tukey (FT) transform of the sids
# rate and multiplied by the square root of births, to achieve constant variance. The FT transform is
ft.SID74 <- sqrt(1000)*(sqrt(nc.sids$SID74/nc.sids$BIR74) +
sqrt((nc.sids$SID74+1)/nc.sids$BIR74))
# and then multiply by square root of births
tr.SID74 <- ft.SID74*sqrt(nc.sids$BIR74)
names(tr.SID74) <- rownames(nc.sids)
# So, modify the Moran test as follows
moran.test(tr.SID74, sids.nblsw, randomisation=F)
##
## Moran I test under normality
##
## data: tr.SID74
## weights: sids.nblsw
##
## Moran I statistic standard deviate = 5.0921, p-value = 1.771e-07
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.319721725 -0.010101010 0.004195403
#
# This produces an increase in Moran’s I, and consequently an improvement in the p-value.
# Let us try randomization
moran.test(nc.sids$SID74, sids.nblsw, randomisation=T)
##
## Moran I test under randomisation
##
## data: nc.sids$SID74
## weights: sids.nblsw
##
## Moran I statistic standard deviate = 2.9862, p-value = 0.001412
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.175722094 -0.010101010 0.003872264
# This also represents an improvement in p-value compared to normality. So far, results indicate
# that there is spatial auto-correlation of the Sids rate 1979-1984 in North Carolina counties. We
# can confirm with Monte Carlo simulations (calculating Moran’s I many times.
moran.mc(nc.sids$SID74, sids.nblsw, nsim=1000)
##
## Monte-Carlo simulation of Moran I
##
## data: nc.sids$SID74
## weights: sids.nblsw
## number of simulations + 1: 1001
##
## statistic = 0.17572, observed rank = 991, p-value = 0.00999
## alternative hypothesis: greater
# We also have a low p-value.
# You can repeat with the Geary statistic, by just changing “moran” for "geary".
Geary Test for SID74
geary.test(nc.sids$SID74, sids.nblsw, randomisation=T)
##
## Geary C test under randomisation
##
## data: nc.sids$SID74
## weights: sids.nblsw
##
## Geary C statistic standard deviate = 2.399, p-value = 0.00822
## alternative hypothesis: Expectation greater than statistic
## sample estimates:
## Geary C statistic Expectation Variance
## 0.818706573 1.000000000 0.005710909
Geary Test for BIR74
geary.test(nc.sids$BIR74, sids.nblsw, randomisation=T)
##
## Geary C test under randomisation
##
## data: nc.sids$BIR74
## weights: sids.nblsw
##
## Geary C statistic standard deviate = 2.1366, p-value = 0.01632
## alternative hypothesis: Expectation greater than statistic
## sample estimates:
## Geary C statistic Expectation Variance
## 0.835092790 1.000000000 0.005957267
# You can repeat with the Geary statistic, by just changing “moran” for "geary".
geary.mc(nc.sids$SID74, sids.nblsw, nsim=1000)
##
## Monte-Carlo simulation of Geary C
##
## data: nc.sids$SID74
## weights: sids.nblsw
## number of simulations + 1: 1001
##
## statistic = 0.81871, observed rank = 10, p-value = 0.00999
## alternative hypothesis: greater
geary.mc(nc.sids$BIR74, sids.nblsw, nsim=1000)
##
## Monte-Carlo simulation of Geary C
##
## data: nc.sids$BIR74
## weights: sids.nblsw
## number of simulations + 1: 1001
##
## statistic = 0.83509, observed rank = 27, p-value = 0.02697
## alternative hypothesis: greater
# We can plot the spatial auto-correlation as a function of lag distance or spatial correlogram
plot(sp.correlogram(sids.nb, nc.sids$SID74, order = 10, method = "corr", style = "W"))
# We can plot the spatial auto-correlation as a function of lag distance or spatial correlogram
plot(sp.correlogram(sids.nb, nc.sids$BIR74, order = 10, method = "corr", style = "W"))
# In the top panel we can appreciate that values are positively correlated at short lags indicating
# spatial pattern. Another option for method is “I” and this will do Moran’s I at different lags
# (bottom panel).
plot(sp.correlogram(sids.nb, nc.sids$SID74, order = 10, method = "I", style = "W"))
plot(sp.correlogram(sids.nb, nc.sids$BIR74, order = 10, method = "I", style = "W"))
Use SAR to build a predictor of SID74 rates from NWBIR74 birth rates.
Evaluate and discuss the results.
# Spatial auto-correlation depends on the weights, therefore it is good practice to perform several
# runs of Moran and Geary with different weights.
# The spatial auto-regression (SAR) linear model (SLM) object is sarslm. Package spdep provides
# functions lagsarslm for the spatial lag model and errorsarslm for the spatial error model, to
# 180
# perform SAR auto-regression. We will use lagsarslm to estimate the coefficients of a predictor
# and the autocorrelation parameter ρ.
# First, assume that race is related to SIDS rates, and therefore SID is modeled as a function of
# nonwhite birth rates NWBIR.
sids.lag <- lagsarlm(SID74 ~ NWBIR74, data=nc.sids, sids.nblsw,tol.solve=1e-9)
sids.lag
##
## Call:
## lagsarlm(formula = SID74 ~ NWBIR74, data = nc.sids, listw = sids.nblsw,
## tol.solve = 1e-09)
## Type: lag
##
## Coefficients:
## rho (Intercept) NWBIR74
## 0.053879949 1.236523821 0.004824141
##
## Log likelihood: -264.33
# note that the results include estimates of the coefficients (intercept and slope) and the parameter
# rho for ρ. This model is then used to predict the sids rate by region (county). It was necessary to
# decrease the tolerance to 10-9. By default, it is 10-7. Using summary, we get more information
summary(sids.lag)
##
## Call:lagsarlm(formula = SID74 ~ NWBIR74, data = nc.sids, listw = sids.nblsw,
## tol.solve = 1e-09)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.53769 -1.88414 -0.57088 1.63595 16.76087
##
## Type: lag
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.23652382 0.60407741 2.047 0.04066
## NWBIR74 0.00482414 0.00024948 19.337 < 2e-16
##
## Rho: 0.05388, LR test value: 0.53784, p-value: 0.46333
## Asymptotic standard error: 0.074818
## z-value: 0.72015, p-value: 0.47144
## Wald statistic: 0.51861, p-value: 0.47144
##
## Log likelihood: -264.33 for lag model
## ML residual variance (sigma squared): 11.567, (sigma: 3.401)
## Number of observations: 100
## Number of parameters estimated: 4
## AIC: 536.66, (AIC for lm: 535.2)
## LM test for residual autocorrelation
## test value: 0.012854, p-value: 0.90973
# The coefficient estimates have low p-values. A special function is the Likelihood Ratio Test
# (LR) to check whether the parameter rho is nonzero. In this case, the estimate is still relatively
# good with a p-value of 0.08. This tells us that ρ is significantly different from zero. However, the
# Lagrange Multiplier (LM) test for lack of serial correlation of the residuals yields high p-value
# and therefore the null hypothesis of serially correlated residuals cannot be rejected.
# Predicted values and residuals are available as part of the slm object type as fitted and resid. This
# model can be diagnosed much in the same manner that we evaluated linear regression models:
# scatter plots, qq plots and residual-predicted plots. For example,
lim<- max(nc.sids$SID74, fitted(sids.lag))
## This method assumes the response is known - see manual page
split.screen(c(2,1))
## [1] 1 2
screen(1)
#par(mar=c(4,4,1,.5),xaxs=r, yaxs=r)
plot(fitted(sids.lag), resid(sids.lag),xlab="Estimated",ylab="Residuals")
## This method assumes the response is known - see manual page
abline(h=0)
split.screen(c(1,2), screen = 2)
## [1] 3 4
screen(3)
plot(nc.sids$SID74,
fitted(sids.lag),xlab="Observed",ylab="Estimated",xlim=c(0,lim),ylim=c(0,lim))
## This method assumes the response is known - see manual page
abline(a=0,b=1)
screen(4)
qqnorm(resid(sids.lag))
qqline(resid(sids.lag))
# where we see that there are some outliers but it is generally fine. The residuals do not behave like
# a normal distribution.
# We know that the Freeman-Tukey (FT) transform helps to stabilize the variance of sids. Let us
# perform SAR on the FT transform of SID79 and NWBIR79. We already have tr.SID74. Let us
# calculate the one for NWBIR74
ft.NWBIR74 <- sqrt(1000)*(sqrt(nc.sids$NWBIR74/nc.sids$BIR74) +
sqrt((nc.sids$NWBIR74+1)/nc.sids$BIR74))
# and then multiply by square root of births
tr.NWBIR74 <- ft.NWBIR74*sqrt(nc.sids$BIR74)
names(tr.NWBIR74) <- rownames(nc.sids)
tr.sids <- data.frame(tr.SID74, tr.NWBIR74)
# Run SAR on the transformed data
sids.tr.lag <- lagsarlm(tr.SID74 ~ tr.NWBIR74, data=tr.sids, sids.nblsw, tol.solve=1e-12)
summary(sids.tr.lag)
##
## Call:
## lagsarlm(formula = tr.SID74 ~ tr.NWBIR74, data = tr.sids, listw = sids.nblsw,
## tol.solve = 1e-12)
##
## Residuals:
## Min 1Q Median 3Q Max
## -82.7541 -27.5356 -4.7796 26.8923 116.8520
##
## Type: lag
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 36.871134 10.704642 3.4444 0.0005723
## tr.NWBIR74 0.061849 0.003956 15.6341 < 2.2e-16
##
## Rho: 0.038154, LR test value: 0.24137, p-value: 0.62322
## Asymptotic standard error: 0.078521
## z-value: 0.48591, p-value: 0.62703
## Wald statistic: 0.23611, p-value: 0.62703
##
## Log likelihood: -506.2653 for lag model
## ML residual variance (sigma squared): 1461.4, (sigma: 38.228)
## Number of observations: 100
## Number of parameters estimated: 4
## AIC: 1020.5, (AIC for lm: 1018.8)
## LM test for residual autocorrelation
## test value: 3.929, p-value: 0.04746
# Note that we have improved the estimation of rho using this transform and the LM test of
# residuals yields low p-value, allowing us to reject the null of serially correlated residuals.
par(mfrow=c(2,2))
plot(tr.sids$tr.SID74, fitted(sids.tr.lag))
## This method assumes the response is known - see manual page
plot(fitted(sids.tr.lag), resid(sids.tr.lag))
## This method assumes the response is known - see manual page
qqnorm(resid(sids.tr.lag))
# As we can see we now have achieved a better behavior of the residuals.