Introduction

This data set was presented first in Symons et al. (1983), analysed with reference to the spatial nature of the data in Cressie and Read (1985), expanded in Cressie and Chan (1989), and used in detail in Cressie (1991). It is for the 100 counties of North Carolina, and includes counts of numbers of live births (also non-white live births) and numbers of sudden infant deaths, for the July 1, 1974 to June 30, 1978 and July 1, 1979 to June 30, 1984 periods. The county seat location coordinates are given in miles in a local (unknown) coordinate reference system.

Data Description

For this labwork the spdep package will be employed which has the sp package and the maptools package. The data from the sources refered to above is documented in the help page for the nc.sids data set in spdep.The actual data, included in a shapefile of the county boundaries for North Carolina is also made available in the maptools package.

This data frame contains the following columns:

SP_ID SpatialPolygons ID

CNTY_ID county ID

east eastings, county seat, miles, local projection

north northings, county seat, miles, local projection

L_id Cressie and Read (1985) L index

M_id Cressie and Read (1985) M index

names County names

AREA County polygon areas in degree units

PERIMETER County polygon perimeters in degree units

CNTY_ Internal county ID

NAME County names

FIPS County ID

FIPSNO County ID

CRESS_ID Cressie papers ID

BIR74 births, 1974-78

SID74 SID deaths, 1974-78

NWBIR74 non-white births, 1974-78

BIR79 births, 1979-84

SID79 SID deaths, 1979-84

NWBIR79 non-white births, 1979-84

Setting work directory

setwd("C:/Users/charl/OneDrive/Desktop/DSCI 607 Data Analytics for Environmental Sciences")

Loading Required LIbraries

#install.packages("spdep")
#install.packages("maptools")
#install.packages("spatialreg")

library(spdep)

## Warning: package 'spdep' was built under R version 4.3.3

## Loading required package: spData

## To access larger datasets in this package, install the spDataLarge
## package with: `install.packages('spDataLarge',
## repos='https://nowosad.github.io/drat/', type='source')`

## Loading required package: sf

## Linking to GEOS 3.11.2, GDAL 3.7.2, PROJ 9.3.0; sf_use_s2() is TRUE

library(maptools)

## Loading required package: sp

## Please note that 'maptools' will be retired during October 2023,
## plan transition at your earliest convenience (see
## https://r-spatial.org/r/2023/05/15/evolution4.html and earlier blogs
## for guidance);some functionality will be moved to 'sp'.
##  Checking rgeos availability: FALSE

## 
## Attaching package: 'maptools'

## The following object is masked from 'package:sp':
## 
##     sp2Mondrian

library(spatialreg)

## Warning: package 'spatialreg' was built under R version 4.3.3

## Loading required package: Matrix

## 
## Attaching package: 'spatialreg'

## The following objects are masked from 'package:spdep':
## 
##     get.ClusterOption, get.coresOption, get.mcOption,
##     get.VerboseOption, get.ZeroPolicyOption, set.ClusterOption,
##     set.coresOption, set.mcOption, set.VerboseOption,
##     set.ZeroPolicyOption

Loading Data Set

data(nc.sids)
help(nc.sids) # check the data and variables

## starting httpd help server ... done

head(nc.sids,5)

##             CNTY.ID BIR74 SID74 NWBIR74 BIR79 SID79 NWBIR79 east north      x
## Ashe           1825  1091     1      10  1364     0      19  164   176 -81.67
## Alleghany      1827   487     0      10   542     3      12  183   182 -50.06
## Surry          1828  3188     5     208  3616     6     260  204   174 -16.14
## Currituck      1831   508     1     123   830     2     145  461   182 406.01
## Northampton    1832  1421     9    1066  1606     3    1197  385   176 281.10
##                   y       lon      lat L.id M.id
## Ashe        4052.29 -81.48594 36.43940    1    2
## Alleghany   4059.70 -81.14061 36.52443    1    2
## Surry       4043.76 -80.75312 36.40033    1    2
## Currituck   4035.10 -76.04892 36.45655    1    4
## Northampton 4029.75 -77.44057 36.38799    1    4

names(nc.sids)

##  [1] "CNTY.ID" "BIR74"   "SID74"   "NWBIR74" "BIR79"   "SID79"   "NWBIR79"
##  [8] "east"    "north"   "x"       "y"       "lon"     "lat"     "L.id"   
## [15] "M.id"

Data Summary

summary(nc.sids)

##     CNTY.ID         BIR74           SID74          NWBIR74      
##  Min.   :1825   Min.   :  248   Min.   : 0.00   Min.   :   1.0  
##  1st Qu.:1902   1st Qu.: 1077   1st Qu.: 2.00   1st Qu.: 190.0  
##  Median :1982   Median : 2180   Median : 4.00   Median : 697.5  
##  Mean   :1986   Mean   : 3300   Mean   : 6.67   Mean   :1051.0  
##  3rd Qu.:2067   3rd Qu.: 3936   3rd Qu.: 8.25   3rd Qu.:1168.5  
##  Max.   :2241   Max.   :21588   Max.   :44.00   Max.   :8027.0  
##      BIR79           SID79          NWBIR79             east      
##  Min.   :  319   Min.   : 0.00   Min.   :    3.0   Min.   : 19.0  
##  1st Qu.: 1336   1st Qu.: 2.00   1st Qu.:  250.5   1st Qu.:178.8  
##  Median : 2636   Median : 5.00   Median :  874.5   Median :285.0  
##  Mean   : 4224   Mean   : 8.36   Mean   : 1352.8   Mean   :271.3  
##  3rd Qu.: 4889   3rd Qu.:10.25   3rd Qu.: 1406.8   3rd Qu.:361.2  
##  Max.   :30757   Max.   :57.00   Max.   :11631.0   Max.   :482.0  
##      north             x                 y             lon        
##  Min.   :  6.0   Min.   :-328.04   Min.   :3757   Min.   :-84.08  
##  1st Qu.: 97.0   1st Qu.: -60.55   1st Qu.:3920   1st Qu.:-81.20  
##  Median :125.5   Median : 114.38   Median :3963   Median :-79.26  
##  Mean   :122.1   Mean   :  91.46   Mean   :3953   Mean   :-79.51  
##  3rd Qu.:151.5   3rd Qu.: 240.03   3rd Qu.:4000   3rd Qu.:-77.87  
##  Max.   :182.0   Max.   : 439.65   Max.   :4060   Max.   :-75.67  
##       lat             L.id           M.id     
##  Min.   :33.92   Min.   :1.00   Min.   :1.00  
##  1st Qu.:35.26   1st Qu.:1.00   1st Qu.:2.00  
##  Median :35.68   Median :2.00   Median :3.00  
##  Mean   :35.62   Mean   :2.12   Mean   :2.67  
##  3rd Qu.:36.05   3rd Qu.:3.00   3rd Qu.:3.25  
##  Max.   :36.52   Max.   :4.00   Max.   :4.00

A first step is to look at a map of the regions we are dealing with. In this case, the regions are NC counties.

sidspolys <- readShapePoly(system.file("shapes/sids.shp", package="maptools"))

## Warning: shapelib support is provided by GDAL through the sf and terra packages
## among others

plot(sidspolys,axes=T)

Coordinates(), set or retrieve the spatial coordinates. For spatial polygons it returns the centroids.

#text(coordinates(sidspolys), labels=row.names(sidspolys), cex=0.6)

We can plot a map of counties coded according to the value of a variable, say SID74. First, we examine the values of SID74 cut the values of the variable in intervals

nc.sids$SID74  #SID deaths, 1974-78

##   [1]  1  0  5  1  9  7  0  0  4  1  2 16  4  4  4 18  3  4  1  1  1  0  1  2 10
##  [26] 23 13  6  4 16  8  0 10  6  0  2 16  2  4  1  0  8  5  5  0  5  7  2 11  3
##  [51] 14  5  9  6  2  0  7  3  4  5 12 18  6 10  8  2  5 44  3  3  5  5  0 10  3
##  [76] 11  1  0  4  1  2 38  1  4 15  7  0  4  4  0 13  8 29 31  5  8  4 15 12  5

Sid74 <- as.ordered(cut(nc.sids$SID74, breaks=c(0, 5, 10, 15, 20, 30, 40, 50, 60),
                        include.lowest=TRUE))
unclass(Sid74)

##   [1] 1 1 1 1 2 2 1 1 1 1 1 4 1 1 1 4 1 1 1 1 1 1 1 1 2 5 3 2 1 4 2 1 2 2 1 1 4
##  [38] 1 1 1 1 2 1 1 1 1 2 1 3 1 3 1 2 2 1 1 2 1 1 1 3 4 2 2 2 1 1 7 1 1 1 1 1 2
##  [75] 1 3 1 1 1 1 1 6 1 1 3 2 1 1 1 1 3 2 5 6 1 2 1 3 3 1
## attr(,"levels")
## [1] "[0,5]"   "(5,10]"  "(10,15]" "(15,20]" "(20,30]" "(30,40]" "(40,50]"

Plot the map of polygons according to the intervals and color codes selected

cols <- grey(seq(0.3,1,0.1))
plot(sidspolys, col=cols[unclass(Sid74)], border = par("fg"),axes=T)
legend(c(-84,-82 ), c(32, 34), legend=paste("Sids 74", levels(Sid74)), fill=cols, bty="n")

We examine the values of NWBIR74 cut the values of the variable in intervals

nc.sids$NWBIR74  #NWBIR74, 1974-78

##   [1]   10   10  208  123 1066  954  115  254  748  160  550 1243  930  613 1179
##  [16] 2365  622  200   17  230  386    4   65  736 3919 5483 1243  921  776 3732
##  [31] 1851    1 2186  309   12  883 4397    5 1144  148  128  736  326  521  116
##  [46]  134  384  591 1827 1057 2620  790  930 1165   57   43 1131  281  534  736
##  [61]  495 2593 1051 1491  302  215  844 8027  856  472  370  158   40 1826   92
##  [76] 1523   95    9 1396  222   32 7043  297 1034  952  987  134 1061 1043    1
##  [91] 1744 1206 2217 5904  341  818  580 1431 1633  659

nwbir74 <- as.ordered(cut(nc.sids$NWBIR74, breaks=c(0, 5, 10, 15, 20, 30, 40, 50, 60),
                        include.lowest=TRUE))
unclass(nwbir74)

##   [1]  2  2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA  4 NA NA  1 NA NA NA
##  [26] NA NA NA NA NA NA  1 NA NA  3 NA NA  1 NA NA NA NA NA NA NA NA NA NA NA NA
##  [51] NA NA NA NA  7  6 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA  5 NA NA
##  [76] NA NA  2 NA NA  5 NA NA NA NA NA NA NA NA  1 NA NA NA NA NA NA NA NA NA NA
## attr(,"levels")
## [1] "[0,5]"   "(5,10]"  "(10,15]" "(15,20]" "(30,40]" "(40,50]" "(50,60]"

cols <- grey(seq(0.3,1,0.1))
plot(sidspolys, col=cols[unclass(nwbir74)], border = par("fg"),axes=T)
legend(c(-84,-82 ), c(32, 34), legend=paste("NWBIR74", levels(Sid74)), fill=cols, bty="n")

We can also view the map as probabilities of observing these values if rates were to follow a Poisson distribution. First, compute the parameter for the Poisson

sids.phat <- sum(nc.sids$SID74) / sum(nc.sids$BIR74)


# apply the cumulative of the Poisson (ppois)
pm <- ppois(nc.sids$SID74, sids.phat*nc.sids$BIR74)
pm.f <- as.ordered(cut(pm, breaks=c(0.0, 0.01, 0.05, 0.1, 0.9, 0.95, 0.99, 1),
                       include.lowest=TRUE))

We assign colors and map

cols <- grey(seq(0.3,1,0.1))
plot(sidspolys, col=cols[unclass(pm.f)],border = par("fg"),axes=T)
legend(c(-84,-82 ), c(32, 34), legend=paste("prob.", levels(pm.f)), fill=cols, bty="n")

For auto-correlation and auto-regression, we need to build a neighbor structure to be stored as object of class nb.

sids.coords <- cbind(nc.sids$east, nc.sids$north)
sids.utm <- cbind(nc.sids$x, nc.sids$y)
sids.lonlat <- cbind(nc.sids$lon, nc.sids$lat)

sids.nb <- dnearneigh(sids.coords, 0, 30, row.names = rownames(nc.sids))
sids.nb

## Neighbour list object:
## Number of regions: 100 
## Number of nonzero links: 398 
## Percentage nonzero weights: 3.98 
## Average number of links: 3.98 
## 2 regions with no links:
## Dare Hyde
## 3 disjoint connected subgraphs

# you can find the link for each county by addressing the id number, for example for counties 1
# and 2
sids.nb[1:2]

## [[1]]
## [1]  2 18 19
## 
## [[2]]
## [1]  1  3 18

# where you see some isolated county seats. Increase the neighbor cutoff to 35 miles to see if we include these counties

sids.nb <- dnearneigh(sids.coords, 0, 35, row.names = rownames(nc.sids))
plot.nb(sids.nb, sids.coords)

plot(sidspolys, border = "blue")
plot(sids.nb, sids.lonlat, add=T)

# Next, we need weights for neighborhood structure. In this case, a simple calculation of weights is
# the inverse of the distance; i.e. the closest the regions the larger the weight (the closest two                                                                                 regions are, the more intense the neighbor effect).
# We can get distances between each pair

sids.dists <- nbdists(sids.nb, sids.coords)
head(sids.dists)

## [[1]]
## [1] 19.92486 25.49510 19.10497 34.05877 34.52535
## 
## [[2]]
## [1] 19.92486 22.47221 25.07987
## 
## [[3]]
## [1] 22.47221 29.01724 28.60070 18.43909
## 
## [[4]]
## [1] 12.04159 14.14214 30.61046
## 
## [[5]]
## [1] 26.00000 10.29563 34.71311
## 
## [[6]]
## [1] 26.00000 10.04988 29.06888 27.80288 28.00000

# and also we can get an intensity of neighborhood using the inverse of the distance applied to all
# pairs

inten <- lapply(sids.dists, function(x) 1/x)
head(inten)

## [[1]]
## [1] 0.05018856 0.03922323 0.05234239 0.02936101 0.02896422
## 
## [[2]]
## [1] 0.05018856 0.04449942 0.03987261
## 
## [[3]]
## [1] 0.04449942 0.03446228 0.03496418 0.05423261
## 
## [[4]]
## [1] 0.08304548 0.07071068 0.03266858
## 
## [[5]]
## [1] 0.03846154 0.09712859 0.02880756
## 
## [[6]]
## [1] 0.03846154 0.09950372 0.03440105 0.03596750 0.03571429

# Next we produce a neighbor matrix with these weights. This is a 100x100 matrix. Style “W” is
# row standardized, i.e., divide by number of neighbors in row. For brevity in this guide, we only
# show the first 4 rows and 10 columns
sids.nbmat <- nb2mat(sids.nb, glist=inten, style="W")
sids.nbmat[1:4,1:10]

##                [,1]      [,2]      [,3] [,4] [,5] [,6]      [,7] [,8] [,9]
## Ashe      0.0000000 0.2508432 0.0000000    0    0    0 0.0000000    0    0
## Alleghany 0.3729811 0.0000000 0.3307017    0    0    0 0.0000000    0    0
## Surry     0.0000000 0.2646278 0.0000000    0    0    0 0.0000000    0    0
## Currituck 0.0000000 0.0000000 0.0000000    0    0    0 0.4454639    0    0
##               [,10]
## Ashe      0.0000000
## Alleghany 0.0000000
## Surry     0.2049393
## Currituck 0.0000000

# Same result can be put in a list because the function to compute Moran statistics will require a
# list. Shown here are only the first four rows
sids.nblsw <- nb2listw(sids.nb, glist=inten, style="W")
 sids.nblsw

## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 100 
## Number of nonzero links: 570 
## Percentage nonzero weights: 5.7 
## Average number of links: 5.7 
## 
## Weights style: W 
## Weights constants summary:
##     n    nn  S0       S1       S2
## W 100 10000 100 44.05019 408.0159

# 
 # We can build a neighbor object using the polygons instead of centroids. Neighboring relations
 # are established based on shared borders instead ot distance between county seats.
 sids.nb.pol <- poly2nb(sidspolys, row.names = rownames(nc.sids))
sids.nb.pol

## Neighbour list object:
## Number of regions: 100 
## Number of nonzero links: 490 
## Percentage nonzero weights: 4.9 
## Average number of links: 4.9

 # having similar structure as the sids.nb object we built above using distance between county seats.
 # We can examine the differences
 #diffnb(sids.nb, sids.nb.pol, verbose=TRUE)

 # Next, we need weights for neighborhood structure. The weights would represent intensity of
 # neighbor relationship (in this case extent of common boundary). Results are put in a list because
 # later the function to compute Moran’s I statistic will require a list. Shown here only the first four
 # rows
 sids.nblsw.pol <- nb2listw(sids.nb.pol, glist=NULL, style="W")
sids.nblsw.pol$weights[1:4]

## [[1]]
## [1] 0.3333333 0.3333333 0.3333333
## 
## [[2]]
## [1] 0.3333333 0.3333333 0.3333333
## 
## [[3]]
## [1] 0.2 0.2 0.2 0.2 0.2
## 
## [[4]]
## [1] 0.5 0.5

Apply the Moran’s I spatial auto-correlation test to the SID74 variable

moran.test(nc.sids$SID74, sids.nblsw, randomisation=F)

## 
##  Moran I test under normality
## 
## data:  nc.sids$SID74  
## weights: sids.nblsw    
## 
## Moran I statistic standard deviate = 2.8689, p-value = 0.00206
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.175722094      -0.010101010       0.004195403

Apply the Moran’s I spatial auto-correlation test to the NWBIR74 variable

moran.test(nc.sids$NWBIR74, sids.nblsw, randomisation=F)

## 
##  Moran I test under normality
## 
## data:  nc.sids$NWBIR74  
## weights: sids.nblsw    
## 
## Moran I statistic standard deviate = 3.0086, p-value = 0.001312
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.184773571      -0.010101010       0.004195403

# Here although the value for the Moran statistic is low and would seem to indicate no spatial
# pattern, the low variance makes the Z value high and consequently the p-value is low enough to
# conclude that there is spatial pattern.
# We have assumed that the mean and variance are the same for all regions. As discussed in
# Kaluzny et al. (1996) the variance of sids rate increases for counties with low birth rates and
# therefore the data needs to be transformed using the Freeman-Tukey (FT) transform of the sids
# rate and multiplied by the square root of births, to achieve constant variance. The FT transform is

ft.SID74 <- sqrt(1000)*(sqrt(nc.sids$SID74/nc.sids$BIR74) +
                          sqrt((nc.sids$SID74+1)/nc.sids$BIR74))
# and then multiply by square root of births
tr.SID74 <- ft.SID74*sqrt(nc.sids$BIR74)
names(tr.SID74) <- rownames(nc.sids)
# So, modify the Moran test as follows
moran.test(tr.SID74, sids.nblsw, randomisation=F)

## 
##  Moran I test under normality
## 
## data:  tr.SID74  
## weights: sids.nblsw    
## 
## Moran I statistic standard deviate = 5.0921, p-value = 1.771e-07
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.319721725      -0.010101010       0.004195403

# 
# This produces an increase in Moran’s I, and consequently an improvement in the p-value.
# Let us try randomization
moran.test(nc.sids$SID74, sids.nblsw, randomisation=T)

## 
##  Moran I test under randomisation
## 
## data:  nc.sids$SID74  
## weights: sids.nblsw    
## 
## Moran I statistic standard deviate = 2.9862, p-value = 0.001412
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.175722094      -0.010101010       0.003872264

# This also represents an improvement in p-value compared to normality. So far, results indicate
# that there is spatial auto-correlation of the Sids rate 1979-1984 in North Carolina counties. We
# can confirm with Monte Carlo simulations (calculating Moran’s I many times.
moran.mc(nc.sids$SID74, sids.nblsw, nsim=1000)

## 
##  Monte-Carlo simulation of Moran I
## 
## data:  nc.sids$SID74 
## weights: sids.nblsw  
## number of simulations + 1: 1001 
## 
## statistic = 0.17572, observed rank = 991, p-value = 0.00999
## alternative hypothesis: greater

# We also have a low p-value.
# You can repeat with the Geary statistic, by just changing “moran” for "geary".

Geary Test for SID74

geary.test(nc.sids$SID74, sids.nblsw, randomisation=T)

## 
##  Geary C test under randomisation
## 
## data:  nc.sids$SID74 
## weights: sids.nblsw   
## 
## Geary C statistic standard deviate = 2.399, p-value = 0.00822
## alternative hypothesis: Expectation greater than statistic
## sample estimates:
## Geary C statistic       Expectation          Variance 
##       0.818706573       1.000000000       0.005710909

Geary Test for BIR74

geary.test(nc.sids$BIR74, sids.nblsw, randomisation=T)

## 
##  Geary C test under randomisation
## 
## data:  nc.sids$BIR74 
## weights: sids.nblsw   
## 
## Geary C statistic standard deviate = 2.1366, p-value = 0.01632
## alternative hypothesis: Expectation greater than statistic
## sample estimates:
## Geary C statistic       Expectation          Variance 
##       0.835092790       1.000000000       0.005957267

# You can repeat with the Geary statistic, by just changing “moran” for "geary".
geary.mc(nc.sids$SID74, sids.nblsw, nsim=1000)

## 
##  Monte-Carlo simulation of Geary C
## 
## data:  nc.sids$SID74 
## weights: sids.nblsw  
## number of simulations + 1: 1001 
## 
## statistic = 0.81871, observed rank = 10, p-value = 0.00999
## alternative hypothesis: greater

geary.mc(nc.sids$BIR74, sids.nblsw, nsim=1000)

## 
##  Monte-Carlo simulation of Geary C
## 
## data:  nc.sids$BIR74 
## weights: sids.nblsw  
## number of simulations + 1: 1001 
## 
## statistic = 0.83509, observed rank = 27, p-value = 0.02697
## alternative hypothesis: greater

# We can plot the spatial auto-correlation as a function of lag distance or spatial correlogram
plot(sp.correlogram(sids.nb, nc.sids$SID74, order = 10, method = "corr", style = "W"))

# We can plot the spatial auto-correlation as a function of lag distance or spatial correlogram
plot(sp.correlogram(sids.nb, nc.sids$BIR74, order = 10, method = "corr", style = "W"))

# In the top panel we can appreciate that values are positively correlated at short lags indicating
# spatial pattern. Another option for method is “I” and this will do Moran’s I at different lags
# (bottom panel).
plot(sp.correlogram(sids.nb, nc.sids$SID74, order = 10, method = "I", style = "W"))

plot(sp.correlogram(sids.nb, nc.sids$BIR74, order = 10, method = "I", style = "W"))

Use SAR to build a predictor of SID74 rates from NWBIR74 birth rates. Evaluate and discuss the results.

# Spatial auto-correlation depends on the weights, therefore it is good practice to perform several
# runs of Moran and Geary with different weights.
# The spatial auto-regression (SAR) linear model (SLM) object is sarslm. Package spdep provides
# functions lagsarslm for the spatial lag model and errorsarslm for the spatial error model, to
# 180
# perform SAR auto-regression. We will use lagsarslm to estimate the coefficients of a predictor
# and the autocorrelation parameter ρ.
# First, assume that race is related to SIDS rates, and therefore SID is modeled as a function of
# nonwhite birth rates NWBIR.

sids.lag <- lagsarlm(SID74 ~ NWBIR74, data=nc.sids, sids.nblsw,tol.solve=1e-9)
sids.lag

## 
## Call:
## lagsarlm(formula = SID74 ~ NWBIR74, data = nc.sids, listw = sids.nblsw, 
##     tol.solve = 1e-09)
## Type: lag 
## 
## Coefficients:
##         rho (Intercept)     NWBIR74 
## 0.053879949 1.236523821 0.004824141 
## 
## Log likelihood: -264.33

# note that the results include estimates of the coefficients (intercept and slope) and the parameter
# rho for ρ. This model is then used to predict the sids rate by region (county). It was necessary to
# decrease the tolerance to 10-9. By default, it is 10-7. Using summary, we get more information
summary(sids.lag)

## 
## Call:lagsarlm(formula = SID74 ~ NWBIR74, data = nc.sids, listw = sids.nblsw, 
##     tol.solve = 1e-09)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -10.53769  -1.88414  -0.57088   1.63595  16.76087 
## 
## Type: lag 
## Coefficients: (asymptotic standard errors) 
##               Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.23652382 0.60407741   2.047  0.04066
## NWBIR74     0.00482414 0.00024948  19.337  < 2e-16
## 
## Rho: 0.05388, LR test value: 0.53784, p-value: 0.46333
## Asymptotic standard error: 0.074818
##     z-value: 0.72015, p-value: 0.47144
## Wald statistic: 0.51861, p-value: 0.47144
## 
## Log likelihood: -264.33 for lag model
## ML residual variance (sigma squared): 11.567, (sigma: 3.401)
## Number of observations: 100 
## Number of parameters estimated: 4 
## AIC: 536.66, (AIC for lm: 535.2)
## LM test for residual autocorrelation
## test value: 0.012854, p-value: 0.90973

# The coefficient estimates have low p-values. A special function is the Likelihood Ratio Test
# (LR) to check whether the parameter rho is nonzero. In this case, the estimate is still relatively
# good with a p-value of 0.08. This tells us that ρ is significantly different from zero. However, the
# Lagrange Multiplier (LM) test for lack of serial correlation of the residuals yields high p-value
# and therefore the null hypothesis of serially correlated residuals cannot be rejected.
# Predicted values and residuals are available as part of the slm object type as fitted and resid. This
# model can be diagnosed much in the same manner that we evaluated linear regression models:
#   scatter plots, qq plots and residual-predicted plots. For example,

lim<- max(nc.sids$SID74, fitted(sids.lag))

## This method assumes the response is known - see manual page

split.screen(c(2,1))

## [1] 1 2

screen(1)
#par(mar=c(4,4,1,.5),xaxs=r, yaxs=r)
plot(fitted(sids.lag), resid(sids.lag),xlab="Estimated",ylab="Residuals")

## This method assumes the response is known - see manual page

abline(h=0)
split.screen(c(1,2), screen = 2)

## [1] 3 4

screen(3)
plot(nc.sids$SID74,
     fitted(sids.lag),xlab="Observed",ylab="Estimated",xlim=c(0,lim),ylim=c(0,lim))

## This method assumes the response is known - see manual page

abline(a=0,b=1)
screen(4)
qqnorm(resid(sids.lag))
qqline(resid(sids.lag))

# where we see that there are some outliers but it is generally fine. The residuals do not behave like
# a normal distribution.
# We know that the Freeman-Tukey (FT) transform helps to stabilize the variance of sids. Let us
# perform SAR on the FT transform of SID79 and NWBIR79. We already have tr.SID74. Let us
# calculate the one for NWBIR74

ft.NWBIR74 <- sqrt(1000)*(sqrt(nc.sids$NWBIR74/nc.sids$BIR74) +
                                   sqrt((nc.sids$NWBIR74+1)/nc.sids$BIR74))
# and then multiply by square root of births   
tr.NWBIR74 <- ft.NWBIR74*sqrt(nc.sids$BIR74)
names(tr.NWBIR74) <- rownames(nc.sids)


tr.sids <- data.frame(tr.SID74, tr.NWBIR74)

# Run SAR on the transformed data
sids.tr.lag <- lagsarlm(tr.SID74 ~ tr.NWBIR74, data=tr.sids, sids.nblsw, tol.solve=1e-12)
summary(sids.tr.lag)

## 
## Call:
## lagsarlm(formula = tr.SID74 ~ tr.NWBIR74, data = tr.sids, listw = sids.nblsw, 
##     tol.solve = 1e-12)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -82.7541 -27.5356  -4.7796  26.8923 116.8520 
## 
## Type: lag 
## Coefficients: (asymptotic standard errors) 
##              Estimate Std. Error z value  Pr(>|z|)
## (Intercept) 36.871134  10.704642  3.4444 0.0005723
## tr.NWBIR74   0.061849   0.003956 15.6341 < 2.2e-16
## 
## Rho: 0.038154, LR test value: 0.24137, p-value: 0.62322
## Asymptotic standard error: 0.078521
##     z-value: 0.48591, p-value: 0.62703
## Wald statistic: 0.23611, p-value: 0.62703
## 
## Log likelihood: -506.2653 for lag model
## ML residual variance (sigma squared): 1461.4, (sigma: 38.228)
## Number of observations: 100 
## Number of parameters estimated: 4 
## AIC: 1020.5, (AIC for lm: 1018.8)
## LM test for residual autocorrelation
## test value: 3.929, p-value: 0.04746

# Note that we have improved the estimation of rho using this transform and the LM test of
# residuals yields low p-value, allowing us to reject the null of serially correlated residuals.
par(mfrow=c(2,2))
plot(tr.sids$tr.SID74, fitted(sids.tr.lag))

## This method assumes the response is known - see manual page

plot(fitted(sids.tr.lag), resid(sids.tr.lag))

## This method assumes the response is known - see manual page

qqnorm(resid(sids.tr.lag))
# As we can see we now have achieved a better behavior of the residuals.

M13_Lab: Task2- North Carolina SIDS data

Charles Wiredu

2024-04-08