Introduction

This document explains the application of Spatial Autoregressive (SAR) and Conditional Autoregressive (CAR) models to analyze North Carolina birth rate data. We examine the relationship between non-white birth rates and overall birth rates using spatial econometric techniques.

Load Required Libraries

library(spdep)
library(sf)
library(sp)
library(ggplot2)
library(gridExtra)
library(tmap)
library(spatialreg)

Load and Prepare Data

# Load North Carolina SIDS data
nc.sids <- st_read(system.file("shapes/sids.gpkg", package = "spData"), quiet = TRUE)

# Check the class of the object
class(nc.sids)

## [1] "sf"         "data.frame"

# View the data
head(nc.sids, 3)

## Simple feature collection with 3 features and 22 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -81.74107 ymin: 36.23388 xmax: -80.43531 ymax: 36.58965
## Geodetic CRS:  NAD27
##   CNTY_ID  AREA PERIMETER CNTY_      NAME  FIPS FIPSNO CRESS_ID BIR74 SID74
## 1    1825 0.114     1.442  1825      Ashe 37009  37009        5  1091     1
## 2    1827 0.061     1.231  1827 Alleghany 37005  37005        3   487     0
## 3    1828 0.143     1.630  1828     Surry 37171  37171       86  3188     5
##   NWBIR74 BIR79 SID79 NWBIR79 east north      x       y       lon      lat L_id
## 1      10  1364     0      19  164   176 -81.67 4052.29 -81.48594 36.43940    1
## 2      10   542     3      12  183   182 -50.06 4059.70 -81.14061 36.52443    1
## 3     208  3616     6     260  204   174 -16.14 4043.76 -80.75312 36.40033    1
##   M_id                           geom
## 1    2 MULTIPOLYGON (((-81.47276 3...
## 2    2 MULTIPOLYGON (((-81.23989 3...
## 3    2 MULTIPOLYGON (((-80.45634 3...

# Check column names
names(nc.sids)

##  [1] "CNTY_ID"   "AREA"      "PERIMETER" "CNTY_"     "NAME"      "FIPS"     
##  [7] "FIPSNO"    "CRESS_ID"  "BIR74"     "SID74"     "NWBIR74"   "BIR79"    
## [13] "SID79"     "NWBIR79"   "east"      "north"     "x"         "y"        
## [19] "lon"       "lat"       "L_id"      "M_id"      "geom"

Create Transformed Variables

Freeman-Tukey Transformation for Non-White Birth Rates

The Freeman-Tukey transformation is applied to stabilize variance and normalize the distribution of non-white birth rates:

\[ \text{nwbir.FT} = \sqrt{1000} \left( \sqrt{\frac{\text{NWBIR79}}{\text{BIR79}}} + \sqrt{\frac{\text{NWBIR79} + 1}{\text{BIR79}}} \right) \]

# Apply Freeman-Tukey transformation for non-white birth rates
nc.sids$nwbir.FT <- sqrt(1000) * (sqrt(nc.sids$NWBIR79/nc.sids$BIR79) + 
                                    sqrt((nc.sids$NWBIR79 + 1)/nc.sids$BIR79))

Freeman-Tukey Transformation for Overall Birth Rates

Similarly, we need to transform the overall birth rates (rates.FT). Based on the original code, rates.FT is the transformed overall birth rates:

\[ \text{rates.FT} = \sqrt{1000} \left( \sqrt{\frac{\text{SID79}}{\text{BIR79}}} + \sqrt{\frac{\text{SID79} + 1}{\text{BIR79}}} \right) \]

# Create the overall birth rates transformation
# Using SID79 as the count variable for overall rates
nc.sids$rates.FT <- sqrt(1000) * (sqrt(nc.sids$SID79/nc.sids$BIR79) + 
                                    sqrt((nc.sids$SID79 + 1)/nc.sids$BIR79))

# Check the created variables
head(nc.sids[, c("BIR79", "SID79", "NWBIR79", "rates.FT", "nwbir.FT")])

## Simple feature collection with 6 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -81.74107 ymin: 36.07282 xmax: -75.77316 ymax: 36.58965
## Geodetic CRS:  NAD27
##   BIR79 SID79 NWBIR79  rates.FT  nwbir.FT                           geom
## 1  1364     0      19 0.8562347  7.561438 MULTIPOLYGON (((-81.47276 3...
## 2   542     3      12 5.0692990  9.602812 MULTIPOLYGON (((-81.23989 3...
## 3  3616     6     260 2.6794800 16.975378 MULTIPOLYGON (((-80.45634 3...
## 4   830     2     145 3.4534738 26.480233 MULTIPOLYGON (((-76.00897 3...
## 5  1606     3    1197 2.9449286 54.612867 MULTIPOLYGON (((-77.21767 3...
## 6  1838     5    1237 3.4561178 51.895516 MULTIPOLYGON (((-76.74506 3...

Convert to sf Object

# Convert to sf object for spatial operations
nc.sids.sf <- st_as_sf(nc.sids)

# Check the result
class(nc.sids.sf)

## [1] "sf"         "data.frame"

head(nc.sids.sf, 3)

## Simple feature collection with 3 features and 24 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -81.74107 ymin: 36.23388 xmax: -80.43531 ymax: 36.58965
## Geodetic CRS:  NAD27
##   CNTY_ID  AREA PERIMETER CNTY_      NAME  FIPS FIPSNO CRESS_ID BIR74 SID74
## 1    1825 0.114     1.442  1825      Ashe 37009  37009        5  1091     1
## 2    1827 0.061     1.231  1827 Alleghany 37005  37005        3   487     0
## 3    1828 0.143     1.630  1828     Surry 37171  37171       86  3188     5
##   NWBIR74 BIR79 SID79 NWBIR79 east north      x       y       lon      lat L_id
## 1      10  1364     0      19  164   176 -81.67 4052.29 -81.48594 36.43940    1
## 2      10   542     3      12  183   182 -50.06 4059.70 -81.14061 36.52443    1
## 3     208  3616     6     260  204   174 -16.14 4043.76 -80.75312 36.40033    1
##   M_id                           geom  nwbir.FT  rates.FT
## 1    2 MULTIPOLYGON (((-81.47276 3...  7.561438 0.8562347
## 2    2 MULTIPOLYGON (((-81.23989 3...  9.602812 5.0692990
## 3    2 MULTIPOLYGON (((-80.45634 3... 16.975378 2.6794800

# Verify rates.FT is in the sf object
names(nc.sids.sf)

##  [1] "CNTY_ID"   "AREA"      "PERIMETER" "CNTY_"     "NAME"      "FIPS"     
##  [7] "FIPSNO"    "CRESS_ID"  "BIR74"     "SID74"     "NWBIR74"   "BIR79"    
## [13] "SID79"     "NWBIR79"   "east"      "north"     "x"         "y"        
## [19] "lon"       "lat"       "L_id"      "M_id"      "geom"      "nwbir.FT" 
## [25] "rates.FT"

Create Spatial Weights Matrix

# Create queen contiguity weights from the sf object
ncCC89.nb <- poly2nb(nc.sids.sf, queen = TRUE)

# Convert to listw object with row standardization (style = "W")
ncCC89.listw <- nb2listw(ncCC89.nb, style = "W")

# View the spatial weights object
ncCC89.listw

## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 100 
## Number of nonzero links: 490 
## Percentage nonzero weights: 4.9 
## Average number of links: 4.9 
## 
## Weights style: W 
## Weights constants summary:
##     n    nn  S0       S1       S2
## W 100 10000 100 44.65023 410.4746

# Summary information
summary(ncCC89.listw)

## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 100 
## Number of nonzero links: 490 
## Percentage nonzero weights: 4.9 
## Average number of links: 4.9 
## Link number distribution:
## 
##  2  3  4  5  6  7  8  9 
##  8 15 17 23 19 14  2  2 
## 8 least connected regions:
## 4 21 45 56 77 80 90 99 with 2 links
## 2 most connected regions:
## 39 67 with 9 links
## 
## Weights style: W 
## Weights constants summary:
##     n    nn  S0       S1       S2
## W 100 10000 100 44.65023 410.4746

SAR (Spatial Autoregressive Error) Model

Model Specification

The SAR error model (estimated using spautolm() with family = "SAR") is specified as:

\[ \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \mathbf{u} \]

\[ \mathbf{u} = \lambda \mathbf{W} \mathbf{u} + \boldsymbol{\varepsilon} \]

\[ \boldsymbol{\varepsilon} \sim \mathcal{N}(\mathbf{0}, \sigma^2 \mathbf{I}) \]

where: - \(\mathbf{y}\) = vector of transformed overall birth rates (\(\text{rates.FT}\)) - \(\mathbf{X}\) = design matrix containing the intercept and \(\text{nwbir.FT}\) - \(\boldsymbol{\beta}\) = vector of regression coefficients - \(\mathbf{u}\) = spatial error term - \(\lambda\) = spatial autoregressive parameter (measures strength of spatial dependence) - \(\mathbf{W}\) = spatial weights matrix (\(\text{ncCC89.listw}\)) - \(\boldsymbol{\varepsilon}\) = i.i.d. normally distributed error term

Reduced Form

The model can be expressed in reduced form as:

\[ \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + (\mathbf{I} - \lambda \mathbf{W})^{-1} \boldsymbol{\varepsilon} \]

Log-Likelihood

The log-likelihood function for the SAR error model is:

\[ \ell(\boldsymbol{\beta}, \lambda, \sigma^2) = -\frac{n}{2}\ln(2\pi) - \frac{n}{2}\ln(\sigma^2) + \ln|\mathbf{I} - \lambda\mathbf{W}| - \frac{1}{2\sigma^2} \left[ (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})^\top (\mathbf{I} - \lambda\mathbf{W})^\top (\mathbf{I} - \lambda\mathbf{W}) (\mathbf{y} - \mathbf{X}\boldsymbol{\beta}) \right] \]

Estimation

# Estimate SAR error model
nc.sids.sar.out <- spautolm(rates.FT ~ nwbir.FT, 
                           data = nc.sids, 
                           family = "SAR", 
                           listw = ncCC89.listw,
                           zero.policy = TRUE)

# View results
summary(nc.sids.sar.out)

## 
## Call: 
## spautolm(formula = rates.FT ~ nwbir.FT, data = nc.sids, listw = ncCC89.listw, 
##     family = "SAR", zero.policy = TRUE)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -2.310193 -0.456265  0.042259  0.512063  2.555381 
## 
## Coefficients: 
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.5481233  0.2615757  9.7414   <2e-16
## nwbir.FT    0.0110715  0.0073343  1.5095   0.1312
## 
## Lambda: 0.24579 LR test value: 3.0723 p-value: 0.079635 
## Numerical Hessian standard error of lambda: 0.1356 
## 
## Log likelihood: -123.2626 
## ML residual variance (sigma squared): 0.67958, (sigma: 0.82437)
## Number of observations: 100 
## Number of parameters estimated: 4 
## AIC: 254.53

# Extract fitted values and add to both data frames
nc.sids$fitted.sar <- fitted(nc.sids.sar.out)
nc.sids.sf$fitted.sar <- fitted(nc.sids.sar.out)

SAR Model Interpretation

The SAR error model accounts for spatial autocorrelation in the error term:

\(\beta_1\): Effect of non-white birth rates on overall birth rates, holding spatial effects constant
\(\lambda\): Strength of spatial autocorrelation in the errors; \(|\lambda| < 1\) for stationarity
\(\mathbf{W}\mathbf{u}\): Spatially weighted average of neighboring error terms

The fitted values are:

\[ \hat{\mathbf{y}} = \mathbf{X}\hat{\boldsymbol{\beta}} + \hat{\lambda} \mathbf{W} (\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}}) \]

CAR (Conditional Autoregressive) Model

Model Specification

The CAR model (estimated using spautolm() with family = "CAR") is specified as:

\[ \mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \mathbf{u} \]

\[ \mathbf{u} \sim \mathcal{N}(\mathbf{0}, \sigma^2 (\mathbf{I} - \rho \mathbf{W})^{-1}) \]

or equivalently:

\[ (\mathbf{I} - \rho \mathbf{W})\mathbf{u} = \boldsymbol{\varepsilon} \]

\[ \boldsymbol{\varepsilon} \sim \mathcal{N}(\mathbf{0}, \sigma^2 \mathbf{I}) \]

where: - \(\rho\) = spatial autocorrelation parameter (CAR parameter) - All other parameters are as defined in the SAR model

Conditional Interpretation

For each county \(i\), the conditional distribution of its error term given all other counties’ errors is:

\[ u_i | \mathbf{u}_{-i} \sim \mathcal{N}\left(\rho \sum_{j=1}^n w_{ij} u_j,\ \sigma^2\right) \]

This means: - The expected value of \(u_i\) given neighboring errors is \(\rho\) times the weighted average of neighboring errors - The conditional variance is constant (\(\sigma^2\)) for all counties - The spatial dependence is local: each county’s error depends directly on its neighbors

Log-Likelihood

The log-likelihood for the CAR model is:

\[ \ell(\boldsymbol{\beta}, \rho, \sigma^2) = -\frac{n}{2}\ln(2\pi) - \frac{n}{2}\ln(\sigma^2) + \frac{1}{2}\ln|\mathbf{I} - \rho\mathbf{W}| - \frac{1}{2\sigma^2} (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})^\top (\mathbf{I} - \rho\mathbf{W}) (\mathbf{y} - \mathbf{X}\boldsymbol{\beta}) \]

Note the Jacobian term is \(\frac{1}{2}\ln|\mathbf{I} - \rho\mathbf{W}|\) (half the log-determinant), unlike the SAR model which has \(\ln|\mathbf{I} - \lambda\mathbf{W}|\).

Estimation

# Estimate CAR model
nc.sids.car.out <- spautolm(rates.FT ~ nwbir.FT, 
                           data = nc.sids, 
                           family = "CAR", 
                           listw = ncCC89.listw,
                           zero.policy = TRUE)

# View results
summary(nc.sids.car.out)

## 
## Call: 
## spautolm(formula = rates.FT ~ nwbir.FT, data = nc.sids, listw = ncCC89.listw, 
##     family = "CAR", zero.policy = TRUE)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -2.334312 -0.423111  0.031383  0.502847  2.718650 
## 
## Coefficients: 
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.4234930  0.2605823  9.3003  < 2e-16
## nwbir.FT    0.0148474  0.0073079  2.0317  0.04218
## 
## Lambda: 0.43308 LR test value: 2.9066 p-value: 0.088217 
## Numerical Hessian standard error of lambda: 0.24786 
## 
## Log likelihood: -123.3455 
## ML residual variance (sigma squared): 0.67449, (sigma: 0.82127)
## Number of observations: 100 
## Number of parameters estimated: 4 
## AIC: 254.69

# Extract fitted values and add to both data frames
nc.sids$fitted.car <- fitted(nc.sids.car.out)
nc.sids.sf$fitted.car <- fitted(nc.sids.car.out)

Comparison: SAR vs CAR

Key Differences

Aspect	SAR Error	CAR
Error Structure	\(\mathbf{u} = \lambda \mathbf{W}\mathbf{u} + \boldsymbol{\varepsilon}\)	\((\mathbf{I} - \rho \mathbf{W})\mathbf{u} = \boldsymbol{\varepsilon}\)
Variance-Covariance	\(\sigma^2 (\mathbf{I} - \lambda \mathbf{W})^{-1} (\mathbf{I} - \lambda \mathbf{W}^\top)^{-1}\)	\(\sigma^2 (\mathbf{I} - \rho \mathbf{W})^{-1}\)
Symmetric Covariance?	No (unless \(\mathbf{W}\) is symmetric)	Yes (if \(\mathbf{W}\) is symmetric)
Interpretation	Global spatial spillover	Local conditional dependence
Jacobian Term	\(\ln\\|\mathbf{I} - \lambda\mathbf{W}\\|\)	\(\frac{1}{2}\ln\\|\mathbf{I} - \rho\mathbf{W}\\|\)

When to Use Each Model

Model	Best Used When
SAR	• Global spatial spillovers are expected • Shocks propagate through the entire system • Asymmetric or row-standardized weights are used • Spatial dependence in omitted variables
CAR	• Local spatial dependence is expected • Neighboring areas directly influence each other • Symmetric weights are appropriate • Conditional interpretation is meaningful

Model Comparison

# Compare coefficients
sar_coef <- summary(nc.sids.sar.out)$Coef
car_coef <- summary(nc.sids.car.out)$Coef

print("SAR Coefficients:")

## [1] "SAR Coefficients:"

print(sar_coef)

##               Estimate  Std. Error  z value  Pr(>|z|)
## (Intercept) 2.54812325 0.261575735 9.741436 0.0000000
## nwbir.FT    0.01107147 0.007334329 1.509541 0.1311605

print("CAR Coefficients:")

## [1] "CAR Coefficients:"

print(car_coef)

##               Estimate  Std. Error  z value   Pr(>|z|)
## (Intercept) 2.42349295 0.260582294 9.300298 0.00000000
## nwbir.FT    0.01484736 0.007307885 2.031691 0.04218499

# Compare spatial parameters
cat("SAR lambda:", nc.sids.sar.out$lambda, "\n")

## SAR lambda: 0.2457948

cat("CAR rho:", nc.sids.car.out$lambda, "\n")

## CAR rho: 0.4330803

# Compare log-likelihoods
cat("SAR log-likelihood:", logLik(nc.sids.sar.out), "\n")

## SAR log-likelihood: -123.2626

cat("CAR log-likelihood:", logLik(nc.sids.car.out), "\n")

## CAR log-likelihood: -123.3455

Visualization

Simple Base R Plots

# Define color schemes
color.code.raw <- colorRampPalette(c("lightyellow", "orange", "red"))(20)
color.code.sar <- colorRampPalette(c("lightblue", "steelblue", "darkblue"))(20)
color.code.car <- colorRampPalette(c("lightgreen", "forestgreen", "darkgreen"))(20)

# Set up layout
par(mfrow = c(3, 1), oma = c(0, 0, 4, 0) + 0.1, mar = c(0, 0, 1, 0) + 0.1)

# Plot 1: Raw rates
plot(nc.sids.sf[, "rates.FT"], 
     pal = colorRampPalette(c("lightyellow", "orange", "red")),
     main = "Raw Birth Rates (FT Transformation)",
     border = "gray40",
     lwd = 0.5,
     key.pos = 4)

# Plot 2: Fitted SAR values
plot(nc.sids.sf[, "fitted.sar"], 
     pal = colorRampPalette(c("lightblue", "steelblue", "darkblue")),
     main = "Fitted SAR Values",
     border = "gray40",
     lwd = 0.5,
     key.pos = 4)

# Plot 3: Fitted CAR values
plot(nc.sids.sf[, "fitted.car"], 
     pal = colorRampPalette(c("lightgreen", "forestgreen", "darkgreen")),
     main = "Fitted CAR Values",
     border = "gray40",
     lwd = 0.5,
     key.pos = 4)

# Add overall title
mtext("North Carolina SIDS Analysis: SAR vs CAR Models", 
      outer = TRUE, cex = 1.5, line = 1)

Alternative: ggplot2 Visualization

# Create individual plots
p1 <- ggplot(nc.sids.sf) +
  geom_sf(aes(fill = rates.FT), color = "black", size = 0.1) +
  scale_fill_gradientn(colors = c("lightyellow", "orange", "red"),
                       name = "Rates.FT") +
  theme_minimal() +
  ggtitle("Raw Birth Rates (FT Transformation)")

p2 <- ggplot(nc.sids.sf) +
  geom_sf(aes(fill = fitted.sar), color = "black", size = 0.1) +
  scale_fill_gradientn(colors = c("lightblue", "steelblue", "darkblue"),
                       name = "Fitted SAR") +
  theme_minimal() +
  ggtitle("Fitted SAR Values")

p3 <- ggplot(nc.sids.sf) +
  geom_sf(aes(fill = fitted.car), color = "black", size = 0.1) +
  scale_fill_gradientn(colors = c("lightgreen", "forestgreen", "darkgreen"),
                       name = "Fitted CAR") +
  theme_minimal() +
  ggtitle("Fitted CAR Values")

# Arrange plots
grid.arrange(p1, p2, p3, ncol = 1,
             top = "North Carolina SIDS Analysis: SAR vs CAR Models")

Additional Models for Comparison

Spatial Lag Model

# Estimate Spatial Lag Model
nc.sids.lagsar.out <- lagsarlm(rates.FT ~ nwbir.FT, 
                              data = nc.sids, 
                              listw = ncCC89.listw,
                              zero.policy = TRUE)

summary(nc.sids.lagsar.out)

## 
## Call:
## lagsarlm(formula = rates.FT ~ nwbir.FT, data = nc.sids, listw = ncCC89.listw, 
##     zero.policy = TRUE)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -2.310957 -0.469092  0.051219  0.499487  2.553309 
## 
## Type: lag 
## Coefficients: (asymptotic standard errors) 
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.9179079  0.4110306  4.6661 3.07e-06
## nwbir.FT    0.0088023  0.0061509  1.4311   0.1524
## 
## Rho: 0.24179, LR test value: 2.9108, p-value: 0.087987
## Asymptotic standard error: 0.13349
##     z-value: 1.8113, p-value: 0.070096
## Wald statistic: 3.2808, p-value: 0.070096
## 
## Log likelihood: -123.3433 for lag model
## ML residual variance (sigma squared): 0.68099, (sigma: 0.82522)
## Number of observations: 100 
## Number of parameters estimated: 4 
## AIC: 254.69, (AIC for lm: 255.6)
## LM test for residual autocorrelation
## test value: 0.82615, p-value: 0.36339

Spatial Error Model (Alternative Implementation)

# Estimate Spatial Error Model using errorsarlm
nc.sids.errorsar.out <- errorsarlm(rates.FT ~ nwbir.FT, 
                                  data = nc.sids, 
                                  listw = ncCC89.listw, 
                                  zero.policy = TRUE)

summary(nc.sids.errorsar.out)

## 
## Call:
## errorsarlm(formula = rates.FT ~ nwbir.FT, data = nc.sids, listw = ncCC89.listw, 
##     zero.policy = TRUE)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -2.310193 -0.456265  0.042259  0.512063  2.555381 
## 
## Type: error 
## Coefficients: (asymptotic standard errors) 
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.5481233  0.2615757  9.7414   <2e-16
## nwbir.FT    0.0110715  0.0073343  1.5095   0.1312
## 
## Lambda: 0.24579, LR test value: 3.0723, p-value: 0.079635
## Asymptotic standard error: 0.13382
##     z-value: 1.8367, p-value: 0.066256
## Wald statistic: 3.3734, p-value: 0.066256
## 
## Log likelihood: -123.2626 for error model
## ML residual variance (sigma squared): 0.67958, (sigma: 0.82437)
## Number of observations: 100 
## Number of parameters estimated: 4 
## AIC: 254.53, (AIC for lm: 255.6)

Model Selection

# Compare AIC values
models <- list(
  SAR = nc.sids.sar.out,
  CAR = nc.sids.car.out,
  Lag = nc.sids.lagsar.out,
  Error = nc.sids.errorsar.out
)

# Extract AIC for each model
aic_values <- sapply(models, AIC)
aic_values

##      SAR      CAR      Lag    Error 
## 254.5252 254.6909 254.6867 254.5252

# Best model (lowest AIC)
best_model <- names(which.min(aic_values))
cat("Best model by AIC:", best_model)

## Best model by AIC: Error

Summary of Model Parameters

# Create summary table
parameter_summary <- data.frame(
  Model = c("SAR", "CAR", "Lag", "Error"),
  Spatial_Parameter = c(nc.sids.sar.out$lambda,
                        nc.sids.car.out$lambda,
                        nc.sids.lagsar.out$rho,
                        nc.sids.errorsar.out$lambda),
  AIC = aic_values,
  LogLik = sapply(models, logLik)
)

parameter_summary

##       Model Spatial_Parameter      AIC    LogLik
## SAR     SAR         0.2457948 254.5252 -123.2626
## CAR     CAR         0.4330803 254.6909 -123.3455
## Lag     Lag         0.2417853 254.6867 -123.3433
## Error Error         0.2457948 254.5252 -123.2626

Conclusion

Both SAR and CAR models provide spatial alternatives to ordinary least squares regression for analyzing North Carolina birth rate data. The key distinction is:

SAR (Spatial Autoregressive Error): Models global spatial spillovers in the error term
CAR (Conditional Autoregressive): Models local conditional dependence

The choice between them depends on the nature of the spatial process being modeled and the structure of the spatial weights matrix. Model selection criteria (AIC) can help guide this choice.

Spatial Autoregressive (SAR) and Conditional Autoregressive (CAR) Models for North Carolina Birth Rates

2026-07-04

Introduction

Load Required Libraries

Load and Prepare Data

Create Transformed Variables

Freeman-Tukey Transformation for Non-White Birth Rates

Freeman-Tukey Transformation for Overall Birth Rates

Convert to sf Object

Create Spatial Weights Matrix

SAR (Spatial Autoregressive Error) Model

Model Specification

Reduced Form

Log-Likelihood

Estimation

SAR Model Interpretation

CAR (Conditional Autoregressive) Model

Model Specification

Conditional Interpretation

Log-Likelihood

Estimation

Comparison: SAR vs CAR

Key Differences

When to Use Each Model

Model Comparison

Visualization

Simple Base R Plots

Alternative: ggplot2 Visualization

Additional Models for Comparison

Spatial Lag Model

Spatial Error Model (Alternative Implementation)

Model Selection

Summary of Model Parameters

Conclusion