Ejercicio Regresión Lineal Espacial

El propósito de esta actividad es poner en práctica lo estudiado hasta ahora en los temas de Análisis Exploratorio de Datos Espaciales, así como Modelos de Regresión Lineal Espaciales.

Primeramente aquí se proporcionan las librerías principales que deberás considerar. De todas maneras si deseas agregar cualquier otra librería lo puedes hacer y recomendaría hacerlo de una vez dentro del siguiente chunk de código:

#install.packages("quantmod")
#install.packages("forecast")
#install.packages("tseries")
#install.packages("corrplot")
#install.packages("leaflet.extras2")
#install.packages("tinytex")
#tinytex::install_tinytex()
library("leaflet.extras2")
library(corrplot)
library(tseries)
library(forecast)
library(quantmod)
library(foreign)
library(dplyr)
library(spdep)
library(tigris)
library(rgeoda)
library(RColorBrewer)
library(viridis)
library(ggplot2)
library(tmap)
library(sf) 
library(sp)
library(spatialreg)
library(stargazer)
library(plm)
library(splm)
library(pspatreg)
library(regclass)
library(mctest)
library(lmtest)
library(spData)
library(mapview)
library(naniar)
library(dlookr)
library(caret)
library(e1071)
library(SparseM)
library(Metrics)
library(randomForest)
library(rpart.plot)
library(insight)
library(jtools)
library(xgboost)
library(DiagrammeR)
library(effects)
library(leaflet)

A continuación se carga la base de datos a ser analizada: ‘boston’ de la librería ‘spData’

data(boston, package="spData")
head(boston.c)

##         TOWN TOWNNO TRACT      LON     LAT MEDV CMEDV    CRIM ZN INDUS CHAS
## 1     Nahant      0  2011 -70.9550 42.2550 24.0  24.0 0.00632 18  2.31    0
## 2 Swampscott      1  2021 -70.9500 42.2875 21.6  21.6 0.02731  0  7.07    0
## 3 Swampscott      1  2022 -70.9360 42.2830 34.7  34.7 0.02729  0  7.07    0
## 4 Marblehead      2  2031 -70.9280 42.2930 33.4  33.4 0.03237  0  2.18    0
## 5 Marblehead      2  2032 -70.9220 42.2980 36.2  36.2 0.06905  0  2.18    0
## 6 Marblehead      2  2033 -70.9165 42.3040 28.7  28.7 0.02985  0  2.18    0
##     NOX    RM  AGE    DIS RAD TAX PTRATIO      B LSTAT
## 1 0.538 6.575 65.2 4.0900   1 296    15.3 396.90  4.98
## 2 0.469 6.421 78.9 4.9671   2 242    17.8 396.90  9.14
## 3 0.469 7.185 61.1 4.9671   2 242    17.8 392.83  4.03
## 4 0.458 6.998 45.8 6.0622   3 222    18.7 394.63  2.94
## 5 0.458 7.147 54.2 6.0622   3 222    18.7 396.90  5.33
## 6 0.458 6.430 58.7 6.0622   3 222    18.7 394.12  5.21

glimpse(boston.c)

## Rows: 506
## Columns: 20
## $ TOWN    <fct> Nahant, Swampscott, Swampscott, Marblehead, Marblehead, Marble…
## $ TOWNNO  <int> 0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4,…
## $ TRACT   <int> 2011, 2021, 2022, 2031, 2032, 2033, 2041, 2042, 2043, 2044, 20…
## $ LON     <dbl> -70.9550, -70.9500, -70.9360, -70.9280, -70.9220, -70.9165, -7…
## $ LAT     <dbl> 42.2550, 42.2875, 42.2830, 42.2930, 42.2980, 42.3040, 42.2970,…
## $ MEDV    <dbl> 24.0, 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9, 15…
## $ CMEDV   <dbl> 24.0, 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 22.1, 16.5, 18.9, 15…
## $ CRIM    <dbl> 0.00632, 0.02731, 0.02729, 0.03237, 0.06905, 0.02985, 0.08829,…
## $ ZN      <dbl> 18.0, 0.0, 0.0, 0.0, 0.0, 0.0, 12.5, 12.5, 12.5, 12.5, 12.5, 1…
## $ INDUS   <dbl> 2.31, 7.07, 7.07, 2.18, 2.18, 2.18, 7.87, 7.87, 7.87, 7.87, 7.…
## $ CHAS    <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ NOX     <dbl> 0.538, 0.469, 0.469, 0.458, 0.458, 0.458, 0.524, 0.524, 0.524,…
## $ RM      <dbl> 6.575, 6.421, 7.185, 6.998, 7.147, 6.430, 6.012, 6.172, 5.631,…
## $ AGE     <dbl> 65.2, 78.9, 61.1, 45.8, 54.2, 58.7, 66.6, 96.1, 100.0, 85.9, 9…
## $ DIS     <dbl> 4.0900, 4.9671, 4.9671, 6.0622, 6.0622, 6.0622, 5.5605, 5.9505…
## $ RAD     <int> 1, 2, 2, 3, 3, 3, 5, 5, 5, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4,…
## $ TAX     <int> 296, 242, 242, 222, 222, 222, 311, 311, 311, 311, 311, 311, 31…
## $ PTRATIO <dbl> 15.3, 17.8, 17.8, 18.7, 18.7, 18.7, 15.2, 15.2, 15.2, 15.2, 15…
## $ B       <dbl> 396.90, 396.90, 392.83, 394.63, 396.90, 394.12, 395.60, 396.90…
## $ LSTAT   <dbl> 4.98, 9.14, 4.03, 2.94, 5.33, 5.21, 12.43, 19.15, 29.93, 17.10…

?boston

## starting httpd help server ... done

Seguidamente, se cargan las geocercas asociadas a la data transversal previamente cargada

boston.tr<-st_read(system.file("shapes/boston_tracts.gpkg", package="spData")[1])

## Reading layer `boston_tracts' from data source 
##   `C:\Users\jmpo2\AppData\Local\R\win-library\4.5\spData\shapes\boston_tracts.gpkg' 
##   using driver `GPKG'
## Simple feature collection with 506 features and 36 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -71.52311 ymin: 42.00305 xmax: -70.63823 ymax: 42.67307
## Geodetic CRS:  NAD27

Ejemplo de primer visualización en Mapas utilizando los datos ya cargados y visualizando los valores utilizando una variedad de paletas de colores.

boston.trSP<-as(boston.tr, "Spatial")
boston_nb<-poly2nb(boston.trSP)
mapview(boston.trSP, zcol="MEDV", col.regions = colorRampPalette(c('lightgrey', 'darkblue')))

mapview(boston.trSP, zcol="MEDV", col.regions = brewer.pal(7,'Blues'))

## Warning: Found less unique colors (7) than unique zcol values (229)! 
## Interpolating color vector to match number of zcol values.

mapview(boston.trSP, zcol="MEDV", col.regions = viridisLite::magma(20))

## Warning: Found less unique colors (20) than unique zcol values (229)! 
## Interpolating color vector to match number of zcol values.

?mapview
?colorRampPalette
?brewer.pal
?viridis

# Carga de geocercas
boston.tr <- st_read(system.file("shapes/boston_tracts.gpkg", package = "spData")[1])

## Reading layer `boston_tracts' from data source 
##   `C:\Users\jmpo2\AppData\Local\R\win-library\4.5\spData\shapes\boston_tracts.gpkg' 
##   using driver `GPKG'
## Simple feature collection with 506 features and 36 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -71.52311 ymin: 42.00305 xmax: -70.63823 ymax: 42.67307
## Geodetic CRS:  NAD27

# Convertir a Spatial
boston.trSP <- as(boston.tr, "Spatial")

# Unir datos boston.c al shapefile
boston.tr <- cbind(boston.tr, boston.c)

# ANÁLISIS EXPLORATORIO DE DATOS TRADICIONAL (EDA)

# --- Histograma de CRIM ---
ggplot(boston.c, aes(x = CRIM)) +
  geom_histogram(fill = "steelblue", color = "white", bins = 40) +
  labs(title = "Distribución de CRIM", x = "Tasa de Criminalidad", y = "Frecuencia") +
  theme_minimal()

# Histograma con transformación log (CRIM es muy sesgada)
ggplot(boston.c, aes(x = log(CRIM + 1))) +
  geom_histogram(fill = "tomato", color = "white", bins = 40) +
  labs(title = "Distribución de log(CRIM+1)", x = "log(CRIM+1)", y = "Frecuencia") +
  theme_minimal()

# Histogramas de variables independientes ---
vars_indep <- c("ZN","INDUS","NOX","RM","AGE","DIS","RAD","TAX","PTRATIO","B","LSTAT")

boston.c %>%
  select(all_of(vars_indep)) %>%
  tidyr::pivot_longer(everything()) %>%
  ggplot(aes(x = value)) +
  geom_histogram(fill = "steelblue", color = "white", bins = 30) +
  facet_wrap(~name, scales = "free") +
  labs(title = "Histogramas de Variables Independientes") +
  theme_minimal()

# --- Boxplots ---
boston.c %>%
  select(CRIM, all_of(vars_indep)) %>%
  tidyr::pivot_longer(-CRIM) %>%
  ggplot(aes(x = name, y = value)) +
  geom_boxplot(fill = "lightblue") +
  facet_wrap(~name, scales = "free") +
  labs(title = "Boxplots de Variables") +
  theme_minimal()

# Boxplot de CRIM
ggplot(boston.c, aes(y = CRIM)) +
  geom_boxplot(fill = "tomato") +
  labs(title = "Boxplot de CRIM") +
  theme_minimal()

# --- Matriz de Correlaciones ---
boston_num <- boston.c %>%
  select(CRIM, ZN, INDUS, NOX, RM, AGE, DIS, RAD, TAX, PTRATIO, B, LSTAT) %>%
  mutate(across(everything(), as.numeric))

cor_matrix <- cor(boston_num, use = "complete.obs")
corrplot(cor_matrix, method = "color", type = "upper",
         tl.cex = 0.8, addCoef.col = "black", number.cex = 0.6,
         title = "Matriz de Correlaciones", mar = c(0,0,1,0))

# ANÁLISIS EXPLORATORIO DE DATOS ESPACIALES (ESDA)

# --- Matriz de conectividad espacial (estructura Reina) ---
boston_nb <- poly2nb(boston.trSP, queen = TRUE)   # vecinos Reina
boston_lw <- nb2listw(boston_nb, style = "W", zero.policy = TRUE)

# Graficar la matriz de conectividad
plot(boston.trSP, border = "grey60", main = "Matriz de Conectividad Espacial (Reina)")
plot(boston_nb, coordinates(boston.trSP), add = TRUE, col = "red", lwd = 0.5)

# --- Mapas coloreados ---
# CRIM
mapview(boston.trSP, zcol = "CRIM",
        col.regions = viridisLite::magma(20),
        main = "Mapa de CRIM")

## Warning: Found less unique colors (20) than unique zcol values (504)! 
## Interpolating color vector to match number of zcol values.

# NOX (variable independiente 1)
mapview(boston.trSP, zcol = "NOX",
        col.regions = colorRampPalette(c("lightgreen", "darkred"))(20),
        main = "Mapa de NOX")

## Warning: Found less unique colors (20) than unique zcol values (81)! 
## Interpolating color vector to match number of zcol values.

# LSTAT (variable independiente 2)
mapview(boston.trSP, zcol = "LSTAT",
        col.regions = colorRampPalette(c("lightyellow", "darkblue"))(20),
        main = "Mapa de LSTAT")

## Warning: Found less unique colors (20) than unique zcol values (455)! 
## Interpolating color vector to match number of zcol values.

# --- Lag espacial de CRIM ---
boston.trSP$CRIM_lag <- lag.listw(boston_lw, boston.trSP$CRIM, zero.policy = TRUE)

# Mapa original vs lag espacial
p1 <- mapview(boston.trSP, zcol = "CRIM",
              col.regions = viridisLite::magma(20),
              layer.name = "CRIM Original")

## Warning: Found less unique colors (20) than unique zcol values (504)! 
## Interpolating color vector to match number of zcol values.

p2 <- mapview(boston.trSP, zcol = "CRIM_lag",
              col.regions = viridisLite::magma(20),
              layer.name = "CRIM Lag Espacial")

## Warning: Found less unique colors (20) than unique zcol values (506)! 
## Interpolating color vector to match number of zcol values.

p1 | p2   # visualizar lado a lado

# --- Clusters Espaciales LISA (HotSpots, ColdSpots, HL, LH) ---
# Usando rgeoda
boston_sf <- st_as_sf(boston.trSP)
queen_w   <- queen_weights(boston_sf)

lisa <- local_moran(queen_w, boston_sf["CRIM"])

# Extraer clasificación de clusters
boston_sf$lisa_cluster <- as.factor(lisa_clusters(lisa))
boston_sf$lisa_pval    <- lisa_pvalues(lisa)

# Etiquetas de clusters
levels(boston_sf$lisa_cluster) <- c("Not Significant","High-High (HotSpot)",
                                     "Low-Low (ColdSpot)","Low-High (Atípico)",
                                     "High-Low (Atípico)","Isolated")

# Mapa de clusters LISA
ggplot(boston_sf) +
  geom_sf(aes(fill = lisa_cluster), color = NA) +
  scale_fill_manual(values = c("Not Significant" = "grey80",
                               "High-High (HotSpot)"  = "red",
                               "Low-Low (ColdSpot)"   = "blue",
                               "Low-High (Atípico)"   = "lightblue",
                               "High-Low (Atípico)"   = "pink",
                               "Isolated"             = "white")) +
  labs(title = "Clusters LISA - CRIM", fill = "Tipo de Cluster") +
  theme_minimal()

# --- Índice Global de Moran ---
# CRIM
moran_CRIM <- moran.test(boston.trSP$CRIM, boston_lw, zero.policy = TRUE)
print(moran_CRIM)

## 
##  Moran I test under randomisation
## 
## data:  boston.trSP$CRIM  
## weights: boston_lw    
## 
## Moran I statistic standard deviate = 20.398, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##      0.5274633075     -0.0019801980      0.0006736983

moran.plot(boston.trSP$CRIM, boston_lw, zero.policy = TRUE,
           main = "Diagrama de Dispersión de Moran - CRIM",
           xlab = "CRIM", ylab = "Lag espacial de CRIM")

# NOX
moran_NOX <- moran.test(boston.trSP$NOX, boston_lw, zero.policy = TRUE)
print(moran_NOX)

## 
##  Moran I test under randomisation
## 
## data:  boston.trSP$NOX  
## weights: boston_lw    
## 
## Moran I statistic standard deviate = 33.692, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##      0.9065225426     -0.0019801980      0.0007271085

moran.plot(boston.trSP$NOX, boston_lw, zero.policy = TRUE,
           main = "Diagrama de Dispersión de Moran - NOX",
           xlab = "NOX", ylab = "Lag espacial de NOX")

# LSTAT
moran_LSTAT <- moran.test(boston.trSP$LSTAT, boston_lw, zero.policy = TRUE)
print(moran_LSTAT)

## 
##  Moran I test under randomisation
## 
## data:  boston.trSP$LSTAT  
## weights: boston_lw    
## 
## Moran I statistic standard deviate = 25.841, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##      0.6944265724     -0.0019801980      0.0007263073

moran.plot(boston.trSP$LSTAT, boston_lw, zero.policy = TRUE,
           main = "Diagrama de Dispersión de Moran - LSTAT",
           xlab = "LSTAT", ylab = "Lag espacial de LSTAT")

# MODELOS DE REGRESIÓN
# ============================================================

# Fórmula base (variables con mayor correlación con CRIM)
formula_crim <- CRIM ~ ZN + INDUS + NOX + RM + AGE + DIS + RAD + TAX + PTRATIO + B + LSTAT

# --- 1. Modelo de Regresión Lineal Tradicional (OLS) ---
modelo_ols <- lm(formula_crim, data = boston.c)
summary(modelo_ols)

## 
## Call:
## lm(formula = formula_crim, data = boston.c)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -10.673  -1.854  -0.329   0.911  77.645 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  9.8100896  6.9788620   1.406 0.160446    
## ZN           0.0361869  0.0187438   1.931 0.054104 .  
## INDUS       -0.0792771  0.0837688  -0.946 0.344418    
## NOX         -7.1418051  5.2229588  -1.367 0.172126    
## RM          -0.3555174  0.5723546  -0.621 0.534788    
## AGE          0.0003225  0.0180835   0.018 0.985777    
## DIS         -0.7059136  0.2715103  -2.600 0.009603 ** 
## RAD          0.5293735  0.0872206   6.069 2.56e-09 ***
## TAX         -0.0006638  0.0051165  -0.130 0.896833    
## PTRATIO     -0.0648908  0.1785141  -0.364 0.716383    
## B           -0.0098300  0.0036501  -2.693 0.007321 ** 
## LSTAT        0.2408326  0.0685421   3.514 0.000483 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.504 on 494 degrees of freedom
## Multiple R-squared:  0.4406, Adjusted R-squared:  0.4282 
## F-statistic: 35.38 on 11 and 494 DF,  p-value: < 2.2e-16

# Diagnóstico de multicolinealidad
VIF(modelo_ols)

##       ZN    INDUS      NOX       RM      AGE      DIS      RAD      TAX 
## 2.281047 3.942099 4.372226 1.930358 3.092830 3.901573 6.884474 8.875931 
##  PTRATIO        B    LSTAT 
## 1.782825 1.325511 2.859635

# Test de Lagrange (para decidir si se necesita modelo espacial)
lm.LMtests(modelo_ols, boston_lw, test = c("LMlag","LMerr","RLMlag","RLMerr"), zero.policy = TRUE)

## Please update scripts to use lm.RStests in place of lm.LMtests

## 
##  Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
##  dependence
## 
## data:  
## model: lm(formula = formula_crim, data = boston.c)
## test weights: listw
## 
## RSlag = 20.205, df = 1, p-value = 6.957e-06
## 
## 
##  Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
##  dependence
## 
## data:  
## model: lm(formula = formula_crim, data = boston.c)
## test weights: listw
## 
## RSerr = 16.498, df = 1, p-value = 4.869e-05
## 
## 
##  Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
##  dependence
## 
## data:  
## model: lm(formula = formula_crim, data = boston.c)
## test weights: listw
## 
## adjRSlag = 3.7273, df = 1, p-value = 0.05353
## 
## 
##  Rao's score (a.k.a Lagrange multiplier) diagnostics for spatial
##  dependence
## 
## data:  
## model: lm(formula = formula_crim, data = boston.c)
## test weights: listw
## 
## adjRSerr = 0.02055, df = 1, p-value = 0.886

# --- 2. Modelo SAR (Spatial AutoRegressive / Spatial Lag) ---
modelo_sar <- lagsarlm(formula_crim, data = boston.c,
                        listw = boston_lw, zero.policy = TRUE)
summary(modelo_sar)

## 
## Call:lagsarlm(formula = formula_crim, data = boston.c, listw = boston_lw, 
##     zero.policy = TRUE)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -11.55301  -1.71272  -0.22048   0.82210  77.35354 
## 
## Type: lag 
## Coefficients: (asymptotic standard errors) 
##               Estimate Std. Error z value  Pr(>|z|)
## (Intercept) 11.0867205  6.7318544  1.6469  0.099578
## ZN           0.0287898  0.0180996  1.5906  0.111692
## INDUS       -0.0743519  0.0806996 -0.9213  0.356872
## NOX         -6.0621729  5.0319625 -1.2047  0.228306
## RM          -0.5142855  0.5515005 -0.9325  0.351068
## AGE         -0.0010008  0.0174250 -0.0574  0.954198
## DIS         -0.6466768  0.2618382 -2.4698  0.013520
## RAD          0.4261737  0.0867539  4.9124 8.995e-07
## TAX         -0.0012364  0.0049362 -0.2505  0.802225
## PTRATIO     -0.0880795  0.1722070 -0.5115  0.609019
## B           -0.0085584  0.0035183 -2.4325  0.014993
## LSTAT        0.1835554  0.0663340  2.7671  0.005655
## 
## Rho: 0.26439, LR test value: 18.937, p-value: 1.3512e-05
## Asymptotic standard error: 0.058308
##     z-value: 4.5343, p-value: 5.7793e-06
## Wald statistic: 20.56, p-value: 5.7793e-06
## 
## Log likelihood: -1649.921 for lag model
## ML residual variance (sigma squared): 39.264, (sigma: 6.2661)
## Number of observations: 506 
## Number of parameters estimated: 14 
## AIC: 3327.8, (AIC for lm: 3344.8)
## LM test for residual autocorrelation
## test value: 0.041176, p-value: 0.8392

modelo_sem <- errorsarlm(formula_crim, data = boston.c,
                          listw = boston_lw, zero.policy = TRUE)
summary(modelo_sem)

## 
## Call:errorsarlm(formula = formula_crim, data = boston.c, listw = boston_lw, 
##     zero.policy = TRUE)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -10.89696  -1.66243  -0.22936   0.78021  78.15911 
## 
## Type: error 
## Coefficients: (asymptotic standard errors) 
##                Estimate  Std. Error z value  Pr(>|z|)
## (Intercept)  9.27120290  7.46602456  1.2418   0.21432
## ZN           0.03206513  0.01946820  1.6471   0.09955
## INDUS       -0.06346004  0.09089151 -0.6982   0.48505
## NOX         -5.33948275  5.91679698 -0.9024   0.36683
## RM          -0.59927801  0.56791286 -1.0552   0.29132
## AGE          0.00209713  0.01866801  0.1123   0.91056
## DIS         -0.65643626  0.29643072 -2.2145   0.02680
## RAD          0.50521289  0.09622242  5.2505 1.517e-07
## TAX          0.00070805  0.00540410  0.1310   0.89576
## PTRATIO     -0.02353069  0.20398776 -0.1154   0.90816
## B           -0.00854575  0.00388247 -2.2011   0.02773
## LSTAT        0.16976563  0.07077858  2.3985   0.01646
## 
## Lambda: 0.29025, LR test value: 16.846, p-value: 4.0537e-05
## Asymptotic standard error: 0.063387
##     z-value: 4.579, p-value: 4.6724e-06
## Wald statistic: 20.967, p-value: 4.6724e-06
## 
## Log likelihood: -1650.966 for error model
## ML residual variance (sigma squared): 39.314, (sigma: 6.2701)
## Number of observations: 506 
## Number of parameters estimated: 14 
## AIC: 3329.9, (AIC for lm: 3344.8)

# --- 4. Modelo Durbin Espacial (SDM) ---
modelo_sdm <- lagsarlm(formula_crim, data = boston.c,
                        listw = boston_lw, type = "mixed", zero.policy = TRUE)
summary(modelo_sdm)

## 
## Call:lagsarlm(formula = formula_crim, data = boston.c, listw = boston_lw, 
##     type = "mixed", zero.policy = TRUE)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -11.75671  -1.78495  -0.35528   0.87761  76.42245 
## 
## Type: mixed 
## Coefficients: (asymptotic standard errors) 
##                Estimate  Std. Error z value Pr(>|z|)
## (Intercept)  6.53744429 11.62917678  0.5622 0.574008
## ZN           0.02649404  0.02098596  1.2625 0.206781
## INDUS       -0.03660323  0.11243330 -0.3256 0.744761
## NOX         -1.69178022  8.04736221 -0.2102 0.833490
## RM          -0.86424230  0.57531659 -1.5022 0.133045
## AGE          0.00234894  0.01966913  0.1194 0.904941
## DIS         -0.58262333  0.37295224 -1.5622 0.118243
## RAD          0.41962145  0.13281391  3.1595 0.001581
## TAX          0.00183903  0.00603622  0.3047 0.760621
## PTRATIO      0.06059794  0.29149241  0.2079 0.835316
## B           -0.00830169  0.00440396 -1.8850 0.059423
## LSTAT        0.10125488  0.07453376  1.3585 0.174302
## lag.ZN       0.00733911  0.03610984  0.2032 0.838944
## lag.INDUS   -0.12237649  0.18358249 -0.6666 0.505026
## lag.NOX     -6.60851573 10.98116087 -0.6018 0.547304
## lag.RM       1.32930050  1.07446121  1.2372 0.216021
## lag.AGE     -0.01564653  0.03448381 -0.4537 0.650019
## lag.DIS     -0.12461843  0.56287763 -0.2214 0.824785
## lag.RAD      0.01064504  0.18875071  0.0564 0.955025
## lag.TAX     -0.00386754  0.01011986 -0.3822 0.702333
## lag.PTRATIO -0.19623108  0.39148759 -0.5012 0.616199
## lag.B       -0.00068754  0.00721380 -0.0953 0.924069
## lag.LSTAT    0.31590878  0.12622062  2.5028 0.012320
## 
## Rho: 0.25618, LR test value: 13.725, p-value: 0.00021164
## Asymptotic standard error: 0.064502
##     z-value: 3.9718, p-value: 7.1344e-05
## Wald statistic: 15.775, p-value: 7.1344e-05
## 
## Log likelihood: -1644.795 for mixed model
## ML residual variance (sigma squared): 38.509, (sigma: 6.2056)
## Number of observations: 506 
## Number of parameters estimated: 25 
## AIC: 3339.6, (AIC for lm: 3351.3)
## LM test for residual autocorrelation
## test value: 1.7502, p-value: 0.18586

# COMPARATIVA DE MODELOS

# AIC de cada modelo
aic_tabla <- data.frame(
  Modelo = c("OLS", "SAR", "SEM", "SDM"),
  AIC    = c(AIC(modelo_ols), AIC(modelo_sar), AIC(modelo_sem), AIC(modelo_sdm)),
  LogLik = c(logLik(modelo_ols), logLik(modelo_sar), logLik(modelo_sem), logLik(modelo_sdm))
)
print(aic_tabla)

##   Modelo      AIC    LogLik
## 1    OLS 3344.778 -1659.389
## 2    SAR 3327.841 -1649.921
## 3    SEM 3329.932 -1650.966
## 4    SDM 3339.591 -1644.795

# Tabla comparativa con stargazer (OLS como base)
stargazer(modelo_ols, type = "text",
          title = "Modelo OLS - Variable Dependiente: CRIM")

## 
## Modelo OLS - Variable Dependiente: CRIM
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                                CRIM            
## -----------------------------------------------
## ZN                            0.036*           
##                               (0.019)          
##                                                
## INDUS                         -0.079           
##                               (0.084)          
##                                                
## NOX                           -7.142           
##                               (5.223)          
##                                                
## RM                            -0.356           
##                               (0.572)          
##                                                
## AGE                           0.0003           
##                               (0.018)          
##                                                
## DIS                          -0.706***         
##                               (0.272)          
##                                                
## RAD                          0.529***          
##                               (0.087)          
##                                                
## TAX                           -0.001           
##                               (0.005)          
##                                                
## PTRATIO                       -0.065           
##                               (0.179)          
##                                                
## B                            -0.010***         
##                               (0.004)          
##                                                
## LSTAT                        0.241***          
##                               (0.069)          
##                                                
## Constant                       9.810           
##                               (6.979)          
##                                                
## -----------------------------------------------
## Observations                    506            
## R2                             0.441           
## Adjusted R2                    0.428           
## Residual Std. Error      6.504 (df = 494)      
## F Statistic          35.375*** (df = 11; 494)  
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

# Moran test sobre residuales OLS (confirmar autocorrelación espacial)
moran.test(residuals(modelo_ols), boston_lw, zero.policy = TRUE)

## 
##  Moran I test under randomisation
## 
## data:  residuals(modelo_ols)  
## weights: boston_lw    
## 
## Moran I statistic standard deviate = 4.4652, p-value = 4.001e-06
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##      0.1101394239     -0.0019801980      0.0006305078

# Moran test sobre residuales SAR y SEM (deben ser no significativos)
moran.test(residuals(modelo_sar), boston_lw, zero.policy = TRUE)

## 
##  Moran I test under randomisation
## 
## data:  residuals(modelo_sar)  
## weights: boston_lw    
## 
## Moran I statistic standard deviate = -0.01728, p-value = 0.5069
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##     -0.0024117281     -0.0019801980      0.0006236381

moran.test(residuals(modelo_sem), boston_lw, zero.policy = TRUE)

## 
##  Moran I test under randomisation
## 
## data:  residuals(modelo_sem)  
## weights: boston_lw    
## 
## Moran I statistic standard deviate = -0.059786, p-value = 0.5238
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##     -0.0034712310     -0.0019801980      0.0006219878

# Impacts del modelo SAR (efectos directos e indirectos)
impacts_sar <- impacts(modelo_sar, listw = boston_lw, R = 500, zero.policy = TRUE)
summary(impacts_sar, zstats = TRUE, short = TRUE)

## Impact measures (lag, exact):
##                     Direct      Indirect        Total
## ZN dy/dx       0.029188105  0.0099489387  0.039137044
## INDUS dy/dx   -0.075380607 -0.0256939271 -0.101074534
## NOX dy/dx     -6.146047006 -2.0949165912 -8.240963598
## RM dy/dx      -0.521400950 -0.1777226076 -0.699123557
## AGE dy/dx     -0.001014662 -0.0003458536 -0.001360516
## DIS dy/dx     -0.655623987 -0.2234733261 -0.879097313
## RAD dy/dx      0.432070059  0.1472736433  0.579343702
## TAX dy/dx     -0.001253468 -0.0004272521 -0.001680720
## PTRATIO dy/dx -0.089298102 -0.0304377878 -0.119735889
## B dy/dx       -0.008676860 -0.0029575592 -0.011634419
## LSTAT dy/dx    0.186095044  0.0634316000  0.249526644
## ========================================================
## Simulation results ( variance matrix):
## ========================================================
## Simulated standard errors
##                    Direct    Indirect       Total
## ZN dy/dx      0.018257285 0.007558002 0.025145490
## INDUS dy/dx   0.081104292 0.031210318 0.110482998
## NOX dy/dx     5.167093102 2.019100184 7.047160387
## RM dy/dx      0.546516417 0.213010713 0.747009828
## AGE dy/dx     0.016781921 0.006115416 0.022700346
## DIS dy/dx     0.261355301 0.117075355 0.358921168
## RAD dy/dx     0.083747484 0.050499993 0.114706907
## TAX dy/dx     0.004960899 0.001873915 0.006778098
## PTRATIO dy/dx 0.172656658 0.066595333 0.236477354
## B dy/dx       0.003737074 0.001692151 0.005181033
## LSTAT dy/dx   0.068658142 0.030562938 0.093271632
## 
## Simulated z-values:
##                    Direct    Indirect      Total
## ZN dy/dx       1.51652200  1.28751642  1.4880850
## INDUS dy/dx   -0.98144166 -0.90718436 -0.9767353
## NOX dy/dx     -1.19349027 -1.08208861 -1.1851186
## RM dy/dx      -0.93491751 -0.86859526 -0.9316716
## AGE dy/dx     -0.08135033 -0.05177071 -0.0740876
## DIS dy/dx     -2.52123670 -1.96569388 -2.4770701
## RAD dy/dx      5.21197381  3.00434768  5.1279321
## TAX dy/dx     -0.28108575 -0.29363080 -0.2869061
## PTRATIO dy/dx -0.58836603 -0.55565367 -0.5860572
## B dy/dx       -2.38692661 -1.85537525 -2.3276624
## LSTAT dy/dx    2.71397423  2.11423406  2.6905676
## 
## Simulated p-values:
##               Direct     Indirect  Total     
## ZN dy/dx      0.1293874  0.1979144 0.1367285 
## INDUS dy/dx   0.3263750  0.3643093 0.3287002 
## NOX dy/dx     0.2326774  0.2792132 0.2359706 
## RM dy/dx      0.3498308  0.3850686 0.3515063 
## AGE dy/dx     0.9351634  0.9587114 0.9409407 
## DIS dy/dx     0.0116943  0.0493340 0.0132466 
## RAD dy/dx     1.8684e-07 0.0026615 2.9294e-07
## TAX dy/dx     0.7786446  0.7690400 0.7741843 
## PTRATIO dy/dx 0.5562866  0.5784476 0.5578371 
## B dy/dx       0.0169899  0.0635427 0.0199300 
## LSTAT dy/dx   0.0066481  0.0344953 0.0071331

# Impacts del modelo SDM
impacts_sdm <- impacts(modelo_sdm, listw = boston_lw, R = 500, zero.policy = TRUE)
summary(impacts_sdm, zstats = TRUE, short = TRUE)

## Impact measures (mixed, exact):
##                     Direct     Indirect         Total
## ZN dy/dx       0.027207013  0.018278938   0.045485951
## INDUS dy/dx   -0.043252902 -0.170482559  -0.213735461
## NOX dy/dx     -2.047189443 -9.111891505 -11.159080948
## RM dy/dx      -0.808326319  1.433559709   0.625233390
## AGE dy/dx      0.001589623 -0.019467158  -0.017877535
## DIS dy/dx     -0.596446057 -0.354383656  -0.950829713
## RAD dy/dx      0.425584319  0.152874404   0.578458723
## TAX dy/dx      0.001667608 -0.004394786  -0.002727178
## PTRATIO dy/dx  0.051477579 -0.233825438  -0.182347860
## B dy/dx       -0.008443725 -0.003641573  -0.012085298
## LSTAT dy/dx    0.118508149  0.442334868   0.560843017
## ========================================================
## Simulation results ( variance matrix):
## ========================================================
## Simulated standard errors
##                    Direct    Indirect       Total
## ZN dy/dx      0.020347464  0.04758671 0.045980766
## INDUS dy/dx   0.102635809  0.21175451 0.182289685
## NOX dy/dx     7.991160256 12.39778953 9.842466219
## RM dy/dx      0.581827292  1.38611763 1.504578175
## AGE dy/dx     0.019892333  0.04376280 0.043839298
## DIS dy/dx     0.371170637  0.68491169 0.574923883
## RAD dy/dx     0.134119919  0.20864481 0.170998141
## TAX dy/dx     0.005683419  0.01197105 0.011161167
## PTRATIO dy/dx 0.278349322  0.43707172 0.342325677
## B dy/dx       0.004402845  0.00856256 0.007658235
## LSTAT dy/dx   0.074188184  0.15210142 0.151342794
## 
## Simulated z-values:
##                    Direct   Indirect      Total
## ZN dy/dx       1.37381509  0.3631384  0.9837639
## INDUS dy/dx   -0.53011894 -0.7618535 -1.1834739
## NOX dy/dx     -0.14944953 -0.8041200 -1.1342265
## RM dy/dx      -1.42368961  1.0864133  0.4503290
## AGE dy/dx      0.05579434 -0.4961726 -0.4699898
## DIS dy/dx     -1.70043264 -0.4784848 -1.6678217
## RAD dy/dx      3.12115466  0.7725775  3.3906994
## TAX dy/dx      0.24503007 -0.3409027 -0.2408668
## PTRATIO dy/dx  0.17087888 -0.5148393 -0.5183884
## B dy/dx       -1.94636634 -0.4281371 -1.5976917
## LSTAT dy/dx    1.54923432  2.9870422  3.7614493
## 
## Simulated p-values:
##               Direct    Indirect  Total     
## ZN dy/dx      0.1694991 0.7165015 0.32523163
## INDUS dy/dx   0.5960295 0.4461475 0.23662137
## NOX dy/dx     0.8811989 0.4213277 0.25669956
## RM dy/dx      0.1545363 0.2772962 0.65247323
## AGE dy/dx     0.9555056 0.6197726 0.63836233
## DIS dy/dx     0.0890496 0.6323052 0.09535113
## RAD dy/dx     0.0018014 0.4397725 0.00069715
## TAX dy/dx     0.8064332 0.7331768 0.80965834
## PTRATIO dy/dx 0.8643190 0.6066653 0.60418728
## B dy/dx       0.0516107 0.6685513 0.11011160
## LSTAT dy/dx   0.1213254 0.0028169 0.00016893

Similar a las exposiciones de las Actividades 01 y 02, esta se llevará a cabo al iniciar la clase del Martes 21 de Abril y para esto pueden utilizar únicamente el archivo HTML kniteado a partir de su código RMD.

Sin embargo, la entrega final deberá incluir tanto tu código completamente ejecutado en archivo HTML, así como una breve presentación en la que se mencionen claramente los principales patrones o insights detectados en su análisis, así como sus conclusiones. Los archivos se deberán subir en Canvas por equipo a más tardar el día Viernes 24 de Abril en la sección habilitada por el profesor para esto, junto con los archivos correspondientes a las actividades 01 y 02.