Ejercicio Regresión Lineal Espacial

El propósito de esta actividad es poner en práctica lo estudiado hasta ahora en los temas de Análisis Exploratorio de Datos Espaciales, así como Modelos de Regresión Lineal Espaciales.

Primeramente aquí se proporcionan las librerías principales que deberás considerar. De todas maneras si deseas agregar cualquier otra librería lo puedes hacer y recomendaría hacerlo de una vez dentro del siguiente chunk de código:

library(foreign)
library(dplyr)
library(spdep)
library(tigris)
library(rgeoda)
library(RColorBrewer)
library(viridis)
library(ggplot2)
library(tmap)
library(sf) 
library(sp)
library(spatialreg)
library(stargazer)
library(plm)
library(splm)
library(pspatreg)
library(regclass)
library(mctest)
library(lmtest)
library(spData)
library(mapview)
library(naniar)
library(dlookr)
library(caret)
library(e1071)
library(SparseM)
library(Metrics)
library(randomForest)
library(rpart.plot)
library(insight)
library(jtools)
library(xgboost)
library(DiagrammeR)
library(effects)
library(leaflet)
library(kableExtra)

A continuación se carga la base de datos a ser analizada: ‘boston’ de la librería ‘spData’

data(boston, package="spData")
bd <- boston.c
kable(head(boston.c))
## Warning in attr(x, "align"): 'xfun::attr()' is deprecated.
## Use 'xfun::attr2()' instead.
## See help("Deprecated")
## Warning in attr(x, "format"): 'xfun::attr()' is deprecated.
## Use 'xfun::attr2()' instead.
## See help("Deprecated")
TOWN TOWNNO TRACT LON LAT MEDV CMEDV CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT
Nahant 0 2011 -70.9550 42.2550 24.0 24.0 0.00632 18 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90 4.98
Swampscott 1 2021 -70.9500 42.2875 21.6 21.6 0.02731 0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90 9.14
Swampscott 1 2022 -70.9360 42.2830 34.7 34.7 0.02729 0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03
Marblehead 2 2031 -70.9280 42.2930 33.4 33.4 0.03237 0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94
Marblehead 2 2032 -70.9220 42.2980 36.2 36.2 0.06905 0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90 5.33
Marblehead 2 2033 -70.9165 42.3040 28.7 28.7 0.02985 0 2.18 0 0.458 6.430 58.7 6.0622 3 222 18.7 394.12 5.21

Seguidamente, se cargan las geocercas asociadas a la data transversal previamente cargada

# Datos Espaciales
bd.tr<-st_read(system.file("shapes/boston_tracts.gpkg", package="spData")[1])
## Reading layer `boston_tracts' from data source 
##   `/Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/library/spData/shapes/boston_tracts.gpkg' 
##   using driver `GPKG'
## Simple feature collection with 506 features and 36 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -71.52311 ymin: 42.00305 xmax: -70.63823 ymax: 42.67307
## Geodetic CRS:  NAD27

Ejemplo de primer visualización en Mapas utilizando los datos ya cargados y visualizando los valores utilizando una variedad de paletas de colores.

bd.trSP<-as(bd.tr, "Spatial")
boston_nb<-poly2nb(bd.trSP, queen=T) 
mapview(bd.trSP, zcol="CMEDV", col.regions = viridisLite::magma(20))
## Warning: Found less unique colors (20) than unique zcol values (228)! 
## Interpolating color vector to match number of zcol values.

La meta esencial del análisis que llevarás a cabo será explicar la variable ‘CRIM’ o ‘CMEDV’ de acuerdo a la paridad del número de su equipo: - Equipos Impares 1, 3, 5 y 7 explicarán ‘CRIM’ - Equipos Pares 2, 4, 6 explicarán ‘CMEDV’ (en este caso deberán de omitir la variable ‘MEDV’ de la base de datos)

Realiza un breve análisis exploratorio de datos tradicional. Cuestiones que se deberán de incluir en esta sección incluyen: - Histogramas - Matriz de Correlaciones - Boxplots

bd.new <- bd.tr[, !(names(bd.tr) == "MEDV")]

Análisis Exploratorio

summary(bd.new)
##    poltract             TOWN               TOWNNO          TRACT     
##  Length:506         Length:506         Min.   : 0.00   Min.   :   1  
##  Class :character   Class :character   1st Qu.:26.25   1st Qu.:1303  
##  Mode  :character   Mode  :character   Median :42.00   Median :3394  
##                                        Mean   :47.53   Mean   :2700  
##                                        3rd Qu.:78.00   3rd Qu.:3740  
##                                        Max.   :91.00   Max.   :5082  
##                                                                      
##       LON              LAT            CMEDV            CRIM         
##  Min.   :-71.29   Min.   :42.03   Min.   : 5.00   Min.   : 0.00632  
##  1st Qu.:-71.09   1st Qu.:42.18   1st Qu.:17.02   1st Qu.: 0.08205  
##  Median :-71.05   Median :42.22   Median :21.20   Median : 0.25651  
##  Mean   :-71.06   Mean   :42.22   Mean   :22.53   Mean   : 3.61352  
##  3rd Qu.:-71.02   3rd Qu.:42.25   3rd Qu.:25.00   3rd Qu.: 3.67708  
##  Max.   :-70.81   Max.   :42.38   Max.   :50.00   Max.   :88.97620  
##                                                                     
##        ZN             INDUS           CHAS                NOX        
##  Min.   :  0.00   Min.   : 0.46   Length:506         Min.   :0.3850  
##  1st Qu.:  0.00   1st Qu.: 5.19   Class :character   1st Qu.:0.4490  
##  Median :  0.00   Median : 9.69   Mode  :character   Median :0.5380  
##  Mean   : 11.36   Mean   :11.14                      Mean   :0.5547  
##  3rd Qu.: 12.50   3rd Qu.:18.10                      3rd Qu.:0.6240  
##  Max.   :100.00   Max.   :27.74                      Max.   :0.8710  
##                                                                      
##        RM             AGE              DIS              RAD        
##  Min.   :3.561   Min.   :  2.90   Min.   : 1.130   Min.   : 1.000  
##  1st Qu.:5.886   1st Qu.: 45.02   1st Qu.: 2.100   1st Qu.: 4.000  
##  Median :6.208   Median : 77.50   Median : 3.207   Median : 5.000  
##  Mean   :6.285   Mean   : 68.57   Mean   : 3.795   Mean   : 9.549  
##  3rd Qu.:6.623   3rd Qu.: 94.08   3rd Qu.: 5.188   3rd Qu.:24.000  
##  Max.   :8.780   Max.   :100.00   Max.   :12.127   Max.   :24.000  
##                                                                    
##       TAX           PTRATIO            B              LSTAT      
##  Min.   :187.0   Min.   :12.60   Min.   :  0.32   Min.   : 1.73  
##  1st Qu.:279.0   1st Qu.:17.40   1st Qu.:375.38   1st Qu.: 6.95  
##  Median :330.0   Median :19.05   Median :391.44   Median :11.36  
##  Mean   :408.2   Mean   :18.46   Mean   :356.67   Mean   :12.65  
##  3rd Qu.:666.0   3rd Qu.:20.20   3rd Qu.:396.23   3rd Qu.:16.95  
##  Max.   :711.0   Max.   :22.00   Max.   :396.90   Max.   :37.97  
##                                                                  
##      units             cu5k            c5_7_5          C7_5_10       
##  Min.   :   5.0   Min.   : 0.000   Min.   : 0.000   Min.   :  0.000  
##  1st Qu.: 115.0   1st Qu.: 0.000   1st Qu.: 1.000   1st Qu.:  2.000  
##  Median : 511.5   Median : 1.000   Median : 3.000   Median :  6.000  
##  Mean   : 680.8   Mean   : 2.921   Mean   : 5.534   Mean   :  9.984  
##  3rd Qu.:1152.0   3rd Qu.: 4.000   3rd Qu.: 7.000   3rd Qu.: 13.000  
##  Max.   :3031.0   Max.   :35.000   Max.   :70.000   Max.   :121.000  
##                                                                      
##      C10_15           C15_20          C20_25          C25_35      
##  Min.   :  0.00   Min.   :  0.0   Min.   :  0.0   Min.   :   0.0  
##  1st Qu.: 14.00   1st Qu.: 19.0   1st Qu.: 13.0   1st Qu.:   7.0  
##  Median : 33.00   Median : 85.0   Median :101.5   Median :  95.0  
##  Mean   : 55.41   Mean   :141.8   Mean   :166.2   Mean   : 170.8  
##  3rd Qu.: 75.50   3rd Qu.:210.0   3rd Qu.:281.8   3rd Qu.: 292.0  
##  Max.   :520.00   Max.   :937.0   Max.   :723.0   Max.   :1189.0  
##                                                                   
##      C35_50           co50k            median            BB        
##  Min.   :  0.00   Min.   :  0.00   Min.   : 5600   Min.   : 0.000  
##  1st Qu.:  1.00   1st Qu.:  0.00   1st Qu.:16800   1st Qu.: 0.200  
##  Median : 17.00   Median :  3.00   Median :21000   Median : 0.500  
##  Mean   : 82.74   Mean   : 45.44   Mean   :21749   Mean   : 6.082  
##  3rd Qu.: 97.00   3rd Qu.: 20.00   3rd Qu.:24700   3rd Qu.: 1.600  
##  Max.   :769.00   Max.   :980.00   Max.   :50000   Max.   :96.400  
##                                    NA's   :17                      
##    censored             NOX_ID           POP                   geom    
##  Length:506         Min.   : 1.00   Min.   :  434   POLYGON      :506  
##  Class :character   1st Qu.:18.00   1st Qu.: 3697   epsg:4267    :  0  
##  Mode  :character   Median :44.00   Median : 5105   +proj=long...:  0  
##                     Mean   :41.77   Mean   : 5340                      
##                     3rd Qu.:62.00   3rd Qu.: 6825                      
##                     Max.   :96.00   Max.   :15976                      
## 
colnames(bd.new)
##  [1] "poltract" "TOWN"     "TOWNNO"   "TRACT"    "LON"      "LAT"     
##  [7] "CMEDV"    "CRIM"     "ZN"       "INDUS"    "CHAS"     "NOX"     
## [13] "RM"       "AGE"      "DIS"      "RAD"      "TAX"      "PTRATIO" 
## [19] "B"        "LSTAT"    "units"    "cu5k"     "c5_7_5"   "C7_5_10" 
## [25] "C10_15"   "C15_20"   "C20_25"   "C25_35"   "C35_50"   "co50k"   
## [31] "median"   "BB"       "censored" "NOX_ID"   "POP"      "geom"
head(bd.new)
## Simple feature collection with 6 features and 35 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -71.1753 ymin: 42.33 xmax: -71.1238 ymax: 42.3737
## Geodetic CRS:  NAD27
##   poltract                    TOWN TOWNNO TRACT      LON     LAT CMEDV    CRIM
## 1     0001 Boston Allston-Brighton     74     1 -71.0830 42.2172  17.8 8.98296
## 2     0002 Boston Allston-Brighton     74     2 -71.0950 42.2120  21.7 3.84970
## 3     0003 Boston Allston-Brighton     74     3 -71.1007 42.2100  22.7 5.20177
## 4     0004 Boston Allston-Brighton     74     4 -71.0930 42.2070  22.6 4.26131
## 5     0005 Boston Allston-Brighton     74     5 -71.0905 42.2033  25.0 4.54192
## 6     0006 Boston Allston-Brighton     74     6 -71.0865 42.2100  19.9 3.83684
##   ZN INDUS CHAS  NOX    RM  AGE    DIS RAD TAX PTRATIO      B LSTAT units cu5k
## 1  0  18.1    1 0.77 6.212 97.4 2.1222  24 666    20.2 377.73 17.60   126    3
## 2  0  18.1    1 0.77 6.395 91.0 2.5052  24 666    20.2 391.34 13.27   399    4
## 3  0  18.1    1 0.77 6.127 83.4 2.7227  24 666    20.2 395.43 11.48   368    3
## 4  0  18.1    0 0.77 6.112 81.3 2.5091  24 666    20.2 390.74 12.67   220    3
## 5  0  18.1    0 0.77 6.398 88.0 2.5182  24 666    20.2 374.56  7.79    44    0
## 6  0  18.1    0 0.77 6.251 91.1 2.2955  24 666    20.2 350.65 14.19   221    2
##   c5_7_5 C7_5_10 C10_15 C15_20 C20_25 C25_35 C35_50 co50k median  BB censored
## 1      3       4     26     43     29     16      1     1  17800 0.8       no
## 2     10       7     37     95    139     93      9     5  21700 1.4       no
## 3      1       2     25     84    127    102     24     0  22700 0.3       no
## 4      2       2     23     45     67     63     12     3  22600 0.8       no
## 5      0       1      1     11      9     12      9     1  25000 1.8       no
## 6      3       7     31     69     72     30      6     1  19900 3.7       no
##   NOX_ID  POP                           geom
## 1      1 3962 POLYGON ((-71.1238 42.3689,...
## 2      1 9245 POLYGON ((-71.1546 42.3573,...
## 3      1 6842 POLYGON ((-71.1685 42.3601,...
## 4      1 8342 POLYGON ((-71.15391 42.3461...
## 5      1 7836 POLYGON ((-71.1479 42.337, ...
## 6      1 9276 POLYGON ((-71.1382 42.3535,...

Histogramas

# Histogramas 
library(dplyr)
library(ggplot2)
library(sf)

# Remove geometry and select only numeric variables
bd.new.numeric <- bd.new %>%
  st_drop_geometry() %>%
  select(where(is.numeric))

# Loop over numeric column names and plot each histogram separately
for (var_name in names(bd.new.numeric)) {
  p <- ggplot(bd.new.numeric, aes_string(x = var_name)) +
    geom_histogram(fill = "steelblue", color = "white", bins = 30) +
    theme_minimal() +
    labs(
      title = paste("Distribución de", var_name),
      x = var_name,
      y = "Frecuencia"
    )
  
  print(p)
}
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Warning: Removed 17 rows containing non-finite outside the scale range
## (`stat_bin()`).

Matriz de correlación

# Matriz de Correlación 
library(dplyr)
library(sf)
# install.packages('corrplot')
library(corrplot)   # Para visualizar la matriz de correlación
## corrplot 0.95 loaded
# Eliminar geometría y seleccionar solo variables numéricas
bd.new.numeric <- bd.new %>%
  st_drop_geometry() %>%
  select(where(is.numeric))

# Calcular matriz de correlación (usando método Pearson por defecto)
corr_matrix <- cor(bd.new.numeric, use = "pairwise.complete.obs")

# Mostrar matriz de correlación
print(corr_matrix)
##                TOWNNO       TRACT           LON          LAT        CMEDV
## TOWNNO   1.0000000000 -0.45575701 -0.0006267488 -0.631647709 -0.265134492
## TRACT   -0.4557570088  1.00000000 -0.2208919274 -0.225540594  0.428252165
## LON     -0.0006267488 -0.22089193  1.0000000000  0.143053589 -0.322946685
## LAT     -0.6316477088 -0.22554059  0.1430535891  1.000000000  0.006825792
## CMEDV   -0.2651344917  0.42825216 -0.3229466851  0.006825792  1.000000000
## CRIM     0.4479196971 -0.54716534  0.0651006133 -0.084292955 -0.389582441
## ZN      -0.1239695438  0.36729434 -0.2180810717 -0.129667394  0.360386177
## INDUS    0.4344260977 -0.57570636  0.0627024525 -0.041093480 -0.484754379
## NOX      0.4293991967 -0.56980825  0.1608712500 -0.068600401 -0.429300219
## RM      -0.1173342359  0.30520745 -0.2571100279 -0.069316987  0.696303794
## AGE      0.2351410001 -0.48746725  0.2047389487  0.079035217 -0.377998896
## DIS     -0.3272804748  0.49684242 -0.0112431346 -0.082980855  0.249314834
## RAD      0.7240604222 -0.82882988  0.0340669366 -0.207012846 -0.384765552
## TAX      0.7079815660 -0.79360272  0.0506625750 -0.167718045 -0.471978807
## PTRATIO  0.3411109630 -0.53267942  0.3126021862 -0.004527081 -0.505654619
## B       -0.3065165696  0.36504663 -0.0182998564  0.105253702  0.334860832
## LSTAT    0.2966002020 -0.52248622  0.1956295900  0.045659550 -0.740835993
## units   -0.2526989971  0.51626534 -0.0332390591 -0.074744561  0.381451773
## cu5k     0.1834035726 -0.22440458  0.2310692197  0.024042027 -0.382425461
## c5_7_5   0.1731596843 -0.13693040  0.2716821886 -0.061513365 -0.383636757
## C7_5_10  0.0446478778  0.06015136  0.3532617984 -0.091505013 -0.297857170
## C10_15  -0.0648601496  0.24200182  0.3780005965 -0.138962860 -0.194921226
## C15_20  -0.1414856459  0.34891452  0.2405496977 -0.133850391 -0.069710894
## C20_25  -0.2292152442  0.40158715  0.0245576155 -0.019323451  0.119916916
## C25_35  -0.2481479936  0.45040472 -0.1710165157 -0.003737788  0.397447349
## C35_50  -0.1827326164  0.39195390 -0.2486660258 -0.050913223  0.616235061
## co50k   -0.1243641282  0.25099493 -0.2274768242 -0.020830045  0.672025419
## median  -0.3008165105  0.50161668 -0.3289352050  0.010910170  0.999813854
## BB       0.3116185230 -0.34870667  0.0179552734 -0.106773438 -0.302852201
## NOX_ID  -0.2597550389  0.95280101 -0.1760462672 -0.433196359  0.401938203
## POP     -0.0616537228  0.23318986  0.0168762267 -0.073256252  0.198081996
##                CRIM          ZN       INDUS         NOX          RM         AGE
## TOWNNO   0.44791970 -0.12396954  0.43442610  0.42939920 -0.11733424  0.23514100
## TRACT   -0.54716534  0.36729434 -0.57570636 -0.56980825  0.30520745 -0.48746725
## LON      0.06510061 -0.21808107  0.06270245  0.16087125 -0.25711003  0.20473895
## LAT     -0.08429296 -0.12966739 -0.04109348 -0.06860040 -0.06931699  0.07903522
## CMEDV   -0.38958244  0.36038618 -0.48475438 -0.42930022  0.69630379 -0.37799890
## CRIM     1.00000000 -0.20046922  0.40658341  0.42097171 -0.21924670  0.35273425
## ZN      -0.20046922  1.00000000 -0.53382819 -0.51660371  0.31199059 -0.56953734
## INDUS    0.40658341 -0.53382819  1.00000000  0.76365145 -0.39167585  0.64477851
## NOX      0.42097171 -0.51660371  0.76365145  1.00000000 -0.30218819  0.73147010
## RM      -0.21924670  0.31199059 -0.39167585 -0.30218819  1.00000000 -0.24026493
## AGE      0.35273425 -0.56953734  0.64477851  0.73147010 -0.24026493  1.00000000
## DIS     -0.37967009  0.66440822 -0.70802699 -0.76923011  0.20524621 -0.74788054
## RAD      0.62550515 -0.31194783  0.59512927  0.61144056 -0.20984667  0.45602245
## TAX      0.58276431 -0.31456332  0.72076018  0.66802320 -0.29204783  0.50645559
## PTRATIO  0.28994558 -0.39167855  0.38324756  0.18893268 -0.35550149  0.26151501
## B       -0.38506394  0.17552032 -0.35697654 -0.38005064  0.12806864 -0.27353398
## LSTAT    0.45562148 -0.41299457  0.60379972  0.59087892 -0.61380827  0.60233853
## units   -0.38522409  0.41134807 -0.64044629 -0.68054021  0.31728294 -0.75009020
## cu5k     0.22668294 -0.01609136  0.13275127  0.10384372 -0.19575128  0.07475397
## c5_7_5   0.08791673  0.03936148  0.06058100  0.05785874 -0.18622636  0.03580045
## C7_5_10 -0.07213610  0.07378400 -0.10585169 -0.11827589 -0.17427822 -0.09829070
## C10_15  -0.22213803  0.10882006 -0.27675893 -0.29983357 -0.15825798 -0.26261526
## C15_20  -0.29603425  0.11720771 -0.37764924 -0.45092623 -0.10299896 -0.46828503
## C20_25  -0.33564314  0.19860012 -0.45749593 -0.57112147  0.04351397 -0.66898381
## C25_35  -0.32222304  0.37894651 -0.56767008 -0.60725716  0.33576891 -0.70289352
## C35_50  -0.23708128  0.47803810 -0.53813891 -0.47813679  0.57081625 -0.51826759
## co50k   -0.14869913  0.39999645 -0.36856309 -0.29443186  0.60427950 -0.29570331
## median  -0.42759586  0.39545043 -0.58306473 -0.51080448  0.67741461 -0.47500199
## BB       0.34529211 -0.14643613  0.30220962  0.30821035 -0.07375350  0.22276741
## NOX_ID  -0.47849870  0.38399314 -0.58250702 -0.54701886  0.29830507 -0.50591387
## POP     -0.43093129  0.07328424 -0.19142672 -0.26730727  0.14001358 -0.29806952
##                 DIS         RAD         TAX      PTRATIO           B
## TOWNNO  -0.32728047  0.72406042  0.70798157  0.341110963 -0.30651657
## TRACT    0.49684242 -0.82882988 -0.79360272 -0.532679418  0.36504663
## LON     -0.01124313  0.03406694  0.05066257  0.312602186 -0.01829986
## LAT     -0.08298085 -0.20701285 -0.16771805 -0.004527081  0.10525370
## CMEDV    0.24931483 -0.38476555 -0.47197881 -0.505654619  0.33486083
## CRIM    -0.37967009  0.62550515  0.58276431  0.289945579 -0.38506394
## ZN       0.66440822 -0.31194783 -0.31456332 -0.391678548  0.17552032
## INDUS   -0.70802699  0.59512927  0.72076018  0.383247556 -0.35697654
## NOX     -0.76923011  0.61144056  0.66802320  0.188932677 -0.38005064
## RM       0.20524621 -0.20984667 -0.29204783 -0.355501495  0.12806864
## AGE     -0.74788054  0.45602245  0.50645559  0.261515012 -0.27353398
## DIS      1.00000000 -0.49458793 -0.53443158 -0.232470542  0.29151167
## RAD     -0.49458793  1.00000000  0.91022819  0.464741179 -0.44441282
## TAX     -0.53443158  0.91022819  1.00000000  0.460853035 -0.44180801
## PTRATIO -0.23247054  0.46474118  0.46085304  1.000000000 -0.17738330
## B        0.29151167 -0.44441282 -0.44180801 -0.177383302  1.00000000
## LSTAT   -0.49699583  0.48867633  0.54399341  0.374044317 -0.36608690
## units    0.64488152 -0.46568972 -0.52160600 -0.211586226  0.36878467
## cu5k    -0.03225965  0.17543958  0.23202063  0.184547066 -0.02496275
## c5_7_5   0.05462511  0.12569602  0.18596137  0.169727251 -0.01904251
## C7_5_10  0.26965588 -0.06395600  0.01323602  0.140993787  0.06430475
## C10_15   0.43622264 -0.23795400 -0.18396298  0.101732874  0.18288475
## C15_20   0.50524351 -0.32502636 -0.32161689  0.053126281  0.28125384
## C20_25   0.53253646 -0.37773128 -0.41896182 -0.063750390  0.33236882
## C25_35   0.52567049 -0.40501847 -0.47427394 -0.230935720  0.31149354
## C35_50   0.40372280 -0.32740727 -0.40396396 -0.345896991  0.21934642
## co50k    0.21764647 -0.20904817 -0.27854052 -0.364027956  0.13841450
## median   0.34794775 -0.45267270 -0.55098797 -0.502141628  0.36917168
## BB      -0.24949608  0.42889641  0.41048918  0.186989959 -0.91596093
## NOX_ID   0.52841522 -0.70835373 -0.71531073 -0.484437928  0.33711578
## POP      0.27080416 -0.18255626 -0.18024914 -0.022366256  0.26640524
##               LSTAT        units         cu5k      c5_7_5      C7_5_10
## TOWNNO   0.29660020 -0.252698997  0.183403573  0.17315968  0.044647878
## TRACT   -0.52248622  0.516265336 -0.224404576 -0.13693040  0.060151359
## LON      0.19562959 -0.033239059  0.231069220  0.27168219  0.353261798
## LAT      0.04565955 -0.074744561  0.024042027 -0.06151337 -0.091505013
## CMEDV   -0.74083599  0.381451773 -0.382425461 -0.38363676 -0.297857170
## CRIM     0.45562148 -0.385224086  0.226682938  0.08791673 -0.072136097
## ZN      -0.41299457  0.411348073 -0.016091362  0.03936148  0.073784001
## INDUS    0.60379972 -0.640446294  0.132751266  0.06058100 -0.105851691
## NOX      0.59087892 -0.680540206  0.103843722  0.05785874 -0.118275891
## RM      -0.61380827  0.317282941 -0.195751280 -0.18622636 -0.174278221
## AGE      0.60233853 -0.750090200  0.074753974  0.03580045 -0.098290696
## DIS     -0.49699583  0.644881522 -0.032259654  0.05462511  0.269655884
## RAD      0.48867633 -0.465689716  0.175439583  0.12569602 -0.063956000
## TAX      0.54399341 -0.521606001  0.232020629  0.18596137  0.013236025
## PTRATIO  0.37404432 -0.211586226  0.184547066  0.16972725  0.140993787
## B       -0.36608690  0.368784668 -0.024962746 -0.01904251  0.064304750
## LSTAT    1.00000000 -0.625267559  0.286280452  0.23554554  0.076608802
## units   -0.62526756  1.000000000  0.005210945  0.08013362  0.270192056
## cu5k     0.28628045  0.005210945  1.000000000  0.74724385  0.544189323
## c5_7_5   0.23554554  0.080133620  0.747243851  1.00000000  0.734675905
## C7_5_10  0.07660880  0.270192056  0.544189323  0.73467591  1.000000000
## C10_15  -0.11435196  0.493105057  0.311680993  0.51096136  0.840994061
## C15_20  -0.30173109  0.702616352  0.139927804  0.27318310  0.536750369
## C20_25  -0.47471478  0.860235130  0.023179250  0.07750317  0.247280122
## C25_35  -0.59871772  0.876469758 -0.105927840 -0.09448994 -0.007001159
## C35_50  -0.55774640  0.651932530 -0.164579038 -0.17004549 -0.141152021
## co50k   -0.41814401  0.377522080 -0.140857111 -0.16503897 -0.163693463
## median  -0.75345672  0.487885788 -0.370750812 -0.37446247 -0.278985661
## BB       0.30856809 -0.312431110  0.029140706  0.02572618 -0.061680607
## NOX_ID  -0.51518796  0.538103840 -0.194148417 -0.09164878  0.097098137
## POP     -0.38017116  0.628191994 -0.006544089  0.06584493  0.252604858
##              C10_15        C15_20      C20_25       C25_35        C35_50
## TOWNNO  -0.06486015 -0.1414856459 -0.22921524 -0.248147994 -0.1827326164
## TRACT    0.24200182  0.3489145162  0.40158715  0.450404724  0.3919539034
## LON      0.37800060  0.2405496977  0.02455762 -0.171016516 -0.2486660258
## LAT     -0.13896286 -0.1338503911 -0.01932345 -0.003737788 -0.0509132230
## CMEDV   -0.19492123 -0.0697108935  0.11991692  0.397447349  0.6162350612
## CRIM    -0.22213803 -0.2960342453 -0.33564314 -0.322223041 -0.2370812809
## ZN       0.10882006  0.1172077078  0.19860012  0.378946507  0.4780380961
## INDUS   -0.27675893 -0.3776492400 -0.45749593 -0.567670076 -0.5381389146
## NOX     -0.29983357 -0.4509262307 -0.57112147 -0.607257164 -0.4781367942
## RM      -0.15825798 -0.1029989639  0.04351397  0.335768911  0.5708162455
## AGE     -0.26261526 -0.4682850293 -0.66898381 -0.702893517 -0.5182675937
## DIS      0.43622264  0.5052435148  0.53253646  0.525670491  0.4037227974
## RAD     -0.23795400 -0.3250263563 -0.37773128 -0.405018473 -0.3274072691
## TAX     -0.18396298 -0.3216168928 -0.41896182 -0.474273941 -0.4039639565
## PTRATIO  0.10173287  0.0531262806 -0.06375039 -0.230935720 -0.3458969914
## B        0.18288475  0.2812538388  0.33236882  0.311493539  0.2193464221
## LSTAT   -0.11435196 -0.3017310924 -0.47471478 -0.598717717 -0.5577464019
## units    0.49310506  0.7026163523  0.86023513  0.876469758  0.6519325302
## cu5k     0.31168099  0.1399278039  0.02317925 -0.105927840 -0.1645790376
## c5_7_5   0.51096136  0.2731831044  0.07750317 -0.094489944 -0.1700454866
## C7_5_10  0.84099406  0.5367503688  0.24728012 -0.007001159 -0.1411520206
## C10_15   1.00000000  0.8411718098  0.50940882  0.135185122 -0.0984089392
## C15_20   0.84117181  1.0000000000  0.81322789  0.387604639 -0.0006683666
## C20_25   0.50940882  0.8132278852  1.00000000  0.755812126  0.2441204105
## C25_35   0.13518512  0.3876046390  0.75581213  1.000000000  0.7163527411
## C35_50  -0.09840894 -0.0006683666  0.24412041  0.716352741  1.0000000000
## co50k   -0.15465208 -0.1398527168 -0.02974482  0.280029621  0.7105041300
## median  -0.15476384  0.0010385156  0.22912414  0.536566915  0.7228904468
## BB      -0.16148202 -0.2427051830 -0.28100220 -0.261019590 -0.1834834386
## NOX_ID   0.28741593  0.3954854869  0.41634813  0.453747752  0.3958719628
## POP      0.44973713  0.5744318737  0.61775048  0.501550678  0.2515035930
##               co50k       median          BB      NOX_ID          POP
## TOWNNO  -0.12436413 -0.300816511  0.31161852 -0.25975504 -0.061653723
## TRACT    0.25099493  0.501616676 -0.34870667  0.95280101  0.233189863
## LON     -0.22747682 -0.328935205  0.01795527 -0.17604627  0.016876227
## LAT     -0.02083004  0.010910170 -0.10677344 -0.43319636 -0.073256252
## CMEDV    0.67202542  0.999813854 -0.30285220  0.40193820  0.198081996
## CRIM    -0.14869913 -0.427595861  0.34529211 -0.47849870 -0.430931289
## ZN       0.39999645  0.395450431 -0.14643613  0.38399314  0.073284238
## INDUS   -0.36856309 -0.583064735  0.30220962 -0.58250702 -0.191426716
## NOX     -0.29443186 -0.510804476  0.30821035 -0.54701886 -0.267307271
## RM       0.60427950  0.677414605 -0.07375350  0.29830507  0.140013577
## AGE     -0.29570331 -0.475001993  0.22276741 -0.50591387 -0.298069522
## DIS      0.21764647  0.347947750 -0.24949608  0.52841522  0.270804162
## RAD     -0.20904817 -0.452672700  0.42889641 -0.70835373 -0.182556259
## TAX     -0.27854052 -0.550987968  0.41048918 -0.71531073 -0.180249136
## PTRATIO -0.36402796 -0.502141628  0.18698996 -0.48443793 -0.022366256
## B        0.13841450  0.369171683 -0.91596093  0.33711578  0.266405236
## LSTAT   -0.41814401 -0.753456716  0.30856809 -0.51518796 -0.380171162
## units    0.37752208  0.487885788 -0.31243111  0.53810384  0.628191994
## cu5k    -0.14085711 -0.370750812  0.02914071 -0.19414842 -0.006544089
## c5_7_5  -0.16503897 -0.374462474  0.02572618 -0.09164878  0.065844930
## C7_5_10 -0.16369346 -0.278985661 -0.06168061  0.09709814  0.252604858
## C10_15  -0.15465208 -0.154763837 -0.16148202  0.28741593  0.449737128
## C15_20  -0.13985272  0.001038516 -0.24270518  0.39548549  0.574431874
## C20_25  -0.02974482  0.229124138 -0.28100220  0.41634813  0.617750484
## C25_35   0.28002962  0.536566915 -0.26101959  0.45374775  0.501550678
## C35_50   0.71050413  0.722890447 -0.18348344  0.39587196  0.251503593
## co50k    1.00000000  0.671766179 -0.11621203  0.23549363  0.121778582
## median   0.67176618  1.000000000 -0.33286055  0.47424490  0.243445007
## BB      -0.11621203 -0.332860551  1.00000000 -0.31083129 -0.217438191
## NOX_ID   0.23549363  0.474244896 -0.31083129  1.00000000  0.252331228
## POP      0.12177858  0.243445007 -0.21743819  0.25233123  1.000000000
# Visualizar con corrplot
corrplot(corr_matrix, method = "color", 
         type = "upper", 
         tl.col = "black", 
         tl.srt = 45,
         addCoef.col = "black",  # agrega coeficientes en la matriz
         number.cex = 0.7,
         diag = FALSE)

BOXPLOTS

# boxplots 
library(dplyr)
library(ggplot2)
library(sf)

# Quitar geometría y seleccionar variables numéricas
bd.new.numeric <- bd.new %>%
  st_drop_geometry() %>%
  select(where(is.numeric))

# Crear un boxplot por variable
for (var_name in names(bd.new.numeric)) {
  p <- ggplot(bd.new.numeric, aes_string(x = "''", y = var_name)) +
    geom_boxplot(fill = "tomato", outlier.color = "black", outlier.size = 1) +
    theme_minimal() +
    labs(
      title = paste("Boxplot de", var_name),
      x = "",
      y = var_name
    )
  print(p)
}

## Warning: Removed 17 rows containing non-finite outside the scale range
## (`stat_boxplot()`).

Realiza un análisis exploratorio de datos espaciales. Los puntos esenciales a generar en esta sección son los siguientes: - Definir claramente la matriz de conectividad espacial - Graficar dicha matriz, utilizar la estructura ‘Reina’ para determinar los vecinos - Realizar mapa coloreado de acuerdo al valor de la variable que se desea explicar y al menos 2 de las variables independientes que se utilizarán - Realizar análisis de clusters: + Incluir mapa de un lag espacial de la variable que se desea explicar junto con el mapa original previamente realizado + Visualización Espacial de Clusters (HotSpots, ColdSpots, NonSignificant y Atípicos: HighLows, LowHighs) - Índice Global de Moran + Calcular el Estadístico del Índice Global de Moran para las variables previamente graficadas (al menos son 3: 1 dependiente, 2 independientes) + Generar sus respectivos gráficos de dispersión

# Definir matriz de conectividad espacial 
swm  <- poly2nb(bd.trSP, queen=T)
summary(swm) 
## Neighbour list object:
## Number of regions: 506 
## Number of nonzero links: 2910 
## Percentage nonzero weights: 1.136559 
## Average number of links: 5.750988 
## Link number distribution:
## 
##   1   2   3   4   5   6   7   8   9  10  11  12  15 
##   3   9  28  81 107 120  87  40  22   5   2   1   1 
## 3 least connected regions:
## 18 51 345 with 1 link
## 1 most connected region:
## 112 with 15 links
# Graficar matriz de conectividad espacial (Reina) para determinar vecinos
sswm <- nb2listw(swm, style="W", zero.policy = TRUE)
bd.trSP_centroid <- coordinates(bd.trSP) 
plot(bd.trSP,border="blue",axes=FALSE,las=1, main="Matriz de conectividad espacial")
plot(bd.trSP,col="grey",border=grey(0.9),axes=T,add=T) 
plot(sswm,coords=bd.trSP_centroid,pch=19,cex=0.1,col="red",add=T) 

# Agregar CMEDV 
bd.trSP$CMEDV <- bd.new$CMEDV
# Lag espacial 
bd.trSP$lag_CMEDV <- lag.listw(sswm, bd.trSP$CMEDV)
par(mfrow=c(1,2))
spplot(bd.trSP, "CMEDV", main = "CMEDV original", col.regions = viridis(100))

spplot(bd.trSP, "lag_CMEDV", main = "Lag Espacial de CMEDV", col.regions = viridis(100))

# Mapa coloreado con la variable CMEDV y variable INDUS y TAX

library(sp)
library(spdep)
library(viridis)

# Asumiendo que bd.trSP es un objeto SpatialPolygonsDataFrame y que contiene las variables
# Si no tienes INDUS y AGE, asegúrate de agregarlas igual que hiciste con CMEDV

# Mapa coloreado para CMEDV
spplot(bd.trSP, "CMEDV", main = "Mapa coloreado de CMEDV", col.regions = viridis(100))

# Mapa coloreado para INDUS
spplot(bd.trSP, "INDUS", main = "Mapa coloreado de INDUS", col.regions = viridis(100))

# Mapa coloreado para TAX
spplot(bd.trSP, "TAX", main = "Mapa coloreado de TAX", col.regions = viridis(100))

bd_sf <- st_as_sf(bd.trSP)
# Matriz reina
queen_w <- queen_weights(bd_sf)
# Clusters
lisa_CMEDV <- local_moran(queen_w, bd_sf["CMEDV"])
bd_sf$cluster_CMEDV <- as.factor(lisa_CMEDV$GetClusterIndicators())
levels(bd_sf$cluster_CMEDV) <- lisa_CMEDV$GetLabels()
ggplot(data = bd_sf) +
  geom_sf(aes(fill = cluster_CMEDV), color = "white", size = 0.2) +
  scale_fill_brewer(palette = "RdBu", name = "Cluster") +
  theme_minimal() +
  labs(title = "Clústeres Espaciales de CMEDV",
       subtitle = "HotSpots, ColdSpots y Atípicos")

# Índice Global de Moran
# CMEDV
moran.test(bd.trSP$CMEDV, listw = sswm, zero.policy = TRUE)
## 
##  Moran I test under randomisation
## 
## data:  bd.trSP$CMEDV  
## weights: sswm    
## 
## Moran I statistic standard deviate = 23.558, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##      0.6322686784     -0.0019801980      0.0007248376
moran.plot(bd.trSP$CMEDV, sswm,
           main = "Moran’s I - CMEDV")

# RAD
moran.test(bd.trSP$TAX, listw = sswm, zero.policy = TRUE)
## 
##  Moran I test under randomisation
## 
## data:  bd.trSP$TAX  
## weights: sswm    
## 
## Moran I statistic standard deviate = 30.389, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.818327988      -0.001980198       0.000728656
moran.plot(bd.trSP$TAX, sswm,
           main = "Moran’s I - TAX")

# INDUS
moran.test(bd.tr$INDUS, listw = sswm, zero.policy = TRUE)
## 
##  Moran I test under randomisation
## 
## data:  bd.tr$INDUS  
## weights: sswm    
## 
## Moran I statistic standard deviate = 30.12, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##      0.8111468479     -0.0019801980      0.0007287869
moran.plot(bd.tr$INDUS, sswm, 
           main = "Moran's I - INDUS")

Por último deberás de ajustar los Modelos de Regresión estudiados en este Módulo: - Modelo de Regresión Lineal Tradicional - Modelo de Regresión Espacial AutoRegresivo (SAR) - Modelo de Regresión Espacial de Errores (SEM) - Modelo de Regresión Espacial Durbin

Regresiones

Regresion Lineal Tradicional

# Modelo de Regresión Lineal Tradicional

model_ols <- lm(CMEDV ~ TAX + INDUS, data = bd.trSP)
summary(model_ols)
## 
## Call:
## lm(formula = CMEDV ~ TAX + INDUS, data = bd.trSP)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -12.443  -4.640  -1.954   3.279  33.858 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 32.687908   0.920972  35.493  < 2e-16 ***
## TAX         -0.013899   0.003002  -4.630 4.65e-06 ***
## INDUS       -0.402701   0.073746  -5.461 7.47e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.881 on 503 degrees of freedom
## Multiple R-squared:  0.2663, Adjusted R-squared:  0.2633 
## F-statistic: 91.27 on 2 and 503 DF,  p-value: < 2.2e-16
AIC(model_ols)
## [1] 3530.177

Regresion Espacial AutoRegresivo (SAR)

# Modelo de Regresión Espacial AutoRegresivo (SAR)
model_sar <- lagsarlm(CMEDV ~ TAX + INDUS, data=bd.trSP, listw = sswm)
summary(model_sar)
## 
## Call:lagsarlm(formula = CMEDV ~ TAX + INDUS, data = bd.trSP, listw = sswm)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -13.04892  -3.13774  -0.97517   1.62383  28.71094 
## 
## Type: lag 
## Coefficients: (asymptotic standard errors) 
##               Estimate Std. Error z value Pr(>|z|)
## (Intercept)  9.5856028  1.3111979  7.3106 2.66e-13
## TAX         -0.0053830  0.0021912 -2.4567 0.014023
## INDUS       -0.1497115  0.0539188 -2.7766 0.005493
## 
## Rho: 0.7385, LR test value: 282.57, p-value: < 2.22e-16
## Asymptotic standard error: 0.034368
##     z-value: 21.488, p-value: < 2.22e-16
## Wald statistic: 461.74, p-value: < 2.22e-16
## 
## Log likelihood: -1619.802 for lag model
## ML residual variance (sigma squared): 30.9, (sigma: 5.5588)
## Number of observations: 506 
## Number of parameters estimated: 5 
## AIC: 3249.6, (AIC for lm: 3530.2)
## LM test for residual autocorrelation
## test value: 5.4407, p-value: 0.019673

Regresion Espacial de Errores (SEM)

# Modelo de Regresión Espacial de Errores (SEM)
model_sem <- errorsarlm(CMEDV ~ TAX + INDUS, data=bd.trSP, listw = sswm) 
summary(model_sem)
## 
## Call:errorsarlm(formula = CMEDV ~ TAX + INDUS, data = bd.trSP, listw = sswm)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -14.39670  -3.04655  -0.91684   1.90550  28.42394 
## 
## Type: error 
## Coefficients: (asymptotic standard errors) 
##               Estimate Std. Error z value  Pr(>|z|)
## (Intercept) 32.2908228  1.6678662 19.3606 < 2.2e-16
## TAX         -0.0147438  0.0037163 -3.9674 7.267e-05
## INDUS       -0.4207164  0.0913266 -4.6067 4.091e-06
## 
## Lambda: 0.78271, LR test value: 315.65, p-value: < 2.22e-16
## Asymptotic standard error: 0.032246
##     z-value: 24.273, p-value: < 2.22e-16
## Wald statistic: 589.19, p-value: < 2.22e-16
## 
## Log likelihood: -1603.264 for error model
## ML residual variance (sigma squared): 28.292, (sigma: 5.319)
## Number of observations: 506 
## Number of parameters estimated: 5 
## AIC: 3216.5, (AIC for lm: 3530.2)

Regresion Espacial dURBIN

# Modelo de Regresión Espacial Durbin
model_durbin <- lagsarlm(CMEDV ~ TAX + INDUS, data=bd.trSP, listw = sswm, type="mixed")
summary(model_durbin)
## 
## Call:lagsarlm(formula = CMEDV ~ TAX + INDUS, data = bd.trSP, listw = sswm, 
##     type = "mixed")
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.5376  -3.0276  -0.9167   1.8568  28.2509 
## 
## Type: mixed 
## Coefficients: (asymptotic standard errors) 
##               Estimate Std. Error z value  Pr(>|z|)
## (Intercept)  6.7895170  1.2259329  5.5382 3.055e-08
## TAX         -0.0149424  0.0041149 -3.6313  0.000282
## INDUS       -0.4509479  0.1028011 -4.3866 1.151e-05
## lag.TAX      0.0115367  0.0050864  2.2681  0.023321
## lag.INDUS    0.3888530  0.1284000  3.0284  0.002458
## 
## Rho: 0.78182, LR test value: 314.95, p-value: < 2.22e-16
## Asymptotic standard error: 0.032316
##     z-value: 24.193, p-value: < 2.22e-16
## Wald statistic: 585.3, p-value: < 2.22e-16
## 
## Log likelihood: -1602.993 for mixed model
## ML residual variance (sigma squared): 28.275, (sigma: 5.3175)
## Number of observations: 506 
## Number of parameters estimated: 7 
## AIC: 3220, (AIC for lm: 3532.9)
## LM test for residual autocorrelation
## test value: 0.13164, p-value: 0.71674
AIC(model_durbin)
## [1] 3219.987

Generar breve comparativa de estos modelos y elegir ¿cuál consideran qué es el mejor modelo? y ¿por qué?

stargazer(model_ols, model_sar, model_sem, model_durbin, type = "text", title="Estimated Regression Results CMEDV")
## 
## Estimated Regression Results CMEDV
## ====================================================================================
##                                           Dependent variable:                       
##                     ----------------------------------------------------------------
##                                                  CMEDV                              
##                               OLS              spatial      spatial      spatial    
##                                             autoregressive   error    autoregressive
##                               (1)                (2)          (3)          (4)      
## ------------------------------------------------------------------------------------
## TAX                        -0.014***           -0.005**    -0.015***    -0.015***   
##                             (0.003)            (0.002)      (0.004)      (0.004)    
##                                                                                     
## INDUS                      -0.403***          -0.150***    -0.421***    -0.451***   
##                             (0.074)            (0.054)      (0.091)      (0.103)    
##                                                                                     
## lag.TAX                                                                  0.012**    
##                                                                          (0.005)    
##                                                                                     
## lag.INDUS                                                                0.389***   
##                                                                          (0.128)    
##                                                                                     
## Constant                   32.688***           9.586***    32.291***     6.790***   
##                             (0.921)            (1.311)      (1.668)      (1.226)    
##                                                                                     
## ------------------------------------------------------------------------------------
## Observations                  506                506          506          506      
## R2                           0.266                                                  
## Adjusted R2                  0.263                                                  
## Log Likelihood                                -1,619.802   -1,603.264   -1,602.993  
## sigma2                                          30.900       28.292       28.275    
## Akaike Inf. Crit.                             3,249.605    3,216.528    3,219.987   
## Residual Std. Error    7.881 (df = 503)                                             
## F Statistic         91.265*** (df = 2; 503)                                         
## Wald Test (df = 1)                            461.740***   589.187***   585.299***  
## LR Test (df = 1)                              282.572***   315.649***   314.953***  
## ====================================================================================
## Note:                                                    *p<0.1; **p<0.05; ***p<0.01
# AIC Comparison
aic_values <- data.frame(
  Model = c("OLS", "SAR", "SEM", "Durbin"),
  AIC = c(AIC(model_ols), AIC(model_sar), AIC(model_sem), AIC(model_durbin))
)

print(aic_values)
##    Model      AIC
## 1    OLS 3530.177
## 2    SAR 3249.605
## 3    SEM 3216.528
## 4 Durbin 3219.987