El propósito de esta actividad es poner en práctica lo estudiado hasta ahora en los temas de Análisis Exploratorio de Datos Espaciales, así como Modelos de Regresión Lineal Espaciales.
Primeramente aquí se proporcionan las librerías principales que deberás considerar. De todas maneras si deseas agregar cualquier otra librería lo puedes hacer y recomendaría hacerlo de una vez dentro del siguiente chunk de código:
library(foreign)
library(dplyr)
library(spdep)
library(tigris)
library(rgeoda)
library(RColorBrewer)
library(viridis)
library(ggplot2)
library(tmap)
library(sf)
library(sp)
library(spatialreg)
library(stargazer)
library(plm)
library(splm)
library(pspatreg)
library(regclass)
library(mctest)
library(lmtest)
library(spData)
library(mapview)
library(naniar)
library(dlookr)
library(caret)
library(e1071)
library(SparseM)
library(Metrics)
library(randomForest)
library(rpart.plot)
library(insight)
library(jtools)
library(xgboost)
library(DiagrammeR)
library(effects)
library(leaflet)
library(kableExtra)
A continuación se carga la base de datos a ser analizada: ‘boston’ de la librería ‘spData’
data(boston, package="spData")
bd <- boston.c
kable(head(boston.c))
## Warning in attr(x, "align"): 'xfun::attr()' is deprecated.
## Use 'xfun::attr2()' instead.
## See help("Deprecated")
## Warning in attr(x, "format"): 'xfun::attr()' is deprecated.
## Use 'xfun::attr2()' instead.
## See help("Deprecated")
TOWN | TOWNNO | TRACT | LON | LAT | MEDV | CMEDV | CRIM | ZN | INDUS | CHAS | NOX | RM | AGE | DIS | RAD | TAX | PTRATIO | B | LSTAT |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Nahant | 0 | 2011 | -70.9550 | 42.2550 | 24.0 | 24.0 | 0.00632 | 18 | 2.31 | 0 | 0.538 | 6.575 | 65.2 | 4.0900 | 1 | 296 | 15.3 | 396.90 | 4.98 |
Swampscott | 1 | 2021 | -70.9500 | 42.2875 | 21.6 | 21.6 | 0.02731 | 0 | 7.07 | 0 | 0.469 | 6.421 | 78.9 | 4.9671 | 2 | 242 | 17.8 | 396.90 | 9.14 |
Swampscott | 1 | 2022 | -70.9360 | 42.2830 | 34.7 | 34.7 | 0.02729 | 0 | 7.07 | 0 | 0.469 | 7.185 | 61.1 | 4.9671 | 2 | 242 | 17.8 | 392.83 | 4.03 |
Marblehead | 2 | 2031 | -70.9280 | 42.2930 | 33.4 | 33.4 | 0.03237 | 0 | 2.18 | 0 | 0.458 | 6.998 | 45.8 | 6.0622 | 3 | 222 | 18.7 | 394.63 | 2.94 |
Marblehead | 2 | 2032 | -70.9220 | 42.2980 | 36.2 | 36.2 | 0.06905 | 0 | 2.18 | 0 | 0.458 | 7.147 | 54.2 | 6.0622 | 3 | 222 | 18.7 | 396.90 | 5.33 |
Marblehead | 2 | 2033 | -70.9165 | 42.3040 | 28.7 | 28.7 | 0.02985 | 0 | 2.18 | 0 | 0.458 | 6.430 | 58.7 | 6.0622 | 3 | 222 | 18.7 | 394.12 | 5.21 |
Seguidamente, se cargan las geocercas asociadas a la data transversal previamente cargada
# Datos Espaciales
bd.tr<-st_read(system.file("shapes/boston_tracts.gpkg", package="spData")[1])
## Reading layer `boston_tracts' from data source
## `/Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/library/spData/shapes/boston_tracts.gpkg'
## using driver `GPKG'
## Simple feature collection with 506 features and 36 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -71.52311 ymin: 42.00305 xmax: -70.63823 ymax: 42.67307
## Geodetic CRS: NAD27
Ejemplo de primer visualización en Mapas utilizando los datos ya cargados y visualizando los valores utilizando una variedad de paletas de colores.
bd.trSP<-as(bd.tr, "Spatial")
boston_nb<-poly2nb(bd.trSP, queen=T)
mapview(bd.trSP, zcol="CMEDV", col.regions = viridisLite::magma(20))
## Warning: Found less unique colors (20) than unique zcol values (228)!
## Interpolating color vector to match number of zcol values.
La meta esencial del análisis que llevarás a cabo será explicar la variable ‘CRIM’ o ‘CMEDV’ de acuerdo a la paridad del número de su equipo: - Equipos Impares 1, 3, 5 y 7 explicarán ‘CRIM’ - Equipos Pares 2, 4, 6 explicarán ‘CMEDV’ (en este caso deberán de omitir la variable ‘MEDV’ de la base de datos)
Realiza un breve análisis exploratorio de datos tradicional. Cuestiones que se deberán de incluir en esta sección incluyen: - Histogramas - Matriz de Correlaciones - Boxplots
bd.new <- bd.tr[, !(names(bd.tr) == "MEDV")]
summary(bd.new)
## poltract TOWN TOWNNO TRACT
## Length:506 Length:506 Min. : 0.00 Min. : 1
## Class :character Class :character 1st Qu.:26.25 1st Qu.:1303
## Mode :character Mode :character Median :42.00 Median :3394
## Mean :47.53 Mean :2700
## 3rd Qu.:78.00 3rd Qu.:3740
## Max. :91.00 Max. :5082
##
## LON LAT CMEDV CRIM
## Min. :-71.29 Min. :42.03 Min. : 5.00 Min. : 0.00632
## 1st Qu.:-71.09 1st Qu.:42.18 1st Qu.:17.02 1st Qu.: 0.08205
## Median :-71.05 Median :42.22 Median :21.20 Median : 0.25651
## Mean :-71.06 Mean :42.22 Mean :22.53 Mean : 3.61352
## 3rd Qu.:-71.02 3rd Qu.:42.25 3rd Qu.:25.00 3rd Qu.: 3.67708
## Max. :-70.81 Max. :42.38 Max. :50.00 Max. :88.97620
##
## ZN INDUS CHAS NOX
## Min. : 0.00 Min. : 0.46 Length:506 Min. :0.3850
## 1st Qu.: 0.00 1st Qu.: 5.19 Class :character 1st Qu.:0.4490
## Median : 0.00 Median : 9.69 Mode :character Median :0.5380
## Mean : 11.36 Mean :11.14 Mean :0.5547
## 3rd Qu.: 12.50 3rd Qu.:18.10 3rd Qu.:0.6240
## Max. :100.00 Max. :27.74 Max. :0.8710
##
## RM AGE DIS RAD
## Min. :3.561 Min. : 2.90 Min. : 1.130 Min. : 1.000
## 1st Qu.:5.886 1st Qu.: 45.02 1st Qu.: 2.100 1st Qu.: 4.000
## Median :6.208 Median : 77.50 Median : 3.207 Median : 5.000
## Mean :6.285 Mean : 68.57 Mean : 3.795 Mean : 9.549
## 3rd Qu.:6.623 3rd Qu.: 94.08 3rd Qu.: 5.188 3rd Qu.:24.000
## Max. :8.780 Max. :100.00 Max. :12.127 Max. :24.000
##
## TAX PTRATIO B LSTAT
## Min. :187.0 Min. :12.60 Min. : 0.32 Min. : 1.73
## 1st Qu.:279.0 1st Qu.:17.40 1st Qu.:375.38 1st Qu.: 6.95
## Median :330.0 Median :19.05 Median :391.44 Median :11.36
## Mean :408.2 Mean :18.46 Mean :356.67 Mean :12.65
## 3rd Qu.:666.0 3rd Qu.:20.20 3rd Qu.:396.23 3rd Qu.:16.95
## Max. :711.0 Max. :22.00 Max. :396.90 Max. :37.97
##
## units cu5k c5_7_5 C7_5_10
## Min. : 5.0 Min. : 0.000 Min. : 0.000 Min. : 0.000
## 1st Qu.: 115.0 1st Qu.: 0.000 1st Qu.: 1.000 1st Qu.: 2.000
## Median : 511.5 Median : 1.000 Median : 3.000 Median : 6.000
## Mean : 680.8 Mean : 2.921 Mean : 5.534 Mean : 9.984
## 3rd Qu.:1152.0 3rd Qu.: 4.000 3rd Qu.: 7.000 3rd Qu.: 13.000
## Max. :3031.0 Max. :35.000 Max. :70.000 Max. :121.000
##
## C10_15 C15_20 C20_25 C25_35
## Min. : 0.00 Min. : 0.0 Min. : 0.0 Min. : 0.0
## 1st Qu.: 14.00 1st Qu.: 19.0 1st Qu.: 13.0 1st Qu.: 7.0
## Median : 33.00 Median : 85.0 Median :101.5 Median : 95.0
## Mean : 55.41 Mean :141.8 Mean :166.2 Mean : 170.8
## 3rd Qu.: 75.50 3rd Qu.:210.0 3rd Qu.:281.8 3rd Qu.: 292.0
## Max. :520.00 Max. :937.0 Max. :723.0 Max. :1189.0
##
## C35_50 co50k median BB
## Min. : 0.00 Min. : 0.00 Min. : 5600 Min. : 0.000
## 1st Qu.: 1.00 1st Qu.: 0.00 1st Qu.:16800 1st Qu.: 0.200
## Median : 17.00 Median : 3.00 Median :21000 Median : 0.500
## Mean : 82.74 Mean : 45.44 Mean :21749 Mean : 6.082
## 3rd Qu.: 97.00 3rd Qu.: 20.00 3rd Qu.:24700 3rd Qu.: 1.600
## Max. :769.00 Max. :980.00 Max. :50000 Max. :96.400
## NA's :17
## censored NOX_ID POP geom
## Length:506 Min. : 1.00 Min. : 434 POLYGON :506
## Class :character 1st Qu.:18.00 1st Qu.: 3697 epsg:4267 : 0
## Mode :character Median :44.00 Median : 5105 +proj=long...: 0
## Mean :41.77 Mean : 5340
## 3rd Qu.:62.00 3rd Qu.: 6825
## Max. :96.00 Max. :15976
##
colnames(bd.new)
## [1] "poltract" "TOWN" "TOWNNO" "TRACT" "LON" "LAT"
## [7] "CMEDV" "CRIM" "ZN" "INDUS" "CHAS" "NOX"
## [13] "RM" "AGE" "DIS" "RAD" "TAX" "PTRATIO"
## [19] "B" "LSTAT" "units" "cu5k" "c5_7_5" "C7_5_10"
## [25] "C10_15" "C15_20" "C20_25" "C25_35" "C35_50" "co50k"
## [31] "median" "BB" "censored" "NOX_ID" "POP" "geom"
head(bd.new)
## Simple feature collection with 6 features and 35 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -71.1753 ymin: 42.33 xmax: -71.1238 ymax: 42.3737
## Geodetic CRS: NAD27
## poltract TOWN TOWNNO TRACT LON LAT CMEDV CRIM
## 1 0001 Boston Allston-Brighton 74 1 -71.0830 42.2172 17.8 8.98296
## 2 0002 Boston Allston-Brighton 74 2 -71.0950 42.2120 21.7 3.84970
## 3 0003 Boston Allston-Brighton 74 3 -71.1007 42.2100 22.7 5.20177
## 4 0004 Boston Allston-Brighton 74 4 -71.0930 42.2070 22.6 4.26131
## 5 0005 Boston Allston-Brighton 74 5 -71.0905 42.2033 25.0 4.54192
## 6 0006 Boston Allston-Brighton 74 6 -71.0865 42.2100 19.9 3.83684
## ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT units cu5k
## 1 0 18.1 1 0.77 6.212 97.4 2.1222 24 666 20.2 377.73 17.60 126 3
## 2 0 18.1 1 0.77 6.395 91.0 2.5052 24 666 20.2 391.34 13.27 399 4
## 3 0 18.1 1 0.77 6.127 83.4 2.7227 24 666 20.2 395.43 11.48 368 3
## 4 0 18.1 0 0.77 6.112 81.3 2.5091 24 666 20.2 390.74 12.67 220 3
## 5 0 18.1 0 0.77 6.398 88.0 2.5182 24 666 20.2 374.56 7.79 44 0
## 6 0 18.1 0 0.77 6.251 91.1 2.2955 24 666 20.2 350.65 14.19 221 2
## c5_7_5 C7_5_10 C10_15 C15_20 C20_25 C25_35 C35_50 co50k median BB censored
## 1 3 4 26 43 29 16 1 1 17800 0.8 no
## 2 10 7 37 95 139 93 9 5 21700 1.4 no
## 3 1 2 25 84 127 102 24 0 22700 0.3 no
## 4 2 2 23 45 67 63 12 3 22600 0.8 no
## 5 0 1 1 11 9 12 9 1 25000 1.8 no
## 6 3 7 31 69 72 30 6 1 19900 3.7 no
## NOX_ID POP geom
## 1 1 3962 POLYGON ((-71.1238 42.3689,...
## 2 1 9245 POLYGON ((-71.1546 42.3573,...
## 3 1 6842 POLYGON ((-71.1685 42.3601,...
## 4 1 8342 POLYGON ((-71.15391 42.3461...
## 5 1 7836 POLYGON ((-71.1479 42.337, ...
## 6 1 9276 POLYGON ((-71.1382 42.3535,...
# Histogramas
library(dplyr)
library(ggplot2)
library(sf)
# Remove geometry and select only numeric variables
bd.new.numeric <- bd.new %>%
st_drop_geometry() %>%
select(where(is.numeric))
# Loop over numeric column names and plot each histogram separately
for (var_name in names(bd.new.numeric)) {
p <- ggplot(bd.new.numeric, aes_string(x = var_name)) +
geom_histogram(fill = "steelblue", color = "white", bins = 30) +
theme_minimal() +
labs(
title = paste("Distribución de", var_name),
x = var_name,
y = "Frecuencia"
)
print(p)
}
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Removed 17 rows containing non-finite outside the scale range
## (`stat_bin()`).
# Matriz de Correlación
library(dplyr)
library(sf)
# install.packages('corrplot')
library(corrplot) # Para visualizar la matriz de correlación
## corrplot 0.95 loaded
# Eliminar geometría y seleccionar solo variables numéricas
bd.new.numeric <- bd.new %>%
st_drop_geometry() %>%
select(where(is.numeric))
# Calcular matriz de correlación (usando método Pearson por defecto)
corr_matrix <- cor(bd.new.numeric, use = "pairwise.complete.obs")
# Mostrar matriz de correlación
print(corr_matrix)
## TOWNNO TRACT LON LAT CMEDV
## TOWNNO 1.0000000000 -0.45575701 -0.0006267488 -0.631647709 -0.265134492
## TRACT -0.4557570088 1.00000000 -0.2208919274 -0.225540594 0.428252165
## LON -0.0006267488 -0.22089193 1.0000000000 0.143053589 -0.322946685
## LAT -0.6316477088 -0.22554059 0.1430535891 1.000000000 0.006825792
## CMEDV -0.2651344917 0.42825216 -0.3229466851 0.006825792 1.000000000
## CRIM 0.4479196971 -0.54716534 0.0651006133 -0.084292955 -0.389582441
## ZN -0.1239695438 0.36729434 -0.2180810717 -0.129667394 0.360386177
## INDUS 0.4344260977 -0.57570636 0.0627024525 -0.041093480 -0.484754379
## NOX 0.4293991967 -0.56980825 0.1608712500 -0.068600401 -0.429300219
## RM -0.1173342359 0.30520745 -0.2571100279 -0.069316987 0.696303794
## AGE 0.2351410001 -0.48746725 0.2047389487 0.079035217 -0.377998896
## DIS -0.3272804748 0.49684242 -0.0112431346 -0.082980855 0.249314834
## RAD 0.7240604222 -0.82882988 0.0340669366 -0.207012846 -0.384765552
## TAX 0.7079815660 -0.79360272 0.0506625750 -0.167718045 -0.471978807
## PTRATIO 0.3411109630 -0.53267942 0.3126021862 -0.004527081 -0.505654619
## B -0.3065165696 0.36504663 -0.0182998564 0.105253702 0.334860832
## LSTAT 0.2966002020 -0.52248622 0.1956295900 0.045659550 -0.740835993
## units -0.2526989971 0.51626534 -0.0332390591 -0.074744561 0.381451773
## cu5k 0.1834035726 -0.22440458 0.2310692197 0.024042027 -0.382425461
## c5_7_5 0.1731596843 -0.13693040 0.2716821886 -0.061513365 -0.383636757
## C7_5_10 0.0446478778 0.06015136 0.3532617984 -0.091505013 -0.297857170
## C10_15 -0.0648601496 0.24200182 0.3780005965 -0.138962860 -0.194921226
## C15_20 -0.1414856459 0.34891452 0.2405496977 -0.133850391 -0.069710894
## C20_25 -0.2292152442 0.40158715 0.0245576155 -0.019323451 0.119916916
## C25_35 -0.2481479936 0.45040472 -0.1710165157 -0.003737788 0.397447349
## C35_50 -0.1827326164 0.39195390 -0.2486660258 -0.050913223 0.616235061
## co50k -0.1243641282 0.25099493 -0.2274768242 -0.020830045 0.672025419
## median -0.3008165105 0.50161668 -0.3289352050 0.010910170 0.999813854
## BB 0.3116185230 -0.34870667 0.0179552734 -0.106773438 -0.302852201
## NOX_ID -0.2597550389 0.95280101 -0.1760462672 -0.433196359 0.401938203
## POP -0.0616537228 0.23318986 0.0168762267 -0.073256252 0.198081996
## CRIM ZN INDUS NOX RM AGE
## TOWNNO 0.44791970 -0.12396954 0.43442610 0.42939920 -0.11733424 0.23514100
## TRACT -0.54716534 0.36729434 -0.57570636 -0.56980825 0.30520745 -0.48746725
## LON 0.06510061 -0.21808107 0.06270245 0.16087125 -0.25711003 0.20473895
## LAT -0.08429296 -0.12966739 -0.04109348 -0.06860040 -0.06931699 0.07903522
## CMEDV -0.38958244 0.36038618 -0.48475438 -0.42930022 0.69630379 -0.37799890
## CRIM 1.00000000 -0.20046922 0.40658341 0.42097171 -0.21924670 0.35273425
## ZN -0.20046922 1.00000000 -0.53382819 -0.51660371 0.31199059 -0.56953734
## INDUS 0.40658341 -0.53382819 1.00000000 0.76365145 -0.39167585 0.64477851
## NOX 0.42097171 -0.51660371 0.76365145 1.00000000 -0.30218819 0.73147010
## RM -0.21924670 0.31199059 -0.39167585 -0.30218819 1.00000000 -0.24026493
## AGE 0.35273425 -0.56953734 0.64477851 0.73147010 -0.24026493 1.00000000
## DIS -0.37967009 0.66440822 -0.70802699 -0.76923011 0.20524621 -0.74788054
## RAD 0.62550515 -0.31194783 0.59512927 0.61144056 -0.20984667 0.45602245
## TAX 0.58276431 -0.31456332 0.72076018 0.66802320 -0.29204783 0.50645559
## PTRATIO 0.28994558 -0.39167855 0.38324756 0.18893268 -0.35550149 0.26151501
## B -0.38506394 0.17552032 -0.35697654 -0.38005064 0.12806864 -0.27353398
## LSTAT 0.45562148 -0.41299457 0.60379972 0.59087892 -0.61380827 0.60233853
## units -0.38522409 0.41134807 -0.64044629 -0.68054021 0.31728294 -0.75009020
## cu5k 0.22668294 -0.01609136 0.13275127 0.10384372 -0.19575128 0.07475397
## c5_7_5 0.08791673 0.03936148 0.06058100 0.05785874 -0.18622636 0.03580045
## C7_5_10 -0.07213610 0.07378400 -0.10585169 -0.11827589 -0.17427822 -0.09829070
## C10_15 -0.22213803 0.10882006 -0.27675893 -0.29983357 -0.15825798 -0.26261526
## C15_20 -0.29603425 0.11720771 -0.37764924 -0.45092623 -0.10299896 -0.46828503
## C20_25 -0.33564314 0.19860012 -0.45749593 -0.57112147 0.04351397 -0.66898381
## C25_35 -0.32222304 0.37894651 -0.56767008 -0.60725716 0.33576891 -0.70289352
## C35_50 -0.23708128 0.47803810 -0.53813891 -0.47813679 0.57081625 -0.51826759
## co50k -0.14869913 0.39999645 -0.36856309 -0.29443186 0.60427950 -0.29570331
## median -0.42759586 0.39545043 -0.58306473 -0.51080448 0.67741461 -0.47500199
## BB 0.34529211 -0.14643613 0.30220962 0.30821035 -0.07375350 0.22276741
## NOX_ID -0.47849870 0.38399314 -0.58250702 -0.54701886 0.29830507 -0.50591387
## POP -0.43093129 0.07328424 -0.19142672 -0.26730727 0.14001358 -0.29806952
## DIS RAD TAX PTRATIO B
## TOWNNO -0.32728047 0.72406042 0.70798157 0.341110963 -0.30651657
## TRACT 0.49684242 -0.82882988 -0.79360272 -0.532679418 0.36504663
## LON -0.01124313 0.03406694 0.05066257 0.312602186 -0.01829986
## LAT -0.08298085 -0.20701285 -0.16771805 -0.004527081 0.10525370
## CMEDV 0.24931483 -0.38476555 -0.47197881 -0.505654619 0.33486083
## CRIM -0.37967009 0.62550515 0.58276431 0.289945579 -0.38506394
## ZN 0.66440822 -0.31194783 -0.31456332 -0.391678548 0.17552032
## INDUS -0.70802699 0.59512927 0.72076018 0.383247556 -0.35697654
## NOX -0.76923011 0.61144056 0.66802320 0.188932677 -0.38005064
## RM 0.20524621 -0.20984667 -0.29204783 -0.355501495 0.12806864
## AGE -0.74788054 0.45602245 0.50645559 0.261515012 -0.27353398
## DIS 1.00000000 -0.49458793 -0.53443158 -0.232470542 0.29151167
## RAD -0.49458793 1.00000000 0.91022819 0.464741179 -0.44441282
## TAX -0.53443158 0.91022819 1.00000000 0.460853035 -0.44180801
## PTRATIO -0.23247054 0.46474118 0.46085304 1.000000000 -0.17738330
## B 0.29151167 -0.44441282 -0.44180801 -0.177383302 1.00000000
## LSTAT -0.49699583 0.48867633 0.54399341 0.374044317 -0.36608690
## units 0.64488152 -0.46568972 -0.52160600 -0.211586226 0.36878467
## cu5k -0.03225965 0.17543958 0.23202063 0.184547066 -0.02496275
## c5_7_5 0.05462511 0.12569602 0.18596137 0.169727251 -0.01904251
## C7_5_10 0.26965588 -0.06395600 0.01323602 0.140993787 0.06430475
## C10_15 0.43622264 -0.23795400 -0.18396298 0.101732874 0.18288475
## C15_20 0.50524351 -0.32502636 -0.32161689 0.053126281 0.28125384
## C20_25 0.53253646 -0.37773128 -0.41896182 -0.063750390 0.33236882
## C25_35 0.52567049 -0.40501847 -0.47427394 -0.230935720 0.31149354
## C35_50 0.40372280 -0.32740727 -0.40396396 -0.345896991 0.21934642
## co50k 0.21764647 -0.20904817 -0.27854052 -0.364027956 0.13841450
## median 0.34794775 -0.45267270 -0.55098797 -0.502141628 0.36917168
## BB -0.24949608 0.42889641 0.41048918 0.186989959 -0.91596093
## NOX_ID 0.52841522 -0.70835373 -0.71531073 -0.484437928 0.33711578
## POP 0.27080416 -0.18255626 -0.18024914 -0.022366256 0.26640524
## LSTAT units cu5k c5_7_5 C7_5_10
## TOWNNO 0.29660020 -0.252698997 0.183403573 0.17315968 0.044647878
## TRACT -0.52248622 0.516265336 -0.224404576 -0.13693040 0.060151359
## LON 0.19562959 -0.033239059 0.231069220 0.27168219 0.353261798
## LAT 0.04565955 -0.074744561 0.024042027 -0.06151337 -0.091505013
## CMEDV -0.74083599 0.381451773 -0.382425461 -0.38363676 -0.297857170
## CRIM 0.45562148 -0.385224086 0.226682938 0.08791673 -0.072136097
## ZN -0.41299457 0.411348073 -0.016091362 0.03936148 0.073784001
## INDUS 0.60379972 -0.640446294 0.132751266 0.06058100 -0.105851691
## NOX 0.59087892 -0.680540206 0.103843722 0.05785874 -0.118275891
## RM -0.61380827 0.317282941 -0.195751280 -0.18622636 -0.174278221
## AGE 0.60233853 -0.750090200 0.074753974 0.03580045 -0.098290696
## DIS -0.49699583 0.644881522 -0.032259654 0.05462511 0.269655884
## RAD 0.48867633 -0.465689716 0.175439583 0.12569602 -0.063956000
## TAX 0.54399341 -0.521606001 0.232020629 0.18596137 0.013236025
## PTRATIO 0.37404432 -0.211586226 0.184547066 0.16972725 0.140993787
## B -0.36608690 0.368784668 -0.024962746 -0.01904251 0.064304750
## LSTAT 1.00000000 -0.625267559 0.286280452 0.23554554 0.076608802
## units -0.62526756 1.000000000 0.005210945 0.08013362 0.270192056
## cu5k 0.28628045 0.005210945 1.000000000 0.74724385 0.544189323
## c5_7_5 0.23554554 0.080133620 0.747243851 1.00000000 0.734675905
## C7_5_10 0.07660880 0.270192056 0.544189323 0.73467591 1.000000000
## C10_15 -0.11435196 0.493105057 0.311680993 0.51096136 0.840994061
## C15_20 -0.30173109 0.702616352 0.139927804 0.27318310 0.536750369
## C20_25 -0.47471478 0.860235130 0.023179250 0.07750317 0.247280122
## C25_35 -0.59871772 0.876469758 -0.105927840 -0.09448994 -0.007001159
## C35_50 -0.55774640 0.651932530 -0.164579038 -0.17004549 -0.141152021
## co50k -0.41814401 0.377522080 -0.140857111 -0.16503897 -0.163693463
## median -0.75345672 0.487885788 -0.370750812 -0.37446247 -0.278985661
## BB 0.30856809 -0.312431110 0.029140706 0.02572618 -0.061680607
## NOX_ID -0.51518796 0.538103840 -0.194148417 -0.09164878 0.097098137
## POP -0.38017116 0.628191994 -0.006544089 0.06584493 0.252604858
## C10_15 C15_20 C20_25 C25_35 C35_50
## TOWNNO -0.06486015 -0.1414856459 -0.22921524 -0.248147994 -0.1827326164
## TRACT 0.24200182 0.3489145162 0.40158715 0.450404724 0.3919539034
## LON 0.37800060 0.2405496977 0.02455762 -0.171016516 -0.2486660258
## LAT -0.13896286 -0.1338503911 -0.01932345 -0.003737788 -0.0509132230
## CMEDV -0.19492123 -0.0697108935 0.11991692 0.397447349 0.6162350612
## CRIM -0.22213803 -0.2960342453 -0.33564314 -0.322223041 -0.2370812809
## ZN 0.10882006 0.1172077078 0.19860012 0.378946507 0.4780380961
## INDUS -0.27675893 -0.3776492400 -0.45749593 -0.567670076 -0.5381389146
## NOX -0.29983357 -0.4509262307 -0.57112147 -0.607257164 -0.4781367942
## RM -0.15825798 -0.1029989639 0.04351397 0.335768911 0.5708162455
## AGE -0.26261526 -0.4682850293 -0.66898381 -0.702893517 -0.5182675937
## DIS 0.43622264 0.5052435148 0.53253646 0.525670491 0.4037227974
## RAD -0.23795400 -0.3250263563 -0.37773128 -0.405018473 -0.3274072691
## TAX -0.18396298 -0.3216168928 -0.41896182 -0.474273941 -0.4039639565
## PTRATIO 0.10173287 0.0531262806 -0.06375039 -0.230935720 -0.3458969914
## B 0.18288475 0.2812538388 0.33236882 0.311493539 0.2193464221
## LSTAT -0.11435196 -0.3017310924 -0.47471478 -0.598717717 -0.5577464019
## units 0.49310506 0.7026163523 0.86023513 0.876469758 0.6519325302
## cu5k 0.31168099 0.1399278039 0.02317925 -0.105927840 -0.1645790376
## c5_7_5 0.51096136 0.2731831044 0.07750317 -0.094489944 -0.1700454866
## C7_5_10 0.84099406 0.5367503688 0.24728012 -0.007001159 -0.1411520206
## C10_15 1.00000000 0.8411718098 0.50940882 0.135185122 -0.0984089392
## C15_20 0.84117181 1.0000000000 0.81322789 0.387604639 -0.0006683666
## C20_25 0.50940882 0.8132278852 1.00000000 0.755812126 0.2441204105
## C25_35 0.13518512 0.3876046390 0.75581213 1.000000000 0.7163527411
## C35_50 -0.09840894 -0.0006683666 0.24412041 0.716352741 1.0000000000
## co50k -0.15465208 -0.1398527168 -0.02974482 0.280029621 0.7105041300
## median -0.15476384 0.0010385156 0.22912414 0.536566915 0.7228904468
## BB -0.16148202 -0.2427051830 -0.28100220 -0.261019590 -0.1834834386
## NOX_ID 0.28741593 0.3954854869 0.41634813 0.453747752 0.3958719628
## POP 0.44973713 0.5744318737 0.61775048 0.501550678 0.2515035930
## co50k median BB NOX_ID POP
## TOWNNO -0.12436413 -0.300816511 0.31161852 -0.25975504 -0.061653723
## TRACT 0.25099493 0.501616676 -0.34870667 0.95280101 0.233189863
## LON -0.22747682 -0.328935205 0.01795527 -0.17604627 0.016876227
## LAT -0.02083004 0.010910170 -0.10677344 -0.43319636 -0.073256252
## CMEDV 0.67202542 0.999813854 -0.30285220 0.40193820 0.198081996
## CRIM -0.14869913 -0.427595861 0.34529211 -0.47849870 -0.430931289
## ZN 0.39999645 0.395450431 -0.14643613 0.38399314 0.073284238
## INDUS -0.36856309 -0.583064735 0.30220962 -0.58250702 -0.191426716
## NOX -0.29443186 -0.510804476 0.30821035 -0.54701886 -0.267307271
## RM 0.60427950 0.677414605 -0.07375350 0.29830507 0.140013577
## AGE -0.29570331 -0.475001993 0.22276741 -0.50591387 -0.298069522
## DIS 0.21764647 0.347947750 -0.24949608 0.52841522 0.270804162
## RAD -0.20904817 -0.452672700 0.42889641 -0.70835373 -0.182556259
## TAX -0.27854052 -0.550987968 0.41048918 -0.71531073 -0.180249136
## PTRATIO -0.36402796 -0.502141628 0.18698996 -0.48443793 -0.022366256
## B 0.13841450 0.369171683 -0.91596093 0.33711578 0.266405236
## LSTAT -0.41814401 -0.753456716 0.30856809 -0.51518796 -0.380171162
## units 0.37752208 0.487885788 -0.31243111 0.53810384 0.628191994
## cu5k -0.14085711 -0.370750812 0.02914071 -0.19414842 -0.006544089
## c5_7_5 -0.16503897 -0.374462474 0.02572618 -0.09164878 0.065844930
## C7_5_10 -0.16369346 -0.278985661 -0.06168061 0.09709814 0.252604858
## C10_15 -0.15465208 -0.154763837 -0.16148202 0.28741593 0.449737128
## C15_20 -0.13985272 0.001038516 -0.24270518 0.39548549 0.574431874
## C20_25 -0.02974482 0.229124138 -0.28100220 0.41634813 0.617750484
## C25_35 0.28002962 0.536566915 -0.26101959 0.45374775 0.501550678
## C35_50 0.71050413 0.722890447 -0.18348344 0.39587196 0.251503593
## co50k 1.00000000 0.671766179 -0.11621203 0.23549363 0.121778582
## median 0.67176618 1.000000000 -0.33286055 0.47424490 0.243445007
## BB -0.11621203 -0.332860551 1.00000000 -0.31083129 -0.217438191
## NOX_ID 0.23549363 0.474244896 -0.31083129 1.00000000 0.252331228
## POP 0.12177858 0.243445007 -0.21743819 0.25233123 1.000000000
# Visualizar con corrplot
corrplot(corr_matrix, method = "color",
type = "upper",
tl.col = "black",
tl.srt = 45,
addCoef.col = "black", # agrega coeficientes en la matriz
number.cex = 0.7,
diag = FALSE)
# boxplots
library(dplyr)
library(ggplot2)
library(sf)
# Quitar geometría y seleccionar variables numéricas
bd.new.numeric <- bd.new %>%
st_drop_geometry() %>%
select(where(is.numeric))
# Crear un boxplot por variable
for (var_name in names(bd.new.numeric)) {
p <- ggplot(bd.new.numeric, aes_string(x = "''", y = var_name)) +
geom_boxplot(fill = "tomato", outlier.color = "black", outlier.size = 1) +
theme_minimal() +
labs(
title = paste("Boxplot de", var_name),
x = "",
y = var_name
)
print(p)
}
## Warning: Removed 17 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
Realiza un análisis exploratorio de datos espaciales. Los puntos esenciales a generar en esta sección son los siguientes: - Definir claramente la matriz de conectividad espacial - Graficar dicha matriz, utilizar la estructura ‘Reina’ para determinar los vecinos - Realizar mapa coloreado de acuerdo al valor de la variable que se desea explicar y al menos 2 de las variables independientes que se utilizarán - Realizar análisis de clusters: + Incluir mapa de un lag espacial de la variable que se desea explicar junto con el mapa original previamente realizado + Visualización Espacial de Clusters (HotSpots, ColdSpots, NonSignificant y Atípicos: HighLows, LowHighs) - Índice Global de Moran + Calcular el Estadístico del Índice Global de Moran para las variables previamente graficadas (al menos son 3: 1 dependiente, 2 independientes) + Generar sus respectivos gráficos de dispersión
# Definir matriz de conectividad espacial
swm <- poly2nb(bd.trSP, queen=T)
summary(swm)
## Neighbour list object:
## Number of regions: 506
## Number of nonzero links: 2910
## Percentage nonzero weights: 1.136559
## Average number of links: 5.750988
## Link number distribution:
##
## 1 2 3 4 5 6 7 8 9 10 11 12 15
## 3 9 28 81 107 120 87 40 22 5 2 1 1
## 3 least connected regions:
## 18 51 345 with 1 link
## 1 most connected region:
## 112 with 15 links
# Graficar matriz de conectividad espacial (Reina) para determinar vecinos
sswm <- nb2listw(swm, style="W", zero.policy = TRUE)
bd.trSP_centroid <- coordinates(bd.trSP)
plot(bd.trSP,border="blue",axes=FALSE,las=1, main="Matriz de conectividad espacial")
plot(bd.trSP,col="grey",border=grey(0.9),axes=T,add=T)
plot(sswm,coords=bd.trSP_centroid,pch=19,cex=0.1,col="red",add=T)
# Agregar CMEDV
bd.trSP$CMEDV <- bd.new$CMEDV
# Lag espacial
bd.trSP$lag_CMEDV <- lag.listw(sswm, bd.trSP$CMEDV)
par(mfrow=c(1,2))
spplot(bd.trSP, "CMEDV", main = "CMEDV original", col.regions = viridis(100))
spplot(bd.trSP, "lag_CMEDV", main = "Lag Espacial de CMEDV", col.regions = viridis(100))
# Mapa coloreado con la variable CMEDV y variable INDUS y TAX
library(sp)
library(spdep)
library(viridis)
# Asumiendo que bd.trSP es un objeto SpatialPolygonsDataFrame y que contiene las variables
# Si no tienes INDUS y AGE, asegúrate de agregarlas igual que hiciste con CMEDV
# Mapa coloreado para CMEDV
spplot(bd.trSP, "CMEDV", main = "Mapa coloreado de CMEDV", col.regions = viridis(100))
# Mapa coloreado para INDUS
spplot(bd.trSP, "INDUS", main = "Mapa coloreado de INDUS", col.regions = viridis(100))
# Mapa coloreado para TAX
spplot(bd.trSP, "TAX", main = "Mapa coloreado de TAX", col.regions = viridis(100))
bd_sf <- st_as_sf(bd.trSP)
# Matriz reina
queen_w <- queen_weights(bd_sf)
# Clusters
lisa_CMEDV <- local_moran(queen_w, bd_sf["CMEDV"])
bd_sf$cluster_CMEDV <- as.factor(lisa_CMEDV$GetClusterIndicators())
levels(bd_sf$cluster_CMEDV) <- lisa_CMEDV$GetLabels()
ggplot(data = bd_sf) +
geom_sf(aes(fill = cluster_CMEDV), color = "white", size = 0.2) +
scale_fill_brewer(palette = "RdBu", name = "Cluster") +
theme_minimal() +
labs(title = "Clústeres Espaciales de CMEDV",
subtitle = "HotSpots, ColdSpots y Atípicos")
# Índice Global de Moran
# CMEDV
moran.test(bd.trSP$CMEDV, listw = sswm, zero.policy = TRUE)
##
## Moran I test under randomisation
##
## data: bd.trSP$CMEDV
## weights: sswm
##
## Moran I statistic standard deviate = 23.558, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.6322686784 -0.0019801980 0.0007248376
moran.plot(bd.trSP$CMEDV, sswm,
main = "Moran’s I - CMEDV")
# RAD
moran.test(bd.trSP$TAX, listw = sswm, zero.policy = TRUE)
##
## Moran I test under randomisation
##
## data: bd.trSP$TAX
## weights: sswm
##
## Moran I statistic standard deviate = 30.389, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.818327988 -0.001980198 0.000728656
moran.plot(bd.trSP$TAX, sswm,
main = "Moran’s I - TAX")
# INDUS
moran.test(bd.tr$INDUS, listw = sswm, zero.policy = TRUE)
##
## Moran I test under randomisation
##
## data: bd.tr$INDUS
## weights: sswm
##
## Moran I statistic standard deviate = 30.12, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.8111468479 -0.0019801980 0.0007287869
moran.plot(bd.tr$INDUS, sswm,
main = "Moran's I - INDUS")
Por último deberás de ajustar los Modelos de Regresión estudiados en este Módulo: - Modelo de Regresión Lineal Tradicional - Modelo de Regresión Espacial AutoRegresivo (SAR) - Modelo de Regresión Espacial de Errores (SEM) - Modelo de Regresión Espacial Durbin
# Modelo de Regresión Lineal Tradicional
model_ols <- lm(CMEDV ~ TAX + INDUS, data = bd.trSP)
summary(model_ols)
##
## Call:
## lm(formula = CMEDV ~ TAX + INDUS, data = bd.trSP)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.443 -4.640 -1.954 3.279 33.858
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 32.687908 0.920972 35.493 < 2e-16 ***
## TAX -0.013899 0.003002 -4.630 4.65e-06 ***
## INDUS -0.402701 0.073746 -5.461 7.47e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.881 on 503 degrees of freedom
## Multiple R-squared: 0.2663, Adjusted R-squared: 0.2633
## F-statistic: 91.27 on 2 and 503 DF, p-value: < 2.2e-16
AIC(model_ols)
## [1] 3530.177
# Modelo de Regresión Espacial AutoRegresivo (SAR)
model_sar <- lagsarlm(CMEDV ~ TAX + INDUS, data=bd.trSP, listw = sswm)
summary(model_sar)
##
## Call:lagsarlm(formula = CMEDV ~ TAX + INDUS, data = bd.trSP, listw = sswm)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.04892 -3.13774 -0.97517 1.62383 28.71094
##
## Type: lag
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 9.5856028 1.3111979 7.3106 2.66e-13
## TAX -0.0053830 0.0021912 -2.4567 0.014023
## INDUS -0.1497115 0.0539188 -2.7766 0.005493
##
## Rho: 0.7385, LR test value: 282.57, p-value: < 2.22e-16
## Asymptotic standard error: 0.034368
## z-value: 21.488, p-value: < 2.22e-16
## Wald statistic: 461.74, p-value: < 2.22e-16
##
## Log likelihood: -1619.802 for lag model
## ML residual variance (sigma squared): 30.9, (sigma: 5.5588)
## Number of observations: 506
## Number of parameters estimated: 5
## AIC: 3249.6, (AIC for lm: 3530.2)
## LM test for residual autocorrelation
## test value: 5.4407, p-value: 0.019673
# Modelo de Regresión Espacial de Errores (SEM)
model_sem <- errorsarlm(CMEDV ~ TAX + INDUS, data=bd.trSP, listw = sswm)
summary(model_sem)
##
## Call:errorsarlm(formula = CMEDV ~ TAX + INDUS, data = bd.trSP, listw = sswm)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.39670 -3.04655 -0.91684 1.90550 28.42394
##
## Type: error
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 32.2908228 1.6678662 19.3606 < 2.2e-16
## TAX -0.0147438 0.0037163 -3.9674 7.267e-05
## INDUS -0.4207164 0.0913266 -4.6067 4.091e-06
##
## Lambda: 0.78271, LR test value: 315.65, p-value: < 2.22e-16
## Asymptotic standard error: 0.032246
## z-value: 24.273, p-value: < 2.22e-16
## Wald statistic: 589.19, p-value: < 2.22e-16
##
## Log likelihood: -1603.264 for error model
## ML residual variance (sigma squared): 28.292, (sigma: 5.319)
## Number of observations: 506
## Number of parameters estimated: 5
## AIC: 3216.5, (AIC for lm: 3530.2)
# Modelo de Regresión Espacial Durbin
model_durbin <- lagsarlm(CMEDV ~ TAX + INDUS, data=bd.trSP, listw = sswm, type="mixed")
summary(model_durbin)
##
## Call:lagsarlm(formula = CMEDV ~ TAX + INDUS, data = bd.trSP, listw = sswm,
## type = "mixed")
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.5376 -3.0276 -0.9167 1.8568 28.2509
##
## Type: mixed
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 6.7895170 1.2259329 5.5382 3.055e-08
## TAX -0.0149424 0.0041149 -3.6313 0.000282
## INDUS -0.4509479 0.1028011 -4.3866 1.151e-05
## lag.TAX 0.0115367 0.0050864 2.2681 0.023321
## lag.INDUS 0.3888530 0.1284000 3.0284 0.002458
##
## Rho: 0.78182, LR test value: 314.95, p-value: < 2.22e-16
## Asymptotic standard error: 0.032316
## z-value: 24.193, p-value: < 2.22e-16
## Wald statistic: 585.3, p-value: < 2.22e-16
##
## Log likelihood: -1602.993 for mixed model
## ML residual variance (sigma squared): 28.275, (sigma: 5.3175)
## Number of observations: 506
## Number of parameters estimated: 7
## AIC: 3220, (AIC for lm: 3532.9)
## LM test for residual autocorrelation
## test value: 0.13164, p-value: 0.71674
AIC(model_durbin)
## [1] 3219.987
Generar breve comparativa de estos modelos y elegir ¿cuál consideran qué es el mejor modelo? y ¿por qué?
stargazer(model_ols, model_sar, model_sem, model_durbin, type = "text", title="Estimated Regression Results CMEDV")
##
## Estimated Regression Results CMEDV
## ====================================================================================
## Dependent variable:
## ----------------------------------------------------------------
## CMEDV
## OLS spatial spatial spatial
## autoregressive error autoregressive
## (1) (2) (3) (4)
## ------------------------------------------------------------------------------------
## TAX -0.014*** -0.005** -0.015*** -0.015***
## (0.003) (0.002) (0.004) (0.004)
##
## INDUS -0.403*** -0.150*** -0.421*** -0.451***
## (0.074) (0.054) (0.091) (0.103)
##
## lag.TAX 0.012**
## (0.005)
##
## lag.INDUS 0.389***
## (0.128)
##
## Constant 32.688*** 9.586*** 32.291*** 6.790***
## (0.921) (1.311) (1.668) (1.226)
##
## ------------------------------------------------------------------------------------
## Observations 506 506 506 506
## R2 0.266
## Adjusted R2 0.263
## Log Likelihood -1,619.802 -1,603.264 -1,602.993
## sigma2 30.900 28.292 28.275
## Akaike Inf. Crit. 3,249.605 3,216.528 3,219.987
## Residual Std. Error 7.881 (df = 503)
## F Statistic 91.265*** (df = 2; 503)
## Wald Test (df = 1) 461.740*** 589.187*** 585.299***
## LR Test (df = 1) 282.572*** 315.649*** 314.953***
## ====================================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
# AIC Comparison
aic_values <- data.frame(
Model = c("OLS", "SAR", "SEM", "Durbin"),
AIC = c(AIC(model_ols), AIC(model_sar), AIC(model_sem), AIC(model_durbin))
)
print(aic_values)
## Model AIC
## 1 OLS 3530.177
## 2 SAR 3249.605
## 3 SEM 3216.528
## 4 Durbin 3219.987