Librerias
library(foreign)
library(dplyr)
##
## Adjuntando el paquete: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(spdep)
## Cargando paquete requerido: spData
## To access larger datasets in this package, install the spDataLarge
## package with: `install.packages('spDataLarge',
## repos='https://nowosad.github.io/drat/', type='source')`
## Cargando paquete requerido: sf
## Linking to GEOS 3.13.0, GDAL 3.10.1, PROJ 9.5.1; sf_use_s2() is TRUE
library(tigris)
## To enable caching of data, set `options(tigris_use_cache = TRUE)`
## in your R script or .Rprofile.
library(rgeoda)
## Cargando paquete requerido: digest
##
## Adjuntando el paquete: 'rgeoda'
## The following object is masked from 'package:spdep':
##
## skater
library(RColorBrewer)
library(viridis)
## Cargando paquete requerido: viridisLite
library(ggplot2)
library(tmap)
library(sf)
library(sp)
library(readxl)
library(tidyr)
library(plm)
##
## Adjuntando el paquete: 'plm'
## The following objects are masked from 'package:dplyr':
##
## between, lag, lead
library(spdep)
library(pspatreg)
library(sqldf)
## Cargando paquete requerido: gsubfn
## Cargando paquete requerido: proto
## Cargando paquete requerido: RSQLite
library(spatialreg)
## Cargando paquete requerido: Matrix
##
## Adjuntando el paquete: 'Matrix'
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
##
## Adjuntando el paquete: 'spatialreg'
## The following objects are masked from 'package:spdep':
##
## get.ClusterOption, get.coresOption, get.mcOption,
## get.VerboseOption, get.ZeroPolicyOption, set.ClusterOption,
## set.coresOption, set.mcOption, set.VerboseOption,
## set.ZeroPolicyOption
library(tmap)
Situación Problema B
De acuerdo a la Cámara Nacional de la Industria Farmacéutica, (CANIFARMA) en México las personas en situación de pobreza se caracterizan por tener una probabilidad 5 veces mayor de fallecer por COVID-19 que las personas con relativamente mayor nivel de ingresos (Arceo-Gómez, et al., 2021)2. Además de la falta de acceso a servicios de salud y posibles cormobilidades, otro factor relevante en incrementar dicha probabililidad es el perfil socioeconómico (Arceo-Gómez, et al., 2021). A partir de la pandemia por COVI19, la firma de consultoría XYZ (México) establece que “Las organizaciones que su principal actividad de negocios es brindar servicios de salud requiren soluciones específicas e innovadoras, para aprovechar oportunidades, afrontar retos, así como favorecer su consolidación y crecimiento”. Algunos de los servicios enfocados por parte de la firma es detectar las necesidades y potencial del crecimiento del sector salud a partir de la Analítica de Datos. ¿Cuáles son las regiones en México que representan una oportunidad de crecimiento y/o expansión de organizaciones relacionadas con el acceso a servicios de salud? ¿Cuáles son las características socioeconómicas de la población de dichas regiones? ¿Porqué sí / no existen condiciones en dichas regiones para el crecimiento y/o expansión de organizaciones relacionadas con el acceso a servicios de salud?
1) Brevemente, describir con sus propias palabras qué es un ESDA y cuál es su principal propósito en el proceso de analítica de datos.
El ESDA (Exploratory Spatial Data Analysis) es un tipo de análisis exploratorio, primero definamos el concepto base : EDA. El Análisis Exploratorio de Datos (EDA) es un proceso fundamental en la analítica de datos. Su principal propósito es comprender la naturaleza de los datos, identificar patrones, valores atípicos y relaciones antes de realizar un análisis más profundo.Este proceso implica inspeccionar, limpiar, transformar y modelar bases de datos. Su principal objetivo es descubrir información útil, llegar a conclusiones y respaldar la toma de decisiones.
Ahora bien, un ESDA es una técnica basada en la ciencia de la información geográfica (GIS) que permite describir y visualizar distribuciones espaciales. Su objetivo principal es dentificar ubicaciones atípicas o valores extremos en el espacio así como descubrir patrones de asociación espacial, agrupamientos o hot spots que puedan ser representados de manera numérica y visual a través de varios métodos.
2) Brevemente, describir con sus propias palabras el concepto de autocorrelación espacial así como 1-2 ejemplos relacionados con dicho concepto.
La autocorrelación es una característica de los datos que muestra el grado de similitud entre los valores de las mismas variables en intervalos de tiempo sucesivos. En contraste, la autocorrelación espacial se refiere a la similitud espacial entre los valores de una variable en un conjunto de datos geográfico. En otras palabras, busca patrones de dependencia espacial. Si los valores similares tienden a agruparse o dispersarse en el espacio, estamos hablando de autocorrelación espacial.
Esta característica se mide a través del Índice de Moran (I de Moran).
Si el índice es cercano a +1, hay autocorrelación espacial positiva:
Los valores similares se agrupan en el espacio.
Ejemplo: Temperaturas altas cerca de otras temperaturas altas.
Si el índice es cercano a -1, hay autocorrelación espacial negativa:
Los valores disímiles se agrupan.
Ejemplo: Áreas densamente pobladas rodeadas de áreas poco pobladas.
Este fenómeno se puede apreciar en la siguiente imágen:
3) Con base en el archivo de datos asignado realizar selección de variables y elaborar ESDA que incluye:
Importing dataset
covid <- read_excel("D:/Backup octavo sem/Planación estratégica/Modulo 1/Actividades/cross_sectional_dataset.xlsx")
covid[sapply(covid, is.numeric)] <- lapply(covid[sapply(covid, is.numeric)], function(x) ifelse(is.na(x), mean(x, na.rm = TRUE), x))
Importing shapefile
# Mexico's states (32)
mx_state_map <- st_read("D:/Backup octavo sem/Planación estratégica/Modulo 1/Actividades/mx_maps/mx_states/mexlatlong.shp")
## Reading layer `mexlatlong' from data source
## `D:\Backup octavo sem\Planación estratégica\Modulo 1\Actividades\mx_maps\mx_states\mexlatlong.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 32 features and 19 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -118.4042 ymin: 14.55055 xmax: -86.73862 ymax: 32.71846
## Geodetic CRS: WGS 84
# Mexico's Municipios (2,457)
mx_mpio_map <- st_read("D:/Backup octavo sem/Planación estratégica/Modulo 1/Actividades/mx_maps/mx_mpios/Mexican Municipalities.shp")
## Reading layer `Mexican Municipalities' from data source
## `D:\Backup octavo sem\Planación estratégica\Modulo 1\Actividades\mx_maps\mx_mpios\Mexican Municipalities.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 2456 features and 3 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -118.4076 ymin: 14.5321 xmax: -86.71041 ymax: 32.71865
## Geodetic CRS: WGS 84
# merging dataset
#mpio_geodata <- geo_join(mx_mpio_map,covid,'IDUNICO','clave_municipio',how='inner')
mpio_geodata <- inner_join(mx_mpio_map, covid, by = c("IDUNICO" = "clave_municipio"))
a. Estadísticos Descripitvos (Global y Regional). Incluir elementos gráficos (histogramas, gráfico de barras, etc.).
str(covid[,4:21])
## tibble [2,457 × 18] (S3: tbl_df/tbl/data.frame)
## $ numero_hospitales : num [1:2457] 1289 22 50 8 60 ...
## $ poblacion_2022 : num [1:2457] 961977 50864 60760 16918 130184 ...
## $ hogrem2015 : num [1:2457] 4.82 12.87 25.85 14.29 5.72 ...
## $ hogremjefmuj2015 : num [1:2457] 27.8 22.1 25.6 21.2 19.9 ...
## $ popnoafmed2015 : num [1:2457] 14.34 5.77 9.97 5.28 14.02 ...
## $ gini2015 : num [1:2457] 0.392 0.37 0.375 0.37 0.402 0.389 0.396 0.418 0.371 0.37 ...
## $ popden2020 : num [1:2457] 812 94.1 62.5 131.9 260.3 ...
## $ crimen_2018 : num [1:2457] 7.36 7.81 8.4 17.86 5.14 ...
## $ crimen_2019 : num [1:2457] 8.15 5.78 8.31 0 10.98 ...
## $ inclusion_fin_2019 : num [1:2457] 1.78 0 1.16 0 0.36 1.2 1.03 0 0 0 ...
## $ porcentaje_pob_pobreza : num [1:2457] 23.7 40.1 45.8 37 26.3 ...
## $ porcentaje_pob_pobreza_ext : num [1:2457] 1.97 4.14 4.5 3.38 3.29 2.88 5.56 2.32 3.15 4.81 ...
## $ porcentaje_pob_servicios_salud: num [1:2457] 20 16.5 21 17.6 21.1 ...
## $ porcentaje_pob_acceso_ss : num [1:2457] 37.8 62.1 76.2 47.5 41.9 ...
## $ pob_6-14_no_edu : num [1:2457] 4.6 6.29 6.95 5.7 5.88 4.44 4.38 5.74 5.37 4.4 ...
## $ rezago_social : num [1:2457] -1.32 -0.86 -0.92 -1 -1.17 -1.17 -1.07 -0.96 -0.9 -0.83 ...
## $ grado_rs : chr [1:2457] "Muy bajo" "Muy bajo" "Muy bajo" "Muy bajo" ...
## $ feb_2020 : num [1:2457] 0 0 0 0 0 0 0 0 0 0 ...
summary(covid[,4:21])
## numero_hospitales poblacion_2022 hogrem2015 hogremjefmuj2015
## Min. : 1.00 Min. : 95 Min. : 0.000 Min. : 0.00
## 1st Qu.: 3.00 1st Qu.: 4546 1st Qu.: 2.296 1st Qu.:22.15
## Median : 8.00 Median : 14240 Median : 5.357 Median :26.02
## Mean : 47.34 Mean : 52158 Mean : 8.562 Mean :25.86
## 3rd Qu.: 24.00 3rd Qu.: 37628 3rd Qu.:11.934 3rd Qu.:29.51
## Max. :2744.00 Max. :1815551 Max. :52.027 Max. :48.24
## popnoafmed2015 gini2015 popden2020 crimen_2018
## Min. : 1.124 Min. :0.3030 Min. : 0.16 Min. : 0.00
## 1st Qu.: 9.746 1st Qu.:0.3690 1st Qu.: 17.79 1st Qu.: 0.00
## Median :13.702 Median :0.3870 Median : 52.85 Median : 9.77
## Mean :14.938 Mean :0.3916 Mean : 313.09 Mean : 19.03
## 3rd Qu.:19.224 3rd Qu.:0.4100 3rd Qu.: 145.07 3rd Qu.: 25.61
## Max. :62.101 Max. :0.6400 Max. :56489.74 Max. :719.42
## crimen_2019 inclusion_fin_2019 porcentaje_pob_pobreza
## Min. : 0.00 Min. : 0.0000 Min. : 5.45
## 1st Qu.: 0.00 1st Qu.: 0.0000 1st Qu.:45.61
## Median : 11.54 Median : 0.0000 Median :62.43
## Mean : 20.34 Mean : 0.4892 Mean :61.91
## 3rd Qu.: 26.89 3rd Qu.: 0.8500 3rd Qu.:80.13
## Max. :551.82 Max. :10.6800 Max. :99.65
## porcentaje_pob_pobreza_ext porcentaje_pob_servicios_salud
## Min. : 0.00 Min. : 1.05
## 1st Qu.: 5.36 1st Qu.:16.10
## Median :12.52 Median :23.30
## Mean :17.14 Mean :25.10
## 3rd Qu.:24.06 3rd Qu.:32.81
## Max. :84.45 Max. :83.86
## porcentaje_pob_acceso_ss pob_6-14_no_edu rezago_social
## Min. :22.03 Min. : 0.000 Min. :-1.550000
## 1st Qu.:64.45 1st Qu.: 4.170 1st Qu.:-0.760000
## Median :76.39 Median : 5.730 Median :-0.220000
## Mean :72.37 Mean : 6.301 Mean :-0.004418
## 3rd Qu.:83.43 3rd Qu.: 7.770 3rd Qu.: 0.460000
## Max. :96.99 Max. :38.560 Max. : 6.830000
## grado_rs feb_2020
## Length:2457 Min. :0.000000
## Class :character 1st Qu.:0.000000
## Mode :character Median :0.000000
## Mean :0.003267
## 3rd Qu.:0.000000
## Max. :1.000000
hist(covid$numero_hospitales,main = 'Número de hospitales')
hist(covid$poblacion_2022,main = 'Población 2022')
hist(covid$gini2015,main = 'Coeficiente de desigualdad del ingreso')
hist(covid$crimen_2019 ,main = 'Crimen 2019')
# Gráfico de barras de la tasa de COVID19
ggplot(data = covid, aes(x = entidad, y = tasa_covid)) +
geom_bar(stat = "identity", fill = "skyblue") +
labs(title = "Tasa de COVID19 por Entidad",
x = "Entidad",
y = "Tasa de COVID19") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
# Gráfico de barras de población 2022
ggplot(data = covid, aes(x = reorder(entidad, -poblacion_2022), y = poblacion_2022)) +
geom_bar(stat = "identity", fill = "lightgreen") +
labs(title = "Población por Entidad en 2022",
x = "Entidad",
y = "Población") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
# Calcular estadísticos descriptivos por municipio
municipio_stats <- covid %>%
group_by(mpio) %>%
summarise(
promedio_hospitales = mean(numero_hospitales, na.rm = TRUE),
promedio_poblacion = mean(poblacion_2022, na.rm = TRUE),
total_casos = sum(total_casos, na.rm = TRUE),
tasa_covid = mean(tasa_covid, na.rm = TRUE)
)
# Mostrar los primeros registros de estadísticos por municipio
head(municipio_stats)
## # A tibble: 6 × 5
## mpio promedio_hospitales promedio_poblacion total_casos tasa_covid
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Abala 2 7035 75 107.
## 2 Abasolo 16.2 28099. 1193 141.
## 3 Abejones 2 1033 21 203.
## 4 Acacoyagua 10 19856 26 13.1
## 5 Acajete 30 39104. 335 43.7
## 6 Acala 20 34494 47 13.6
# Mapa básico de municipios
tm_shape(mx_mpio_map) +
tm_polygons(col = "black") +
tm_compass(position=c("left","bottom")) +
tm_layout(main.title = "Mexico's Municipios")
b. Estadísticos de Dispersión (Global y Regional). Incluir elementos gráficos (box plots, qq plots, etc.).
# Gráfico de dispersión de población y tasa de COVID19
ggplot(data = municipio_stats, aes(x = promedio_poblacion, y = tasa_covid)) +
geom_point(color = "red") +
labs(title = "Relación entre Población y Tasa de COVID19 por Municipio",
x = "Población",
y = "Tasa de COVID19")
qqplot(covid$numero_hospitales,covid$total_casos,main = "Número de hospitales")
qqplot(covid$poblacion_2022,covid$total_casos, main = "Población 2022")
qqplot(covid$gini2015,covid$total_casos, main = "Coeficiente de Desigualdad de Género")
qqplot(covid$crimen_2019,covid$total_casos, main = "Tasa de crimen por cada 100 mil habitantes")
c. Visualizar la distribución espacial de las variables seleccionadas usando mapas.
tm_shape(mpio_geodata) +
tm_polygons(col = "tasa_covid", palette="-cividis", style="quantile", n=8, title="Tasa Covid") +
tm_layout(main.title= 'Tasa Covid por Municipios', title.position = c('right', 'top'), legend.position= c("left", "bottom"), title.size = 1)
tm_shape(mpio_geodata) +
tm_polygons(col = "total_casos", palette = "OrRd", style="quantile", n=8, title="Total de casos") +
tm_layout(main.title= 'Total de casos por municipio', title.position = c('right', 'top'), legend.position= c("left", "bottom"), title.size = 1)
tm_shape(mpio_geodata) +
tm_polygons(col = "numero_hospitales", palette="Purples", style="quantile", n=8, title="Tasa Covid") +
tm_layout(main.title= 'Número de Hospitales por municipio', title.position = c('right', 'top'), legend.position= c("left", "bottom"), title.size = 1)
tm_shape(mpio_geodata) +
tm_polygons(col = "poblacion_2022", palette="Greens", style="quantile", n=8, title="Tasa Covid") +
tm_layout(main.title= 'Población por municipio', title.position = c('right', 'top'), legend.position= c("left", "bottom"), title.size = 1)
tm_shape(mpio_geodata) +
tm_polygons(col = "gini2015", palette="inferno", style="quantile", n=8, title="Tasa Covid") +
tm_layout(main.title= 'Coeficiente de desigualdad del ingreso', title.position = c('right', 'top'), legend.position= c("left", "bottom"), title.size = 1)
tm_shape(mpio_geodata) +
tm_polygons(col = "crimen_2019", palette="viridis", style="quantile", n=8, title="Tasa Covid") +
tm_layout(main.title= 'Tasa de Crimen por Municipio', title.position = c('right', 'top'), legend.position= c("left", "bottom"), title.size = 1)
# Municipios de México - Graficación de variables
hospitales <- tm_shape(mpio_geodata) +
tm_polygons(col = "numero_hospitales", palette="Blues", style="quantile", n=8, title="Numero de Hospitales") +
tm_layout(main.title= 'Número de hospitales', title.position = c('right', 'top'), legend.position= c("left", "bottom"), title.size = 1)
##
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_polygons()`: instead of `style = "quantile"`, use fill.scale =
## `tm_scale_intervals()`.
## ℹ Migrate the argument(s) 'style', 'n', 'palette' (rename to 'values') to
## 'tm_scale_intervals(<HERE>)'
## [v3->v4] `tm_polygons()`: migrate the argument(s) related to the legend of the
## visual variable `fill` namely 'title' to 'fill.legend = tm_legend(<HERE>)'
## [v3->v4] `tm_layout()`: use `tm_title()` instead of `tm_layout(main.title = )`
tasa_covid <- tm_shape(mpio_geodata) +
tm_polygons(col = "tasa_covid", palette="BuGn", style="quantile", n=8, title="Tasa Covid") +
tm_layout(main.title= 'Tasa Covid', title.position = c('right', 'top'), legend.position= c("left", "bottom"), title.size = 1)
## [v3->v4] `tm_polygons()`: migrate the argument(s) related to the legend of the
## visual variable `fill` namely 'title' to 'fill.legend = tm_legend(<HERE>)'
## [v3->v4] `tm_layout()`: use `tm_title()` instead of `tm_layout(main.title = )`
gini <- tm_shape(mpio_geodata) +
tm_polygons(col = "gini2015", palette="OrRd", style="quantile", n=8, title="gini2015") +
tm_layout(main.title= 'Gini 2015', title.position = c('right', 'top'), legend.position= c("left", "bottom"), title.size = 1)
## [v3->v4] `tm_polygons()`: migrate the argument(s) related to the legend of the
## visual variable `fill` namely 'title' to 'fill.legend = tm_legend(<HERE>)'
## [v3->v4] `tm_layout()`: use `tm_title()` instead of `tm_layout(main.title = )`
pop <- tm_shape(mpio_geodata) +
tm_polygons(col = "poblacion_2022", palette="-viridis", style="quantile", n=8, title="poblacion_2022") +
tm_layout(main.title= 'poblacion_2022', title.position = c('right', 'top'), legend.position= c("left", "bottom"), title.size = 1)
## [v3->v4] `tm_polygons()`: migrate the argument(s) related to the legend of the
## visual variable `fill` namely 'title' to 'fill.legend = tm_legend(<HERE>)'
## [v3->v4] `tm_layout()`: use `tm_title()` instead of `tm_layout(main.title = )`
# Panel of Maps
tmap_arrange(hospitales,tasa_covid,gini,pop, ncol = 2)
## [cols4all] color palettes: use palettes from the R package cols4all. Run
## `cols4all::c4a_gui()` to explore them. The old palette name "Blues" is named
## "brewer.blues"
## Multiple palettes called "blues" found: "brewer.blues", "matplotlib.blues". The first one, "brewer.blues", is returned.
##
## [cols4all] color palettes: use palettes from the R package cols4all. Run
## `cols4all::c4a_gui()` to explore them. The old palette name "BuGn" is named
## "brewer.bu_gn"
## Multiple palettes called "bu_gn" found: "brewer.bu_gn", "matplotlib.bu_gn". The first one, "brewer.bu_gn", is returned.
##
## [cols4all] color palettes: use palettes from the R package cols4all. Run
## `cols4all::c4a_gui()` to explore them. The old palette name "OrRd" is named
## "brewer.or_rd"
## Multiple palettes called "or_rd" found: "brewer.or_rd", "matplotlib.or_rd". The first one, "brewer.or_rd", is returned.
d. Elaborar y visualizar 1 – 2 matrices de connectividad para estados / municipios de México.
Spatial Connectivity Matrix
swm <- poly2nb(mx_mpio_map, queen=T)
summary(swm) # The average number of neighbors is 4.31
## Neighbour list object:
## Number of regions: 2456
## Number of nonzero links: 14392
## Percentage nonzero weights: 0.2385967
## Average number of links: 5.859935
## Link number distribution:
##
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 19 20 22
## 8 63 202 393 515 467 317 233 143 52 27 15 5 4 3 3 1 1 1 3
## 8 least connected regions:
## 300 355 668 859 931 1244 1462 2235 with 1 link
## 3 most connected regions:
## 967 1059 1174 with 22 links
Queen Matrix
sswm <- nb2listw(swm, style="W", zero.policy = TRUE)
mx_mpio_map_a <- as(mx_mpio_map, "Spatial")
mx_mpio_map_centroid <- coordinates(mx_mpio_map_a)
plot(mx_mpio_map_a,border="blue",axes=FALSE,las=1, main="Mexico's Municipality Queen SWM")
plot(mx_mpio_map_a,col="grey",border=grey(0.9),axes=T,add=T)
plot(sswm,coords=mx_mpio_map_centroid,pch=19,cex=0.1,col="red",add=T)
e. Detectar la presencia de autocorrelación espacial global para cada una de las variables seleccionadas.
f. Detectar la presencia de autocorrelación espacial local para cada una de las variables seleccionadas.
### Spatial Autocorrelation
# Global Moran's I
moran.test(mpio_geodata$numero_hospitales, sswm)
##
## Moran I test under randomisation
##
## data: mpio_geodata$numero_hospitales
## weights: sswm
##
## Moran I statistic standard deviate = 15.167, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.1810738904 -0.0004073320 0.0001431794
moran.test(mpio_geodata$total_casos, sswm)
##
## Moran I test under randomisation
##
## data: mpio_geodata$total_casos
## weights: sswm
##
## Moran I statistic standard deviate = 35.138, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.4143086184 -0.0004073320 0.0001392969
moran.test(mpio_geodata$crimen_2019, sswm)
##
## Moran I test under randomisation
##
## data: mpio_geodata$crimen_2019
## weights: sswm
##
## Moran I statistic standard deviate = 20.352, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.2447673979 -0.0004073320 0.0001451252
moran.test(mpio_geodata$gini2015, sswm)
##
## Moran I test under randomisation
##
## data: mpio_geodata$gini2015
## weights: sswm
##
## Moran I statistic standard deviate = 24.283, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.2954305305 -0.0004073320 0.0001484178
moran.test(mpio_geodata$poblacion_2022, sswm)
##
## Moran I test under randomisation
##
## data: mpio_geodata$poblacion_2022
## weights: sswm
##
## Moran I statistic standard deviate = 26.255, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.3159585568 -0.0004073320 0.0001451961
table <- data.frame(Variable = c("numero_hospitales", "total_casos", "crimen_2019","gini_2015","poblacion_2022"), GM = c(0.18, 0.41, 0.24,0.29,0.31), Significance = c("*","***","**","**","***"))
table
## Variable GM Significance
## 1 numero_hospitales 0.18 *
## 2 total_casos 0.41 ***
## 3 crimen_2019 0.24 **
## 4 gini_2015 0.29 **
## 5 poblacion_2022 0.31 ***
mpio_geodata$sp_lag_numero_hospitales <- lag.listw(sswm, mpio_geodata$numero_hospitales, zero.policy=TRUE)
mpio_geodata$sp_lag_tasa_covid <- lag.listw(sswm, mpio_geodata$total_casos, zero.policy=TRUE)
mpio_geodata$sp_lag_crimen_2019 <- lag.listw(sswm, mpio_geodata$crimen_2019, zero.policy=TRUE)
mpio_geodata$sp_lag_gini2015 <- lag.listw(sswm, mpio_geodata$gini2015, zero.policy=TRUE)
mpio_geodata$sp_lag_poblacion_2022 <- lag.listw(sswm, mpio_geodata$poblacion_2022, zero.policy=TRUE)
g. Identificar la posible presencia de clústers locales / regionales para cada una de las variables de seleccionadas. ¿Cuáles son algunas de las características socioeconómicas y/o económicas de los estados / municipios que componen los clústers identificados?
newhosp_lag <- tm_shape(mpio_geodata) +
tm_polygons(col = "sp_lag_tasa_covid", palette="Purples", style="quantile", n=5, title="Número de Casos de Covid") +
tm_layout(main.title= 'Clusters de Covid', title.position = c('right', 'top'), legend.position= c("left", "bottom"), title.size = 1)
newhosp_lag
newhosp_lag <- tm_shape(mpio_geodata) +
tm_polygons(col = "sp_lag_numero_hospitales", palette="Purples", style="quantile", n=5, title="Número de Hospitales") +
tm_layout(main.title= 'Clusters de Hospitales', title.position = c('right', 'top'), legend.position= c("left", "bottom"), title.size = 1)
newhosp_lag
newcrime_lag <- tm_shape(mpio_geodata) +
tm_polygons(col = "sp_lag_crimen_2019", palette="Reds", style="quantile", n=10, title="Número de Crimen") +
tm_layout(main.title= 'Clusters de Crimen', title.position = c('right', 'top'), legend.position= c("left", "bottom"), title.size = 1)
##
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_polygons()`: instead of `style = "quantile"`, use fill.scale =
## `tm_scale_intervals()`.
## ℹ Migrate the argument(s) 'style', 'n', 'palette' (rename to 'values') to
## 'tm_scale_intervals(<HERE>)'
## [v3->v4] `tm_polygons()`: migrate the argument(s) related to the legend of the
## visual variable `fill` namely 'title' to 'fill.legend = tm_legend(<HERE>)'
## [v3->v4] `tm_layout()`: use `tm_title()` instead of `tm_layout(main.title = )`
newcrime_lag
## [cols4all] color palettes: use palettes from the R package cols4all. Run
## `cols4all::c4a_gui()` to explore them. The old palette name "Reds" is named
## "brewer.reds"
## Multiple palettes called "reds" found: "brewer.reds", "matplotlib.reds". The first one, "brewer.reds", is returned.
newgini_lag <- tm_shape(mpio_geodata) +
tm_polygons(col = "sp_lag_gini2015", palette="Blues", style="quantile", n=5, title="Desigualdad de ingreso") +
tm_layout(main.title= 'Clusters de Desigualdad de ingreso', title.position = c('right', 'top'), legend.position= c("left", "bottom"), title.size = 1)
##
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_polygons()`: instead of `style = "quantile"`, use fill.scale =
## `tm_scale_intervals()`.
## ℹ Migrate the argument(s) 'style', 'n', 'palette' (rename to 'values') to
## 'tm_scale_intervals(<HERE>)'
## [v3->v4] `tm_polygons()`: migrate the argument(s) related to the legend of the
## visual variable `fill` namely 'title' to 'fill.legend = tm_legend(<HERE>)'
## [v3->v4] `tm_layout()`: use `tm_title()` instead of `tm_layout(main.title = )`
newgini_lag
## [cols4all] color palettes: use palettes from the R package cols4all. Run
## `cols4all::c4a_gui()` to explore them. The old palette name "Blues" is named
## "brewer.blues"
## Multiple palettes called "blues" found: "brewer.blues", "matplotlib.blues". The first one, "brewer.blues", is returned.
newpob_lag <- tm_shape(mpio_geodata) +
tm_polygons(col = "sp_lag_poblacion_2022", palette="PuOr", style="quantile", n=5, title="Densidad Poblacional") +
tm_layout(main.title= 'Clusters de Población', title.position = c('right', 'top'), legend.position= c("left", "bottom"), title.size = 1)
##
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_polygons()`: instead of `style = "quantile"`, use fill.scale =
## `tm_scale_intervals()`.
## ℹ Migrate the argument(s) 'style', 'n', 'palette' (rename to 'values') to
## 'tm_scale_intervals(<HERE>)'
## [v3->v4] `tm_polygons()`: migrate the argument(s) related to the legend of the
## visual variable `fill` namely 'title' to 'fill.legend = tm_legend(<HERE>)'
## [v3->v4] `tm_layout()`: use `tm_title()` instead of `tm_layout(main.title = )`
newpob_lag
## [cols4all] color palettes: use palettes from the R package cols4all. Run
## `cols4all::c4a_gui()` to explore them. The old palette name "PuOr" is named
## "brewer.pu_or"
## Multiple palettes called "pu_or" found: "brewer.pu_or", "matplotlib.pu_or". The first one, "brewer.pu_or", is returned.
4) Considerando el contexto de la situación problema, describir los principales 6 - 8 hallazgos encontrados a partir del ESDA.
- El esparcimiento del covid se dió en mayores niveles en el norte del país y esto pudo ocasionar una gran cantidad de muertes debido a que no existía una cantidad de hospitales considerable distribuidos en el país.
- Baja California y Baja California sur fueron los estados que presentaron una mayor cantidad de casos en base a los mapas realizados. De la misma manera cuentan con los mayores números de casos de covid, porcentualmente hablando, puede que estos estados no tengan las mayores cifras sin embargo si tuvieron valores altos en contagios.
- En general se tuvo una alta tasa de casos en el oeste del país en contagios de covid, al compararlo con las tasas de crimen podemos ver que hay una autocorrelación positiva y de manera inicial podemos ver que hay posibilidad de encontrar una relación entre el crimen en México y las muertes de covid, se desconoce como sea la relación en práctica de este evento.
- El indicador GINI también es un indicador que de manera visual parece tener relación con la cantidad de casos de covid ya que se ve una fuerte concentración de casos y de desigualdad de ingreso en los mapas.
- En cuanto a la población de México podemos ver que hay una autocorrelación espacial aleatoria, esa variable podría ser explorada con más detalle en un análisis posterior para modelado.
Actividad Extra
Spatial Regression Model
Model specification
panel <- read.csv('panel_dataset.csv')
summary(panel)
## state year state_id new_fdi
## Length:512 Min. :2006 Min. : 888 Min. :-958.11
## Class :character 1st Qu.:2010 1st Qu.:1047 1st Qu.: 59.33
## Mode :character Median :2014 Median :1081 Median : 196.00
## Mean :2014 Mean :1219 Mean : 367.61
## 3rd Qu.:2017 3rd Qu.:1118 3rd Qu.: 429.81
## Max. :2021 Max. :2357 Max. :4724.62
##
## reinv_profits intercom_acc total_fdi crime_rate
## Min. : -55.23 Min. :-796.120 Min. :-405.5 Min. : 1.710
## 1st Qu.: 69.75 1st Qu.: 6.902 1st Qu.: 195.2 1st Qu.: 7.973
## Median : 156.84 Median : 81.315 Median : 493.6 Median : 13.745
## Mean : 370.56 Mean : 207.514 Mean : 945.7 Mean : 21.758
## 3rd Qu.: 400.09 3rd Qu.: 328.062 3rd Qu.:1184.5 3rd Qu.: 25.828
## Max. :4436.22 Max. :1960.580 Max. :8777.9 Max. :181.510
##
## unemployment employment business_activity real_wage
## Min. :0.01000 Min. :0.8900 Min. :-2.980 Min. :239.3
## 1st Qu.:0.03000 1st Qu.:0.9500 1st Qu.:-2.270 1st Qu.:281.1
## Median :0.04000 Median :0.9700 Median :-2.090 Median :304.8
## Mean :0.04326 Mean :0.9632 Mean :-1.868 Mean :312.8
## 3rd Qu.:0.05000 3rd Qu.:0.9700 3rd Qu.:-1.887 3rd Qu.:333.9
## Max. :0.10000 Max. :0.9900 Max. : 2.470 Max. :466.8
##
## real_ave_month_income pop_density good_governance
## Min. :3281 Min. : 7.74 Min. : 0.000
## 1st Qu.:4553 1st Qu.: 39.56 1st Qu.: 0.170
## Median :5247 Median : 61.49 Median : 0.480
## Mean :5384 Mean : 298.49 Mean : 2.219
## 3rd Qu.:6083 3rd Qu.: 149.11 3rd Qu.: 1.258
## Max. :9206 Max. :6195.59 Max. :200.020
##
## ratio_public_investment lq_primary lq_secondary lq_tertiary
## Min. :0.000000 Min. :0.010 Min. :0.3200 Min. :0.7800
## 1st Qu.:0.000000 1st Qu.:0.400 1st Qu.:0.6700 1st Qu.:0.9400
## Median :0.000000 Median :0.740 Median :0.9900 Median :1.0100
## Mean :0.005625 Mean :1.057 Mean :0.9869 Mean :0.9996
## 3rd Qu.:0.010000 3rd Qu.:1.350 3rd Qu.:1.2500 3rd Qu.:1.0700
## Max. :0.030000 Max. :4.700 Max. :1.9900 Max. :1.1800
##
## exchange_rate patents_rate inpc border_distance
## Min. :10.85 Min. : 0.000 Min. : 62.69 Min. : 8.83
## 1st Qu.:12.87 1st Qu.: 0.330 1st Qu.: 74.14 1st Qu.: 613.26
## Median :14.14 Median : 0.800 Median : 85.48 Median : 751.64
## Mean :15.69 Mean : 1.557 Mean : 86.75 Mean : 704.92
## 3rd Qu.:19.38 3rd Qu.: 1.923 3rd Qu.: 99.46 3rd Qu.: 875.76
## Max. :20.52 Max. :13.770 Max. :117.31 Max. :1252.66
## NA's :32
## college_education new_fdi_real_mxn log_new_fdi_real_mxn region_a
## Min. :0.1860 Min. :-15305 Min. :-4.185 Length:512
## 1st Qu.:0.3050 1st Qu.: 1124 1st Qu.: 3.050 Class :character
## Median :0.3530 Median : 3496 Median : 3.544 Mode :character
## Mean :0.3599 Mean : 6587 Mean : 3.192
## 3rd Qu.:0.4140 3rd Qu.: 7558 3rd Qu.: 3.878
## Max. :0.5740 Max. : 91516 Max. : 4.961
##
## region_b trump_election exports
## Min. :1.000 Min. :0.00 Min. : 14707
## 1st Qu.:2.000 1st Qu.:0.00 1st Qu.: 1090699
## Median :3.000 Median :0.00 Median : 4125480
## Mean :3.188 Mean :0.25 Mean :10202997
## 3rd Qu.:4.250 3rd Qu.:0.25 3rd Qu.:16852403
## Max. :5.000 Max. :1.00 Max. :58614915
## NA's :32
# Mexico's states (32)
mx_state_map <- st_read("D:/Backup octavo sem/Planación estratégica/Modulo 1/Actividades/mx_maps/mx_states/mexlatlong.shp")
## Reading layer `mexlatlong' from data source
## `D:\Backup octavo sem\Planación estratégica\Modulo 1\Actividades\mx_maps\mx_states\mexlatlong.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 32 features and 19 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -118.4042 ymin: 14.55055 xmax: -86.73862 ymax: 32.71846
## Geodetic CRS: WGS 84
pd_frame <- pdata.frame(panel, index = c("state", "year"), drop.index = TRUE)
# panel model specification
panel_model <- new_fdi_real_mxn ~ business_activity + real_ave_month_income + crime_rate + pop_density
summary(panel_model)
## Length Class Mode
## 3 formula call
# estimate panel regression model
non_spatial_panel_model <- plm(panel_model, data = pd_frame, model = "within", effect = "twoways")
summary(non_spatial_panel_model)
## Twoways effects Within Model
##
## Call:
## plm(formula = panel_model, data = pd_frame, effect = "twoways",
## model = "within")
##
## Balanced Panel: n = 32, T = 16, N = 512
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -52312.79 -2559.00 -584.15 1768.59 49856.94
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## business_activity 482.09814 4350.44483 0.1108 0.9118
## real_ave_month_income 0.73602 0.98133 0.7500 0.4536
## crime_rate -5.62203 21.44470 -0.2622 0.7933
## pop_density 28.62129 20.55342 1.3925 0.1644
##
## Total Sum of Squares: 2.1112e+10
## Residual Sum of Squares: 2.0984e+10
## R-Squared: 0.0060417
## Adj. R-Squared: -0.10176
## F-statistic: 0.700541 on 4 and 461 DF, p-value: 0.59188
state_geodata <- inner_join(mx_state_map, panel, by = c("OBJECTID" = "state_id"))
data_2018 <- state_geodata %>% filter(year==2018)
data_2019 <- state_geodata %>% filter(year==2019)
data_2020 <- state_geodata %>% filter(year==2020)
data_2021 <- state_geodata %>% filter(year==2021)
## Matriz de conectividad por año
swm_2018 <- poly2nb(data_2018, queen=T)
sswm_2018 <- nb2listw(swm_2018, style="W", zero.policy = TRUE)
swm_2019 <- poly2nb(data_2019, queen=T)
sswm_2019 <- nb2listw(swm_2019, style="W", zero.policy = TRUE)
swm_2020 <- poly2nb(data_2020, queen=T)
sswm_2020 <- nb2listw(swm_2020, style="W", zero.policy = TRUE)
swm_2021 <- poly2nb(data_2021, queen=T)
sswm_2021 <- nb2listw(swm_2021, style="W", zero.policy = TRUE)
## Datos panel por año
panel_2018 <- panel %>% filter(year==2018)
panel_2019 <- panel %>% filter(year==2019)
panel_2020 <- panel %>% filter(year==2020)
panel_2021 <- panel %>% filter(year==2021)
## Matriz de conectividad general
swm2 <- poly2nb(state_geodata, queen=T)
summary(swm2) # The average number of neighbors is 4.31
## Neighbour list object:
## Number of regions: 512
## Number of nonzero links: 43008
## Percentage nonzero weights: 16.40625
## Average number of links: 84
## Link number distribution:
##
## 31 47 63 79 95 111 127 143 159
## 16 96 96 96 80 32 48 32 16
## 16 least connected regions:
## 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 with 31 links
## 16 most connected regions:
## 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 with 159 links
sswm2 <- nb2listw(swm2, style="W", zero.policy = TRUE)
# estimate spatial panel regression model
spatial_panel_model <- pspatfit(formula = panel_model, data = panel, listw = sswm2, demean = FALSE, eff_demean = "twoways", type = "sar", index = c("state", "year"))
##
## Fitting Model...
##
## Time to fit the model: 0.24 seconds
summary(spatial_panel_model)
##
## Call
## pspatfit(formula = panel_model, data = panel, listw = sswm2,
## type = "sar", demean = FALSE, eff_demean = "twoways", index = c("state",
## "year"))
##
## Parametric Terms
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 461.53697 2372.26328 0.1946 0.8458190
## business_activity 1149.78068 430.94656 2.6680 0.0078743 **
## real_ave_month_income 1.20228 0.34888 3.4461 0.0006161 ***
## crime_rate -16.10602 16.07289 -1.0021 0.3167930
## pop_density 6.75810 0.34299 19.7037 < 2.2e-16 ***
## rho 0.01800 0.10626 0.1694 0.8655448
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Goodness-of-Fit
##
## EDF Total: 6
## Sigma: 7830.16
## AIC: 9698.5
## BIC: 9723.93
# estimate spatial regression model
sar_2018 <- lagsarlm(new_fdi_real_mxn ~ business_activity + real_ave_month_income + crime_rate + pop_density, data = data_2018, listw = sswm_2018)
summary(sar_2018)
##
## Call:
## lagsarlm(formula = new_fdi_real_mxn ~ business_activity + real_ave_month_income +
## crime_rate + pop_density, data = data_2018, listw = sswm_2018)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7578.5 -3178.6 -1415.6 1754.3 12902.9
##
## Type: lag
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1313.62165 6166.26184 -0.2130 0.83130
## business_activity 1373.11073 1079.43620 1.2721 0.20335
## real_ave_month_income 1.93105 0.97996 1.9705 0.04878
## crime_rate -38.64395 40.18974 -0.9615 0.33628
## pop_density 6.72925 0.82234 8.1830 2.22e-16
##
## Rho: -0.011292, LR test value: 0.0028123, p-value: 0.95771
## Asymptotic standard error: 0.20428
## z-value: -0.055278, p-value: 0.95592
## Wald statistic: 0.0030557, p-value: 0.95592
##
## Log likelihood: -316.5829 for lag model
## ML residual variance (sigma squared): 22943000, (sigma: 4789.9)
## Number of observations: 32
## Number of parameters estimated: 7
## AIC: 647.17, (AIC for lm: 645.17)
## LM test for residual autocorrelation
## test value: 0.027301, p-value: 0.86876
AIC(sar_2018)
## [1] 647.1658
# estimate spatial panel regression model
sem_2018 <- errorsarlm(new_fdi_real_mxn ~ business_activity + real_ave_month_income + crime_rate + pop_density, data = data_2018, listw = sswm_2018)
summary(sem_2018)
##
## Call:
## errorsarlm(formula = new_fdi_real_mxn ~ business_activity + real_ave_month_income +
## crime_rate + pop_density, data = data_2018, listw = sswm_2018)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7525.0 -3147.5 -1456.8 1703.7 12912.4
##
## Type: error
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1249.22689 6028.82952 -0.2072 0.83585
## business_activity 1377.02552 1072.88876 1.2835 0.19933
## real_ave_month_income 1.91252 0.97741 1.9567 0.05038
## crime_rate -39.73962 40.11483 -0.9906 0.32186
## pop_density 6.73283 0.81898 8.2210 2.22e-16
##
## Lambda: -0.037414, LR test value: 0.019142, p-value: 0.88996
## Asymptotic standard error: 0.23978
## z-value: -0.15603, p-value: 0.87601
## Wald statistic: 0.024346, p-value: 0.87601
##
## Log likelihood: -316.5747 for error model
## ML residual variance (sigma squared): 22924000, (sigma: 4787.9)
## Number of observations: 32
## Number of parameters estimated: 7
## AIC: 647.15, (AIC for lm: 645.17)
AIC(sem_2018)
## [1] 647.1495
res <- residuals(sem_2018)
data_2018$residuals <- res
tm_shape(data_2018) +
tm_polygons(col = "residuals", palette="Blues", style="quantile", n=8, title="Residuales") +
tm_layout(main.title= 'Residuales del modelo espacial', title.position = c('right', 'top'), legend.position= c("left", "bottom"), title.size = 1)
##
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_polygons()`: instead of `style = "quantile"`, use fill.scale =
## `tm_scale_intervals()`.
## ℹ Migrate the argument(s) 'style', 'n', 'palette' (rename to 'values') to
## 'tm_scale_intervals(<HERE>)'
## [v3->v4] `tm_polygons()`: migrate the argument(s) related to the legend of the
## visual variable `fill` namely 'title' to 'fill.legend = tm_legend(<HERE>)'
## [v3->v4] `tm_layout()`: use `tm_title()` instead of `tm_layout(main.title = )`
## [cols4all] color palettes: use palettes from the R package cols4all. Run
## `cols4all::c4a_gui()` to explore them. The old palette name "Blues" is named
## "brewer.blues"
## Multiple palettes called "blues" found: "brewer.blues", "matplotlib.blues". The first one, "brewer.blues", is returned.
### Spatial Autocorrelation of Residuals
# Global Moran's I
moran.test(data_2018$residuals, sswm_2018)
##
## Moran I test under randomisation
##
## data: data_2018$residuals
## weights: sswm_2018
##
## Moran I statistic standard deviate = 0.25167, p-value = 0.4006
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## -0.001752745 -0.032258065 0.014692356
# estimate spatial regression model
sar_2019 <- lagsarlm(new_fdi_real_mxn ~ business_activity + real_ave_month_income + crime_rate + pop_density, data = data_2019, listw = sswm_2019)
summary(sar_2019)
##
## Call:
## lagsarlm(formula = new_fdi_real_mxn ~ business_activity + real_ave_month_income +
## crime_rate + pop_density, data = data_2019, listw = sswm_2019)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7286.81 -2785.05 -735.42 1621.94 10646.11
##
## Type: lag
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 349.99705 5046.28428 0.0694 0.94471
## business_activity 1192.13546 889.15180 1.3408 0.18000
## real_ave_month_income 1.34088 0.71946 1.8637 0.06236
## crime_rate -48.81095 31.65134 -1.5421 0.12304
## pop_density 11.23820 0.67959 16.5368 < 2e-16
##
## Rho: 0.065628, LR test value: 0.1776, p-value: 0.67344
## Asymptotic standard error: 0.15263
## z-value: 0.42998, p-value: 0.66721
## Wald statistic: 0.18488, p-value: 0.66721
##
## Log likelihood: -309.4945 for lag model
## ML residual variance (sigma squared): 14716000, (sigma: 3836.1)
## Number of observations: 32
## Number of parameters estimated: 7
## AIC: 632.99, (AIC for lm: 631.17)
## LM test for residual autocorrelation
## test value: 0.23321, p-value: 0.62915
AIC(sar_2019)
## [1] 632.9891
# estimate spatial panel regression model
sem_2019 <- errorsarlm(new_fdi_real_mxn ~ business_activity + real_ave_month_income + crime_rate + pop_density, data = data_2019, listw = sswm_2019)
summary(sem_2019)
##
## Call:
## errorsarlm(formula = new_fdi_real_mxn ~ business_activity + real_ave_month_income +
## crime_rate + pop_density, data = data_2019, listw = sswm_2019)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7254.61 -3010.08 -549.51 1726.30 10777.33
##
## Type: error
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1440.64327 4889.22021 0.2947 0.76826
## business_activity 1206.27245 893.45235 1.3501 0.17698
## real_ave_month_income 1.22564 0.71572 1.7125 0.08681
## crime_rate -48.75257 31.47390 -1.5490 0.12139
## pop_density 11.31048 0.67091 16.8584 < 2e-16
##
## Lambda: -0.034474, LR test value: 0.016554, p-value: 0.89762
## Asymptotic standard error: 0.23963
## z-value: -0.14387, p-value: 0.88561
## Wald statistic: 0.020697, p-value: 0.88561
##
## Log likelihood: -309.5751 for error model
## ML residual variance (sigma squared): 14802000, (sigma: 3847.3)
## Number of observations: 32
## Number of parameters estimated: 7
## AIC: 633.15, (AIC for lm: 631.17)
res <- residuals(sem_2019)
data_2019$residuals <- res
tm_shape(data_2019) +
tm_polygons(col = "residuals", palette="Blues", style="quantile", n=8, title="Residuales") +
tm_layout(main.title= 'Residuales del modelo espacial', title.position = c('right', 'top'), legend.position= c("left", "bottom"), title.size = 1)
##
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_polygons()`: instead of `style = "quantile"`, use fill.scale =
## `tm_scale_intervals()`.
## ℹ Migrate the argument(s) 'style', 'n', 'palette' (rename to 'values') to
## 'tm_scale_intervals(<HERE>)'
## [v3->v4] `tm_polygons()`: migrate the argument(s) related to the legend of the
## visual variable `fill` namely 'title' to 'fill.legend = tm_legend(<HERE>)'
## [v3->v4] `tm_layout()`: use `tm_title()` instead of `tm_layout(main.title = )`
## [cols4all] color palettes: use palettes from the R package cols4all. Run
## `cols4all::c4a_gui()` to explore them. The old palette name "Blues" is named
## "brewer.blues"
## Multiple palettes called "blues" found: "brewer.blues", "matplotlib.blues". The first one, "brewer.blues", is returned.
### Spatial Autocorrelation of Residuals
# Global Moran's I
moran.test(data_2019$residuals, sswm_2019)
##
## Moran I test under randomisation
##
## data: data_2019$residuals
## weights: sswm_2019
##
## Moran I statistic standard deviate = 0.26652, p-value = 0.3949
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 1.673614e-05 -3.225806e-02 1.466416e-02
# estimate spatial regression model
sar_2020 <- lagsarlm(new_fdi_real_mxn ~ business_activity + real_ave_month_income + crime_rate + pop_density, data = data_2020, listw = sswm_2020)
summary(sar_2020)
##
## Call:
## lagsarlm(formula = new_fdi_real_mxn ~ business_activity + real_ave_month_income +
## crime_rate + pop_density, data = data_2020, listw = sswm_2020)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4974.51 -1523.74 -470.26 1109.87 5242.51
##
## Type: lag
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2314.51351 3206.09602 -0.7219 0.470350
## business_activity 647.21987 530.32661 1.2204 0.222307
## real_ave_month_income 1.33974 0.44714 2.9963 0.002733
## crime_rate -45.90118 17.13856 -2.6782 0.007401
## pop_density 4.15058 0.40436 10.2647 < 2.2e-16
##
## Rho: 0.029182, LR test value: 0.01828, p-value: 0.89245
## Asymptotic standard error: 0.19496
## z-value: 0.14968, p-value: 0.88102
## Wald statistic: 0.022404, p-value: 0.88102
##
## Log likelihood: -293.3914 for lag model
## ML residual variance (sigma squared): 5383600, (sigma: 2320.3)
## Number of observations: 32
## Number of parameters estimated: 7
## AIC: 600.78, (AIC for lm: 598.8)
## LM test for residual autocorrelation
## test value: 0.14553, p-value: 0.70285
AIC(sar_2020)
## [1] 600.7827
# estimate spatial panel regression model
sem_2020 <- errorsarlm(new_fdi_real_mxn ~ business_activity + real_ave_month_income + crime_rate + pop_density, data = data_2020, listw = sswm_2020)
summary(sem_2020)
##
## Call:
## errorsarlm(formula = new_fdi_real_mxn ~ business_activity + real_ave_month_income +
## crime_rate + pop_density, data = data_2020, listw = sswm_2020)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4882.19 -1554.36 -400.52 1107.62 5282.11
##
## Type: error
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2274.52481 3049.66065 -0.7458 0.455771
## business_activity 673.86865 527.19119 1.2782 0.201170
## real_ave_month_income 1.35874 0.44052 3.0844 0.002040
## crime_rate -45.20006 17.01023 -2.6572 0.007879
## pop_density 4.15116 0.40360 10.2853 < 2.2e-16
##
## Lambda: -0.046286, LR test value: 0.026586, p-value: 0.87048
## Asymptotic standard error: 0.24023
## z-value: -0.19267, p-value: 0.84722
## Wald statistic: 0.037122, p-value: 0.84722
##
## Log likelihood: -293.3872 for error model
## ML residual variance (sigma squared): 5380500, (sigma: 2319.6)
## Number of observations: 32
## Number of parameters estimated: 7
## AIC: 600.77, (AIC for lm: 598.8)
res <- residuals(sem_2020)
data_2020$residuals <- res
tm_shape(data_2020) +
tm_polygons(col = "residuals", palette="Blues", style="quantile", n=8, title="Residuales") +
tm_layout(main.title= 'Residuales del modelo espacial', title.position = c('right', 'top'), legend.position= c("left", "bottom"), title.size = 1)
##
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_polygons()`: instead of `style = "quantile"`, use fill.scale =
## `tm_scale_intervals()`.
## ℹ Migrate the argument(s) 'style', 'n', 'palette' (rename to 'values') to
## 'tm_scale_intervals(<HERE>)'
## [v3->v4] `tm_polygons()`: migrate the argument(s) related to the legend of the
## visual variable `fill` namely 'title' to 'fill.legend = tm_legend(<HERE>)'
## [v3->v4] `tm_layout()`: use `tm_title()` instead of `tm_layout(main.title = )`
## [cols4all] color palettes: use palettes from the R package cols4all. Run
## `cols4all::c4a_gui()` to explore them. The old palette name "Blues" is named
## "brewer.blues"
## Multiple palettes called "blues" found: "brewer.blues", "matplotlib.blues". The first one, "brewer.blues", is returned.
### Spatial Autocorrelation of Residuals
# Global Moran's I
moran.test(data_2020$residuals, sswm_2020)
##
## Moran I test under randomisation
##
## data: data_2020$residuals
## weights: sswm_2020
##
## Moran I statistic standard deviate = 0.23962, p-value = 0.4053
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## -0.002796386 -0.032258065 0.015116680
# estimate spatial regression model
sar_2021 <- lagsarlm(new_fdi_real_mxn ~ business_activity + real_ave_month_income + crime_rate + pop_density, data = data_2021, listw = sswm_2021)
## Warning in lagsarlm(new_fdi_real_mxn ~ business_activity + real_ave_month_income + : inversion of asymptotic covariance matrix failed for tol.solve = 2.22044604925031e-16
## número de condición recíproco = 1.52226e-16 - using numerical Hessian.
summary(sar_2021)
##
## Call:
## lagsarlm(formula = new_fdi_real_mxn ~ business_activity + real_ave_month_income +
## crime_rate + pop_density, data = data_2021, listw = sswm_2021)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7826.6 -2763.1 -1507.6 2228.7 15342.7
##
## Type: lag
## Coefficients: (numerical Hessian approximate standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3623.07805 6979.44494 -0.5191 0.603686
## business_activity 3369.83616 1278.74502 2.6353 0.008407
## real_ave_month_income 2.97432 0.94625 3.1433 0.001671
## crime_rate -21.16383 37.16506 -0.5695 0.569047
## pop_density 5.04674 0.88560 5.6986 1.208e-08
##
## Rho: 0.0012113, LR test value: 3.1465e-05, p-value: 0.99552
## Approximate (numerical Hessian) standard error: 0.022117
## z-value: 0.054767, p-value: 0.95632
## Wald statistic: 0.0029994, p-value: 0.95632
##
## Log likelihood: -319.213 for lag model
## ML residual variance (sigma squared): 27043000, (sigma: 5200.3)
## Number of observations: 32
## Number of parameters estimated: 7
## AIC: 652.43, (AIC for lm: 650.43)
AIC(sar_2021)
## [1] 652.426
# estimate spatial panel regression model
sem_2021 <- errorsarlm(new_fdi_real_mxn ~ business_activity + real_ave_month_income + crime_rate + pop_density, data = data_2021, listw = sswm_2021)
summary(sem_2021)
##
## Call:
## errorsarlm(formula = new_fdi_real_mxn ~ business_activity + real_ave_month_income +
## crime_rate + pop_density, data = data_2021, listw = sswm_2021)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6726.2 -2928.1 -1522.1 2268.7 15020.8
##
## Type: error
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1863.70485 6983.95250 -0.2669 0.789581
## business_activity 3711.79736 1294.07090 2.8683 0.004127
## real_ave_month_income 2.78731 0.92486 3.0138 0.002580
## crime_rate -22.30403 36.42968 -0.6122 0.540373
## pop_density 4.92393 0.86188 5.7130 1.11e-08
##
## Lambda: -0.1835, LR test value: 0.48067, p-value: 0.48812
## Asymptotic standard error: 0.24432
## z-value: -0.75108, p-value: 0.45261
## Wald statistic: 0.56412, p-value: 0.45261
##
## Log likelihood: -318.9727 for error model
## ML residual variance (sigma squared): 26419000, (sigma: 5140)
## Number of observations: 32
## Number of parameters estimated: 7
## AIC: 651.95, (AIC for lm: 650.43)
res <- residuals(sem_2021)
data_2021$residuals <- res
tm_shape(data_2021) +
tm_polygons(col = "residuals", palette="Blues", style="quantile", n=8, title="Residuales") +
tm_layout(main.title= 'Residuales del modelo espacial', title.position = c('right', 'top'), legend.position= c("left", "bottom"), title.size = 1)
##
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_polygons()`: instead of `style = "quantile"`, use fill.scale =
## `tm_scale_intervals()`.
## ℹ Migrate the argument(s) 'style', 'n', 'palette' (rename to 'values') to
## 'tm_scale_intervals(<HERE>)'
## [v3->v4] `tm_polygons()`: migrate the argument(s) related to the legend of the
## visual variable `fill` namely 'title' to 'fill.legend = tm_legend(<HERE>)'
## [v3->v4] `tm_layout()`: use `tm_title()` instead of `tm_layout(main.title = )`
## [cols4all] color palettes: use palettes from the R package cols4all. Run
## `cols4all::c4a_gui()` to explore them. The old palette name "Blues" is named
## "brewer.blues"
## Multiple palettes called "blues" found: "brewer.blues", "matplotlib.blues". The first one, "brewer.blues", is returned.
### Spatial Autocorrelation of Residuals
# Global Moran's I
moran.test(data_2021$residuals, sswm_2019)
##
## Moran I test under randomisation
##
## data: data_2021$residuals
## weights: sswm_2019
##
## Moran I statistic standard deviate = 0.21568, p-value = 0.4146
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## -0.006398193 -0.032258065 0.014375528
Conclusiones
Para esta actividad se llevaron a cabo 4 modelos de autocorrelación espacial para los años 2021, 2020, 2019 y 2018, considerando los valores p de cada uno de los modelos y el indicador de Moran podemos determinar que los modelos no son significativos, además, de manera visual no se puede determinar una progresión que permita el uso del modelo para un ejercicio de predicción en base a las variables consideradas. Si se quisiese profundizar en estos modelos serían preciso ubicar variables de mayor impacto dentro de la base para la inversión extrajera directa a través de una posible matriz de correlación, con esta realizar de nuevo el ejercicio ahora con nuevas variables y determinar, en base al valor p e indicador de moran si el nuevo análisis es significativo o no, en conjunto con determinar si existe un fenómeno de autocorrelación espacial y una progresión de esto a través de los periodos de tiempo analizados.
Creditos:
Un especial agradecimiento para :
José Santiago González Padilla | Linkedin
Ximena Solís Islas | Linkedin
Vivian Pérez Mosqueda | Linkedin
Quienes me ayudaron a hacer este proyecto posible y completar de manera exitosa los análisis presentados en el archivo presente.