Ejercicio

library(rio)
library(dplyr)

## Warning: package 'dplyr' was built under R version 4.4.2

## 
## Adjuntando el paquete: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

dataVotos = import("DataV.csv")
dataE = import("dataEDU.csv")
dataFC = import("Fc.csv")

Terminar de arreglar la data

# Combinar los dataframes uno por uno
temp1 = merge(dataE, dataFC, by='UBIGEO', all.x = T)

temp1 <- temp1 %>% 
  na.omit()

data_dep <- temp1 %>%
  group_by(DEPARTAMENTO) %>%
  summarise(across(where(is.numeric), sum, na.rm = TRUE), .groups = "drop")

## Warning: There was 1 warning in `summarise()`.
## ℹ In argument: `across(where(is.numeric), sum, na.rm = TRUE)`.
## ℹ In group 1: `DEPARTAMENTO = "AMAZONAS"`.
## Caused by warning:
## ! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
## Supply arguments directly to `.fns` through an anonymous function instead.
## 
##   # Previously
##   across(a:b, mean, na.rm = TRUE)
## 
##   # Now
##   across(a:b, \(x) mean(x, na.rm = TRUE))

data_dep <- data_dep[-19, ]

Arreglar la data de votos

# Combinar los dataframes uno por uno
dataFinal = merge(data_dep, dataVotos, by='DEPARTAMENTO', all.x = T)

str(dataVotos)

## 'data.frame':    25 obs. of  5 variables:
##  $ Castillo     : chr  "34 464" "110 620" "88 812" "256 224" ...
##  $ Fujimori     : chr  "17 815" "67 394" "10 879" "40 216" ...
##  $ Participación: chr  "184 057" "613 850" "219 260" "902 243" ...
##  $ Electores    : chr  "306 186" "886 265" "316 000" "1 145 268" ...
##  $ DEPARTAMENTO : chr  "AMAZONAS" "ANCASH" "APURIMAC" "AREQUIPA" ...

dataVotos <- dataVotos %>%
  mutate(across(
    .cols = -DEPARTAMENTO,  # aplica a todas menos Departamento
    .fns = ~ as.numeric(gsub(" ", "", .))
  ))

## Warning: There were 4 warnings in `mutate()`.
## The first warning was:
## ℹ In argument: `across(...)`.
## Caused by warning:
## ! NAs introducidos por coerción
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 3 remaining warnings.

dataVotos$Electores[15] <- 8322644

dataVotos$Participación[15] <- 6206220

dataVotos$Fujimori[15] <- 754216

dataVotos$Castillo[15] <- 416743

unir limas

data_dep$DEPARTAMENTO[data_dep$DEPARTAMENTO == "LIMA METROPOLITANA"] <- "LIMA"

data_unida <- data_dep %>%
  group_by(DEPARTAMENTO) %>%
  summarise(across(where(is.numeric), sum, na.rm = TRUE)) %>%
  ungroup()

-Data final

# Combinar los dataframes uno por uno
dataFinal = merge(data_unida, dataVotos, by='DEPARTAMENTO', all.x = T)

POISSON

library(modelsummary)

## Warning: package 'modelsummary' was built under R version 4.4.2

## `modelsummary` 2.0.0 now uses `tinytable` as its default table-drawing
##   backend. Learn more at: https://vincentarelbundock.github.io/tinytable/
## 
## Revert to `kableExtra` for one session:
## 
##   options(modelsummary_factory_default = 'kableExtra')
##   options(modelsummary_factory_latex = 'kableExtra')
##   options(modelsummary_factory_html = 'kableExtra')
## 
## Silence this message forever:
## 
##   config_modelsummary(startup_message = FALSE)

h1 = formula(Castillo ~ total_no_lee + total_lee + total_superior + total_no_superior + total_fallecidos)

rp1 = glm(h1, data = dataFinal, 
        offset = log(Electores), #exposure 
        family = poisson(link = "log"))
summary(rp1)

## 
## Call:
## glm(formula = h1, family = poisson(link = "log"), data = dataFinal, 
##     offset = log(Electores))
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)       -6.208e-01  3.612e-03  -171.9   <2e-16 ***
## total_no_lee       4.601e-03  1.003e-05   458.8   <2e-16 ***
## total_lee         -1.582e-03  9.033e-06  -175.1   <2e-16 ***
## total_superior     1.638e-03  7.189e-06   227.9   <2e-16 ***
## total_no_superior -1.101e-03  3.241e-06  -339.9   <2e-16 ***
## total_fallecidos   4.900e-05  2.221e-07   220.6   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 1375302  on 24  degrees of freedom
## Residual deviance:  470039  on 19  degrees of freedom
## AIC: 470377
## 
## Number of Fisher Scoring iterations: 5

Exponenciamos para interpretar

cbind(exp(coef(rp1)),exp(confint(rp1)))

## Waiting for profiling to be done...

##                                 2.5 %    97.5 %
## (Intercept)       0.5374921 0.5337008 0.5413104
## total_no_lee      1.0046118 1.0045921 1.0046316
## total_lee         0.9984197 0.9984020 0.9984374
## total_superior    1.0016398 1.0016257 1.0016539
## total_no_superior 0.9988991 0.9988928 0.9989055
## total_fallecidos  1.0000490 1.0000486 1.0000494

Primeras conclusiones:

total_no_lee -> Cada persona más que no sabe leer → aumenta en 0.46% la tasa de votos a Castillo
total_lee -> Cada persona que sabe leer → disminuye en 0.16% la tasa de votos
total_superior -> Cada persona más con educación superior → aumenta en 0.16% la tasa de votos
total_no_superior -> Cada persona más sin educación superior → disminuye en 0.11% la tasa de votos
total_fallecidos -> Cada fallecido adicional → aumenta levemente (0.005%) la tasa de votos

El analfabetismo (no saber leer) está positivamente relacionado con el apoyo a Castillo.

Saber leer o tener estudios no está asociado directamente a más votos por Castillo. De hecho, en tu caso:

Dispersión

#Over y underdisperción: under → quasi poisson ; over → quasi y binomial negativa

library(magrittr)
library(kableExtra)

## Warning: package 'kableExtra' was built under R version 4.4.2

## 
## Adjuntando el paquete: 'kableExtra'

## The following object is masked from 'package:dplyr':
## 
##     group_rows

overdispersion=AER::dispersiontest(rp1,alternative='greater')$ p.value<0.05
underdispersion=AER::dispersiontest(rp1,alternative='less')$ p.value<0.05
# tabla
testResult=as.data.frame(rbind(overdispersion,underdispersion))
names(testResult)='Es probable?'
testResult%>%kable(caption = "Test de Equidispersión")%>%kableExtra::kable_styling()

Test de Equidispersión
	Es probable?
overdispersion	TRUE
underdispersion	FALSE

Regresión Quasipoisson

# Regresión Quasipoisson

rqp = glm(h1, data = dataFinal,
          offset=log(Electores),
          family = quasipoisson(link = "log"))
summary(rqp)

## 
## Call:
## glm(formula = h1, family = quasipoisson(link = "log"), data = dataFinal, 
##     offset = log(Electores))
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)   
## (Intercept)       -6.208e-01  5.613e-01  -1.106  0.28249   
## total_no_lee       4.601e-03  1.559e-03   2.952  0.00818 **
## total_lee         -1.582e-03  1.404e-03  -1.127  0.27392   
## total_superior     1.638e-03  1.117e-03   1.467  0.15885   
## total_no_superior -1.101e-03  5.037e-04  -2.187  0.04147 * 
## total_fallecidos   4.900e-05  3.452e-05   1.419  0.17197   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 24152.18)
## 
##     Null deviance: 1375302  on 24  degrees of freedom
## Residual deviance:  470039  on 19  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 5

Entonces..

total_no_lee (personas que no saben leer) es la única variable con significancia fuerte

Regresión Binomial Negativa:

#Regresión Binomial Negativa:
# bin
h2off=formula(Castillo ~ total_no_lee + total_lee + total_superior + total_no_superior + total_fallecidos + offset(log(Electores)))

rbn=MASS::glm.nb(h2off,data=dataFinal)
summary(rbn)

## 
## Call:
## MASS::glm.nb(formula = h2off, data = dataFinal, init.theta = 3.878645528, 
##     link = log)
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)  
## (Intercept)       -6.071e-01  5.022e-01  -1.209   0.2267  
## total_no_lee       2.977e-03  1.470e-03   2.024   0.0429 *
## total_lee         -1.021e-03  1.436e-03  -0.711   0.4769  
## total_superior     1.250e-03  1.110e-03   1.127   0.2599  
## total_no_superior -8.778e-04  5.028e-04  -1.746   0.0808 .
## total_fallecidos   3.877e-05  3.234e-05   1.199   0.2306  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(3.8786) family taken to be 1)
## 
##     Null deviance: 42.866  on 24  degrees of freedom
## Residual deviance: 26.068  on 19  degrees of freedom
## AIC: 612.56
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  3.88 
##           Std. Err.:  1.05 
## 
##  2 x log-likelihood:  -598.562

conclusion:

👉 Para analizar los factores asociados al número de votos recibidos por Pedro Castillo en las elecciones generales de 2021, se estimaron tres modelos de regresión para datos de conteo: Poisson, Quasipoisson y Binomial Negativa. La variable dependiente fue el número de votos por Castillo a nivel departamental, y se incluyó un offset del logaritmo del total de electores para controlar por el tamaño del padrón.

Las variables independientes consideradas fueron:

Total de personas que no saben leer

Total de personas que saben leer

Total de personas con educación superior

Total de personas sin educación superior

Total de fallecidos durante la pandemia

En los tres modelos, la variable “total de personas que no saben leer” resultó ser la más significativa y consistente, mostrando una relación positiva con el número de votos por Castillo.

Fuentes:

Fallecidos: https://www.datosabiertos.gob.pe/dataset/fallecidos-por-covid-19-ministerio-de-salud-minsa/resource/4b7636f3-5f0c-4404-8526

Educación ENAHO: https://proyectos.inei.gob.pe/microdatos/

Votos wikipedia:

ejercicio 11-04 pt2

Yhara

2025-04-11