library(haven)Warning: package 'haven' was built under R version 4.4.3
data <- read_dta("Data1_R.dta")
View(data)##REGRESIÓN MÚLTIPLE CON VARIABLE DISCRETA ORDENADA
Instalar paquetes si es necesario
install.packages(“MASS”) Modelos Probit y Logit Ordenado
install.packages(“VGAM”) Modelos Logit y Probit Ordenado
install.packages(“margins”) Efectos marginales
install.packages(“pROC”) Curva ROC
install.packages(“ggplot2”) Gráficos
install.packages(“nnet”) Matriz de confusión
install.packages(“modelsummary”) Exportar a word
install.packages(“pandoc”) Exportar a word
install.packages(“officer”)
install.packages(“flextable”)
install.packages(“broom”)
install.packages(c(“officer”, “flextable”, “broom”))
Cargar librerías
library(MASS)
library(VGAM)
library(margins)
library(pROC)
library(ggplot2)
library(nnet)
library(haven)
library(modelsummary)
library(VGAM)
library(officer)
library(flextable)
library(broom)
library(Rchoice)
library(“foreign”)
library(“Rchoice”)
library(“msm”)
library(“car”)
setwd(“C:/Users/LENOVO/Desktop/1 JUNIO ANDRE”)
#Leer los datos (ajustar la ruta del archivo)
library(haven)Warning: package 'haven' was built under R version 4.4.3
data <- read_dta("Data1_R.dta")
View(data)#Ver las primeras filas
head(data)# A tibble: 6 × 50
area empleo region edad t_hijos nac_vivo_murieron mortinato_2
<dbl+lbl> <dbl+lbl> <dbl+l> <dbl> <dbl> <dbl+lbl> <dbl+lbl>
1 1 [Urbano] 1 [Trabajó al … 1 [Sie… 19 1 0 [No] 0 [No]
2 1 [Urbano] 0 [No trabajó] 1 [Sie… 23 1 0 [No] 0 [No]
3 1 [Urbano] 1 [Trabajó al … 1 [Sie… 38 5 0 [No] 0 [No]
4 1 [Urbano] 0 [No trabajó] 1 [Sie… 18 1 0 [No] 0 [No]
5 1 [Urbano] 0 [No trabajó] 1 [Sie… 21 1 0 [No] 0 [No]
6 1 [Urbano] 1 [Trabajó al … 1 [Sie… 22 1 0 [No] 0 [No]
# ℹ 43 more variables: depresion_pp <dbl+lbl>, intensidad_dpp <dbl+lbl>,
# etnia <dbl+lbl>, f2_s2_216_1 <dbl+lbl>, f2_s2_216_2 <dbl>,
# f2_s2_218_1_a <dbl+lbl>, tiempo_dpp <dbl+lbl>, f2_s5_504a_1 <dbl+lbl>,
# f2_s5_504b_1 <dbl+lbl>, f2_s5_504c_1 <dbl+lbl>, f2_s5_504d_1 <dbl+lbl>,
# f2_s5_504e_1 <dbl+lbl>, f2_s5_504f_1 <dbl+lbl>, f2_s5_504g_1 <dbl+lbl>,
# f2_s5_504h_1 <dbl+lbl>, f2_s5_504i_1 <dbl+lbl>, f2_s5_504j_1 <dbl+lbl>,
# f2_s5_504k_1 <dbl+lbl>, est_civil <dbl+lbl>, q_usted <dbl+lbl>, …
#Revisar estructura de los datos
str(data)#Ver distribución de la variable dependiente
table(data$intensidad_dpp)
1 2 3
12828 1790 1833
table(data$depresion_dpp)Warning: Unknown or uninitialised column: `depresion_dpp`.
< table of extent 0 >
##Ejemplo 1: Modelo con variable dependiente odenada####
library("Rchoice") Warning: package 'Rchoice' was built under R version 4.4.3
Cargando paquete requerido: Formula
Cargando paquete requerido: maxLik
Warning: package 'maxLik' was built under R version 4.4.3
Cargando paquete requerido: miscTools
Warning: package 'miscTools' was built under R version 4.4.3
Please cite the 'maxLik' package as:
Henningsen, Arne and Toomet, Ott (2011). maxLik: A package for maximum likelihood estimation in R. Computational Statistics 26(3), 443-458. DOI 10.1007/s00180-010-0217-1.
If you have questions, suggestions, or comments regarding the 'maxLik' package, please use a forum or 'tracker' at maxLik's R-Forge site:
https://r-forge.r-project.org/projects/maxlik/
logitord <- Rchoice(intensidad_dpp ~ lingrl + anios_esc + edad + t_hijos + etnia + area, data = data, na.action = na.omit, family = ordinal('logit'))summary(logitord)
Model: ordinal
Model estimated on: sáb jun 07 15:35:30 2025
Call:
Rchoice(formula = intensidad_dpp ~ lingrl + anios_esc + edad +
t_hijos + etnia + area, data = data, na.action = na.omit,
family = ordinal("logit"), method = "bfgs")
Frequencies of categories:
y
1 2 3
0.7798 0.1088 0.1114
The estimation took: 0h:0m:2s
Coefficients:
Estimate Std. Error z-value Pr(>|z|)
kappa.1 0.822194 0.018816 43.697 < 2e-16 ***
constant -2.352638 0.100572 -23.393 < 2e-16 ***
lingrl 0.001703 0.007114 0.239 0.81077
anios_esc -0.009924 0.004865 -2.040 0.04136 *
edad 0.034483 0.003190 10.810 < 2e-16 ***
t_hijos 0.039105 0.018717 2.089 0.03669 *
etnia 0.344926 0.059842 5.764 8.22e-09 ***
area 0.118337 0.042259 2.800 0.00511 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Optimization of log-likelihood by BFGS maximization
Log Likelihood: -11050
Number of observations: 16451
Number of iterations: 129
Exit of MLE: successful convergence
Análisis:
El ingreso no tiene ninguna influencia sobre la depresión post parto, es decir, no hay evidencia estadística.
Los años de escolaridad cuando aumenta, reduce la probabilidad de sufrir de depresión, no sabemos en que magnitud.
La edad aumenta la probabilidad de sufrir depresión post parto
El número de hijos aumenta la Etnia, las mujeres indígenas sufren mas área rural sufren más
Las mujeres que habitan en zonas rurales tienen mas probabilidades de sufrir depresion post parto.
#Exportar informacion a word
library(modelsummary)Warning: package 'modelsummary' was built under R version 4.4.3
modelsummary(logitord, output = "modelo_logit_ord.docx")library(modelsummary)
modelsummary(logitord,
output = "modelo_logit_ord2.docx",
stars = c('*' = .05, '**' = .01, '***' = .001))library(modelsummary)
modelsummary(logitord,
output = "modelo_logit_ord3.docx",
statistic = c("p.value" = "p = {p.value}{stars}"),
stars = TRUE)library(modelsummary)
modelsummary(logitord,
output = "modelo_logit_ord4.docx",
statistic = "{statistic}",
stars = TRUE)#Criterio de información de Akaike (AIC) y criterio de Información Bayesiano (BIC)
AIC(logitord)[1] 22123.39
BIC(logitord)[1] 22185.05
probitord <- Rchoice(intensidad_dpp ~ lingrl + anios_esc + edad + t_hijos + etnia + area,
data = data,
na.action = na.omit,
family = ordinal('probit'))#Mostrar resumen de los modelos
summary(probitord)
Model: ordinal
Model estimated on: sáb jun 07 15:35:36 2025
Call:
Rchoice(formula = intensidad_dpp ~ lingrl + anios_esc + edad +
t_hijos + etnia + area, data = data, na.action = na.omit,
family = ordinal("probit"), method = "bfgs")
Frequencies of categories:
y
1 2 3
0.7798 0.1088 0.1114
The estimation took: 0h:0m:1s
Coefficients:
Estimate Std. Error z-value Pr(>|z|)
kappa.1 0.454873 0.010062 45.205 < 2e-16 ***
constant -1.404867 0.057158 -24.579 < 2e-16 ***
lingrl 0.001106 0.004054 0.273 0.7850
anios_esc -0.006361 0.002774 -2.293 0.0218 *
edad 0.020295 0.001839 11.037 < 2e-16 ***
t_hijos 0.023193 0.010864 2.135 0.0328 *
etnia 0.196005 0.034697 5.649 1.61e-08 ***
area 0.071133 0.023881 2.979 0.0029 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Optimization of log-likelihood by BFGS maximization
Log Likelihood: -11050
Number of observations: 16451
Number of iterations: 82
Exit of MLE: successful convergence
library(modelsummary)
modelsummary(probitord,
output = "modelo_probitord_ord4.docx",
statistic = "{statistic}",
stars = TRUE)#Comando para exportar los resultados en una tabla
library(modelsummary)
modelsummary(list("Logit" = logitord, "Probit" = probitord),
output = "modelos_logit_probit.docx",
stars = TRUE)AIC(probitord)[1] 22109.98
BIC(probitord)[1] 22171.64
#Comparar AIC y BIC de ambos modelos
aic_logit_ord <- AIC(logitord)
aic_probit_ord <- AIC(probitord)bic_logit_ord <- BIC(logitord)
bic_probit_ord <- BIC(probitord)#Mostrar resultados
cat("AIC Logit Ordenado:", aic_logit_ord, " | AIC Probit Ordenado:", aic_probit_ord, "\n")AIC Logit Ordenado: 22123.39 | AIC Probit Ordenado: 22109.98
cat("BIC Logit Ordenado:", bic_logit_ord, " | BIC Probit Ordenado:", bic_probit_ord, "\n")BIC Logit Ordenado: 22185.05 | BIC Probit Ordenado: 22171.64
#Conclusión:
El modelo con menor AIC/BIC es el preferido, puesto que, mantiene el mejor ajuste. Por lo tanto, el modelo probit ordenado se considera como el mejor en este caso.
###Efectos marginales######
X <- cbind(1, logitord$mf[, -1])coeficientes<- logitord$coefficientscoeficientes_1<- coeficientes[2:8]ai <- crossprod(t(X), coeficientes_1) deriv1<-mean(dnorm(ai))INGRESO
library(msm)Warning: package 'msm' was built under R version 4.4.3
gamma_hatq1_1 <- coeficientes_1["lingrl"]
AME_hat_q1lc_1 <- round(mean(dnorm(ai)) * (gamma_hatq1_1),3);AME_hat_q1lc_1lingrl
0
AME_hat_q1lc_se_1 <- round(deltamethod(~ deriv1 * x1, coef(logitord), vcov(logitord), ses = TRUE),3);AME_hat_q1lc_se_1 [1] 0.003
AME_hat_q1lc_cil_1 <- round(c(AME_hat_q1lc_1 - 1.96 * AME_hat_q1lc_se_1),3);AME_hat_q1lc_cil_1lingrl
-0.006
AME_hat_q1lc_ciu_1 <- round(c(AME_hat_q1lc_1 + 1.96 * AME_hat_q1lc_se_1),3);AME_hat_q1lc_ciu_1lingrl
0.006
AME_hat_q1lc_pr_1 <- round(2 * pnorm(-abs(AME_hat_q1lc_1/AME_hat_q1lc_se_1)),3);AME_hat_q1lc_pr_1lingrl
1
AME_hat_q1lc_star_1<- " "
lingrl_q1<-cbind(AME_hat_q1lc_1,AME_hat_q1lc_star_1, AME_hat_q1lc_se_1,AME_hat_q1lc_pr_1,AME_hat_q1lc_cil_1,AME_hat_q1lc_ciu_1)ESCOLARIDAD
library(msm)
gamma_hatq2_2 <- coeficientes_1["anios_esc"]
AME_hat_q2lc_2 <- round(mean(dnorm(ai)) * (gamma_hatq2_2),3);AME_hat_q2lc_2anios_esc
-0.002
AME_hat_q2lc_se_2 <- round(deltamethod(~ deriv1 * x2, coef(logitord), vcov(logitord), ses = TRUE),3);AME_hat_q2lc_se_2 [1] 0.018
AME_hat_q2lc_cil_2 <- round(c(AME_hat_q2lc_2 - 1.96 * AME_hat_q2lc_se_2),3);AME_hat_q2lc_cil_2anios_esc
-0.037
AME_hat_q2lc_ciu_2 <- round(c(AME_hat_q2lc_2 + 1.96 * AME_hat_q2lc_se_2),3);AME_hat_q2lc_ciu_2anios_esc
0.033
AME_hat_q2lc_pr_2 <- round(2 * pnorm(-abs(AME_hat_q2lc_2/AME_hat_q2lc_se_2)),3);AME_hat_q2lc_pr_2anios_esc
0.912
AME_hat_q2lc_star_2<- " "
anios_esc_q2<-cbind(AME_hat_q2lc_2,AME_hat_q2lc_star_2, AME_hat_q2lc_se_2,AME_hat_q2lc_pr_2,AME_hat_q2lc_cil_2,AME_hat_q2lc_ciu_2)EDAD
library(msm)
gamma_hatq3_3 <- coeficientes_1["edad"]
AME_hat_q3lc_3 <- round(mean(dnorm(ai)) * (gamma_hatq3_3),3);AME_hat_q3lc_3 edad
0.006
AME_hat_q3lc_se_3 <- round(deltamethod(~ deriv1 * x3, coef(logitord), vcov(logitord), ses = TRUE),3);AME_hat_q3lc_se_3 [1] 0.001
AME_hat_q3lc_cil_3 <- round(c(AME_hat_q3lc_3 - 1.96 * AME_hat_q3lc_se_3),3);AME_hat_q3lc_cil_3 edad
0.004
AME_hat_q3lc_ciu_3 <- round(c(AME_hat_q3lc_3 + 1.96 * AME_hat_q3lc_se_3),3);AME_hat_q3lc_ciu_3 edad
0.008
AME_hat_q3lc_pr_3 <- round(2 * pnorm(-abs(AME_hat_q3lc_3/AME_hat_q3lc_se_3)),3);AME_hat_q3lc_pr_3edad
0
AME_hat_q3lc_star_3<- "*** "
edad_q3<-cbind(AME_hat_q3lc_3,AME_hat_q3lc_star_3, AME_hat_q3lc_se_3,AME_hat_q3lc_pr_3,AME_hat_q3lc_cil_3,AME_hat_q3lc_ciu_3)NUMERO DE HIJOS
library(msm)
gamma_hatq4_4 <- coeficientes_1["t_hijos"]
AME_hat_q4lc_4 <- round(mean(dnorm(ai)) * (gamma_hatq4_4),3);AME_hat_q4lc_4t_hijos
0.007
AME_hat_q4lc_se_4 <- round(deltamethod(~ deriv1 * x4, coef(logitord), vcov(logitord), ses = TRUE),3);AME_hat_q4lc_se_4 [1] 0.001
AME_hat_q4lc_cil_4 <- round(c(AME_hat_q4lc_4 - 1.96 * AME_hat_q4lc_se_4),3);AME_hat_q4lc_cil_4t_hijos
0.005
AME_hat_q4lc_ciu_4 <- round(c(AME_hat_q4lc_4 + 1.96 * AME_hat_q4lc_se_4),3);AME_hat_q4lc_ciu_4t_hijos
0.009
AME_hat_q4lc_pr_4 <- round(2 * pnorm(-abs(AME_hat_q4lc_4/AME_hat_q4lc_se_4)),3);AME_hat_q4lc_pr_4t_hijos
0
AME_hat_q4lc_star_4<- "*** "
t_hijos_q4<-cbind(AME_hat_q4lc_4,AME_hat_q4lc_star_4, AME_hat_q4lc_se_4,AME_hat_q4lc_pr_4,AME_hat_q4lc_cil_4,AME_hat_q4lc_ciu_4)ETNIA
library(msm)
gamma_hatq5_5 <- coeficientes_1["etnia"]
AME_hat_q5lc_5 <- round(mean(dnorm(ai)) * (gamma_hatq5_5),3);AME_hat_q5lc_5etnia
0.061
AME_hat_q5lc_se_5 <- round(deltamethod(~ deriv1 * x5, coef(logitord), vcov(logitord), ses = TRUE),3);AME_hat_q5lc_se_5 [1] 0.001
AME_hat_q5lc_cil_5 <- round(c(AME_hat_q5lc_5 - 1.96 * AME_hat_q5lc_se_5),3);AME_hat_q5lc_cil_5etnia
0.059
AME_hat_q5lc_ciu_5 <- round(c(AME_hat_q5lc_5 + 1.96 * AME_hat_q5lc_se_5),3);AME_hat_q5lc_ciu_5etnia
0.063
AME_hat_q5lc_pr_5 <- round(2 * pnorm(-abs(AME_hat_q5lc_5/AME_hat_q5lc_se_5)),3);AME_hat_q5lc_pr_5etnia
0
AME_hat_q5lc_star_5<- "*** "
etnia_q5<-cbind(AME_hat_q5lc_5,AME_hat_q5lc_star_5, AME_hat_q5lc_se_5,AME_hat_q5lc_pr_5,AME_hat_q5lc_cil_5,AME_hat_q5lc_ciu_5)AREA
library(msm)
gamma_hatq6_6 <- coeficientes_1["area"]
AME_hat_q6lc_6 <- round(mean(dnorm(ai)) * (gamma_hatq6_6),3);AME_hat_q6lc_6 area
0.021
AME_hat_q6lc_se_6 <- round(deltamethod(~ deriv1 * x6, coef(logitord), vcov(logitord), ses = TRUE),3);AME_hat_q6lc_se_6 [1] 0.003
AME_hat_q6lc_cil_6 <- round(c(AME_hat_q6lc_6 - 1.96 * AME_hat_q6lc_se_6),3);AME_hat_q6lc_cil_6 area
0.015
AME_hat_q6lc_ciu_6 <- round(c(AME_hat_q6lc_6 + 1.96 * AME_hat_q6lc_se_6),3);AME_hat_q6lc_ciu_6 area
0.027
AME_hat_q6lc_pr_6 <- round(2 * pnorm(-abs(AME_hat_q6lc_6/AME_hat_q6lc_se_6)),3);AME_hat_q6lc_pr_6area
0
AME_hat_q6lc_star_6<- "*** "
area_q6<-cbind(AME_hat_q6lc_6,AME_hat_q6lc_star_6, AME_hat_q6lc_se_6,AME_hat_q6lc_pr_6,AME_hat_q6lc_cil_6,AME_hat_q6lc_ciu_6)resultados_ame<-rbind(lingrl_q1,anios_esc_q2, edad_q3, t_hijos_q4,etnia_q5,area_q6);resultados_ame AME_hat_q1lc_1 AME_hat_q1lc_star_1 AME_hat_q1lc_se_1
lingrl "0" " " "0.003"
anios_esc "-0.002" " " "0.018"
edad "0.006" "*** " "0.001"
t_hijos "0.007" "*** " "0.001"
etnia "0.061" "*** " "0.001"
area "0.021" "*** " "0.003"
AME_hat_q1lc_pr_1 AME_hat_q1lc_cil_1 AME_hat_q1lc_ciu_1
lingrl "1" "-0.006" "0.006"
anios_esc "0.912" "-0.037" "0.033"
edad "0" "0.004" "0.008"
t_hijos "0" "0.005" "0.009"
etnia "0" "0.059" "0.063"
area "0" "0.015" "0.027"
#Exportar efectos marginales a excel
write.csv2(resultados_ame,file="ame_logitord.csv")####Matriz de Confusión y Exactitud del Modelo
# Obtener predicciones del modelo logit ordenado
pred_logit_ord <- crossprod(t(X), coeficientes_1)pred_clases <- apply(pred_logit_ord, 1, which.max)conf_matrix <- table(Predicho = pred_clases, Real = as.numeric(data$intensidad_dpp))print(conf_matrix) Real
Predicho 1 2 3
1 12828 1790 1833
exactitud <- sum(diag(conf_matrix)) / sum(conf_matrix)
cat("Exactitud del modelo Logit Ordenado:", exactitud, "\n")Exactitud del modelo Logit Ordenado: 0.7797702
#Análisis
Por tanto, considerando que el modelo logit ordenado tiene una exactitud del 0,77, para este caso de análisis es un modelo robusto y viable para analizarse.
#ACTIVIDAD DE TAREA
SE DEBE HACER LO MISMO, INCLUYENDO LA EXACTITUD DEL MODELO PROBIT.
X_probit <- cbind(1, probitord$mf[, -1])
coef_probit <- probitord$coefficients
coef_probit_1 <- coef_probit[2:8]
pred_probit_ord <- crossprod(t(X_probit), coef_probit_1)#Convertir las probabilidades en categorias predichas
pred_clases_probit <- apply(pred_probit_ord, 1, which.max)#Crear la matriz de confusion para el modelo Probit
conf_matrix_probit <- table(Predicho = pred_clases_probit, Real = as.numeric(data$intensidad_dpp))#Mostrar la matriz de confusion
print(conf_matrix_probit) Real
Predicho 1 2 3
1 12828 1790 1833
#Calcular la exactitud del modelo Probit
exactitud_probit <- sum(diag(conf_matrix_probit)) / sum(conf_matrix_probit)
cat("Exactitud del modelo Probit Ordenado:", exactitud_probit, "\n")Exactitud del modelo Probit Ordenado: 0.7797702
#Análisis
Por tanto, considerando el modelo Probit ordenado, el cual, tiene una exactitud de 0,77, se lo toma como un modelo robusto, por lo que, existe un 77% de probabilidad de predicción de la intensidad de la depresión post parto.