Tarea

#Cargar librerías
library(MASS)
library(VGAM)

## Loading required package: stats4

## Loading required package: splines

library(margins)
library(pROC)

## Type 'citation("pROC")' for a citation.

## 
## Attaching package: 'pROC'

## The following objects are masked from 'package:stats':
## 
##     cov, smooth, var

library(ggplot2)
library(nnet)
library(haven)
library(modelsummary)

## 
## Attaching package: 'modelsummary'

## The following object is masked from 'package:VGAM':
## 
##     Max

library(VGAM)
library(officer)
library(flextable)
library(broom)
library(Rchoice)

## Loading required package: Formula

## Loading required package: maxLik

## Loading required package: miscTools

## 
## Please cite the 'maxLik' package as:
## Henningsen, Arne and Toomet, Ott (2011). maxLik: A package for maximum likelihood estimation in R. Computational Statistics 26(3), 443-458. DOI 10.1007/s00180-010-0217-1.
## 
## If you have questions, suggestions, or comments regarding the 'maxLik' package, please use a forum or 'tracker' at maxLik's R-Forge site:
## https://r-forge.r-project.org/projects/maxlik/

library("foreign")
library("Rchoice")
library("msm")
library("car")

## Loading required package: carData

## 
## Attaching package: 'car'

## The following object is masked from 'package:VGAM':
## 
##     logit

setwd("/cloud/project")
data <- read_dta("Data1_R.dta")

Ver distribución de la variable dependiente

##Ejemplo 1: Modelo con variable dependiente odenada###

summary(logitord)

## 
## Model: ordinal
## Model estimated on: Fri Jun 20 20:37:50 2025 
## 
## Call:
## Rchoice(formula = intensidad_dpp ~ lingrl + anios_esc + edad + 
##     t_hijos + etnia + area, data = data, na.action = na.omit, 
##     family = ordinal("logit"), method = "bfgs")
## 
## 
## Frequencies of categories:
## y
##      0      1      2 
## 0.7798 0.1088 0.1114 
## The estimation took: 0h:0m:20s 
## 
## Coefficients:
##            Estimate Std. Error z-value Pr(>|z|)    
## kappa.1    0.822194   0.018816  43.697  < 2e-16 ***
## constant  -2.352638   0.100572 -23.393  < 2e-16 ***
## lingrl     0.001703   0.007114   0.239  0.81077    
## anios_esc -0.009924   0.004865  -2.040  0.04136 *  
## edad       0.034483   0.003190  10.810  < 2e-16 ***
## t_hijos    0.039105   0.018717   2.089  0.03669 *  
## etnia      0.344926   0.059842   5.764 8.22e-09 ***
## area       0.118337   0.042259   2.800  0.00511 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Optimization of log-likelihood by BFGS maximization
## Log Likelihood: -11050
## Number of observations: 16451
## Number of iterations: 129
## Exit of MLE: successful convergence

El ingreso no tiene ninguna influencia sobre la depresión post parto, no hay evidencia estadística. Los años de escolaridad cuando aumenta, reduce la probabilidad de sufrir de depresión, no sabemos en que magnitud.La edad aumenta la probabilidad de sufrir depresión post parto.

library(modelsummary)
modelsummary(logitord, output = "modelo_logit_ord.docx")

library(modelsummary)
modelsummary(logitord, 
             output = "modelo_logit_ord2.docx", 
             stars = c('*' = .05, '**' = .01, '***' = .001))

modelsummary(logitord,
             output = "modelo_logit_ord3.docx",
             statistic = c("p.value" = "p = {p.value}{stars}"),
             stars = TRUE)

library(modelsummary)
modelsummary(logitord,
             output = "modelo_logit_ord4.docx",
             statistic = "{statistic}",
             stars = TRUE)

RMSE significa Root Mean Squared Error (Raíz del Error Cuadrático Medio). Es una métrica común para evaluar la precisión de modelos de predicción, especialmente en regresión. Más bajo = mejor. Un RMSE de 0 indica predicciones perfectas. Este es más usual en casos de variables cuantitativas, para determinar la precisión lo hacemos con la matriz de confusión.

Criterio de información de Akaike (AIC) y criterio de Información Bayesiano (BIC)

AIC(logitord)

## [1] 22123.39

BIC(logitord)

## [1] 22185.05

Ajustar modelo Probit Ordenado

probitord <- Rchoice(intensidad_dpp ~ lingrl + anios_esc + edad + t_hijos + etnia + area, 
                    data = data, 
                    na.action = na.omit, 
                    family = ordinal('probit'))

summary(probitord)

## 
## Model: ordinal
## Model estimated on: Fri Jun 20 20:38:11 2025 
## 
## Call:
## Rchoice(formula = intensidad_dpp ~ lingrl + anios_esc + edad + 
##     t_hijos + etnia + area, data = data, na.action = na.omit, 
##     family = ordinal("probit"), method = "bfgs")
## 
## 
## Frequencies of categories:
## y
##      0      1      2 
## 0.7798 0.1088 0.1114 
## The estimation took: 0h:0m:15s 
## 
## Coefficients:
##            Estimate Std. Error z-value Pr(>|z|)    
## kappa.1    0.454873   0.010062  45.205  < 2e-16 ***
## constant  -1.404867   0.057158 -24.579  < 2e-16 ***
## lingrl     0.001106   0.004054   0.273   0.7850    
## anios_esc -0.006361   0.002774  -2.293   0.0218 *  
## edad       0.020295   0.001839  11.037  < 2e-16 ***
## t_hijos    0.023193   0.010864   2.135   0.0328 *  
## etnia      0.196005   0.034697   5.649 1.61e-08 ***
## area       0.071133   0.023881   2.979   0.0029 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Optimization of log-likelihood by BFGS maximization
## Log Likelihood: -11050
## Number of observations: 16451
## Number of iterations: 82
## Exit of MLE: successful convergence

modelsummary(probitord,
             output = "modelo_probitord_ord4.docx",
             statistic = "{statistic}",
             stars = TRUE)

AIC(probitord)

## [1] 22109.98

BIC(probitord)

## [1] 22171.64

El modelo probit ordenado presenta un mejor ajuste que el logit ordenado, según los criterios de información AIC y BIC. El AIC del modelo probit es 22109.98, mientras que el del logit es 22123.39; de igual forma, el BIC del probit es 22171.64 frente a 22185.05 del logit. Estas diferencias, superiores a 10 puntos en ambos casos, indican una mejora sustancial en el ajuste del modelo probit, lo que sugiere que este capta de manera más eficiente la estructura de los datos bajo los mismos predictores.

aic_logit_ord <- AIC(logitord)
aic_probit_ord <- AIC(probitord)

bic_logit_ord <- BIC(logitord)
bic_probit_ord <- BIC(probitord)

cat("AIC Logit Ordenado:", aic_logit_ord, " | AIC Probit Ordenado:", aic_probit_ord, "\n")

## AIC Logit Ordenado: 22123.39  | AIC Probit Ordenado: 22109.98

bic_probit_ord <- BIC(probitord)
cat("BIC Logit Ordenado:", bic_logit_ord, " | BIC Probit Ordenado:", bic_probit_ord, "\n")

## BIC Logit Ordenado: 22185.05  | BIC Probit Ordenado: 22171.64

Efectos Marginales

X <- cbind(1, logitord$mf[, -1])

coeficientes<- logitord$coefficients

coeficientes_1<- coeficientes[2:8]

ai <- crossprod(t(X), coeficientes_1) 

deriv1<-mean(dnorm(ai))

#INGRESO
library(msm)
gamma_hatq1_1 <- coeficientes_1["lingrl"]
AME_hat_q1lc_1   <- round(mean(dnorm(ai)) * (gamma_hatq1_1),3);AME_hat_q1lc_1

## lingrl 
##      0

AME_hat_q1lc_se_1 <-  round(deltamethod(~ deriv1 * x1, coef(logitord), vcov(logitord), ses =  TRUE),3);AME_hat_q1lc_se_1

## [1] 0.003

AME_hat_q1lc_cil_1 <-  round(c(AME_hat_q1lc_1 - 1.96 * AME_hat_q1lc_se_1),3);AME_hat_q1lc_cil_1

## lingrl 
## -0.006

AME_hat_q1lc_ciu_1 <-  round(c(AME_hat_q1lc_1 + 1.96 * AME_hat_q1lc_se_1),3);AME_hat_q1lc_ciu_1

## lingrl 
##  0.006

AME_hat_q1lc_pr_1 <-  round(2 * pnorm(-abs(AME_hat_q1lc_1/AME_hat_q1lc_se_1)),3);AME_hat_q1lc_pr_1

## lingrl 
##      1

AME_hat_q1lc_star_1<- " "  
lingrl_q1<-cbind(AME_hat_q1lc_1,AME_hat_q1lc_star_1, AME_hat_q1lc_se_1,AME_hat_q1lc_pr_1,AME_hat_q1lc_cil_1,AME_hat_q1lc_ciu_1)

#ESCOLARIDAD
library(msm)
gamma_hatq2_2 <- coeficientes_1["anios_esc"]
AME_hat_q2lc_2   <- round(mean(dnorm(ai)) * (gamma_hatq2_2),3);AME_hat_q2lc_2

## anios_esc 
##    -0.002

AME_hat_q2lc_se_2 <-  round(deltamethod(~ deriv1 * x2, coef(logitord), vcov(logitord), ses =  TRUE),3);AME_hat_q2lc_se_2

## [1] 0.018

AME_hat_q2lc_cil_2 <-  round(c(AME_hat_q2lc_2 - 1.96 * AME_hat_q2lc_se_2),3);AME_hat_q2lc_cil_2

## anios_esc 
##    -0.037

AME_hat_q2lc_ciu_2 <-  round(c(AME_hat_q2lc_2 + 1.96 * AME_hat_q2lc_se_2),3);AME_hat_q2lc_ciu_2

## anios_esc 
##     0.033

AME_hat_q2lc_pr_2 <-  round(2 * pnorm(-abs(AME_hat_q2lc_2/AME_hat_q2lc_se_2)),3);AME_hat_q2lc_pr_2

## anios_esc 
##     0.912

AME_hat_q2lc_star_2<- " "  
anios_esc_q2<-cbind(AME_hat_q2lc_2,AME_hat_q2lc_star_2, AME_hat_q2lc_se_2,AME_hat_q2lc_pr_2,AME_hat_q2lc_cil_2,AME_hat_q2lc_ciu_2)

#EDAD
library(msm)
gamma_hatq3_3 <- coeficientes_1["edad"]
AME_hat_q3lc_3   <- round(mean(dnorm(ai)) * (gamma_hatq3_3),3);AME_hat_q3lc_3

##  edad 
## 0.006

AME_hat_q3lc_se_3 <-  round(deltamethod(~ deriv1 * x3, coef(logitord), vcov(logitord), ses =  TRUE),3);AME_hat_q3lc_se_3

## [1] 0.001

AME_hat_q3lc_cil_3 <-  round(c(AME_hat_q3lc_3 - 1.96 * AME_hat_q3lc_se_3),3);AME_hat_q3lc_cil_3

##  edad 
## 0.004

AME_hat_q3lc_ciu_3 <-  round(c(AME_hat_q3lc_3 + 1.96 * AME_hat_q3lc_se_3),3);AME_hat_q3lc_ciu_3

##  edad 
## 0.008

AME_hat_q3lc_pr_3 <-  round(2 * pnorm(-abs(AME_hat_q3lc_3/AME_hat_q3lc_se_3)),3);AME_hat_q3lc_pr_3

## edad 
##    0

AME_hat_q3lc_star_3<- "*** "  
edad_q3<-cbind(AME_hat_q3lc_3,AME_hat_q3lc_star_3, AME_hat_q3lc_se_3,AME_hat_q3lc_pr_3,AME_hat_q3lc_cil_3,AME_hat_q3lc_ciu_3)

#NUMERO DE HIJOS 
library(msm)
gamma_hatq4_4 <- coeficientes_1["t_hijos"]
AME_hat_q4lc_4   <- round(mean(dnorm(ai)) * (gamma_hatq4_4),3);AME_hat_q4lc_4

## t_hijos 
##   0.007

AME_hat_q4lc_se_4 <-  round(deltamethod(~ deriv1 * x4, coef(logitord), vcov(logitord), ses =  TRUE),3);AME_hat_q4lc_se_4

## [1] 0.001

AME_hat_q4lc_cil_4 <-  round(c(AME_hat_q4lc_4 - 1.96 * AME_hat_q4lc_se_4),3);AME_hat_q4lc_cil_4

## t_hijos 
##   0.005

AME_hat_q4lc_ciu_4 <-  round(c(AME_hat_q4lc_4 + 1.96 * AME_hat_q4lc_se_4),3);AME_hat_q4lc_ciu_4

## t_hijos 
##   0.009

AME_hat_q4lc_pr_4 <-  round(2 * pnorm(-abs(AME_hat_q4lc_4/AME_hat_q4lc_se_4)),3);AME_hat_q4lc_pr_4

## t_hijos 
##       0

AME_hat_q4lc_star_4<- "*** "  
t_hijos_q4<-cbind(AME_hat_q4lc_4,AME_hat_q4lc_star_4, AME_hat_q4lc_se_4,AME_hat_q4lc_pr_4,AME_hat_q4lc_cil_4,AME_hat_q4lc_ciu_4)

#ETNIA
library(msm)
gamma_hatq5_5 <- coeficientes_1["etnia"]
AME_hat_q5lc_5   <- round(mean(dnorm(ai)) * (gamma_hatq5_5),3);AME_hat_q5lc_5

## etnia 
## 0.061

AME_hat_q5lc_se_5 <-  round(deltamethod(~ deriv1 * x5, coef(logitord), vcov(logitord), ses =  TRUE),3);AME_hat_q5lc_se_5

## [1] 0.001

AME_hat_q5lc_cil_5 <-  round(c(AME_hat_q5lc_5 - 1.96 * AME_hat_q5lc_se_5),3);AME_hat_q5lc_cil_5

## etnia 
## 0.059

AME_hat_q5lc_ciu_5 <-  round(c(AME_hat_q5lc_5 + 1.96 * AME_hat_q5lc_se_5),3);AME_hat_q5lc_ciu_5

## etnia 
## 0.063

AME_hat_q5lc_pr_5 <-  round(2 * pnorm(-abs(AME_hat_q5lc_5/AME_hat_q5lc_se_5)),3);AME_hat_q5lc_pr_5

## etnia 
##     0

AME_hat_q5lc_star_5<- "*** "  
etnia_q5<-cbind(AME_hat_q5lc_5,AME_hat_q5lc_star_5, AME_hat_q5lc_se_5,AME_hat_q5lc_pr_5,AME_hat_q5lc_cil_5,AME_hat_q5lc_ciu_5)

#AREA
library(msm)
gamma_hatq6_6 <- coeficientes_1["area"]
AME_hat_q6lc_6   <- round(mean(dnorm(ai)) * (gamma_hatq6_6),3);AME_hat_q6lc_6

##  area 
## 0.021

AME_hat_q6lc_se_6 <-  round(deltamethod(~ deriv1 * x6, coef(logitord), vcov(logitord), ses =  TRUE),3);AME_hat_q6lc_se_6

## [1] 0.003

AME_hat_q6lc_cil_6 <-  round(c(AME_hat_q6lc_6 - 1.96 * AME_hat_q6lc_se_6),3);AME_hat_q6lc_cil_6

##  area 
## 0.015

AME_hat_q6lc_ciu_6 <-  round(c(AME_hat_q6lc_6 + 1.96 * AME_hat_q6lc_se_6),3);AME_hat_q6lc_ciu_6

##  area 
## 0.027

AME_hat_q6lc_pr_6 <-  round(2 * pnorm(-abs(AME_hat_q6lc_6/AME_hat_q6lc_se_6)),3);AME_hat_q6lc_pr_6

## area 
##    0

AME_hat_q6lc_star_6<- "*** "  
area_q6<-cbind(AME_hat_q6lc_6,AME_hat_q6lc_star_6, AME_hat_q6lc_se_6,AME_hat_q6lc_pr_6,AME_hat_q6lc_cil_6,AME_hat_q6lc_ciu_6)

resultados_ame<-rbind(lingrl_q1,anios_esc_q2, edad_q3, t_hijos_q4,etnia_q5,area_q6);resultados_ame

##           AME_hat_q1lc_1 AME_hat_q1lc_star_1 AME_hat_q1lc_se_1
## lingrl    "0"            " "                 "0.003"          
## anios_esc "-0.002"       " "                 "0.018"          
## edad      "0.006"        "*** "              "0.001"          
## t_hijos   "0.007"        "*** "              "0.001"          
## etnia     "0.061"        "*** "              "0.001"          
## area      "0.021"        "*** "              "0.003"          
##           AME_hat_q1lc_pr_1 AME_hat_q1lc_cil_1 AME_hat_q1lc_ciu_1
## lingrl    "1"               "-0.006"           "0.006"           
## anios_esc "0.912"           "-0.037"           "0.033"           
## edad      "0"               "0.004"            "0.008"           
## t_hijos   "0"               "0.005"            "0.009"           
## etnia     "0"               "0.059"            "0.063"           
## area      "0"               "0.015"            "0.027"

MATRIZ DE CONFUCIÓN

Logit

pred_logit_ord <- crossprod(t(X), coeficientes_1)

pred_clases <- apply(pred_logit_ord, 1, which.max)

conf_matrix <- table(Predicho = pred_clases, Real = as.numeric(data$intensidad_dpp))

print(conf_matrix)

##         Real
## Predicho     0     1     2
##        1 12828  1790  1833

exactitud <- sum(diag(conf_matrix)) / sum(conf_matrix)
cat("Exactitud del modelo Logit Ordenado:", exactitud, "\n")

## Exactitud del modelo Logit Ordenado: 0.7797702

Una exactitud del 77.98 % indica que el modelo logit ordenado clasifica correctamente casi 8 de cada 10 observaciones en tu variable dependiente intensidad_dpp. Esto puede considerarse una capacidad predictiva aceptable o moderadamente buena, dependiendo del contexto del estudio y del equilibrio entre clases.

Probit

pred_probit_ord <- crossprod(t(X), coeficientes_1)

pred_clases_pro <- apply(pred_probit_ord, 1, which.max)

conf_matrix_pro <- table(Predicho = pred_clases_pro , Real = as.numeric(data$intensidad_dpp))

print(conf_matrix_pro)

##         Real
## Predicho     0     1     2
##        1 12828  1790  1833

exactitud_pro <- sum(diag(conf_matrix_pro)) / sum(conf_matrix_pro)
cat("Exactitud del modelo Logit Ordenado:", exactitud, "\n")

## Exactitud del modelo Logit Ordenado: 0.7797702

La coincidencia exacta en la medida de exactitud (0.77977) entre el modelo logit ordenado y el modelo probit ordenado se explica por la similitud estructural entre ambos enfoques. Aunque utilizan funciones de enlace distintas —logística para el logit y normal estándar para el probit—, ambas transformaciones son muy parecidas en forma y efecto sobre los datos. Como resultado, al aplicar los mismos predictores y la misma base de datos, es esperable que generen clasificaciones prácticamente idénticas, lo que se refleja en igual número de aciertos sobre el total de observaciones. Esta similitud no representa un error ni una anomalía, sino que confirma que, en este caso particular, la función de enlace elegida no altera el desempeño predictivo del modelo en términos de exactitud.

Tarea

Alba Gualan Ordoñez

2025-06-01