Un investigador está interesado en saber qué efecto tienen variables como PAU (nota en Selectividad), bach (nota media bachiller) y el prestigio del instituto al que asistieron en la admisión de los estudiantes en una determinada universidad. La variable respuesta, admitir/no admitirlo, es una variable binaria (8p).
*Y: Admitido (1: Admitido, 0: No Admitido) X1: PAU (Nota en selectividad) X2: Bach (Nota media bachiller) X3: Prestigio (1 mayor reconocimiento hasta 4 reconocimiento bajo)
datos<-read.table("trabajo.txt",header = T,sep = "\t")
attach(datos)
head(datos,5)
## Y Pau bach Prestigio
## 1 0 380 3.61 3
## 2 1 660 3.67 3
## 3 1 800 4.00 1
## 4 1 640 3.19 4
## 5 0 520 2.93 4
# Ajustar el modelo de regresión logística binaria
modelo_logistico1 <- glm(Y ~ ., data = datos, family = binomial)
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
# Resumen del modelo
summary(modelo_logistico1)
##
## Call:
## glm(formula = Y ~ ., family = binomial, data = datos)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.203e+02 1.287e+06 0.000 1
## Pau 3.678e-01 7.229e+02 0.001 1
## bach 5.520e+01 2.644e+05 0.000 1
## Prestigio -1.420e+01 1.117e+05 0.000 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1.4421e+01 on 10 degrees of freedom
## Residual deviance: 3.9958e-10 on 7 degrees of freedom
## AIC: 8
##
## Number of Fisher Scoring iterations: 25
Nota. Se observa que, las variables no son significativas (p-value > 5%), lo que da a entender que no influyen o repercuten en que admitan en la universidad, sin embargo por motivos de practica se sigue con el análisis.
#Test de Hosmer-Lemeshow
library(ResourceSelection)
## ResourceSelection 0.3-6 2023-06-27
hoslem.test(Y, fitted(modelo_logistico1), g=10)
## Warning in hoslem.test(Y, fitted(modelo_logistico1), g = 10): The data did not
## allow for the requested number of bins.
##
## Hosmer and Lemeshow goodness of fit (GOF) test
##
## data: Y, fitted(modelo_logistico1)
## X-squared = 1.9979e-10, df = 3, p-value = 1
Nota. Según la prueba de Hosmer-Lemeshow, el modelo es adecuado, para un nivel de signficancia del 5%.
#Pseudo-R2
library(DescTools)
PseudoR2(modelo_logistico1,which = c("McFadden","CoxSnell","Nagelkerke"))
## McFadden CoxSnell Nagelkerke
## 1.0000000 0.7304398 1.0000000
Nota. observamos que el valor de Rcuadrado de Cox y Snell es igual a 0.73 y el valor de Rcuadrado de Nagelkerke es igual a 1, el cual nos indica que el 100% de las proporciones de la varianza de la variable dependiente (Admitir) es explicada por la variable predictora Pau, Bach y prestigio; dando a entender que el modelo de regresión logística es bueno.
exp(cbind(OR = coef(modelo_logistico1), confint(modelo_logistico1, level=0.95)))
## Waiting for profiling to be done...
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## OR 2.5 % 97.5 %
## (Intercept) 8.144029e-140 0.00000e+00 Inf
## Pau 1.444521e+00 1.99955e-17 3.125409e+17
## bach 9.424154e+23 NA Inf
## Prestigio 6.833188e-07 0.00000e+00 NA
Nota. Se observa el OR y sus intervalos
#Precisión del modelo
y.pred<-ifelse(modelo_logistico1$fitted.values > 0.43, yes = 1, no = 0)
tab_clasif <-table(Y, y.pred,dnn = c("observ", "predic"))
tab_clasif
## predic
## observ 0 1
## 0 4 0
## 1 0 7
Nota. Considerando la precisión (4+7)/7 * 100 = 100%; Entonces el modelo presenta una precisión del 100%, lo que indica un modelo muy bueno.
#Prediccion
nuevo<-data.frame(Pau= 1000, bach=4.5, Prestigio=1)
pred<-predict(modelo_logistico1,newdata=nuevo,type="response")
pred
## 1
## 1
Nota. De acuerdo a estos resultados podemos decir que, la probabilidad de que se admita es del 100% cuando Pau sea 1000, bach sea 4.5 y prestigio sea 1.
*E. Encontrar los parámetros posteriores para una regresión logística bayesiana asumiendo que siguen una distribución normal: Intercepto -> N (0,9); B1 -> N (0,10).
Para ello consideramos el siguiente código:
#REGRESION LOGISTICA BAYESIANA ASUMIENDO NORMALIDAD
# Instala y carga el paquete "rstan" si aún no lo has hecho
library(rstan)
## Loading required package: StanHeaders
##
## rstan version 2.26.23 (Stan version 2.26.1)
## For execution on a local, multicore CPU with excess RAM we recommend calling
## options(mc.cores = parallel::detectCores()).
## To avoid recompilation of unchanged Stan programs, we recommend calling
## rstan_options(auto_write = TRUE)
## For within-chain threading using `reduce_sum()` or `map_rect()` Stan functions,
## change `threads_per_chain` option:
## rstan_options(threads_per_chain = 1)
## Do not specify '-march=native' in 'LOCAL_CPPFLAGS' or a Makevars file
library(StanHeaders)
# Definir el modelo en formato Stan
modelo_stan <- "
data {
int<lower=0> N; // Número de observaciones
int<lower=0, upper=3> y[N]; // Variable de respuesta binaria
vector[N] x1; // Variable predictora x1
vector[N] x2; // Variable predictora x2
vector[N] x3; // Variable predictora x3
}
parameters {
real alpha; // Intercepto
real beta1; // Coeficiente para x1
real beta2; // Coeficiente para x2
real beta3; // Coeficiente para x3
}
model {
// Prior para los parámetros
alpha ~ normal(0, 9);
beta1 ~ normal(0, 10);
beta2 ~ normal(0, 10);
beta3 ~ normal(0, 10);
// Modelo logístico
for (i in 1:N) {
y[i] ~ bernoulli_logit(alpha + beta1 * x1[i] + beta2 * x2[i] + beta3 * x3[i]);
}
}
"
# Convertir los datos en formato necesario para Stan
datos_stan <- list(
N = nrow(datos),
y = Y,
x1 = Pau,
x2 = bach,
x3 = Prestigio
)
# Compilar el modelo
modelo_compilado <- stan_model(model_code = modelo_stan)
# Ajustar el modelo utilizando MCMC
ajuste <- sampling(modelo_compilado, data = datos_stan, chains = 4, iter = 2000)
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0.000252 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 2.52 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 37.163 seconds (Warm-up)
## Chain 1: 16.811 seconds (Sampling)
## Chain 1: 53.974 seconds (Total)
## Chain 1:
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
## Chain 2:
## Chain 2: Gradient evaluation took 0.00015 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 1.5 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2:
## Chain 2:
## Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 2: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 2: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 2:
## Chain 2: Elapsed Time: 30.174 seconds (Warm-up)
## Chain 2: 15.402 seconds (Sampling)
## Chain 2: 45.576 seconds (Total)
## Chain 2:
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
## Chain 3:
## Chain 3: Gradient evaluation took 0.000169 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 1.69 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3:
## Chain 3:
## Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 3: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 3: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 3:
## Chain 3: Elapsed Time: 30.005 seconds (Warm-up)
## Chain 3: 18.587 seconds (Sampling)
## Chain 3: 48.592 seconds (Total)
## Chain 3:
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
## Chain 4:
## Chain 4: Gradient evaluation took 0.000167 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 1.67 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4:
## Chain 4:
## Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 4: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 4: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 4:
## Chain 4: Elapsed Time: 30.117 seconds (Warm-up)
## Chain 4: 18.26 seconds (Sampling)
## Chain 4: 48.377 seconds (Total)
## Chain 4:
# Resumen de los resultados
summary(ajuste)
## $summary
## mean se_mean sd 2.5% 25% 50%
## alpha -7.5850825 0.238645231 8.25963304 -23.79392742 -12.87167314 -7.3227675
## beta1 0.1191196 0.002029352 0.05530996 0.03283342 0.07616924 0.1139842
## beta2 -11.3009207 0.234998681 6.63873371 -25.46294887 -15.66372867 -10.6314233
## beta3 -5.8660000 0.117553059 3.36303271 -13.15321002 -7.94050932 -5.5947049
## lp__ -3.5618976 0.052208909 1.56583457 -7.47509353 -4.32642058 -3.2141120
## 75% 97.5% n_eff Rhat
## alpha -1.989730 7.98430898 1197.8874 1.002824
## beta1 0.155305 0.23605404 742.8340 1.005345
## beta2 -6.489268 0.06378046 798.0675 1.005878
## beta3 -3.344317 -0.38311873 818.4541 1.005582
## lp__ -2.426563 -1.61383403 899.5028 1.003230
##
## $c_summary
## , , chains = chain:1
##
## stats
## parameter mean sd 2.5% 25% 50%
## alpha -7.1374827 8.58192258 -23.16435569 -12.41705274 -6.8554579
## beta1 0.1223059 0.05360008 0.03348664 0.08040231 0.1187725
## beta2 -11.7560875 6.49422125 -25.29817263 -15.91017703 -11.1969360
## beta3 -6.0736655 3.22884746 -13.20091917 -8.07963454 -5.8723206
## lp__ -3.6492690 1.63843614 -7.40346386 -4.39417786 -3.3129407
## stats
## parameter 75% 97.5%
## alpha -1.6349249 8.8100678
## beta1 0.1569497 0.2339462
## beta2 -7.3129103 0.1168572
## beta3 -3.6292457 -0.7860101
## lp__ -2.5105662 -1.6427724
##
## , , chains = chain:2
##
## stats
## parameter mean sd 2.5% 25% 50%
## alpha -8.5760149 8.38065678 -24.61192617 -14.73782328 -8.4128073
## beta1 0.1165139 0.05121547 0.03703342 0.07729044 0.1111404
## beta2 -10.7509531 6.35714055 -23.47335522 -14.75605091 -10.3987625
## beta3 -5.6872043 3.17535711 -12.49332460 -7.63828962 -5.6244599
## lp__ -3.6290916 1.57712954 -7.94781147 -4.34133395 -3.2545052
## stats
## parameter 75% 97.5%
## alpha -2.5874394 7.0135145
## beta1 0.1505257 0.2214264
## beta2 -6.1829809 0.5193729
## beta3 -3.2582870 -0.3351308
## lp__ -2.4968617 -1.6489169
##
## , , chains = chain:3
##
## stats
## parameter mean sd 2.5% 25% 50%
## alpha -7.2217676 7.94872246 -23.64136645 -12.13882902 -6.8448166
## beta1 0.1248628 0.05819657 0.03693653 0.07801188 0.1167892
## beta2 -11.9919909 6.88636722 -26.96220345 -17.00010939 -10.6608585
## beta3 -6.2372425 3.58467712 -13.79777285 -8.27190284 -5.8446802
## lp__ -3.5516426 1.55427291 -7.27129086 -4.31626390 -3.1798308
## stats
## parameter 75% 97.5%
## alpha -1.5794022 7.2231405
## beta1 0.1650802 0.2537860
## beta2 -6.8025944 -0.9492990
## beta3 -3.6602651 -0.3712388
## lp__ -2.4203952 -1.6036944
##
## , , chains = chain:4
##
## stats
## parameter mean sd 2.5% 25% 50%
## alpha -7.4050649 8.04213806 -23.89486520 -12.36092109 -7.1686868
## beta1 0.1127955 0.05721108 0.02764516 0.06721111 0.1068595
## beta2 -10.7046514 6.71335923 -24.63197951 -15.03573990 -10.3055611
## beta3 -5.4658877 3.39739373 -12.70546025 -7.75257169 -5.0175899
## lp__ -3.4175870 1.48122483 -6.88644305 -4.20428495 -3.1023001
## stats
## parameter 75% 97.5%
## alpha -2.2770356 7.8380828
## beta1 0.1497252 0.2358256
## beta2 -5.7478293 0.1374114
## beta3 -2.8913688 -0.2214687
## lp__ -2.2923531 -1.5808450
Nota. Observamos que, el modelo bayesiano, quedaría con los parametros siguientes:
Ln(y) = -7.4367 + 0.1208(x1) - 11.5586(x2) - 5.9160(x3)
# Visualización de resultados (por ejemplo, trazar distribuciones posteriores)
plot(ajuste)
## ci_level: 0.8 (80% intervals)
## outer_level: 0.95 (95% intervals)
Nota. Se observa que el modelo es adecuado.
A MODO PRÁCTICA