PREGUNTA 12            Un estudio del conjunto de datos “birthwt” desea estudiar la relación entre el peso al nacer y la edad gestacional, teniendo en cuenta otros factores como la edad y el estado de tabaquismo de la madre. Podríamos plantear la siguiente pregunta: ¿Cuál es la probabilidad de que un recién nacido tenga un peso inferior a 2,5 gramos, dado que la edad gestacional es de 30 semanas y la madre es fumadora?

Podemos ajustar un modelo de regresión logística utilizando la función glm() en R de la siguiente manera:

# library(tidyverse)
# library(magrittr)
# library(dplyr)
library(Zelig)
## Loading required package: survival
library(pander)
## Warning: package 'pander' was built under R version 4.2.2
library(texreg)
## Warning: package 'texreg' was built under R version 4.2.2
## Version:  1.38.6
## Date:     2022-04-06
## Author:   Philip Leifeld (University of Essex)
## 
## Consider submitting praise using the praise or praise_interactive functions.
## Please cite the JSS article in your publications -- see citation("texreg").
library(visreg)
## Warning: package 'visreg' was built under R version 4.2.2
library(lmtest)
## Warning: package 'lmtest' was built under R version 4.2.2
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 4.2.2
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
library(visreg)
library(sjmisc)
## Warning: package 'sjmisc' was built under R version 4.2.2
library(MASS)
library(AER)
## Warning: package 'AER' was built under R version 4.2.2
## Loading required package: car
## Warning: package 'car' was built under R version 4.2.2
## Loading required package: carData
## Warning: package 'carData' was built under R version 4.2.2
## Loading required package: sandwich
## Warning: package 'sandwich' was built under R version 4.2.2
library( Amelia)
## Warning: package 'Amelia' was built under R version 4.2.2
## Loading required package: Rcpp
## Warning: package 'Rcpp' was built under R version 4.2.2
## ## 
## ## Amelia II: Multiple Imputation
## ## (Version 1.8.1, built: 2022-11-18)
## ## Copyright (C) 2005-2023 James Honaker, Gary King and Matthew Blackwell
## ## Refer to http://gking.harvard.edu/amelia/ for more information
## ##
library(geepack)
## Warning: package 'geepack' was built under R version 4.2.2
library( MatchIt)
## Warning: package 'MatchIt' was built under R version 4.2.2
library(Zelig)
library(magrittr)
## 
## Attaching package: 'magrittr'
## The following object is masked from 'package:texreg':
## 
##     extract
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.2.2
## ── Attaching packages
## ───────────────────────────────────────
## tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0      ✔ purrr   1.0.1 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.3      ✔ forcats 0.5.2
## Warning: package 'ggplot2' was built under R version 4.2.2
## Warning: package 'purrr' was built under R version 4.2.2
## Warning: package 'dplyr' was built under R version 4.2.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ tibble::add_case()  masks sjmisc::add_case()
## ✖ tidyr::extract()    masks magrittr::extract(), texreg::extract()
## ✖ dplyr::filter()     masks stats::filter()
## ✖ purrr::is_empty()   masks sjmisc::is_empty()
## ✖ dplyr::lag()        masks stats::lag()
## ✖ dplyr::recode()     masks car::recode()
## ✖ purrr::reduce()     masks Zelig::reduce()
## ✖ tidyr::replace_na() masks sjmisc::replace_na()
## ✖ dplyr::select()     masks MASS::select()
## ✖ purrr::set_names()  masks magrittr::set_names()
## ✖ purrr::some()       masks car::some()
## ✖ ggplot2::stat()     masks Zelig::stat()
data("birthwt")
birthwt$race <- as.factor(birthwt$race)
birthwt$smoke <- as.factor(birthwt$smoke)
birthwt$ui <- as.factor(birthwt$ui)
birthwt$low <- as.factor(birthwt$low)
head(birthwt)
##    low age lwt race smoke ptl ht ui ftv  bwt
## 85   0  19 182    2     0   0  0  1   0 2523
## 86   0  33 155    3     0   0  0  0   3 2551
## 87   0  20 105    1     1   0  0  0   1 2557
## 88   0  21 108    1     1   0  0  1   2 2594
## 89   0  18 107    1     1   0  0  1   0 2600
## 91   0  21 124    3     0   0  0  0   0 2622
dim(birthwt)
## [1] 189  10
birthwt2 <- birthwt%>%rename(hypertension = ht)%>%mutate(race = ifelse(race == 1, "white", "non-white"))
head(birthwt2)
##    low age lwt      race smoke ptl hypertension ui ftv  bwt
## 85   0  19 182 non-white     0   0            0  1   0 2523
## 86   0  33 155 non-white     0   0            0  0   3 2551
## 87   0  20 105     white     1   0            0  0   1 2557
## 88   0  21 108     white     1   0            0  1   2 2594
## 89   0  18 107     white     1   0            0  1   0 2600
## 91   0  21 124 non-white     0   0            0  0   0 2622
#view(birthwt2)

Modelos de regresión logística

modelo 1

modelo analicé si un niño que nace con un peso al nacer inferior al normal se ve afectado por la raza. Al observar los resultados de este primer modelo, podemos ver que ser blanco tiene un efecto negativo sobre si un niño nace por debajo del peso normal al nacer (-0.6954) en comparación con otras razas. Dependiendo de la raza, uno puede recibir mejor atención que otras razas debido a los ingresos, la salud, etc. Según el valor de p, podemos ver que los resultados son estadísticamente significativos.

m0 <- glm(low ~ race, family = binomial, data = birthwt2)
summary(m0)
## 
## Call:
## glm(formula = low ~ race, family = binomial, data = birthwt2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.9895  -0.9895  -0.7401   1.3777   1.6905  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept)  -0.4595     0.2129  -2.159   0.0309 *
## racewhite    -0.6954     0.3202  -2.172   0.0298 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 234.67  on 188  degrees of freedom
## Residual deviance: 229.86  on 187  degrees of freedom
## AIC: 233.86
## 
## Number of Fisher Scoring iterations: 4

modelo 2

m1 <- glm(low ~ race + smoke, family = binomial, data = birthwt2)
summary(m1)
## 
## Call:
## glm(formula = low ~ race + smoke, family = binomial, data = birthwt2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.3402  -0.8840  -0.5433   1.4968   1.9930  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)   
## (Intercept)  -0.7382     0.2379  -3.103  0.00191 **
## racewhite    -1.1003     0.3645  -3.019  0.00254 **
## smoke1        1.1130     0.3643   3.056  0.00225 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 234.67  on 188  degrees of freedom
## Residual deviance: 219.98  on 186  degrees of freedom
## AIC: 225.98
## 
## Number of Fisher Scoring iterations: 4

modelo 3

m2 <- glm(low ~ race + smoke + hypertension, family = binomial, data = birthwt2)
summary(m2)
## 
## Call:
## glm(formula = low ~ race + smoke + hypertension, family = binomial, 
##     data = birthwt2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.3371  -0.8496  -0.5222   1.0587   2.0297  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   -0.8331     0.2460  -3.386 0.000708 ***
## racewhite     -1.0904     0.3686  -2.958 0.003095 ** 
## smoke1         1.1190     0.3681   3.040 0.002367 ** 
## hypertension   1.1725     0.6225   1.883 0.059633 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 234.67  on 188  degrees of freedom
## Residual deviance: 216.38  on 185  degrees of freedom
## AIC: 224.38
## 
## Number of Fisher Scoring iterations: 4

modelo 4

m3 <- glm(low ~ race + smoke + hypertension + ui, family = binomial, data = birthwt2)
summary(m3)
## 
## Call:
## glm(formula = low ~ race + smoke + hypertension + ui, family = binomial, 
##     data = birthwt2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.6588  -0.7920  -0.4829   1.1445   2.1009  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   -1.0136     0.2637  -3.844 0.000121 ***
## racewhite     -1.0766     0.3749  -2.872 0.004082 ** 
## smoke1         1.0915     0.3742   2.917 0.003532 ** 
## hypertension   1.3582     0.6290   2.159 0.030822 *  
## ui1            1.0067     0.4378   2.299 0.021485 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 234.67  on 188  degrees of freedom
## Residual deviance: 211.17  on 184  degrees of freedom
## AIC: 221.17
## 
## Number of Fisher Scoring iterations: 4

modelo 5

m4 <- glm(low ~ race * hypertension + smoke + ui, family = binomial, data = birthwt2)
summary(m4)
## 
## Call:
## glm(formula = low ~ race * hypertension + smoke + ui, family = binomial, 
##     data = birthwt2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.6521  -0.8115  -0.4885   1.1502   2.0905  
## 
## Coefficients:
##                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)             -1.0597     0.2711  -3.909 9.26e-05 ***
## racewhite               -1.0061     0.3829  -2.628  0.00860 ** 
## hypertension             1.8508     0.8846   2.092  0.03642 *  
## smoke1                   1.1241     0.3765   2.986  0.00283 ** 
## ui1                      1.0052     0.4375   2.298  0.02157 *  
## racewhite:hypertension  -1.1185     1.3127  -0.852  0.39419    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 234.67  on 188  degrees of freedom
## Residual deviance: 210.42  on 183  degrees of freedom
## AIC: 222.42
## 
## Number of Fisher Scoring iterations: 4

Prueba de razón de verosimilitud

anova(m0, m1, m2, m3, m4, test = "Chisq")
## Analysis of Deviance Table
## 
## Model 1: low ~ race
## Model 2: low ~ race + smoke
## Model 3: low ~ race + smoke + hypertension
## Model 4: low ~ race + smoke + hypertension + ui
## Model 5: low ~ race * hypertension + smoke + ui
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)   
## 1       187     229.86                        
## 2       186     219.98  1   9.8802 0.001671 **
## 3       185     216.38  1   3.5949 0.057956 . 
## 4       184     211.17  1   5.2148 0.022396 * 
## 5       183     210.42  1   0.7515 0.385993   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Comparando modelos

htmlreg(list(m0, m1, m2, m3, m4),  file = "texreg.doc",
        inline.css = FALSE,
        doctype = TRUE,
        html.tag = TRUE,
        head.tag = TRUE,
        body.tag = TRUE)
## The table was written to the file 'texreg.doc'.
unlink("texreg.doc")


fullmod <- glm(low ~ age+lwt+race + smoke + hypertension + ui, family = binomial, data = birthwt2)
summary(fullmod)
## 
## Call:
## glm(formula = low ~ age + lwt + race + smoke + hypertension + 
##     ui, family = binomial, data = birthwt2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7484  -0.8455  -0.5295   1.0059   2.1739  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)   
## (Intercept)   1.294610   1.085101   1.193  0.23284   
## age          -0.019884   0.035214  -0.565  0.57231   
## lwt          -0.014846   0.006495  -2.286  0.02226 * 
## racewhite    -1.032600   0.393011  -2.627  0.00860 **
## smoke1        1.079879   0.388200   2.782  0.00541 **
## hypertension  1.837254   0.685882   2.679  0.00739 **
## ui1           0.882103   0.450421   1.958  0.05018 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 234.67  on 188  degrees of freedom
## Residual deviance: 204.44  on 182  degrees of freedom
## AIC: 218.44
## 
## Number of Fisher Scoring iterations: 4

Interpretación:

Para predecir low que indica si un recién nacido tiene un peso inferior a 2.5 gramos o no, a partir de las variables explicativas “age” edad de la madre, peso de la madre(lwt), race(raza de la madre), smoke(estado de tabaquismo de la madre), hipertesión y ui (presencia de infecció urinaria en la madre)

En resumen, el modelo sugieere que el peso de la madre,la raza, el estado de tabaquismo y la hipertesión de la madre son factores importantes para predecir la probabilidad de tener un hijo bajo peso al nacer. Sin embargo, la edad de la madre y la presencia de la infección urinaria no son significatubas en el modelo.

########################################

 La regresión logistica que predice la probabilidad de que un recién nacido tenga un peso inferir a 2.5 gramos sin incluir ninguna variable explicativa, es decir, simplemente asume que todas las observaciones tienen la misma probabilidad de tener un bajo peso al nacer.

nothing <- glm(low ~ 1, family=binomial,data = birthwt2)
summary(nothing)
## 
## Call:
## glm(formula = low ~ 1, family = binomial, data = birthwt2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.8651  -0.8651  -0.8651   1.5259   1.5259  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   -0.790      0.157  -5.033 4.84e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 234.67  on 188  degrees of freedom
## Residual deviance: 234.67  on 188  degrees of freedom
## AIC: 236.67
## 
## Number of Fisher Scoring iterations: 4

Interpretación:

El pvalor=0.00<0.05. S rechaza Ho. Lo que nos dice que el modelo es altamente significativo.

La deviance residual es igual a la deviance nula lo que nos indica que el modelo no explica nada de la variabilidad de los datos

redmod1 = glm(low ~ lwt+race+smoke+ ptl + hypertension,family=binomial,data = birthwt2)
summary(redmod1)
## 
## Call:
## glm(formula = low ~ lwt + race + smoke + ptl + hypertension, 
##     family = binomial, data = birthwt2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7335  -0.8358  -0.5436   0.9474   2.1421  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)   
## (Intercept)   0.957060   0.840186   1.139  0.25466   
## lwt          -0.015245   0.006532  -2.334  0.01961 * 
## racewhite    -1.041289   0.386167  -2.696  0.00701 **
## smoke1        0.998133   0.388996   2.566  0.01029 * 
## ptl           0.595026   0.335353   1.774  0.07601 . 
## hypertension  1.736837   0.693986   2.503  0.01233 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 234.67  on 188  degrees of freedom
## Residual deviance: 205.39  on 183  degrees of freedom
## AIC: 217.39
## 
## Number of Fisher Scoring iterations: 4
backwards=step(fullmod)
## Start:  AIC=218.44
## low ~ age + lwt + race + smoke + hypertension + ui
## 
##                Df Deviance    AIC
## - age           1   204.77 216.77
## <none>              204.44 218.44
## - ui            1   208.23 220.23
## - lwt           1   210.36 222.36
## - race          1   211.78 223.78
## - hypertension  1   211.97 223.97
## - smoke         1   212.57 224.57
## 
## Step:  AIC=216.77
## low ~ lwt + race + smoke + hypertension + ui
## 
##                Df Deviance    AIC
## <none>              204.77 216.77
## - ui            1   208.66 218.66
## - lwt           1   211.17 221.17
## - hypertension  1   212.37 222.37
## - race          1   212.83 222.83
## - smoke         1   213.15 223.15
 formula(backwards)
## low ~ lwt + race + smoke + hypertension + ui
 summary(backwards)
## 
## Call:
## glm(formula = low ~ lwt + race + smoke + hypertension + ui, family = binomial, 
##     data = birthwt2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7582  -0.8357  -0.5349   1.0000   2.1762  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)   
## (Intercept)   0.904876   0.837725   1.080  0.28007   
## lwt          -0.015287   0.006455  -2.368  0.01788 * 
## racewhite    -1.064769   0.388109  -2.743  0.00608 **
## smoke1        1.091155   0.386610   2.822  0.00477 **
## hypertension  1.853287   0.688244   2.693  0.00709 **
## ui1           0.892556   0.449515   1.986  0.04708 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 234.67  on 188  degrees of freedom
## Residual deviance: 204.77  on 183  degrees of freedom
## AIC: 216.77
## 
## Number of Fisher Scoring iterations: 4

Interpretación:

El modelo ajustado es mejor que el nulo, todas las variables son significativas.

“lwt” un aumento de la unidad en el peso del bebe al nacer se asocia con un decrecimiento de 0.015 en el log odds de lo manteniendo constate las otras variables.

forwards=step(nothing,scope=list(lower=formula(nothing), upper=formula(fullmod)), direction="forward")
## Start:  AIC=236.67
## low ~ 1
## 
##                Df Deviance    AIC
## + lwt           1   228.69 232.69
## + ui            1   229.60 233.60
## + smoke         1   229.81 233.81
## + race          1   229.86 233.86
## + hypertension  1   230.65 234.65
## + age           1   231.91 235.91
## <none>              234.67 236.67
## 
## Step:  AIC=232.69
## low ~ lwt
## 
##                Df Deviance    AIC
## + hypertension  1   221.14 227.14
## + smoke         1   224.34 230.34
## + race          1   224.65 230.65
## + ui            1   225.10 231.10
## <none>              228.69 232.69
## + age           1   227.12 233.12
## 
## Step:  AIC=227.14
## low ~ lwt + hypertension
## 
##         Df Deviance    AIC
## + ui     1   216.61 224.61
## + smoke  1   216.86 224.86
## + race   1   217.74 225.74
## <none>       221.14 227.14
## + age    1   219.88 227.88
## 
## Step:  AIC=224.61
## low ~ lwt + hypertension + ui
## 
##         Df Deviance    AIC
## + smoke  1   212.83 222.83
## + race   1   213.15 223.15
## <none>       216.61 224.61
## + age    1   215.48 225.48
## 
## Step:  AIC=222.83
## low ~ lwt + hypertension + ui + smoke
## 
##        Df Deviance    AIC
## + race  1   204.77 216.77
## <none>      212.83 222.83
## + age   1   211.78 223.78
## 
## Step:  AIC=216.77
## low ~ lwt + hypertension + ui + smoke + race
## 
##        Df Deviance    AIC
## <none>      204.77 216.77
## + age   1   204.44 218.44