PREGUNTA 12 Un estudio del conjunto de datos “birthwt” desea estudiar la relación entre el peso al nacer y la edad gestacional, teniendo en cuenta otros factores como la edad y el estado de tabaquismo de la madre. Podríamos plantear la siguiente pregunta: ¿Cuál es la probabilidad de que un recién nacido tenga un peso inferior a 2,5 gramos, dado que la edad gestacional es de 30 semanas y la madre es fumadora?
Podemos ajustar un modelo de regresión logística utilizando la
función glm() en R de la siguiente
manera:
# library(tidyverse)
# library(magrittr)
# library(dplyr)
library(Zelig)
## Loading required package: survival
library(pander)
## Warning: package 'pander' was built under R version 4.2.2
library(texreg)
## Warning: package 'texreg' was built under R version 4.2.2
## Version: 1.38.6
## Date: 2022-04-06
## Author: Philip Leifeld (University of Essex)
##
## Consider submitting praise using the praise or praise_interactive functions.
## Please cite the JSS article in your publications -- see citation("texreg").
library(visreg)
## Warning: package 'visreg' was built under R version 4.2.2
library(lmtest)
## Warning: package 'lmtest' was built under R version 4.2.2
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 4.2.2
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(visreg)
library(sjmisc)
## Warning: package 'sjmisc' was built under R version 4.2.2
library(MASS)
library(AER)
## Warning: package 'AER' was built under R version 4.2.2
## Loading required package: car
## Warning: package 'car' was built under R version 4.2.2
## Loading required package: carData
## Warning: package 'carData' was built under R version 4.2.2
## Loading required package: sandwich
## Warning: package 'sandwich' was built under R version 4.2.2
library( Amelia)
## Warning: package 'Amelia' was built under R version 4.2.2
## Loading required package: Rcpp
## Warning: package 'Rcpp' was built under R version 4.2.2
## ##
## ## Amelia II: Multiple Imputation
## ## (Version 1.8.1, built: 2022-11-18)
## ## Copyright (C) 2005-2023 James Honaker, Gary King and Matthew Blackwell
## ## Refer to http://gking.harvard.edu/amelia/ for more information
## ##
library(geepack)
## Warning: package 'geepack' was built under R version 4.2.2
library( MatchIt)
## Warning: package 'MatchIt' was built under R version 4.2.2
library(Zelig)
library(magrittr)
##
## Attaching package: 'magrittr'
## The following object is masked from 'package:texreg':
##
## extract
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.2.2
## ── Attaching packages
## ───────────────────────────────────────
## tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 1.0.1
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.4.1
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## Warning: package 'ggplot2' was built under R version 4.2.2
## Warning: package 'purrr' was built under R version 4.2.2
## Warning: package 'dplyr' was built under R version 4.2.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ tibble::add_case() masks sjmisc::add_case()
## ✖ tidyr::extract() masks magrittr::extract(), texreg::extract()
## ✖ dplyr::filter() masks stats::filter()
## ✖ purrr::is_empty() masks sjmisc::is_empty()
## ✖ dplyr::lag() masks stats::lag()
## ✖ dplyr::recode() masks car::recode()
## ✖ purrr::reduce() masks Zelig::reduce()
## ✖ tidyr::replace_na() masks sjmisc::replace_na()
## ✖ dplyr::select() masks MASS::select()
## ✖ purrr::set_names() masks magrittr::set_names()
## ✖ purrr::some() masks car::some()
## ✖ ggplot2::stat() masks Zelig::stat()
data("birthwt")
birthwt$race <- as.factor(birthwt$race)
birthwt$smoke <- as.factor(birthwt$smoke)
birthwt$ui <- as.factor(birthwt$ui)
birthwt$low <- as.factor(birthwt$low)
head(birthwt)
## low age lwt race smoke ptl ht ui ftv bwt
## 85 0 19 182 2 0 0 0 1 0 2523
## 86 0 33 155 3 0 0 0 0 3 2551
## 87 0 20 105 1 1 0 0 0 1 2557
## 88 0 21 108 1 1 0 0 1 2 2594
## 89 0 18 107 1 1 0 0 1 0 2600
## 91 0 21 124 3 0 0 0 0 0 2622
dim(birthwt)
## [1] 189 10
birthwt2 <- birthwt%>%rename(hypertension = ht)%>%mutate(race = ifelse(race == 1, "white", "non-white"))
head(birthwt2)
## low age lwt race smoke ptl hypertension ui ftv bwt
## 85 0 19 182 non-white 0 0 0 1 0 2523
## 86 0 33 155 non-white 0 0 0 0 3 2551
## 87 0 20 105 white 1 0 0 0 1 2557
## 88 0 21 108 white 1 0 0 1 2 2594
## 89 0 18 107 white 1 0 0 1 0 2600
## 91 0 21 124 non-white 0 0 0 0 0 2622
#view(birthwt2)
modelo 1
modelo analicé si un niño que nace con un peso al nacer inferior al normal se ve afectado por la raza. Al observar los resultados de este primer modelo, podemos ver que ser blanco tiene un efecto negativo sobre si un niño nace por debajo del peso normal al nacer (-0.6954) en comparación con otras razas. Dependiendo de la raza, uno puede recibir mejor atención que otras razas debido a los ingresos, la salud, etc. Según el valor de p, podemos ver que los resultados son estadísticamente significativos.
m0 <- glm(low ~ race, family = binomial, data = birthwt2)
summary(m0)
##
## Call:
## glm(formula = low ~ race, family = binomial, data = birthwt2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.9895 -0.9895 -0.7401 1.3777 1.6905
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.4595 0.2129 -2.159 0.0309 *
## racewhite -0.6954 0.3202 -2.172 0.0298 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 234.67 on 188 degrees of freedom
## Residual deviance: 229.86 on 187 degrees of freedom
## AIC: 233.86
##
## Number of Fisher Scoring iterations: 4
m1 <- glm(low ~ race + smoke, family = binomial, data = birthwt2)
summary(m1)
##
## Call:
## glm(formula = low ~ race + smoke, family = binomial, data = birthwt2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.3402 -0.8840 -0.5433 1.4968 1.9930
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.7382 0.2379 -3.103 0.00191 **
## racewhite -1.1003 0.3645 -3.019 0.00254 **
## smoke1 1.1130 0.3643 3.056 0.00225 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 234.67 on 188 degrees of freedom
## Residual deviance: 219.98 on 186 degrees of freedom
## AIC: 225.98
##
## Number of Fisher Scoring iterations: 4
m2 <- glm(low ~ race + smoke + hypertension, family = binomial, data = birthwt2)
summary(m2)
##
## Call:
## glm(formula = low ~ race + smoke + hypertension, family = binomial,
## data = birthwt2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.3371 -0.8496 -0.5222 1.0587 2.0297
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.8331 0.2460 -3.386 0.000708 ***
## racewhite -1.0904 0.3686 -2.958 0.003095 **
## smoke1 1.1190 0.3681 3.040 0.002367 **
## hypertension 1.1725 0.6225 1.883 0.059633 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 234.67 on 188 degrees of freedom
## Residual deviance: 216.38 on 185 degrees of freedom
## AIC: 224.38
##
## Number of Fisher Scoring iterations: 4
m3 <- glm(low ~ race + smoke + hypertension + ui, family = binomial, data = birthwt2)
summary(m3)
##
## Call:
## glm(formula = low ~ race + smoke + hypertension + ui, family = binomial,
## data = birthwt2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.6588 -0.7920 -0.4829 1.1445 2.1009
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.0136 0.2637 -3.844 0.000121 ***
## racewhite -1.0766 0.3749 -2.872 0.004082 **
## smoke1 1.0915 0.3742 2.917 0.003532 **
## hypertension 1.3582 0.6290 2.159 0.030822 *
## ui1 1.0067 0.4378 2.299 0.021485 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 234.67 on 188 degrees of freedom
## Residual deviance: 211.17 on 184 degrees of freedom
## AIC: 221.17
##
## Number of Fisher Scoring iterations: 4
m4 <- glm(low ~ race * hypertension + smoke + ui, family = binomial, data = birthwt2)
summary(m4)
##
## Call:
## glm(formula = low ~ race * hypertension + smoke + ui, family = binomial,
## data = birthwt2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.6521 -0.8115 -0.4885 1.1502 2.0905
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.0597 0.2711 -3.909 9.26e-05 ***
## racewhite -1.0061 0.3829 -2.628 0.00860 **
## hypertension 1.8508 0.8846 2.092 0.03642 *
## smoke1 1.1241 0.3765 2.986 0.00283 **
## ui1 1.0052 0.4375 2.298 0.02157 *
## racewhite:hypertension -1.1185 1.3127 -0.852 0.39419
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 234.67 on 188 degrees of freedom
## Residual deviance: 210.42 on 183 degrees of freedom
## AIC: 222.42
##
## Number of Fisher Scoring iterations: 4
anova(m0, m1, m2, m3, m4, test = "Chisq")
## Analysis of Deviance Table
##
## Model 1: low ~ race
## Model 2: low ~ race + smoke
## Model 3: low ~ race + smoke + hypertension
## Model 4: low ~ race + smoke + hypertension + ui
## Model 5: low ~ race * hypertension + smoke + ui
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 187 229.86
## 2 186 219.98 1 9.8802 0.001671 **
## 3 185 216.38 1 3.5949 0.057956 .
## 4 184 211.17 1 5.2148 0.022396 *
## 5 183 210.42 1 0.7515 0.385993
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Comparando modelos
htmlreg(list(m0, m1, m2, m3, m4), file = "texreg.doc",
inline.css = FALSE,
doctype = TRUE,
html.tag = TRUE,
head.tag = TRUE,
body.tag = TRUE)
## The table was written to the file 'texreg.doc'.
unlink("texreg.doc")
fullmod <- glm(low ~ age+lwt+race + smoke + hypertension + ui, family = binomial, data = birthwt2)
summary(fullmod)
##
## Call:
## glm(formula = low ~ age + lwt + race + smoke + hypertension +
## ui, family = binomial, data = birthwt2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7484 -0.8455 -0.5295 1.0059 2.1739
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.294610 1.085101 1.193 0.23284
## age -0.019884 0.035214 -0.565 0.57231
## lwt -0.014846 0.006495 -2.286 0.02226 *
## racewhite -1.032600 0.393011 -2.627 0.00860 **
## smoke1 1.079879 0.388200 2.782 0.00541 **
## hypertension 1.837254 0.685882 2.679 0.00739 **
## ui1 0.882103 0.450421 1.958 0.05018 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 234.67 on 188 degrees of freedom
## Residual deviance: 204.44 on 182 degrees of freedom
## AIC: 218.44
##
## Number of Fisher Scoring iterations: 4
Interpretación:
Para predecir low que indica si un recién nacido tiene un peso inferior a 2.5 gramos o no, a partir de las variables explicativas “age” edad de la madre, peso de la madre(lwt), race(raza de la madre), smoke(estado de tabaquismo de la madre), hipertesión y ui (presencia de infecció urinaria en la madre)
En resumen, el modelo sugieere que el peso de la madre,la raza, el estado de tabaquismo y la hipertesión de la madre son factores importantes para predecir la probabilidad de tener un hijo bajo peso al nacer. Sin embargo, la edad de la madre y la presencia de la infección urinaria no son significatubas en el modelo.
########################################
La regresión logistica que predice la probabilidad de que un recién nacido tenga un peso inferir a 2.5 gramos sin incluir ninguna variable explicativa, es decir, simplemente asume que todas las observaciones tienen la misma probabilidad de tener un bajo peso al nacer.
nothing <- glm(low ~ 1, family=binomial,data = birthwt2)
summary(nothing)
##
## Call:
## glm(formula = low ~ 1, family = binomial, data = birthwt2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.8651 -0.8651 -0.8651 1.5259 1.5259
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.790 0.157 -5.033 4.84e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 234.67 on 188 degrees of freedom
## Residual deviance: 234.67 on 188 degrees of freedom
## AIC: 236.67
##
## Number of Fisher Scoring iterations: 4
Interpretación:
El pvalor=0.00<0.05. S rechaza Ho. Lo que nos dice que el modelo es altamente significativo.
La deviance residual es igual a la deviance nula lo que nos indica que el modelo no explica nada de la variabilidad de los datos
redmod1 = glm(low ~ lwt+race+smoke+ ptl + hypertension,family=binomial,data = birthwt2)
summary(redmod1)
##
## Call:
## glm(formula = low ~ lwt + race + smoke + ptl + hypertension,
## family = binomial, data = birthwt2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7335 -0.8358 -0.5436 0.9474 2.1421
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.957060 0.840186 1.139 0.25466
## lwt -0.015245 0.006532 -2.334 0.01961 *
## racewhite -1.041289 0.386167 -2.696 0.00701 **
## smoke1 0.998133 0.388996 2.566 0.01029 *
## ptl 0.595026 0.335353 1.774 0.07601 .
## hypertension 1.736837 0.693986 2.503 0.01233 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 234.67 on 188 degrees of freedom
## Residual deviance: 205.39 on 183 degrees of freedom
## AIC: 217.39
##
## Number of Fisher Scoring iterations: 4
backwards=step(fullmod)
## Start: AIC=218.44
## low ~ age + lwt + race + smoke + hypertension + ui
##
## Df Deviance AIC
## - age 1 204.77 216.77
## <none> 204.44 218.44
## - ui 1 208.23 220.23
## - lwt 1 210.36 222.36
## - race 1 211.78 223.78
## - hypertension 1 211.97 223.97
## - smoke 1 212.57 224.57
##
## Step: AIC=216.77
## low ~ lwt + race + smoke + hypertension + ui
##
## Df Deviance AIC
## <none> 204.77 216.77
## - ui 1 208.66 218.66
## - lwt 1 211.17 221.17
## - hypertension 1 212.37 222.37
## - race 1 212.83 222.83
## - smoke 1 213.15 223.15
formula(backwards)
## low ~ lwt + race + smoke + hypertension + ui
summary(backwards)
##
## Call:
## glm(formula = low ~ lwt + race + smoke + hypertension + ui, family = binomial,
## data = birthwt2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7582 -0.8357 -0.5349 1.0000 2.1762
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.904876 0.837725 1.080 0.28007
## lwt -0.015287 0.006455 -2.368 0.01788 *
## racewhite -1.064769 0.388109 -2.743 0.00608 **
## smoke1 1.091155 0.386610 2.822 0.00477 **
## hypertension 1.853287 0.688244 2.693 0.00709 **
## ui1 0.892556 0.449515 1.986 0.04708 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 234.67 on 188 degrees of freedom
## Residual deviance: 204.77 on 183 degrees of freedom
## AIC: 216.77
##
## Number of Fisher Scoring iterations: 4
Interpretación:
El modelo ajustado es mejor que el nulo, todas las variables son significativas.
“lwt” un aumento de la unidad en el peso del bebe al nacer se asocia con un decrecimiento de 0.015 en el log odds de lo manteniendo constate las otras variables.
forwards=step(nothing,scope=list(lower=formula(nothing), upper=formula(fullmod)), direction="forward")
## Start: AIC=236.67
## low ~ 1
##
## Df Deviance AIC
## + lwt 1 228.69 232.69
## + ui 1 229.60 233.60
## + smoke 1 229.81 233.81
## + race 1 229.86 233.86
## + hypertension 1 230.65 234.65
## + age 1 231.91 235.91
## <none> 234.67 236.67
##
## Step: AIC=232.69
## low ~ lwt
##
## Df Deviance AIC
## + hypertension 1 221.14 227.14
## + smoke 1 224.34 230.34
## + race 1 224.65 230.65
## + ui 1 225.10 231.10
## <none> 228.69 232.69
## + age 1 227.12 233.12
##
## Step: AIC=227.14
## low ~ lwt + hypertension
##
## Df Deviance AIC
## + ui 1 216.61 224.61
## + smoke 1 216.86 224.86
## + race 1 217.74 225.74
## <none> 221.14 227.14
## + age 1 219.88 227.88
##
## Step: AIC=224.61
## low ~ lwt + hypertension + ui
##
## Df Deviance AIC
## + smoke 1 212.83 222.83
## + race 1 213.15 223.15
## <none> 216.61 224.61
## + age 1 215.48 225.48
##
## Step: AIC=222.83
## low ~ lwt + hypertension + ui + smoke
##
## Df Deviance AIC
## + race 1 204.77 216.77
## <none> 212.83 222.83
## + age 1 211.78 223.78
##
## Step: AIC=216.77
## low ~ lwt + hypertension + ui + smoke + race
##
## Df Deviance AIC
## <none> 204.77 216.77
## + age 1 204.44 218.44