Trabajo Final de Estadística

Factores asociados a la mortalidad en pacientes con insuficiencia cardíaca

Introducción

Figura adaptada de Savarese et al. Global burden of heart failure: a comprehensive and updated review of epidemiology. Cardiovasc Res. 2023; 119 (6): 1453. DOI: 10.1093/cvr/cvad026

Pregunta de investigación

¿Qué factores sociodemográficos, clínicos y de laboratorio se asocian con la mortalidad en pacientes con insuficiencia cardíaca?

Para responder a esta preguntan se emplearán los datos publicados por Ahmad et al.1 correspondientes a un estudio observacional que incluyó a 299 pacientes con insuficiencia cardíaca atendidos en el Instituto de Cardiología de Faisalabad (Pakistán).

Conjunto de datos

Variable Explicación Medida Rango
Age Edad del paciente Años 40 - 95
BP Hipertensión arterial Binaria 0, 1
CPK Creatinfosfoquinasa mcg/L 23, 7961
Anaemia Anemia Binaria 0, 1
Diabetes Diabetes Mellitus Binaria 0, 1
Ejection.Fraction Fracción de eyección Porcentaje 14 - 80
Gender Sexo Binaria 0, 1
Pletelets Plaquetas kiloplatelets/mL 25.01 - 850
Creatinine Creatinina sérica mg/dL 0.5 - 9.4
Sodium Sodio sérico mEq/L 114 - 148
Smoking Fumador Binaria 0, 1
TIME Tiempo de seguimiento Days 4 - 285
Event Fallecimiento del paciente Binaria 0, 1

Estimación del FGR

\[ \text{FGR} = 142 \times \left(\min\left(\frac{\text{scr}}{k}, 1\right)\right)^a \times \left(\max\left(\frac{\text{scr}}{k}, 1\right)\right)^{-1.2} \times 0.9938^{\text{age}} \times e \] Donde1:

  • scr: es la creatinina sérica expresada en mg/dL
  • k: es 0.7 para el sexo femenino y 0.9 para el masculino
  • a: es -0.241 para el sexo femenino y -0.302 para el masculino
  • e: es 1.012 para el sexo femenino y 1 para el masculino

Código en R

ckd_epi <- function(scr,age,sex){
  sex <- ifelse(sex %in% "Femenino",0,1)
  k <- ifelse(sex == 0,0.7,0.9)
  a <- ifelse(sex == 0,-0.241,-0.302)
  e <- ifelse(sex == 0,1.012,1)
  round(142 * min(scr/k,1)^a * max(scr/k,1)^(-1.2) * 0.9938^age * e,2)
}

Paquetes

if(!require(pacman)){install.packages("pacman")}
p_load(dplyr,ggplot2, survminer, survival, ggsurvfit, DT,
       kableExtra, skimr, forestmodel, splines,performance,gridExtra)

Datos

db <- read.csv("S1Data.csv")

db <- db %>% 
  transform(Gender = factor(Gender,levels=0:1,labels = c("Femenino","Masculino")),
            Smoking = factor(Smoking, levels = 0:1,
                             labels = c("No fumador","Fumador")),
            Diabetes = factor(Diabetes, levels = 0:1,
                              labels = c("No DM","DM")),
            BP = factor(BP, levels = 0:1, labels = c("No HTA","HTA")),
            Anaemia = factor(Anaemia, levels = 0:1,
                             labels = c("No Anemia","Anemia")),
            Pletelets = round(Pletelets/1000,1)
            ) %>% 
  mutate(event_cat = factor(Event,levels = 0:1, labels = c("Vivo","Fallecido"))
         ) %>% 
  rowwise() %>% 
  mutate(FGR = ckd_epi(Creatinine,Age,Gender)) %>% 
  as.data.frame()

Previsualización de datos

Análisis univariado

Categóricas

categoricas <- db %>% 
  select(event_cat, Gender, Smoking, Diabetes, BP, Anaemia)

desc_categoricas <- function(x, xlab = "X"){
  datos <- as.data.frame(table(x)) %>% 
    mutate(p = Freq/sum(Freq) * 100) %>% 
    mutate(label = sprintf("%.1f %%\n(n=%.0f)",p,Freq))
  
  datos %>% 
    ggplot(aes(x=x, y = p)) +
    geom_col(fill = "#562457") +
    scale_y_continuous(name = "Proporción (%)",
                       limits = c(0,100),
                       breaks = seq(0,100,20),
                       expand = c(0,0)) +
    geom_text(aes(y = p + 0.3*p, label = label), size = 3) +
    labs(x = xlab) +
    theme_classic() +
    theme(
      axis.text.x = element_text(size = 12, color = "black"),
      axis.title.x = element_text(face="bold",size=14,
                                  margin = margin(10,0,10,0)),
      axis.title.y = element_text(face = "bold", size = 14,
                                 margin = margin(0,10,0,10)),
      axis.text.y = element_text(size = 12, color = "black")
    )
}

ggpubr::ggarrange(
  desc_categoricas(categoricas$event_cat, "Mortalidad"),
  desc_categoricas(categoricas$Gender, "Sexo"),
  desc_categoricas(categoricas$Smoking, "Tabaquismo"),
  desc_categoricas(categoricas$Diabetes, "Diabetes"),
  desc_categoricas(categoricas$BP, "HTA"),
  desc_categoricas(categoricas$Anaemia, "Anemia"),
  labels = "AUTO",
  align = "hv"
)

Cuantitativas

cuantiativas <- db %>% 
  select(TIME, Age, Ejection.Fraction, Sodium, Creatinine,
         Pletelets, CPK, FGR)

desc_db <- skimr::skim(cuantiativas) %>% .[c(-1,-3,-4,-7,-11,-12)] %>% as.data.frame()
colnames(desc_db) <- c("Variable", "Media", "DE", "p25", "p50","p75")
desc_db[,2:6] <- lapply(desc_db[,2:6], function(x){round(x,1)})
p <- as.numeric(apply(cuantiativas,2,
                                     function(x){
                                       shapiro.test(x)$p.value}
))
normalidad <- if_else(p < 0.001, "<0.001", sprintf("%.3f",p))
desc_db$`Test normalidad` <- normalidad

desc_db$Histograma <- ""

kable(desc_db,  align = "c") %>%
  column_spec(8, image = spec_hist(cuantiativas, same_lim = F)) %>%
  kable_styling(font_size = 22) %>% 
  kable_paper("hover", full_width = F)
Variable Media DE p25 p50 p75 Test normalidad Histograma
TIME 130.3 77.6 73.0 115.0 203.0 <0.001
Age 60.8 11.9 51.0 60.0 70.0 <0.001
Ejection.Fraction 38.1 11.8 30.0 38.0 45.0 <0.001
Sodium 136.6 4.4 134.0 137.0 140.0 <0.001
Creatinine 1.4 1.0 0.9 1.1 1.4 <0.001
Pletelets 263.4 97.8 212.5 262.0 303.5 <0.001
CPK 581.8 970.3 116.5 250.0 582.0 <0.001
FGR 67.2 27.0 48.7 68.0 87.1 <0.001

Análisis bivariado

Cualitativas

paleta <- c("#562457", "#af78b0", "#659696","#abaa65", "#85846f")

s_sexo <- survfit2(Surv(TIME, Event) ~ Gender, data = db)

g_sexo <- s_sexo %>% 
  ggsurvfit(linewidth = 1) +
  add_confidence_interval() +
  add_pvalue(location = "annotation",
             caption = glue::glue("Log-rank: {survfit2_p(s_sexo)}"),
             x = 50,
             y = 0.10,
             size = 5) +
  scale_ggsurvfit() +
  labs(y = "Probabilidad de supervivencia",
       x = "Tiempo (días)") +
  scale_fill_manual(values = paleta[c(1,3)]) +
  scale_color_manual(values = paleta[c(1,3)]) +
  ggplot2::theme(
        axis.title = element_text(face = "bold", size = 16),
        axis.text = element_text(size = 14),
        legend.text = element_text(size = 14),
        legend.title = element_text(face = "bold",
                                    size = 16)
  )

s_tabaquismo <- survfit2(Surv(TIME, Event) ~ Smoking, data = db)

g_tabaquismo <- s_tabaquismo %>% 
  ggsurvfit(linewidth = 1) +
  add_confidence_interval() +
  add_pvalue(location = "annotation",
             caption = glue::glue("Log-rank: {survfit2_p(s_tabaquismo)}"),
             x = 50,
             y = 0.10,
             size = 5) +
  scale_ggsurvfit() +
  labs(y = "Probabilidad de supervivencia",
       x = "Tiempo (días)") +
  scale_fill_manual(values = paleta[c(1,3)]) +
  scale_color_manual(values = paleta[c(1,3)]) +
  ggplot2::theme(
        axis.title = element_text(face = "bold", size = 16),
        axis.text = element_text(size = 14),
        legend.text = element_text(size = 14),
        legend.title = element_text(face = "bold",
                                    size = 16)
  )

s_diabetes <- survfit2(Surv(TIME, Event) ~ Diabetes,
                        data = db)

g_diabetes <- s_diabetes %>% 
  ggsurvfit(linewidth = 1) +
  add_confidence_interval() +
  add_pvalue(location = "annotation",
             caption = glue::glue("Log-rank: {survfit2_p(s_diabetes)}"),
             x = 50,
             y = 0.10,
             size = 5) +
  scale_ggsurvfit() +
  labs(y = "Probabilidad de supervivencia",
       x = "Tiempo (días)") +
  scale_fill_manual(values = paleta[c(1,3)]) +
  scale_color_manual(values = paleta[c(1,3)]) +
  ggplot2::theme(
        axis.title = element_text(face = "bold", size = 16),
        axis.text = element_text(size = 14),
        legend.text = element_text(size = 14),
        legend.title = element_text(face = "bold",
                                    size = 16)
  )

s_bp <- survfit2(Surv(TIME, Event) ~ BP, data = db)

g_bp <- s_bp %>% 
  ggsurvfit(linewidth = 1) +
  add_confidence_interval() +
  add_pvalue(location = "annotation",
             caption = glue::glue("Log-rank: {survfit2_p(s_bp)}"),
             x = 50,
             y = 0.10,
             size = 5) +
  scale_ggsurvfit() +
  labs(y = "Probabilidad de supervivencia",
       x = "Tiempo (días)") +
  scale_fill_manual(values = paleta[c(1,3)]) +
  scale_color_manual(values = paleta[c(1,3)]) +
  ggplot2::theme(
        axis.title = element_text(face = "bold", size = 16),
        axis.text = element_text(size = 14),
        legend.text = element_text(size = 14),
        legend.title = element_text(face = "bold",
                                    size = 16)
  )

s_Anaemia <- survfit2(Surv(TIME, Event) ~ Anaemia, data = db)

g_Anaemia <- s_Anaemia %>% 
  ggsurvfit(linewidth = 1) +
  add_confidence_interval() +
  add_pvalue(location = "annotation",
             caption = glue::glue("Log-rank: {survfit2_p(s_Anaemia)}"),
             x = 50,
             y = 0.10,
             size = 5) +
  scale_ggsurvfit() +
  labs(y = "Probabilidad de supervivencia",
       x = "Tiempo (días)") +
  scale_fill_manual(values = paleta[c(1,3)]) +
  scale_color_manual(values = paleta[c(1,3)]) +
  ggplot2::theme(
        axis.title = element_text(face = "bold", size = 16),
        axis.text = element_text(size = 14),
        legend.text = element_text(size = 14),
        legend.title = element_text(face = "bold",
                                    size = 16)
  )

db <- db %>% 
mutate(FGR_cat = factor(
case_when(FGR >= 60 ~ 0,
          FGR >= 30 & FGR < 60 ~ 1,
          FGR < 30 ~ 2
          ),
levels = 0:2,
labels = c("≥ 60","30-60","< 30")
))

s_fgr <- survfit2(Surv(TIME, Event) ~ FGR_cat, data = db)

g_FGR <- s_fgr %>% 
  ggsurvfit(linewidth = 1) +
  add_confidence_interval() +
  add_pvalue(location = "annotation",
             caption = glue::glue("Log-rank: {survfit2_p(s_fgr)}"),
             x = 50,
             y = 0.10,
             size = 5) +
  scale_ggsurvfit() +
  labs(y = "Probabilidad de supervivencia",
       x = "Tiempo (días)") +
  scale_fill_manual(values = paleta) +
  scale_color_manual(values = paleta) +
  ggplot2::theme(
        axis.title = element_text(face = "bold", size = 16),
        axis.text = element_text(size = 14),
        legend.text = element_text(size = 14),
        legend.title = element_text(face = "bold",
                                    size = 16)
  )

Cuantitativas

s_edad <- survreg(Surv(TIME, Event) ~ splines::bs(Age) ,
data = db, dist = "gaussian")

p <- anova(s_edad)$`Pr(>Chi)`[2]
p <- ifelse(p < 0.001, "p < 0.001", sprintf("p = %.3f",p))

g_edad <- db %>%
ggplot(aes(x=Age,y=TIME)) +
geom_point(aes(alpha = event_cat),color=paleta[1], size =3) +
scale_alpha_manual(values = c(0.1,1)) +
geom_line(data = tibble(x=db$Age,y=predict(s_edad)),
aes(x=x,y=y), color = paleta[3],linewidth = 1,inherit.aes = FALSE) +
scale_color_manual(values = paleta) +
labs(y = "Tiempo medio hasta el evento", x= "Edad",
alpha = "Status al egreso", subtitle = p) +
theme_classic()+
theme(
        axis.title = element_text(face = "bold", size = 16),
        axis.text = element_text(size = 14),
        legend.text = element_text(size = 14),
        legend.title = element_text(face = "bold",
                                    size = 16),
        legend.position = "top",
        plot.subtitle = element_text(size = 14)
)

s_EF <- survreg(Surv(TIME, Event) ~ splines::bs(Ejection.Fraction) ,
data = db, dist = "gaussian")

p <- anova(s_EF)$`Pr(>Chi)`[2]
p <- ifelse(p < 0.001, "p < 0.001", sprintf("p = %.3f",p))

g_EF <- db %>%
ggplot(aes(x=Ejection.Fraction,y=TIME)) +
geom_point(aes(alpha = event_cat),color=paleta[1], size =3) +
scale_alpha_manual(values = c(0.1,1)) +
geom_line(data = tibble(x=db$Ejection.Fraction,y=predict(s_EF)),
aes(x=x,y=y), color = paleta[3],linewidth = 1,inherit.aes = FALSE) +
scale_color_manual(values = paleta) +
labs(y = "Tiempo medio hasta el evento", x= "Fracción de eyección",
alpha = "Status al egreso", subtitle = p) +
theme_classic()+
theme(
        axis.title = element_text(face = "bold", size = 16),
        axis.text = element_text(size = 14),
        legend.text = element_text(size = 14),
        legend.title = element_text(face = "bold",
                                    size = 16),
        legend.position = "top",
        plot.subtitle = element_text(size = 14)
)

s_Na <- survreg(Surv(TIME, Event) ~ splines::bs(Sodium) ,
data = db %>% filter(Sodium > 120), dist = "gaussian")

p <- anova(s_Na)$`Pr(>Chi)`[2]
p <- ifelse(p < 0.001, "p < 0.001", sprintf("p = %.3f",p))

g_Na <- db %>% filter(Sodium > 120) %>% 
ggplot(aes(x=Sodium,y=TIME)) +
geom_point(aes(alpha = event_cat),color=paleta[1], size =3) +
scale_alpha_manual(values = c(0.1,1)) +
geom_line(data = tibble(x=db$Sodium[db$Sodium>120],y=predict(s_Na)),
aes(x=x,y=y), color = paleta[3],linewidth = 1,inherit.aes = FALSE) +
scale_color_manual(values = paleta) +
labs(y = "Tiempo medio hasta el evento", x= "Sodio sérico",
alpha = "Status al egreso", subtitle = p) +
theme_classic()+
theme(
        axis.title = element_text(face = "bold", size = 16),
        axis.text = element_text(size = 14),
        legend.text = element_text(size = 14),
        legend.title = element_text(face = "bold",
                                    size = 16),
        legend.position = "top",
        plot.subtitle = element_text(size = 14)
)

db <- db %>% 
mutate(Na_cat = factor(
case_when(Sodium >= 135 & Sodium <= 145 ~ 0,
Sodium < 135 ~ 1,
Sodium > 145 ~ 2),
levels = 0:2,
labels = c("135-145","< 135", "> 145")))

s_plaquetas <- survreg(Surv(TIME, Event) ~ splines::bs(Pletelets) ,
data = db %>% filter(Pletelets < 600), dist = "gaussian")

p <- anova(s_plaquetas)$`Pr(>Chi)`[2]
p <- ifelse(p < 0.001, "p < 0.001", sprintf("p = %.3f",p))

g_plaquetas <- db %>% filter(Pletelets < 600) %>%
ggplot(aes(x=Pletelets,y=TIME)) +
geom_point(aes(alpha = event_cat),color=paleta[1], size =3) +
scale_alpha_manual(values = c(0.1,1)) +
geom_line(data = tibble(x=db$Pletelets[db$Pletelets <600],y=predict(s_plaquetas)),
aes(x=x,y=y), color = paleta[3],linewidth = 1,inherit.aes = FALSE) +
scale_color_manual(values = paleta) +
labs(y = "Tiempo medio hasta el evento", x= "Plaquetas",
alpha = "Status al egreso", subtitle = p) +
theme_classic()+
theme(
        axis.title = element_text(face = "bold", size = 16),
        axis.text = element_text(size = 14),
        legend.text = element_text(size = 14),
        legend.title = element_text(face = "bold",
                                    size = 16),
        legend.position = "top",
        plot.subtitle = element_text(size = 14)
)

s_CPK <- survreg(Surv(TIME, Event) ~ splines::bs(log(CPK)) ,
data = db, dist = "gaussian")

p <- anova(s_CPK)$`Pr(>Chi)`[2]
p <- ifelse(p < 0.001, "p < 0.001", sprintf("p = %.3f",p))

g_CPK <- db %>%
ggplot(aes(x=log(CPK),y=TIME)) +
geom_point(aes(alpha = event_cat),color=paleta[1], size =3) +
scale_alpha_manual(values = c(0.1,1)) +
geom_line(data = tibble(x=log(db$CPK),y=predict(s_CPK)),
aes(x=x,y=y), color = paleta[3],linewidth = 1,inherit.aes = FALSE) +
scale_color_manual(values = paleta) +
labs(y = "Tiempo medio hasta el evento", x= "Log(CPK)",
alpha = "Status al egreso", subtitle = p) +
theme_classic()+
theme(
        axis.title = element_text(face = "bold", size = 16),
        axis.text = element_text(size = 14),
        legend.text = element_text(size = 14),
        legend.title = element_text(face = "bold",
                                    size = 16),
        legend.position = "top",
        plot.subtitle = element_text(size = 14)
)

Modelo de regresión de Cox

modelo1 <- coxph(Surv(TIME, Event) ~ Age + BP + Anaemia + FGR_cat +
Ejection.Fraction+ Na_cat , data = db)
summary(modelo1)
Call:
coxph(formula = Surv(TIME, Event) ~ Age + BP + Anaemia + FGR_cat + 
    Ejection.Fraction + Na_cat, data = db)

  n= 299, number of events= 96 

                       coef exp(coef)  se(coef)      z Pr(>|z|)    
Age                0.036937  1.037628  0.009296  3.974 7.08e-05 ***
BPHTA              0.476140  1.609848  0.212005  2.246  0.02471 *  
AnaemiaAnemia      0.536851  1.710611  0.212977  2.521  0.01171 *  
FGR_cat30-60       0.650604  1.916699  0.245998  2.645  0.00818 ** 
FGR_cat< 30        1.294983  3.650934  0.292114  4.433 9.29e-06 ***
Ejection.Fraction -0.043055  0.957859  0.010789 -3.991 6.59e-05 ***
Na_cat< 135        0.266808  1.305789  0.224130  1.190  0.23388    
Na_cat> 145        1.074386  2.928194  1.041535  1.032  0.30229    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

                  exp(coef) exp(-coef) lower .95 upper .95
Age                  1.0376     0.9637    1.0189    1.0567
BPHTA                1.6098     0.6212    1.0625    2.4392
AnaemiaAnemia        1.7106     0.5846    1.1268    2.5968
FGR_cat30-60         1.9167     0.5217    1.1835    3.1042
FGR_cat< 30          3.6509     0.2739    2.0595    6.4722
Ejection.Fraction    0.9579     1.0440    0.9378    0.9783
Na_cat< 135          1.3058     0.7658    0.8416    2.0261
Na_cat> 145          2.9282     0.3415    0.3802   22.5505

Concordance= 0.734  (se = 0.027 )
Likelihood ratio test= 80.52  on 8 df,   p=4e-14
Wald test            = 80.47  on 8 df,   p=4e-14
Score (logrank) test = 86.04  on 8 df,   p=3e-15
modelo2 <- coxph(Surv(TIME, Event) ~ Age + BP + Anaemia + FGR_cat + 
Ejection.Fraction, data = db)
summary(modelo2)
Call:
coxph(formula = Surv(TIME, Event) ~ Age + BP + Anaemia + FGR_cat + 
    Ejection.Fraction, data = db)

  n= 299, number of events= 96 

                       coef exp(coef)  se(coef)      z Pr(>|z|)    
Age                0.039159  1.039936  0.009201  4.256 2.08e-05 ***
BPHTA              0.483081  1.621062  0.211219  2.287  0.02219 *  
AnaemiaAnemia      0.508608  1.662975  0.211678  2.403  0.01627 *  
FGR_cat30-60       0.683916  1.981623  0.240861  2.839  0.00452 ** 
FGR_cat< 30        1.368169  3.928152  0.282782  4.838 1.31e-06 ***
Ejection.Fraction -0.044232  0.956732  0.010484 -4.219 2.45e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

                  exp(coef) exp(-coef) lower .95 upper .95
Age                  1.0399     0.9616    1.0214    1.0589
BPHTA                1.6211     0.6169    1.0715    2.4524
AnaemiaAnemia        1.6630     0.6013    1.0983    2.5181
FGR_cat30-60         1.9816     0.5046    1.2359    3.1772
FGR_cat< 30          3.9282     0.2546    2.2568    6.8374
Ejection.Fraction    0.9567     1.0452    0.9373    0.9766

Concordance= 0.729  (se = 0.028 )
Likelihood ratio test= 78.4  on 6 df,   p=8e-15
Wald test            = 77.9  on 6 df,   p=1e-14
Score (logrank) test = 82.95  on 6 df,   p=9e-16
compare_performance(modelo1,modelo2)
# Comparison of Model Performance Indices

Name    | Model | AIC (weights) | AICc (weights) | BIC (weights)
----------------------------------------------------------------
modelo1 | coxph | 953.9 (0.281) |  954.4 (0.260) | 983.5 (0.010)
modelo2 | coxph | 952.0 (0.719) |  952.3 (0.740) | 974.2 (0.990)

Name    | Nagelkerke's R2 |  RMSE | Sigma
-----------------------------------------
modelo1 |           0.244 | 0.545 | 0.000
modelo2 |           0.239 | 0.545 | 0.000

g1 <- ggcoxzph(cox.zph(modelo2), var = c("Age","BP","Anaemia"),
point.col = paleta[1],font.main = 10)
gridExtra::marrangeGrob(g1, nrow = 1, ncol = 3, top = NA)

g2 <- ggcoxzph(cox.zph(modelo2), var = c("FGR_cat","Ejection.Fraction"),
point.col = paleta[1],font.main = 10)
gridExtra::marrangeGrob(g2, nrow = 1, ncol = 2, top = NA)

Conclusiones

La edad estuvo asociada con un aumento moderado pero significativo del riesgo (HR=1,04; IC95%: 1,02-1,06 por cada año adicional). La presencia de hipertensión (HR=1,62; IC95%: 1,07-2,45) y anemia (HR=1,66; IC95%: 1,10-2,52) también incrementaron significativamente el riesgo. Además, el riesgo aumentó marcadamente con la disminución del filtrado glomerular (FGR): HR=1,98 (IC95%: 1,24-3,18) para un FGR entre 30 y 60, y HR=3,93 (IC95%: 2,26-6,84) para un FGR inferior a 30. Por otro lado, una mayor fracción de eyección mostró un efecto protector significativo, reduciendo el riesgo en un 4 % por cada unidad adicional (HR=0,96; IC95%: 0,94-0,98).

¡MUCHAS GRACIAS!