Presentación del problema El dataset stroke.xlsx describe datos de pacientes con ACV (accidente cerebrovascular), incluyendo variables como el ID de paciente, edad, género, hipertensión, diabetes, hábito tabáquico, etc.
Variables (entre otras): ID: Identificador del paciente gender: Género del paciente (“F” o “M”) age: Edad htn: Hipertensión (“Yes” / “No”) dm: Diabetes mellitus (“Yes” / “No”) Smoking: Hábito tabáquico (“Yes” / “No”) type: Tipo de ACV (“ischemic” / “hemorrhage”) outcome: Estado al alta (“alive” / “dead”) cholest: Colesterol (mg/dl) glyc: Glucemia (mg/dl)
Para resolver esta guía vas a necesitar el paquete {tidyverse} (en particular {dplyr}) para manipular y procesar los de datos.
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.4.3
## Warning: package 'ggplot2' was built under R version 4.4.3
## Warning: package 'tibble' was built under R version 4.4.3
## Warning: package 'readr' was built under R version 4.4.3
## Warning: package 'stringr' was built under R version 4.4.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(readxl)
Ejercicio 1
stroke <- read_xlsx("stroke.xlsx")
str(stroke)
## tibble [1,000 × 10] (S3: tbl_df/tbl/data.frame)
## $ ID : num [1:1000] 1001 1002 1003 1004 1005 ...
## $ gender : chr [1:1000] "F" "M" "F" "M" ...
## $ age : num [1:1000] 64 59 53 57 74 68 83 65 58 64 ...
## $ htn : chr [1:1000] "Yes" "No" "Yes" "No" ...
## $ dm : chr [1:1000] "Yes" "No" "No" "No" ...
## $ Smoking: chr [1:1000] "Yes" "No" "No" "No" ...
## $ type : chr [1:1000] "ischemic" "ischemic" "ischemic" "hemorrhage" ...
## $ outcome: chr [1:1000] "alive" "alive" "alive" "alive" ...
## $ cholest: num [1:1000] 185 218 206 190 192 210 219 252 183 253 ...
## $ glyc : num [1:1000] 74 84 116 119 90 118 92 116 113 103 ...
nrow(stroke)
## [1] 1000
ncol(stroke)
## [1] 10
sapply(stroke, class)
## ID gender age htn dm Smoking
## "numeric" "character" "numeric" "character" "character" "character"
## type outcome cholest glyc
## "character" "character" "numeric" "numeric"
stroke %>% mutate(gender = as.factor(gender), htn = as.factor(htn), type = as.factor(type), outcome = as.factor(outcome), Smoking = as.factor(Smoking), dm = as.factor(dm))
## # A tibble: 1,000 × 10
## ID gender age htn dm Smoking type outcome cholest glyc
## <dbl> <fct> <dbl> <fct> <fct> <fct> <fct> <fct> <dbl> <dbl>
## 1 1001 F 64 Yes Yes Yes ischemic alive 185 74
## 2 1002 M 59 No No No ischemic alive 218 84
## 3 1003 F 53 Yes No No ischemic alive 206 116
## 4 1004 M 57 No No No hemorrhage alive 190 119
## 5 1005 M 74 No No Yes ischemic alive 192 90
## 6 1006 F 68 No Yes No ischemic alive 210 118
## 7 1007 F 83 Yes Yes No ischemic dead 219 92
## 8 1008 M 65 No No Yes ischemic alive 252 116
## 9 1009 M 58 No Yes No ischemic alive 183 113
## 10 1010 F 64 No No No ischemic alive 253 103
## # ℹ 990 more rows
stroke_clean <- stroke %>% drop_na()
stroke_clean <- stroke %>%
drop_na() %>%
distinct()
stroke_clean <- stroke %>%
drop_na() %>%
distinct() %>%
arrange(desc(glyc))
#Base resultante:
stroke_clean
## # A tibble: 968 × 10
## ID gender age htn dm Smoking type outcome cholest glyc
## <dbl> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 1647 M 68 No No No ischemic alive 202 181
## 2 1857 M 95 No No No ischemic alive 171 179
## 3 1580 F 84 No No No ischemic dead 222 177
## 4 1768 F 93 No No No ischemic alive 153 170
## 5 1673 F 52 Yes No No ischemic alive 200 169
## 6 1803 F 58 No No Yes hemorrhage dead 249 169
## 7 1625 F 62 No No No ischemic alive 135 168
## 8 1743 F 82 Yes No Yes hemorrhage dead 260 168
## 9 1969 M 61 No No No hemorrhage alive 211 168
## 10 1543 M 78 No Yes No ischemic alive 170 167
## # ℹ 958 more rows
stroke_clean %>% filter(htn == "Yes" & Smoking =="Yes")
## # A tibble: 76 × 10
## ID gender age htn dm Smoking type outcome cholest glyc
## <dbl> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 1743 F 82 Yes No Yes hemorrhage dead 260 168
## 2 1715 F 78 Yes No Yes ischemic alive 220 166
## 3 1280 M 74 Yes Yes Yes ischemic alive 186 161
## 4 1212 F 57 Yes No Yes hemorrhage dead 240 153
## 5 1573 M 68 Yes No Yes hemorrhage alive 291 153
## 6 1594 F 55 Yes Yes Yes ischemic alive 224 152
## 7 1771 M 80 Yes No Yes ischemic alive 148 150
## 8 1216 M 77 Yes No Yes ischemic alive 243 146
## 9 1645 F 72 Yes No Yes ischemic alive 207 141
## 10 1722 M 73 Yes No Yes ischemic alive 217 139
## # ℹ 66 more rows
stroke_clean %>%
filter(htn == "Yes" & Smoking =="Yes") %>%
count()
## # A tibble: 1 × 1
## n
## <int>
## 1 76
stroke_clean %>%
filter(htn == "Yes" & Smoking =="Yes") %>%
reframe(rango = range(age))
## # A tibble: 2 × 1
## rango
## <dbl>
## 1 52
## 2 95
Agregar una columna al dataset stroke_clean llamada colesterol_categoria que sea “alto” si cholest >= 200 y “normal” en caso contrario.
stroke_clean %>%
mutate(colesterol_categoria = case_when(cholest < 200 ~ "normal", cholest >= 200 ~ "alto"))
## # A tibble: 968 × 11
## ID gender age htn dm Smoking type outcome cholest glyc
## <dbl> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 1647 M 68 No No No ischemic alive 202 181
## 2 1857 M 95 No No No ischemic alive 171 179
## 3 1580 F 84 No No No ischemic dead 222 177
## 4 1768 F 93 No No No ischemic alive 153 170
## 5 1673 F 52 Yes No No ischemic alive 200 169
## 6 1803 F 58 No No Yes hemorrhage dead 249 169
## 7 1625 F 62 No No No ischemic alive 135 168
## 8 1743 F 82 Yes No Yes hemorrhage dead 260 168
## 9 1969 M 61 No No No hemorrhage alive 211 168
## 10 1543 M 78 No Yes No ischemic alive 170 167
## # ℹ 958 more rows
## # ℹ 1 more variable: colesterol_categoria <chr>
stroke_clean %>%
mutate(colesterol_categoria = case_when(cholest < 200 ~ "normal", cholest >= 200 ~ "alto"),
glucemia_categoria = case_when(glyc < 110 ~ "normal", glyc >= 110 & glyc <= 125 ~ "prediabetes", glyc >= 126 ~ "diabetes"))
## # A tibble: 968 × 12
## ID gender age htn dm Smoking type outcome cholest glyc
## <dbl> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 1647 M 68 No No No ischemic alive 202 181
## 2 1857 M 95 No No No ischemic alive 171 179
## 3 1580 F 84 No No No ischemic dead 222 177
## 4 1768 F 93 No No No ischemic alive 153 170
## 5 1673 F 52 Yes No No ischemic alive 200 169
## 6 1803 F 58 No No Yes hemorrhage dead 249 169
## 7 1625 F 62 No No No ischemic alive 135 168
## 8 1743 F 82 Yes No Yes hemorrhage dead 260 168
## 9 1969 M 61 No No No hemorrhage alive 211 168
## 10 1543 M 78 No Yes No ischemic alive 170 167
## # ℹ 958 more rows
## # ℹ 2 more variables: colesterol_categoria <chr>, glucemia_categoria <chr>
stroke_clean %>%
group_by(type,outcome) %>%
summarise (media_edad_subgrupo = mean(age, na.rm = TRUE), mediana_colesterol_subgrupo = median(cholest, na.rm = TRUE), ntotal = n()) %>%
ungroup()
## `summarise()` has grouped output by 'type'. You can override using the
## `.groups` argument.
## # A tibble: 4 × 5
## type outcome media_edad_subgrupo mediana_colesterol_subgrupo ntotal
## <chr> <chr> <dbl> <dbl> <int>
## 1 hemorrhage alive 74.7 201 187
## 2 hemorrhage dead 71.7 206. 48
## 3 ischemic alive 73.6 201 586
## 4 ischemic dead 72.8 201 147
stroke_clean %>%
group_by(type, outcome) %>%
summarise (media_edad_subgrupo = mean(age, na.rm = TRUE),
mediana_colesterol_subgrupo = median(cholest, na.rm = TRUE), ntotal = n()) %>%
arrange(desc(ntotal)) %>%
ungroup()
## `summarise()` has grouped output by 'type'. You can override using the
## `.groups` argument.
## # A tibble: 4 × 5
## type outcome media_edad_subgrupo mediana_colesterol_subgrupo ntotal
## <chr> <chr> <dbl> <dbl> <int>
## 1 ischemic alive 73.6 201 586
## 2 hemorrhage alive 74.7 201 187
## 3 ischemic dead 72.8 201 147
## 4 hemorrhage dead 71.7 206. 48
stroke_clean %>%
drop_na() %>%
group_by(gender) %>%
summarise(promedio_glyc = mean(glyc, na.rm = TRUE)) %>%
ungroup()
## # A tibble: 2 × 2
## gender promedio_glyc
## <chr> <dbl>
## 1 F 110.
## 2 M 111.
-El porcentaje de casos con hipertensión.
stroke_clean %>%
drop_na() %>%
group_by(gender) %>%
summarise(porcentaje_casos_hta = sum(htn == "Yes")/n() * 100) %>%
ungroup()
## # A tibble: 2 × 2
## gender porcentaje_casos_hta
## <chr> <dbl>
## 1 F 28.7
## 2 M 34.7
-La frecuencia conjunta de diabetes y tabaquismo.
stroke_clean %>%
drop_na() %>%
group_by(gender) %>%
summarise(frecuencia_DBT_TBQ = sum(htn == "Yes" & Smoking == "Yes")/n() * 100) %>%
ungroup()
## # A tibble: 2 × 2
## gender frecuencia_DBT_TBQ
## <chr> <dbl>
## 1 F 6.04
## 2 M 10.0
stroke_clean %>%
drop_na() %>% mutate(colesterol_categoria = case_when(cholest < 200 ~ "normal", cholest >= 200 ~ "alto"),
glucemia_categoria = case_when(glyc < 110 ~ "normal", glyc >= 110 & glyc <= 125 ~ "prediabetes", glyc >= 126 ~ "diabetes")) %>%
group_by(gender, type) %>%
summarise(proporción_fumadores = sum(glucemia_categoria == "diabetes" & Smoking == "Yes")/n() * 100) %>%
ungroup()
## `summarise()` has grouped output by 'gender'. You can override using the
## `.groups` argument.
## # A tibble: 4 × 3
## gender type proporción_fumadores
## <chr> <chr> <dbl>
## 1 F hemorrhage 8.76
## 2 F ischemic 4.83
## 3 M hemorrhage 8.16
## 4 M ischemic 6.47
Ejercicio 2
En muchos escenarios médicos, la tabla de pacientes (como la de stroke) puede estar vinculada con otra tabla que incluya, por ejemplo, datos de laboratorio complementarios o evolución en la internación. En este ejercicio vamos a trabajar con un segundo dataset, stroke_lab, que tiene el id de los pacientes (en algunos casos estos pacientes coincidirán con los del ejercicio anterior, pero no en todos) y algunas variables extra, que resultaron de un análisis de laboratorio posterior.
Cargar los datos del archivo stroke_lab.xls en el objeto stroke_lab.
stroke_lab <- read_xlsx("stroke_lab.xlsx")
class(stroke_lab)
## [1] "tbl_df" "tbl" "data.frame"
str(stroke_lab)
## tibble [800 × 3] (S3: tbl_df/tbl/data.frame)
## $ id : num [1:800] 1414 1462 1178 1525 1194 ...
## $ BUN : num [1:800] 37 18.5 38.4 40.2 44.6 ...
## $ Creatinina: num [1:800] 1.22 0.56 1.31 1.37 1.21 1.09 0.65 1.73 0.96 1.1 ...
dim(stroke_lab)
## [1] 800 3
nrow(stroke_lab)
## [1] 800
ncol(stroke_lab)
## [1] 3
sapply(stroke_lab,class)
## id BUN Creatinina
## "numeric" "numeric" "numeric"
stroke_left <- stroke %>%
left_join(stroke_lab, by = c("ID" = "id"))
¿Cuántas filas tiene stroke_left?
nrow(stroke_left)
## [1] 1000
¿Cuántos NA aparecen en las columnas nuevas para los IDs que no coinciden?
colSums(is.na(stroke_left))
## ID gender age htn dm Smoking type
## 0 10 0 4 14 4 0
## outcome cholest glyc BUN Creatinina
## 0 0 0 439 439
stroke_left %>%
mutate(filtrado = case_when (is.na(BUN) | is.na(Creatinina) ~ "SinLab", !is.na(BUN) & !is.na(Creatinina) ~ "TieneLab"))
## # A tibble: 1,000 × 13
## ID gender age htn dm Smoking type outcome cholest glyc BUN
## <dbl> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 1001 F 64 Yes Yes Yes ischemic alive 185 74 17.8
## 2 1002 M 59 No No No ischemic alive 218 84 NA
## 3 1003 F 53 Yes No No ischemic alive 206 116 NA
## 4 1004 M 57 No No No hemorrhage alive 190 119 47.8
## 5 1005 M 74 No No Yes ischemic alive 192 90 25.3
## 6 1006 F 68 No Yes No ischemic alive 210 118 NA
## 7 1007 F 83 Yes Yes No ischemic dead 219 92 40.0
## 8 1008 M 65 No No Yes ischemic alive 252 116 43.7
## 9 1009 M 58 No Yes No ischemic alive 183 113 30.8
## 10 1010 F 64 No No No ischemic alive 253 103 47.9
## # ℹ 990 more rows
## # ℹ 2 more variables: Creatinina <dbl>, filtrado <chr>
stroke_left %>%
mutate(filtrado = case_when
(is.na(BUN) | is.na(Creatinina) ~ "SinLab", !is.na(BUN) & !is.na(Creatinina) ~ "TieneLab")) %>%
group_by(type) %>% summarise(proporción_faltantes = sum(filtrado == "SinLab")/n() * 100) %>% ungroup()
## # A tibble: 2 × 2
## type proporción_faltantes
## <chr> <dbl>
## 1 hemorrhage 45.2
## 2 ischemic 43.5
stroke %>%
full_join(stroke_lab, by = c("ID" = "id"))
## # A tibble: 1,239 × 12
## ID gender age htn dm Smoking type outcome cholest glyc BUN
## <dbl> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 1001 F 64 Yes Yes Yes ischemic alive 185 74 17.8
## 2 1002 M 59 No No No ischemic alive 218 84 NA
## 3 1003 F 53 Yes No No ischemic alive 206 116 NA
## 4 1004 M 57 No No No hemorrhage alive 190 119 47.8
## 5 1005 M 74 No No Yes ischemic alive 192 90 25.3
## 6 1006 F 68 No Yes No ischemic alive 210 118 NA
## 7 1007 F 83 Yes Yes No ischemic dead 219 92 40.0
## 8 1008 M 65 No No Yes ischemic alive 252 116 43.7
## 9 1009 M 58 No Yes No ischemic alive 183 113 30.8
## 10 1010 F 64 No No No ischemic alive 253 103 47.9
## # ℹ 1,229 more rows
## # ℹ 1 more variable: Creatinina <dbl>
stroke %>%
inner_join(stroke_lab, by = c("ID" = "id"))
## # A tibble: 561 × 12
## ID gender age htn dm Smoking type outcome cholest glyc BUN
## <dbl> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 1001 F 64 Yes Yes Yes ischemic alive 185 74 17.8
## 2 1004 M 57 No No No hemorrhage alive 190 119 47.8
## 3 1005 M 74 No No Yes ischemic alive 192 90 25.3
## 4 1007 F 83 Yes Yes No ischemic dead 219 92 40.0
## 5 1008 M 65 No No Yes ischemic alive 252 116 43.7
## 6 1009 M 58 No Yes No ischemic alive 183 113 30.8
## 7 1010 F 64 No No No ischemic alive 253 103 47.9
## 8 1012 F 89 No No No hemorrhage alive 174 98 21.3
## 9 1015 F 68 No <NA> No ischemic alive 217 102 37.7
## 10 1016 M 58 No No Yes ischemic alive 145 83 31.2
## # ℹ 551 more rows
## # ℹ 1 more variable: Creatinina <dbl>
stroke %>%
right_join(stroke_lab, by = c("ID" = "id"))
## # A tibble: 800 × 12
## ID gender age htn dm Smoking type outcome cholest glyc BUN
## <dbl> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 1001 F 64 Yes Yes Yes ischemic alive 185 74 17.8
## 2 1004 M 57 No No No hemorrhage alive 190 119 47.8
## 3 1005 M 74 No No Yes ischemic alive 192 90 25.3
## 4 1007 F 83 Yes Yes No ischemic dead 219 92 40.0
## 5 1008 M 65 No No Yes ischemic alive 252 116 43.7
## 6 1009 M 58 No Yes No ischemic alive 183 113 30.8
## 7 1010 F 64 No No No ischemic alive 253 103 47.9
## 8 1012 F 89 No No No hemorrhage alive 174 98 21.3
## 9 1015 F 68 No <NA> No ischemic alive 217 102 37.7
## 10 1016 M 58 No No Yes ischemic alive 145 83 31.2
## # ℹ 790 more rows
## # ℹ 1 more variable: Creatinina <dbl>
Verificar.
semi_join(stroke, stroke_lab, by = c("ID" = "id"))
## # A tibble: 561 × 10
## ID gender age htn dm Smoking type outcome cholest glyc
## <dbl> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 1001 F 64 Yes Yes Yes ischemic alive 185 74
## 2 1004 M 57 No No No hemorrhage alive 190 119
## 3 1005 M 74 No No Yes ischemic alive 192 90
## 4 1007 F 83 Yes Yes No ischemic dead 219 92
## 5 1008 M 65 No No Yes ischemic alive 252 116
## 6 1009 M 58 No Yes No ischemic alive 183 113
## 7 1010 F 64 No No No ischemic alive 253 103
## 8 1012 F 89 No No No hemorrhage alive 174 98
## 9 1015 F 68 No <NA> No ischemic alive 217 102
## 10 1016 M 58 No No Yes ischemic alive 145 83
## # ℹ 551 more rows
¿Qué hace el código anti_join(stroke, stroke_lab, by
=c("ID" = "id"))? #### Respuesta: lo que hace es
mostrarte las filas de aquellas observaciones en las columnas de ambos
data set donde el ID no coincide entre ellas. Verificar.
anti_join(stroke, stroke_lab, by =c("ID" = "id"))
## # A tibble: 439 × 10
## ID gender age htn dm Smoking type outcome cholest glyc
## <dbl> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 1002 M 59 No No No ischemic alive 218 84
## 2 1003 F 53 Yes No No ischemic alive 206 116
## 3 1006 F 68 No Yes No ischemic alive 210 118
## 4 1011 M 59 No Yes No ischemic alive 141 86
## 5 1013 M 62 No No No ischemic alive 146 122
## 6 1014 M 93 Yes No No ischemic alive 212 98
## 7 1020 M 66 Yes No No ischemic dead 138 100
## 8 1021 M 65 No No Yes ischemic dead 235 82
## 9 1024 <NA> 68 No No No hemorrhage alive 197 99
## 10 1026 M 69 Yes No No ischemic dead 185 150
## # ℹ 429 more rows
stroke_con_lab <- inner_join(stroke_clean,stroke_lab, by = c("ID" = "id"))
stroke_con_lab %>%
filter(Creatinina > 1.2)
## # A tibble: 261 × 12
## ID gender age htn dm Smoking type outcome cholest glyc BUN
## <dbl> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 1768 F 93 No No No ischemic alive 153 170 38.5
## 2 1969 M 61 No No No hemorrhage alive 211 168 36.4
## 3 1543 M 78 No Yes No ischemic alive 170 167 18.8
## 4 1737 M 62 No No No ischemic alive 252 167 25.0
## 5 1235 F 52 No Yes No ischemic alive 128 166 17.5
## 6 1280 M 74 Yes Yes Yes ischemic alive 186 161 26.1
## 7 1315 F 75 No No Yes ischemic alive 209 161 26.6
## 8 1740 F 62 Yes No No ischemic dead 234 161 23.7
## 9 1524 F 52 No Yes No ischemic dead 201 160 31.7
## 10 1959 F 90 Yes Yes No ischemic alive 217 159 22.2
## # ℹ 251 more rows
## # ℹ 1 more variable: Creatinina <dbl>
stroke_con_lab %>%
count(Creatinina > 1.2)
## # A tibble: 2 × 2
## `Creatinina > 1.2` n
## <lgl> <int>
## 1 FALSE 282
## 2 TRUE 261
stroke_con_lab %>%
drop_na() %>%
summarise(proporción_hipertensos = sum(htn == "Yes")/n() * 100)
## # A tibble: 1 × 1
## proporción_hipertensos
## <dbl>
## 1 31.3
stroke_con_lab %>%
drop_na() %>%
summarise(proporción_DBT = sum(glyc >= 126)/n() * 100)
## # A tibble: 1 × 1
## proporción_DBT
## <dbl>
## 1 26.9
-Agrupar por outcome y calcular el promedio de BUN en cada grupo.
stroke_con_lab %>%
group_by(outcome) %>%
summarise (promedio_BUN_por_outcome = mean(BUN)) %>%
ungroup()
## # A tibble: 2 × 2
## outcome promedio_BUN_por_outcome
## <chr> <dbl>
## 1 alive 32.8
## 2 dead 31.1
EXTRA: A modo de práctica con pipes, crear un único conjunto de “tuberías” (es decir, uno o varios %>% seguidos) que en el dataset stroke_con_lab: Filtre pacientes sin valores ausentes en cholest o glyc, Recodifique una columna colesterol_alto si cholest >= 200, Agrupe por colesterol alto y género y calcule la media de glucemia y la proporción de outcome de cada grupo.
stroke_con_lab %>%
filter(!is.na(glyc) | !is.na(cholest)) %>% drop_na %>%
mutate(colesterol_alto = case_when(cholest >= 200 ~ "Si", cholest < 200 ~ "No")) %>%
group_by(colesterol_alto, gender) %>%
summarise(media_glucemia_agrupada = mean(glyc, na.rm = TRUE), proporción_vivos_por_grupo = sum(outcome == "alive")/n()*100, proporción_muertos_por_grupo = sum(outcome == "dead")/n()*100) %>%
ungroup()
## `summarise()` has grouped output by 'colesterol_alto'. You can override using
## the `.groups` argument.
## # A tibble: 4 × 5
## colesterol_alto gender media_glucemia_agrupada proporción_vivos_por_grupo
## <chr> <chr> <dbl> <dbl>
## 1 No F 112. 80.1
## 2 No M 108. 76.7
## 3 Si F 112. 77.6
## 4 Si M 112. 85.3
## # ℹ 1 more variable: proporción_muertos_por_grupo <dbl>