Yohannes Deboch’s TIDYVERSE RECIPE

I used Yohannes’s Tidyverse example to understand how different ages distributed in heart disease risk .I used dplyr::recode fucntion to put different ages in the buckets.

Install Libraries

# load package(s)
library(tidyverse)
## -- Attaching packages ------------------------------------------------------------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.1.0       v purrr   0.3.0  
## v tibble  2.0.1       v dplyr   0.8.0.1
## v tidyr   0.8.3       v stringr 1.3.1  
## v readr   1.3.1       v forcats 0.4.0
## -- Conflicts ---------------------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(janitor)
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(kableExtra)
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows

Load Data

Read the data using readr

disease <- read_csv("heart.csv")
## Parsed with column specification:
## cols(
##   age = col_double(),
##   sex = col_double(),
##   cp = col_double(),
##   trestbps = col_double(),
##   chol = col_double(),
##   fbs = col_double(),
##   restecg = col_double(),
##   thalach = col_double(),
##   exang = col_double(),
##   oldpeak = col_double(),
##   slope = col_double(),
##   ca = col_double(),
##   thal = col_double(),
##   target = col_double()
## )
head(disease) %>% kable() %>% kable_styling()
age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal target
63 1 3 145 233 1 0 150 0 2.3 0 0 1 1
37 1 2 130 250 0 1 187 0 3.5 0 0 2 1
41 0 1 130 204 0 0 172 0 1.4 2 0 2 1
56 1 1 120 236 0 1 178 0 0.8 2 0 2 1
57 0 0 120 354 0 1 163 1 0.6 2 0 2 1
57 1 0 140 192 0 1 148 0 0.4 1 0 1 1
disease$age_bucket <- cut(disease$age, breaks = 10)

df <- disease %>%
        mutate(age_bucket = 
dplyr::recode(age_bucket, "(29,33.8]" = "Risk level 0",
                             "(33.8,38.6]" = "Risk level 1",
                             "(38.6,43.4]" = "Risk level 2",
                             "(43.4,48.2]" = "Risk level 3",
                             "(48.2,53]" = "Risk level 4",
                             "(53,57.8]" = "Risk level 5",
                             "(57.8,62.6]" = "Risk level 6",
                             "(62.6,67.4]" = "Risk level 7",
                             "(67.4,72.2]" = "Risk level 8", 
                             "(72.2,77]" = "Risk level 9"))
head(df) %>% kable() %>% kable_styling()
age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal target age_bucket
63 1 3 145 233 1 0 150 0 2.3 0 0 1 1 Risk level 7
37 1 2 130 250 0 1 187 0 3.5 0 0 2 1 Risk level 1
41 0 1 130 204 0 0 172 0 1.4 2 0 2 1 Risk level 2
56 1 1 120 236 0 1 178 0 0.8 2 0 2 1 Risk level 5
57 0 0 120 354 0 1 163 1 0.6 2 0 2 1 Risk level 5
57 1 0 140 192 0 1 148 0 0.4 1 0 1 1 Risk level 5

Graphs

table(df$age_bucket) %>% barplot(., ylab = "Frequency", main = "Heart Disease Risk Distribution")

Conclusion

According to our graph, ages between 57 and 62 is high risk.The heart diesease decreases after ages 62 . The ages between 33 and 62, heart disease increases in every age buckets.