Yohannes Deboch’s TIDYVERSE RECIPE

I used Yohannes’s Tidyverse example to understand how different ages distributed in heart disease risk .I used dplyr::recode fucntion to put different ages in the buckets.

Install Libraries

# load package(s)
library(tidyverse)

## -- Attaching packages ------------------------------------------------------------------------------------------------------------------------------- tidyverse 1.2.1 --

## v ggplot2 3.1.0       v purrr   0.3.0  
## v tibble  2.0.1       v dplyr   0.8.0.1
## v tidyr   0.8.3       v stringr 1.3.1  
## v readr   1.3.1       v forcats 0.4.0

## -- Conflicts ---------------------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(janitor)

## 
## Attaching package: 'janitor'

## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

library(kableExtra)

## 
## Attaching package: 'kableExtra'

## The following object is masked from 'package:dplyr':
## 
##     group_rows

Load Data

Data source

https://www.kaggle.com/ronitf/heart-disease-uci

Read the data using readr

disease <- read_csv("heart.csv")

## Parsed with column specification:
## cols(
##   age = col_double(),
##   sex = col_double(),
##   cp = col_double(),
##   trestbps = col_double(),
##   chol = col_double(),
##   fbs = col_double(),
##   restecg = col_double(),
##   thalach = col_double(),
##   exang = col_double(),
##   oldpeak = col_double(),
##   slope = col_double(),
##   ca = col_double(),
##   thal = col_double(),
##   target = col_double()
## )

head(disease) %>% kable() %>% kable_styling()

age	sex	cp	trestbps	chol	fbs	restecg	thalach	exang	oldpeak	slope	thal	target
63	1	3	145	233	1	0	150	0	2.3	0	1	1
37	1	2	130	250	0	1	187	0	3.5	0	2	1
41	0	1	130	204	0	0	172	0	1.4	2	2	1
56	1	1	120	236	0	1	178	0	0.8	2	2	1
57	0	0	120	354	0	1	163	1	0.6	2	2	1
57	1	0	140	192	0	1	148	0	0.4	1	1	1

disease$age_bucket <- cut(disease$age, breaks = 10)

df <- disease %>%
        mutate(age_bucket = 
dplyr::recode(age_bucket, "(29,33.8]" = "Risk level 0",
                             "(33.8,38.6]" = "Risk level 1",
                             "(38.6,43.4]" = "Risk level 2",
                             "(43.4,48.2]" = "Risk level 3",
                             "(48.2,53]" = "Risk level 4",
                             "(53,57.8]" = "Risk level 5",
                             "(57.8,62.6]" = "Risk level 6",
                             "(62.6,67.4]" = "Risk level 7",
                             "(67.4,72.2]" = "Risk level 8", 
                             "(72.2,77]" = "Risk level 9"))
head(df) %>% kable() %>% kable_styling()

age	sex	cp	trestbps	chol	fbs	restecg	thalach	exang	oldpeak	slope	thal	target	age_bucket
63	1	3	145	233	1	0	150	0	2.3	0	1	1	Risk level 7
37	1	2	130	250	0	1	187	0	3.5	0	2	1	Risk level 1
41	0	1	130	204	0	0	172	0	1.4	2	2	1	Risk level 2
56	1	1	120	236	0	1	178	0	0.8	2	2	1	Risk level 5
57	0	0	120	354	0	1	163	1	0.6	2	2	1	Risk level 5
57	1	0	140	192	0	1	148	0	0.4	1	1	1	Risk level 5

Graphs

table(df$age_bucket) %>% barplot(., ylab = "Frequency", main = "Heart Disease Risk Distribution")

Conclusion

According to our graph, ages between 57 and 62 is high risk.The heart diesease decreases after ages 62 . The ages between 33 and 62, heart disease increases in every age buckets.

DATA 607 - TIDYVERSE-PART 2

OMER OZEREN