HAZAL GUNDUZ

Tidyverse EXTEND Assignment

In this assignment, you’ll practice collaborating around a code project with GitHub. You could consider our collective work as building out a book of examples on how to use TidyVerse functions.

Github repository: https://github.com/Gunduzhazal/heart

Your task here is to Extend an Existing Example. Using one of your classmate’s examples (as created above), extend his or her example with additional annotated code. (15 points)

You should clone the provided repository. Once you have code to submit, you should make a pull request on the shared repository. You should also update the README.md file with your example.

After you’ve extended your classmate’s vignette, please submit your GitHub handle name in the submission link provided below. This will let your instructor know that your work is ready to be peer-graded.

You should complete your submission on the schedule stated in the course syllabus.

Load the library

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.5     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.4     ✓ stringr 1.4.0
## ✓ readr   2.0.2     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(ggplot2)
library(dplyr)

Data source

https://www.kaggle.com/ronitf/heart-disease-uci

heart <- read_csv("heart.csv")
## Rows: 303 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (14): age, sex, cp, trestbps, chol, fbs, restecg, thalach, exang, oldpea...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
names (heart) <- c("age", "sex", "cp", "trestbps", "chol", "fbs", "restecg", "thalach", "exang", "oldpeak", "slope", "ca", "thal", "target")
head(heart)
## # A tibble: 6 × 14
##     age   sex    cp trestbps  chol   fbs restecg thalach exang oldpeak slope
##   <dbl> <dbl> <dbl>    <dbl> <dbl> <dbl>   <dbl>   <dbl> <dbl>   <dbl> <dbl>
## 1    63     1     3      145   233     1       0     150     0     2.3     0
## 2    37     1     2      130   250     0       1     187     0     3.5     0
## 3    41     0     1      130   204     0       0     172     0     1.4     2
## 4    56     1     1      120   236     0       1     178     0     0.8     2
## 5    57     0     0      120   354     0       1     163     1     0.6     2
## 6    57     1     0      140   192     0       1     148     0     0.4     1
## # … with 3 more variables: ca <dbl>, thal <dbl>, target <dbl>

Tidyverse EXTEND

tail(heart)
## # A tibble: 6 × 14
##     age   sex    cp trestbps  chol   fbs restecg thalach exang oldpeak slope
##   <dbl> <dbl> <dbl>    <dbl> <dbl> <dbl>   <dbl>   <dbl> <dbl>   <dbl> <dbl>
## 1    59     1     0      164   176     1       0      90     0     1       1
## 2    57     0     0      140   241     0       1     123     1     0.2     1
## 3    45     1     3      110   264     0       1     132     0     1.2     1
## 4    68     1     0      144   193     1       1     141     0     3.4     1
## 5    57     1     0      130   131     0       1     115     1     1.2     1
## 6    57     0     1      130   236     0       0     174     0     0       1
## # … with 3 more variables: ca <dbl>, thal <dbl>, target <dbl>
str(heart)
## spec_tbl_df [303 × 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ age     : num [1:303] 63 37 41 56 57 57 56 44 52 57 ...
##  $ sex     : num [1:303] 1 1 0 1 0 1 0 1 1 1 ...
##  $ cp      : num [1:303] 3 2 1 1 0 0 1 1 2 2 ...
##  $ trestbps: num [1:303] 145 130 130 120 120 140 140 120 172 150 ...
##  $ chol    : num [1:303] 233 250 204 236 354 192 294 263 199 168 ...
##  $ fbs     : num [1:303] 1 0 0 0 0 0 0 0 1 0 ...
##  $ restecg : num [1:303] 0 1 0 1 1 1 0 1 1 1 ...
##  $ thalach : num [1:303] 150 187 172 178 163 148 153 173 162 174 ...
##  $ exang   : num [1:303] 0 0 0 0 1 0 0 0 0 0 ...
##  $ oldpeak : num [1:303] 2.3 3.5 1.4 0.8 0.6 0.4 1.3 0 0.5 1.6 ...
##  $ slope   : num [1:303] 0 0 2 2 2 1 1 2 2 2 ...
##  $ ca      : num [1:303] 0 0 0 0 0 0 0 0 0 0 ...
##  $ thal    : num [1:303] 1 2 2 2 2 1 2 3 3 2 ...
##  $ target  : num [1:303] 1 1 1 1 1 1 1 1 1 1 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   age = col_double(),
##   ..   sex = col_double(),
##   ..   cp = col_double(),
##   ..   trestbps = col_double(),
##   ..   chol = col_double(),
##   ..   fbs = col_double(),
##   ..   restecg = col_double(),
##   ..   thalach = col_double(),
##   ..   exang = col_double(),
##   ..   oldpeak = col_double(),
##   ..   slope = col_double(),
##   ..   ca = col_double(),
##   ..   thal = col_double(),
##   ..   target = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
summary(heart)
##       age             sex               cp           trestbps    
##  Min.   :29.00   Min.   :0.0000   Min.   :0.000   Min.   : 94.0  
##  1st Qu.:47.50   1st Qu.:0.0000   1st Qu.:0.000   1st Qu.:120.0  
##  Median :55.00   Median :1.0000   Median :1.000   Median :130.0  
##  Mean   :54.37   Mean   :0.6832   Mean   :0.967   Mean   :131.6  
##  3rd Qu.:61.00   3rd Qu.:1.0000   3rd Qu.:2.000   3rd Qu.:140.0  
##  Max.   :77.00   Max.   :1.0000   Max.   :3.000   Max.   :200.0  
##       chol            fbs            restecg          thalach     
##  Min.   :126.0   Min.   :0.0000   Min.   :0.0000   Min.   : 71.0  
##  1st Qu.:211.0   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:133.5  
##  Median :240.0   Median :0.0000   Median :1.0000   Median :153.0  
##  Mean   :246.3   Mean   :0.1485   Mean   :0.5281   Mean   :149.6  
##  3rd Qu.:274.5   3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.:166.0  
##  Max.   :564.0   Max.   :1.0000   Max.   :2.0000   Max.   :202.0  
##      exang           oldpeak         slope             ca        
##  Min.   :0.0000   Min.   :0.00   Min.   :0.000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.00   1st Qu.:1.000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.80   Median :1.000   Median :0.0000  
##  Mean   :0.3267   Mean   :1.04   Mean   :1.399   Mean   :0.7294  
##  3rd Qu.:1.0000   3rd Qu.:1.60   3rd Qu.:2.000   3rd Qu.:1.0000  
##  Max.   :1.0000   Max.   :6.20   Max.   :2.000   Max.   :4.0000  
##       thal           target      
##  Min.   :0.000   Min.   :0.0000  
##  1st Qu.:2.000   1st Qu.:0.0000  
##  Median :2.000   Median :1.0000  
##  Mean   :2.314   Mean   :0.5446  
##  3rd Qu.:3.000   3rd Qu.:1.0000  
##  Max.   :3.000   Max.   :1.0000
heart 
## # A tibble: 303 × 14
##      age   sex    cp trestbps  chol   fbs restecg thalach exang oldpeak slope
##    <dbl> <dbl> <dbl>    <dbl> <dbl> <dbl>   <dbl>   <dbl> <dbl>   <dbl> <dbl>
##  1    63     1     3      145   233     1       0     150     0     2.3     0
##  2    37     1     2      130   250     0       1     187     0     3.5     0
##  3    41     0     1      130   204     0       0     172     0     1.4     2
##  4    56     1     1      120   236     0       1     178     0     0.8     2
##  5    57     0     0      120   354     0       1     163     1     0.6     2
##  6    57     1     0      140   192     0       1     148     0     0.4     1
##  7    56     0     1      140   294     0       0     153     0     1.3     1
##  8    44     1     1      120   263     0       1     173     0     0       2
##  9    52     1     2      172   199     1       1     162     0     0.5     2
## 10    57     1     2      150   168     0       1     174     0     1.6     2
## # … with 293 more rows, and 3 more variables: ca <dbl>, thal <dbl>,
## #   target <dbl>
urlGet <- ("https://github.com/Gunduzhazal/heart")

dplyr package

glimpse(heart)
## Rows: 303
## Columns: 14
## $ age      <dbl> 63, 37, 41, 56, 57, 57, 56, 44, 52, 57, 54, 48, 49, 64, 58, 5…
## $ sex      <dbl> 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1…
## $ cp       <dbl> 3, 2, 1, 1, 0, 0, 1, 1, 2, 2, 0, 2, 1, 3, 3, 2, 2, 3, 0, 3, 0…
## $ trestbps <dbl> 145, 130, 130, 120, 120, 140, 140, 120, 172, 150, 140, 130, 1…
## $ chol     <dbl> 233, 250, 204, 236, 354, 192, 294, 263, 199, 168, 239, 275, 2…
## $ fbs      <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0…
## $ restecg  <dbl> 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1…
## $ thalach  <dbl> 150, 187, 172, 178, 163, 148, 153, 173, 162, 174, 160, 139, 1…
## $ exang    <dbl> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0…
## $ oldpeak  <dbl> 2.3, 3.5, 1.4, 0.8, 0.6, 0.4, 1.3, 0.0, 0.5, 1.6, 1.2, 0.2, 0…
## $ slope    <dbl> 0, 0, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 0, 2, 2, 1…
## $ ca       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0…
## $ thal     <dbl> 1, 2, 2, 2, 2, 1, 2, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3…
## $ target   <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…

Selecting

heart_sub <- heart %>%
  select(age, sex, cp, oldpeak)
heart_sub1 <- heart %>% 
  select(age:exang)
head(heart_sub)
## # A tibble: 6 × 4
##     age   sex    cp oldpeak
##   <dbl> <dbl> <dbl>   <dbl>
## 1    63     1     3     2.3
## 2    37     1     2     3.5
## 3    41     0     1     1.4
## 4    56     1     1     0.8
## 5    57     0     0     0.6
## 6    57     1     0     0.4
head(heart_sub1)
## # A tibble: 6 × 9
##     age   sex    cp trestbps  chol   fbs restecg thalach exang
##   <dbl> <dbl> <dbl>    <dbl> <dbl> <dbl>   <dbl>   <dbl> <dbl>
## 1    63     1     3      145   233     1       0     150     0
## 2    37     1     2      130   250     0       1     187     0
## 3    41     0     1      130   204     0       0     172     0
## 4    56     1     1      120   236     0       1     178     0
## 5    57     0     0      120   354     0       1     163     1
## 6    57     1     0      140   192     0       1     148     0

Filtering

heart_disease <- heart %>% 
  filter(restecg == "Normal")
head(heart_disease)
## # A tibble: 0 × 14
## # … with 14 variables: age <dbl>, sex <dbl>, cp <dbl>, trestbps <dbl>,
## #   chol <dbl>, fbs <dbl>, restecg <dbl>, thalach <dbl>, exang <dbl>,
## #   oldpeak <dbl>, slope <dbl>, ca <dbl>, thal <dbl>, target <dbl>

Tidyverse EXTEND

heart_disease <- heart %>% 
  filter(chol > 150)
head(heart_disease)
## # A tibble: 6 × 14
##     age   sex    cp trestbps  chol   fbs restecg thalach exang oldpeak slope
##   <dbl> <dbl> <dbl>    <dbl> <dbl> <dbl>   <dbl>   <dbl> <dbl>   <dbl> <dbl>
## 1    63     1     3      145   233     1       0     150     0     2.3     0
## 2    37     1     2      130   250     0       1     187     0     3.5     0
## 3    41     0     1      130   204     0       0     172     0     1.4     2
## 4    56     1     1      120   236     0       1     178     0     0.8     2
## 5    57     0     0      120   354     0       1     163     1     0.6     2
## 6    57     1     0      140   192     0       1     148     0     0.4     1
## # … with 3 more variables: ca <dbl>, thal <dbl>, target <dbl>
heart_disease %>%
  arrange(age)
## # A tibble: 298 × 14
##      age   sex    cp trestbps  chol   fbs restecg thalach exang oldpeak slope
##    <dbl> <dbl> <dbl>    <dbl> <dbl> <dbl>   <dbl>   <dbl> <dbl>   <dbl> <dbl>
##  1    29     1     1      130   204     0       0     202     0     0       2
##  2    34     1     3      118   182     0       0     174     0     0       2
##  3    34     0     1      118   210     0       1     192     0     0.7     2
##  4    35     0     0      138   183     0       1     182     0     1.4     2
##  5    35     1     1      122   192     0       1     174     0     0       2
##  6    35     1     0      120   198     0       1     130     1     1.6     1
##  7    35     1     0      126   282     0       0     156     1     0       2
##  8    37     1     2      130   250     0       1     187     0     3.5     0
##  9    37     0     2      120   215     0       1     170     0     0       2
## 10    38     1     2      138   175     0       1     173     0     0       2
## # … with 288 more rows, and 3 more variables: ca <dbl>, thal <dbl>,
## #   target <dbl>
heart %>%
  arrange(desc(chol))
## # A tibble: 303 × 14
##      age   sex    cp trestbps  chol   fbs restecg thalach exang oldpeak slope
##    <dbl> <dbl> <dbl>    <dbl> <dbl> <dbl>   <dbl>   <dbl> <dbl>   <dbl> <dbl>
##  1    67     0     2      115   564     0       0     160     0     1.6     1
##  2    65     0     2      140   417     1       0     157     0     0.8     2
##  3    56     0     0      134   409     0       0     150     1     1.9     1
##  4    63     0     0      150   407     0       0     154     0     4       1
##  5    62     0     0      140   394     0       0     157     0     1.2     1
##  6    65     0     2      160   360     0       0     151     0     0.8     2
##  7    57     0     0      120   354     0       1     163     1     0.6     2
##  8    55     1     0      132   353     0       1     132     1     1.2     1
##  9    55     0     1      132   342     0       1     166     0     1.2     2
## 10    43     0     0      132   341     1       0     136     1     3       1
## # … with 293 more rows, and 3 more variables: ca <dbl>, thal <dbl>,
## #   target <dbl>

Mutating

heart_disease %>%
  mutate(ratio = chol / restecg) %>%
  select(restecg, chol, ratio)
## # A tibble: 298 × 3
##    restecg  chol ratio
##      <dbl> <dbl> <dbl>
##  1       0   233   Inf
##  2       1   250   250
##  3       0   204   Inf
##  4       1   236   236
##  5       1   354   354
##  6       1   192   192
##  7       0   294   Inf
##  8       1   263   263
##  9       1   199   199
## 10       1   168   168
## # … with 288 more rows

Counting

heart_disease %>%
  count(restecg)
## # A tibble: 3 × 2
##   restecg     n
##     <dbl> <int>
## 1       0   146
## 2       1   148
## 3       2     4
heart_disease %>%
  count(restecg, sort=TRUE)
## # A tibble: 3 × 2
##   restecg     n
##     <dbl> <int>
## 1       1   148
## 2       0   146
## 3       2     4

Renaming

heart <- heart %>% 
  rename("gender" = sex)
head(heart)
## # A tibble: 6 × 14
##     age gender    cp trestbps  chol   fbs restecg thalach exang oldpeak slope
##   <dbl>  <dbl> <dbl>    <dbl> <dbl> <dbl>   <dbl>   <dbl> <dbl>   <dbl> <dbl>
## 1    63      1     3      145   233     1       0     150     0     2.3     0
## 2    37      1     2      130   250     0       1     187     0     3.5     0
## 3    41      0     1      130   204     0       0     172     0     1.4     2
## 4    56      1     1      120   236     0       1     178     0     0.8     2
## 5    57      0     0      120   354     0       1     163     1     0.6     2
## 6    57      1     0      140   192     0       1     148     0     0.4     1
## # … with 3 more variables: ca <dbl>, thal <dbl>, target <dbl>
ggplot(data = heart, aes(x = chol, y = restecg)) + geom_point(alpha = 0.5) + 
  labs(title = "restecg vs. chol") + theme_bw()

ggplot(data = heart, aes(x = chol, y = restecg, color = restecg)) + geom_point()

ggplot(data = heart, aes(x=chol, y=restecg)) + geom_point() +
  facet_wrap(~restecg)

Barplot

ggplot(data = heart, aes(x = gender)) + geom_bar(fill = "green") + 
  labs(title = "Bar chart for count of sex") + theme_bw()

ggplot(data = heart, aes(x = gender)) + geom_bar(fill = "yellow") + 
  labs(title = "Bar chart for count of sex") + theme_bw() + coord_flip()

Conclusion

Heart disease is a major health concern and there are factors that people should be aware of.

Github => https://github.com/Gunduzhazal/heart

Rpubs => https://rpubs.com/gunduzhazal/832254