HAZAL GUNDUZ

Tidyverse Assignment

The task here is to Create an Example. Using one or more TidyVerse packages, and any dataset from fivethirtyeight.com or Kaggle, create a programming sample “vignette” that demonstrates how to use one or more of the capabilities of the selected TidyVerse package with selected dataset.

Columns meanings:

The meaning of some of the column headers are not obvious. Here’s what they mean: • age: The person’s age in years • sex: The person’s sex (1 = male, 0 = female) • cp: The chest pain experienced (Value 1: typical angina, Value 2: atypical angina, Value 3: non-angina pain, Value 4: asymptomatic angina) • trestbps: The person’s resting blood pressure (mm Hg on admission to the hospital) • chol: The person’s cholesterol measurement in mg/dl • fbs: The person’s fasting blood sugar (> 120 mg/dl, 1 = true; 0 = false) • restecg: Resting electrocardiographic measurement (0 = normal, 1 = having ST-T wave abnormality, 2 = showing probable or definite left ventricular hypertrophy by Estes’ criteria) • thalach: The person’s maximum heart rate achieved • exang: Exercise induced angina (1 = yes; 0 = no) • oldpeak: ST depression induced by exercise relative to rest (‘ST’ relates to positions on the ECG plot. See more here) • slope: the slope of the peak exercise ST segment (Value 1: upsloping, Value 2: flat, Value 3: downsloping) • ca: The number of major vessels (0-3) • thal: A blood disorder called thalassemia (3 = normal; 6 = fixed defect; 7 = reversable defect) • target: Heart disease (0 = no, 1 = yes)

Load the library

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.5     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.4     ✓ stringr 1.4.0
## ✓ readr   2.0.2     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(ggplot2)
library(dplyr)

Data source

https://www.kaggle.com/ronitf/heart-disease-uci

heart <- read_csv("heart.csv")
## Rows: 303 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (14): age, sex, cp, trestbps, chol, fbs, restecg, thalach, exang, oldpea...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
names (heart) <- c("age", "sex", "cp", "trestbps", "chol", "fbs", "restecg", "thalach", "exang", "oldpeak", "slope", "ca", "thal", "target")
head(heart)
## # A tibble: 6 × 14
##     age   sex    cp trestbps  chol   fbs restecg thalach exang oldpeak slope
##   <dbl> <dbl> <dbl>    <dbl> <dbl> <dbl>   <dbl>   <dbl> <dbl>   <dbl> <dbl>
## 1    63     1     3      145   233     1       0     150     0     2.3     0
## 2    37     1     2      130   250     0       1     187     0     3.5     0
## 3    41     0     1      130   204     0       0     172     0     1.4     2
## 4    56     1     1      120   236     0       1     178     0     0.8     2
## 5    57     0     0      120   354     0       1     163     1     0.6     2
## 6    57     1     0      140   192     0       1     148     0     0.4     1
## # … with 3 more variables: ca <dbl>, thal <dbl>, target <dbl>
dim(heart)
## [1] 303  14
str(heart)
## spec_tbl_df [303 × 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ age     : num [1:303] 63 37 41 56 57 57 56 44 52 57 ...
##  $ sex     : num [1:303] 1 1 0 1 0 1 0 1 1 1 ...
##  $ cp      : num [1:303] 3 2 1 1 0 0 1 1 2 2 ...
##  $ trestbps: num [1:303] 145 130 130 120 120 140 140 120 172 150 ...
##  $ chol    : num [1:303] 233 250 204 236 354 192 294 263 199 168 ...
##  $ fbs     : num [1:303] 1 0 0 0 0 0 0 0 1 0 ...
##  $ restecg : num [1:303] 0 1 0 1 1 1 0 1 1 1 ...
##  $ thalach : num [1:303] 150 187 172 178 163 148 153 173 162 174 ...
##  $ exang   : num [1:303] 0 0 0 0 1 0 0 0 0 0 ...
##  $ oldpeak : num [1:303] 2.3 3.5 1.4 0.8 0.6 0.4 1.3 0 0.5 1.6 ...
##  $ slope   : num [1:303] 0 0 2 2 2 1 1 2 2 2 ...
##  $ ca      : num [1:303] 0 0 0 0 0 0 0 0 0 0 ...
##  $ thal    : num [1:303] 1 2 2 2 2 1 2 3 3 2 ...
##  $ target  : num [1:303] 1 1 1 1 1 1 1 1 1 1 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   age = col_double(),
##   ..   sex = col_double(),
##   ..   cp = col_double(),
##   ..   trestbps = col_double(),
##   ..   chol = col_double(),
##   ..   fbs = col_double(),
##   ..   restecg = col_double(),
##   ..   thalach = col_double(),
##   ..   exang = col_double(),
##   ..   oldpeak = col_double(),
##   ..   slope = col_double(),
##   ..   ca = col_double(),
##   ..   thal = col_double(),
##   ..   target = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
sum(is.na(heart))
## [1] 0
summary(heart)
##       age             sex               cp           trestbps    
##  Min.   :29.00   Min.   :0.0000   Min.   :0.000   Min.   : 94.0  
##  1st Qu.:47.50   1st Qu.:0.0000   1st Qu.:0.000   1st Qu.:120.0  
##  Median :55.00   Median :1.0000   Median :1.000   Median :130.0  
##  Mean   :54.37   Mean   :0.6832   Mean   :0.967   Mean   :131.6  
##  3rd Qu.:61.00   3rd Qu.:1.0000   3rd Qu.:2.000   3rd Qu.:140.0  
##  Max.   :77.00   Max.   :1.0000   Max.   :3.000   Max.   :200.0  
##       chol            fbs            restecg          thalach     
##  Min.   :126.0   Min.   :0.0000   Min.   :0.0000   Min.   : 71.0  
##  1st Qu.:211.0   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:133.5  
##  Median :240.0   Median :0.0000   Median :1.0000   Median :153.0  
##  Mean   :246.3   Mean   :0.1485   Mean   :0.5281   Mean   :149.6  
##  3rd Qu.:274.5   3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.:166.0  
##  Max.   :564.0   Max.   :1.0000   Max.   :2.0000   Max.   :202.0  
##      exang           oldpeak         slope             ca        
##  Min.   :0.0000   Min.   :0.00   Min.   :0.000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.00   1st Qu.:1.000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.80   Median :1.000   Median :0.0000  
##  Mean   :0.3267   Mean   :1.04   Mean   :1.399   Mean   :0.7294  
##  3rd Qu.:1.0000   3rd Qu.:1.60   3rd Qu.:2.000   3rd Qu.:1.0000  
##  Max.   :1.0000   Max.   :6.20   Max.   :2.000   Max.   :4.0000  
##       thal           target      
##  Min.   :0.000   Min.   :0.0000  
##  1st Qu.:2.000   1st Qu.:0.0000  
##  Median :2.000   Median :1.0000  
##  Mean   :2.314   Mean   :0.5446  
##  3rd Qu.:3.000   3rd Qu.:1.0000  
##  Max.   :3.000   Max.   :1.0000
heart 
## # A tibble: 303 × 14
##      age   sex    cp trestbps  chol   fbs restecg thalach exang oldpeak slope
##    <dbl> <dbl> <dbl>    <dbl> <dbl> <dbl>   <dbl>   <dbl> <dbl>   <dbl> <dbl>
##  1    63     1     3      145   233     1       0     150     0     2.3     0
##  2    37     1     2      130   250     0       1     187     0     3.5     0
##  3    41     0     1      130   204     0       0     172     0     1.4     2
##  4    56     1     1      120   236     0       1     178     0     0.8     2
##  5    57     0     0      120   354     0       1     163     1     0.6     2
##  6    57     1     0      140   192     0       1     148     0     0.4     1
##  7    56     0     1      140   294     0       0     153     0     1.3     1
##  8    44     1     1      120   263     0       1     173     0     0       2
##  9    52     1     2      172   199     1       1     162     0     0.5     2
## 10    57     1     2      150   168     0       1     174     0     1.6     2
## # … with 293 more rows, and 3 more variables: ca <dbl>, thal <dbl>,
## #   target <dbl>
urlGet <- ("https://github.com/Gunduzhazal/heart")
df<- select(heart, c("age","sex","cp"))
head(df)
## # A tibble: 6 × 3
##     age   sex    cp
##   <dbl> <dbl> <dbl>
## 1    63     1     3
## 2    37     1     2
## 3    41     0     1
## 4    56     1     1
## 5    57     0     0
## 6    57     1     0
df <- df %>% arrange(age, sex, cp)
head(df)
## # A tibble: 6 × 3
##     age   sex    cp
##   <dbl> <dbl> <dbl>
## 1    29     1     1
## 2    34     0     1
## 3    34     1     3
## 4    35     0     0
## 5    35     1     0
## 6    35     1     0
tail(df)
## # A tibble: 6 × 3
##     age   sex    cp
##   <dbl> <dbl> <dbl>
## 1    71     0     0
## 2    71     0     1
## 3    71     0     2
## 4    74     0     1
## 5    76     0     2
## 6    77     1     0

Conclusion

Heart disease is a major health concern and there are factors that people should be aware of.

Github => https://github.com/Gunduzhazal/heart

Rpubs => https://rpubs.com/gunduzhazal/826400