HAZAL GUNDUZ
Tidyverse EXTEND Assignment
In this assignment, you’ll practice collaborating around a code project with GitHub. You could consider our collective work as building out a book of examples on how to use TidyVerse functions.
Github repository: https://github.com/Gunduzhazal/heart
Your task here is to Extend an Existing Example. Using one of your classmate’s examples (as created above), extend his or her example with additional annotated code. (15 points)
You should clone the provided repository. Once you have code to submit, you should make a pull request on the shared repository. You should also update the README.md file with your example.
After you’ve extended your classmate’s vignette, please submit your GitHub handle name in the submission link provided below. This will let your instructor know that your work is ready to be peer-graded.
You should complete your submission on the schedule stated in the course syllabus.
Load the library
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.5 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.4 ✓ stringr 1.4.0
## ✓ readr 2.0.2 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(ggplot2)
library(dplyr)
Data source
https://www.kaggle.com/ronitf/heart-disease-uci
heart <- read_csv("heart.csv")
## Rows: 303 Columns: 14
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (14): age, sex, cp, trestbps, chol, fbs, restecg, thalach, exang, oldpea...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
names (heart) <- c("age", "sex", "cp", "trestbps", "chol", "fbs", "restecg", "thalach", "exang", "oldpeak", "slope", "ca", "thal", "target")
head(heart)
## # A tibble: 6 × 14
## age sex cp trestbps chol fbs restecg thalach exang oldpeak slope
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 63 1 3 145 233 1 0 150 0 2.3 0
## 2 37 1 2 130 250 0 1 187 0 3.5 0
## 3 41 0 1 130 204 0 0 172 0 1.4 2
## 4 56 1 1 120 236 0 1 178 0 0.8 2
## 5 57 0 0 120 354 0 1 163 1 0.6 2
## 6 57 1 0 140 192 0 1 148 0 0.4 1
## # … with 3 more variables: ca <dbl>, thal <dbl>, target <dbl>
Tidyverse EXTEND
tail(heart)
## # A tibble: 6 × 14
## age sex cp trestbps chol fbs restecg thalach exang oldpeak slope
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 59 1 0 164 176 1 0 90 0 1 1
## 2 57 0 0 140 241 0 1 123 1 0.2 1
## 3 45 1 3 110 264 0 1 132 0 1.2 1
## 4 68 1 0 144 193 1 1 141 0 3.4 1
## 5 57 1 0 130 131 0 1 115 1 1.2 1
## 6 57 0 1 130 236 0 0 174 0 0 1
## # … with 3 more variables: ca <dbl>, thal <dbl>, target <dbl>
str(heart)
## spec_tbl_df [303 × 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ age : num [1:303] 63 37 41 56 57 57 56 44 52 57 ...
## $ sex : num [1:303] 1 1 0 1 0 1 0 1 1 1 ...
## $ cp : num [1:303] 3 2 1 1 0 0 1 1 2 2 ...
## $ trestbps: num [1:303] 145 130 130 120 120 140 140 120 172 150 ...
## $ chol : num [1:303] 233 250 204 236 354 192 294 263 199 168 ...
## $ fbs : num [1:303] 1 0 0 0 0 0 0 0 1 0 ...
## $ restecg : num [1:303] 0 1 0 1 1 1 0 1 1 1 ...
## $ thalach : num [1:303] 150 187 172 178 163 148 153 173 162 174 ...
## $ exang : num [1:303] 0 0 0 0 1 0 0 0 0 0 ...
## $ oldpeak : num [1:303] 2.3 3.5 1.4 0.8 0.6 0.4 1.3 0 0.5 1.6 ...
## $ slope : num [1:303] 0 0 2 2 2 1 1 2 2 2 ...
## $ ca : num [1:303] 0 0 0 0 0 0 0 0 0 0 ...
## $ thal : num [1:303] 1 2 2 2 2 1 2 3 3 2 ...
## $ target : num [1:303] 1 1 1 1 1 1 1 1 1 1 ...
## - attr(*, "spec")=
## .. cols(
## .. age = col_double(),
## .. sex = col_double(),
## .. cp = col_double(),
## .. trestbps = col_double(),
## .. chol = col_double(),
## .. fbs = col_double(),
## .. restecg = col_double(),
## .. thalach = col_double(),
## .. exang = col_double(),
## .. oldpeak = col_double(),
## .. slope = col_double(),
## .. ca = col_double(),
## .. thal = col_double(),
## .. target = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
summary(heart)
## age sex cp trestbps
## Min. :29.00 Min. :0.0000 Min. :0.000 Min. : 94.0
## 1st Qu.:47.50 1st Qu.:0.0000 1st Qu.:0.000 1st Qu.:120.0
## Median :55.00 Median :1.0000 Median :1.000 Median :130.0
## Mean :54.37 Mean :0.6832 Mean :0.967 Mean :131.6
## 3rd Qu.:61.00 3rd Qu.:1.0000 3rd Qu.:2.000 3rd Qu.:140.0
## Max. :77.00 Max. :1.0000 Max. :3.000 Max. :200.0
## chol fbs restecg thalach
## Min. :126.0 Min. :0.0000 Min. :0.0000 Min. : 71.0
## 1st Qu.:211.0 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:133.5
## Median :240.0 Median :0.0000 Median :1.0000 Median :153.0
## Mean :246.3 Mean :0.1485 Mean :0.5281 Mean :149.6
## 3rd Qu.:274.5 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:166.0
## Max. :564.0 Max. :1.0000 Max. :2.0000 Max. :202.0
## exang oldpeak slope ca
## Min. :0.0000 Min. :0.00 Min. :0.000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.00 1st Qu.:1.000 1st Qu.:0.0000
## Median :0.0000 Median :0.80 Median :1.000 Median :0.0000
## Mean :0.3267 Mean :1.04 Mean :1.399 Mean :0.7294
## 3rd Qu.:1.0000 3rd Qu.:1.60 3rd Qu.:2.000 3rd Qu.:1.0000
## Max. :1.0000 Max. :6.20 Max. :2.000 Max. :4.0000
## thal target
## Min. :0.000 Min. :0.0000
## 1st Qu.:2.000 1st Qu.:0.0000
## Median :2.000 Median :1.0000
## Mean :2.314 Mean :0.5446
## 3rd Qu.:3.000 3rd Qu.:1.0000
## Max. :3.000 Max. :1.0000
heart
## # A tibble: 303 × 14
## age sex cp trestbps chol fbs restecg thalach exang oldpeak slope
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 63 1 3 145 233 1 0 150 0 2.3 0
## 2 37 1 2 130 250 0 1 187 0 3.5 0
## 3 41 0 1 130 204 0 0 172 0 1.4 2
## 4 56 1 1 120 236 0 1 178 0 0.8 2
## 5 57 0 0 120 354 0 1 163 1 0.6 2
## 6 57 1 0 140 192 0 1 148 0 0.4 1
## 7 56 0 1 140 294 0 0 153 0 1.3 1
## 8 44 1 1 120 263 0 1 173 0 0 2
## 9 52 1 2 172 199 1 1 162 0 0.5 2
## 10 57 1 2 150 168 0 1 174 0 1.6 2
## # … with 293 more rows, and 3 more variables: ca <dbl>, thal <dbl>,
## # target <dbl>
urlGet <- ("https://github.com/Gunduzhazal/heart")
dplyr package
glimpse(heart)
## Rows: 303
## Columns: 14
## $ age <dbl> 63, 37, 41, 56, 57, 57, 56, 44, 52, 57, 54, 48, 49, 64, 58, 5…
## $ sex <dbl> 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1…
## $ cp <dbl> 3, 2, 1, 1, 0, 0, 1, 1, 2, 2, 0, 2, 1, 3, 3, 2, 2, 3, 0, 3, 0…
## $ trestbps <dbl> 145, 130, 130, 120, 120, 140, 140, 120, 172, 150, 140, 130, 1…
## $ chol <dbl> 233, 250, 204, 236, 354, 192, 294, 263, 199, 168, 239, 275, 2…
## $ fbs <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0…
## $ restecg <dbl> 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1…
## $ thalach <dbl> 150, 187, 172, 178, 163, 148, 153, 173, 162, 174, 160, 139, 1…
## $ exang <dbl> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0…
## $ oldpeak <dbl> 2.3, 3.5, 1.4, 0.8, 0.6, 0.4, 1.3, 0.0, 0.5, 1.6, 1.2, 0.2, 0…
## $ slope <dbl> 0, 0, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 0, 2, 2, 1…
## $ ca <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0…
## $ thal <dbl> 1, 2, 2, 2, 2, 1, 2, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3…
## $ target <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
Selecting
heart_sub <- heart %>%
select(age, sex, cp, oldpeak)
heart_sub1 <- heart %>%
select(age:exang)
head(heart_sub)
## # A tibble: 6 × 4
## age sex cp oldpeak
## <dbl> <dbl> <dbl> <dbl>
## 1 63 1 3 2.3
## 2 37 1 2 3.5
## 3 41 0 1 1.4
## 4 56 1 1 0.8
## 5 57 0 0 0.6
## 6 57 1 0 0.4
head(heart_sub1)
## # A tibble: 6 × 9
## age sex cp trestbps chol fbs restecg thalach exang
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 63 1 3 145 233 1 0 150 0
## 2 37 1 2 130 250 0 1 187 0
## 3 41 0 1 130 204 0 0 172 0
## 4 56 1 1 120 236 0 1 178 0
## 5 57 0 0 120 354 0 1 163 1
## 6 57 1 0 140 192 0 1 148 0
Filtering
heart_disease <- heart %>%
filter(restecg == "Normal")
head(heart_disease)
## # A tibble: 0 × 14
## # … with 14 variables: age <dbl>, sex <dbl>, cp <dbl>, trestbps <dbl>,
## # chol <dbl>, fbs <dbl>, restecg <dbl>, thalach <dbl>, exang <dbl>,
## # oldpeak <dbl>, slope <dbl>, ca <dbl>, thal <dbl>, target <dbl>
Tidyverse EXTEND
heart_disease <- heart %>%
filter(chol > 150)
head(heart_disease)
## # A tibble: 6 × 14
## age sex cp trestbps chol fbs restecg thalach exang oldpeak slope
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 63 1 3 145 233 1 0 150 0 2.3 0
## 2 37 1 2 130 250 0 1 187 0 3.5 0
## 3 41 0 1 130 204 0 0 172 0 1.4 2
## 4 56 1 1 120 236 0 1 178 0 0.8 2
## 5 57 0 0 120 354 0 1 163 1 0.6 2
## 6 57 1 0 140 192 0 1 148 0 0.4 1
## # … with 3 more variables: ca <dbl>, thal <dbl>, target <dbl>
heart_disease %>%
arrange(age)
## # A tibble: 298 × 14
## age sex cp trestbps chol fbs restecg thalach exang oldpeak slope
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 29 1 1 130 204 0 0 202 0 0 2
## 2 34 1 3 118 182 0 0 174 0 0 2
## 3 34 0 1 118 210 0 1 192 0 0.7 2
## 4 35 0 0 138 183 0 1 182 0 1.4 2
## 5 35 1 1 122 192 0 1 174 0 0 2
## 6 35 1 0 120 198 0 1 130 1 1.6 1
## 7 35 1 0 126 282 0 0 156 1 0 2
## 8 37 1 2 130 250 0 1 187 0 3.5 0
## 9 37 0 2 120 215 0 1 170 0 0 2
## 10 38 1 2 138 175 0 1 173 0 0 2
## # … with 288 more rows, and 3 more variables: ca <dbl>, thal <dbl>,
## # target <dbl>
heart %>%
arrange(desc(chol))
## # A tibble: 303 × 14
## age sex cp trestbps chol fbs restecg thalach exang oldpeak slope
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 67 0 2 115 564 0 0 160 0 1.6 1
## 2 65 0 2 140 417 1 0 157 0 0.8 2
## 3 56 0 0 134 409 0 0 150 1 1.9 1
## 4 63 0 0 150 407 0 0 154 0 4 1
## 5 62 0 0 140 394 0 0 157 0 1.2 1
## 6 65 0 2 160 360 0 0 151 0 0.8 2
## 7 57 0 0 120 354 0 1 163 1 0.6 2
## 8 55 1 0 132 353 0 1 132 1 1.2 1
## 9 55 0 1 132 342 0 1 166 0 1.2 2
## 10 43 0 0 132 341 1 0 136 1 3 1
## # … with 293 more rows, and 3 more variables: ca <dbl>, thal <dbl>,
## # target <dbl>
Mutating
heart_disease %>%
mutate(ratio = chol / restecg) %>%
select(restecg, chol, ratio)
## # A tibble: 298 × 3
## restecg chol ratio
## <dbl> <dbl> <dbl>
## 1 0 233 Inf
## 2 1 250 250
## 3 0 204 Inf
## 4 1 236 236
## 5 1 354 354
## 6 1 192 192
## 7 0 294 Inf
## 8 1 263 263
## 9 1 199 199
## 10 1 168 168
## # … with 288 more rows
Counting
heart_disease %>%
count(restecg)
## # A tibble: 3 × 2
## restecg n
## <dbl> <int>
## 1 0 146
## 2 1 148
## 3 2 4
heart_disease %>%
count(restecg, sort=TRUE)
## # A tibble: 3 × 2
## restecg n
## <dbl> <int>
## 1 1 148
## 2 0 146
## 3 2 4
Renaming
heart <- heart %>%
rename("gender" = sex)
head(heart)
## # A tibble: 6 × 14
## age gender cp trestbps chol fbs restecg thalach exang oldpeak slope
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 63 1 3 145 233 1 0 150 0 2.3 0
## 2 37 1 2 130 250 0 1 187 0 3.5 0
## 3 41 0 1 130 204 0 0 172 0 1.4 2
## 4 56 1 1 120 236 0 1 178 0 0.8 2
## 5 57 0 0 120 354 0 1 163 1 0.6 2
## 6 57 1 0 140 192 0 1 148 0 0.4 1
## # … with 3 more variables: ca <dbl>, thal <dbl>, target <dbl>
ggplot(data = heart, aes(x = chol, y = restecg)) + geom_point(alpha = 0.5) +
labs(title = "restecg vs. chol") + theme_bw()
ggplot(data = heart, aes(x = chol, y = restecg, color = restecg)) + geom_point()
ggplot(data = heart, aes(x=chol, y=restecg)) + geom_point() +
facet_wrap(~restecg)
Barplot
ggplot(data = heart, aes(x = gender)) + geom_bar(fill = "green") +
labs(title = "Bar chart for count of sex") + theme_bw()
ggplot(data = heart, aes(x = gender)) + geom_bar(fill = "yellow") +
labs(title = "Bar chart for count of sex") + theme_bw() + coord_flip()
Conclusion
Heart disease is a major health concern and there are factors that people should be aware of.
Github => https://github.com/Gunduzhazal/heart
Rpubs => https://rpubs.com/gunduzhazal/832254