TidyVerse Assignment Part 01

The problem statement is as follows:

In this assignment, you’ll practice collaborating around a code project with GitHub. You could consider our collective work as building out a book of examples on how to use TidyVerse functions.

GitHub repository: https://github.com/acatlin/FALL2019TIDYVERSE

FiveThirtyEight.com datasets.

Kaggle datasets.

You have two tasks:

Create an Example. Using one or more TidyVerse packages, and any dataset from fivethirtyeight.com or Kaggle, create a programming sample “vignette” that demonstrates how to use one or more of the capabilities of the selected TidyVerse package with your selected dataset. (25 points) Extend an Existing Example. Using one of your classmate’s examples (as created above), extend his or her example with additional annotated code. (15 points) You should clone the provided repository. Once you have code to submit, you should make a pull request on the shared repository. Minimally, you should be submitted .Rmd files; ideally, you should also submit an .md file and update the README.md file with your example.

After you’ve completed both parts of the assignment, please submit your GitHub handle name in the submission link provided in the week 1 folder! This will let your instructor know that your work is ready to be graded.

You should complete both parts of the assignment and make your submission no later than the end of day on Sunday, December 1st.

library(tidyverse)
library('knitr')

Data Source

https://www.kaggle.com/ronitf/heart-disease-uci

Load Data

heart <- read_csv("https://raw.githubusercontent.com/forhadakbar/data607fall2019/master/Week%2014/heart.csv")

## Parsed with column specification:
## cols(
##   age = col_double(),
##   sex = col_double(),
##   cp = col_double(),
##   trestbps = col_double(),
##   chol = col_double(),
##   fbs = col_double(),
##   restecg = col_double(),
##   thalach = col_double(),
##   exang = col_double(),
##   oldpeak = col_double(),
##   slope = col_double(),
##   ca = col_double(),
##   thal = col_double(),
##   target = col_double()
## )

kable (head(heart))

age	sex	cp	trestbps	chol	fbs	restecg	thalach	exang	oldpeak	slope	thal	target
63	1	3	145	233	1	0	150	0	2.3	0	1	1
37	1	2	130	250	0	1	187	0	3.5	0	2	1
41	0	1	130	204	0	0	172	0	1.4	2	2	1
56	1	1	120	236	0	1	178	0	0.8	2	2	1
57	0	0	120	354	0	1	163	1	0.6	2	2	1
57	1	0	140	192	0	1	148	0	0.4	1	1	1

Capability 1.

slice capability tutorial

Description: Using slice we can select rows by specifying the row number.
Usage: slice(data, …)
Example: To select rows 6 to 12

slice(heart, 6:12)

## # A tibble: 7 x 14
##     age   sex    cp trestbps  chol   fbs restecg thalach exang oldpeak
##   <dbl> <dbl> <dbl>    <dbl> <dbl> <dbl>   <dbl>   <dbl> <dbl>   <dbl>
## 1    57     1     0      140   192     0       1     148     0     0.4
## 2    56     0     1      140   294     0       0     153     0     1.3
## 3    44     1     1      120   263     0       1     173     0     0  
## 4    52     1     2      172   199     1       1     162     0     0.5
## 5    57     1     2      150   168     0       1     174     0     1.6
## 6    54     1     0      140   239     0       1     160     0     1.2
## 7    48     0     2      130   275     0       1     139     0     0.2
## # ... with 4 more variables: slope <dbl>, ca <dbl>, thal <dbl>,
## #   target <dbl>

To select rows 10 to 15, 18 and 299 to 302

heart %>% slice(c(10:15, 18, 299:302))

## # A tibble: 11 x 14
##      age   sex    cp trestbps  chol   fbs restecg thalach exang oldpeak
##    <dbl> <dbl> <dbl>    <dbl> <dbl> <dbl>   <dbl>   <dbl> <dbl>   <dbl>
##  1    57     1     2      150   168     0       1     174     0     1.6
##  2    54     1     0      140   239     0       1     160     0     1.2
##  3    48     0     2      130   275     0       1     139     0     0.2
##  4    49     1     1      130   266     0       1     171     0     0.6
##  5    64     1     3      110   211     0       0     144     1     1.8
##  6    58     0     3      150   283     1       0     162     0     1  
##  7    66     0     3      150   226     0       1     114     0     2.6
##  8    57     0     0      140   241     0       1     123     1     0.2
##  9    45     1     3      110   264     0       1     132     0     1.2
## 10    68     1     0      144   193     1       1     141     0     3.4
## 11    57     1     0      130   131     0       1     115     1     1.2
## # ... with 4 more variables: slope <dbl>, ca <dbl>, thal <dbl>,
## #   target <dbl>

Capability 2.

mutate capability tutorial

Description: Creates new columns based on existing ones
Usage: mutate(.data, …)
Example: Let’s look at a ratio of resting blood pressure to cholesterol under a new column name “Ratio”

#Ratio = trestbps/chol
heart %>% select(trestbps, chol) %>% mutate(Ratio = trestbps/chol)

## # A tibble: 303 x 3
##    trestbps  chol Ratio
##       <dbl> <dbl> <dbl>
##  1      145   233 0.622
##  2      130   250 0.52 
##  3      130   204 0.637
##  4      120   236 0.508
##  5      120   354 0.339
##  6      140   192 0.729
##  7      140   294 0.476
##  8      120   263 0.456
##  9      172   199 0.864
## 10      150   168 0.893
## # ... with 293 more rows

Capability 3.

summarise capability tutorial

Description: Applies functions that return results of length 1. Can perform multiple calculations in the same call.
Usage: summarise(data, …)
Example: Lets see mean and median cholesterol along with mean and median maximum heart rate achieved.

heart %>% summarise(Avg_cholesteral = mean(chol), Median_cholesteral = median(chol), Avg_heartrate = mean(thalach), Median_heartrate = median(thalach))

## # A tibble: 1 x 4
##   Avg_cholesteral Median_cholesteral Avg_heartrate Median_heartrate
##             <dbl>              <dbl>         <dbl>            <dbl>
## 1            246.                240          150.              153

Capability 4.

group_by capability tutorial with tally

Description: Using group_by and tally we can find count of category member
Usage: group_by(data, …)
Example: Find the count by sex

heart %>% group_by(sex) %>% tally()

## # A tibble: 2 x 2
##     sex     n
##   <dbl> <int>
## 1     0    96
## 2     1   207