The problem statement is as follows:
In this assignment, you’ll practice collaborating around a code project with GitHub. You could consider our collective work as building out a book of examples on how to use TidyVerse functions.
GitHub repository: https://github.com/acatlin/FALL2019TIDYVERSE
FiveThirtyEight.com datasets.
Kaggle datasets.
You have two tasks:
Create an Example. Using one or more TidyVerse packages, and any dataset from fivethirtyeight.com or Kaggle, create a programming sample “vignette” that demonstrates how to use one or more of the capabilities of the selected TidyVerse package with your selected dataset. (25 points) Extend an Existing Example. Using one of your classmate’s examples (as created above), extend his or her example with additional annotated code. (15 points) You should clone the provided repository. Once you have code to submit, you should make a pull request on the shared repository. Minimally, you should be submitted .Rmd files; ideally, you should also submit an .md file and update the README.md file with your example.
After you’ve completed both parts of the assignment, please submit your GitHub handle name in the submission link provided in the week 1 folder! This will let your instructor know that your work is ready to be graded.
You should complete both parts of the assignment and make your submission no later than the end of day on Sunday, December 1st.
Data Source
Load Data
## Parsed with column specification:
## cols(
## age = col_double(),
## sex = col_double(),
## cp = col_double(),
## trestbps = col_double(),
## chol = col_double(),
## fbs = col_double(),
## restecg = col_double(),
## thalach = col_double(),
## exang = col_double(),
## oldpeak = col_double(),
## slope = col_double(),
## ca = col_double(),
## thal = col_double(),
## target = col_double()
## )
| age | sex | cp | trestbps | chol | fbs | restecg | thalach | exang | oldpeak | slope | ca | thal | target |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 63 | 1 | 3 | 145 | 233 | 1 | 0 | 150 | 0 | 2.3 | 0 | 0 | 1 | 1 |
| 37 | 1 | 2 | 130 | 250 | 0 | 1 | 187 | 0 | 3.5 | 0 | 0 | 2 | 1 |
| 41 | 0 | 1 | 130 | 204 | 0 | 0 | 172 | 0 | 1.4 | 2 | 0 | 2 | 1 |
| 56 | 1 | 1 | 120 | 236 | 0 | 1 | 178 | 0 | 0.8 | 2 | 0 | 2 | 1 |
| 57 | 0 | 0 | 120 | 354 | 0 | 1 | 163 | 1 | 0.6 | 2 | 0 | 2 | 1 |
| 57 | 1 | 0 | 140 | 192 | 0 | 1 | 148 | 0 | 0.4 | 1 | 0 | 1 | 1 |
Capability 1.
slice capability tutorial
Description: Using slice we can select rows by specifying the row number.
Usage: slice(data, …)
Example: To select rows 6 to 12
## # A tibble: 7 x 14
## age sex cp trestbps chol fbs restecg thalach exang oldpeak
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 57 1 0 140 192 0 1 148 0 0.4
## 2 56 0 1 140 294 0 0 153 0 1.3
## 3 44 1 1 120 263 0 1 173 0 0
## 4 52 1 2 172 199 1 1 162 0 0.5
## 5 57 1 2 150 168 0 1 174 0 1.6
## 6 54 1 0 140 239 0 1 160 0 1.2
## 7 48 0 2 130 275 0 1 139 0 0.2
## # ... with 4 more variables: slope <dbl>, ca <dbl>, thal <dbl>,
## # target <dbl>
OR
To select rows 10 to 15, 18 and 299 to 302
## # A tibble: 11 x 14
## age sex cp trestbps chol fbs restecg thalach exang oldpeak
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 57 1 2 150 168 0 1 174 0 1.6
## 2 54 1 0 140 239 0 1 160 0 1.2
## 3 48 0 2 130 275 0 1 139 0 0.2
## 4 49 1 1 130 266 0 1 171 0 0.6
## 5 64 1 3 110 211 0 0 144 1 1.8
## 6 58 0 3 150 283 1 0 162 0 1
## 7 66 0 3 150 226 0 1 114 0 2.6
## 8 57 0 0 140 241 0 1 123 1 0.2
## 9 45 1 3 110 264 0 1 132 0 1.2
## 10 68 1 0 144 193 1 1 141 0 3.4
## 11 57 1 0 130 131 0 1 115 1 1.2
## # ... with 4 more variables: slope <dbl>, ca <dbl>, thal <dbl>,
## # target <dbl>
Capability 2.
mutate capability tutorial
Description: Creates new columns based on existing ones
Usage: mutate(.data, …)
Example: Let’s look at a ratio of resting blood pressure to cholesterol under a new column name “Ratio”
## # A tibble: 303 x 3
## trestbps chol Ratio
## <dbl> <dbl> <dbl>
## 1 145 233 0.622
## 2 130 250 0.52
## 3 130 204 0.637
## 4 120 236 0.508
## 5 120 354 0.339
## 6 140 192 0.729
## 7 140 294 0.476
## 8 120 263 0.456
## 9 172 199 0.864
## 10 150 168 0.893
## # ... with 293 more rows
Capability 3.
summarise capability tutorial
Description: Applies functions that return results of length 1. Can perform multiple calculations in the same call.
Usage: summarise(data, …)
Example: Lets see mean and median cholesterol along with mean and median maximum heart rate achieved.
## # A tibble: 1 x 4
## Avg_cholesteral Median_cholesteral Avg_heartrate Median_heartrate
## <dbl> <dbl> <dbl> <dbl>
## 1 246. 240 150. 153
Capability 4.
group_by capability tutorial with tally
Description: Using group_by and tally we can find count of category member
Usage: group_by(data, …)
Example: Find the count by sex
## # A tibble: 2 x 2
## sex n
## <dbl> <int>
## 1 0 96
## 2 1 207