TidyVerse 01
The problem statement is as follows:
In this assignment, you’ll practice collaborating around a code project with GitHub. You could consider our collective work as building out a book of examples on how to use TidyVerse functions.
GitHub repository: https://github.com/acatlin/FALL2019TIDYVERSE
FiveThirtyEight.com datasets.
Kaggle datasets.
You have two tasks:
Create an Example. Using one or more TidyVerse packages, and any dataset from fivethirtyeight.com or Kaggle, create a programming sample “vignette” that demonstrates how to use one or more of the capabilities of the selected TidyVerse package with your selected dataset. (25 points) Extend an Existing Example. Using one of your classmate’s examples (as created above), extend his or her example with additional annotated code. (15 points) You should clone the provided repository. Once you have code to submit, you should make a pull request on the shared repository. Minimally, you should be submitted .Rmd files; ideally, you should also submit an .md file and update the README.md file with your example.
After you’ve completed both parts of the assignment, please submit your GitHub handle name in the submission link provided in the week 1 folder! This will let your instructor know that your work is ready to be graded.
You should complete both parts of the assignment and make your submission no later than the end of day on Sunday, December 1st.
Data Source
Load Data
##
## -- Column specification --------------------------------------------------------
## cols(
## age = col_double(),
## sex = col_double(),
## cp = col_double(),
## trestbps = col_double(),
## chol = col_double(),
## fbs = col_double(),
## restecg = col_double(),
## thalach = col_double(),
## exang = col_double(),
## oldpeak = col_double(),
## slope = col_double(),
## ca = col_double(),
## thal = col_double(),
## target = col_double()
## )
| age | sex | cp | trestbps | chol | fbs | restecg | thalach | exang | oldpeak | slope | ca | thal | target |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 63 | 1 | 3 | 145 | 233 | 1 | 0 | 150 | 0 | 2.3 | 0 | 0 | 1 | 1 |
| 37 | 1 | 2 | 130 | 250 | 0 | 1 | 187 | 0 | 3.5 | 0 | 0 | 2 | 1 |
| 41 | 0 | 1 | 130 | 204 | 0 | 0 | 172 | 0 | 1.4 | 2 | 0 | 2 | 1 |
| 56 | 1 | 1 | 120 | 236 | 0 | 1 | 178 | 0 | 0.8 | 2 | 0 | 2 | 1 |
| 57 | 0 | 0 | 120 | 354 | 0 | 1 | 163 | 1 | 0.6 | 2 | 0 | 2 | 1 |
| 57 | 1 | 0 | 140 | 192 | 0 | 1 | 148 | 0 | 0.4 | 1 | 0 | 1 | 1 |
Capability 1.
do capability tutorial (do anything)
Description: Performs any arbitrary computations on the data
Usage: do(data, …)
Example: We can create a function that sorts the data by age then returns the first 5 for each age group.
x = 5
top <- function(t, x){
t %>% arrange(desc(age)) %>% head(x)
}
heart %>% group_by(age) %>% do(top(., x))## # A tibble: 171 x 14
## # Groups: age [41]
## age sex cp trestbps chol fbs restecg thalach exang oldpeak slope
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 29 1 1 130 204 0 0 202 0 0 2
## 2 34 1 3 118 182 0 0 174 0 0 2
## 3 34 0 1 118 210 0 1 192 0 0.7 2
## 4 35 0 0 138 183 0 1 182 0 1.4 2
## 5 35 1 1 122 192 0 1 174 0 0 2
## 6 35 1 0 120 198 0 1 130 1 1.6 1
## 7 35 1 0 126 282 0 0 156 1 0 2
## 8 37 1 2 130 250 0 1 187 0 3.5 0
## 9 37 0 2 120 215 0 1 170 0 0 2
## 10 38 1 2 138 175 0 1 173 0 0 2
## # ... with 161 more rows, and 3 more variables: ca <dbl>, thal <dbl>,
## # target <dbl>
Capability 2.
filter capability tutorial
Description: Using filter we can select rows of the data frame matching conditions.
Usage: filter(data)
Example: To select the people of over 20 and less than 65 we can pass the data heart and condtion age>20 and age < 65 to the function . It’ll return matching rows of heart disease.
## # A tibble: 262 x 14
## age sex cp trestbps chol fbs restecg thalach exang oldpeak slope
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 63 1 3 145 233 1 0 150 0 2.3 0
## 2 37 1 2 130 250 0 1 187 0 3.5 0
## 3 41 0 1 130 204 0 0 172 0 1.4 2
## 4 56 1 1 120 236 0 1 178 0 0.8 2
## 5 57 0 0 120 354 0 1 163 1 0.6 2
## 6 57 1 0 140 192 0 1 148 0 0.4 1
## 7 56 0 1 140 294 0 0 153 0 1.3 1
## 8 44 1 1 120 263 0 1 173 0 0 2
## 9 52 1 2 172 199 1 1 162 0 0.5 2
## 10 57 1 2 150 168 0 1 174 0 1.6 2
## # ... with 252 more rows, and 3 more variables: ca <dbl>, thal <dbl>,
## # target <dbl>
Capability 3.
select capability tutorial
Description: Using select we can keep the selected variables
Usage: select(data, …)
Example: To keep only age, sex,cp variable we can pass the data heart and age, sex,cp to the function .
## # A tibble: 6 x 3
## age sex cp
## <dbl> <dbl> <dbl>
## 1 63 1 3
## 2 37 1 2
## 3 41 0 1
## 4 56 1 1
## 5 57 0 0
## 6 57 1 0
Capability 4.
arrange capability tutorial Description: Using arrange we can order the rows in an expression involving variables
Usage: arrange(data, …)
Example: To arrange the rows by sex and age
## # A tibble: 6 x 3
## age sex cp
## <dbl> <dbl> <dbl>
## 1 34 0 1
## 2 35 0 0
## 3 37 0 2
## 4 39 0 2
## 5 39 0 2
## 6 41 0 1
Marker: 607-13