## Load the library
library(tidyverse)
## ── Attaching packages ───────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.0 ✔ purrr 0.2.5
## ✔ tibble 2.0.0 ✔ dplyr 0.8.0.1
## ✔ tidyr 0.8.2 ✔ stringr 1.3.1
## ✔ readr 1.3.1 ✔ forcats 0.4.0
## ── Conflicts ──────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
https://www.kaggle.com/ronitf/heart-disease-uci
## Read the data using readr
r disease <- read_csv("heart.csv")
## Parsed with column specification: ## cols( ## age = col_double(), ## sex = col_double(), ## cp = col_double(), ## trestbps = col_double(), ## chol = col_double(), ## fbs = col_double(), ## restecg = col_double(), ## thalach = col_double(), ## exang = col_double(), ## oldpeak = col_double(), ## slope = col_double(), ## ca = col_double(), ## thal = col_double(), ## target = col_double() ## )
r head(disease)
## # A tibble: 6 x 14 ## age sex cp trestbps chol fbs restecg thalach exang oldpeak ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 63 1 3 145 233 1 0 150 0 2.3 ## 2 37 1 2 130 250 0 1 187 0 3.5 ## 3 41 0 1 130 204 0 0 172 0 1.4 ## 4 56 1 1 120 236 0 1 178 0 0.8 ## 5 57 0 0 120 354 0 1 163 1 0.6 ## 6 57 1 0 140 192 0 1 148 0 0.4 ## # … with 4 more variables: slope <dbl>, ca <dbl>, thal <dbl>, target <dbl>
The selected package I want to use id dplyr.
## Capability 1.
### filter capability tutorial
### Description
Using filter we can select rows of the data frame matching conditions.
### Usage filter(data) ### Example
To select the people of over 20 and less than 65 we can pass the data disease and condtion age>20 and age < 65 to the function . It’ll return matching rows of heart disease.
r filter(disease, age>20 & age < 65)
## # A tibble: 262 x 14 ## age sex cp trestbps chol fbs restecg thalach exang oldpeak ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 63 1 3 145 233 1 0 150 0 2.3 ## 2 37 1 2 130 250 0 1 187 0 3.5 ## 3 41 0 1 130 204 0 0 172 0 1.4 ## 4 56 1 1 120 236 0 1 178 0 0.8 ## 5 57 0 0 120 354 0 1 163 1 0.6 ## 6 57 1 0 140 192 0 1 148 0 0.4 ## 7 56 0 1 140 294 0 0 153 0 1.3 ## 8 44 1 1 120 263 0 1 173 0 0 ## 9 52 1 2 172 199 1 1 162 0 0.5 ## 10 57 1 2 150 168 0 1 174 0 1.6 ## # … with 252 more rows, and 4 more variables: slope <dbl>, ca <dbl>, ## # thal <dbl>, target <dbl>
## Capability 2.
### select capability tutorial
###Description
Using select we can keep the selected variables
###sage select(data, …)
###Example
To keep only age, sex,cp variable we can pass the data disease and age, sex,cp to the function .
r df<- select(disease, c("age","sex","cp")) head(df)
## # A tibble: 6 x 3 ## age sex cp ## <dbl> <dbl> <dbl> ## 1 63 1 3 ## 2 37 1 2 ## 3 41 0 1 ## 4 56 1 1 ## 5 57 0 0 ## 6 57 1 0
## Extended by Santosh Cheruku ### Using dplyr::Arrange - To change the ordering of rows
### Description
Using arrange we can order the rows in an expression involving variables
### Example To arrange the rows by sex and age
df <- df %>% arrange(sex, age)
head(df)
## # A tibble: 6 x 3
## age sex cp
## <dbl> <dbl> <dbl>
## 1 34 0 1
## 2 35 0 0
## 3 37 0 2
## 4 39 0 2
## 5 39 0 2
## 6 41 0 1
tail(df)
## # A tibble: 6 x 3
## age sex cp
## <dbl> <dbl> <dbl>
## 1 69 1 2
## 2 70 1 1
## 3 70 1 0
## 4 70 1 0
## 5 70 1 2
## 6 77 1 0
We can use ggplot() to visualize the data, by supplying the ggplot() with the data, aesthetics, and mapping accordingly to the variables. * We can use ggplot2() to plot a bubble chart of cholesterol vs. age. The size of the points get larger as cholesterol increases. We do this by setting the size property of aesthetic to the cholesterol variable.
library(ggplot2)
ggplot(disease, aes(age, chol)) + geom_point(aes(size = chol), colour = "blue")
ggplot(data = df) + geom_bar(mapping = aes(x = age, y = cp), stat = "identity")
ggplot(df, aes(age, cp)) + geom_jitter(aes(colour = cp))