This tuturial is the part of the dplyr training series. Here is the YouTube Video link for this tutuorial.
Here is the link to the complete series on DPLYR
https://www.youtube.com/playlist?list=PLkHcMTpvAaXVJzyRSytUn3nSK92TJphxR
dplyr is a great tool to use in R.
The commands may look long and overwhelming to someone not using dplyr but that is not the case.
Once you learn the basics of it then it is very intuitive to use. Just like making a sentence once you have learnt the basic words of a language.
For beginners or experienced R users wanting to learn various commands of dplyr.
We will be covering all practical aspects of dplyr::case_when command in this. This tutorial is part of a series of tutorials on all practical aspects of dplyr All youtube videos are available in a single playlist on YouTube.
https://www.youtube.com/playlist?list=PLkHcMTpvAaXVJzyRSytUn3nSK92TJphxR
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.1.3
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.1.3
library(wakefield)
## Warning: package 'wakefield' was built under R version 4.1.3
Let us create some data for patient ages.
wakefield::age(10) # draw 10 ages with default values
## [1] 38 40 39 29 34 26 65 87 85 58
In this use case we assume that we got some data from a hospital which only had the patient date of birth. We will convert the date of birth to the AgeYears Then we will create age groups Create a frequency table Create some chart showing the age distribution
Using the wakefield package,we will create some data which has the date of birth of patients.
Patients <- wakefield::dob(5000
, random = TRUE
, start = Sys.Date() - 365 * 120
, k = 365 * 120
, by = "1 days" )
Patients <- Patients%>%
data.frame()
names(Patients) <- c("DOB")
Patients <- Patients%>%
dplyr::mutate(AgeYears = floor((Sys.Date() - DOB)/365.25)) %>%
dplyr::mutate(AgeYears = as.integer(AgeYears))
We will use the case_when command from dplyr to convert the Age into Agegroups.
d1 <- Patients %>%
dplyr::arrange(AgeYears)%>%
dplyr::mutate(AgeGroup = case_when(between(AgeYears, 0,10) ~ '0-10 Years'
,between(AgeYears, 11,20) ~ '11-20 Years'
,between(AgeYears, 21,30) ~ '21-30 Years'
,between(AgeYears, 31,40) ~ '31-40 Years'
,between(AgeYears, 41,50) ~ '41-50 Years'
,between(AgeYears, 51,60) ~ '51-60 Years'
,between(AgeYears, 61,70) ~ '61-70 Years'
,between(AgeYears, 71,80) ~ '71-80 Years'
,between(AgeYears, 81,120) ~ '80 + Years'
,TRUE ~ 'Unknown'
))
d1
Group the data by AgeGroup and calculate the frequency and the percentage.
d2 <- d1 %>%
dplyr::group_by(AgeGroup)%>%
dplyr::tally()%>%
dplyr::mutate(pct = n/sum(n))
d2
pl <- ggplot(data = d1,aes(x= AgeYears))
pl <- pl + geom_histogram(fill = "red", alpha = 0.2)
pl <- pl + theme_bw()
pl
pl <- ggplot(data = d1,aes(x= AgeYears))
pl <- pl + geom_density()
pl <- pl + theme_bw()
pl
pl <- ggplot(data = d2,aes(x= AgeGroup, y = n))
pl <- pl + geom_bar(stat ="identity")
pl <- pl + theme_bw()
pl
library(scales)
## Warning: package 'scales' was built under R version 4.1.3
pl <- ggplot(data = d2,aes(x= AgeGroup, y = pct))
pl <- pl + geom_bar(stat ="identity")
pl <- pl + theme_bw()
pl <- pl + scale_y_continuous(labels = scales::percent)
pl
pl <- ggplot(data = d2,aes(x= AgeGroup, y = n))
pl <- pl + geom_line(group=1)
pl <- pl + theme_bw()
pl
pl <- ggplot(data = d2,aes(x= AgeGroup, y = n))
pl <- pl + geom_point()
pl <- pl + theme_bw()
pl
Watch our complete tutorial on all aspects of DPLYR.
https://www.youtube.com/playlist?list=PLkHcMTpvAaXVJzyRSytUn3nSK92TJphxR