This tuturial is the part of the dplyr training series. Here is the YouTube Video link for this tutuorial.

https://youtu.be/3Pq_sEh9IRE

Here is the link to the complete series on DPLYR

https://www.youtube.com/playlist?list=PLkHcMTpvAaXVJzyRSytUn3nSK92TJphxR

Why dplyr

dplyr is a great tool to use in R.

The commands may look long and overwhelming to someone not using dplyr but that is not the case.

Once you learn the basics of it then it is very intuitive to use. Just like making a sentence once you have learnt the basic words of a language.

Audience

For beginners or experienced R users wanting to learn various commands of dplyr.

DPLYR : case_when

We will be covering all practical aspects of dplyr::case_when command in this. This tutorial is part of a series of tutorials on all practical aspects of dplyr All youtube videos are available in a single playlist on YouTube.

https://www.youtube.com/playlist?list=PLkHcMTpvAaXVJzyRSytUn3nSK92TJphxR

Create sample dataset

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.1.3
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.1.3
library(wakefield)
## Warning: package 'wakefield' was built under R version 4.1.3

Sample dataset.

Let us create some data for patient ages.

wakefield::age(10) # draw 10 ages with default values
##  [1] 38 40 39 29 34 26 65 87 85 58

In this use case we assume that we got some data from a hospital which only had the patient date of birth. We will convert the date of birth to the AgeYears Then we will create age groups Create a frequency table Create some chart showing the age distribution

create date of births.

Using the wakefield package,we will create some data which has the date of birth of patients.

Patients <- wakefield::dob(5000
                               , random = TRUE
                               , start = Sys.Date() - 365 * 120
                               , k = 365 * 120
                               ,  by = "1 days" )
Patients <- Patients%>%
             data.frame()

names(Patients) <- c("DOB")

Calculate age in years from the date of birth

Patients <- Patients%>%
            dplyr::mutate(AgeYears  = floor((Sys.Date() - DOB)/365.25)) %>%
            dplyr::mutate(AgeYears  = as.integer(AgeYears)) 

Using DPLYR CASE_WHEN statement

We will use the case_when command from dplyr to convert the Age into Agegroups.

d1 <- Patients %>%
      dplyr::arrange(AgeYears)%>%
      dplyr::mutate(AgeGroup = case_when(between(AgeYears, 0,10)     ~ '0-10 Years'
                                         ,between(AgeYears, 11,20)   ~ '11-20 Years' 
                                          ,between(AgeYears, 21,30)  ~ '21-30 Years'
                                          ,between(AgeYears, 31,40)  ~ '31-40 Years'
                                          ,between(AgeYears, 41,50)  ~ '41-50 Years'
                                          ,between(AgeYears, 51,60)  ~ '51-60 Years'
                                          ,between(AgeYears, 61,70)  ~ '61-70 Years'
                                          ,between(AgeYears, 71,80)  ~ '71-80 Years'
                                          ,between(AgeYears, 81,120) ~ '80 +  Years'
                                         
                                         ,TRUE  ~ 'Unknown'
                                         )) 


d1

Using DPLYR to create a frequency table

Group the data by AgeGroup and calculate the frequency and the percentage.

d2 <- d1 %>%
      dplyr::group_by(AgeGroup)%>%
      dplyr::tally()%>%
      dplyr::mutate(pct = n/sum(n))

d2

Plot a histogram of the age

pl <- ggplot(data = d1,aes(x= AgeYears))
pl <-  pl + geom_histogram(fill = "red", alpha = 0.2)
pl <- pl + theme_bw()

pl

You can also plot the density of the age

pl <- ggplot(data = d1,aes(x= AgeYears))
pl <-  pl + geom_density()
pl <- pl + theme_bw()

pl

Bar chart to show the frequency count of the age groups

pl <- ggplot(data = d2,aes(x= AgeGroup, y = n))
pl <-  pl + geom_bar(stat ="identity")
pl <- pl + theme_bw()

pl

Bar chart showing percentage

library(scales)
## Warning: package 'scales' was built under R version 4.1.3
pl <- ggplot(data = d2,aes(x= AgeGroup, y = pct))
pl <-  pl + geom_bar(stat ="identity")
pl <- pl + theme_bw()
pl <-  pl + scale_y_continuous(labels = scales::percent)

pl

Line chart

pl <- ggplot(data = d2,aes(x= AgeGroup, y = n))
pl <-  pl + geom_line(group=1)
pl <- pl + theme_bw()

pl

Point chart

pl <- ggplot(data = d2,aes(x= AgeGroup, y = n))
pl <-  pl + geom_point()
pl <- pl + theme_bw()

pl

Watch our complete tutorial on all aspects of DPLYR.

https://www.youtube.com/playlist?list=PLkHcMTpvAaXVJzyRSytUn3nSK92TJphxR