R_dplyr_case

#Cleaning the environment

rm(list=ls())

#Importing Dataset

setwd("/Users/nishantaneja/Desktop/Files_for_R")
sales_order = read.csv("sales_order.csv", stringsAsFactors = FALSE)

#Loading dplyr

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

#case_when()

case_when() allows us to add multiple if_else() statements and the code is much cleaner to read and understand for any user going forward :)

case_when() is R equivalent of SQL CASE WHEN statements, if no cases match, NA is returned.

Syntax is as follows:

case_when(
  condition1 ~ value1,
  condition2 ~ value2,
  condition3 ~ value3,
  TRUE ~ value_for_everything_else_not_covered_by_conditions
)

Using case_when() inside mutate() helps when you want to rely on complex combination of existing variables…

mean(sales_order$Total)

## [1] 456.4623

sales_order_1 = sales_order %>% 
  mutate(sale_type = case_when(Total > 456 ~ "Sale above Avg.",
                                Total < 456 ~ "Sale below Avg.",
                               TRUE ~ "other"))

Let’s play with units now…

sales_order_2 = sales_order %>% 
  mutate(unit_bucket = case_when(Units >= 0 & Units <=30 ~ "0 - 30",
                                Units >30 & Units <=50 ~ "31 - 50",
                                Units >50 & Units <=100 ~ "51 - 100",
                                TRUE ~ "other"))

As we can see we can categorize a column in buckets by using case_when() instead of using multiple if_else() statements

Thanks for Watching :)

R_dplyr_case_when

nth education

2022-12-11