#Cleaning the environment
rm(list=ls())
#Importing Dataset
setwd("/Users/nishantaneja/Desktop/Files_for_R")
sales_order = read.csv("sales_order.csv", stringsAsFactors = FALSE)
#Loading dplyr
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
#case_when()
case_when() allows us to add multiple if_else() statements and the code is much cleaner to read and understand for any user going forward :)
case_when() is R equivalent of SQL CASE WHEN statements, if no cases match, NA is returned.
Syntax is as follows:
case_when(
condition1 ~ value1,
condition2 ~ value2,
condition3 ~ value3,
TRUE ~ value_for_everything_else_not_covered_by_conditions
)
Using case_when() inside mutate() helps when you want to rely on complex combination of existing variables…
mean(sales_order$Total)
## [1] 456.4623
sales_order_1 = sales_order %>%
mutate(sale_type = case_when(Total > 456 ~ "Sale above Avg.",
Total < 456 ~ "Sale below Avg.",
TRUE ~ "other"))
Let’s play with units now…
sales_order_2 = sales_order %>%
mutate(unit_bucket = case_when(Units >= 0 & Units <=30 ~ "0 - 30",
Units >30 & Units <=50 ~ "31 - 50",
Units >50 & Units <=100 ~ "51 - 100",
TRUE ~ "other"))
As we can see we can categorize a column in buckets by using case_when() instead of using multiple if_else() statements
Thanks for Watching :)