SYNOPSIS

In this report, we delve into the impact of discounts on sales during the holiday season, focusing on Easter, Christmas, and Valentine’s Day. Our objective is to explore growth opportunities by conducting customer segmentation and assessing purchase power to inform a potential early discount strategy before the official holiday period. We also look into the potential growth in offering good deals by learning customer behavior pattern.

Our approach involves identifying key products, determining peak purchase times, exploring associated items, quantifying total sales revenue and monthly margin. This strategic analysis aims to create a plan for effectively managing seasonal sales, translating insights into a marketing strategy that boost sales.

PACKAGE REQUIRED

The following R packages are required in order to run the code in this R project:

library(plotly)
library(lubridate)  # functions used for working with dates and times
library(tidyverse)  # tidying data and working with other R packages
library(completejourney)  # grocery store shopping transactions data from group of 2,469 households
library(ggplot2)  # data visualization plotting system 
library(gganimate)  # Extends ggplot2 to create animated plots.
library(dplyr)  # manipulating and transforming data (i.e., filtering, joining, etc.)
library(hrbrthemes)  # Offers additional themes for ggplot2.
library(viridis)  # Provides perceptually uniform color maps for data visualization.
library(ggthemes)  # additional plotting themes, scales, and geoms for "ggplot2"

DATA PREPARATION & EXLORATORY DATA

# complete journey data
df_transactions <- completejourney::get_transactions()
df_products <- completejourney::products
df_demographics <- demographics

SUMMARY

EASTER

##code
Easter <- df_transactions %>%
  mutate(
    month = month(transaction_timestamp, label = TRUE)) %>%
  inner_join(
    df_products, 
    by = 'product_id'
  ) %>%
  group_by(month, product_category) %>%
  summarise(
    volume = sum(quantity),
    sale = sum(sales_value)
  ) %>%
  group_by(
    product_category
  ) %>%
  mutate(
    monthly_sale = mean(sale),
    monthly_margin = (sale/monthly_sale)-1
  ) %>%
  filter(
    product_category == "EASTER"
  ) %>%
  arrange(month)

## Plot 1
Easter %>% 
  ggplot(aes(x=month, group = 1)) +
  geom_col(aes(y =sale), fill ='brown', size = 0.1,show.legend = TRUE) +
  geom_point(aes(y=(monthly_margin*100)), color = 'orange', size = 2, alpha = 0.6) +
  geom_path(aes(y=monthly_margin*100, color = 'Monthly Margin compared to average (%)')) +
  geom_path(aes(y=monthly_sale, color = 'Average Monthly Sale ($)')) +
  scale_color_manual(name = 'legend', values = c('Monthly Margin compared to average (%)'='gold','Average Monthly Sale ($)'='black'))+
  scale_y_continuous(
    name =  'Total Sales Value ($)',
    sec.axis = sec_axis(~./10, name = 'Monthly Margin Rate (%)')
  ) +
  theme_classic()+
  labs(
    title = 'Monthly Sales and Monthly Margin generated by Easter 2017',
    x = 'Month',
    subtitle =
      'The data below shows the total sales value in dollars and the marginal rate of the month
compared to average level in the year 2017.'
  )

Taking a top-down approach to peak seasonal sales accross all product categories, Easter category observed a noticeably jump in sales value and volume during Easter period starting from Feb to Apr.

##Code
Easter_1 <- df_transactions %>%
  mutate(
    month = month(transaction_timestamp, label = TRUE)) %>%
  inner_join(
    df_products, 
    by = 'product_id'
  ) %>%
  group_by(month, product_category, product_type) %>%
  summarise(
    volume = sum(quantity),
    sale = sum(sales_value)
  ) %>%
  group_by(
    product_type
  ) %>%
  mutate(
    monthly_sale = mean(sale),
    monthly_margin = ((sale/monthly_sale)-1)*100
  ) %>%
  filter(
    product_category == "EASTER"
  ) %>%
  arrange(desc(monthly_margin))
view(Easter_1)

Easter_top_3 <- Easter_1 %>%
  filter(
    month %in% c("Feb","Mar","Apr")
  ) %>%
  group_by(month, product_type) %>%
  summarise(
    sale = mean(sale),
    volume = mean(volume), 
    monthly_sale = mean(monthly_sale),
    monthly_margin = mean(monthly_margin)
  ) %>%
  arrange(desc(monthly_margin)) %>%
  group_by(month) %>%
  arrange(desc(monthly_margin))%>%
  top_n(3,monthly_margin)
view(Easter_top_3)

##Plot 2: 
ggplot(Easter_top_3, aes(x=product_type, fill = month))+
  geom_col(aes(y=monthly_margin), width = 0.5)+
  facet_wrap(~month, nrow =3, scales='free') +
  theme_tufte() + theme(axis.line=element_line(), axis.text.x = element_text(size = 5)) +
  ylab('Monthly margin (%)')+
  labs(
    title = 'Top 3 performers of each month during Easter 2017',
    x='',
    subtitle =
      'The data below shows the sales boost in percentage of the product'
  )

The second insight delves into the specific products accelerating the most during the peak season, which vary by month

##Code:
Easter_discount <- df_transactions %>%
  mutate(
    month = month(transaction_timestamp, label = TRUE)) %>%
  inner_join(
    df_products, 
    by = 'product_id'
  ) %>%
  group_by(month, product_category) %>%
  summarise(
    volume = sum(quantity),
    sale = sum(sales_value), 
    store_discount = sum(retail_disc),
    coupon_disc = sum(coupon_disc)
  ) %>%
  filter(
    product_category == "EASTER"
  ) %>%
  arrange(month)
view(Easter_discount)

##Plot 3:

ggplot(Easter_discount, aes(x=month))+
  geom_col(mapping = aes(y=sale, fill = 'Sale Value')) +
  geom_col(aes(y=store_discount, fill = 'Discount Value'))+
  scale_fill_manual(
    name = 'amount',
    values = c('Sale Value'= '#69b3a2','Discount Value' = 'beige')
  ) +
theme_classic()+
labs(
  title = 'Discount Effect on Sales of Easter Seasonal Products in 2017',
  x='Month',
  subtitle =
    'The data below shows the sales and discount value in dollars'
)

The third insight concerns about the discount effect on sales for products within Easter category.

Based on the data, even though sales start to spike in Feb and Mar, discount regimes only happen in Apr. Furthermore, all the discounts are marked up by retailers, not by suppliers or coupon redemption.

##Code: 
Easter_customer <- df_transactions %>%
  inner_join(df_demographics, by = 'household_id') %>%
  inner_join(df_products, by = 'product_id') %>%
  filter(
    product_category == 'EASTER'
  ) %>%
  mutate(
    kids_count = factor(kids_count)
  ) %>%
  group_by(age, income, home_ownership, marital_status, household_size, kids_count) %>%
  summarise(
    purchase_value = sum(sales_value), 
    purchase_volume = sum(quantity),
    discount_used = sum(retail_disc)
  ) %>%
  arrange(desc(purchase_value))
view(Easter_customer)

##Plot4: 
ggplot(Easter_customer,aes(x=age,y=income,fill=marital_status,size=kids_count))+ 
  geom_point(alpha=0.5, shape=21, color="black")+
  theme_ipsum() +
  theme(legend.position="right")+
  ylab('income')+
  xlab('age')

The fourth insight is triggered by the question whether discount should be applied starting from February until April to stimulate more sales of seasonal products.

To answer the question, we are going to explore the behaviours of target customers for Easter products. Taking a look at their demographic data, the common traits of the group is that they are married and have kids. Their age mainly fall between 35-54 and their income level is in the lower-end of the range.

#Code: 
Easter_customer_track <- df_transactions %>%
  inner_join(df_demographics, by = 'household_id') %>%
  inner_join(df_products, by = 'product_id') %>%
  filter(
    product_category == 'EASTER'
  ) %>%
  group_by(household_id) %>%
  summarise(
    purchase_value = sum(sales_value), 
    purchase_volume = sum(quantity),
    discount_used = sum(retail_disc)
  ) %>%
  arrange(desc(purchase_value))
view(Easter_customer_track)

#Plot5: 
ggplot(Easter_customer,aes(x=age,y=income,fill=marital_status,size=kids_count))+ 
  geom_point(alpha=0.5, shape=21, color="black")+
  theme_ipsum() +
  theme(legend.position="bottom")+
  ylab('income')+
  labs(
    title = 'Target customer demographic distribution map',
    x='age',
    subtitle =
      'The data below shows where most of our target customers are demographically identified '
  )

The fifth insight is backed by the rationale that lower-income households are more tempted by discount than others. We are going to track their buying pattern across other categories to see how discount affect their buying incentives.

Customer_other_purchase <- df_transactions %>%
  inner_join(Easter_customer_track, by = 'household_id') %>%
  inner_join(df_products) %>%
  group_by(product_category, product_type) %>%
  summarise(
    purchase_value = sum(sales_value), 
    purchase_volume = sum(quantity),
    discount_used = sum(retail_disc), 
  ) %>%
  filter(
    product_category != 'COUPON/MISC ITEMS'
  ) %>%
  mutate(
    discount_percentage = (discount_used/purchase_value)*100,
    product_group = case_when(
      product_category %in% c('FLUID MILK PRODUCTS','SOFT DRINKS','BEERS/ALES') ~ 'Beverage',
      product_category %in% c('BEEF','CHEESE') ~ 'Food',
      product_category %in% c('BATH TISSUES','DIAPERS & DISPOSABLES','CIGARETTES') ~ 'Household and Others'
    )
  ) %>%
  arrange(desc(purchase_value)) %>%
  head(9)
view(Customer_other_purchase)

##Plot6: 
ggplot(Customer_other_purchase, aes(x=product_category, na.rm=TRUE))+
  geom_col(aes(y=purchase_value,fill = 'Purchase Value'))+
  geom_col(aes(y=discount_used, fill = 'Discount value'), alpha = 0.7)+
  ylim(0,30000)+
  facet_wrap(~product_group, nrow =3, scales='free') +
  theme_tufte() + theme(axis.line=element_line(), axis.text.x = element_text(size = 5))+
  labs(
    title = 'Discount Effect on Purchasing Power of Target Customers based on their Top Choices in 2017',
    x='Month',
    subtitle =
      'The data below shows the sales and discount value in dollars'
  )

The group also has huge purchase power accross food and beverage category and receives a large portion of discount over the total shopping cart value

CHRISTMAS

Christmas <- df_transactions %>%
  mutate(
    month = month(transaction_timestamp, label = TRUE)) %>%
  inner_join(
    df_products, 
    by = 'product_id'
  ) %>%
  group_by(month, product_category) %>%
  summarise(
    volume = sum(quantity),
    sale = sum(sales_value)
  ) %>%
  group_by(
    product_category
  ) %>%
  mutate(
    monthly_sale = mean(sale),
    monthly_margin = (sale/monthly_sale)-1
  ) %>%
  filter(
    product_category == "CHRISTMAS  SEASONAL"
  ) %>%
  arrange(month)

##Plot1: 
Christmas  %>% 
  ggplot(aes(x=month, group = 1)) +
  geom_col(aes(y =sale), fill ='darkgreen', size = 0.1,show.legend = TRUE) +
  geom_point(aes(y=(monthly_margin*100)), color = 'orange', size = 2, alpha = 0.6) +
  geom_path(aes(y=monthly_margin*100, color = 'Monthly Margin compared to average (%)')) +
  geom_path(aes(y=monthly_sale, color = 'Average Monthly Sale ($)')) +
  scale_color_manual(name = 'legend', values = c('Monthly Margin compared to average (%)'='gold','Average Monthly Sale ($)'='black'))+
  scale_y_continuous(
    name =  'Total Sales Value ($)',
    sec.axis = sec_axis(~./10, name = 'Monthly Margin Rate (%)')
  ) +
  theme_classic()+
  labs(
    title = 'Monthly Sales and Monthly Margin generated by Christmas in 2017',
    x = 'Month',
    subtitle =
      'The data below shows the total sales value in dollars and the marginal rate of the month compared to average level in the year 2017.'
  )

The initial observation relates to the sales performance of the Christmas seasonal throughout the 12 months of the year, considering the marginal increase or decrease in each month compared to the monthly average level.

Christmas_discount <- df_transactions %>%
  mutate(
    month = month(transaction_timestamp, label = TRUE)) %>%
  inner_join(
    df_products, 
    by = 'product_id'
  ) %>%
  group_by(month, product_category) %>%
  summarise(
    volume = sum(quantity),
    sale = sum(sales_value), 
    store_discount = sum(retail_disc),
    coupon_disc = sum(coupon_disc)
  ) %>%
  filter(
    product_category == "CHRISTMAS  SEASONAL"
  ) %>%
  arrange(month)

##Plot 3:
ggplot(Christmas_discount, aes(x=month))+
  geom_col(mapping = aes(y=sale, fill = 'Sale Value')) +
  geom_col(aes(y=store_discount, fill = 'Discount Value'))+
  scale_fill_manual(
    name = 'amount',
    values = c('Sale Value'= '#235E6F','Discount Value' = 'beige')
  ) +
  theme_classic()+
  labs(
    title = 'Discount Effect on Christmas Seasonal in 2017',
    x='Month',
    subtitle =
      'The data below shows the sales and discount value in dollars'
  )

As we delve into the data, an intriguing insight emerges—January showcases a unique pattern where the discount effect on sales stands at an impressive 100%. This anomaly prompts us to embark on a journey to unravel the underlying dynamics of this phenomenon.

Christmas_1 <- df_transactions %>%
  mutate(
    month = month(transaction_timestamp, label = TRUE)) %>%
  inner_join(
    df_products, 
    by = 'product_id'
  ) %>%
  group_by(month, product_category, product_type, basket_id) %>%
  summarise(
    volume = sum(quantity),
    sale = sum(sales_value)
  ) %>%
  group_by(
    product_type
  ) %>%
  mutate(
    monthly_sale = mean(sale),
    monthly_margin = ((sale/monthly_sale)-1)*100
  ) %>%
  filter(
    product_category == "CHRISTMAS  SEASONAL"
  ) %>%
  arrange(desc(monthly_margin))

Christmas_top_2 <- Christmas_1 %>%
  filter(
    month %in% c("Jan")
  ) %>%
  group_by(month, product_type, basket_id) %>%
  summarise(
    sale,
    volume = mean(volume), 
    monthly_sale = mean(monthly_sale),
    monthly_margin = mean(monthly_margin)
  ) %>%
  arrange(desc(monthly_margin)) %>%
  group_by(month) %>%
  arrange(desc(monthly_sale))%>%
  top_n(2,monthly_margin)

##Plot 2: 
ggplot(Christmas_top_2, aes(x=product_type, fill = month))+
  geom_col(aes(y=sale), width = 0.5, fill ='#CC231E')+
  facet_wrap(~month, nrow =5, scales='free') +
  theme_tufte() + theme(axis.line=element_line(), axis.text.x = element_text(size = 5)) +
  ylab('Sales ($)')+
  labs(
    title = 'Top 2 performers of January of Christmas Seasonal Category in 2017',
    x='',
    subtitle =
      'The data below shows the 2 best-selling products in January'
  )

We aim to identify the top-performing products during this month and discern whether a strategic bundle products approach could enhance our sales strategy.

## Products that bought with it 
product_bought_tog <- df_transactions %>%
  mutate(
    month = month(transaction_timestamp, label = TRUE)) %>%
  inner_join(
    df_products, 
    by = 'product_id'
  ) %>%
  group_by(month, product_category, product_type, basket_id) %>%
  summarise(
    volume = sum(quantity),
    sale = sum(sales_value)
  ) %>%
  group_by(
    product_category
  ) %>%
  mutate(
    monthly_sale = mean(sale),
    monthly_margin = (sale/monthly_sale)-1
  ) %>%
  filter(
    basket_id %in% c(31343846854, 31356470477, 31343890347)
  ) %>%
  arrange(desc(volume)) %>%
  head(5)
#plot 3
plot_ly(
  data = product_bought_tog,
  labels = ~product_type,
  parents = ~month,
  values = ~sale,
  type = 'treemap',
  marker = list(
    colors = c('#F8B229', '#4CAF50', '#3498DB', '#E74C3C', '#9B59B6'))
) %>%
  layout(
    title = "Top 5 products bought with Decor and Candles",
    xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
    yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE)
  )

To capitalize on this insight and boost overall sales value, we are considering implementing a bundle strategy focused on these popular items.

Taking a step further, we delved into the products frequently bought in conjunction with candles and decor. Surprisingly, pizza, frozen dinners, burritos, and lemons emerged as the top 5 products frequently purchased together. This intriguing finding suggests an interesting customer behavior pattern.

The potential meaning behind this pattern could be that customers who purchase candles and decor are likely looking for convenience and quick meal solutions, as evidenced by the preference for items like pizza and frozen dinners. The inclusion of lemons also hints at a preference for fresh and flavorful additions to their meals.

VALENTINE

##Code1: 
Valentine <- df_transactions %>%
  mutate(
    month = month(transaction_timestamp, label = TRUE)) %>%
  inner_join(
    df_products, 
    by = 'product_id'
  ) %>%
  group_by(month, product_category) %>%
  summarise(
    volume = sum(quantity),
    sale = sum(sales_value)
  ) %>%
  group_by(
    product_category
  ) %>%
  mutate(
    monthly_sale = mean(sale),
    monthly_margin = (sale/monthly_sale)-1
  ) %>%
  filter(
    product_category == "VALENTINE"
  ) %>%
  arrange(month)

##Plot1: 
Valentine  %>% 
  ggplot(aes(x=month, group = 1)) +
  geom_col(aes(y =sale), fill ='lightpink', size = 0.1,show.legend = TRUE) +
  geom_point(aes(y=(monthly_margin*100)), color = 'red', size = 2, alpha = 0.6) +
  geom_path(aes(y=monthly_margin*100, color = 'Monthly Margin compared to average (%)')) +
  geom_path(aes(y=monthly_sale, color = 'Average Monthly Sale ($)')) +
  scale_color_manual(name = 'legend', values = c('Monthly Margin compared to average (%)'='darkred','Average Monthly Sale ($)'='red'))+
  scale_y_continuous(
    name =  'Total Sales Value ($)',
    sec.axis = sec_axis(~./10, name = 'Monthly Margin Rate (%)')
  ) +
  theme_classic()+
  labs(
    title = 'Monthly Sales and Monthly Margin generated by Valentine 2017',
    x = 'Month',
    subtitle =
      'The data below shows the total sales value in dollars and the marginal rate of the month 
    compared to average level in the year 2017.'
  )

The initial insight revolves around evaluating the sales dynamics of the Christmas seasonal across all 12 months, examining the marginal fluctuations in each month relative to the average monthly level.

##Code
Valentine_1 <- df_transactions %>%
  mutate(
    month = month(transaction_timestamp, label = TRUE)) %>%
  inner_join(
    df_products, 
    by = 'product_id'
  ) %>%
  group_by(month, product_category, product_type, basket_id) %>%
  summarise(
    volume = sum(quantity),
    sale = sum(sales_value)
  ) %>%
  group_by(
    product_type
  ) %>%
  mutate(
    monthly_sale = mean(sale),
    monthly_margin = ((sale/monthly_sale)-1)*100
  ) %>%
  filter(
    product_category == "VALENTINE"
  ) %>%
  arrange(desc(monthly_margin))

##Code
Valentine_1 <- df_transactions %>%
  mutate(
    month = month(transaction_timestamp, label = TRUE)) %>%
  inner_join(
    df_products, 
    by = 'product_id'
  ) %>%
  group_by(month, product_category, product_type) %>%
  summarise(
    volume = sum(quantity),
    sale = sum(sales_value)
  ) %>%
  group_by(
    product_type
  ) %>%
  mutate(
    monthly_sale = mean(sale),
    monthly_margin = ((sale/monthly_sale)-1)*100
  ) %>%
  filter(
    product_category == "VALENTINE"
  ) %>%
  arrange(desc(monthly_margin))
Valentine_top_3 <- Valentine_1 %>%
  filter(
    month %in% c("Jan","Feb")
  ) %>%
  group_by(month, product_type) %>%
  summarise(
    sale = mean(sale),
    volume = mean(volume), 
    monthly_sale = mean(monthly_sale),
    monthly_margin = mean(monthly_margin)
  ) %>%
  arrange(desc(monthly_margin)) %>%
  group_by(month) %>%
  arrange(desc(monthly_margin))%>%
  top_n(3,monthly_margin)

ggplot(Valentine_top_3, aes(x=product_type, fill = month))+
  geom_col(aes(y=monthly_margin), width = 0.5)+
  facet_wrap(~month, nrow =3, scales='free') +
  theme_tufte() + theme(axis.line=element_line(), axis.text.x = element_text(size = 5)) +
  ylab('Monthly margin (%)')+
  labs(
    title = 'Top 3 performers of each month during Valentine 2017',
    x='',
    subtitle =
      'The data below shows the sales boost in percentage of the product'
  )

The second observation focuses on identifying the particular products that experience the highest acceleration during the peak season, with variations observed from month to month.

Valentine_discount <- df_transactions %>%
  mutate(
    month = month(transaction_timestamp, label = TRUE)) %>%
  inner_join(
    df_products, 
    by = 'product_id'
  ) %>%
  group_by(month, product_category) %>%
  summarise(
    volume = sum(quantity),
    sale = sum(sales_value), 
    store_discount = sum(retail_disc),
    coupon_disc = sum(coupon_disc)
  ) %>%
  filter(
    product_category == "VALENTINE"
  ) %>%
  arrange(month)

##Plot 3:
ggplot(Valentine_discount, aes(x=month))+
  geom_col(mapping = aes(y=sale, fill = 'Sale Value')) +
  geom_col(aes(y=store_discount, fill = 'Discount Value'))+
  scale_fill_manual(
    name = 'amount',
    values = c('Sale Value'= '#5E081E','Discount Value' = 'beige')
  ) +
  theme_classic()+
  labs(
    title = 'Discount Effect on Valentine in 2017',
    x='Month',
    subtitle =
      'The data below shows the sales and discount value in dollars'
  )

The third observation revolves around the impact of discounts on sales within the Valentine category.

Examining the data reveals a spike in sales starting from January but there is no discount effect.

Valentine_customer <- df_transactions %>%
  inner_join(df_demographics, by = 'household_id') %>%
  inner_join(df_products, by = 'product_id') %>%
  filter(
    product_category == 'VALENTINE'
  ) %>%
  mutate(
    kids_count = factor(kids_count)
  ) %>%
  group_by(age, income, home_ownership, marital_status, household_size, kids_count) %>%
  summarise(
    purchase_value = sum(sales_value), 
    purchase_volume = sum(quantity),
    discount_used = sum(retail_disc)
  ) %>%
  arrange(desc(purchase_value))

ggplot(Valentine_customer,aes(x=age,y=income,fill=marital_status,size=kids_count))+ 
  geom_point(alpha=0.5, shape=21, color="black")+
  theme_ipsum() +
  theme(legend.position="bottom")+
  ylab('income')+
  labs(
    title = 'Target customer demographic distribution map',
    x='age',
    subtitle =
      'The data below shows where most of our target customers are demographically identified '
  )

The fourth insight is triggered by the question whether discount should be applied starting from January until April to stimulate more sales of seasonal products.

To answer the question, we are going to explore the behaviours of target customers for Easter products. Taking a look at their demographic data, the common traits of the group is that they are married and have kids. Their age mainly fall between 35-54 and their income level are varied.

Valentine_customer_track <- df_transactions %>%
  inner_join(df_demographics, by = 'household_id') %>%
  inner_join(df_products, by = 'product_id') %>%
  filter(
    product_category == 'VALENTINE'
  ) %>%
  group_by(household_id) %>%
  summarise(
    purchase_value = sum(sales_value), 
    purchase_volume = sum(quantity),
    discount_used = sum(retail_disc)
  ) %>%
  arrange(desc(purchase_value))
Customer_other_purchase <- df_transactions %>%
  inner_join(Valentine_customer_track, by = 'household_id') %>%
  inner_join(df_products) %>%
  group_by(product_category, product_type) %>%
  summarise(
    purchase_value = sum(sales_value), 
    purchase_volume = sum(quantity),
    discount_used = sum(retail_disc), 
  ) %>%
  filter(
    product_category != 'COUPON/MISC ITEMS'
  ) %>%
  mutate(
    discount_percentage = (discount_used/purchase_value)*100,
    product_group = case_when(
      product_category %in% c('FLUID MILK PRODUCTS','SOFT DRINKS','BEERS/ALES') ~ 'Beverage',
      product_category %in% c('BEEF','CHEESE','PORK','BAG SNACKS') ~ 'Food',
      product_category %in% c('BATH TISSUES','DIAPERS & DISPOSABLES','COLD CEREAL','CIGARETTES') ~ 'Household and Others'
    )
  ) %>%
  arrange(desc(purchase_value)) %>%
  head(13)

ggplot(Customer_other_purchase, aes(x=product_category, na.rm=TRUE))+
  geom_col(aes(y=purchase_value,fill = 'Purchase Value'))+
  geom_col(aes(y=discount_used, fill = 'Discount value'), alpha = 0.7)+
  ylim(0,30000)+
  facet_wrap(~product_group, nrow =3, scales='free') +
  theme_tufte() + theme(axis.line=element_line(), axis.text.x = element_text(size = 5))+
  labs(
    title = 'Discount Effect on Purchasing Power of Target Customers based on their Top Choices in 2017',
    x='Month',
    subtitle =
      'The data below shows the sales and discount value in dollars'
  )

The group proves a strong purchase power accross food and beverage category and receives a large portion of discount over the total shopping cart value