Regork

Introduction

Regork is one of the largest retailers in United States and is a company that uses data to optimize their daily operations. As part of the Regork Data Analytics team we are always looking at different ways in which help Regork’s business to grow and get maximum value from the investments in the company.

Problem Statements

Business Question 1:

The project aims to explore new avenues of revenue growth for Regork. To study the existing product portfolio of Regork and identify potential opportunities for business expansion, cross-selling and maximizing basket value for each customer.

Once we identify the category, it is also important to run a campaign for the new product. Hence, our second business question is to analyse the existing campaign data and provide recommendations on improving campaign strategy.

Business Question 2:

As part of analysis, we decided to analyze campaigns which yields in highest sales during which month of the year and time of the day and also provide suggestions to optimize the product display locations.Campaigns are an important part of the marketing and sales strategies for any retail company, and effectively using them will not only improve bottom line numbers but also increase customer engagement.

Analytical Approach

For this analysis, we have used the records of Regork’s household level transactions of 2469 households, who are frequent shoppers in the year 2017. The data includes customer demographics, transaction data, product meta data and campaign information.

The approach that we have taken to address the business objective is ‘Descriptive Analysis’. Through descriptive analysis, we perform investigations on data so as to discover patterns, to spot anomalies, to test hypothesis and to check assumptions with the help of summary statistics and graphical representations.

For Business question 1, we analysed the least selling categories to identify potential opportunities, checked for their potential customer base, further examined the existing data about the category and then explored best bundling categories for cross-selling the selected category.

Similarly, for campaign analysis, we studied multiple aspects of the campaigns to spot the factors leading to campaign success.

Business Proposal

The proposed solutions will help Regork expand their business, and thus increase revenue. They include insights to assure that these ideas have the potential to increase Regork profits. In multiple ways, these proposed solutions aim to maximize the resource utilization and improve campaign design.

Packages

completejourney - Gives us the data-set we are analyzing.
ggplot2 - Provides us with the different graphs that we can use to plot our results.
plotly - Used to make our graphs more interactive.
dplyr - For flexibility in data manipulation.
tidyr - To clean up messy data.
lubridate - To easily format time stamps to date for transaction data.
scales - To map data to aesthetics, and provide methods for automatically determining breaks and labels for axes and legends.
treemapify - Allows creating treemaps, which are used for displaying hierarchical data using nested rectangles

library(completejourney)
library(ggplot2)
library(plotly)
library(dplyr)
library(tidyr)
library(lubridate)
library(tidyverse)
library(scales)
library(ggtext)
library(here)
library(treemapify)

Data Preparation

The original data set is from the ‘completejourney’ package. This data set has the records of 2017 household level transactions of 2469 households who are frequent shoppers at Regork. The data includes customer demographics, transaction data, product meta data and campaign information.

For our project we have used below data sets to answer the chosen business questions.

Transactions
Promotions
Campaigns
Campaign_descriptions
Products
Demographics

transactions <- get_transactions()
promotions <- get_promotions()
products <- products
demographics <- demographics
coupons <- coupons
coupon_redemptions <- coupon_redemptions

#Checking for Missing Value Check
sum(is.na(campaigns))

## [1] 0

sum(is.na(campaign_descriptions))

## [1] 0

#Join Campaign tables
campaign <- full_join(campaigns, campaign_descriptions, by = "campaign_id")

#Sanity check : end_day > start_day
sum(campaign$end_day < campaign$start_day)

## [1] 0

## No further data cleaning needed for our analysis at this stage. We will assess the data at each stage and tidy if needed, as our analysis progresses.

Exploratory Data Analysis

BUSINESS QUESTION 1

EXPLORING NEW AVENUES FOR REVENUE GROWTH

What untapped market can Regork expand its business to? How to use existing resources to market such new products?

We start by narrowing down the under-performing product categories in terms of sales volume and total value.

### Finding Under-performing Product Categories #####################################

least <- products %>%
  inner_join(transactions, by = "product_id") %>%
  group_by(product_category)%>%
  summarize(total_qty= sum(quantity), total_sales =sum(sales_value))%>%
  filter(total_sales < 100 & total_qty <100)%>%
  mutate(puvalue = total_sales/total_qty)%>%
  arrange(total_sales)

ggplot(least, aes(x = product_category, y = total_sales)) +
  geom_point(aes(color = puvalue,size = total_qty), shape = 17) +
  scale_x_discrete(guide = guide_axis(angle = 90)) +
  scale_y_continuous(labels = scales::dollar)  +
  labs(title = "Under Performing Product Categories",
       subtitle = "Data represents the total sales volume and value of least selling products. \nRegork 2017",
       x = "Product Category",
       y = "Total Sales Value",
       size = "Quantity Sold",
       color = "Per Unit Cost",
       caption = "Data sources: https://bradleyboehmke.github.io/completejourney/articles/completejourney.html",
       tag = "Fig.1")

It can be observed from the plot Fig.1 that the category “Toys” is the least selling category. Before deciding to move ahead with this category, we would like to examine the potential market for Toys. Hence we analysed the customer demographics to look for potential household with kids.

### Market Potential for Toys #####################################

market <- demographics %>%
  filter(kids_count >= 1)%>%
  inner_join(transactions, by = "household_id")%>%
  group_by(household_id,income)%>%
  summarize(household_sales =sum(sales_value),visits = n_distinct(basket_id))%>%
  arrange(desc(household_sales))

## `summarise()` has grouped output by 'household_id'. You can override using the
## `.groups` argument.

ggplot(market, aes(x=visits,y=household_sales,color=income))+
  geom_point(shape=19,size = 2)+
  labs(title = "Potential of Untapped Market",
       subtitle = "Data represents the income demographics, sales volume of CUSTOMERS WITH KIDS",
       x = "Store Visits",
       y = "Household Sales Value",
       color = "Income Group",
       caption = "Data sources: https://bradleyboehmke.github.io/completejourney/articles/completejourney.html",
       tag = "Fig.2")

As shown in the plot Fig.2 , there is a diverse pool of customers who have kids, and are frequent shoppers at Regork. Some of these households even make hundreds of purchases that value as high as $15000 and above. As the color of the points indicate, there are customers with income above 200k as well. Hence, there is a sizeable market for premium products also which can help gain higher margins. Although, this is a tangent that needs further deep-dive, we would like to continue our analysis on the product category “Toys”.

### Studying multiple aspects of the selected category "TOYS" #####################################

## Toys Product Portfolio 

Options <- products %>%
  filter(product_category == "TOYS")%>%
  select(product_category,product_id,product_type,brand)

Options

## # A tibble: 3 × 4
##   product_category product_id product_type brand   
##   <chr>            <chr>      <chr>        <fct>   
## 1 TOYS             831764     CARD GAMES   National
## 2 TOYS             9193084    DIECAST MINI National
## 3 TOYS             9674186    CARD GAMES   National

## Toys Campaigns

campaigns <- coupons %>%
  filter(product_id == "831764" | product_id=="9193084"| product_id=="9674186")

n_distinct(campaigns$campaign_id)

## [1] 0

## Toys Display Locations

location <- promotions%>%
  filter(product_id == "831764" | product_id=="9193084"| product_id=="9674186")%>%
  group_by(display_location)%>%
   mutate(display_location = fct_recode(display_location,
      "In Shelf" = "A",
      "Store Front" = "1",
      "Store Rear" = "2",
      "Front End" = "3",
      "Mid Aisle End" = "4",
      "Rear End" = "5",
      "Side Aisle" = "6",
      "In Aisle" = "7",
      "Secondary Location Display" = "9",
    ))

ggplot(location,aes(fct_infreq(display_location)))+ 
  geom_bar(fill = "gold")+
  scale_x_discrete(guide = guide_axis(angle = 90)) +
  labs(title = "Promotions on Toys through Display locations",
       subtitle = "Data represents the display locations used for Toys in various stores",
       x = "Display Location",
       y = "Frequency",
       caption = "Data sources: https://bradleyboehmke.github.io/completejourney/articles/completejourney.html",
       tag = "Fig.3")

Quick observations on the category “Toys”, from the tibble Campaigns, tibble Options and the plot Fig.3, are as follows:

There are only 3 product_id s’ under this category
No private brands being sold
There are zero campaigns held for this category
The display location for this category is predominantly in-shelf or in-aisle for majority of the stores

Thus, our final recommendations are based on the insights found so far. Additionally, to recommend how to push sales for Toys, if introduced, we analysed the product categories by sales value.

### Choosing product categories to bundle Toys for promotional offers #####################################

bundle <- demographics %>%
  inner_join(transactions, by = "household_id")%>%
  inner_join(products, by = "product_id")%>%
  group_by(product_category,kids_count)%>%
  filter(product_category != "COUPON/MISC ITEMS")%>%
  mutate(kids_count = fct_collapse(kids_count,Families_with_kids = c("1","2","3+")))%>%
  mutate(kids_count = fct_collapse(kids_count,No_kids = "0"))%>%
  summarize(category_sales =sum(sales_value))%>%
  filter(category_sales > 10000)%>%
  arrange(desc(category_sales))

## `summarise()` has grouped output by 'product_category'. You can override using
## the `.groups` argument.

ggplot(bundle,aes(area = category_sales, fill = product_category, label = paste(product_category))) +
  geom_treemap() +
  geom_treemap_text(grow = TRUE,reflow = TRUE) +
  theme(legend.position = "none") +
  facet_grid(~kids_count)+
  labs(title = "Indentifying Product Categories to Bundle promotional offers for Toys",
       subtitle = "Data represents the top product categories by sales value, among families with and without kids.",
       x = "Product Category",
       y = "Category Sales Value",
       fill = "Household Composition",
       caption = "Data sources: https://bradleyboehmke.github.io/completejourney/articles/completejourney.html",
       tag = "Fig.4")

The above plot Fig.4 helps identify the top selling category in each grid. Although the top categories are Soft drinks, beef, Cheese etc, they are common for all households. We can notice that the families with kids purchase “Diapers & Disposables” category with a total sales value of about $11000.This is the category we would want to bundle toys to, for our promotional offers.

This leads to our next question on how to improve our campaign strategy. This strategy will help our new venture of expanding Toys business to ultimate success.

BUSINESS QUESTION 2

CAMPAIGN STRATEGY

For the purpose of our analysis, we create a base table ‘datasets’ which contains a details of campaign resulting in highest sales and product category analysis purchased during that campaign period.

We start with analyzing campaign data, and tried to find out campaign with highest sales made. ‘Campaign id - 18’ had the highest sales. To deep drive further, we analyzed month of the campaign and time of the day which had the highest sales.

To explore more on the sales analysis, we tried to find if any particular store was making highest sales for top selling product categories based on any external factor.

#Join Transactions and Campaign tables
df1 <- campaign %>%
        group_by(campaign_id) %>%
        inner_join(transactions, by = 'household_id') %>%
        mutate(daterange = as.Date(transaction_timestamp)) %>%
    filter(between(daterange, min(start_date), max(end_date))) %>%
        summarise(total_sales = sum(sales_value, na.rm = TRUE)) %>%
        arrange(desc(total_sales))
  
x=as.POSIXct(strptime(c("050000","105959","110000","155959",
                        "160000",
                        "185959"),"%H%M%S"),"UTC")
      
df2 <-  campaign %>%
        filter(campaign_id == '18') %>%
        inner_join(transactions, by = 'household_id') %>%
        mutate(time = format(as.POSIXct(transaction_timestamp), format = "%H:%M:%S")) %>%
        mutate(daterange = as.Date(transaction_timestamp)) %>%
        mutate(Month = month(daterange)) %>%
        filter(between(daterange, min(start_date), max(end_date)))

df2$hour <- hour(hms(df2$time))
hr_breaks = seq(0, 24, by = 4)
hr_labels <- c("00-04", "04-08", "08-12", "12-16",
               "16-20", "20-00")
df2$tod = cut(df2$hour, breaks = hr_breaks, 
             labels = hr_labels,
             include.lowest = T, right = F)

df3 <- df2 %>%
      group_by(month(daterange), tod) %>%
      summarise(total_sales = sum(sales_value, na.rm = TRUE)) %>%
      arrange(desc(total_sales))

df3 <- df2 %>%
     mutate(Monthname = case_when(Month == 10 ~ 'October',
                                  Month == 11 ~ 'November',
                                  Month == 12 ~ 'December')) %>%
      group_by(Monthname, tod) %>%
      summarise(total_sales = sum(sales_value, na.rm = TRUE)) %>%
      arrange(desc(total_sales))

plot1 <- ggplot(df3, aes(x = tod, y = total_sales, fill = Monthname)) +
  geom_bar(stat = "identity") +
  xlab("Time of the Day(24Hrs)") +
  ylab("Total Sales") +
  #facet_wrap(. ~ age, ncol = 2) +
  theme(plot.title = element_text(hjust = 0.5), 
        axis.title.x = element_text(face = "bold", colour = "blue", size = 12),
        axis.title.y = element_text(face = "bold", colour = "blue", size = 12),
        legend.title = element_text(face = "bold", size = 12),  
        strip.background = element_rect(fill = "lightgreen", colour = "black", size = 1),
        strip.text = element_text(face = "bold",size = rel(1.2)))+
  theme(plot.title = element_text(size = 5, vjust = -.5))

ggplotly(plot1)

After this we tried to find pattern for top 3 store ids and product category.

df4 <- df2 %>%
       inner_join(products, by = "product_id") %>%
       group_by(product_category) %>%
       summarise(total_sales = sum(sales_value, na.rm = TRUE)) %>%
       arrange(desc(total_sales))

top_prod <- df4 %>%
            slice(1:10)

dis_product_category = n_distinct(df4$product_category, na.rm = TRUE) 

uni_prod <- unique(df2$product_id)

top10prod <- df2 %>%
            inner_join(products, by = 'product_id') %>%
            filter(product_category == top_prod$product_category) 

df5 <- promotions %>%
       inner_join(top10prod)

store <- df5 %>% 
        group_by(product_category, store_id) %>%
        summarise(total_sales = sum(sales_value, na.rm = TRUE)) %>%
        arrange(desc(total_sales))  %>%
        top_n(n = 3)

# Plot 2 ----------------------------------------------------------

  store %>%
ggplot(aes(total_sales, store_id, label = round(total_sales,0),angle = 45)) +
  geom_segment(aes(x = 0, y = store_id, xend = total_sales, yend = store_id)) +
  geom_point(aes(color = product_category),size = 3) +
  theme(panel.background = element_rect(fill = "white", colour = "grey50")) +
  theme(plot.title.position = "plot",
        plot.caption.position = "plot" ,
        plot.title = element_markdown(),
        plot.subtitle = element_text(size = 9, margin = margin(b = 25, l= -25)),
        plot.caption = element_markdown()) +
  labs(title = 'Sales of Top 3 Stores based on Product Category',tag = "Fig.6")

We also tried to find pattern for display location based on top 10 product category.

display <- df5 %>%
          group_by(product_category, display_location) %>%
          summarise(total_sales = sum(sales_value, na.rm = TRUE)) %>%
          arrange(desc(total_sales))  %>%
          top_n(n = 3)

# Plot 3 ------------------------------------------------------------------

display %>%
    ggplot(aes(total_sales, display_location, label = round(total_sales,0),angle = 45)) +
    geom_segment(aes(x = 0, y = display_location, xend = total_sales, yend = display_location)) +
    geom_point(aes(color = product_category),size = 3) +
    #scale_color_manual(values=c("#FFFF33", "#33FF99")) +
    theme(panel.background = element_rect(fill = "white", colour = "grey50")) +
    theme(plot.title.position = "plot",
          plot.caption.position = "plot" ,
          plot.title = element_markdown(),
          plot.subtitle = element_text(size = 9, margin = margin(b = 25, l= -25)),
          plot.caption = element_markdown()) +
  labs(title = "Sales of Top 3 Display Location based on Product Category",
       tag = "Fig.7")+
    facet_wrap(~ product_category)

#### {- .leave-tabset}

Summary

Regork is a national grocery chain. We are trying to identify potential areas of growth where the company could invest future resources to increase revenue and profits. For this, we chose to identify under performing product categories and thus building revenue streams from new avenues. This has been the category Toys, after analysis. Similarly, we chose to examine of the effectiveness of existing campaign data and made recommendations that will help in revenue growth.

Recommendations

Based on the findings from the analyses done above, we have the following recommendations:

Business Question 1:

Expand the Product Portfolio for Toys. Get more brands onboarded and include Private Brands. Include product types that suit boys and girls, with a mix of premium and economy options, for the income demographics of our target households vary
Seasonally move selected brands or themed products to better display locations to increase visibility for this category
Include bundled promotions to push sales for these new products among the target customers. As shown in Fig.4, bundling with products like Diapers & Disposables will help reach our target customers

Business Question 2:

From a sales perspective, we can see that October to December is the peak time for sales and Soft drinks and Beer are one of the most popular purchase, while bag snacks are still a little low on the bar. Thus, the team can run more campaigns around the festive seasons will help to increase sales.
From a display-level, we need to target those stores where we see a high sales for display location 0. So, we can keep probably move the closer soft drinks, milk products and beer providing more visibility and consequently more sales

Implications

This analysis suggests Regork to invest funds in building the product category of Toys and build a new revenue channel from the existing customer base itself, by increasing their basket value. Further, this analysis should help the marketing teams of Regork to improvise their campaign design, thus driving our new venture of expanding Toys business to ultimate success.

Limitations

The primary limitation for this data set is the limited time period for which this data was collected. Year on year comparison of campaigns during the same holiday seasons would yield better insights than comparing within the year. Seasonality of the sales also needs yearly data
Geographical data on store locations or any other information on store_id s’ will help validating the analysis done and avoiding any misinterpretations
Display location data is helpful in analyzing the promotions but planogram of the store - which brands are placed at eye-level of target customers is also equally important
Campaign types are identified as A,B and C. However, additional information on what each of them define would help getting additional insights. The product types and categories in each campaign type are widely diverse to make any assumptions.

Division of Work

Business Question 1 - Amulya Satyavarapu
Business Question 2 - Priyanka Nainani, Priya Anandani

Data Wrangling Final Project

GROUP 1: Amulya Satyavarapu, Priyanka Nainani, Priya Anandani

October 8th,2022

Business Insights and Growth Strategy

Regork

Introduction

Problem Statements

Analytical Approach

Business Proposal

Packages

Data Preparation

Exploratory Data Analysis

BUSINESS QUESTION 1

BUSINESS QUESTION 2

Summary

Summary

Recommendations

Implications

Limitations

Division of Work