Introduction

I have been tasked with identifying and analyzing potential growth opportunities where Regork can invest in order to increase revenue and profits. Companies like Regork are always looking to make an impact and to win the “hearts and minds” of the consumer. As a grocer, sometimes this can be difficult when you deal with commodities. How can you gain the attention of the consumer and get them into the store?

Packages Required

The following packages are required:

library(completejourney)  #Import completejourney datasets
library(tidyverse)        #visualize, transform and tidy the data
library(dplyr)            #data manipulation
library(ggplot2)          #plot charts
library(lubridate)        #perform date-time operations

Data Preparation

completejourney provides access to data sets characterizing household level transactions over the year 2017 from a group of 2,469 households who are frequent shoppers at a grocery store. It contains all of each household’s purchases, not just those from a limited number of categories. For certain households, demographic information as well as direct marketing contact history are captured.

For our purposes we use 3 tables from the CompleteJourney package:

Transactions

## # A tibble: 1,469,307 × 11
##    household_id store_id basket_id   product_id quantity sales_value retail_disc
##    <chr>        <chr>    <chr>       <chr>         <dbl>       <dbl>       <dbl>
##  1 900          330      31198570044 1095275           1        0.5         0   
##  2 900          330      31198570047 9878513           1        0.99        0.1 
##  3 1228         406      31198655051 1041453           1        1.43        0.15
##  4 906          319      31198705046 1020156           1        1.5         0.29
##  5 906          319      31198705046 1053875           2        2.78        0.8 
##  6 906          319      31198705046 1060312           1        5.49        0.5 
##  7 906          319      31198705046 1075313           1        1.5         0.29
##  8 1058         381      31198676055 985893            1        1.88        0.21
##  9 1058         381      31198676055 988791            1        1.5         1.29
## 10 1058         381      31198676055 9297106           1        2.69        0   
## # ℹ 1,469,297 more rows
## # ℹ 4 more variables: coupon_disc <dbl>, coupon_match_disc <dbl>, week <int>,
## #   transaction_timestamp <dttm>

products

## # A tibble: 92,331 × 7
##    product_id manufacturer_id department    brand  product_category product_type
##    <chr>      <chr>           <chr>         <fct>  <chr>            <chr>       
##  1 25671      2               GROCERY       Natio… FRZN ICE         ICE - CRUSH…
##  2 26081      2               MISCELLANEOUS Natio… <NA>             <NA>        
##  3 26093      69              PASTRY        Priva… BREAD            BREAD:ITALI…
##  4 26190      69              GROCERY       Priva… FRUIT - SHELF S… APPLE SAUCE 
##  5 26355      69              GROCERY       Priva… COOKIES/CONES    SPECIALTY C…
##  6 26426      69              GROCERY       Priva… SPICES & EXTRAC… SPICES & SE…
##  7 26540      69              GROCERY       Priva… COOKIES/CONES    TRAY PACK/C…
##  8 26601      69              DRUG GM       Priva… VITAMINS         VITAMIN - M…
##  9 26636      69              PASTRY        Priva… BREAKFAST SWEETS SW GDS: SW …
## 10 26691      16              GROCERY       Priva… PNT BTR/JELLY/J… HONEY       
## # ℹ 92,321 more rows
## # ℹ 1 more variable: package_size <chr>

demographics

## # A tibble: 801 × 8
##    household_id age   income    home_ownership marital_status household_size
##    <chr>        <ord> <ord>     <ord>          <ord>          <ord>         
##  1 1            65+   35-49K    Homeowner      Married        2             
##  2 1001         45-54 50-74K    Homeowner      Unmarried      1             
##  3 1003         35-44 25-34K    <NA>           Unmarried      1             
##  4 1004         25-34 15-24K    <NA>           Unmarried      1             
##  5 101          45-54 Under 15K Homeowner      Married        4             
##  6 1012         35-44 35-49K    <NA>           Married        5+            
##  7 1014         45-54 15-24K    <NA>           Married        4             
##  8 1015         45-54 50-74K    Homeowner      Unmarried      1             
##  9 1018         45-54 35-49K    Homeowner      Married        5+            
## 10 1020         45-54 25-34K    Homeowner      Married        2             
## # ℹ 791 more rows
## # ℹ 2 more variables: household_comp <ord>, kids_count <ord>

Exploratory Data Analysis

Top Selling Departments

I began my analysis by firstly finding what are the top 5 categories with the highest sales value.The results are shown below:

## [1] "drug gm" "fuel"    "grocery" "meat"    "produce"

For better analysis, I decided to plot this data.

I decided to focus on the top selling department. Which was the Grocery department.

Afterwards, I wanted to better understand and see what the Grocery department sales and data looked like.

Soft Drinks

The Soft Drinks category is the top selling so I decided to move on with only this category from no on.

I combined all soft drink product types to see if there were any trends.

## # A tibble: 30 × 2
##    product_type                   total_sales_value
##    <chr>                                      <dbl>
##  1 SOFT DRINKS 12/18&15PK CAN CAR            89597.
##  2 SFT DRNK 2 LITER BTL CARB INCL            29288.
##  3 SOFT DRINKS 20PK&24PK CAN CARB            18212.
##  4 SFT DRNK MLT-PK BTL CARB (EXCP            14691.
##  5 SFT DRNK SNGL SRV BTL CARB (EX             9111.
##  6 SOFT DRINKS 6PK/4PK CAN CARB (             5578.
##  7 TEA SWEETENED                              3399.
##  8 ENERGY DRINK                               2927.
##  9 SOFT DRINKS CAN NON-CARB (EXCE             2635.
## 10 SOFT DRINK BOTTLE NON-CARB (EX             1454.
## # ℹ 20 more rows

Afterwards, I wanted to see what this would look like compared to income.

As you can see I tried to plot each product type and its total sales value by income. However, I needed to widen the range and combine together all the soft drinks product types which I did below.

From the plot above, I concluded that there were 3 income ranges that were selling the most. so I decided to focus on those 3 and create a piechart to see what product types they were buying the most.

The 3 top selling income levels which are “35-59K”, “50-74K”, and “75-99K” are also buying the same 3 product type items. Which are “SOFT DRINKS 12/18&15PK CAN CARB”, “SFT DRNK 2 LITER BTL CARB INCL” and “SOFT DRINKS 20PK&24PK CAN CARB”.

35-49k

50-74K

75-99K

Alcohol

I analysed the Soft Drinks category, but I wanted to see also the Alcohol category. I wanted to understand if it was because the other income ranges were buying a different product, and since we are talking about soft drinks, I deduced that another competitor for that specific product would be alcohol

I combined all alcohol product types to see if there were any trends.

## # A tibble: 27 × 2
##    product_type            total_sales_value
##    <chr>                               <dbl>
##  1 BEERALEMALT LIQUORS                82643.
##  2 PREMIUM 750ML WINES                 9410.
##  3 POPULAR 750ML WINES                 7911.
##  4 AUSTRALIAN/NZ WINES                 5195.
##  5 FIGHTING VARIETAL WINES             4294.
##  6 BOX WINES                           3592.
##  7 ITALIAN WINES                       2721.
##  8 ULTRA PREMIUM WINES                 2637.
##  9 VALUE GLASS WINE                    2449.
## 10 SPARKLING WINES                     2157.
## # ℹ 17 more rows

Afterwards, I wanted to see what this would look like compared to income.

As you can see I tried to plot each product type and its total sales value by income. However, I needed to widen the range and combine together all the soft drinks product types which I did below.

From the plot above, I concluded that there were 3 income ranges that were selling the most. so I decided to focus on those 3 and create a piechart to see what product types they were buying the most.

The 3 top selling income levels which are “35-59K”, “50-74K”, and “75-99K” also are buying 1 item which results in 70%, 63% and 62% sales of the total sales value. Which is “BEER/ALE/MALT LIQUORS”.

35-49k

50-74K

75-99K

Summary

I have been tasked with identifying and analyzing potential growth opportunities where Regork can invest in order to increase revenue and profits. In the beginning of this analysis, I wanted figure out what departments were the most fruitful for the company which was the Grocery department. From then I decided to focus on which categories and product types were the top sellers. I found that Soft Drinks were the top seller out of all the other Product Categories. I decided I needed a better understanding of that category therefore I decided to compare it with all the income levels from the Demographics to better understand what type of people spent so much on soft drinks. Those were the income ranges: “35-49K”, “50-74K” and “75-99K”. Afterwards I delved deeper into what products out of the soft drinks category were the most sold for those income ranges and I found that close to 50% and sometimes even more were from the “SOFT DRINKS 12/18&15PK CAN CARB”, while that was the top seller, I also decide to include the second and third top seller. Those are the “SFT DRNK 2 LITER BTL CARB INCL” and “SOFT DRINKS 20PK&24PK CAN CARB” which even though they were about 15% and 12% of the total sales for the Soft Drinks category they still amount to a big chunk of the total soft drinks category overall sales. From my analysis I recommend increasing the price as well as running deals and campaigns to better entice the customers into buying the products mentioned previously, that we know are already top sellers.

Furthermore, when analyzing the soft drinks category sales over income levels I noticed that only 3 income ranges were the most fruitful. So, I wanted to understand if it was because the other income ranges were buying a different product, and since we are talking about soft drinks, I deduced that another competitor for that specific product would be alcohol. Similarly, I noticed that the same income range, “35-49K”, “50-74K” and “75-99K” were the top buyers for the alcohol product category as well. Diving deeper into the data I can also conclude that the top selling item within the alcohol category was without a doubt the “BEER/ALE/MALT LIQUORS” where they sold as 70%, 63%, and 62%. If we wanted to maximized sales revenue, I would suggest similarly to the soft drinks to increase the price as well as running deals and campaigns to better entice the customers into buying the products mentioned previously, that we know are already top sellers.

There are plenty of limitations of this analysis. Such as the data is old, from 2017 therefore, this analysis could be completely irrelevant. Furthermore, I would add that selling trends do change over the years, meaning that even though the recommended products sold well this year does not mean that they are going to sell just as well or more in the future years. There are also a lot more factors contributing to this analysis such as economy, consumer confidence, income levels and more.