I have been tasked with identifying and analyzing potential growth opportunities where Regork can invest in order to increase revenue and profits. Companies like Regork are always looking to make an impact and to win the “hearts and minds” of the consumer. As a grocer, sometimes this can be difficult when you deal with commodities. How can you gain the attention of the consumer and get them into the store?
The following packages are required:
library(completejourney) #Import completejourney datasets
library(tidyverse) #visualize, transform and tidy the data
library(dplyr) #data manipulation
library(ggplot2) #plot charts
library(lubridate) #perform date-time operations
completejourney provides access to data sets characterizing household level transactions over the year 2017 from a group of 2,469 households who are frequent shoppers at a grocery store. It contains all of each household’s purchases, not just those from a limited number of categories. For certain households, demographic information as well as direct marketing contact history are captured.
## # A tibble: 1,469,307 × 11
## household_id store_id basket_id product_id quantity sales_value retail_disc
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 900 330 31198570044 1095275 1 0.5 0
## 2 900 330 31198570047 9878513 1 0.99 0.1
## 3 1228 406 31198655051 1041453 1 1.43 0.15
## 4 906 319 31198705046 1020156 1 1.5 0.29
## 5 906 319 31198705046 1053875 2 2.78 0.8
## 6 906 319 31198705046 1060312 1 5.49 0.5
## 7 906 319 31198705046 1075313 1 1.5 0.29
## 8 1058 381 31198676055 985893 1 1.88 0.21
## 9 1058 381 31198676055 988791 1 1.5 1.29
## 10 1058 381 31198676055 9297106 1 2.69 0
## # ℹ 1,469,297 more rows
## # ℹ 4 more variables: coupon_disc <dbl>, coupon_match_disc <dbl>, week <int>,
## # transaction_timestamp <dttm>
## # A tibble: 92,331 × 7
## product_id manufacturer_id department brand product_category product_type
## <chr> <chr> <chr> <fct> <chr> <chr>
## 1 25671 2 GROCERY Natio… FRZN ICE ICE - CRUSH…
## 2 26081 2 MISCELLANEOUS Natio… <NA> <NA>
## 3 26093 69 PASTRY Priva… BREAD BREAD:ITALI…
## 4 26190 69 GROCERY Priva… FRUIT - SHELF S… APPLE SAUCE
## 5 26355 69 GROCERY Priva… COOKIES/CONES SPECIALTY C…
## 6 26426 69 GROCERY Priva… SPICES & EXTRAC… SPICES & SE…
## 7 26540 69 GROCERY Priva… COOKIES/CONES TRAY PACK/C…
## 8 26601 69 DRUG GM Priva… VITAMINS VITAMIN - M…
## 9 26636 69 PASTRY Priva… BREAKFAST SWEETS SW GDS: SW …
## 10 26691 16 GROCERY Priva… PNT BTR/JELLY/J… HONEY
## # ℹ 92,321 more rows
## # ℹ 1 more variable: package_size <chr>
## # A tibble: 801 × 8
## household_id age income home_ownership marital_status household_size
## <chr> <ord> <ord> <ord> <ord> <ord>
## 1 1 65+ 35-49K Homeowner Married 2
## 2 1001 45-54 50-74K Homeowner Unmarried 1
## 3 1003 35-44 25-34K <NA> Unmarried 1
## 4 1004 25-34 15-24K <NA> Unmarried 1
## 5 101 45-54 Under 15K Homeowner Married 4
## 6 1012 35-44 35-49K <NA> Married 5+
## 7 1014 45-54 15-24K <NA> Married 4
## 8 1015 45-54 50-74K Homeowner Unmarried 1
## 9 1018 45-54 35-49K Homeowner Married 5+
## 10 1020 45-54 25-34K Homeowner Married 2
## # ℹ 791 more rows
## # ℹ 2 more variables: household_comp <ord>, kids_count <ord>
I began my analysis by firstly finding what are the top 5 categories with the highest sales value.The results are shown below:
## [1] "drug gm" "fuel" "grocery" "meat" "produce"
For better analysis, I decided to plot this data.
I decided to focus on the top selling department. Which was the Grocery department.
Afterwards, I wanted to better understand and see what the Grocery department sales and data looked like.
The Soft Drinks category is the top selling so I decided to move on with only this category from no on.
I combined all soft drink product types to see if there were any trends.
## # A tibble: 30 × 2
## product_type total_sales_value
## <chr> <dbl>
## 1 SOFT DRINKS 12/18&15PK CAN CAR 89597.
## 2 SFT DRNK 2 LITER BTL CARB INCL 29288.
## 3 SOFT DRINKS 20PK&24PK CAN CARB 18212.
## 4 SFT DRNK MLT-PK BTL CARB (EXCP 14691.
## 5 SFT DRNK SNGL SRV BTL CARB (EX 9111.
## 6 SOFT DRINKS 6PK/4PK CAN CARB ( 5578.
## 7 TEA SWEETENED 3399.
## 8 ENERGY DRINK 2927.
## 9 SOFT DRINKS CAN NON-CARB (EXCE 2635.
## 10 SOFT DRINK BOTTLE NON-CARB (EX 1454.
## # ℹ 20 more rows
Afterwards, I wanted to see what this would look like compared to income.
As you can see I tried to plot each product type and its total sales value by income. However, I needed to widen the range and combine together all the soft drinks product types which I did below.
From the plot above, I concluded that there were 3 income ranges that were selling the most. so I decided to focus on those 3 and create a piechart to see what product types they were buying the most.
The 3 top selling income levels which are “35-59K”, “50-74K”, and “75-99K” are also buying the same 3 product type items. Which are “SOFT DRINKS 12/18&15PK CAN CARB”, “SFT DRNK 2 LITER BTL CARB INCL” and “SOFT DRINKS 20PK&24PK CAN CARB”.
I analysed the Soft Drinks category, but I wanted to see also the Alcohol category. I wanted to understand if it was because the other income ranges were buying a different product, and since we are talking about soft drinks, I deduced that another competitor for that specific product would be alcohol
I combined all alcohol product types to see if there were any trends.
## # A tibble: 27 × 2
## product_type total_sales_value
## <chr> <dbl>
## 1 BEERALEMALT LIQUORS 82643.
## 2 PREMIUM 750ML WINES 9410.
## 3 POPULAR 750ML WINES 7911.
## 4 AUSTRALIAN/NZ WINES 5195.
## 5 FIGHTING VARIETAL WINES 4294.
## 6 BOX WINES 3592.
## 7 ITALIAN WINES 2721.
## 8 ULTRA PREMIUM WINES 2637.
## 9 VALUE GLASS WINE 2449.
## 10 SPARKLING WINES 2157.
## # ℹ 17 more rows
Afterwards, I wanted to see what this would look like compared to income.
As you can see I tried to plot each product type and its total sales value by income. However, I needed to widen the range and combine together all the soft drinks product types which I did below.
From the plot above, I concluded that there were 3 income ranges that were selling the most. so I decided to focus on those 3 and create a piechart to see what product types they were buying the most.
The 3 top selling income levels which are “35-59K”, “50-74K”, and “75-99K” also are buying 1 item which results in 70%, 63% and 62% sales of the total sales value. Which is “BEER/ALE/MALT LIQUORS”.
I have been tasked with identifying and analyzing potential growth opportunities where Regork can invest in order to increase revenue and profits. In the beginning of this analysis, I wanted figure out what departments were the most fruitful for the company which was the Grocery department. From then I decided to focus on which categories and product types were the top sellers. I found that Soft Drinks were the top seller out of all the other Product Categories. I decided I needed a better understanding of that category therefore I decided to compare it with all the income levels from the Demographics to better understand what type of people spent so much on soft drinks. Those were the income ranges: “35-49K”, “50-74K” and “75-99K”. Afterwards I delved deeper into what products out of the soft drinks category were the most sold for those income ranges and I found that close to 50% and sometimes even more were from the “SOFT DRINKS 12/18&15PK CAN CARB”, while that was the top seller, I also decide to include the second and third top seller. Those are the “SFT DRNK 2 LITER BTL CARB INCL” and “SOFT DRINKS 20PK&24PK CAN CARB” which even though they were about 15% and 12% of the total sales for the Soft Drinks category they still amount to a big chunk of the total soft drinks category overall sales. From my analysis I recommend increasing the price as well as running deals and campaigns to better entice the customers into buying the products mentioned previously, that we know are already top sellers.
Furthermore, when analyzing the soft drinks category sales over income levels I noticed that only 3 income ranges were the most fruitful. So, I wanted to understand if it was because the other income ranges were buying a different product, and since we are talking about soft drinks, I deduced that another competitor for that specific product would be alcohol. Similarly, I noticed that the same income range, “35-49K”, “50-74K” and “75-99K” were the top buyers for the alcohol product category as well. Diving deeper into the data I can also conclude that the top selling item within the alcohol category was without a doubt the “BEER/ALE/MALT LIQUORS” where they sold as 70%, 63%, and 62%. If we wanted to maximized sales revenue, I would suggest similarly to the soft drinks to increase the price as well as running deals and campaigns to better entice the customers into buying the products mentioned previously, that we know are already top sellers.
There are plenty of limitations of this analysis. Such as the data is old, from 2017 therefore, this analysis could be completely irrelevant. Furthermore, I would add that selling trends do change over the years, meaning that even though the recommended products sold well this year does not mean that they are going to sell just as well or more in the future years. There are also a lot more factors contributing to this analysis such as economy, consumer confidence, income levels and more.