Introduction

Our objective is to explore and evaluate various growth opportunities where Regork can invest to boost revenue and profit. We are always looking for ways to to make a meaningful impact and capture consumer loyalty. It can be difficult to win the “hearts and minds” of the consumers considering it’s hard to stand out but we have an idea to reach some markets to bring them into the store and purchase our products.

What do customers tend to buy along with Frozen Meat/Meat dinners and are there any correlations with these products to potentially create promotions to increase revenue?

To analyze this, we will be using the complete journey package in R which is where the data is for Regork. The data is already there, we will be transforming the data in unique ways to show the findings of the above question.

Using this data will help the CEO of Regork by finding target markets and making promotions for specific products to gain more consumer interest. Using this data will bring in more customers and more money for Regork.

Our proposed solution is to create a deal or promotion when customers buy Frozen Meat/Meat Dinners since it is a popular item in Regork. In our analysis, we will demonstrate and recommended some products that tend to be bought with Frozen Meat/Meat Dinners to make a package deal.

Packages Required

library(tidyverse) # Transforming and presenting data
library(ggplot2) # Creating plots and visualizations
library(completejourney) # Main data for transactions, products, promotions, etc.
library(dplyr) # Transforming and manipulating data
library(readr) # Reads CSV's and other various files
library(knitr) # Creates the html

Data Prep

These are the data frames in the complete journey package we will be using in our analysis

transactions <- get_transactions()
transactions
## # A tibble: 1,469,307 × 11
##    household_id store_id basket_id   product_id quantity sales_value retail_disc
##    <chr>        <chr>    <chr>       <chr>         <dbl>       <dbl>       <dbl>
##  1 900          330      31198570044 1095275           1        0.5         0   
##  2 900          330      31198570047 9878513           1        0.99        0.1 
##  3 1228         406      31198655051 1041453           1        1.43        0.15
##  4 906          319      31198705046 1020156           1        1.5         0.29
##  5 906          319      31198705046 1053875           2        2.78        0.8 
##  6 906          319      31198705046 1060312           1        5.49        0.5 
##  7 906          319      31198705046 1075313           1        1.5         0.29
##  8 1058         381      31198676055 985893            1        1.88        0.21
##  9 1058         381      31198676055 988791            1        1.5         1.29
## 10 1058         381      31198676055 9297106           1        2.69        0   
## # ℹ 1,469,297 more rows
## # ℹ 4 more variables: coupon_disc <dbl>, coupon_match_disc <dbl>, week <int>,
## #   transaction_timestamp <dttm>
promotions <- get_promotions()
promotions
## # A tibble: 20,940,529 × 5
##    product_id store_id display_location mailer_location  week
##    <chr>      <chr>    <fct>            <fct>           <int>
##  1 1000050    316      9                0                   1
##  2 1000050    337      3                0                   1
##  3 1000050    441      5                0                   1
##  4 1000092    292      0                A                   1
##  5 1000092    293      0                A                   1
##  6 1000092    295      0                A                   1
##  7 1000092    298      0                A                   1
##  8 1000092    299      0                A                   1
##  9 1000092    304      0                A                   1
## 10 1000092    306      0                A                   1
## # ℹ 20,940,519 more rows
products
## # A tibble: 92,331 × 7
##    product_id manufacturer_id department    brand  product_category product_type
##    <chr>      <chr>           <chr>         <fct>  <chr>            <chr>       
##  1 25671      2               GROCERY       Natio… FRZN ICE         ICE - CRUSH…
##  2 26081      2               MISCELLANEOUS Natio… <NA>             <NA>        
##  3 26093      69              PASTRY        Priva… BREAD            BREAD:ITALI…
##  4 26190      69              GROCERY       Priva… FRUIT - SHELF S… APPLE SAUCE 
##  5 26355      69              GROCERY       Priva… COOKIES/CONES    SPECIALTY C…
##  6 26426      69              GROCERY       Priva… SPICES & EXTRAC… SPICES & SE…
##  7 26540      69              GROCERY       Priva… COOKIES/CONES    TRAY PACK/C…
##  8 26601      69              DRUG GM       Priva… VITAMINS         VITAMIN - M…
##  9 26636      69              PASTRY        Priva… BREAKFAST SWEETS SW GDS: SW …
## 10 26691      16              GROCERY       Priva… PNT BTR/JELLY/J… HONEY       
## # ℹ 92,321 more rows
## # ℹ 1 more variable: package_size <chr>
demographics
## # A tibble: 801 × 8
##    household_id age   income    home_ownership marital_status household_size
##    <chr>        <ord> <ord>     <ord>          <ord>          <ord>         
##  1 1            65+   35-49K    Homeowner      Married        2             
##  2 1001         45-54 50-74K    Homeowner      Unmarried      1             
##  3 1003         35-44 25-34K    <NA>           Unmarried      1             
##  4 1004         25-34 15-24K    <NA>           Unmarried      1             
##  5 101          45-54 Under 15K Homeowner      Married        4             
##  6 1012         35-44 35-49K    <NA>           Married        5+            
##  7 1014         45-54 15-24K    <NA>           Married        4             
##  8 1015         45-54 50-74K    Homeowner      Unmarried      1             
##  9 1018         45-54 35-49K    Homeowner      Married        5+            
## 10 1020         45-54 25-34K    Homeowner      Married        2             
## # ℹ 791 more rows
## # ℹ 2 more variables: household_comp <ord>, kids_count <ord>
coupons
## # A tibble: 116,204 × 3
##    coupon_upc  product_id campaign_id
##    <chr>       <chr>      <chr>      
##  1 10000085207 9676830    26         
##  2 10000085207 9676943    26         
##  3 10000085207 9676944    26         
##  4 10000085207 9676947    26         
##  5 10000085207 9677008    26         
##  6 10000085207 9677052    26         
##  7 10000085207 9677385    26         
##  8 10000085207 9677479    26         
##  9 10000085207 9677791    26         
## 10 10000085207 9677878    26         
## # ℹ 116,194 more rows

These 4 data sets is how we will be coming up with suggestions for our business problem. We will be joining and filtering multiple data sets to find our answers.

Exploratory Data Analysis

We first want to find out, “What age group is buying Frozen Meat/Meat Dinners the most?”. We can determine what market to push promotions out to the most with this.

transactions %>%
  inner_join(demographics, by = ("household_id")) %>%
  inner_join(products, by = ("product_id")) %>%
  filter(product_category == 'FRZN MEAT/MEAT DINNERS') %>%
  ggplot(aes(x = age, fill = product_category)) +
  geom_bar(color="red") +
  labs(
    fill = "Product Type",
    x = "Age",
    y = "Count of Products Sold",
    title = "What Age Group Buys The Most Frozen Dinners?",
    subtitle = "The bars indicate how many products are sold per age group"
  )

As we can see, ages 45-54 purchase the most frozen meals. More specifically, a majority of the purchases consist of ages 25-54. In this range includes college students, middle aged adults and potentially with families, and slightly older adults. With this data, Regork can push out some deals to these markets. For example e could create a “college deal” or a “family bundle”. This will get the attention of these groups that buy the products the most.

Next we wanted to examine which age group uses the most coupons. If we want to make a promotion, we want to see where coupons are being used the most. In fact, we will filter out Frozen Meat/Meat Dinners to break the data down even further. The goal is to see if those age groups determined in the graph above and close to the same in this graph. If so, then we can start to make an idea on how we want to promote this.

transactions %>%
  inner_join(demographics, by = ("household_id")) %>%
  inner_join(products, by = ("product_id")) %>%
  inner_join(coupons, by = 'product_id') %>%
  filter(product_category == 'FRZN MEAT/MEAT DINNERS') %>%
  group_by(age) %>%
  summarize(total_coupon_discount = sum(coupon_disc, na.rm = TRUE)) %>%
  ggplot(aes(x = age, y = total_coupon_discount, fill = age)) +
  geom_bar(stat = "identity") +  
  labs(title = "Total Coupons Used of Frozen Meals by Age Group", 
       x = "Age Group", 
       y = "Total Coupons Used") 

As seen in the graph, this is a near identical correlation between the first and second graph. This also proves to us that the 35-54 age group is willing to purchase deals related to Frozen Meat/Meat Dinners. With these answers, we can now look into ways to actually make deals to grow revenue for Regork. We want to now break the data down to look at the top 10 product types that are inside the Frozen Meat/Meat Dinners category. This way we can see which Frozen Meats/Meat Dinners are the most purchased to base the deals around those particular products.

transactions %>%
  inner_join(demographics, by = "household_id") %>%
  inner_join(products, by = "product_id") %>%
  filter(product_category == 'FRZN MEAT/MEAT DINNERS') %>%
  count(product_type) %>%  
  arrange(desc(n)) %>%  
  head(10) %>%  
  ggplot(aes(x = reorder(product_type, n), y = n, fill = product_type)) +  
  geom_bar(stat = "identity") +  
  labs(
    title = "Top 10 Product Types in Frozen Meat/Dinners",
    x = "Product Type",
    y = "Number of Purchases"
  ) +
  theme(axis.text.x = element_text(angle = 60, hjust = 1))  

We have now exacted the top 3 product types within the Frozen Meat/Meat Dinners category. The top 3 product types shown in the graph are actually related to meal prepping which we can see soon in the graphs below. Think of something like hamburger helper. That is the type of product we are dealing with.

Going into our final part of the analysis, we want to determine what the top 3 products purchased are when those top 3 product types are purchased in the graph above. This will finally show us which products to bundle with Frozen Meats/Meat Dinners to grow revenue.

To Start, we will look at the most purchased products bought along with the Frozen Premium Entree Dinners/N product type.

frzn_transactions <- transactions %>%
  inner_join(products, by = 'product_id') %>%
  filter(product_type == "FRZN SS PREMIUM ENTREES/DNRS/N")


co_occuring_products <- transactions %>%
  inner_join(products, by = 'product_id') %>%
  filter(basket_id %in% frzn_transactions$basket_id) %>%
  filter(product_type != 'FRZN SS PREMIUM ENTREES/DNRS/N') %>%
  group_by(product_id) %>%
  summarise(count = n()) %>%
  arrange(desc(count)) %>%
  slice_head(n = 3)  


ggplot(co_occuring_products, aes(x = reorder(product_id, count), y = count)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  coord_flip() +  
  labs(title = "Top 3 Products Bought Alongside 'FRZN SS PREMIUM ENTREES/DNRS/N'",
       x = "Product ID",
       y = "Number of Occurrences") +
  theme_minimal()

As seen in the graph, there are 3 product id’s shown and we will break down what those products are. To Start, Product 1082185 is actually Bananas. Bananas are the most purchased product alongside Frozen Premium Entrees. Product 1029743 and 995242 are two types of white milk. This is an interesting find. Bananas are a popular fruit to eat with any meal and the data shows here that there is a relationship of purchases between bananas and frozen meals. With the milk, we believe there is a correlation there because majority of premium meals require milk. This would make sense why milk is purchased a lot with our targeted product. Now lets look at Economy Entrees and see if there are any similarities.

frzn_transactions <- transactions %>%
  inner_join(products, by = 'product_id') %>%
  filter(product_type == "SS ECONOMY ENTREES/DINNERS ALL")


co_occuring_products <- transactions %>%
  inner_join(products, by = 'product_id') %>%
  filter(basket_id %in% frzn_transactions$basket_id) %>%
  filter(product_type != 'SS ECONOMY ENTREES/DINNERS ALL') %>%
  group_by(product_id) %>%
  summarise(count = n()) %>%
  arrange(desc(count)) %>%
  slice_head(n = 3)  


ggplot(co_occuring_products, aes(x = reorder(product_id, count), y = count)) +
  geom_bar(stat = "identity", fill = "pink") +
  coord_flip() +  
  labs(title = "Top 3 Products Bought Alongside 'SS ECONOMY ENTREES/DINNERS ALL'",
       x = "Product ID",
       y = "Number of Occurrences") +
  theme_minimal()

Surprisingly, it is the same exact products in the same exact order with the Bananas and Milk. So there is an exact relationship of purchase behavior here. We can already get a sense of what products to bundle with Frozen Dinners but just to be sure, we will do the same thing with the last Frozen meal Product type.

frzn_transactions <- transactions %>%
  inner_join(products, by = 'product_id') %>%
  filter(product_type == "FRZN SS PREMIUM ENTREES/DNRS/T")


co_occuring_products <- transactions %>%
  inner_join(products, by = 'product_id') %>%
  filter(basket_id %in% frzn_transactions$basket_id) %>%
  filter(product_type != 'FRZN SS PREMIUM ENTREES/DNRS/T') %>%
  group_by(product_id) %>%
  summarise(count = n()) %>%
  arrange(desc(count)) %>%
  slice_head(n = 3)  


ggplot(co_occuring_products, aes(x = reorder(product_id, count), y = count)) +
  geom_bar(stat = "identity", fill = "violet") +
  coord_flip() +  
  labs(title = "Top 3 Products Bought Alongside 'FRZN SS PREMIUM ENTREES/DNRS/T'",
       x = "Product ID",
       y = "Number of Occurrences") +
  theme_minimal()

Once again, it is the same three products. There is only one slight difference and that is the two milks are flipped in the number of occurrences. Still, its the same products shown across all three graphs. With all this data we have discovered, it is time to provide an answer for our question.

Summary

Summarize the problem statement you addressed

Throughout our data analysis we have been trying to determine if there are products that are frequently bought alongside Frozen Meat/Meat Dinners. If there was, then we would make a suggestion to Regork for making a package deal with these products to boost revenue.

Summarize how you addressed this problem statement (the data used and the methodology employed)

To be able to get to the answers we wanted. We had to go step by step. We basically made a graph of what we wanted and broke it down further and further. We started with which age group spends the most on Frozen Meals. We wanted to determine which target markets to push out these deals. Then we wanted to see if this age group actually uses coupons/deals. We determined that there is a direct link in those two graphs. So we continued our analysis by figuring out the top 5 products within Frozen Meals. We then extracted the top 3 of those to figure out the top products purchased along with Frozen Meals which was determined to be Bananas and Milk.

Summarize the interesting insights that your analysis provided. What should you propose to the Regork CEO?

Our analysis allows Regork to understand consumer behavior for a specific product. It shows what a certain age group buys wuth certain products. Regork as a business is always finding ways to create more revenue and we believe our analysis shows ways to do so. If we use this analysis, then we should make a promotion that gives customers some sort of discount when they buy Frozen Meat/Dinners, Bananas, and Milk. We also need to make sure these ads are being exposed to the 45-54 age group as they use the most coupons on Frozen Meals as we seen from the start.

Discuss the limitations of your analysis and how you, or someone else, could improve or build on it

The limitations consist of only showing the top 3 products. There could be so many other routes of doing this analysis. We just focused on a specific part of a product when we could go super broad. Maybe we could build a data model out of this to show multiple products and expand our research and findings even further.