Our objective is to explore and evaluate various growth opportunities where Regork can invest to boost revenue and profit. We are always looking for ways to to make a meaningful impact and capture consumer loyalty. It can be difficult to win the “hearts and minds” of the consumers considering it’s hard to stand out but we have an idea to reach some markets to bring them into the store and purchase our products.
What do customers tend to buy along with Frozen Meat/Meat dinners and are there any correlations with these products to potentially create promotions to increase revenue?
To analyze this, we will be using the complete journey package in R which is where the data is for Regork. The data is already there, we will be transforming the data in unique ways to show the findings of the above question.
Using this data will help the CEO of Regork by finding target markets and making promotions for specific products to gain more consumer interest. Using this data will bring in more customers and more money for Regork.
Our proposed solution is to create a deal or promotion when customers buy Frozen Meat/Meat Dinners since it is a popular item in Regork. In our analysis, we will demonstrate and recommended some products that tend to be bought with Frozen Meat/Meat Dinners to make a package deal.
library(tidyverse) # Transforming and presenting data
library(ggplot2) # Creating plots and visualizations
library(completejourney) # Main data for transactions, products, promotions, etc.
library(dplyr) # Transforming and manipulating data
library(readr) # Reads CSV's and other various files
library(knitr) # Creates the html
These are the data frames in the complete journey package we will be using in our analysis
transactions <- get_transactions()
transactions
## # A tibble: 1,469,307 × 11
## household_id store_id basket_id product_id quantity sales_value retail_disc
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 900 330 31198570044 1095275 1 0.5 0
## 2 900 330 31198570047 9878513 1 0.99 0.1
## 3 1228 406 31198655051 1041453 1 1.43 0.15
## 4 906 319 31198705046 1020156 1 1.5 0.29
## 5 906 319 31198705046 1053875 2 2.78 0.8
## 6 906 319 31198705046 1060312 1 5.49 0.5
## 7 906 319 31198705046 1075313 1 1.5 0.29
## 8 1058 381 31198676055 985893 1 1.88 0.21
## 9 1058 381 31198676055 988791 1 1.5 1.29
## 10 1058 381 31198676055 9297106 1 2.69 0
## # ℹ 1,469,297 more rows
## # ℹ 4 more variables: coupon_disc <dbl>, coupon_match_disc <dbl>, week <int>,
## # transaction_timestamp <dttm>
promotions <- get_promotions()
promotions
## # A tibble: 20,940,529 × 5
## product_id store_id display_location mailer_location week
## <chr> <chr> <fct> <fct> <int>
## 1 1000050 316 9 0 1
## 2 1000050 337 3 0 1
## 3 1000050 441 5 0 1
## 4 1000092 292 0 A 1
## 5 1000092 293 0 A 1
## 6 1000092 295 0 A 1
## 7 1000092 298 0 A 1
## 8 1000092 299 0 A 1
## 9 1000092 304 0 A 1
## 10 1000092 306 0 A 1
## # ℹ 20,940,519 more rows
products
## # A tibble: 92,331 × 7
## product_id manufacturer_id department brand product_category product_type
## <chr> <chr> <chr> <fct> <chr> <chr>
## 1 25671 2 GROCERY Natio… FRZN ICE ICE - CRUSH…
## 2 26081 2 MISCELLANEOUS Natio… <NA> <NA>
## 3 26093 69 PASTRY Priva… BREAD BREAD:ITALI…
## 4 26190 69 GROCERY Priva… FRUIT - SHELF S… APPLE SAUCE
## 5 26355 69 GROCERY Priva… COOKIES/CONES SPECIALTY C…
## 6 26426 69 GROCERY Priva… SPICES & EXTRAC… SPICES & SE…
## 7 26540 69 GROCERY Priva… COOKIES/CONES TRAY PACK/C…
## 8 26601 69 DRUG GM Priva… VITAMINS VITAMIN - M…
## 9 26636 69 PASTRY Priva… BREAKFAST SWEETS SW GDS: SW …
## 10 26691 16 GROCERY Priva… PNT BTR/JELLY/J… HONEY
## # ℹ 92,321 more rows
## # ℹ 1 more variable: package_size <chr>
demographics
## # A tibble: 801 × 8
## household_id age income home_ownership marital_status household_size
## <chr> <ord> <ord> <ord> <ord> <ord>
## 1 1 65+ 35-49K Homeowner Married 2
## 2 1001 45-54 50-74K Homeowner Unmarried 1
## 3 1003 35-44 25-34K <NA> Unmarried 1
## 4 1004 25-34 15-24K <NA> Unmarried 1
## 5 101 45-54 Under 15K Homeowner Married 4
## 6 1012 35-44 35-49K <NA> Married 5+
## 7 1014 45-54 15-24K <NA> Married 4
## 8 1015 45-54 50-74K Homeowner Unmarried 1
## 9 1018 45-54 35-49K Homeowner Married 5+
## 10 1020 45-54 25-34K Homeowner Married 2
## # ℹ 791 more rows
## # ℹ 2 more variables: household_comp <ord>, kids_count <ord>
coupons
## # A tibble: 116,204 × 3
## coupon_upc product_id campaign_id
## <chr> <chr> <chr>
## 1 10000085207 9676830 26
## 2 10000085207 9676943 26
## 3 10000085207 9676944 26
## 4 10000085207 9676947 26
## 5 10000085207 9677008 26
## 6 10000085207 9677052 26
## 7 10000085207 9677385 26
## 8 10000085207 9677479 26
## 9 10000085207 9677791 26
## 10 10000085207 9677878 26
## # ℹ 116,194 more rows
These 4 data sets is how we will be coming up with suggestions for our business problem. We will be joining and filtering multiple data sets to find our answers.
We first want to find out, “What age group is buying Frozen Meat/Meat Dinners the most?”. We can determine what market to push promotions out to the most with this.
transactions %>%
inner_join(demographics, by = ("household_id")) %>%
inner_join(products, by = ("product_id")) %>%
filter(product_category == 'FRZN MEAT/MEAT DINNERS') %>%
ggplot(aes(x = age, fill = product_category)) +
geom_bar(color="red") +
labs(
fill = "Product Type",
x = "Age",
y = "Count of Products Sold",
title = "What Age Group Buys The Most Frozen Dinners?",
subtitle = "The bars indicate how many products are sold per age group"
)
As we can see, ages 45-54 purchase the most frozen meals. More specifically, a majority of the purchases consist of ages 25-54. In this range includes college students, middle aged adults and potentially with families, and slightly older adults. With this data, Regork can push out some deals to these markets. For example e could create a “college deal” or a “family bundle”. This will get the attention of these groups that buy the products the most.
Next we wanted to examine which age group uses the most coupons. If we want to make a promotion, we want to see where coupons are being used the most. In fact, we will filter out Frozen Meat/Meat Dinners to break the data down even further. The goal is to see if those age groups determined in the graph above and close to the same in this graph. If so, then we can start to make an idea on how we want to promote this.
transactions %>%
inner_join(demographics, by = ("household_id")) %>%
inner_join(products, by = ("product_id")) %>%
inner_join(coupons, by = 'product_id') %>%
filter(product_category == 'FRZN MEAT/MEAT DINNERS') %>%
group_by(age) %>%
summarize(total_coupon_discount = sum(coupon_disc, na.rm = TRUE)) %>%
ggplot(aes(x = age, y = total_coupon_discount, fill = age)) +
geom_bar(stat = "identity") +
labs(title = "Total Coupons Used of Frozen Meals by Age Group",
x = "Age Group",
y = "Total Coupons Used")
As seen in the graph, this is a near identical correlation between the first and second graph. This also proves to us that the 35-54 age group is willing to purchase deals related to Frozen Meat/Meat Dinners. With these answers, we can now look into ways to actually make deals to grow revenue for Regork. We want to now break the data down to look at the top 10 product types that are inside the Frozen Meat/Meat Dinners category. This way we can see which Frozen Meats/Meat Dinners are the most purchased to base the deals around those particular products.
transactions %>%
inner_join(demographics, by = "household_id") %>%
inner_join(products, by = "product_id") %>%
filter(product_category == 'FRZN MEAT/MEAT DINNERS') %>%
count(product_type) %>%
arrange(desc(n)) %>%
head(10) %>%
ggplot(aes(x = reorder(product_type, n), y = n, fill = product_type)) +
geom_bar(stat = "identity") +
labs(
title = "Top 10 Product Types in Frozen Meat/Dinners",
x = "Product Type",
y = "Number of Purchases"
) +
theme(axis.text.x = element_text(angle = 60, hjust = 1))
We have now exacted the top 3 product types within the Frozen Meat/Meat Dinners category. The top 3 product types shown in the graph are actually related to meal prepping which we can see soon in the graphs below. Think of something like hamburger helper. That is the type of product we are dealing with.
Going into our final part of the analysis, we want to determine what the top 3 products purchased are when those top 3 product types are purchased in the graph above. This will finally show us which products to bundle with Frozen Meats/Meat Dinners to grow revenue.
To Start, we will look at the most purchased products bought along with the Frozen Premium Entree Dinners/N product type.
frzn_transactions <- transactions %>%
inner_join(products, by = 'product_id') %>%
filter(product_type == "FRZN SS PREMIUM ENTREES/DNRS/N")
co_occuring_products <- transactions %>%
inner_join(products, by = 'product_id') %>%
filter(basket_id %in% frzn_transactions$basket_id) %>%
filter(product_type != 'FRZN SS PREMIUM ENTREES/DNRS/N') %>%
group_by(product_id) %>%
summarise(count = n()) %>%
arrange(desc(count)) %>%
slice_head(n = 3)
ggplot(co_occuring_products, aes(x = reorder(product_id, count), y = count)) +
geom_bar(stat = "identity", fill = "steelblue") +
coord_flip() +
labs(title = "Top 3 Products Bought Alongside 'FRZN SS PREMIUM ENTREES/DNRS/N'",
x = "Product ID",
y = "Number of Occurrences") +
theme_minimal()
As seen in the graph, there are 3 product id’s shown and we will break
down what those products are. To Start, Product 1082185 is actually
Bananas. Bananas are the most purchased product alongside Frozen Premium
Entrees. Product 1029743 and 995242 are two types of white milk. This is
an interesting find. Bananas are a popular fruit to eat with any meal
and the data shows here that there is a relationship of purchases
between bananas and frozen meals. With the milk, we believe there is a
correlation there because majority of premium meals require milk. This
would make sense why milk is purchased a lot with our targeted product.
Now lets look at Economy Entrees and see if there are any
similarities.
frzn_transactions <- transactions %>%
inner_join(products, by = 'product_id') %>%
filter(product_type == "SS ECONOMY ENTREES/DINNERS ALL")
co_occuring_products <- transactions %>%
inner_join(products, by = 'product_id') %>%
filter(basket_id %in% frzn_transactions$basket_id) %>%
filter(product_type != 'SS ECONOMY ENTREES/DINNERS ALL') %>%
group_by(product_id) %>%
summarise(count = n()) %>%
arrange(desc(count)) %>%
slice_head(n = 3)
ggplot(co_occuring_products, aes(x = reorder(product_id, count), y = count)) +
geom_bar(stat = "identity", fill = "pink") +
coord_flip() +
labs(title = "Top 3 Products Bought Alongside 'SS ECONOMY ENTREES/DINNERS ALL'",
x = "Product ID",
y = "Number of Occurrences") +
theme_minimal()
Surprisingly, it is the same exact products in the same exact order with the Bananas and Milk. So there is an exact relationship of purchase behavior here. We can already get a sense of what products to bundle with Frozen Dinners but just to be sure, we will do the same thing with the last Frozen meal Product type.
frzn_transactions <- transactions %>%
inner_join(products, by = 'product_id') %>%
filter(product_type == "FRZN SS PREMIUM ENTREES/DNRS/T")
co_occuring_products <- transactions %>%
inner_join(products, by = 'product_id') %>%
filter(basket_id %in% frzn_transactions$basket_id) %>%
filter(product_type != 'FRZN SS PREMIUM ENTREES/DNRS/T') %>%
group_by(product_id) %>%
summarise(count = n()) %>%
arrange(desc(count)) %>%
slice_head(n = 3)
ggplot(co_occuring_products, aes(x = reorder(product_id, count), y = count)) +
geom_bar(stat = "identity", fill = "violet") +
coord_flip() +
labs(title = "Top 3 Products Bought Alongside 'FRZN SS PREMIUM ENTREES/DNRS/T'",
x = "Product ID",
y = "Number of Occurrences") +
theme_minimal()
Once again, it is the same three products. There is only one slight
difference and that is the two milks are flipped in the number of
occurrences. Still, its the same products shown across all three graphs.
With all this data we have discovered, it is time to provide an answer
for our question.
Summarize the problem statement you addressed
Throughout our data analysis we have been trying to determine if there are products that are frequently bought alongside Frozen Meat/Meat Dinners. If there was, then we would make a suggestion to Regork for making a package deal with these products to boost revenue.
Summarize how you addressed this problem statement (the data used and the methodology employed)
To be able to get to the answers we wanted. We had to go step by step. We basically made a graph of what we wanted and broke it down further and further. We started with which age group spends the most on Frozen Meals. We wanted to determine which target markets to push out these deals. Then we wanted to see if this age group actually uses coupons/deals. We determined that there is a direct link in those two graphs. So we continued our analysis by figuring out the top 5 products within Frozen Meals. We then extracted the top 3 of those to figure out the top products purchased along with Frozen Meals which was determined to be Bananas and Milk.
Summarize the interesting insights that your analysis provided. What should you propose to the Regork CEO?
Our analysis allows Regork to understand consumer behavior for a specific product. It shows what a certain age group buys wuth certain products. Regork as a business is always finding ways to create more revenue and we believe our analysis shows ways to do so. If we use this analysis, then we should make a promotion that gives customers some sort of discount when they buy Frozen Meat/Dinners, Bananas, and Milk. We also need to make sure these ads are being exposed to the 45-54 age group as they use the most coupons on Frozen Meals as we seen from the start.
Discuss the limitations of your analysis and how you, or someone else, could improve or build on it
The limitations consist of only showing the top 3 products. There could be so many other routes of doing this analysis. We just focused on a specific part of a product when we could go super broad. Maybe we could build a data model out of this to show multiple products and expand our research and findings even further.