Introduction

Our Problem

We believe there is many opportunities for Regork to sell more cheese products. By so doing they will increase the sales of other products, as Cheese is often an ingredient in a bigger dish, or used as a garnish.

Our findings

By utilizing the data sets of demographics and transactions, we were able to determine both what income and age demographics are purchasing cheese products, and what products are most often purchased with cheese products.

Why Does It Matter?

This analysis is useful to Regork as it highlights who is most likely to buy, and what products they are most likely to buy. This information would allow them to better layout their stores. Along with that this can be used by the marketing team to attempt to break into untapped markets that currently are not buying Cheese product.

Packages

CompleteJourney

This package allows us access to the data needed to successfully analyze Regorks transactions and demographic data

Dplyr

This package allows us easier manipulation of the data

ggplot2

We used this package for data visualization including both graphs. It streamlines the design and alteration of graphing in R.

suppressPackageStartupMessages(library(completejourney))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(ggplot2))

Create Variables

Attach the parts of the data set that we plan on using to easier values

transactions <- get_transactions()

promotions <- get_promotions()

products <- products

demographics <- demographics

Filter

Filter the data to depict only the values we want

cheese_products <- products %>%   filter(grepl("CHEESE", product_category, ignore.case = TRUE))

cheese_transactions <- transactions %>%   filter(product_id %in% cheese_products$product_id)

Join The Data Sets

Join the demographics and cheese transactions data sets to further analyze them

cheese_demographics <- cheese_transactions %>%   left_join(demographics, by = "household_id")

Create Summary

Create a summary of the demographic data for households that purchased cheese products. Filter out any missing values.

demographic_summary <- cheese_demographics %>%
  group_by(age, income, marital_status) %>%
  summarize(TotalPurchases = n(), .groups = "drop") %>%
  arrange(desc(TotalPurchases))

demographic_summary <- demographic_summary %>%
  filter(!is.na(age), !is.na(income), !is.na(marital_status))

print(demographic_summary)
## # A tibble: 106 × 4
##    age   income marital_status TotalPurchases
##    <ord> <ord>  <ord>                   <int>
##  1 45-54 50-74K Unmarried                1725
##  2 45-54 50-74K Married                   986
##  3 45-54 75-99K Married                   949
##  4 35-44 35-49K Married                   853
##  5 35-44 50-74K Married                   832
##  6 25-34 50-74K Unmarried                 767
##  7 25-34 50-74K Married                   742
##  8 35-44 75-99K Married                   682
##  9 45-54 35-49K Unmarried                 620
## 10 35-44 35-49K Unmarried                 614
## # ℹ 96 more rows

Demographics of Cheese Product Buyers

Create Graph to represent the demographics of cheese product buyers.

ggplot(demographic_summary, aes(x = age, y = TotalPurchases, fill = income)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Demographics of Cheese Product Buyers",
       x = "Age Group",
       y = "Total Purchases") +
  theme_minimal()

Filter and Group by Basket

Filter the baskets that have cheese items and determine what else is in those

related_purchases <- transactions %>%
  filter(basket_id %in% cheese_transactions$basket_id) %>%
  filter(product_id != cheese_products$product_id)
## Warning: There was 1 warning in `filter()`.
## ℹ In argument: `product_id != cheese_products$product_id`.
## Caused by warning in `product_id != cheese_products$product_id`:
## ! longer object length is not a multiple of shorter object length
top_items_with_cheese <- related_purchases %>%
  group_by(product_id) %>%
  summarize(TotalPurchases = n(), .groups = "drop") %>%
  left_join(products, by = "product_id") %>%
  arrange(desc(TotalPurchases)) %>%
  head(5)

print(top_items_with_cheese)
## # A tibble: 5 × 8
##   product_id TotalPurchases manufacturer_id department brand    product_category
##   <chr>               <int> <chr>           <chr>      <fct>    <chr>           
## 1 1082185              7510 2               PRODUCE    National TROPICAL FRUIT  
## 2 1029743              3421 69              GROCERY    Private  FLUID MILK PROD…
## 3 995242               3032 69              GROCERY    Private  FLUID MILK PROD…
## 4 981760               2789 69              GROCERY    Private  EGGS            
## 5 1106523              2093 69              GROCERY    Private  FLUID MILK PROD…
## # ℹ 2 more variables: product_type <chr>, package_size <chr>

Top 3 Products Purchased with Cheese Products

ggplot(top_items_with_cheese, aes(x = reorder(product_type, -TotalPurchases), y = TotalPurchases)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  labs(title = "Top 3 Items Purchased with Cheese Products",
       x = "Product",
       y = "Total Purchases") +
  theme_minimal() +
  coord_flip()

Summary

We believed that there was missed profit coming from the Cheese products being sold at Regork. We were able to analyze the demographics that are already buying and the demographics that are not by joining the data sets of Demographics and transactions. By so doing we found out that young and old people are buying substantially less cheese product than middle aged people. It was also concluded that people in the middle class often purchase more cheese product than lower and upper financial groups. We were also able to analyze what else people were buying with cheese products and concluded that Milk, Eggs, and Bananas were the three top products purchased along with cheese. These insights are useful specifically for advertising and store layout. Advertising to groups that are currently not purchasing much cheese could break through an untapped market. Laying stores out differently could maximize the amount of products that have to be walked past in order to get to the products the customers typically buy with cheese.

Limitations

Our biggest limitation is the store layout suggestion. If this change were to be implemented it would have to be fit together with thousands of other data sets similar to ours. Other than that, the demographics have limitations because of the outliers within the graph. These outliers make it difficult to pinpoint potential marketing targets.