We believe there is many opportunities for Regork to sell more cheese products. By so doing they will increase the sales of other products, as Cheese is often an ingredient in a bigger dish, or used as a garnish.
By utilizing the data sets of demographics and transactions, we were able to determine both what income and age demographics are purchasing cheese products, and what products are most often purchased with cheese products.
This analysis is useful to Regork as it highlights who is most likely to buy, and what products they are most likely to buy. This information would allow them to better layout their stores. Along with that this can be used by the marketing team to attempt to break into untapped markets that currently are not buying Cheese product.
This package allows us access to the data needed to successfully analyze Regorks transactions and demographic data
This package allows us easier manipulation of the data
We used this package for data visualization including both graphs. It streamlines the design and alteration of graphing in R.
suppressPackageStartupMessages(library(completejourney))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(ggplot2))
Attach the parts of the data set that we plan on using to easier values
transactions <- get_transactions()
promotions <- get_promotions()
products <- products
demographics <- demographics
Filter the data to depict only the values we want
cheese_products <- products %>% filter(grepl("CHEESE", product_category, ignore.case = TRUE))
cheese_transactions <- transactions %>% filter(product_id %in% cheese_products$product_id)
Join the demographics and cheese transactions data sets to further analyze them
cheese_demographics <- cheese_transactions %>% left_join(demographics, by = "household_id")
Create a summary of the demographic data for households that purchased cheese products. Filter out any missing values.
demographic_summary <- cheese_demographics %>%
group_by(age, income, marital_status) %>%
summarize(TotalPurchases = n(), .groups = "drop") %>%
arrange(desc(TotalPurchases))
demographic_summary <- demographic_summary %>%
filter(!is.na(age), !is.na(income), !is.na(marital_status))
print(demographic_summary)
## # A tibble: 106 × 4
## age income marital_status TotalPurchases
## <ord> <ord> <ord> <int>
## 1 45-54 50-74K Unmarried 1725
## 2 45-54 50-74K Married 986
## 3 45-54 75-99K Married 949
## 4 35-44 35-49K Married 853
## 5 35-44 50-74K Married 832
## 6 25-34 50-74K Unmarried 767
## 7 25-34 50-74K Married 742
## 8 35-44 75-99K Married 682
## 9 45-54 35-49K Unmarried 620
## 10 35-44 35-49K Unmarried 614
## # ℹ 96 more rows
Create Graph to represent the demographics of cheese product buyers.
ggplot(demographic_summary, aes(x = age, y = TotalPurchases, fill = income)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Demographics of Cheese Product Buyers",
x = "Age Group",
y = "Total Purchases") +
theme_minimal()
Filter the baskets that have cheese items and determine what else is in those
related_purchases <- transactions %>%
filter(basket_id %in% cheese_transactions$basket_id) %>%
filter(product_id != cheese_products$product_id)
## Warning: There was 1 warning in `filter()`.
## ℹ In argument: `product_id != cheese_products$product_id`.
## Caused by warning in `product_id != cheese_products$product_id`:
## ! longer object length is not a multiple of shorter object length
top_items_with_cheese <- related_purchases %>%
group_by(product_id) %>%
summarize(TotalPurchases = n(), .groups = "drop") %>%
left_join(products, by = "product_id") %>%
arrange(desc(TotalPurchases)) %>%
head(5)
print(top_items_with_cheese)
## # A tibble: 5 × 8
## product_id TotalPurchases manufacturer_id department brand product_category
## <chr> <int> <chr> <chr> <fct> <chr>
## 1 1082185 7510 2 PRODUCE National TROPICAL FRUIT
## 2 1029743 3421 69 GROCERY Private FLUID MILK PROD…
## 3 995242 3032 69 GROCERY Private FLUID MILK PROD…
## 4 981760 2789 69 GROCERY Private EGGS
## 5 1106523 2093 69 GROCERY Private FLUID MILK PROD…
## # ℹ 2 more variables: product_type <chr>, package_size <chr>
ggplot(top_items_with_cheese, aes(x = reorder(product_type, -TotalPurchases), y = TotalPurchases)) +
geom_bar(stat = "identity", fill = "steelblue") +
labs(title = "Top 3 Items Purchased with Cheese Products",
x = "Product",
y = "Total Purchases") +
theme_minimal() +
coord_flip()
We believed that there was missed profit coming from the Cheese products being sold at Regork. We were able to analyze the demographics that are already buying and the demographics that are not by joining the data sets of Demographics and transactions. By so doing we found out that young and old people are buying substantially less cheese product than middle aged people. It was also concluded that people in the middle class often purchase more cheese product than lower and upper financial groups. We were also able to analyze what else people were buying with cheese products and concluded that Milk, Eggs, and Bananas were the three top products purchased along with cheese. These insights are useful specifically for advertising and store layout. Advertising to groups that are currently not purchasing much cheese could break through an untapped market. Laying stores out differently could maximize the amount of products that have to be walked past in order to get to the products the customers typically buy with cheese.
Our biggest limitation is the store layout suggestion. If this change were to be implemented it would have to be fit together with thousands of other data sets similar to ours. Other than that, the demographics have limitations because of the outliers within the graph. These outliers make it difficult to pinpoint potential marketing targets.