Introduction

The Growth Possibility

In this report,the business problem we addressed was how we can leverage the cyclical nature of bread sales with items commonly purchased with them.

Process

We discussed demographics and spending habits of our customers and listed household to identify the leading characteristics of bread buyers. This gave us insight into which demographics to target, as well as items commonly purchased with bread throughout the year.

Analysis Importance

This report will allow you to analyze the Regork bread purchasers to best prepare for the bread buying cycles, and best position the commonly purchased with bread. This can be used to drive spending and increased transactions while supporting the customers and there needs based on the buying habits we have identified.

Packages and Libraries Required

library(completejourney): It is a package that contains data from The Complete Journey, the consumer packaged goods dataset that we used for this project. This dataset includes information on transactions, products, promotions, and demographics, making it useful for this project.

library(tidyverse): tidyverse is a collection of R packages designed for data science. It includes packages like ggplot2, dplyr, tidyr, and more, that share a common philosophy and are intended to work together seamlessly.

library(ggplot2): It provides a flexible system for creating a wide variety of plots, such as bar charts, line graphs, histograms, scatterplots, etc.

library(dplyr): It is a data manipulation package, designed to simplify data wrangling tasks. It provides easy-to-use functions for filtering, selecting, grouping, summarizing, and mutating data frame.

library(lubridate): It is designed to make working with dates and times easier. It provides functions for parsing, manipulating, and converting dates and times in a more intuitive and readable way

# Import necessary libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
library(completejourney)
## Welcome to the completejourney package! Learn more about these data
## sets at http://bit.ly/completejourney.
# get the full promotions data set
promotions <- get_promotions()
promotions
## # A tibble: 20,940,529 × 5
##    product_id store_id display_location mailer_location  week
##    <chr>      <chr>    <fct>            <fct>           <int>
##  1 1000050    316      9                0                   1
##  2 1000050    337      3                0                   1
##  3 1000050    441      5                0                   1
##  4 1000092    292      0                A                   1
##  5 1000092    293      0                A                   1
##  6 1000092    295      0                A                   1
##  7 1000092    298      0                A                   1
##  8 1000092    299      0                A                   1
##  9 1000092    304      0                A                   1
## 10 1000092    306      0                A                   1
## # ℹ 20,940,519 more rows
# get the full transactions data set
transactions <- get_transactions()
transactions
## # A tibble: 1,469,307 × 11
##    household_id store_id basket_id   product_id quantity sales_value retail_disc
##    <chr>        <chr>    <chr>       <chr>         <dbl>       <dbl>       <dbl>
##  1 900          330      31198570044 1095275           1        0.5         0   
##  2 900          330      31198570047 9878513           1        0.99        0.1 
##  3 1228         406      31198655051 1041453           1        1.43        0.15
##  4 906          319      31198705046 1020156           1        1.5         0.29
##  5 906          319      31198705046 1053875           2        2.78        0.8 
##  6 906          319      31198705046 1060312           1        5.49        0.5 
##  7 906          319      31198705046 1075313           1        1.5         0.29
##  8 1058         381      31198676055 985893            1        1.88        0.21
##  9 1058         381      31198676055 988791            1        1.5         1.29
## 10 1058         381      31198676055 9297106           1        2.69        0   
## # ℹ 1,469,297 more rows
## # ℹ 4 more variables: coupon_disc <dbl>, coupon_match_disc <dbl>, week <int>,
## #   transaction_timestamp <dttm>
# a convenience function to get both
c(promotions, transactions) %<-% get_data(which = 'both', verbose = FALSE)
dim(promotions)
## [1] 20940529        5
## [1] 20940529        5

dim(transactions)
## [1] 1469307      11

Data Analysis

# Bread purchased Over The Year
transactions_with_category <- transactions %>%
  left_join(products, by = 'product_id')
  
transactions_with_category <- transactions_with_category %>%
  mutate(date = as.Date("2018-01-01") + weeks(week - 1))
  #Adjust year if needed

bread_purchases <- transactions_with_category %>%
  filter(product_category == "BREAD") %>%
  group_by(date) %>%
  summarize(total_spending = sum(sales_value, na.rm = TRUE)) %>%
  arrange(date)

ggplot(bread_purchases,aes(x = date, y = total_spending)) +
  geom_line(color = "blue") +
  geom_point(color = "blue") +
  labs(title = "Bread Purchases Over The Year",
       subtitle = "Tracking spending patterns over time",
       x = "Date",
       y = "Total Spending ($)") + 
  scale_x_date(date_breaks = "1 month", date_labels = "%b") + 
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

## Significance: There is a clear consistent nature in bread transactions. This graph shows us that bread sales are usually very high throughout the summer, especially during the beginning and end of the year. This supports our initial theory as we see a higher demand because bread is a staple food in many households, making it a frequent purchase throughout the year. Its versatility for meals and snacks, along with its relatively low cost, ensures consistent demand.

Bread Bought by Age Group

joined_data <- transactions %>%
  left_join(demographics, by = "household_id") %>%
  left_join(products, by = "product_id")

joined_data <- joined_data %>%
  filter(!is.na(age))

colnames(joined_data)
##  [1] "household_id"          "store_id"              "basket_id"            
##  [4] "product_id"            "quantity"              "sales_value"          
##  [7] "retail_disc"           "coupon_disc"           "coupon_match_disc"    
## [10] "week"                  "transaction_timestamp" "age"                  
## [13] "income"                "home_ownership"        "marital_status"       
## [16] "household_size"        "household_comp"        "kids_count"           
## [19] "manufacturer_id"       "department"            "brand"                
## [22] "product_category"      "product_type"          "package_size"
joined_data %>% filter(product_category == "BREAD")
## # A tibble: 3,196 × 24
##    household_id store_id basket_id   product_id quantity sales_value retail_disc
##    <chr>        <chr>    <chr>       <chr>         <dbl>       <dbl>       <dbl>
##  1 1765         346      31198970244 826144            1        1           0.19
##  2 1333         317      31198475743 7166523           1        2.99        0.5 
##  3 2312         442      31198591439 9265288           1        1.49        0   
##  4 1057         427      31198492256 841138            1        0.89        0   
##  5 1907         338      31198881518 10200444          1        1           0.99
##  6 324          343      31198999154 9245312           1        1           0.19
##  7 404          396      31198636871 9677672           1        2.99        0.3 
##  8 1475         361      31225556127 879031            1        2.29        0   
##  9 85           362      31225396600 1139830           2        0.78        0   
## 10 2322         369      31225806846 1009449           2        2           0.38
## # ℹ 3,186 more rows
## # ℹ 17 more variables: coupon_disc <dbl>, coupon_match_disc <dbl>, week <int>,
## #   transaction_timestamp <dttm>, age <ord>, income <ord>,
## #   home_ownership <ord>, marital_status <ord>, household_size <ord>,
## #   household_comp <ord>, kids_count <ord>, manufacturer_id <chr>,
## #   department <chr>, brand <fct>, product_category <chr>, product_type <chr>,
## #   package_size <chr>
bread_by_age <- joined_data %>%
  filter(product_category == "BREAD")

age_bread_quantity <- bread_by_age %>%
  group_by(age) %>%
  summarise(total_quantity = sum(quantity))

ggplot(age_bread_quantity, aes(x = age, y = total_quantity, group + 1)) +
  geom_line(color = "blue") +
  geom_point(color = "blue", size = 2) +
  labs(title = "Bread Bought by Age Group",
       x = "Age Group",
       y = "Quantity of Bread") +
  theme_minimal()
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?

## Significance: We see a material difference in the age of most bread buyers. This graph suggests that a majority of bread purchases throughout the year are from buyers between the ages of 35-54 years old. Outside this range is a hard drop off to the age range on either side.

Bread Bought by Marital Status

joined_data2 <- transactions %>%
  left_join(demographics, by = "household_id") %>%
  left_join(products, by = "product_id") %>%
  filter(!is.na(marital_status))

bread_by_status <- joined_data2 %>%
  filter(product_category == "BREAD")

ggplot(bread_by_status, aes(x = marital_status, fill = package_size)) +
  geom_bar() +
  coord_flip() +
  labs(title = "Bread Bought by Marital Status",
       x = "Marital Status",
       y = "Quantity of Bread",
       fill = "Package Size") +
  theme_minimal()

## Significance: With this visual displaying Marital Status and Quantity of Bread, you can see the difference bought between the two groups. From the data, it can be concluded that Married group had bought more bread than the Unmarried group. The numbers being about 1700 Married and about 1100 Unmarried.

Top Ten Items Purchased with Ice

merged_data <- merge(transactions, products, by = "product_id")

bread_transactions <- merged_data %>%
  filter(product_category == "BREAD")

top_10_items <- merged_data %>%
  filter(basket_id %in% bread_transactions$basket_id) %>%
  filter(!product_category %in% c("COUPON", "MISC", "BREAD")) %>%
  count(product_category, sort = TRUE) %>%
  top_n(10)
## Selecting by n
##Selecting by n

ggplot(top_10_items, aes(x = reorder(product_category, n), y = n)) +
  geom_bar(stat = "identity", fill = "steel blue") +
  coord_flip() +
  labs(title = "Top 10 Items Purchased with BREAD (Excluding BREAD, Coupon/Misc)",
       x = "Product Category",
       y = "Count of Purchases") +
  theme_minimal()

## Significance: By displaying the top 10 items purchased with bread, we are able to determine which items we should focus on. It is clear to see that soft drinks, bag snacks, and bread are often purchased with bread. The other items make sense to be bought with bread as they either need bread to stay fresh or they are paired with items that can go along with meals made with bread.

Top Three Items Purchased

merged_dataTOP3 <- merge(transactions, products, by = "product_id")

selected_productsTOP3 <- merged_dataTOP3 %>%
  filter(product_category %in% c("SOFT DRINKS", "BAG SNACKS", "COFFEE"))

sales_summaryTOP3 <- selected_productsTOP3 %>%
  group_by(product_category) %>%
  summarise(total_sales = sum(sales_value))

ggplot(sales_summaryTOP3, aes(x = product_category, y = total_sales, fill = product_category)) +
  geom_bar(stat = "Identity") +
  labs(title = "Sales for SOFT DRINKS, BAG SNACKS, COFFEE",
       x = "Product Category",
       y = "Total Sales ($)") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

## Significance: This plot shows the age demographic that purchases the most of the top 3 items bought with bread. As a result, we can refine on different age ranges to choose whether or not it is worth marketing to them in some way.

Top Three Items Bought with Ice

merged_dataTOP3wBREAD <- transactions %>%
  inner_join(products, by = "product_id") %>%
  inner_join(demographics, by = "household_id")

selected_products <- merged_dataTOP3wBREAD %>%
  filter(product_category %in% c("SOFT DRINK", "BAG SNACKS", "COFFEE"))

sales_summary <- selected_products %>%
  group_by(product_category, age) %>%
  summarise(total_sales = sum(sales_value)) %>%
  ungroup()
## `summarise()` has grouped output by 'product_category'. You can override using
## the `.groups` argument.
## `summarise()` has grouped output by "product_category'. You can override using ## the `.groups` argument.

ggplot(sales_summary, aes(x = product_category, y = total_sales, fill = age)) + 
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Sales by Product Category and Age Group",
       x = "Product Category",
       y = "Total Sales ($)",
       fill = "Age Group") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

## Significance: This graph displays the total sales of the top three items that are purchased with bread. With this, we can easily see which bread-related items we should focus on as they already get a lot of attention from bread buyers.

Conclusion

Insights and Recommendations

Based on our findings, we have identified a few ways Regork can leverage the bread sales to increase total basket size.

One thing we noticed was the top 10 items purchased with bread were common for basic meals in a household. So, we should market coupons that offer a combo deal for these items Cross-selling: Buy a bag of bread and get 50% off of a bagged snack. This will total boost basket value. Targeted marketing: Advertising campaign for recipes. Focus on in-store, online, and paper ads on events like those listed above

We found that ice is primarily bought by younger age groups (35-54) and families. Segmented Marketing: Promote the multi-use of bread on social media that these age groups typically use (YouTube, Facebook, Instagram). Family-oriented promotions: Special discounts for amount of bread bought and family-size snacks and soft drinks

Our final recommendation is around accessibility and product placement.We advise you to put the bread near the top 3 items bought with bread close to checkout. Limitations:The broadness of product categories Ex. Soft drinks were the most common items purchased with bread, but which soft drinks? Coke or Pepsi products?

Out intial thesis also predicted similar results. However, we thought bag snacks purchases would be greater and the high purchase volume seen in June/July would be greater due to sandwhiches being brought to events, beaches, and pools.

To improve: Group store_id by location to determine which regions are purchasing the most bread. Decipher which promotions were put in place during the time of the sales figures we used. Perform a profit analysis to determine the worth of running the promotions we mentioned.