2024 BANA4080 Group24 Midterm Project

Chicken Noodule Soup Analysis

Introduction

Business Question

Is the growing preference for Chicken Noodle Soup solely influenced by seasonal weather patterns, or are there additional underlying factors driving this trend?

(i) Why the CEO Should Be Interested: This is an opportunity to increase revenue by targeting a known seasonal demand spike. The successful launch of a new product line could enhance brand positioning in a competitive food market, increase customer loyalty, and drive sales during the fall season. The CEO’s interest stems from the potential for higher profitability, an expanded customer base, and a stronger presence in the soup category.

(ii) Approach to the Problem: We employed a combination of:

Quantitative analysis: Examining financial data to calculate price sensitivity.
Market research: Collecting and analyzing customer preferences for chicken noodle soup broth.
Promotional strategies: Evaluating which types of promotions (e.g., discounts, bundles, or advertising) would resonate with the target market.

(iii) How our analysis will help the Regork:

Impact on Decision-Making: The analysis equips the Regork CEO with data-driven insights into the potential financial success of launching the chicken noodle soup broth. It highlights key opportunities for market entry, projected sales growth, and effective promotional strategies. By understanding which promotions work best for this market segment, the CEO can make informed decisions about marketing budgets and product positioning.
Proposed Solution: Based on the analysis, the optimal strategy would be to launch the product with targeted promotions during peak soup season (late September to early winter). This could include digital marketing campaigns and in-store promotions to generate interest and sales. A phased, region-based launch could minimize risk while testing different promotional strategies.

Packages Required

library(completejourney) #Provides datasets for consumer and retail analysis
library(tidyverse) #A collection of R packages for data science
library(lubridate) #Helps with date and time manipulation
library(ggplot2) # Used for creating complex and customizable plots
library(tidyr) # helps tidy up messy data
library(dplyr) # provides functions for data manipulation
library(knitr) # Enables dynamic report generation in R Markdown
library(rmarkdown) # Converts R Markdown documents to various output formats

Data Preparation

transactions <- get_transactions()
transactions

## # A tibble: 1,469,307 × 11
##    household_id store_id basket_id   product_id quantity sales_value retail_disc
##    <chr>        <chr>    <chr>       <chr>         <dbl>       <dbl>       <dbl>
##  1 900          330      31198570044 1095275           1        0.5         0   
##  2 900          330      31198570047 9878513           1        0.99        0.1 
##  3 1228         406      31198655051 1041453           1        1.43        0.15
##  4 906          319      31198705046 1020156           1        1.5         0.29
##  5 906          319      31198705046 1053875           2        2.78        0.8 
##  6 906          319      31198705046 1060312           1        5.49        0.5 
##  7 906          319      31198705046 1075313           1        1.5         0.29
##  8 1058         381      31198676055 985893            1        1.88        0.21
##  9 1058         381      31198676055 988791            1        1.5         1.29
## 10 1058         381      31198676055 9297106           1        2.69        0   
## # ℹ 1,469,297 more rows
## # ℹ 4 more variables: coupon_disc <dbl>, coupon_match_disc <dbl>, week <int>,
## #   transaction_timestamp <dttm>

promotions <- get_promotions()
promotions

## # A tibble: 20,940,529 × 5
##    product_id store_id display_location mailer_location  week
##    <chr>      <chr>    <fct>            <fct>           <int>
##  1 1000050    316      9                0                   1
##  2 1000050    337      3                0                   1
##  3 1000050    441      5                0                   1
##  4 1000092    292      0                A                   1
##  5 1000092    293      0                A                   1
##  6 1000092    295      0                A                   1
##  7 1000092    298      0                A                   1
##  8 1000092    299      0                A                   1
##  9 1000092    304      0                A                   1
## 10 1000092    306      0                A                   1
## # ℹ 20,940,519 more rows

household_size <- demographics$household_size
household_size

##   [1] 2  1  1  1  4  5+ 4  1  5+ 2  5+ 4  2  1  5+ 1  1  2  2  3  2  1  2  2  2 
##  [26] 2  2  4  2  2  1  3  2  2  5+ 1  3  3  4  2  3  4  3  3  2  1  2  2  2  4 
##  [51] 2  1  2  2  2  1  1  1  3  4  2  2  1  5+ 2  1  2  1  5+ 1  1  2  2  3  4 
##  [76] 3  1  3  1  2  2  2  3  2  2  2  2  3  1  1  2  1  1  1  4  1  1  2  3  2 
## [101] 2  1  4  4  4  2  1  2  2  2  2  1  2  4  4  2  2  2  2  1  2  1  3  1  2 
## [126] 1  1  2  3  2  1  2  1  2  2  1  1  3  3  1  2  1  3  2  1  3  3  1  2  2 
## [151] 5+ 3  3  2  2  2  4  2  1  1  1  1  1  1  2  1  2  2  2  2  1  2  1  2  4 
## [176] 2  2  1  1  3  1  3  4  1  2  1  3  2  4  3  2  1  1  2  2  2  1  2  1  1 
## [201] 2  1  1  1  2  5+ 3  2  1  4  2  5+ 2  1  1  2  3  1  3  2  2  2  1  3  2 
## [226] 1  1  3  1  1  3  2  4  1  1  3  2  1  2  1  1  2  1  1  4  1  5+ 5+ 4  1 
## [251] 5+ 2  2  5+ 5+ 2  2  2  1  1  2  2  2  3  2  1  2  2  5+ 2  2  2  2  1  5+
## [276] 5+ 1  2  2  3  1  3  2  1  4  5+ 1  2  1  2  3  2  2  3  1  3  1  2  1  1 
## [301] 5+ 2  2  2  1  2  1  3  1  2  1  2  2  2  4  1  1  1  1  1  2  1  3  1  5+
## [326] 3  3  2  1  5+ 1  5+ 3  1  2  1  1  1  3  5+ 1  2  1  2  1  5+ 3  2  1  1 
## [351] 2  5+ 2  2  2  2  2  1  2  1  1  3  2  3  2  2  2  1  1  3  3  2  5+ 2  2 
## [376] 5+ 2  2  2  5+ 5+ 5+ 1  1  2  2  4  2  1  2  1  2  2  3  4  2  2  1  1  1 
## [401] 1  2  2  3  2  3  2  2  1  4  1  2  2  4  2  1  1  3  2  2  1  2  3  5+ 1 
## [426] 1  2  1  1  2  1  2  3  1  2  1  4  1  1  2  1  2  2  2  3  1  2  5+ 4  1 
## [451] 2  2  1  2  3  1  3  2  2  5+ 1  2  1  2  2  1  5+ 1  3  1  1  1  2  5+ 2 
## [476] 2  1  2  2  2  2  2  1  5+ 1  1  2  1  2  3  1  3  2  2  5+ 3  5+ 1  3  2 
## [501] 1  5+ 5+ 1  2  2  2  5+ 1  2  2  3  2  4  1  1  1  2  3  2  3  1  2  3  1 
## [526] 4  1  2  3  2  2  1  1  1  1  2  1  2  4  2  4  3  2  4  2  2  3  2  4  2 
## [551] 1  4  1  5+ 2  2  4  1  2  5+ 2  1  1  2  2  3  2  2  2  1  5+ 1  1  1  2 
## [576] 1  1  3  1  2  3  2  2  5+ 2  2  1  2  3  1  2  2  5+ 1  3  5+ 2  1  2  2 
## [601] 2  1  2  2  5+ 1  1  2  1  1  2  1  2  1  1  2  5+ 2  2  1  3  2  2  2  1 
## [626] 1  1  1  1  1  3  2  2  2  2  4  2  1  3  2  3  1  2  1  4  3  1  3  5+ 1 
## [651] 3  3  2  1  4  4  1  1  5+ 1  4  1  1  3  1  3  3  2  4  2  1  5+ 3  1  2 
## [676] 2  1  2  2  3  1  1  2  3  2  2  2  2  5+ 4  2  2  3  2  5+ 5+ 2  1  2  4 
## [701] 2  2  2  3  4  3  2  4  1  5+ 2  5+ 2  2  5+ 5+ 1  3  1  2  2  2  1  2  1 
## [726] 2  3  2  2  2  2  2  3  1  3  3  2  3  5+ 2  1  1  4  1  1  3  1  1  1  1 
## [751] 5+ 2  1  2  3  3  3  2  2  2  2  3  4  1  2  2  1  2  3  2  2  1  5+ 1  2 
## [776] 2  2  1  2  2  1  1  2  1  2  5+ 2  1  3  4  5+ 2  2  2  4  3  1  3  5+ 2 
## [801] 1 
## Levels: 1 < 2 < 3 < 4 < 5+

incomedesc <- demographics
incomedesc

## # A tibble: 801 × 8
##    household_id age   income    home_ownership marital_status household_size
##    <chr>        <ord> <ord>     <ord>          <ord>          <ord>         
##  1 1            65+   35-49K    Homeowner      Married        2             
##  2 1001         45-54 50-74K    Homeowner      Unmarried      1             
##  3 1003         35-44 25-34K    <NA>           Unmarried      1             
##  4 1004         25-34 15-24K    <NA>           Unmarried      1             
##  5 101          45-54 Under 15K Homeowner      Married        4             
##  6 1012         35-44 35-49K    <NA>           Married        5+            
##  7 1014         45-54 15-24K    <NA>           Married        4             
##  8 1015         45-54 50-74K    Homeowner      Unmarried      1             
##  9 1018         45-54 35-49K    Homeowner      Married        5+            
## 10 1020         45-54 25-34K    Homeowner      Married        2             
## # ℹ 791 more rows
## # ℹ 2 more variables: household_comp <ord>, kids_count <ord>

demographics

## # A tibble: 801 × 8
##    household_id age   income    home_ownership marital_status household_size
##    <chr>        <ord> <ord>     <ord>          <ord>          <ord>         
##  1 1            65+   35-49K    Homeowner      Married        2             
##  2 1001         45-54 50-74K    Homeowner      Unmarried      1             
##  3 1003         35-44 25-34K    <NA>           Unmarried      1             
##  4 1004         25-34 15-24K    <NA>           Unmarried      1             
##  5 101          45-54 Under 15K Homeowner      Married        4             
##  6 1012         35-44 35-49K    <NA>           Married        5+            
##  7 1014         45-54 15-24K    <NA>           Married        4             
##  8 1015         45-54 50-74K    Homeowner      Unmarried      1             
##  9 1018         45-54 35-49K    Homeowner      Married        5+            
## 10 1020         45-54 25-34K    Homeowner      Married        2             
## # ℹ 791 more rows
## # ℹ 2 more variables: household_comp <ord>, kids_count <ord>

products

## # A tibble: 92,331 × 7
##    product_id manufacturer_id department    brand  product_category product_type
##    <chr>      <chr>           <chr>         <fct>  <chr>            <chr>       
##  1 25671      2               GROCERY       Natio… FRZN ICE         ICE - CRUSH…
##  2 26081      2               MISCELLANEOUS Natio… <NA>             <NA>        
##  3 26093      69              PASTRY        Priva… BREAD            BREAD:ITALI…
##  4 26190      69              GROCERY       Priva… FRUIT - SHELF S… APPLE SAUCE 
##  5 26355      69              GROCERY       Priva… COOKIES/CONES    SPECIALTY C…
##  6 26426      69              GROCERY       Priva… SPICES & EXTRAC… SPICES & SE…
##  7 26540      69              GROCERY       Priva… COOKIES/CONES    TRAY PACK/C…
##  8 26601      69              DRUG GM       Priva… VITAMINS         VITAMIN - M…
##  9 26636      69              PASTRY        Priva… BREAKFAST SWEETS SW GDS: SW …
## 10 26691      16              GROCERY       Priva… PNT BTR/JELLY/J… HONEY       
## # ℹ 92,321 more rows
## # ℹ 1 more variable: package_size <chr>

campaign_descriptions

## # A tibble: 27 × 4
##    campaign_id campaign_type start_date end_date  
##    <chr>       <ord>         <date>     <date>    
##  1 1           Type B        2017-03-03 2017-04-09
##  2 2           Type B        2017-03-08 2017-04-09
##  3 3           Type C        2017-03-13 2017-05-08
##  4 4           Type B        2017-03-29 2017-04-30
##  5 5           Type B        2017-04-03 2017-05-07
##  6 6           Type C        2017-04-19 2017-05-21
##  7 7           Type B        2017-04-24 2017-05-28
##  8 8           Type A        2017-05-08 2017-06-25
##  9 9           Type B        2017-05-31 2017-07-02
## 10 10          Type B        2017-06-28 2017-07-30
## # ℹ 17 more rows

campaigns

## # A tibble: 6,589 × 2
##    campaign_id household_id
##    <chr>       <chr>       
##  1 1           105         
##  2 1           1238        
##  3 1           1258        
##  4 1           1483        
##  5 1           2200        
##  6 1           293         
##  7 1           529         
##  8 1           536         
##  9 1           568         
## 10 1           630         
## # ℹ 6,579 more rows

Exploratory Data Analysis

The bar chart shows the sales of the 10 most popular soups and chilies in the U.S. Chicken noodle soup stands out as the best-selling product by a significant margin, with approximately 303,116 units sold. The next best-selling product, butternut squash soup, recorded 27,467 units in sales, followed by tomato soup at 22,760. The remaining products, including chili varieties and clam chowder, each had sales ranging between 13,643 and 20,477 units, indicating a much smaller share of the market compared to chicken noodle soup.

product_summary <- transactions %>%
  group_by(product_id) %>%
  summarise(total_sales = sum(sales_value, na.rm = TRUE), 
            total_quantity = sum(quantity, na.rm = TRUE)) %>%
  arrange(desc(total_sales))
product_summary
head(product_summary)
transactions_subset <- transactions %>%
  select(household_id, store_id, product_id, quantity, sales_value, transaction_timestamp) %>%
  mutate(week = week(transaction_timestamp))
promotions_subset <- promotions %>%
  select(product_id, store_id, week)
transactions_with_promotions <- transactions_subset %>%
  left_join(promotions_subset, by = c("product_id", "store_id", "week"), relationship = "many-to-many")
product_summary <- transactions_with_promotions %>%
  group_by(product_id) %>%
  summarise(total_sales = sum(sales_value), total_quantity = sum(quantity)) %>%
  arrange(desc(total_sales))
head(product_summary)
top_products <- product_summary %>%
  head(10)
product_labels <- c("6534178" = "Chicken Noodle Soup",
                    "6533889" = "Butternut Squash Soup",
                    "1029743" = "Tomato Soup",
                    "6534166" = "Clam Chowder",
                    "6533765" = "Lentil Soup",
                    "1082185" = "Beef Chili",
                    "916122"  = "Turkey Chili",
                    "1106523" = "Vegetarian Chilli",
                    "995242"  = "White Chicken Chilli",
                    "5569230" = "Chili Con Carne")

top_products_named <- product_summary %>%
  head(10) %>%
  mutate(product_label = product_labels[as.character(product_id)])

ggplot(top_products_named, aes(x = reorder(product_label, total_sales), y = total_sales)) +
  geom_bar(stat = "identity", fill = "orange")+
  geom_text(aes(label = round(total_sales, 2)), vjust = -0.3, size = 3.5) +
  labs(title = "10 Most Popular Soups and Chilis Sales in the US",
       x = "Product",
       y = "Total Sales ($)") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Based on the data from CompleteJourney, the majority of households (30%) fall into the income bracket of under $15K per year. Three additional income groups—15-24K, 25-34K, and 35-49K—each represent 20% of the population. Together, these four income groups make up 90% of the population, with 60% concentrated in households earning between 15K and 49K annually. Only 10% of households have an income of 50-74K

str(demographics)
incomedesc <- demographics %>%
  arrange(income)
summary(incomedesc)
income_numeric <- as.numeric(as.factor(demographics$income))
levels(demographics$income)
income <- factor(c("Under 15K", "15-24K", "Under 15K", "25-34K", "25-34K", 
                   "15-24K", "Under 15K", "35-49K", "35-49K", "50-74K"), 
                 ordered = TRUE)
income_freq <- table(income)
income_prop <- prop.table(income_freq)
pie(income_prop, 
    labels = paste0(names(income_prop), " (", round(100 * income_prop, 1), "%)"), 
    main = "Income Distribution")

The data shows the distribution of household sizes, with the majority of households consisting of two members (nearly 40%), followed by single-person households, which make up around 32%. Households of three members represent a smaller proportion, while households with four members and five or more members are the least common, each accounting for a small fraction of the total.

str(demographics)
household_size <- demographics$household_size
household_size <- factor(household_size, levels = c("1", "2", "3", "4", "5+"))
df <- data.frame(household_size)
ggplot(df, aes(x = household_size)) +
  geom_bar(aes(y = (..count..) / sum(..count..)), fill = "brown") +
  scale_y_continuous(labels = scales::percent_format()) +  
  labs(title = "Proportion of Household Sizes",
       x = "Household Size",
       y = "Proportion") +
  theme_minimal()

According to our analysis, while Campaign Type C was highly successful in attracting customers from October 2017 to January 2018, Campaign Type B proved to be the most effective overall. Although its duration was shorter, the frequency and reach of Campaign Type B were remarkable, leading to better results in a shorter time frame.

ggplot(campaign_descriptions, aes(x = start_date, xend = end_date, y = reorder(campaign_id, start_date))) +
  geom_segment(aes(yend = campaign_id, color = campaign_type), linewidth = 1.5) +  
  labs(title = "Campaign Timelines Evaluation", 
       x = "Durations", 
       y = "Campaign ID") +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, face = "bold", hjust = 0.5),  
    axis.title.x = element_text(size = 14, face = "bold"),             
    axis.title.y = element_text(size = 14, face = "bold"),             
    axis.text.x = element_text(size = 12),                             
    axis.text.y = element_text(size = 12),                             
    legend.title = element_text(size = 14),                            
    legend.text = element_text(size = 12)                              
  )

Summary

How Our Team Solved the Business Problem

We used the CompleteJourney package to analyze key factors such as consumer preferences, household size, income distribution, and promotion effectiveness. This data-driven approach helped us develop a targeted launch strategy for the new chicken noodle soup broth.

Interesting Insight

Households with children are more likely to purchase the soup when bundled with other convenience foods, suggesting that family meal bundles could drive higher sales.

Implications

Target middle-income households with promotions.
Launch promotions in mid-fall to align with colder weather.

Some Limitations

Limited data on specific consumer preferences.
Market unpredictability could affect results.