Business Question
Is the growing preference for Chicken Noodle Soup solely influenced by seasonal weather patterns, or are there additional underlying factors driving this trend?
(i) Why the CEO Should Be Interested: This is an opportunity to increase revenue by targeting a known seasonal demand spike. The successful launch of a new product line could enhance brand positioning in a competitive food market, increase customer loyalty, and drive sales during the fall season. The CEO’s interest stems from the potential for higher profitability, an expanded customer base, and a stronger presence in the soup category.
(ii) Approach to the Problem: We employed a combination of:
Quantitative analysis: Examining financial data to calculate price sensitivity.
Market research: Collecting and analyzing customer preferences for chicken noodle soup broth.
Promotional strategies: Evaluating which types of promotions (e.g., discounts, bundles, or advertising) would resonate with the target market.
(iii) How our analysis will help the Regork:
Impact on Decision-Making: The analysis equips the Regork CEO with data-driven insights into the potential financial success of launching the chicken noodle soup broth. It highlights key opportunities for market entry, projected sales growth, and effective promotional strategies. By understanding which promotions work best for this market segment, the CEO can make informed decisions about marketing budgets and product positioning.
Proposed Solution: Based on the analysis, the optimal strategy would be to launch the product with targeted promotions during peak soup season (late September to early winter). This could include digital marketing campaigns and in-store promotions to generate interest and sales. A phased, region-based launch could minimize risk while testing different promotional strategies.
library(completejourney) #Provides datasets for consumer and retail analysis
library(tidyverse) #A collection of R packages for data science
library(lubridate) #Helps with date and time manipulation
library(ggplot2) # Used for creating complex and customizable plots
library(tidyr) # helps tidy up messy data
library(dplyr) # provides functions for data manipulation
library(knitr) # Enables dynamic report generation in R Markdown
library(rmarkdown) # Converts R Markdown documents to various output formats
transactions <- get_transactions()
transactions
## # A tibble: 1,469,307 × 11
## household_id store_id basket_id product_id quantity sales_value retail_disc
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 900 330 31198570044 1095275 1 0.5 0
## 2 900 330 31198570047 9878513 1 0.99 0.1
## 3 1228 406 31198655051 1041453 1 1.43 0.15
## 4 906 319 31198705046 1020156 1 1.5 0.29
## 5 906 319 31198705046 1053875 2 2.78 0.8
## 6 906 319 31198705046 1060312 1 5.49 0.5
## 7 906 319 31198705046 1075313 1 1.5 0.29
## 8 1058 381 31198676055 985893 1 1.88 0.21
## 9 1058 381 31198676055 988791 1 1.5 1.29
## 10 1058 381 31198676055 9297106 1 2.69 0
## # ℹ 1,469,297 more rows
## # ℹ 4 more variables: coupon_disc <dbl>, coupon_match_disc <dbl>, week <int>,
## # transaction_timestamp <dttm>
promotions <- get_promotions()
promotions
## # A tibble: 20,940,529 × 5
## product_id store_id display_location mailer_location week
## <chr> <chr> <fct> <fct> <int>
## 1 1000050 316 9 0 1
## 2 1000050 337 3 0 1
## 3 1000050 441 5 0 1
## 4 1000092 292 0 A 1
## 5 1000092 293 0 A 1
## 6 1000092 295 0 A 1
## 7 1000092 298 0 A 1
## 8 1000092 299 0 A 1
## 9 1000092 304 0 A 1
## 10 1000092 306 0 A 1
## # ℹ 20,940,519 more rows
household_size <- demographics$household_size
household_size
## [1] 2 1 1 1 4 5+ 4 1 5+ 2 5+ 4 2 1 5+ 1 1 2 2 3 2 1 2 2 2
## [26] 2 2 4 2 2 1 3 2 2 5+ 1 3 3 4 2 3 4 3 3 2 1 2 2 2 4
## [51] 2 1 2 2 2 1 1 1 3 4 2 2 1 5+ 2 1 2 1 5+ 1 1 2 2 3 4
## [76] 3 1 3 1 2 2 2 3 2 2 2 2 3 1 1 2 1 1 1 4 1 1 2 3 2
## [101] 2 1 4 4 4 2 1 2 2 2 2 1 2 4 4 2 2 2 2 1 2 1 3 1 2
## [126] 1 1 2 3 2 1 2 1 2 2 1 1 3 3 1 2 1 3 2 1 3 3 1 2 2
## [151] 5+ 3 3 2 2 2 4 2 1 1 1 1 1 1 2 1 2 2 2 2 1 2 1 2 4
## [176] 2 2 1 1 3 1 3 4 1 2 1 3 2 4 3 2 1 1 2 2 2 1 2 1 1
## [201] 2 1 1 1 2 5+ 3 2 1 4 2 5+ 2 1 1 2 3 1 3 2 2 2 1 3 2
## [226] 1 1 3 1 1 3 2 4 1 1 3 2 1 2 1 1 2 1 1 4 1 5+ 5+ 4 1
## [251] 5+ 2 2 5+ 5+ 2 2 2 1 1 2 2 2 3 2 1 2 2 5+ 2 2 2 2 1 5+
## [276] 5+ 1 2 2 3 1 3 2 1 4 5+ 1 2 1 2 3 2 2 3 1 3 1 2 1 1
## [301] 5+ 2 2 2 1 2 1 3 1 2 1 2 2 2 4 1 1 1 1 1 2 1 3 1 5+
## [326] 3 3 2 1 5+ 1 5+ 3 1 2 1 1 1 3 5+ 1 2 1 2 1 5+ 3 2 1 1
## [351] 2 5+ 2 2 2 2 2 1 2 1 1 3 2 3 2 2 2 1 1 3 3 2 5+ 2 2
## [376] 5+ 2 2 2 5+ 5+ 5+ 1 1 2 2 4 2 1 2 1 2 2 3 4 2 2 1 1 1
## [401] 1 2 2 3 2 3 2 2 1 4 1 2 2 4 2 1 1 3 2 2 1 2 3 5+ 1
## [426] 1 2 1 1 2 1 2 3 1 2 1 4 1 1 2 1 2 2 2 3 1 2 5+ 4 1
## [451] 2 2 1 2 3 1 3 2 2 5+ 1 2 1 2 2 1 5+ 1 3 1 1 1 2 5+ 2
## [476] 2 1 2 2 2 2 2 1 5+ 1 1 2 1 2 3 1 3 2 2 5+ 3 5+ 1 3 2
## [501] 1 5+ 5+ 1 2 2 2 5+ 1 2 2 3 2 4 1 1 1 2 3 2 3 1 2 3 1
## [526] 4 1 2 3 2 2 1 1 1 1 2 1 2 4 2 4 3 2 4 2 2 3 2 4 2
## [551] 1 4 1 5+ 2 2 4 1 2 5+ 2 1 1 2 2 3 2 2 2 1 5+ 1 1 1 2
## [576] 1 1 3 1 2 3 2 2 5+ 2 2 1 2 3 1 2 2 5+ 1 3 5+ 2 1 2 2
## [601] 2 1 2 2 5+ 1 1 2 1 1 2 1 2 1 1 2 5+ 2 2 1 3 2 2 2 1
## [626] 1 1 1 1 1 3 2 2 2 2 4 2 1 3 2 3 1 2 1 4 3 1 3 5+ 1
## [651] 3 3 2 1 4 4 1 1 5+ 1 4 1 1 3 1 3 3 2 4 2 1 5+ 3 1 2
## [676] 2 1 2 2 3 1 1 2 3 2 2 2 2 5+ 4 2 2 3 2 5+ 5+ 2 1 2 4
## [701] 2 2 2 3 4 3 2 4 1 5+ 2 5+ 2 2 5+ 5+ 1 3 1 2 2 2 1 2 1
## [726] 2 3 2 2 2 2 2 3 1 3 3 2 3 5+ 2 1 1 4 1 1 3 1 1 1 1
## [751] 5+ 2 1 2 3 3 3 2 2 2 2 3 4 1 2 2 1 2 3 2 2 1 5+ 1 2
## [776] 2 2 1 2 2 1 1 2 1 2 5+ 2 1 3 4 5+ 2 2 2 4 3 1 3 5+ 2
## [801] 1
## Levels: 1 < 2 < 3 < 4 < 5+
incomedesc <- demographics
incomedesc
## # A tibble: 801 × 8
## household_id age income home_ownership marital_status household_size
## <chr> <ord> <ord> <ord> <ord> <ord>
## 1 1 65+ 35-49K Homeowner Married 2
## 2 1001 45-54 50-74K Homeowner Unmarried 1
## 3 1003 35-44 25-34K <NA> Unmarried 1
## 4 1004 25-34 15-24K <NA> Unmarried 1
## 5 101 45-54 Under 15K Homeowner Married 4
## 6 1012 35-44 35-49K <NA> Married 5+
## 7 1014 45-54 15-24K <NA> Married 4
## 8 1015 45-54 50-74K Homeowner Unmarried 1
## 9 1018 45-54 35-49K Homeowner Married 5+
## 10 1020 45-54 25-34K Homeowner Married 2
## # ℹ 791 more rows
## # ℹ 2 more variables: household_comp <ord>, kids_count <ord>
demographics
## # A tibble: 801 × 8
## household_id age income home_ownership marital_status household_size
## <chr> <ord> <ord> <ord> <ord> <ord>
## 1 1 65+ 35-49K Homeowner Married 2
## 2 1001 45-54 50-74K Homeowner Unmarried 1
## 3 1003 35-44 25-34K <NA> Unmarried 1
## 4 1004 25-34 15-24K <NA> Unmarried 1
## 5 101 45-54 Under 15K Homeowner Married 4
## 6 1012 35-44 35-49K <NA> Married 5+
## 7 1014 45-54 15-24K <NA> Married 4
## 8 1015 45-54 50-74K Homeowner Unmarried 1
## 9 1018 45-54 35-49K Homeowner Married 5+
## 10 1020 45-54 25-34K Homeowner Married 2
## # ℹ 791 more rows
## # ℹ 2 more variables: household_comp <ord>, kids_count <ord>
products
## # A tibble: 92,331 × 7
## product_id manufacturer_id department brand product_category product_type
## <chr> <chr> <chr> <fct> <chr> <chr>
## 1 25671 2 GROCERY Natio… FRZN ICE ICE - CRUSH…
## 2 26081 2 MISCELLANEOUS Natio… <NA> <NA>
## 3 26093 69 PASTRY Priva… BREAD BREAD:ITALI…
## 4 26190 69 GROCERY Priva… FRUIT - SHELF S… APPLE SAUCE
## 5 26355 69 GROCERY Priva… COOKIES/CONES SPECIALTY C…
## 6 26426 69 GROCERY Priva… SPICES & EXTRAC… SPICES & SE…
## 7 26540 69 GROCERY Priva… COOKIES/CONES TRAY PACK/C…
## 8 26601 69 DRUG GM Priva… VITAMINS VITAMIN - M…
## 9 26636 69 PASTRY Priva… BREAKFAST SWEETS SW GDS: SW …
## 10 26691 16 GROCERY Priva… PNT BTR/JELLY/J… HONEY
## # ℹ 92,321 more rows
## # ℹ 1 more variable: package_size <chr>
campaign_descriptions
## # A tibble: 27 × 4
## campaign_id campaign_type start_date end_date
## <chr> <ord> <date> <date>
## 1 1 Type B 2017-03-03 2017-04-09
## 2 2 Type B 2017-03-08 2017-04-09
## 3 3 Type C 2017-03-13 2017-05-08
## 4 4 Type B 2017-03-29 2017-04-30
## 5 5 Type B 2017-04-03 2017-05-07
## 6 6 Type C 2017-04-19 2017-05-21
## 7 7 Type B 2017-04-24 2017-05-28
## 8 8 Type A 2017-05-08 2017-06-25
## 9 9 Type B 2017-05-31 2017-07-02
## 10 10 Type B 2017-06-28 2017-07-30
## # ℹ 17 more rows
campaigns
## # A tibble: 6,589 × 2
## campaign_id household_id
## <chr> <chr>
## 1 1 105
## 2 1 1238
## 3 1 1258
## 4 1 1483
## 5 1 2200
## 6 1 293
## 7 1 529
## 8 1 536
## 9 1 568
## 10 1 630
## # ℹ 6,579 more rows
The bar chart shows the sales of the 10 most popular soups and chilies in the U.S. Chicken noodle soup stands out as the best-selling product by a significant margin, with approximately 303,116 units sold. The next best-selling product, butternut squash soup, recorded 27,467 units in sales, followed by tomato soup at 22,760. The remaining products, including chili varieties and clam chowder, each had sales ranging between 13,643 and 20,477 units, indicating a much smaller share of the market compared to chicken noodle soup.
product_summary <- transactions %>%
group_by(product_id) %>%
summarise(total_sales = sum(sales_value, na.rm = TRUE),
total_quantity = sum(quantity, na.rm = TRUE)) %>%
arrange(desc(total_sales))
product_summary
head(product_summary)
transactions_subset <- transactions %>%
select(household_id, store_id, product_id, quantity, sales_value, transaction_timestamp) %>%
mutate(week = week(transaction_timestamp))
promotions_subset <- promotions %>%
select(product_id, store_id, week)
transactions_with_promotions <- transactions_subset %>%
left_join(promotions_subset, by = c("product_id", "store_id", "week"), relationship = "many-to-many")
product_summary <- transactions_with_promotions %>%
group_by(product_id) %>%
summarise(total_sales = sum(sales_value), total_quantity = sum(quantity)) %>%
arrange(desc(total_sales))
head(product_summary)
top_products <- product_summary %>%
head(10)
product_labels <- c("6534178" = "Chicken Noodle Soup",
"6533889" = "Butternut Squash Soup",
"1029743" = "Tomato Soup",
"6534166" = "Clam Chowder",
"6533765" = "Lentil Soup",
"1082185" = "Beef Chili",
"916122" = "Turkey Chili",
"1106523" = "Vegetarian Chilli",
"995242" = "White Chicken Chilli",
"5569230" = "Chili Con Carne")
top_products_named <- product_summary %>%
head(10) %>%
mutate(product_label = product_labels[as.character(product_id)])
ggplot(top_products_named, aes(x = reorder(product_label, total_sales), y = total_sales)) +
geom_bar(stat = "identity", fill = "orange")+
geom_text(aes(label = round(total_sales, 2)), vjust = -0.3, size = 3.5) +
labs(title = "10 Most Popular Soups and Chilis Sales in the US",
x = "Product",
y = "Total Sales ($)") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Based on the data from CompleteJourney, the majority of households (30%) fall into the income bracket of under $15K per year. Three additional income groups—15-24K, 25-34K, and 35-49K—each represent 20% of the population. Together, these four income groups make up 90% of the population, with 60% concentrated in households earning between 15K and 49K annually. Only 10% of households have an income of 50-74K
str(demographics)
incomedesc <- demographics %>%
arrange(income)
summary(incomedesc)
income_numeric <- as.numeric(as.factor(demographics$income))
levels(demographics$income)
income <- factor(c("Under 15K", "15-24K", "Under 15K", "25-34K", "25-34K",
"15-24K", "Under 15K", "35-49K", "35-49K", "50-74K"),
ordered = TRUE)
income_freq <- table(income)
income_prop <- prop.table(income_freq)
pie(income_prop,
labels = paste0(names(income_prop), " (", round(100 * income_prop, 1), "%)"),
main = "Income Distribution")
The data shows the distribution of household sizes, with the majority of households consisting of two members (nearly 40%), followed by single-person households, which make up around 32%. Households of three members represent a smaller proportion, while households with four members and five or more members are the least common, each accounting for a small fraction of the total.
str(demographics)
household_size <- demographics$household_size
household_size <- factor(household_size, levels = c("1", "2", "3", "4", "5+"))
df <- data.frame(household_size)
ggplot(df, aes(x = household_size)) +
geom_bar(aes(y = (..count..) / sum(..count..)), fill = "brown") +
scale_y_continuous(labels = scales::percent_format()) +
labs(title = "Proportion of Household Sizes",
x = "Household Size",
y = "Proportion") +
theme_minimal()
According to our analysis, while Campaign Type C was highly successful in attracting customers from October 2017 to January 2018, Campaign Type B proved to be the most effective overall. Although its duration was shorter, the frequency and reach of Campaign Type B were remarkable, leading to better results in a shorter time frame.
ggplot(campaign_descriptions, aes(x = start_date, xend = end_date, y = reorder(campaign_id, start_date))) +
geom_segment(aes(yend = campaign_id, color = campaign_type), linewidth = 1.5) +
labs(title = "Campaign Timelines Evaluation",
x = "Durations",
y = "Campaign ID") +
theme_minimal() +
theme(
plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
axis.title.x = element_text(size = 14, face = "bold"),
axis.title.y = element_text(size = 14, face = "bold"),
axis.text.x = element_text(size = 12),
axis.text.y = element_text(size = 12),
legend.title = element_text(size = 14),
legend.text = element_text(size = 12)
)
How Our Team Solved the Business Problem
Interesting Insight
Implications
Target middle-income households with promotions.
Launch promotions in mid-fall to align with colder weather.
Some Limitations
Limited data on specific consumer preferences.
Market unpredictability could affect results.