Do people with lower incomes really eat more deli meat than those with higher incomes? And if so, which deli meat type do they used the most across different states? This research will uncover the truth behind which type of meat usage and spending habits.
(i) Why should the Regork CEO be interested in this? By leveraging these insights, Regork can maximize profits while optimizing promotional spending, ensuring both increased revenue and cost savings. This research provides valuable insights into customer spending behavior, helping CEOs determine which time of the year should they operate marketing campaigns to attract new customers. Additionally, it allows companies to reassess long-standing coupon strategies that may have already shaped customer shopping habits.
(ii) How we addressed this problem statement? - Descriptive Analysis: Utilizing data spanning in last year to analyze trends of shopping meat
Market Research: Analyzing the relationship between customer income and shopping behavior
Operation Strategy: Evaluating the most profitable states and target customer segments (e.g., income level, marital status) to optimize marketing strategies and implement targeted campaigns.
(iii) How our analysis will help the Regork CEO? Optimizing Strategic Decisions for Regork: This analysis helps the Regork valuable data-driven insights into the financial viability of the campaign. It identifies which products should receive coupons, the ideal coupon values, and the most effective distribution methods. Furthermore, it highlights key opportunities for market expansion, anticipated sales growth, and impactful promotional strategies. By understanding which promotions resonate best with the target audience, the CEO can make well-informed decisions regarding marketing budgets and product positioning to drive profitability.
Proposed solution: Regork should strategically focus on offering turkeys in a range of sizes throughout the festival season, from October to January. This period represents a key opportunity to align with consumer demand during holidays and celebrations. By providing a variety of turkey sizes, Regork can cater to different customer needs, from small families to larger gatherings, thereby maximizing sales potential. Leveraging this seasonal demand will not only enhance customer satisfaction but also help establish Regork as a go-to brand for holiday meals, driving both short-term revenue and long-term brand loyalty.
library(ggplot2) #Used for creating complex and customizable plot
library(completejourney) #Provides datasets for consumer analysis
## Welcome to the completejourney package! Learn more about these data
## sets at http://bit.ly/completejourney.
library(lubridate)#Simplifies the manipulation and parsing of date-time
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(RColorBrewer)#Offers a collection of color palettes for better visual representation in plots
library(viridisLite)#Provide color maps that are ideal for data visualization
library(dplyr)#Provide functions for data manipulation
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(scales)#Formatting and scaling of visual elements like axis
library(knitr)#Enables dynamic report in R Markdown documents
library(rmarkdown)#Converts R Markdown documents to various output formats
products
## # A tibble: 92,331 × 7
## product_id manufacturer_id department brand product_category product_type
## <chr> <chr> <chr> <fct> <chr> <chr>
## 1 25671 2 GROCERY Natio… FRZN ICE ICE - CRUSH…
## 2 26081 2 MISCELLANEOUS Natio… <NA> <NA>
## 3 26093 69 PASTRY Priva… BREAD BREAD:ITALI…
## 4 26190 69 GROCERY Priva… FRUIT - SHELF S… APPLE SAUCE
## 5 26355 69 GROCERY Priva… COOKIES/CONES SPECIALTY C…
## 6 26426 69 GROCERY Priva… SPICES & EXTRAC… SPICES & SE…
## 7 26540 69 GROCERY Priva… COOKIES/CONES TRAY PACK/C…
## 8 26601 69 DRUG GM Priva… VITAMINS VITAMIN - M…
## 9 26636 69 PASTRY Priva… BREAKFAST SWEETS SW GDS: SW …
## 10 26691 16 GROCERY Priva… PNT BTR/JELLY/J… HONEY
## # ℹ 92,321 more rows
## # ℹ 1 more variable: package_size <chr>
transactions <- get_transactions()
transactions
## # A tibble: 1,469,307 × 11
## household_id store_id basket_id product_id quantity sales_value retail_disc
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 900 330 31198570044 1095275 1 0.5 0
## 2 900 330 31198570047 9878513 1 0.99 0.1
## 3 1228 406 31198655051 1041453 1 1.43 0.15
## 4 906 319 31198705046 1020156 1 1.5 0.29
## 5 906 319 31198705046 1053875 2 2.78 0.8
## 6 906 319 31198705046 1060312 1 5.49 0.5
## 7 906 319 31198705046 1075313 1 1.5 0.29
## 8 1058 381 31198676055 985893 1 1.88 0.21
## 9 1058 381 31198676055 988791 1 1.5 1.29
## 10 1058 381 31198676055 9297106 1 2.69 0
## # ℹ 1,469,297 more rows
## # ℹ 4 more variables: coupon_disc <dbl>, coupon_match_disc <dbl>, week <int>,
## # transaction_timestamp <dttm>
promotions <- get_promotions()
coupons <- coupon_redemptions
coupons
## # A tibble: 2,102 × 4
## household_id coupon_upc campaign_id redemption_date
## <chr> <chr> <chr> <date>
## 1 1029 51380041013 26 2017-01-01
## 2 1029 51380041313 26 2017-01-01
## 3 165 53377610033 26 2017-01-03
## 4 712 51380041013 26 2017-01-07
## 5 712 54300016033 26 2017-01-07
## 6 2488 51200092776 26 2017-01-10
## 7 2488 51410010050 26 2017-01-10
## 8 1923 53000012033 26 2017-01-14
## 9 1923 54300021057 26 2017-01-14
## 10 1923 57047091041 26 2017-01-14
## # ℹ 2,092 more rows
household_size <- demographics$household_size
household_size
## [1] 2 1 1 1 4 5+ 4 1 5+ 2 5+ 4 2 1 5+ 1 1 2 2 3 2 1 2 2 2
## [26] 2 2 4 2 2 1 3 2 2 5+ 1 3 3 4 2 3 4 3 3 2 1 2 2 2 4
## [51] 2 1 2 2 2 1 1 1 3 4 2 2 1 5+ 2 1 2 1 5+ 1 1 2 2 3 4
## [76] 3 1 3 1 2 2 2 3 2 2 2 2 3 1 1 2 1 1 1 4 1 1 2 3 2
## [101] 2 1 4 4 4 2 1 2 2 2 2 1 2 4 4 2 2 2 2 1 2 1 3 1 2
## [126] 1 1 2 3 2 1 2 1 2 2 1 1 3 3 1 2 1 3 2 1 3 3 1 2 2
## [151] 5+ 3 3 2 2 2 4 2 1 1 1 1 1 1 2 1 2 2 2 2 1 2 1 2 4
## [176] 2 2 1 1 3 1 3 4 1 2 1 3 2 4 3 2 1 1 2 2 2 1 2 1 1
## [201] 2 1 1 1 2 5+ 3 2 1 4 2 5+ 2 1 1 2 3 1 3 2 2 2 1 3 2
## [226] 1 1 3 1 1 3 2 4 1 1 3 2 1 2 1 1 2 1 1 4 1 5+ 5+ 4 1
## [251] 5+ 2 2 5+ 5+ 2 2 2 1 1 2 2 2 3 2 1 2 2 5+ 2 2 2 2 1 5+
## [276] 5+ 1 2 2 3 1 3 2 1 4 5+ 1 2 1 2 3 2 2 3 1 3 1 2 1 1
## [301] 5+ 2 2 2 1 2 1 3 1 2 1 2 2 2 4 1 1 1 1 1 2 1 3 1 5+
## [326] 3 3 2 1 5+ 1 5+ 3 1 2 1 1 1 3 5+ 1 2 1 2 1 5+ 3 2 1 1
## [351] 2 5+ 2 2 2 2 2 1 2 1 1 3 2 3 2 2 2 1 1 3 3 2 5+ 2 2
## [376] 5+ 2 2 2 5+ 5+ 5+ 1 1 2 2 4 2 1 2 1 2 2 3 4 2 2 1 1 1
## [401] 1 2 2 3 2 3 2 2 1 4 1 2 2 4 2 1 1 3 2 2 1 2 3 5+ 1
## [426] 1 2 1 1 2 1 2 3 1 2 1 4 1 1 2 1 2 2 2 3 1 2 5+ 4 1
## [451] 2 2 1 2 3 1 3 2 2 5+ 1 2 1 2 2 1 5+ 1 3 1 1 1 2 5+ 2
## [476] 2 1 2 2 2 2 2 1 5+ 1 1 2 1 2 3 1 3 2 2 5+ 3 5+ 1 3 2
## [501] 1 5+ 5+ 1 2 2 2 5+ 1 2 2 3 2 4 1 1 1 2 3 2 3 1 2 3 1
## [526] 4 1 2 3 2 2 1 1 1 1 2 1 2 4 2 4 3 2 4 2 2 3 2 4 2
## [551] 1 4 1 5+ 2 2 4 1 2 5+ 2 1 1 2 2 3 2 2 2 1 5+ 1 1 1 2
## [576] 1 1 3 1 2 3 2 2 5+ 2 2 1 2 3 1 2 2 5+ 1 3 5+ 2 1 2 2
## [601] 2 1 2 2 5+ 1 1 2 1 1 2 1 2 1 1 2 5+ 2 2 1 3 2 2 2 1
## [626] 1 1 1 1 1 3 2 2 2 2 4 2 1 3 2 3 1 2 1 4 3 1 3 5+ 1
## [651] 3 3 2 1 4 4 1 1 5+ 1 4 1 1 3 1 3 3 2 4 2 1 5+ 3 1 2
## [676] 2 1 2 2 3 1 1 2 3 2 2 2 2 5+ 4 2 2 3 2 5+ 5+ 2 1 2 4
## [701] 2 2 2 3 4 3 2 4 1 5+ 2 5+ 2 2 5+ 5+ 1 3 1 2 2 2 1 2 1
## [726] 2 3 2 2 2 2 2 3 1 3 3 2 3 5+ 2 1 1 4 1 1 3 1 1 1 1
## [751] 5+ 2 1 2 3 3 3 2 2 2 2 3 4 1 2 2 1 2 3 2 2 1 5+ 1 2
## [776] 2 2 1 2 2 1 1 2 1 2 5+ 2 1 3 4 5+ 2 2 2 4 3 1 3 5+ 2
## [801] 1
## Levels: 1 < 2 < 3 < 4 < 5+
incomedesc <- demographics
incomedesc
## # A tibble: 801 × 8
## household_id age income home_ownership marital_status household_size
## <chr> <ord> <ord> <ord> <ord> <ord>
## 1 1 65+ 35-49K Homeowner Married 2
## 2 1001 45-54 50-74K Homeowner Unmarried 1
## 3 1003 35-44 25-34K <NA> Unmarried 1
## 4 1004 25-34 15-24K <NA> Unmarried 1
## 5 101 45-54 Under 15K Homeowner Married 4
## 6 1012 35-44 35-49K <NA> Married 5+
## 7 1014 45-54 15-24K <NA> Married 4
## 8 1015 45-54 50-74K Homeowner Unmarried 1
## 9 1018 45-54 35-49K Homeowner Married 5+
## 10 1020 45-54 25-34K Homeowner Married 2
## # ℹ 791 more rows
## # ℹ 2 more variables: household_comp <ord>, kids_count <ord>
demographics
## # A tibble: 801 × 8
## household_id age income home_ownership marital_status household_size
## <chr> <ord> <ord> <ord> <ord> <ord>
## 1 1 65+ 35-49K Homeowner Married 2
## 2 1001 45-54 50-74K Homeowner Unmarried 1
## 3 1003 35-44 25-34K <NA> Unmarried 1
## 4 1004 25-34 15-24K <NA> Unmarried 1
## 5 101 45-54 Under 15K Homeowner Married 4
## 6 1012 35-44 35-49K <NA> Married 5+
## 7 1014 45-54 15-24K <NA> Married 4
## 8 1015 45-54 50-74K Homeowner Unmarried 1
## 9 1018 45-54 35-49K Homeowner Married 5+
## 10 1020 45-54 25-34K Homeowner Married 2
## # ℹ 791 more rows
## # ℹ 2 more variables: household_comp <ord>, kids_count <ord>
campaign_descriptions
## # A tibble: 27 × 4
## campaign_id campaign_type start_date end_date
## <chr> <ord> <date> <date>
## 1 1 Type B 2017-03-03 2017-04-09
## 2 2 Type B 2017-03-08 2017-04-09
## 3 3 Type C 2017-03-13 2017-05-08
## 4 4 Type B 2017-03-29 2017-04-30
## 5 5 Type B 2017-04-03 2017-05-07
## 6 6 Type C 2017-04-19 2017-05-21
## 7 7 Type B 2017-04-24 2017-05-28
## 8 8 Type A 2017-05-08 2017-06-25
## 9 9 Type B 2017-05-31 2017-07-02
## 10 10 Type B 2017-06-28 2017-07-30
## # ℹ 17 more rows
campaigns
## # A tibble: 6,589 × 2
## campaign_id household_id
## <chr> <chr>
## 1 1 105
## 2 1 1238
## 3 1 1258
## 4 1 1483
## 5 1 2200
## 6 1 293
## 7 1 529
## 8 1 536
## 9 1 568
## 10 1 630
## # ℹ 6,579 more rows
str(demographics)
## spc_tbl_ [801 × 8] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ household_id : chr [1:801] "1" "1001" "1003" "1004" ...
## $ age : Ord.factor w/ 6 levels "19-24"<"25-34"<..: 6 4 3 2 4 3 4 4 4 4 ...
## $ income : Ord.factor w/ 12 levels "Under 15K"<"15-24K"<..: 4 5 3 2 1 4 2 5 4 3 ...
## $ home_ownership: Ord.factor w/ 5 levels "Renter"<"Probable Renter"<..: 3 3 NA NA 3 NA NA 3 3 3 ...
## $ marital_status: Ord.factor w/ 3 levels "Married"<"Unmarried"<..: 1 2 2 2 1 1 1 2 1 1 ...
## $ household_size: Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 2 1 1 1 4 5 4 1 5 2 ...
## $ household_comp: Ord.factor w/ 5 levels "1 Adult Kids"<..: 4 2 2 2 3 3 3 2 3 4 ...
## $ kids_count : Ord.factor w/ 5 levels "0"<"1"<"2"<"3+"<..: 1 1 1 1 3 4 3 1 4 1 ...
incomedesc <- demographics%>%
arrange(income)
summary(incomedesc)
## household_id age income home_ownership
## Length:801 19-24: 46 50-74K :192 Renter : 42
## Class :character 25-34:142 35-49K :172 Probable Renter : 11
## Mode :character 35-44:194 75-99K : 96 Homeowner :504
## 45-54:288 25-34K : 77 Probable Homeowner: 11
## 55-64: 59 15-24K : 74 Unknown : 0
## 65+ : 72 Under 15K: 61 NA's :233
## (Other) :129
## marital_status household_size household_comp kids_count
## Married :340 1 :255 1 Adult Kids : 93 0 :513
## Unmarried:324 2 :318 1 Adult No Kids :255 1 :159
## Unknown : 0 3 :109 2 Adults Kids :195 2 : 60
## NA's :137 4 : 53 2 Adults No Kids:258 3+ : 69
## 5+: 66 Unknown : 0 Unknown: 0
##
##
income_numeric <- as.numeric(as.factor(demographics$income))
levels(demographics$income)
## [1] "Under 15K" "15-24K" "25-34K" "35-49K" "50-74K" "75-99K"
## [7] "100-124K" "125-149K" "150-174K" "175-199K" "200-249K" "250K+"
income <- factor(c("Under 15K", "15K-24K", "Under 15K", "25K-34K","Under 15K", "35K-49K", "50K-74K"),
ordered = TRUE)
income_freq <- table(income)
income_prop <- prop.table(income_freq)
pie(income_prop,
lables = paste0(names(income_prop), " (", round(100 * income_prop, 1), "%)"),
main = "Income Distribution")
## Warning in text.default(1.1 * P$x, 1.1 * P$y, labels[i], xpd = TRUE, adj =
## ifelse(P$x < : "lables" is not a graphical parameter
## Warning in text.default(1.1 * P$x, 1.1 * P$y, labels[i], xpd = TRUE, adj =
## ifelse(P$x < : "lables" is not a graphical parameter
## Warning in text.default(1.1 * P$x, 1.1 * P$y, labels[i], xpd = TRUE, adj =
## ifelse(P$x < : "lables" is not a graphical parameter
## Warning in text.default(1.1 * P$x, 1.1 * P$y, labels[i], xpd = TRUE, adj =
## ifelse(P$x < : "lables" is not a graphical parameter
## Warning in text.default(1.1 * P$x, 1.1 * P$y, labels[i], xpd = TRUE, adj =
## ifelse(P$x < : "lables" is not a graphical parameter
## Warning in title(main = main, ...): "lables" is not a graphical parameter
product_summary <- transactions %>%
group_by(product_id) %>%
summarise(total_sales = sum(sales_value, na.rm = TRUE),
total_quantity = sum(quantity, na.rm = TRUE)) %>%
arrange(desc(total_sales))
product_summary
## # A tibble: 68,509 × 3
## product_id total_sales total_quantity
## <chr> <dbl> <dbl>
## 1 6534178 303116. 126868510
## 2 6533889 27468. 10711596
## 3 1029743 22730. 9264
## 4 6534166 20478. 8330403
## 5 6533765 19452. 1150
## 6 1082185 17220. 17639
## 7 916122 16120. 3935
## 8 1106523 15630. 6367
## 9 995242 15603. 13043
## 10 5569230 13410. 4413
## # ℹ 68,499 more rows
head(product_summary)
## # A tibble: 6 × 3
## product_id total_sales total_quantity
## <chr> <dbl> <dbl>
## 1 6534178 303116. 126868510
## 2 6533889 27468. 10711596
## 3 1029743 22730. 9264
## 4 6534166 20478. 8330403
## 5 6533765 19452. 1150
## 6 1082185 17220. 17639
transactions_subset <- transactions %>%
select(household_id, store_id, product_id, quantity, sales_value, transaction_timestamp) %>%
mutate(week = week(transaction_timestamp))
promotions_subset <- promotions %>%
select(product_id, store_id, week)
transactions_with_promotions <- transactions_subset %>%
left_join(promotions_subset, by = c("product_id", "store_id", "week"), relationship = "many-to-many")
product_summary <- transactions_with_promotions %>%
group_by(product_id) %>%
summarise(total_sales = sum(sales_value), total_quantity = sum(quantity)) %>%
arrange(desc(total_sales))
head(product_summary)
## # A tibble: 6 × 3
## product_id total_sales total_quantity
## <chr> <dbl> <dbl>
## 1 6534178 303116. 126868510
## 2 6533889 27468. 10711596
## 3 1029743 22760. 9277
## 4 6534166 20478. 8330403
## 5 6533765 19452. 1150
## 6 1082185 17220. 17639
top_products <- product_summary %>%
head(10)
product_labels <- c("6534178" = "Turkey",
"6533889" = "Cured Ham",
"1029743" = "Spam",
"6534166" = "Bologna",
"6533765" = "Pepperoni",
"1082185" = "Sausage",
"916122" = "Meatloaf",
"1106523" = "Vegetarian Spam",
"995242" = "Corned Beef",
"5569230" = "Chicken Breast")
top_products_named <- product_summary %>%
head(10) %>%
mutate(product_label = product_labels[as.character(product_id)])
ggplot(top_products_named, aes(x = reorder(product_label, total_sales), y = total_sales)) +
geom_bar(stat = "identity", fill = "red")+
geom_text(aes(label = round(total_sales, 2)), vjust = -0.3, size = 3.5) +
labs(title = "10 Most Popular Frozen Meat/Meat Dinner Sales in the US",
x = "Product",
y = "Total Sales ($)") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
str(demographics)
## spc_tbl_ [801 × 8] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ household_id : chr [1:801] "1" "1001" "1003" "1004" ...
## $ age : Ord.factor w/ 6 levels "19-24"<"25-34"<..: 6 4 3 2 4 3 4 4 4 4 ...
## $ income : Ord.factor w/ 12 levels "Under 15K"<"15-24K"<..: 4 5 3 2 1 4 2 5 4 3 ...
## $ home_ownership: Ord.factor w/ 5 levels "Renter"<"Probable Renter"<..: 3 3 NA NA 3 NA NA 3 3 3 ...
## $ marital_status: Ord.factor w/ 3 levels "Married"<"Unmarried"<..: 1 2 2 2 1 1 1 2 1 1 ...
## $ household_size: Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 2 1 1 1 4 5 4 1 5 2 ...
## $ household_comp: Ord.factor w/ 5 levels "1 Adult Kids"<..: 4 2 2 2 3 3 3 2 3 4 ...
## $ kids_count : Ord.factor w/ 5 levels "0"<"1"<"2"<"3+"<..: 1 1 1 1 3 4 3 1 4 1 ...
household_size <- demographics$household_size
household_size <- factor(household_size, levels = c("1", "2", "3", "4", "5+"))
df <- data.frame(household_size)
ggplot(df, aes(x = household_size)) +
geom_bar(aes(y = (..count..) / sum(..count..)), fill = "darkgreen") +
scale_y_continuous(labels = scales::percent_format()) +
labs(title = "Proportion of Household Sizes",
x = "Household Size",
y = "Proportion") +
theme_minimal()
## Warning: The dot-dot notation (`..count..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(count)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
ggplot(campaign_descriptions, aes(x = start_date, xend = end_date, y = reorder(campaign_id, start_date))) +
geom_segment(aes(yend = campaign_id, color = campaign_type), linewidth = 1.5) +
labs(title = "Campaign Timelines Evaluation",
x = "Durations",
y = "Campaign ID") +
theme_minimal() +
theme(
plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
axis.title.x = element_text(size = 14, face = "bold"),
axis.title.y = element_text(size = 14, face = "bold"),
axis.text.x = element_text(size = 12),
axis.text.y = element_text(size = 12),
legend.title = element_text(size = 14),
legend.text = element_text(size = 12)
)
How Our Team Addressed the Business Challenge
Our team utilized the CompleteJourney package to conduct a thorough analysis of key factors such as consumer preferences, household size, income distribution, and the effectiveness of various promotional strategies. By leveraging this data-driven approach, we were able to devise a comprehensive and targeted strategy for launching our new deli meat product, with a specific focus on Turkey. This approach ensured that our efforts were aligned with consumer trends and purchasing behaviors, maximizing the potential for a successful product introduction.
Key Insights - An interesting finding emerged during our analysis: households with children are more likely to purchase deli meats when bundled with other convenience foods. This insight suggests that offering family meal bundles could significantly increase sales, particularly during the festival season when demand for easy and quick meal solutions tends to rise. By leveraging this trend, we can enhance our product offerings and cater to the specific needs of our target market.
Strategic Implications Based on our analysis, we recommend the following strategies to optimize the launch of our deli meat product:
Target Middle-Income Households: Focus promotions on middle-income households, as they exhibit a strong propensity to purchase deli meats when bundled with other value-driven offers.
Timing of Promotions: Initiate promotional campaigns in mid-fall to coincide with the peak of consumer buying behavior during the festival season. This will increase visibility and drive demand during a critical sales period.
Considerations and Limitations While our analysis provides valuable insights, there are certain limitations that need to be taken into account:
Limited Consumer Preference Data: The data available on specific consumer preferences was somewhat limited, which may impact the precision of our targeting strategies.
Market Uncertainty: The market remains unpredictable, and external factors such as potential economic shifts, changes in consumer behavior, or unforeseen events (e.g., health crises) could affect the outcomes of our promotional efforts. Additionally, fluctuations in turkey prices or supply chain issues may also pose risks to the anticipated success of the launch.
Despite these challenges, our data-driven approach provides a solid foundation for the successful introduction of the new deli meat product, with a clear strategy to drive sales and engage key consumer segments effectively.