Business Question:
How does household income influence purchasing behavior, coupon
usage, and brand preference at Regork, and what strategies can be
implemented to maximize revenue across different income brackets?
Packages Required
completejourney: Provides #consumer transaction
data, including demographics, promotions, and purchase history, enabling
in-depth retail analytics.
ggplot2: Used for creating visualizations such as
bar charts and faceted plots, aiding in the graphical representation of
purchasing trends.
dplyr: Facilitates efficient data manipulation,
including filtering, grouping, summarizing, and merging datasets.
lubridate: Helps with handling and manipulating
date-time data, particularly for analyzing transaction timestamps and
campaign periods.
scales: Enhances ggplot2 plots by formatting
numerical values, such as currency labels, to improve readability and
interpretation.
Objective
Regork aims to identify growth opportunities by analyzing customer
spending habits, discount utilization, and brand preferences based on
income levels. Understanding these trends will enable data-driven
marketing strategies and investment decisions.
Data Source
I used the Complete Journey dataset, integrating the following
datasets:
Transactions: Individual purchase records,
including quantity, sales value, and discounts.
Demographics: Household attributes such as income
level, household size, and marital status.
Products: Product category, brand, and department
information.
Campaign Descriptions: Details of marketing
campaigns and their impact on sales.
Data Analysis
library(completejourney)
## Welcome to the completejourney package! Learn more about these data
## sets at http://bit.ly/completejourney.
data(package = 'completejourney')
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
transactions <- transactions_sample
demographics
## # A tibble: 801 × 8
## household_id age income home_ownership marital_status household_size
## <chr> <ord> <ord> <ord> <ord> <ord>
## 1 1 65+ 35-49K Homeowner Married 2
## 2 1001 45-54 50-74K Homeowner Unmarried 1
## 3 1003 35-44 25-34K <NA> Unmarried 1
## 4 1004 25-34 15-24K <NA> Unmarried 1
## 5 101 45-54 Under 15K Homeowner Married 4
## 6 1012 35-44 35-49K <NA> Married 5+
## 7 1014 45-54 15-24K <NA> Married 4
## 8 1015 45-54 50-74K Homeowner Unmarried 1
## 9 1018 45-54 35-49K Homeowner Married 5+
## 10 1020 45-54 25-34K Homeowner Married 2
## # ℹ 791 more rows
## # ℹ 2 more variables: household_comp <ord>, kids_count <ord>
products
## # A tibble: 92,331 × 7
## product_id manufacturer_id department brand product_category product_type
## <chr> <chr> <chr> <fct> <chr> <chr>
## 1 25671 2 GROCERY Natio… FRZN ICE ICE - CRUSH…
## 2 26081 2 MISCELLANEOUS Natio… <NA> <NA>
## 3 26093 69 PASTRY Priva… BREAD BREAD:ITALI…
## 4 26190 69 GROCERY Priva… FRUIT - SHELF S… APPLE SAUCE
## 5 26355 69 GROCERY Priva… COOKIES/CONES SPECIALTY C…
## 6 26426 69 GROCERY Priva… SPICES & EXTRAC… SPICES & SE…
## 7 26540 69 GROCERY Priva… COOKIES/CONES TRAY PACK/C…
## 8 26601 69 DRUG GM Priva… VITAMINS VITAMIN - M…
## 9 26636 69 PASTRY Priva… BREAKFAST SWEETS SW GDS: SW …
## 10 26691 16 GROCERY Priva… PNT BTR/JELLY/J… HONEY
## # ℹ 92,321 more rows
## # ℹ 1 more variable: package_size <chr>
Dtransaction <- inner_join(demographics,transactions)
## Joining with `by = join_by(household_id)`
Avgquantity <- Dtransaction %>%
group_by(income) %>%
summarise(avgtotal = mean(quantity))
Avgquantity
## # A tibble: 12 × 2
## income avgtotal
## <ord> <dbl>
## 1 Under 15K 128.
## 2 15-24K 63.2
## 3 25-34K 82.4
## 4 35-49K 116.
## 5 50-74K 108.
## 6 75-99K 161.
## 7 100-124K 232.
## 8 125-149K 105.
## 9 150-174K 169.
## 10 175-199K 60.9
## 11 200-249K 56.0
## 12 250K+ 40
ggplot(Avgquantity, aes(x = income, y = avgtotal)) +
geom_col(fill = "steelblue") +
labs(title = "Average Quantity by Income Level",
x = "Income",
y = "Average Quantity") +
theme_minimal()

Purchasing Behavior by Income Level
We analyzed the average quantity of items purchased per transaction
across income brackets. The data revealed:
- Lower-income households (<$35K) purchase fewer items per
transaction, likely due to budget constraints.**
- Middle-income groups ($75K-$149K) exhibit the highest purchase
volume, suggesting they have more disposable income but still seek value
in bulk purchases.
- Higher-income households ($175K+) purchase fewer items on average,
likely reflecting a preference for quality over quantity.
This insight suggests that offering bulk discounts or incentives for
larger purchases could be effective for middle-income segments.
Sumcoupons <- Dtransaction %>%
group_by(income) %>%
summarise(total = sum(retail_disc))
Sumcoupons
## # A tibble: 12 × 2
## income total
## <ord> <dbl>
## 1 Under 15K 1848.
## 2 15-24K 1710.
## 3 25-34K 2158.
## 4 35-49K 4492.
## 5 50-74K 5725.
## 6 75-99K 2787.
## 7 100-124K 911.
## 8 125-149K 1178.
## 9 150-174K 1029.
## 10 175-199K 316.
## 11 200-249K 97.5
## 12 250K+ 391.
ggplot(Sumcoupons, aes(x = income, y = total)) +
geom_col(fill = "purple") +
labs(title = "Total Retail Discount by Income Level",
x = "Income",
y = "Total Discount") +
theme_minimal()

Coupon Usage by Income
Retail discount data was aggregated to analyze coupon usage:
-Lower-income households utilize the highest volume of discounts,
confirming price sensitivity.
- Middle-income groups ($50K-$74K) redeem the most coupons in
absolute terms, indicating a strong response to promotions.
- Higher-income brackets ($200K+) redeem significantly fewer
discounts, implying that they are less price-sensitive and may not be as
influenced by discount-based marketing.
Marketing Campaign Effectiveness
Sales performance was examined across different promotional
campaigns:
- Certain campaigns drive significantly higher sales than others,
with loyalty-focused campaigns generating the most engagement.
- Higher-income segments exhibit lower response rates to promotions,
indicating that price-based incentives may not be the most effective
strategy for them.
- Short-term sales promotions boost revenue, but long-term customer
retention strategies should also be considered.
This suggests that while promotions are effective for driving sales,
a more strategic segmentation approach is needed to ensure higher-income
customers remain engaged through exclusive benefits rather than
discounts.
library(ggplot2)
library(scales)
trans_demo <- transactions %>%
inner_join(demographics, by = "household_id")
full_demo_prod <- trans_demo %>%
inner_join(products, by = "product_id")
income_spending <- full_demo_prod %>%
group_by(income,brand) %>%
summarise(total_spent = sum(sales_value, narm = TRUE), .groups = "drop")
library(ggplot2)
library(scales)
ggplot(data = income_spending, aes(x = brand, y = total_spent, fill = brand)) +
geom_bar(stat = "identity") +
ggtitle("Total Spend for Each Brand in Each Income Bracket") +
scale_y_continuous(name = "Total Spend", labels = scales::dollar) +
xlab("Brand") +
facet_wrap(~ income, nrow = 4) + # Facet by income
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))

Brand Preference by Income
An analysis of brand spending revealed:
- Private-label brands are predominantly favored by lower-income
groups, who prioritize affordability.
- National brands dominate spending in middle-to-higher income
brackets, where brand loyalty is stronger.
- High-income households show a preference for specialty and premium
brands, indicating potential opportunities for exclusive product
offerings.
This suggests that Regork should continue expanding its
private-label offerings for lower-income customers while strengthening
partnerships with premium brands to attract high-income shoppers.
Conclusion
The analysis identified clear distinctions in purchasing behavior,
coupon utilization, and brand preferences across income brackets:
Middle-income groups are the most engaged in
promotions and bulk purchasing.
Lower-income customers prioritize discounts and
private-label products.
Higher-income households spend more per transaction
but are less influenced by promotions.
Recommendations
1. Expand Targeted Promotions: Increase coupon and
discount offerings for price-sensitive lower-income customers.Design
personalized promotions for middle-income consumers, who engage the most
with coupons. Develop loyalty-based incentives for high-income shoppers
rather than relying on discounts.
2. Optimize Product Positioning: Strengthen
private-label branding for lower-income segments. Promote national
brands for middle-income shoppers, who exhibit strong brand
loyalty.Introduce premium and exclusive products to attract high-income
consumers.
3. Refine Marketing Campaigns: Use data-driven
segmentation to tailor campaign strategies. Focus on long-term
engagement strategies, such as membership programs, rather than
short-term discounts. Leverage insights from campaign performance data
to improve future marketing efforts.
4. Enhance Retail Experience: Provide bulk
purchasing incentives for middle-income shoppers. Improve premium
product placements and exclusive shopping experiences for high-income
customers. Ensure affordability remains a key focus for lower-income
shoppers.
By implementing these strategies, Regork can maximize revenue while
enhancing customer satisfaction across all income levels.