Business Question:

How does household income influence purchasing behavior, coupon usage, and brand preference at Regork, and what strategies can be implemented to maximize revenue across different income brackets?

Packages Required

completejourney: Provides #consumer transaction data, including demographics, promotions, and purchase history, enabling in-depth retail analytics.

ggplot2: Used for creating visualizations such as bar charts and faceted plots, aiding in the graphical representation of purchasing trends.

dplyr: Facilitates efficient data manipulation, including filtering, grouping, summarizing, and merging datasets.

lubridate: Helps with handling and manipulating date-time data, particularly for analyzing transaction timestamps and campaign periods.

scales: Enhances ggplot2 plots by formatting numerical values, such as currency labels, to improve readability and interpretation.

Objective

Data Source

I used the Complete Journey dataset, integrating the following datasets:

Transactions: Individual purchase records, including quantity, sales value, and discounts.

Demographics: Household attributes such as income level, household size, and marital status.

Products: Product category, brand, and department information.

Campaign Descriptions: Details of marketing campaigns and their impact on sales.

Data Analysis

library(completejourney)
## Welcome to the completejourney package! Learn more about these data
## sets at http://bit.ly/completejourney.
data(package = 'completejourney')
library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
transactions <- transactions_sample
demographics
## # A tibble: 801 × 8
##    household_id age   income    home_ownership marital_status household_size
##    <chr>        <ord> <ord>     <ord>          <ord>          <ord>         
##  1 1            65+   35-49K    Homeowner      Married        2             
##  2 1001         45-54 50-74K    Homeowner      Unmarried      1             
##  3 1003         35-44 25-34K    <NA>           Unmarried      1             
##  4 1004         25-34 15-24K    <NA>           Unmarried      1             
##  5 101          45-54 Under 15K Homeowner      Married        4             
##  6 1012         35-44 35-49K    <NA>           Married        5+            
##  7 1014         45-54 15-24K    <NA>           Married        4             
##  8 1015         45-54 50-74K    Homeowner      Unmarried      1             
##  9 1018         45-54 35-49K    Homeowner      Married        5+            
## 10 1020         45-54 25-34K    Homeowner      Married        2             
## # ℹ 791 more rows
## # ℹ 2 more variables: household_comp <ord>, kids_count <ord>
products
## # A tibble: 92,331 × 7
##    product_id manufacturer_id department    brand  product_category product_type
##    <chr>      <chr>           <chr>         <fct>  <chr>            <chr>       
##  1 25671      2               GROCERY       Natio… FRZN ICE         ICE - CRUSH…
##  2 26081      2               MISCELLANEOUS Natio… <NA>             <NA>        
##  3 26093      69              PASTRY        Priva… BREAD            BREAD:ITALI…
##  4 26190      69              GROCERY       Priva… FRUIT - SHELF S… APPLE SAUCE 
##  5 26355      69              GROCERY       Priva… COOKIES/CONES    SPECIALTY C…
##  6 26426      69              GROCERY       Priva… SPICES & EXTRAC… SPICES & SE…
##  7 26540      69              GROCERY       Priva… COOKIES/CONES    TRAY PACK/C…
##  8 26601      69              DRUG GM       Priva… VITAMINS         VITAMIN - M…
##  9 26636      69              PASTRY        Priva… BREAKFAST SWEETS SW GDS: SW …
## 10 26691      16              GROCERY       Priva… PNT BTR/JELLY/J… HONEY       
## # ℹ 92,321 more rows
## # ℹ 1 more variable: package_size <chr>
Dtransaction <- inner_join(demographics,transactions)
## Joining with `by = join_by(household_id)`
Avgquantity <- Dtransaction %>%
  group_by(income) %>%
  summarise(avgtotal = mean(quantity))

Avgquantity
## # A tibble: 12 × 2
##    income    avgtotal
##    <ord>        <dbl>
##  1 Under 15K    128. 
##  2 15-24K        63.2
##  3 25-34K        82.4
##  4 35-49K       116. 
##  5 50-74K       108. 
##  6 75-99K       161. 
##  7 100-124K     232. 
##  8 125-149K     105. 
##  9 150-174K     169. 
## 10 175-199K      60.9
## 11 200-249K      56.0
## 12 250K+         40
ggplot(Avgquantity, aes(x = income, y = avgtotal)) +
  geom_col(fill = "steelblue") +
  labs(title = "Average Quantity by Income Level",
       x = "Income",
       y = "Average Quantity") +
  theme_minimal()

Purchasing Behavior by Income Level

We analyzed the average quantity of items purchased per transaction across income brackets. The data revealed:
- Lower-income households (<$35K) purchase fewer items per transaction, likely due to budget constraints.**
- Middle-income groups ($75K-$149K) exhibit the highest purchase volume, suggesting they have more disposable income but still seek value in bulk purchases.
- Higher-income households ($175K+) purchase fewer items on average, likely reflecting a preference for quality over quantity.
This insight suggests that offering bulk discounts or incentives for larger purchases could be effective for middle-income segments.
Sumcoupons <- Dtransaction %>%
  group_by(income) %>%
  summarise(total = sum(retail_disc))

Sumcoupons
## # A tibble: 12 × 2
##    income     total
##    <ord>      <dbl>
##  1 Under 15K 1848. 
##  2 15-24K    1710. 
##  3 25-34K    2158. 
##  4 35-49K    4492. 
##  5 50-74K    5725. 
##  6 75-99K    2787. 
##  7 100-124K   911. 
##  8 125-149K  1178. 
##  9 150-174K  1029. 
## 10 175-199K   316. 
## 11 200-249K    97.5
## 12 250K+      391.
ggplot(Sumcoupons, aes(x = income, y = total)) +
  geom_col(fill = "purple") +
  labs(title = "Total Retail Discount by Income Level",
       x = "Income",
       y = "Total Discount") +
  theme_minimal()

Coupon Usage by Income

Retail discount data was aggregated to analyze coupon usage:
-Lower-income households utilize the highest volume of discounts, confirming price sensitivity.
- Middle-income groups ($50K-$74K) redeem the most coupons in absolute terms, indicating a strong response to promotions.
- Higher-income brackets ($200K+) redeem significantly fewer discounts, implying that they are less price-sensitive and may not be as influenced by discount-based marketing.
This suggests that targeted promotions for lower- and middle-income groups could maximize sales, while premium loyalty rewards might better appeal to higher-income customers.
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
transactiontime <- transactions_sample %>%
  mutate(transaction_date = as.Date(transaction_timestamp))

campaign_sales <- transactiontime %>%
  cross_join(campaign_descriptions) %>%
  filter(transaction_date >= start_date & transaction_date <= end_date) %>%
  group_by (campaign_id, campaign_type) %>%
  summarise(total_sales = sum(sales_value, na.rm = TRUE), .groups = "drop")

ggplot(campaign_sales, aes(x = factor(campaign_id), y = total_sales, fill = campaign_type)) +
  geom_col() +
  labs(title = "Total Sales During Campaigns",
       x = "Campaign ID",
       y = "Total Sales ($)",
       fill = "Campaign Type") +
  theme_minimal()

Marketing Campaign Effectiveness

Sales performance was examined across different promotional campaigns:
- Certain campaigns drive significantly higher sales than others, with loyalty-focused campaigns generating the most engagement.
- Higher-income segments exhibit lower response rates to promotions, indicating that price-based incentives may not be the most effective strategy for them.
- Short-term sales promotions boost revenue, but long-term customer retention strategies should also be considered.
This suggests that while promotions are effective for driving sales, a more strategic segmentation approach is needed to ensure higher-income customers remain engaged through exclusive benefits rather than discounts.
library(ggplot2)
library(scales)

trans_demo <- transactions %>%
  inner_join(demographics, by = "household_id")

full_demo_prod <- trans_demo %>%
  inner_join(products, by = "product_id")

income_spending <- full_demo_prod %>%
  group_by(income,brand) %>%
  summarise(total_spent = sum(sales_value, narm = TRUE), .groups = "drop")


library(ggplot2)
library(scales)


ggplot(data = income_spending, aes(x = brand, y = total_spent, fill = brand)) +
  geom_bar(stat = "identity") +
  ggtitle("Total Spend for Each Brand in Each Income Bracket") +
  scale_y_continuous(name = "Total Spend", labels = scales::dollar) +
  xlab("Brand") +
  facet_wrap(~ income, nrow = 4) +  # Facet by income
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Brand Preference by Income

An analysis of brand spending revealed:
- Private-label brands are predominantly favored by lower-income groups, who prioritize affordability.
- National brands dominate spending in middle-to-higher income brackets, where brand loyalty is stronger.
- High-income households show a preference for specialty and premium brands, indicating potential opportunities for exclusive product offerings.
This suggests that Regork should continue expanding its private-label offerings for lower-income customers while strengthening partnerships with premium brands to attract high-income shoppers.

Conclusion

The analysis identified clear distinctions in purchasing behavior, coupon utilization, and brand preferences across income brackets:

Middle-income groups are the most engaged in promotions and bulk purchasing.

Lower-income customers prioritize discounts and private-label products.

Higher-income households spend more per transaction but are less influenced by promotions.

Recommendations

1. Expand Targeted Promotions: Increase coupon and discount offerings for price-sensitive lower-income customers.Design personalized promotions for middle-income consumers, who engage the most with coupons. Develop loyalty-based incentives for high-income shoppers rather than relying on discounts.

2. Optimize Product Positioning: Strengthen private-label branding for lower-income segments. Promote national brands for middle-income shoppers, who exhibit strong brand loyalty.Introduce premium and exclusive products to attract high-income consumers.

3. Refine Marketing Campaigns: Use data-driven segmentation to tailor campaign strategies. Focus on long-term engagement strategies, such as membership programs, rather than short-term discounts. Leverage insights from campaign performance data to improve future marketing efforts.

4. Enhance Retail Experience: Provide bulk purchasing incentives for middle-income shoppers. Improve premium product placements and exclusive shopping experiences for high-income customers. Ensure affordability remains a key focus for lower-income shoppers.

By implementing these strategies, Regork can maximize revenue while enhancing customer satisfaction across all income levels.