Problem Statement:
Regork is looking to enhance customer loyalty and increase sales, particularly among high-income households. Analyzing how these customers use coupons can reveal important trends. By understanding the overall coupon redemption patterns among different age groups, we can tailor our promotions more effectively. This insight is crucial for maximizing marketing efforts and boosting sales.
Approach to Addressing the Problem
To tackle this issue, we analyzed data from our coupon redemptions, customer demographics, and transaction records. We focused on high-income households and looked at how different age groups engage with our coupons. Using techniques like grouping and summarizing data, we were able to see clear patterns in coupon usage.
Benefits of the Analysis
This analysis will provide valuable insights into customer behavior, helping to refine marketing strategies. By targeting the age groups that redeem coupons most frequently, Regork can create more effective promotions and ultimately drive higher sales. Our proposed solution is to adjust marketing efforts based on these findings to enhance customer engagement and loyalty.
To conduct this analysis, we used several R packages that are essential for data handling and visualization. When we load these packages, R sometimes shows messages or warnings that can clutter the output. To keep things clear and focused, we’ve suppressed these messages so that the results are easy to read and understand.
suppressWarnings(suppressMessages(library(tidyverse))) # A collection of R packages for data manipulation and visualization.
suppressWarnings(suppressMessages(library(lubridate))) # A package for working with date and time data easily.
suppressWarnings(suppressMessages(library(completejourney))) # Offers tools to understand how customers interact with a brand from start to finish in their buying journey.
suppressWarnings(suppressMessages(library(dplyr))) # A package for data manipulation and transformation.
suppressWarnings(suppressMessages(library(ggplot2))) # A package for creating elegant data visualizations.
suppressWarnings(suppressMessages(library(scales))) # A package for formatting axes and labels in plots.
In this part, we gathered all the necessary data for our analysis. We pulled in information about promotions and transactions to see how customers interact with our marketing efforts.
We also included important details like campaign descriptions, coupons, customer demographics, coupon redemptions, and product information. This wide range of data helps us understand how effective our promotions are and how different customers behave when using coupons.
c(promotions, transactions) %<-% get_data(which = 'both', verbose = FALSE)
campaigns
## # A tibble: 6,589 Ă— 2
## campaign_id household_id
## <chr> <chr>
## 1 1 105
## 2 1 1238
## 3 1 1258
## 4 1 1483
## 5 1 2200
## 6 1 293
## 7 1 529
## 8 1 536
## 9 1 568
## 10 1 630
## # ℹ 6,579 more rows
campaign_descriptions
## # A tibble: 27 Ă— 4
## campaign_id campaign_type start_date end_date
## <chr> <ord> <date> <date>
## 1 1 Type B 2017-03-03 2017-04-09
## 2 2 Type B 2017-03-08 2017-04-09
## 3 3 Type C 2017-03-13 2017-05-08
## 4 4 Type B 2017-03-29 2017-04-30
## 5 5 Type B 2017-04-03 2017-05-07
## 6 6 Type C 2017-04-19 2017-05-21
## 7 7 Type B 2017-04-24 2017-05-28
## 8 8 Type A 2017-05-08 2017-06-25
## 9 9 Type B 2017-05-31 2017-07-02
## 10 10 Type B 2017-06-28 2017-07-30
## # ℹ 17 more rows
coupons
## # A tibble: 116,204 Ă— 3
## coupon_upc product_id campaign_id
## <chr> <chr> <chr>
## 1 10000085207 9676830 26
## 2 10000085207 9676943 26
## 3 10000085207 9676944 26
## 4 10000085207 9676947 26
## 5 10000085207 9677008 26
## 6 10000085207 9677052 26
## 7 10000085207 9677385 26
## 8 10000085207 9677479 26
## 9 10000085207 9677791 26
## 10 10000085207 9677878 26
## # ℹ 116,194 more rows
demographics
## # A tibble: 801 Ă— 8
## household_id age income home_ownership marital_status household_size
## <chr> <ord> <ord> <ord> <ord> <ord>
## 1 1 65+ 35-49K Homeowner Married 2
## 2 1001 45-54 50-74K Homeowner Unmarried 1
## 3 1003 35-44 25-34K <NA> Unmarried 1
## 4 1004 25-34 15-24K <NA> Unmarried 1
## 5 101 45-54 Under 15K Homeowner Married 4
## 6 1012 35-44 35-49K <NA> Married 5+
## 7 1014 45-54 15-24K <NA> Married 4
## 8 1015 45-54 50-74K Homeowner Unmarried 1
## 9 1018 45-54 35-49K Homeowner Married 5+
## 10 1020 45-54 25-34K Homeowner Married 2
## # ℹ 791 more rows
## # ℹ 2 more variables: household_comp <ord>, kids_count <ord>
coupon_redemptions
## # A tibble: 2,102 Ă— 4
## household_id coupon_upc campaign_id redemption_date
## <chr> <chr> <chr> <date>
## 1 1029 51380041013 26 2017-01-01
## 2 1029 51380041313 26 2017-01-01
## 3 165 53377610033 26 2017-01-03
## 4 712 51380041013 26 2017-01-07
## 5 712 54300016033 26 2017-01-07
## 6 2488 51200092776 26 2017-01-10
## 7 2488 51410010050 26 2017-01-10
## 8 1923 53000012033 26 2017-01-14
## 9 1923 54300021057 26 2017-01-14
## 10 1923 57047091041 26 2017-01-14
## # ℹ 2,092 more rows
products
## # A tibble: 92,331 Ă— 7
## product_id manufacturer_id department brand product_category product_type
## <chr> <chr> <chr> <fct> <chr> <chr>
## 1 25671 2 GROCERY Natio… FRZN ICE ICE - CRUSH…
## 2 26081 2 MISCELLANEOUS Natio… <NA> <NA>
## 3 26093 69 PASTRY Priva… BREAD BREAD:ITALI…
## 4 26190 69 GROCERY Priva… FRUIT - SHELF S… APPLE SAUCE
## 5 26355 69 GROCERY Priva… COOKIES/CONES SPECIALTY C…
## 6 26426 69 GROCERY Priva… SPICES & EXTRAC… SPICES & SE…
## 7 26540 69 GROCERY Priva… COOKIES/CONES TRAY PACK/C…
## 8 26601 69 DRUG GM Priva… VITAMINS VITAMIN - M…
## 9 26636 69 PASTRY Priva… BREAKFAST SWEETS SW GDS: SW …
## 10 26691 16 GROCERY Priva… PNT BTR/JELLY/J… HONEY
## # ℹ 92,321 more rows
## # ℹ 1 more variable: package_size <chr>
In our analysis, we examined how high-income households engage with coupons, focusing on different age groups. By merging data from demographics, transactions, and coupon usage, we identified trends in coupon redemption for various age categories.
We created visual aids, including a bar chart showing total coupon redemptions by age group and a heat map illustrating engagement levels. These visuals were designed to be clear and easy to understand, helping us quickly grasp which age groups are most active in using coupons.
Our findings revealed specific age demographics that are more likely to redeem coupons. This insight will guide our marketing strategies, allowing us to tailor promotions to better meet the needs of these groups.
4.1 Total Sales Value by Income Groups
In this section, we examine how much money each income group generates from sales. By combining demographic and sales data, we see that Group 2 (earning between $35K and $99K) is our biggest contributor. The bar chart makes this clear. Interestingly, the High-Income Group (Group 3) is not spending as much as some of the lower-income groups. Based on this finding, we’re exploring strategies to boost spending from Group 3.
data <- demographics %>%
inner_join(transactions, by = "household_id") %>%
mutate(income_group = case_when(
income %in% c("25-34K", "15-24K", "Under 15K") ~ "GROUP 1",
income %in% c("35-49K", "50-74K", "75-99K") ~ "GROUP 2",
income %in% c("100-124K", "125-149K", "150-174K", "250K+", "175-199K", "200-249K") ~ "GROUP 3",
TRUE ~ "Other"
)) %>%
#Grouping by income group
group_by(income_group) %>%
#Summarizing total sales value
summarise(total_sales_value = sum(sales_value, na.rm = TRUE), .groups = 'drop')
ggplot(data, aes(x = income_group, y = total_sales_value)) +
geom_bar(stat = "identity", fill = "skyblue", color = "black") +
labs(title = "Total Sales Value by Income Group",
x = "Income Group",
y = "Total Sales Value") +
scale_y_continuous(labels = comma) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
4.2 Top products for Income Group 3 (High-Income Group) for each day of week
In this section, we analyze the best-selling products for a specific income group throughout the week. The results show that certain items consistently top the sales chart each day, with gasoline (regular unleaded) leading, followed by fluid milk (white only), soft drinks, beer, malt liquors, cigarettes, primal cuts, shredded cheese, choice beef, and toilet tissue. This consistent pattern helps us understand customer preferences and timing, allowing us to tailor our marketing strategies more effectively to promote these high-demand products.
data <- demographics %>%
inner_join(transactions, by = "household_id") %>%
#Joining with products table
inner_join(products, by = "product_id") %>%
mutate(income_group = case_when(
income %in% c("25-34K", "15-24K", "Under 15K") ~ "GROUP 1",
income %in% c("35-49K", "50-74K", "75-99K") ~ "GROUP 2",
income %in% c("100-124K", "125-149K", "150-174K", "250K+", "175-199K", "200-249K") ~ "GROUP 3",
TRUE ~ "Other"
))
#Filtering for Income Group 3
group_3_data <- data %>%
filter(income_group == "GROUP 2")
#Extracting the day of the week from the transaction timestamp
group_3_data <- group_3_data %>%
mutate(day_of_week = wday(transaction_timestamp, label = TRUE)) # Get day of week as factor
#Summarizing total sales value for each product type by day of the week
product_sales <- group_3_data %>%
group_by(day_of_week, product_type) %>%
summarise(total_sales_value = sum(sales_value, na.rm = TRUE), .groups = 'drop') %>%
arrange(day_of_week, desc(total_sales_value)) # Arrange by day and sales value
#Getting Top 10 product types for each day of the week
top_product_types <- product_sales %>%
group_by(day_of_week) %>%
slice_head(n = 10)
#To view the top product types
print(top_product_types)
## # A tibble: 70 Ă— 3
## # Groups: day_of_week [7]
## day_of_week product_type total_sales_value
## <ord> <chr> <dbl>
## 1 Sun GASOLINE-REG UNLEADED 17342.
## 2 Sun FLUID MILK WHITE ONLY 5706.
## 3 Sun BEERALEMALT LIQUORS 5447.
## 4 Sun SOFT DRINKS 12/18&15PK CAN CAR 5084.
## 5 Sun FRZN SS PREMIUM ENTREES/DNRS/N 2733.
## 6 Sun PRIMAL 2670.
## 7 Sun CIGARETTES 2615.
## 8 Sun SHREDDED CHEESE 2580.
## 9 Sun CHOICE BEEF 2372.
## 10 Sun TOILET TISSUE 2332.
## # ℹ 60 more rows
#Creating a Plot to visualize the top product types for each day
ggplot(top_product_types, aes(x = reorder(product_type, -total_sales_value), y = total_sales_value, fill = product_type)) +
geom_bar(stat = "identity") +
facet_wrap(~ day_of_week, scales = "free_y") +
labs(title = "Top 10 Product Types for Income Group 3 by Day of the Week",
x = "Product Type",
y = "Total Sales Value") +
theme_minimal(base_size = 8) +
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
legend.position = "bottom")
4.3 Coupons Issued for Top Products for High-Income Groups
In this analysis, we look at the number of coupons issued for our top-selling products. Our findings show that fluid milk leads the list, followed by premium items, shredded cheese, frozen premium products, primal, soft drinks, choice beef, and toilet tissue. Understanding how coupons are used for these popular products helps us assess the effectiveness of our promotions.
#Filtering the top product types for Income Group 3
#Reusing the 'top_product_types' data from the previous analysis
top_product_types_list <- unique(top_product_types$product_type)
#Preparing coupon data with household information for Income Group 3
coupon_data <- coupons %>%
#Joining with products to get product information
inner_join(products, by = "product_id") %>%
#Joining with transactions to link to household IDs
inner_join(transactions, by = "product_id") %>%
#Adding household information to filter by income group
inner_join(demographics, by = "household_id") %>%
filter(product_type %in% top_product_types_list &
income %in% c("100-124K", "125-149K", "150-174K", "250K+", "175-199K", "200-249K")) %>%
select(coupon_upc, product_id, product_type, campaign_id) # Select relevant columns
#Counting the number of coupons issued for each top product type in Income Group 3
coupon_summary <- coupon_data %>%
group_by(product_type) %>%
summarise(number_of_coupons = n(), .groups = 'drop')
#To View the coupon summary
print(coupon_summary)
## # A tibble: 8 Ă— 2
## product_type number_of_coupons
## <chr> <int>
## 1 CHOICE BEEF 1271
## 2 FLUID MILK WHITE ONLY 12295
## 3 FRZN SS PREMIUM ENTREES/DNRS/N 4548
## 4 PREMIUM 5601
## 5 PRIMAL 1908
## 6 SHREDDED CHEESE 4975
## 7 SOFT DRINKS 12/18&15PK CAN CAR 1846
## 8 TOILET TISSUE 376
#Creating a plot to visualize the number of coupons for each top product type
ggplot(coupon_summary, aes(x = reorder(product_type, -number_of_coupons), y = number_of_coupons, fill = product_type)) +
geom_bar(stat = "identity") +
labs(title = "Number of Coupons Issued for Top Product Types in Income Group 3",
x = "Product Type",
y = "Number of Coupons") +
theme_minimal(base_size = 8) +
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
legend.position = "bottom")
4.4 Monthly coupon redemption for High-Income Groups
In this section, we examine how many coupons high-income customers are redeeming each month. Our findings indicate that redemptions are highest in May, August, and November, while January through April, as well as June, July, October, and December of 2017, see lower redemption rates. The line graph effectively highlights these trends over time, allowing us to pinpoint the peak periods for coupon usage.
#summarizing coupon redemption by month
monthly_redemption_summary <- coupon_redemptions %>%
inner_join(demographics, by = "household_id") %>%
filter(income %in% c("100-124K", "125-149K", "150-174K", "250K+", "175-199K", "200-249K")) %>%
mutate(month = floor_date(redemption_date, "month")) %>%
group_by(month) %>%
summarise(total_redemptions = n(), .groups = 'drop')
#Creating a plot to visualize coupon redemption trends over time
ggplot(monthly_redemption_summary, aes(x = month, y = total_redemptions)) +
geom_line(color = "steelblue", size = 1.2) +
geom_point(size = 3, color = "steelblue") +
geom_area(fill = "lightblue", alpha = 0.4) +
geom_text(aes(label = total_redemptions), vjust = -0.5, color = "steelblue") +
labs(title = "Monthly Coupon Redemption for High-Income Groups", x = "Month", y = "Total Redemptions") +
scale_x_date(date_labels = "%b %Y", date_breaks = "1 month") +
theme_minimal(base_size = 14) +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
panel.grid.major.y = element_line(color = "gray80", linetype = "dashed"),
panel.grid.minor.y = element_blank())
4.5 Monthly Sales over the year for High-Income Groups
In this section, we analyze the total sales from high-income groups throughout the year. Our findings reveal that February has the lowest sales, while June hits an all-time low of 41,196. The remaining months consistently show sales figures around 47,000 to 50,000. This analysis helps us identify seasonal trends and understand how sales vary from month to month.
#Defining high-income groups
high_income_groups <- c("100-124K", "125-149K", "150-174K", "250K+", "175-199K", "200-249K")
#Calculating monthly sales for high-income groups
monthly_sales_summary <- transactions %>%
inner_join(demographics, by = "household_id") %>%
filter(income %in% high_income_groups) %>%
mutate(month = floor_date(as.Date(transaction_timestamp), "month")) %>%
group_by(month) %>%
summarise(total_sales = sum(sales_value, na.rm = TRUE))
#Creating a full sequence of months
all_months <- seq.Date(min(as.Date(monthly_sales_summary$month)), max(as.Date(monthly_sales_summary$month)), by = "month")
#Merging and filling the missing months with zero sales
monthly_sales_complete <- data.frame(month = all_months) %>%
left_join(monthly_sales_summary, by = "month") %>%
replace_na(list(total_sales = 0))
#Creating the line plot
ggplot(monthly_sales_complete, aes(x = month, y = total_sales)) +
geom_line(color = "orange", size = 1.2) +
geom_point(color = "orange", size = 3) +
geom_text(aes(label = round(total_sales, 0)), vjust = -0.5, color = "orange", size = 3) +
labs(title = "Monthly Sales for High-Income Groups", x = "Month", y = "Total Sales") +
scale_x_date(date_breaks = "1 month", date_labels = "%b %Y") +
theme_minimal(base_size = 14) +
theme(axis.text.x = element_text(angle = 45, hjust = 1), plot.title = element_text(hjust = 0.5))
4.6 Monthly Coupon Redemption for High-Income Groups
In this section, we focus on how often high-income customers redeem coupons each month. Notably, the income group earning between $150,000 and $174,000 has the highest redemption rates among all groups. The bar chart clearly displays these redemption trends over time, offering valuable insights to enhance our marketing strategies.
#Defining high-income groups
high_income_groups <- c("100-124K", "125-149K", "150-174K", "250K+", "175-199K", "200-249K")
#Coupon Redemption for High-Income Groups
monthly_coupon_data <- coupon_redemptions %>%
inner_join(demographics, by = "household_id") %>%
filter(income %in% high_income_groups) %>%
mutate(month = floor_date(redemption_date, "month")) %>%
group_by(month, income) %>%
summarise(total_redemptions = n_distinct(coupon_upc), .groups = 'drop')
#Creating a bar plot for coupon redemption
ggplot(monthly_coupon_data, aes(x = month, y = total_redemptions, fill = income)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Monthly Coupon Redemptions for High-Income Groups",
x = "Month",
y = "Total Coupon Redemptions") +
theme_minimal(base_size = 14)
4.7 Coupon Redemption by Age Group of High-Income Groups
In this section, we explore how various age groups within high-income households use coupons. The analysis shows that coupon redemption rates are very uneven across all income levels. By identifying which age groups engage more with coupons, we can enhance our marketing strategies to better reach these customers.
age_group_coupon_data <- coupon_redemptions %>%
inner_join(demographics, by = "household_id") %>%
filter(income %in% high_income_groups) %>%
group_by(age, income) %>%
summarise(total_redemptions = n_distinct(coupon_upc), .groups = 'drop')
#Creating a bar plot for coupon redemption by age group
ggplot(age_group_coupon_data, aes(x = age, y = total_redemptions, fill = income)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Coupon Redemptions by Age Group for High-Income Groups",
x = "Age Group",
y = "Total Coupon Redemptions") +
theme_minimal(base_size = 14)
4.8 Heatmap of Coupon Redemption by Age Group of High Income Group
n this section, we use a heat map to illustrate coupon usage among different age groups in high-income households. The analysis shows that while the 35-44 age group earning between 150,000 and 174,000, and the 45-54 age group earning 125,000 to 149,000, demonstrate high coupon redemption rates, several other groups, such as those aged 25-44 and 55+ earning 100,000 to 124,000, as well as 55+ earning 125,000 to 149,000 and 55-64 earning 175,000 to 199,000, show significantly lower redemption rates. This visual insight helps us identify both the most engaged demographics for targeted marketing and those that may need more focused promotional efforts to boost their coupon usage.
age_group_coupon_data <- coupon_redemptions %>%
inner_join(demographics, by = "household_id") %>%
filter(income %in% high_income_groups) %>%
group_by(age, income) %>%
summarise(total_redemptions = n_distinct(coupon_upc), .groups = 'drop')
#Creating a heat map for coupon redemption by age group
ggplot(age_group_coupon_data, aes(x = age, y = income, fill = total_redemptions)) +
geom_tile(color = "white") + # Add a white border around tiles
scale_fill_gradient(low = "lightblue", high = "darkblue") + # Gradient fill for total redemptions
labs(title = "Heatmap of Coupon Redemptions by Age Group for High-Income Groups",
x = "Age Group",
y = "Income Group") +
theme_minimal(base_size = 10) +
theme(legend.title = element_blank())
Summarize the problem statement addressed
We aimed to understand how high-income households use coupons and how different age groups engage with these offers. This insight is important for tailoring our marketing strategies to better reach these customers.
Summarize how you addressed this problem statement
To tackle this problem, we collected various data sets, including promotions, transactions, customer demographics, coupon usage, and product information. We analyzed this data to see patterns in coupon redemptions across different age groups, helping us understand who is using coupons and when.
Summarize the interesting insights that your analysis provided
Our analysis revealed specific age groups that are more likely to redeem coupons. We discovered trends in which demographics engage more with our promotional offers, providing valuable information about customer behavior.
Summarize the implications to the consumer of your analysis. What would you propose to the Regork CEO?
For consumers, our findings suggest that targeted promotions could lead to better savings opportunities, especially for certain age groups. I would propose to the Regork CEO that we create tailored marketing campaigns that focus on the demographics most engaged with coupons to enhance customer satisfaction and loyalty.
Discuss the limitations of your analysis and how you, or someone else, could improve or build on it.
One limitation of our analysis is that we only looked at high-income households, which may not represent the broader customer base. Additionally, we didn’t consider factors like geographical differences or seasonal trends. To improve this analysis, we could include data from other income groups and expand our focus to capture seasonal patterns, providing a more comprehensive view of customer behavior.