Using the top tier stores we want to propose a training campaign to observe what makes these stores thrive during Type A campaigns. These stores boast the best sales performance across our campaigns and cultivate a family friendly environment. Regork can leverage their hands on experience to train sub-standard stores to best practice for increased revenue.
# Load necessary libraries with explanations and warning suppression
suppressPackageStartupMessages({
library(completejourney) # For accessing transaction and demographic data
library(dplyr) # For data manipulation and joining datasets
library(ggplot2) # For data visualization
library(RColorBrewer) # For enhanced color palettes in visualizations
library(lubridate) # For date manipulation and trend analysis
library(broom) # For cleaning model outputs
library(spatial) # For time-series analysis
library(tidyr) # For pivot_wider (replacing spread)
library(scales)
})
Our methodology to answer the business question focused on strategically segmenting the customer base and conducting a comprehensive campaign performance analysis. We began by segmenting customers into three primary demographics: Married, Unmarried (Singles), and Undefined. The undefined category included households with no registered marital status (NA) but with 2+ members, indicating a potentially profitable segment. This segmentation was crucial to understanding purchasing behavior across different life stages and household compositions
# Preparing Data Frame with Objects we will use often in our exploration
transactions <- get_transactions()
special_demographics <- demographics
value_products <- products
promotions <- get_promotions()
married_transactions <- transactions %>%
full_join(special_demographics, by = "household_id") %>%
filter(marital_status == "Married")
print(married_transactions)
## # A tibble: 373,193 × 18
## household_id store_id basket_id product_id quantity sales_value retail_disc
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 900 330 31198570044 1095275 1 0.5 0
## 2 900 330 31198570047 9878513 1 0.99 0.1
## 3 906 319 31198705046 1020156 1 1.5 0.29
## 4 906 319 31198705046 1053875 2 2.78 0.8
## 5 906 319 31198705046 1060312 1 5.49 0.5
## 6 906 319 31198705046 1075313 1 1.5 0.29
## 7 993 32004 31198515122 840361 1 1 0.09
## 8 993 32004 31198515122 859075 1 1.67 0.22
## 9 993 32004 31198515122 866227 2 1 0
## 10 993 32004 31198515122 1039156 1 1.5 0
## # ℹ 373,183 more rows
## # ℹ 11 more variables: coupon_disc <dbl>, coupon_match_disc <dbl>, week <int>,
## # transaction_timestamp <dttm>, age <ord>, income <ord>,
## # home_ownership <ord>, marital_status <ord>, household_size <ord>,
## # household_comp <ord>, kids_count <ord>
singles_transactions <- transactions %>%
full_join(special_demographics, by = "household_id") %>%
filter(marital_status == "Unmarried")
print(singles_transactions)
## # A tibble: 314,225 × 18
## household_id store_id basket_id product_id quantity sales_value retail_disc
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 1228 406 31198655051 1041453 1 1.43 0.15
## 2 1419 32004 31198515072 1037894 1 1 0.29
## 3 1419 32004 31198515072 1069175 1 1.29 0
## 4 1873 361 31198640134 834484 1 0.66 0
## 5 1873 361 31198640134 852437 1 1.5 0.29
## 6 1873 361 31198640134 862854 1 2 0.69
## 7 1873 361 31198640134 926905 1 3 1.59
## 8 1873 361 31198640134 995211 1 1.88 0.61
## 9 1873 361 31198640134 5568378 1 1.88 0.87
## 10 1873 361 31198640137 868473 1 1.05 0
## # ℹ 314,215 more rows
## # ℹ 11 more variables: coupon_disc <dbl>, coupon_match_disc <dbl>, week <int>,
## # transaction_timestamp <dttm>, age <ord>, income <ord>,
## # home_ownership <ord>, marital_status <ord>, household_size <ord>,
## # household_comp <ord>, kids_count <ord>
undefined_transactions <- transactions %>%
full_join(special_demographics, by = "household_id") %>%
filter(is.na(marital_status))
print(undefined_transactions)
## # A tibble: 781,889 × 18
## household_id store_id basket_id product_id quantity sales_value retail_disc
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 1058 381 31198676055 985893 1 1.88 0.21
## 2 1058 381 31198676055 988791 1 1.5 1.29
## 3 1058 381 31198676055 9297106 1 2.69 0
## 4 749 346 31198970033 1082310 1 1.61 0
## 5 749 346 31198970033 5564931 1 1.59 0
## 6 850 292 31198700070 995242 1 1.99 0
## 7 850 292 31198700070 1022254 1 1.27 0
## 8 850 292 31198700070 1082185 1 0.97 0
## 9 70 380 31198975059 1039156 1 1.5 0
## 10 83 381 31198676089 1046485 1 11.0 0
## # ℹ 781,879 more rows
## # ℹ 11 more variables: coupon_disc <dbl>, coupon_match_disc <dbl>, week <int>,
## # transaction_timestamp <dttm>, age <ord>, income <ord>,
## # home_ownership <ord>, marital_status <ord>, household_size <ord>,
## # household_comp <ord>, kids_count <ord>
target_market <- transactions %>%
full_join(special_demographics, by = "household_id") %>%
filter(marital_status == "Married" | is.na(marital_status))
print(target_market)
## # A tibble: 1,155,082 × 18
## household_id store_id basket_id product_id quantity sales_value retail_disc
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 900 330 31198570044 1095275 1 0.5 0
## 2 900 330 31198570047 9878513 1 0.99 0.1
## 3 906 319 31198705046 1020156 1 1.5 0.29
## 4 906 319 31198705046 1053875 2 2.78 0.8
## 5 906 319 31198705046 1060312 1 5.49 0.5
## 6 906 319 31198705046 1075313 1 1.5 0.29
## 7 1058 381 31198676055 985893 1 1.88 0.21
## 8 1058 381 31198676055 988791 1 1.5 1.29
## 9 1058 381 31198676055 9297106 1 2.69 0
## 10 749 346 31198970033 1082310 1 1.61 0
## # ℹ 1,155,072 more rows
## # ℹ 11 more variables: coupon_disc <dbl>, coupon_match_disc <dbl>, week <int>,
## # transaction_timestamp <dttm>, age <ord>, income <ord>,
## # home_ownership <ord>, marital_status <ord>, household_size <ord>,
## # household_comp <ord>, kids_count <ord>
target_age <- transactions %>%
full_join(special_demographics, by = "household_id") %>%
filter(age %in% c("35-44", "45-54"))
print(target_age)
## # A tibble: 526,607 × 18
## household_id store_id basket_id product_id quantity sales_value retail_disc
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 900 330 31198570044 1095275 1 0.5 0
## 2 900 330 31198570047 9878513 1 0.99 0.1
## 3 1228 406 31198655051 1041453 1 1.43 0.15
## 4 1419 32004 31198515072 1037894 1 1 0.29
## 5 1419 32004 31198515072 1069175 1 1.29 0
## 6 1873 361 31198640134 834484 1 0.66 0
## 7 1873 361 31198640134 852437 1 1.5 0.29
## 8 1873 361 31198640134 862854 1 2 0.69
## 9 1873 361 31198640134 926905 1 3 1.59
## 10 1873 361 31198640134 995211 1 1.88 0.61
## # ℹ 526,597 more rows
## # ℹ 11 more variables: coupon_disc <dbl>, coupon_match_disc <dbl>, week <int>,
## # transaction_timestamp <dttm>, age <ord>, income <ord>,
## # home_ownership <ord>, marital_status <ord>, household_size <ord>,
## # household_comp <ord>, kids_count <ord>
non_target_age <- transactions %>%
full_join(special_demographics, by = "household_id") %>%
filter(!(age == "35-44" | age == "45-54"))
print(non_target_age)
## # A tibble: 302,243 × 18
## household_id store_id basket_id product_id quantity sales_value retail_disc
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 906 319 31198705046 1020156 1 1.5 0.29
## 2 906 319 31198705046 1053875 2 2.78 0.8
## 3 906 319 31198705046 1060312 1 5.49 0.5
## 4 906 319 31198705046 1075313 1 1.5 0.29
## 5 993 32004 31198515122 840361 1 1 0.09
## 6 993 32004 31198515122 859075 1 1.67 0.22
## 7 993 32004 31198515122 866227 2 1 0
## 8 993 32004 31198515122 1039156 1 1.5 0
## 9 993 32004 31198515122 1060269 1 0.67 0.22
## 10 993 32004 31198515122 8118533 1 3.49 0.5
## # ℹ 302,233 more rows
## # ℹ 11 more variables: coupon_disc <dbl>, coupon_match_disc <dbl>, week <int>,
## # transaction_timestamp <dttm>, age <ord>, income <ord>,
## # home_ownership <ord>, marital_status <ord>, household_size <ord>,
## # household_comp <ord>, kids_count <ord>
# Demographic Objects
sales_per_household_size <- special_demographics %>%
inner_join(transactions) %>%
group_by(household_size) %>%
summarize(total_sales = sum(sales_value))
## Joining with `by = join_by(household_id)`
sales_per_age_group <- special_demographics %>%
inner_join(transactions) %>%
group_by(age) %>%
summarize(total_sales = sum(sales_value))
## Joining with `by = join_by(household_id)`
print(sales_per_age_group)
## # A tibble: 6 × 2
## age total_sales
## <ord> <dbl>
## 1 19-24 125673.
## 2 25-34 453372.
## 3 35-44 724357.
## 4 45-54 971822.
## 5 55-64 173154.
## 6 65+ 176601.
print(sales_per_household_size)
## # A tibble: 5 × 2
## household_size total_sales
## <ord> <dbl>
## 1 1 754649.
## 2 2 1024643.
## 3 3 383256.
## 4 4 207407.
## 5 5+ 255024.
#Verification that objects were properly joined
sum(married_transactions$sales_value) / sum(transactions$sales_value)
## [1] 0.2653885
sum(singles_transactions$sales_value) / sum(transactions$sales_value)
## [1] 0.2087272
sum(undefined_transactions$sales_value) / sum(transactions$sales_value)
## [1] 0.5258843
# Comparing Summary Statistics for the created objects; for a high level analysis
summary(singles_transactions)
## household_id store_id basket_id product_id
## Length:314225 Length:314225 Length:314225 Length:314225
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## quantity sales_value retail_disc coupon_disc
## Min. : 0.0 Min. : 0.000 Min. : 0.0000 Min. : 0.00000
## 1st Qu.: 1.0 1st Qu.: 1.250 1st Qu.: 0.0000 1st Qu.: 0.00000
## Median : 1.0 Median : 2.000 Median : 0.0000 Median : 0.00000
## Mean : 105.8 Mean : 3.053 Mean : 0.5216 Mean : 0.01835
## 3rd Qu.: 1.0 3rd Qu.: 3.440 3rd Qu.: 0.6500 3rd Qu.: 0.00000
## Max. :35077.0 Max. :224.990 Max. :130.0200 Max. :55.93000
##
## coupon_match_disc week transaction_timestamp
## Min. :0.000000 Min. : 1.00 Min. :2017-01-01 07:26:30.00
## 1st Qu.:0.000000 1st Qu.:15.00 1st Qu.:2017-04-03 17:27:12.00
## Median :0.000000 Median :28.00 Median :2017-07-04 16:40:18.00
## Mean :0.003412 Mean :27.59 Mean :2017-07-03 23:26:37.69
## 3rd Qu.:0.000000 3rd Qu.:41.00 3rd Qu.:2017-10-03 15:45:39.00
## Max. :2.500000 Max. :53.00 Max. :2017-12-31 23:01:20.00
##
## age income home_ownership
## 19-24: 27741 50-74K :87267 Renter : 24461
## 25-34: 55890 35-49K :67806 Probable Renter : 10204
## 35-44: 72667 Under 15K:38449 Homeowner :124497
## 45-54:119997 25-34K :36570 Probable Homeowner: 6414
## 55-64: 15924 15-24K :29356 Unknown : 0
## 65+ : 22006 75-99K :18942 NA's :148649
## (Other) :35835
## marital_status household_size household_comp kids_count
## Married : 0 1 :245464 1 Adult Kids : 43533 0 :265397
## Unmarried:314225 2 : 36732 1 Adult No Kids :245464 1 : 19785
## Unknown : 0 3 : 19784 2 Adults Kids : 5295 2 : 18184
## 4 : 6722 2 Adults No Kids: 19933 3+ : 10859
## 5+: 5523 Unknown : 0 Unknown: 0
##
##
summary(married_transactions)
## household_id store_id basket_id product_id
## Length:373193 Length:373193 Length:373193 Length:373193
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## quantity sales_value retail_disc coupon_disc
## Min. : 0.0 Min. : 0.000 Min. : 0.0000 Min. : 0.00000
## 1st Qu.: 1.0 1st Qu.: 1.380 1st Qu.: 0.0000 1st Qu.: 0.00000
## Median : 1.0 Median : 2.290 Median : 0.0200 Median : 0.00000
## Mean : 130.6 Mean : 3.268 Mean : 0.5484 Mean : 0.02428
## 3rd Qu.: 1.0 3rd Qu.: 3.560 3rd Qu.: 0.6900 3rd Qu.: 0.00000
## Max. :89638.0 Max. :840.000 Max. :90.0500 Max. :31.46000
##
## coupon_match_disc week transaction_timestamp
## Min. :0.000000 Min. : 1.00 Min. :2017-01-01 06:53:26.00
## 1st Qu.:0.000000 1st Qu.:15.00 1st Qu.:2017-04-03 18:54:34.00
## Median :0.000000 Median :28.00 Median :2017-07-04 15:42:07.00
## Mean :0.004538 Mean :27.58 Mean :2017-07-03 22:32:12.91
## 3rd Qu.:0.000000 3rd Qu.:41.00 3rd Qu.:2017-10-02 12:00:37.00
## Max. :4.050000 Max. :53.00 Max. :2017-12-31 22:44:38.00
##
## age income home_ownership marital_status
## 19-24: 7318 50-74K :83188 Renter : 21479 Married :373193
## 25-34: 59334 35-49K :64627 Probable Renter : 558 Unmarried: 0
## 35-44:116760 75-99K :63560 Homeowner :326495 Unknown : 0
## 45-54:127304 125-149K:33780 Probable Homeowner: 4865
## 55-64: 32297 25-34K :23993 Unknown : 0
## 65+ : 30180 150-174K:23827 NA's : 19796
## (Other) :80218
## household_size household_comp kids_count
## 1 : 0 1 Adult Kids : 50470 0 :149502
## 2 :186162 1 Adult No Kids : 0 1 :113677
## 3 : 77017 2 Adults Kids :173221 2 : 43502
## 4 : 43502 2 Adults No Kids:149502 3+ : 66512
## 5+: 66512 Unknown : 0 Unknown: 0
##
##
summary(undefined_transactions)
## household_id store_id basket_id product_id
## Length:781889 Length:781889 Length:781889 Length:781889
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## quantity sales_value retail_disc coupon_disc
## Min. : 0.00 Min. : 0.000 Min. : 0.0000 Min. : 0.00000
## 1st Qu.: 1.00 1st Qu.: 1.250 1st Qu.: 0.0000 1st Qu.: 0.00000
## Median : 1.00 Median : 2.000 Median : 0.0100 Median : 0.00000
## Mean : 90.72 Mean : 3.091 Mean : 0.5411 Mean : 0.01475
## 3rd Qu.: 1.00 3rd Qu.: 3.490 3rd Qu.: 0.6700 3rd Qu.: 0.00000
## Max. :41833.00 Max. :499.990 Max. :100.0000 Max. :37.93000
##
## coupon_match_disc week transaction_timestamp
## Min. :0.000000 Min. : 1.00 Min. :2017-01-01 07:56:33.0
## 1st Qu.:0.000000 1st Qu.:14.00 1st Qu.:2017-03-30 18:42:18.0
## Median :0.000000 Median :27.00 Median :2017-06-29 23:42:49.0
## Mean :0.002272 Mean :27.18 Mean :2017-07-01 03:20:01.0
## 3rd Qu.:0.000000 3rd Qu.:40.00 3rd Qu.:2017-10-01 18:45:03.0
## Max. :7.700000 Max. :53.00 Max. :2017-12-31 22:50:03.0
##
## age income home_ownership marital_status
## 19-24: 8854 50-74K : 32464 Renter : 2244 Married : 0
## 25-34: 30087 35-49K : 30408 Probable Renter : 0 Unmarried: 0
## 35-44: 33797 75-99K : 18430 Homeowner : 85545 Unknown : 0
## 45-54: 56082 15-24K : 14531 Probable Homeowner: 1181 NA's :781889
## 55-64: 4526 25-34K : 14248 Unknown : 0
## 65+ : 8086 (Other): 31351 NA's :692919
## NA's :640457 NA's :640457
## household_size household_comp kids_count
## 1 : 0 1 Adult Kids : 0 0 : 93933
## 2 : 93933 1 Adult No Kids : 0 1 : 26181
## 3 : 26181 2 Adults Kids : 47499 2 : 17016
## 4 : 17016 2 Adults No Kids: 93933 3+ : 4302
## 5+ : 4302 Unknown : 0 Unknown: 0
## NA's:640457 NA's :640457 NA's :640457
##
# Comparison between transaction objects to determine most lucrative customer type
married_average_transactions <- married_transactions %>%
select(income, household_id,transaction_timestamp) %>%
mutate(N = 1) %>%
group_by(income) %>%
summarise(sum(N)/length(unique(household_id))) %>%
rename("average_transactions" = 'sum(N)/length(unique(household_id))' )
married_average_transactions
## # A tibble: 12 × 2
## income average_transactions
## <ord> <dbl>
## 1 Under 15K 937.
## 2 15-24K 745.
## 3 25-34K 1043.
## 4 35-49K 965.
## 5 50-74K 1124.
## 6 75-99K 1156.
## 7 100-124K 977.
## 8 125-149K 1408.
## 9 150-174K 1702.
## 10 175-199K 1156.
## 11 200-249K 830
## 12 250K+ 1634.
singles_average_transactions <- singles_transactions %>%
select(income, household_id,transaction_timestamp) %>%
mutate(N = 1) %>%
group_by(income) %>%
summarise(sum(N)/length(unique(household_id))) %>%
rename("singles_average_transactions" = 'sum(N)/length(unique(household_id))' )
singles_average_transactions
## # A tibble: 11 × 2
## income singles_average_transactions
## <ord> <dbl>
## 1 Under 15K 1131.
## 2 15-24K 863.
## 3 25-34K 914.
## 4 35-49K 929.
## 5 50-74K 992.
## 6 75-99K 824.
## 7 100-124K 991.
## 8 125-149K 957.
## 9 150-174K 1175.
## 10 175-199K 1307.
## 11 250K+ 1802.
undefined_average_transactions <- undefined_transactions %>%
select(income, household_id,transaction_timestamp) %>%
mutate(N = 1) %>%
group_by(income) %>%
summarise(sum(N)/length(unique(household_id))) %>%
rename("undefined_average_transactions" = 'sum(N)/length(unique(household_id))' )
undefined_average_transactions
## # A tibble: 11 × 2
## income undefined_average_transactions
## <ord> <dbl>
## 1 Under 15K 1000.
## 2 15-24K 908.
## 3 25-34K 1018.
## 4 35-49K 950.
## 5 50-74K 1082.
## 6 75-99K 1024.
## 7 125-149K 1324.
## 8 150-174K 1149
## 9 175-199K 1208.
## 10 250K+ 1470
## 11 <NA> 384.
To gain a deeper understanding of customer spending patterns, we examined household size and age group across these segments. We categorized households by size, focusing on those with 2 or more members, as our initial exploration suggested they contributed the highest sales. Additionally, we analyzed customer age groups, paying particular attention to the 35-54 age group, which showed the highest engagement and spending across all marital statuses. This segmentation allowed us to refine our target audience and tailor marketing strategies accordingly
Our analysis revealed several key insights that significantly impact strategic marketing decisions. First, Married households consistently contributed the highest percentage of total sales. The driving force behind this was larger households with 2 or more members, which exhibited the highest average transaction values and total sales. This segment’s purchasing power underscores the importance of targeting married customers with larger household sizes, as they are more likely to yield higher returns. Consequently, marketing efforts should focus on tailored campaigns for this demographic to maximize revenue growth.
# Plots per household size
married_transactions %>% #Married Household
group_by(household_size) %>%
summarize(total_sales = sum(sales_value)) %>%
arrange(desc(total_sales)) %>%
ggplot(aes(x = factor(household_size), y = total_sales, fill = factor(household_size))) +
geom_col() +
labs(title = "Total Sales by Household Size",
x = "Household Size",
y = "Total Sales",
fill = "Household Size") +
theme_minimal() +
scale_fill_brewer(palette = "Set3") +
scale_y_continuous(labels = dollar_format(prefix = "$", big.mark = ","))
In contrast, Singles (Unmarried) showed moderate sales but with lower average transaction values. The spending patterns in this segment did not demonstrate significant revenue growth, leading us to deprioritize this demographic for targeted marketing campaigns. However, maintaining a baseline level of engagement remains important to retain brand loyalty within this group.
singles_transactions %>% #Single Household
group_by(household_size) %>%
summarize(total_sales = sum(sales_value)) %>%
arrange(desc(total_sales)) %>%
ggplot(aes(x = factor(household_size), y = total_sales, fill = factor(household_size))) +
geom_col() +
labs(title = "Total Sales by Household Size",
x = "Household Size",
y = "Total Sales",
fill = "Household Size") +
theme_minimal() +
scale_fill_brewer(palette = "Set3") +
scale_y_continuous(labels = dollar_format(prefix = "$", big.mark = ","))
The Undefined demographic presented an interesting opportunity. This group includes customers with no registered marital status (NA) but with household sizes of 2 or more members, suggesting they might be married but unregistered in the system. This demographic displayed substantial spending patterns, contributing to a significant increase in overall sales. The data indicates that large household sizes within the undefined group behave similarly to married households in terms of spending behavior. Therefore, it is strategic to include them in campaigns targeting larger households, ensuring no potential high-value customers are missed.
undefined_transactions %>% #Undefined Household
group_by(household_size) %>%
summarize(total_sales = sum(sales_value)) %>%
arrange(desc(total_sales)) %>%
ggplot(aes(x = factor(household_size), y = total_sales, fill = factor(household_size))) +
geom_col() +
labs(title = "Total Sales by Household Size",
x = "Household Size",
y = "Total Sales",
fill = "Household Size") +
theme_minimal() +
scale_fill_brewer(palette = "Set3") +
scale_y_continuous(labels = dollar_format(prefix = "$", big.mark = ","))
## Results
Our Household Size Analysis confirmed that larger households consistently drive higher total sales across all segments. Married households with 2+ members were the most profitable, while smaller households (less than 2 members) exhibited lower spending patterns. This reinforces the strategic decision to prioritize marketing campaigns for larger households to maximize revenue.
The Age Group Analysis provided further clarity on customer engagement and spending patterns. The 35-54 age group emerged as the most lucrative, demonstrating the highest engagement and total sales across all marital statuses. This demographic is highly responsive to targeted campaigns, making them a critical audience for future marketing efforts
# Plots per age group
married_transactions %>% #Married in age group
group_by(age) %>%
summarize(total_sales = sum(sales_value)) %>%
arrange(desc(total_sales)) %>%
ggplot(aes(x = factor(age), y = total_sales, fill = factor(age))) +
geom_col() +
labs(title = "Total Sales by Age Group",
x = "Age",
y = "Total Sales",
fill = "Age Group") +
theme_minimal() +
scale_fill_brewer(palette = "Pastel1") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_y_continuous(labels = dollar_format(prefix = "$", big.mark = ","))
singles_transactions %>% #Single Age Group
group_by(age) %>%
summarize(total_sales = sum(sales_value)) %>%
arrange(desc(total_sales)) %>%
ggplot(aes(x = factor(age), y = total_sales, fill = factor(age))) +
geom_col() +
labs(title = "Total Sales by Age Group",
x = "Age",
y = "Total Sales",
fill = "Age Group") +
theme_minimal() +
scale_fill_brewer(palette = "Pastel1") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_y_continuous(labels = dollar_format(prefix = "$", big.mark = ","))
undefined_transactions %>% #Undefined Age Group
group_by(age) %>%
summarize(total_sales = sum(sales_value)) %>%
arrange(desc(total_sales)) %>%
ggplot(aes(x = factor(age), y = total_sales, fill = factor(age))) +
geom_col() +
labs(title = "Total Sales by Age Group",
x = "Age",
y = "Total Sales",
fill = "Age Group") +
theme_minimal() +
scale_fill_brewer(palette = "Pastel1") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_y_continuous(labels = dollar_format(prefix = "$", big.mark = ","))
# Campaign Performance Analysis
In parallel, we performed a campaign performance analysis to evaluate the effectiveness of various campaign types across different customer segments. This analysis aimed to understand how campaign type, household size, and age group influence sales performance. Using boxplots and bar charts, we visualized sales distribution and campaign effectiveness. Boxplots provided a detailed view of sales distribution by campaign type and demographic, while bar charts illustrated total sales comparisons across segments. This dual approach enabled us to identify which campaign types resonated best with high-value customer groups.
Our Campaign Performance Analysis highlighted the significant impact of household size and age group on campaign success. Boxplots visualized clear evidence that households with 2+ members consistently yielded the highest sales, and the 35-54 age group performed best across all campaign types. Interestingly, Campaign Type A resonated most with married customers, especially within the 35-54 age group, confirming its effectiveness for high-value segments
# For Created Marital Category Transactions
married_transactions %>%
left_join(campaigns, by = "household_id") %>%
group_by(campaign_id) %>%
summarize(top_campaign = sum(sales_value, na.rm = TRUE)) %>%
mutate(campaign_id = as.numeric(campaign_id)) %>%
arrange(campaign_id) %>%
ggplot(aes(x = factor(campaign_id), y = top_campaign)) +
geom_col(fill = "steelblue") +
labs(title = "Married Sales per Campaign", x = "Campaign ID", y = "Total Revenue") +
scale_y_continuous(labels = dollar_format(prefix = "$", big.mark = ",")) +
theme_minimal()
By narrowing our target to married households with 2+ members within the 35-54 age group, we can maximize marketing effectiveness and return on investment
singles_transactions %>%
left_join(campaigns, by = "household_id") %>%
group_by(campaign_id) %>%
summarize(top_campaign = sum(sales_value, na.rm = TRUE)) %>%
mutate(campaign_id = as.numeric(campaign_id)) %>%
arrange(campaign_id) %>%
ggplot(aes(x = factor(campaign_id), y = top_campaign)) +
geom_col(fill = "steelblue") +
labs(title = "Singles Sales per Campaign", x = "Campaign ID", y = "Total Revenue") +
scale_y_continuous(labels = dollar_format(prefix = "$", big.mark = ",")) +
theme_minimal()
The analysis also showed that smaller households were less engaged and had lower average sales per campaign, reinforcing our focus on larger households for future campaigns
undefined_transactions %>%
left_join(campaigns, by = "household_id") %>%
group_by(campaign_id) %>%
summarize(top_campaign = sum(sales_value, na.rm = TRUE)) %>%
mutate(campaign_id = as.numeric(campaign_id)) %>%
arrange(campaign_id) %>%
ggplot(aes(x = factor(campaign_id), y = top_campaign)) +
geom_col(fill = "steelblue") +
labs(title = "Undefined Sales per Campaign", x = "Campaign ID", y = "Total Revenue") +
scale_y_continuous(labels = dollar_format(prefix = "$", big.mark = ",")) +
theme_minimal()
plot_tm <- target_market %>%
left_join(campaigns, by = "household_id") %>%
left_join(campaign_descriptions, by = "campaign_id") %>%
mutate(log_sales = log1p(sales_value)) %>%
sample_frac(0.1) %>%
filter(!is.na(campaign_type)) %>%
filter(!is.na(household_size))
plot_tm %>%
ggplot(aes(x = campaign_type, y = sales_value, fill = factor(household_size))) +
geom_boxplot(outlier.shape = NA, alpha = 0.3) +
geom_jitter(color = "black", size = 0.25, alpha = 0.5,
position = position_dodge(width = 0.8)) +
scale_fill_brewer(palette = "Set1") +
labs(title = "Target Market Campaign by Household Size",
x = "Campaign Type",
y = "Total Sales",
fill = "Household Size") +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5, face = "bold"),
legend.position = "top") +
scale_y_continuous(labels = dollar_format(prefix = "$", big.mark = ","),
limits = c(0, 50)) +
coord_flip()
plot_ntm <- singles_transactions %>%
left_join(campaigns, by = "household_id") %>%
left_join(campaign_descriptions, by = "campaign_id") %>%
mutate(log_sales = log1p(sales_value)) %>%
sample_frac(0.1) %>%
filter(!is.na(campaign_type)) %>%
filter(!is.na(household_size))
plot_ntm %>%
ggplot(aes(x = campaign_type, y = sales_value, fill = factor(household_size))) +
geom_boxplot(outlier.shape = NA, alpha = 0.3) +
geom_jitter(color = "black", size = 0.25, alpha = 0.5,
position = position_dodge(width = 0.8)) +
scale_fill_brewer(palette = "Set1") +
labs(title = "Campaign Peformance for Singles",
x = "Campaign Type",
y = "Total Sales",
fill = "Household Size") +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5, face = "bold"),
legend.position = "top") +
scale_y_continuous(labels = dollar_format(prefix = "$", big.mark = ","),
limits = c(0, 50)) +
coord_flip()
plot_ta <- target_age %>%
left_join(campaigns, by = "household_id") %>%
left_join(campaign_descriptions, by = "campaign_id") %>%
mutate(log_sales = log1p(sales_value)) %>%
sample_frac(0.1) %>%
filter(!is.na(campaign_type))
plot_ta %>%
ggplot(aes(x = campaign_type, y = sales_value, fill = factor(age))) +
geom_boxplot(outlier.shape = NA, alpha = 0.3) +
geom_jitter(color = "black", size = 0.25, alpha = 0.5,
position = position_dodge(width = 0.8)) +
scale_fill_brewer(palette = "Set1") +
labs(title = "Campaign Peformance by Target Age Groups",
x = "Campaign Type",
y = "Total Sales",
fill = "Age Group") +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5, face = "bold"),
legend.position = "top") +
scale_y_continuous(labels = dollar_format(prefix = "$", big.mark = ","),
limits = c(0, 50)) +
coord_flip()
plot_nta <- non_target_age %>%
left_join(campaigns, by = "household_id") %>%
left_join(campaign_descriptions, by = "campaign_id") %>%
mutate(log_sales = log1p(sales_value)) %>%
sample_frac(0.1) %>%
filter(!is.na(campaign_type))
plot_nta %>%
ggplot(aes(x = campaign_type, y = sales_value, fill = factor(age))) +
geom_boxplot(outlier.shape = NA, alpha = 0.3) +
geom_jitter(color = "black", size = 0.25, alpha = 0.5,
position = position_dodge(width = 0.8)) +
scale_fill_brewer(palette = "Set1") +
labs(title = "Campaign Peformance by Nontarget Age Group",
x = "Campaign Type",
y = "Total Sales",
fill = "Age Group") +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5, face = "bold"),
legend.position = "top") +
scale_y_continuous(labels = dollar_format(prefix = "$", big.mark = ","),
limits = c(0, 50)) +
coord_flip()
Furthermore, we conducted a Top Store Performance Analysis to identify the top 5 stores contributing the highest revenue during campaigns. This analysis helped us pinpoint high-performing stores that consistently showed high customer engagement and spending. Understanding store-level performance enabled us to optimize promotional efforts, ensuring maximum return on investment
#Finding the top five stores to promote our campaign
top_5_stores <- transactions %>%
left_join(campaigns, by = "household_id") %>%
group_by(store_id) %>%
summarise(total_sales = sum(sales_value, na.rm = TRUE), .groups = "drop") %>%
arrange(desc(total_sales)) %>%
slice_head(n = 5) %>%
pull(store_id)
transactions %>%
left_join(campaigns, by = "household_id") %>%
filter(store_id %in% top_5_stores) %>%
group_by(store_id, campaign_id) %>%
summarise(top_store = sum(sales_value, na.rm = TRUE), .groups = "drop") %>%
mutate(campaign_id = as.numeric(campaign_id)) %>%
arrange(desc(top_store)) %>%
ggplot(aes(x = factor(campaign_id), y = top_store, group = store_id, color = store_id)) +
geom_line(size = 1.2) +
geom_point(size = 3) +
facet_wrap(~ store_id) +
scale_color_brewer(palette = "Dark2") +
scale_x_discrete(breaks = function(x) x[seq(1, length(x), by = 2)]) +
labs(title = "Top 5 Stores by Campaign Performance",
x = "Campaign ID", y = "Total Revenue",
color = "Store ID") +
scale_y_continuous(labels = dollar_format(prefix = "$", big.mark = ",")) +
theme_minimal() +
theme(legend.position = c(0.9, 0.1))
The Top Store Performance Analysis revealed that the Top 5 stores consistently contributed the highest revenue during campaigns. These stores demonstrated high customer engagement and spending, making them ideal targets for promotional activities. By focusing on these high-performing stores, we can enhance campaign reach and maximize revenue growth
We strongly encourage Regork must focus marketing efforts on married households, especially those with 2+ members within the age group 35-54.. These customers consistently show the highest total sales and engagement leading to yield the best ROI. While the successful campaigns continue to use high-performing campaign types, this will reevaluate underperforming stores. These unperforming stores will have a likely chance to maximize engagement if they engage more with the profitable target market. In summary, Regork must consider (1) allocate marketing budget more efficiently, (2) enhance customer engagement by targeting high-value segments, (3) optimize campaign types to maximize ROI, and (4) leverage top-performing stores to boost sales during promotional periods. By focusing on the most lucrative customer segments and optimizing campaign strategies, Regork can maximize sales and build long-term customer loyalty