Introduction and Methodology

Using the top tier stores we want to propose a training campaign to observe what makes these stores thrive during Type A campaigns. These stores boast the best sales performance across our campaigns and cultivate a family friendly environment. Regork can leverage their hands on experience to train sub-standard stores to best practice for increased revenue.

# Load necessary libraries with explanations and warning suppression
suppressPackageStartupMessages({
  library(completejourney)  # For accessing transaction and demographic data
  library(dplyr)            # For data manipulation and joining datasets
  library(ggplot2)          # For data visualization
  library(RColorBrewer)     # For enhanced color palettes in visualizations
  library(lubridate)        # For date manipulation and trend analysis
  library(broom)            # For cleaning model outputs
  library(spatial)          # For time-series analysis
  library(tidyr)            # For pivot_wider (replacing spread)
  library(scales)
})

Methodology

Our methodology to answer the business question focused on strategically segmenting the customer base and conducting a comprehensive campaign performance analysis. We began by segmenting customers into three primary demographics: Married, Unmarried (Singles), and Undefined. The undefined category included households with no registered marital status (NA) but with 2+ members, indicating a potentially profitable segment. This segmentation was crucial to understanding purchasing behavior across different life stages and household compositions

# Preparing Data Frame with Objects we will use often in our exploration
transactions <- get_transactions()
special_demographics <- demographics
value_products <- products
promotions <- get_promotions()

married_transactions <- transactions %>%
  full_join(special_demographics, by = "household_id") %>%
  filter(marital_status == "Married")
print(married_transactions)
## # A tibble: 373,193 × 18
##    household_id store_id basket_id   product_id quantity sales_value retail_disc
##    <chr>        <chr>    <chr>       <chr>         <dbl>       <dbl>       <dbl>
##  1 900          330      31198570044 1095275           1        0.5         0   
##  2 900          330      31198570047 9878513           1        0.99        0.1 
##  3 906          319      31198705046 1020156           1        1.5         0.29
##  4 906          319      31198705046 1053875           2        2.78        0.8 
##  5 906          319      31198705046 1060312           1        5.49        0.5 
##  6 906          319      31198705046 1075313           1        1.5         0.29
##  7 993          32004    31198515122 840361            1        1           0.09
##  8 993          32004    31198515122 859075            1        1.67        0.22
##  9 993          32004    31198515122 866227            2        1           0   
## 10 993          32004    31198515122 1039156           1        1.5         0   
## # ℹ 373,183 more rows
## # ℹ 11 more variables: coupon_disc <dbl>, coupon_match_disc <dbl>, week <int>,
## #   transaction_timestamp <dttm>, age <ord>, income <ord>,
## #   home_ownership <ord>, marital_status <ord>, household_size <ord>,
## #   household_comp <ord>, kids_count <ord>
singles_transactions <- transactions %>%
  full_join(special_demographics, by = "household_id") %>%
  filter(marital_status == "Unmarried")
print(singles_transactions)
## # A tibble: 314,225 × 18
##    household_id store_id basket_id   product_id quantity sales_value retail_disc
##    <chr>        <chr>    <chr>       <chr>         <dbl>       <dbl>       <dbl>
##  1 1228         406      31198655051 1041453           1        1.43        0.15
##  2 1419         32004    31198515072 1037894           1        1           0.29
##  3 1419         32004    31198515072 1069175           1        1.29        0   
##  4 1873         361      31198640134 834484            1        0.66        0   
##  5 1873         361      31198640134 852437            1        1.5         0.29
##  6 1873         361      31198640134 862854            1        2           0.69
##  7 1873         361      31198640134 926905            1        3           1.59
##  8 1873         361      31198640134 995211            1        1.88        0.61
##  9 1873         361      31198640134 5568378           1        1.88        0.87
## 10 1873         361      31198640137 868473            1        1.05        0   
## # ℹ 314,215 more rows
## # ℹ 11 more variables: coupon_disc <dbl>, coupon_match_disc <dbl>, week <int>,
## #   transaction_timestamp <dttm>, age <ord>, income <ord>,
## #   home_ownership <ord>, marital_status <ord>, household_size <ord>,
## #   household_comp <ord>, kids_count <ord>
undefined_transactions <- transactions %>%
  full_join(special_demographics, by = "household_id") %>%
  filter(is.na(marital_status))
print(undefined_transactions)
## # A tibble: 781,889 × 18
##    household_id store_id basket_id   product_id quantity sales_value retail_disc
##    <chr>        <chr>    <chr>       <chr>         <dbl>       <dbl>       <dbl>
##  1 1058         381      31198676055 985893            1        1.88        0.21
##  2 1058         381      31198676055 988791            1        1.5         1.29
##  3 1058         381      31198676055 9297106           1        2.69        0   
##  4 749          346      31198970033 1082310           1        1.61        0   
##  5 749          346      31198970033 5564931           1        1.59        0   
##  6 850          292      31198700070 995242            1        1.99        0   
##  7 850          292      31198700070 1022254           1        1.27        0   
##  8 850          292      31198700070 1082185           1        0.97        0   
##  9 70           380      31198975059 1039156           1        1.5         0   
## 10 83           381      31198676089 1046485           1       11.0         0   
## # ℹ 781,879 more rows
## # ℹ 11 more variables: coupon_disc <dbl>, coupon_match_disc <dbl>, week <int>,
## #   transaction_timestamp <dttm>, age <ord>, income <ord>,
## #   home_ownership <ord>, marital_status <ord>, household_size <ord>,
## #   household_comp <ord>, kids_count <ord>
target_market <- transactions %>%
  full_join(special_demographics, by = "household_id") %>%
  filter(marital_status == "Married" | is.na(marital_status))
print(target_market)
## # A tibble: 1,155,082 × 18
##    household_id store_id basket_id   product_id quantity sales_value retail_disc
##    <chr>        <chr>    <chr>       <chr>         <dbl>       <dbl>       <dbl>
##  1 900          330      31198570044 1095275           1        0.5         0   
##  2 900          330      31198570047 9878513           1        0.99        0.1 
##  3 906          319      31198705046 1020156           1        1.5         0.29
##  4 906          319      31198705046 1053875           2        2.78        0.8 
##  5 906          319      31198705046 1060312           1        5.49        0.5 
##  6 906          319      31198705046 1075313           1        1.5         0.29
##  7 1058         381      31198676055 985893            1        1.88        0.21
##  8 1058         381      31198676055 988791            1        1.5         1.29
##  9 1058         381      31198676055 9297106           1        2.69        0   
## 10 749          346      31198970033 1082310           1        1.61        0   
## # ℹ 1,155,072 more rows
## # ℹ 11 more variables: coupon_disc <dbl>, coupon_match_disc <dbl>, week <int>,
## #   transaction_timestamp <dttm>, age <ord>, income <ord>,
## #   home_ownership <ord>, marital_status <ord>, household_size <ord>,
## #   household_comp <ord>, kids_count <ord>
target_age <- transactions %>%
  full_join(special_demographics, by = "household_id") %>%
  filter(age %in% c("35-44", "45-54"))
print(target_age)
## # A tibble: 526,607 × 18
##    household_id store_id basket_id   product_id quantity sales_value retail_disc
##    <chr>        <chr>    <chr>       <chr>         <dbl>       <dbl>       <dbl>
##  1 900          330      31198570044 1095275           1        0.5         0   
##  2 900          330      31198570047 9878513           1        0.99        0.1 
##  3 1228         406      31198655051 1041453           1        1.43        0.15
##  4 1419         32004    31198515072 1037894           1        1           0.29
##  5 1419         32004    31198515072 1069175           1        1.29        0   
##  6 1873         361      31198640134 834484            1        0.66        0   
##  7 1873         361      31198640134 852437            1        1.5         0.29
##  8 1873         361      31198640134 862854            1        2           0.69
##  9 1873         361      31198640134 926905            1        3           1.59
## 10 1873         361      31198640134 995211            1        1.88        0.61
## # ℹ 526,597 more rows
## # ℹ 11 more variables: coupon_disc <dbl>, coupon_match_disc <dbl>, week <int>,
## #   transaction_timestamp <dttm>, age <ord>, income <ord>,
## #   home_ownership <ord>, marital_status <ord>, household_size <ord>,
## #   household_comp <ord>, kids_count <ord>
non_target_age <- transactions %>%
  full_join(special_demographics, by = "household_id") %>%
  filter(!(age == "35-44" | age == "45-54"))
print(non_target_age)
## # A tibble: 302,243 × 18
##    household_id store_id basket_id   product_id quantity sales_value retail_disc
##    <chr>        <chr>    <chr>       <chr>         <dbl>       <dbl>       <dbl>
##  1 906          319      31198705046 1020156           1        1.5         0.29
##  2 906          319      31198705046 1053875           2        2.78        0.8 
##  3 906          319      31198705046 1060312           1        5.49        0.5 
##  4 906          319      31198705046 1075313           1        1.5         0.29
##  5 993          32004    31198515122 840361            1        1           0.09
##  6 993          32004    31198515122 859075            1        1.67        0.22
##  7 993          32004    31198515122 866227            2        1           0   
##  8 993          32004    31198515122 1039156           1        1.5         0   
##  9 993          32004    31198515122 1060269           1        0.67        0.22
## 10 993          32004    31198515122 8118533           1        3.49        0.5 
## # ℹ 302,233 more rows
## # ℹ 11 more variables: coupon_disc <dbl>, coupon_match_disc <dbl>, week <int>,
## #   transaction_timestamp <dttm>, age <ord>, income <ord>,
## #   home_ownership <ord>, marital_status <ord>, household_size <ord>,
## #   household_comp <ord>, kids_count <ord>
# Demographic Objects
sales_per_household_size <- special_demographics %>%
  inner_join(transactions) %>%
  group_by(household_size) %>%
  summarize(total_sales = sum(sales_value)) 
## Joining with `by = join_by(household_id)`
sales_per_age_group <- special_demographics %>%
  inner_join(transactions) %>%
  group_by(age) %>%
  summarize(total_sales = sum(sales_value))
## Joining with `by = join_by(household_id)`
print(sales_per_age_group)
## # A tibble: 6 × 2
##   age   total_sales
##   <ord>       <dbl>
## 1 19-24     125673.
## 2 25-34     453372.
## 3 35-44     724357.
## 4 45-54     971822.
## 5 55-64     173154.
## 6 65+       176601.
print(sales_per_household_size)
## # A tibble: 5 × 2
##   household_size total_sales
##   <ord>                <dbl>
## 1 1                  754649.
## 2 2                 1024643.
## 3 3                  383256.
## 4 4                  207407.
## 5 5+                 255024.
#Verification that objects were properly joined
sum(married_transactions$sales_value) / sum(transactions$sales_value)
## [1] 0.2653885
sum(singles_transactions$sales_value) / sum(transactions$sales_value)
## [1] 0.2087272
sum(undefined_transactions$sales_value) / sum(transactions$sales_value) 
## [1] 0.5258843
# Comparing Summary Statistics for the created objects; for a high level analysis
summary(singles_transactions)
##  household_id         store_id          basket_id          product_id       
##  Length:314225      Length:314225      Length:314225      Length:314225     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##     quantity        sales_value       retail_disc        coupon_disc      
##  Min.   :    0.0   Min.   :  0.000   Min.   :  0.0000   Min.   : 0.00000  
##  1st Qu.:    1.0   1st Qu.:  1.250   1st Qu.:  0.0000   1st Qu.: 0.00000  
##  Median :    1.0   Median :  2.000   Median :  0.0000   Median : 0.00000  
##  Mean   :  105.8   Mean   :  3.053   Mean   :  0.5216   Mean   : 0.01835  
##  3rd Qu.:    1.0   3rd Qu.:  3.440   3rd Qu.:  0.6500   3rd Qu.: 0.00000  
##  Max.   :35077.0   Max.   :224.990   Max.   :130.0200   Max.   :55.93000  
##                                                                           
##  coupon_match_disc       week       transaction_timestamp           
##  Min.   :0.000000   Min.   : 1.00   Min.   :2017-01-01 07:26:30.00  
##  1st Qu.:0.000000   1st Qu.:15.00   1st Qu.:2017-04-03 17:27:12.00  
##  Median :0.000000   Median :28.00   Median :2017-07-04 16:40:18.00  
##  Mean   :0.003412   Mean   :27.59   Mean   :2017-07-03 23:26:37.69  
##  3rd Qu.:0.000000   3rd Qu.:41.00   3rd Qu.:2017-10-03 15:45:39.00  
##  Max.   :2.500000   Max.   :53.00   Max.   :2017-12-31 23:01:20.00  
##                                                                     
##     age               income                 home_ownership  
##  19-24: 27741   50-74K   :87267   Renter            : 24461  
##  25-34: 55890   35-49K   :67806   Probable Renter   : 10204  
##  35-44: 72667   Under 15K:38449   Homeowner         :124497  
##  45-54:119997   25-34K   :36570   Probable Homeowner:  6414  
##  55-64: 15924   15-24K   :29356   Unknown           :     0  
##  65+  : 22006   75-99K   :18942   NA's              :148649  
##                 (Other)  :35835                              
##    marital_status   household_size          household_comp     kids_count    
##  Married  :     0   1 :245464      1 Adult Kids    : 43533   0      :265397  
##  Unmarried:314225   2 : 36732      1 Adult No Kids :245464   1      : 19785  
##  Unknown  :     0   3 : 19784      2 Adults Kids   :  5295   2      : 18184  
##                     4 :  6722      2 Adults No Kids: 19933   3+     : 10859  
##                     5+:  5523      Unknown         :     0   Unknown:     0  
##                                                                              
## 
summary(married_transactions)
##  household_id         store_id          basket_id          product_id       
##  Length:373193      Length:373193      Length:373193      Length:373193     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##     quantity        sales_value       retail_disc       coupon_disc      
##  Min.   :    0.0   Min.   :  0.000   Min.   : 0.0000   Min.   : 0.00000  
##  1st Qu.:    1.0   1st Qu.:  1.380   1st Qu.: 0.0000   1st Qu.: 0.00000  
##  Median :    1.0   Median :  2.290   Median : 0.0200   Median : 0.00000  
##  Mean   :  130.6   Mean   :  3.268   Mean   : 0.5484   Mean   : 0.02428  
##  3rd Qu.:    1.0   3rd Qu.:  3.560   3rd Qu.: 0.6900   3rd Qu.: 0.00000  
##  Max.   :89638.0   Max.   :840.000   Max.   :90.0500   Max.   :31.46000  
##                                                                          
##  coupon_match_disc       week       transaction_timestamp           
##  Min.   :0.000000   Min.   : 1.00   Min.   :2017-01-01 06:53:26.00  
##  1st Qu.:0.000000   1st Qu.:15.00   1st Qu.:2017-04-03 18:54:34.00  
##  Median :0.000000   Median :28.00   Median :2017-07-04 15:42:07.00  
##  Mean   :0.004538   Mean   :27.58   Mean   :2017-07-03 22:32:12.91  
##  3rd Qu.:0.000000   3rd Qu.:41.00   3rd Qu.:2017-10-02 12:00:37.00  
##  Max.   :4.050000   Max.   :53.00   Max.   :2017-12-31 22:44:38.00  
##                                                                     
##     age              income                 home_ownership     marital_status  
##  19-24:  7318   50-74K  :83188   Renter            : 21479   Married  :373193  
##  25-34: 59334   35-49K  :64627   Probable Renter   :   558   Unmarried:     0  
##  35-44:116760   75-99K  :63560   Homeowner         :326495   Unknown  :     0  
##  45-54:127304   125-149K:33780   Probable Homeowner:  4865                     
##  55-64: 32297   25-34K  :23993   Unknown           :     0                     
##  65+  : 30180   150-174K:23827   NA's              : 19796                     
##                 (Other) :80218                                                 
##  household_size          household_comp     kids_count    
##  1 :     0      1 Adult Kids    : 50470   0      :149502  
##  2 :186162      1 Adult No Kids :     0   1      :113677  
##  3 : 77017      2 Adults Kids   :173221   2      : 43502  
##  4 : 43502      2 Adults No Kids:149502   3+     : 66512  
##  5+: 66512      Unknown         :     0   Unknown:     0  
##                                                           
## 
summary(undefined_transactions)
##  household_id         store_id          basket_id          product_id       
##  Length:781889      Length:781889      Length:781889      Length:781889     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##     quantity         sales_value       retail_disc        coupon_disc      
##  Min.   :    0.00   Min.   :  0.000   Min.   :  0.0000   Min.   : 0.00000  
##  1st Qu.:    1.00   1st Qu.:  1.250   1st Qu.:  0.0000   1st Qu.: 0.00000  
##  Median :    1.00   Median :  2.000   Median :  0.0100   Median : 0.00000  
##  Mean   :   90.72   Mean   :  3.091   Mean   :  0.5411   Mean   : 0.01475  
##  3rd Qu.:    1.00   3rd Qu.:  3.490   3rd Qu.:  0.6700   3rd Qu.: 0.00000  
##  Max.   :41833.00   Max.   :499.990   Max.   :100.0000   Max.   :37.93000  
##                                                                            
##  coupon_match_disc       week       transaction_timestamp          
##  Min.   :0.000000   Min.   : 1.00   Min.   :2017-01-01 07:56:33.0  
##  1st Qu.:0.000000   1st Qu.:14.00   1st Qu.:2017-03-30 18:42:18.0  
##  Median :0.000000   Median :27.00   Median :2017-06-29 23:42:49.0  
##  Mean   :0.002272   Mean   :27.18   Mean   :2017-07-01 03:20:01.0  
##  3rd Qu.:0.000000   3rd Qu.:40.00   3rd Qu.:2017-10-01 18:45:03.0  
##  Max.   :7.700000   Max.   :53.00   Max.   :2017-12-31 22:50:03.0  
##                                                                    
##     age             income                  home_ownership     marital_status  
##  19-24:  8854   50-74K : 32464   Renter            :  2244   Married  :     0  
##  25-34: 30087   35-49K : 30408   Probable Renter   :     0   Unmarried:     0  
##  35-44: 33797   75-99K : 18430   Homeowner         : 85545   Unknown  :     0  
##  45-54: 56082   15-24K : 14531   Probable Homeowner:  1181   NA's     :781889  
##  55-64:  4526   25-34K : 14248   Unknown           :     0                     
##  65+  :  8086   (Other): 31351   NA's              :692919                     
##  NA's :640457   NA's   :640457                                                 
##  household_size          household_comp     kids_count    
##  1   :     0    1 Adult Kids    :     0   0      : 93933  
##  2   : 93933    1 Adult No Kids :     0   1      : 26181  
##  3   : 26181    2 Adults Kids   : 47499   2      : 17016  
##  4   : 17016    2 Adults No Kids: 93933   3+     :  4302  
##  5+  :  4302    Unknown         :     0   Unknown:     0  
##  NA's:640457    NA's            :640457   NA's   :640457  
## 
# Comparison between transaction objects to determine most lucrative customer type
married_average_transactions <- married_transactions %>%
  select(income, household_id,transaction_timestamp) %>%
  mutate(N = 1) %>%
  group_by(income) %>%
  summarise(sum(N)/length(unique(household_id))) %>%
  rename("average_transactions" = 'sum(N)/length(unique(household_id))' )

married_average_transactions
## # A tibble: 12 × 2
##    income    average_transactions
##    <ord>                    <dbl>
##  1 Under 15K                 937.
##  2 15-24K                    745.
##  3 25-34K                   1043.
##  4 35-49K                    965.
##  5 50-74K                   1124.
##  6 75-99K                   1156.
##  7 100-124K                  977.
##  8 125-149K                 1408.
##  9 150-174K                 1702.
## 10 175-199K                 1156.
## 11 200-249K                  830 
## 12 250K+                    1634.
singles_average_transactions <- singles_transactions %>%
  select(income, household_id,transaction_timestamp) %>%
  mutate(N = 1) %>%
  group_by(income) %>%
  summarise(sum(N)/length(unique(household_id))) %>%
  rename("singles_average_transactions" = 'sum(N)/length(unique(household_id))' )

singles_average_transactions
## # A tibble: 11 × 2
##    income    singles_average_transactions
##    <ord>                            <dbl>
##  1 Under 15K                        1131.
##  2 15-24K                            863.
##  3 25-34K                            914.
##  4 35-49K                            929.
##  5 50-74K                            992.
##  6 75-99K                            824.
##  7 100-124K                          991.
##  8 125-149K                          957.
##  9 150-174K                         1175.
## 10 175-199K                         1307.
## 11 250K+                            1802.
undefined_average_transactions <- undefined_transactions %>%
  select(income, household_id,transaction_timestamp) %>%
  mutate(N = 1) %>%
  group_by(income) %>%
  summarise(sum(N)/length(unique(household_id))) %>%
  rename("undefined_average_transactions" = 'sum(N)/length(unique(household_id))' )

undefined_average_transactions
## # A tibble: 11 × 2
##    income    undefined_average_transactions
##    <ord>                              <dbl>
##  1 Under 15K                          1000.
##  2 15-24K                              908.
##  3 25-34K                             1018.
##  4 35-49K                              950.
##  5 50-74K                             1082.
##  6 75-99K                             1024.
##  7 125-149K                           1324.
##  8 150-174K                           1149 
##  9 175-199K                           1208.
## 10 250K+                              1470 
## 11 <NA>                                384.

To gain a deeper understanding of customer spending patterns, we examined household size and age group across these segments. We categorized households by size, focusing on those with 2 or more members, as our initial exploration suggested they contributed the highest sales. Additionally, we analyzed customer age groups, paying particular attention to the 35-54 age group, which showed the highest engagement and spending across all marital statuses. This segmentation allowed us to refine our target audience and tailor marketing strategies accordingly

Findings for Demographic Category: Household Size

Our analysis revealed several key insights that significantly impact strategic marketing decisions. First, Married households consistently contributed the highest percentage of total sales. The driving force behind this was larger households with 2 or more members, which exhibited the highest average transaction values and total sales. This segment’s purchasing power underscores the importance of targeting married customers with larger household sizes, as they are more likely to yield higher returns. Consequently, marketing efforts should focus on tailored campaigns for this demographic to maximize revenue growth.

# Plots per household size
married_transactions %>% #Married Household
  group_by(household_size) %>%
  summarize(total_sales = sum(sales_value)) %>%
  arrange(desc(total_sales)) %>%
  ggplot(aes(x = factor(household_size), y = total_sales, fill = factor(household_size))) +
  geom_col() +
  labs(title = "Total Sales by Household Size",
       x = "Household Size",
       y = "Total Sales",
       fill = "Household Size") +
  theme_minimal() +
  scale_fill_brewer(palette = "Set3") + 
  scale_y_continuous(labels = dollar_format(prefix = "$", big.mark = ","))

In contrast, Singles (Unmarried) showed moderate sales but with lower average transaction values. The spending patterns in this segment did not demonstrate significant revenue growth, leading us to deprioritize this demographic for targeted marketing campaigns. However, maintaining a baseline level of engagement remains important to retain brand loyalty within this group.

singles_transactions %>% #Single Household
  group_by(household_size) %>%
  summarize(total_sales = sum(sales_value)) %>%
  arrange(desc(total_sales)) %>%
  ggplot(aes(x = factor(household_size), y = total_sales, fill = factor(household_size))) +
  geom_col() +
  labs(title = "Total Sales by Household Size",
       x = "Household Size",
       y = "Total Sales",
       fill = "Household Size") +
  theme_minimal() +
  scale_fill_brewer(palette = "Set3") + 
  scale_y_continuous(labels = dollar_format(prefix = "$", big.mark = ","))

The Undefined demographic presented an interesting opportunity. This group includes customers with no registered marital status (NA) but with household sizes of 2 or more members, suggesting they might be married but unregistered in the system. This demographic displayed substantial spending patterns, contributing to a significant increase in overall sales. The data indicates that large household sizes within the undefined group behave similarly to married households in terms of spending behavior. Therefore, it is strategic to include them in campaigns targeting larger households, ensuring no potential high-value customers are missed.

undefined_transactions %>% #Undefined Household
  group_by(household_size) %>%
  summarize(total_sales = sum(sales_value)) %>%
  arrange(desc(total_sales)) %>%
  ggplot(aes(x = factor(household_size), y = total_sales, fill = factor(household_size))) +
  geom_col() +
  labs(title = "Total Sales by Household Size",
       x = "Household Size",
       y = "Total Sales",
       fill = "Household Size") +
  theme_minimal() +
  scale_fill_brewer(palette = "Set3") + 
  scale_y_continuous(labels = dollar_format(prefix = "$", big.mark = ","))

## Results

Our Household Size Analysis confirmed that larger households consistently drive higher total sales across all segments. Married households with 2+ members were the most profitable, while smaller households (less than 2 members) exhibited lower spending patterns. This reinforces the strategic decision to prioritize marketing campaigns for larger households to maximize revenue.

Findings for Age Group

The Age Group Analysis provided further clarity on customer engagement and spending patterns. The 35-54 age group emerged as the most lucrative, demonstrating the highest engagement and total sales across all marital statuses. This demographic is highly responsive to targeted campaigns, making them a critical audience for future marketing efforts

# Plots per age group
married_transactions %>% #Married in age group
  group_by(age) %>%
  summarize(total_sales = sum(sales_value)) %>%
  arrange(desc(total_sales)) %>%
  ggplot(aes(x = factor(age), y = total_sales, fill = factor(age))) +
  geom_col() +
  labs(title = "Total Sales by Age Group",
       x = "Age",
       y = "Total Sales",
       fill = "Age Group") +
  theme_minimal() +
  scale_fill_brewer(palette = "Pastel1") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) + 
  scale_y_continuous(labels = dollar_format(prefix = "$", big.mark = ","))

singles_transactions %>% #Single Age Group
  group_by(age) %>%
  summarize(total_sales = sum(sales_value)) %>%
  arrange(desc(total_sales)) %>%
  ggplot(aes(x = factor(age), y = total_sales, fill = factor(age))) +
  geom_col() +
  labs(title = "Total Sales by Age Group",
       x = "Age",
       y = "Total Sales",
       fill = "Age Group") +
  theme_minimal() +
  scale_fill_brewer(palette = "Pastel1") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) + 
  scale_y_continuous(labels = dollar_format(prefix = "$", big.mark = ","))

undefined_transactions %>% #Undefined Age Group
  group_by(age) %>%
  summarize(total_sales = sum(sales_value)) %>%
  arrange(desc(total_sales)) %>%
  ggplot(aes(x = factor(age), y = total_sales, fill = factor(age))) +
  geom_col() +
  labs(title = "Total Sales by Age Group",
       x = "Age",
       y = "Total Sales",
       fill = "Age Group") +
  theme_minimal() +
  scale_fill_brewer(palette = "Pastel1") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) + 
  scale_y_continuous(labels = dollar_format(prefix = "$", big.mark = ",")) 

# Campaign Performance Analysis

In parallel, we performed a campaign performance analysis to evaluate the effectiveness of various campaign types across different customer segments. This analysis aimed to understand how campaign type, household size, and age group influence sales performance. Using boxplots and bar charts, we visualized sales distribution and campaign effectiveness. Boxplots provided a detailed view of sales distribution by campaign type and demographic, while bar charts illustrated total sales comparisons across segments. This dual approach enabled us to identify which campaign types resonated best with high-value customer groups.

Findings for Campaign Performance

Our Campaign Performance Analysis highlighted the significant impact of household size and age group on campaign success. Boxplots visualized clear evidence that households with 2+ members consistently yielded the highest sales, and the 35-54 age group performed best across all campaign types. Interestingly, Campaign Type A resonated most with married customers, especially within the 35-54 age group, confirming its effectiveness for high-value segments

# For Created Marital Category Transactions
married_transactions %>%
  left_join(campaigns, by = "household_id") %>% 
  group_by(campaign_id) %>%
  summarize(top_campaign = sum(sales_value, na.rm = TRUE)) %>%
  mutate(campaign_id = as.numeric(campaign_id)) %>%
  arrange(campaign_id) %>%
  ggplot(aes(x = factor(campaign_id), y = top_campaign)) +
  geom_col(fill = "steelblue") +
  labs(title = "Married Sales per Campaign", x = "Campaign ID", y = "Total Revenue") +
  scale_y_continuous(labels = dollar_format(prefix = "$", big.mark = ",")) +
  theme_minimal()

Results

By narrowing our target to married households with 2+ members within the 35-54 age group, we can maximize marketing effectiveness and return on investment

singles_transactions %>%
  left_join(campaigns, by = "household_id") %>% 
  group_by(campaign_id) %>%
  summarize(top_campaign = sum(sales_value, na.rm = TRUE)) %>%
  mutate(campaign_id = as.numeric(campaign_id)) %>%
  arrange(campaign_id) %>%
  ggplot(aes(x = factor(campaign_id), y = top_campaign)) +
  geom_col(fill = "steelblue") +
  labs(title = "Singles Sales per Campaign", x = "Campaign ID", y = "Total Revenue") +
  scale_y_continuous(labels = dollar_format(prefix = "$", big.mark = ",")) +
  theme_minimal()

Results

The analysis also showed that smaller households were less engaged and had lower average sales per campaign, reinforcing our focus on larger households for future campaigns

undefined_transactions %>%
  left_join(campaigns, by = "household_id") %>% 
  group_by(campaign_id) %>%
  summarize(top_campaign = sum(sales_value, na.rm = TRUE)) %>%
  mutate(campaign_id = as.numeric(campaign_id)) %>%
  arrange(campaign_id) %>%
  ggplot(aes(x = factor(campaign_id), y = top_campaign)) +
  geom_col(fill = "steelblue") +
  labs(title = "Undefined Sales per Campaign", x = "Campaign ID", y = "Total Revenue") +
  scale_y_continuous(labels = dollar_format(prefix = "$", big.mark = ",")) +
  theme_minimal()

plot_tm <- target_market %>%
  left_join(campaigns, by = "household_id") %>%
  left_join(campaign_descriptions, by = "campaign_id") %>%
  mutate(log_sales = log1p(sales_value)) %>%
  sample_frac(0.1) %>%
  filter(!is.na(campaign_type)) %>%
  filter(!is.na(household_size))

plot_tm %>%
  ggplot(aes(x = campaign_type, y = sales_value, fill = factor(household_size))) +  
  geom_boxplot(outlier.shape = NA, alpha = 0.3) +
  geom_jitter(color = "black", size = 0.25, alpha = 0.5, 
              position = position_dodge(width = 0.8)) +  
  scale_fill_brewer(palette = "Set1") +  
  labs(title = "Target Market Campaign by Household Size",
       x = "Campaign Type",
       y = "Total Sales",
       fill = "Household Size") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),  
        plot.title = element_text(hjust = 0.5, face = "bold"),
        legend.position = "top") +
  scale_y_continuous(labels = dollar_format(prefix = "$", big.mark = ","), 
                   limits = c(0, 50)) +
  coord_flip()

plot_ntm <- singles_transactions %>%
  left_join(campaigns, by = "household_id") %>%
  left_join(campaign_descriptions, by = "campaign_id") %>%
  mutate(log_sales = log1p(sales_value)) %>%
  sample_frac(0.1) %>%
  filter(!is.na(campaign_type)) %>%
  filter(!is.na(household_size))

plot_ntm %>%
  ggplot(aes(x = campaign_type, y = sales_value, fill = factor(household_size))) +  
  geom_boxplot(outlier.shape = NA, alpha = 0.3) +
  geom_jitter(color = "black", size = 0.25, alpha = 0.5, 
              position = position_dodge(width = 0.8)) +  
  scale_fill_brewer(palette = "Set1") +  
  labs(title = "Campaign Peformance for Singles",
       x = "Campaign Type",
       y = "Total Sales",
       fill = "Household Size") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),  
        plot.title = element_text(hjust = 0.5, face = "bold"),
        legend.position = "top") +
  scale_y_continuous(labels = dollar_format(prefix = "$", big.mark = ","), 
                     limits = c(0, 50)) +
  coord_flip()

plot_ta <- target_age %>%
  left_join(campaigns, by = "household_id") %>%
  left_join(campaign_descriptions, by = "campaign_id") %>%
  mutate(log_sales = log1p(sales_value)) %>%
  sample_frac(0.1) %>%
  filter(!is.na(campaign_type))

plot_ta %>%
  ggplot(aes(x = campaign_type, y = sales_value, fill = factor(age))) +  
  geom_boxplot(outlier.shape = NA, alpha = 0.3) +
  geom_jitter(color = "black", size = 0.25, alpha = 0.5, 
              position = position_dodge(width = 0.8)) +  
  scale_fill_brewer(palette = "Set1") +  
  labs(title = "Campaign Peformance by Target Age Groups",
       x = "Campaign Type",
       y = "Total Sales",
       fill = "Age Group") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),  
        plot.title = element_text(hjust = 0.5, face = "bold"),
        legend.position = "top") +
  scale_y_continuous(labels = dollar_format(prefix = "$", big.mark = ","), 
                     limits = c(0, 50)) +
  coord_flip()

plot_nta <- non_target_age %>%
  left_join(campaigns, by = "household_id") %>%
  left_join(campaign_descriptions, by = "campaign_id") %>%
  mutate(log_sales = log1p(sales_value)) %>%
  sample_frac(0.1) %>%
  filter(!is.na(campaign_type))

plot_nta %>%
  ggplot(aes(x = campaign_type, y = sales_value, fill = factor(age))) +  
  geom_boxplot(outlier.shape = NA, alpha = 0.3) +
  geom_jitter(color = "black", size = 0.25, alpha = 0.5, 
              position = position_dodge(width = 0.8)) +  
  scale_fill_brewer(palette = "Set1") +  
  labs(title = "Campaign Peformance by Nontarget Age Group",
       x = "Campaign Type",
       y = "Total Sales",
       fill = "Age Group") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),  
        plot.title = element_text(hjust = 0.5, face = "bold"),
        legend.position = "top") +
  scale_y_continuous(labels = dollar_format(prefix = "$", big.mark = ","), 
                     limits = c(0, 50)) +
  coord_flip()

Top 5 Performance Analysis

Furthermore, we conducted a Top Store Performance Analysis to identify the top 5 stores contributing the highest revenue during campaigns. This analysis helped us pinpoint high-performing stores that consistently showed high customer engagement and spending. Understanding store-level performance enabled us to optimize promotional efforts, ensuring maximum return on investment

#Finding the top five stores to promote our campaign 

top_5_stores <- transactions %>%
  left_join(campaigns, by = "household_id") %>%
  group_by(store_id) %>%
  summarise(total_sales = sum(sales_value, na.rm = TRUE), .groups = "drop") %>%
  arrange(desc(total_sales)) %>%
  slice_head(n = 5) %>%
  pull(store_id)

transactions %>%
  left_join(campaigns, by = "household_id") %>%
  filter(store_id %in% top_5_stores) %>%  
  group_by(store_id, campaign_id) %>%
  summarise(top_store = sum(sales_value, na.rm = TRUE), .groups = "drop") %>%
  mutate(campaign_id = as.numeric(campaign_id)) %>%
  arrange(desc(top_store)) %>%
  ggplot(aes(x = factor(campaign_id), y = top_store, group = store_id, color = store_id)) +
  geom_line(size = 1.2) +  
  geom_point(size = 3) +  
  facet_wrap(~ store_id) +  
  scale_color_brewer(palette = "Dark2") + 
  scale_x_discrete(breaks = function(x) x[seq(1, length(x), by = 2)]) +
  labs(title = "Top 5 Stores by Campaign Performance",
       x = "Campaign ID", y = "Total Revenue",
       color = "Store ID") +
  scale_y_continuous(labels = dollar_format(prefix = "$", big.mark = ",")) +
  theme_minimal() +
  theme(legend.position = c(0.9, 0.1))

Results

The Top Store Performance Analysis revealed that the Top 5 stores consistently contributed the highest revenue during campaigns. These stores demonstrated high customer engagement and spending, making them ideal targets for promotional activities. By focusing on these high-performing stores, we can enhance campaign reach and maximize revenue growth

Summary

We strongly encourage Regork must focus marketing efforts on married households, especially those with 2+ members within the age group 35-54.. These customers consistently show the highest total sales and engagement leading to yield the best ROI. While the successful campaigns continue to use high-performing campaign types, this will reevaluate underperforming stores. These unperforming stores will have a likely chance to maximize engagement if they engage more with the profitable target market. In summary, Regork must consider (1) allocate marketing budget more efficiently, (2) enhance customer engagement by targeting high-value segments, (3) optimize campaign types to maximize ROI, and (4) leverage top-performing stores to boost sales during promotional periods. By focusing on the most lucrative customer segments and optimizing campaign strategies, Regork can maximize sales and build long-term customer loyalty