Introduction

Introduction

I have been tasked with identifying and analyzing potential growth opportunities where Regork can invest to increase revenue and profits. Regork has focused on three types of campaigns thus far: A, B, and C. Through analysis, I aim to uncover the true effectiveness of these campaigns, in order to select the optimal one.

Selecting the Optimal Campaign

Regork’s data outlines the behavior of consumers, but it is through analysis that this behavior can be visualized as patterns. Using data anylsis techniques through R, I have dissected Regork’s data, highlighting the most effective campaign, which is type A. Campaign A demonstrates high levels of sales and effectiveness, despite its lower volume of campaign run-time, indicating more optimal consumer targeting.

Required Packages

Required Packages

The following packages were used in the R project:

knitr::opts_chunk$set(echo = TRUE)
library(dplyr)
library(lubridate)
library(completejourney)
library(ggplot2)
library(RColorBrewer)
library(knitr)
  • DPLYR: Manipulating and transforming data/datasets
  • lubridate: Functions for utilizing dates and times
  • completejourney: Grocery store transactional data
  • ggplot2: Data plotting and visualization system
  • RColorBrewer: Creates visually appealing color palettes
  • knitr: Helps users knit markdowns

Exploratory Analysis

The Exploratory Analysis section presents the visualizations and data findings that have been utilized throughout the R project, as to address the business problem at hand.

Campaign Distribution

Campaign distribution is plotted below. Recognizing the proportion of distribution is important when measuring the overall impact and effectiveness of a given campaign type.

# Campaign Distribution
campaign_distribution <- campaign_descriptions %>%
  group_by(campaign_type) %>%
  summarize(total_campaigns = n_distinct(campaign_id), .groups = 'drop')
# pie calc
campaign_distribution <- campaign_distribution %>%
  mutate(percentage = total_campaigns / sum(total_campaigns) * 100)

# Campaign Distribution Pie
ggplot(campaign_distribution, aes(x = "", y = percentage, fill = campaign_type)) +
  geom_bar(stat = "identity", width = 1) +
  coord_polar(theta = "y") +
  theme_void() +
  labs(title = "Campaign Distribution (Total: 27)") +
  geom_text(aes(label = paste0(round(percentage, 1), "%")), 
            position = position_stack(vjust = 0.5)) +
  scale_fill_brewer(palette = "Set1")

When analyzing our campaign distribution, it is clear that out of the 27 campaigns ran throughout the year, 4 are Type A, 17 are Type B, and 6 are Type C. The distribution of campaigns is important, especially when trying to understand the overall impact of each campaign, respective to their occurrence.

Campaign Transactions

Campaign transaction data is plotted below. This data is important to visualize, as it helps us to understand the volume of consumer spending within a given year for each campaign.

total_transactions_summary 
## # A tibble: 3 × 2
##   campaign_type total_transactions
##   <ord>                      <int>
## 1 Type A                  15304858
## 2 Type B                   1327335
## 3 Type C                    610915
# transactional 
trans_data <- campaign_demographics %>%
  inner_join(household_transactions, by = "household_id")

ggplot(trans_data, aes(x = income, y = total_transactions, fill = campaign_type)) +
  geom_bar(stat = "identity", position = "dodge") +
  theme_minimal() +
  labs(title = "Number of Transactions Based on Income for Each Campaign Type",
       x = "Income",
       y = "Total Transactions") +
  scale_fill_brewer(palette = "Set1")

# Transactional Distrib
ggplot(total_transactions_summary, aes(x = campaign_type, y = total_transactions, fill = campaign_type)) +
  geom_bar(stat = "identity") +
  theme_minimal() +
  labs(title = "Total Transactions Summary for Each Campaign Type",
       x = "Campaign Type",
       y = "Total Transactions") +
  scale_fill_brewer(palette = "Set1") +
  theme(axis.text.x = element_text(size = 8)) +
  scale_y_continuous(labels = scales::comma)

Looking at the transactional frequency of each campaign type, it is clear that the majority of purchases take place during campaign A. The following is the specific transactional summary for each campaign type:

  • Type A: 15304858
  • Type B: 1327335
  • Type C: 610915

Campaign A shows a significantly higher number of transactions compared to the other campaigns. This suggests that Campaign A was very effective in driving customer engagement and sales (volume and frequency). Interestingly, the campaigns all follow similar behavior patterns when applied to purchase frequency related to income range. This indicates that income may not significantly influence campaign effectiveness. What is clear, however, is that Campaign A consistently shows a higher prevalence of purchases among all levels of income.

Campaign Spending

The following graphs show weekly spending by household size and campaign type on a weekly basis, as well as the total avg yearly sales for each respective campaign type. These visualizations are important when understanding the effectiveness of each campaign in terms of sales dollars.

# weekly spending
ggplot(weekly_spending, aes(x = week, y = total_spend, color = campaign_type)) +
  geom_line() +
  guides(color = guide_legend(title = "Campaign Type")) +
  scale_y_continuous(labels = scales::dollar) +
  theme(legend.justification = 'centre',
        legend.position = 'bottom',
        legend.direction = "horizontal",
        legend.key.height = unit(0.5, "cm"),
        legend.key.width = unit(0.5,"cm")) +
  labs(
    x = "Week",
    y = "Total Spending",
    title = "Weekly Spending by Household Size and Campaign Type", 
    subtitle = "This data visualizes household spending on a weekly basis",
    ) +
  facet_wrap(~ household_size) +
  theme(
    plot.subtitle = element_text(face = "italic"),
    plot.caption = element_text(hjust = 1, size = 7),
    legend.title = element_text(size = 9),
    legend.text = element_text(size = 7)
  )

#sales value
campaign_type_summary %>%
  ggplot(aes(x = "", y = total_sales_value, fill = campaign_type)) +
  geom_bar(stat = "identity", width = 1) +
  coord_polar(theta = "y") +
  labs(title = "Total Avg Sales Value by Campaign Type") +
  theme_void() +
  theme(legend.title = element_blank()) +
  geom_text(aes(label = paste0("$", format(total_sales_value, big.mark = ","))), 
            position = position_stack(vjust = 0.5), color = "white")

Indicative by the graphs above, weekly household spending is consistently higher for Campaign A compared to other campaigns. Additionally, households with sizes of 1-2 show higher weekly sales than those of 3 or more. Shown by the pie chart, Campaign A has a higher yearly sales value compared to Campaigns B and C. Comparatively, Campaign A accounts for 56% of all average yearly sales value. Campaign B accounts for 31% and Campaign C accounts for 13%.

Top Campaign Departments

This graph visualizes the top 5 key departments within Regork and the sales for each respective campaign type. This is helpful when analyzing customer preference and behavior consistency.

#topdept
ggplot(top_departments_per_campaign, aes(x = campaign_type, y = total_sales_value, fill = department)) +
  geom_col(position = "dodge") +
  scale_fill_brewer(palette = "Set3") +
  labs(title = "Top Departments per Campaign Type",
       x = "Campaign Type",
       y = "Total Sales Value") +
  theme_minimal()

Campaign A demonstrates a higher total sales value across each of the top five Regork departments. It is important to note that the Grocery department is the highest performing category. This trend is consistent among the other campaign types, indicating that customer preferences are consistent, yet Campaign A outperforms the others.

Summary

In the early stages of my analysis, I wanted to understand how Regork can increase sales through campaigns, hoping to find the most effective campaign type to allocate funds towards. Using methods of analysis within R, I was able to evaluate cross-sections of data sets, specifically targeting and measuring campaign effectiveness. Through careful analysis and visualization, I have created a clear outline of the overall effectiveness and implications of the three campaign types.

Understanding the Implications of the Analysis

The analysis of Campaign Distribution, Campaign Transactions, Campaign Spending, and Top Campaign Departments provides critical and visually insightful information regarding consumer behavior and campaign effectiveness. First, despite Campaign A accounting for only 14.8% of all campaigns run throughout the year, compared to Campaign B’s 63% and Campaign C’s 22.2%, it indicates a disproportionately higher impact. This suggests that lower volume, more specialized/targeted campaigns can be more effective than a higher volume of less specialized/targeted ones. Additionally, Campaign A’s significantly higher total transaction count (15,304,858) compared to Campaign B (1,327,335) and Campaign C (610915) indicates positive consumer engagement and effective promotional strategies within Campaign A.

Next, Campaign A accounts for 56% of the average yearly sales, demonstrating dominance in sales value. This ultimately showcases the effectiveness of Campaign A with regard for driving customer sales. Additionally, among the top five departments (drug, fuel, grocery, meat, and produce), Campaign A outperforms the others. The grocery department is consistently the highest across all campaigns, indicating consistent consumer preferences and behaviors, but with Campaign A’s performance, it is clearly the more effective campaign.

My Recommendation

Based on the implications of my Regork campaign analysis, it is clear that Campaign A, despite its lower volume, significantly outperforms Campaigns B and C. It excels across three major facets: transactions, sales value, and overall effectiveness across the top 5 key departments. In order for Regork to capitalize on my findings, I propose 3 major improvements through an increase in funding for campaign A: enhancing effectiveness, boosting sales output/volume, and optimizing strategies. First, I find it critical to increase funding for Campaign A. Allocating additional funding to the campaign will significantly increase its proven effectiveness. Secondly, with a demonstrated higher yearly sales value (accounting for 56% of yearly average sales), as well as proven higher yearly transactions (13,366,608>B+C) increased funding will boost sales output. Finally, an increase in funding will enhance the overall optimal nature of Campaign A, demonstrated by the higher levels of customer engagement (measured through transactions) and increased sales across key Regork departments. It is important to note that in order to increase optimization, it is crucial to focus efforts on the smaller households, since weekly spending is much higher for households of 1-2 people who engage with campaign A. By implementing my recommendations, Regork will maximize the overall impact and effectiveness of Campaign A, leading to higher engagement and sales.

Limiations

There are a few notable limitations within this analysis. First, the given data sets only contain data for one year. With more yearly transaction and campaign data, I could highlight clearer trends from the three campaign types. Second, while there is correlation among certain campaigns and their effectiveness, without economic measures present, it is difficult to distinguish correlation from causation for the given year. Finally, without the use of ML, the analysis lacks an algorithmic, trained approach, which would enhance the overall findings.