Optimizing High-Income Consumer Strategies

The goal of this project analysis for Regork is to identify the most profitable strategy for growth in the high-income market segment. My hypothesis is that this economic class contributes higher purchasing volumes than every other segment. Therefore, targeting growth in this segment will lead to higher profits overall.

Firstly, I will Identify the packages and data sets used throughout my project analysis.

Packages and Data Sets

library(ggplot2)
library(tidyverse)
library(dplyr)
library(completejourney)
library(ggthemes)
library(gridExtra)

transactions <- get_transactions()
promotions <- get_promotions()

I have joined gridExtra as it will allow me to display multiple plots in a way that works best with my visualizations.

Now I will join the necessary variables to the transactions data set. For this project I am going to be looking at how sales transactions are correlated with purchased products, brands, and certain promotions.

Joining Transactions

transactions <- transactions %>%
  left_join(products %>% select(product_id, brand, product_type), by = "product_id")

transactions <- transactions %>%
  left_join(demographics %>% select(household_id, income), by = "household_id")



head(transactions)
## # A tibble: 6 × 14
##   househ…¹ store…² baske…³ produ…⁴ quant…⁵ sales…⁶ retai…⁷ coupo…⁸ coupo…⁹  week
##   <chr>    <chr>   <chr>   <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <int>
## 1 900      330     311985… 1095275       1    0.5     0          0       0     1
## 2 900      330     311985… 9878513       1    0.99    0.1        0       0     1
## 3 1228     406     311986… 1041453       1    1.43    0.15       0       0     1
## 4 906      319     311987… 1020156       1    1.5     0.29       0       0     1
## 5 906      319     311987… 1053875       2    2.78    0.8        0       0     1
## 6 906      319     311987… 1060312       1    5.49    0.5        0       0     1
## # … with 4 more variables: transaction_timestamp <dttm>, brand <fct>,
## #   product_type <chr>, income <ord>, and abbreviated variable names
## #   ¹​household_id, ²​store_id, ³​basket_id, ⁴​product_id, ⁵​quantity, ⁶​sales_value,
## #   ⁷​retail_disc, ⁸​coupon_disc, ⁹​coupon_match_disc

Now that I have completed adding the necessary variables to my data frame, I am going to confirm that my hypothesis is true.

To display the correlation that high economic spending is correlated with a greater income, I derived average spending per household and visualized it within each income segment.

Sales Value by Household Income

income_sales <- transactions %>%
  filter(!is.na(income)) %>%
  group_by(income) %>%
  summarize(
    total_sales = sum(sales_value, na.rm = TRUE),
    household_count = n_distinct(household_id)
  ) %>%
  mutate(avg_sales_per_household = total_sales / household_count) %>%
  arrange(factor(income, levels = c("Under 15K", "15-24K", "25-34K", "35-49K", 
                                    "50-74K", "75-99K", "100-124K", "125-149K", 
                                    "150-174K", "175-199K", "200-249K", "250K+")))

ggplot(income_sales, aes(x = reorder(income, avg_sales_per_household), y = avg_sales_per_household, fill = income)) +
  geom_bar(stat = "identity", color = "black", show.legend = FALSE) +
  coord_flip() +
  labs(title = "Average Sales Value per Household by Income Category",
       x = "Income Category",
       y = "Avg Sales per Household ($)") +
  theme_minimal()

Scatter Plot

income_sales <- transactions %>%
  filter(!is.na(income)) %>%
  group_by(income) %>%
  summarize(
    total_sales = sum(sales_value, na.rm = TRUE),
    household_count = n_distinct(household_id)
  ) %>%
  mutate(avg_sales_per_household = total_sales / household_count) %>%
  arrange(factor(income, levels = c("Under 15K", "15-24K", "25-34K", "35-49K", 
                                    "50-74K", "75-99K", "100-124K", "125-149K", 
                                    "150-174K", "175-199K", "200-249K", "250K+")))


ggplot(income_sales, aes(x = income, y = avg_sales_per_household)) + 
  geom_point(size = 4, color = "blue", alpha = 0.7) + 
  geom_line(group = 1, color = "darkblue", linetype = "dashed") +  
  labs(title = "Average Sales per Household by Income Category",
       x = "Income Category",
       y = "Avg Sales per Household ($)") +
  theme_minimal(base_size = 14) +  
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold", size = 16),
    axis.text.x = element_text(angle = 45, hjust = 1),  
    axis.text.y = element_text(size = 12),
    axis.title.x = element_text(face = "bold"),
    axis.title.y = element_text(face = "bold")
  )

My hypothesis that higher income is correlated with higher spending is confirmed with $250K+ being a much higher spending than each other of the income segment.

Furthermore, I notice a dip in spending within the segment of 200-249K income. It is interesting that those with an average household spending of under 15k spend more than that segment. This is an important takeaway from my data that will help me with my analysis going forward.

Since there is a significant dip in spending among the category of 200-249k, it does not make since to target this segment in my analysis. Since we are looking at the highest spending segment, I am going to continue my analysis by filtering out data to focus on the 250K+ income sector.

Filtering Transactions

transactions_250K <- transactions %>%
  filter(income == "250K+")

transactions_X_250K <- transactions %>%
  filter(income != "250K+" | is.na(income))

Now that I have 2 data frames filtered for and without 250K income, I can begin looking for growth opportunities in my data. The first question I ask is how Regork is performing in each quarter for this economic segment.

To complete this segment I first filter the weeks of my data into 4 separate variables (Q1, Q2, Q3, Q4) as weeks (1-13,14-26,27-39,40-52).I assign a value of 1 in these variables so it can be easily mutated with sales value and other statistics to better represent purchasing power for each segment.

How is the $250K+ income segment performing in each quarter

Mutating transactions to Include Quarters

transactions_X_250K <- transactions_X_250K %>%
  mutate(
    Q1 = ifelse(week >= 1  & week <= 13, 1, 0),
    Q2 = ifelse(week >= 14 & week <= 26, 1, 0),
    Q3 = ifelse(week >= 27 & week <= 39, 1, 0),
    Q4 = ifelse(week >= 40 & week <= 52, 1, 0)
  )

transactions_250K <- transactions_250K %>%
  mutate(
    Q1 = ifelse(week >= 1  & week <= 13, 1, 0),
    Q2 = ifelse(week >= 14 & week <= 26, 1, 0),
    Q3 = ifelse(week >= 27 & week <= 39, 1, 0),
    Q4 = ifelse(week >= 40 & week <= 52, 1, 0)
  )

Quarter Performance for salary below 250K

ggplot(transactions_X_250K, aes(x = case_when(
  Q1 == 1 ~ "Q1",
  Q2 == 1 ~ "Q2",
  Q3 == 1 ~ "Q3",
  Q4 == 1 ~ "Q4"
), weight = sales_value)) +
  geom_bar(fill = "red", color = "black", width = 0.7) +  # Blue with black outline
  labs(title = "Quarterly Sales Value Distribution (Below 250K)",
       x = "Quarter",
       y = "Total Sales Value ($)") +
  theme_minimal(base_size = 16) +  # Increase base text size
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold", size = 18),  # Center title
    axis.text.x = element_text(size = 14, face = "bold"),  # Bigger, bolder x-axis labels
    axis.text.y = element_text(size = 12),  
    axis.title.x = element_text(face = "bold"),
    axis.title.y = element_text(face = "bold"),
    panel.grid.major = element_blank(),  # Remove major grid lines
    panel.grid.minor = element_blank()  # Remove minor grid lines
  )

## Quarter Performance for salary above 250K

ggplot(transactions_250K, aes(x = case_when(
  Q1 == 1 ~ "Q1",
  Q2 == 1 ~ "Q2",
  Q3 == 1 ~ "Q3",
  Q4 == 1 ~ "Q4"
), weight = sales_value)) +
  geom_bar(fill = "blue", color = "black", width = 0.7) +  # Blue with black outline
  labs(title = "Quarterly Sales Value Distribution (250K+)",
       x = "Quarter",
       y = "Total Sales Value ($)") +
  theme_minimal(base_size = 16) +  # Increase base text size
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold", size = 18),  # Center title
    axis.text.x = element_text(size = 14, face = "bold"),  # Bigger, bolder x-axis labels
    axis.text.y = element_text(size = 12),  
    axis.title.x = element_text(face = "bold"),
    axis.title.y = element_text(face = "bold"),
    panel.grid.major = element_blank(),  # Remove major grid lines
    panel.grid.minor = element_blank()  # Remove minor grid lines
  )

After displaying the performance for each group one quarter that stood out to me was Q3. For the segment below 250K, it seems that Q1 is the lowest performing with quarters 2-4 on par with each other.

However, for the segment below 250K, there is a significant dip in performance in Q3. Because of this dip in quarter 3, I am going to focus on certain products or promotions that may be beneficial in tracking performance for this sector To begin looking at this section, I will compare data specifically for quarter 3. The factor I am going to consider is the products during quarter 3. My idea is to identify a top performing product from quarter 3 in the high income segment that we can increase marketing to give higher sales and revenue.

Quarter 3 top 5 Products Metrics

transactions_250K_Q3 <- transactions_250K %>%
  filter(Q3 == 1)



transactions_X_250K_Q3 <- transactions_X_250K %>%
  filter(Q3 == 1)

get_top_5_products <- function(df) {
  df %>%
    group_by(product_type) %>%
    summarise(count = n()) %>%
    arrange(desc(count)) %>%
    slice_head(n = 5)  # Keep only top 5 products
}

# Filter top 5 products for each dataset
top_products_X_250K_Q3 <- get_top_5_products(transactions_X_250K_Q3)
top_products_250K_Q3 <- get_top_5_products(transactions_250K_Q3)


plot1 <- ggplot(top_products_X_250K_Q3, aes(x = reorder(product_type, count), y = count, fill = product_type)) +
  geom_bar(stat = "identity", width = 0.6) +  
  labs(title = "Top 5 Products (Below 250K) - Q3",
       x = "Product Type",
       y = NULL) +  
  coord_flip() +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold", size = 12, margin = margin(t = 5, b = 5)),  
    axis.text.x = element_blank(),  
    axis.ticks.x = element_blank(), 
    axis.text.y = element_text(size = 8, face = "bold"),  
    axis.title.y = element_text(size = 10, face = "bold"),
    plot.margin = margin(t = 10, r = 10, b = 10, l = 10),
    legend.position = "none"
  )

plot2 <- ggplot(top_products_250K_Q3, aes(x = reorder(product_type, count), y = count, fill = product_type)) +
  geom_bar(stat = "identity", width = 0.6) +  
  labs(title = "Top 5 Products (250K+) - Q3",
       x = "Product Type",
       y = NULL) +  
  coord_flip() +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold", size = 12, margin = margin(t = 5, b = 5)),  
    axis.text.x = element_blank(),  
    axis.ticks.x = element_blank(), 
    axis.text.y = element_text(size = 8, face = "bold"),  
    axis.title.y = element_text(size = 10, face = "bold"),
    plot.margin = margin(t = 10, r = 10, b = 10, l = 10),
    legend.position = "none"
  )

grid.arrange(plot1, plot2, ncol = 2)

With this Correlation one area that interests me is the RTS SOUP: CHUNKY/HOMESTYLE. From the visualization we can see that this is a product that is number 2 in demand during quarter 3 for the 250K economic sector.I strongly suggest that Regork pushes advertising and marketing for this product in Q2/Q3 as it could lead to higher returns for Regork. A price increase would be something Regork could heavily consider additionally since those with higher salaries are more likely to spend on these products. Another factor I looked into this segment was if high income earners are more likely to purchase based upon the brand of which there product comes from. This data is shown in a pie chart below.

brand_spending_250K <- transactions_250K %>%
  group_by(brand) %>%
  summarize(total_sales = sum(sales_value, na.rm = TRUE))


brand_spending_X_250K <- transactions_X_250K %>%
  group_by(brand) %>%
  summarize(total_sales = sum(sales_value, na.rm = TRUE))


pie_250K <- ggplot(brand_spending_250K, aes(x = "", y = total_sales, fill = brand)) +
  geom_bar(stat = "identity", width = 1) +
  coord_polar("y", start = 0) +
  labs(title = "Brand Spending Distribution (250K+)", fill = "Brand") +
  theme_minimal() +
  theme(axis.text.x = element_blank(),
        axis.ticks = element_blank(),
        panel.grid = element_blank())


pie_X_250K <- ggplot(brand_spending_X_250K, aes(x = "", y = total_sales, fill = brand)) +
  geom_bar(stat = "identity", width = 1) +
  coord_polar("y", start = 0) +
  labs(title = "Brand Spending Distribution (Below 250K)", fill = "Brand") +
  theme_minimal() +
  theme(axis.text.x = element_blank(),
        axis.ticks = element_blank(),
        panel.grid = element_blank())


grid.arrange(pie_250K, pie_X_250K, ncol = 2)

Since we can see that the brand distribution among those earning 250K+ is significantly lower for private spending, an area for opportunity for Q3 can be an increase in marketing for nationally branded products. This could drive steady revenue for this economic sector throughout the quarter.

Overall summary and reccomendation

Overall my goal for Regork was to find the best way to increase revenue. I believe my data shows for findings that will be crucial to meeting this goa.

1. Marketing push for 250K+ economic sector during quarter 3

From my multiple charts I was able to deduct that the higher economic sector are the highest spenders per household. Additionally there is an opportunity for growth during the third quarter as this sector is greatly under performing against the rest of market.

2. Focus on campaigns for soup products

Since soup greatly over performs in this market, this can be an area the Regork continues to grow. I would recommend a pricing increase in these products since the data illustrates that this sector are the highest spenders and likely to pay more for these goods.

3. Increase supply of nationally branded products

Lastly, It is shown that consumers in the higher economic class are more inclined to pay for nationally branded products. Increasing the supply and advertisement of these products will lead to an increase in revenue.