Introduction

Introduction

Our team has been tasked with defining the logical approach to help identify a new way to increase sales.

Our team chose to answer the business question of:

How does understanding product performance help make strategic decisions on sales growth?

This project is crucial to Regork’s strategy. With Regork being a top grocery chain in the United States, it is important to understand how customers are spending their money at the store to steer our promotions, price points, and marketing strategies.

We explored this through the completejourney dataset. Within this, we used transaction and product data to gather insights. We started with finding the top 10 highest sold categories in Regork. Next we drilled further down into these categories and found the average spent, number of buyers, and percentage of total sales. These insights will help direct future opportunities and could ultimately lead to an increase in sales

The sales performance of these product categories provides valuable insight for Regork to optimize promotional strategies, refine pricing approaches, and adjust inventory to meet high demand for convenience items. Ultimately, this data supports strategic growth and enhances overall company performance.

From our analysis, we suggest Regork should invest in their marketing, pricing strategy, and promotion of convenience foods.

# Load necessary libraries

# The completejourney package provides transaction, product, and demographic data
library(completejourney)

# The dplyr package is used for data manipulation (filtering, summarizing, joining tables)
library(dplyr)

# The ggplot2 package is used for creating plots and visualizations
library(ggplot2)

# The scales package is used to format labels and scales (commas in large numbers)
library(scales)

# Load the transactions and products data
transactions <- get_transactions()
products <- completejourney::products

# Join transactions with products to get product categories and filter out "COUPON/MISC ITEMS"
transactions_with_products <- transactions %>%
  inner_join(products, by = "product_id") %>%
  filter(product_category != "COUPON/MISC ITEMS")  # Exclude "COUPON/MISC ITEMS"

# Define the color for the plots globally
blue_color <- "#084999"

Total Sales

Top 10 Product Categories Total Sales

As mentioned prior, our team wanted to focus on the top 10 product categories to provide a solution to the initial question: How does understanding product performance help make strategic decisions on sales growth? This bar graph displays the top 10 performing product categories by total sales. This graph reveals the small number of categories that are driving a substantial portion of overall sales.

# Total Sales by Top 10 Product Categories
total_sales_by_category <- transactions_with_products %>%
  group_by(product_category) %>%
  summarize(total_sales = sum(sales_value, na.rm = TRUE)) %>%
  arrange(desc(total_sales)) %>%
  head(10)

# Define the color for the plots
blue_color <- "#084999"
# Plot 1: Total Sales by Top 10 Product Categories
# Plot 1: Total Sales by Top 10 Product Categories
ggplot(total_sales_by_category, aes(x = reorder(product_category, -total_sales), y = total_sales)) +
  geom_bar(stat = "identity", fill = blue_color) +
  labs(title = "Total Sales by Top 10 Product Categories", x = "Product Category", y = "Total Sales") +
  scale_y_continuous(labels = comma) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        panel.grid = element_blank())  

Average Sales

Average Sales of Top 10 Product Categories

This bar graph displays the average USD spent per transaction for these 10 product categories and highlights which products are generating higher value for the company.The beer/ales were leading the pack in transaction values. Even though there’s less quantity sold there is more monetary value in these sales

# First, calculate average sales value for all product categories
avg_sales_value <- transactions_with_products %>%
  group_by(product_category) %>%
  summarize(average_sales = mean(sales_value, na.rm = TRUE), .groups = 'drop')

# Filter avg_sales_value to include only the top 10 categories from total_sales_by_category
avg_sales_value_filtered <- avg_sales_value %>%
  filter(product_category %in% total_sales_by_category$product_category)

# Create the second plot (Average Spent on Top 10 Product Categories per Transaction)
ggplot(avg_sales_value_filtered, aes(x = reorder(product_category, -average_sales), y = average_sales)) +
  geom_bar(stat = "identity", fill = blue_color) +
  scale_y_continuous(labels = comma) +  # Format y-axis labels
  labs(title = "Average Spent per Transaction for Top 10 Product Categories", x = "Product Category", y = "Average Amount Spend (USD)") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        panel.grid = element_blank())  # Remove grid lines

Unique

Unique Customers Who Purchased Top 10 Product Categories

Displays how many unique customers are purchasing products from these categories and provides an opportunity for targeted marketing.

# Unique Customers Purchasing Top 10 Product Categories
unique_customers_per_category <- transactions_with_products %>%
  filter(product_category %in% total_sales_by_category$product_category) %>%
  group_by(product_category) %>%
  summarize(unique_customers = n_distinct(household_id), .groups = 'drop')

# Create the third plot (Unique Customers Purchasing Each Top 10 Product Category)
ggplot(unique_customers_per_category, aes(x = reorder(product_category, -unique_customers), y = unique_customers)) +
  geom_bar(stat = "identity", fill = blue_color) +
  labs(title = "Unique Customers Purchasing Each of the Top 10 Product Categories",
       x = "Product Category",
       y = "Number of Unique Customers") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        panel.grid = element_blank())  # Remove grid lines

Percentage

Percentage of Total Sales Made by Top 10 Product Categories

Displays the exact percentage contribution these product categories are producing. These 10 product categories are accounting for about a quarter of Regork’s total sales. This also exemplifies the emphasis needed on convenience foods as they make up for nearly a quarter of total sales.

# Percentage of Total Sales Represented by Top 10 Product Categories
total_sales_all <- transactions_with_products %>%
  summarize(total_sales = sum(sales_value, na.rm = TRUE))

top_10_sales <- transactions_with_products %>%
  group_by(product_category) %>%
  summarize(total_sales = sum(sales_value, na.rm = TRUE)) %>%
  top_n(10, total_sales) %>%
  summarize(total_sales = sum(total_sales), .groups = 'drop')

percentage_top_10 <- (top_10_sales$total_sales / total_sales_all$total_sales) * 100

# Data for Pie Chart
sales_data <- data.frame(
  Category = c("Top 10 Product Categories", "Other Product Categories"),
  Sales = c(top_10_sales$total_sales, total_sales_all$total_sales - top_10_sales$total_sales)
)

# Create the fourth plot (Percentage of Total Sales by Top 10 Product Categories)
plot4 <- ggplot(sales_data, aes(x = "", y = Sales, fill = Category)) +
  geom_bar(stat = "identity", width = 1) +
  coord_polar("y") +
  # Add the percentage only to the "Top 10 Product Categories" slice
  geom_text(aes(label = ifelse(Category == "Top 10 Product Categories", paste0(round(percentage_top_10, 1), "%"), "")),
            position = position_stack(vjust = 0.5), size = 6, color = "white") +
  # Keep the title
  labs(title = paste("Percentage of Total Sales by Top 10 Product Categories")) +
  # Remove the legend
  theme(legend.position = "none") +
  theme_void() +  # Clean up the background
  # Set colors for the chart
  scale_fill_manual(values = c("Top 10 Product Categories" = blue_color, "Other Product Categories" = "#0056b3"))

# Print the plot
print(plot4)

Summary

Summary of Findings

How does understanding product performance help make strategic decisions on sales growth?

This project is essential to Regork’s strategic approach. As one of the leading grocery chains in the United States, it is vital to understand customer spending patterns in order to guide promotions, pricing strategies, and marketing efforts.

To dive deeper into this question we set up our analysis to identify the Top Ten Product Categories based on total sales. These were used to identify the top 10 categories to help Regork understand what products drive the largest revenue. We found that 9/10 categories were convenience food categories. This insight allows for data based decision making regarding product placement, promotional campaigns, inventory levels, and maximizing sales in high performing categories.

Subsequently, we continued to analyze these product categories and examined Average Sales of Top 10 Products. This data was used to provide context to the sales value per transaction within these categories. We determined the beer/ales were leading the pack in transaction values. This can help immensely with pricing strategy and determining the value of bundling or upselling. These strategies will al attempt to positively impact the average transaction value.

We maintained focus on the consumer behavior regarding these products and examined how many Unique Customers Purchased Top 10 Product Categories.Understanding the number of unique customers versus repeat customers will help Regork understand the reach of these products. This can influence loyalty program data and targeted marketing to further expand and maintain the original unique customers.

Our focus shifted back to observing sales values. We wanted to understand how much of our revenue was stemming from these products. We examined the Percentage of Total Sales Made by Top 10 Product Categories. These 10 product categories are accounting for about a quarter of Regork’s total sales. This also exemplifies the emphasis needed on convenience foods as they make up for nearly a quarter of total sales. It can also help the company to diversify and focus more efforts in lower performing categories.

Overall, by analyzing these findings, we were able to highlight which product categories are driving revenue and attracting customers to Regork. However, a limitation we encountered was that these insights did not take demographic and location data into account, which could further refine our understanding.

For next steps, Regork could explore deeper analysis by incorporating demographic and location data to draw more specific conclusions about customer behavior. Additionally, the company could emphasize promotions and bundling deals on convenience foods, focus marketing efforts on ready-made options, and explore growth opportunities in lower-performing categories.

The sales performance of these product categories provides valuable insight for Regork to optimize promotional strategies, refine pricing approaches, and adjust inventory to meet high demand for convenience items. Ultimately, this data supports strategic growth and enhances overall company performance.