Regork Inc. Grocery Chain: Help Increase Revenue

Introduction

In today’s ever evolving business landscape, companies and organizations like Regork Inc. must continue to seek new opportunities for growth and expansion. For Regork Inc this is a strategic goal to thrive in a competitive marketplace. As a data engineer entrusted with this task, our prime objective is clear: we first need to identify untapped growth possibilities that can push Regork to newer and better heights.

Business Question

“How can Regork strategically manage its inventory and position its products in stores to boost both revenue and profitability”

Was the the outcome of the reports and analysis useful?

While there are endless possibilities to focus on we chose to focus on the strategic growth initiatives that can deliver tangible results. We will explore various data to answer our business question above, such as which products drive the most revenue, which customer segments should we utilize for targeting, and how marketing campaigns impact the overall sales.

Analytical Approach

Our analysis involved data processing, data joins, and advanced visualization techniques to derive meaningful insights. The R code provided in the report forms the basis of our analysis.

For our analysis, we began by identifying total sales per product_id in the transactions datasets, then we filtered out rows in the promotions data set for each unique product_id, but same product_id can be placed in different stores in the different display locations. Therefore, we grouped the merged data set by display location, this gave us the data set for each product_id and display locations

How we get our solution?

As we code and dive more into the Data we can translate our insights into actionable recommendations that Regrok Inc. can leverage to boost revenue and enhance customer satisfaction, as well as operational efficiency and market share.

  1. Our examination of product sales across different age groups unveils valuable insights for targeting specific demographics effectively.

  2. By pinpointing product categories with lower sales and untapped growth potential, we can develop strategies to revitalize these areas.

  3. The creation of a visual representation showcasing the top 10 product categories with the highest sales in transactions involving coupon discounts enables us to identify critical areas for growth. This chart guides marketing efforts towards the most successful product categories, optimizing resource allocation.

  4. Our analysis identifies the most popular product categories in the highest-performing display locations. This information helps us identify opportunities for product improvement and diversification within these categories. To arrive at these insights, we calculated total sales by product and store, merged this data with promotions, removed outliers in total sales at display locations, and filtered for the top three locations with the highest sales. This data informs us about popular products, which we can visualize based on total sales by product category.

  5. We offer data-driven recommendations aimed at enhancing revenue and profitability, ensuring that our insights translate into tangible business improvements.

PACKAGES REQUIRED

CompleteJourney found here: https://bradleyboehmke.github.io/completejourney/index.html

The R data package CompleteJourney provides us information about the company Regork Inc. that we can use to perform analysis and give our recommendation.

  • tidyverse : to transform and better represent data .
  • dplyr : for transformation and manipulation of the data.
  • lubridate : it provides tools that make it easier to parse and manipulate dates
  • ggplot2 : for data visualization to improve quality and aesthetics of graphics.
  • tidyr : to create tidy data

Graphical Representation of Data

Product Sales by Age Group

The chart below illustrates product sales by age group. Each bar represents a product category, and the colors indicate different age groups. The height of the bars represents the total sales in dollars.

## # A tibble: 1,469,307 × 11
##    household_id store_id basket_id   product_id quantity sales_value retail_disc
##    <chr>        <chr>    <chr>       <chr>         <dbl>       <dbl>       <dbl>
##  1 900          330      31198570044 1095275           1        0.5         0   
##  2 900          330      31198570047 9878513           1        0.99        0.1 
##  3 1228         406      31198655051 1041453           1        1.43        0.15
##  4 906          319      31198705046 1020156           1        1.5         0.29
##  5 906          319      31198705046 1053875           2        2.78        0.8 
##  6 906          319      31198705046 1060312           1        5.49        0.5 
##  7 906          319      31198705046 1075313           1        1.5         0.29
##  8 1058         381      31198676055 985893            1        1.88        0.21
##  9 1058         381      31198676055 988791            1        1.5         1.29
## 10 1058         381      31198676055 9297106           1        2.69        0   
## # ℹ 1,469,297 more rows
## # ℹ 4 more variables: coupon_disc <dbl>, coupon_match_disc <dbl>, week <int>,
## #   transaction_timestamp <dttm>
## # A tibble: 20,940,529 × 5
##    product_id store_id display_location mailer_location  week
##    <chr>      <chr>    <fct>            <fct>           <int>
##  1 1000050    316      9                0                   1
##  2 1000050    337      3                0                   1
##  3 1000050    441      5                0                   1
##  4 1000092    292      0                A                   1
##  5 1000092    293      0                A                   1
##  6 1000092    295      0                A                   1
##  7 1000092    298      0                A                   1
##  8 1000092    299      0                A                   1
##  9 1000092    304      0                A                   1
## 10 1000092    306      0                A                   1
## # ℹ 20,940,519 more rows
## # A tibble: 92,331 × 7
##    product_id manufacturer_id department    brand  product_category product_type
##    <chr>      <chr>           <chr>         <fct>  <chr>            <chr>       
##  1 25671      2               GROCERY       Natio… FRZN ICE         ICE - CRUSH…
##  2 26081      2               MISCELLANEOUS Natio… <NA>             <NA>        
##  3 26093      69              PASTRY        Priva… BREAD            BREAD:ITALI…
##  4 26190      69              GROCERY       Priva… FRUIT - SHELF S… APPLE SAUCE 
##  5 26355      69              GROCERY       Priva… COOKIES/CONES    SPECIALTY C…
##  6 26426      69              GROCERY       Priva… SPICES & EXTRAC… SPICES & SE…
##  7 26540      69              GROCERY       Priva… COOKIES/CONES    TRAY PACK/C…
##  8 26601      69              DRUG GM       Priva… VITAMINS         VITAMIN - M…
##  9 26636      69              PASTRY        Priva… BREAKFAST SWEETS SW GDS: SW …
## 10 26691      16              GROCERY       Priva… PNT BTR/JELLY/J… HONEY       
## # ℹ 92,321 more rows
## # ℹ 1 more variable: package_size <chr>
## # A tibble: 801 × 8
##    household_id age   income    home_ownership marital_status household_size
##    <chr>        <ord> <ord>     <ord>          <ord>          <ord>         
##  1 1            65+   35-49K    Homeowner      Married        2             
##  2 1001         45-54 50-74K    Homeowner      Unmarried      1             
##  3 1003         35-44 25-34K    <NA>           Unmarried      1             
##  4 1004         25-34 15-24K    <NA>           Unmarried      1             
##  5 101          45-54 Under 15K Homeowner      Married        4             
##  6 1012         35-44 35-49K    <NA>           Married        5+            
##  7 1014         45-54 15-24K    <NA>           Married        4             
##  8 1015         45-54 50-74K    Homeowner      Unmarried      1             
##  9 1018         45-54 35-49K    Homeowner      Married        5+            
## 10 1020         45-54 25-34K    Homeowner      Married        2             
## # ℹ 791 more rows
## # ℹ 2 more variables: household_comp <ord>, kids_count <ord>
# Products Sales by age group 

product_sales_age_group <- transactions %>%
  inner_join(prods, by = 'product_id') %>%
  inner_join(demo, by = 'household_id') %>%
  group_by(product_category, age) %>%
  summarize(sales = sum(sales_value)) %>%
  arrange(desc(sales)) %>%
  head(75)
product_sales_age_group
## # A tibble: 75 × 3
## # Groups:   product_category [37]
##    product_category       age    sales
##    <chr>                  <ord>  <dbl>
##  1 COUPON/MISC ITEMS      45-54 89171.
##  2 COUPON/MISC ITEMS      35-44 72982.
##  3 COUPON/MISC ITEMS      25-34 43606.
##  4 SOFT DRINKS            45-54 38801.
##  5 BEEF                   45-54 36485.
##  6 SOFT DRINKS            35-44 27337.
##  7 BEEF                   35-44 27124.
##  8 FLUID MILK PRODUCTS    45-54 25563.
##  9 CHEESE                 45-54 21791.
## 10 FRZN MEAT/MEAT DINNERS 45-54 18892.
## # ℹ 65 more rows
product_sales_age_group %>%
  ggplot(aes(x = reorder(product_category, sales, FUN = sum), 
             y = sales, 
             fill = age
  )
  ) +
  geom_col() +
  coord_flip() +
  labs(
    title = 'Top Most Product Categories Sold Items by age Groups',
    y = 'Total Net Sales $$$',
    x = 'Product Categories',
    subtitle = 
      'The data here shows the age groups for the most popular product 
categories sold.'
  ) +
  guides(fill = guide_legend(title = "Age Group")) +
  theme(plot.title = element_text(hjust = 0.5,face = "bold", size = 14), 
        plot.subtitle = element_text(size = 9),
        axis.text = element_text(size = 8)   
  )

Top Product Categories Using Coupons

The bubble chart below visualizes the total sales for the 10 most sold product categories when campaign is used. The size of each bubble represents the total sales, and the color indicates the product categories.

# Create a bubble chart
transactions %>%
  inner_join(prods, by = "product_id") %>%
  filter(coupon_disc != 0) %>%        # Filter transactions with coupon discounts
  group_by(product_category) %>%
  summarize(total_sales = sum(sales_value)) %>%   #calculate total sales by product category
  arrange(desc(total_sales)) %>%
  top_n(10) %>%                       #Select the top 10 product categories
  ggplot(aes(x = product_category, y = total_sales, size = total_sales, color = product_category)) +
  geom_point() +
  theme_minimal() +  # Use a minimal theme for cleaner look
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
    legend.position = "none"
  ) +
  labs(
    title = "Top 10 Product Categories Using Coupons",
    subtitle = "Total Sales Grouped by Product Category for Coupon Transactions",
    x = "Product Category",
    y = "Total Sales"
  ) +
  scale_size_continuous(range = c(5, 20)) + # Adjust bubble size
  scale_y_continuous(limits = c(0, 8000))   # Set y-axis limits

Average Total Sales by Display Location

The bubble chart below visualizes the average total sales on each display location. The size of each bubble represents the average total sales, and the color indicates the display location.

# Calculate total sales by product and store
total_sales_by_product <- transactions %>%
  group_by(product_id, store_id) %>%
  summarise(total_sales = sum(sales_value)) %>%
  na.omit()

# Merge transactions and promotions datasets
promotion_sales <- total_sales_by_product %>%
  left_join(promotions, by = c("product_id", "store_id")) %>%
  group_by(display_location) %>%
  summarise(total_display_sales = sum(total_sales, na.rm = TRUE)) %>%
  na.omit()

# Remove Outliers for display location 

# Calculate the IQR for total_display_sales
Q1 <- quantile(promotion_sales$total_display_sales, 0.25)
Q3 <- quantile(promotion_sales$total_display_sales, 0.75)
IQR <- Q3 - Q1

# Define a threshold to identify outliers
threshold <- 1.5 * IQR

# Filter promotion_sales to remove outliers
promotion_sales <- promotion_sales %>%
  filter(!(total_display_sales > (Q3 + threshold) | total_display_sales < (Q1 - threshold)))
# Create a bubble chart to visualize average total sales by display location
ggplot(promotion_sales, aes(x = display_location, y = total_display_sales, size = total_display_sales, fill = display_location)) +
  geom_point(shape = 21, color = "black", alpha = 0.7) +
  scale_size_continuous(range = c(5, 15)) + # Adjust size range for better visibility
  labs(title = "Total Sales by Display Location",
       x = "Display Location",
       y = "Total Sales",
       size = "Total Sales") + # Updated y-axis label
  theme_minimal() +
  scale_y_continuous(limits = c(40000, 3000000), labels = comma_format(scale = 1e-3, suffix = "K")) +
  theme(plot.title = element_text(hjust = 0.5, size = 14, face = "bold")) + # Center the title
  guides(size = "none")

# Filter for the top-performing display locations
top_display_locations <- promotion_sales %>%
  top_n(3, wt = total_display_sales)  # Select the top 3 display locations

# filter promotion data for the top display location
promotions_display_data <- promotions %>%
          filter(display_location %in% top_display_locations$display_location)
# Merge with promotions and transactions data to get details of the top display locations
  
top_display_data <- transactions %>% select(household_id, product_id, store_id, sales_value, week) %>%
  inner_join(promotions_display_data, by = c("product_id", "store_id", "week"))

Total Sales by Product Category in the Top Display Locations

The bar chart above shows the total sales by product category in the top 3 display locations. Each bar represents a product category, and the fill color indicates the display location.

#  most popular products in the top display locations

top_products <- prods %>% inner_join(top_display_data, by = "product_id") %>%
  group_by(product_id, product_category, display_location)%>%
  summarise(total_sales = sum(sales_value))%>%  arrange(desc(total_sales))%>%
  na.omit()

top_100_products <- head(top_products, 100)# Select the top 100 products
# Bar chart of the most popular products in the top display locations

ggplot(top_100_products, aes(x = product_category, y = total_sales, fill = display_location)) +
  geom_bar(stat = "identity") +
  labs(title = "Total Sales by Product Category in Top 3 Display Locations",
       x = "Product Category",
       y = "Total Sales",
       fill = "Display Location") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1), plot.title = element_text(hjust = 0.5, size = 14, face = "bold")) +
  scale_fill_manual(values = c("blue", "green", "red"))  # Customize fill colors as needed

INSIGHTS & RECOMMENDATIONS

Key Findings or Insights:

Our analysis uncovered several key findings:

  • By analyzing product sales by age group, we uncovered distinct preferences among different age groups.Some age groups exhibited a strong affinity for specific product categories. From this visualization we identified opportunities to target specific demographics effectively. This personalized approach can lead to increased sales and customer satisfaction.

  • Our data reveals distinct sales patterns among different age groups. The “45-54” age group emerges as the most significant contributor to our sales, particularly in categories like “COUPON/MISC ITEMS” and “SOFT DRINKS.”

  • Among our product categories, “SOFT DRINKS” stands out as a top performer, with substantial total sales. This category is closely followed by “FLUID MILK PRODUCTS” and “BAKED BREAD/BUNS/ROLLS.”

  • We identified underperforming product categories with significant growth potential. Strategies to enhance these categories can unlock latent profitability.

  • We identified the top display locations with the highest average total sales. Notably, “2-store rear,” “7-in-aisle,” and “9-secondary location display” emerged as the frontrunners in this regard. Additionally, “0-display” and “3-front end cap” also demonstrated significant sales potential. These information should be considered for optimizing product placement strategies and to make informed inventory planning decision to elevate customer satisfaction.

  • Within these top-performing display locations, we also identified the most popular product categories. This finding opens the door for recommendations to expand and diversify the product variety within these categories.

Recommendations:

Based on our findings, we propose the following recommendations to enhance Regork Inc.’s revenue and profitability:

  • Utilize customer insights to improve product offerings, refine store layouts, and optimize product placement strategies. Tailoring these aspects to match customer preferences can result in increased revenue and elevated customer satisfaction.

  • Leverage the insights gained from our data analysis to make informed inventory decisions. This includes selecting the right products to stock and when to introduce them, aligning inventory more closely with customer demand.

  • Focus marketing efforts on age groups 19-24 and 25-34, especially for soft drinks, as these segments exhibit the highest propensity to purchase such products. Consider introducing introductory items in strategic locations to attract more customers.

  • Concentrate efforts on display locations like “2-store rear,” “7-in-aisle,” “9-secondary” and “3 - frontend.” These locations have demonstrated strong sales potential and should be prioritized for key product placements.

  • Emphasize product categories such as produce, soft drinks, meat, baked bread, and cheese. These categories have shown promise for revenue growth and warrant special attention in marketing and inventory planning.

  • Launch targeted marketing campaigns tailored to age groups that exhibit growth potential. Craft messages and promotions that resonate with these demographics to maximize their engagement.

  • Utilize customer demographics to personalize marketing messages and promotions further. Tailoring your marketing approach can lead to a more personalized and engaging customer experience, which can drive increased sales and loyalty.

In Conclusion

In conclusion, this report provides valuable insights and recommendations to guide Regork’s strategic decisions. By optimizing inventory management, implementing targeted marketing, analyzing profitability, and prioritizing customer satisfaction, Regork can position itself for continued growth, increased revenue, and enhanced profitability.

Summary of the problem statement

At Regork Inc., one of our major challenges we wanted to tackle: how can we boost revenue and profitability in a highly competitive market? We needed to identify the key drivers of revenue, optimize product placements, and enhance customer targeting to achieve sustainable growth.

Summary of how we addressed the problem

In our quest to tackle this challenge, we embarked on a data-driven journey that encompassed a comprehensive analysis of product sales by age group, enabling us to uncover hidden demographics with immense growth potential. We delved deep into underperforming product categories, devising strategies to unlock their latent profitability. Our recommendations were solidly grounded in data, providing Regork Inc. with a clear path towards informed decision-making and highly personalized marketing efforts.

Moreover, we harnessed advanced visualization techniques, including the creation of a captivating bubble chart and a revealing bar chart showcasing the top-performing products in the top three display locations. These visualizations transformed raw data into actionable insights, equipping Regork Inc. with a robust strategy for revenue enhancement, improved profitability, and an elevated customer experience.

Interesting Findings along our journey

  • As we talked it became pretty clear that there were certain age groups that favored a certain product over the other, this became abundantly clear once we made graphical representations of our data.
  • Certain age groups did not favor at all or very little of some products, there was no interest from certain age groups.