Introduction

# Tasked with finding areas of opportunity for growth within Regork, I have decided that a great area to look in depth is customer retention where the business problem I am advancing to achieve is how can Regork increase customer retention and drive repeat purchases for larger valued products.

Data wrangling Synopsis

# I used R coding to dive into Regork's customer purchasing behaviors to analyze relationships between certain areas such as purchasing frequency, purchasing product types, types of customers, transaction sizes and purchasing times. Using all this information I developed an analysis highlighting key details that can be used to create possible solutions for customer retention as well as repeat purchases for valued products.

Proposed Solution

# Through the analysis of Regork's data we identify who frequent customers are, what products they buy the most and how often they shop. Understanding these relationships between one another will help us create a strategy to boost store revenues. 

Packages Required

library(completejourney)
# This package holds a large data set from which is composed of many variables regarding Regork's raw details.
library(tidyverse)
# Collection of packages used for data manipulation along with visualization.
library(dplyr)
# This package is used as a part of tidyverse used for data manipulation.
library(lubridate)
# Helps Simplify extracting dates and times.
library(ggplot2)
# Data visualization package used for graphs. 

Data Overview

transactions <- get_transactions()
transactions
## # A tibble: 1,469,307 × 11
##    household_id store_id basket_id   product_id quantity sales_value retail_disc
##    <chr>        <chr>    <chr>       <chr>         <dbl>       <dbl>       <dbl>
##  1 900          330      31198570044 1095275           1        0.5         0   
##  2 900          330      31198570047 9878513           1        0.99        0.1 
##  3 1228         406      31198655051 1041453           1        1.43        0.15
##  4 906          319      31198705046 1020156           1        1.5         0.29
##  5 906          319      31198705046 1053875           2        2.78        0.8 
##  6 906          319      31198705046 1060312           1        5.49        0.5 
##  7 906          319      31198705046 1075313           1        1.5         0.29
##  8 1058         381      31198676055 985893            1        1.88        0.21
##  9 1058         381      31198676055 988791            1        1.5         1.29
## 10 1058         381      31198676055 9297106           1        2.69        0   
## # ℹ 1,469,297 more rows
## # ℹ 4 more variables: coupon_disc <dbl>, coupon_match_disc <dbl>, week <int>,
## #   transaction_timestamp <dttm>
promotions <- get_promotions()
promotions
## # A tibble: 20,940,529 × 5
##    product_id store_id display_location mailer_location  week
##    <chr>      <chr>    <fct>            <fct>           <int>
##  1 1000050    316      9                0                   1
##  2 1000050    337      3                0                   1
##  3 1000050    441      5                0                   1
##  4 1000092    292      0                A                   1
##  5 1000092    293      0                A                   1
##  6 1000092    295      0                A                   1
##  7 1000092    298      0                A                   1
##  8 1000092    299      0                A                   1
##  9 1000092    304      0                A                   1
## 10 1000092    306      0                A                   1
## # ℹ 20,940,519 more rows
campaigns
## # A tibble: 6,589 × 2
##    campaign_id household_id
##    <chr>       <chr>       
##  1 1           105         
##  2 1           1238        
##  3 1           1258        
##  4 1           1483        
##  5 1           2200        
##  6 1           293         
##  7 1           529         
##  8 1           536         
##  9 1           568         
## 10 1           630         
## # ℹ 6,579 more rows
campaign_descriptions
## # A tibble: 27 × 4
##    campaign_id campaign_type start_date end_date  
##    <chr>       <ord>         <date>     <date>    
##  1 1           Type B        2017-03-03 2017-04-09
##  2 2           Type B        2017-03-08 2017-04-09
##  3 3           Type C        2017-03-13 2017-05-08
##  4 4           Type B        2017-03-29 2017-04-30
##  5 5           Type B        2017-04-03 2017-05-07
##  6 6           Type C        2017-04-19 2017-05-21
##  7 7           Type B        2017-04-24 2017-05-28
##  8 8           Type A        2017-05-08 2017-06-25
##  9 9           Type B        2017-05-31 2017-07-02
## 10 10          Type B        2017-06-28 2017-07-30
## # ℹ 17 more rows
coupons
## # A tibble: 116,204 × 3
##    coupon_upc  product_id campaign_id
##    <chr>       <chr>      <chr>      
##  1 10000085207 9676830    26         
##  2 10000085207 9676943    26         
##  3 10000085207 9676944    26         
##  4 10000085207 9676947    26         
##  5 10000085207 9677008    26         
##  6 10000085207 9677052    26         
##  7 10000085207 9677385    26         
##  8 10000085207 9677479    26         
##  9 10000085207 9677791    26         
## 10 10000085207 9677878    26         
## # ℹ 116,194 more rows
demographics
## # A tibble: 801 × 8
##    household_id age   income    home_ownership marital_status household_size
##    <chr>        <ord> <ord>     <ord>          <ord>          <ord>         
##  1 1            65+   35-49K    Homeowner      Married        2             
##  2 1001         45-54 50-74K    Homeowner      Unmarried      1             
##  3 1003         35-44 25-34K    <NA>           Unmarried      1             
##  4 1004         25-34 15-24K    <NA>           Unmarried      1             
##  5 101          45-54 Under 15K Homeowner      Married        4             
##  6 1012         35-44 35-49K    <NA>           Married        5+            
##  7 1014         45-54 15-24K    <NA>           Married        4             
##  8 1015         45-54 50-74K    Homeowner      Unmarried      1             
##  9 1018         45-54 35-49K    Homeowner      Married        5+            
## 10 1020         45-54 25-34K    Homeowner      Married        2             
## # ℹ 791 more rows
## # ℹ 2 more variables: household_comp <ord>, kids_count <ord>
products
## # A tibble: 92,331 × 7
##    product_id manufacturer_id department    brand  product_category product_type
##    <chr>      <chr>           <chr>         <fct>  <chr>            <chr>       
##  1 25671      2               GROCERY       Natio… FRZN ICE         ICE - CRUSH…
##  2 26081      2               MISCELLANEOUS Natio… <NA>             <NA>        
##  3 26093      69              PASTRY        Priva… BREAD            BREAD:ITALI…
##  4 26190      69              GROCERY       Priva… FRUIT - SHELF S… APPLE SAUCE 
##  5 26355      69              GROCERY       Priva… COOKIES/CONES    SPECIALTY C…
##  6 26426      69              GROCERY       Priva… SPICES & EXTRAC… SPICES & SE…
##  7 26540      69              GROCERY       Priva… COOKIES/CONES    TRAY PACK/C…
##  8 26601      69              DRUG GM       Priva… VITAMINS         VITAMIN - M…
##  9 26636      69              PASTRY        Priva… BREAKFAST SWEETS SW GDS: SW …
## 10 26691      16              GROCERY       Priva… PNT BTR/JELLY/J… HONEY       
## # ℹ 92,321 more rows
## # ℹ 1 more variable: package_size <chr>
# Customer Segmentation
transactions_demo <- transactions %>%
  inner_join(demographics, by = "household_id")

# Product level analysis 
transactions_products <- transactions %>%
  inner_join(products, by = "product_id")

Data Analysis

Distribution of shopping frequency which shows how often households shop at Regork. This graph shows that a majority of customers are occasional shoppers as they infrequently visit Regork to make irregular purchases which also shows they may shop at other places.

customer_freq <- transactions %>%
  count(household_id) 

ggplot(customer_freq, aes(x = n)) +
  geom_histogram(binwidth = 5, fill = "steelblue", color = "black") +
  labs(title = "Distribution of Shopping Frequency",
       x = "Number of Transactions per Household",
       y = "Number of Households") +
  theme_minimal()

This graph shows the average sales per transaction which in this case tell us that most transactions are small purchases and one-time purchasers dominate the market. Frequent shoppers drive more revenue by volume however, and not by spending a lot per visit to Regork. Encouraging customers to buy more items when they visit it vital, bundles and promotions will help motivate customers.

shopper_segments <- transactions %>%
  group_by(household_id) %>%
  summarize(total_transactions = n()) %>%
  mutate(shopper_type = case_when(
    total_transactions >= 30 ~ "Frequent Shopper",
    total_transactions >= 10 ~ "Occasional Shopper",
    TRUE ~ "One-Time Shopper"
  ))

transactions <- transactions %>%
  left_join(shopper_segments, by = "household_id")

transactions %>%
  group_by(shopper_type) %>%
  summarize(avg_spending = mean(sales_value)) %>%
  ggplot(aes(x = shopper_type, y = avg_spending, fill = shopper_type)) +
  geom_col() +
  labs(title = "Average Spending per Shopper Type",
       x = "Shopper Type",
       y = "Average Spending ($)") +
  theme_minimal() +
  theme(legend.position = "none")

Top products driving repeat purchases which looks at most frequently purchases products. This graph shows that the most frequently purchased items are milk products, eggs, and bread along with fruit which tells us that repeat customers rely on Regork for their everyday essentials which is good.

top_products <- transactions %>%
  count(product_id, sort = TRUE) %>%
  top_n(15, n) %>%
  inner_join(products, by = "product_id")  # Join with product details

ggplot(top_products, aes(x = reorder(product_category, n), y = n)) +
  geom_col(fill = "darkgreen") +
  coord_flip() +
  labs(title = "Top 15 Most Frequently Purchased Product Categories",
       x = "Product Category",
       y = "Number of Purchases") +
  theme_minimal()

This graph shows the top items purchased by one-time shoppers and here we see that coupon items are the most frequently purchased which shows that those shoppers are drawn in by promotional and discounted items.

one_time_shoppers <- transactions %>%
  group_by(household_id) %>%
  filter(n() == 1) %>%
  ungroup()

one_time_shoppers_products <- one_time_shoppers %>%
  left_join(products, by = "product_id") %>%
  count(product_category, sort = TRUE) %>%
  top_n(15)

ggplot(one_time_shoppers_products, aes(x = reorder(product_category, n), y = n)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  coord_flip() +
  labs(title = "Top 15 Products Purchased by One-Time Shoppers", 
       x = "Product Category", 
       y = "Number of Purchases") +
  theme_minimal()

Time between purchases shows how long customers wait before shopping again. This graph shows us that frequent customers show loyalty by continuing to come back to shop around every three weeks. To shorten time between trips Regork could apply the use of more promotions and incentives.

transactions_time <- transactions %>%
  mutate(transaction_date = as.Date(transaction_timestamp)) %>%  
  group_by(household_id) %>%
  arrange(household_id, transaction_date) %>%  
  mutate(days_since_last = as.numeric(difftime(transaction_date, lag(transaction_date), units = "days"))) %>% 
  ungroup()

transactions_time <- transactions_time %>%
  filter(!is.na(days_since_last))

ggplot(transactions_time, aes(x = days_since_last)) +
  geom_histogram(binwidth = 5, fill = "purple", color = "black") +
  scale_x_continuous(limits = c(0, 50)) +  # Limit x-axis to 50 days
  scale_y_continuous(labels = scales::comma, limits = c(0, 5000)) + 
  labs(title = "Time Between Purchases (Limited to 50 Days)",
       x = "Days Since Last Purchase",
       y = "Number of Transactions") +
  theme_minimal()

# Summary

# The goal of this analysis was to create solutions to promote and improve customer retention as well as repeat purchases for larger valued items.

# Through the analysis of shopping frequency, spending patterns, time between purchases, and product preferences sheds light on a few opportunities for Regork to grow and increase customer retention and increase revenue. Breaking down the process I started looking at the distribution of shopping frequency which showed that a majority of customers are occasional shoppers who visit Regork infrequently to make irregular purchases. This suggests that those customers also shop at other places which in turn created reason to improve on customer retention. Next I looked at average sales per transaction which showed me that most purchases are small and one-time purchasers dominate the market despite frequent shoppers driving more revenue through volume, they don't spend a lot per visit. After finding that I dived into top products driving repeat purchases which told me that most frequently purchased items are essentials such as eggs, milk, bread and fruit which creates opportunity for Regork to be positioned as a staple spot for shopping. Next I found top items purchased by one time shoppers which highlights that coupon items are most frequently bought. Finally I found the time between purchased which told me that frequent cusotmers usually return every three weeks on average which shows loyalty, but there are ways to decrease that time to further improve revenue. 

# Overall the key to customer retention and repeat purchases for larger valued items starts with increasing customer loyalty by focusing on personalized promotions through offering coupons and discounts specific to the shopper's personal needs. Next we can work on bundling solutions as the most purchased item, milk, bread, eggs and fruit we can promote discounts when customers purchases two or three of the products together. Also by increasing the price of only one or two of those products along with the bundle discount will lead to higher revenues. Using limited time offers will diminish the time between purchases causing higher sales.  

# While the analysis provides valuable insight it is important to note that there are some limitations such as being a one man team and having limited scope on the situation, if other people were to continue working on this analysis then more data and a different perspective would be induced and created more in depth solutions highlighting on new substance. Another limitation is that data is ever changing and even though those changes may be subtle there is still opportunity for extremities such as pandemics which would drastically alter shopping altogether.