# Tasked with finding areas of opportunity for growth within Regork, I have decided that a great area to look in depth is customer retention where the business problem I am advancing to achieve is how can Regork increase customer retention and drive repeat purchases for larger valued products.
# I used R coding to dive into Regork's customer purchasing behaviors to analyze relationships between certain areas such as purchasing frequency, purchasing product types, types of customers, transaction sizes and purchasing times. Using all this information I developed an analysis highlighting key details that can be used to create possible solutions for customer retention as well as repeat purchases for valued products.
# Through the analysis of Regork's data we identify who frequent customers are, what products they buy the most and how often they shop. Understanding these relationships between one another will help us create a strategy to boost store revenues.
library(completejourney)
# This package holds a large data set from which is composed of many variables regarding Regork's raw details.
library(tidyverse)
# Collection of packages used for data manipulation along with visualization.
library(dplyr)
# This package is used as a part of tidyverse used for data manipulation.
library(lubridate)
# Helps Simplify extracting dates and times.
library(ggplot2)
# Data visualization package used for graphs.
transactions <- get_transactions()
transactions
## # A tibble: 1,469,307 × 11
## household_id store_id basket_id product_id quantity sales_value retail_disc
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 900 330 31198570044 1095275 1 0.5 0
## 2 900 330 31198570047 9878513 1 0.99 0.1
## 3 1228 406 31198655051 1041453 1 1.43 0.15
## 4 906 319 31198705046 1020156 1 1.5 0.29
## 5 906 319 31198705046 1053875 2 2.78 0.8
## 6 906 319 31198705046 1060312 1 5.49 0.5
## 7 906 319 31198705046 1075313 1 1.5 0.29
## 8 1058 381 31198676055 985893 1 1.88 0.21
## 9 1058 381 31198676055 988791 1 1.5 1.29
## 10 1058 381 31198676055 9297106 1 2.69 0
## # ℹ 1,469,297 more rows
## # ℹ 4 more variables: coupon_disc <dbl>, coupon_match_disc <dbl>, week <int>,
## # transaction_timestamp <dttm>
promotions <- get_promotions()
promotions
## # A tibble: 20,940,529 × 5
## product_id store_id display_location mailer_location week
## <chr> <chr> <fct> <fct> <int>
## 1 1000050 316 9 0 1
## 2 1000050 337 3 0 1
## 3 1000050 441 5 0 1
## 4 1000092 292 0 A 1
## 5 1000092 293 0 A 1
## 6 1000092 295 0 A 1
## 7 1000092 298 0 A 1
## 8 1000092 299 0 A 1
## 9 1000092 304 0 A 1
## 10 1000092 306 0 A 1
## # ℹ 20,940,519 more rows
campaigns
## # A tibble: 6,589 × 2
## campaign_id household_id
## <chr> <chr>
## 1 1 105
## 2 1 1238
## 3 1 1258
## 4 1 1483
## 5 1 2200
## 6 1 293
## 7 1 529
## 8 1 536
## 9 1 568
## 10 1 630
## # ℹ 6,579 more rows
campaign_descriptions
## # A tibble: 27 × 4
## campaign_id campaign_type start_date end_date
## <chr> <ord> <date> <date>
## 1 1 Type B 2017-03-03 2017-04-09
## 2 2 Type B 2017-03-08 2017-04-09
## 3 3 Type C 2017-03-13 2017-05-08
## 4 4 Type B 2017-03-29 2017-04-30
## 5 5 Type B 2017-04-03 2017-05-07
## 6 6 Type C 2017-04-19 2017-05-21
## 7 7 Type B 2017-04-24 2017-05-28
## 8 8 Type A 2017-05-08 2017-06-25
## 9 9 Type B 2017-05-31 2017-07-02
## 10 10 Type B 2017-06-28 2017-07-30
## # ℹ 17 more rows
coupons
## # A tibble: 116,204 × 3
## coupon_upc product_id campaign_id
## <chr> <chr> <chr>
## 1 10000085207 9676830 26
## 2 10000085207 9676943 26
## 3 10000085207 9676944 26
## 4 10000085207 9676947 26
## 5 10000085207 9677008 26
## 6 10000085207 9677052 26
## 7 10000085207 9677385 26
## 8 10000085207 9677479 26
## 9 10000085207 9677791 26
## 10 10000085207 9677878 26
## # ℹ 116,194 more rows
demographics
## # A tibble: 801 × 8
## household_id age income home_ownership marital_status household_size
## <chr> <ord> <ord> <ord> <ord> <ord>
## 1 1 65+ 35-49K Homeowner Married 2
## 2 1001 45-54 50-74K Homeowner Unmarried 1
## 3 1003 35-44 25-34K <NA> Unmarried 1
## 4 1004 25-34 15-24K <NA> Unmarried 1
## 5 101 45-54 Under 15K Homeowner Married 4
## 6 1012 35-44 35-49K <NA> Married 5+
## 7 1014 45-54 15-24K <NA> Married 4
## 8 1015 45-54 50-74K Homeowner Unmarried 1
## 9 1018 45-54 35-49K Homeowner Married 5+
## 10 1020 45-54 25-34K Homeowner Married 2
## # ℹ 791 more rows
## # ℹ 2 more variables: household_comp <ord>, kids_count <ord>
products
## # A tibble: 92,331 × 7
## product_id manufacturer_id department brand product_category product_type
## <chr> <chr> <chr> <fct> <chr> <chr>
## 1 25671 2 GROCERY Natio… FRZN ICE ICE - CRUSH…
## 2 26081 2 MISCELLANEOUS Natio… <NA> <NA>
## 3 26093 69 PASTRY Priva… BREAD BREAD:ITALI…
## 4 26190 69 GROCERY Priva… FRUIT - SHELF S… APPLE SAUCE
## 5 26355 69 GROCERY Priva… COOKIES/CONES SPECIALTY C…
## 6 26426 69 GROCERY Priva… SPICES & EXTRAC… SPICES & SE…
## 7 26540 69 GROCERY Priva… COOKIES/CONES TRAY PACK/C…
## 8 26601 69 DRUG GM Priva… VITAMINS VITAMIN - M…
## 9 26636 69 PASTRY Priva… BREAKFAST SWEETS SW GDS: SW …
## 10 26691 16 GROCERY Priva… PNT BTR/JELLY/J… HONEY
## # ℹ 92,321 more rows
## # ℹ 1 more variable: package_size <chr>
# Customer Segmentation
transactions_demo <- transactions %>%
inner_join(demographics, by = "household_id")
# Product level analysis
transactions_products <- transactions %>%
inner_join(products, by = "product_id")
customer_freq <- transactions %>%
count(household_id)
ggplot(customer_freq, aes(x = n)) +
geom_histogram(binwidth = 5, fill = "steelblue", color = "black") +
labs(title = "Distribution of Shopping Frequency",
x = "Number of Transactions per Household",
y = "Number of Households") +
theme_minimal()
shopper_segments <- transactions %>%
group_by(household_id) %>%
summarize(total_transactions = n()) %>%
mutate(shopper_type = case_when(
total_transactions >= 30 ~ "Frequent Shopper",
total_transactions >= 10 ~ "Occasional Shopper",
TRUE ~ "One-Time Shopper"
))
transactions <- transactions %>%
left_join(shopper_segments, by = "household_id")
transactions %>%
group_by(shopper_type) %>%
summarize(avg_spending = mean(sales_value)) %>%
ggplot(aes(x = shopper_type, y = avg_spending, fill = shopper_type)) +
geom_col() +
labs(title = "Average Spending per Shopper Type",
x = "Shopper Type",
y = "Average Spending ($)") +
theme_minimal() +
theme(legend.position = "none")
top_products <- transactions %>%
count(product_id, sort = TRUE) %>%
top_n(15, n) %>%
inner_join(products, by = "product_id") # Join with product details
ggplot(top_products, aes(x = reorder(product_category, n), y = n)) +
geom_col(fill = "darkgreen") +
coord_flip() +
labs(title = "Top 15 Most Frequently Purchased Product Categories",
x = "Product Category",
y = "Number of Purchases") +
theme_minimal()
one_time_shoppers <- transactions %>%
group_by(household_id) %>%
filter(n() == 1) %>%
ungroup()
one_time_shoppers_products <- one_time_shoppers %>%
left_join(products, by = "product_id") %>%
count(product_category, sort = TRUE) %>%
top_n(15)
ggplot(one_time_shoppers_products, aes(x = reorder(product_category, n), y = n)) +
geom_bar(stat = "identity", fill = "steelblue") +
coord_flip() +
labs(title = "Top 15 Products Purchased by One-Time Shoppers",
x = "Product Category",
y = "Number of Purchases") +
theme_minimal()
transactions_time <- transactions %>%
mutate(transaction_date = as.Date(transaction_timestamp)) %>%
group_by(household_id) %>%
arrange(household_id, transaction_date) %>%
mutate(days_since_last = as.numeric(difftime(transaction_date, lag(transaction_date), units = "days"))) %>%
ungroup()
transactions_time <- transactions_time %>%
filter(!is.na(days_since_last))
ggplot(transactions_time, aes(x = days_since_last)) +
geom_histogram(binwidth = 5, fill = "purple", color = "black") +
scale_x_continuous(limits = c(0, 50)) + # Limit x-axis to 50 days
scale_y_continuous(labels = scales::comma, limits = c(0, 5000)) +
labs(title = "Time Between Purchases (Limited to 50 Days)",
x = "Days Since Last Purchase",
y = "Number of Transactions") +
theme_minimal()
# Summary
# The goal of this analysis was to create solutions to promote and improve customer retention as well as repeat purchases for larger valued items.
# Through the analysis of shopping frequency, spending patterns, time between purchases, and product preferences sheds light on a few opportunities for Regork to grow and increase customer retention and increase revenue. Breaking down the process I started looking at the distribution of shopping frequency which showed that a majority of customers are occasional shoppers who visit Regork infrequently to make irregular purchases. This suggests that those customers also shop at other places which in turn created reason to improve on customer retention. Next I looked at average sales per transaction which showed me that most purchases are small and one-time purchasers dominate the market despite frequent shoppers driving more revenue through volume, they don't spend a lot per visit. After finding that I dived into top products driving repeat purchases which told me that most frequently purchased items are essentials such as eggs, milk, bread and fruit which creates opportunity for Regork to be positioned as a staple spot for shopping. Next I found top items purchased by one time shoppers which highlights that coupon items are most frequently bought. Finally I found the time between purchased which told me that frequent cusotmers usually return every three weeks on average which shows loyalty, but there are ways to decrease that time to further improve revenue.
# Overall the key to customer retention and repeat purchases for larger valued items starts with increasing customer loyalty by focusing on personalized promotions through offering coupons and discounts specific to the shopper's personal needs. Next we can work on bundling solutions as the most purchased item, milk, bread, eggs and fruit we can promote discounts when customers purchases two or three of the products together. Also by increasing the price of only one or two of those products along with the bundle discount will lead to higher revenues. Using limited time offers will diminish the time between purchases causing higher sales.
# While the analysis provides valuable insight it is important to note that there are some limitations such as being a one man team and having limited scope on the situation, if other people were to continue working on this analysis then more data and a different perspective would be induced and created more in depth solutions highlighting on new substance. Another limitation is that data is ever changing and even though those changes may be subtle there is still opportunity for extremities such as pandemics which would drastically alter shopping altogether.