The task at hand is to create a data evaluation of Kroger coupon use in 2017. What follows is the findings that I had while analyzing the data and creating understandable metrics.
When looking at a series of data there are many ways to evaluate and form this data into a successful analytical discovery. The question is: Does the income range of customers affect coupon usage by customers in different aspects of data categories? I wanted to find where we should manage our target market in certain times of year and usage on types of coupon in varying household incomes.
breakdown of midterm
library(cowplot)
library(completejourney)
## Welcome to the completejourney package! Learn more about these data
## sets at http://bit.ly/completejourney.
library(ggplot2)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ lubridate 1.9.4 ✔ tibble 3.2.1
## ✔ purrr 1.0.2 ✔ tidyr 1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ✖ lubridate::stamp() masks cowplot::stamp()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
library(dplyr)
library(scales)
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:purrr':
##
## discard
##
## The following object is masked from 'package:readr':
##
## col_factor
cowplot: Enchanced ggplot2 by providing tools for combining multiple plots into a single figure
completejourney: Provides consumer transaction datasets for analyzing grocery shopping behavior
ggplot2: A powerful data visualization package based on the grammar of graphics
tidyverse: A collection of R packages for data manipulation, visualization, and analysis.
lubridate: Simplifies working with date-time objects in R
dyplr: Provides efficient functions for data wrangling using a readable syntax
scales: Controls axis scaling, formatting, and transformations in ggplot2
transactions <- get_transactions()
transactions
## # A tibble: 1,469,307 × 11
## household_id store_id basket_id product_id quantity sales_value retail_disc
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 900 330 31198570044 1095275 1 0.5 0
## 2 900 330 31198570047 9878513 1 0.99 0.1
## 3 1228 406 31198655051 1041453 1 1.43 0.15
## 4 906 319 31198705046 1020156 1 1.5 0.29
## 5 906 319 31198705046 1053875 2 2.78 0.8
## 6 906 319 31198705046 1060312 1 5.49 0.5
## 7 906 319 31198705046 1075313 1 1.5 0.29
## 8 1058 381 31198676055 985893 1 1.88 0.21
## 9 1058 381 31198676055 988791 1 1.5 1.29
## 10 1058 381 31198676055 9297106 1 2.69 0
## # ℹ 1,469,297 more rows
## # ℹ 4 more variables: coupon_disc <dbl>, coupon_match_disc <dbl>, week <int>,
## # transaction_timestamp <dttm>
promotions <- get_promotions()
promotions
## # A tibble: 20,940,529 × 5
## product_id store_id display_location mailer_location week
## <chr> <chr> <fct> <fct> <int>
## 1 1000050 316 9 0 1
## 2 1000050 337 3 0 1
## 3 1000050 441 5 0 1
## 4 1000092 292 0 A 1
## 5 1000092 293 0 A 1
## 6 1000092 295 0 A 1
## 7 1000092 298 0 A 1
## 8 1000092 299 0 A 1
## 9 1000092 304 0 A 1
## 10 1000092 306 0 A 1
## # ℹ 20,940,519 more rows
campaigns
## # A tibble: 6,589 × 2
## campaign_id household_id
## <chr> <chr>
## 1 1 105
## 2 1 1238
## 3 1 1258
## 4 1 1483
## 5 1 2200
## 6 1 293
## 7 1 529
## 8 1 536
## 9 1 568
## 10 1 630
## # ℹ 6,579 more rows
campaign_descriptions
## # A tibble: 27 × 4
## campaign_id campaign_type start_date end_date
## <chr> <ord> <date> <date>
## 1 1 Type B 2017-03-03 2017-04-09
## 2 2 Type B 2017-03-08 2017-04-09
## 3 3 Type C 2017-03-13 2017-05-08
## 4 4 Type B 2017-03-29 2017-04-30
## 5 5 Type B 2017-04-03 2017-05-07
## 6 6 Type C 2017-04-19 2017-05-21
## 7 7 Type B 2017-04-24 2017-05-28
## 8 8 Type A 2017-05-08 2017-06-25
## 9 9 Type B 2017-05-31 2017-07-02
## 10 10 Type B 2017-06-28 2017-07-30
## # ℹ 17 more rows
coupons
## # A tibble: 116,204 × 3
## coupon_upc product_id campaign_id
## <chr> <chr> <chr>
## 1 10000085207 9676830 26
## 2 10000085207 9676943 26
## 3 10000085207 9676944 26
## 4 10000085207 9676947 26
## 5 10000085207 9677008 26
## 6 10000085207 9677052 26
## 7 10000085207 9677385 26
## 8 10000085207 9677479 26
## 9 10000085207 9677791 26
## 10 10000085207 9677878 26
## # ℹ 116,194 more rows
coupon_redemptions
## # A tibble: 2,102 × 4
## household_id coupon_upc campaign_id redemption_date
## <chr> <chr> <chr> <date>
## 1 1029 51380041013 26 2017-01-01
## 2 1029 51380041313 26 2017-01-01
## 3 165 53377610033 26 2017-01-03
## 4 712 51380041013 26 2017-01-07
## 5 712 54300016033 26 2017-01-07
## 6 2488 51200092776 26 2017-01-10
## 7 2488 51410010050 26 2017-01-10
## 8 1923 53000012033 26 2017-01-14
## 9 1923 54300021057 26 2017-01-14
## 10 1923 57047091041 26 2017-01-14
## # ℹ 2,092 more rows
demographics
## # A tibble: 801 × 8
## household_id age income home_ownership marital_status household_size
## <chr> <ord> <ord> <ord> <ord> <ord>
## 1 1 65+ 35-49K Homeowner Married 2
## 2 1001 45-54 50-74K Homeowner Unmarried 1
## 3 1003 35-44 25-34K <NA> Unmarried 1
## 4 1004 25-34 15-24K <NA> Unmarried 1
## 5 101 45-54 Under 15K Homeowner Married 4
## 6 1012 35-44 35-49K <NA> Married 5+
## 7 1014 45-54 15-24K <NA> Married 4
## 8 1015 45-54 50-74K Homeowner Unmarried 1
## 9 1018 45-54 35-49K Homeowner Married 5+
## 10 1020 45-54 25-34K Homeowner Married 2
## # ℹ 791 more rows
## # ℹ 2 more variables: household_comp <ord>, kids_count <ord>
products
## # A tibble: 92,331 × 7
## product_id manufacturer_id department brand product_category product_type
## <chr> <chr> <chr> <fct> <chr> <chr>
## 1 25671 2 GROCERY Natio… FRZN ICE ICE - CRUSH…
## 2 26081 2 MISCELLANEOUS Natio… <NA> <NA>
## 3 26093 69 PASTRY Priva… BREAD BREAD:ITALI…
## 4 26190 69 GROCERY Priva… FRUIT - SHELF S… APPLE SAUCE
## 5 26355 69 GROCERY Priva… COOKIES/CONES SPECIALTY C…
## 6 26426 69 GROCERY Priva… SPICES & EXTRAC… SPICES & SE…
## 7 26540 69 GROCERY Priva… COOKIES/CONES TRAY PACK/C…
## 8 26601 69 DRUG GM Priva… VITAMINS VITAMIN - M…
## 9 26636 69 PASTRY Priva… BREAKFAST SWEETS SW GDS: SW …
## 10 26691 16 GROCERY Priva… PNT BTR/JELLY/J… HONEY
## # ℹ 92,321 more rows
## # ℹ 1 more variable: package_size <chr>
In the first graph I was able to pull the overall usage of coupons by Kroger customers in the year 2017 by the overall household income of customers. The goal of this graph is to pinpoint where we are seeing lots of coupon usage and where we see very little or moderate coupon usage. Knowing this data allows us to see where our coupon marketing strategies lie. Once we can see the main three groups we should target, which based on the data we see is the ranges of 50-74K, 75-99k, and 35-49K in that order are the household income ranges that use the most coupons. After reading this data we know that we should be marketing more to the lower income households that are not using coupons as much as they could. As well as keep pumping out marketing material and coupons to the middle class income range because this has clearly been successful.
season_redempt1 <- coupon_redemptions %>%
inner_join(demographics, by = "household_id") %>%
group_by(income) %>%
summarise(coupon_upc = n()) %>%
arrange(income) %>%
ggplot(aes(x = income, y = coupon_upc)) +
geom_line(aes(group = 1), color = "blue") +
geom_point(data = . %>% top_n(20, coupon_upc), aes(x = income, y = coupon_upc), color = "red", size = 3) +
labs(x = "Income", y = "Coupon Usage") +
ggtitle("Coupon Usage by Household Income") +
geom_text(data = . %>% top_n(20, coupon_upc), aes(x = income, y = coupon_upc, label = coupon_upc),
color = "black", vjust = -0.5, size = 3)
season_redempt1
In the second graph we wanted to look at the total coupon redemptions by month with those top 3 income ranges we are focusing on, 50-74K, 75-99k, and 35-49K. The graph shows us that the months of August and November are extremely popular for coupon usage. The goal of this graph is to see where we have lots of activity in certain months. The data can help us show ratios of when to put out more or less coupons in certain months. The holiday season, by no surprise, is the most popular time for coupon usage so we should continue producing lots of coupons during this time, as well as promote more coupon use. Something we found interesting was the month of August, we can see that back to school shopping promotes a huge coupon usage in these income ranges of customers. We can conclude that in the month of August we should produce more back to school item coupons for customers.
income_time_series <- coupon_redemptions %>%
inner_join(demographics, by = "household_id") %>%
mutate(income_range = case_when(
income %in% c("50-74K") ~ "50-74K",
income %in% c("75-99K") ~ "75-99K",
income %in% c("35-49K") ~ "35-49K",
income %in% c("25-34K") ~ "25-34K"
)) %>%
filter(income_range %in% c("50-74K", "75-99K", "35-49K", "25-34K")) %>%
mutate(month = floor_date(redemption_date, "month")) %>%
group_by(month, income_range) %>%
summarize(total_redemptions = n()) %>%
ggplot(aes(x = month, y = total_redemptions, fill = income_range)) +
geom_bar(stat = "identity", position = "dodge") +
labs(x = "Month",
y = "Total Redemptions",
fill = "Income Range",
subtitle = "This graph looks at the total coupon redemptions for different income ranges, grouped by month.",
caption = "Source: Coupon Redemptions and Demographics data sets from the completejourney R package.") +
ggtitle("Total Coupon Redemptions by Income Range and Month") +
theme(axis.text.x = element_text(angle = 45, size = rel(1), margin = margin(.5, unit = "cm"), vjust = 1),
legend.position = "top",
axis.title.x = element_text(vjust = 4),
plot.title = element_text(hjust = 0.3),
legend.justification = -.03) +
scale_x_date(date_breaks = "1 month", date_labels = "%b %Y")
## `summarise()` has grouped output by 'month'. You can override using the
## `.groups` argument.
income_time_series
In this third graph I am evaluating the categories of coupons used in the two highest coupon usage months, August and November. The goal of this graph is to see our most used types of coupons in the busy season. When we are able to pinpoint the certain categories we can ask ourselves the questions of why are these coupons successful, then making an educated guess we can push out more coupons like the ones that are doing well. For example if there is a coupon for lunch meat, then we should also push out more coupons for bread, because they go together. We can also evaluate the numbers and see if coupon allocation is in the proper categories in these months. This we can do for all months and better organize what type of coupons we are pushing out to customers in certain times of the year because of popularity. The graph shows us how popular beef and lunch meat coupons are in the month of August, and November shows us beauty products are more popular in November, therefore we should continue marketing these coupons to our customers.
library(tidyverse)
library(scales)
top_by_cat <- coupon_redemptions %>%
inner_join(coupons, by = "coupon_upc", relationship = "many-to-many") %>%
inner_join(products, by = "product_id") %>%
mutate(period = case_when(
month(redemption_date) %in% c(8) ~ "August",
month(redemption_date) %in% c(11) ~ "November",
month(redemption_date) %in% c(12) ~ "December",
month(redemption_date) %in% c(9) ~ "September"
)) %>%
filter(period %in% c("August", "November", "December", "September")) %>%
group_by(period, product_category) %>%
summarise(coupon_redemptions = n(), .groups = "drop") %>%
top_n(5) %>%
arrange(desc(period), desc(coupon_redemptions)) %>%
mutate(product_category = reorder(product_category, -coupon_redemptions)) %>%
ggplot(aes(x = coupon_redemptions, y = product_category)) +
geom_bar(stat = "identity", fill = "skyblue") +
labs(x = "Coupon Redemptions",
y = "Product Category",
title = "Highest Coupon Use by Months",
caption = "Source: Coupon Redemptions, Coupons, and Products data sets from the completejourney R package.") +
coord_flip() +
geom_text(
aes(label = paste(round(coupon_redemptions / 1e3, 2), "K")),
position = position_dodge(width = 1),
vjust = -.2, size = 2.5
) +
facet_wrap(~ period, scales = 'free_x') +
scale_x_continuous(labels = label_number(suffix = "K", scale = 1e-3)) +
theme(axis.text.x = element_text(size = 5, angle = 90, hjust = 0.95, vjust = 0.2),
axis.text.y = element_text(size = 7))
## Selecting by coupon_redemptions
top_by_cat
The fourth and final graph is about being as specific as possible. In this graph we are looking at the three most popular coupon campaign types in the most popular month, November, in the income by households of 50-74k, which is the most popular as well. I wanted to see how well each of the campaign types did when coupons were the most used by the most used category. The data shows us that the type A campaign clearly worked the best in the most popular coupon month. While type C was clearly lacking and type B was far behind as well. This can tell us that we need to work more on type C and B campaigns because we are struggling with that. Then for campaign A continue what is working there because it is clearly successful. We can ask ourselves what we can learn for type C and B campaigns that we got from the type A campaign.
joined_data <- coupon_redemptions %>% inner_join(campaigns, by
= c(“campaign_id”, “household_id”)) joined_data <- coupon_redemptions
%>% inner_join(coupons, by = “coupon_upc”) %>%
left_join(campaigns, by = “campaign_id”) %>%
inner_join(demographics, by = “household_id”) nov_coup_inc <-
joined_data %>% mutate(month = month(redemption_date, label = TRUE,
abbr = FALSE)) %>%
filter(month == “November” & income == “50-74K”) %>%
group_by(campaign_id, campaign_type) %>%
summarise(total_redemptions = n(), .groups = “drop”) %>%
slice_max(order_by = total_redemptions, n = 10) %>%
arrange(desc(total_redemptions)) %>%
ggplot(aes(x = total_redemptions, y = reorder(campaign_type,
total_redemptions), fill = campaign_type)) +
geom_bar(stat = “identity”, show.legend = FALSE) +
geom_text(aes(label = total_redemptions), hjust = -0.2, size = 4)
+
labs( x = “Total Coupon Redemptions”, y = “Campaign Type”, title = “Most
Popular Campaign Types in November (Income: 50-74K)”, caption = “Source:
Complete Journey Dataset” ) + scale_x_continuous(labels = scales::comma)
+
theme_minimal() + theme(axis.text.x = element_text(size = 10),
axis.text.y = element_text(size = 12), plot.title = element_text(face =
“bold”, size = 14))
nov_coup_inc
The goal of this project was to be able to create a data evaluation that determined how different household incomes used coupons. We asked this question for the purpose of finding out where the allocation of different types of coupons should be distributed to our customers. We were able to pinpoint certain times of year where coupons and customers are more active, then furthermore evaluating which campaigns were the most successful in coupon usage based on incomes. Overall we made lots of findings in our data evaluation for Kroger. We are able to notice lots of things in the different elements of coupons and their users. With the findings we can help allocate coupons correctly based on income, time of the year, product type, and campaign success. The evaluation showed that the the very most successful coupon category is a customer who’s household income ranges from 50-75k, in the month of November under the marketing campaign type A. We even evaluated the most used coupons by products in certain months. We found that beauty product coupons are extremely popular in the month of November. With all of this information we found we can help the finance, marketing, operations, and overall sales departments because of our findings. Overall the project was a success in pinpointing certain coupons under certain incomes and time of year.