Coffee is one of the most popular beverages in the world, with millions of people seeking that freshly brewed cup to kickstart their days. Stores like Regork capitalize on this demand by offering a wide range of coffee and mix-ins to help customers craft the perfect cup.
Although coffee is an incredibly popular product, Regork can boost its sales by attracting different demographics to purchase more of its offerings. This raises the question: How can Regork increase sales among the 25-34 age group?
To explore this question, this report is divided into three key data analysis topics: purchase timing, purchase patterns, and product preferences. Each topic is accompanied by various data visualizations to enhance understanding of the information.
This analysis will assist Regork in gaining insight into the preferences of 25-34-year-olds who purchase coffee and understand the choices they are making. By examining these three key data analysis topics, Regork can utilize transaction and product data to determine when coffee is bought, what it is purchased with, and how consumers prefer their coffee, allowing for targeted strategies to boost sales and customer engagement.
# shopping transactions data from 2,469 households
library(completejourney)
# tidying data and creating data visualizations
library(tidyverse)
# working with dates & times within data
library(lubridate)
Load these CompleteJourney datasets to use during data analysis:
transactions <- get_transactions()
products <- products
demographics <- demographics
coupons <- coupons
redemptions <- coupon_redemptions
To begin our analysis, we’ll start by looking at how much sales value each age group brought in for purchasing coffee to figure out which age group to analyze.
# filter transactions for coffee-related purchases
coffee_sales <- transactions %>%
inner_join(products, by = 'product_id') %>%
filter(product_category == 'COFFEE') %>%
group_by(household_id) %>%
summarize(total_spent = sum(sales_value, na.rm = TRUE))
# merge with customer demographics
coffee_by_age <- coffee_sales %>%
inner_join(demographics, by = 'household_id') %>%
group_by(age) %>%
summarize(total_sales = sum(total_spent, na.rm = TRUE)) %>%
arrange(age)
ggplot(data = coffee_by_age, aes(x = age, y = total_sales)) +
geom_col(fill = 'chocolate4') +
geom_text(aes(label = scales::dollar(total_sales)), size = 3.5, nudge_y = 300, color = 'black') +
scale_y_continuous(breaks = seq(0, 10000, by = 1000), labels = scales::dollar) +
labs(title = 'Coffee Sales per Age Group', x = 'Age Group', y = 'Total Sales')
Out of the six age groups of customers, 45-54-year-olds purchased the most coffee that year, spending a total of $9,200.69. In contrast, 19-24-year-olds bought the least coffee, with total sales adding up to only $743.98.
Looking at the 25-34 age group, who are no longer dependent on their parents’ income, it’s interesting that their coffee spending at grocery stores isn’t higher. As a group of young professionals who enjoy caffeine, a visit to Regork could help them stock up for the week to ensure they always have that perfect cup available.
This analysis will explore the purchase timing, purchasing patterns, and product preferences of 25-34-year-olds who buy coffee at Regork. It will offer valuable insights to help Regork attract more customers from this age group to buy coffee in its stores.
This section examines when 25-34-year-olds purchase coffee throughout the week and year.
# insert a weekday column in transactions data
transactions$weekday <- weekdays(as.Date(transactions$transaction_timestamp))
# coffee spending from 25-34 age group by day of week
coffee_by_weekday <- transactions %>%
inner_join(products, by = 'product_id') %>%
filter(product_category == 'COFFEE') %>%
inner_join(demographics, by = 'household_id') %>%
filter(age == '25-34') %>%
group_by(weekday) %>%
summarize(total_sales = sum(sales_value, na.rm = TRUE)) %>%
arrange(factor(weekday, levels = c('Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday')))
ggplot(data = coffee_by_weekday, aes(x = factor(weekday, levels = c('Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday')), y = total_sales, group = 1)) +
geom_line(color = 'darkgreen', linewidth = 1) +
geom_text(aes(label = scales::dollar(total_sales)), size = 3.5, nudge_y = 20, color = 'black') +
scale_y_continuous(breaks = seq(200,800, by = 100), labels = scales::dollar) +
labs(title = 'Coffee Sales per Day', x = 'Day of Week', y = 'Total Sales')
This chart illustrates the total coffee sales for this age group on each day of the week. Weekends emerge as the most popular days, with Saturday and Sunday sales totaling $1,215, which accounts for 45% of total sales. Weekday sales are significantly lower, yielding only $1,461.68 over those five days and 54% of total sales.
# coffee spending from 25-34 age group by month
coffee_by_month <- transactions %>%
inner_join(products, by = 'product_id') %>%
filter(product_category == 'COFFEE') %>%
inner_join(demographics, by = 'household_id') %>%
filter(age == '25-34') %>%
mutate(month = month(transaction_timestamp, label = TRUE)) %>%
group_by(month) %>%
summarize(total_sales = sum(sales_value, na.rm = TRUE))
ggplot(data = coffee_by_month, aes(x = month, y = total_sales, group = 1)) +
geom_line(color = 'darkgreen', linewidth = 1) +
geom_text(aes(label = scales::dollar(total_sales)), size = 3, nudge_y = 8, color = 'black') +
scale_y_continuous(labels = scales::dollar) +
labs(title = 'Coffee Sales per Month', x = 'Month', y = 'Total Sales')
Shifting to a different timeframe, this chart evaluates coffee sales by month for this age group. As the year progresses and temperatures drop, coffee sales naturally rise as people seek warmth. December marks the peak month for sales, with 25-34-year-olds purchasing $336.43 worth of coffee. While summer months see a decline in sales, it is also interesting to note that the early months of the year experience sluggish sales. This challenges the initial observation that colder months lead to increased coffee sales, as January reports only $165.70 in sales. Let’s examine spending during these two months more closely.
# daily coffee spending Dec and Jan
winter_coffee_by_day <- transactions %>%
inner_join(products, by = 'product_id') %>%
filter(product_category == 'COFFEE') %>%
inner_join(demographics, by = 'household_id') %>%
filter(age == '25-34') %>%
mutate(date = as.Date(transaction_timestamp), month = factor(month(date, label = TRUE), levels = c('Dec', 'Jan'))) %>%
filter(month %in% c('Dec', 'Jan')) %>%
group_by(date, month) %>%
summarize(total_sales = sum(sales_value, na.rm = TRUE)) %>%
arrange(date)
ggplot(data = winter_coffee_by_day, aes(x = day(date), y = total_sales, fill = month)) +
geom_col(fill = '#006d70') +
geom_text(aes(label = scales::dollar(total_sales)), size = 2, nudge_y = 0.75, color = 'black') +
scale_x_continuous(breaks = seq(0,31, by = 5)) +
scale_y_continuous(labels = scales::dollar) +
facet_wrap(~ month, scales = "free_x") +
labs(title = 'Daily Coffee Sales in Winter Months', subtitle = 'December and January', x = 'Day', y = 'Total Sales', fill = 'Month')
A detailed analysis of sales data for both months reveals that December’s surge in sales can be attributed to an increased frequency of purchases and higher peak sale days. December had 26 days of sales recorded, peaking at $28.01 on December 3, while January only had 16 sales days, peaking at $23.55 on January 22. Notably, two of December’s peak sales days (December 22 and 23) fall right before Christmas, indicating that coffee is likely part of this age group’s holiday shopping list. The lower sales figures in January may be linked to New Year’s resolutions to reduce caffeine intake and a desire to recover financially from holiday expenses.
This section examines what other products 25-34-year-olds are buying with their coffee, whether they’re using coupons, and how the number of discounts they have affected their transaction’s sales value.
# top product categories purchased with coffee
coffee_partners <- transactions %>%
inner_join(products, by = 'product_id') %>%
filter(product_category == 'COFFEE') %>%
inner_join(demographics, by = 'household_id') %>%
filter(age == '25-34') %>%
select(basket_id) %>%
distinct() %>%
inner_join(transactions, by = 'basket_id') %>%
inner_join(products, by = 'product_id') %>%
filter(product_category != 'COFFEE') %>%
group_by(product_category) %>%
summarize(total_sales = sum(sales_value)) %>%
ungroup() %>%
arrange(desc(total_sales)) %>%
top_n(10, total_sales) %>%
mutate(product_category = factor(product_category, levels = product_category))
ggplot(data = coffee_partners, aes(x = product_category, y = total_sales)) +
geom_col(fill = '#952a88') +
geom_text(aes(label = scales::dollar(total_sales)), size = 3.5, nudge_y = 60, color = 'black') +
scale_y_continuous(labels = scales::dollar) +
labs(title = 'Top Products Purchased with Coffee', x = 'Product Category', y = 'Total Sales') +
theme(axis.text.x = element_text(size = 6, angle = 45, hjust = 1))
This chart displays the top 10 products this age group also purchases alongside coffee when they shop. Nearly half of this list comprises popular breakfast foods, such as cheese, milk, bread, and cold cereal, which all pair well with a good cup of coffee. These common breakfast items contribute an additional $4,142.14 in sales value alongside coffee transactions. Beverages are another favored category to accompany coffee, with milk, soft drinks, and beer adding an extra $2,936.91. The most frequently purchased product alongside coffee is beef, which adds a sales value of $1,647.
# coffee sales by coupon usage
coffee_with_coupon <- transactions %>%
inner_join(products, by = 'product_id') %>%
filter(product_category == 'COFFEE') %>%
inner_join(demographics, by = 'household_id') %>%
filter(age == '25-34') %>%
left_join(coupons %>% inner_join(coupon_redemptions, by = "coupon_upc"), by = c("household_id", "product_id")) %>%
mutate(day = weekdays(as.Date(transaction_timestamp)), day = factor(day, levels = c('Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday')), coupon_used = !is.na(redemption_date))
ggplot(data = coffee_with_coupon, aes(x = day, y = sales_value, color = coupon_used)) +
geom_point(aes(alpha = coupon_used, size = coupon_used)) +
scale_color_manual(values = c('FALSE' = 'red2', 'TRUE' = '#008c00')) +
scale_alpha_manual(values = c('FALSE' = 0.2, 'TRUE' = 1)) +
scale_size_manual(values = c('FALSE' = 1.75, 'TRUE' = 2.5)) +
scale_y_continuous(labels = scales::dollar) +
labs(title = 'Coffee Coupon Transactions per Day', x = 'Day of Week', y = 'Sales Value', color = 'Coupon Used', alpha = 'Coupon Used', size = 'Coupon Used')
This graph analyzes the number of coupons distributed per day of the week for coffee transactions by this age group and whether those coupons were ultimately redeemed. Each coupon that was redeemed originally had a coffee transaction sales value of at least $4.99, indicating that consumers spent more than the bare minimum on coffee before using that coupon. Additionally, coffee transactions that occurred on weekdays resulted in more redeemed coupons, with 24 redeemed coupons over 5 days compared to the weekend’s 4 redeemed over 2 days.
# sum total value of discounts per qualifying transactions
coffee_discounts <- transactions %>%
inner_join(products, by = 'product_id') %>%
filter(product_category == 'COFFEE') %>%
inner_join(demographics, by = 'household_id') %>%
filter(age == '25-34') %>%
mutate(total_discount = retail_disc + coupon_disc + coupon_match_disc) %>%
filter(total_discount > 0)
ggplot(data = coffee_discounts, aes(x = total_discount, y = sales_value)) +
geom_point(color = 'chocolate4') +
scale_x_continuous(labels = scales::dollar) +
scale_y_continuous(labels = scales::dollar) +
labs(title = 'Coffee Sales per Total of Discounts',x = 'Total of Discounts', y = 'Sales Value')
This graph summarizes all discounts applied to coffee transactions for this age group and assesses the sales value of the products to which these discounts were applied. The discounts included in this assessment come from a Regork loyalty card program, manufacturer’s coupons, and Regork’s matching of a manufacturer coupon. As the total number of discounts increases, the sales value also shows a slight increase.
This section explores the most popular types of coffee that 25-34-year-olds are buying, their top coffee alternatives, and what types of creamers they’re using.
# most popular types of coffee
popular_coffee <- transactions %>%
inner_join(products, by = 'product_id') %>%
inner_join(demographics, by = 'household_id') %>%
filter(product_category == 'COFFEE', age == '25-34') %>%
group_by(product_type) %>%
summarize(total_households = n_distinct(household_id)) %>%
arrange(desc(total_households)) %>%
top_n(10, total_households) %>%
mutate(product_type = factor(product_type, levels = product_type))
ggplot(data = popular_coffee, aes(x = product_type, y = total_households)) +
geom_col(fill = 'chocolate4') +
geom_text(aes(label = total_households), size = 3.5, nudge_y = 3, color = 'black') +
labs(title = 'Most Popular Coffee Types', x = 'Coffee Type', y = 'Number of Households') +
theme(axis.text.x = element_text(size = 6, angle = 45, hjust = 1))
This chart illustrates which types of coffee are most frequently purchased by this age range. Ground coffee is by far the most popular type, with over twice the number of households buying it compared to the second option - coffee creamer. The remaining entries in the top 10 include various types of ground coffee, instant coffee, and coffee beans. With all these different coffee variations being popular, customers in this age range have diverse preferences but ultimately favor ground coffee variations the most.
# top 10 tea, energy drink, and coffee sales
coffee_alternatives <- transactions %>%
inner_join(products, by = 'product_id') %>%
inner_join(demographics, by = 'household_id') %>%
filter(age == '25-34') %>%
mutate(category = case_when(grepl('TEA ', product_type, ignore.case = TRUE) ~ 'TEA', product_type == 'ENERGY DRINK' ~ 'ENERGY DRINK', product_category == 'COFFEE' ~ 'COFFEE', TRUE ~ NA_character_)) %>%
filter(!is.na(category)) %>%
group_by(product_type, category) %>%
summarize(total_sales = sum(sales_value, na.rm = TRUE)) %>%
ungroup() %>%
arrange(desc(total_sales)) %>%
top_n(10, total_sales)
ggplot(data = coffee_alternatives, aes(x = reorder(product_type, -total_sales), y = total_sales, fill = category)) +
geom_col() +
geom_text(aes(label = scales::dollar(total_sales)), size = 2.6, nudge_y = 50, color = 'black') +
scale_fill_manual(values = c('TEA' = '#006d70', 'COFFEE' = 'chocolate4', 'ENERGY DRINK' = '#952a88')) +
scale_y_continuous(labels = scales::dollar) +
labs(title = ' Top 10 Coffee, Tea, and Energy Drink Sales', x = 'Product Type', y = 'Total Sales', fill = 'Category') +
theme(axis.text.x = element_text(size = 6, angle = 45, hjust = 1))
This chart compares the most popular types of coffee with popular caffeinated drink alternatives for this age group. Ground coffee continues to lead sales, with a total value of $1,243.31. Energy drinks are the most popular alternative to coffee, ranking second in total sales value at $756.92. There is another significant drop-off in value to reach the next type of coffee, before stabilizing around less popular coffee types and the most popular tea options. While tea is less popular than energy drinks, it ranks higher than all but the top three types of coffee.
coffee_creamer <- transactions %>%
inner_join(products, by = 'product_id') %>%
filter(grepl('CREAMER', product_type, ignore.case = TRUE)) %>%
inner_join(demographics, by = 'household_id') %>%
filter(age == '25-34') %>%
group_by(product_type) %>%
summarize(total_sales = sum(sales_value, na.rm = TRUE)) %>%
mutate(percentage = total_sales / sum(total_sales) * 100)
ggplot(data = coffee_creamer, aes(x = "", y = total_sales, fill = product_type)) +
geom_bar(stat = 'identity', width = 1, show.legend = FALSE) +
scale_fill_manual(values = c('NON DAIRY CREAMER: DRY' = '#952a88', 'REFRIGERATED COFFEE CREAMERS' = '#006d70')) +
geom_text(aes(label = paste0(product_type, "\n", scales::dollar(total_sales))), position = position_stack(vjust=0.5), size = 3) +
coord_polar('y', start = 0) +
theme_void() +
ggtitle('Total Sales of Coffee Creamers')
This chart compares the total sales of the two types of creamers purchased by this age group: non-dairy and dairy creamers. Coffee creamer sales are dominated by refrigerated dairy creamers, accounting for $914.54 in total sales, which makes up 83% of all coffee creamer sales. Dry non-dairy creamers are less popular, making up the remaining 17% of sales with $193.27. A total of $1,107.81 represents 41% of total coffee sales for this age group, compared to $2,676.68.
The goal of this analysis was to determine how Regork can better attract 25-34-year-olds to purchase more of their coffee products.
By examining transactions, products, demographics, coupons, and coupon redemption data, Regork can gain insight into the minds of their existing customers and understand their decision-making processes. Analyzing this data for purchase timing, patterns, and product preferences provides valuable insights into how 25-34-year-olds buy their coffee. This information can be useful for making product stocking or coupon distribution decisions.
The first interesting insight found was regarding purchase timing, as data revealed that 45% of weekly coffee sales for 25-34-year-olds occurred on weekends. Another insight was noted in the purchase patterns section, where data showed that 86% of coupons distributed in initial coffee transactions were redeemed on weekdays. A third insight was identified in the product preferences section, where ground coffee was found to be the most popular type, alongside four other variations.
Based on the insights gained from this analysis, I recommend that Regork optimize its coupon strategy. Given the higher rates of coupon redemption for weekday transactions, I would prioritize distributing coupons on those days. Individual weekday sales currently lag behind weekend sales, so focusing on high-value coupons that encourage customers to spend more could help improve those numbers. Applying these coupons to coffee types that are less frequently purchased by households could also entice customers to try different varieties, such as instant coffee. An optimized coupon strategy saves consumers money by providing more relevant and valuable discounts. However, it can also increase the pressure to spend more, as distributing high-value coupons may strain their budgets.
One limitation of this analysis is that the product data isn’t as detailed as it could be. It can be inferred that two products with different product IDs but the same category name are simply sold by different brands, but it’s not entirely clear if that assumption holds. Having brand information would provide more opportunities for analysis, as leading coffee brands could be compared within the age group. Another limitation is the scope of the timeframe, as only a single year of data isn’t sufficient when comparing purchase timing. Additional data from another year could offer further insight into whether day-to-day or month-to-month coffee sales for that age range are consistent between the two years or if they are anomalies.