I was tasked with identifying potential areas of growth which Regork could invest in to improve future revenues. In order to find potential opportunities for Regork, I analyzed data regarding products, demographics, and transactions for which coupons were used. After discovering the three income groups which use coupons the most, I compared the top product categories for transactions using coupons to the top product categories for all transactions. I also analyzed how coupon usage changes by month to identify potential trends and patterns.
In order to complete all necessary exploratory data analysis, the following packages were required:
library(completejourney)
library(tidyverse)
library(lubridate)
library(ggplot2)
library(dplyr)
completejourney- Provided all necessary information about Regork transactions.
tidyverse- A collection of R packages that share an underlying design and similar methods to clean and analyze tidy data.
lubridate- Functions that improve ease of working with date-times and time-spans
ggplot2- A system for declarative creating graphics, based on The Grammar of Graphics
dplyr- A set of functions designed to enable dataframe manipulation
transactions <- get_transactions()
tmonths <- transactions %>%
mutate(month = month(transaction_timestamp))
After naming the transactions data, I mutated the dataframe so that I could filter by month for future data analysis.
transactions %>%
inner_join(coupon_redemptions) %>%
inner_join(demographics) %>%
inner_join(coupons) %>%
group_by(household_id) %>%
summarise(Sales = sum(sales_value)) %>%
inner_join(demographics) %>%
ggplot(aes(x = income, y = Sales)) +
geom_bar(stat = 'identity', fill = "purple") +
ggtitle("Total Coupon Sales by Income Level") +
scale_x_discrete("Income", guide = guide_axis(n.dodge = 2)) +
scale_y_continuous(labels = scales::comma) +
theme(plot.title = element_text(hjust = 0.5)) +
labs(y = "Toal Sales", x = "Income")
This graph shows the total sales for transactions which used a coupon by income level. As you can see, the 50-74K income range was easily the highest in coupon sales, followed by 35-49K and 75-99K ranges. I chose to focus primarily on these three income ranges in further analysis as a much higher sales total than other income ranges, which would allow for more thorough and in-depth analysis.
transactions %>%
inner_join(products) %>%
inner_join(demographics) %>%
filter(str_detect(income, "35-49K")) %>%
group_by(product_category) %>%
count(product_category) %>%
filter(n > 4000) %>%
ggplot(aes(x = reorder(product_category, -n), y = n)) +
geom_bar(stat = 'identity', fill = "lightgreen") +
ggtitle("Count of Total Transactions by Category For Income 35-49K") +
scale_x_discrete("Product Category" , guide = guide_axis(n.dodge = 2)) +
scale_y_continuous(labels = scales::comma) +
theme(plot.title = element_text(hjust = 0.5)) +
labs(y = "Number of Transactions", x = "Product Category")
transactions %>%
inner_join(products) %>%
inner_join(demographics) %>%
inner_join(coupons) %>%
inner_join(coupon_redemptions) %>%
filter(str_detect(income, "35-49K")) %>%
group_by(product_category) %>%
count(product_category) %>%
filter(n > 265) %>%
ggplot(aes(x = reorder(product_category, -n), y = n)) +
geom_bar(stat = 'identity', fill = "lightgreen") +
ggtitle("Count of Coupon Transactions by Category For Income 35-49K") +
scale_x_discrete("Product Category" , guide = guide_axis(n.dodge = 2)) +
scale_y_continuous(labels = scales::comma) +
theme(plot.title = element_text(hjust = 0.5)) +
labs(y = "Number of Transactions", x = "Product Category")
These graphs show the number of total transactions and and number of transactions using coupons for the 35-49K income range. Soft drinks, baked bread/buns/rolls, and bag snacks are all present in the total transactions graph, but are not present in the coupon transactions graph. On the other hand, beef, tropical fruit, and vegetables are all present in the coupon transactions graph, but not in the total transactions graph.
transactions %>%
inner_join(products) %>%
inner_join(demographics) %>%
filter(str_detect(income, "50-74K")) %>%
group_by(product_category) %>%
count(product_category) %>%
filter(n > 4000) %>%
ggplot(aes(x = reorder(product_category, -n), y = n)) +
geom_bar(stat = 'identity', fill = "orchid") +
ggtitle("Count of Total Transactions by Category For Income 50-74K") +
scale_x_discrete("Product Category" , guide = guide_axis(n.dodge = 2)) +
scale_y_continuous(labels = scales::comma) +
theme(plot.title = element_text(hjust = 0.5)) +
labs(y = "Number of Transactions", x = "Product Category")
transactions %>%
inner_join(products) %>%
inner_join(demographics) %>%
inner_join(coupons) %>%
inner_join(coupon_redemptions) %>%
filter(str_detect(income, "50-74K")) %>%
group_by(product_category) %>%
count(product_category) %>%
filter(n > 600) %>%
ggplot(aes(x = reorder(product_category, -n), y = n)) +
geom_bar(stat = 'identity', fill = "orchid") +
ggtitle("Count of Coupon Transactions by Category For Income 50-74K") +
scale_x_discrete("Product Category" , guide = guide_axis(n.dodge = 2)) +
scale_y_continuous(labels = scales::comma) +
theme(plot.title = element_text(hjust = 0.5)) +
labs(y = "Number of Transactions", x = "Product Category")
These graphs show the number of total transactions and and number of transactions using coupons for the 50-74K income range. Again, the product categories of soft drinks, baked bread/buns/rolls, and bag snacks are all only present in the total transactions graph. The same is true for cheese. Beef, yogurt, tropical fruit, and frozen pizza are all only present in the coupon transactions graph.
transactions %>%
inner_join(products) %>%
inner_join(demographics) %>%
filter(str_detect(income, "75-99K")) %>%
group_by(product_category) %>%
count(product_category) %>%
filter(n > 2000) %>%
ggplot(aes(x = reorder(product_category, -n), y = n)) +
geom_bar(stat = 'identity', fill = "lightgoldenrod") +
ggtitle("Count of Total Transactions by Category For Income 75-99K") +
scale_x_discrete("Product Category" , guide = guide_axis(n.dodge = 2)) +
scale_y_continuous(labels = scales::comma) +
theme(plot.title = element_text(hjust = 0.5)) +
labs(y = "Number of Transactions", x = "Product Category")
transactions %>%
inner_join(products) %>%
inner_join(demographics) %>%
inner_join(coupons) %>%
inner_join(coupon_redemptions) %>%
filter(str_detect(income, "75-99K")) %>%
group_by(product_category) %>%
count(product_category) %>%
filter(n > 325) %>%
ggplot(aes(x = reorder(product_category, -n), y = n)) +
geom_bar(stat = 'identity', fill = "lightgoldenrod") +
ggtitle("Count of Coupon Transactions by Category For Income 75-99K") +
scale_x_discrete("Product Category" , guide = guide_axis(n.dodge = 2)) +
scale_y_continuous(labels = scales::comma) +
theme(plot.title = element_text(hjust = 0.5)) +
labs(y = "Number of Transactions", x = "Product Category")
For the 75-99K income range, soft drinks, baked bread/buns/rolls, and bag snacks are all only present in the total transactions graph. Frozen meat dinners, tropical fruit, and frozen pizza are all only present in the coupon transactions graph.
tmonths %>%
inner_join(demographics) %>%
inner_join(coupons) %>%
inner_join(coupon_redemptions) %>%
group_by(household_id, sales_value, month) %>%
summarise(Sales = sum(sales_value)) %>%
ggplot(aes(x = month, y = Sales)) +
geom_bar(stat = 'identity', fill = "lightblue") +
ggtitle("Total Coupon Sales by Month") +
scale_x_continuous("Month",
breaks = c(1,2,3,4,5,6,7,8,9,10,11,12),
label = c("1","2","3","4","5","6","7","8","9","10","11","12")) +
scale_y_continuous(labels = scales::comma) +
theme(plot.title = element_text(hjust = 0.5)) +
labs(y = "Toal Sales", x = "Month")
This graph shows the total coupon sales for all income ranges by month. There is a sizable drop in coupon sales in February, as well as a more minor drop in sales from September-November.
tmonths %>%
inner_join(demographics) %>%
inner_join(coupons) %>%
inner_join(coupon_redemptions) %>%
group_by(household_id, sales_value, month) %>%
filter(str_detect(income, "35-49K")) %>%
summarise(Sales = sum(sales_value)) %>%
ggplot(aes(x = month, y = Sales)) +
geom_bar(stat = 'identity', fill = "lightgreen") +
ggtitle("Total Coupon Sales by Month For Income 35-49K") +
scale_x_continuous("Month",
breaks = c(1,2,3,4,5,6,7,8,9,10,11,12),
label = c("1","2","3","4","5","6","7","8","9","10","11","12")) +
scale_y_continuous(labels = scales::comma) +
theme(plot.title = element_text(hjust = 0.5)) +
labs(y = "Toal Sales", x = "Month")
This graph shows the total coupon sales by month for income range 35-49K. There are significantly less coupon sales in the beginning of the year, particularly in January and February. May and November have notable jumps in coupon sales compared to surrounding months.
tmonths %>%
inner_join(demographics) %>%
inner_join(coupons) %>%
inner_join(coupon_redemptions) %>%
group_by(household_id, sales_value, month) %>%
filter(str_detect(income, "50-74K")) %>%
summarise(Sales = sum(sales_value)) %>%
ggplot(aes(x = month, y = Sales)) +
geom_bar(stat = 'identity', fill = "orchid") +
ggtitle("Total Coupon Sales by Month For Income 50-74K") +
scale_x_continuous("Month",
breaks = c(1,2,3,4,5,6,7,8,9,10,11,12),
label = c("1","2","3","4","5","6","7","8","9","10","11","12")) +
scale_y_continuous(labels = scales::comma) +
theme(plot.title = element_text(hjust = 0.5)) +
labs(y = "Toal Sales", x = "Month")
For shoppers in the income range 50-74K, there is a major drop in coupon sales from September-November, as well as a minor drop in sales during the month of February.
tmonths %>%
inner_join(demographics) %>%
inner_join(coupons) %>%
inner_join(coupon_redemptions) %>%
group_by(household_id, sales_value, month) %>%
filter(str_detect(income, "75-99K")) %>%
summarise(Sales = sum(sales_value)) %>%
ggplot(aes(x = month, y = Sales)) +
geom_bar(stat = 'identity', fill = "lightgoldenrod") +
labs(title = "Total Coupon Sales by Month For Income 75-99K") +
scale_x_continuous("Month",
breaks = c(1,2,3,4,5,6,7,8,9,10,11,12),
label = c("1","2","3","4","5","6","7","8","9","10","11","12")) +
scale_y_continuous(labels = scales::comma) +
theme(plot.title = element_text(hjust = 0.5)) +
labs(y = "Toal Sales", x = "Month")
This graph shows the total coupon sales by month For shoppers in the income range 75-99K, the months of January-March have noticeably lower sales than other months. In addition, October-December shows a significant increase in coupon sales when compared to the rest of the year.
For this report, I was tasked with finding areas of growth for Regork in order to potentially improve revenue in the future. In order to do so, I focused on the role that coupons play in transactions for shoppers of different income ranges. First, I identified the three income ranges which contained the highest amount of total sales while using coupons: 35-49K, 50-74K, and 75-99K. After honing in on these three income ranges, I compared the top product categories for each income ranges’ total transactions and transactions with coupons. Through this analysis, I found many consistent products at the top of total transactions, such as soft drinks, fluid milk products, and baked bread/buns/rolls. Meanwhile, the coupon transactions charts included different product categories, such as vegetables, tropical fruits, and frozen pizza. Finally, I analyzed total coupon sales by month for all income categories as well as for the three income ranges mentioned above. I found notable drops in coupon sales often occurred at the beginning of the year (most often in January and February) as well as in the later months, with each chart showing a drop somewhere between August and November.
Given the findings that I have detailed above, I would recommend that Regork reduces the number of coupons offered for product categories that seem to be consistent performers across numerous income ranges, such as soft drinks, fluid milk products, and baked bread/buns/rolls. This is because these product categories were near the top of total transactions across all income ranges, suggesting that a decrease in offered coupons would not hurt sales for these products as much as it might for others. In turn, Regork could offer more coupons for less purchased items. Perhaps other produce items could see more regular coupon offers, which could provide similar benefits that are displayed from the high coupon sales of categories such as tropical fruit and vegetables. Providing a similar boost in sales as lower-selling produce items receive from coupons would increase profits throughout the store. I would also recommend that Regork runs two strategic marketing campaigns for their coupons. One of these should come at the beginning of the year, such as during January and February. Then, a second coupon campaign could be ran from September through October. The goal of these campaigns would be to boost coupon transactions during these months due to the drops in coupon sales that are seen at those times. This will, in turn, improve overall sales, and boost profits.
One of the limitations of this analysis was the limited data set. Although there are ~1.5 million rows of data in the transactions data set, when this data was broken down into coupon sales by income group, the total sales for numerous income groups were under 10,000, which makes it almost impossible to break this information down further and gain meaningful analysis. This limited me to fewer income groups than I would have liked to analyze. In addition, if I had more time to analyze the data set, I would likely have been able to dig deeper into the data and draw even more meaningful insights and conclusions from the data.