Group 5: Grace Bare, Emily Theis, Leah Searcy
In this analysis, we will be identifying a potential area of growth where the company Regork could invest future resources to increase revenue and profits.
# Load Packages
library(completejourney)
library(tidyverse)
library(dplyr)
library(ggplot2)
library(RColorBrewer)
library(gridExtra)
Using the CompleteJourney pacakge, the data for transactions and promotions needed to be loaded into dataframes. This data is already is a usable format, so cleaning was not required.
# Load Data
transactions <- get_transactions()
promotions <- get_promotions()
The primary purpose of this analysis is to understand how coupons are being used based on department to provide ideas for improving growth. To do this, we look at which departments have the greatest quantity sold and highest sales overall, and then look deeper into the coupon distribution and coupon redemptions. By examining the data, we found an opportunity and developed a strategy to increase sales in the seafood department.
products %>%
inner_join(transactions, by = "product_id") %>%
mutate(department = ifelse(str_detect(department, "SEAFOOD"), "SEAFOOD", department)) %>%
mutate(department = ifelse(str_detect(department, "MEAT"), "MEAT", department)) %>%
group_by(department) %>%
count(quantity) %>%
summarize(total=sum(n)) %>%
arrange(desc(total)) %>%
ggplot(aes( x = total, fct_reorder(department, total), fill = factor(ifelse(department == "SEAFOOD","SEAFOOD", "Normal"))))+
geom_bar(stat="identity") +
theme_minimal() +
theme(legend.position="none") +
annotate("text", x=18000, y="SEAFOOD", label="8,767",size = 4, color = "#FF9999") +
scale_fill_manual(name="Department", values=c("#CC3366", "#FF9999")) +
scale_x_log10(labels = scales::comma) +
labs (
title = "Quantity of Items Bought per Department",
x = "Quantity of items",
y = "Department",
)
As shown in this graph, the total combined seafood items sold (fresh, frozen, packaged, etc.) is significantly lower than similar departments such as meat, deli, and produce.
#number of coupons used for each department
products %>%
inner_join(coupons, by = "product_id") %>%
mutate(department = ifelse(str_detect(department, "SEAFOOD"), "SEAFOOD", department)) %>%
mutate(department = ifelse(str_detect(department, "MEAT"), "MEAT", department)) %>%
mutate(highlight_flag = ifelse(department == 'SEAFOOD', T, F)) %>%
select(department, coupon_upc, highlight_flag) %>%
group_by(department, highlight_flag) %>%
count(coupon_upc) %>%
summarize(total=sum(n)) %>%
arrange(desc(total)) %>%
ggplot(aes(x = total, fct_reorder(department, total), color = highlight_flag)) +
geom_point(size = 3) +
theme_minimal() +
theme(legend.position = "none") +
theme(plot.subtitle=element_text(size=9, color="black")) +
scale_color_manual(values = c("TRUE" = "#FF9999", "FALSE" = "#CC3366")) +
annotate("text", x=5500, y="SEAFOOD", label="2,934",size = 4.25, color = "#FF9999") +
labs(
title = "Which Department Gives Out the Most Amount of Coupons",
subtitle = "Data about the number of coupons given out from two different stores and the top 6 departments",
x = "Number of Coupons Given Out",
y = "Department"
)
Next, we wanted to see if there could be any correlation between the department sales and the number of coupons given out. In the graph above, we can see that the amount of coupons given out follows the same trend as the total sales. Seafood has less coupons given out compared to meat and produce, which could be a reason why the number of items bought for meat and produce were greater. This indicates that if more coupons are distributed for a particular department, the quantity of sales will also increase.
# sales based on item
products %>%
inner_join(transactions, by = "product_id") %>%
mutate(department = ifelse(str_detect(department, "SEAFOOD"), "SEAFOOD", department)) %>%
mutate(department = ifelse(str_detect(department, "MEAT"), "MEAT", department)) %>%
group_by(department) %>%
summarize(total=round(mean(sales_value), 2)) %>%
arrange(desc(total)) %>%
ggplot(aes( x = total, fct_reorder(department, total), fill = factor(ifelse(department == "SEAFOOD","SEAFOOD", "Normal"))))+
geom_bar(stat="identity") +
scale_fill_manual(name="Department", values=c("#CC3366", "#FF9999")) +
theme_minimal() +
theme(legend.position="none") +
annotate("text", x=7.5, y="SEAFOOD", label="$6.05",size = 4.25, color = "#FF9999") +
labs (
title = "Avg Sales Value by Department",
x = "Avg Sales Value",
y = "Department",
)
In the above graph, you can see that the value of seafood is one of the highest departments. This means our opportunity to make more money if we sold more items is very high. Each item sold on average is about $6, which is greater than if we sold one item of deli or meat.
Next, we looked at the coupons redeemed by department.
coupon_redemptions %>%
inner_join(coupons, by = "coupon_upc") %>%
inner_join (products, by = "product_id") %>%
mutate(department = ifelse(str_detect(department, "SEAFOOD"), "SEAFOOD", department)) %>%
mutate(department = ifelse(str_detect(department, "MEAT"), "MEAT", department)) %>%
group_by(department) %>%
count(coupon_upc) %>%
summarize (total = sum(n)) %>%
arrange(desc(total)) %>%
ggplot(aes( x = total, fct_reorder(department, total), fill = factor(ifelse(department == "SEAFOOD","SEAFOOD", "Normal"))))+
geom_bar(stat="identity") +
theme_minimal() +
theme(legend.position="none") +
scale_fill_manual(name="Department", values=c("#CC3366", "#FF9999")) +
annotate("text", x=170000, y="SEAFOOD", label="79,388",size = 4.25, color = "#FF9999") +
scale_x_log10(labels = scales::comma) +
labs (
title = "Coupon Redemption Based on Department",
x = "Number of Coupons",
y = "Department",
)
NA
NA
Finally, we can evaluate which age groups already buy the most seafood and which groups redeem the most coupons. From the graphs below, we can see that the seafood sales and coupon redemptions according to age group follow the same trend. The same age groups who already buy seafood also redeem the most coupons.
plot4 <- coupon_redemptions %>%
inner_join(demographics, by = "household_id") %>%
group_by(age) %>%
count(redemption_date) %>%
summarize(total = sum (n)) %>%
arrange(desc(total)) %>%
ggplot(aes( x = age, y=total, color = age, base.size = 11)) +
geom_point(color = "#CC3366", size = 3) +
theme(legend.position="none") +
labs (
title = "Coupons Redeemed by Age Group",
x="Age",
y="Coupons"
) +
guides(color="none") +
theme_minimal() +
geom_text(aes(x="45-54", y=max(total)+50, label=round(max(total))), show.legend = FALSE, color = "#CC3366")
plot5 <- demographics %>%
inner_join(transactions, by = "household_id")%>%
inner_join(products, by = "product_id") %>%
mutate(department = ifelse(str_detect(department, "SEAFOOD"), "SEAFOOD", department)) %>%
mutate(department = ifelse(str_detect(department, "MEAT"), "MEAT", department)) %>%
group_by(department, age) %>%
filter(department == "SEAFOOD") %>%
count(quantity) %>%
summarize(total = sum (n)) %>%
arrange(desc(total)) %>%
ggplot(aes(x=age,y=total), color = "Rb", base.size = 11) +
geom_point(size = 3, color="#660099") +
labs(
title = "Seafood Sales by Age Group",
x = "Age",
y = "Sales"
) +
theme_minimal()+
geom_text(aes(x="45-54", y=max(total)+100, label=round(max(total))), show.legend = FALSE, color = "#660099")
grid.arrange(plot4, plot5, ncol = 2)
Overall, the Seafood department is an area that could be targeted to increase sales. This can be done by giving out more Seafood coupons and advertising to the 45-54 age group, the 35-44 age group, and the 25-34 age group. These age groups are most likely to buy Seafood and redeem the most coupons. By targeting the age groups that use coupons, this can also increase the number of Seafood bought since it might have been too expensive for them to buy before the coupon.