The goal of this report is to examine potential controllable factors that change how much of certain types of products sells at a given location. The focus will be on promotions and coupons uses.
This report uses the transactions, promotions, and products tables to examine factors such as promotions and coupon use. The report looks at the top product categories in order to compare sales in specific categories between each location.
This information will allow Regork to determine what potential avenues to focus on in order to improve sales, and whether any of these avenues work better for specific types of products.
completejourney - This library stores all of the
data that will be used. The layout of the data is explained on this
site.
Complete Journey Explanation
tidyverse - A
collection of packages that assist in data analysis via things such as
cleaning up the data.
dplyr - Helps with
manipulating data through use of functions such as joining tables and
filtering on certain criteria.
ggplot2 - Used for
creating plots using the “Grammar of Graphics”.
suppressWarnings(suppressMessages(library(completejourney)))
suppressWarnings(suppressMessages(library(tidyverse)))
suppressWarnings(suppressMessages(library(dplyr)))
suppressWarnings(suppressMessages(library(ggplot2)))
transactions <- get_transactions()
promotions <- get_promotions()
suppressWarnings(suppressMessages(transactions %>%
inner_join(products) %>%
group_by(product_category) %>%
summarize(total_quantity = sum(quantity)) %>%
arrange(desc(total_quantity))))
suppressWarnings(suppressMessages(transactions %>%
inner_join(products) %>%
group_by(store_id) %>%
summarize(total_quantity = sum(quantity)) %>%
arrange(desc(total_quantity))))
The category for miscellaneous items is not likely to be beneficial to include due to how unclear it is, so it will be excluded. We will be looking at the top five for each category and store, which is as follows:FUEL, SOFT DRINKS, FLUID MILK PRODUCTS, BAKED BREAD/BUNS/ROLLS, and CHEESE for stores 429, 406, 375, 367, and 323.
suppressWarnings(suppressMessages(transactions %>%
inner_join(products) %>%
filter(str_detect(product_category, regex("FUEL|SOFT DRINKS|FLUID MILK PRODUCTS|CHEESE$|BAKED BREAD/BUNS/ROLLS"))) %>%
filter(str_detect(store_id, regex("429$|406$|375$|367$|323$"))) %>%
group_by(store_id, product_category) %>%
summarize(total_quantity = sum(quantity)) %>%
arrange(desc(total_quantity)) %>%
ggplot(aes(x = store_id, fill = product_category, y = total_quantity)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Number of Sales for Product Categories by Store",
subtitle = "Looking at the Top 5 Product Categories and Stores",
x = "Store ID",
y = "Quantity Sold",
fill = "Product Category")))
Fuel is clearly only a major source of sales at specific locations,
which are not generally the overall high sales locations. From here out
fuel will not be considered for this reason.
Despite large total sales, store 375 has relatively low sales of the top 5 product categories.
Store 367 sold proportionally high amounts of cheese, and store 323 sold proportionally high amounts of fluid milk products.
suppressWarnings(suppressMessages(products %>%
inner_join(promotions) %>%
group_by(product_category) %>%
tally() %>%
arrange(desc(n))))
The order is notably different, though the categories with the top 5 number of promotions are all ranked 12 or high in number of items sold and aside from the previously excluded miscellaneous and fuel, the reverse is true as well.
suppressWarnings(suppressMessages(products %>%
inner_join(promotions) %>%
inner_join(transactions) %>%
filter(str_detect(product_category, regex("SOFT DRINKS|FLUID MILK PRODUCTS|CHEESE$|BAKED BREAD/BUNS/ROLLS"))) %>%
filter(str_detect(store_id, regex("429$|406$|375$|367$|323$"))) %>%
group_by(store_id, product_category) %>%
tally() %>%
ggplot(aes(x = store_id, fill = product_category, y = n)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Number of Promotions for Product Categories by Store",
subtitle = "Looking at the Top 5 Product Categories and Stores",
x = "Store ID",
y = "Quantity Sold Per Promotion",
fill = "Product Category")))
Store 367 has a much larger number of promotions for soft drinks than the other stores, even though the promotions for the other categories generally make up relatively similar proportions of the promotions for each of the other stores. Store 367 did have the highest number of soft drink sales, but it was not that much out of proportion of total sales compared to the other stores.
Store 406 has a large number of fluid milk products promotions. Despite this, its number of fluid milk products sales are in line with the other stores, not even having the highest number.
This suggests an overall limited effect of promotions on sales, at least in regards to the categories with the highest quantity of items sold.
The coupons and coupon_redemptions tables in the database are not effective for comparing one store to another, so we will look at transactions wherein the coupon_disc field is greater than zero, which means a coupon was applied.
suppressWarnings(suppressMessages(products %>%
inner_join(transactions) %>%
filter(coupon_disc > 0) %>%
group_by(product_category) %>%
tally() %>%
arrange(desc(n))))
Soft drinks, cheese, and baked bread/buns/rolls are still in the top 10 in regards to number of coupons used, but fluid milk products is tied for 32nd place. As such, the data for fluid milk products may not be as accurate due to the smaller sample size.
suppressWarnings(suppressMessages(products %>%
inner_join(transactions) %>%
filter(coupon_disc > 0) %>%
filter(str_detect(product_category, regex("SOFT DRINKS|FLUID MILK PRODUCTS|CHEESE$|BAKED BREAD/BUNS/ROLLS"))) %>%
filter(str_detect(store_id, regex("429$|406$|375$|367$|323$"))) %>%
group_by(store_id, product_category) %>%
tally() %>%
arrange(desc(n)) %>%
ggplot(aes(x = store_id, fill = product_category, y = n)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Number of Coupon Uses for Product Categories by Store",
subtitle = "Looking at the Top 5 Product Categories and Stores",
x = "Store ID",
y = "Number of Promotions",
fill = "Product Category")))
Different stores have significantly different numbers of coupon uses and proportions used for each category. Store 323 has almost no coupon uses except in the soft drinks category, but it actually had proportionally fewer soft drink sales than most of the other locations.
Store 367 had a higher proportion of their coupon uses going towards cheese, which does line up with this store having had the highest cheese sales.
This suggests an overall limited effect of coupon usage on sales, at least in regards to the categories with the highest quantity of items sold.
This report looked into the effects of promotions and coupon usage on sales. This was done by comparing sales, promotions, and coupon usage of the top five locations and some of the top product categories. In regards to the product categories with the highest numbers of sales, both promotions and coupons have limited effects. While it does seem that there is some benefit, it does not seem to be consistent. It is possible that this is because some of these categories are basic foods that people are likely to buy anyways. Further analysis would be necessary to determine this, such as by selecting categories that still have overall significant sales but are not generally seen as basic foods. For now, the data suggests that while some resources used on promotions and coupons can bring a benefit, the company should not put an excessive amount towards these as it could potentially be a waste of resources.