We are working for a company named Regork, and we have to identify a potential area of growth where the company could invest future resources to increase revenue and profit.
We are providing the best place to allocate money to maximize future profitability and revenue.
We addressed this problem by looking for trends in data that show us marketing opportunities. We found these opportunities in the beverage product category, due to the fluctuation in monthly sales.
We started with total sales in product categories, and found that there are various trends in different beverage categories. We dug deeper into soft drinks, wine, and beer. We looked at sales for these products on a month to month basis to locate any dips or inconsistencies in sales.
We looked into demographics that purchase the most of a specific product.
Another solution we thought of was focusing our marketing efforts towards the specific stores that get the most foot traffic. This should help increase sales and revenue for all departments.
Promote or market these products in months where sales are very low, or very high.
Market products towards a specific age group that is responsible for the most sales pertaining to that product.
Market towards stores that have the most foot traffic to increase sales over multiple locations.
For this project, most packages we used were included in the tidyverse package, like ggplot, purrr, tidyre, and magrittr. We used the lubridate package to filter various product sales by month, and we used the complete journey data set to mine for various patterns.
The DT package was useful for making an interactive table so the user can search for data themselves.
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.1.3
## Warning: package 'ggplot2' was built under R version 4.1.3
## Warning: package 'tibble' was built under R version 4.1.3
## Warning: package 'tidyr' was built under R version 4.1.3
## Warning: package 'readr' was built under R version 4.1.3
## Warning: package 'purrr' was built under R version 4.1.3
## Warning: package 'dplyr' was built under R version 4.1.3
## Warning: package 'stringr' was built under R version 4.1.3
## Warning: package 'forcats' was built under R version 4.1.3
library(lubridate)
## Warning: package 'lubridate' was built under R version 4.1.3
library(completejourney)
## Warning: package 'completejourney' was built under R version 4.1.3
library(ggplot2)
library(DT)
## Warning: package 'DT' was built under R version 4.1.3
We used the transactions, promotions, products, and coupons to get the most important data sets into one vector. We also renamed data sets for smoother looking code.
transactions <- get_transactions()
promotions <- get_promotions()
products <- products
coupons <- coupons
tpp <- transactions %>%
inner_join(products) %>%
inner_join(promotions)
We were able to filter and group various columns in this vector to find a pattern. In this instance, we found the top 10 product categories that grosses the most sales over 1 year. We then took those 10 product categories and looked at their sales on a month to month basis.
top10 <- tpp %>%
group_by(product_category) %>%
summarize(total_sales = sum(sales_value)) %>%
slice_max(order_by = total_sales, n = 10) %>%
pull(product_category)
Graph_1 <- tpp %>%
filter(product_category %in% top10) %>%
mutate(month = month(transaction_timestamp)) %>%
group_by(month, product_category) %>%
summarize(total_sales = sum(sales_value)) %>%
ggplot(aes(y = fct_reorder(product_category, total_sales), x = total_sales, fill = total_sales)) +
geom_col(width = .5) +
facet_wrap( ~ month) +
theme(axis.text.x = element_text(angle = 90)) +
theme(axis.text.y = element_text(size = 6)) +
scale_x_continuous(labels = scales::dollar_format()) +
ylab("Product Category") +
xlab("Total Sales") +
ggtitle("Top 10 Products that Gross Most Sales per Month") +
labs(fill = "Total Sales")
We created this graph to see the greatest variation between specific products. This gives a better visual than the previous graph, as we saw that soft drinks has the most variance and the largest inner quartile range. The other products here stay pretty consistent throughout the year.
Graph_2 <- tpp %>%
filter(product_category %in% top10) %>%
mutate(month = month(transaction_timestamp)) %>%
group_by(month, product_category) %>%
summarize(total_sales = sum(sales_value)) %>%
ggplot(aes(y = fct_reorder(product_category, total_sales), x = total_sales, fill = product_category)) +
geom_boxplot() +
scale_x_continuous(labels = scales::dollar_format()) +
labs(x = "Total Sales",
y = "Products",
fill = "Product Category") +
theme_classic()
We decided to look into which demographics purchase the most Soft drinks, and alcoholic beverages. As we can see here, the ages between 45 and 54, and that age range having a household size of 1 or 2 purchase the most soft drinks.
Table_1 <- products %>%
filter(str_detect(product_category, regex("(SOFT DRINKS)", ignore_case = TRUE))) %>%
inner_join(transactions) %>%
inner_join(demographics) %>%
group_by(household_size, age) %>%
summarise(total_sales = sum(sales_value)) %>%
arrange(desc(total_sales))
After looking at this table, we noticed trends pertaining to certain demographics, and who spent the most specifically on beer. The same demographics purchase the most Beer as Soft Drinks.
Table_2 <- products %>%
filter(str_detect(product_category, regex("(BEERS/ALES)", ignore_case = TRUE))) %>%
inner_join(transactions) %>%
inner_join(demographics) %>%
group_by(household_size, age) %>%
summarise(total_sales = sum(sales_value)) %>%
arrange(desc(total_sales))
The demographic for this table is slightly different. Ages 25-34 and a household size of 1 purchase the most wine, and the previous demographics we mentioned purchase the 2nd most wine.
Table_3 <- products %>%
filter(str_detect(product_category, regex("(Wine)", ignore_case = TRUE))) %>%
inner_join(transactions) %>%
inner_join(demographics) %>%
group_by(household_size, age) %>%
summarise(total_sales = sum(sales_value)) %>%
arrange(desc(total_sales))
For this last graph, we thought about which stores sell the most product, and generate the most revenue. Below, is a lollipop chart of the top 10 stores that generate the most revenue.
L_Graph <- transactions %>%
inner_join(products) %>%
group_by(store_id, department) %>%
summarize(total_sales = sum(sales_value, na.rm = TRUE)) %>%
arrange(desc(total_sales)) %>%
slice_max(order_by = total_sales) %>%
head(10) %>%
arrange(desc(total_sales)) %>%
ggplot(aes(total_sales, fct_reorder(store_id, total_sales))) +
geom_col(width = 0.07, size = 0.6, color = "black") +
geom_point(size = 3.5, color = "black") +
geom_text(aes(label = total_sales, size = NULL), nudge_x = 7) +
scale_x_continuous(labels=scales::dollar_format()) +
xlab("Total Sales in Millions") +
ylab("Store ID") +
ggtitle("Top 10 Selling Stores by ID") +
theme_classic()
“Soft Drinks” accumulate the most sales per month. However, it begins to taper down around the end of the year. We think it would be smart to better focus marketing efforts towards soft drinks through coupons, special deals, and advertisements at the end of the year. They could also reduce spending towards some of the lower revenue earning products, and allocate their spending more towards “Soft Drinks”, where the biggest difference will be made.
We chose to look at 3 products in specific: Beer, Wine, and Soft Drinks. We then were able to join the products data with the transactions data, and filter the product category column. We then selected the households column, age column, and total sales column to see which demographic purchases the most of Beer, Wine, and Soft Drinks. We found that the same demographic purchases the most Beer and Soft Drinks, which are ages 45-54, with a household size of 1-2. The demographic that purchases the most wine were ages 25-34, with a household size of 1. We think the best way to generate more revenue for these products is to market towards these specific demographics. This can be sending them coupons via email, or creating advertisements.
We looked into foot traffic, amount of transactions, and total sales, for the top 10 stores across the data set. We think focusing on these stores and marketing more heavily towards these locations could turn a higher profit for the organization. By focusing, we mean allocate more spending towards inventory of products for people to purchase in these stores.