Is there an opportunity to take the best selling products per season, and sell them in other seasons to increase the overall profit from products sold per season?
We started by looking at which products have spikes in sales for each season. We classify a spike in sales as a product having more than 50% of its sales happen in a single season. We then plotted the top 25 selling products in each season. After analyzing our findings from those plots, we then plotted how those top 25 products from each season performed in their off-seasons. From there, we were able to make conclusions on which products would be profitable to sell year-round.
If we understand the purchasing patterns for seasonal products, we can determine if there is any incentive to market them year round. From this deeper understanding, we can develop better marketing campaigns for these products; which could ultimately lead to higher overall revenue from product sales per season, and more product competition per season.
#Required libraries
library(ggplot2)
library(tidyverse)
library(tibble)
library(lubridate)
library(completejourney)
# Required data sets from complete journey
transactions <- get_transactions()
data("products")
# A function to create a vector containing the months that correlate to each season
get_season <- function(x){
case_when(
month(x) %in% c(12, 1, 2) ~ "Winter",
month(x) %in% c(3, 4, 5) ~ "Spring",
month(x) %in% c(6, 7, 8) ~ "Summer",
month(x) %in% c(9, 10, 11) ~ "Fall",
TRUE ~ NA_character_
)
}
seasons <- c("Winter", "Spring", "Summer", "Fall")
# A function to create a season variable from the transaction_timestamp
df <- transactions %>%
inner_join(products, by = "product_id") %>%
mutate(season = get_season(transaction_timestamp)) %>%
filter(!is.na(season)) %>%
group_by(product_type, season) %>%
summarize(sales_season = sum(sales_value, na.rm = TRUE), .groups = "drop")
# Creating sales_ratio field, which tracks what proportion of total sales of that product occurred in each season
df_sales_ratio <- df %>%
group_by(product_type) %>%
mutate(
total_product_sales = sum(sales_season),
sales_ratio = sales_season / total_product_sales) %>%
ungroup()
# Selects the top 25 products for each season by sales- where that season had more than half of all sales
seasonal_products <- df_sales_ratio %>%
filter(sales_ratio > 0.5) %>%
group_by(season) %>%
arrange(desc(total_product_sales)) %>%
slice_head(n = 25) %>%
ungroup()
# For loop to create a top 25 products plot for each season
for (s in seasons) {
p <- ggplot(filter(seasonal_products, season == s), aes(x = total_product_sales, y = reorder(product_type, total_product_sales), fill = season)) +
geom_col() +
ggtitle(sprintf("Top %s Seasonal Products", s)) +
labs(x = "Total Sales", y = "Products") +
theme(legend.position = "none") +
scale_x_continuous(labels = scales::dollar)
print(p)
}
In this coding chunk, we first began by joining the transactions data set to the products data set by the “product_id” variable key. We then created a new variable, using mutate, called season using the function get_season(), which categorizes transactions into a season based on the date they occurred. After grouping by product type and season, we calculated the total sales for each product, per season. After that, we compared the product sales per season to the total sales for each product; the “sales_ratio” field measures the proportion of total sales that happened in a specific season. We filtered only for records that had a sales ratio greater than 50%, meaning the majority of sales for the plotted products happened in the one season- this was to ensure our plots accurately assessed . Lastly, we selected the top 25 products per season to plot, ordered by their total sales amount. These top 25 selling products, per season, are shown in the four plots above.
# Filter seasonal products by Winter to identify which of the top 25 products from the winter are purchased in other months
seasonal_products %>%
filter(season == "Winter") %>%
inner_join(products, by = "product_type") %>%
select(season, product_type, product_id) %>%
inner_join(transactions, by = "product_id") %>%
mutate(season_comparison = get_season(transaction_timestamp)) %>%
filter(season != season_comparison) %>%
group_by(product_type, season_comparison) %>%
summarize(total_sales = sum(sales_value), .groups = "drop") %>%
arrange(product_type, season_comparison)
## # A tibble: 53 × 3
## product_type season_comparison total_sales
## <chr> <chr> <dbl>
## 1 AS SEEN ON TV Fall 85.0
## 2 AS SEEN ON TV Spring 9.99
## 3 AS SEEN ON TV Summer 39.0
## 4 BAGS/TAGS/TISSUE/BOXES Fall 67.8
## 5 BAGS/TAGS/TISSUE/BOXES Spring 1.79
## 6 BANDANA/SCARVES Fall 188.
## 7 BANDANA/SCARVES Spring 31.0
## 8 BOXED CHRISTMAS CARDS Fall 214
## 9 BULK PLASTICS Fall 26.7
## 10 BULK PLASTICS Spring 32.1
## # ℹ 43 more rows
# Filter seasonal products by Spring to identify which of the top 25 products from the spring are purchased in other months
seasonal_products %>%
filter(season == "Spring") %>%
inner_join(products, by = "product_type") %>%
select(season, product_type, product_id) %>%
inner_join(transactions, by = "product_id") %>%
mutate(season_comparison = get_season(transaction_timestamp)) %>%
filter(season != season_comparison) %>%
group_by(product_type, season_comparison) %>%
summarize(total_sales = sum(sales_value), .groups = "drop") %>%
arrange(product_type, season_comparison)
## # A tibble: 42 × 3
## product_type season_comparison total_sales
## <chr> <chr> <dbl>
## 1 CB (NON ROSE)LILY Fall 51.9
## 2 CB (NON ROSE)LILY Summer 33.0
## 3 CB (NON ROSE)LILY Winter 22.0
## 4 CONTINUITY Fall 23.0
## 5 CONTINUITY Summer 110.
## 6 CONTINUITY Winter 125.
## 7 CORN FRESH PACKAGED Summer 13.5
## 8 CORN FRESH PACKAGED Winter 29.9
## 9 DISH GARDENS Fall 95.0
## 10 DISH GARDENS Winter 73.0
## # ℹ 32 more rows
# Filter seasonal products by Summer to identify which of the top 25 products from the summer are purchased in other months
seasonal_products %>%
filter(season == "Summer") %>%
inner_join(products, by = "product_type") %>%
select(season, product_type, product_id) %>%
inner_join(transactions, by = "product_id") %>%
mutate(season_comparison = get_season(transaction_timestamp)) %>%
filter(season != season_comparison) %>%
group_by(product_type, season_comparison) %>%
summarize(total_sales = sum(sales_value), .groups = "drop") %>%
arrange(product_type, season_comparison)
## # A tibble: 62 × 3
## product_type season_comparison total_sales
## <chr> <chr> <dbl>
## 1 AGE RESTRICTED FIREWORKS Spring 92.0
## 2 AGE RESTRICTED FIREWORKS Winter 10.5
## 3 AREA REPELLANTS Spring 46.4
## 4 AREA RUGS Spring 54.9
## 5 AREA RUGS Winter 60.0
## 6 ATHLETIC GIFT ITEMS Fall 159.
## 7 ATHLETIC GIFT ITEMS Winter 36.0
## 8 BLUEBERRIES Fall 536.
## 9 BLUEBERRIES Spring 595.
## 10 BLUEBERRIES Winter 530.
## # ℹ 52 more rows
# Filter seasonal products by Fall to identify which of the top 25 products from the fall are purchased in other months
seasonal_products %>%
filter(season == "Fall") %>%
inner_join(products, by = "product_type") %>%
select(season, product_type, product_id) %>%
inner_join(transactions, by = "product_id") %>%
mutate(season_comparison = get_season(transaction_timestamp)) %>%
filter(season != season_comparison) %>%
group_by(product_type, season_comparison) %>%
summarize(total_sales = sum(sales_value), .groups = "drop") %>%
arrange(product_type, season_comparison)
## # A tibble: 59 × 3
## product_type season_comparison total_sales
## <chr> <chr> <dbl>
## 1 ALL UNDERWEAR Spring 71.9
## 2 ALL UNDERWEAR Summer 37.9
## 3 ALL UNDERWEAR Winter 11.0
## 4 APPLES OTHER (BULK&BAG) Spring 442.
## 5 APPLES OTHER (BULK&BAG) Summer 209.
## 6 APPLES OTHER (BULK&BAG) Winter 587.
## 7 ARTIFICIAL TREES Winter 74.0
## 8 CANDY Spring 29.4
## 9 CANDY Summer 89.6
## 10 CANDY Winter 8.87
## # ℹ 49 more rows
# Plotting the results from filtering the seasonal products by season to identify the top 25 products from every season that are purchased in out of season months
for (s in seasons) {
data <- seasonal_products %>%
filter(season == s) %>%
inner_join(products, by = "product_type") %>%
select(season, product_type, product_id) %>%
inner_join(transactions, by = "product_id") %>%
mutate(season_comparison = get_season(transaction_timestamp)) %>%
filter(season != season_comparison) %>%
group_by(product_type, season_comparison) %>%
summarize(total_sales = sum(sales_value), .groups = "drop") %>%
arrange(product_type, season_comparison)
plot <- ggplot(data, aes(total_sales, reorder(product_type, total_sales), fill = season_comparison)) +
geom_col() +
ggtitle(sprintf("Sales of %s Products Year-Round", s)) +
labs(x = "Total Sales", y = "Product") +
theme(legend.position = "none") +
facet_wrap(~ season_comparison) +
theme_minimal()
print(plot)
}
In this coding chunk, we rejoined the seasonal_products data frame back to products and transactions data frames and filtered for each season one at a time. We then used the get_season() function again in order to look at transactions for specific seasonal products that occurred in an off-season. Again, we calculated total sales for the product across the three other seasons and visualized the results. This allows us to determine if any trends exist for purchasing seasonal products at other times of the year, and how we can adjust marketing strategies accordingly to maximize profit from products every season if trends do exist.
Is there an opportunity to take the best selling products per season, and sell them in other seasons to increase the overall profit from products sold per season?
We started by looking at which products have spikes in sales for each season. We classified a spike in sales as a product having more than 50% of its sales happen in a single season. We then visualized the top 25 selling products in each season. After analyzing our findings from those plots, we then visualized how those top 25 products from each season performed in their off-seasons. From there, we were able to make conclusions on which products would be profitable to sell year-round.
Winter: More wireless phones are sold in the winter due to Christmas time sales, but we see none sold in the fall. We suggest having a sale on wireless phones in the fall to boost sales before Christmas.
Spring: The top products are specific to this season – I.E. patio sets and garden supplies only sell well in the spring – we suggest not wasting resources on marketing spring time top sellers in other seasons.
Summer: Athletic gift items have low sales in the spring; we suggest a sale for these items in the spring as they are popular in the summer and we could boost sales right before their peak selling season.
Fall: We suggest marketing electronic gift cards in the spring because they have significant sales in the fall and winter, and there could be sales opportunities that could drive up the sales of this product in the spring.
Limitations: Some products will only be sold during a specific season because they will not be purchased at other times. For example, Christmas cards are only sold in the winter, they are not in store during spring, summer and fall; therefore, this season-specific product can’t be purchased year-round. There are examples like Christmas cards in the winter, in every season. In conclusion, certain products are not feasible to sell year-round because the demand for those products is tied to a specific season.
Future steps: We want to find a way to solidly differentiate between products that simply will not be purchased year-round, no matter how many resources we dedicate to trying to market them, and the products that have real potential to break into the top 25 selling items in another season. We believe the analysis that we have done is a great starting place to begin selecting items to market in their off-seasons, but our next step would be to continue visualizing the data frames to keep honing in on the best products that are worth dedicating more marketing resources to the rest of the year. Future analysis could be calculating off-season demand in years past or adding in another data set from complete journey – like coupons – to evaluate how off-season marketing is working.