Introduction

Business problem we are addressing:

Is there an opportunity to take the best selling products per season, and sell them in other seasons to increase the overall profit from products sold per season?

How our team addressed this problem:

We started by looking at which products have spikes in sales for each season. We classify a spike in sales as a product having more than 50% of its sales happen in a single season. We then plotted the top 25 selling products in each season. After analyzing our findings from those plots, we then plotted how those top 25 products from each season performed in their off-seasons. From there, we were able to make conclusions on which products would be profitable to sell year-round.

How this analysis will help:

If we understand the purchasing patterns for seasonal products, we can determine if there is any incentive to market them year round. From this deeper understanding, we can develop better marketing campaigns for these products; which could ultimately lead to higher overall revenue from product sales per season, and more product competition per season.

Required packages and libraries

#Required libraries
library(ggplot2)
library(tidyverse)
library(tibble)
library(lubridate)
library(completejourney)

# Required data sets from complete journey
transactions <- get_transactions() 
data("products")

Explaination regarding purpose of packages:

  1. Ggplot helps us visualize our data, by creating readable and informative plots, making analysis of the data clear and efficient.
  2. Tidyverse has several packages that fall under its umbrella, but in total it allows us to manipulate our data sets by granting access to filtering, joining, mutating, grouping, etc.
  3. Tibble allows us to view data tables with a clean and consistent format.
  4. Lubridate allows us to mutate our transaction date and assign them to a season, this makes it so we can analyze the top sold products in each season.
  5. Completejounrey has all of the data sets we are using for our analysis. We analyze the transactions and products data sets from the completejourney package

Coding chunk for plotting top 25 products per season by sales:

# A function to create a vector containing the months that correlate to each season 
get_season <- function(x){ 
  case_when( 
    month(x) %in% c(12, 1, 2) ~ "Winter",  
    month(x) %in% c(3, 4, 5) ~ "Spring",  
    month(x) %in% c(6, 7, 8) ~ "Summer",  
    month(x) %in% c(9, 10, 11) ~ "Fall", 
    TRUE ~ NA_character_
  ) 
} 
seasons <- c("Winter", "Spring", "Summer", "Fall")

# A function to create a season variable from the transaction_timestamp 
df <- transactions %>%  
  inner_join(products, by = "product_id") %>% 
  mutate(season = get_season(transaction_timestamp)) %>% 
  filter(!is.na(season)) %>%
  group_by(product_type, season) %>%  
  summarize(sales_season = sum(sales_value, na.rm = TRUE), .groups = "drop") 

# Creating sales_ratio field, which tracks what proportion of total sales of that product occurred in each season  
df_sales_ratio <- df %>% 
  group_by(product_type) %>% 
  mutate(
    total_product_sales = sum(sales_season), 
    sales_ratio = sales_season / total_product_sales) %>% 
  ungroup()

# Selects the top 25 products for each season by sales- where that season had more than half of all sales  
seasonal_products <- df_sales_ratio %>% 
  filter(sales_ratio > 0.5) %>% 
  group_by(season) %>% 
  arrange(desc(total_product_sales)) %>% 
  slice_head(n = 25) %>% 
  ungroup()

# For loop to create a top 25 products plot for each season 
for (s in seasons) { 
    p <- ggplot(filter(seasonal_products, season == s), aes(x = total_product_sales, y = reorder(product_type, total_product_sales), fill = season)) + 
      geom_col() + 
      ggtitle(sprintf("Top %s Seasonal Products", s)) + 
      labs(x = "Total Sales", y = "Products") + 
      theme(legend.position = "none") +
      scale_x_continuous(labels = scales::dollar)
print(p) 
}

Explaination of the coding chunk:

In this coding chunk, we first began by joining the transactions data set to the products data set by the “product_id” variable key. We then created a new variable, using mutate, called season using the function get_season(), which categorizes transactions into a season based on the date they occurred. After grouping by product type and season, we calculated the total sales for each product, per season. After that, we compared the product sales per season to the total sales for each product; the “sales_ratio” field measures the proportion of total sales that happened in a specific season. We filtered only for records that had a sales ratio greater than 50%, meaning the majority of sales for the plotted products happened in the one season- this was to ensure our plots accurately assessed . Lastly, we selected the top 25 products per season to plot, ordered by their total sales amount. These top 25 selling products, per season, are shown in the four plots above.

Coding chunk for which of the top 25 products, per season, are purchased in other months:

# Filter seasonal products by Winter to identify which of the top 25 products from the winter are purchased in other months 
seasonal_products %>%  
  filter(season == "Winter") %>%  
  inner_join(products, by = "product_type") %>%  
  select(season, product_type, product_id) %>%  
  inner_join(transactions, by = "product_id") %>%  
  mutate(season_comparison = get_season(transaction_timestamp)) %>%  
  filter(season != season_comparison) %>%  
  group_by(product_type, season_comparison) %>%  
  summarize(total_sales = sum(sales_value), .groups = "drop") %>%  
  arrange(product_type, season_comparison) 
## # A tibble: 53 × 3
##    product_type           season_comparison total_sales
##    <chr>                  <chr>                   <dbl>
##  1 AS SEEN ON TV          Fall                    85.0 
##  2 AS SEEN ON TV          Spring                   9.99
##  3 AS SEEN ON TV          Summer                  39.0 
##  4 BAGS/TAGS/TISSUE/BOXES Fall                    67.8 
##  5 BAGS/TAGS/TISSUE/BOXES Spring                   1.79
##  6 BANDANA/SCARVES        Fall                   188.  
##  7 BANDANA/SCARVES        Spring                  31.0 
##  8 BOXED CHRISTMAS CARDS  Fall                   214   
##  9 BULK PLASTICS          Fall                    26.7 
## 10 BULK PLASTICS          Spring                  32.1 
## # ℹ 43 more rows
# Filter seasonal products by Spring to identify which of the top 25 products from the spring are purchased in other months 
seasonal_products %>%  
  filter(season == "Spring") %>%  
  inner_join(products, by = "product_type") %>%  
  select(season, product_type, product_id) %>%  
  inner_join(transactions, by = "product_id") %>%  
  mutate(season_comparison = get_season(transaction_timestamp)) %>%  
  filter(season != season_comparison) %>%  
  group_by(product_type, season_comparison) %>%  
  summarize(total_sales = sum(sales_value), .groups = "drop") %>%  
  arrange(product_type, season_comparison)
## # A tibble: 42 × 3
##    product_type        season_comparison total_sales
##    <chr>               <chr>                   <dbl>
##  1 CB (NON ROSE)LILY   Fall                     51.9
##  2 CB (NON ROSE)LILY   Summer                   33.0
##  3 CB (NON ROSE)LILY   Winter                   22.0
##  4 CONTINUITY          Fall                     23.0
##  5 CONTINUITY          Summer                  110. 
##  6 CONTINUITY          Winter                  125. 
##  7 CORN FRESH PACKAGED Summer                   13.5
##  8 CORN FRESH PACKAGED Winter                   29.9
##  9 DISH GARDENS        Fall                     95.0
## 10 DISH GARDENS        Winter                   73.0
## # ℹ 32 more rows
# Filter seasonal products by Summer to identify which of the top 25 products from the summer are purchased in other months
seasonal_products %>%  
  filter(season == "Summer") %>%  
  inner_join(products, by = "product_type") %>%  
  select(season, product_type, product_id) %>%  
  inner_join(transactions, by = "product_id") %>%  
  mutate(season_comparison = get_season(transaction_timestamp)) %>%  
  filter(season != season_comparison) %>%  
  group_by(product_type, season_comparison) %>%  
  summarize(total_sales = sum(sales_value), .groups = "drop") %>%  
  arrange(product_type, season_comparison)
## # A tibble: 62 × 3
##    product_type             season_comparison total_sales
##    <chr>                    <chr>                   <dbl>
##  1 AGE RESTRICTED FIREWORKS Spring                   92.0
##  2 AGE RESTRICTED FIREWORKS Winter                   10.5
##  3 AREA REPELLANTS          Spring                   46.4
##  4 AREA RUGS                Spring                   54.9
##  5 AREA RUGS                Winter                   60.0
##  6 ATHLETIC GIFT ITEMS      Fall                    159. 
##  7 ATHLETIC GIFT ITEMS      Winter                   36.0
##  8 BLUEBERRIES              Fall                    536. 
##  9 BLUEBERRIES              Spring                  595. 
## 10 BLUEBERRIES              Winter                  530. 
## # ℹ 52 more rows
# Filter seasonal products by Fall to identify which of the top 25 products from the fall are purchased in other months 
seasonal_products %>%  
filter(season == "Fall") %>%  
inner_join(products, by = "product_type") %>%  
select(season, product_type, product_id) %>%  
inner_join(transactions, by = "product_id") %>%  
mutate(season_comparison = get_season(transaction_timestamp)) %>%  
filter(season != season_comparison) %>%  
group_by(product_type, season_comparison) %>%  
summarize(total_sales = sum(sales_value), .groups = "drop") %>%  
arrange(product_type, season_comparison) 
## # A tibble: 59 × 3
##    product_type            season_comparison total_sales
##    <chr>                   <chr>                   <dbl>
##  1 ALL UNDERWEAR           Spring                  71.9 
##  2 ALL UNDERWEAR           Summer                  37.9 
##  3 ALL UNDERWEAR           Winter                  11.0 
##  4 APPLES OTHER (BULK&BAG) Spring                 442.  
##  5 APPLES OTHER (BULK&BAG) Summer                 209.  
##  6 APPLES OTHER (BULK&BAG) Winter                 587.  
##  7 ARTIFICIAL TREES        Winter                  74.0 
##  8 CANDY                   Spring                  29.4 
##  9 CANDY                   Summer                  89.6 
## 10 CANDY                   Winter                   8.87
## # ℹ 49 more rows
# Plotting the results from filtering the seasonal products by season to identify the top 25 products from every season that are purchased in out of season months
for (s in seasons) {  
  data <- seasonal_products %>%  
  filter(season == s) %>%  
  inner_join(products, by = "product_type") %>%  
  select(season, product_type, product_id) %>%  
  inner_join(transactions, by = "product_id") %>%  
  mutate(season_comparison = get_season(transaction_timestamp)) %>%  
  filter(season != season_comparison) %>%  
  group_by(product_type, season_comparison) %>%  
  summarize(total_sales = sum(sales_value), .groups = "drop") %>%  
  arrange(product_type, season_comparison)
  
plot <- ggplot(data, aes(total_sales, reorder(product_type, total_sales), fill = season_comparison)) + 
  geom_col() + 
  ggtitle(sprintf("Sales of %s Products Year-Round", s)) + 
  labs(x = "Total Sales", y = "Product") + 
  theme(legend.position = "none") + 
  facet_wrap(~ season_comparison) +
  theme_minimal()

  print(plot) 
} 

Explaination of the coding chunk:

In this coding chunk, we rejoined the seasonal_products data frame back to products and transactions data frames and filtered for each season one at a time. We then used the get_season() function again in order to look at transactions for specific seasonal products that occurred in an off-season. Again, we calculated total sales for the product across the three other seasons and visualized the results. This allows us to determine if any trends exist for purchasing seasonal products at other times of the year, and how we can adjust marketing strategies accordingly to maximize profit from products every season if trends do exist.

Report Summary

Business problem that was addressed:

Is there an opportunity to take the best selling products per season, and sell them in other seasons to increase the overall profit from products sold per season?

How our group addressed the problem:

We started by looking at which products have spikes in sales for each season. We classified a spike in sales as a product having more than 50% of its sales happen in a single season. We then visualized the top 25 selling products in each season. After analyzing our findings from those plots, we then visualized how those top 25 products from each season performed in their off-seasons. From there, we were able to make conclusions on which products would be profitable to sell year-round.

Insights provided by the analysis:

Winter: More wireless phones are sold in the winter due to Christmas time sales, but we see none sold in the fall. We suggest having a sale on wireless phones in the fall to boost sales before Christmas.

Spring: The top products are specific to this season – I.E. patio sets and garden supplies only sell well in the spring – we suggest not wasting resources on marketing spring time top sellers in other seasons.

Summer: Athletic gift items have low sales in the spring; we suggest a sale for these items in the spring as they are popular in the summer and we could boost sales right before their peak selling season.

Fall: We suggest marketing electronic gift cards in the spring because they have significant sales in the fall and winter, and there could be sales opportunities that could drive up the sales of this product in the spring.

Limitations, and future steps:

Limitations: Some products will only be sold during a specific season because they will not be purchased at other times. For example, Christmas cards are only sold in the winter, they are not in store during spring, summer and fall; therefore, this season-specific product can’t be purchased year-round. There are examples like Christmas cards in the winter, in every season. In conclusion, certain products are not feasible to sell year-round because the demand for those products is tied to a specific season.

Future steps: We want to find a way to solidly differentiate between products that simply will not be purchased year-round, no matter how many resources we dedicate to trying to market them, and the products that have real potential to break into the top 25 selling items in another season. We believe the analysis that we have done is a great starting place to begin selecting items to market in their off-seasons, but our next step would be to continue visualizing the data frames to keep honing in on the best products that are worth dedicating more marketing resources to the rest of the year. Future analysis could be calculating off-season demand in years past or adding in another data set from complete journey – like coupons – to evaluate how off-season marketing is working.