The problem I dissected, are there certain products that fluctuate, seasonally, more than others and what income groups influence these trends the most.
To close in on trends in the data, I found the seasonal trends based off past sales numbers of certain products, calculated percent change from season to season and considered the different demographics and how they effected the potential seasonal trends.The data was collected from CompleteJourney which supplied millions of past transactions that were then filtered out based on the studied trends.
The report provided will provide Regork representatives in the Marketing and Store Management with supplemental information for future strategy. Specifically, with the demographics mentioned, Marketing representatives will be able to understand when promotions might be useful for certain products. As for Store Management, this group will benefit from when stores might benefit the most from stocking certain products.
CompleteJourney: This package was the baseline of data for the report. All data concerning demographics, specific transactions, and products were pulled from CompleteJourney
RColorBrewer: This package allows graphs and charts to be manipulated to appear more visually appealing.
Scales: This package is used to adjust the size of different charts and graphs
DT: This package was used to add tables of data for better visualization
The phrases Volatility and Seasonal Impact will be used interchangeably throughout this report. Seasonal Impact, in a sense, is the measure of Volatility from season to season.
#Load packages to be used
suppressWarnings(suppressMessages(library(completejourney)))
suppressWarnings(suppressMessages(library(tidyverse)))
suppressWarnings(suppressMessages(library(dplyr)))
suppressWarnings(suppressMessages(library(ggplot2)))
suppressWarnings(suppressMessages(library(lubridate)))
suppressWarnings(suppressMessages(library(RColorBrewer)))
suppressWarnings(suppressMessages(library(scales)))
suppressWarnings(suppressMessages(library(DT)))
suppressWarnings(suppressMessages(library(knitr)))
transactions<- get_transactions()
promotions<- get_promotions
custom_colors<- c("Under 15K" = "turquoise3","15-24K"="springgreen2","25-34K"="lightsteelblue","35-49K"="sienna2","50-74K"="lightblue","75-99K"="rosybrown2","100-124K"="red3","125-149K"="orchid","150-174K"="maroon2","175-199K"="steelblue2","200-249K"="lightskyblue","250K+"="steelblue4")
custom_colors_2<- c("Winter"="lightblue","Spring"="lightgreen","Summer"="yellow","Fall"="orange")
suppressMessages(prod_tran<- inner_join(products,transactions))
suppressMessages(dem_tran<- inner_join(prod_tran,demographics))
suppressMessages(dem_tran_szn<- dem_tran%>%
mutate(month = month(transaction_timestamp),
season = case_when(
month %in% c(12,1,2) ~"Winter",
month %in% c(3,4,5) ~"Spring",
month %in% c(6,7,8) ~"Summer",
month %in% c(9,10,11) ~"Fall",
TRUE~ "Unknown"
)))
This data structure is a visual representation of sales per season, as clearly visible, each season has similar sales.
sales_by_season<- dem_tran_szn%>%
group_by(season)%>%
summarize(total_sales = sum(sales_value, na.rm = TRUE))
kable(head(sales_by_season, 4), caption = "Seasonal Sales Values", format = "markdown")
| season | total_sales |
|---|---|
| Fall | 664729.5 |
| Spring | 648750.2 |
| Summer | 661338.2 |
| Winter | 650066.9 |
Lets dive deeper into the next observation to gain a greater scope of understanding. ### Products by season This simple chart references which top products perform the best by season.This data is useful, however, I will dive deeper into different trends next.
# Top Sales per Season
suppressMessages({seasonal_sales_value<- dem_tran_szn%>%
group_by(product_category,season)%>%
summarize(total_sales = sum(sales_value))
top_products<- seasonal_sales_value%>%
group_by(season)%>%
top_n(3,total_sales)%>%
arrange(season,desc(total_sales))})
ggplot(top_products, aes(season,total_sales))+
geom_bar(stat = 'identity', aes(fill = season))+
facet_wrap(~ product_category) +
labs(title = "Top Product Sales per Season",
x = "Products",
y = "Total Sales") +
theme_minimal() +
scale_fill_brewer(palette = "Set1")+
theme(axis.text.x = element_text(angle = 45, hjust = 1))
# Observation 1
Seasonal impact was determined using the average change in product sales by season. Products with a higher seasonal impact tend to be more volatile as seasons change compared to those with low seasonal impact that are not as volatile.
# To find the seasonal impact of all products
suppressMessages({seasonal_sales<- dem_tran_szn%>%
group_by(product_category,season)%>%
summarize(average_sales = mean(sales_value, na.rm = TRUE))
seasonal_changes<- seasonal_sales%>%
group_by(product_category)%>%
summarize(seasonal_impact = max(average_sales)-min(average_sales))})
# To rearrange the seasonal impact
seasonal_changes_df<- seasonal_changes%>%
arrange(desc(seasonal_impact))
DT::datatable(head(seasonal_changes_df, 10))
# To order the data structure to show the top 5 products in terms of seasonal impact
top_5_products<-seasonal_changes%>%
arrange(desc(seasonal_impact))%>%
slice_head(n = 5)
This data frame narrows down the data. As shown, it is evident that Home Furnishings have the highest seasonal impact. Regork management should pay close attention to these products, as taking advantage of the different seasonal product trends may lead to higher sales. Although this chart is slimmed down, it still does not provide proficient evidence to conclude a trend.
ggplot(top_5_products, aes(product_category,seasonal_impact))+
geom_bar(stat = "identity", aes(fill = product_category), width = .7)+
labs(title = "Top 5 Products with Highest Seasonal Impact",
x = "Product",
y= "Seasonal Impact")+
theme_minimal()+
theme(axis.text.x = element_text(angle = 50, hjust = .9)) +
scale_fill_brewer(palette = "Dark2") +
scale_y_continuous(labels = comma)
# Sales per product
suppressMessages(total_sales_per_product <- dem_tran_szn %>%
group_by(product_category, season) %>%
summarise(total_sales = sum(sales_value, na.rm = TRUE)) %>%
arrange(desc(total_sales)))
As depicted in the chart, most of these products have small amounts of sales, thus creating more of a likelihood that they are outliers. However,as visible with Spring/Summer Seasonal, there is a clear relationship between the different seasons. Most of the sales for this product category happen during Spring and Summer. Home Furnishings along with Smoking Cessations also show a significantly greater sales volume in Fall, compared to other months.
# Sales per top 5 products with highest seasonal impact
suppressMessages({top_product_sales<- dem_tran_szn%>%
filter(product_category %in% c('GIFT & FRUIT BASKETS','HOME FURNISHINGS','NATURAL VITAMINS','SMOKING CESSATIONS','SPRING/SUMMER SEASONAL'))%>%
group_by(season, product_category)%>%
summarize(total_sales = sum(sales_value, na.rm = TRUE))
ggplot(top_product_sales, aes(season,total_sales, fill = season))+
geom_bar(stat = 'identity')+
facet_wrap(~ product_category) +
labs(title = "Sales by Season for Top Product Categories",
x = "Season",
y = "Total Sales") +
scale_fill_manual(values = custom_colors_2)+
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
coord_cartesian(ylim = c(0, max(top_product_sales$total_sales) * 1.1))})
Although using all sales values is useful, there was clearly a possibility that several of the products in the Top 5 Products with highest seasonal impact, could be outliers.This new data structure is going to filter out all sales values under 50 and test if that makes a difference.
#observation 2
suppressMessages({cleaned_data <- dem_tran_szn[!is.na(dem_tran_szn$product_category), ]
over_50_sales<- cleaned_data%>%
filter(sales_value>50)%>%
group_by(product_category,season)
seasonal_sales_average_2<- over_50_sales%>%
summarize(average_sales = mean(sales_value, na.rm = TRUE))
seasonal_impact_2<- seasonal_sales_average_2%>%
group_by(product_category)%>%
summarize(seasonal_impact = max(average_sales)-min(average_sales))%>%
arrange(desc(seasonal_impact))})
top_5_over_50<- seasonal_impact_2%>%
slice_head(.,n = 5)
# New Top 5 Products with highest seasonal impact
DT::datatable(top_5_over_50)
This chart shows greater numbers for representation. As shown, the Spring/Summer Seasonal category still makes an appearance, but with smaller numbers comparable to the rest. Pork has high volume numbers, which could be reason for higher seasonal impact based on sales value. Seafood - Frozen shows higher numbers in Spring and winter, thus leading to more volatility from season to season. Tickets show greater sales in Fall compared to the rest of the seasons. Finally, Prepaid Wireless&Accessories shows a slightly higher increase in Winter, potentially due to the holiday seasons.
suppressMessages({specific_products_2<- c("TICKETS",'PORK','PREPAID WIRELESS&ACCESSORIES','SEAFOOD - FROZEN','SPRING/SUMMER SEASONAL')
top_product_sales_2<- dem_tran_szn%>%
filter(product_category %in% specific_products_2)%>%
group_by(season, product_category)%>%
summarize(total_sales = sum(sales_value, na.rm = TRUE))
ggplot(top_product_sales_2, aes(season,total_sales, fill = season))+
geom_bar(stat = 'identity', width = 1)+
facet_wrap(~ product_category) +
labs(title = "Sales by Season for Top Product Categories",
x = "Season",
y = "Total Sales") +
theme_minimal() +
scale_fill_manual(values = custom_colors_2)+
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_y_continuous(limits = c(0,8000))
})
specific_products<- c('GIFT & FRUIT BASKETS','HOME FURNISHINGS','NATURAL VITAMINS','SMOKING CESSATIONS','SPRING/SUMMER SEASONAL')
filtered_data<- dem_tran_szn%>%
filter(product_category %in% specific_products)
suppressMessages(income_demographic_sales <- filtered_data %>%
group_by(product_category, income) %>%
summarize(Total_Sales = sum(sales_value, na.rm = TRUE)) %>%
ungroup())
top_income_demographics <- income_demographic_sales %>%
group_by(product_category) %>%
top_n(1, Total_Sales) %>%
arrange(product_category, desc(Total_Sales))
To gain a better understanding of who is buying these products at Regork, we now dive into sales by income level. This chart shows that the 50-74K range has the highest total sales, while 200-249K has the lowest. This information will prove useful when comparing the relationship between seasonal impact and income level.
suppressMessages({income_season_sales<- dem_tran_szn%>%
group_by(income,season)%>%
summarize(total_sales = sum(sales_value))%>%
arrange(desc(season))
ggplot(income_season_sales, aes(income, total_sales, fill = income))+
geom_bar(stat = 'identity')+
facet_wrap(~ season)+
labs(title = "Sales by Income Level per Season",
x = "Income",
y = "Total Sales")+
theme_minimal() +
scale_fill_manual(values = custom_colors)+
theme(axis.text.x = element_text(angle = 45, hjust = 1))})
This chart explains which income group purchases the the most amount of products that have the highest seasonal impact. The largest demographic that purchases Spring/Summer Seasonal products is 50-74K. It is important to note that this income range is the group with the highest total sales for each season. Smoking Cessations category, it is a lower income, 35-49K that purchases the most products. One surprise that may arise is noticing that the majority of Home Furnishings are sold to individuals within the 15-24K income range
# Observation 1 with all products
suppressMessages(ggplot(top_income_demographics, aes(x = reorder(product_category, -Total_Sales), y = Total_Sales, fill = income)) +
geom_bar(stat = "identity") +
theme_minimal() +
scale_fill_manual(values = custom_colors)+
labs(title = "Top Income per Top Products Seasonally Impacted", x = "Product", y = "Total Sales") +
coord_flip())
### Income per Top 5 Products; Sales Value >50
This observation shows a clear example that shoppers in the 50-74K range of income tend to be the people who influence the trend of volatility from season to season. They are the overhwelming majority of sales per products sold with the highest seasonal impact.
# Observation 2 with all products
filtered_data_2<- dem_tran_szn%>%
filter(product_category %in% specific_products_2)
suppressMessages(income_demographic_sales_2 <- filtered_data_2 %>%
group_by(product_category, income) %>%
summarize(Total_Sales = sum(sales_value, na.rm = TRUE)) %>%
ungroup())
top_income_demographics_2 <- income_demographic_sales_2 %>%
group_by(product_category) %>%
top_n(1, Total_Sales) %>%
arrange(product_category, desc(Total_Sales))
suppressMessages(ggplot(top_income_demographics_2, aes(x = reorder(product_category, -Total_Sales), y = Total_Sales, fill = income)) +
geom_bar(stat = "identity") +
theme_minimal() +
scale_fill_manual(values = custom_colors)+
labs(title = "Top Income; Sales Value >50", x = "Product", y = "Total Sales") +
coord_flip())
This analysis was conducted to uncover seasonal product trends and which demographics (income) were most influential to these trends. The proposal is for the purpose of increasing awareness to the Marketing and Management levels of Regork.
To first find trends in this data, it was important to filter based on seasonal transactions. From there, I was able to find the top products and their seasonal sales. After that, I wanted to break down which products were impacted the most by changing seasons. These products were measured based on a subtraction of the smallest average sale from the largest average sale by season. I then filtered out the Top 5 Product Categories by seasonal impact to view any potential trends. The products were <GIFT & FRUIT BASKETS, NATURAL VITAMINS, HOME FURNISHINGS, SMOKING CESSATIONS, SPRING/SUMMER SEASONAL>. I found it important to then include all sales values per season to help visualize the trends per season. It was then observed that some of these products had relatively small sales and could have potentially been outliers. With this new information, I then decided to add a filter to the original data set, with a new observation of all sales over 50. This observation proved to conclude a larger volume of evidence. The new product categories included <PREPAID WIRELESS&ACCESSORIES, SPRING/SUMMER SEASONAL, TICKETS, SEAFOOD - FROZEN, PORK>. There were different products in this data, which helped gain a better scope of what products could be looked into for better results. My next step in this process was to filter the total sales per income level. After this, I looked at the Top 5 products from each of the observations and found which income levels influence these seasonal trends the most. It was evident that the income range 50-74K showed overwhelming evidence that they tend to be the shoppers who are impacted most by seasonal changes.
The evidence provided concludes the overall trend of product categories that appear to fluctuate in sales most by season and who influences these trends. The recommendation I have found is that products within the Spring/Summer Seasonal should be promoted to customers who are in the income range of 50-74K during the Spring and Summer months. The Marketing and Management departments should also take a close look into Pork, Prepaid Wireless&Accessories, Seafood - Frozen, and Tickets to accurately stock these items each season, as these products tend to fluctuate the most per season in total sales value. With these recommendations, proper promotions can be conducted and sitting inventory shall be reduced based off of the trends mentioned.
The major limitation to this analysis would be the fact that this data was based off of sales value. The higher priced products might be more volatile and see larger numbers than others. It would be wise to analyze a larger scale of products. This research was done to depict the possibility of seasonal trends which was proved to be accurate. Also, the income range, 50-74K, purchases the largest amount of items in total.This might have skewed the data and given inaccurate results based off of population percentage.