In 2017, the North American Sports market size was 69.14 U.S. billion dollars, and is steadily increasing today. According to a publishing by Statista [1], it is one of the largest markets in the world.
Business Question How can the Regork grocery store chain increase profit during the most popular sporting events in North America?
This question will be answered through the analysis of transactions, products, and promotions data sets of the completejourney Regork data [2]. The methodologies utilized are descriptive and inferential statistics as well as regression and clustering analysis.
This report will prove that Regork has untapped potential for profit in the sales of private brands during sporting events.
library(completejourney) # one year grocery store data sets from a group of 2,469 households
library(tidyverse) # tidying data and working with other R packages
library(tidyr) # functions used for tidying or cleaning up messy data
library(ggplot2) # data visualization plotting system using "Grammar of Graphics"
library(knitr) # dynamic report generation in R
library(dplyr) # manipulating and transforming data ( joining, etc.)
library(naniar) # find data quality issues; summarize, visualize, and manipulate missing data
library(stringr) # manipulation text
library(lubridate) # functions used for working with dates and times
library(DT) # For formatting tables
# Loading completejourney data sets
transactions <- get_transactions() %>% na.omit
promotions <- get_promotions() %>% na.omit
products <- products %>% na.omit
combined_data <- transactions %>%
inner_join(products, by = "product_id")
# Filter data for February 5, 2017
combined_data_feb5 <- combined_data %>%
filter(product_type!="GASOLINE-REG UNLEADED") %>%
filter(transaction_timestamp >= as.POSIXct("2017-02-05 00:00:00") &
transaction_timestamp < as.POSIXct("2017-02-06 00:00:00"))
datatable(
combined_data_feb5 %>%
group_by(product_category) %>%
summarize(quantity = sum(quantity, na.rm = TRUE)) %>%
arrange(desc(quantity)),
caption = "Top products Super Bowl")
The above displays the most profitable products sold on Super Bowl Sunday. I will now focus on the products that make the most sense to promote during a sports night. Product categories like “SOFT DRINKS, BEERS/ALES, BAG SNACKS, MEAT, BEEF, LIQUOR” and these include multiple product types. I filtered out the category “gasoline” because it’s not a product of focus for this analysis.
Let’s focus on just the product categories below. Product type not needed to differentiate between National and Private brands.
datatable(
combined_data_feb5 %>%
group_by(product_category) %>%
filter(str_detect(product_category, regex('SOFT DRINKS|BAG SNACKS|BEEF|BEERS/ALES|Cheese|Bread|Juices|Trsh|Water|meat|LIQUOR|WINE|TURKEY|PORK|JCE', ignore_case = TRUE))) %>%
summarize (total_sales_value = sum(sales_value, na.rm = TRUE)) %>%
arrange(desc(total_sales_value)),
caption = "Top products Super Bowl")
Changing product category names to use for the rest of the analysis; “Meat, Bread, Cheese, Liquor, Juice, Snacks, Water” Navigate the sample table below with the arrows on the top, next to the variables.
relevant_prod_combined_data_feb5 <- combined_data_feb5 %>%
filter(str_detect(product_category, regex('SOFT DRINKS|BAG SNACKS|BEEF|BEERS/ALES|Cheese|Bread|Juices|Water|meat|LIQUOR|WINE|TURKEY|PORK|JCE', ignore_case = TRUE))) %>%
mutate(
product_category = case_when(
str_detect(product_category, regex("PORK|Meat|beef|turkey", ignore_case = TRUE)) ~ "Meat",
str_detect(product_category, regex("Bread", ignore_case = TRUE)) ~ "Bread",
str_detect(product_category, regex("Cheese", ignore_case = TRUE)) ~ "Cheese",
str_detect(product_category, regex("Liquor|BEERS/ALES|wine", ignore_case = TRUE)) ~ "Liquor",
str_detect(product_category, regex("JCE|Juice|SOFT DRINKS", ignore_case = TRUE)) ~ "Juice & Soda",
str_detect(product_category, regex("Snacks|BAG SNACKS", ignore_case = TRUE)) ~ "Snacks",
str_detect(product_category, regex("Water", ignore_case = TRUE)) ~ "Water",
TRUE ~ product_category) )
datatable( head(relevant_prod_combined_data_feb5) %>%
arrange(desc(sales_value)) %>%
select(c("product_category", "brand", "product_type", "quantity", "sales_value"))
)
The plot helps to visualize products that are National vs Private brands and we can now see the focused product categories after renaming them.
ggplot(relevant_prod_combined_data_feb5,
aes(x = product_category, fill = brand)) +
geom_bar(position = "dodge") +
labs(title = "Super Bowl: National vs Private Brand Sales" , x = "Products Categories")
The results are not shocking. As National brands tend to sell more most days, and there is a relationship when more overall sales increase,the each brand type sales count increases as well.
But, as we can observe, there is an opportunity to market private brands such as liquors as 0 were sold to the 2,469 households on Super Bowl day.
We then conduct the same investigation during March Madness, although it is a longer period.
combined_data <- transactions %>%
inner_join(products, by = "product_id")
# Filter data for March- April 3, 2017
combined_data_march <- combined_data %>%
na.omit %>%
filter(product_type!="GASOLINE-REG UNLEADED") %>%
filter(transaction_timestamp >= as.POSIXct("2017-03-14 00:00:00") &
transaction_timestamp < as.POSIXct("2017-04-03 00:00:00"))
datatable(
combined_data_march %>%
group_by(product_category) %>%
summarize(total_sales_value = sum(sales_value, na.rm = TRUE)) %>%
arrange(desc(total_sales_value)),
caption = "Top products march madness")
datatable(
combined_data_march %>%
group_by(product_category) %>%
filter(str_detect(product_category, regex('SOFT DRINKS|BAG SNACKS|BEEF|BEERS/ALES|Cheese|Bread|Juices|Water|meat|LIQUOR|WINE|TURKEY|PORK|JCE', ignore_case = TRUE))) %>%
summarize (total_sales_value = sum(sales_value, na.rm = TRUE)) %>%
arrange(desc(total_sales_value)),
caption = "Top products march madness")
relevant_prod_combined_data_march <- combined_data_march %>%
filter(str_detect(product_category, regex('SOFT DRINKS|BAG SNACKS|BEEF|BEERS/ALES|Cheese|Bread|Juices|Water|meat|LIQUOR|WINE|TURKEY|PORK|JCE', ignore_case = TRUE))) %>%
mutate(
product_category = case_when(
str_detect(product_category, regex("pork|Meat|beef|Turkey", ignore_case = TRUE)) ~ "Meat",
str_detect(product_category, regex("Bread", ignore_case = TRUE)) ~ "Bread",
str_detect(product_category, regex("Cheese", ignore_case = TRUE)) ~ "Cheese",
str_detect(product_category, regex("Liquor|LIQUOR|BEERS/ALES|wine", ignore_case = TRUE)) ~ "Liquor",
str_detect(product_category, regex("Juice|SOFT DRINKS|jce", ignore_case = TRUE)) ~ "Juice & Soda",
str_detect(product_category, regex("Snacks|BAG SNACKS", ignore_case = TRUE)) ~ "Snacks",
str_detect(product_category, regex("Water", ignore_case = TRUE)) ~ "Water",
TRUE ~ product_category) )
datatable( head(relevant_prod_combined_data_march) %>%
arrange(desc(sales_value)) %>%
select(c("product_category", "brand", "product_type", "quantity", "sales_value"))
)
ggplot(relevant_prod_combined_data_march,
aes(x = product_category, fill = brand)) +
geom_bar(position = "dodge") + scale_y_log10() +
labs(title = "March Madness: National vs Private Brand Sales", x = "Produts Category")
Similar to the Super Bowl sales, private liquor brands have more potential for growth during March Madness.
Finally, we also look at another long period during the World Series.
combined_data <- transactions %>%
inner_join(products, by = "product_id")
# Filter data for March- April 3, 2017
combined_data_wseries <- combined_data %>%
na.omit %>%
filter(product_type!="GASOLINE-REG UNLEADED") %>%
filter(transaction_timestamp >= as.POSIXct("2017-10-24 00:00:00") &
transaction_timestamp < as.POSIXct("2017-11-01 00:00:00"))
datatable(
combined_data_wseries %>%
group_by(product_category) %>%
summarize(total_sales_value = sum(sales_value, na.rm = TRUE)) %>%
arrange(desc(total_sales_value)),
caption = "Top products world series")
datatable(
combined_data_wseries %>%
group_by(product_category) %>%
filter(str_detect(product_category, regex('SOFT DRINKS|BAG SNACKS|BEEF|BEERS/ALES|Cheese|Bread|Juices|Water|meat|LIQUOR|WINE|TURKEY|PORK|JCE', ignore_case = TRUE))) %>%
summarize (total_sales_value = sum(sales_value, na.rm = TRUE)) %>%
arrange(desc(total_sales_value)),
caption = "Top products world series")
relevant_prod_combined_data_wseries <- combined_data_wseries %>%
filter(str_detect(product_category, regex('SOFT DRINKS|BAG SNACKS|BEEF|BEERS/ALES|Cheese|Bread|Juices|Water|meat|LIQUOR|WINE|TURKEY|PORK|JCE', ignore_case = TRUE))) %>%
mutate(
product_category = case_when(
str_detect(product_category, regex("pork|Meat|beef|Turkey", ignore_case = TRUE)) ~ "Meat",
str_detect(product_category, regex("Bread", ignore_case = TRUE)) ~ "Bread",
str_detect(product_category, regex("Cheese", ignore_case = TRUE)) ~ "Cheese",
str_detect(product_category, regex("Liquor|LIQUOR|BEERS/ALES|wine", ignore_case = TRUE)) ~ "Liquor",
str_detect(product_category, regex("Juice|SOFT DRINKS|jce", ignore_case = TRUE)) ~ "Juice & Soda",
str_detect(product_category, regex("Snacks|BAG SNACKS", ignore_case = TRUE)) ~ "Snacks",
str_detect(product_category, regex("Water", ignore_case = TRUE)) ~ "Water",
TRUE ~ product_category) )
datatable( head(relevant_prod_combined_data_wseries) %>%
arrange(desc(sales_value)) %>%
select(c("product_category", "brand", "product_type", "quantity", "sales_value"))
)
ggplot(relevant_prod_combined_data_wseries,
aes(x = product_category, fill = brand)) +
geom_bar(position = "dodge") + scale_y_log10() +
labs(title = "World Series: National vs Private Brand Sales", x = "Produts Category")
Similar to the Super Bowl and March Madness sales, private liquor brands have more potential for growth during the World Series.
Joining Promotions Data:
Defining Variable - Display location (0-display, 1-store front, 2-store rear, 3-front end cap, 4-mid-aisle end cap, 5-rear end cap, 6-side aisle end cap, 7-in-aisle, 9-secondary location display, A-in-shelf)
promotions_feb5 <- combined_data_feb5 %>%
inner_join(promotions, by = c("product_id","store_id","week")) %>%
filter(str_detect(product_category, regex('SOFT DRINKS|BAG SNACKS|BEEF|BEERS/ALES|Cheese|Bread|Juices|Water|meat|LIQUOR|WINE|TURKEY|PORK|JCE', ignore_case = TRUE))) %>%
mutate(
product_category = case_when(
str_detect(product_category, regex("pork|Meat|beef|Turkey", ignore_case = TRUE)) ~ "Meat",
str_detect(product_category, regex("Bread", ignore_case = TRUE)) ~ "Bread",
str_detect(product_category, regex("Cheese", ignore_case = TRUE)) ~ "Cheese",
str_detect(product_category, regex("Liquor|LIQUOR|BEERS/ALES|wine", ignore_case = TRUE)) ~ "Liquor",
str_detect(product_category, regex("Juice|SOFT DRINKS|jce", ignore_case = TRUE)) ~ "Juice & Soda",
str_detect(product_category, regex("Snacks|BAG SNACKS", ignore_case = TRUE)) ~ "Snacks",
str_detect(product_category, regex("Water", ignore_case = TRUE)) ~ "Water",
TRUE ~ product_category) )
promotions_march <- combined_data_march %>%
inner_join(promotions, by = c("product_id","store_id","week")) %>%
filter(str_detect(product_category, regex('SOFT DRINKS|BAG SNACKS|BEEF|BEERS/ALES|Cheese|Bread|Juices|Water|meat|LIQUOR|WINE|TURKEY|PORK|JCE', ignore_case = TRUE))) %>%
mutate(
product_category = case_when(
str_detect(product_category, regex("pork|Meat|beef|Turkey", ignore_case = TRUE)) ~ "Meat",
str_detect(product_category, regex("Bread", ignore_case = TRUE)) ~ "Bread",
str_detect(product_category, regex("Cheese", ignore_case = TRUE)) ~ "Cheese",
str_detect(product_category, regex("Liquor|LIQUOR|BEERS/ALES|wine", ignore_case = TRUE)) ~ "Liquor",
str_detect(product_category, regex("Juice|SOFT DRINKS|jce", ignore_case = TRUE)) ~ "Juice & Soda",
str_detect(product_category, regex("Snacks|BAG SNACKS", ignore_case = TRUE)) ~ "Snacks",
str_detect(product_category, regex("Water", ignore_case = TRUE)) ~ "Water",
TRUE ~ product_category) )
promotions_wseries <- combined_data_wseries %>%
inner_join(promotions, by = c("product_id","store_id","week")) %>%
filter(str_detect(product_category, regex('SOFT DRINKS|BAG SNACKS|BEEF|BEERS/ALES|Cheese|Bread|Juices|Water|meat|LIQUOR|WINE|TURKEY|PORK|JCE', ignore_case = TRUE))) %>%
mutate(
product_category = case_when(
str_detect(product_category, regex("pork|Meat|beef|Turkey", ignore_case = TRUE)) ~ "Meat",
str_detect(product_category, regex("Bread", ignore_case = TRUE)) ~ "Bread",
str_detect(product_category, regex("Cheese", ignore_case = TRUE)) ~ "Cheese",
str_detect(product_category, regex("Liquor|LIQUOR|BEERS/ALES|wine", ignore_case = TRUE)) ~ "Liquor",
str_detect(product_category, regex("Juice|SOFT DRINKS|jce", ignore_case = TRUE)) ~ "Juice & Soda",
str_detect(product_category, regex("Snacks|BAG SNACKS", ignore_case = TRUE)) ~ "Snacks",
str_detect(product_category, regex("Water", ignore_case = TRUE)) ~ "Water",
TRUE ~ product_category) )
datatable(promotions_feb5 %>%
filter(display_location == "1") %>%
group_by(brand) %>%
summarize(total_sold = sum(quantity)), caption = "Super Bowl FRONT of Store Sales")
datatable(promotions_feb5 %>%
filter(display_location == "2") %>%
group_by(brand) %>%
summarize(total_sold = sum(quantity)), caption = "Super Bowl BACK of Store Sales")
plot_promotions_feb5 <- promotions_feb5 %>%
mutate(
display_location = case_when(
str_detect(display_location, regex("0", ignore_case = TRUE)) ~ "Display",
str_detect(display_location, regex("1", ignore_case = TRUE)) ~ "Front",
str_detect(display_location, regex("2", ignore_case = TRUE)) ~ "Rear",
str_detect(display_location, regex("3", ignore_case = TRUE)) ~ "FrontEnd",
str_detect(display_location, regex("4", ignore_case = TRUE)) ~ "MidAisle",
str_detect(display_location, regex("5", ignore_case = TRUE)) ~ "RearEnd",
str_detect(display_location, regex("6", ignore_case = TRUE)) ~ "SideAisle",
str_detect(display_location, regex("7", ignore_case = TRUE)) ~ "InAisle",
str_detect(display_location, regex("9", ignore_case = TRUE)) ~ "SecDisplay",
str_detect(display_location, regex("A", ignore_case = TRUE)) ~ "Shelf",
TRUE ~ display_location) )
plot_promotions_feb5 %>%
ggplot(aes(x = brand, y = quantity, fill = display_location)) +
geom_bar(stat = "identity") + facet_grid(~display_location) +
labs( title = "SuperBowl Sales Quantity based on Display location",
y = "Quantity Sold", x = "Brand Type") + theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) + theme(axis.text.y = element_text(angle = 90)) +
scale_fill_discrete(name = "Display Locations")
datatable(promotions_march %>%
filter(display_location == "1") %>%
group_by(brand) %>%
summarize(total_sold = sum(quantity)), caption = "March Madness FRONT of Store Sales")
datatable(promotions_march %>%
filter(display_location == "2") %>%
group_by(brand) %>%
summarize(total_sold = sum(quantity)), caption = "March Madness Brand BACK of Store Sales")
plot_promotions_march <- promotions_march %>%
mutate(
display_location = case_when(
str_detect(display_location, regex("0", ignore_case = TRUE)) ~ "Display",
str_detect(display_location, regex("1", ignore_case = TRUE)) ~ "Front",
str_detect(display_location, regex("2", ignore_case = TRUE)) ~ "Rear",
str_detect(display_location, regex("3", ignore_case = TRUE)) ~ "FrontEnd",
str_detect(display_location, regex("4", ignore_case = TRUE)) ~ "MidAisle",
str_detect(display_location, regex("5", ignore_case = TRUE)) ~ "RearEnd",
str_detect(display_location, regex("6", ignore_case = TRUE)) ~ "SideAisle",
str_detect(display_location, regex("7", ignore_case = TRUE)) ~ "InAisle",
str_detect(display_location, regex("9", ignore_case = TRUE)) ~ "SecDisplay",
str_detect(display_location, regex("A", ignore_case = TRUE)) ~ "Shelf",
TRUE ~ display_location) )
plot_promotions_march %>%
ggplot(aes(x = brand, y = quantity, fill = display_location)) +
geom_bar(stat = "identity") + facet_grid(~display_location) +
labs( title = "March Madness Sales Quantity based on Display location",
y = "Quantity Sold", x = "Brand Type") + theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) + theme(axis.text.y = element_text(angle = 90)) +
scale_fill_discrete(name = "Display Locations")
datatable(promotions_wseries %>%
filter(display_location == "1") %>%
group_by(brand) %>%
summarize(total_sold = sum(quantity)), caption = "World Series FRONT of Store Sales")
datatable(promotions_wseries %>%
filter(display_location == "2") %>%
group_by(brand) %>%
summarize(total_sold = sum(quantity)), caption = "World Series BACK of Store Sales")
plot_promotions_wseries <- promotions_wseries %>%
mutate(
display_location = case_when(
str_detect(display_location, regex("0", ignore_case = TRUE)) ~ "Display",
str_detect(display_location, regex("1", ignore_case = TRUE)) ~ "Front",
str_detect(display_location, regex("2", ignore_case = TRUE)) ~ "Rear",
str_detect(display_location, regex("3", ignore_case = TRUE)) ~ "FrontEnd",
str_detect(display_location, regex("4", ignore_case = TRUE)) ~ "MidAisle",
str_detect(display_location, regex("5", ignore_case = TRUE)) ~ "RearEnd",
str_detect(display_location, regex("6", ignore_case = TRUE)) ~ "SideAisle",
str_detect(display_location, regex("7", ignore_case = TRUE)) ~ "InAisle",
str_detect(display_location, regex("9", ignore_case = TRUE)) ~ "SecDisplay",
str_detect(display_location, regex("A", ignore_case = TRUE)) ~ "Shelf",
TRUE ~ display_location) )
plot_promotions_wseries %>%
ggplot(aes(x = brand, y = quantity, fill = display_location)) +
geom_bar(stat = "identity") + facet_grid(~display_location) +
labs( title = "WorldSeries Sales Quantity based on Display location",
y = "Quantity Sold", x = "Brand Type") + theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) + theme(axis.text.y = element_text(angle = 90)) +
scale_fill_discrete(name = "Display Locations")
From the previous plots and tables, we can see that the Front of the store sells better as expected. But, there is an opportunity to increase private brand sales if they are located in front of the store. We can conclude that National brands will sell more regardless of display location. In this example, we compare location 1 and location 2. The front of the store and back.
National brands perform best from both display location. We can make a calculated change by displaying more private brands to the front of the store when customers come to shop for a specific brand.
The prime product during major sporting events is liquor and from the analysis above, we can see that private brands sold 0 during super bowl Sunday and private brand store location was always the back of the store, not the front.
Business Question How can the Regork grocery store chain increase profit during the most popular sporting events in North America?
This analysis explores opportunities for the Regork grocery store chain to increase profits during popular North American sporting events. Utilizing data sets including transactions, products, and promotions from the completejourney database, the analysis employs descriptive and inferential statistics, as well as regression and clustering techniques, to address this question. The report demonstrates that Regork has significant untapped potential for profit in the sales of private brands during sporting events, particularly through strategic promotional campaigns targeting high-demand product categories such as liquor. By comparing sales data from events like the Super Bowl, March Madness, and the World Series, the report highlights opportunities for optimizing product placement and promotion strategies, particularly favoring the promotion of private brands in high-traffic areas within stores. The analysis suggests a strategic shift towards promoting private brands at the forefront of store displays during major sporting events, recognizing the dominance of national brands while emphasizing the potential for growth in private brand sales, especially in lucrative categories like liquor. Additionally, the report acknowledges the limitations of the analysis and suggests avenues for further improvement, including more granular data collection and analysis techniques.
To the Regork CEO, I would propose implementing a comprehensive strategy that leverages the insights from the analysis to maximize consumer engagement and drive sales during major sporting events. This strategy could include:
Allocating prime display locations within stores for private brands, particularly during peak shopping periods like sporting events.
Developing targeted promotional campaigns that highlight the value and quality of private brands, emphasizing their competitive advantages over national brands.
Investing in consumer education initiatives to raise awareness about the benefits of choosing private brands, such as cost savings and comparable quality.
Monitoring consumer feedback and sales data to continuously refine and optimize promotional strategies, ensuring alignment with consumer preferences and market trends.
By implementing these recommendations, Regork can enhance the shopping experience for consumers, drive sales growth, and strengthen its position in the competitive grocery market.
The plan for Regork: Use what we learned from the data to boost sales and get people excited during big sports games. Here’s how:
Put our own brands (Regork brands) in the best spots in stores, especially when everyone's shopping for game day.
Run targeted ads showing how great our brands are and why they're better than national brands.
Teach people why they should choose our brands, like saving money and getting the same quality.
Keep track of what people say and how sales are going, so we can adjust our plans based on what works.
By doing this, Regork can make shopping more fun, sell more stuff, and be a bigger player in the grocery game. ##### (v) Discuss the limitations of your analysis and how you, or someone else, could improve or build on it
[1] https://www.statista.com/statistics/214960/revenue-of-the-north-american-sports-market/
[2] https://cran.r-project.org/web/packages/completejourney/vignettes/completejourney.html