Introduction:
According to ReFed.org, retailers in 2021 generated over 5.12 million tons of surplus food. Of those 5.12 million tons, 1.792 million tons (35%) went to landfills or incinerators. The highest percentage of food waste came from produce (fruit and vegetables) with 32% or 573,440 tons of waste. That is unacceptable and preventable if the right actions were taken.
With Regork being one of the most prestigious grocery store chains in the Midwest, they are always looking for ways to increase their sale margins. One such method is by promoting certain items during their corresponding seasons with marketing campaigns. Depending on the specific product, season, and campaign type there are varying levels of success.
Alarmed at the ReFed.org article, Regork immediately reviews their food waste data, and finds that fruits are heavily wasted in their stores. This is a big opportunity for both savings and a sales increase. To combat the food waste associated with fruit while at the same time increasing fruit sales, Regork has contracted our company to see which fruits are the most popular during each season and which campaigns work best to market them. By knowing that, Regork will be able to kill two birds with one stone. They will be able to both reduce fruit waste (by always knowing what to stock in their stores around the year) and increase their sales numbers.
Proposed Question: How can Regork best prioritize specific fruits during certain seasons of the year and marketing campaigns to generate the most sales possible and decrease food waste?
By properly analyzing the Regork data related to transactions, demographics, campaigns, and campaign descriptions, we will be able to find relationships related to the seasonality of fruit and their best specific marketing campaigns. We not only want to find out the most popular fruit, but who specifically is buying them demographic-wise.
library(tidyverse)
library(lubridate)
library(ggplot2)
library(knitr)
library(RColorBrewer)
library(dplyr)
library(completejourney)
library(kableExtra)
# Sample data
library_data <- data.frame(
Library_Name = c("tidyverse", "lubridate", "completejourney", "dplyr","ggplot2","knitr","RColorBrewer"),
Description = c(
"A collection of R packages for data manipulation and visualization, designed to work seamlessly together.",
"Provides functions to easily work with date-times in R, simplifying date and time data manipulation.",
"A package for analyzing and visualizing customer journey data to understand user behavior and interactions.",
"A powerful data manipulation package for filtering, arranging, summarizing, and transforming data.",
"A versatile and popular package for creating high-quality data visualizations using a layered grammar of graphics.",
"An essential tool for dynamic report generation in R, allowing the integration of R code and text in documents.",
"Provides color palettes for creating attractive and distinctive plots in R, especially useful with ggplot2."
)
)
library_data %>%
kable(format = "html", col.names = c("Library Name", "Description")) %>%
kable_styling(
full_width = FALSE,
bootstrap_options = "striped",
font_size = 14
) %>%
add_header_above(header = c("Library Information" = 2)) %>%
row_spec(0, bold = TRUE, color = "black") %>%
column_spec(1, bold = TRUE, color = "black") %>%
column_spec(2, italic = TRUE, color = "black")
| Library Name | Description |
|---|---|
| tidyverse | A collection of R packages for data manipulation and visualization, designed to work seamlessly together. |
| lubridate | Provides functions to easily work with date-times in R, simplifying date and time data manipulation. |
| completejourney | A package for analyzing and visualizing customer journey data to understand user behavior and interactions. |
| dplyr | A powerful data manipulation package for filtering, arranging, summarizing, and transforming data. |
| ggplot2 | A versatile and popular package for creating high-quality data visualizations using a layered grammar of graphics. |
| knitr | An essential tool for dynamic report generation in R, allowing the integration of R code and text in documents. |
| RColorBrewer | Provides color palettes for creating attractive and distinctive plots in R, especially useful with ggplot2. |
Data Frames
The data used in this project was from the completejourney data set. The specific data used included the following:
An initial data frame was created that contains the Product, Demographic, Campaign, Campaign Description and Transaction information. This was done to make data wrangling easier. In other words, we could create subsets from this master data frame to target specific insights we were looking to extract from the entire data set.
transactions_df <- get_transactions()
main_data <- transactions_df %>%
inner_join(products) %>%
inner_join(demographics) %>%
inner_join(campaigns) %>%
inner_join(campaign_descriptions)
Extracting the month from TransactionsTimestamp
A month variable was created for the table and so that
we could sort them into their respective season.
main_data <- main_data %>%
mutate(month = month(transaction_timestamp)) # will tell us which month it was bought
Creating Season Data
Four season variables were created in order to tell us
which point in the year a fruit was purchased. This was stored in a new
data frame that contained this season data. Depending on the month
extracted from the transaction_timestamp, this information
will sort the month into the correct season.
season_data <- main_data %>%
mutate(
season = case_when(
month %in% 3:5 ~ "Spring",
month %in% 6:8 ~ "Summer",
month %in% 9:11 ~ "Autumn",
TRUE ~ "Winter" ))
Filtering the Data Set by Products with Select Fruit Names
All of the fruit types from the season_data were compiled into
fruits and used to create a data frame that only contains
information with the fruits listed below. This was done by
unique(fruit_data$product_type). We then collected all the
fruits from the list generated.
fruits <- c("ORANGES", "GRAPES", "BLACKBERRIES", "STRAWBERRIES", "RASPBERRIES","PLUMS", "CLEMENTINES", "APPLES", "PEARS", "PEACHES", "NECTARINES", "PINEAPPLE", "KIWI FRUIT", "TANGERINES", "BLUEBERRIES", "TOMATOES","AVOCADO", "CANTALOUPE", "LEMON", "LIMES", "GRAPEFRUIT", "MELON", "BANANAS", "MANDARINS", "WATERMELON")
fruit_data <- season_data %>%
filter(str_detect(product_type, regex(paste(fruits, collapse = "|"), ignore_case = TRUE))) %>%
select(sales_value, age, department, product_type, household_size, kids_count, household_comp, campaign_type, month, season, quantity)
Filtering Data by Produce / Grocery Departments & Renaming Detected Strings with Fruit Names
Due to the inconsistent descriptions of fruit types, the fruits were
mutated into categories with just their identifying name. This was
necessary data cleaning. Additionally, it is worth noting that the fruit
types were not only from the Produce department. Some of
the fruits such as kiwi fruit were listed in the
Grocery department. This made it somewhat difficult to sift
through the strings. Ideally, we would want everything listed under
Produce, but it is unusual for data to be ready for
analysis from the start.
fruit_data <- fruit_data %>%
filter(department %in% c("PRODUCE", "GROCERY")) %>%
mutate(fruit_category = case_when(
str_detect(product_type, regex("ORANGES", ignore_case = TRUE)) ~ "ORANGES",
str_detect(product_type, regex("GRAPES", ignore_case = TRUE)) ~ "GRAPES",
str_detect(product_type, regex("BLACKBERRIES", ignore_case = TRUE)) ~ "BLACKBERRIES",
str_detect(product_type, regex("STRAWBERRIES", ignore_case = TRUE)) ~ "STRAWBERRIES",
str_detect(product_type, regex("RASPBERRIES", ignore_case = TRUE)) ~ "RASPBERRIES",
str_detect(product_type, regex("PLUMS", ignore_case = TRUE)) ~ "PLUMS",
str_detect(product_type, regex("CLEMENTINES", ignore_case = TRUE)) ~ "CLEMENTINES",
str_detect(product_type, regex("APPLES", ignore_case = TRUE)) ~ "APPLES",
str_detect(product_type, regex("PEARS", ignore_case = TRUE)) ~ "PEARS",
str_detect(product_type, regex("PEACHES", ignore_case = TRUE)) ~ "PEACHES",
str_detect(product_type, regex("NECTARINES", ignore_case = TRUE)) ~ "NECTARINES",
str_detect(product_type, regex("PINEAPPLE", ignore_case = TRUE)) ~ "PINEAPPLE",
str_detect(product_type, regex("KIWI FRUIT", ignore_case = TRUE)) ~ "KIWI FRUIT",
str_detect(product_type, regex("TANGERINES", ignore_case = TRUE)) ~ "TANGERINES",
str_detect(product_type, regex("BLUEBERRIES", ignore_case = TRUE)) ~ "BLUEBERRIES",
str_detect(product_type, regex("AVOCADO", ignore_case = TRUE)) ~ "AVOCADO",
str_detect(product_type, regex("CANTALOUPE", ignore_case = TRUE)) ~ "CANTALOUPE",
str_detect(product_type, regex("LEMON", ignore_case = TRUE)) ~ "LEMON",
str_detect(product_type, regex("LIMES", ignore_case = TRUE)) ~ "LIMES",
str_detect(product_type, regex("GRAPEFRUIT", ignore_case = TRUE)) ~ "GRAPEFRUIT",
str_detect(product_type, regex("MELON", ignore_case = TRUE)) ~ "MELON",
str_detect(product_type, regex("BANANAS", ignore_case = TRUE)) ~ "BANANAS",
str_detect(product_type, regex("MANDARINS", ignore_case = TRUE)) ~ "MANDARINS",
TRUE ~ "OTHER"
))
Top 5 Fruits Per Season
This data allowed us to visualize the top five fruits bought in each season and helped to narrow down buying behavior. Consequently, this allowed us to prioritize specific produce to focus on during each season of the year.
fruits_by_season <- fruit_data %>%
group_by(season, fruit_category) %>%
summarise(total_quantity = sum(quantity)) %>%
arrange(desc(total_quantity))
top_5_fruits_per_season <- fruits_by_season %>%
group_by(season, fruit_category) %>%
summarise(total_quantity) %>%
arrange(season, desc(total_quantity)) %>%
group_by(season) %>%
slice_head(n = 6) %>%
filter(fruit_category != "OTHER")
custom_colors <- c(
"ORANGES" = "#FFA442",
"GRAPES" = "#0F6DFD",
"BANANAS" = "#F6F95A",
"STRAWBERRIES" = "#FD0F53",
"APPLES" = "#58CD63")
ggplot(top_5_fruits_per_season, aes(x = season, y = total_quantity, fill = fruit_category)) +
geom_bar(stat = "identity", position = "dodge") +
labs(
x = "Season",
y = "Total Purchases",
fill = "Fruit",
title = "Top 5 Fruits Purchased in Each Season"
) +
scale_fill_manual(values = custom_colors) +
theme_minimal()
The top five selling fruits from each season are the same, but in
different quantities according to the season. Those fruits included
Bananas, Grapes, Oranges,
Strawberries and Apples. Across all seasons,
Bananas remained the most popular fruit by total quantity
purchased. Grapes and Apples remained
relatively constant across all seasons; Oranges and
Strawberries differed radically. Oranges were
primarily sold in Winter and Spring (Dec-March is described as
“in-season”) and Strawberries were mainly purchased during
the Spring and Summer months.
Age Demographics
top_age<-fruit_data %>% group_by(age) %>% summarise(Total_Sales_by_Age = sum(sales_value))
ggplot(top_age, aes(x = age, y = Total_Sales_by_Age)) +
geom_bar(stat = "identity", fill = "blue", alpha = 0.7) +
labs(
title = "Who is Buying the Most Fruit?",
x = "Age Group",
y = "Total Sales"
) +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
axis.title = element_text(size = 14),
axis.text.x = element_text(angle = 45, hjust = 1, size = 12),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
#axis.line = element_line(color = "black"),
panel.background = element_rect(fill = "white"),
plot.caption = element_text(hjust = 0.5, size = 12))
The top age groups that buy fruit were within the 45-54,
35-44, and 24-35 age ranges; Age group 45-54
lead with the most sales. These age ranges define periods where children
are raised, but further demographic data would need to be analyzed with
respect to household size to make such conclusions.
Campaign Sales by Type
campaign_sales <- fruit_data %>%
group_by(season, campaign_type) %>%
summarise(total_sales = sum(sales_value)) %>%
ungroup()
ggplot(campaign_sales, aes(x = season, y = total_sales, fill = campaign_type)) +
geom_bar(stat = "identity", position = "stack") +
labs(
x = "Season",
y = "Total Sales",
fill = "Campaign Type",
title = "Sales by Campaign Type and Season"
) +
theme_minimal() +
scale_fill_manual(values =c("Type C" = "yellow", "Type B" = "orange", "Type A" = "red"))
The marketing campaign that generated the most sales was found to be
Type A. Type A was followed by
Type B. The worst performing of the three types of
marketing campaigns was Type C. From this data, we can
suggest that Type C marketing campaigns should be avoided.
More emphasis on Type A or Type B is
recommended.
Winter Overall Fruit Sales
The fruit data was filtered for only Winter in order to
see how well all types of fruit sell during the season.
Note: this data displays the total sum of sales for the fruit. Fruit costs vary, so fruits purchased the most does not always indicate the highest sales for that specific fruit. You will see bananas as the most purchased with respect to quantity, but does not generate the most sales.
winter_data <- fruit_data %>% filter(season == "Winter"& !fruit_category == "OTHER")%>%
group_by(fruit_category) %>%
select(fruit_category,season, sales_value) %>%
summarize(Total_Sale_of_Category = sum(sales_value))
winter_graph <- ggplot(winter_data, aes(x = reorder(fruit_category, -Total_Sale_of_Category), y = Total_Sale_of_Category)) +
geom_bar(stat = "identity", fill = "#00abff", alpha = 0.8) + # Adjust fill color and transparency
geom_text(aes(label = round(Total_Sale_of_Category, 2)), vjust = -0.5, size = 2.5, color = "black") + # Add data labels
labs(
title = "Total Fruit Sales in the Winter Season",
x = "Fruit Category",
y = "Total Sales Value"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
coord_flip() + # Flip the coordinates for better readability
scale_y_continuous(labels = scales::comma) +
theme(plot.title = element_text(hjust = 0.5), # Center the title
axis.text.x =element_blank(),
axis.ticks.x=element_blank(),
panel.grid.major = element_blank(), # Remove major gridlines
panel.grid.minor = element_blank(), # Remove minor gridlines
axis.line = element_line(color = "black")) # Add a black axis line
winter_graph
Despite apples not selling the most quantity-wise (shown in Top Five
Fruits per Season), they made the most money sales-wise by a sizable
margin. Apples were followed by Grapes,
Bananas, Strawberries and then
Oranges. We came to the conclusion that this trend was
partially because of the higher cost per unit of Apples
when compared to other categories. Additionally, you will see these 5
fruits generate the most sales across all seasons.
Winter Fruit Household Size Graph
The fruit data was filtered for Winter and grouped by
household_size in order to visualize fruit sales with
respect to household_size.
fruit_data %>%
filter(season == "Winter") %>%
group_by(household_size) %>%
summarize(
Total_Sales = round(sum(sales_value), 2)
) %>%
ggplot(aes(x = household_size, y = Total_Sales, fill = household_size)) +
geom_col() +
coord_polar() +
theme_minimal() +
geom_text(aes(label = Total_Sales), position = position_stack(vjust = 0.5), size = 2.5) +
labs(
title = "The Total Fruit Sales for the Winter Season",
subtitle = "Based Upon HouseHold Size",
x = NULL,
y = NULL) +
scale_fill_brewer(palette = "Set2") +
theme(axis.text.y = element_blank(),
axis.title.y = element_blank())
A household_size of 2 people purchased the most
fruit.
Spring Overall Fruit Sales
The fruit data was filtered for only Spring in order to
see how the types of fruit sell during the season.
spring_data <- fruit_data %>% filter(season == "Spring"& !fruit_category == "OTHER")%>%
group_by(fruit_category) %>%
select(fruit_category,season, sales_value) %>%
summarize(Total_Sale_of_Category = sum(sales_value))
spring_graph <- ggplot(spring_data, aes(x = reorder(fruit_category, -Total_Sale_of_Category), y = Total_Sale_of_Category)) +
geom_bar(stat = "identity", fill = "#5BB05B", alpha = 0.8) + # Adjust fill color and transparency
geom_text(aes(label = round(Total_Sale_of_Category, 2)), vjust = -0.5, size = 2.5, color = "black") + # Add data labels
labs(
title = "Total Fruit Sales in the Spring Season",
x = "Fruit Category",
y = "Total Sales Value"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
coord_flip() + # Flip the coordinates for better readability
scale_y_continuous(labels = scales::comma) +
theme(plot.title = element_text(hjust = 0.5), # Center the title
axis.text.x =element_blank(),
axis.ticks.x=element_blank(),
panel.grid.major = element_blank(), # Remove major gridlines
panel.grid.minor = element_blank(), # Remove minor gridlines
axis.line = element_line(color = "black")) # Add a black axis line
spring_graph
Strawberries had the highest monetary sales followed by
Apples, Grapes, Bananas and then
Oranges. This makes sense since it was previously
discovered in the Top Five Fruits per Seasons Graph that high quantities
of Strawberries were sold in Spring.
Spring Fruit Household Size Graph
The fruit data was filtered for only Spring and grouped
by household_size in order to see how fruit sold for each
different a household_size.
fruit_data %>%
filter(season == "Spring") %>%
group_by(household_size) %>%
summarize(
Total_Sales = round(sum(sales_value), 2)
) %>%
ggplot(aes(x = household_size, y = Total_Sales, fill = household_size)) +
geom_col() +
coord_polar() +
theme_minimal() +
geom_text(aes(label = Total_Sales), position = position_stack(vjust = 0.5), size = 2.5) +
labs(
title = "The Total Fruit Sales for the Spring Season",
subtitle = "Based Upon Household Size",
x = NULL,
y = NULL) +
scale_fill_brewer(palette = "Paired") +
theme(axis.text.y = element_blank(),
axis.title.y = element_blank())
Again, a household_size of 2 people purchases the most
fruit.
Summer Overall Fruit Sales
The fruit data was filtered for only Summer in order to
see how types of fruit sell during the season.
summer_data<-fruit_data %>% filter(season == "Summer"& !fruit_category == "OTHER") %>%
group_by(fruit_category) %>%
select(fruit_category,season, sales_value) %>%
summarize(Total_Sale_of_Category = round(sum(sales_value), 2))
ggplot(summer_data, aes(x = reorder(fruit_category, -Total_Sale_of_Category), y = Total_Sale_of_Category)) +
geom_bar(stat = "identity", fill = "#FFD700", alpha = 0.8) + # Adjust fill color and transparency
geom_text(aes(label = round(Total_Sale_of_Category, 2)), vjust = -0.5, size = 2.5, color = "black") + # Add data labels
labs(
title = "Total Fruit Sales in the Summer Season",
x = "Fruit Category",
y = "Total Sales Value"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
coord_flip() + # Flip the coordinates for better readability
scale_y_continuous(labels = scales::comma) + # Format y-axis labels with commas
theme(plot.title = element_text(hjust = 0.5), # Center the title
axis.text.x =element_blank(),
axis.ticks.x=element_blank(),
panel.grid.major = element_blank(), # Remove major gridlines
panel.grid.minor = element_blank(), # Remove minor gridlines
axis.line = element_line(color = "black")) # Add a black axis line
Grapes had the highest monetary sales value followed by
Apples, Bananas, Strawberries
and, unsurprisingly, Melons. Melons are
Summer-friendly fruits so it makes sense that they are surpassing
Oranges in sales. What was surprising was that Grapes were
the highest selling fruit during the season despite not a high quantity
being sold. This can be accounted for by realizing that every other
fruit’s sales dipped during the Summer relative to Grapes.
Grapes are not typically in-season during the start of
summer, but rather considered to be in-season later in August throughout
October. (information grabbed from this link).
Summer Fruit Household Size Graph
The fruit data was filtered for only Summer and grouped
by household_size in order to see how fruit sold for each
different a household_size.
fruit_data %>%
filter(season == "Summer") %>%
group_by(household_size) %>%
summarize(
Total_Sales = round(sum(sales_value),2)
) %>%
ggplot(aes(x = household_size, y = Total_Sales, fill = household_size)) +
geom_col() +
coord_polar() +
theme_minimal() +
geom_text(aes(label = Total_Sales), position = position_stack(vjust = 0.5), size = 2.5) +
labs(
title = "The Total Fruit Sales for the Summer Season",
subtitle = "Based Upon Household Size",
x = NULL,
y = NULL) +
scale_fill_brewer(palette = "Reds") +
theme(axis.text.y = element_blank(),
axis.title.y = element_blank())
Again, a household_size of 2 people purchases the most
fruit.
Autumn Overall Fruit Sales
The fruit data was filtered for only Autumnin order to
see how all types of fruit sell during the season.
fall_data<-fruit_data %>% filter(season == "Autumn"& !fruit_category == "OTHER")%>%
group_by(fruit_category) %>%
select(fruit_category,season, sales_value) %>%
summarize(Total_Sale_of_Category = sum(sales_value))
ggplot(fall_data, aes(x = reorder(fruit_category, -Total_Sale_of_Category), y = Total_Sale_of_Category)) +
geom_bar(stat = "identity", fill = "#FFA500", alpha = 0.8) + # Adjust fill color and transparency
geom_text(aes(label = round(Total_Sale_of_Category, 2)), vjust = -0.5, size = 2.5, color = "black") + # Add data labels
labs(
title = "Total Fruit Sales in the Autumn Season",
x = "Fruit Category",
y = "Total Sales Value"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
coord_flip() + # Flip the coordinates for better readability
scale_y_continuous(labels = scales::comma) + # Format y-axis labels with commas
theme(plot.title = element_text(hjust = 0.5), # Center the title
axis.text.x =element_blank(),
axis.ticks.x=element_blank(),
panel.grid.major = element_blank(), # Remove major gridlines
panel.grid.minor = element_blank(), # Remove minor gridlines
axis.line = element_line(color = "black")) # Add a black axis line
Apples were the best selling fruit during the Autumn
season. Apples were followed by Grapes,
Bananas, Strawberries and then
Oranges. This makes sense when referring back to the other
quantity and sales graphs.
Autumn Fruit Household Size Graph
The fruit data was filtered for only Autumn and grouped
by household_size in order to see how fruit sells for each
different household_size.
fruit_data %>%
filter(season == "Autumn") %>%
group_by(household_size) %>%
summarize(
Total_Sales = round(sum(sales_value), 2)
) %>%
ggplot(aes(x = household_size, y = Total_Sales, fill = household_size)) +
geom_col() +
coord_polar() +
theme_minimal() +
geom_text(aes(label = Total_Sales), position = position_stack(vjust = 0.5), size = 2.5) +
labs(
title = "The Total Fruit Sales for the Autumn Season",
subtitle = "Based Upon Household Size",
x = NULL,
y = NULL) +
scale_fill_brewer(palette = "OrRd") +
theme(axis.text.y = element_blank(),
axis.title.y = element_blank())
Again, a household_size of 2 people purchases the most
fruit.
Summarize the problem statement addressed
In order to effectively advertise different fruits (to increase sales) across each season and reduce fruit waste, it was important to understand which fruits sold the most during each month of the year and the quantity purchased. Once this data was collected, we could then look at demographic and campaign data to see which advertisement campaigns generate the most sales. As a result, we are able to offer recommendations to advertise certain fruits during specific months of the year. We make these recommendations with the hope to maximize profit, while reducing food waste.
Summarize how you addressed this problem statement
We addressed the problem by doing the following:
Data visualizations were created based off of these different methods and used in order to see how best to increase the sales of different fruits and reduce fruit waste across the company.
Summarize the interesting insights that your analysis provided
Interesting insights we found:
Quantity sold and monetary sales did not always directly correlate due to fruit being different in price. For instance, despite Bananas being sold the most quantity wise across all seasons by a large margin, they only ranked third overall in the fruit category for monetary sales. Meanwhile, Apples were almost the complete opposite due to their higher price per unit.
Households of two people bought the most fruit with households of one person following across all seasons. This is interesting since most people would think that larger households would buy more fruit.
Seasons at risk for massive potential fruit waste are Winter and Autumn. Fruit sales visibly decrease in both quantity and monetary amounts during these times. Please be wary and follow the trends of what sells the most and makes the most money.
Overstocking could lead to profit loss and food waste.
The top age demographic that bought fruit was the 45-54 age group followed by the 35-44 age group and then the 25-34 age group. This makes sense since these are the typical age groups that make up families.
The best marketing campaign to use in order to increase fruit sales was Type A. That means using targeted coupon campaigns worked the best.
Across the different seasonal graphs for monetary fruits sales, one of the best fruits overall was Apples. They were either first or second in total monetary sales for each season. Quantity sold wise, Bananas were consistently the best.
Summarize the implications to the consumer of your analysis. What would you propose to the Regork CEO?
We suggest the following to Regork CEO:
Focus on the marketing campaign Type A since they generate the most sales for fruit.
Market fruit discounts to household sizes with 2+ people. The top three age groups (45-54, 35-44, and 25-34) encompass a range in which families are held. Targeted marketing to families advertising for in-season fruit discounts would encourage families to buy more fruit from Regork.
Create coupons for the top fruits sold (monetarily and quantity wise) during their best prospective seasons Incorporate in-season data into quantity / shipment orders of produce. Customers are more likely to buy fruits that are in season as their price is typically lower.
Focus on fruits that are in season and increase the bundled deals such as “Buy Five Save Five”.
Discuss the limitations of your analysis and how you, or someone else, could improve or build on it.
Some potential limitations of our analysis:
Our dataset was limited to a single year and so the data could be potentially skewed
The complete journey data set was parsed down, the entire set contains millions of rows compared to the 1.5 million rows of transaction data we were given.
While some items that are technically fruits (tomatoes) were originally included in analysis, they were later excluded as they tended to dominate most sales and quantity purchased. We decided to only include fruits that would go well together in a fruit salad (sweet vs. savory tasting).
As data analysts/scientists in training, we could have potentially miscoded or misinterpreted a part of our results since this project was for learning. This was supposed to be a demonstration of our skills acquired in the last 7 weeks.
How our analysis could be potentially improved:
Utilizing alternative visualization tools such as Tableau or Power BI to generate more compelling visual representations.
Expanding our dataset to encompass a broader time span, including data from various years and decades.
Conducting a comparative analysis by contrasting the Complete Journey dataset with industry data and aligning our exploratory data analysis with external sources.
Employing machine learning algorithms to gain a deeper understanding of the intricate relationship between fruit sales and our input variables, which encompass demographics and age groups.
Exploring insights within the grocery store’s supply chain to identify opportunities for waste reduction and improved efficiency.
Works Cited:
Everything to know about grapes. Food Network. (n.d.). https://www.foodnetwork.com/how-to/packages/food-network-essentials/when-are-grapes-in-season
Retail shopping data • completejourney - github pages. (n.d.). https://bradleyboehmke.github.io/completejourney/index.html
“Solutions for Food Waste in Grocery Stores for Retailers.” ReFED, refed.org/stakeholders/retailers/#:~:text=In%202021%2C%20Retailers%20generated%205.12,confusion%20over%20freshness%20date%20labels. Accessed 5 Oct. 2023.