Regork’s Fruit Frenzy!

Introduction

Introduction:

According to ReFed.org, retailers in 2021 generated over 5.12 million tons of surplus food. Of those 5.12 million tons, 1.792 million tons (35%) went to landfills or incinerators. The highest percentage of food waste came from produce (fruit and vegetables) with 32% or 573,440 tons of waste. That is unacceptable and preventable if the right actions were taken.

With Regork being one of the most prestigious grocery store chains in the Midwest, they are always looking for ways to increase their sale margins. One such method is by promoting certain items during their corresponding seasons with marketing campaigns. Depending on the specific product, season, and campaign type there are varying levels of success.

Alarmed at the ReFed.org article, Regork immediately reviews their food waste data, and finds that fruits are heavily wasted in their stores. This is a big opportunity for both savings and a sales increase. To combat the food waste associated with fruit while at the same time increasing fruit sales, Regork has contracted our company to see which fruits are the most popular during each season and which campaigns work best to market them. By knowing that, Regork will be able to kill two birds with one stone. They will be able to both reduce fruit waste (by always knowing what to stock in their stores around the year) and increase their sales numbers.

Proposed Question: How can Regork best prioritize specific fruits during certain seasons of the year and marketing campaigns to generate the most sales possible and decrease food waste?

By properly analyzing the Regork data related to transactions, demographics, campaigns, and campaign descriptions, we will be able to find relationships related to the seasonality of fruit and their best specific marketing campaigns. We not only want to find out the most popular fruit, but who specifically is buying them demographic-wise.

Packages Required

library(tidyverse)
library(lubridate)
library(ggplot2)
library(knitr)
library(RColorBrewer)
library(dplyr)
library(completejourney)
library(kableExtra)

# Sample data
library_data <- data.frame(
  Library_Name = c("tidyverse", "lubridate", "completejourney", "dplyr","ggplot2","knitr","RColorBrewer"),
  Description = c(
    "A collection of R packages for data manipulation and visualization, designed to work seamlessly together.",
    "Provides functions to easily work with date-times in R, simplifying date and time data manipulation.",
    "A package for analyzing and visualizing customer journey data to understand user behavior and interactions.",
    "A powerful data manipulation package for filtering, arranging, summarizing, and transforming data.",
    "A versatile and popular package for creating high-quality data visualizations using a layered grammar of graphics.",
    "An essential tool for dynamic report generation in R, allowing the integration of R code and text in documents.",
    "Provides color palettes for creating attractive and distinctive plots in R, especially useful with ggplot2."
  )
)

library_data %>%
  kable(format = "html", col.names = c("Library Name", "Description")) %>%
  kable_styling(
    full_width = FALSE, 
    bootstrap_options = "striped", 
    font_size = 14 
  ) %>%
  add_header_above(header = c("Library Information" = 2)) %>%
  row_spec(0, bold = TRUE, color = "black") %>%
  column_spec(1, bold = TRUE, color = "black") %>%
  column_spec(2, italic = TRUE, color = "black")

Library Information
Library Name	Description
tidyverse	A collection of R packages for data manipulation and visualization, designed to work seamlessly together.
lubridate	Provides functions to easily work with date-times in R, simplifying date and time data manipulation.
completejourney	A package for analyzing and visualizing customer journey data to understand user behavior and interactions.
dplyr	A powerful data manipulation package for filtering, arranging, summarizing, and transforming data.
ggplot2	A versatile and popular package for creating high-quality data visualizations using a layered grammar of graphics.
knitr	An essential tool for dynamic report generation in R, allowing the integration of R code and text in documents.
RColorBrewer	Provides color palettes for creating attractive and distinctive plots in R, especially useful with ggplot2.

Data Preparation

Data Frames

The data used in this project was from the completejourney data set. The specific data used included the following:

Transaction data
Demographic data
Campaign data
Campaign descriptions

An initial data frame was created that contains the Product, Demographic, Campaign, Campaign Description and Transaction information. This was done to make data wrangling easier. In other words, we could create subsets from this master data frame to target specific insights we were looking to extract from the entire data set.

transactions_df <- get_transactions()
main_data <- transactions_df %>%
  inner_join(products) %>%
  inner_join(demographics) %>%
  inner_join(campaigns) %>%
  inner_join(campaign_descriptions)

Extracting the month from TransactionsTimestamp

A month variable was created for the table and so that we could sort them into their respective season.

main_data <- main_data %>%
  mutate(month = month(transaction_timestamp)) # will tell us which month it was bought

Creating Season Data

Four season variables were created in order to tell us which point in the year a fruit was purchased. This was stored in a new data frame that contained this season data. Depending on the month extracted from the transaction_timestamp, this information will sort the month into the correct season.

season_data <- main_data %>%
  mutate(
    season = case_when(
      month %in% 3:5 ~ "Spring",
      month %in% 6:8 ~ "Summer",
      month %in% 9:11 ~ "Autumn",
      TRUE ~ "Winter" ))

Filtering the Data Set by Products with Select Fruit Names

All of the fruit types from the season_data were compiled into fruits and used to create a data frame that only contains information with the fruits listed below. This was done by unique(fruit_data$product_type). We then collected all the fruits from the list generated.

fruits <- c("ORANGES", "GRAPES", "BLACKBERRIES", "STRAWBERRIES", "RASPBERRIES","PLUMS", "CLEMENTINES", "APPLES", "PEARS", "PEACHES", "NECTARINES", "PINEAPPLE", "KIWI FRUIT", "TANGERINES", "BLUEBERRIES", "TOMATOES","AVOCADO", "CANTALOUPE", "LEMON", "LIMES", "GRAPEFRUIT", "MELON", "BANANAS", "MANDARINS", "WATERMELON")

fruit_data <- season_data %>%
  filter(str_detect(product_type, regex(paste(fruits, collapse = "|"), ignore_case = TRUE))) %>%
  select(sales_value, age, department, product_type, household_size, kids_count, household_comp, campaign_type, month, season, quantity)

Filtering Data by Produce / Grocery Departments & Renaming Detected Strings with Fruit Names

Due to the inconsistent descriptions of fruit types, the fruits were mutated into categories with just their identifying name. This was necessary data cleaning. Additionally, it is worth noting that the fruit types were not only from the Produce department. Some of the fruits such as kiwi fruit were listed in the Grocery department. This made it somewhat difficult to sift through the strings. Ideally, we would want everything listed under Produce, but it is unusual for data to be ready for analysis from the start.

fruit_data <- fruit_data %>%
  filter(department %in% c("PRODUCE", "GROCERY")) %>%
  mutate(fruit_category = case_when(
    str_detect(product_type, regex("ORANGES", ignore_case = TRUE)) ~ "ORANGES",
    str_detect(product_type, regex("GRAPES", ignore_case = TRUE)) ~ "GRAPES",
    str_detect(product_type, regex("BLACKBERRIES", ignore_case = TRUE)) ~ "BLACKBERRIES",
    str_detect(product_type, regex("STRAWBERRIES", ignore_case = TRUE)) ~ "STRAWBERRIES",
    str_detect(product_type, regex("RASPBERRIES", ignore_case = TRUE)) ~ "RASPBERRIES",
    str_detect(product_type, regex("PLUMS", ignore_case = TRUE)) ~ "PLUMS",
    str_detect(product_type, regex("CLEMENTINES", ignore_case = TRUE)) ~ "CLEMENTINES",
    str_detect(product_type, regex("APPLES", ignore_case = TRUE)) ~ "APPLES",
    str_detect(product_type, regex("PEARS", ignore_case = TRUE)) ~ "PEARS",
    str_detect(product_type, regex("PEACHES", ignore_case = TRUE)) ~ "PEACHES",
    str_detect(product_type, regex("NECTARINES", ignore_case = TRUE)) ~ "NECTARINES",
    str_detect(product_type, regex("PINEAPPLE", ignore_case = TRUE)) ~ "PINEAPPLE",
    str_detect(product_type, regex("KIWI FRUIT", ignore_case = TRUE)) ~ "KIWI FRUIT",
    str_detect(product_type, regex("TANGERINES", ignore_case = TRUE)) ~ "TANGERINES",
    str_detect(product_type, regex("BLUEBERRIES", ignore_case = TRUE)) ~ "BLUEBERRIES",
    str_detect(product_type, regex("AVOCADO", ignore_case = TRUE)) ~ "AVOCADO",
    str_detect(product_type, regex("CANTALOUPE", ignore_case = TRUE)) ~ "CANTALOUPE",
    str_detect(product_type, regex("LEMON", ignore_case = TRUE)) ~ "LEMON",
    str_detect(product_type, regex("LIMES", ignore_case = TRUE)) ~ "LIMES",
    str_detect(product_type, regex("GRAPEFRUIT", ignore_case = TRUE)) ~ "GRAPEFRUIT",
    str_detect(product_type, regex("MELON", ignore_case = TRUE)) ~ "MELON",
    str_detect(product_type, regex("BANANAS", ignore_case = TRUE)) ~ "BANANAS",
    str_detect(product_type, regex("MANDARINS", ignore_case = TRUE)) ~ "MANDARINS",
    TRUE ~ "OTHER"
  ))

Exploratory Data Analysis

Overall

Top 5 Fruits Per Season

This data allowed us to visualize the top five fruits bought in each season and helped to narrow down buying behavior. Consequently, this allowed us to prioritize specific produce to focus on during each season of the year.

fruits_by_season <- fruit_data %>%
  group_by(season, fruit_category) %>%
  summarise(total_quantity = sum(quantity)) %>%
  arrange(desc(total_quantity))

top_5_fruits_per_season <- fruits_by_season %>%
  group_by(season, fruit_category) %>%
  summarise(total_quantity) %>%
  arrange(season, desc(total_quantity)) %>%
  group_by(season) %>%
  slice_head(n = 6) %>%
  filter(fruit_category != "OTHER")

custom_colors <- c(
  "ORANGES" = "#FFA442",
  "GRAPES" = "#0F6DFD",
  "BANANAS" = "#F6F95A",
  "STRAWBERRIES" = "#FD0F53",
  "APPLES" = "#58CD63")

ggplot(top_5_fruits_per_season, aes(x = season, y = total_quantity, fill = fruit_category)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(
    x = "Season",
    y = "Total Purchases",
    fill = "Fruit",
    title = "Top 5 Fruits Purchased in Each Season"
  ) +
  scale_fill_manual(values = custom_colors) +
  theme_minimal()

The top five selling fruits from each season are the same, but in different quantities according to the season. Those fruits included Bananas, Grapes, Oranges, Strawberries and Apples. Across all seasons, Bananas remained the most popular fruit by total quantity purchased. Grapes and Apples remained relatively constant across all seasons; Oranges and Strawberries differed radically. Oranges were primarily sold in Winter and Spring (Dec-March is described as “in-season”) and Strawberries were mainly purchased during the Spring and Summer months.

Age Demographics

top_age<-fruit_data %>% group_by(age) %>% summarise(Total_Sales_by_Age = sum(sales_value))


ggplot(top_age, aes(x = age, y = Total_Sales_by_Age)) +
  geom_bar(stat = "identity", fill = "blue", alpha = 0.7) +  
  labs(
    title = "Who is Buying the Most Fruit?",
    x = "Age Group",
    y = "Total Sales"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),  
        axis.title = element_text(size = 14),  
        axis.text.x = element_text(angle = 45, hjust = 1, size = 12),  
        panel.grid.major = element_blank(),  
        panel.grid.minor = element_blank(),  
        #axis.line = element_line(color = "black"), 
        panel.background = element_rect(fill = "white"),  
        plot.caption = element_text(hjust = 0.5, size = 12))

The top age groups that buy fruit were within the 45-54, 35-44, and 24-35 age ranges; Age group 45-54 lead with the most sales. These age ranges define periods where children are raised, but further demographic data would need to be analyzed with respect to household size to make such conclusions.

Campaign Sales by Type

campaign_sales <- fruit_data %>%
  group_by(season, campaign_type) %>%
  summarise(total_sales = sum(sales_value)) %>% 
  ungroup()

ggplot(campaign_sales, aes(x = season, y = total_sales, fill = campaign_type)) +
  geom_bar(stat = "identity", position = "stack") + 
  labs(
    x = "Season",
    y = "Total Sales",
    fill = "Campaign Type",
    title = "Sales by Campaign Type and Season"
  ) +
  theme_minimal() +
  scale_fill_manual(values =c("Type C" = "yellow", "Type B" = "orange", "Type A" = "red"))

The marketing campaign that generated the most sales was found to be Type A. Type A was followed by Type B. The worst performing of the three types of marketing campaigns was Type C. From this data, we can suggest that Type C marketing campaigns should be avoided. More emphasis on Type A or Type B is recommended.

Winter

Winter Overall Fruit Sales

The fruit data was filtered for only Winter in order to see how well all types of fruit sell during the season.

Note: this data displays the total sum of sales for the fruit. Fruit costs vary, so fruits purchased the most does not always indicate the highest sales for that specific fruit. You will see bananas as the most purchased with respect to quantity, but does not generate the most sales.

winter_data <- fruit_data %>% filter(season == "Winter"& !fruit_category == "OTHER")%>%
  group_by(fruit_category) %>% 
  select(fruit_category,season, sales_value) %>% 
  summarize(Total_Sale_of_Category = sum(sales_value))

winter_graph <- ggplot(winter_data, aes(x = reorder(fruit_category, -Total_Sale_of_Category), y = Total_Sale_of_Category)) +
  geom_bar(stat = "identity", fill = "#00abff", alpha = 0.8) +  # Adjust fill color and transparency
  geom_text(aes(label = round(Total_Sale_of_Category, 2)), vjust = -0.5,   size = 2.5, color = "black") +  # Add data labels
  labs(
    title = "Total Fruit Sales in the Winter Season",
    x = "Fruit Category",
    y = "Total Sales Value" 
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  coord_flip() +  # Flip the coordinates for better readability
  scale_y_continuous(labels = scales::comma) +
    theme(plot.title = element_text(hjust = 0.5),  # Center the title
        axis.text.x =element_blank(),
        axis.ticks.x=element_blank(),
        panel.grid.major = element_blank(),  # Remove major gridlines
        panel.grid.minor = element_blank(),  # Remove minor gridlines
        axis.line = element_line(color = "black"))  # Add a black axis line

winter_graph

Despite apples not selling the most quantity-wise (shown in Top Five Fruits per Season), they made the most money sales-wise by a sizable margin. Apples were followed by Grapes, Bananas, Strawberries and then Oranges. We came to the conclusion that this trend was partially because of the higher cost per unit of Apples when compared to other categories. Additionally, you will see these 5 fruits generate the most sales across all seasons.

Winter Fruit Household Size Graph

The fruit data was filtered for Winter and grouped by household_size in order to visualize fruit sales with respect to household_size.

fruit_data %>%
  filter(season == "Winter") %>%
  group_by(household_size) %>%
  summarize(
    Total_Sales = round(sum(sales_value), 2)
  ) %>%
  ggplot(aes(x = household_size, y = Total_Sales, fill = household_size)) +
  geom_col() +
  coord_polar() +
  theme_minimal() +
  geom_text(aes(label = Total_Sales), position = position_stack(vjust = 0.5), size = 2.5) + 
  labs(
    title = "The Total Fruit Sales for the Winter Season",
    subtitle = "Based Upon HouseHold Size",
    x = NULL,
    y = NULL) +
  scale_fill_brewer(palette = "Set2") +
  theme(axis.text.y = element_blank(),
        axis.title.y = element_blank())

A household_size of 2 people purchased the most fruit.

Spring

Spring Overall Fruit Sales

The fruit data was filtered for only Spring in order to see how the types of fruit sell during the season.

spring_data <- fruit_data %>% filter(season == "Spring"& !fruit_category == "OTHER")%>%
  group_by(fruit_category) %>% 
  select(fruit_category,season, sales_value) %>% 
  summarize(Total_Sale_of_Category = sum(sales_value))

spring_graph <- ggplot(spring_data, aes(x = reorder(fruit_category, -Total_Sale_of_Category), y = Total_Sale_of_Category)) +
  geom_bar(stat = "identity", fill = "#5BB05B", alpha = 0.8) +  # Adjust fill color and transparency
  geom_text(aes(label = round(Total_Sale_of_Category, 2)), vjust = -0.5,   size = 2.5, color = "black") +  # Add data labels
  labs(
    title = "Total Fruit Sales in the Spring Season",
    x = "Fruit Category",
    y = "Total Sales Value" 
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  coord_flip() +  # Flip the coordinates for better readability
  scale_y_continuous(labels = scales::comma) +
    theme(plot.title = element_text(hjust = 0.5),  # Center the title
        axis.text.x =element_blank(),
        axis.ticks.x=element_blank(),
        panel.grid.major = element_blank(),  # Remove major gridlines
        panel.grid.minor = element_blank(),  # Remove minor gridlines
        axis.line = element_line(color = "black"))  # Add a black axis line

spring_graph

Strawberries had the highest monetary sales followed by Apples, Grapes, Bananas and then Oranges. This makes sense since it was previously discovered in the Top Five Fruits per Seasons Graph that high quantities of Strawberries were sold in Spring.

Spring Fruit Household Size Graph

The fruit data was filtered for only Spring and grouped by household_size in order to see how fruit sold for each different a household_size.

fruit_data %>%
  filter(season == "Spring") %>%
  group_by(household_size) %>%
  summarize(
    Total_Sales = round(sum(sales_value), 2)
  ) %>%
  ggplot(aes(x = household_size, y = Total_Sales, fill = household_size)) +
  geom_col() +
  coord_polar() +
  theme_minimal() +
  geom_text(aes(label = Total_Sales), position = position_stack(vjust = 0.5), size = 2.5) + 
  labs(
    title = "The Total Fruit Sales for the Spring Season",
    subtitle = "Based Upon Household Size",
    x = NULL,
    y = NULL) +
  scale_fill_brewer(palette = "Paired") +
  theme(axis.text.y = element_blank(),
        axis.title.y = element_blank())

Again, a household_size of 2 people purchases the most fruit.

Summer

Summer Overall Fruit Sales

The fruit data was filtered for only Summer in order to see how types of fruit sell during the season.

summer_data<-fruit_data %>% filter(season == "Summer"& !fruit_category == "OTHER") %>% 
  group_by(fruit_category) %>% 
  select(fruit_category,season, sales_value) %>% 
  summarize(Total_Sale_of_Category = round(sum(sales_value), 2))


ggplot(summer_data, aes(x = reorder(fruit_category, -Total_Sale_of_Category), y = Total_Sale_of_Category)) +
  geom_bar(stat = "identity", fill = "#FFD700", alpha = 0.8) +  # Adjust fill color and transparency
  geom_text(aes(label = round(Total_Sale_of_Category, 2)), vjust = -0.5, size = 2.5, color = "black") +  # Add data labels
  labs(
    title = "Total Fruit Sales in the Summer Season",
    x = "Fruit Category",
    y = "Total Sales Value" 
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  coord_flip() +  # Flip the coordinates for better readability
  scale_y_continuous(labels = scales::comma) +  # Format y-axis labels with commas
  theme(plot.title = element_text(hjust = 0.5),  # Center the title
        axis.text.x =element_blank(),
        axis.ticks.x=element_blank(),
        panel.grid.major = element_blank(),  # Remove major gridlines
        panel.grid.minor = element_blank(),  # Remove minor gridlines
        axis.line = element_line(color = "black"))  # Add a black axis line

Grapes had the highest monetary sales value followed by Apples, Bananas, Strawberries and, unsurprisingly, Melons. Melons are Summer-friendly fruits so it makes sense that they are surpassing Oranges in sales. What was surprising was that Grapes were the highest selling fruit during the season despite not a high quantity being sold. This can be accounted for by realizing that every other fruit’s sales dipped during the Summer relative to Grapes. Grapes are not typically in-season during the start of summer, but rather considered to be in-season later in August throughout October. (information grabbed from this link).

Summer Fruit Household Size Graph

The fruit data was filtered for only Summer and grouped by household_size in order to see how fruit sold for each different a household_size.

fruit_data %>%
  filter(season == "Summer") %>%
  group_by(household_size) %>%
  summarize(
    Total_Sales = round(sum(sales_value),2)
  ) %>%
  ggplot(aes(x = household_size, y = Total_Sales, fill = household_size)) +
  geom_col() +
  coord_polar() +
  theme_minimal() +
  geom_text(aes(label = Total_Sales), position = position_stack(vjust = 0.5), size = 2.5) + 
  labs(
    title = "The Total Fruit Sales for the Summer Season",
    subtitle = "Based Upon Household Size",
    x = NULL,
    y = NULL) +
  scale_fill_brewer(palette = "Reds") +
  theme(axis.text.y = element_blank(),
        axis.title.y = element_blank())

Again, a household_size of 2 people purchases the most fruit.

Autumn

Autumn Overall Fruit Sales

The fruit data was filtered for only Autumnin order to see how all types of fruit sell during the season.

fall_data<-fruit_data %>% filter(season == "Autumn"& !fruit_category == "OTHER")%>%
  group_by(fruit_category) %>% 
  select(fruit_category,season, sales_value) %>% 
  summarize(Total_Sale_of_Category = sum(sales_value))


ggplot(fall_data, aes(x = reorder(fruit_category, -Total_Sale_of_Category), y = Total_Sale_of_Category)) +
  geom_bar(stat = "identity", fill = "#FFA500", alpha = 0.8) +  # Adjust fill color and transparency
  geom_text(aes(label = round(Total_Sale_of_Category, 2)), vjust = -0.5, size = 2.5, color = "black") +  # Add data labels
  labs(
    title = "Total Fruit Sales in the Autumn Season",
    x = "Fruit Category",
    y = "Total Sales Value" 
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  coord_flip() +  # Flip the coordinates for better readability
  scale_y_continuous(labels = scales::comma) +  # Format y-axis labels with commas
  theme(plot.title = element_text(hjust = 0.5),  # Center the title
        axis.text.x =element_blank(),
        axis.ticks.x=element_blank(),
        panel.grid.major = element_blank(),  # Remove major gridlines
        panel.grid.minor = element_blank(),  # Remove minor gridlines
        axis.line = element_line(color = "black"))  # Add a black axis line

Apples were the best selling fruit during the Autumn season. Apples were followed by Grapes, Bananas, Strawberries and then Oranges. This makes sense when referring back to the other quantity and sales graphs.

Autumn Fruit Household Size Graph

The fruit data was filtered for only Autumn and grouped by household_size in order to see how fruit sells for each different household_size.

fruit_data %>%
  filter(season == "Autumn") %>%
  group_by(household_size) %>%
  summarize(
    Total_Sales = round(sum(sales_value), 2)
  ) %>%
  ggplot(aes(x = household_size, y = Total_Sales, fill = household_size)) +
  geom_col() +
  coord_polar() +
  theme_minimal() +
  geom_text(aes(label = Total_Sales), position = position_stack(vjust = 0.5), size = 2.5) + 
  labs(
    title = "The Total Fruit Sales for the Autumn Season",
    subtitle = "Based Upon Household Size",
    x = NULL,
    y = NULL) +
  scale_fill_brewer(palette = "OrRd") +
  theme(axis.text.y = element_blank(),
        axis.title.y = element_blank())

Again, a household_size of 2 people purchases the most fruit.

Summary & Conclusions

Summarize the problem statement addressed

In order to effectively advertise different fruits (to increase sales) across each season and reduce fruit waste, it was important to understand which fruits sold the most during each month of the year and the quantity purchased. Once this data was collected, we could then look at demographic and campaign data to see which advertisement campaigns generate the most sales. As a result, we are able to offer recommendations to advertise certain fruits during specific months of the year. We make these recommendations with the hope to maximize profit, while reducing food waste.

Summarize how you addressed this problem statement

We addressed the problem by doing the following:

Finding the top five selling fruit for each season by quantity
Filtering the monetary sales of fruit by age group demographic
Grouping the monetary sales of fruit by season and marketing campaign type
Showing the top selling fruit for each season by monetary value
Sectioning the monetary sales of fruit for each season by household size

Data visualizations were created based off of these different methods and used in order to see how best to increase the sales of different fruits and reduce fruit waste across the company.

Summarize the interesting insights that your analysis provided

Interesting insights we found:

Quantity sold and monetary sales did not always directly correlate due to fruit being different in price. For instance, despite Bananas being sold the most quantity wise across all seasons by a large margin, they only ranked third overall in the fruit category for monetary sales. Meanwhile, Apples were almost the complete opposite due to their higher price per unit.
Households of two people bought the most fruit with households of one person following across all seasons. This is interesting since most people would think that larger households would buy more fruit.
Seasons at risk for massive potential fruit waste are Winter and Autumn. Fruit sales visibly decrease in both quantity and monetary amounts during these times. Please be wary and follow the trends of what sells the most and makes the most money.
Overstocking could lead to profit loss and food waste.
The top age demographic that bought fruit was the 45-54 age group followed by the 35-44 age group and then the 25-34 age group. This makes sense since these are the typical age groups that make up families.
The best marketing campaign to use in order to increase fruit sales was Type A. That means using targeted coupon campaigns worked the best.
Across the different seasonal graphs for monetary fruits sales, one of the best fruits overall was Apples. They were either first or second in total monetary sales for each season. Quantity sold wise, Bananas were consistently the best.

Summarize the implications to the consumer of your analysis. What would you propose to the Regork CEO?

We suggest the following to Regork CEO:

Focus on the marketing campaign Type A since they generate the most sales for fruit.
Market fruit discounts to household sizes with 2+ people. The top three age groups (45-54, 35-44, and 25-34) encompass a range in which families are held. Targeted marketing to families advertising for in-season fruit discounts would encourage families to buy more fruit from Regork.
Create coupons for the top fruits sold (monetarily and quantity wise) during their best prospective seasons Incorporate in-season data into quantity / shipment orders of produce. Customers are more likely to buy fruits that are in season as their price is typically lower.
Focus on fruits that are in season and increase the bundled deals such as “Buy Five Save Five”.

Discuss the limitations of your analysis and how you, or someone else, could improve or build on it.

Some potential limitations of our analysis:

Our dataset was limited to a single year and so the data could be potentially skewed
The complete journey data set was parsed down, the entire set contains millions of rows compared to the 1.5 million rows of transaction data we were given.
While some items that are technically fruits (tomatoes) were originally included in analysis, they were later excluded as they tended to dominate most sales and quantity purchased. We decided to only include fruits that would go well together in a fruit salad (sweet vs. savory tasting).
As data analysts/scientists in training, we could have potentially miscoded or misinterpreted a part of our results since this project was for learning. This was supposed to be a demonstration of our skills acquired in the last 7 weeks.

How our analysis could be potentially improved:

Utilizing alternative visualization tools such as Tableau or Power BI to generate more compelling visual representations.
Expanding our dataset to encompass a broader time span, including data from various years and decades.
Conducting a comparative analysis by contrasting the Complete Journey dataset with industry data and aligning our exploratory data analysis with external sources.
Employing machine learning algorithms to gain a deeper understanding of the intricate relationship between fruit sales and our input variables, which encompass demographics and age groups.
Exploring insights within the grocery store’s supply chain to identify opportunities for waste reduction and improved efficiency.

Works Cited:

Everything to know about grapes. Food Network. (n.d.). https://www.foodnetwork.com/how-to/packages/food-network-essentials/when-are-grapes-in-season

Retail shopping data • completejourney - github pages. (n.d.). https://bradleyboehmke.github.io/completejourney/index.html

“Solutions for Food Waste in Grocery Stores for Retailers.” ReFED, refed.org/stakeholders/retailers/#:~:text=In%202021%2C%20Retailers%20generated%205.12,confusion%20over%20freshness%20date%20labels. Accessed 5 Oct. 2023.

Regork’s Fruit Frenzy!

Brett Karsten

2023-10-05

Introduction

Packages Required

Data Preparation

Exploratory Data Analysis

Overall

Winter

Spring

Summer

Autumn

Summary & Conclusions