Introduction

The Problem

Produce is disproportionately sold to certain demographics and could potentially be expanded to a broader market

Significance

Fresh produce is known for health benefits to consumers which would improve public perception of Regork if they showed initiative in fresh produce. Not only would the reputation boost increase sales, but a greater quantity of produce being sold would allow for more advantageous partnerships with suppliers. If Regork were to buy more product more consistently they could likely negotiate lower prices. Having a greater connection to farm suppliers would also emphasize advantages of healthier, higher quality products that Regork has over convenience stores.

Overview

I tried to find which age groups and income groups would be a good target for increasing produce sales. To do this I had to get rid of bias due to most sales in general being made be young to middle-aged adults. I created ratios that would calculate the amount of produce products per basket and then amount of produce products per household member. Once I had them ready to compare I identified which had potential to increase sales amount so that they could be targeted by a marketing campaign.

Packages and Setup

I used the completejourney package for the grocery data as well as the tidyverse package collection for all of the data manipulation.

# Load Relevant Packages
suppressWarnings(suppressMessages(library(completejourney))) #Grocery Data Package
suppressWarnings(suppressMessages(library(tidyverse))) # Data Manipulation Package Collection
# Create Raw Data Variables
dem<-demographics
prod<- products
transacts<- get_transactions()
# Joins
dem_transacts <- inner_join(dem, transacts, by= "household_id")
dem_transacts_prod<- inner_join(dem_transacts, prod, by= "product_id")

Exploratory Data Analysis

First, I had to identify which demographics did not buy as many produce products. I had to make all demographics equal in order to have a reasonable comparison. I did this by creating a variable that is the produce quantity per basket.

produce <- dem_transacts_prod %>%
  filter(department == "PRODUCE") %>%
  select(household_id, basket_id, age, income, quantity, sales_value, household_size)

produce <- produce %>%
  mutate(household_size = ifelse(household_size == "5+", 5.5, as.numeric(household_size)))

# Which age range buys the most produce

produce_age<- produce %>%
  group_by(age) %>%
  summarise(bskt_count= n_distinct(basket_id), total_quantity= sum(quantity), total_sales= sum(sales_value)) %>%
  mutate(prod_bskt_ratio= total_quantity/bskt_count)

produce_age %>%
  select(age, prod_bskt_ratio) %>%
  ggplot(aes(x = age, y = prod_bskt_ratio)) +
  geom_col() +
  theme_minimal() +
  labs(title = "Produce Products per Basket by Age",
       x = "Age",
       y = "Product Basket Ratio")

# Which income range buys the most produce

produce_income<- produce %>%
  group_by(income) %>%
  summarise(bskt_count= n_distinct(basket_id), total_quantity= sum(quantity), total_sales= sum(sales_value)) %>%
  mutate(prod_bskt_ratio= total_quantity/bskt_count)

produce_income %>%
  select(income, prod_bskt_ratio) %>%
  ggplot(aes(x = income, y = prod_bskt_ratio)) +
  geom_col() +
  theme_minimal() +
  labs(title = "Produce Products per Basket by Income",
       x = "Income",
       y = "Product Basket Ratio")

After evaluating produce bought by income and age, it became clear that individuals with higher incomes bought more produce products and individuals of 25-54 bought the most produce products.

This prompted the question of whether they were buying for more people. So, I found the average size of a household in each income range and age group.

produce$household_size<- is.numeric(produce$household_size)

produce_hshld_age<-produce %>%
  group_by(age) %>%
  summarize(avg_size= mean(household_size))

produce_hshld_income<-produce %>%
  group_by(income) %>%
  summarize(avg_size= mean(household_size))

I then adjusted the original ratio to account for this average size

produce_age %>%
  inner_join(produce_hshld_age, by = "age") %>%
  mutate(adj_ratio = prod_bskt_ratio / avg_size) %>%
  filter(!is.na(adj_ratio)) %>%
  ggplot(aes(x = age, y = adj_ratio)) +
  geom_col() +
  theme_minimal() +
  labs(title = "Age vs. Adjusted Product Basket Ratio",
       x = "Age",
       y = "Adjusted Ratio")

produce_income %>%
  inner_join(produce_hshld_income, by = "income") %>%
  mutate(adj_ratio = prod_bskt_ratio / avg_size) %>%
  filter(!is.na(adj_ratio)) %>% # Remove rows where adj_ratio is NA
  ggplot(aes(x = income, y = adj_ratio)) +
  geom_col() +
  theme_minimal() +
  labs(title = "Income vs. Adjusted Product Basket Ratio",
       x = "Income",
       y = "Adjusted Ratio")

The age range of 25-54 still have the highest ratios while the oldest and youngest ranges have the lowest ratios.

On the income side, the wealthier still tend to buy more produce products but we see that the lowest range of income buys more produce products than the next few higher ranges. This gives us an understanding that other factors than price are influencing the lower income groups in buying less produce products.

Summary

I went about this produce opportunity by exploring demographics to find who might be underserved. I did this because I think it would create a strong postive image for Regork as promoting health while also capitalizing on . The findings from my analysis told me that individuals around the youngest and oldest ages could buy more produce. What stood out was that the oldest age range did not have as high of a ratio as I would have expected. This impacts the customer because increasing the focus on selling fresh produce to the younger and oldest customers, would likely increase sales and tie Regork to positive attributes of health.

Recommendation

I would recommend a marketing push for produce highlighting farm sourced fresh produce at Regork and mentioning the health aspect of that produce for young adults and elders. This should be targeted to encourage those age groups to buy more produce.

Limitations

I could improve this project by accounting for more factors that could explain what I found, such as higher incomes buying more due to household size and whether coupons for produce products are equally prevalent within each income range and age range. This analysis also uses products per household member instead of servings which could be misleading if individuals were buying packs of produce instead of individual fruits or vegetables. This also does not analyze the in store competition factor with customers maybe choosing to buy a produce product they normally wouldn’t instead of another product at the store. I also admit that my business problem did not turn out as impactful as what I thought when I started the analysis, but I was too far in to turn back.