In todays fast paced world American adults and families are always on the run in the morning and hardly ever have time to eat a traditional breakfast. Due to this, a lot of people now turn to alternates such as yogurt. Being able to know the target demographic and what common food items are purchased with yogurt(both multi-pack and non multi-pack) could give Regork a competative advantage in optimizing their sales in yogurt. The CEO should be interested in this because the data can inform targeted marketing strategies, product promotions, and more effective sales tactics.
I choose to address this problem by implementing exploratory data analysis (EDA) to view trends of demographic data such as income, number of kids, household number, marital status, and many others. Additionally, I choose to address the top product transactions that were purchased with yogurt.
Our analysis will help the Regork CEO by being able to use our finding to create a clear marketing plan to increase yogurt sales. Finding the target demographic and then food items that are purchased along side yogurt. With this information, Regork can develop targeted marketing strategies, including the design of specific promotions and coupon strategies to increase yogurt sales and drive customer loyalty.
library(completejourney) # The main source of data for this project, it's data from a grocery store
library(tidyverse) # General package that includes many packages used for data manipulation(including ggplot2 and dplyr)
library(ggplot2) # Package used to graph and plot data visualization in r
library(scales) # Provides functions for formatting and scaling data in visualizations
transactions <- get_transactions()
products <- products
demographics <- demographics
non_multi_pack_yogurt <- products%>%
filter(tolower(product_type) == "yogurt not multi-packs")
multi_pack_yogurt <- products %>%
filter(tolower(product_type) == "yogurt multi-packs")
all_yogurt <- products%>%
filter(grepl("yogurt", product_type, ignore.case = TRUE))
The graph below shows the relationship between customers ages and their income range and the number of yogurts those demographic groups purchase. Looking at these graphs you can learn a lot about the demographic of yogurt consumers.
First, you can see that the largest number of yogurt buyers fell in the income ranges of “35-49K”, “50-74k”, and 75-99K. Along with this, the most common age groups to purchase yogurts appear to be “25-34,”35-44”, “45-54”.
Being able to see this data allows us to be able to narrow our demographic scope down to specific income ranges and age groups, all which will help create a better marketing and coupon plan to generate more sales.
age_income <- transactions %>%
inner_join(products, by = "product_id") %>%
inner_join(demographics, by = "household_id") %>%
mutate(product_type = tolower(trimws(product_type))) %>%
filter(product_type %in% c("yogurt not multi-packs","yogurt multi-packs"))
ggplot(age_income, aes(x = age, fill = product_type)) +
geom_bar() +
labs(title = "Age to Income Relationship for Yogurt Buyers",
x = "Age",
y = "Count",
fill = "Product Type") +
facet_wrap(~ income) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
This scatter plot analysis clearly shows that married individuals tend to spend more on yogurt compared to unmarried individuals. While both groups exhibit a range of spending, married people consistently have higher median spending.
The higher spending among married individuals is particularly evident in the upper range, suggesting they may be purchasing larger quantities. This is likely for family use, as indicated by the higher prevalence of multi-packs among this group.
Although there are a few outliers in both groups, married individuals are the clear trendsetters when it comes to yogurt purchases.
data_marital_status <- transactions %>%
inner_join(products, by = "product_id") %>%
inner_join(demographics, by = "household_id") %>%
mutate(product_type = tolower(trimws(product_type))) %>%
filter(product_type %in% c("yogurt not multi-packs", "yogurt multi-packs")) %>%
group_by(household_id, income, marital_status, product_type, age) %>%
summarize(total_spent = sum(sales_value), .groups = "drop") %>%
drop_na()
ggplot(data_marital_status, aes(x = marital_status, y = total_spent, color = product_type)) + geom_jitter(alpha = 0.6, size = 3, width = 0.2) +
geom_boxplot(alpha = 0.2, outlier.shape = NA, color = "black") +
scale_color_manual(values = c("yogurt not multi-packs" = "#E69F00",
"yogurt multi-packs" = "#56B4E9")) +
labs( title = "Marital Status vs. Total Yogurt Spending", x = "Marital Status", y = "Total Spent on Yogurt ($)", color = "Product Type" ) + theme_minimal(base_size = 14)
This chart illustrates yogurt spending by household size, comparing single-serving and multi-packs. It was created by merging transactions (purchases), products (yogurt type), and demographics (household size).
After filtering specifically for yogurt, total spending was summed by household size and product type to analyze purchasing behavior.
The bar chart visually compares spending, while trend lines highlight overall patterns in yogurt consumption.
data_household_size <- transactions %>%
inner_join(products, by = "product_id") %>%
inner_join(demographics, by = "household_id") %>%
mutate(product_type = tolower(trimws(product_type))) %>%
filter(product_type %in% c("yogurt not multi-packs", "yogurt multi-packs")) %>%
group_by(household_size, product_type) %>%
summarise(total_spent = sum(sales_value), .groups = "drop")
ggplot(data_household_size, aes(x = factor(household_size), y = total_spent, fill = product_type)) +
geom_bar(stat = "identity", position = "dodge") +
geom_smooth(aes(group = product_type, color = product_type), method = "lm", se = FALSE) +
labs(title = "Average Yogurt Spending by Household Size",
x = "Household Size",
y = "Average Total Spent on Yogurt ($)",
fill = "Product Type",
color = "Trend Line") +
theme_minimal()
yogurt_households <- transactions %>%
inner_join(products, by = "product_id") %>%
mutate(product_type = tolower(trimws(product_type))) %>%
filter(product_type %in% c("yogurt not multi-packs", "yogurt multi-packs")) %>%
pull(household_id)
products_sold_with_yogurt <- transactions %>%
inner_join(products, by = "product_id") %>%
filter(household_id %in% yogurt_households) %>%
mutate(product_type = tolower(trimws(product_type))) %>%
filter(!(product_type %in% c("yogurt not multi-packs", "yogurt multi-packs")))%>%
mutate(product_type = recode(product_type,
"fluid milk white only" = "Whole Milk",
"soft drinks 12/18&15pk can car" = "Soft Drinks(Cans)",
"sft drnk 2 liter btl carb incl" = "2 Liter Soft Drink",
"bananas" = "Bananas",
"shredded cheese" = "Shredded Cheese",
"gasoline-reg unleaded" = "Gasoline",
"mainstream white bread" = "White Bread",
"candy bars (singles)(including" = "Candy Bars",
"potato chips" = "Potato Chips ",
"frzn ss premium entrees/dnrs/n" = "Frozen Dinners"
))
top_products <- products_sold_with_yogurt %>%
count(product_type, sort = TRUE) %>%
slice_max(order_by = n, n = 10)
ggplot(top_products, aes(x = reorder(product_type, n), y = n, fill = product_type)) +
geom_bar(stat = "identity", width = 0.6, show.legend = FALSE) +
geom_text(aes(y = n + max(n) * 0.08, label = scales::comma(n)), size = 5, fontface = "bold") +
coord_flip() +
scale_fill_manual(values = c("#1b9e77", "#d95f02", "#7570b3", "#e7298a", "#66a61e",
"#e6ab02", "#a6761d", "#666666", "#b3b3b3", "#ff7f00")) +
scale_y_continuous(expand = expansion(mult = c(0.05, 0.15))) +
labs(
title = "Top 10 Products Commonly Purchased with Yogurt",
x = "Product Type",
y = "Number of Purchases"
) +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(face = "bold", hjust = 0.5, size = 14),
axis.text.x = element_text(size = 14),
axis.text.y = element_text(size = 14),
axis.title.x = element_text(face = "bold"),
axis.title.y = element_text(face = "bold"),
plot.margin = margin(10, 10, 10, 10)
)
OVERVIEW OF PROJECT
This analysis aimed to help Regork increase yogurt sales by identifying key consumer demographics and purchasing behaviors. As yogurt consumption has grown significantly due to its convenience and health benefits, understanding who buys yogurt and what they purchase alongside it would allow Regork to develop targeted marketing strategies, optimize product placement, and improve coupon promotions. By examining household purchasing trends, the study sought to reveal insights into income levels, household size, marital status, and complementary product purchases, helping Regork capitalize on the yogurt market’s growth.
METHOD AND DATA
To explore these trends, we conducted Exploratory Data Analysis (EDA) using 2017 grocery transaction data from the “CompleteJourney” dataset. We combined data on transactions, product details, and household demographics to analyze yogurt purchasing patterns. Using R for data manipulation and visualization, we examined income levels, household sizes, and marital status in relation to yogurt purchases, alongside identifying the most common complementary items bought with yogurt. The methodology included data wrangling, filtering, and statistical visualizations, such as bar charts and trend analyses, to uncover consumer insights.
DATA INSIGHTS
Our findings showed that middle-income earners ($35K–$99K) were the most frequent yogurt buyers, and married couples purchased more yogurt than single individuals. Household size significantly impacted purchasing behavior—larger families preferred multi-pack yogurts, while smaller households leaned toward single-serve options. Additionally, we identified top complementary products purchased with yogurt, including bananas, milk, bread, and soft drinks, suggesting opportunities for strategic bundling and in-store placement to increase sales.
PROPOSE TO CEO
Based on these insights, we recommend personalized promotions targeting different household sizes, such as multi-pack discounts for larger families and single-serve yogurt promotions in urban areas with smaller households. Additionally, bundling yogurt with frequently purchased complementary items could drive cross-category sales. Coupon strategies should focus on family-oriented promotions, encouraging bulk purchases while maintaining appeal for smaller households. Optimizing product placement in stores by considering household demographics would also enhance visibility and sales.
LIMITATIONS
This analysis had several limitations, including the age of the dataset (2017), lack of online grocery purchase data, and limited brand-specific insights. Additionally, it did not consider seasonal trends or holiday-driven purchasing behaviors, which could significantly influence yogurt sales. Future research should incorporate more recent transaction data, online grocery sales, and seasonal factors to refine insights. Expanding the study to include customer loyalty programs and brand preferences would also provide a more comprehensive strategy for Regork to enhance its yogurt sales initiatives.