Consumers today are showing a growing preference for healthier, natural, and ethically produced foods — a shift highlighted by Alam et al. (2025) in Emerging Trends in Food Process Engineering: Integrating Sensing Technologies for Health, Sustainability, and Consumer Preferences. With rising interest in trends like organic, low-carb, gluten-free, and protein-rich products, understanding this segment is critical to staying competitive.
While broader market data and consumer research outline these emerging trends, this analysis seeks to reveal which health products are currently driving revenue at Regork and identifying the demographic segments leading this growth.
By combining market insights with transaction-level data, Regork can develop targeted strategies that meet evolving consumer demands and capitalize on these promising product and customer segments.
This study assesses
To execute the code in this R project, the following R packages are required:
library(completejourney) # dataset access
library(dplyr) # data manipulation & transformation
library(ggplot2) # data visualization & plotting
library(stringr) # string detection & manipulation
library(lubridate) # date handling
library(scales) # formatting axis labels
This report draws on the completejourney dataset, which captures one year of transaction data at the household level from 2,469 frequent grocery shoppers. It offers comprehensive records of each household’s purchases across all product categories, along with demographic information and direct marketing history for select households. You can access the full user guide here.
Loading the data
This section imports three completejourney datasets — transactions, products, and demographics —that form the foundation for subsequent analysis.
# Load the completejourney transactions dataset
transactions <- get_transactions()
transactions
## # A tibble: 1,469,307 × 11
## household_id store_id basket_id product_id quantity sales_value retail_disc
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 900 330 31198570044 1095275 1 0.5 0
## 2 900 330 31198570047 9878513 1 0.99 0.1
## 3 1228 406 31198655051 1041453 1 1.43 0.15
## 4 906 319 31198705046 1020156 1 1.5 0.29
## 5 906 319 31198705046 1053875 2 2.78 0.8
## 6 906 319 31198705046 1060312 1 5.49 0.5
## 7 906 319 31198705046 1075313 1 1.5 0.29
## 8 1058 381 31198676055 985893 1 1.88 0.21
## 9 1058 381 31198676055 988791 1 1.5 1.29
## 10 1058 381 31198676055 9297106 1 2.69 0
## # ℹ 1,469,297 more rows
## # ℹ 4 more variables: coupon_disc <dbl>, coupon_match_disc <dbl>, week <int>,
## # transaction_timestamp <dttm>
# Load the completejourney products dataset
products <- products
products
## # A tibble: 92,331 × 7
## product_id manufacturer_id department brand product_category product_type
## <chr> <chr> <chr> <fct> <chr> <chr>
## 1 25671 2 GROCERY Natio… FRZN ICE ICE - CRUSH…
## 2 26081 2 MISCELLANEOUS Natio… <NA> <NA>
## 3 26093 69 PASTRY Priva… BREAD BREAD:ITALI…
## 4 26190 69 GROCERY Priva… FRUIT - SHELF S… APPLE SAUCE
## 5 26355 69 GROCERY Priva… COOKIES/CONES SPECIALTY C…
## 6 26426 69 GROCERY Priva… SPICES & EXTRAC… SPICES & SE…
## 7 26540 69 GROCERY Priva… COOKIES/CONES TRAY PACK/C…
## 8 26601 69 DRUG GM Priva… VITAMINS VITAMIN - M…
## 9 26636 69 PASTRY Priva… BREAKFAST SWEETS SW GDS: SW …
## 10 26691 16 GROCERY Priva… PNT BTR/JELLY/J… HONEY
## # ℹ 92,321 more rows
## # ℹ 1 more variable: package_size <chr>
# Load the completejourney demographics dataset
demographics <- demographics
demographics
## # A tibble: 801 × 8
## household_id age income home_ownership marital_status household_size
## <chr> <ord> <ord> <ord> <ord> <ord>
## 1 1 65+ 35-49K Homeowner Married 2
## 2 1001 45-54 50-74K Homeowner Unmarried 1
## 3 1003 35-44 25-34K <NA> Unmarried 1
## 4 1004 25-34 15-24K <NA> Unmarried 1
## 5 101 45-54 Under 15K Homeowner Married 4
## 6 1012 35-44 35-49K <NA> Married 5+
## 7 1014 45-54 15-24K <NA> Married 4
## 8 1015 45-54 50-74K Homeowner Unmarried 1
## 9 1018 45-54 35-49K Homeowner Married 5+
## 10 1020 45-54 25-34K Homeowner Married 2
## # ℹ 791 more rows
## # ℹ 2 more variables: household_comp <ord>, kids_count <ord>
Dataframe Creation & Transformation
This section performs data filtering, joins, variable creation, and aggregation to prepare key dataframes, for subsequent analysis of health product sales and demographics.
# Identify health-related products based on keywords and selected categories
health_keywords <- c("organic", "gluten free", "no sugar", "protein", "fitness&diet")
selected_categories <- c("ORGANICS FRUIT & VEGETABLES", "FITNESS&DIET", "NATURAL VITAMINS")
products <- products %>%
mutate(
is_health_product = str_detect(str_to_lower(product_type), str_c(health_keywords, collapse = "|")) |
product_category %in% selected_categories
)
health_products <- products %>% filter(is_health_product)
# Filter transactions to include only health products and create weekly timestamps
transactions_health <- transactions %>%
inner_join(products, by = "product_id") %>%
filter(is_health_product) %>%
filter(!is.na(product_category)) %>%
mutate(week = floor_date(transaction_timestamp, "week"))
# Aggregate total weekly sales for health products
weekly_health_sales <- transactions_health %>%
group_by(week) %>%
summarise(health_sales = sum(sales_value, na.rm = TRUE))
# Aggregate total weekly sales for all products
weekly_total_sales <- transactions %>%
mutate(week = floor_date(transaction_timestamp, "week")) %>%
group_by(week) %>%
summarise(total_sales = sum(sales_value, na.rm = TRUE))
# Calculate weekly percentage share of health product sales relative to total sales
health_share <- weekly_health_sales %>%
left_join(weekly_total_sales, by = "week") %>%
mutate(health_sales_pct = health_sales / total_sales)
# Summarize total sales for the three predefined health product categories
health_categories <- transactions_health %>%
group_by(product_category) %>%
summarise(total_sales = sum(sales_value, na.rm = TRUE)) %>%
arrange(desc(total_sales))
# Identify top 10 health product types by total sales
top_products <- transactions_health %>%
group_by(product_type) %>%
summarise(total_sales = sum(sales_value, na.rm = TRUE)) %>%
arrange(desc(total_sales)) %>%
slice_head(n = 10)
# Join demographic info with health product transactions for segmentation analysis
transactions_health_demo <- transactions_health %>%
left_join(demographics, by = "household_id")
# Summarize health product spend by income group
income_spend <- transactions_health_demo %>%
filter(!is.na(income), !is.na(sales_value)) %>%
group_by(income) %>%
summarise(total_spend = sum(sales_value, na.rm = TRUE)) %>%
arrange(income)
# Summarize health product spend by age group
age_spend <- transactions_health_demo %>%
filter(!is.na(age), !is.na(sales_value)) %>%
group_by(age) %>%
summarise(total_spend = sum(sales_value, na.rm = TRUE)) %>%
arrange(age)
# Summarize health product spend by age and income groups
demo_spend <- transactions_health_demo %>%
filter(!is.na(age), !is.na(income)) %>%
group_by(age, income) %>%
summarise(total_spend = sum(sales_value, na.rm = TRUE)) %>%
ungroup()
This section visualizes key trends in health product sales over time, highlights top health products by revenue, and examines spending patterns on health products across income and age demographic groups to uncover insights.
This section explores how health product sales are evolving week-over-week as a share of total store sales.
# Line plot: Health product sales as a share of total weekly sales
ggplot(health_share, aes(x = week, y = health_sales_pct)) +
geom_line(color = "forestgreen", size = 1.3) +
scale_y_continuous(labels = percent_format(accuracy = 0.1)) +
labs(
title = "Health Product Sales as % of Total Weekly Sales",
subtitle = "How is the health category trending as a share of total weekly sales?",
x = "Week",
y = "Health Sales Share (%)",
caption = "Source: completejourney dataset"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 15, margin = margin(b = 10)),
plot.subtitle = element_text(face = "italic", color = "gray40", size = 13, margin = margin(b = 15)),
plot.caption = element_text(hjust = 1, size = 9, color = "gray60", margin = margin(t = 10)),
plot.title.position = "plot",
axis.title.x = element_text(margin = margin(t = 10)),
axis.title.y = element_text(margin = margin(r = 10))
)
This section examines sales patterns across health product categories and types to reveal how revenue is distributed.
# Bar chart: Total sales by health product category (horizontal)
health_categories %>%
filter(product_category %in% selected_categories) %>%
ggplot(aes(x = reorder(product_category, total_sales), y = total_sales, fill = total_sales)) +
geom_col(show.legend = FALSE) +
coord_flip() +
scale_y_continuous(labels = dollar_format()) +
labs(
title = "Total Sales Performance of Health Product Categories",
subtitle = "How much revenue do dedicated health product categories generate overall?",
x = "Product Category",
y = "Total Sales",
caption = "Source: completejourney dataset"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 15, margin = margin(b = 10)),
plot.subtitle = element_text(face = "italic", color = "gray40", size = 13, margin = margin(b = 15)),
plot.caption = element_text(hjust = 1, size = 9, color = "gray60", margin = margin(t = 10)),
plot.title.position = "plot",
axis.title.x = element_text(margin = margin(t = 10)),
axis.title.y = element_text(margin = margin(r = 10))
) +
scale_fill_gradient(low = "lightgreen", high = "darkgreen")
# Dot plot: Top 10 health product types ranked by total sales
ggplot(top_products, aes(x = reorder(product_type, total_sales), y = total_sales)) +
geom_point(color = "forestgreen", size = 4) +
coord_flip() +
scale_y_continuous(labels = scales::dollar_format()) +
labs(
title = "Top 10 Health Product Types by Total Sales",
subtitle = "Which health product types contribute most to overall sales?",
x = "Product Type",
y = "Total Sales",
caption = "Source: completejourney dataset"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 15, margin = margin(b = 10)),
plot.subtitle = element_text(face = "italic", color = "gray40", size = 13, margin = margin(b = 15)),
plot.caption = element_text(hjust = 1, size = 9, color = "gray60", margin = margin(t = 10)),
plot.title.position = "plot",
axis.title.x = element_text(margin = margin(t = 10)),
axis.title.y = element_text(margin = margin(r = 10))
)
Here, we explore how health product spending varies across income and age to surface patterns in consumer behavior.
# Dot plot: Total health product spend by income group
ggplot(income_spend, aes(x = reorder(income, total_spend), y = total_spend)) +
geom_point(color = "darkblue", size = 5) +
scale_y_continuous(labels = dollar_format()) +
labs(
title = "Total Health Product Spend by Income Group",
subtitle = "How does spending on health products vary across income groups?",
x = "Income Group",
y = "Total Spend",
caption = "Source: completejourney dataset"
) +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(face = "bold", size = 15, margin = margin(b = 10)),
plot.subtitle = element_text(face = "italic", color = "gray40", size = 13, margin = margin(b = 15)),
plot.caption = element_text(hjust = 1, size = 9, color = "gray60", margin = margin(t = 10)),
plot.title.position = "plot",
axis.title.x = element_text(margin = margin(t = 10)),
axis.title.y = element_text(margin = margin(r = 10))
)
# Box plot: Distribution of per-transaction spend by income group
ggplot(transactions_health_demo %>% filter(!is.na(income)), aes(x = income, y = sales_value)) +
geom_boxplot(fill = "lightblue") +
scale_y_continuous(labels = scales::dollar_format()) +
labs(
title = "Health Product Spend Distribution by Income Group",
subtitle = "What is the range of spend per transaction across different income groups?",
x = "Income Group",
y = "Spend per Transaction",
caption = "Source: completejourney dataset"
) +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(face = "bold", size = 15, margin = margin(b = 10)),
plot.subtitle = element_text(face = "italic", color = "gray40", size = 13, margin = margin(b = 15)),
plot.caption = element_text(hjust = 1, size = 9, color = "gray60", margin = margin(t = 10)),
plot.title.position = "plot",
axis.title.x = element_text(margin = margin(t = 10)),
axis.title.y = element_text(margin = margin(r = 10))
)
# Violin plot: Distribution of per-transaction spend by age group
ggplot(transactions_health_demo %>% filter(!is.na(age)), aes(x = age, y = sales_value)) +
geom_violin(fill = "lightgreen") +
scale_y_continuous(labels = scales::dollar_format()) +
labs(
title = "Health Product Spend Distribution by Age Group",
subtitle = "How does spend per transaction on health products vary across age groups?",
x = "Age Group",
y = "Spend per Transaction",
caption = "Source: completejourney dataset"
) +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(face = "bold", size = 15, margin = margin(b = 10)),
plot.subtitle = element_text(face = "italic", color = "gray40", size = 13, margin = margin(b = 15)),
plot.caption = element_text(hjust = 1, size = 9, color = "gray60", margin = margin(t = 10)),
plot.title.position = "plot",
axis.title.x = element_text(margin = margin(t = 10)),
axis.title.y = element_text(margin = margin(r = 10))
)
# Heatmap: Total health product spend by income and age group
ggplot(demo_spend, aes(x = income, y = age, fill = total_spend)) +
geom_tile() +
scale_fill_gradient(low = "white", high = "darkgreen", labels = scales::dollar_format()) +
labs(
title = "Health Product Total Spend By Income and Age Group",
subtitle = "How is overall health product spend distributed across income and age segments?",
x = "Income Group",
y = "Age Group",
caption = "Source: completejourney dataset",
fill = "Total Spend") +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(face = "bold", size = 15, margin = margin(b = 10)),
plot.subtitle = element_text(face = "italic", color = "gray40", size = 13, margin = margin(b = 15)),
plot.caption = element_text(hjust = 1, size = 9, color = "gray60", margin = margin(t = 10)),
plot.title.position = "plot",
axis.title.x = element_text(margin = margin(t = 10)),
axis.title.y = element_text(margin = margin(r = 10))
)
(i) Problem Statement
The analysis aims to understand the impact of health and wellness products within Regork’s overall sales landscape and uncover opportunities for growth. As consumer interest in healthier options continue to grow, this analysis explores how significantly these products contribute to weekly sales, which offerings are gaining traction, and which customer groups are driving sales.
(ii) Approach (Data & Methodology)
To address the problem statement, the analysis leverages the
complete_journey
dataset, which contains transaction-level
grocery purchase data and household demographics.
The methodology involved:
(iii) Key Insights
Health Product Sales Share Health and wellness products make up a small portion of total weekly sales. The share ranges from a minimum of 0.24% to a maximum of 0.56%, with an average around 0.38%. The highest weekly share occurred in the week of September 17, 2017, where health products accounted for 0.56% of total sales.
Top Health Product Categories The leading category is Organics Fruit & Vegetables, with total sales of approximately $10,987 over the year. This is followed by Fitness & Diet products at about $5,938, and Natural Vitamins at around $194.
Top Health Product Types Among product types, Fitness & Diet Bars lead with total sales around $4,581. Other top products include Organic Salad Mix ($3,010), Organic Herbs ($1,536), and Organic Carrots ($982).
Spending by Income Group Households earning $50K-$74K collectively spend the most on health products, totaling around $2,576 across the year, followed by those earning $75K-$99K ($1,531) and $35K-$49K ($1,225). Lower income groups under $15K and $15K-$24K spend comparatively less in total, at $542 and $455, respectively.
Spending by Age Group The 45-54 age group accounts for the highest total spending with approximately $4,135 in annual health product sales, followed by the 35-44 group ($3,152) and 25-34 group ($2,191). Both younger (19-24) and older (55+) age groups spend less on health products.
Top Spending Segments by Income and Age The highest spending segments are households aged 35-44 earning $50K-$74K, with $919 in total spend, and those aged 45-54 within the same income bracket ($804). High earners aged 45-54 with incomes over $250K also showed notable total spending ($697).
Attribute Coverage in Product Assortment Current product offerings lack popular health-related attributes such as low fat, gluten-free, vegan, keto, high protein, and paleo.
(iv) Consumer Implications & CEO Recommendations
To grow the health and wellness category, Regork should prioritize expanding and promoting organic produce and fitness/diet offerings, which currently lead sales.
Marketing strategies should focus on the mid-income, middle-aged demographic who demonstrate strong purchasing behavior in this category.
To further expand market share, Regork should explore messaging to attract younger shoppers and high-income consumers, potentially tapping into new growth opportunities.
Additionally, Regork should address gaps in its product assortment to strengthen its natural, ethical brand positioning to unlock new segment potential.
(v) Limitations & Future Improvements
The analysis is based on one year of historical data from a limited set of households, which may not represent broader or future market trends.
Health products were identified using keyword and category filters, which may miss some relevant products or include gray area items.
Income and age groupings are broad, and more granular segmentation could reveal deeper insights.
Using promotional campaign data could help show how marketing impacts health product sales.