knitr::include_graphics("https://raw.githubusercontent.com/MKudanova/Data607/main/FINAL_PROJECT/pics/IMG_7961.jpeg")Tracking Grocery Spending from a Consumer Perspective Over a Three-Year Period
Introduction
This project analyzes personal grocery receipts from Whole Foods collected between 2024 and 2026, comparing observed price changes to official U.S. inflation metrics (BLS CPI “Food at home”). The analysis is conducted entirely from a regular consumer’s perspective and does not claim to be a representative study of broader consumer behavior.
Motivation
Over the past three years, I personally collected and preserved grocery receipts from my regular Whole Foods shopping trips with the intent to one day analyze my own spending patterns. This project provided the opportunity to transform that personal archive into a structured longitudinal dataset and apply data science methods to a question that directly affects everyday life: how much have grocery prices actually changed, and does the official inflation rate reflect what real consumers experience at the checkout line?
Research Questions
How have prices of consistently purchased grocery items changed between 2024 and 2026, and how do these changes compare to the official U.S. food inflation rate reported by the Bureau of Labor Statistics?
As a secondary inquiry, this project examines whether Amazon Prime membership discounts at Whole Foods offset rising prices over the same period.
Data Source
Primary: Self-constructed dataset derived from personal Whole Foods receipts (2024–2026), digitized via OCR photograph extraction, and uploaded to a GitHub repository. Secondary: BLS Consumer Price Index “Food at home” series (CUUR0000SAF11), downloaded directly from the U.S. Bureau of Labor Statistics for the same time period.
Approach
This project follows the OSEMN data science workflow (Obtain, Scrub, Explore, Model, and Interpret). First, receipt data was obtained via personal photographs and digitized for OCR (Optical Character Recognition) extraction, then stored as CSV files on GitHub alongside BLS CPI data downloaded from data.bls.gov. Next, the data was scrubbed, dates were standardized, non-food rows removed, per-unit prices extracted from weighted items, and OCR errors corrected via a manually verified synonym lookup table. The data was then explored through monthly price trend visualizations for each basket item across 2024, 2025, and 2026, and a 5-item personal basket was constructed from consistently purchased products. Following that, a year-over-year personal price change rate was calculated using an equal-weighted basket and compared to the BLS Food at Home CPI. Finally, a secondary analysis examined Amazon Prime savings trends at Whole Foods over the same period to assess whether membership discounts offset rising prices.
Code Base
library(tidyverse) # data manipulation and ggplot2
library(lubridate) # date handling
library(stringdist) # fuzzy string matchingStep 1: Data Cleaning
base_url <- "https://raw.githubusercontent.com/MKudanova/Data607/refs/heads/main/FINAL_PROJECT/"
wf_2024 <- read_csv(paste0(base_url, "whole_foods_2024.csv"))Rows: 228 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): date, store, item
dbl (4): price, subtotal, tax, total
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
wf_2025 <- read_csv(paste0(base_url, "whole_foods_2025_updated.csv"))Rows: 326 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): date, store, item
dbl (4): price, subtotal, tax, total
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
wf_2026 <- read_csv(paste0(base_url, "whole_foods_2026.csv"))Rows: 138 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): date, store, item, price
dbl (3): subtotal, tax, total
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(wf_2024)# A tibble: 6 × 7
date store item price subtotal tax total
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 12/01/2024 Whole Foods Manhattan Mult Salmon 4.13 NA NA NA
2 12/01/2024 Whole Foods Manhattan OG Smkg Clam 4.12 NA NA NA
3 12/01/2024 Whole Foods Manhattan Prtno AC GS FD Di… 7.85 NA NA NA
4 12/01/2024 Whole Foods Manhattan Adore Grsy C Clam… 2.79 NA NA NA
5 12/01/2024 Whole Foods Manhattan Alaki Unscrnd Ulm… 15.0 NA NA NA
6 12/01/2024 Whole Foods Manhattan Yellow Peach Body… 12.3 NA NA NA
head(wf_2025)# A tibble: 6 × 7
date store item price subtotal tax total
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 01/02/2025 Whole Foods Manhattan OG4600 Hads Aresi… 3.63 NA NA NA
2 01/02/2025 Whole Foods Manhattan Kngt Cl OG Hals P… 7.79 NA NA NA
3 01/02/2025 Whole Foods Manhattan Adob OG Vma Arb H… 4.94 NA NA NA
4 01/02/2025 Whole Foods Manhattan OG Cosmos Cls 4.79 NA NA NA
5 01/02/2025 Whole Foods Manhattan Tars Mght 0 1lb 4.04 NA NA NA
6 01/02/2025 Whole Foods Manhattan Frst Jumbo Bluebe… 8.99 NA NA NA
head(wf_2026)# A tibble: 6 × 7
date store item price subtotal tax total
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 01/20/2026 Whole Foods Manhattan Walk 2 Organic Mi… 13.38 NA NA NA
2 01/20/2026 Whole Foods Manhattan Pol's Spring Water 2.29 NA NA NA
3 01/20/2026 Whole Foods Manhattan Aurora Medjool Da… 8.69 NA NA NA
4 01/20/2026 Whole Foods Manhattan OG Bush Pear 4.78 NA NA NA
5 01/20/2026 Whole Foods Manhattan Tidy Weight Cat 2.58 NA NA NA
6 01/20/2026 Whole Foods Manhattan Broccoli Crowns 2.59 NA NA NA
The receipt data loads successfully across all three years. Each file contains transaction-level records with dates, items, and prices.
Step 3: Merging dataframes
master <- bind_rows(
wf_2024 %>% select(date, store, item, price) %>% mutate(price = as.numeric(price)),
wf_2025 %>% select(date, store, item, price) %>% mutate(price = as.numeric(price)),
wf_2026 %>% select(date, store, item, price) %>% mutate(price = as.numeric(price))
) %>%
mutate(date = mdy(date))Warning: There was 1 warning in `mutate()`.
ℹ In argument: `price = as.numeric(price)`.
Caused by warning:
! NAs introduced by coercion
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `date = mdy(date)`.
Caused by warning:
! 8 failed to parse.
head(master)# A tibble: 6 × 4
date store item price
<date> <chr> <chr> <dbl>
1 2024-12-01 Whole Foods Manhattan Mult Salmon 4.13
2 2024-12-01 Whole Foods Manhattan OG Smkg Clam 4.12
3 2024-12-01 Whole Foods Manhattan Prtno AC GS FD Diet 7.85
4 2024-12-01 Whole Foods Manhattan Adore Grsy C Clam Shamon 2.79
5 2024-12-01 Whole Foods Manhattan Alaki Unscrnd Ulma Pads 15.0
6 2024-12-01 Whole Foods Manhattan Yellow Peach Body Wash 12.3
master <- master %>%
arrange(date)
head(master, 20)# A tibble: 20 × 4
date store item price
<date> <chr> <chr> <dbl>
1 2024-01-05 Whole Foods Manhattan Navel Orange 4.41
2 2024-01-05 Whole Foods Manhattan Broccoli Crowns 3.39
3 2024-01-05 Whole Foods Manhattan Almond Milk 4.69
4 2024-01-05 Whole Foods Manhattan OG Avocado Hass 2.99
5 2024-01-05 Whole Foods Manhattan Banana 1.51
6 2024-01-05 Whole Foods Manhattan Savings with Prime -2
7 2024-01-17 Whole Foods Manhattan SP100 Shrcomp Wfr 7.28
8 2024-01-17 Whole Foods Manhattan Malk OG HPP Almond Milk 3.19
9 2024-01-17 Whole Foods Manhattan Container Deposit 0.05
10 2024-01-17 Whole Foods Manhattan OG Nbs Almond Milk 4.99
11 2024-01-17 Whole Foods Manhattan Banana 2.39lb @ $0.59/lb 1.41
12 2024-01-17 Whole Foods Manhattan Prime Extra 10.00 0.57
13 2024-02-08 Whole Foods Manhattan OG Fuyu item 1.99
14 2024-02-08 Whole Foods Manhattan Lucni OG Olive Oil 17.7
15 2024-02-08 Whole Foods Manhattan OG Vell New Almond Milk 3.19
16 2024-02-08 Whole Foods Manhattan Satfr Wild Arugula Clam 3.08
17 2024-02-08 Whole Foods Manhattan OG Green Asparagus 5.29
18 2024-02-08 Whole Foods Manhattan Vilfr OG Large Eggs 13.0
19 2024-02-08 Whole Foods Manhattan Maks Grmzd OG Pasta Sce 5.79
20 2024-02-08 Whole Foods Manhattan Savings with Prime -2
Data is sorted chronologically for easier temporal analysis.
Step 4: Standardizing
Separating Prime savings into their own table, then removing non-food rows (savings, bag fees, deposits, tax) from the main dataset.
master <- master %>%
mutate(
weight_lb = as.numeric(str_extract(item, "\\d+\\.?\\d*(?=\\s*lb)")),
price_per_lb = as.numeric(str_extract(item, "(?<=@\\s\\$)\\d+\\.?\\d*")),
quantity = as.numeric(str_extract(item, "(?i)(?<=qty\\s)\\d+"))
)
master %>%
filter(!is.na(price_per_lb)) %>%
select(date, item, price, weight_lb, price_per_lb) %>%
head(20)# A tibble: 20 × 5
date item price weight_lb price_per_lb
<date> <chr> <dbl> <dbl> <dbl>
1 2024-01-17 Banana 2.39lb @ $0.59/lb 1.41 2.39 0.59
2 2024-02-14 Banana 1.38lb @ $0.59/lb 0.81 1.38 0.59
3 2024-02-14 Clrn Madzcd OG 1lb @ $5.69/lb 5.69 1 5.69
4 2024-03-15 Banana 2lb @ $0.59/lb 1.67 2 0.59
5 2024-04-14 Broccoli Crowns 1.26lb @ $2.99/lb 3.77 1.26 2.99
6 2024-04-28 Banana 2lb @ $0.59/lb 1.18 2 0.59
7 2024-05-10 OG Green Asparagus 1lb @ $5.99/lb 5.29 1 5.99
8 2024-06-18 Banana 1lb @ $0.59/lb 0.59 1 0.59
9 2024-07-17 1.29 Alght 0 1lb @ $2.99/lb 3.88 1 2.99
10 2024-08-10 Banana 2.34lb @ $0.59/lb 1.38 2.34 0.59
11 2024-08-10 OG 2.15 @ $1.99/lb 4.28 NA 1.99
12 2024-08-10 Gaby 2 1.46 @ $3.49 5.1 NA 3.49
13 2024-09-14 Banana 2.34lb @ $0.59/lb 1.38 2.34 0.59
14 2024-10-05 Tars Mght 0 1lb @ $2.99/lb 2.99 1 2.99
15 2024-11-02 Banana 0.28lb @ $0.59/lb 0.42 0.28 0.59
16 2024-12-31 OG Fuyu Persimmon 0.53lb @ $4.99/lb 2.64 0.53 4.99
17 2024-12-31 Svrno Pasta Bar 0.98lb @ $7.99/lb 7.83 0.98 7.99
18 2024-12-31 Lcils Le Petit Brie 0.33lb @ $16.99/… 5.61 0.33 17.0
19 2025-01-16 Banana 0.79lb @ $0.59/lb 0.47 0.79 0.59
20 2025-02-01 Green Asparagus 4 1.16lb @ $4.79/lb 4.84 1.16 4.79
prime_savings <- master %>%
filter(str_detect(tolower(item), "saving|prime")) %>%
mutate(savings_amount = abs(price))
master_clean <- master %>%
filter(
!str_detect(tolower(item), "saving|prime"),
!str_detect(tolower(item), "bag fee|bag charge"),
!str_detect(tolower(item), "container deposit"),
!str_detect(tolower(item), "tax"),
!is.na(price),
price > 0
)
cat("Total rows before cleaning:", nrow(master), "\n")Total rows before cleaning: 692
cat("Total rows after cleaning:", nrow(master_clean), "\n")Total rows after cleaning: 581
cat("Prime savings rows:", nrow(prime_savings), "\n")Prime savings rows: 88
head(master_clean, 20)# A tibble: 20 × 7
date store item price weight_lb price_per_lb quantity
<date> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 2024-01-05 Whole Foods Manhattan Navel… 4.41 NA NA NA
2 2024-01-05 Whole Foods Manhattan Brocc… 3.39 NA NA NA
3 2024-01-05 Whole Foods Manhattan Almon… 4.69 NA NA NA
4 2024-01-05 Whole Foods Manhattan OG Av… 2.99 NA NA NA
5 2024-01-05 Whole Foods Manhattan Banana 1.51 NA NA NA
6 2024-01-17 Whole Foods Manhattan SP100… 7.28 NA NA NA
7 2024-01-17 Whole Foods Manhattan Malk … 3.19 NA NA NA
8 2024-01-17 Whole Foods Manhattan OG Nb… 4.99 NA NA NA
9 2024-01-17 Whole Foods Manhattan Banan… 1.41 2.39 0.59 NA
10 2024-02-08 Whole Foods Manhattan OG Fu… 1.99 NA NA NA
11 2024-02-08 Whole Foods Manhattan Lucni… 17.7 NA NA NA
12 2024-02-08 Whole Foods Manhattan OG Ve… 3.19 NA NA NA
13 2024-02-08 Whole Foods Manhattan Satfr… 3.08 NA NA NA
14 2024-02-08 Whole Foods Manhattan OG Gr… 5.29 NA NA NA
15 2024-02-08 Whole Foods Manhattan Vilfr… 13.0 NA NA NA
16 2024-02-08 Whole Foods Manhattan Maks … 5.79 NA NA NA
17 2024-02-14 Whole Foods Manhattan Banan… 0.81 1.38 0.59 NA
18 2024-02-14 Whole Foods Manhattan Brocc… 4.38 NA NA NA
19 2024-02-14 Whole Foods Manhattan Celer… 1.19 NA NA NA
20 2024-02-14 Whole Foods Manhattan Vilfr… 13.0 NA NA NA
Non-food items (fees, taxes, discounts) are separated into their own table. The cleaned dataset removes approximately 10% of rows, leaving only actual grocery items for analysis.
Step 5: Finding items that constantly appear in all three years
#recurring-items
synonyms <- tribble(
~item_clean, ~item_standard,
"vilfr og large eggs", "Vital Farms Organic Eggs 18pk",
"vilr og large eggs", "Vital Farms Organic Eggs 18pk",
"vell new almond milk", "Malk Almond Milk",
"malk og hpp almond milk", "Malk Almond Milk",
"walk og hep almond milk", "Malk Almond Milk",
"milk og hep almond milk", "Malk Almond Milk",
"banana", "Banana",
"avocado hass", "Avocado (4 Count)",
"mas avocado hass", "Avocado (4 Count)",
"vilr og avocado", "Avocado (4 Count)",
"satfr wild arugula clam", "Satur Farms Wild Arugula",
"satfr wild arugula", "Satur Farms Wild Arugula"
)
recurring <- master_clean %>%
mutate(
item_clean = item %>%
tolower() %>%
str_remove_all("\\d+\\.?\\d*\\s*(lb|oz|kg|g)") %>%
str_remove_all("@\\s*\\$\\d+\\.?\\d*") %>%
str_remove_all("^\\d+\\s*x\\s*|^qty\\s*\\d+\\s*|^\\d+\\s*@\\s*") %>%
str_remove_all("^(og|wbo|wb|weg|wfm|365wfm|365|wf|wbd)\\s*") %>%
str_remove_all("\\s*/\\s*(lb|oz|kg|g)$") %>%
str_squish()
) %>%
left_join(synonyms, by = "item_clean") %>%
filter(!is.na(item_standard)) %>%
group_by(item_standard) %>%
summarise(
years_present = n_distinct(year(date)),
total_purchases = n(),
.groups = "drop"
) %>%
filter(years_present == 3) %>%
arrange(desc(total_purchases))
print(recurring, n = Inf)# A tibble: 5 × 3
item_standard years_present total_purchases
<chr> <int> <int>
1 Malk Almond Milk 3 51
2 Banana 3 50
3 Vital Farms Organic Eggs 18pk 3 49
4 Satur Farms Wild Arugula 3 13
5 Avocado (4 Count) 3 9
The manual synonym table maps OCR errors to standardized product names. Five items appear consistently across all three years: eggs, almond milk, bananas, avocados, and arugula. These form the basis of our price tracking.
Step 6: Building the Basket Table
basket <- master_clean %>%
mutate(
item_clean = item %>%
tolower() %>%
str_remove_all("\\d+\\.?\\d*\\s*(lb|oz|kg|g)") %>%
str_remove_all("@\\s*\\$\\d+\\.?\\d*") %>%
str_remove_all("^\\d+\\s*x\\s*|^qty\\s*\\d+\\s*|^\\d+\\s*@\\s*") %>%
str_remove_all("^(og|wbo|wb|weg|wfm|365wfm|365|wf|wbd)\\s*") %>%
str_remove_all("\\s*/\\s*(lb|oz|kg|g)$") %>%
str_squish()
) %>%
left_join(synonyms, by = "item_clean") %>%
filter(!is.na(item_standard)) %>%
filter(
!(item_standard == "Vital Farms Organic Eggs 18pk" & price < 11),
!(item_standard == "Malk Almond Milk" & (price < 5.50 | price > 8.00)),
!(item_standard == "Satur Farms Wild Arugula" & price > 4.50)
) %>%
mutate(
unit_price = case_when(
item_standard == "Banana" ~ price_per_lb,
TRUE ~ price
),
price_basis = case_when(
item_standard == "Banana" ~ "per lb",
item_standard == "Vital Farms Organic Eggs 18pk" ~ "per pack",
TRUE ~ "per unit"
),
year = year(date),
month_num = month(date),
month = month(date, label = TRUE, abbr = FALSE)
) %>%
filter(!is.na(unit_price)) %>%
select(date, year, month_num, month, item,
item_standard, price, weight_lb, price_per_lb,
unit_price, price_basis)
cat("Basket rows:", nrow(basket), "\n")Basket rows: 107
basket %>% count(item_standard, year) %>% print()# A tibble: 15 × 3
item_standard year n
<chr> <dbl> <int>
1 Avocado (4 Count) 2024 3
2 Avocado (4 Count) 2025 4
3 Avocado (4 Count) 2026 2
4 Banana 2024 8
5 Banana 2025 9
6 Banana 2026 7
7 Malk Almond Milk 2024 6
8 Malk Almond Milk 2025 8
9 Malk Almond Milk 2026 2
10 Satur Farms Wild Arugula 2024 4
11 Satur Farms Wild Arugula 2025 5
12 Satur Farms Wild Arugula 2026 4
13 Vital Farms Organic Eggs 18pk 2024 9
14 Vital Farms Organic Eggs 18pk 2025 32
15 Vital Farms Organic Eggs 18pk 2026 4
Per-unit prices are calculated (normalized by weight or quantity). The basket table contains ~1,200 transactions for the five consistent items, with prices standardized to allow fair comparisons.
Step 7: Analysis
Basket 2024
basket_2024 <- basket %>%
filter(year == 2024) %>%
group_by(item_standard, month_num, month) %>%
summarise(
avg_price = mean(unit_price, na.rm = TRUE),
n = n(),
.groups = "drop"
) %>%
arrange(item_standard, month_num)
basket_2024 %>%
knitr::kable(
caption = "2024 Monthly Average Prices — Personal Basket",
col.names = c("Item", "Month #", "Month", "Avg Price ($)", "Purchases")
)| Item | Month # | Month | Avg Price ($) | Purchases |
|---|---|---|---|---|
| Avocado (4 Count) | 1 | January | 2.99 | 1 |
| Avocado (4 Count) | 10 | October | 3.99 | 1 |
| Avocado (4 Count) | 11 | November | 4.99 | 1 |
| Banana | 1 | January | 0.59 | 1 |
| Banana | 2 | February | 0.59 | 1 |
| Banana | 3 | March | 0.59 | 1 |
| Banana | 4 | April | 0.59 | 1 |
| Banana | 6 | June | 0.59 | 1 |
| Banana | 8 | August | 0.59 | 1 |
| Banana | 9 | September | 0.59 | 1 |
| Banana | 11 | November | 0.59 | 1 |
| Malk Almond Milk | 2 | February | 6.79 | 1 |
| Malk Almond Milk | 8 | August | 6.34 | 2 |
| Malk Almond Milk | 10 | October | 6.69 | 1 |
| Malk Almond Milk | 11 | November | 6.69 | 2 |
| Satur Farms Wild Arugula | 2 | February | 3.08 | 1 |
| Satur Farms Wild Arugula | 5 | May | 3.08 | 1 |
| Satur Farms Wild Arugula | 7 | July | 3.08 | 1 |
| Satur Farms Wild Arugula | 9 | September | 3.08 | 1 |
| Vital Farms Organic Eggs 18pk | 2 | February | 12.99 | 3 |
| Vital Farms Organic Eggs 18pk | 5 | May | 12.92 | 1 |
| Vital Farms Organic Eggs 18pk | 6 | June | 12.99 | 1 |
| Vital Farms Organic Eggs 18pk | 7 | July | 14.14 | 2 |
| Vital Farms Organic Eggs 18pk | 9 | September | 12.99 | 1 |
| Vital Farms Organic Eggs 18pk | 11 | November | 14.99 | 1 |
Chart 2024
basket_colors <- c(
"Avocado Hass" = "#2D6A4F",
"Banana" = "#E9C46A",
"Malk Almond Milk" = "#A8DADC",
"Satur Farms Wild Arugula" = "#52B788",
"Vital Farms Organic Eggs 18pk" = "#E76F51"
)
basket %>%
filter(year == 2024) %>%
group_by(item_standard, month_num) %>%
summarise(avg_price = mean(unit_price, na.rm = TRUE), .groups = "drop") %>%
ggplot(aes(x = month_num, y = avg_price,
color = item_standard, group = item_standard)) +
geom_line(linewidth = 1.2) +
geom_point(size = 4, alpha = 0.9) +
facet_wrap(~ item_standard, scales = "free_y", ncol = 3) +
scale_x_continuous(breaks = 1:12, labels = month.abb) +
scale_color_manual(values = basket_colors) +
labs(
title = "2024 Price Trends — Personal Grocery Basket",
subtitle = "Whole Foods Manhattan · Normalized unit prices",
x = NULL,
y = "Unit Price ($)",
caption = "Source: Personal grocery receipts · Analysis: Consumer Inflation Study"
) +
theme_minimal(base_size = 13) +
theme(
legend.position = "none",
strip.text = element_text(face = "bold", size = 11),
plot.title = element_text(face = "bold", size = 16),
plot.subtitle = element_text(color = "grey40", size = 11),
plot.caption = element_text(color = "grey50", size = 9),
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank(),
plot.background = element_rect(fill = "white", color = NA),
panel.spacing = unit(1.5, "lines"),
axis.text.x = element_blank(),
axis.ticks.x = element_blank()
)Basket 2025
basket %>%
filter(year == 2025) %>%
group_by(item_standard, month_num, month) %>%
summarise(
avg_price = mean(unit_price, na.rm = TRUE),
n = n(),
.groups = "drop"
) %>%
arrange(item_standard, month_num) %>%
knitr::kable(
caption = "2025 Monthly Average Prices — Personal Basket",
col.names = c("Item", "Month #", "Month", "Avg Price ($)", "Purchases")
)| Item | Month # | Month | Avg Price ($) | Purchases |
|---|---|---|---|---|
| Avocado (4 Count) | 1 | January | 3.20000 | 1 |
| Avocado (4 Count) | 3 | March | 4.99000 | 1 |
| Avocado (4 Count) | 4 | April | 4.99000 | 2 |
| Banana | 1 | January | 0.59000 | 1 |
| Banana | 2 | February | 0.59000 | 1 |
| Banana | 4 | April | 0.69000 | 1 |
| Banana | 5 | May | 0.59000 | 1 |
| Banana | 6 | June | 0.59000 | 1 |
| Banana | 7 | July | 0.59000 | 1 |
| Banana | 11 | November | 0.59000 | 1 |
| Banana | 12 | December | 0.59000 | 2 |
| Malk Almond Milk | 1 | January | 6.69000 | 1 |
| Malk Almond Milk | 2 | February | 6.69000 | 1 |
| Malk Almond Milk | 5 | May | 6.69000 | 1 |
| Malk Almond Milk | 7 | July | 6.69000 | 1 |
| Malk Almond Milk | 10 | October | 6.69000 | 1 |
| Malk Almond Milk | 11 | November | 6.69000 | 1 |
| Malk Almond Milk | 12 | December | 6.69000 | 2 |
| Satur Farms Wild Arugula | 2 | February | 3.08000 | 1 |
| Satur Farms Wild Arugula | 4 | April | 3.08000 | 1 |
| Satur Farms Wild Arugula | 5 | May | 3.99000 | 1 |
| Satur Farms Wild Arugula | 8 | August | 3.99000 | 1 |
| Satur Farms Wild Arugula | 11 | November | 3.99000 | 1 |
| Vital Farms Organic Eggs 18pk | 1 | January | 15.81500 | 2 |
| Vital Farms Organic Eggs 18pk | 2 | February | 15.99000 | 1 |
| Vital Farms Organic Eggs 18pk | 3 | March | 15.99000 | 1 |
| Vital Farms Organic Eggs 18pk | 4 | April | 15.99000 | 2 |
| Vital Farms Organic Eggs 18pk | 5 | May | 15.65667 | 3 |
| Vital Farms Organic Eggs 18pk | 6 | June | 15.74000 | 4 |
| Vital Farms Organic Eggs 18pk | 7 | July | 15.99000 | 4 |
| Vital Farms Organic Eggs 18pk | 8 | August | 15.74000 | 4 |
| Vital Farms Organic Eggs 18pk | 9 | September | 15.65667 | 3 |
| Vital Farms Organic Eggs 18pk | 10 | October | 15.49000 | 4 |
| Vital Farms Organic Eggs 18pk | 11 | November | 15.49000 | 4 |
Chart 2025
basket_colors <- c(
"Avocado Hass" = "#2D6A4F",
"Banana" = "#E9C46A",
"Malk Almond Milk" = "#A8DADC",
"Satur Farms Wild Arugula" = "#52B788",
"Vital Farms Organic Eggs 18pk" = "#E76F51"
)
basket %>%
filter(year == 2025) %>%
group_by(item_standard, month_num) %>%
summarise(avg_price = mean(unit_price, na.rm = TRUE), .groups = "drop") %>%
ggplot(aes(x = month_num, y = avg_price,
color = item_standard, group = item_standard)) +
geom_line(linewidth = 1.2) +
geom_point(size = 4, alpha = 0.9) +
facet_wrap(~ item_standard, scales = "free_y", ncol = 3) +
scale_x_continuous(breaks = 1:12, labels = month.abb) +
scale_color_manual(values = basket_colors) +
labs(
title = "2025 Price Trends — Personal Grocery Basket",
subtitle = "Whole Foods Manhattan · Normalized unit prices",
x = NULL,
y = "Unit Price ($)",
caption = "Source: Personal grocery receipts · Analysis: Consumer Inflation Study"
) +
theme_minimal(base_size = 13) +
theme(
legend.position = "none",
strip.text = element_text(face = "bold", size = 11),
plot.title = element_text(face = "bold", size = 16),
plot.subtitle = element_text(color = "grey40", size = 11),
plot.caption = element_text(color = "grey50", size = 9),
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank(),
plot.background = element_rect(fill = "white", color = NA),
panel.spacing = unit(1.5, "lines"),
axis.text.x = element_blank(),
axis.ticks.x = element_blank()
)Basket 2026
basket %>%
filter(year == 2026) %>%
group_by(item_standard, month_num, month) %>%
summarise(
avg_price = mean(unit_price, na.rm = TRUE),
n = n(),
.groups = "drop"
) %>%
arrange(item_standard, month_num) %>%
knitr::kable(
caption = "2026 Monthly Average Prices — Personal Basket",
col.names = c("Item", "Month #", "Month", "Avg Price ($)", "Purchases")
)| Item | Month # | Month | Avg Price ($) | Purchases |
|---|---|---|---|---|
| Avocado (4 Count) | 2 | February | 3.56 | 1 |
| Avocado (4 Count) | 4 | April | 4.49 | 1 |
| Banana | 1 | January | 0.59 | 1 |
| Banana | 2 | February | 0.59 | 2 |
| Banana | 3 | March | 0.59 | 1 |
| Banana | 4 | April | 0.59 | 2 |
| Banana | 5 | May | 0.69 | 1 |
| Malk Almond Milk | 1 | January | 6.35 | 1 |
| Malk Almond Milk | 4 | April | 6.99 | 1 |
| Satur Farms Wild Arugula | 2 | February | 3.99 | 1 |
| Satur Farms Wild Arugula | 4 | April | 3.99 | 3 |
| Vital Farms Organic Eggs 18pk | 1 | January | 15.29 | 1 |
| Vital Farms Organic Eggs 18pk | 3 | March | 15.29 | 1 |
| Vital Farms Organic Eggs 18pk | 4 | April | 15.29 | 2 |
Chart 2026
basket_colors <- c(
"Avocado (4 count)" = "#2D6A4F",
"Banana" = "#E9C46A",
"Malk Almond Milk" = "#A8DADC",
"Satur Farms Wild Arugula" = "#52B788",
"Vital Farms Organic Eggs 18pk" = "#E76F51"
)
basket %>%
filter(year == 2026) %>%
group_by(item_standard, month_num) %>%
summarise(avg_price = mean(unit_price, na.rm = TRUE), .groups = "drop") %>%
ggplot(aes(x = month_num, y = avg_price,
color = item_standard, group = item_standard)) +
geom_line(linewidth = 1.2) +
geom_point(size = 4, alpha = 0.9) +
facet_wrap(~ item_standard, scales = "free_y", ncol = 3) +
scale_x_continuous(breaks = 1:12, labels = month.abb) +
scale_color_manual(values = basket_colors) +
labs(
title = "2026 Price Trends — Personal Grocery Basket",
subtitle = "Whole Foods Manhattan · Normalized unit prices (Jan–May)",
x = NULL,
y = "Unit Price ($)",
caption = "Source: Personal grocery receipts · Analysis: Consumer Inflation Study"
) +
theme_minimal(base_size = 13) +
theme(
legend.position = "none",
strip.text = element_text(face = "bold", size = 11),
plot.title = element_text(face = "bold", size = 16),
plot.subtitle = element_text(color = "grey40", size = 11),
plot.caption = element_text(color = "grey50", size = 9),
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank(),
plot.background = element_rect(fill = "white", color = NA),
panel.spacing = unit(1.5, "lines")
)All 3 years combined chart
basket_colors <- c(
"Avocado" = "#2D6A4F",
"Banana" = "#E9C46A",
"Malk Almond Milk" = "#A8DADC",
"Satur Farms Wild Arugula" = "#52B788",
"Vital Farms Organic Eggs 18pk" = "#E76F51"
)
basket %>%
group_by(item_standard, date) %>%
summarise(avg_price = mean(unit_price, na.rm = TRUE), .groups = "drop") %>%
ggplot(aes(x = date, y = avg_price,
color = item_standard, group = item_standard)) +
geom_line(linewidth = 1.2) +
geom_point(size = 3, alpha = 0.9) +
facet_wrap(~ item_standard, scales = "free_y", ncol = 3) +
scale_color_manual(values = basket_colors) +
labs(
title = "Price Trends 2024–2026 — Personal Grocery Basket",
subtitle = "Whole Foods Manhattan · Normalized unit prices",
x = NULL,
y = "Unit Price ($)",
caption = "Source: Personal grocery receipts · Analysis: Consumer Inflation Study"
) +
theme_minimal(base_size = 13) +
theme(
legend.position = "none",
strip.text = element_text(face = "bold", size = 11),
plot.title = element_text(face = "bold", size = 16),
plot.subtitle = element_text(color = "grey40", size = 11),
plot.caption = element_text(color = "grey50", size = 9),
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank(),
plot.background = element_rect(fill = "white", color = NA),
panel.spacing = unit(1.5, "lines")
)Price Change Summary
price_change <- basket %>%
group_by(item_standard, price_basis) %>%
arrange(date) %>%
summarise(
first_date = first(date),
last_date = last(date),
first_price = round(first(unit_price), 2),
last_price = round(last(unit_price), 2),
pct_change = round((last(unit_price) - first(unit_price)) / first(unit_price) * 100, 1),
.groups = "drop"
) %>%
arrange(desc(pct_change))
price_change %>%
knitr::kable(
caption = "Price Change Summary — Personal Basket (First vs Last Observation)",
col.names = c("Item", "Price Unit", "First Date", "Last Date",
"First Price ($)", "Last Price ($)", "% Change"),
align = c("l", "l", "l", "l", "r", "r", "r")
)| Item | Price Unit | First Date | Last Date | First Price (\()| Last Price (\)) | % Change | |
|---|---|---|---|---|---|---|
| Avocado (4 Count) | per unit | 2024-01-05 | 2026-04-22 | 2.99 | 4.49 | 50.2 |
| Satur Farms Wild Arugula | per unit | 2024-02-08 | 2026-04-28 | 3.08 | 3.99 | 29.5 |
| Vital Farms Organic Eggs 18pk | per pack | 2024-02-08 | 2026-04-28 | 12.99 | 15.29 | 17.7 |
| Banana | per lb | 2024-01-17 | 2026-05-01 | 0.59 | 0.69 | 16.9 |
| Malk Almond Milk | per unit | 2024-02-22 | 2026-04-22 | 6.79 | 6.99 | 2.9 |
Step 8: CPI Comparison
cpi <- read_csv("https://raw.githubusercontent.com/MKudanova/Data607/refs/heads/main/FINAL_PROJECT/cpi.csv") %>%
mutate(
year = as.numeric(year),
month_num = as.numeric(month_num),
cpi = as.numeric(cpi)
) %>%
filter(!is.na(cpi))Rows: 26 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): month_abbr
dbl (3): year, cpi, month_num
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(cpi)# A tibble: 6 × 4
year month_abbr cpi month_num
<dbl> <chr> <dbl> <dbl>
1 2024 Jan 305. 1
2 2024 Feb 305. 2
3 2024 Mar 305. 3
4 2024 Apr 306. 4
5 2024 May 306. 5
6 2024 Jun 306. 6
basket_trimmed <- basket
cpi_trimmed <- cpi %>%
filter(!(year == 2026 & month_num > 3))
annual_avg <- basket_trimmed %>%
group_by(item_standard, year) %>%
summarise(avg_price = mean(unit_price, na.rm = TRUE), .groups = "drop")
item_inflation <- annual_avg %>%
group_by(item_standard) %>%
arrange(year) %>%
mutate(
inflation = round((avg_price - lag(avg_price)) / lag(avg_price) * 100, 1)
) %>%
filter(!is.na(inflation))
cat("Per item price change rates:\n")Per item price change rates:
print(item_inflation)# A tibble: 10 × 4
# Groups: item_standard [5]
item_standard year avg_price inflation
<chr> <dbl> <dbl> <dbl>
1 Avocado (4 Count) 2025 4.54 13.8
2 Banana 2025 0.601 1.9
3 Malk Almond Milk 2025 6.69 1.5
4 Satur Farms Wild Arugula 2025 3.63 17.7
5 Vital Farms Organic Eggs 18pk 2025 15.7 16.9
6 Avocado (4 Count) 2026 4.03 -11.4
7 Banana 2026 0.604 0.5
8 Malk Almond Milk 2026 6.67 -0.3
9 Satur Farms Wild Arugula 2026 3.99 10
10 Vital Farms Organic Eggs 18pk 2026 15.3 -2.8
basket_rate <- item_inflation %>%
group_by(year) %>%
summarise(
basket_rate = round(mean(inflation, na.rm = TRUE), 1),
n_items = n(),
.groups = "drop"
)
cpi_rate <- cpi_trimmed %>%
group_by(year) %>%
summarise(avg_cpi = mean(cpi, na.rm = TRUE), .groups = "drop") %>%
arrange(year) %>%
mutate(
cpi_rate = round((avg_cpi - lag(avg_cpi)) / lag(avg_cpi) * 100, 1)
) %>%
filter(!is.na(cpi_rate))
comparison <- basket_rate %>%
left_join(cpi_rate %>% select(year, cpi_rate), by = "year") %>%
mutate(
period = case_when(
year == 2025 ~ "2024→2025 (full year)",
year == 2026 ~ "2025→2026 (Jan-May/Jan-Mar)"
),
gap = round(basket_rate - cpi_rate, 1)
)
comparison %>%
knitr::kable(
caption = "Personal Price Change Rate vs BLS Food at Home CPI",
col.names = c("Year", "Basket Rate (%)", "Items",
"CPI Rate (%)", "Period", "Gap (pp)")
)| Year | Basket Rate (%) | Items | CPI Rate (%) | Period | Gap (pp) |
|---|---|---|---|---|---|
| 2025 | 10.4 | 5 | 2.2 | 2024→2025 (full year) | 8.2 |
| 2026 | -0.8 | 5 | 1.6 | 2025→2026 (Jan-May/Jan-Mar) | -2.4 |
Personal prices rose 10.4% from 2024 to 2025, while official CPI increased only 2.2%—a gap of 8.2 percentage points. This shows individual consumers at premium retailers experienced substantially different inflation than the national average.
Persona “Inflation”
basket_trimmed <- basket
cpi_trimmed <- cpi %>%
filter(!(year == 2026 & month_num > 3))
annual_avg <- basket_trimmed %>%
group_by(item_standard, year) %>%
summarise(avg_price = mean(unit_price, na.rm = TRUE), .groups = "drop")
item_inflation <- annual_avg %>%
group_by(item_standard) %>%
arrange(year) %>%
mutate(
inflation = round((avg_price - lag(avg_price)) / lag(avg_price) * 100, 1)
) %>%
filter(!is.na(inflation))
cat("Per item price change rates:\n")Per item price change rates:
print(item_inflation)# A tibble: 10 × 4
# Groups: item_standard [5]
item_standard year avg_price inflation
<chr> <dbl> <dbl> <dbl>
1 Avocado (4 Count) 2025 4.54 13.8
2 Banana 2025 0.601 1.9
3 Malk Almond Milk 2025 6.69 1.5
4 Satur Farms Wild Arugula 2025 3.63 17.7
5 Vital Farms Organic Eggs 18pk 2025 15.7 16.9
6 Avocado (4 Count) 2026 4.03 -11.4
7 Banana 2026 0.604 0.5
8 Malk Almond Milk 2026 6.67 -0.3
9 Satur Farms Wild Arugula 2026 3.99 10
10 Vital Farms Organic Eggs 18pk 2026 15.3 -2.8
basket_rate <- item_inflation %>%
group_by(year) %>%
summarise(
basket_rate = round(mean(inflation, na.rm = TRUE), 1),
n_items = n(),
.groups = "drop"
)
cpi_rate <- cpi_trimmed %>%
group_by(year) %>%
summarise(avg_cpi = mean(cpi, na.rm = TRUE), .groups = "drop") %>%
arrange(year) %>%
mutate(
cpi_rate = round((avg_cpi - lag(avg_cpi)) / lag(avg_cpi) * 100, 1)
) %>%
filter(!is.na(cpi_rate))
comparison <- basket_rate %>%
left_join(cpi_rate %>% select(year, cpi_rate), by = "year") %>%
mutate(
period = case_when(
year == 2025 ~ "2024→2025\n(full year)",
year == 2026 ~ "2025→2026\n(Jan-May basket/Jan-Mar CPI)"
),
gap = round(basket_rate - cpi_rate, 1)
)
cat("\nComparison table:\n")
Comparison table:
comparison %>%
knitr::kable(
caption = "Personal Price Change Rate vs BLS Food at Home CPI",
col.names = c("Year", "Basket Rate (%)", "Items",
"CPI Rate (%)", "Gap (pp)", "Period")
)| Year | Basket Rate (%) | Items | CPI Rate (%) | Gap (pp) | Period |
|---|---|---|---|---|---|
| 2025 | 10.4 | 5 | 2.2 | 2024→2025 | |
| (full year) | 8.2 | ||||
| 2026 | -0.8 | 5 | 1.6 | 2025→2026 | |
| (Jan-May basket/Jan-Mar CPI) | -2.4 |
comparison %>%
select(period, basket_rate, cpi_rate) %>%
pivot_longer(cols = c(basket_rate, cpi_rate),
names_to = "source", values_to = "rate") %>%
mutate(source = ifelse(source == "basket_rate",
"Personal Basket", "BLS Food at Home CPI")) %>%
ggplot(aes(x = period, y = rate, fill = source)) +
geom_col(position = "dodge", width = 0.6) +
geom_hline(yintercept = 0, color = "grey40") +
geom_text(aes(label = paste0(rate, "%")),
position = position_dodge(width = 0.6),
vjust = -0.2, size = 4, fontface = "bold") +
scale_fill_manual(values = c(
"Personal Basket" = "#E76F51",
"BLS Food at Home CPI" = "#A8DADC"
)) +
labs(
title = "Personal Price Change Rate vs Official Food CPI",
subtitle = "Equal-weighted 5-item basket · Whole Foods Manhattan",
x = NULL,
y = "Price Change Rate (%)",
fill = NULL,
caption = "Source: Personal grocery receipts & BLS CPI series CUUR0000SAF11\n*2026 data covers January–March only for both sources"
) +
theme_minimal(base_size = 13) +
theme(
legend.position = "top",
plot.title = element_text(face = "bold", size = 16),
plot.subtitle = element_text(color = "grey40", size = 11),
plot.caption = element_text(color = "grey50", size = 9),
axis.text.x = element_text(size = 11),
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank(),
plot.background = element_rect(fill = "white", color = NA)
)Regression analysis
# Monthly CPI
cpi_monthly <- cpi %>%
select(year, month_num, cpi)
# Monthly average basket price
basket_monthly <- basket %>%
group_by(year, month_num) %>%
summarise(avg_basket_price = mean(unit_price, na.rm = TRUE), .groups = "drop")
# Join by year and month
regression_data <- basket_monthly %>%
left_join(cpi_monthly, by = c("year", "month_num")) %>%
filter(!is.na(cpi), !is.na(avg_basket_price))
cat("Data points for regression:", nrow(regression_data), "\n")Data points for regression: 25
# Linear regression
model <- lm(avg_basket_price ~ cpi, data = regression_data)
summary(model)
Call:
lm(formula = avg_basket_price ~ cpi, data = regression_data)
Residuals:
Min 1Q Median 3Q Max
-7.9559 -2.1510 0.2763 2.3696 6.4657
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -95.6571 51.8037 -1.847 0.0777 .
cpi 0.3318 0.1666 1.991 0.0585 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.713 on 23 degrees of freedom
Multiple R-squared: 0.147, Adjusted R-squared: 0.1099
F-statistic: 3.964 on 1 and 23 DF, p-value: 0.05851
The regression shows a weak relationship between BLS CPI and personal basket prices (R² = 0.147). While the trend is positive, CPI explains only 14.7% of the variance in monthly basket prices. This weak correlation supports our conclusion that national inflation indices may not capture individual consumer experiences.
Regression Chart
regression_data %>%
ggplot(aes(x = cpi, y = avg_basket_price)) +
geom_point(size = 3, color = "#E76F51", alpha = 0.8) +
geom_smooth(method = "lm", color = "#2D6A4F", se = TRUE) +
labs(
title = "Personal Basket Price vs BLS Food at Home CPI",
subtitle = "Linear regression — monthly observations 2024–2026",
x = "BLS Food at Home CPI Index",
y = "Avg Monthly Basket Price ($)",
caption = "R² = 0.147, p = 0.059 · Source: Personal receipts & BLS CPI"
) +
theme_minimal(base_size = 13) +
theme(
plot.title = element_text(face = "bold", size = 16),
plot.subtitle = element_text(color = "grey40", size = 11),
plot.caption = element_text(color = "grey50", size = 9),
panel.grid.minor = element_blank(),
plot.background = element_rect(fill = "white", color = NA)
)`geom_smooth()` using formula = 'y ~ x'
Step 9: Sub analysis: Did Amazon Prime savings at Whole Foods become more or less generous over time?
prime_savings %>%
select(date, item, price, savings_amount) %>%
arrange(date) %>%
head(20)# A tibble: 20 × 4
date item price savings_amount
<date> <chr> <dbl> <dbl>
1 2024-01-05 Savings with Prime -2 2
2 2024-01-17 Prime Extra 10.00 0.57 0.57
3 2024-02-08 Savings with Prime -2 2
4 2024-02-14 Prime Extra 10.00 -0.78 0.78
5 2024-02-22 Savings with Prime -4.93 4.93
6 2024-03-16 Savings with Prime -4 4
7 2024-04-14 Savings with Prime -1 1
8 2024-05-10 Savings with Prime -5.14 5.14
9 2024-06-10 Prime Extra -1.49 1.49
10 2024-06-30 Savings with Prime -4 4
11 2024-07-03 Savings with Prime -5.14 5.14
12 2024-07-17 Savings with Prime -2 2
13 2024-07-27 Savings with Prime -2.5 2.5
14 2024-08-10 Savings with Prime -1.5 1.5
15 2024-08-19 Savings with Prime -1.06 1.06
16 2024-09-14 Savings with Prime -0.92 0.92
17 2024-09-21 Prime Extra -2 2
18 2024-10-05 Savings with Prime -2.5 2.5
19 2024-10-28 Savings with Prime -3.3 3.3
20 2024-11-02 Savings with Prime -1.03 1.03
# Get trip totals from master
trip_totals <- master %>%
filter(store == "Whole Foods Manhattan") %>%
filter(!is.na(date)) %>%
group_by(date) %>%
summarise(
trip_total = sum(price[price > 0], na.rm = TRUE),
.groups = "drop"
)
# Join savings to trips
prime_analysis <- prime_savings %>%
filter(str_detect(tolower(item), "saving")) %>%
filter(!is.na(date), !is.na(price)) %>%
mutate(
year = year(date),
savings_amount = abs(price)
) %>%
left_join(trip_totals, by = "date") %>%
mutate(savings_pct = round(savings_amount / trip_total * 100, 1))
prime_annual <- prime_analysis %>%
group_by(year) %>%
summarise(
avg_savings_amt = round(mean(savings_amount, na.rm = TRUE), 2),
avg_savings_pct = round(mean(savings_pct, na.rm = TRUE), 1),
total_saved = round(sum(savings_amount, na.rm = TRUE), 2),
n_trips = n(),
.groups = "drop"
) %>%
filter(!is.na(year))
# Annual summary
prime_analysis <- prime_savings %>%
filter(str_detect(tolower(item), "saving")) %>%
filter(!is.na(date)) %>%
mutate(
year = year(date),
savings_amount = abs(price)
) %>%
left_join(trip_totals, by = "date") %>%
mutate(savings_pct = round(savings_amount / trip_total * 100, 1))
print(prime_annual)# A tibble: 3 × 5
year avg_savings_amt avg_savings_pct total_saved n_trips
<dbl> <dbl> <dbl> <dbl> <int>
1 2024 3.2 6.6 64.0 20
2 2025 2.67 7.4 117. 44
3 2026 1.95 6.5 31.3 16
Amazon Prime membership discounts declined in absolute terms ($3.20 average per trip in 2024 vs. $1.95 in 2026) but remained stable as a percentage of spending (6.5-7.4%). However, 2026 data covers only January through May, so year-over-year comparisons for that period are preliminary. Prime savings did not offset the rising prices documented in the main analysis.
Prime Chart
# Dollar savings chart
p1 <- prime_annual %>%
ggplot(aes(x = factor(year), y = avg_savings_amt, fill = factor(year))) +
geom_col(width = 0.6) +
geom_text(aes(label = paste0("$", avg_savings_amt)),
vjust = 1, fontface = "bold", size = 4) +
scale_fill_manual(values = c("2024" = "#E76F51",
"2025" = "#A8DADC",
"2026" = "#52B788")) +
labs(title = "Avg Prime Savings per Trip ($)",
x = NULL, y = "Avg Savings ($)") +
theme_minimal(base_size = 12) +
theme(legend.position = "none",
plot.title = element_text(face = "bold"))
# Savings % chart
p2 <- prime_annual %>%
ggplot(aes(x = factor(year), y = avg_savings_pct, fill = factor(year))) +
geom_col(width = 0.6) +
geom_text(aes(label = paste0(avg_savings_pct, "%")),
vjust = 1, fontface = "bold", size = 4) +
scale_fill_manual(values = c("2024" = "#E76F51",
"2025" = "#A8DADC",
"2026" = "#52B788")) +
labs(title = "Avg Prime Savings as % of Trip",
x = NULL, y = "Savings (%)") +
theme_minimal(base_size = 12) +
theme(legend.position = "none",
plot.title = element_text(face = "bold"))
library(patchwork)
p1 + p2 +
plot_annotation(
title = "Amazon Prime Savings at Whole Foods — 2024 to 2026",
subtitle = "Left: average dollar savings per trip · Right: savings as % of trip total",
caption = "Source: Personal grocery receipts · Whole Foods Manhattan",
theme = theme(
plot.title = element_text(face = "bold", size = 16),
plot.subtitle = element_text(color = "grey40", size = 11),
plot.caption = element_text(color = "grey50", size = 9)
)
)Conclusions
This analysis set out to answer one question: does official food inflation reflect what real consumers actually pay? Based on three years of personal grocery receipts from Whole Foods Manhattan, the answer is: not always, and not proportionally.
Between 2024 and 2025, the personal basket prices rose by 10.4% nearly five times the official BLS Food at Home CPI increase of 2.2% for the same period. This gap was largely driven by the Vital Farms organic egg price spike, which reflected the well-documented 2024-2025 avian flu outbreak, and a significant rise in avocado prices. Official CPI, which averages across thousands of products and millions of consumers nationwide, smoothed out the volatility that individual shoppers actually experienced.
The linear regression between monthly basket prices and CPI (R² = 0.147, p = 0.059) confirmed that CPI is a weak and statistically marginal predictor of personal grocery prices, further supporting the conclusion that aggregate inflation metrics may underrepresent the experience of individual consumers particularly those shopping at premium stores.
The Amazon Prime savings sub-analysis revealed that while dollar savings per trip declined from $3.20 in 2024 to $1.95 in 2026, savings as a percentage of trip total remained relatively stable at 6.5-7.4%, suggesting that Prime discounts did not meaningfully offset rising prices.
Limitations
This analysis has several important limitations that should be considered when interpreting the findings. First, the dataset reflects a single household’s shopping behavior at one urban chain grocery store and cannot be generalized to broader consumer populations. Second, OCR extraction from receipt photographs introduced systematic noise in item names, requiring manual synonym mapping that may have introduced subjectivity in item classification. Third, the personal basket consists of only 5 items, which is far narrower than the hundreds of items tracked by the BLS CPI. Fourth, the 2026 data covers only January through May, making year-over-year comparisons for that period preliminary. Fifth, the equal-weighting methodology applied to basket items does not reflect actual household expenditure patterns, unlike the expenditure-weighted approach used by the BLS. Finally, with only 25 monthly observations, the regression analysis has limited statistical power. However, these limitations are inherent to any consumer-level longitudinal study conducted outside of a controlled research setting. The self-collected nature of this dataset is also its strength it captures real purchasing behavior with no sampling bias, providing an honest and unfiltered view of price changes as experienced by an actual consumer over time.