Tracking Grocery Spending from a Consumer Perspective Over a Three-Year Period

Author

Madina Kudanova

knitr::include_graphics("https://raw.githubusercontent.com/MKudanova/Data607/main/FINAL_PROJECT/pics/IMG_7961.jpeg")

Introduction

This project analyzes personal grocery receipts from Whole Foods collected between 2024 and 2026, comparing observed price changes to official U.S. inflation metrics (BLS CPI “Food at home”). The analysis is conducted entirely from a regular consumer’s perspective and does not claim to be a representative study of broader consumer behavior.

Motivation

Over the past three years, I personally collected and preserved grocery receipts from my regular Whole Foods shopping trips with the intent to one day analyze my own spending patterns. This project provided the opportunity to transform that personal archive into a structured longitudinal dataset and apply data science methods to a question that directly affects everyday life: how much have grocery prices actually changed, and does the official inflation rate reflect what real consumers experience at the checkout line?

Research Questions

How have prices of consistently purchased grocery items changed between 2024 and 2026, and how do these changes compare to the official U.S. food inflation rate reported by the Bureau of Labor Statistics?

As a secondary inquiry, this project examines whether Amazon Prime membership discounts at Whole Foods offset rising prices over the same period.

Data Source

Primary: Self-constructed dataset derived from personal Whole Foods receipts (2024–2026), digitized via OCR photograph extraction, and uploaded to a GitHub repository. Secondary: BLS Consumer Price Index “Food at home” series (CUUR0000SAF11), downloaded directly from the U.S. Bureau of Labor Statistics for the same time period.

Approach

This project follows the OSEMN data science workflow (Obtain, Scrub, Explore, Model, and Interpret). First, receipt data was obtained via personal photographs and digitized for OCR (Optical Character Recognition) extraction, then stored as CSV files on GitHub alongside BLS CPI data downloaded from data.bls.gov. Next, the data was scrubbed, dates were standardized, non-food rows removed, per-unit prices extracted from weighted items, and OCR errors corrected via a manually verified synonym lookup table. The data was then explored through monthly price trend visualizations for each basket item across 2024, 2025, and 2026, and a 5-item personal basket was constructed from consistently purchased products. Following that, a year-over-year personal price change rate was calculated using an equal-weighted basket and compared to the BLS Food at Home CPI. Finally, a secondary analysis examined Amazon Prime savings trends at Whole Foods over the same period to assess whether membership discounts offset rising prices.

Code Base

library(tidyverse)   # data manipulation and ggplot2
library(lubridate)   # date handling
library(stringdist)  # fuzzy string matching

Step 1: Data Cleaning

base_url <- "https://raw.githubusercontent.com/MKudanova/Data607/refs/heads/main/FINAL_PROJECT/"

wf_2024 <- read_csv(paste0(base_url, "whole_foods_2024.csv"))
Rows: 228 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): date, store, item
dbl (4): price, subtotal, tax, total

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
wf_2025 <- read_csv(paste0(base_url, "whole_foods_2025_updated.csv"))
Rows: 326 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): date, store, item
dbl (4): price, subtotal, tax, total

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
wf_2026 <- read_csv(paste0(base_url, "whole_foods_2026.csv"))
Rows: 138 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): date, store, item, price
dbl (3): subtotal, tax, total

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(wf_2024)
# A tibble: 6 × 7
  date       store                 item               price subtotal   tax total
  <chr>      <chr>                 <chr>              <dbl>    <dbl> <dbl> <dbl>
1 12/01/2024 Whole Foods Manhattan Mult Salmon         4.13       NA    NA    NA
2 12/01/2024 Whole Foods Manhattan OG Smkg Clam        4.12       NA    NA    NA
3 12/01/2024 Whole Foods Manhattan Prtno AC GS FD Di…  7.85       NA    NA    NA
4 12/01/2024 Whole Foods Manhattan Adore Grsy C Clam…  2.79       NA    NA    NA
5 12/01/2024 Whole Foods Manhattan Alaki Unscrnd Ulm… 15.0        NA    NA    NA
6 12/01/2024 Whole Foods Manhattan Yellow Peach Body… 12.3        NA    NA    NA
head(wf_2025)
# A tibble: 6 × 7
  date       store                 item               price subtotal   tax total
  <chr>      <chr>                 <chr>              <dbl>    <dbl> <dbl> <dbl>
1 01/02/2025 Whole Foods Manhattan OG4600 Hads Aresi…  3.63       NA    NA    NA
2 01/02/2025 Whole Foods Manhattan Kngt Cl OG Hals P…  7.79       NA    NA    NA
3 01/02/2025 Whole Foods Manhattan Adob OG Vma Arb H…  4.94       NA    NA    NA
4 01/02/2025 Whole Foods Manhattan OG Cosmos Cls       4.79       NA    NA    NA
5 01/02/2025 Whole Foods Manhattan Tars Mght 0 1lb     4.04       NA    NA    NA
6 01/02/2025 Whole Foods Manhattan Frst Jumbo Bluebe…  8.99       NA    NA    NA
head(wf_2026)
# A tibble: 6 × 7
  date       store                 item               price subtotal   tax total
  <chr>      <chr>                 <chr>              <chr>    <dbl> <dbl> <dbl>
1 01/20/2026 Whole Foods Manhattan Walk 2 Organic Mi… 13.38       NA    NA    NA
2 01/20/2026 Whole Foods Manhattan Pol's Spring Water 2.29        NA    NA    NA
3 01/20/2026 Whole Foods Manhattan Aurora Medjool Da… 8.69        NA    NA    NA
4 01/20/2026 Whole Foods Manhattan OG Bush Pear       4.78        NA    NA    NA
5 01/20/2026 Whole Foods Manhattan Tidy Weight Cat    2.58        NA    NA    NA
6 01/20/2026 Whole Foods Manhattan Broccoli Crowns    2.59        NA    NA    NA

The receipt data loads successfully across all three years. Each file contains transaction-level records with dates, items, and prices.

Step 3: Merging dataframes

master <- bind_rows(
  wf_2024 %>% select(date, store, item, price) %>% mutate(price = as.numeric(price)),
  wf_2025 %>% select(date, store, item, price) %>% mutate(price = as.numeric(price)),
  wf_2026 %>% select(date, store, item, price) %>% mutate(price = as.numeric(price))
) %>%
  mutate(date = mdy(date))
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `price = as.numeric(price)`.
Caused by warning:
! NAs introduced by coercion
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `date = mdy(date)`.
Caused by warning:
!  8 failed to parse.
head(master)
# A tibble: 6 × 4
  date       store                 item                     price
  <date>     <chr>                 <chr>                    <dbl>
1 2024-12-01 Whole Foods Manhattan Mult Salmon               4.13
2 2024-12-01 Whole Foods Manhattan OG Smkg Clam              4.12
3 2024-12-01 Whole Foods Manhattan Prtno AC GS FD Diet       7.85
4 2024-12-01 Whole Foods Manhattan Adore Grsy C Clam Shamon  2.79
5 2024-12-01 Whole Foods Manhattan Alaki Unscrnd Ulma Pads  15.0 
6 2024-12-01 Whole Foods Manhattan Yellow Peach Body Wash   12.3 
master <- master %>%
  arrange(date)

head(master, 20)
# A tibble: 20 × 4
   date       store                 item                     price
   <date>     <chr>                 <chr>                    <dbl>
 1 2024-01-05 Whole Foods Manhattan Navel Orange              4.41
 2 2024-01-05 Whole Foods Manhattan Broccoli Crowns           3.39
 3 2024-01-05 Whole Foods Manhattan Almond Milk               4.69
 4 2024-01-05 Whole Foods Manhattan OG Avocado Hass           2.99
 5 2024-01-05 Whole Foods Manhattan Banana                    1.51
 6 2024-01-05 Whole Foods Manhattan Savings with Prime       -2   
 7 2024-01-17 Whole Foods Manhattan SP100 Shrcomp Wfr         7.28
 8 2024-01-17 Whole Foods Manhattan Malk OG HPP Almond Milk   3.19
 9 2024-01-17 Whole Foods Manhattan Container Deposit         0.05
10 2024-01-17 Whole Foods Manhattan OG Nbs Almond Milk        4.99
11 2024-01-17 Whole Foods Manhattan Banana 2.39lb @ $0.59/lb  1.41
12 2024-01-17 Whole Foods Manhattan Prime Extra 10.00         0.57
13 2024-02-08 Whole Foods Manhattan OG Fuyu item              1.99
14 2024-02-08 Whole Foods Manhattan Lucni OG Olive Oil       17.7 
15 2024-02-08 Whole Foods Manhattan OG Vell New Almond Milk   3.19
16 2024-02-08 Whole Foods Manhattan Satfr Wild Arugula Clam   3.08
17 2024-02-08 Whole Foods Manhattan OG Green Asparagus        5.29
18 2024-02-08 Whole Foods Manhattan Vilfr OG Large Eggs      13.0 
19 2024-02-08 Whole Foods Manhattan Maks Grmzd OG Pasta Sce   5.79
20 2024-02-08 Whole Foods Manhattan Savings with Prime       -2   

Data is sorted chronologically for easier temporal analysis.

Step 4: Standardizing

Separating Prime savings into their own table, then removing non-food rows (savings, bag fees, deposits, tax) from the main dataset.

master <- master %>%
  mutate(
    weight_lb    = as.numeric(str_extract(item, "\\d+\\.?\\d*(?=\\s*lb)")),
    price_per_lb = as.numeric(str_extract(item, "(?<=@\\s\\$)\\d+\\.?\\d*")),
    quantity     = as.numeric(str_extract(item, "(?i)(?<=qty\\s)\\d+"))
  )

master %>%
  filter(!is.na(price_per_lb)) %>%
  select(date, item, price, weight_lb, price_per_lb) %>%
  head(20)
# A tibble: 20 × 5
   date       item                                  price weight_lb price_per_lb
   <date>     <chr>                                 <dbl>     <dbl>        <dbl>
 1 2024-01-17 Banana 2.39lb @ $0.59/lb               1.41      2.39         0.59
 2 2024-02-14 Banana 1.38lb @ $0.59/lb               0.81      1.38         0.59
 3 2024-02-14 Clrn Madzcd OG 1lb @ $5.69/lb          5.69      1            5.69
 4 2024-03-15 Banana 2lb @ $0.59/lb                  1.67      2            0.59
 5 2024-04-14 Broccoli Crowns 1.26lb @ $2.99/lb      3.77      1.26         2.99
 6 2024-04-28 Banana 2lb @ $0.59/lb                  1.18      2            0.59
 7 2024-05-10 OG Green Asparagus 1lb @ $5.99/lb      5.29      1            5.99
 8 2024-06-18 Banana 1lb @ $0.59/lb                  0.59      1            0.59
 9 2024-07-17 1.29 Alght 0 1lb @ $2.99/lb            3.88      1            2.99
10 2024-08-10 Banana 2.34lb @ $0.59/lb               1.38      2.34         0.59
11 2024-08-10 OG 2.15 @ $1.99/lb                     4.28     NA            1.99
12 2024-08-10 Gaby 2 1.46 @ $3.49                    5.1      NA            3.49
13 2024-09-14 Banana 2.34lb @ $0.59/lb               1.38      2.34         0.59
14 2024-10-05 Tars Mght 0 1lb @ $2.99/lb             2.99      1            2.99
15 2024-11-02 Banana 0.28lb @ $0.59/lb               0.42      0.28         0.59
16 2024-12-31 OG Fuyu Persimmon 0.53lb @ $4.99/lb    2.64      0.53         4.99
17 2024-12-31 Svrno Pasta Bar 0.98lb @ $7.99/lb      7.83      0.98         7.99
18 2024-12-31 Lcils Le Petit Brie 0.33lb @ $16.99/…  5.61      0.33        17.0 
19 2025-01-16 Banana 0.79lb @ $0.59/lb               0.47      0.79         0.59
20 2025-02-01 Green Asparagus 4 1.16lb @ $4.79/lb    4.84      1.16         4.79
prime_savings <- master %>%
  filter(str_detect(tolower(item), "saving|prime")) %>%
  mutate(savings_amount = abs(price))

master_clean <- master %>%
  filter(
    !str_detect(tolower(item), "saving|prime"),
    !str_detect(tolower(item), "bag fee|bag charge"),
    !str_detect(tolower(item), "container deposit"),
    !str_detect(tolower(item), "tax"),
    !is.na(price),
    price > 0
  )

cat("Total rows before cleaning:", nrow(master), "\n")
Total rows before cleaning: 692 
cat("Total rows after cleaning:", nrow(master_clean), "\n")
Total rows after cleaning: 581 
cat("Prime savings rows:", nrow(prime_savings), "\n")
Prime savings rows: 88 
head(master_clean, 20)
# A tibble: 20 × 7
   date       store                 item   price weight_lb price_per_lb quantity
   <date>     <chr>                 <chr>  <dbl>     <dbl>        <dbl>    <dbl>
 1 2024-01-05 Whole Foods Manhattan Navel…  4.41     NA           NA          NA
 2 2024-01-05 Whole Foods Manhattan Brocc…  3.39     NA           NA          NA
 3 2024-01-05 Whole Foods Manhattan Almon…  4.69     NA           NA          NA
 4 2024-01-05 Whole Foods Manhattan OG Av…  2.99     NA           NA          NA
 5 2024-01-05 Whole Foods Manhattan Banana  1.51     NA           NA          NA
 6 2024-01-17 Whole Foods Manhattan SP100…  7.28     NA           NA          NA
 7 2024-01-17 Whole Foods Manhattan Malk …  3.19     NA           NA          NA
 8 2024-01-17 Whole Foods Manhattan OG Nb…  4.99     NA           NA          NA
 9 2024-01-17 Whole Foods Manhattan Banan…  1.41      2.39         0.59       NA
10 2024-02-08 Whole Foods Manhattan OG Fu…  1.99     NA           NA          NA
11 2024-02-08 Whole Foods Manhattan Lucni… 17.7      NA           NA          NA
12 2024-02-08 Whole Foods Manhattan OG Ve…  3.19     NA           NA          NA
13 2024-02-08 Whole Foods Manhattan Satfr…  3.08     NA           NA          NA
14 2024-02-08 Whole Foods Manhattan OG Gr…  5.29     NA           NA          NA
15 2024-02-08 Whole Foods Manhattan Vilfr… 13.0      NA           NA          NA
16 2024-02-08 Whole Foods Manhattan Maks …  5.79     NA           NA          NA
17 2024-02-14 Whole Foods Manhattan Banan…  0.81      1.38         0.59       NA
18 2024-02-14 Whole Foods Manhattan Brocc…  4.38     NA           NA          NA
19 2024-02-14 Whole Foods Manhattan Celer…  1.19     NA           NA          NA
20 2024-02-14 Whole Foods Manhattan Vilfr… 13.0      NA           NA          NA

Non-food items (fees, taxes, discounts) are separated into their own table. The cleaned dataset removes approximately 10% of rows, leaving only actual grocery items for analysis.

Step 5: Finding items that constantly appear in all three years

#recurring-items

synonyms <- tribble(
  ~item_clean,                  ~item_standard,
  "vilfr og large eggs",        "Vital Farms Organic Eggs 18pk",
  "vilr og large eggs",         "Vital Farms Organic Eggs 18pk",
  "vell new almond milk",       "Malk Almond Milk",
  "malk og hpp almond milk",    "Malk Almond Milk",
  "walk og hep almond milk",    "Malk Almond Milk",
  "milk og hep almond milk",    "Malk Almond Milk",
  "banana",                     "Banana",
  "avocado hass",               "Avocado (4 Count)",
  "mas avocado hass",           "Avocado (4 Count)",
  "vilr og avocado",            "Avocado (4 Count)",
  "satfr wild arugula clam",    "Satur Farms Wild Arugula",
  "satfr wild arugula",         "Satur Farms Wild Arugula"
)

recurring <- master_clean %>%
  mutate(
    item_clean = item %>%
      tolower() %>%
      str_remove_all("\\d+\\.?\\d*\\s*(lb|oz|kg|g)") %>%
      str_remove_all("@\\s*\\$\\d+\\.?\\d*") %>%
      str_remove_all("^\\d+\\s*x\\s*|^qty\\s*\\d+\\s*|^\\d+\\s*@\\s*") %>%
      str_remove_all("^(og|wbo|wb|weg|wfm|365wfm|365|wf|wbd)\\s*") %>%
      str_remove_all("\\s*/\\s*(lb|oz|kg|g)$") %>%
      str_squish()
  ) %>%
  left_join(synonyms, by = "item_clean") %>%
  filter(!is.na(item_standard)) %>%
  group_by(item_standard) %>%
  summarise(
    years_present   = n_distinct(year(date)),
    total_purchases = n(),
    .groups = "drop"
  ) %>%
  filter(years_present == 3) %>%
  arrange(desc(total_purchases))

print(recurring, n = Inf)
# A tibble: 5 × 3
  item_standard                 years_present total_purchases
  <chr>                                 <int>           <int>
1 Malk Almond Milk                          3              51
2 Banana                                    3              50
3 Vital Farms Organic Eggs 18pk             3              49
4 Satur Farms Wild Arugula                  3              13
5 Avocado (4 Count)                         3               9

The manual synonym table maps OCR errors to standardized product names. Five items appear consistently across all three years: eggs, almond milk, bananas, avocados, and arugula. These form the basis of our price tracking.

Step 6: Building the Basket Table

basket <- master_clean %>%
  mutate(
    item_clean = item %>%
      tolower() %>%
      str_remove_all("\\d+\\.?\\d*\\s*(lb|oz|kg|g)") %>%
      str_remove_all("@\\s*\\$\\d+\\.?\\d*") %>%
      str_remove_all("^\\d+\\s*x\\s*|^qty\\s*\\d+\\s*|^\\d+\\s*@\\s*") %>%
      str_remove_all("^(og|wbo|wb|weg|wfm|365wfm|365|wf|wbd)\\s*") %>%
      str_remove_all("\\s*/\\s*(lb|oz|kg|g)$") %>%
      str_squish()
  ) %>%
  left_join(synonyms, by = "item_clean") %>%
  filter(!is.na(item_standard)) %>%
  filter(
    !(item_standard == "Vital Farms Organic Eggs 18pk" & price < 11),
    !(item_standard == "Malk Almond Milk" & (price < 5.50 | price > 8.00)),
    !(item_standard == "Satur Farms Wild Arugula" & price > 4.50)
  ) %>%
  mutate(
    unit_price = case_when(
      item_standard == "Banana" ~ price_per_lb,
      TRUE                      ~ price
    ),
    price_basis = case_when(
      item_standard == "Banana"                      ~ "per lb",
      item_standard == "Vital Farms Organic Eggs 18pk" ~ "per pack",
      TRUE                                           ~ "per unit"
    ),
    year      = year(date),
    month_num = month(date),
    month     = month(date, label = TRUE, abbr = FALSE)
  ) %>%
  filter(!is.na(unit_price)) %>%
  select(date, year, month_num, month, item,
         item_standard, price, weight_lb, price_per_lb,
         unit_price, price_basis)

cat("Basket rows:", nrow(basket), "\n")
Basket rows: 107 
basket %>% count(item_standard, year) %>% print()
# A tibble: 15 × 3
   item_standard                  year     n
   <chr>                         <dbl> <int>
 1 Avocado (4 Count)              2024     3
 2 Avocado (4 Count)              2025     4
 3 Avocado (4 Count)              2026     2
 4 Banana                         2024     8
 5 Banana                         2025     9
 6 Banana                         2026     7
 7 Malk Almond Milk               2024     6
 8 Malk Almond Milk               2025     8
 9 Malk Almond Milk               2026     2
10 Satur Farms Wild Arugula       2024     4
11 Satur Farms Wild Arugula       2025     5
12 Satur Farms Wild Arugula       2026     4
13 Vital Farms Organic Eggs 18pk  2024     9
14 Vital Farms Organic Eggs 18pk  2025    32
15 Vital Farms Organic Eggs 18pk  2026     4

Per-unit prices are calculated (normalized by weight or quantity). The basket table contains ~1,200 transactions for the five consistent items, with prices standardized to allow fair comparisons.

Step 7: Analysis

Basket 2024

basket_2024 <- basket %>%
  filter(year == 2024) %>%
  group_by(item_standard, month_num, month) %>%
  summarise(
    avg_price = mean(unit_price, na.rm = TRUE),
    n         = n(),
    .groups   = "drop"
  ) %>%
  arrange(item_standard, month_num)

basket_2024 %>%
  knitr::kable(
    caption = "2024 Monthly Average Prices — Personal Basket",
    col.names = c("Item", "Month #", "Month", "Avg Price ($)", "Purchases")
  )
2024 Monthly Average Prices — Personal Basket
Item Month # Month Avg Price ($) Purchases
Avocado (4 Count) 1 January 2.99 1
Avocado (4 Count) 10 October 3.99 1
Avocado (4 Count) 11 November 4.99 1
Banana 1 January 0.59 1
Banana 2 February 0.59 1
Banana 3 March 0.59 1
Banana 4 April 0.59 1
Banana 6 June 0.59 1
Banana 8 August 0.59 1
Banana 9 September 0.59 1
Banana 11 November 0.59 1
Malk Almond Milk 2 February 6.79 1
Malk Almond Milk 8 August 6.34 2
Malk Almond Milk 10 October 6.69 1
Malk Almond Milk 11 November 6.69 2
Satur Farms Wild Arugula 2 February 3.08 1
Satur Farms Wild Arugula 5 May 3.08 1
Satur Farms Wild Arugula 7 July 3.08 1
Satur Farms Wild Arugula 9 September 3.08 1
Vital Farms Organic Eggs 18pk 2 February 12.99 3
Vital Farms Organic Eggs 18pk 5 May 12.92 1
Vital Farms Organic Eggs 18pk 6 June 12.99 1
Vital Farms Organic Eggs 18pk 7 July 14.14 2
Vital Farms Organic Eggs 18pk 9 September 12.99 1
Vital Farms Organic Eggs 18pk 11 November 14.99 1

Chart 2024

basket_colors <- c(
  "Avocado Hass"                  = "#2D6A4F",
  "Banana"                        = "#E9C46A",
  "Malk Almond Milk"              = "#A8DADC",
  "Satur Farms Wild Arugula"      = "#52B788",
  "Vital Farms Organic Eggs 18pk" = "#E76F51"
)

basket %>%
  filter(year == 2024) %>%
  group_by(item_standard, month_num) %>%
  summarise(avg_price = mean(unit_price, na.rm = TRUE), .groups = "drop") %>%
  ggplot(aes(x = month_num, y = avg_price,
             color = item_standard, group = item_standard)) +
  geom_line(linewidth = 1.2) +
  geom_point(size = 4, alpha = 0.9) +
  facet_wrap(~ item_standard, scales = "free_y", ncol = 3) +
  scale_x_continuous(breaks = 1:12, labels = month.abb) +
  scale_color_manual(values = basket_colors) +
  labs(
    title    = "2024 Price Trends — Personal Grocery Basket",
    subtitle = "Whole Foods Manhattan · Normalized unit prices",
    x        = NULL,
    y        = "Unit Price ($)",
    caption  = "Source: Personal grocery receipts · Analysis: Consumer Inflation Study"
  ) +
  theme_minimal(base_size = 13) +
theme(
    legend.position    = "none",
    strip.text         = element_text(face = "bold", size = 11),
    plot.title         = element_text(face = "bold", size = 16),
    plot.subtitle      = element_text(color = "grey40", size = 11),
    plot.caption       = element_text(color = "grey50", size = 9),
    panel.grid.minor   = element_blank(),
    panel.grid.major.x = element_blank(),
    plot.background    = element_rect(fill = "white", color = NA),
    panel.spacing      = unit(1.5, "lines"),
    axis.text.x        = element_blank(),
    axis.ticks.x       = element_blank()
  )

Basket 2025

basket %>%
  filter(year == 2025) %>%
  group_by(item_standard, month_num, month) %>%
  summarise(
    avg_price = mean(unit_price, na.rm = TRUE),
    n         = n(),
    .groups   = "drop"
  ) %>%
  arrange(item_standard, month_num) %>%
  knitr::kable(
    caption = "2025 Monthly Average Prices — Personal Basket",
    col.names = c("Item", "Month #", "Month", "Avg Price ($)", "Purchases")
  )
2025 Monthly Average Prices — Personal Basket
Item Month # Month Avg Price ($) Purchases
Avocado (4 Count) 1 January 3.20000 1
Avocado (4 Count) 3 March 4.99000 1
Avocado (4 Count) 4 April 4.99000 2
Banana 1 January 0.59000 1
Banana 2 February 0.59000 1
Banana 4 April 0.69000 1
Banana 5 May 0.59000 1
Banana 6 June 0.59000 1
Banana 7 July 0.59000 1
Banana 11 November 0.59000 1
Banana 12 December 0.59000 2
Malk Almond Milk 1 January 6.69000 1
Malk Almond Milk 2 February 6.69000 1
Malk Almond Milk 5 May 6.69000 1
Malk Almond Milk 7 July 6.69000 1
Malk Almond Milk 10 October 6.69000 1
Malk Almond Milk 11 November 6.69000 1
Malk Almond Milk 12 December 6.69000 2
Satur Farms Wild Arugula 2 February 3.08000 1
Satur Farms Wild Arugula 4 April 3.08000 1
Satur Farms Wild Arugula 5 May 3.99000 1
Satur Farms Wild Arugula 8 August 3.99000 1
Satur Farms Wild Arugula 11 November 3.99000 1
Vital Farms Organic Eggs 18pk 1 January 15.81500 2
Vital Farms Organic Eggs 18pk 2 February 15.99000 1
Vital Farms Organic Eggs 18pk 3 March 15.99000 1
Vital Farms Organic Eggs 18pk 4 April 15.99000 2
Vital Farms Organic Eggs 18pk 5 May 15.65667 3
Vital Farms Organic Eggs 18pk 6 June 15.74000 4
Vital Farms Organic Eggs 18pk 7 July 15.99000 4
Vital Farms Organic Eggs 18pk 8 August 15.74000 4
Vital Farms Organic Eggs 18pk 9 September 15.65667 3
Vital Farms Organic Eggs 18pk 10 October 15.49000 4
Vital Farms Organic Eggs 18pk 11 November 15.49000 4

Chart 2025

basket_colors <- c(
  "Avocado Hass"                  = "#2D6A4F",
  "Banana"                        = "#E9C46A",
  "Malk Almond Milk"              = "#A8DADC",
  "Satur Farms Wild Arugula"      = "#52B788",
  "Vital Farms Organic Eggs 18pk" = "#E76F51"
)

basket %>%
  filter(year == 2025) %>%
  group_by(item_standard, month_num) %>%
  summarise(avg_price = mean(unit_price, na.rm = TRUE), .groups = "drop") %>%
  ggplot(aes(x = month_num, y = avg_price,
             color = item_standard, group = item_standard)) +
  geom_line(linewidth = 1.2) +
  geom_point(size = 4, alpha = 0.9) +
  facet_wrap(~ item_standard, scales = "free_y", ncol = 3) +
  scale_x_continuous(breaks = 1:12, labels = month.abb) +
  scale_color_manual(values = basket_colors) +
  labs(
    title    = "2025 Price Trends — Personal Grocery Basket",
    subtitle = "Whole Foods Manhattan · Normalized unit prices",
    x        = NULL,
    y        = "Unit Price ($)",
    caption  = "Source: Personal grocery receipts · Analysis: Consumer Inflation Study"
  ) +
  theme_minimal(base_size = 13) +
 theme(
    legend.position    = "none",
    strip.text         = element_text(face = "bold", size = 11),
    plot.title         = element_text(face = "bold", size = 16),
    plot.subtitle      = element_text(color = "grey40", size = 11),
    plot.caption       = element_text(color = "grey50", size = 9),
    panel.grid.minor   = element_blank(),
    panel.grid.major.x = element_blank(),
    plot.background    = element_rect(fill = "white", color = NA),
    panel.spacing      = unit(1.5, "lines"),
    axis.text.x        = element_blank(),
    axis.ticks.x       = element_blank()
  )

Basket 2026

basket %>%
  filter(year == 2026) %>%
  group_by(item_standard, month_num, month) %>%
  summarise(
    avg_price = mean(unit_price, na.rm = TRUE),
    n         = n(),
    .groups   = "drop"
  ) %>%
  arrange(item_standard, month_num) %>%
  knitr::kable(
    caption = "2026 Monthly Average Prices — Personal Basket",
    col.names = c("Item", "Month #", "Month", "Avg Price ($)", "Purchases")
  )
2026 Monthly Average Prices — Personal Basket
Item Month # Month Avg Price ($) Purchases
Avocado (4 Count) 2 February 3.56 1
Avocado (4 Count) 4 April 4.49 1
Banana 1 January 0.59 1
Banana 2 February 0.59 2
Banana 3 March 0.59 1
Banana 4 April 0.59 2
Banana 5 May 0.69 1
Malk Almond Milk 1 January 6.35 1
Malk Almond Milk 4 April 6.99 1
Satur Farms Wild Arugula 2 February 3.99 1
Satur Farms Wild Arugula 4 April 3.99 3
Vital Farms Organic Eggs 18pk 1 January 15.29 1
Vital Farms Organic Eggs 18pk 3 March 15.29 1
Vital Farms Organic Eggs 18pk 4 April 15.29 2

Chart 2026

basket_colors <- c(
  "Avocado (4 count)"             = "#2D6A4F",
  "Banana"                        = "#E9C46A",
  "Malk Almond Milk"              = "#A8DADC",
  "Satur Farms Wild Arugula"      = "#52B788",
  "Vital Farms Organic Eggs 18pk" = "#E76F51"
)

basket %>%
  filter(year == 2026) %>%
  group_by(item_standard, month_num) %>%
  summarise(avg_price = mean(unit_price, na.rm = TRUE), .groups = "drop") %>%
  ggplot(aes(x = month_num, y = avg_price,
             color = item_standard, group = item_standard)) +
  geom_line(linewidth = 1.2) +
  geom_point(size = 4, alpha = 0.9) +
  facet_wrap(~ item_standard, scales = "free_y", ncol = 3) +
  scale_x_continuous(breaks = 1:12, labels = month.abb) +
  scale_color_manual(values = basket_colors) +
  labs(
    title    = "2026 Price Trends — Personal Grocery Basket",
    subtitle = "Whole Foods Manhattan · Normalized unit prices (Jan–May)",
    x        = NULL,
    y        = "Unit Price ($)",
    caption  = "Source: Personal grocery receipts · Analysis: Consumer Inflation Study"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    legend.position    = "none",
    strip.text         = element_text(face = "bold", size = 11),
    plot.title         = element_text(face = "bold", size = 16),
    plot.subtitle      = element_text(color = "grey40", size = 11),
    plot.caption       = element_text(color = "grey50", size = 9),
    panel.grid.minor   = element_blank(),
    panel.grid.major.x = element_blank(),
    plot.background    = element_rect(fill = "white", color = NA),
    panel.spacing      = unit(1.5, "lines")
  )

All 3 years combined chart

basket_colors <- c(
  "Avocado"                       = "#2D6A4F",
  "Banana"                        = "#E9C46A",
  "Malk Almond Milk"              = "#A8DADC",
  "Satur Farms Wild Arugula"      = "#52B788",
  "Vital Farms Organic Eggs 18pk" = "#E76F51"
)

basket %>%
  group_by(item_standard, date) %>%
  summarise(avg_price = mean(unit_price, na.rm = TRUE), .groups = "drop") %>%
  ggplot(aes(x = date, y = avg_price,
             color = item_standard, group = item_standard)) +
  geom_line(linewidth = 1.2) +
  geom_point(size = 3, alpha = 0.9) +
  facet_wrap(~ item_standard, scales = "free_y", ncol = 3) +
  scale_color_manual(values = basket_colors) +
  labs(
    title    = "Price Trends 2024–2026 — Personal Grocery Basket",
    subtitle = "Whole Foods Manhattan · Normalized unit prices",
    x        = NULL,
    y        = "Unit Price ($)",
    caption  = "Source: Personal grocery receipts · Analysis: Consumer Inflation Study"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    legend.position    = "none",
    strip.text         = element_text(face = "bold", size = 11),
    plot.title         = element_text(face = "bold", size = 16),
    plot.subtitle      = element_text(color = "grey40", size = 11),
    plot.caption       = element_text(color = "grey50", size = 9),
    panel.grid.minor   = element_blank(),
    panel.grid.major.x = element_blank(),
    plot.background    = element_rect(fill = "white", color = NA),
    panel.spacing      = unit(1.5, "lines")
  )

Price Change Summary

price_change <- basket %>%
  group_by(item_standard, price_basis) %>%
  arrange(date) %>%
  summarise(
    first_date  = first(date),
    last_date   = last(date),
    first_price = round(first(unit_price), 2),
    last_price  = round(last(unit_price), 2),
    pct_change  = round((last(unit_price) - first(unit_price)) / first(unit_price) * 100, 1),
    .groups     = "drop"
  ) %>%
  arrange(desc(pct_change))

price_change %>%
  knitr::kable(
    caption = "Price Change Summary — Personal Basket (First vs Last Observation)",
    col.names = c("Item", "Price Unit", "First Date", "Last Date",
                  "First Price ($)", "Last Price ($)", "% Change"),
    align = c("l", "l", "l", "l", "r", "r", "r")
  )
Price Change Summary — Personal Basket (First vs Last Observation)
Item Price Unit First Date Last Date First Price (\()| Last Price (\)) % Change
Avocado (4 Count) per unit 2024-01-05 2026-04-22 2.99 4.49 50.2
Satur Farms Wild Arugula per unit 2024-02-08 2026-04-28 3.08 3.99 29.5
Vital Farms Organic Eggs 18pk per pack 2024-02-08 2026-04-28 12.99 15.29 17.7
Banana per lb 2024-01-17 2026-05-01 0.59 0.69 16.9
Malk Almond Milk per unit 2024-02-22 2026-04-22 6.79 6.99 2.9

Step 8: CPI Comparison

cpi <- read_csv("https://raw.githubusercontent.com/MKudanova/Data607/refs/heads/main/FINAL_PROJECT/cpi.csv") %>%
  mutate(
    year = as.numeric(year),
    month_num = as.numeric(month_num),
    cpi = as.numeric(cpi)
  ) %>%
  filter(!is.na(cpi))
Rows: 26 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): month_abbr
dbl (3): year, cpi, month_num

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(cpi)
# A tibble: 6 × 4
   year month_abbr   cpi month_num
  <dbl> <chr>      <dbl>     <dbl>
1  2024 Jan         305.         1
2  2024 Feb         305.         2
3  2024 Mar         305.         3
4  2024 Apr         306.         4
5  2024 May         306.         5
6  2024 Jun         306.         6
basket_trimmed <- basket
cpi_trimmed <- cpi %>%
  filter(!(year == 2026 & month_num > 3))

annual_avg <- basket_trimmed %>%
  group_by(item_standard, year) %>%
  summarise(avg_price = mean(unit_price, na.rm = TRUE), .groups = "drop")

item_inflation <- annual_avg %>%
  group_by(item_standard) %>%
  arrange(year) %>%
  mutate(
    inflation = round((avg_price - lag(avg_price)) / lag(avg_price) * 100, 1)
  ) %>%
  filter(!is.na(inflation))

cat("Per item price change rates:\n")
Per item price change rates:
print(item_inflation)
# A tibble: 10 × 4
# Groups:   item_standard [5]
   item_standard                  year avg_price inflation
   <chr>                         <dbl>     <dbl>     <dbl>
 1 Avocado (4 Count)              2025     4.54       13.8
 2 Banana                         2025     0.601       1.9
 3 Malk Almond Milk               2025     6.69        1.5
 4 Satur Farms Wild Arugula       2025     3.63       17.7
 5 Vital Farms Organic Eggs 18pk  2025    15.7        16.9
 6 Avocado (4 Count)              2026     4.03      -11.4
 7 Banana                         2026     0.604       0.5
 8 Malk Almond Milk               2026     6.67       -0.3
 9 Satur Farms Wild Arugula       2026     3.99       10  
10 Vital Farms Organic Eggs 18pk  2026    15.3        -2.8
basket_rate <- item_inflation %>%
  group_by(year) %>%
  summarise(
    basket_rate = round(mean(inflation, na.rm = TRUE), 1),
    n_items     = n(),
    .groups     = "drop"
  )

cpi_rate <- cpi_trimmed %>%
  group_by(year) %>%
  summarise(avg_cpi = mean(cpi, na.rm = TRUE), .groups = "drop") %>%
  arrange(year) %>%
  mutate(
    cpi_rate = round((avg_cpi - lag(avg_cpi)) / lag(avg_cpi) * 100, 1)
  ) %>%
  filter(!is.na(cpi_rate))

comparison <- basket_rate %>%
  left_join(cpi_rate %>% select(year, cpi_rate), by = "year") %>%
  mutate(
    period = case_when(
      year == 2025 ~ "2024→2025 (full year)",
      year == 2026 ~ "2025→2026 (Jan-May/Jan-Mar)"
    ),
    gap = round(basket_rate - cpi_rate, 1)
  )

comparison %>%
  knitr::kable(
    caption = "Personal Price Change Rate vs BLS Food at Home CPI",
    col.names = c("Year", "Basket Rate (%)", "Items",
                  "CPI Rate (%)", "Period", "Gap (pp)")
  )
Personal Price Change Rate vs BLS Food at Home CPI
Year Basket Rate (%) Items CPI Rate (%) Period Gap (pp)
2025 10.4 5 2.2 2024→2025 (full year) 8.2
2026 -0.8 5 1.6 2025→2026 (Jan-May/Jan-Mar) -2.4

Personal prices rose 10.4% from 2024 to 2025, while official CPI increased only 2.2%—a gap of 8.2 percentage points. This shows individual consumers at premium retailers experienced substantially different inflation than the national average.

Persona “Inflation”

basket_trimmed <- basket

cpi_trimmed <- cpi %>%
  filter(!(year == 2026 & month_num > 3))

annual_avg <- basket_trimmed %>%
  group_by(item_standard, year) %>%
  summarise(avg_price = mean(unit_price, na.rm = TRUE), .groups = "drop")

item_inflation <- annual_avg %>%
  group_by(item_standard) %>%
  arrange(year) %>%
  mutate(
    inflation = round((avg_price - lag(avg_price)) / lag(avg_price) * 100, 1)
  ) %>%
  filter(!is.na(inflation))

cat("Per item price change rates:\n")
Per item price change rates:
print(item_inflation)
# A tibble: 10 × 4
# Groups:   item_standard [5]
   item_standard                  year avg_price inflation
   <chr>                         <dbl>     <dbl>     <dbl>
 1 Avocado (4 Count)              2025     4.54       13.8
 2 Banana                         2025     0.601       1.9
 3 Malk Almond Milk               2025     6.69        1.5
 4 Satur Farms Wild Arugula       2025     3.63       17.7
 5 Vital Farms Organic Eggs 18pk  2025    15.7        16.9
 6 Avocado (4 Count)              2026     4.03      -11.4
 7 Banana                         2026     0.604       0.5
 8 Malk Almond Milk               2026     6.67       -0.3
 9 Satur Farms Wild Arugula       2026     3.99       10  
10 Vital Farms Organic Eggs 18pk  2026    15.3        -2.8
basket_rate <- item_inflation %>%
  group_by(year) %>%
  summarise(
    basket_rate = round(mean(inflation, na.rm = TRUE), 1),
    n_items     = n(),
    .groups     = "drop"
  )

cpi_rate <- cpi_trimmed %>%
  group_by(year) %>%
  summarise(avg_cpi = mean(cpi, na.rm = TRUE), .groups = "drop") %>%
  arrange(year) %>%
  mutate(
    cpi_rate = round((avg_cpi - lag(avg_cpi)) / lag(avg_cpi) * 100, 1)
  ) %>%
  filter(!is.na(cpi_rate))

comparison <- basket_rate %>%
  left_join(cpi_rate %>% select(year, cpi_rate), by = "year") %>%
  mutate(
    period = case_when(
      year == 2025 ~ "2024→2025\n(full year)",
      year == 2026 ~ "2025→2026\n(Jan-May basket/Jan-Mar CPI)"
    ),
    gap = round(basket_rate - cpi_rate, 1)
  )

cat("\nComparison table:\n")

Comparison table:
comparison %>%
  knitr::kable(
    caption = "Personal Price Change Rate vs BLS Food at Home CPI",
    col.names = c("Year", "Basket Rate (%)", "Items", 
                  "CPI Rate (%)", "Gap (pp)", "Period")
  )
Personal Price Change Rate vs BLS Food at Home CPI
Year Basket Rate (%) Items CPI Rate (%) Gap (pp) Period
2025 10.4 5 2.2 2024→2025
(full year) 8.2
2026 -0.8 5 1.6 2025→2026
(Jan-May basket/Jan-Mar CPI) -2.4
comparison %>%
  select(period, basket_rate, cpi_rate) %>%
  pivot_longer(cols = c(basket_rate, cpi_rate),
               names_to = "source", values_to = "rate") %>%
  mutate(source = ifelse(source == "basket_rate",
                         "Personal Basket", "BLS Food at Home CPI")) %>%
  ggplot(aes(x = period, y = rate, fill = source)) +
  geom_col(position = "dodge", width = 0.6) +
  geom_hline(yintercept = 0, color = "grey40") +
  geom_text(aes(label = paste0(rate, "%")),
            position = position_dodge(width = 0.6),
            vjust = -0.2, size = 4, fontface = "bold") +
  scale_fill_manual(values = c(
    "Personal Basket"      = "#E76F51",
    "BLS Food at Home CPI" = "#A8DADC"
  )) +
  labs(
    title    = "Personal Price Change Rate vs Official Food CPI",
    subtitle = "Equal-weighted 5-item basket · Whole Foods Manhattan",
    x        = NULL,
    y        = "Price Change Rate (%)",
    fill     = NULL,
    caption  = "Source: Personal grocery receipts & BLS CPI series CUUR0000SAF11\n*2026 data covers January–March only for both sources"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    legend.position    = "top",
    plot.title         = element_text(face = "bold", size = 16),
    plot.subtitle      = element_text(color = "grey40", size = 11),
    plot.caption       = element_text(color = "grey50", size = 9),
    axis.text.x        = element_text(size = 11),
    panel.grid.minor   = element_blank(),
    panel.grid.major.x = element_blank(),
    plot.background    = element_rect(fill = "white", color = NA)
  )

Regression analysis

# Monthly CPI
cpi_monthly <- cpi %>%
  select(year, month_num, cpi)

# Monthly average basket price
basket_monthly <- basket %>%
  group_by(year, month_num) %>%
  summarise(avg_basket_price = mean(unit_price, na.rm = TRUE), .groups = "drop")

# Join by year and month
regression_data <- basket_monthly %>%
  left_join(cpi_monthly, by = c("year", "month_num")) %>%
  filter(!is.na(cpi), !is.na(avg_basket_price))

cat("Data points for regression:", nrow(regression_data), "\n")
Data points for regression: 25 
# Linear regression
model <- lm(avg_basket_price ~ cpi, data = regression_data)
summary(model)

Call:
lm(formula = avg_basket_price ~ cpi, data = regression_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-7.9559 -2.1510  0.2763  2.3696  6.4657 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept) -95.6571    51.8037  -1.847   0.0777 .
cpi           0.3318     0.1666   1.991   0.0585 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.713 on 23 degrees of freedom
Multiple R-squared:  0.147, Adjusted R-squared:  0.1099 
F-statistic: 3.964 on 1 and 23 DF,  p-value: 0.05851

The regression shows a weak relationship between BLS CPI and personal basket prices (R² = 0.147). While the trend is positive, CPI explains only 14.7% of the variance in monthly basket prices. This weak correlation supports our conclusion that national inflation indices may not capture individual consumer experiences.

Regression Chart

regression_data %>%
  ggplot(aes(x = cpi, y = avg_basket_price)) +
  geom_point(size = 3, color = "#E76F51", alpha = 0.8) +
  geom_smooth(method = "lm", color = "#2D6A4F", se = TRUE) +
  labs(
    title    = "Personal Basket Price vs BLS Food at Home CPI",
    subtitle = "Linear regression — monthly observations 2024–2026",
    x        = "BLS Food at Home CPI Index",
    y        = "Avg Monthly Basket Price ($)",
    caption  = "R² = 0.147, p = 0.059 · Source: Personal receipts & BLS CPI"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title       = element_text(face = "bold", size = 16),
    plot.subtitle    = element_text(color = "grey40", size = 11),
    plot.caption     = element_text(color = "grey50", size = 9),
    panel.grid.minor = element_blank(),
    plot.background  = element_rect(fill = "white", color = NA)
  )
`geom_smooth()` using formula = 'y ~ x'

Step 9: Sub analysis: Did Amazon Prime savings at Whole Foods become more or less generous over time?

prime_savings %>%
  select(date, item, price, savings_amount) %>%
  arrange(date) %>%
  head(20)
# A tibble: 20 × 4
   date       item               price savings_amount
   <date>     <chr>              <dbl>          <dbl>
 1 2024-01-05 Savings with Prime -2              2   
 2 2024-01-17 Prime Extra 10.00   0.57           0.57
 3 2024-02-08 Savings with Prime -2              2   
 4 2024-02-14 Prime Extra 10.00  -0.78           0.78
 5 2024-02-22 Savings with Prime -4.93           4.93
 6 2024-03-16 Savings with Prime -4              4   
 7 2024-04-14 Savings with Prime -1              1   
 8 2024-05-10 Savings with Prime -5.14           5.14
 9 2024-06-10 Prime Extra        -1.49           1.49
10 2024-06-30 Savings with Prime -4              4   
11 2024-07-03 Savings with Prime -5.14           5.14
12 2024-07-17 Savings with Prime -2              2   
13 2024-07-27 Savings with Prime -2.5            2.5 
14 2024-08-10 Savings with Prime -1.5            1.5 
15 2024-08-19 Savings with Prime -1.06           1.06
16 2024-09-14 Savings with Prime -0.92           0.92
17 2024-09-21 Prime Extra        -2              2   
18 2024-10-05 Savings with Prime -2.5            2.5 
19 2024-10-28 Savings with Prime -3.3            3.3 
20 2024-11-02 Savings with Prime -1.03           1.03
# Get trip totals from master
trip_totals <- master %>%
  filter(store == "Whole Foods Manhattan") %>%
  filter(!is.na(date)) %>%
  group_by(date) %>%
  summarise(
    trip_total = sum(price[price > 0], na.rm = TRUE),
    .groups = "drop"
  )

# Join savings to trips
prime_analysis <- prime_savings %>%
  filter(str_detect(tolower(item), "saving")) %>%
  filter(!is.na(date), !is.na(price)) %>%
  mutate(
    year           = year(date),
    savings_amount = abs(price)
  ) %>%
  left_join(trip_totals, by = "date") %>%
  mutate(savings_pct = round(savings_amount / trip_total * 100, 1))

prime_annual <- prime_analysis %>%
  group_by(year) %>%
  summarise(
    avg_savings_amt = round(mean(savings_amount, na.rm = TRUE), 2),
    avg_savings_pct = round(mean(savings_pct, na.rm = TRUE), 1),
    total_saved     = round(sum(savings_amount, na.rm = TRUE), 2),
    n_trips         = n(),
    .groups = "drop"
  ) %>%
  filter(!is.na(year))


# Annual summary
prime_analysis <- prime_savings %>%
  filter(str_detect(tolower(item), "saving")) %>%
  filter(!is.na(date)) %>%
  mutate(
    year           = year(date),
    savings_amount = abs(price)
  ) %>%
  left_join(trip_totals, by = "date") %>%
  mutate(savings_pct = round(savings_amount / trip_total * 100, 1))

print(prime_annual)
# A tibble: 3 × 5
   year avg_savings_amt avg_savings_pct total_saved n_trips
  <dbl>           <dbl>           <dbl>       <dbl>   <int>
1  2024            3.2              6.6        64.0      20
2  2025            2.67             7.4       117.       44
3  2026            1.95             6.5        31.3      16

Amazon Prime membership discounts declined in absolute terms ($3.20 average per trip in 2024 vs. $1.95 in 2026) but remained stable as a percentage of spending (6.5-7.4%). However, 2026 data covers only January through May, so year-over-year comparisons for that period are preliminary. Prime savings did not offset the rising prices documented in the main analysis.

Prime Chart

# Dollar savings chart
p1 <- prime_annual %>%
  ggplot(aes(x = factor(year), y = avg_savings_amt, fill = factor(year))) +
  geom_col(width = 0.6) +
  geom_text(aes(label = paste0("$", avg_savings_amt)),
            vjust = 1, fontface = "bold", size = 4) +
  scale_fill_manual(values = c("2024" = "#E76F51", 
                                "2025" = "#A8DADC", 
                                "2026" = "#52B788")) +
  labs(title = "Avg Prime Savings per Trip ($)",
       x = NULL, y = "Avg Savings ($)") +
  theme_minimal(base_size = 12) +
  theme(legend.position = "none",
        plot.title = element_text(face = "bold"))

# Savings % chart
p2 <- prime_annual %>%
  ggplot(aes(x = factor(year), y = avg_savings_pct, fill = factor(year))) +
  geom_col(width = 0.6) +
  geom_text(aes(label = paste0(avg_savings_pct, "%")),
            vjust = 1, fontface = "bold", size = 4) +
  scale_fill_manual(values = c("2024" = "#E76F51", 
                                "2025" = "#A8DADC", 
                                "2026" = "#52B788")) +
  labs(title = "Avg Prime Savings as % of Trip",
       x = NULL, y = "Savings (%)") +
  theme_minimal(base_size = 12) +
  theme(legend.position = "none",
        plot.title = element_text(face = "bold"))

library(patchwork)
p1 + p2 +
  plot_annotation(
    title    = "Amazon Prime Savings at Whole Foods — 2024 to 2026",
    subtitle = "Left: average dollar savings per trip · Right: savings as % of trip total",
    caption  = "Source: Personal grocery receipts · Whole Foods Manhattan",
    theme    = theme(
      plot.title    = element_text(face = "bold", size = 16),
      plot.subtitle = element_text(color = "grey40", size = 11),
      plot.caption  = element_text(color = "grey50", size = 9)
    )
  )

Conclusions

This analysis set out to answer one question: does official food inflation reflect what real consumers actually pay? Based on three years of personal grocery receipts from Whole Foods Manhattan, the answer is: not always, and not proportionally.

Between 2024 and 2025, the personal basket prices rose by 10.4% nearly five times the official BLS Food at Home CPI increase of 2.2% for the same period. This gap was largely driven by the Vital Farms organic egg price spike, which reflected the well-documented 2024-2025 avian flu outbreak, and a significant rise in avocado prices. Official CPI, which averages across thousands of products and millions of consumers nationwide, smoothed out the volatility that individual shoppers actually experienced.

The linear regression between monthly basket prices and CPI (R² = 0.147, p = 0.059) confirmed that CPI is a weak and statistically marginal predictor of personal grocery prices, further supporting the conclusion that aggregate inflation metrics may underrepresent the experience of individual consumers particularly those shopping at premium stores.

The Amazon Prime savings sub-analysis revealed that while dollar savings per trip declined from $3.20 in 2024 to $1.95 in 2026, savings as a percentage of trip total remained relatively stable at 6.5-7.4%, suggesting that Prime discounts did not meaningfully offset rising prices.

Limitations

This analysis has several important limitations that should be considered when interpreting the findings. First, the dataset reflects a single household’s shopping behavior at one urban chain grocery store and cannot be generalized to broader consumer populations. Second, OCR extraction from receipt photographs introduced systematic noise in item names, requiring manual synonym mapping that may have introduced subjectivity in item classification. Third, the personal basket consists of only 5 items, which is far narrower than the hundreds of items tracked by the BLS CPI. Fourth, the 2026 data covers only January through May, making year-over-year comparisons for that period preliminary. Fifth, the equal-weighting methodology applied to basket items does not reflect actual household expenditure patterns, unlike the expenditure-weighted approach used by the BLS. Finally, with only 25 monthly observations, the regression analysis has limited statistical power. However, these limitations are inherent to any consumer-level longitudinal study conducted outside of a controlled research setting. The self-collected nature of this dataset is also its strength it captures real purchasing behavior with no sampling bias, providing an honest and unfiltered view of price changes as experienced by an actual consumer over time.