This project looks at grocery prices in my own neighborhood, Brownsville, Brooklyn. I focused on two local supermarkets—C-Town Supermarket (Brownsville) and Fine Fare Supermarket (Brownsville)—and recorded shelf prices for 17 core items that represent a typical household grocery basket. My original plan was to compare Brownsville prices to a store in Lower Manhattan, but the amount of manual data collection and cleaning required made that too much to complete this semester. Because of that, this analysis should be viewed as Phase 1, focused only on Brownsville while building a framework that can be extended to Lower Manhattan in the future.
To collect the data, I visited both supermarkets in person and photographed the shelf labels for each of the 17 core grocery items. From these photos, I manually recorded the item name, package size, and shelf price into a single dataset. When items were on promotion (for example, “2 for $6”), I converted that into a single-item price (so “2 for $6” becomes $3.00 per bag). For items sold by weight, such as bananas, onions, and potatoes, I used the posted price per pound. I also recorded when an item was unavailable in a store (for example, if tofu or tater tots were not sold there), and left the price as missing for that store. All of this information was entered into a tibble in R, which I then used for summary tables and visualizations.
core_items <- tribble(
~store, ~item, ~category, ~size, ~unit, ~price,
"C-Town Supermarket (Brownsville)", "100% Milk", "Dairy", "1 gallon", "gallon", 5.49,
"Fine Fare Supermarket (Brownsville)", "Whole Milk", "Dairy", "1 gallon", "gallon", 4.99,
"C-Town Supermarket (Brownsville)", "Cream cheese", "Dairy", "8 oz", "ounce", 6.69,
"Fine Fare Supermarket (Brownsville)", "Cream cheese", "Dairy", "8 oz", "ounce", 6.59,
"C-Town Supermarket (Brownsville)", "Sharp Cheddar cheese", "Dairy", "8 oz", "ounce", 3.49,
"Fine Fare Supermarket (Brownsville)", "Sharp Cheddar cheese", "Dairy", "8 oz", "ounce", 3.49,
"C-Town Supermarket (Brownsville)", "Half & half", "Dairy", "1 quart", "quart", NA_real_,
"Fine Fare Supermarket (Brownsville)", "Half & half", "Dairy", "1 quart", "quart", 4.49,
"C-Town Supermarket (Brownsville)", "Large Eggs", "Protein", "12 eggs", "dozen", 3.49,
"Fine Fare Supermarket (Brownsville)", "Large Eggs", "Protein", "12 eggs", "dozen", 3.99,
"C-Town Supermarket (Brownsville)", "Chicken breastSkinless bone in", "Protein", "1 lb", "pound", 2.99,
"Fine Fare Supermarket (Brownsville)", "Chicken breast cutlets", "Protein", "1 lb", "pound", 4.99,
"C-Town Supermarket (Brownsville)", "Ground Chunk", "Protein", "1 lb", "pound", 7.59,
"Fine Fare Supermarket (Brownsville)", "Ground beef", "Protein", "1 lb", "pound", 7.19,
"C-Town Supermarket (Brownsville)", "Tofu", "Protein", "14-16 oz", "ounce", NA_real_,
"Fine Fare Supermarket (Brownsville)", "Extra firm Tofu", "Protein", "14-16 oz", "ounce", 4.99,
"C-Town Supermarket (Brownsville)", "Barilla Fettuccine Pasta", "Dry goods", "16 oz", "ounce", 2.49,
"Fine Fare Supermarket (Brownsville)", "Ronzoni Fettucine Pasta", "Dry goods", "16 oz", "ounce", 2.69,
"C-Town Supermarket (Brownsville)", "Golden Canilla Parboiled Rice", "Dry goods", "5 lb bag", "pound", 8.59,
"Fine Fare Supermarket (Brownsville)", "Carolina gold Parboiled Rice", "Dry goods", "5 lb bag", "pound", 7.49,
"C-Town Supermarket (Brownsville)", "Cheerios Cereal", "Dry goods", "box", "unit", 8.69,
"Fine Fare Supermarket (Brownsville)", "Cheerios Cereal", "Dry goods", "box", "unit", 9.39,
"C-Town Supermarket (Brownsville)", "Split Top Wheat Bread", "Dry goods", "loaf", "loaf", 2.99,
"Fine Fare Supermarket (Brownsville)", "Split Top Wheat Bread", "Dry goods", "loaf", "loaf", 2.99,
"C-Town Supermarket (Brownsville)", "Tater tots", "Frozen", "bag", "bag", NA_real_,
"Fine Fare Supermarket (Brownsville)", "Tater tots", "Frozen", "bag", "bag", NA_real_,
"C-Town Supermarket (Brownsville)", "Yellow Bananas", "Produce", "per lb", "pound", .89,
"Fine Fare Supermarket (Brownsville)", "Yellow Bananas", "Produce", "per lb", "pound", .99,
"C-Town Supermarket (Brownsville)", "Red Onions", "Produce", "per lb", "pound", 1.29,
"Fine Fare Supermarket (Brownsville)", "Red Onions", "Produce", "per lb", "pound", 1.99,
"C-Town Supermarket (Brownsville)", "Golden apples", "Produce", "per lb", "pound", 1.99,
"Fine Fare Supermarket (Brownsville)", "Gala apples", "Produce", "per lb", "pound", 2.49,
"C-Town Supermarket (Brownsville)", "Potatoes (Idaho/Russet)", "Produce","5 lb bag","pound", 3.00,
"Fine Fare Supermarket (Brownsville)","Potatoes (Idaho/Russet)", "Produce","5 lb bag","pound", 2.99
)
core_items
## # A tibble: 34 × 6
## store item category size unit price
## <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 C-Town Supermarket (Brownsville) 100% Milk Dairy 1 ga… gall… 5.49
## 2 Fine Fare Supermarket (Brownsville) Whole Milk Dairy 1 ga… gall… 4.99
## 3 C-Town Supermarket (Brownsville) Cream cheese Dairy 8 oz ounce 6.69
## 4 Fine Fare Supermarket (Brownsville) Cream cheese Dairy 8 oz ounce 6.59
## 5 C-Town Supermarket (Brownsville) Sharp Cheddar… Dairy 8 oz ounce 3.49
## 6 Fine Fare Supermarket (Brownsville) Sharp Cheddar… Dairy 8 oz ounce 3.49
## 7 C-Town Supermarket (Brownsville) Half & half Dairy 1 qu… quart NA
## 8 Fine Fare Supermarket (Brownsville) Half & half Dairy 1 qu… quart 4.49
## 9 C-Town Supermarket (Brownsville) Large Eggs Protein 12 e… dozen 3.49
## 10 Fine Fare Supermarket (Brownsville) Large Eggs Protein 12 e… dozen 3.99
## # ℹ 24 more rows
The table and bar chart below summarize the shelf prices for the 17 core grocery items at C-Town Supermarket and Fine Fare Supermarket in Brownsville. The summary table reports the price at each store and the difference between them, while the bar chart shows a side-by-side comparison for each item.
# Summarize prices by item and show both stores + difference
summary_table <- core_items %>%
dplyr::group_by(item) %>%
dplyr::summarise(
ctown_price = price[store == "C-Town Supermarket (Brownsville)"],
finefare_price = price[store == "Fine Fare Supermarket (Brownsville)"],
difference = finefare_price - ctown_price
)
## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
## always returns an ungrouped data frame and adjust accordingly.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `summarise()` has grouped output by 'item'. You can override using the
## `.groups` argument.
summary_table
## # A tibble: 10 × 4
## # Groups: item [10]
## item ctown_price finefare_price difference
## <chr> <dbl> <dbl> <dbl>
## 1 Cheerios Cereal 8.69 9.39 0.700
## 2 Cream cheese 6.69 6.59 -0.100
## 3 Half & half NA 4.49 NA
## 4 Large Eggs 3.49 3.99 0.5
## 5 Potatoes (Idaho/Russet) 3 2.99 -0.01000
## 6 Red Onions 1.29 1.99 0.7
## 7 Sharp Cheddar cheese 3.49 3.49 0
## 8 Split Top Wheat Bread 2.99 2.99 0
## 9 Tater tots NA NA NA
## 10 Yellow Bananas 0.89 0.99 0.1
ggplot(core_items, aes(x = item, y = price, fill = store)) +
geom_col(position = "dodge") +
theme_minimal() +
labs(
title = "Shelf Price Comparison: C-Town vs Fine Fare (Brownsville)",
x = "Item",
y = "Shelf Price ($)",
fill = "Store"
) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_col()`).
My original intent for this project was to compare grocery prices in Brownsville to prices at a store in Lower Manhattan, such as Whole Foods or Trader Joe’s, using the same 17-item basket. However, manually collecting and cleaning the Brownsville data took significantly more time than I expected. Rather than rush the Manhattan data or rely on online price estimates that might not match in-store prices, I decided to limit this analysis to the Brownsville phase only.
This means that the current project does not answer the question of how Brownsville compares to Lower Manhattan. Instead, it focuses on the price differences between two neighborhood supermarkets that local residents actually use. At the same time, the data structure and R code are now set up so that a Phase 2 project could add a Lower Manhattan store and directly extend this analysis in the future.
One of the biggest limitations was brand and product variability. Grocery stores don’t stock identical versions of every item, and I noticed that many categories had multiple brands, sizes, and price points. Without a standardized rule (such as always choosing the lowest-priced option or always choosing a specific brand), the comparison can be affected by differences in product assortment rather than actual price levels. This is an important factor I did not anticipate fully, and it shows how complex real-world grocery price comparisons can be.
This project ended up being a good lesson in planning and scope. On paper, comparing Brownsville to Lower Manhattan sounded simple, but once I started taking photos, reading shelf labels, entering prices, and cleaning the data, I realized how much work was involved in just one neighborhood. I was disappointed at first that I could not fully deliver the Lower Manhattan comparison this semester, but I also recognize that this kind of adjustment happens in real data projects all the time.
I am still proud that I built a clean Brownsville dataset, defined a clear 17-item basket, and created working summary tables and visualizations in R. This Phase 1 version gives me a solid foundation, and if I ever want to extend the project, I can reuse the same structure to add a Manhattan store. Overall, the experience helped me practice not only technical skills, but also being honest and transparent about what I could realistically finish with the time and resources I had.
As I collected the prices, I realized the process was not as simple as I expected. Many items had multiple brands, sizes, or variations, which made it difficult to decide which version represented the “true” price for comparison. For example, there might be five different types of pasta or three different brands of cereal, each with different prices. I quickly learned that without a strict rule for selecting brands or sizes, the dataset can become inconsistent. This is something I would redesign in future phases, either by choosing a standardized brand list or collecting a price range (lowest-priced vs name-brand). This experience helped me see how much planning a project like this really requires.