final project

2025-12-01

Starbucks Nutrition: A TidyTuesday Analysis

by: Mharkel Borja

setup

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.1     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.2.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Why This Dataset?

Research Questions

Dataset

Loading Data

library(tidyverse)

starbucks <- read_csv(
  "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-12-21/starbucks.csv"
)
## Rows: 1147 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (4): product_name, size, trans_fat_g, fiber_g
## dbl (11): milk, whip, serv_size_m_l, calories, total_fat_g, saturated_fat_g,...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Highest Sugar Drinks

starbucks %>%
  arrange(desc(sugar_g)) %>%
  slice(1:10) %>%
  ggplot(aes(x = reorder(product_name, sugar_g),
             y = sugar_g)) +
  geom_col(fill = "steelblue") +
  coord_flip() +
  labs(title = "Top 10 Highest-Sugar Drinks",
       x = "Drink",
       y = "Sugar (g)")

Calories by Drink Size

starbucks %>%
  ggplot(aes(x = size, y = calories)) +
  geom_jitter(width = 0.2, height = 0, color = "darkgreen", alpha = 0.6) +
  labs(title = "Calories by Drink Size",
       x = "Drink Size",
       y = "Calories (kcal)")

Key Findings

Conclusion