Overview

Summary

This dataset contains the nutrition info for Starbucks menu items.

The intention of this assessment is to analyze the beverages offered at Starbucks and make health concious decisions based off caffeine, sugar, carbs, and calories.

Details of Starbucks data:

  • The ‘Beverage_category’ column classifies the type of beverage, such as coffee, tea, or smoothie. The ‘Beverage’ column provides the specific name of the drink, for instance, Caramel Macchiato or Green Tea Latte.

  • The ‘Beverage_prep’ column details the preparation method of the beverage, including whether it’s served hot or cold, and any additional ingredients or toppings like whipped cream or syrup. The ‘Calories’ column lists the total caloric content of each beverage, providing insight into the energy provided by each drink.

  • The next three columns, ‘Total Fat (g)’, ‘Trans Fat (g)’, and ‘Saturated Fat (g)’, provide a breakdown of the fat content in each beverage. These columns are crucial for those monitoring their fat intake for health or dietary reasons. The ‘Sodium (mg)’ column indicates the amount of sodium in each beverage, which is essential information for individuals on low-sodium diets.

  • The ‘Total Carbohydrates (g)’ column provides the total carbohydrate content, including sugars, which is particularly useful for people managing diabetes or following a low-carb diet. Lastly, the ‘Cholesterol (mg)’ column lists the amount of cholesterol in each beverage, a critical factor for those monitoring their cholesterol levels.

Data/Packages

  • ‘Tidyverse’ for preparing, wrangling and visualizing data
  • ‘Plotly’ for interactive charts
  • ‘DT’ for data table used in ‘overview’ section
# Packages
library(tidyverse, warn.conflicts = FALSE, quietly = TRUE)
## Warning: package 'tidyverse' was built under R version 4.3.2
## Warning: package 'ggplot2' was built under R version 4.3.2
## Warning: package 'tidyr' was built under R version 4.3.2
## Warning: package 'readr' was built under R version 4.3.2
## Warning: package 'dplyr' was built under R version 4.3.2
## Warning: package 'stringr' was built under R version 4.3.2
## Warning: package 'forcats' was built under R version 4.3.2
## Warning: package 'lubridate' was built under R version 4.3.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.0     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(plotly, warn.conflicts = FALSE, quietly = TRUE)
## Warning: package 'plotly' was built under R version 4.3.2
library(DT, warn.conflicts = FALSE, quietly = TRUE)
## Warning: package 'DT' was built under R version 4.3.2
# Read data
starbucks <- "starbucks.csv"
if (file.exists("starbucks.csv")) {
  print("File exists.")
} else {
  print("File not found.")
}
## [1] "File exists."
starbucks <- read.csv("starbucks.csv")

Data cleaning

I noticed that the variable names from the Starbucks excel document I brought into R might be problematic with how parenthesis and the like are commonly used in R, so I decided to rename the entire thing for ease of use. Then there were data frames I had to create using filter and subsets, to only analyze the variables I was looking for without the excess that was provided in the document.

# Data cleansing
starbucks <- starbucks %>%
  rename(
    beverage_type = `Beverage`,
    beverage_prep = `Beverage_prep`,
    calories = `Calories`,
    total_fat = `Total.Fat..g.`,
    trans_fat = `Trans.Fat..g.`,
    saturated_fat = `Saturated.Fat..g.`,
    sodium = `Sodium..mg.`,
    total_carbs = `Total.Carbohydrates..g.`,
    cholesterol = `Cholesterol..mg.`,
    dietary_fiber = `Dietary.Fibre..g.`,
    sugars = `Sugars..g.`,
    protein = `Protein..g.`,
    vitamin_a = `Vitamin.A....DV.`,
    vitamin_c = `Vitamin.C....DV.`,
    calcium = `Calcium....DV.`,
    iron = `Iron....DV.`,
    caffeine = `Caffeine..mg.`
  )

# Remove varies observations
starbucks <- starbucks %>%
  filter(!caffeine %in% c('varies', 'Varies'))

# Further cleaning to only read in numeric values
starbucks$caffeine <- as.numeric(as.character(starbucks$caffeine))

# Needed to create a DF that ONLY had the count of unique drinks NOT accounting for size of drinks
unique_drinks_df <- starbucks %>%
  group_by(Beverage_category) %>%
  summarise(unique_beverage_types = n_distinct(beverage_type))

coffee_data <- starbucks %>%
  filter(Beverage_category == 'Coffee') %>%
  select(calories, caffeine, sugars, total_carbs)

classic_espresso_data <- starbucks %>%
  filter(Beverage_category == 'Classic Espresso Drinks') %>%
  select(calories, caffeine, sugars, total_carbs)

signature_espresso_data <- starbucks %>%
  filter(Beverage_category == 'Signature Espresso Drinks') %>%
  select(calories, caffeine, sugars, total_carbs)

tazo_data <- starbucks %>%
  filter(Beverage_category == 'Tazo® Tea Drinks') %>%
  select(calories, caffeine, sugars, total_carbs)

shaken_iced_data <- starbucks %>%
  filter(Beverage_category == 'Shaken Iced Beverages') %>%
  select(calories, caffeine, sugars, total_carbs)

smoothies_data <- starbucks %>%
  filter(Beverage_category == 'Smoothies') %>%
  select(calories, caffeine, sugars, total_carbs)

frap_blended_data <- starbucks %>%
  filter(Beverage_category == 'Frappuccino® Blended Coffee') %>%
  select(calories, caffeine, sugars, total_carbs)

frap_light_data <- starbucks %>%
  filter(Beverage_category == 'Frappuccino® Light Blended Coffee') %>%
  select(calories, caffeine, sugars, total_carbs)

frap_creme_data <- starbucks %>%
  filter(Beverage_category == 'Frappuccino® Blended Crème') %>%
  select(calories, caffeine, sugars, total_carbs)

# Subset needed columns
starbucks_subset <- starbucks[ , c("Beverage_category", "beverage_type","calories", "caffeine", "sugars", "total_carbs")]

Table

I went and used a version of the data table we worked on last class and added few modifications/arguements. Additions include hover over highlights, cell borders, and removing row names (values).

coffee_table <- datatable(starbucks_subset, rownames = FALSE, filter = 'top', class = 'hover cell-border stripe', options = list(
  pageLength = 10, autoWidth = TRUE, columnDefs = list(list(className = 'dt-center', targets = 0:3))
))

coffee_table

Visualizations

The error bar graphs I created indicate the variability, or uncertainty in reported measurement. They give a general idea of how precise a measurement is, or conversely, how far from the reported value the true value might be.

Unique drinks

ggplot(unique_drinks_df, aes(x = unique_beverage_types, y = Beverage_category)) +
  geom_bar(stat = "identity") +
  labs(title = "Number of Unique Drinks in Each Category",
       x = "Count of Unique Beverages",
       y = "Beverage Category") +
  theme_minimal()

Caffeine Plot

caffeine_CI_plot <- starbucks %>%
  ggplot(aes(x = caffeine, y = Beverage_category)) +
  geom_bar(stat = "summary", fun = "mean", position = "dodge") +
  geom_errorbar(
    stat = "summary",
    fun.data = mean_cl_normal,
    position = position_dodge(width = 0.9),
    width = 0.2,
    size = 1,
    color = "red"
  ) +
  labs(
    title = "Average Caffeine Content in Each Category",
    subtitle = "(Red error bar indicate the variability, or uncertainty in reported measurement)",
    x = "Average Caffeine (mg)",
    y = "Beverage Category"
  ) +
  theme_minimal()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
caffeine_CI_plot
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_summary()`).
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_summary()`).

Sugar Plot

sugar_CI_plot <- starbucks %>%
  ggplot(aes(x = sugars, y = Beverage_category)) +
  geom_bar(stat = "summary", fun = "mean", position = "dodge") +
  geom_errorbar(
    stat = "summary",
    fun.data = mean_cl_normal,
    position = position_dodge(width = 0.9),
    width = 0.2,
    size = 1,
    color = "red"
  ) +
  labs(
    title = "Average Sugar Content in Each Category",
    subtitle = "(Red error bar indicate the variability, or uncertainty in reported measurement)",
    x = "Average sugar (mg)",
    y = "Beverage Category"
  ) +
  theme_minimal()

sugar_CI_plot

Carb Plot

carb_CI_plot <- starbucks %>%
  ggplot(aes(x = total_carbs, y = Beverage_category)) +
  geom_bar(stat = "summary", fun = "mean", position = "dodge") +
  geom_errorbar(
    stat = "summary",
    fun.data = mean_cl_normal,
    position = position_dodge(width = 0.9),
    width = 0.2,
    size = 1,
    color = "red"
  ) +
  labs(
    title = "Average Carb Content in Each Category",
    subtitle = "(Red error bar indicate the variability, or uncertainty in reported measurement)",
    x = "Average carb count",
    y = "Beverage Category"
  ) +
  theme_minimal()

carb_CI_plot

Calorie Plot

calorie_CI_plot <- starbucks %>%
  ggplot(aes(x = calories, y = Beverage_category)) +
  geom_bar(stat = "summary", fun = "mean", position = "dodge") +
  geom_errorbar(
    stat = "summary",
    fun.data = mean_cl_normal,
    position = position_dodge(width = 0.9),
    width = 0.2,
    size = 1,
    color = "red"
  ) +
  labs(
    title = "Average Calorie Content in Each Category with Confidence Intervals",
    subtitle = "(Red error bar indicate the variability, or uncertainty in reported measurement)",
    x = "Average calorie count",
    y = "Beverage Category"
  ) +
  theme_minimal()

calorie_CI_plot

Static plot for Caffeine by Beverage Category

caffeine_plotly <- starbucks %>%
  ggplot(aes(x=caffeine, y=Beverage_category, color=Beverage_category)) +
  geom_point() +
  labs(title = "Caffeine content in each drink", 
       subtitle = "(Some offerings do not have caffeine)",
       x = "Average Caffeine (mg)", y = "Beverage Category",
       color = "Beverage Category",
       caption = "Static version") +
  scale_color_discrete(name="Beverage Category") +
  theme_bw()

caffeine_plotly
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

Conclusion

My intention was to find the highest caffeinated drink without the extra sugar, carb, and calories. Analyzing/visualizing this data made it apparent that out of all the options offered at Starbucks, I am better off with just sticking to a regular old coffee offering they have on their menu without resorting to any of their other offerings.

The End