Data 110 Final Project

Author

Shadeja Fuentes

Introduction

Starbucks was founded in 1971 in Seattle, Washington, by Jerry Baldwin, Zev Siegl, and Gordon Bowker. It began as a small store selling premium whole-bean coffee. Under the leadership of Howard Schultz, who joined the company in 1982, Starbucks was transformed into a global brand. Starbucks went public in 1992 and expanded rapidly, opening stores worldwide. Starbucks became an iconic destination for coffee lovers because of its inviting store design and customized drinks. Today, it continues to innovate and maintain its position as a leading global coffeehouse chain, offering an extensive range of beverages and products while shaping the coffee culture around the world.

First create a setup chunk, then load in the appropriate libraries, and finally the csv file.

library(tidyverse)
library(RColorBrewer)
library(plotly)
library(DT)
library(DataExplorer)
starbucks_drink_menu <- read_csv("starbucks_drinkMenu_expanded.csv") 

As one of their most loyal customers I make frequent visits to Starbucks and am curious in exploring the nutrion facts of its drink menu. Below is a preview of the dataset.

starbucks <-
  datatable(starbucks_drink_menu, rownames = FALSE,filter = "top", options = list(scrollX = TRUE))
starbucks

View the structure of the data to begin cleaning.

summary(starbucks_drink_menu)
 Beverage_category    Beverage         Beverage_prep         Calories    
 Length:242         Length:242         Length:242         Min.   :  0.0  
 Class :character   Class :character   Class :character   1st Qu.:120.0  
 Mode  :character   Mode  :character   Mode  :character   Median :185.0  
                                                          Mean   :193.9  
                                                          3rd Qu.:260.0  
                                                          Max.   :510.0  
 Total Fat (g)      Trans Fat (g)   Saturated Fat (g)  Sodium (mg)    
 Length:242         Min.   :0.000   Min.   :0.0000    Min.   : 0.000  
 Class :character   1st Qu.:0.100   1st Qu.:0.0000    1st Qu.: 0.000  
 Mode  :character   Median :0.500   Median :0.0000    Median : 5.000  
                    Mean   :1.307   Mean   :0.0376    Mean   : 6.364  
                    3rd Qu.:2.000   3rd Qu.:0.1000    3rd Qu.:10.000  
                    Max.   :9.000   Max.   :0.3000    Max.   :40.000  
 Total Carbohydrates (g) Cholesterol (mg) Dietary Fibre (g)   Sugars (g)   
 Min.   :  0.0           Min.   : 0.00    Min.   :0.0000    Min.   : 0.00  
 1st Qu.: 70.0           1st Qu.:21.00    1st Qu.:0.0000    1st Qu.:18.00  
 Median :125.0           Median :34.00    Median :0.0000    Median :32.00  
 Mean   :128.9           Mean   :35.99    Mean   :0.8058    Mean   :32.96  
 3rd Qu.:170.0           3rd Qu.:50.75    3rd Qu.:1.0000    3rd Qu.:43.75  
 Max.   :340.0           Max.   :90.00    Max.   :8.0000    Max.   :84.00  
  Protein (g)     Vitamin A (% DV)   Vitamin C (% DV)   Calcium (% DV)    
 Min.   : 0.000   Length:242         Length:242         Length:242        
 1st Qu.: 3.000   Class :character   Class :character   Class :character  
 Median : 6.000   Mode  :character   Mode  :character   Mode  :character  
 Mean   : 6.979                                                           
 3rd Qu.:10.000                                                           
 Max.   :20.000                                                           
 Iron (% DV)        Caffeine (mg)     
 Length:242         Length:242        
 Class :character   Class :character  
 Mode  :character   Mode  :character  
                                      
                                      
                                      

Change the column names to simple names by removing (g), (mg), and (%DV), then making them lowercase.

starbies <- starbucks_drink_menu %>% 
                    rename("bev_category" = "Beverage_category", "beverage" = "Beverage" , "bev_prep" = "Beverage_prep", "calories"= "Calories","total_fat" = "Total Fat (g)", "trans_fat" =  "Trans Fat (g)","saturated_fat" =  "Saturated Fat (g)", "sodium_mg" = "Sodium (mg)", "total_carbs" =  "Total Carbohydrates (g)","chol_mg" =  "Cholesterol (mg)", "fiber" =  "Dietary Fibre (g)", "sugar" =  "Sugars (g)", "protein" = "Protein (g)", "vitamin_a_dv" = "Vitamin A (% DV)", "vitamin_c_dv" = "Vitamin C (% DV)","calcium_dv" = "Calcium (% DV)", "iron_dv" = "Iron (% DV)", "caffeine_mg" =  "Caffeine (mg)")

Remove the % from the values in the vitamin_a_dv, vitamin_c_dv, calcium_dv, and iron_dv columns.

starbies$vitamin_a_dv <- gsub("%", "", starbies$vitamin_a_dv)
starbies$vitamin_c_dv <- gsub("%", "", starbies$vitamin_c_dv)
starbies$calcium_dv <- gsub("%", "", starbies$calcium_dv)
starbies$iron_dv <- gsub("%", "", starbies$iron_dv)

Convert vitamin_a_dv, vitamin_c_dv, calcium_dv, iron_dv, caffeine_mg, and total_Fat to numeric.

starbies$vitamin_a_dv<- as.numeric(starbies$vitamin_a_dv)
starbies$vitamin_c_dv <- as.numeric(starbies$vitamin_c_dv)
starbies$calcium_dv <- as.numeric(starbies$calcium_dv)
starbies$iron_dv <- as.numeric(starbies$iron_dv)
starbies$caffeine_mg <- as.numeric(starbies$caffeine_mg)
starbies$total_fat <- as.numeric(starbies$total_fat)

Carbohydrates in Starbucks Drinks

The starbies dataset includes the total carbohydrates in grams for all of its beverages. According to the FDA, the daily value (dv) for total carbohydrates is 275 grams per day, based on a 2000 calorie diet. Which beverages offer the least amount of carbs and calories?

p1 <- starbies %>%
  ggplot(aes(x = total_carbs, fill = bev_category)) +
  geom_histogram ( position = "stack", alpha=0.6, color = "#2c220A", binwidth = 20) +
  ggtitle("Total Carbs in Drinks by Category") +
   labs(x = "Total Carbs", y = "Count")+
 scale_fill_brewer(palette= "BrBG", name = "Beverage Category")+
  theme(panel.background = element_rect(fill = "#fcebd2"))
p1 <- ggplotly(p1)
p1

Calories in Starbucks Drinks

By comparing the boxplots between different beverage categories, you can determine which categories tend to have higher or lower calorie values. Categories with larger boxplots and higher median lines generally indicate higher calorie content, while smaller boxplots and lower median lines suggest lower calorie content. The outlier is very interesting. While Tazo Tea Drinks are typically low calorie, there is a drink in the category that is 450 calories!

mean(starbies$calories)
[1] 193.8719

Exploring correlations between sugar (g) and calories

Findings published by the American Heart Association state that the average American adult consumes an average of 17 teaspoons of added sugar a day. This adds up to a whopping 60 pounds annually. Does the sugar content in Starbucks drinks effect calorie count?

p3<- starbies %>% ggplot(aes(x= calories, y = sugar)) + 
  geom_point(color = "#127131") + 
  geom_smooth(method = "loess", color = "tan", fill = "#cfe4db") +
  ggtitle("Linear Regression of Sugar to Calories") + 
  labs(x = "Calories", y = "Sugar (g)") +
theme_minimal()
p3 

The plot shows the relationship between calories and sugar content in the starbies dataset.The smoothed line generated by the “loess” method provides an overall trend of the relationship between calories and sugar. The line slopes upward indicating a positive correlation which implies that as the calorie content increases, so does the sugar content.The data points appear to be more tightly clustered around the line which suggests a stronger association between the two variables.

Average calories and caffeine

The treemap illustrates the different beverage categories. The size of box corresponds to the average caffeine in miligrams found in the beverages of that category. The colors indicate the average calories in beverages of that category labled.

Beverage Category Heatmap

Correlations in nutrition content

plot_correlation(na.omit(starbies), maxcat = 5L) +
  ggthemes::scale_colour_fivethirtyeight() + labs(x = "Nutrition", y = "Nutrition")

The correlation plot indicates that there is a positive correlation between sugar and cholesterol, and cholesterol and calories according to the correlation plot.

Conclusion

Based on the analysis of the Starbucks drink menu dataset, several interesting insights can be drawn regarding the nutrition facts of their beverages.

  • The Frappuccino Blended Coffee category has the highest total carbohydrates, exceeding the recommended daily value of 275 grams. On the other hand, the Tazo Tea Drinks category offers the most low-carbohydrate options, with many beverages containing less than 100 grams. Shaken Iced Beverages provide the most zero-carb options
  • The analysis of calorie content across different beverage categories reveals variations in caloric intake. Categories with larger boxplots and higher median lines generally indicate higher-calorie beverages, while smaller boxplots and lower median lines suggest lower-calorie options. Notably, there is an outlier in the Tazo Tea Drinks category, with a beverage containing 450 calories.
  • Sugar and Calories: There is a positive correlation between the sugar content and calorie count of Starbucks beverages. As the calorie content increases, so does the sugar content. This finding aligns with the general trend of sugary drinks being higher in calories.
  • Average Calories and Caffeine: The treemap visualization showcases different beverage categories based on the average caffeine content (box size) and average calorie content (colors). It provides a comprehensive overview of the variations in caffeine and calorie levels across categories.
  • Correlations in Nutrition Content: The correlation plot reveals a positive correlation between sugar and cholesterol, as well as cholesterol and calories. This suggests that beverages with higher sugar content also tend to have higher cholesterol and calorie levels.

In conclusion, analyzing the Starbucks drink menu dataset provides valuable insights into the nutrition facts of their beverages. It highlights the varying levels of carbohydrates, calories, sugar, and caffeine across different categories which help to make informed choices when visiting Starbucks’ stores.