library(tidyverse)
library(RColorBrewer)
library(plotly)
library(DT)
library(DataExplorer)Data 110 Final Project
Introduction
First create a setup chunk, then load in the appropriate libraries, and finally the csv file.
starbucks_drink_menu <- read_csv("starbucks_drinkMenu_expanded.csv") View the structure of the data to begin cleaning.
summary(starbucks_drink_menu) Beverage_category Beverage Beverage_prep Calories
Length:242 Length:242 Length:242 Min. : 0.0
Class :character Class :character Class :character 1st Qu.:120.0
Mode :character Mode :character Mode :character Median :185.0
Mean :193.9
3rd Qu.:260.0
Max. :510.0
Total Fat (g) Trans Fat (g) Saturated Fat (g) Sodium (mg)
Length:242 Min. :0.000 Min. :0.0000 Min. : 0.000
Class :character 1st Qu.:0.100 1st Qu.:0.0000 1st Qu.: 0.000
Mode :character Median :0.500 Median :0.0000 Median : 5.000
Mean :1.307 Mean :0.0376 Mean : 6.364
3rd Qu.:2.000 3rd Qu.:0.1000 3rd Qu.:10.000
Max. :9.000 Max. :0.3000 Max. :40.000
Total Carbohydrates (g) Cholesterol (mg) Dietary Fibre (g) Sugars (g)
Min. : 0.0 Min. : 0.00 Min. :0.0000 Min. : 0.00
1st Qu.: 70.0 1st Qu.:21.00 1st Qu.:0.0000 1st Qu.:18.00
Median :125.0 Median :34.00 Median :0.0000 Median :32.00
Mean :128.9 Mean :35.99 Mean :0.8058 Mean :32.96
3rd Qu.:170.0 3rd Qu.:50.75 3rd Qu.:1.0000 3rd Qu.:43.75
Max. :340.0 Max. :90.00 Max. :8.0000 Max. :84.00
Protein (g) Vitamin A (% DV) Vitamin C (% DV) Calcium (% DV)
Min. : 0.000 Length:242 Length:242 Length:242
1st Qu.: 3.000 Class :character Class :character Class :character
Median : 6.000 Mode :character Mode :character Mode :character
Mean : 6.979
3rd Qu.:10.000
Max. :20.000
Iron (% DV) Caffeine (mg)
Length:242 Length:242
Class :character Class :character
Mode :character Mode :character
Change the column names to simple names by removing (g), (mg), and (%DV), then making them lowercase.
starbies <- starbucks_drink_menu %>%
rename("bev_category" = "Beverage_category", "beverage" = "Beverage" , "bev_prep" = "Beverage_prep", "calories"= "Calories","total_fat" = "Total Fat (g)", "trans_fat" = "Trans Fat (g)","saturated_fat" = "Saturated Fat (g)", "sodium_mg" = "Sodium (mg)", "total_carbs" = "Total Carbohydrates (g)","chol_mg" = "Cholesterol (mg)", "fiber" = "Dietary Fibre (g)", "sugar" = "Sugars (g)", "protein" = "Protein (g)", "vitamin_a_dv" = "Vitamin A (% DV)", "vitamin_c_dv" = "Vitamin C (% DV)","calcium_dv" = "Calcium (% DV)", "iron_dv" = "Iron (% DV)", "caffeine_mg" = "Caffeine (mg)")Remove the % from the values in the vitamin_a_dv, vitamin_c_dv, calcium_dv, and iron_dv columns.
starbies$vitamin_a_dv <- gsub("%", "", starbies$vitamin_a_dv)
starbies$vitamin_c_dv <- gsub("%", "", starbies$vitamin_c_dv)
starbies$calcium_dv <- gsub("%", "", starbies$calcium_dv)
starbies$iron_dv <- gsub("%", "", starbies$iron_dv)Convert vitamin_a_dv, vitamin_c_dv, calcium_dv, iron_dv, caffeine_mg, and total_Fat to numeric.
starbies$vitamin_a_dv<- as.numeric(starbies$vitamin_a_dv)
starbies$vitamin_c_dv <- as.numeric(starbies$vitamin_c_dv)
starbies$calcium_dv <- as.numeric(starbies$calcium_dv)
starbies$iron_dv <- as.numeric(starbies$iron_dv)
starbies$caffeine_mg <- as.numeric(starbies$caffeine_mg)
starbies$total_fat <- as.numeric(starbies$total_fat)Carbohydrates in Starbucks Drinks
The starbies dataset includes the total carbohydrates in grams for all of its beverages. According to the FDA, the daily value (dv) for total carbohydrates is 275 grams per day, based on a 2000 calorie diet. Which beverages offer the least amount of carbs and calories?
p1 <- starbies %>%
ggplot(aes(x = total_carbs, fill = bev_category)) +
geom_histogram ( position = "stack", alpha=0.6, color = "#2c220A", binwidth = 20) +
ggtitle("Total Carbs in Drinks by Category") +
labs(x = "Total Carbs", y = "Count")+
scale_fill_brewer(palette= "BrBG", name = "Beverage Category")+
theme(panel.background = element_rect(fill = "#fcebd2"))
p1 <- ggplotly(p1)
p1Based on the graph above the Frappucino Blended Coffee bevarages have the highest total carbohydrates at 340 g, way more than the recommended daily value. The Tazo Tea Drinks category that contains the most low carbohydrate drinks with less than 100 g. Shaken Iced Bevarages offers the most 0 carb options.
Calories in Starbucks Drinks
According to the FDA, for a 33 year old moderately active young woman, the recommended daily caloric intake is about 2000 calories. This is an ambitious goal considering my love of Starbucks’ beautifully crafted drinks.
p2 <- starbies %>%
ggplot(aes(bev_category, calories, fill = bev_category)) +
ggtitle("Calories by Bevarage Category") +
labs(x = "Beverage Category", y = "Calories") +
geom_boxplot() +
scale_fill_brewer(palette= "BrBG", name = "Beverage Category") + theme(panel.background = element_rect(fill = "#fcebd2"), axis.text.x = element_text(angle = 40, hjust = 1))
p2 <-ggplotly(p2)
p2By comparing the boxplots between different beverage categories, you can determine which categories tend to have higher or lower calorie values. Categories with larger boxplots and higher median lines generally indicate higher calorie content, while smaller boxplots and lower median lines suggest lower calorie content. The outlier is very interesting. While Tazo Tea Drinks are typically low calorie, there is a drink in the category that is 450 calories!
mean(starbies$calories)[1] 193.8719
Exploring correlations between sugar (g) and calories
Findings published by the American Heart Association state that the average American adult consumes an average of 17 teaspoons of added sugar a day. This adds up to a whopping 60 pounds annually. Does the sugar content in Starbucks drinks effect calorie count?
p3<- starbies %>% ggplot(aes(x= calories, y = sugar)) +
geom_point(color = "#127131") +
geom_smooth(method = "loess", color = "tan", fill = "#cfe4db") +
ggtitle("Linear Regression of Sugar to Calories") +
labs(x = "Calories", y = "Sugar (g)") +
theme_minimal()
p3 The plot shows the relationship between calories and sugar content in the starbies dataset.The smoothed line generated by the “loess” method provides an overall trend of the relationship between calories and sugar. The line slopes upward indicating a positive correlation which implies that as the calorie content increases, so does the sugar content.The data points appear to be more tightly clustered around the line which suggests a stronger association between the two variables.
Average calories and caffeine
The treemap illustrates the different beverage categories. The size of box corresponds to the average caffeine in miligrams found in the beverages of that category. The colors indicate the average calories in beverages of that category labled.
Correlations in nutrition content
plot_correlation(na.omit(starbies), maxcat = 5L) +
ggthemes::scale_colour_fivethirtyeight() + labs(x = "Nutrition", y = "Nutrition")