McDonald’s is the world’s largest restaurant chain by revenue, serving over 69 million customers daily in over 100 countries across 37,855 outlets as of 2018. McDonald’s food is globally regarded as rather unhealthy for us, but is it really true?
The data I am using is Nutrition Facts for McDonald’s Menu created 2017 from Kaggle. This dataset provides a nutrition analysis of every menu item on the US McDonald’s menu, including breakfast, beef burgers, chicken and fish sandwiches, fries, salads, soda, coffee and tea, milkshakes, and desserts.
In this markdown, I want to explore the distribution of calories across different menu items, food categories and serving sizes. Besides, I want to do some testing to see if there is a significant difference between eating fried chicken or grilled chicken at McDonald’s.
# setting up libraries
library(tidyverse)#include ggplot2,dplyr, readr
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2 v purrr 0.3.4
## v tibble 3.0.4 v dplyr 1.0.2
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(gridExtra) #arrange multiple graph on a page
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
library(gsubfn) #utilities for strings and function arguments
## Loading required package: proto
#inporting the data
menu<-read.csv("C:/Users/TM37075/Documents/UM Master of Data Science/WQD7004 Programming in Data Science/menu.csv",header=TRUE,sep=',')
#Dimension of the Data
dim(menu)
## [1] 260 24
#Structure of the Data
str(menu)
## 'data.frame': 260 obs. of 24 variables:
## $ Category : chr "Breakfast" "Breakfast" "Breakfast" "Breakfast" ...
## $ Item : chr "Egg McMuffin" "Egg White Delight" "Sausage McMuffin" "Sausage McMuffin with Egg" ...
## $ Serving.Size : chr "4.8 oz (136 g)" "4.8 oz (135 g)" "3.9 oz (111 g)" "5.7 oz (161 g)" ...
## $ Calories : int 300 250 370 450 400 430 460 520 410 470 ...
## $ Calories.from.Fat : int 120 70 200 250 210 210 230 270 180 220 ...
## $ Total.Fat : num 13 8 23 28 23 23 26 30 20 25 ...
## $ Total.Fat....Daily.Value. : int 20 12 35 43 35 36 40 47 32 38 ...
## $ Saturated.Fat : num 5 3 8 10 8 9 13 14 11 12 ...
## $ Saturated.Fat....Daily.Value.: int 25 15 42 52 42 46 65 68 56 59 ...
## $ Trans.Fat : num 0 0 0 0 0 1 0 0 0 0 ...
## $ Cholesterol : int 260 25 45 285 50 300 250 250 35 35 ...
## $ Cholesterol....Daily.Value. : int 87 8 15 95 16 100 83 83 11 11 ...
## $ Sodium : int 750 770 780 860 880 960 1300 1410 1300 1420 ...
## $ Sodium....Daily.Value. : int 31 32 33 36 37 40 54 59 54 59 ...
## $ Carbohydrates : int 31 30 29 30 30 31 38 43 36 42 ...
## $ Carbohydrates....Daily.Value.: int 10 10 10 10 10 10 13 14 12 14 ...
## $ Dietary.Fiber : int 4 4 4 4 4 4 2 3 2 3 ...
## $ Dietary.Fiber....Daily.Value.: int 17 17 17 17 17 18 7 12 7 12 ...
## $ Sugars : int 3 3 2 2 2 3 3 4 3 4 ...
## $ Protein : int 17 18 14 21 21 26 19 19 20 20 ...
## $ Vitamin.A....Daily.Value. : int 10 6 8 15 6 15 10 15 2 6 ...
## $ Vitamin.C....Daily.Value. : int 0 0 0 0 0 2 8 8 8 8 ...
## $ Calcium....Daily.Value. : int 25 25 25 30 25 30 15 20 15 15 ...
## $ Iron....Daily.Value. : int 15 8 10 15 10 20 15 20 10 15 ...
#Summary of the data
summary(menu)
## Category Item Serving.Size Calories
## Length:260 Length:260 Length:260 Min. : 0.0
## Class :character Class :character Class :character 1st Qu.: 210.0
## Mode :character Mode :character Mode :character Median : 340.0
## Mean : 368.3
## 3rd Qu.: 500.0
## Max. :1880.0
## Calories.from.Fat Total.Fat Total.Fat....Daily.Value. Saturated.Fat
## Min. : 0.0 Min. : 0.000 Min. : 0.00 Min. : 0.000
## 1st Qu.: 20.0 1st Qu.: 2.375 1st Qu.: 3.75 1st Qu.: 1.000
## Median : 100.0 Median : 11.000 Median : 17.00 Median : 5.000
## Mean : 127.1 Mean : 14.165 Mean : 21.82 Mean : 6.008
## 3rd Qu.: 200.0 3rd Qu.: 22.250 3rd Qu.: 35.00 3rd Qu.:10.000
## Max. :1060.0 Max. :118.000 Max. :182.00 Max. :20.000
## Saturated.Fat....Daily.Value. Trans.Fat Cholesterol
## Min. : 0.00 Min. :0.0000 Min. : 0.00
## 1st Qu.: 4.75 1st Qu.:0.0000 1st Qu.: 5.00
## Median : 24.00 Median :0.0000 Median : 35.00
## Mean : 29.97 Mean :0.2038 Mean : 54.94
## 3rd Qu.: 48.00 3rd Qu.:0.0000 3rd Qu.: 65.00
## Max. :102.00 Max. :2.5000 Max. :575.00
## Cholesterol....Daily.Value. Sodium Sodium....Daily.Value.
## Min. : 0.00 Min. : 0.0 Min. : 0.00
## 1st Qu.: 2.00 1st Qu.: 107.5 1st Qu.: 4.75
## Median : 11.00 Median : 190.0 Median : 8.00
## Mean : 18.39 Mean : 495.8 Mean : 20.68
## 3rd Qu.: 21.25 3rd Qu.: 865.0 3rd Qu.: 36.25
## Max. :192.00 Max. :3600.0 Max. :150.00
## Carbohydrates Carbohydrates....Daily.Value. Dietary.Fiber
## Min. : 0.00 Min. : 0.00 Min. :0.000
## 1st Qu.: 30.00 1st Qu.:10.00 1st Qu.:0.000
## Median : 44.00 Median :15.00 Median :1.000
## Mean : 47.35 Mean :15.78 Mean :1.631
## 3rd Qu.: 60.00 3rd Qu.:20.00 3rd Qu.:3.000
## Max. :141.00 Max. :47.00 Max. :7.000
## Dietary.Fiber....Daily.Value. Sugars Protein
## Min. : 0.000 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.000 1st Qu.: 5.75 1st Qu.: 4.00
## Median : 5.000 Median : 17.50 Median :12.00
## Mean : 6.531 Mean : 29.42 Mean :13.34
## 3rd Qu.:10.000 3rd Qu.: 48.00 3rd Qu.:19.00
## Max. :28.000 Max. :128.00 Max. :87.00
## Vitamin.A....Daily.Value. Vitamin.C....Daily.Value. Calcium....Daily.Value.
## Min. : 0.00 Min. : 0.000 Min. : 0.00
## 1st Qu.: 2.00 1st Qu.: 0.000 1st Qu.: 6.00
## Median : 8.00 Median : 0.000 Median :20.00
## Mean : 13.43 Mean : 8.535 Mean :20.97
## 3rd Qu.: 15.00 3rd Qu.: 4.000 3rd Qu.:30.00
## Max. :170.00 Max. :240.000 Max. :70.00
## Iron....Daily.Value.
## Min. : 0.000
## 1st Qu.: 0.000
## Median : 4.000
## Mean : 7.735
## 3rd Qu.:15.000
## Max. :40.000
#First 3 rows of the Data
head(menu,3)
## Category Item Serving.Size Calories Calories.from.Fat
## 1 Breakfast Egg McMuffin 4.8 oz (136 g) 300 120
## 2 Breakfast Egg White Delight 4.8 oz (135 g) 250 70
## 3 Breakfast Sausage McMuffin 3.9 oz (111 g) 370 200
## Total.Fat Total.Fat....Daily.Value. Saturated.Fat
## 1 13 20 5
## 2 8 12 3
## 3 23 35 8
## Saturated.Fat....Daily.Value. Trans.Fat Cholesterol
## 1 25 0 260
## 2 15 0 25
## 3 42 0 45
## Cholesterol....Daily.Value. Sodium Sodium....Daily.Value. Carbohydrates
## 1 87 750 31 31
## 2 8 770 32 30
## 3 15 780 33 29
## Carbohydrates....Daily.Value. Dietary.Fiber Dietary.Fiber....Daily.Value.
## 1 10 4 17
## 2 10 4 17
## 3 10 4 17
## Sugars Protein Vitamin.A....Daily.Value. Vitamin.C....Daily.Value.
## 1 3 17 10 0
## 2 3 18 6 0
## 3 2 14 8 0
## Calcium....Daily.Value. Iron....Daily.Value.
## 1 25 15
## 2 25 8
## 3 25 10
#Category of the McDonald's menu
unique(menu[,1])
## [1] "Breakfast" "Beef & Pork" "Chicken & Fish"
## [4] "Salads" "Snacks & Sides" "Desserts"
## [7] "Beverages" "Coffee & Tea" "Smoothies & Shakes"
#If any of the columns having missing values
colSums(is.na(menu))
## Category Item
## 0 0
## Serving.Size Calories
## 0 0
## Calories.from.Fat Total.Fat
## 0 0
## Total.Fat....Daily.Value. Saturated.Fat
## 0 0
## Saturated.Fat....Daily.Value. Trans.Fat
## 0 0
## Cholesterol Cholesterol....Daily.Value.
## 0 0
## Sodium Sodium....Daily.Value.
## 0 0
## Carbohydrates Carbohydrates....Daily.Value.
## 0 0
## Dietary.Fiber Dietary.Fiber....Daily.Value.
## 0 0
## Sugars Protein
## 0 0
## Vitamin.A....Daily.Value. Vitamin.C....Daily.Value.
## 0 0
## Calcium....Daily.Value. Iron....Daily.Value.
## 0 0
This data contains 260 items of McDonald’s menu and 24 columns.The menu is divided into 9 categories: Beef & Pork, Beverages, Breakfast, Chicken & Fish, Coffee & Tea, Desserts, Salads, Smoothies & Shakes, Snacks & Sides.
This data in the first look seems tidy as it doesn’t contain any missing value. However, one of the objective is to look at the relationships between serving size and calories but the serving size variable is econded as factor in the data set. It will be converted into a single numeric variable represented by grams for food and milliliters for drinks.
#select drinks data that contain "fl oz" and "carton" string
#Convert fl oz to milliliters (1 oz = 29.5735 ml)
##"In raw data, string carton as 1 carton (236 ml), we want to seperate numbers from the string
drink.oz<-menu[str_detect(menu$Serving.Size," fl oz.*"),]
drink.carton<-menu[str_detect(menu$Serving.Size,"carton"),]
drink.oz$Serving.Size <- round(as.numeric(gsub(" fl oz.*","",drink.oz$Serving.Size))*29.5735,0)
drink.carton$Serving.Size<-round(as.numeric(gsub(".*\\((.*)\\ml).*","\\1",drink.carton$Serving.Size)),0)
#Note: \\ use to capture groups by enclosing patterns, \\1 : captured pattern 1
#select food data that contain "g" string and seperate the numbers
food.g<-menu[str_detect(menu$Serving.Size,"g"),]
food.g$Serving.Size<-round(as.numeric(gsub(".*\\((.*)\\ g.*","\\1",food.g$Serving.Size)),0)
#combine 3 above data frames by rbind() into new data frame
#create a new column with type and unit
drink.oz$Serving.Unit<-rep("drinks/ml",nrow(drink.oz))
drink.carton$Serving.Unit<-rep("drinks/ml",nrow(drink.carton))
food.g$Serving.Unit<-rep("food/grams",nrow(food.g))
newMenu<-rbind(drink.oz,drink.carton,food.g)
#Verify data on new data frame
dim(newMenu)
## [1] 260 25
For Data Cleaning, since the serving size variable in the original data is string: fl oz cup, oz (..g), carton(..ml) and cookie(..g), I am using some r function to assist me extracting value from the data.
str_detect(string,pattern) to select data with specific string
as.numeric() to convert column into numeric value
gsub(pattern,replacement,stringvector) to replace all matches of a string
After that, we combine those data frame into new one and verify on the dimension. The data is now ready for analysis.
There are various cateogry that have the option of choosing either crispy or grilled chicken. My assumption is that crispy option is slightly unhealthier as crispy required to be deep fried which is oily, but is that really true?
#Extract all items contain Crispy and Grilled string
crispy<-newMenu[str_detect(newMenu$Item,"Crispy"),]
grilled<-newMenu[str_detect(newMenu$Item,"Grilled"),]
#Combining two dataframes into one and a new column type
crispy$type<-rep("crispy",nrow(crispy))
grilled$type<-rep("grilled",nrow(grilled))
chicken<-rbind(crispy,grilled)
# barplot of the calories for each type of chicken
p1<-ggplot(chicken)+geom_bar(aes(y=Calories,x=type),fill = "lightblue",stat = "summary",fun="mean",width=0.2)+coord_flip()+ggtitle("Calories")
# barplot of the total fat percent daily value for each type of chicken
total.fat.percentage<-chicken[,7]
p2<-ggplot(chicken)+geom_bar(aes(y=total.fat.percentage,x=type),fill = "lightblue",stat = "summary",fun="mean",width=0.2)+coord_flip()+ggtitle("Total Fat (% Daily Value)")
# barplot of the cholesterol percent daily value for each type of chicken
cholesterol.percentage<-chicken[,12]
p3<-ggplot(chicken)+geom_bar(aes(y=cholesterol.percentage,x=type),fill = "lightblue",stat = "summary",fun="mean",width=0.2)+coord_flip()+ggtitle("Cholesterol (% Daily Value)")
# barplot of the sodium percent daily value for each type of chicken
sodium.percentage<-chicken[,14]
p4<-ggplot(chicken)+geom_bar(aes(y=sodium.percentage,x=type),size=5,fill = "lightblue",stat = "summary",fun="mean",width=0.2)+coord_flip()+ggtitle("Sodium (% Daily Value)")
# barplot of the carbohydrates percent daily value for each type of chicken
carbohydrates.percentage<-chicken[,16]
p5<-ggplot(chicken)+geom_bar(aes(y=carbohydrates.percentage,x=type),size=5,fill = "lightblue",stat = "summary",fun="mean",width=0.2)+coord_flip()+ggtitle("Carbohydrates (% Daily Value)")
# barplot of the dietary fiber percent daily value for each type of chicken
dietary.fiber.percentage<-chicken[,18]
p6<-ggplot(chicken)+geom_bar(aes(y=dietary.fiber.percentage,x=type),size=5,fill = "lightblue",stat = "summary",fun="mean",width=0.2)+coord_flip()+ggtitle("Dietary Fiber (% Daily Value)")
# barplot of the vitamin A fiber percent daily value for each type of chicken
vitamin.A.percentarge<-chicken[,21]
p7<-ggplot(chicken)+geom_bar(aes(y=vitamin.A.percentarge,x=type),size=5,fill = "lightblue",stat = "summary",fun="mean",width=0.2)+coord_flip()+ggtitle("Vitamin A (% Daily Value)")
# barplot of the vitamin C percent daily value for each type of chicken
vitamin.C.percentage<-chicken[,22]
p8<-ggplot(chicken)+geom_bar(aes(y=vitamin.C.percentage,x=type),size=5,fill = "lightblue",stat = "summary",fun="mean",width=0.2)+coord_flip()+ggtitle("Vitamin C (% Daily Value)")
# barplot of the calcium percent daily value for each type of chicken
calcium.percentage<-chicken[,23]
p9<-ggplot(chicken)+geom_bar(aes(y=calcium.percentage,x=type),size=5,fill = "lightblue",stat = "summary",fun="mean",width=0.2)+coord_flip()+ggtitle("Calcium (% Daily Value)")
# barplot of the iron percent daily value for each type of chicken
iron.percentage<-chicken[,24]
p10<-ggplot(chicken)+geom_bar(aes(y=iron.percentage,x=type),size=5,fill = "lightblue",stat = "summary",fun="mean",width=0.2)+coord_flip()+ggtitle("Iron (% Daily Value)")
grid.arrange(p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,nrow=5)
From the barplot, it seems that grilled chicken has lower calories, total fat, sodium, carbohydrates and a higer dietary fiber, vitamin A, and vitamin C for the same menu item with crispy chicken in it.
The only aspect in which crispy chicken may be potentially “healthier” is that it has lower mean cholesterol percent daily value than grilled chicken.
Lastly, there appear to be no significant difference for calcium and iron between grilled chicken and crispy chicken.
#Filter Calories less than 150
zeroCaloriesItem<- newMenu %>%
group_by(Item,Calories) %>%
filter(Calories < 150) %>%
tally(sort = TRUE) %>%
arrange(desc(Calories))
zeroCaloriesItem %>% print(n=Inf)
## # A tibble: 41 x 3
## # Groups: Item [41]
## Item Calories n
## <chr> <int> <int>
## 1 Coca-Cola Classic (Small) 140 1
## 2 Dr Pepper (Small) 140 1
## 3 Nonfat Latte with Sugar Free French Vanilla Syrup (Small) 140 1
## 4 Premium Bacon Ranch Salad (without Chicken) 140 1
## 5 Premium Southwest Salad (without Chicken) 140 1
## 6 Regular Iced Coffee (Small) 140 1
## 7 Sprite (Small) 140 1
## 8 Caramel Iced Coffee (Small) 130 1
## 9 Fat Free Chocolate Milk Jug 130 1
## 10 Hazelnut Iced Coffee (Small) 130 1
## 11 Nonfat Latte (Medium) 130 1
## 12 French Vanilla Iced Coffee (Small) 120 1
## 13 Iced Coffee with Sugar Free French Vanilla Syrup (Medium) 120 1
## 14 Kids French Fries 110 1
## 15 Sweet Tea (Child) 110 1
## 16 1% Low Fat Milk Jug 100 1
## 17 Coca-Cola Classic (Child) 100 1
## 18 Dr Pepper (Child) 100 1
## 19 Nonfat Latte (Small) 100 1
## 20 Sprite (Child) 100 1
## 21 Iced Coffee with Sugar Free French Vanilla Syrup (Small) 80 1
## 22 Minute Maid 100% Apple Juice Box 80 1
## 23 Kids Ice Cream Cone 45 1
## 24 Side Salad 20 1
## 25 Apple Slices 15 1
## 26 Coffee (Large) 0 1
## 27 Coffee (Medium) 0 1
## 28 Coffee (Small) 0 1
## 29 Dasani Water Bottle 0 1
## 30 Diet Coke (Child) 0 1
## 31 Diet Coke (Large) 0 1
## 32 Diet Coke (Medium) 0 1
## 33 Diet Coke (Small) 0 1
## 34 Diet Dr Pepper (Child) 0 1
## 35 Diet Dr Pepper (Large) 0 1
## 36 Diet Dr Pepper (Medium) 0 1
## 37 Diet Dr Pepper (Small) 0 1
## 38 Iced Tea (Child) 0 1
## 39 Iced Tea (Large) 0 1
## 40 Iced Tea (Medium) 0 1
## 41 Iced Tea (Small) 0 1
For regression problem, is there a relationship between menu, food categories and serving sizes, as we observed from the distribution graph, we found that there is a linear increase in Calories with Serving Size in food category but not in drinks, the spread of caloric value is just too huge.
For classification problem, does the grilled chicken a healthier choice or the crispy chicken, we can conclude that grilled chicken has overall lower calories, total fat, carbohydrates, richer in vitamins. However, crispy chicken we found out has lower mean cholesterol percent daily value than grilled chicken.