1 Introduction

McDonald’s is the world’s largest restaurant chain by revenue, serving over 69 million customers daily in over 100 countries across 37,855 outlets as of 2018. McDonald’s food is globally regarded as rather unhealthy for us, but is it really true?

1.1 Dataset & Objective

The data I am using is Nutrition Facts for McDonald’s Menu created 2017 from Kaggle. This dataset provides a nutrition analysis of every menu item on the US McDonald’s menu, including breakfast, beef burgers, chicken and fish sandwiches, fries, salads, soda, coffee and tea, milkshakes, and desserts.

In this markdown, I want to explore the distribution of calories across different menu items, food categories and serving sizes. Besides, I want to do some testing to see if there is a significant difference between eating fried chicken or grilled chicken at McDonald’s.

2 Data Preparation

# setting up libraries
library(tidyverse)#include ggplot2,dplyr, readr
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2     v purrr   0.3.4
## v tibble  3.0.4     v dplyr   1.0.2
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(gridExtra) #arrange multiple graph on a page
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
library(gsubfn) #utilities for strings and function arguments
## Loading required package: proto
#inporting the data
menu<-read.csv("C:/Users/TM37075/Documents/UM Master of Data Science/WQD7004 Programming in Data Science/menu.csv",header=TRUE,sep=',')

#Dimension of the Data
dim(menu)
## [1] 260  24
#Structure of the Data
str(menu)
## 'data.frame':    260 obs. of  24 variables:
##  $ Category                     : chr  "Breakfast" "Breakfast" "Breakfast" "Breakfast" ...
##  $ Item                         : chr  "Egg McMuffin" "Egg White Delight" "Sausage McMuffin" "Sausage McMuffin with Egg" ...
##  $ Serving.Size                 : chr  "4.8 oz (136 g)" "4.8 oz (135 g)" "3.9 oz (111 g)" "5.7 oz (161 g)" ...
##  $ Calories                     : int  300 250 370 450 400 430 460 520 410 470 ...
##  $ Calories.from.Fat            : int  120 70 200 250 210 210 230 270 180 220 ...
##  $ Total.Fat                    : num  13 8 23 28 23 23 26 30 20 25 ...
##  $ Total.Fat....Daily.Value.    : int  20 12 35 43 35 36 40 47 32 38 ...
##  $ Saturated.Fat                : num  5 3 8 10 8 9 13 14 11 12 ...
##  $ Saturated.Fat....Daily.Value.: int  25 15 42 52 42 46 65 68 56 59 ...
##  $ Trans.Fat                    : num  0 0 0 0 0 1 0 0 0 0 ...
##  $ Cholesterol                  : int  260 25 45 285 50 300 250 250 35 35 ...
##  $ Cholesterol....Daily.Value.  : int  87 8 15 95 16 100 83 83 11 11 ...
##  $ Sodium                       : int  750 770 780 860 880 960 1300 1410 1300 1420 ...
##  $ Sodium....Daily.Value.       : int  31 32 33 36 37 40 54 59 54 59 ...
##  $ Carbohydrates                : int  31 30 29 30 30 31 38 43 36 42 ...
##  $ Carbohydrates....Daily.Value.: int  10 10 10 10 10 10 13 14 12 14 ...
##  $ Dietary.Fiber                : int  4 4 4 4 4 4 2 3 2 3 ...
##  $ Dietary.Fiber....Daily.Value.: int  17 17 17 17 17 18 7 12 7 12 ...
##  $ Sugars                       : int  3 3 2 2 2 3 3 4 3 4 ...
##  $ Protein                      : int  17 18 14 21 21 26 19 19 20 20 ...
##  $ Vitamin.A....Daily.Value.    : int  10 6 8 15 6 15 10 15 2 6 ...
##  $ Vitamin.C....Daily.Value.    : int  0 0 0 0 0 2 8 8 8 8 ...
##  $ Calcium....Daily.Value.      : int  25 25 25 30 25 30 15 20 15 15 ...
##  $ Iron....Daily.Value.         : int  15 8 10 15 10 20 15 20 10 15 ...
#Summary of the data
summary(menu)
##    Category             Item           Serving.Size          Calories     
##  Length:260         Length:260         Length:260         Min.   :   0.0  
##  Class :character   Class :character   Class :character   1st Qu.: 210.0  
##  Mode  :character   Mode  :character   Mode  :character   Median : 340.0  
##                                                           Mean   : 368.3  
##                                                           3rd Qu.: 500.0  
##                                                           Max.   :1880.0  
##  Calories.from.Fat   Total.Fat       Total.Fat....Daily.Value. Saturated.Fat   
##  Min.   :   0.0    Min.   :  0.000   Min.   :  0.00            Min.   : 0.000  
##  1st Qu.:  20.0    1st Qu.:  2.375   1st Qu.:  3.75            1st Qu.: 1.000  
##  Median : 100.0    Median : 11.000   Median : 17.00            Median : 5.000  
##  Mean   : 127.1    Mean   : 14.165   Mean   : 21.82            Mean   : 6.008  
##  3rd Qu.: 200.0    3rd Qu.: 22.250   3rd Qu.: 35.00            3rd Qu.:10.000  
##  Max.   :1060.0    Max.   :118.000   Max.   :182.00            Max.   :20.000  
##  Saturated.Fat....Daily.Value.   Trans.Fat       Cholesterol    
##  Min.   :  0.00                Min.   :0.0000   Min.   :  0.00  
##  1st Qu.:  4.75                1st Qu.:0.0000   1st Qu.:  5.00  
##  Median : 24.00                Median :0.0000   Median : 35.00  
##  Mean   : 29.97                Mean   :0.2038   Mean   : 54.94  
##  3rd Qu.: 48.00                3rd Qu.:0.0000   3rd Qu.: 65.00  
##  Max.   :102.00                Max.   :2.5000   Max.   :575.00  
##  Cholesterol....Daily.Value.     Sodium       Sodium....Daily.Value.
##  Min.   :  0.00              Min.   :   0.0   Min.   :  0.00        
##  1st Qu.:  2.00              1st Qu.: 107.5   1st Qu.:  4.75        
##  Median : 11.00              Median : 190.0   Median :  8.00        
##  Mean   : 18.39              Mean   : 495.8   Mean   : 20.68        
##  3rd Qu.: 21.25              3rd Qu.: 865.0   3rd Qu.: 36.25        
##  Max.   :192.00              Max.   :3600.0   Max.   :150.00        
##  Carbohydrates    Carbohydrates....Daily.Value. Dietary.Fiber  
##  Min.   :  0.00   Min.   : 0.00                 Min.   :0.000  
##  1st Qu.: 30.00   1st Qu.:10.00                 1st Qu.:0.000  
##  Median : 44.00   Median :15.00                 Median :1.000  
##  Mean   : 47.35   Mean   :15.78                 Mean   :1.631  
##  3rd Qu.: 60.00   3rd Qu.:20.00                 3rd Qu.:3.000  
##  Max.   :141.00   Max.   :47.00                 Max.   :7.000  
##  Dietary.Fiber....Daily.Value.     Sugars          Protein     
##  Min.   : 0.000                Min.   :  0.00   Min.   : 0.00  
##  1st Qu.: 0.000                1st Qu.:  5.75   1st Qu.: 4.00  
##  Median : 5.000                Median : 17.50   Median :12.00  
##  Mean   : 6.531                Mean   : 29.42   Mean   :13.34  
##  3rd Qu.:10.000                3rd Qu.: 48.00   3rd Qu.:19.00  
##  Max.   :28.000                Max.   :128.00   Max.   :87.00  
##  Vitamin.A....Daily.Value. Vitamin.C....Daily.Value. Calcium....Daily.Value.
##  Min.   :  0.00            Min.   :  0.000           Min.   : 0.00          
##  1st Qu.:  2.00            1st Qu.:  0.000           1st Qu.: 6.00          
##  Median :  8.00            Median :  0.000           Median :20.00          
##  Mean   : 13.43            Mean   :  8.535           Mean   :20.97          
##  3rd Qu.: 15.00            3rd Qu.:  4.000           3rd Qu.:30.00          
##  Max.   :170.00            Max.   :240.000           Max.   :70.00          
##  Iron....Daily.Value.
##  Min.   : 0.000      
##  1st Qu.: 0.000      
##  Median : 4.000      
##  Mean   : 7.735      
##  3rd Qu.:15.000      
##  Max.   :40.000
#First 3 rows of the Data
head(menu,3)
##    Category              Item   Serving.Size Calories Calories.from.Fat
## 1 Breakfast      Egg McMuffin 4.8 oz (136 g)      300               120
## 2 Breakfast Egg White Delight 4.8 oz (135 g)      250                70
## 3 Breakfast  Sausage McMuffin 3.9 oz (111 g)      370               200
##   Total.Fat Total.Fat....Daily.Value. Saturated.Fat
## 1        13                        20             5
## 2         8                        12             3
## 3        23                        35             8
##   Saturated.Fat....Daily.Value. Trans.Fat Cholesterol
## 1                            25         0         260
## 2                            15         0          25
## 3                            42         0          45
##   Cholesterol....Daily.Value. Sodium Sodium....Daily.Value. Carbohydrates
## 1                          87    750                     31            31
## 2                           8    770                     32            30
## 3                          15    780                     33            29
##   Carbohydrates....Daily.Value. Dietary.Fiber Dietary.Fiber....Daily.Value.
## 1                            10             4                            17
## 2                            10             4                            17
## 3                            10             4                            17
##   Sugars Protein Vitamin.A....Daily.Value. Vitamin.C....Daily.Value.
## 1      3      17                        10                         0
## 2      3      18                         6                         0
## 3      2      14                         8                         0
##   Calcium....Daily.Value. Iron....Daily.Value.
## 1                      25                   15
## 2                      25                    8
## 3                      25                   10
#Category of the McDonald's menu
unique(menu[,1])
## [1] "Breakfast"          "Beef & Pork"        "Chicken & Fish"    
## [4] "Salads"             "Snacks & Sides"     "Desserts"          
## [7] "Beverages"          "Coffee & Tea"       "Smoothies & Shakes"
#If any of the columns having missing values
colSums(is.na(menu))
##                      Category                          Item 
##                             0                             0 
##                  Serving.Size                      Calories 
##                             0                             0 
##             Calories.from.Fat                     Total.Fat 
##                             0                             0 
##     Total.Fat....Daily.Value.                 Saturated.Fat 
##                             0                             0 
## Saturated.Fat....Daily.Value.                     Trans.Fat 
##                             0                             0 
##                   Cholesterol   Cholesterol....Daily.Value. 
##                             0                             0 
##                        Sodium        Sodium....Daily.Value. 
##                             0                             0 
##                 Carbohydrates Carbohydrates....Daily.Value. 
##                             0                             0 
##                 Dietary.Fiber Dietary.Fiber....Daily.Value. 
##                             0                             0 
##                        Sugars                       Protein 
##                             0                             0 
##     Vitamin.A....Daily.Value.     Vitamin.C....Daily.Value. 
##                             0                             0 
##       Calcium....Daily.Value.          Iron....Daily.Value. 
##                             0                             0

This data contains 260 items of McDonald’s menu and 24 columns.The menu is divided into 9 categories: Beef & Pork, Beverages, Breakfast, Chicken & Fish, Coffee & Tea, Desserts, Salads, Smoothies & Shakes, Snacks & Sides.

This data in the first look seems tidy as it doesn’t contain any missing value. However, one of the objective is to look at the relationships between serving size and calories but the serving size variable is econded as factor in the data set. It will be converted into a single numeric variable represented by grams for food and milliliters for drinks.

3 Data Cleaning

#select drinks data that contain "fl oz" and "carton" string
#Convert fl oz to milliliters (1 oz = 29.5735 ml)
##"In raw data, string carton as 1 carton (236 ml), we want to seperate numbers from the string

drink.oz<-menu[str_detect(menu$Serving.Size," fl oz.*"),]

drink.carton<-menu[str_detect(menu$Serving.Size,"carton"),]

drink.oz$Serving.Size <- round(as.numeric(gsub(" fl oz.*","",drink.oz$Serving.Size))*29.5735,0)

drink.carton$Serving.Size<-round(as.numeric(gsub(".*\\((.*)\\ml).*","\\1",drink.carton$Serving.Size)),0)
#Note: \\ use to capture groups by enclosing patterns, \\1 : captured pattern 1

#select food data that contain "g" string and seperate the numbers

food.g<-menu[str_detect(menu$Serving.Size,"g"),]

food.g$Serving.Size<-round(as.numeric(gsub(".*\\((.*)\\ g.*","\\1",food.g$Serving.Size)),0)

#combine 3 above data frames by rbind() into new data frame
#create a new column with type and unit

drink.oz$Serving.Unit<-rep("drinks/ml",nrow(drink.oz))

drink.carton$Serving.Unit<-rep("drinks/ml",nrow(drink.carton))

food.g$Serving.Unit<-rep("food/grams",nrow(food.g))

newMenu<-rbind(drink.oz,drink.carton,food.g)

#Verify data on new data frame
dim(newMenu)
## [1] 260  25

For Data Cleaning, since the serving size variable in the original data is string: fl oz cup, oz (..g), carton(..ml) and cookie(..g), I am using some r function to assist me extracting value from the data.

  1. str_detect(string,pattern) to select data with specific string

  2. as.numeric() to convert column into numeric value

  3. gsub(pattern,replacement,stringvector) to replace all matches of a string

After that, we combine those data frame into new one and verify on the dimension. The data is now ready for analysis.

4 Data Analysis

4.1 Distribution of Calories across different menu, food categories and serving sizes

a) Box plot of Calories versus Category. There are few outliners in Breakfast, Chicken & Fish, Coffee & Tea category. One item seems to have a very high calories. I am going to check if it is an error in the data?

The highest caloric item from the Chicken & Fish category is a 40 pieces of Chicken McNuggets (1880 calories!). Hence, in this case, the outliner is not an error. My assumption would be that nuggets are to be shared with several individuals.

boxplot(newMenu$Calories~newMenu$Category,las=2,col="aquamarine",xlab="Category",ylab="Calories",cex.axis=.5)

calories<-newMenu %>%
  group_by(Item,Calories) %>%
  filter(Category=="Chicken & Fish") %>%
  tally(sort = TRUE) %>%
  arrange(desc(Calories))

calories
## # A tibble: 27 x 3
## # Groups:   Item [27]
##    Item                                                Calories     n
##    <chr>                                                  <int> <int>
##  1 Chicken McNuggets (40 piece)                            1880     1
##  2 Chicken McNuggets (20 piece)                             940     1
##  3 Bacon Clubhouse Crispy Chicken Sandwich                  750     1
##  4 Premium Crispy Chicken Club Sandwich                     670     1
##  5 Premium McWrap Southwest Chicken (Crispy Chicken)        670     1
##  6 Premium McWrap Chicken & Bacon (Crispy Chicken)          630     1
##  7 Premium Crispy Chicken Ranch BLT Sandwich                610     1
##  8 Premium McWrap Chicken & Ranch (Crispy Chicken)          610     1
##  9 Bacon Clubhouse Grilled Chicken Sandwich                 590     1
## 10 Premium McWrap Chicken Sweet Chili (Crispy Chicken)      540     1
## # ... with 17 more rows

b) Scatter plots of Calories agains Serving Size by type of menu. Regression line shows a clear linear increase in Calories with Serving Size in food, but no such clear effect is visible for drinks.

plot(food.g$Serving.Size,food.g$Calories,pch=16,cex=1.3,col="blue",main="Calories against Food Serving.Size",xlab="Serving.Size",ylab="Calories")

reg<-lm(food.g$Calories~food.g$Serving.Size)
abline(reg)

drinkCategory<-rbind(drink.oz,drink.carton)

plot(drinkCategory$Serving.Size,drinkCategory$Calories,pch=16,cex=1.3,col="blue",main="Calories against Drinks Serving.Size",xlab="Serving.Size",ylab="Calories")

reg1<-lm(drinkCategory$Calories~drinkCategory$Serving.Size)
abline(reg1)

c) Scatter plots of Calories agains Serving Size across food categories with colour indicating drinks or food. Drinks are having a huge spread of caloric value for the similar serrving size.

ggplot(newMenu,aes(y=Calories,x=Serving.Size,color=Serving.Unit))+facet_wrap(~Category)+geom_jitter(size=0.5)

4.2 Crispy Chicken vs Grilled Chicken

There are various cateogry that have the option of choosing either crispy or grilled chicken. My assumption is that crispy option is slightly unhealthier as crispy required to be deep fried which is oily, but is that really true?

#Extract all items contain Crispy and Grilled string

crispy<-newMenu[str_detect(newMenu$Item,"Crispy"),]

grilled<-newMenu[str_detect(newMenu$Item,"Grilled"),]

#Combining two dataframes into one and a new column type
crispy$type<-rep("crispy",nrow(crispy))

grilled$type<-rep("grilled",nrow(grilled))

chicken<-rbind(crispy,grilled)

# barplot of the calories for each type of chicken
p1<-ggplot(chicken)+geom_bar(aes(y=Calories,x=type),fill = "lightblue",stat = "summary",fun="mean",width=0.2)+coord_flip()+ggtitle("Calories")

# barplot of the total fat percent daily value for each type of chicken
total.fat.percentage<-chicken[,7]

p2<-ggplot(chicken)+geom_bar(aes(y=total.fat.percentage,x=type),fill = "lightblue",stat = "summary",fun="mean",width=0.2)+coord_flip()+ggtitle("Total Fat (% Daily Value)")

# barplot of the cholesterol percent daily value for each type of chicken
cholesterol.percentage<-chicken[,12]

p3<-ggplot(chicken)+geom_bar(aes(y=cholesterol.percentage,x=type),fill = "lightblue",stat = "summary",fun="mean",width=0.2)+coord_flip()+ggtitle("Cholesterol (% Daily Value)")

# barplot of the sodium percent daily value for each type of chicken
sodium.percentage<-chicken[,14]

p4<-ggplot(chicken)+geom_bar(aes(y=sodium.percentage,x=type),size=5,fill = "lightblue",stat = "summary",fun="mean",width=0.2)+coord_flip()+ggtitle("Sodium (% Daily Value)")

# barplot of the carbohydrates percent daily value for each type of chicken
carbohydrates.percentage<-chicken[,16]

p5<-ggplot(chicken)+geom_bar(aes(y=carbohydrates.percentage,x=type),size=5,fill = "lightblue",stat = "summary",fun="mean",width=0.2)+coord_flip()+ggtitle("Carbohydrates (% Daily Value)")

# barplot of the dietary fiber percent daily value for each type of chicken
dietary.fiber.percentage<-chicken[,18]

p6<-ggplot(chicken)+geom_bar(aes(y=dietary.fiber.percentage,x=type),size=5,fill = "lightblue",stat = "summary",fun="mean",width=0.2)+coord_flip()+ggtitle("Dietary Fiber (% Daily Value)")

# barplot of the vitamin A fiber percent daily value for each type of chicken
vitamin.A.percentarge<-chicken[,21]

p7<-ggplot(chicken)+geom_bar(aes(y=vitamin.A.percentarge,x=type),size=5,fill = "lightblue",stat = "summary",fun="mean",width=0.2)+coord_flip()+ggtitle("Vitamin A (% Daily Value)")

# barplot of the vitamin C percent daily value for each type of chicken
vitamin.C.percentage<-chicken[,22]

p8<-ggplot(chicken)+geom_bar(aes(y=vitamin.C.percentage,x=type),size=5,fill = "lightblue",stat = "summary",fun="mean",width=0.2)+coord_flip()+ggtitle("Vitamin C (% Daily Value)")

# barplot of the calcium percent daily value for each type of chicken
calcium.percentage<-chicken[,23]

p9<-ggplot(chicken)+geom_bar(aes(y=calcium.percentage,x=type),size=5,fill = "lightblue",stat = "summary",fun="mean",width=0.2)+coord_flip()+ggtitle("Calcium (% Daily Value)")

# barplot of the iron percent daily value for each type of chicken
iron.percentage<-chicken[,24]

p10<-ggplot(chicken)+geom_bar(aes(y=iron.percentage,x=type),size=5,fill = "lightblue",stat = "summary",fun="mean",width=0.2)+coord_flip()+ggtitle("Iron (% Daily Value)")

grid.arrange(p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,nrow=5)

From the barplot, it seems that grilled chicken has lower calories, total fat, sodium, carbohydrates and a higer dietary fiber, vitamin A, and vitamin C for the same menu item with crispy chicken in it.

The only aspect in which crispy chicken may be potentially “healthier” is that it has lower mean cholesterol percent daily value than grilled chicken.

Lastly, there appear to be no significant difference for calcium and iron between grilled chicken and crispy chicken.

5 Conclusion

  1. In response to changing consumer tastes and a negative backlash because of the unhealthiness of their food, McDonald’s has added to its menu salads, zero calories drink like diet coke and ice tea, smoothies, and fruit. From the table below, you can see all item with calories less than 150.
#Filter Calories less than 150
zeroCaloriesItem<- newMenu %>%
  group_by(Item,Calories) %>%
  filter(Calories < 150) %>%
  tally(sort = TRUE) %>%
  arrange(desc(Calories))

zeroCaloriesItem %>% print(n=Inf)
## # A tibble: 41 x 3
## # Groups:   Item [41]
##    Item                                                      Calories     n
##    <chr>                                                        <int> <int>
##  1 Coca-Cola Classic (Small)                                      140     1
##  2 Dr Pepper (Small)                                              140     1
##  3 Nonfat Latte with Sugar Free French Vanilla Syrup (Small)      140     1
##  4 Premium Bacon Ranch Salad (without Chicken)                    140     1
##  5 Premium Southwest Salad (without Chicken)                      140     1
##  6 Regular Iced Coffee (Small)                                    140     1
##  7 Sprite (Small)                                                 140     1
##  8 Caramel Iced Coffee (Small)                                    130     1
##  9 Fat Free Chocolate Milk Jug                                    130     1
## 10 Hazelnut Iced Coffee (Small)                                   130     1
## 11 Nonfat Latte (Medium)                                          130     1
## 12 French Vanilla Iced Coffee (Small)                             120     1
## 13 Iced Coffee with Sugar Free French Vanilla Syrup (Medium)      120     1
## 14 Kids French Fries                                              110     1
## 15 Sweet Tea (Child)                                              110     1
## 16 1% Low Fat Milk Jug                                            100     1
## 17 Coca-Cola Classic (Child)                                      100     1
## 18 Dr Pepper (Child)                                              100     1
## 19 Nonfat Latte (Small)                                           100     1
## 20 Sprite (Child)                                                 100     1
## 21 Iced Coffee with Sugar Free French Vanilla Syrup (Small)        80     1
## 22 Minute Maid 100% Apple Juice Box                                80     1
## 23 Kids Ice Cream Cone                                             45     1
## 24 Side Salad                                                      20     1
## 25 Apple Slices                                                    15     1
## 26 Coffee (Large)                                                   0     1
## 27 Coffee (Medium)                                                  0     1
## 28 Coffee (Small)                                                   0     1
## 29 Dasani Water Bottle                                              0     1
## 30 Diet Coke (Child)                                                0     1
## 31 Diet Coke (Large)                                                0     1
## 32 Diet Coke (Medium)                                               0     1
## 33 Diet Coke (Small)                                                0     1
## 34 Diet Dr Pepper (Child)                                           0     1
## 35 Diet Dr Pepper (Large)                                           0     1
## 36 Diet Dr Pepper (Medium)                                          0     1
## 37 Diet Dr Pepper (Small)                                           0     1
## 38 Iced Tea (Child)                                                 0     1
## 39 Iced Tea (Large)                                                 0     1
## 40 Iced Tea (Medium)                                                0     1
## 41 Iced Tea (Small)                                                 0     1
  1. For regression problem, is there a relationship between menu, food categories and serving sizes, as we observed from the distribution graph, we found that there is a linear increase in Calories with Serving Size in food category but not in drinks, the spread of caloric value is just too huge.

  2. For classification problem, does the grilled chicken a healthier choice or the crispy chicken, we can conclude that grilled chicken has overall lower calories, total fat, carbohydrates, richer in vitamins. However, crispy chicken we found out has lower mean cholesterol percent daily value than grilled chicken.

5.1 Reference

  1. Dataset: https://www.kaggle.com/mcdonalds/nutrition-facts
  2. Information about McDonald’s: https://en.wikipedia.org/wiki/McDonald