1. Introduction

This dataset provides a nutrition analysis of every menu item on the Indian McDonald’s menu, including breakfast, burgers, fries, salads, soda, coffee and tea, milkshakes, and desserts. This dataset is downloaded from kaggel, and you can download it from here

2. Read and Extracting Data

2.1. Read Data

Let’s start be reading in the data. There is only one CSV file, with the menu items and some measurements on each item. {janitor::clean_names} helps us get clean column names quickly.

# import dataset
menu <- read.csv("India_Menu.csv") %>%
  janitor::clean_names()

2.2. Data Inspection

# read 5 data from the top
head(menu, 5)
# read 5 data from the bottom
tail(menu, 5)
# dimension of the data
dim(menu)
#> [1] 141  13
# column's name
names(menu)
#>  [1] "menu_category"        "menu_items"           "per_serve_size"      
#>  [4] "energy_k_cal"         "protein_g"            "total_fat_g"         
#>  [7] "sat_fat_g"            "trans_fat_g"          "cholesterols_mg"     
#> [10] "total_carbohydrate_g" "total_sugars_g"       "added_sugars_g"      
#> [13] "sodium_mg"

From our inspection we can conclde that: - The dataset consists of 141 rows and 13 columns. - Each of column name are “menu_category”, “menu_items”, “per_serve_size”, “energy_k_cal”, “protein_g” “total_fat_g”, “sat_fat_g”, “trans_fat_g”, “cholesterols_mg”, “total_carbohydrate_g”, “total_sugars_g”, “added_sugars_g”, “sodium_mg”

2.3. Data Cleansing & Coercion

Check data type for each column

str(menu)
#> 'data.frame':    141 obs. of  13 variables:
#>  $ menu_category       : chr  "Regular Menu" "Regular Menu" "Regular Menu" "Regular Menu" ...
#>  $ menu_items          : chr  "McVeggie™ Burger" "McAloo Tikki Burger®" "McSpicy™ Paneer Burger" "Spicy Paneer Wrap" ...
#>  $ per_serve_size      : chr  "168 g" "146 g" "199 g" "250 g" ...
#>  $ energy_k_cal        : num  402 340 653 675 512 ...
#>  $ protein_g           : num  10.2 8.5 20.3 21 15.3 ...
#>  $ total_fat_g         : num  13.8 11.3 39.5 39.1 23.4 ...
#>  $ sat_fat_g           : num  5.34 4.27 17.12 19.73 10.51 ...
#>  $ trans_fat_g         : num  0.16 0.2 0.18 0.26 0.17 0.28 0.24 0.09 0.16 0.21 ...
#>  $ cholesterols_mg     : num  2.49 1.47 21.85 40.93 25.24 ...
#>  $ total_carbohydrate_g: num  56.5 50.3 52.3 59.3 57 ...
#>  $ total_sugars_g      : num  7.9 7.05 8.35 3.5 7.85 ...
#>  $ added_sugars_g      : num  4.49 4.07 5.27 1.08 4.76 6.92 1.15 0.35 4.49 3.54 ...
#>  $ sodium_mg           : num  706 545 1075 1087 1051 ...

From this result, we find some of data type not in the correct type. we need to convert it into correct type (data coercion).

  • Column menu_category should be converted into factor type
  • Column per_serve_size should be converted into number. However, it consists string that should be handled at first.
# change data type
menu$menu_category <- as.factor(menu$menu_category)

# change column name
colnames(menu)[colnames(menu) == "per_serve_size"] <- "per_serve_size_g"

# manipulate string data in column "Per.Serve.Size..g." and convert to numeric data type
menu$per_serve_size_g <- as.numeric(stringr::str_extract(menu$per_serve_size_g, "\\d*"))

# check data types
str(menu)
#> 'data.frame':    141 obs. of  13 variables:
#>  $ menu_category       : Factor w/ 7 levels "Beverages Menu",..: 7 7 7 7 7 7 7 7 7 7 ...
#>  $ menu_items          : chr  "McVeggie™ Burger" "McAloo Tikki Burger®" "McSpicy™ Paneer Burger" "Spicy Paneer Wrap" ...
#>  $ per_serve_size_g    : num  168 146 199 250 177 306 132 87 173 136 ...
#>  $ energy_k_cal        : num  402 340 653 675 512 ...
#>  $ protein_g           : num  10.2 8.5 20.3 21 15.3 ...
#>  $ total_fat_g         : num  13.8 11.3 39.5 39.1 23.4 ...
#>  $ sat_fat_g           : num  5.34 4.27 17.12 19.73 10.51 ...
#>  $ trans_fat_g         : num  0.16 0.2 0.18 0.26 0.17 0.28 0.24 0.09 0.16 0.21 ...
#>  $ cholesterols_mg     : num  2.49 1.47 21.85 40.93 25.24 ...
#>  $ total_carbohydrate_g: num  56.5 50.3 52.3 59.3 57 ...
#>  $ total_sugars_g      : num  7.9 7.05 8.35 3.5 7.85 ...
#>  $ added_sugars_g      : num  4.49 4.07 5.27 1.08 4.76 6.92 1.15 0.35 4.49 3.54 ...
#>  $ sodium_mg           : num  706 545 1075 1087 1051 ...

Each of column already changed into desired data type.

Cek for missing value

colSums(is.na(menu))
#>        menu_category           menu_items     per_serve_size_g 
#>                    0                    0                    0 
#>         energy_k_cal            protein_g          total_fat_g 
#>                    0                    0                    0 
#>            sat_fat_g          trans_fat_g      cholesterols_mg 
#>                    0                    0                    0 
#> total_carbohydrate_g       total_sugars_g       added_sugars_g 
#>                    0                    0                    0 
#>            sodium_mg 
#>                    1

There is missing value in the column Sodium..mg.

Lets treat this column by inspecting the dataset.

# find the index number that is missing value
which(is.na(menu$sodium_mg))
#> [1] 112
# subsetting into the dataset to look at the rows
menu[111:113, ]

Since the row number 112 have almost the same menu with the row number 113, we gonna fulfill the row 112 on the column Sodium..mg. with the dataset from the row 113.

Using library from tidyr we will handle this missing value

menu <- menu %>%
  fill(sodium_mg, .direction = "up")

# check the dataset
menu[111:113, c("menu_category", "sodium_mg")]
colSums(is.na(menu))
#>        menu_category           menu_items     per_serve_size_g 
#>                    0                    0                    0 
#>         energy_k_cal            protein_g          total_fat_g 
#>                    0                    0                    0 
#>            sat_fat_g          trans_fat_g      cholesterols_mg 
#>                    0                    0                    0 
#> total_carbohydrate_g       total_sugars_g       added_sugars_g 
#>                    0                    0                    0 
#>            sodium_mg 
#>                    0

Now if we check again the missing value, all is fulfilled. Great!! No missing value again.

Now, menu dataset is ready to be processed and analyzed

3. Data Explanation

Brief explanation

summary(menu)
#>          menu_category  menu_items        per_serve_size_g  energy_k_cal  
#>  Beverages Menu :17    Length:141         Min.   :  5      Min.   :  0.0  
#>  Breakfast Menu :15    Class :character   1st Qu.:125      1st Qu.:116.4  
#>  Condiments Menu: 9    Mode  :character   Median :212      Median :219.4  
#>  Desserts Menu  : 2                       Mean   :222      Mean   :244.6  
#>  Gourmet Menu   :11                       3rd Qu.:301      3rd Qu.:339.5  
#>  McCafe Menu    :51                       Max.   :544      Max.   :834.4  
#>  Regular Menu   :36                                                       
#>    protein_g       total_fat_g       sat_fat_g       trans_fat_g     
#>  Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.0000  
#>  1st Qu.: 0.650   1st Qu.: 0.460   1st Qu.: 0.280   1st Qu.: 0.0600  
#>  Median : 4.790   Median : 7.770   Median : 4.270   Median : 0.1500  
#>  Mean   : 7.494   Mean   : 9.992   Mean   : 4.998   Mean   : 0.6872  
#>  3rd Qu.:10.880   3rd Qu.:14.160   3rd Qu.: 7.280   3rd Qu.: 0.2200  
#>  Max.   :39.470   Max.   :45.180   Max.   :20.460   Max.   :75.2600  
#>                                                                      
#>  cholesterols_mg  total_carbohydrate_g total_sugars_g  added_sugars_g 
#>  Min.   :  0.00   Min.   : 0.00        Min.   : 0.00   Min.   : 0.00  
#>  1st Qu.:  1.51   1st Qu.:15.74        1st Qu.: 2.33   1st Qu.: 0.00  
#>  Median :  8.39   Median :30.82        Median : 9.16   Median : 3.64  
#>  Mean   : 26.35   Mean   :31.19        Mean   :15.46   Mean   :10.34  
#>  3rd Qu.: 31.11   3rd Qu.:46.00        3rd Qu.:26.95   3rd Qu.:19.23  
#>  Max.   :302.61   Max.   :93.84        Max.   :64.22   Max.   :64.22  
#>                                                                       
#>    sodium_mg      
#>  Min.   :   0.00  
#>  1st Qu.:  44.53  
#>  Median : 153.15  
#>  Mean   : 367.80  
#>  3rd Qu.: 545.34  
#>  Max.   :2399.49  
#> 
  • Based on menu_category, we have 7 category.
  • Total of Mcdonald’s menu in India is 141 items.
  • Energy calorie in Mcdonald’s menu range from 0 - 834 cal.
  • Total fat ranged from 0 - 45 gram.

With function summary() each column is well explained in terms of statistics.

4. Data Manipulation, Transformation and Exploration

1. Which menu items that gives the lowest total fat especially in which menu_category?

menu[menu$total_fat_g == 0, ]

Answer: There is 16 menu items with 0 total fat and mostly from the category of Beverages Menu followed by Regular Menu and Condiments Menu.

2. How much is mean of total fat for each category? Reorder from the highest to lowest.

Total_Fat_mean <- aggregate(x = total_fat_g ~ menu_category, data = menu, FUN = mean)

Total_Fat_mean[order(Total_Fat_mean$total_fat_g, decreasing = T), ]

Answer: Gourmet Menu is the highest of mean total fat. While Beverages menu is the lowest. But we should take a note that Beverages menu doesn’t have total fat value, instead we should the total sugar.

3. How much is mean of total sugar? Reorder from the highest to lowest

Total_Sugar_mean <- aggregate(x = total_sugars_g ~ menu_category, data = menu, FUN = mean)

Total_Sugar_mean[order(Total_Sugar_mean$total_sugars_g, decreasing = T), ]

Answer: In terms of mean total sugars, we find out that Beverages Menu is the top highest followed by McCafe Menu and Desserts Menu in he top tree. While Breakfast Menu is the least.

4. Which are the most and least energy dense options?

I’ll add a variable called energy density, which is the number of calories per weight of food. I suspect things like cookies and cake (perhaps even the fizzy drinks?) would rank high on this feature.

menu$energy_density <- menu$energy_k_cal / menu$per_serve_size_g

menu_order <- menu[order(menu$energy_density, decreasing = T), c("menu_category", "menu_items", "energy_density")]

head(menu_order, 3)
tail(menu_order, 3)

Answer: As I am expecting, Double Chocochips Muffin from Regular Menu is the highest energy density and Vedica Natural Mineral Water from Beverages Menu is the lowest.

5. Which food items that has the highest and lowest cholesterol?

menu_order_cholesterol <- menu[order(menu$cholesterols_mg, decreasing = T), c("menu_category", "menu_items", "cholesterols_mg")]

head(menu_order_cholesterol, 3)
tail(menu_order_cholesterol, 3)

Answer: Menu items of McSpicy Premium Chicken Burger from Gourmet Menu is the highest cholesterol with 302.61 mg and drinks from Beverages Menu is the lowest with 0 mg.

5. Recommendations

It is recommend to do data explanatory using plot and visualizing the analysis so that it is more easy to be understood.