This dataset provides a nutrition analysis of every menu item on the Indian McDonald’s menu, including breakfast, burgers, fries, salads, soda, coffee and tea, milkshakes, and desserts. This dataset is downloaded from kaggel, and you can download it from here
Let’s start be reading in the data. There is only one CSV file, with
the menu items and some measurements on each item.
{janitor::clean_names}
helps us get clean column names
quickly.
# import dataset
<- read.csv("India_Menu.csv") %>%
menu ::clean_names() janitor
# read 5 data from the top
head(menu, 5)
# read 5 data from the bottom
tail(menu, 5)
# dimension of the data
dim(menu)
#> [1] 141 13
# column's name
names(menu)
#> [1] "menu_category" "menu_items" "per_serve_size"
#> [4] "energy_k_cal" "protein_g" "total_fat_g"
#> [7] "sat_fat_g" "trans_fat_g" "cholesterols_mg"
#> [10] "total_carbohydrate_g" "total_sugars_g" "added_sugars_g"
#> [13] "sodium_mg"
From our inspection we can conclde that: - The dataset consists of 141 rows and 13 columns. - Each of column name are “menu_category”, “menu_items”, “per_serve_size”, “energy_k_cal”, “protein_g” “total_fat_g”, “sat_fat_g”, “trans_fat_g”, “cholesterols_mg”, “total_carbohydrate_g”, “total_sugars_g”, “added_sugars_g”, “sodium_mg”
Check data type for each column
str(menu)
#> 'data.frame': 141 obs. of 13 variables:
#> $ menu_category : chr "Regular Menu" "Regular Menu" "Regular Menu" "Regular Menu" ...
#> $ menu_items : chr "McVeggie™ Burger" "McAloo Tikki Burger®" "McSpicy™ Paneer Burger" "Spicy Paneer Wrap" ...
#> $ per_serve_size : chr "168 g" "146 g" "199 g" "250 g" ...
#> $ energy_k_cal : num 402 340 653 675 512 ...
#> $ protein_g : num 10.2 8.5 20.3 21 15.3 ...
#> $ total_fat_g : num 13.8 11.3 39.5 39.1 23.4 ...
#> $ sat_fat_g : num 5.34 4.27 17.12 19.73 10.51 ...
#> $ trans_fat_g : num 0.16 0.2 0.18 0.26 0.17 0.28 0.24 0.09 0.16 0.21 ...
#> $ cholesterols_mg : num 2.49 1.47 21.85 40.93 25.24 ...
#> $ total_carbohydrate_g: num 56.5 50.3 52.3 59.3 57 ...
#> $ total_sugars_g : num 7.9 7.05 8.35 3.5 7.85 ...
#> $ added_sugars_g : num 4.49 4.07 5.27 1.08 4.76 6.92 1.15 0.35 4.49 3.54 ...
#> $ sodium_mg : num 706 545 1075 1087 1051 ...
From this result, we find some of data type not in the correct type. we need to convert it into correct type (data coercion).
menu_category
should be converted into factor
typeper_serve_size
should be converted into number.
However, it consists string that should be handled at first.# change data type
$menu_category <- as.factor(menu$menu_category)
menu
# change column name
colnames(menu)[colnames(menu) == "per_serve_size"] <- "per_serve_size_g"
# manipulate string data in column "Per.Serve.Size..g." and convert to numeric data type
$per_serve_size_g <- as.numeric(stringr::str_extract(menu$per_serve_size_g, "\\d*"))
menu
# check data types
str(menu)
#> 'data.frame': 141 obs. of 13 variables:
#> $ menu_category : Factor w/ 7 levels "Beverages Menu",..: 7 7 7 7 7 7 7 7 7 7 ...
#> $ menu_items : chr "McVeggie™ Burger" "McAloo Tikki Burger®" "McSpicy™ Paneer Burger" "Spicy Paneer Wrap" ...
#> $ per_serve_size_g : num 168 146 199 250 177 306 132 87 173 136 ...
#> $ energy_k_cal : num 402 340 653 675 512 ...
#> $ protein_g : num 10.2 8.5 20.3 21 15.3 ...
#> $ total_fat_g : num 13.8 11.3 39.5 39.1 23.4 ...
#> $ sat_fat_g : num 5.34 4.27 17.12 19.73 10.51 ...
#> $ trans_fat_g : num 0.16 0.2 0.18 0.26 0.17 0.28 0.24 0.09 0.16 0.21 ...
#> $ cholesterols_mg : num 2.49 1.47 21.85 40.93 25.24 ...
#> $ total_carbohydrate_g: num 56.5 50.3 52.3 59.3 57 ...
#> $ total_sugars_g : num 7.9 7.05 8.35 3.5 7.85 ...
#> $ added_sugars_g : num 4.49 4.07 5.27 1.08 4.76 6.92 1.15 0.35 4.49 3.54 ...
#> $ sodium_mg : num 706 545 1075 1087 1051 ...
Each of column already changed into desired data type.
Cek for missing value
colSums(is.na(menu))
#> menu_category menu_items per_serve_size_g
#> 0 0 0
#> energy_k_cal protein_g total_fat_g
#> 0 0 0
#> sat_fat_g trans_fat_g cholesterols_mg
#> 0 0 0
#> total_carbohydrate_g total_sugars_g added_sugars_g
#> 0 0 0
#> sodium_mg
#> 1
There is missing value in the column
Sodium..mg.
Lets treat this column by inspecting the dataset.
# find the index number that is missing value
which(is.na(menu$sodium_mg))
#> [1] 112
# subsetting into the dataset to look at the rows
111:113, ] menu[
Since the row number 112 have almost the same menu with the row
number 113, we gonna fulfill the row 112 on the column
Sodium..mg.
with the dataset from the row 113.
Using library from tidyr we will handle this missing value
<- menu %>%
menu fill(sodium_mg, .direction = "up")
# check the dataset
111:113, c("menu_category", "sodium_mg")] menu[
colSums(is.na(menu))
#> menu_category menu_items per_serve_size_g
#> 0 0 0
#> energy_k_cal protein_g total_fat_g
#> 0 0 0
#> sat_fat_g trans_fat_g cholesterols_mg
#> 0 0 0
#> total_carbohydrate_g total_sugars_g added_sugars_g
#> 0 0 0
#> sodium_mg
#> 0
Now if we check again the missing value, all is fulfilled. Great!! No missing value again.
Now, menu dataset is ready to be processed and analyzed
Brief explanation
summary(menu)
#> menu_category menu_items per_serve_size_g energy_k_cal
#> Beverages Menu :17 Length:141 Min. : 5 Min. : 0.0
#> Breakfast Menu :15 Class :character 1st Qu.:125 1st Qu.:116.4
#> Condiments Menu: 9 Mode :character Median :212 Median :219.4
#> Desserts Menu : 2 Mean :222 Mean :244.6
#> Gourmet Menu :11 3rd Qu.:301 3rd Qu.:339.5
#> McCafe Menu :51 Max. :544 Max. :834.4
#> Regular Menu :36
#> protein_g total_fat_g sat_fat_g trans_fat_g
#> Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.0000
#> 1st Qu.: 0.650 1st Qu.: 0.460 1st Qu.: 0.280 1st Qu.: 0.0600
#> Median : 4.790 Median : 7.770 Median : 4.270 Median : 0.1500
#> Mean : 7.494 Mean : 9.992 Mean : 4.998 Mean : 0.6872
#> 3rd Qu.:10.880 3rd Qu.:14.160 3rd Qu.: 7.280 3rd Qu.: 0.2200
#> Max. :39.470 Max. :45.180 Max. :20.460 Max. :75.2600
#>
#> cholesterols_mg total_carbohydrate_g total_sugars_g added_sugars_g
#> Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
#> 1st Qu.: 1.51 1st Qu.:15.74 1st Qu.: 2.33 1st Qu.: 0.00
#> Median : 8.39 Median :30.82 Median : 9.16 Median : 3.64
#> Mean : 26.35 Mean :31.19 Mean :15.46 Mean :10.34
#> 3rd Qu.: 31.11 3rd Qu.:46.00 3rd Qu.:26.95 3rd Qu.:19.23
#> Max. :302.61 Max. :93.84 Max. :64.22 Max. :64.22
#>
#> sodium_mg
#> Min. : 0.00
#> 1st Qu.: 44.53
#> Median : 153.15
#> Mean : 367.80
#> 3rd Qu.: 545.34
#> Max. :2399.49
#>
menu_category
, we have 7 category.With function summary() each column is well explained in terms of statistics.
1. Which menu items that gives the lowest total fat
especially in which menu_category
?
$total_fat_g == 0, ] menu[menu
Answer: There is 16 menu items with 0 total fat and mostly from the category of Beverages Menu followed by Regular Menu and Condiments Menu.
2. How much is mean of total fat for each category? Reorder from the highest to lowest.
<- aggregate(x = total_fat_g ~ menu_category, data = menu, FUN = mean)
Total_Fat_mean
order(Total_Fat_mean$total_fat_g, decreasing = T), ] Total_Fat_mean[
Answer: Gourmet Menu is the highest of mean total fat. While Beverages menu is the lowest. But we should take a note that Beverages menu doesn’t have total fat value, instead we should the total sugar.
3. How much is mean of total sugar? Reorder from the highest to lowest
<- aggregate(x = total_sugars_g ~ menu_category, data = menu, FUN = mean)
Total_Sugar_mean
order(Total_Sugar_mean$total_sugars_g, decreasing = T), ] Total_Sugar_mean[
Answer: In terms of mean total sugars, we find out that Beverages Menu is the top highest followed by McCafe Menu and Desserts Menu in he top tree. While Breakfast Menu is the least.
4. Which are the most and least energy dense options?
I’ll add a variable called energy density, which is the number of calories per weight of food. I suspect things like cookies and cake (perhaps even the fizzy drinks?) would rank high on this feature.
$energy_density <- menu$energy_k_cal / menu$per_serve_size_g
menu
<- menu[order(menu$energy_density, decreasing = T), c("menu_category", "menu_items", "energy_density")]
menu_order
head(menu_order, 3)
tail(menu_order, 3)
Answer: As I am expecting, Double Chocochips Muffin from Regular Menu is the highest energy density and Vedica Natural Mineral Water from Beverages Menu is the lowest.
5. Which food items that has the highest and lowest cholesterol?
<- menu[order(menu$cholesterols_mg, decreasing = T), c("menu_category", "menu_items", "cholesterols_mg")]
menu_order_cholesterol
head(menu_order_cholesterol, 3)
tail(menu_order_cholesterol, 3)
Answer: Menu items of McSpicy Premium Chicken Burger from Gourmet Menu is the highest cholesterol with 302.61 mg and drinks from Beverages Menu is the lowest with 0 mg.
It is recommend to do data explanatory using plot and visualizing the analysis so that it is more easy to be understood.