McDonald’s Food Nutrition Data Analysis

1.Background

I find a dataset about McDonald Menu Nutrition from Kaggle, because I’m a fan of McDonald and I want to analysis the data to find some new information. The menu items and nutrition facts were scraped from the McDonald’s website. This dataset provides a nutrition analysis of every menu item on the Indian McDonald’s menu.
The dataset is from https://www.kaggle.com/datasets/deepcontractor/mcdonalds-india-menu-nutrition-facts

Description of the meaning of the dataset features：
The data contains 13 feature dimensions.
(1).Menu Category:Includes 7 different menus.
(2).Menu Items:Food items per menu.
(3).Food nutrient content: Per Serve Size,Energy (kCal),Protein (g),Total fat (g),Sat Fat (g),Trans fat (g),Cholesterols (mg),Total carbohydrate (g),Total Sugars (g),Added Sugars (g),Sodium (mg)

2.Dataset

In this section I work on processing the dataset to make the data easier to analyze.

Import the required libraries.

## 
## Attaching package: 'dplyr'

## The following object is masked from 'package:gridExtra':
## 
##     combine

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa

## corrplot 0.92 loaded

## Loading required package: lattice

## Loading required package: Matrix

## Loaded glmnet 4.1-7

## 
## Attaching package: 'MASS'

## The following object is masked from 'package:dplyr':
## 
##     select

Read a dataset in csv format and view the column names.

dataInit <- read.csv('India_Menu.csv')
names(dataInit)

##  [1] "Menu.Category"          "Menu.Items"             "Per.Serve.Size"        
##  [4] "Energy..kCal."          "Protein..g."            "Total.fat..g."         
##  [7] "Sat.Fat..g."            "Trans.fat..g."          "Cholesterols..mg."     
## [10] "Total.carbohydrate..g." "Total.Sugars..g."       "Added.Sugars..g."      
## [13] "Sodium..mg."

To easily analyze the dataset subsequently, I make some changes to the column names and then check the dataset for missing data.

colnames(dataInit)<- c("Category","Items","Size_g","Energy_kCal","Protein_g","Totalfat_g","Satfat_g","Transfat_g","Cholesterol_mg","Totalcarbohydrate_g","Totalsugar_g","Addedsugar_g","Sodium_mg")
vis_miss(dataInit)

nrow(dataInit)

## [1] 141

I find out that there is missing data, and since there is very little missing data, I decide to delete the missing data in that row.
After deletion, 140 rows of data remain. It can be seen that only one row contains the missing data.

dataNew <- na.omit(dataInit)
nrow(dataNew)

## [1] 140

3.Research Questions

1.Analysis of unhealthy nutrients in different menus.
2.Is there any difference between the Gourmet Menu, the Regular Menu and the Breakfast Menu?
3.Use PCA to downscale and see if there are any new findings.
4.I want to predict whether a particular item belongs to the Regular Menu or not. This is a binary classification problem.

4.Result:

4.1.Analysis of unhealthy nutrients in different menus.

In our daily life, the main food nutrients that affect people’s health are:
Totalfat(g): Consuming too much fat can lead to weight gain and an increased risk of heart disease.
SatFat(g): Saturated fat is particularly harmful to heart health and can increase cholesterol levels.
Transfat(g): Trans fats are also harmful to heart health and have been linked to an increased risk of heart disease and other health problems.
Cholesterol(mg): High levels of cholesterol in the blood can increase the risk of heart disease.
Totalsugar(g): Consuming too much sugar can lead to weight gain and an increased risk of diabetes and other health problems.
Sodium(mg): Consuming too much sodium can increase blood pressure and the risk of heart disease.

Now, I focus on these 6 nutrients that affect health.
Processing data, divided into six groups.

data_Cate_Totalfat <- dataNew %>% dplyr::select(Category, Items, Totalfat_g)
data_Cate_Satfat <- dataNew %>% dplyr::select(Category, Items, Satfat_g)
data_Cate_Transfat <- dataNew %>% dplyr::select(Category, Items, Transfat_g)
data_Cate_Cholesterol <- dataNew %>% dplyr::select(Category, Items, Cholesterol_mg)
data_Cate_Totalsugar <- dataNew %>% dplyr::select(Category, Items, Totalsugar_g)
data_Cate_Sodium <- dataNew %>% dplyr::select(Category, Items, Sodium_mg)

4.1.1.Totalfat(g)

Shows the fat content of a variety of different foods under different menu categories.

p1 <- ggplot(data_Cate_Totalfat, aes(x = Category, y = Totalfat_g)) +
  geom_point(aes(size = Totalfat_g, color = Totalfat_g, alpha = 0.6)) +scale_color_gradient(low = "blue", high = "red")+
  #geom_boxplot()+
  xlab("Menu") +
  ylab("Total Fat (g)") +
  theme(axis.text.x = element_text(angle = 90, hjust = 0.5, vjust = 0.5))
p2 <- ggplot(data_Cate_Totalfat, aes(x = Totalfat_g, y = Category)) +geom_boxplot()
grid.arrange(p1, p2, ncol=2)

menu_totalfat <- dataNew %>%
  group_by(Category) %>%
  summarize(Total=sum(Totalfat_g),Mean=mean(Totalfat_g),Median=median(Totalfat_g),Min=min(Totalfat_g),Max=max(Totalfat_g),Sd=sd(Totalfat_g))
menu_totalfat

We can see that the dishes on the Gourmet Menu contain the highest amount of average fat! The dishes with the highest fat content are on the Gourmet Menu. The foods in the Gourmet Menu and the Regular Menu have a relatively wide range of fat distribution. However the average fat content of the food in the Beverages Menu is the smallest.
Recommendation:
For people with excess fat, try to choose foods from the Breakfast Menu when ordering. If you miss breakfast time, we prefer to choose food from the Regular Menu, in addition, try to choose drinks from the Beverages menu instead of drinking coffee.

4.1.2.SatFat(g)

p1 <- ggplot(data_Cate_Satfat, aes(x = Category, y = Satfat_g)) +
  geom_point(aes(size = Satfat_g, color = Satfat_g, alpha = 0.6)) +scale_color_gradient(low = "blue", high = "red")+
  xlab("Menu") +
  ylab("Sat Fat (g)") +
  theme(axis.text.x = element_text(angle = 90, hjust = 0.5, vjust = 0.5))
p2 <- ggplot(data_Cate_Satfat, aes(x = Satfat_g, y = Category)) +geom_boxplot()
grid.arrange(p1, p2, ncol=2)

menu_Satfat <- dataNew %>%
  group_by(Category) %>%
  summarize(Total=sum(Satfat_g),Mean=mean(Satfat_g),Median=median(Satfat_g),Min=min(Satfat_g),Max=max(Satfat_g),Sd=sd(Satfat_g))
menu_Satfat

We can see that the foods on the Gourmet Menu contain the highest average Saturated fat, followed by the foods in the Regular Menu. The average Saturated fat of the Breakfast Menu and the McCafe Menu are relatively similar, but the distribution of Saturated fat content of the food in the McCafe Menu is more spread out.
Recommendation:
If you’re not a fan of saturated fat, it’s recommended that you opt for items from the Breakfast Menu when placing your order. If breakfast time has passed, then selecting items from the Regular Menu would be preferred. As for beverages, it’s suggested to choose options from the Beverages Menu instead of going for coffee.

4.1.3.Transfat(g)

p1 <- ggplot(data_Cate_Transfat, aes(x = Category, y = Transfat_g)) +
  geom_point(aes(size = Transfat_g, color = Transfat_g, alpha = 0.6)) +scale_color_gradient(low = "blue", high = "red")+
  xlab("Menu") +
  ylab("Transfat (g)") +
  theme(axis.text.x = element_text(angle = 90, hjust = 0.5, vjust = 0.5))
p2 <- ggplot(data_Cate_Transfat, aes(x = Transfat_g, y = Category)) +geom_boxplot()
grid.arrange(p1, p2, ncol=2)

An outlier appears, let’s see what it is?
Exporting that line, we find out that it turns out to be 5 Piece Chicken Strips. This dish is super high in trans fat! In order to better analyze the data, I think it would be a good idea to remove this row first and then see what happens to the data.

max_row <- which(data_Cate_Transfat$Transfat_g == max(data_Cate_Transfat$Transfat_g))
data_Cate_Transfat[max_row, ]

data_Cate_Transfat_New <- data_Cate_Transfat[-max_row, ]
p1 <- ggplot(data_Cate_Transfat_New, aes(x = Category, y = Transfat_g)) +
  geom_point(aes(size = Transfat_g, color = Transfat_g, alpha = 0.6)) +scale_color_gradient(low = "blue", high = "red")+
  xlab("Menu") +
  ylab("Transfat (g)") +
  theme(axis.text.x = element_text(angle = 90, hjust = 0.5, vjust = 0.5))
p2 <- ggplot(data_Cate_Transfat_New, aes(x = Transfat_g, y = Category)) +geom_boxplot()
grid.arrange(p1, p2, ncol=2)

menu_Transfat <- data_Cate_Transfat_New %>%
  group_by(Category) %>%
  summarize(Total=sum(Transfat_g),Mean=mean(Transfat_g),Median=median(Transfat_g),Min=min(Transfat_g),Max=max(Transfat_g),Sd=sd(Transfat_g))
menu_Transfat

Foods under the McCofe Menu have the highest average trans fat. In addition, one of the foods on the Condiments Menu has a high level of trans fat.
Recommendation:
For those who are not comfortable with trans fats, the McCofe Menu should be avoided, and the 5 Piece Chicken Strips from the Regular Menu should be avoided.

4.1.4.Cholesterol(mg)

p1 <- ggplot(data_Cate_Cholesterol, aes(x = Category, y = Cholesterol_mg)) +
  geom_point(aes(size = Cholesterol_mg, color = Cholesterol_mg, alpha = 0.6)) +scale_color_gradient(low = "blue", high = "red")+
  xlab("Menu") +
  ylab("Cholesterol (mg)") +
  theme(axis.text.x = element_text(angle = 90, hjust = 0.5, vjust = 0.5))
p2 <- ggplot(data_Cate_Cholesterol, aes(x = Cholesterol_mg, y = Category)) +geom_boxplot()
grid.arrange(p1, p2, ncol=2)

menu_Cholesterol <- dataNew %>%
  group_by(Category) %>%
  summarize(Total=sum(Cholesterol_mg),Mean=mean(Cholesterol_mg),Median=median(Cholesterol_mg),Min=min(Cholesterol_mg),Max=max(Cholesterol_mg),Sd=sd(Cholesterol_mg))
menu_Cholesterol

From the data, the food on the Gourmet Menu has a higher average cholesterol content, followed by the food in the Breakfast Menu. Five of these foods had higher cholesterol levels.

head(data_Cate_Cholesterol[order(-data_Cate_Cholesterol$Cholesterol_mg),], 6)

Recommendation:
For those who are not suitable to eat with high cholesterol content, please take care to order as little as possible: McSpicy Premium Chicken Burger in the Gourmet Menu; Sausage Mc Muffin with egg, Egg McMuffin and Spicy Egg McMuffin in the Breakfast Menu; Mc Egg Masala Burger, Mc Egg Burger for Happy Meal in the Regular Menu.

4.1.5.Totalsugar(g)

p1 <- ggplot(data_Cate_Totalsugar, aes(x = Category, y = Totalsugar_g)) +
  geom_point(aes(size = Totalsugar_g, color = Totalsugar_g, alpha = 0.6)) +scale_color_gradient(low = "blue", high = "red")+
  xlab("Menu") +
  ylab("Totalsugar (g)") +
  theme(axis.text.x = element_text(angle = 90, hjust = 0.5, vjust = 0.5))
p2 <- ggplot(data_Cate_Totalsugar, aes(x = Totalsugar_g, y = Category)) +geom_boxplot()
grid.arrange(p1, p2, ncol=2)

menu_Totalsugar <- dataNew %>%
  group_by(Category) %>%
  summarize(Total=sum(Totalsugar_g),Mean=mean(Totalsugar_g),Median=median(Totalsugar_g),Min=min(Totalsugar_g),Max=max(Totalsugar_g),Sd=sd(Totalsugar_g))
menu_Totalsugar

According to the data, McDonald’s drinks have a lot of sugar, and the Beverages Menu has the highest average sugar content, followed by the McCafe Menu, and the Breakfast Menu and Regular Menu have the lowest average sugar content.
Recommendation:
For people who are not suitable for eating with too much sugar content, try to choose sugar-free or less sugar drinks when ordering a meal. Other menu foods contain relatively little sugar and can be eaten.

4.1.6.Sodium(mg)

p1 <- ggplot(data_Cate_Sodium, aes(x = Category, y = Sodium_mg)) +
  geom_point(aes(size = Sodium_mg, color = Sodium_mg, alpha = 0.6)) +scale_color_gradient(low = "blue", high = "red")+
  xlab("Menu") +
  ylab("Sodium (mg)") +
  theme(axis.text.x = element_text(angle = 90, hjust = 0.5, vjust = 0.5))
p2 <- ggplot(data_Cate_Sodium, aes(x = Sodium_mg, y = Category)) +geom_boxplot()
grid.arrange(p1, p2, ncol=2)

menu_Sodium <- dataNew %>%
  group_by(Category) %>%
  summarize(Total=sum(Sodium_mg),Mean=mean(Sodium_mg),Median=median(Sodium_mg),Min=min(Sodium_mg),Max=max(Sodium_mg),Sd=sd(Sodium_mg))
menu_Sodium

The food on the Gourmet Menu had a very high average sodium content, about 500mg higher than the second place Regular Menu, and not surprisingly, the food on the Beverage Menu had the lowest average sodium content.
Recommendation:
The foods in the Gourmet Menu need to be chosen carefully for people who are not suitable for eating with too much sodium. Please note that some of the foods on the regular menu have a high sodium content.

4.2.Gourmet Menu vs Regular Menu vs Breakfast Menu

dataNew %>%
  filter(Category %in% c("Regular Menu", "Gourmet Menu","Breakfast Menu")) %>%
  filter(Transfat_g < 20) %>% 
  dplyr::select(Category, where(is.numeric)) %>%
  tidyr::pivot_longer(-Category) %>%
  ggplot(aes(x = value,fill = Category)) +
  geom_density(alpha = 0.8) +
  facet_wrap(~ name,scales = "free") +
  ggtitle(label = " Gourmet vs Regular vs Breakfast") +
  labs(x = NULL,y = "Probability density",fill = NULL,caption = "Outlier '5 piece Chicken Strips' with `trans_fat` of 75.3g removed") +
  theme(legend.position = "top",plot.subtitle = element_text(face = "italic"))

In general, we will find that the nutritional content of the food in the Gourmet Menu has a more obvious right-shifted curve; the nutritional content of the food in the Breakfast Menu has a more obvious left-shifted curve; and the nutritional content of the food in the Regular Menu has a more centered curve.
This is more in agreement with our expectation that the food for breakfast is simpler compared to the daily food and gourmet food. The food in the Gourmet Menu is high in various nutrients, which is more in agreement with the food in the Gourmet Menu tends to be more expensive, and you get what you pay for.

4.3.Use PCA to downscale and see if there are any new findings.

dataNew %>% 
  dplyr::select(where(is.numeric)) %>%
  as.data.frame() -> dataNumeric
corr_matrix <- cor(dataNumeric)
corrplot(corr_matrix, method = "color", type = "lower",tl.col = "black", tl.srt = 45)

According to the heat map, we can find some information.
There are a number of high correlations between the features:
(1)Energy_kCal ~ Protein_g,Totalfat_g,Satfat_g,Totalcarbohydrate_g,Sodium_mg
(2)Protein_g ~ Energy_kCal,Totalfat_g,Sodium_mg
(3)Totalfat_g ~ Energy_kCal,Protein_g,Satfat_g,Sodium_mg
(4)Satfat_g ~ Energy_kCal,Totalfat_g
(5)Totalcarbohydrate_g ~ Energy_kCal
(6)Totalsugar_g ~ Addedsugar_g
(7)Sodium_mg ~ Energy_kCal,Protein_g,Totalfat_g

dataNew %>% 
  dplyr::select(where(is.numeric)) %>%
  scale() %>% 
  as.data.frame() -> dataScaled
rownames(dataScaled) <- dataScaled$Items
#pca <- PCA(dataScaled, graph = TRUE)
pca <- prcomp(dataScaled)
fviz_eig(pca, addlabels = TRUE, ylim = c(0, 60))

The eigenvalues, contribution sizes corresponding to each principal component in the PCA are plotted.

Cumulative contribution rate	Top1	Top2	Top3	Top4	Top5	Top6	Top7
	50.5%	74.9%	85.5%	92.9%	96.6%	98.3%	99.1%

fviz_pca_ind(pca,col.ind = dataNew$Category,select.ind = list(cos2 = 0.5))

## Warning: The shape palette can deal with a maximum of 6 discrete values because
## more than 6 becomes difficult to discriminate; you have 7. Consider
## specifying shapes manually if you must have them.

## Warning: Removed 23 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

Visualize the projection of each observation on the first two principal components.
(1)The Beverages Menu is concentrated in the first quadrant.
(2)The Regular Menu is concentrated in the third quadrant.
(3)The Condiments Menu is concentrated in the fourth quadrant.

4.4.Predict whether a particular item belongs to the Regular Menu or not.

According to the picture in section 4.3, the Gourmet Menu is not easily distinguishable from the Regular Menu. We think about whether we can use machine learning methods to distinguish Gourmet Menu from Regular Menu based on nutritional content.

Annotate the Regular Menu of the dataset.

X <- dataNew
features <- c("Category","Items","Size_g")
X <- X[, !(names(X) %in% features)]
y <- ifelse(dataNew$Category == "Regular Menu", "1", "0")

Divide the train set and test set.

set.seed(0)
trainIndex <- createDataPartition(y, p = 0.8, list = FALSE)
X_train <- X[trainIndex, ]
X_test <- X[-trainIndex, ]
y_train <- y[trainIndex]
y_test <- y[-trainIndex]

preprocessParams <- preProcess(X_train)
X_train <- predict(preprocessParams, X_train)
X_test <- predict(preprocessParams, X_test)

Data standardization.

X_train %>% 
  scale() %>% 
  as.data.frame() -> trainScaled
rownames(trainScaled) <- trainScaled$Items
pcatrain <- prcomp(trainScaled)

X_test %>% 
  scale() %>% 
  as.data.frame() -> testScaled
rownames(testScaled) <- testScaled$Items
pcatest <- prcomp(testScaled)

Check the contribution rate of PCA.

pca_var_ratio <- pcatrain$sdev^2/sum(pcatrain$sdev^2)
pca_var_ratio

##  [1] 0.5159519760 0.2439825838 0.1062308341 0.0694347194 0.0315490665
##  [6] 0.0165927184 0.0071056273 0.0057838472 0.0029820574 0.0003865699

The cumulative contribution of the first seven reached 0.9908475, so we selected the first seven features of PCA for training.

datasetML <- predict(pcatrain, trainScaled)[,1:7]
datasetMLTest <- predict(pcatest, testScaled)[,1:7]

Menu_Label <- as.numeric(as.factor(y_train))
Menu_LabelTest <- as.numeric(as.factor(y_test))

datasetML <- as.data.frame(datasetML)
datasetML$Class <- Menu_Label

datasetMLTest <- as.data.frame(datasetMLTest)
datasetMLTest$Class <- Menu_LabelTest

datasetML$Class <- as.factor(datasetML$Class)
datasetMLTest$Class <- as.factor(datasetMLTest$Class)
# head(datasetML)

4.4.1.Linear Model

##  Accuracy 
## 0.8518519

## Accuracy 
## 0.840708

4.4.2.Multinomial-Logistic Model

modelMUL <- multinom(Class ~ ., data = datasetML)

## # weights:  9 (8 variable)
## initial  value 78.325631 
## iter  10 value 37.171875
## iter  20 value 33.110493
## iter  30 value 33.020291
## final  value 33.019904 
## converged

predictedMUL <- predict(modelMUL, newdata = datasetMLTest)
AccuracyMUL <- confusionMatrix(predictedMUL, datasetMLTest$Class)$overall["Accuracy"]
AccuracyMUL

##  Accuracy 
## 0.7777778

predictedMULTrain <- predict(modelMUL, newdata = datasetML)
AccuracyMULTrain <- confusionMatrix(predictedMULTrain, datasetML$Class)$overall["Accuracy"]
AccuracyMULTrain

##  Accuracy 
## 0.8938053

4.4.3.Support Vector Machine

modelSVM <- svm(Class ~ ., data = datasetML, kernel = "linear", cost = 10)
predictedSVM <- predict(modelSVM, newdata = datasetMLTest)
AccuracySVM <- confusionMatrix(predictedSVM, datasetMLTest$Class)$overall["Accuracy"]
AccuracySVM

##  Accuracy 
## 0.7037037

predictedSVMTrain <- predict(modelSVM, newdata = datasetML)
AccuracySVMTrain <- confusionMatrix(predictedSVMTrain, datasetML$Class)$overall["Accuracy"]
AccuracySVMTrain

##  Accuracy 
## 0.8761062

4.4.4.Lasso

modelLasso <- glmnet(datasetML, y_train, alpha = 1, family = "binomial")
temp <- datasetMLTest
temp$Class <- as.numeric(temp$Class)
X_test <- as.matrix(temp)
predictedLasso <- predict(modelLasso, newx = X_test, type = "response")
predictedLasso <- ifelse(predictedLasso > 0.5, 1, 0)
AccuracyLasso <- sum(predictedLasso == y_test) / length(y_test)
AccuracyLasso

## [1] 69.7037

temp <- datasetML
temp$Class <- as.numeric(temp$Class)
X_train <- as.matrix(temp)
predictedLassoTrain <- predict(modelLasso, newx = X_train, type = "response")
predictedLassoTrain <- ifelse(predictedLassoTrain > 0.5, 1, 0)
AccuracyLassoTrain <- sum(predictedLassoTrain == y_train) / length(y_train)
AccuracyLassoTrain

## [1] 69.71681

4.4.5.Ridge

modelRidge <- glmnet(datasetML, y_train, alpha = 0, family = "binomial")
temp <- datasetMLTest
temp$Class <- as.numeric(temp$Class)
X_test <- as.matrix(temp)
predictedRidge <- predict(modelRidge, newx = X_test, type = "response")
predictedRidge <- ifelse(predictedRidge > 0.5, 1, 0)
AccuracyRidge <- sum(predictedRidge == y_test) / length(y_test)
AccuracyRidge

## [1] 80.96296

temp <- datasetML
temp$Class <- as.numeric(temp$Class)
X_train <- as.matrix(temp)
predictedRidgeTrain <- predict(modelRidge, newx = X_train, type = "response")
predictedRidgeTrain <- ifelse(predictedRidgeTrain > 0.5, 1, 0)
AccuracyRidgeTrain <- sum(predictedRidgeTrain == y_train) / length(y_train)
AccuracyRidgeTrain

## [1] 81.36283

4.4.6.Backward Step

## [1] 0.7037037

## [1] 0.8672566

4.4.7.Forward Step

## [1] 0.7777778

## [1] 0.8938053

Model	Accuracy—TrainSet	Accuracy—TestSet
Linear Model	0.840708	0.8518519
Multinomial-Logistic Model	0.8938053	0.7777778
SVM	0.8761062	0.7037037
Lasso	0.6971681	0.697037
Ridge	0.8136283	0.8096296
Backward Step	0.8672566	0.7037037
Forward Step	0.8938053	0.7777778

We can see that Multinomial-Logistic Model and Forward Step Model have the highest prediction accuracy of 89.4% on the train set and Linear Model has the highest prediction accuracy of 85.2% on the test set. When McDonald’s comes out with new products in the future, it can use the model to develop whether the new products belong to the ” Regular Menu ” or the ” Gourmet Menu “.

5.Conclusion

5.1.Analysis of unhealthy nutrients in different menus.

(1)Foods on the Gourmet Menu contain the highest average of: fat, saturated fat, cholesterol, sodium.
(2)Foods on the McCafe Menu contain the highest average of: trans fat.
(3)Foods on the Beverage Menu contain the highest average of: sugar.

Please note that we are giving average values and that foods in other menus may still contain higher levels of nutrients.
The results are for informational purposes only, and some meal selection suggestions are provided during the analysis. If you are a patient, please follow your doctor’s instructions for meals.

5.3.Use PCA to downscale and see if there are any new findings.

We analyzed the correlation between nutrients and gave the contribution of each principal component after PCA.

5.4.Predict whether a particular item belongs to the Regular Menu or not.

The Multinomial-Logistic Model and Forward Step Model achieved the highest prediction accuracy of 89.4% on the training set, while the Linear Model had the highest prediction accuracy of 85.2% on the test set.
These models can be utilized by McDonald’s to determine whether new products belong to their “Regular Menu” or “Gourmet Menu” categories in the future. This will help them make informed decisions when introducing new items to their menu.

6.Reference

I was inspired by many of the following resources in this project, and after reading the following, I wrote my own program and wrote the documentation in conjunction with the course content.

[1][McDonald’s India : Menu Nutrition Dataset](https://www.kaggle.com/datasets/deepcontractor/mcdonalds-india-menu-nutrition-facts)

[2][McD India - Exploratory Work in R - Graphs & PCA](https://www.kaggle.com/code/rsangole/mcd-india-exploratory-work-in-r-graphs-pca/report#graphical-eda)

[3][McDonalds Menu EDA](https://www.kaggle.com/code/prathameshgadekar/mcdonalds-menu-eda)

McDonald’s Food Nutrition Data Analysis

CHEN Guang 22067084g

McDonald’s Food Nutrition Data Analysis

1.Background

2.Dataset

3.Research Questions

4.Result:

4.1.Analysis of unhealthy nutrients in different menus.

4.1.1.Totalfat(g)

4.1.2.SatFat(g)

4.1.3.Transfat(g)

4.1.4.Cholesterol(mg)

4.1.5.Totalsugar(g)

4.1.6.Sodium(mg)

4.2.Gourmet Menu vs Regular Menu vs Breakfast Menu

4.3.Use PCA to downscale and see if there are any new findings.

4.4.Predict whether a particular item belongs to the Regular Menu or not.

4.4.1.Linear Model

4.4.2.Multinomial-Logistic Model

4.4.3.Support Vector Machine

4.4.4.Lasso

4.4.5.Ridge

4.4.6.Backward Step

4.4.7.Forward Step

5.Conclusion

5.1.Analysis of unhealthy nutrients in different menus.

5.2.Is there any difference between the gourmet menu, the regular menu and the breakfast menu?

5.3.Use PCA to downscale and see if there are any new findings.

5.4.Predict whether a particular item belongs to the Regular Menu or not.

6.Reference