MVA Homework 1

mydata <- read.csv("C:/Users/Tajda/Downloads/burger-king-menu (2).csv") [,c(1,2,3,5,12,13)]

head(mydata)

##                                   Item Category Calories Fat..g. Sugars..g.
## 1                    Whopper® Sandwich  Burgers      660      40         11
## 2        Whopper® Sandwich with Cheese  Burgers      740      46         11
## 3     Bacon & Cheese Whopper® Sandwich  Burgers      790      51         11
## 4             Double Whopper® Sandwich  Burgers      900      58         11
## 5 Double Whopper® Sandwich with Cheese  Burgers      980      64         11
## 6             Triple Whopper® Sandwich  Burgers     1130      75         11
##   Protein..g.
## 1          28
## 2          32
## 3          35
## 4          48
## 5          52
## 6          67

colnames(mydata) <- c ("Item", "Category", "Calories", "Fat(g)", "Sugars(g)", "Proteins(g)")


head(mydata)

##                                   Item Category Calories Fat(g) Sugars(g)
## 1                    Whopper® Sandwich  Burgers      660     40        11
## 2        Whopper® Sandwich with Cheese  Burgers      740     46        11
## 3     Bacon & Cheese Whopper® Sandwich  Burgers      790     51        11
## 4             Double Whopper® Sandwich  Burgers      900     58        11
## 5 Double Whopper® Sandwich with Cheese  Burgers      980     64        11
## 6             Triple Whopper® Sandwich  Burgers     1130     75        11
##   Proteins(g)
## 1          28
## 2          32
## 3          35
## 4          48
## 5          52
## 6          67

Description: This data set is a collection of nutritional information for all major menu items offered by Burger King. The data set includes information on the number of calories, total fat,sugars and protein found in each menu item.

Unit of observation: menu items offered by Burger King,

Sample size: 77 (units of observation).

Calories: number of calories (kcal) is in each menu item (meal)

Fat: how many grams (g) of fats is in each menu item.

Sugars:how many grams (g) of sugars is in each menu item.

Proteins:how many grams (g) of proteins is in each menu item.

Source: https://www.kaggle.com/datasets/mattop/burger-king-menu-nutrition-data?select=burger-king-menu.csv

Main goal of this data analysis (research question): How different categories (burgers, breakfasts and chicken meal) of menu items differ in the number of calories, fats, sugars and proteins.

mydata$CategoryFactor <- factor(mydata$Category,
                                labels = c (0, 1, 2),
                                levels = c ("Burgers", "Chicken", "Breakfast"))

library(pastecs)
round(stat.desc(mydata[,c(-1,-2,-7)]), 1)

##              Calories Fat(g) Sugars(g) Proteins(g)
## nbr.val          77.0   77.0      77.0        77.0
## nbr.null          0.0    4.0      13.0         8.0
## nbr.na            0.0    0.0       0.0         0.0
## min              10.0    0.0       0.0         0.0
## max            1220.0   84.0      40.0        71.0
## range          1210.0   84.0      40.0        71.0
## sum           38610.0 2384.5     511.0      1610.0
## median          430.0   28.0       6.0        17.0
## mean            501.4   31.0       6.6        20.9
## SE.mean          35.1    2.3       0.8         2.0
## CI.mean.0.95     69.8    4.7       1.6         3.9
## var           94625.6  421.7      48.6       294.0
## std.dev         307.6   20.5       7.0        17.1
## coef.var          0.6    0.7       1.1         0.8

CALORIES:

Max number of calories of menu items(unit of observation) is 1220.0 kcal.

Mean: The average number of calories of menu items is 501.4 kcal.

FAT:

Median: 50% of all observations (menu items) of data have grams of fats higher than 28 grams, 50% have less than 28 grams.

Mean: The average grams of fats in menu items is 31.0 g.

SUGAR:

Min: Minimum grams of sugars in menu items is 0.0 grams. Max: Maximum grams of sugars in menu items is 40.0 grams. –> Range (difference) between max and min value is 40.0 grams.

PROTEINS:

nbr.val: number of units of observation is 77.

CI.mean.0.95: arithmetic mean of grams of proteins is with 95% between (20.9 - 3.9)g and ( 20.9 + 3.9)g.

Histograms

hist(mydata$`Fat(g)`, 
     main = "Distribution of fat (grams) in menu items", 
     xlab = "Fat (g) in menu item", 
     ylab = "Frequency",
     breaks = seq(from = 0, to = 100, by = 5))

It does not look normally distributed but positively skewed. We can notice the mode (the value that appears most often in a set of data values) which is 15-20 grams of fats.

hist(mydata$Calories, 
     main = "Number of calories (kcal) in menu items", 
     xlab = "Calories (kcal)", 
     ylab = "Frequency",
     breaks = seq(from = 0, to = 1500, by = 50))

According to the histogram the distribution looks slightly positively skewed.

CAL <- ggplot(mydata, aes(y=Calories , fill=Category)) +
  geom_boxplot(position=position_dodge(1)) +
  ggtitle("Calories") +
  ylab("Calories (kcal)") + 
  ylim(0,1500) +
  theme(axis.text.x=element_blank(),
        axis.ticks.x=element_blank())

FAT <- ggplot(mydata, aes(y=mydata[,4] , fill=Category)) +
  geom_boxplot(position=position_dodge(1)) +
  ggtitle("Fat") +
  ylab("Fat (g") + 
  ylim(0,100) +
  theme(axis.text.x=element_blank(),
        axis.ticks.x=element_blank())

SUG <- ggplot(mydata, aes(y=mydata[,5] , fill=Category)) +
  geom_boxplot(position=position_dodge(1)) +
  ggtitle("Sugars") +
  ylab("Sugars (g)") + 
  ylim(0,40) +
  theme(axis.text.x=element_blank(),
        axis.ticks.x=element_blank())

PROT <- ggplot(mydata, aes(y=mydata[,6] , fill=Category)) +
  geom_boxplot(position=position_dodge(1)) +
  ggtitle("Proteins") +
  ylab("Proteins (g)") + 
  ylim(0,100) +
  theme(axis.text.x=element_blank(),
        axis.ticks.x=element_blank())

ggarrange(CAL, FAT, SUG, PROT,
         ncol=2, nrow=2)

In every boxplot we can see min and max value, median, first and third quartile (numers are seen in the descriptive statistics. First we can look at min value. It is find at the tip of the lower whisker. On the other hand the max value can be found at the tip of upper whisker. The points above the max or min value represent outliers. So for example in boxplot Sugars there are 5 outliers in category breakfast and 1 in category burgers.The median is at the middle of box (hortizontal line). We can also see interquartile range which measures the spread of the middle half of your data. .At the ends of the box, you find the first quartile and the third quartile. If we look for the max number, category Burgers have the highest number of calories, fats, sugars and proteins. On the other hand category Breakfast have the lowest number of calories, fats and proteins but not sugars.

MVA Homework 1

Tajda Korošec

2023-01-05

Description: This data set is a collection of nutritional information for all major menu items offered by Burger King. The data set includes information on the number of calories, total fat,sugars and protein found in each menu item.

Histograms