mydata <- read.csv("C:/Users/Tajda/Downloads/burger-king-menu (2).csv") [,c(1,2,3,5,12,13)]
head(mydata)
## Item Category Calories Fat..g. Sugars..g.
## 1 Whopper® Sandwich Burgers 660 40 11
## 2 Whopper® Sandwich with Cheese Burgers 740 46 11
## 3 Bacon & Cheese Whopper® Sandwich Burgers 790 51 11
## 4 Double Whopper® Sandwich Burgers 900 58 11
## 5 Double Whopper® Sandwich with Cheese Burgers 980 64 11
## 6 Triple Whopper® Sandwich Burgers 1130 75 11
## Protein..g.
## 1 28
## 2 32
## 3 35
## 4 48
## 5 52
## 6 67
colnames(mydata) <- c ("Item", "Category", "Calories", "Fat(g)", "Sugars(g)", "Proteins(g)")
head(mydata)
## Item Category Calories Fat(g) Sugars(g)
## 1 Whopper® Sandwich Burgers 660 40 11
## 2 Whopper® Sandwich with Cheese Burgers 740 46 11
## 3 Bacon & Cheese Whopper® Sandwich Burgers 790 51 11
## 4 Double Whopper® Sandwich Burgers 900 58 11
## 5 Double Whopper® Sandwich with Cheese Burgers 980 64 11
## 6 Triple Whopper® Sandwich Burgers 1130 75 11
## Proteins(g)
## 1 28
## 2 32
## 3 35
## 4 48
## 5 52
## 6 67
hist(mydata$`Fat(g)`,
main = "Distribution of fat (grams) in menu items",
xlab = "Fat (g) in menu item",
ylab = "Frequency",
breaks = seq(from = 0, to = 100, by = 5))
It does not look normally distributed but positively skewed. We can notice the mode (the value that appears most often in a set of data values) which is 15-20 grams of fats.
hist(mydata$Calories,
main = "Number of calories (kcal) in menu items",
xlab = "Calories (kcal)",
ylab = "Frequency",
breaks = seq(from = 0, to = 1500, by = 50))
According to the histogram the distribution looks slightly positively skewed.
CAL <- ggplot(mydata, aes(y=Calories , fill=Category)) +
geom_boxplot(position=position_dodge(1)) +
ggtitle("Calories") +
ylab("Calories (kcal)") +
ylim(0,1500) +
theme(axis.text.x=element_blank(),
axis.ticks.x=element_blank())
FAT <- ggplot(mydata, aes(y=mydata[,4] , fill=Category)) +
geom_boxplot(position=position_dodge(1)) +
ggtitle("Fat") +
ylab("Fat (g") +
ylim(0,100) +
theme(axis.text.x=element_blank(),
axis.ticks.x=element_blank())
SUG <- ggplot(mydata, aes(y=mydata[,5] , fill=Category)) +
geom_boxplot(position=position_dodge(1)) +
ggtitle("Sugars") +
ylab("Sugars (g)") +
ylim(0,40) +
theme(axis.text.x=element_blank(),
axis.ticks.x=element_blank())
PROT <- ggplot(mydata, aes(y=mydata[,6] , fill=Category)) +
geom_boxplot(position=position_dodge(1)) +
ggtitle("Proteins") +
ylab("Proteins (g)") +
ylim(0,100) +
theme(axis.text.x=element_blank(),
axis.ticks.x=element_blank())
ggarrange(CAL, FAT, SUG, PROT,
ncol=2, nrow=2)
In every boxplot we can see min and max value, median, first and third quartile (numers are seen in the descriptive statistics. First we can look at min value. It is find at the tip of the lower whisker. On the other hand the max value can be found at the tip of upper whisker. The points above the max or min value represent outliers. So for example in boxplot Sugars there are 5 outliers in category breakfast and 1 in category burgers.The median is at the middle of box (hortizontal line). We can also see interquartile range which measures the spread of the middle half of your data. .At the ends of the box, you find the first quartile and the third quartile. If we look for the max number, category Burgers have the highest number of calories, fats, sugars and proteins. On the other hand category Breakfast have the lowest number of calories, fats and proteins but not sugars.