R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

mydata4 <- read.csv("C:/Users/MihaPristov/Desktop/faks/R-multiverse/Homework 1/burger-king-menu.csv", header=TRUE, sep=",", dec=".")

head(mydata4)
##                                   Item Category Calories Fat.Calories Fat..g.
## 1                    Whopper® Sandwich  Burgers      660          360      40
## 2        Whopper® Sandwich with Cheese  Burgers      740          420      46
## 3     Bacon & Cheese Whopper® Sandwich  Burgers      790          460      51
## 4             Double Whopper® Sandwich  Burgers      900          520      58
## 5 Double Whopper® Sandwich with Cheese  Burgers      980          580      64
## 6             Triple Whopper® Sandwich  Burgers     1130          680      75
##   Saturated.Fat..g. Trans.Fat..g. Cholesterol..mg. Sodium..mg. Total.Carb..g.
## 1                12           1.5               90         980             49
## 2                16           2.0              115        1340             50
## 3                17           2.0              125        1560             50
## 4                20           3.0              175        1050             49
## 5                24           3.0              195        1410             50
## 6                28           4.0              255        1120             49
##   Dietary.Fiber..g. Sugars..g. Protein..g. Weight.Watchers
## 1                 2         11          28             655
## 2                 2         11          32             735
## 3                 2         11          35             783
## 4                 2         11          48             883
## 5                 2         11          52             963
## 6                 2         11          67            1102

The data consists of BurgerKing menu items. The unit of observation is a menu item that is either a Burger, a chicken product or an brekafast item. As we are only interested in these three categories and the data set was last updated one month ago, this is technically a census.

Describtion of variables:

-Item: name of the item -Category: Category in witch the item belongs -Calories: Number of calories in the item -Fat Calories: Number of fat calories in the item -Fat in grams: Amount of fat in grams in the item -Saturated Fat in grams: Amount of saturated fat in grams in the item -Trans Fat in grams: Amount of trans fat in grams in the item -Other Calories: Calories in the item that are not cat calories (Calories-Fat Calories)

Units of mesurment: kcal, grams

mydata5 <- mydata4 [c(-8,-9,-10,-11,-12,-13,-14)]

mydata5$CategoryF <- factor(mydata5$Category,
                            levels = c("Burgers","Chicken","Breakfast"),
                            labels = c(1,2,3) )

colnames(mydata5) <- c("Item","Category","Calories","Fat Calories","Fatingrams","Saturated fat in grams","Trans Fat in grams","category")

Research question: Do Burgers and Chicken differ in Fat calories?

H0: The means of Fat calories for categories Burgers and Chicken are the same. (μ1=μ2) H1: The means of Fat calories for categories Burgers and Chicken are not the same. (μ1≠μ2)

##install.packages("tidyverse")
library(magrittr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
mydata5 %>%
   group_by(Category) %>%
  summarise(mean_fat = round(mean(`Fat Calories`), 2)) %>%
   rename("Means of Fat Calories" = "mean_fat")
## # A tibble: 3 × 2
##   Category  `Means of Fat Calories`
##   <chr>                       <dbl>
## 1 Breakfast                    214.
## 2 Burgers                      351.
## 3 Chicken                      292.
mydata6 <- mydata5 %>% filter(Category != "Breakfast")
library(ggplot2)


ggplot(mydata6, aes(x= `Fat Calories`)) + 
  geom_histogram(binwidth = 50) +
  facet_wrap (~Category, ncol = 1 )

From the Graph we can see that neither Burgers not Chicken have a normal distirbution of Fat calories. This is why we move to non parametric test.

H0: The medians of Fat calories for categories Burgers and Chicken are the same. (Me1 = Me2) H1: The medians of Fat calories for categories Burgers and Chicken are not the same. (Me1 ≠ Me2)

wilcox.test(mydata6$`Fat Calories`~ mydata6$category,
            paired = FALSE,
            correct = FALSE,
            exact = FALSE,
            alternative = "two.sided")
## 
##  Wilcoxon rank sum test
## 
## data:  mydata6$`Fat Calories` by mydata6$category
## W = 260.5, p-value = 0.5268
## alternative hypothesis: true location shift is not equal to 0
library(effectsize)

effectsize(wilcox.test(mydata6$`Fat Calories`~ mydata6$category,
            paired = FALSE,
            correct = FALSE,
            exact = FALSE,
            alternative = "two.sided"))
## r (rank biserial) |        95% CI
## ---------------------------------
## 0.11              | [-0.23, 0.43]
interpret_rank_biserial(0.11)
## [1] "small"
## (Rules: funder2019)

Using the data, we are unable unable to demonstrate that Fat calories differed between Burger and chiken menu items. (p=0,53, the effect size is small, 𝑟= 0.11).

Research question number 2: Do Burgers, Chicken and Breakfast items arithmetic means differ in Fat in grams.

H0: μ(Burgers)= μ(Chicken) = μ(Breakfast). Means of Fat in grams of categories Burgers, Chicken and Breakfast items are the same. H1: One μ or more are different

library(magrittr)
library(dplyr)

mydata5 %>%
   group_by(Category) %>%
  summarise(mean_fat = round(mean(Fatingrams), 2)) %>%
   rename("Means of Fat in grams" = "mean_fat")
## # A tibble: 3 × 2
##   Category  `Means of Fat in grams`
##   <chr>                       <dbl>
## 1 Breakfast                    23.8
## 2 Burgers                      39.0
## 3 Chicken                      32.5
##install.packages("stats")
library(rstatix)
## 
## Attaching package: 'rstatix'
## The following objects are masked from 'package:effectsize':
## 
##     cohens_d, eta_squared
## The following object is masked from 'package:stats':
## 
##     filter
library(dplyr)

mydata5 %>%
  group_by(Category) %>%
  shapiro_test(Fatingrams)
## # A tibble: 3 × 4
##   Category  variable   statistic      p
##   <chr>     <chr>          <dbl>  <dbl>
## 1 Breakfast Fatingrams     0.946 0.0985
## 2 Burgers   Fatingrams     0.902 0.0171
## 3 Chicken   Fatingrams     0.948 0.397

From p values we see that Breakfast and Burgers both fail the shapiro wilk test from normality of distribution but Chiken’s p-value is to high to reject H0 that fat in grams is normally distirbuted, but we still have to move to a non parametric test.

New hyphotesis:

H0: All distribution locations of variables are the same H1: At least one distribution location of variable is different

kruskal.test(Fatingrams ~ Category,
             data=mydata5)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Fatingrams by Category
## Kruskal-Wallis chi-squared = 5.3735, df = 2, p-value = 0.0681
library(rstatix)

kruskal_effsize(Fatingrams ~ Category,
             data=mydata5)
## # A tibble: 1 × 5
##   .y.            n effsize method  magnitude
## * <chr>      <int>   <dbl> <chr>   <ord>    
## 1 Fatingrams    77  0.0456 eta2[H] small
groups_nonpar <- wilcox_test(Fatingrams ~ Category,
            paired=FALSE,
            p.adjust.method = "bonferroni",
            data = mydata5)

groups_nonpar
## # A tibble: 3 × 9
##   .y.        group1    group2     n1    n2 statistic     p p.adj p.adj.signif
## * <chr>      <chr>     <chr>   <int> <int>     <dbl> <dbl> <dbl> <chr>       
## 1 Fatingrams Breakfast Burgers    33    26      286. 0.03  0.09  ns          
## 2 Fatingrams Breakfast Chicken    33    18      220. 0.134 0.402 ns          
## 3 Fatingrams Burgers   Chicken    26    18      260  0.543 1     ns

From the data we could not prove that the distribution of fat in grams differs for at least one of menu item categories (𝜒2 =5.37 𝑝=0.07), the effect size was small (𝜂2 = 0.045). Post-Hoc tests revealed non significant differences for each pair of groups (𝑝>0.05).

Research question number 3: Does a BurgerKing menu item have more calories than an Mcdonalds menu item in the US.

The average Mcdonalds menu item in US has 368,5 calories according to: a data set from Kaggle https://www.kaggle.com/datasets/mcdonalds/nutrition-facts

The research question is does a BurgerKing menu items have more calories on average than an average Mcdonalds menu items in the US.

H0: The means of calories for BurgerKing menu items and Mcdonalds menu items are the same. (μ=μ0) H1: The means of calories for BurgerKing menu items and Mcdonalds menu items are not the same.(μ≠μ0)

library(magrittr)
library(dplyr)


mydata5 %>%
   summarise(mean_calories = round(mean(`Calories`), 2)) %>%
   rename("Means of Item Calories" = "mean_calories")
##   Means of Item Calories
## 1                 501.43
library(ggplot2)


ggplot(mydata5, aes(x= `Calories`)) + 
  geom_histogram(binwidth = 30)

  shapiro_test(mydata5$Calories)
## # A tibble: 1 × 3
##   variable         statistic p.value
##   <chr>                <dbl>   <dbl>
## 1 mydata5$Calories     0.963  0.0252

We see that Burger calories are not normallly distirbuted and can reject null hyphotesis at p=2,5%. We move to a non parametric test. New hyphotesis:

H0: The medians of calories for BurgerKing menu items and Mcdonalds menu items are the same. (Me = Me0) H1: The medians of calories for categories menu items and Mcdonalds menu items are not the same. (Me ≠ Me0)

median(mydata5$Calories)
## [1] 430
wilcox.test(mydata5$Calories,
            mu = 340,
            correct=FALSE,
            exact= FALSE,
            alternative = "two.sided")
## 
##  Wilcoxon signed rank test
## 
## data:  mydata5$Calories
## V = 2197, p-value = 0.0001443
## alternative hypothesis: true location is not equal to 340

We can reject null hypothesis at p-value<0.001

effectsize(wilcox.test(mydata5$Calories,
            mu = 340,
            correct=FALSE,
            exact= FALSE,
            alternative = "two.sided"))
## r (rank biserial) |       95% CI
## --------------------------------
## 0.50              | [0.28, 0.67]
## 
## - Deviation from a difference of 340.
interpret_rank_biserial(0.5, rules = "funder2019")
## [1] "very large"
## (Rules: funder2019)

Based on the data, we found that the median Calories of BurgerKing Burgers was 630 and is higher than the median of calories of Mcdonalds burgers. (p<0.001, 𝑟 =0.5 – very large effect).