This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
mydata4 <- read.csv("C:/Users/MihaPristov/Desktop/faks/R-multiverse/Homework 1/burger-king-menu.csv", header=TRUE, sep=",", dec=".")
head(mydata4)
## Item Category Calories Fat.Calories Fat..g.
## 1 Whopper® Sandwich Burgers 660 360 40
## 2 Whopper® Sandwich with Cheese Burgers 740 420 46
## 3 Bacon & Cheese Whopper® Sandwich Burgers 790 460 51
## 4 Double Whopper® Sandwich Burgers 900 520 58
## 5 Double Whopper® Sandwich with Cheese Burgers 980 580 64
## 6 Triple Whopper® Sandwich Burgers 1130 680 75
## Saturated.Fat..g. Trans.Fat..g. Cholesterol..mg. Sodium..mg. Total.Carb..g.
## 1 12 1.5 90 980 49
## 2 16 2.0 115 1340 50
## 3 17 2.0 125 1560 50
## 4 20 3.0 175 1050 49
## 5 24 3.0 195 1410 50
## 6 28 4.0 255 1120 49
## Dietary.Fiber..g. Sugars..g. Protein..g. Weight.Watchers
## 1 2 11 28 655
## 2 2 11 32 735
## 3 2 11 35 783
## 4 2 11 48 883
## 5 2 11 52 963
## 6 2 11 67 1102
The data consists of BurgerKing menu items. The unit of observation is a menu item that is either a Burger, a chicken product or an brekafast item. As we are only interested in these three categories and the data set was last updated one month ago, this is technically a census.
Describtion of variables:
-Item: name of the item -Category: Category in witch the item belongs -Calories: Number of calories in the item -Fat Calories: Number of fat calories in the item -Fat in grams: Amount of fat in grams in the item -Saturated Fat in grams: Amount of saturated fat in grams in the item -Trans Fat in grams: Amount of trans fat in grams in the item -Other Calories: Calories in the item that are not cat calories (Calories-Fat Calories)
Units of mesurment: kcal, grams
mydata5 <- mydata4 [c(-8,-9,-10,-11,-12,-13,-14)]
mydata5$CategoryF <- factor(mydata5$Category,
levels = c("Burgers","Chicken","Breakfast"),
labels = c(1,2,3) )
colnames(mydata5) <- c("Item","Category","Calories","Fat Calories","Fatingrams","Saturated fat in grams","Trans Fat in grams","category")
H0: The means of Fat calories for categories Burgers and Chicken are the same. (μ1=μ2) H1: The means of Fat calories for categories Burgers and Chicken are not the same. (μ1≠μ2)
##install.packages("tidyverse")
library(magrittr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
mydata5 %>%
group_by(Category) %>%
summarise(mean_fat = round(mean(`Fat Calories`), 2)) %>%
rename("Means of Fat Calories" = "mean_fat")
## # A tibble: 3 × 2
## Category `Means of Fat Calories`
## <chr> <dbl>
## 1 Breakfast 214.
## 2 Burgers 351.
## 3 Chicken 292.
mydata6 <- mydata5 %>% filter(Category != "Breakfast")
library(ggplot2)
ggplot(mydata6, aes(x= `Fat Calories`)) +
geom_histogram(binwidth = 50) +
facet_wrap (~Category, ncol = 1 )
From the Graph we can see that neither Burgers not Chicken have a normal distirbution of Fat calories. This is why we move to non parametric test.
H0: The medians of Fat calories for categories Burgers and Chicken are the same. (Me1 = Me2) H1: The medians of Fat calories for categories Burgers and Chicken are not the same. (Me1 ≠ Me2)
wilcox.test(mydata6$`Fat Calories`~ mydata6$category,
paired = FALSE,
correct = FALSE,
exact = FALSE,
alternative = "two.sided")
##
## Wilcoxon rank sum test
##
## data: mydata6$`Fat Calories` by mydata6$category
## W = 260.5, p-value = 0.5268
## alternative hypothesis: true location shift is not equal to 0
library(effectsize)
effectsize(wilcox.test(mydata6$`Fat Calories`~ mydata6$category,
paired = FALSE,
correct = FALSE,
exact = FALSE,
alternative = "two.sided"))
## r (rank biserial) | 95% CI
## ---------------------------------
## 0.11 | [-0.23, 0.43]
interpret_rank_biserial(0.11)
## [1] "small"
## (Rules: funder2019)
Using the data, we are unable unable to demonstrate that Fat calories differed between Burger and chiken menu items. (p=0,53, the effect size is small, 𝑟= 0.11).
H0: μ(Burgers)= μ(Chicken) = μ(Breakfast). Means of Fat in grams of categories Burgers, Chicken and Breakfast items are the same. H1: One μ or more are different
library(magrittr)
library(dplyr)
mydata5 %>%
group_by(Category) %>%
summarise(mean_fat = round(mean(Fatingrams), 2)) %>%
rename("Means of Fat in grams" = "mean_fat")
## # A tibble: 3 × 2
## Category `Means of Fat in grams`
## <chr> <dbl>
## 1 Breakfast 23.8
## 2 Burgers 39.0
## 3 Chicken 32.5
##install.packages("stats")
library(rstatix)
##
## Attaching package: 'rstatix'
## The following objects are masked from 'package:effectsize':
##
## cohens_d, eta_squared
## The following object is masked from 'package:stats':
##
## filter
library(dplyr)
mydata5 %>%
group_by(Category) %>%
shapiro_test(Fatingrams)
## # A tibble: 3 × 4
## Category variable statistic p
## <chr> <chr> <dbl> <dbl>
## 1 Breakfast Fatingrams 0.946 0.0985
## 2 Burgers Fatingrams 0.902 0.0171
## 3 Chicken Fatingrams 0.948 0.397
From p values we see that Breakfast and Burgers both fail the shapiro wilk test from normality of distribution but Chiken’s p-value is to high to reject H0 that fat in grams is normally distirbuted, but we still have to move to a non parametric test.
New hyphotesis:
H0: All distribution locations of variables are the same H1: At least one distribution location of variable is different
kruskal.test(Fatingrams ~ Category,
data=mydata5)
##
## Kruskal-Wallis rank sum test
##
## data: Fatingrams by Category
## Kruskal-Wallis chi-squared = 5.3735, df = 2, p-value = 0.0681
library(rstatix)
kruskal_effsize(Fatingrams ~ Category,
data=mydata5)
## # A tibble: 1 × 5
## .y. n effsize method magnitude
## * <chr> <int> <dbl> <chr> <ord>
## 1 Fatingrams 77 0.0456 eta2[H] small
groups_nonpar <- wilcox_test(Fatingrams ~ Category,
paired=FALSE,
p.adjust.method = "bonferroni",
data = mydata5)
groups_nonpar
## # A tibble: 3 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 Fatingrams Breakfast Burgers 33 26 286. 0.03 0.09 ns
## 2 Fatingrams Breakfast Chicken 33 18 220. 0.134 0.402 ns
## 3 Fatingrams Burgers Chicken 26 18 260 0.543 1 ns
From the data we could not prove that the distribution of fat in grams differs for at least one of menu item categories (𝜒2 =5.37 𝑝=0.07), the effect size was small (𝜂2 = 0.045). Post-Hoc tests revealed non significant differences for each pair of groups (𝑝>0.05).