Connect with me on Open Science Framework | Contact me via LinkedIn
Depending on your machine it might be necessary to use right-click -> open in new browser window.
R analysis script presenting the solutions for the exercise in Sensory Marketing regarding Lichters et al.(2021) .
If this exercise grabs your attention, please check-out our study programs at the Chemnitz University of Technology by clicking on the logo (Germany):
Beware: R is a context-sensitive language. Thus, ‘data’ will be interpreted not in the same way as ‘Data’ will.
In R most functionality is provided by additional packages.
Most of the packages are well-documented, See: https://cran.r-project.org/
The code chunk below first evaluates if the package pacman is already installed to your machine. If yes, the corresponding package will be loaded. If not, R will install the package.
Alternatively, you can do this manually first by executing install.packages(“pacman”) and then library(pacman).
The second line then loads the package pacman
The third line uses the function p_load() from the pacman package to install (if necessary) and load all packages that we provide as arguments (e.g., osfr, which provides functions for interfacing with the open science framework from R).
if(!"pacman" %in% rownames(installed.packages())) install.packages("pacman")
library(pacman)
pacman::p_load(tidyverse, dplyr, tidyselect, osfr, rstatix)
Here is the R session info which gives you information on my machine, all loaded packages and their version:
sessionInfo()
## R version 4.1.1 (2021-08-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19043)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252
## [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
## [5] LC_TIME=German_Germany.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] rstatix_0.7.0 pwr_1.3-0 osfr_0.2.8 tidyselect_1.1.1
## [5] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7 purrr_0.3.4
## [9] readr_2.0.2 tidyr_1.1.4 tibble_3.1.4 ggplot2_3.3.5
## [13] tidyverse_1.3.1 pacman_0.5.1
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.7 lubridate_1.7.10 assertthat_0.2.1 digest_0.6.28
## [5] utf8_1.2.2 R6_2.5.1 cellranger_1.1.0 backports_1.2.1
## [9] reprex_2.0.1 evaluate_0.14 httr_1.4.2 pillar_1.6.3
## [13] rlang_0.4.11 curl_4.3.2 readxl_1.3.1 data.table_1.14.2
## [17] rstudioapi_0.13 car_3.0-11 jquerylib_0.1.4 rmarkdown_2.11
## [21] foreign_0.8-81 munsell_0.5.0 broom_0.7.9 compiler_4.1.1
## [25] modelr_0.1.8 xfun_0.26 pkgconfig_2.0.3 htmltools_0.5.2
## [29] httpcode_0.3.0 codetools_0.2-18 rio_0.5.27 fansi_0.5.0
## [33] crayon_1.4.1 tzdb_0.1.2 dbplyr_2.1.1 withr_2.4.2
## [37] crul_1.1.0 grid_4.1.1 jsonlite_1.7.2 gtable_0.3.0
## [41] lifecycle_1.0.1 DBI_1.1.1 magrittr_2.0.1 scales_1.1.1
## [45] zip_2.2.0 carData_3.0-4 cachem_1.0.6 cli_3.0.1
## [49] stringi_1.7.4 fs_1.5.0 xml2_1.3.2 bslib_0.3.0
## [53] ellipsis_0.3.2 generics_0.1.0 vctrs_0.3.8 openxlsx_4.2.4
## [57] tools_4.1.1 glue_1.4.2 hms_1.1.1 abind_1.4-5
## [61] fastmap_1.1.0 yaml_2.2.1 colorspace_2.0-2 rvest_1.0.1
## [65] memoise_2.0.0 knitr_1.36 haven_2.4.3 sass_0.4.0
The article provides us with the link to a corresponding OSF project: https://doi.org/10.17605/OSF.IO/VKZ4Y. Here, we can find all data sets and R-scrips accompanying the manuscript in Food Quality and Preference. OSF is a free web application that provides a space for researchers to collaboratively store, manage, and share their research materials (e.g. data, code, protocols). Most work on OSF is organized around projects, which include a cloud-based storage bucket where files can be stored and organized into directories. Please navigate to the corresponding project, using your web browser. We can see that the project contains a data component named Data Sets and Analysis Scripts. If we click on the data component, we are provided with its own url identifyer which is: https://osf.io/kst7p This data component is housing the product acceptance scores for the cappuccino category in the data set named Working dataset Cappuccino.csv within the directory path Data and analysis scripts/Main studies.
To load the data set we use functions provided by the osfr package.
We first assess the corresponding OSF data component from R. For this purpose we apply the function osf_retrieve_node(), which expects us to hand-in an url to an existing resource. We assign the results of the function to an object named OSF_project_segmenting_consumers.
OSF_project_segmenting_consumers <- osf_retrieve_node("https://osf.io/kst7p/")
The code chunk above opens a, so to say, table of content for the corresponding OSF data component. Let’s list all of the files that have been uploaded to the path Data and analysis scripts/Main studies. This is done by the osf_ls_files() function:
osf_ls_files(OSF_project_segmenting_consumers, path = "Data and analysis scripts/Main studies")
## # A tibble: 8 x 3
## name id meta
## <chr> <chr> <list>
## 1 Working dataset Cappuccino.csv 5ec278ee221cb100a5132747 <named ~
## 2 Working dataset Beer.csv 5ec278f6b9daae00bb50aa6b <named ~
## 3 Sensory Acceptance Tests - Workspace.RData 5ec278ff33a74b00b83b8dcd <named ~
## 4 Sensory Acceptance Tests - Analysis Script.R 5ec2790a33a74b00b73b7c65 <named ~
## 5 RStudio Project.Rproj 5ec27913b9daae00b150b0e2 <named ~
## 6 Beer_Order_Environments.xlsx 5f76e5d91cfe690149cf1623 <named ~
## 7 Cappuccino_Order_Environments.xlsx 5f76e5d9e64e7e014caab4bf <named ~
## 8 ARIs.Manuscript.Rds 5f76e5ee39d7e3014ceb5fe1 <named ~
This returns an osf_tbl object, which is the data.frame-like class the osfr package uses to represent items retrieved from OSF. This one contains 8 rows; one for each of the files stored on the OSF data component. To download a local copy of the cappuccino data set (which is row 1 by the way), we use the osf_download() function. More specifically, we build a pipe of commands: First, we hand over our list of files stored in the OSF file path to a function filter(). This function helps us in selecting only the entry with the file named Working dataset Cappuccino.csv. We better use the notation dplyr::filter() to make sure that the filter() function from the dplyr-package is used, since multiple other packages provide functions aith the same name. The result of this step then goes as argument to the last command in the pipe, which is osf_download().
osf_ls_files(OSF_project_segmenting_consumers, path = "Data and analysis scripts/Main studies") %>%
dplyr::filter(name == "Working dataset Cappuccino.csv") %>%
osf_download(conflicts = "overwrite")
## # A tibble: 1 x 4
## name id local_path meta
## <chr> <chr> <chr> <list>
## 1 Working dataset Cappuccino.csv 5ec278ee221cb100a5132747 ./Working dat~ <named~
As the result, the data set Working dataset Cappuccino.csv is stored to the current working directory of your R-project. All we need to do, is to import it to our current R analysis session. For this purpose, we use read.csv() function. We assign the data set to a new object d. In a last step, we convert the object to the [tibble] (https://tibble.tidyverse.org/) format, which is prominent among marketing researchers.
d <- read.csv("Working dataset Cappuccino.csv")
d <- as_tibble(d)
Finally, we can give a brief overview over the imported data:
print(d)
## # A tibble: 104 x 13
## ID OL.Bar.1shot...Lab OL.Siz.1shot...Lab OL.Bar.2shot...~ OL.Siz.2shot...~
## <int> <int> <int> <int> <int>
## 1 1 6 5 7 8
## 2 3 8 7 4 7
## 3 4 5 9 9 8
## 4 5 7 9 7 6
## 5 6 7 5 6 7
## 6 7 7 7 8 7
## 7 8 6 8 6 8
## 8 9 8 9 7 6
## 9 11 7 6 5 4
## 10 12 8 6 8 7
## # ... with 94 more rows, and 8 more variables: OL.Bar.1shot...VR <int>,
## # OL.Siz.1shot...VR <int>, OL.Bar.2shot...VR <int>, OL.Siz.2shot...VR <int>,
## # OL.Bar.1shot...Fil <int>, OL.Siz.1shot...Fil <int>,
## # OL.Bar.2shot...Fil <int>, OL.Siz.2shot...Fil <int>
As we can see, the spreadsheet-like data set contains one row for each of the n=104 interviewed consumers. Furthermore, there are 13 variables as columns. The first variable houses the consumer ID, whereas the remaining columns comprise of the consumers’ overall liking judgments (“OL”) about the different product samples across the varying testing environments (scaled from 1 to 9). The abbreviations “Bar,” “Lab,” and “VR” thereby stand for the environments **Real Coffeehouse*", Senory Lab, and Immersive Coffeehouse**. The abbreviations “Bar.1shot,” “Bar.2shot,” “Siz.1shot,” and “Siz.2shot” mark the different product samples, namely cappuccinos made of different coffee brands (“Barista” and “Sizilianer Art,” according to the authors), and cappuccinos prepared with one or two shots of that coffee.
Apply a repeated measures ANOVA to the overall liking scores to evaluate if you are able to also extract a significant difference in product acceptance across test environments (a so-called level effect).
Here, we are asked to simply replicate the original analysis proposed by the authors. Thus, the easiest way to solve the task is to have a look at the analyses scripts provided on OSF. In the code below, however, we applied our own code.
We first, tell R to handle the first column “ID” as a categorical factor variable instead of a numerical one.
Next, we use the function pivot_longer to restructure our data set from the wide format (each consumer is one row), to the long format (each product judgement is one row). Thus, we are stacking the data. We are doing so, because the subsequent repeated measures ANOVA (rANOVA) is expecting the long instead of the wide format. Within the function call, we hand-in d as for the data argument. Further, we tell the function to use the columns named “OL.Bar.1shot…Lab” to “OL.Siz.2shot…Fil” for creating the longer format. The names_to argument tells the function which name to use for the new stacked variable collecting all product variant names. Finally, the values_to argument sets the name for the stacked product evaluations. The results are stored to a new object called d_stacked.
d$ID <- as_factor(d$ID)
d_stacked <- pivot_longer(data = d,
cols = OL.Bar.1shot...Lab:OL.Siz.2shot...Fil,
names_to = "product_sample",
values_to = "liking")
print(d_stacked)
## # A tibble: 1,248 x 3
## ID product_sample liking
## <fct> <chr> <int>
## 1 1 OL.Bar.1shot...Lab 6
## 2 1 OL.Siz.1shot...Lab 5
## 3 1 OL.Bar.2shot...Lab 7
## 4 1 OL.Siz.2shot...Lab 8
## 5 1 OL.Bar.1shot...VR 7
## 6 1 OL.Siz.1shot...VR 6
## 7 1 OL.Bar.2shot...VR 8
## 8 1 OL.Siz.2shot...VR 7
## 9 1 OL.Bar.1shot...Fil 4
## 10 1 OL.Siz.1shot...Fil 8
## # ... with 1,238 more rows
The results (see above) present the stacked data set. Each consumer, now, has 12 rows, since each consumer has tasted 4 different product formulations within 3 different consumption environments.
For the subsequent rANOVA we need to create additional columns indicating the different levels of the experimental factors of the study. This is done, by the code chunk below. Here we use a combination of the functions mutate(), case_when(), and str_detect() to create two new variables in d_stacked. For example, in the first code block we feed-in d_stacked to a mutate This function is a generic one, which can add new variables and preserves existing ones. Next, we have to tell mutate() a certain build logic for the new variable sample. Here, we use case_when(). Within case_when(), we can apply additional logic. The function then applies this logic to each row in d_stacked. Let’s focus on one example str_detect(product_sample, “Bar.1shot”) ~ ‘Bar_1shot’. We use str_detect() from the stringr package to evaluate for each line of the variable product_sample whether it contains the string “Bar.1shot.” If this is the case, the function is setting the value ‘Bar_1shot’ to the newly created variable sample. If no match is detected, no value will be set. In the end, we create two new variables sample and environment, which indicate for each row in d_stacked, where a consumer has evaluated which formula of cappuccino.
The last two lines tell R to handle these variables as categorical factors.
d_stacked <- d_stacked %>% mutate(
sample = case_when(
str_detect(product_sample, "Bar.1shot") ~ 'Bar_1shot',
str_detect(product_sample, "Siz.1shot") ~ 'Siz_1shot',
str_detect(product_sample, "Siz.2shot") ~ 'Siz_2shot',
str_detect(product_sample, "Bar.2shot") ~ 'Bar_2shot'
))
d_stacked <- d_stacked %>% mutate(
environment = case_when(
str_detect(product_sample, "Lab") ~ 'Sensory_Lab',
str_detect(product_sample, "VR") ~ 'Immersive_Coffeehouse',
str_detect(product_sample, "Fil") ~ 'Real_Coffeehouse'
))
d_stacked$sample <- as_factor(d_stacked$sample)
d_stacked$environment <- as_factor(d_stacked$environment)
print(d_stacked)
## # A tibble: 1,248 x 5
## ID product_sample liking sample environment
## <fct> <chr> <int> <fct> <fct>
## 1 1 OL.Bar.1shot...Lab 6 Bar_1shot Sensory_Lab
## 2 1 OL.Siz.1shot...Lab 5 Siz_1shot Sensory_Lab
## 3 1 OL.Bar.2shot...Lab 7 Bar_2shot Sensory_Lab
## 4 1 OL.Siz.2shot...Lab 8 Siz_2shot Sensory_Lab
## 5 1 OL.Bar.1shot...VR 7 Bar_1shot Immersive_Coffeehouse
## 6 1 OL.Siz.1shot...VR 6 Siz_1shot Immersive_Coffeehouse
## 7 1 OL.Bar.2shot...VR 8 Bar_2shot Immersive_Coffeehouse
## 8 1 OL.Siz.2shot...VR 7 Siz_2shot Immersive_Coffeehouse
## 9 1 OL.Bar.1shot...Fil 4 Bar_1shot Real_Coffeehouse
## 10 1 OL.Siz.1shot...Fil 8 Siz_1shot Real_Coffeehouse
## # ... with 1,238 more rows
Now, we are ready for conducting the rANOVA. For this purpose, we take advantage of the anova_test() function, which the rstatix package provides. We need to provide four arguments in our case:
We assign the results of our rANOVA to a new object named cappuccino_rANOVA.
Lastly, we print the results, while limiting the display to 3 digits.
cappuccino_rANOVA <- anova_test(data = d_stacked,
dv = liking,
wid = ID,
within = c(sample, environment))
print(cappuccino_rANOVA,
digits = 3)
## ANOVA Table (type III tests)
##
## $ANOVA
## Effect DFn DFd F p p<.05 ges
## 1 sample 3 309 59.639 1.91e-30 * 0.146
## 2 environment 2 206 11.238 2.33e-05 * 0.010
## 3 sample:environment 6 618 1.815 9.40e-02 0.005
##
## $`Mauchly's Test for Sphericity`
## Effect W p p<.05
## 1 sample 0.554 1.20e-11 *
## 2 environment 0.989 5.69e-01
## 3 sample:environment 0.858 7.50e-01
##
## $`Sphericity Corrections`
## Effect GGe DF[GG] p[GG] p[GG]<.05 HFe DF[HF]
## 1 sample 0.709 2.13, 218.98 2.48e-22 * 0.724 2.17, 223.72
## 2 environment 0.989 1.98, 203.76 2.54e-05 * 1.008 2.02, 207.72
## 3 sample:environment 0.952 5.71, 588.14 9.80e-02 1.014 6.08, 626.58
## p[HF] p[HF]<.05
## 1 9.26e-23 *
## 2 2.33e-05 *
## 3 9.40e-02
In an essence, the results are presented within 3 tables.
The first one contains the main results of the rANOVA. We can see that 3 effects where tested: (1) The main effect of sample (i.e., the different cappuccino recipes), (2) the main effect of environment (i.e., the difference in mean acceptability scores across consumption environments), and (3) the interaction of sample and environment (i.e., are different cappuccino recipes evaluated differently across environments). For example, one sample might be evaluated best in the lab, but worst in the real coffeehouse).
We see that the main effects of sample and environment are both significant as the empirical p-values fall below 5%. Thus, we are able to replicate the authors’ findings regarding the level effect across consumption environments. More precisely, the effect of environment \(F_{2, 206}\) = 11.24, p = 0 is significant with an effect size of generalized \(\eta^{2}_{p}\) = 0.01, a small effect (Bakeman, 2005; Olejnik & Algina, 2003). This perfectly mirrors the results reported in Web Appendix, despite the fact, that the Web Appendix obviously reported the wrong value for the generalized \(\eta^{2}_{p}\) (0.10 instead of 0.01).
The two other tables reported above simply test one of the assumptions for rANOVA, namely whether the data exerts Sphericity across measurement times (Field et al., 2012, p. 551). To make it short, the second table shows that this assumption is significantly violated for the main effect of sample, but not for the other effects under consideration. If the assumption is violated, the one has to pay attention to the third table. Here we can find two proposed correction methods (Hyunh-Feldt [HF] and Greenhouse-Geisser [GG]), of which the latter is most commonly applied (Field et al., 2012, p. 554).
Use a different post-hoc test than the authors. Drawing on your chosen test, is the immersive environment still significantly different from the other two environments?
At page 4 of the Web Appendix the authors state: “[…] running a series of repeated measures analysis of variance analyses (rANOVAs) combined with post hoc Bonferroni-corrected within-subjects t-tests.” Thus, in the article the authors applied the Bonferroni correction (Bonferroni, 1936) to account for multiple group comparisons. Researchers in statistics have proposed countless alternatives for post-hoc tests to account for family-wise Alpha error, when comparing multiple groups (in our case the 3 environments). For illustration, we choose Holm’s method for the code chunk below, which is less conservative (i.e., differences get significant faster) as compared to Bonferroni (Holm, 1979).
We implement a combination of the two functions with() and pairwise.t.test() in the code below and assign the results of our post-hoc procedure to a new object named post_hoc. We first submit d_stacked to the with() function, which simply tells R to conduct several analysis steps with it. The specific analysis is specified next. Here, we call for a form of t-test within the function pairwise.t.test(). This function expects us to provide 4 arguments:
post_hoc <- d_stacked %>%
with(., pairwise.t.test(x=liking,
g=environment,
p.adjust.method="holm",
paired=TRUE))
print(post_hoc, digits = 3)
##
## Pairwise comparisons using paired t tests
##
## data: liking and environment
##
## Sensory_Lab Immersive_Coffeehouse
## Immersive_Coffeehouse 1.09e-05 -
## Real_Coffeehouse 0.19242 0.00126
##
## P value adjustment method: holm
The resultant matrix presents the p-values of the 3 pair-wise comparisons. We see that at Alpha=5%, the mean product likings of the immersive coffeehouse condition are significantly different from both, the sensory lab and the real coffeehouse. Therefore, that reported results of the article are robust, even when using Holm’s correction instead of Bonferroni.
Only focus on the real coffeehouse environment! If you choose a different post-hoc test as compared to the authors, do you find the same significant differences between products?
This task is very similar to the previous one. The difference is that we have to focus only at the real coffeehouse environment. Furthermore, we are investigating post-hoc tests for the sample factor instead of the environment factor. As an alternative to the authors’ Bonferroni procedure, we, again, draw on Holm’s post-hoc test.
In the code chunk below we, again, use dplyr’s filter() function to only select rows in d_stacked that correspond to observations made in the real coffeehouse environment.
post_hoc_real_coffeehouse <- d_stacked %>%
dplyr::filter(environment == "Real_Coffeehouse") %>%
with(., pairwise.t.test(
x=liking,
g=sample,
p.adjust.method="holm",
paired=TRUE))
print(post_hoc_real_coffeehouse, digits = 3)
##
## Pairwise comparisons using paired t tests
##
## data: liking and sample
##
## Bar_1shot Siz_1shot Bar_2shot
## Siz_1shot 1 - -
## Bar_2shot 4.49e-07 2.33e-09 -
## Siz_2shot 3.95e-06 4.49e-07 1
##
## P value adjustment method: holm
The resultant matrix presents the p-values of the 4 pair-wise comparisons. We see that at Alpha=5%, the acceptability scores of all product variants are significantly different from each other with the exception of the tests between Bar_1shot and Siz_1shot, as well as between Bar_2shot and Siz_2shot. In conclusion, applying Holm’s instead of Bonferroni’s post-hoc correction does not change the interpretation of the data within the real coffeehouse condition.