if (!"pacman" %in% rownames(installed.packages())) install.packages("pacman")
library(pacman)
pacman::p_load(tidyverse, osfr, rstatix)Connect with me on Open Science Framework | Contact me via LinkedIn
It might be necessary to right-click -> open in a new browser window, depending on your machine.
R analysis script presenting the solutions for the exercise in Sensory Marketing regarding Lichters et al. (2021).
The purpose of this script does not solely lay in answering the exercise question. Moreover, studying these scripts should help students become familiar with some aspects of working in R.
If this exercise grabs your attention, please check out our master study programs at the Otto-von-Guericke-University in Magdeburg (Germany) by clicking on the logo!
1 Loading packages
R is a context-sensitive language. Thus, ‘data’ will not be interpreted the same way as ‘Data’ will.
In R, most functionality is provided by additional packages.
Most of the packages are well-documented; see: https://cran.r-project.org/
The code chunk below first evaluates if the package pacman (Rinker & Kurkiewicz, 2018) is already installed on your machine. If yes, the corresponding package will be loaded. If not, R will install the package.
Alternatively, you can do this manually first by executing install.packages(“pacman”) and then library(pacman).
The second line then loads the package pacman.
The third line uses the function p_load() from the pacman package to install (if necessary) and loads all packages that we provide as arguments (e.g., rstatix (Kassambara, 2023), which provides functions for performing basic statistical tests}
In all code chunks throughout this script, you can receive additional help on each used function by clicking on its name (or via right-click and then opening in a new browser tab). Alternatively, when coding, we can see which arguments a function understands by pressing ‘F1’ while setting the cursor to the function’s name.
Here is the R session info, which gives you information on my machine, all loaded packages, and their version:
sessionInfo()R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22621)
Matrix products: default
locale:
[1] LC_COLLATE=German_Germany.utf8 LC_CTYPE=German_Germany.utf8
[3] LC_MONETARY=German_Germany.utf8 LC_NUMERIC=C
[5] LC_TIME=German_Germany.utf8
time zone: Europe/Berlin
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] osfr_0.2.9 rstatix_0.7.2 compute.es_0.2-5 pwr_1.3-0
[5] htmltools_0.5.6.1 quarto_1.3 ggpubr_0.6.0 lubridate_1.9.3
[9] forcats_1.0.0 stringr_1.5.0 dplyr_1.1.3 purrr_1.0.2
[13] readr_2.1.4 tidyr_1.3.0 tibble_3.2.1 ggplot2_3.4.4
[17] tidyverse_2.0.0 labelled_2.12.0 knitr_1.44 kableExtra_1.3.4
[21] haven_2.5.3 pacman_0.5.1
loaded via a namespace (and not attached):
[1] gtable_0.3.4 xfun_0.40 processx_3.8.2 tzdb_0.4.0
[5] vctrs_0.6.4 tools_4.3.1 ps_1.7.5 generics_0.1.3
[9] curl_5.1.0 fansi_1.0.5 pkgconfig_2.0.3 webshot_0.5.5
[13] lifecycle_1.0.3 compiler_4.3.1 munsell_0.5.0 carData_3.0-5
[17] yaml_2.3.7 pillar_1.9.0 later_1.3.1 car_3.1-2
[21] cachem_1.0.8 abind_1.4-5 tidyselect_1.2.0 rvest_1.0.3
[25] digest_0.6.33 stringi_1.7.12 fastmap_1.1.1 grid_4.3.1
[29] colorspace_2.1-0 cli_3.6.1 magrittr_2.0.3 crul_1.4.0
[33] utf8_1.2.3 broom_1.0.5 withr_2.5.1 scales_1.2.1
[37] backports_1.4.1 timechange_0.2.0 rmarkdown_2.25 httr_1.4.7
[41] ggsignif_0.6.4 hms_1.1.3 memoise_2.0.1 evaluate_0.22
[45] viridisLite_0.4.2 rlang_1.1.1 Rcpp_1.0.11 httpcode_0.3.0
[49] glue_1.6.2 xml2_1.3.5 svglite_2.1.2 rstudioapi_0.15.0
[53] jsonlite_1.8.7 R6_2.5.1 fs_1.6.3 systemfonts_1.0.5
2 Task 5.1. Loading data from the Open Science Framework (OSF)
The article provides the link to a corresponding OSF project: https://doi.org/10.17605/OSF.IO/VKZ4Y. We can find all data sets and R-scripts accompanying the Food Quality and Preference manuscript here. OSF is a free web application that allows researchers to collaboratively store, manage, and share their research materials (e.g., data, code, protocols). Most work on OSF is organized around projects, which include a cloud-based storage bucket where files can be stored and organized into directories. Please navigate to the corresponding project using your web browser. We can see that the project contains a data component named Data Sets and Analysis Scripts. If we click on the data component, we are provided with its own URL identifier, which is https://osf.io/kst7p. This data component houses the product acceptance ratings for the cappuccino category in the data set named Working dataset Cappuccino.csv within the directory path Data and analysis scripts/Main studies.
To load the data set, we use functions provided by the osfr package (Wolen et al., 2020).
We first assess the corresponding OSF data component from R. For this purpose, we apply the function osf_retrieve_node(), which expects us to hand in a url to an existing resource. We assign the results of the function to an object named OSF_project_segmenting_consumers.
OSF_project_segmenting_consumers <- osf_retrieve_node("https://osf.io/kst7p/")The code chunk above opens a, so to say, table of contents for the corresponding OSF data component. Let’s list all the files uploaded to the path Data and analysis scripts/Main studies. This is done by the osf_ls_files() function:
osf_ls_files(OSF_project_segmenting_consumers, path = "Data and analysis scripts/Main studies")# A tibble: 8 × 3
name id meta
<chr> <chr> <list>
1 Cappuccino_Order_Environments.xlsx 5f76e5d9e64e7e014ca… <named list>
2 Beer_Order_Environments.xlsx 5f76e5d91cfe690149c… <named list>
3 ARIs.Manuscript.Rds 5f76e5ee39d7e3014ce… <named list>
4 Working dataset Beer.csv 5ec278f6b9daae00bb5… <named list>
5 Sensory Acceptance Tests - Analysis Script.R 5ec2790a33a74b00b73… <named list>
6 Working dataset Cappuccino.csv 5ec278ee221cb100a51… <named list>
7 RStudio Project.Rproj 5ec27913b9daae00b15… <named list>
8 Sensory Acceptance Tests - Workspace.RData 5ec278ff33a74b00b83… <named list>
This returns an osf_tbl object, which is the data.frame-like class the osfr package uses to represent items retrieved from OSF. This one contains eight rows, one for each file stored on the OSF data component. To download a local copy of the cappuccino data set (row 1, by the way), we use the osf_download() function. More specifically, we build a pipeline of commands: First, we hand over our list of files stored in the OSF file path to a function filter(). This function helps us select only the entry with the file named Working dataset Cappuccino.csv. We better use the notation dplyr::filter() to ensure that the filter() function from the dplyr-package (Wickham, François, Henry, Müller, & Vaughan, 2023) is used since multiple other packages provide functions with the same name. The result of this step then goes as an argument to the last command in the pipe, which is osf_download().
osf_ls_files(OSF_project_segmenting_consumers, path = "Data and analysis scripts/Main studies") %>%
dplyr::filter(name == "Working dataset Cappuccino.csv") %>%
osf_download(conflicts = "overwrite")# A tibble: 1 × 4
name id local_path meta
<chr> <chr> <chr> <list>
1 Working dataset Cappuccino.csv 5ec278ee221cb100a51327… ./Working… <named list>
As a result, the data set Working dataset Cappuccino.csv is stored in the current working directory of your R-project. All we need to do is to import it to our current R analysis session. For this purpose, we use read.csv() function. We assign the data set to a new object, d. In the last step, we convert the object to the [tibble] (https://tibble.tidyverse.org/) format, which is prominent among marketing researchers.
d <- read.csv("Working dataset Cappuccino.csv")
d <- as_tibble(d)Finally, we can give a brief overview of the imported data:
print(d)# A tibble: 104 × 13
ID OL.Bar.1shot...Lab OL.Siz.1shot...Lab OL.Bar.2shot...Lab
<int> <int> <int> <int>
1 1 6 5 7
2 3 8 7 4
3 4 5 9 9
4 5 7 9 7
5 6 7 5 6
6 7 7 7 8
7 8 6 8 6
8 9 8 9 7
9 11 7 6 5
10 12 8 6 8
# ℹ 94 more rows
# ℹ 9 more variables: OL.Siz.2shot...Lab <int>, OL.Bar.1shot...VR <int>,
# OL.Siz.1shot...VR <int>, OL.Bar.2shot...VR <int>, OL.Siz.2shot...VR <int>,
# OL.Bar.1shot...Fil <int>, OL.Siz.1shot...Fil <int>,
# OL.Bar.2shot...Fil <int>, OL.Siz.2shot...Fil <int>
As we can see, the spreadsheet-like data set contains one row for each of the n=104 interviewed consumers. Furthermore, there are 13 variables as columns. The first variable houses the consumer ID, whereas the remaining columns comprise the consumers’ overall liking judgments (“OL”) about the different product samples across the varying testing environments (scaled from 1 to 9). The abbreviations “Bar”, “Lab”, and “VR” thereby stand for the environments **Real Coffeehouse*“, Sensory Lab, and Immersive Coffeehouse**. The abbreviations”Bar.1shot”, “Bar.2shot”, “Siz.1shot”, and “Siz.2shot” mark the different product samples, namely cappuccinos made of varying coffee brands (“Barista” and “Sizilianer Art”, according to the authors), and cappuccinos prepared with one or two shots of that coffee.
3 Task 5.2.
Apply a repeated measures ANOVA to the overall liking scores to evaluate if you can also find a significant difference in mean product acceptance across test environments (a so-called level effect).
Here, we are asked to replicate the original analysis proposed by the authors. Thus, the easiest way to solve the task is to look at the analysis scripts provided on OSF. In the code below, however, we applied our own code.
We first tell R to handle the first column, “ID” as a categorical factor variable instead of a numerical one.
Next, we use the function pivot_longer to restructure our data set from the wide format (each consumer is one row) to the long format (each product judgment is one row). Thus, we are stacking the data. We are doing so because the subsequent repeated measures ANOVA (rANOVA) expects the long instead of the wide format. Within the function call, we hand in d for the data argument. Further, we tell the function to use the columns named “OL.Bar.1shot…Lab” to “OL.Siz.2shot…Fil” for creating the longer format. The names_to argument tells the function which name to use for the new stacked variable collecting all product variant names. Finally, the values_to argument sets the name for the stacked product evaluations. The results are stored in a new object called d_stacked.
d$ID <- as_factor(d$ID)
d_stacked <- pivot_longer(data = d,
cols = OL.Bar.1shot...Lab:OL.Siz.2shot...Fil,
names_to = "product_sample",
values_to = "liking")
print(d_stacked)# A tibble: 1,248 × 3
ID product_sample liking
<fct> <chr> <int>
1 1 OL.Bar.1shot...Lab 6
2 1 OL.Siz.1shot...Lab 5
3 1 OL.Bar.2shot...Lab 7
4 1 OL.Siz.2shot...Lab 8
5 1 OL.Bar.1shot...VR 7
6 1 OL.Siz.1shot...VR 6
7 1 OL.Bar.2shot...VR 8
8 1 OL.Siz.2shot...VR 7
9 1 OL.Bar.1shot...Fil 4
10 1 OL.Siz.1shot...Fil 8
# ℹ 1,238 more rows
The results (see above) present the stacked data set. Each consumer now has 12 rows since each consumer has tasted four different product formulations within three different consumption environments.
For the subsequent rANOVA, we need to create additional columns indicating the different levels of the experimental factors of the study. This is done by the code chunk below. Here, we use a combination of the functions mutate(), case_when(), and str_detect() to create two new variables in d_stacked. For example, in the first code block, we feed in d_stacked to mutate(). This function is a generic one, which can add new variables and preserve existing ones. Next, we must tell mutate() a certain build logic for the new variable sample. Here, we use case_when(). Within case_when(), we can apply additional logic. The function then applies this logic to each row in d_stacked. Let’s focus on one example str_detect(product_sample, “Bar.1shot”) ~ ‘Bar_1shot’. We use str_detect() from the stringr package (Wickham, 2022) to evaluate for each line of the variable product_sample whether it contains the string “Bar.1shot”. If so, the function sets the value ‘Bar_1shot’ to the newly created variable sample. If no match is detected, no value will be set. In the end, we create two new variables, sample, and environment, which indicate for each row in d_stacked, where a consumer has evaluated which cappuccino formula.
The last two lines tell R to handle these variables as categorical factors.
d_stacked <- d_stacked %>% mutate(
sample = case_when(
str_detect(product_sample, "Bar.1shot") ~ 'Bar_1shot',
str_detect(product_sample, "Siz.1shot") ~ 'Siz_1shot',
str_detect(product_sample, "Siz.2shot") ~ 'Siz_2shot',
str_detect(product_sample, "Bar.2shot") ~ 'Bar_2shot'
))
d_stacked <- d_stacked %>% mutate(
environment = case_when(
str_detect(product_sample, "Lab") ~ 'Sensory_Lab',
str_detect(product_sample, "VR") ~ 'Immersive_Coffeehouse',
str_detect(product_sample, "Fil") ~ 'Real_Coffeehouse'
))
d_stacked$sample <- as_factor(d_stacked$sample)
d_stacked$environment <- as_factor(d_stacked$environment)
print(d_stacked)# A tibble: 1,248 × 5
ID product_sample liking sample environment
<fct> <chr> <int> <fct> <fct>
1 1 OL.Bar.1shot...Lab 6 Bar_1shot Sensory_Lab
2 1 OL.Siz.1shot...Lab 5 Siz_1shot Sensory_Lab
3 1 OL.Bar.2shot...Lab 7 Bar_2shot Sensory_Lab
4 1 OL.Siz.2shot...Lab 8 Siz_2shot Sensory_Lab
5 1 OL.Bar.1shot...VR 7 Bar_1shot Immersive_Coffeehouse
6 1 OL.Siz.1shot...VR 6 Siz_1shot Immersive_Coffeehouse
7 1 OL.Bar.2shot...VR 8 Bar_2shot Immersive_Coffeehouse
8 1 OL.Siz.2shot...VR 7 Siz_2shot Immersive_Coffeehouse
9 1 OL.Bar.1shot...Fil 4 Bar_1shot Real_Coffeehouse
10 1 OL.Siz.1shot...Fil 8 Siz_1shot Real_Coffeehouse
# ℹ 1,238 more rows
Now, we are ready to conduct the rANOVA. For this purpose, we take advantage of the anova_test() function, which the rstatix package provides. We need to provide four arguments in our case:
- data = the data set to use
- dv = the dependent variable in the rANOVA (the variable liking in the present case)
- wid = the variable indicating the case IDs of consumers (ID in our case)
- within = a vector of variable names, for which the analysis assumes within-subjects observations
We assign the results of our rANOVA to a new object named cappuccino_rANOVA.
Lastly, we print the results while limiting the display to 3 digits.
cappuccino_rANOVA <- anova_test(data = d_stacked,
dv = liking,
wid = ID,
within = c(sample, environment))
print(cappuccino_rANOVA, digits = 3)ANOVA Table (type III tests)
$ANOVA
Effect DFn DFd F p p<.05 ges
1 sample 3 309 59.639 1.91e-30 * 0.146
2 environment 2 206 11.238 2.33e-05 * 0.010
3 sample:environment 6 618 1.815 9.40e-02 0.005
$`Mauchly's Test for Sphericity`
Effect W p p<.05
1 sample 0.554 1.20e-11 *
2 environment 0.989 5.69e-01
3 sample:environment 0.858 7.50e-01
$`Sphericity Corrections`
Effect GGe DF[GG] p[GG] p[GG]<.05 HFe DF[HF]
1 sample 0.709 2.13, 218.98 2.48e-22 * 0.724 2.17, 223.72
2 environment 0.989 1.98, 203.76 2.54e-05 * 1.008 2.02, 207.72
3 sample:environment 0.952 5.71, 588.14 9.80e-02 1.014 6.08, 626.58
p[HF] p[HF]<.05
1 9.26e-23 *
2 2.33e-05 *
3 9.40e-02
In essence, the results are presented in 3 tables.
The first one contains the main results of the rANOVA. We can see that three effects were tested: (1) The main effect of the sample (i.e., the different cappuccino recipes), (2) the main effect of the environment (i.e., the difference in mean acceptability scores across consumption environments), and (3) the interaction of sample and environment (i.e., are different cappuccino recipes evaluated differently across environments). For example, one sample might be evaluated best in the sensory lab but worst in the real coffeehouse).
We see that the main effects of sample and environment are both significant as the empirical p-values fall below 5%. Thus, we can replicate the authors’ findings regarding the “level effect” across consumption environments.
More precisely, the main effect of the environment \(F_{2, 206}\) = 11.24, p = 0 is significant with an effect size of generalized \(\eta^{2}_{p}\) = 0.01, a small effect (Bakeman, 2005; Olejnik & Algina, 2003). This perfectly mirrors the results reported in the Web Appendix, even though the Web Appendix obviously reported the wrong value for the generalized \(\eta^{2}_{p}\) (0.10 instead of 0.01).
The two other tables reported above simply test one of the assumptions for rANOVA, namely whether the data exerts Sphericity across measurement times (Field, Miles, & Field, 2012, p. 551). To make it short, the second table shows that this assumption is significantly violated for the main effect of the sample but not for the other effects under consideration. If the assumption is violated, one must consider the third table. Here, we can find two proposed correction methods (Hyunh-Feldt [HF] and Greenhouse-Geisser [GG]), of which the latter is most commonly applied (Field et al., 2012, p. 551). However, we see that both main effects’ p-values also fall below 5% with the Greenhouse-Geisser correction (column p[GG]).
4 Task 5.3.
Use a different post-hoc test than the authors. Based on your chosen test, is the immersive environment still significantly different from the other two?
On page 4 of the Web Appendix, the authors state: “[…] running a series of repeated measures analysis of variance analyses (rANOVAs) combined with post hoc Bonferroni-corrected within-subjects t-tests”.
Thus, in the article, the authors applied the Bonferroni correction (Bonferroni, 1936) to account for multiple group comparisons. Researchers in statistics have proposed countless alternatives for post-hoc tests to account for family-wise Alpha error when comparing multiple groups (in our case, the three environments). For illustration, we choose Holm’s method (Holm, 1979) for the code chunk below, which is less conservative (i.e., differences get significant faster) than Bonferroni.
We combine the two functions with() and pairwise.t.test() in the code below and assign the results of our post-hoc procedure to a new object named post_hoc. We first submit d_stacked to the with() function, which simply tells R to conduct several analysis steps with it. The specific analysis is specified next. Here, we call for a t-test within the function pairwise.t.test(). This function expects us to provide four arguments:
- x = the dependent variable in the test (the product liking in our case)
- g = the variable forming the groups in the pair-wise comparisons (environment in the present task)
- p.adjust.method = the post-hoc procedure to apply to control for multiple comparisons (try ?p.adjust() to see which options are available)
- paired = a logical indicator flagged as TRUE to highlight that the observations are not independent (the same consumers evaluated the same products within different environments. Thus, a paired t-test is applied)
post_hoc <- d_stacked %>%
with(., pairwise.t.test(x=liking,
g=environment,
p.adjust.method="holm",
paired=TRUE))
print(post_hoc, digits = 3)
Pairwise comparisons using paired t tests
data: liking and environment
Sensory_Lab Immersive_Coffeehouse
Immersive_Coffeehouse 1.09e-05 -
Real_Coffeehouse 0.19242 0.00126
P value adjustment method: holm
The resultant matrix presents the p-values of the 3 three pair-wise comparisons. We see that at Alpha=5%, the mean product likings of the immersive coffeehouse condition significantly differ from the sensory lab and the real coffeehouse. Therefore, the reported results of the article are robust, even when using Holm’s correction instead of Bonferroni.
5 Task 5.4.
Only focus on the real coffeehouse environment! If you choose a different post-hoc test as compared to the authors, do you find the same significant differences between products?
This task is very similar to the previous one. The difference is that we have to focus only at the real coffeehouse environment. Furthermore, we are investigating post-hoc tests for the sample factor instead of the environment factor. As an alternative to the authors’ Bonferroni procedure, we, again, draw on Holm’s post-hoc test.
In the code chunk below, we, again, use dplyr’s filter() function to select only the rows in d_stacked that correspond to observations made in the real coffeehouse environment. These cases then move on to the same code we have applied above. This time, however, the argument g is formed by the sample variable, which contains information on the different cappuccino products.
post_hoc_real_coffeehouse <- d_stacked %>%
dplyr::filter(environment == "Real_Coffeehouse") %>%
with(., pairwise.t.test(
x=liking,
g=sample,
p.adjust.method="holm",
paired=TRUE))
print(post_hoc_real_coffeehouse, digits = 3)
Pairwise comparisons using paired t tests
data: liking and sample
Bar_1shot Siz_1shot Bar_2shot
Siz_1shot 1 - -
Bar_2shot 4.49e-07 2.33e-09 -
Siz_2shot 3.95e-06 4.49e-07 1
P value adjustment method: holm
The resultant matrix presents the p-values of the four pair-wise comparisons. We see that at Alpha=5%, the acceptability scores of all product variants are significantly different from each other, except for the tests between Bar_1shot and Siz_1shot, as well as between Bar_2shot and Siz_2shot. In conclusion, applying Holm’s instead of Bonferroni’s post-hoc correction does not change the interpretation of the data within the real coffeehouse condition.