Part 2: Verification

The things that we needed to reproduce were the participant demographics,

table 1,

figure 1,

and figure 2.

Demographics

For the demographics, we needed to calculate the mean and SD of participants age at each site. We also needed to calculate how many of each sex there were. We first loaded in our packages and data.

library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.1.2     v dplyr   1.0.6
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(gt)

dat <- read.csv("Nichols_et_al_data.csv")

We then selected only the variables we needed.

dat <- dat %>% 
  select(
  sex, site, include, age
  ) 

We used count() to count how many of each value of the sex variable there was and placed it into a dataset called datSex.

datSex <- dat %>% 
  count(sex)

After this, we created a dataset named datSite that was the same as the original dataset, except filtering only for participants who were not excluded. We had to do this after counting the sex of participants, because for some reason the authors had kept the excluded participants in their calculations of sex.

datSite <- dat %>% 
  filter(include == 0) 

Next, we grouped datSite by site, and calculated the mean and SD of age for each site. We used na.rm so that only non-NA values were used in the calculation. Note that we did not actually drop the NA values - this is because there were some participants who were missing age and sex data, so their data did not affect the mean and sd, but would affect the n count.

datSite <- datSite %>% 
  group_by(site) %>% 
  summarise(mean = mean(age, na.rm = TRUE),
            sd = sd(age, na.rm = TRUE),
            n = n())

We then relabelled the values for our tables to make it readable.

datSex$sex[datSex$sex == 0] <- "Female"
datSex$sex[datSex$sex == 1] <- "Male"

datSite$site[datSite$site == 1] <- "USA"
datSite$site[datSite$site == 2] <- "Czech Republic"
datSite$site[datSite$site == 3] <- "Japan"

Lastly, we made a simple gt table to lay out our data. We used fmt_number to round our data to two decimal places.

gt1 <- gt(datSite) 
gt2 <- gt(datSex)


gt1 %>% fmt_number(
    columns = c("mean", "sd"),
    decimals = 2,
    use_seps = FALSE
  )
site mean sd n
USA 25.54 9.84 123
Czech Republic 24.40 3.37 128
Japan 19.79 0.90 157
gt2
sex n
Female 228
Male 227
NA 5

Table 1

The first thing to do was to load in our packages. We used the gt package for the bulk of our visualization for this table.

library(tidyverse)
library(dplyr)
library(gt)
data1 <- read.csv("Nichols_et_al_data.csv")

Below was the standard header of code we included in nearly every r script we used. It accounted for the paper’s exclusion criteria with filter(), renamed variables to make them easier to read, and selected only variables we would use for our plots.

data1 <- data1 %>% 
  filter(include == 0) %>% 
  rename(cond = con,
         claimpercent = claim,
         claimmoney = moneyclaim,
         CT_practice = completion.time..practice.included.,
         CT_payments = completion.time..payments.only.,
         religiosity = relig,
         religion = Religion) %>% 
  select(-CT_practice, -CT_payments, -CT, -CT_cheat, -Religion.Text, -affil_cong)

Some variables needed to be created out of existing variables, as they combined some of the aspects of the music into its own category, for instance tempo and negativity/positivity. After reading the OSF repo, we found that these were simply averages of other variables and coded it according to the repo.

There was still some confusion however. We’re still not sure why tempo is coded by taking the absolute value of slow subtracted by seven, given that slow is a 5 point scale from 1-5. This is unlikely to be solved without contacting the authors themselves.

data1 <- data1 %>% 
  mutate(
    negativity = (distressing + irritating + boring + sad)/4,
    positivity = (interesting + pleasant + exciting + relaxing + happy)/5,
    impact = (deep + powerful)/2,
    tempo = (fast + abs(slow-7))/2
  ) 

We then went about finding the descriptive statistics for these variables.

The next piece of code shows our first attempt at doing this. We first grouped our table by condition, and then used summarise to take the mean and SD of all our variables. However, doing so simply returned a table which contained the values. As we needed to take these values, and put them inside a table with a bunch of other values, we needed our code to return the values individually in order for us to take them and put it inside another table.

data1 %>% 
  group_by(cond) %>%
  summarise(mean_claim = mean(claimpercent, na.rm = TRUE),
            mean_sac = mean(sacred, na.rm = TRUE),
            mean_neg = mean(negativity, na.rm = TRUE),
            mean_pos = mean(positivity, na.rm = TRUE),
            mean_imp = mean(impact, na.rm = TRUE),
            mean_temp = mean(tempo, na.rm = TRUE),
            SD_claim = sd(claimpercent, na.rm = TRUE),
            SD_sac = sd(sacred, na.rm = TRUE),
            SD_neg = sd(negativity, na.rm = TRUE),
            SD_pos = sd(positivity, na.rm = TRUE),
            SD_imp = sd(impact, na.rm = TRUE),
            SD_temp = sd(tempo, na.rm = TRUE)
  )
## # A tibble: 4 x 13
##    cond mean_claim mean_sac mean_neg mean_pos mean_imp mean_temp SD_claim SD_sac
##   <int>      <dbl>    <dbl>    <dbl>    <dbl>    <dbl>     <dbl>    <dbl>  <dbl>
## 1     1      0.224   NaN      NaN      NaN      NaN       NaN       0.239  NA   
## 2     2      0.294     3.26     2.58     1.38     1.76      3.99    0.311   1.19
## 3     3      0.303     4.32     1.64     2.72     2.73      2.78    0.310   1.35
## 4     4      0.273     5.17     1.82     2.54     2.91      2.60    0.299   1.39
## # ... with 4 more variables: SD_neg <dbl>, SD_pos <dbl>, SD_imp <dbl>,
## #   SD_temp <dbl>

After consulting the author’s code, we found a solution for this. This involved writing creating a new object for each value that we obtained. We used the $ operator to select the variable we were calculating, and then used square brackets to filter between the condition groups. na.rm removed any NA values from the calculation, and then we multiplied the result by 100 to give us a percentage, which we would store in the object. We used !is.na when calculating the number of participants, which is important as na.rm removes the row with the NA value, whilst !is.na simply ignores the NA value. When looking for participants, we don’t want to always remove participants with na values. The ! operator stands for “not”, and is.na gives a true value if the value of something is NA. So in this case, we are only looking for values that are NOT NA. R also comes with handy functions to calculate the mean, sd, and other descriptives which are aptly named for us. For the CI limits we used a simple formula, using the mean and standard error we had just calculated.

The naming scheme is as follows - the descriptive comes first (mean for mean, sd for sd, n for number of participants, se for standard error, lC and uC for 95% CI limits). The number that comes after describes the condition, with 1 being control, 2 being white noise, 3 being secular, and 4 being the religious condition. Lastly, the letter describes the variable being calculated. For instance, c is claimpercent, and s is sacred. With this naming scheme, mean1c would be the mean of the claimpercent value for participants in the 1st condition.

Now that we have finished this table, it probably would’ve been easier to write a function for this code. I’m not too sure if a function could return multiple objects, but if it could, this would help save a lot of lines of code. I think a function where you parsed in the condition and variable you are calculating, as well as object names for values to be returned to would work, but I’m not too sure. It’s definitely something worthwhile to look into for future replications.

# Percent claimed variable
# Control Group (1)
mean1c <- mean(data1$claimpercent[data1$cond=="1"], na.rm = TRUE)*100 # mean
sd1c <- sd(data1$claimpercent[data1$cond=="1"], na.rm = TRUE)*100 # SD
n1c <- length(data1$cond[data1$cond=="1" & !is.na(data1$cond)]) # number of responses
se1c <- sd1c/sqrt(n1c) # standard error
lCI1c <- mean1c - (1.96*se1c) # lower 95% CI
uCI1c <- mean1c + (1.96*se1c) # upper 95% CI
# White Noise Group (2)
mean2c <- mean(data1$claimpercent[data1$cond=="2"], na.rm = TRUE)*100 # mean
sd2c <- sd(data1$claimpercent[data1$cond=="2"], na.rm = TRUE)*100 # SD
n2c <- length(data1$cond[data1$cond=="2" & !is.na(data1$cond)]) # number of responses
se2c <- sd2c/sqrt(n2c) # standard error
lCI2c <- mean2c - (1.96*se2c) # lower 95% CI
uCI2c <- mean2c + (1.96*se2c) #upper 95% CI
# Secular Group (3)
mean3c <- mean(data1$claimpercent[data1$cond=="3"], na.rm = TRUE)*100 # mean
sd3c <- sd(data1$claimpercent[data1$cond=="3"], na.rm = TRUE)*100 # SD
n3c <- length(data1$cond[data1$cond=="3" & !is.na(data1$cond)]) # number of responses
se3c <- sd3c/sqrt(n3c) # standard error
lCI3c <- mean3c - (1.96*se3c) # lower 95% CI
uCI3c <- mean3c + (1.96*se3c) #upper 95% CI
# Religious Group (4)
mean4c <- mean(data1$claimpercent[data1$cond=="4"], na.rm = TRUE)*100 # mean
sd4c <- sd(data1$claimpercent[data1$cond=="4"], na.rm = TRUE)*100 # SD
n4c <- length(data1$cond[data1$cond=="4" & !is.na(data1$cond)]) # number of responses
se4c <- sd4c/sqrt(n4c) # standard error
lCI4c <- mean4c - (1.96*se4c) # lower 95% CI
uCI4c <- mean4c + (1.96*se4c) #upper 95% CI
# Sacredness variable
# White Noise Group (2)
mean2s <- mean(data1$sacred[data1$cond=="2"], na.rm = TRUE) # mean
sd2s <- sd(data1$sacred[data1$cond=="2"], na.rm = TRUE) # SD
n2s <- length(data1$cond[data1$cond=="2" & !is.na(data1$cond)]) # number of responses
se2s <- sd2s/sqrt(n2s) # standard error
lCI2s <- mean2s - (1.96*se2s) # lower 95% CI
uCI2s <- mean2s + (1.96*se2s) #upper 95% CI
# Secular Group (3)
mean3s <- mean(data1$sacred[data1$cond=="3"], na.rm = TRUE) # mean
sd3s <- sd(data1$sacred[data1$cond=="3"], na.rm = TRUE) # SD
n3s <- length(data1$cond[data1$cond=="3" & !is.na(data1$cond)]) # number of responses
se3s <- sd3s/sqrt(n3s) # standard error
lCI3s <- mean3s - (1.96*se3s) # lower 95% CI
uCI3s <- mean3s + (1.96*se3s) #upper 95% CI
# Religious Group (4)
mean4s <- mean(data1$sacred[data1$cond=="4"], na.rm = TRUE) # mean
sd4s <- sd(data1$sacred[data1$cond=="4"], na.rm = TRUE) # SD
n4s <- length(data1$cond[data1$cond=="4" & !is.na(data1$cond)]) # number of responses
se4s <- sd4s/sqrt(n4s) # standard error
lCI4s <- mean4s - (1.96*se4s) # lower 95% CI
uCI4s <- mean4s + (1.96*se4s) #upper 95% CI
# Negativity variable
# White Noise Group (2)
mean2n <- mean(data1$negativity[data1$cond=="2"], na.rm = TRUE) # mean
sd2n <- sd(data1$negativity[data1$cond=="2"], na.rm = TRUE) # SD
n2n <- length(data1$cond[data1$cond=="2" & !is.na(data1$cond)]) # number of responses
se2n <- sd2n/sqrt(n2n) # standard error
lCI2n <- mean2n - (1.96*se2n) # lower 95% CI
uCI2n <- mean2n + (1.96*se2n) #upper 95% CI
# Secular Group (3)
mean3n <- mean(data1$negativity[data1$cond=="3"], na.rm = TRUE) # mean
sd3n <- sd(data1$negativity[data1$cond=="3"], na.rm = TRUE) # SD
n3n <- length(data1$cond[data1$cond=="3" & !is.na(data1$cond)]) # number of responses
se3n <- sd3n/sqrt(n3n) # standard error
lCI3n <- mean3n - (1.96*se3n) # lower 95% CI
uCI3n <- mean3n + (1.96*se3n) #upper 95% CI
# Religious Group (4)
mean4n <- mean(data1$negativity[data1$cond=="4"], na.rm = TRUE) # mean
sd4n <- sd(data1$negativity[data1$cond=="4"], na.rm = TRUE) # SD
n4n <- length(data1$cond[data1$cond=="4" & !is.na(data1$cond)]) # number of responses
se4n <- sd4n/sqrt(n4n) # standard error
lCI4n <- mean4n - (1.96*se4n) # lower 95% CI
uCI4n <- mean4n + (1.96*se4n) #upper 95% CI
# Positivity variable
# White Noise Group (2)
mean2p <- mean(data1$positivity[data1$cond=="2"], na.rm = TRUE) # mean
sd2p <- sd(data1$positivity[data1$cond=="2"], na.rm = TRUE) # SD
n2p <- length(data1$cond[data1$cond=="2" & !is.na(data1$cond)]) # number of responses
se2p <- sd2p/sqrt(n2p) # standard error
lCI2p <- mean2p - (1.96*se2p) # lower 95% CI
uCI2p <- mean2p + (1.96*se2p) #upper 95% CI
# Secular Group (3)
mean3p <- mean(data1$positivity[data1$cond=="3"], na.rm = TRUE) # mean
sd3p <- sd(data1$positivity[data1$cond=="3"], na.rm = TRUE) # SD
n3p <- length(data1$cond[data1$cond=="3" & !is.na(data1$cond)]) # number of responses
se3p <- sd3p/sqrt(n3p) # standard error
lCI3p <- mean3p - (1.96*se3p) # lower 95% CI
uCI3p <- mean3p + (1.96*se3p) #upper 95% CI
# Religious Group (4)
mean4p <- mean(data1$positivity[data1$cond=="4"], na.rm = TRUE) # mean
sd4p <- sd(data1$positivity[data1$cond=="4"], na.rm = TRUE) # SD
n4p <- length(data1$cond[data1$cond=="4" & !is.na(data1$cond)]) # number of responses
se4p <- sd4p/sqrt(n4p) # standard error
lCI4p <- mean4p - (1.96*se4p) # lower 95% CI
uCI4p <- mean4p + (1.96*se4p) #upper 95% CI
# Tempo variable
# White Noise Group (2)
mean2t <- mean(data1$tempo[data1$cond=="2"], na.rm = TRUE) # mean
sd2t <- sd(data1$tempo[data1$cond=="2"], na.rm = TRUE) # SD
n2t <- length(data1$cond[data1$cond=="2" & !is.na(data1$cond)]) # number of responses
se2t <- sd2t/sqrt(n2t) # standard error
lCI2t <- mean2t - (1.96*se2t) # lower 95% CI
uCI2t <- mean2t + (1.96*se2t) #upper 95% CI
# Secular Group (3)
mean3t <- mean(data1$tempo[data1$cond=="3"], na.rm = TRUE) # mean
sd3t <- sd(data1$tempo[data1$cond=="3"], na.rm = TRUE) # SD
n3t <- length(data1$cond[data1$cond=="3" & !is.na(data1$cond)]) # number of responses
se3t <- sd3t/sqrt(n3t) # standard error
lCI3t <- mean3t - (1.96*se3t) # lower 95% CI
uCI3t <- mean3t + (1.96*se3t) #upper 95% CI
# Religious Group (4)
mean4t <- mean(data1$tempo[data1$cond=="4"], na.rm = TRUE) # mean
sd4t <- sd(data1$tempo[data1$cond=="4"], na.rm = TRUE) # SD
n4t <- length(data1$cond[data1$cond=="4" & !is.na(data1$cond)]) # number of responses
se4t <- sd4t/sqrt(n4t) # standard error
lCI4t <- mean4t - (1.96*se4t) # lower 95% CI
uCI4t <- mean4t + (1.96*se4t) #upper 95% CI
# Impact variable
# White Noise Group (2)
mean2i <- mean(data1$impact[data1$cond=="2"], na.rm = TRUE) # mean
sd2i <- sd(data1$impact[data1$cond=="2"], na.rm = TRUE) # SD
n2i <- length(data1$cond[data1$cond=="2" & !is.na(data1$cond)]) # number of responses
se2i <- sd2i/sqrt(n2i) # standard error
lCI2i <- mean2i - (1.96*se2i) # lower 95% CI
uCI2i <- mean2i + (1.96*se2i) #upper 95% CI
# Secular Group (3)
mean3i <- mean(data1$impact[data1$cond=="3"], na.rm = TRUE) # mean
sd3i <- sd(data1$impact[data1$cond=="3"], na.rm = TRUE) # SD
n3i <- length(data1$cond[data1$cond=="3" & !is.na(data1$cond)]) # number of responses
se3i <- sd3i/sqrt(n3i) # standard error
lCI3i <- mean3i - (1.96*se3i) # lower 95% CI
uCI3i <- mean3i + (1.96*se3i) #upper 95% CI
# Religious Group (4)
mean4i <- mean(data1$impact[data1$cond=="4"], na.rm = TRUE) # mean
sd4i <- sd(data1$impact[data1$cond=="4"], na.rm = TRUE) # SD
n4i <- length(data1$cond[data1$cond=="4" & !is.na(data1$cond)]) # number of responses
se4i <- sd4i/sqrt(n4i) # standard error
lCI4i <- mean4i - (1.96*se4i) # lower 95% CI
uCI4i <- mean4i + (1.96*se4i) #upper 95% CI

We then calculated Cohen’s d which tells us the effect size for each of the 6 variables, in comparison to the religious condition.

The formula for Cohen’s d is as follows…

We used the author’s code to help us turn this formula into code. Using abs() gives us the absolute value of the value contained within the brackets. We will always be comparing each condition with condition 4, the religious condition, to give us the appropriate effect size. For these objects, the naming scheme is similar to the above code, however 1 refers to the comparison between religious and secular condition, 2 is the comparison between religious and noise condition, and 3 is the comparison between religious and control.

# computing Cohen's d
# % claimed
#relig vs sec
d1c <- abs((mean4c-mean3c)/sqrt((sd4c^2+sd3c^2)/2))
#relig vs noise
d2c <- abs((mean4c-mean2c)/sqrt((sd4c^2+sd2c^2)/2))
#relig vs control
d3c <- abs((mean4c-mean1c)/sqrt((sd4c^2+sd1c^2)/2))
# sacredness
#relig vs sec
d1s <- abs((mean4s-mean3s)/sqrt((sd4s^2+sd3s^2)/2))
#relig vs noise
d2s <- abs((mean4s-mean2s)/sqrt((sd4s^2+sd2s^2)/2))
# negativity
#relig vs sec
d1n <- abs((mean4n-mean3n)/sqrt((sd4n^2+sd3n^2)/2))
#relig vs noise
d2n <- abs((mean4n-mean2n)/sqrt((sd4n^2+sd2n^2)/2))
# positivity
#relig vs sec
d1p <- abs((mean4p-mean3p)/sqrt((sd4p^2+sd3p^2)/2))
#relig vs noise
d2p <- abs((mean4p-mean2p)/sqrt((sd4p^2+sd2p^2)/2))
# tempo
#relig vs sec
d1t <- abs((mean4t-mean3t)/sqrt((sd4t^2+sd3t^2)/2))
#relig vs noise
d2t <- abs((mean4t-mean2t)/sqrt((sd4t^2+sd2t^2)/2))
# impact
#relig vs sec
d1i <- abs((mean4i-mean3i)/sqrt((sd4i^2+sd3i^2)/2))
#relig vs noise
d2i <- abs((mean4i-mean2i)/sqrt((sd4i^2+sd2i^2)/2))

With all our values calculated, all that was left was to visualize it in a table. We first constructed a tibble, placing our objects in order according to the paper. Using the c() function enabled us to parse through multiple objects to be placed in the table. This first table was values for only the religious condition.

We then used mutate_if() to round the values to 2 decimal places, and then used gt() to place visualize it as a gt table. The GT package makes creating beautiful tables extremely simple, which made it perfect for us. We used the cols_label function to label this portion of the table as the Religious condition.

#Making the tables
#Make religious table
table1 <- tibble(
  characteristics = c("% claimed", "Sacredness", "Negativity", "Positivity", "Tempo", "Impact"),
  M = c(mean4c, mean4s, mean4n, mean4p, mean4t, mean4i),
  SD = c(sd4c, sd4s, sd4n, sd4p, sd4t, sd4i),
  lCI = c(lCI4c, lCI4s, lCI4n, lCI4p, lCI4t, lCI4i),
  uCI = c(uCI4c, uCI4s, uCI4n, uCI4p, uCI4t, uCI4i),
  d = c("-", "-", "-", "-", "-", "-")
) 
table1 %>% mutate_if(is.numeric, ~round(., 2)) %>%
  gt() %>%
  cols_label(characteristics = "Religious") 
Religious M SD lCI uCI d
% claimed 27.33 29.92 21.52 33.14 -
Sacredness 5.17 1.39 4.90 5.44 -
Negativity 1.82 0.65 1.70 1.95 -
Positivity 2.54 0.85 2.37 2.70 -
Tempo 2.60 0.81 2.45 2.76 -
Impact 2.91 1.09 2.70 3.12 -

We repeated this code for all of the other conditions. Note that in the religious table, we used hyphens for the d values as we did not calculate Cohen’s D for the religious condition as we were comparing all the other descriptives to the religious condition.

#Make secular table
table2 <- tibble(
  characteristics = c("% claimed", "Sacredness", "Negativity", "Positivity", "Tempo", "Impact"),
  M = c(mean3c, mean3s, mean3n, mean3p, mean3t, mean3i),
  SD = c(sd3c, sd3s, sd3n, sd3p, sd3t, sd3i),
  lCI = c(lCI3c, lCI3s, lCI3n, lCI3p, lCI3t, lCI3i),
  uCI = c(uCI3c, uCI3s, uCI3n, uCI3p, uCI3t, uCI3i),
  d = c(d1c, d1s, d1n, d1p, d1t, d1i)
)
table2 %>% mutate_if(is.numeric, ~round(., 2)) %>%
  gt() %>%
  cols_label(characteristics = "Secular")
Secular M SD lCI uCI d
% claimed 30.32 30.98 24.33 36.30 0.10
Sacredness 4.32 1.35 4.06 4.58 0.62
Negativity 1.64 0.51 1.55 1.74 0.31
Positivity 2.72 0.78 2.56 2.87 0.22
Tempo 2.78 0.82 2.62 2.94 0.22
Impact 2.73 1.13 2.51 2.95 0.16
#Make white noise table
table3 <- tibble(
  characteristics = c("% claimed", "Sacredness", "Negativity", "Positivity", "Tempo", "Impact"),
  M = c(mean2c, mean2s, mean2n, mean2p, mean2t, mean2i),
  SD = c(sd2c, sd2s, sd2n, sd2p, sd2t, sd2i),
  lCI = c(lCI2c, lCI2s, lCI2n, lCI2p, lCI2t, lCI2i),
  uCI = c(uCI2c, uCI2s, uCI2n, uCI2p, uCI2t, uCI2i),
  d = c(d2c, d2s, d2n, d2p, d2t, d2i)
)
table3 %>% mutate_if(is.numeric, ~round(., 2)) %>%
  gt() %>%
  cols_label(characteristics = "White Noise")
White Noise M SD lCI uCI d
% claimed 29.40 31.09 23.39 35.40 0.07
Sacredness 3.26 1.19 3.04 3.49 1.48
Negativity 2.58 0.96 2.39 2.76 0.93
Positivity 1.38 0.62 1.26 1.50 1.56
Tempo 3.99 0.72 3.85 4.13 1.80
Impact 1.76 0.93 1.58 1.94 1.13
#Make control group table
table4 <- tibble(
  characteristics = c("% claimed", "Sacredness", "Negativity", "Positivity", "Tempo", "Impact"),
  M = c(mean1c, NA, NA, NA, NA, NA),
  SD = c(sd1c, NA, NA, NA, NA, NA),
  lCI = c(lCI1c, NA, NA, NA, NA, NA),
  uCI = c(uCI1c, NA, NA, NA, NA, NA),
  d = c(d3c, NA, NA, NA, NA, NA)
)
table4 %>% mutate_if(is.numeric, ~round(., 2)) %>%
  gt() %>%
  cols_label(characteristics = "Control Group")
Control Group M SD lCI uCI d
% claimed 22.38 23.89 17.69 27.06 0.18
Sacredness NA NA NA NA NA
Negativity NA NA NA NA NA
Positivity NA NA NA NA NA
Tempo NA NA NA NA NA
Impact NA NA NA NA NA

Lastly, we took the tables and used data.frame to combine them into one.

data_tables <- data.frame(Religious = table1, 
                          Secular = table2,
                          WhiteNoise = table3,
                          Control = table4)

Next, we used gt() once again, this time with our combined table. tab_source_note() allowed us to create a note at the bottom of the table, identical to the paper. tab_spanner() let us create a heading that spanned several columns, allowing us to dictate which conditions contained which values.

gt_tbl <- gt(data_tables) %>% 
  tab_source_note(
    source_note = "M = Mean; SD = Standard Deviation; CI = 95% Confidence Intervals. Cohen's d represents the effect size of comparisons between musical conditions.") %>%
  tab_spanner(
    label = "Religious (n = 102)",
    columns = c(Religious.M, Religious.SD, Religious.lCI, Religious.uCI, Religious.d)
  ) %>%
  tab_spanner(
    label = "Secular (n = 103)",
    columns = c(Secular.M, Secular.SD, Secular.lCI, Secular.uCI, Secular.d)
  ) %>%
  tab_spanner(
    label = "White Noise (n = 103)",
    columns = c(WhiteNoise.M, WhiteNoise.SD, WhiteNoise.lCI, WhiteNoise.uCI, WhiteNoise.d)
  ) %>%
  tab_spanner(
    label = "Control (n = 100)",
    columns = c(Control.M, Control.SD, Control.lCI, Control.uCI, Control.d)
  )

We used cols_label to name each column according to the paper. Then we used fmt_number to round each value to 2 decimal places. Next, we used cols_hide() to hide certain columns in the table. Because of the way we initially created 4 individual tables before combining them, there were several duplicated columns that we needed to remove. Finally, we just needed to change the NA values into “-” to match the paper. Using the fmt_missing function, we replaced all the missing values with “-”. You will notice that this is why we used a hyphen for each of the Cohen’s D values for the religious condition.

gt_tbl %>%  cols_label(Religious.characteristics = NULL, 
             Religious.M = "M",
             Religious.SD = "SD",
             Religious.lCI = "lCI",
             Religious.uCI = "uCI",
             Religious.d = "d",
             Secular.characteristics = "Secular",
             Secular.M = "M",
             Secular.SD = "SD",
             Secular.lCI = "lCI",
             Secular.uCI = "uCI",
             Secular.d = "d",
             WhiteNoise.characteristics = "White Noise",
             WhiteNoise.M = "M",
             WhiteNoise.SD = "SD",
             WhiteNoise.lCI = "lCI",
             WhiteNoise.uCI = "uCI",
             WhiteNoise.d = "d",
             Control.characteristics = "Control",
             Control.M = "M",
             Control.SD = "SD",
             Control.lCI = "lCI",
             Control.uCI = "uCI",
             Control.d = "d") %>% 
  fmt_number(
      columns = c(Religious.M, Religious.SD, Religious.lCI, Religious.uCI, Secular.M, Secular.SD, Secular.lCI,
                  Secular.uCI, Secular.d, WhiteNoise.M, WhiteNoise.SD, WhiteNoise.lCI, WhiteNoise.uCI,
                  WhiteNoise.d, Control.M, Control.SD, Control.lCI, Control.uCI, Control.d),
      decimals = 2,
      use_seps = FALSE
  ) %>% 
  cols_hide(
    columns = c(Secular.characteristics, WhiteNoise.characteristics, Control.characteristics)
  ) %>% 
  fmt_missing(
    columns = everything(),
    missing_text = "-"
  )
M Religious (n = 102) Secular (n = 103) White Noise (n = 103) Control (n = 100)
M SD lCI uCI d M SD lCI uCI d M SD lCI uCI d M SD lCI uCI d
% claimed 27.33 29.92 21.52 33.14 - 30.32 30.98 24.33 36.30 0.10 29.40 31.09 23.39 35.40 0.07 22.38 23.89 17.69 27.06 0.18
Sacredness 5.17 1.39 4.90 5.44 - 4.32 1.35 4.06 4.58 0.62 3.26 1.19 3.04 3.49 1.48 - - - - -
Negativity 1.82 0.65 1.70 1.95 - 1.64 0.51 1.55 1.74 0.31 2.58 0.96 2.39 2.76 0.93 - - - - -
Positivity 2.54 0.85 2.37 2.70 - 2.72 0.78 2.56 2.87 0.22 1.38 0.62 1.26 1.50 1.56 - - - - -
Tempo 2.60 0.81 2.45 2.76 - 2.78 0.82 2.62 2.94 0.22 3.99 0.72 3.85 4.13 1.80 - - - - -
Impact 2.91 1.09 2.70 3.12 - 2.73 1.13 2.51 2.95 0.16 1.76 0.93 1.58 1.94 1.13 - - - - -
M = Mean; SD = Standard Deviation; CI = 95% Confidence Intervals. Cohen's d represents the effect size of comparisons between musical conditions.

Figure 1

We first loaded in the packages and data required for the figure.

library(janitor) #used to read csv
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(ggplot2) #used to plot column
library(ggdist) #used for raincloud plot
library(gridExtra) #combines both plots
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
data2 <- read_csv("Nichols_et_al_data.csv")
## 
## -- Column specification --------------------------------------------------------
## cols(
##   .default = col_double(),
##   moneyclaim = col_character(),
##   `completion time (practice included)` = col_time(format = ""),
##   `completion time (payments only)` = col_time(format = ""),
##   Religion = col_character(),
##   `Religion Text` = col_character()
## )
## i Use `spec()` for the full column specifications.

claimpercent had to be mutated to become a percentage value by multiplying by 100, since they were in decimal place beforehand. Then, we used as_tibble to make the data easier to look at.

data2 <- data2 %>%  
  filter(include == 0) %>% 
  rename(cond = con,
         claimpercent = claim,
         claimmoney = moneyclaim,
         CT_practice = `completion time (practice included)`,
         CT_payments = `completion time (payments only)`,
         religiosity = relig,
         religion = Religion) %>% 
  select(site, claimpercent, id, cond) %>% 
  mutate(claimpercent = claimpercent * 100) %>% 
  as_tibble()

We reordered the cond and site variables in order to the code them in the same order as the original paper. This code was taken from the author’s code. At first we were confused as to how they had reordered the conditions. Why did they reorder cond==4 twice? And why did they not reorder cond==2? After playing around with the code, we realized that r read the code line by line, so if we changed cond==1 to 3, then on the next line it would be changed straight back to 1. Furthermore, cond==2 did not need to be changed as it remained the same no matter what order it was.

# data by condition
data2$cond[data2$cond==4] <- 0 # Make religious prime the reference category
data2$cond[data2$cond==1] <- 4 # This is in a weird order as R reads the code line by line, so if we go from top to bottom, 
data2$cond[data2$cond==3] <- 1 # we're changing the number twice which screws up our dataframe
data2$cond[data2$cond==4] <- 3
# data by site
data2$site[data2$site==1] <- 0 
data2$site[data2$site==3] <- 1 
# makes USA = 0, Japan = 1, Czech Republic = 2

We used mutate from the dplyr package to create “claimpercent2”, which would be the original claimpercent value, divided by the number of participants in that participants condition. We did this as geom_col takes the sum of all the values when visualizing data, whereas we wanted the columns to represent the means. We used conditional statements in order to determine how much to divide each claimpercent by. This code essentially figures out which condition the participant was in, and then divides the claimpercent value for that row by the number of participants in that same condition. When the scores are all added up, we will be left with the mean. Using the | (or) operand and == (equal to) operand, we can construct code that enables certain outputs only when the correct conditions are met.

# data by condition
data2 <- data2 %>% 
  mutate(numberOf = (cond == 0) * 100 + (cond == 1 | cond == 2) * 103 + (cond == 3) * 102) %>% 
  mutate(claimpercent2 = claimpercent / numberOf)
# data by site
data2 <- data2 %>% 
  mutate(numberOf2 = (site == 0) * 123 + (site == 2) * 127 + (site == 1) * 156) %>% 
  mutate(claimpercent3 = claimpercent / numberOf2)

We created group_cond so geom_error (in the next chunk) can plot the error bar. group_by() from dplyr creates a tibble so that any data wrangling we apply to the dataset is done in relation to the variables we have grouped by. For this part, we grouped by condition. We used code we found from StackOverflow to help us calculate descriptives as well as CI limits. We once again used the handy descriptive functions from R and also drop_na() to drop any rows with NA values.

We then repeated this code, now grouping by site. We needed to do this as there is an error_bar in both plots.

We also changed the condition and site variables into factors. When they were kept as numeric values, the plots had a hard time aligning the x-axis intervals with the plots for some reason. We also chose to change it at this point in particular as the calculations we performed earlier would not take factors as values, and would only take in continuous values.

# data by condition
group_cond <- data2 %>% 
  group_by(cond) %>% 
  summarise(mean = mean(claimpercent, na.rm = TRUE),
            sd = sd(claimpercent, na.rm = TRUE),
            n = n()) %>% 
  drop_na() %>% 
  mutate(se = sd / sqrt(n),
         lowerCI = mean - qt(1 - (0.05/2), n - 1) * se,
         upperCI = mean + qt(1 - (0.05/2), n - 1) * se) %>% 
  ungroup()
group_cond$cond <- factor(group_cond$cond)
# data by site
group_site <- data2 %>% 
  group_by(site) %>% 
  summarise(mean = mean(claimpercent, na.rm = TRUE),
            sd = sd(claimpercent, na.rm = TRUE),
            n = n()) %>% 
  drop_na() %>% 
  mutate(se = sd / sqrt(n),
         lowerCI = mean - qt(1 - (0.05/2), n - 1) * se,
         upperCI = mean + qt(1 - (0.05/2), n - 1) * se) %>% 
  ungroup()
group_site$site <- factor(group_site$site)

The cloud graph was created using stat_halfeye from the ggdist package. The bar graph underneath the cloud was created using geom_col(). We also had to change the width and height in order to be able to fit both of the plots for each x interval. The graph is also flipped so that the x axis is plotted on the y axis and vice versa - this was achieved using coord_flip, in order to match the plot in the paper.

We then used geom_errorbar() to plot the error bars for all of the graphs. We used the lowerCI and upperCI values we had calculated previously, and then used the width argument to arrange it so that it was aligned with the bar graph. The expand_limits() function expanded the x and y axis limits to include (0,0), to match the graph in the paper.

scale_x_discrete and scale_y_continuous were used in order to determine the distance between each x and y axis interval in order to match the paper. We also used them to create labels for the axes.

theme_classic() removed the gridlines and set the background to white.

The argument, legend.position = “none” allowed us to remove the legend, and the function scale_fill_manual() allowed us to input the exact hex codes for the colours in the graph from the paper. We obtained the hex codes through using a a website by which we could upload photos and extract colour values.

We used ggtitle() to create the letter representing each plot at the top left of the graph, and then used facet_grid() in order to enclose the title in a grey box, similar to the author’s plot.

# data by condition
data2$cond <- factor(data2$cond)
fig1_cond <- ggplot(data2, aes(x = cond)) +
  geom_col(
    aes(y = claimpercent2, fill = cond),
    width = .3
  ) +
  ggdist::stat_halfeye(
    aes(y = claimpercent, fill = cond),
    adjust = .5,
    width = .4,
    .width = 0,
    point_colour = NA,
    position = position_nudge(x = 0.17, y = 0)
  ) +
  geom_errorbar(data = group_cond,
                aes(ymin = lowerCI, ymax = upperCI),
                show.legend = NA,
                width = .3
                ) +
  coord_flip() +
  expand_limits(x = 0, y = 0) +
  scale_x_discrete(name = NULL,
                     expand = c(0,0),
                     breaks = c(0, 1, 2, 3),
                     labels = c("Control", "Noise", "Secular", "Religious")
  ) +
  scale_y_continuous(name = "Percent Claimed",
                     expand = c(0,0),
                     breaks = c(0, 20, 40, 60, 80, 100)
                     ) +
  ggtitle(
    label = "Data by condition"
  ) +
  theme_classic() +
  theme(legend.position = "none") +
  scale_fill_manual(values = c("#adadad","#336ca3","#77c2ec","#ecb234")) +
  ggtitle(label = "A.") +
  facet_grid(. ~ "Data by condition")
plot(fig1_cond)

# data by site
data2$site <- factor(data2$site)
fig1_site <- ggplot(data2, aes(x = site)) +
  geom_col(
    aes(y = claimpercent3, fill = site),
    width = .3
  ) +
  ggdist::stat_halfeye(
    aes(y = claimpercent, fill = site),
    adjust = .5,
    width = .3,
    .width = 0,
    point_colour = NA,
    position = position_nudge(x = 0.2, y = 0)
  ) +
  geom_errorbar(data = group_site,
                aes(ymin = lowerCI, ymax = upperCI),
                width = .3
  ) +
  coord_flip() +
  expand_limits(x = 0, y = 0) +
  scale_x_discrete(name = NULL,
                     expand = c(0,0), 
                     breaks = c(0, 1, 2),
                     labels = c("USA","Japan","Czech Rep.")
                     ) +
  scale_y_continuous(name = "Percent Claimed",
                     expand = c(0,0),
                     breaks = c(0, 20, 40, 60, 80, 100)
                     ) +
  theme_classic() +
  theme(legend.position = "none") +
  scale_fill_manual(values = c("#fec8bc","#f2667c","#d75b5c")) +
  ggtitle(label = "B.") +
  facet_grid(. ~ "Data by site") 
plot(fig1_site)

We then plotted the two graphs side by side using grid.arrange() from the gridExtra package.

grid.arrange(fig1_cond, fig1_site, ncol = 2)

Figure 2

We cleaned up the code first, just like all the other figures/tables. After working on the code for a while we realized we would need to reverse code some variables, namely religiosity and ritual. The abs() function takes the absolute value of the equation inside, so when combined with the subtraction, this allows our code to be reordered back to front.

library(gmodels)
data3 <- read.csv("Nichols_et_al_data.csv")
#Cleaning up the data for use
dataA <- data3 %>%                                                          
  filter(include == 0) %>% 
  
  rename(cond = con,
         claimpercent = claim,
         claimmoney = moneyclaim,
         CT_practice = completion.time..practice.included.,
         CT_payments = completion.time..payments.only.,
         religiosity = relig,
         religion = Religion) %>% 
  
  mutate(religiosity = abs(religiosity - 5),
         ritual = abs(ritual - 7),              # reverse coding
         claimpercent = claimpercent * 100,     #turning this into a percentage value
         affil = as.factor(affil_cong)) %>%      
         
  
  select(cond:ritual, -claimmoney, -sex, -age, -Religion.Text, -religion, -starts_with("CT")) %>% # selecting only the columns required
  
  as_data_frame() 
## Warning: `as_data_frame()` was deprecated in tibble 2.0.0.
## Please use `as_tibble()` instead.
## The signature and semantics have changed, see `?as_tibble`.

We also reordered and factored the conditions as per the author’s code.

# Re-order conditions to: religious, secular, noise, and control
dataA$cond[dataA$cond==4] <- 0 # Make religious prime the reference category
dataA$cond[dataA$cond==1] <- 4 # This is in a weird order as R reads the code line by line, so if we go from top to bottom, 
dataA$cond[dataA$cond==3] <- 1 # we're changing the number twice which screws up our dataframe
dataA$cond[dataA$cond==4] <- 3
# labelling the levels of the treatment variable
dataA$cond <- factor(dataA$cond, levels= c(0,1,2,3),
                labels = c("Religious", "Secular", "Noise","Control"))

The next piece of code creates a dataset that takes the data of all participants from dataA who are in the religious condition. This was done in order to plot the SE line in the first 2 plots, as the SE line only appears for the religious condition.

dataB <- dataA %>% 
  filter(cond == "Religious")

We reused the code from Fig1 to figure out the descriptives and CI limits for all conditions, grouping by condition and religious affiliation. Then, we relabelled the affiliation levels to either Non-affiliated or affiliated. There was no indication what each of the levels indicated - we had to use the count function to figure out the number of participants in each level and match it to the paper.

#Make a new dataframe that contains CI limits for all conditions based on whether participants are religiously affiliated or not
groupA <- dataA %>% 
  group_by(cond, affil) %>% 
  summarise(mean = mean(claimpercent, na.rm = TRUE),
            sd = sd(claimpercent, na.rm = TRUE),
            n = n()) %>% 
  drop_na() %>% 
  mutate(se = sd / sqrt(n),
         lowerCI = mean - qt(1 - (0.05/2), n - 1) * se,
         upperCI = mean + qt(1 - (0.05/2), n - 1) * se) %>% 
  ungroup()
## `summarise()` has grouped output by 'cond'. You can override using the `.groups` argument.
# labelling the levels of religious affiliation 
groupA$affil <- factor(groupA$affil, levels = c(0, 1), 
                       labels = c("Non-affiliated", "Affiliated"))

Figuring out how to code the plots took a bit of time. The first thing to figure out was how to get 4 lines to appear, as attempts to use geom_smooth() failed at first. We figured out that we needed to turn the cond variable into a factor variable, rather than a numeric variable. This enabled 4 lines to appear.

figA <- ggplot(dataA, aes(religiosity, claimpercent, color = cond)) +
  geom_smooth(method = "lm")

figA
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 12 rows containing non-finite values (stat_smooth).

After that, the next thing to figure out was how to get the y limits to reach 50.

figA <- ggplot(dataA, aes(religiosity, claimpercent, color = cond)) +
  geom_smooth(method = "lm") +
  ylim(c(0, 50))

figA
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 86 rows containing non-finite values (stat_smooth).

When we tried using the ylim() function, the lines changed completely. After reviewing the documentation, we figured out that ylim() and other similar functions simply zoomed in or out of the graph, which caused data points to disappear when using geom_smooth. We found a function, coord_cartesian() that allowed us to set limits without losing any data points. We then used functions such as labs(), theme_light(), theme(), and scale_color_manual to change the graph to look similar to the paper. The last thing to figure out was how to put the title into a grey box. This grey box typically only appears in graphs that have used the facet function which is typically used with multiple graphs, so we just used the facet_grid() function on one table at a time in order to get the grey box to appear.

figA <- ggplot(dataA, aes(religiosity, claimpercent, color = cond)) +
  geom_smooth(method = "lm", se = FALSE, size = 1.5) + #method = "lm" creates a straight line of best fit
  geom_smooth(data = dataB, size = 0, se = TRUE, method = "lm", fill = "#fbf1d8") + #this creates the SE for the religious condition
  theme_light() + #Gives white background to plot
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), legend.position = "none") + #Removes gridlines from plot, removes legend
  coord_cartesian(ylim = c(0, 50)) + #Sets y limit to 50
  labs(x = "Religiosity", y = "Percentage claimed") + #axis labels 
  facet_grid(. ~ "Condition*Religiosity") + # title, we use facet_grid to put it in a grey box
  scale_color_manual(values = c("#dea520", "#5ab3e4", "#094689", "#a4a4a4")) #change the colours of the lines to fit source

figA
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 12 rows containing non-finite values (stat_smooth).
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 6 rows containing non-finite values (stat_smooth).

We used two instances of geom_smooth. One was to plot all conditions, whereas one was to plot only the religious condition SE, using the dataset we had created before with the grouped table.

figB <- ggplot(dataA, aes(ritual, claimpercent, color = cond)) +
  geom_smooth(method = "lm", se = FALSE, size = 1.5) +
  geom_smooth(data = dataB, size = 0, se = TRUE, method = "lm", fill = "#fbf1d8") +
  theme_light() +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), legend.position = "none") +
  coord_cartesian(ylim = c(0, 50)) +
  labs(x = "Ritual frequency", y = "Percentage claimed") +
  facet_grid(. ~ "Condition*Ritual frequency") +
  scale_x_continuous(breaks = seq(0, 6, 1)) +  #adjusting the tick marks so 7 marks appear
  scale_color_manual(values = c("#dea520", "#5ab3e4", "#094689", "#a4a4a4"))

figB
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 23 rows containing non-finite values (stat_smooth).
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 7 rows containing non-finite values (stat_smooth).

The third plot took the most time to figure out. This was in part due to needing to plot the CI limits, which was explained above. The next biggest issue was figuring out how to use geom_line(). Our initial attempts to use geom_line() failed to produce any lines. After scouring the geom_line() documentation, we realized that we needed to use the group argument in order to tell geom_line() what it was plotting. This took a while to figure out as the documentation used the argument (group = group) in their example which did not make much sense at first, as they were grouping the variable “group”. The next thing to figure out was how to get the geoms to not overlap over each other. After some research, we found the function position_dodge, which moved each line by a set distance in order to stop them from overlapping.

figC <- ggplot(groupA, aes(affil, mean, color = cond)) +
  geom_line(aes(group = cond), position = position_dodge(width = 0.5), size = 1.5) + #group our lines by condition, we use position dodge so nothing overlaps
  geom_errorbar(data = groupA, aes(ymin = lowerCI, ymax = upperCI), width = 0.2, position = position_dodge(width = 0.5), size = 1.3) +
  theme_light() +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
  ylim(0, 50) +
  scale_x_discrete() +
  labs(x = "Religious affiliation", y = "Percentage claimed") +
  facet_grid(. ~ "Condition*Religious affiliation") +
  scale_color_manual(values = c("#dea520", "#5ab3e4", "#094689", "#a4a4a4"))

figC

The first part of the figure that we knew would be a bit difficult was how to display 3 graphs at once. This was due to the fact that facet would not work for us as we were using different variables in each plot. After some research, we came across the gridExtra package which came with the function grid.arrange() which allowed us to plot 3 graphs side by side.

grid.arrange(figA, figB, figC, ncol = 3)
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 12 rows containing non-finite values (stat_smooth).
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 6 rows containing non-finite values (stat_smooth).
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 23 rows containing non-finite values (stat_smooth).
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 7 rows containing non-finite values (stat_smooth).