1 Summary and Reaction

Koenig-Robert, R., El Omar, H., & Pearson, J. (2022). Unconscious Bias Training Removes Bias Effect from Non-Conscious Stimuli, Restoring Choice Divergence. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.4308582

Subliminal stimuli refers being exposed to information that is presented below a person’s awareness. Subliminal information permeates our world, such as advertising impact consumer choice or social media propagating unrealistic ideals. The impact of subliminal information on the decisions we make has remained a growing field in cognitive science. Previous studies have found that when we are presented with with non-conscious information, the decisions we make lean into those subliminal primes without our knowledge. This occurs across a range of behaviours including attitudes or biases, how we learn or even what we see. The notion that subliminal information influences how we act raises the question of whether we can be trained to steer away from biased choices, meaning that decision making can potentially be recovered.

The present study by Koenig et al. (2022) aims to investigate whether the influence of subliminal information is malleable and if we could diverge away from previous biases. The researchers propose that if behaviour can move away from subliminal biases, this would suggest that the process from subliminal perception to behaviour are subject to influence (such as learning) and not always rigidly biased in the same manner. They hypothesise that, firstly, presenting subliminal stimuli did bias a decision task, and, secondly, that training could cause choice divergences from subliminal stimuli.

To test hypothesis 1, 17 participants underwent an imagery-based subliminal prime task. Those in the main experiment were presented with a non-moving stimulus (either a red or green grating) in their non-dominant eye and a random, moving pattern in their dominant eye. The point of this was so the grating becomes consciously repressed due to the moving stimuli, hence becoming subliminal. They were then asked to imagine which grating they saw, and then select which grating they chose from one of two options. They were then asked to rate how vivid their imagery was (i.e. vividness) and how much they believed they were in control about their grating choice (i.e. agency). Those in the no-imagery control condition underwent the same paradigm, however they did not have an imagination period. Results showed that subliminal gratings did influence participant selection, as participants readily chose the grating that was subliminally presented to them. To test hypothesis 2, 7 participants were selected from the main experiment who displayed above chance priming (i.e. they were more likely to chose the same grating as the subliminal prime). These participants were trained on the same paradigm weeks after the main experiment, however they were given a training blocks where they received feedback on whether the chosen grating was correct or incorrect. Right selections were if the chosen stimulus did not match the subliminal stimulus. Results showed that training that moves away from subliminal stimuli does switch participant choices, and they readily select more incongruent choices. In post experiment interviews, participants reported not knowing how to perform the task, however, they still could implicitly learn to diverge from the subliminal stimuli. The figure below shows the steps participants undertook for both the subliminal prime and training experiments.

These results demonstrate that choice bias towards subliminal stimuli can be reversed through training. This finding is of note as non-conscious information that is learnt has been regarded in the literature as being unalterable. Being able to learn to make decisions against subliminal information may have practical implications for treating those who may be more susceptible to processing negative subliminal information. For example, in the case of addiction, helping to diverge away from stimuli related to addictive behaviours that are subliminal processed, or in the case of eating disorders, diverging away from stimuli which may promote a harmful body image.

I was surprised that there was a way to test the influence of non-conscious information. Since subliminal information happens below a person’s awareness, I thought it would be difficult to measure the effectiveness of it since we’re not even conscious of its influence on us. Also, I thought it would be difficult to come up with an experiment of subliminal information that isn’t impacted by individual effects. Seeing this procedure, I understood how there could be an empirical way to test this phenomenon. Having participants respond to whether they saw a completely novel stimulus from their non-dominant eye, which would be considered subliminal since although vision is obscured, it is still processing information in the brain. This type of visual suppression has been used quite frequently to control the observer’s awareness of critical stimuli. However, I’m still a little sceptical about how well awareness is measured in this paradigm. Although the researchers did include an agency rating (defined by how well participants felt in control of their decision), I would have liked the researchers to have looked into this variable further in their results as a more thorough check of whether information received was truly processed unconsciously.
I’m not sure I understood why certain orientations and colours were chosen. In the paper itself, there did not appear to be any rationale behind the two different choices of colours (red and green) or the orientations (horizontal and vertical) of the gratings. I do know that looking at the colour red for too long then turning away can cause an afterimage of green since they are complementary colours. However, I don’t know for certain whether alternating between the two colours was done to prevent any afterimages, or if there is another reason to do with priming effectiveness of red and green stimuli. Moreover, I was unsure as to why specifically a horizontal and vertical grating were used compared to any other type of pattern. My best guess, again, has to do with preventing any afterimage effects, but I still wasn’t sure how using different orientation of lines was better than using any other type of shape.
I wonder whether these effects could also be translated into an audio medium. When I read the title and saw “subliminal”, the first thing I thought of was audio subliminal messaging. I’m most familiar with them since they’re big in pop culture for self-help purposes and are claimed to help unconsciously improve certain skills, like language learning. Seeing how decisions can be influenced by imagery-based subliminal information, I think it’d be interesting to investigate audio-based subliminal priming.

(As an aside, there is the RPubs link to this entire report: https://rpubs.com/daphnely03/1069878 in the case that code may be cut off!)

2 Verification

This begins my very first experience in coding and attempting to reproduce this study with the help of my team members.

The goal of this section was to reproduce the demographics and figures within the paper based on the data set provided. The only demographics listed in the paper was for the main experiment, which had 29 participants consisting of 18 females aged between 18 and 39 (pg.2). However, this data was not given, and, after discussing with the paper’s author about how these demographics could be obtained, there was actually no proper record of gender or age for any participants within the study. Unfortunately, reproducing demographics of this study was not possible.

As for the figures, there were 4 main figures within the paper:

2.1 Loading Packages and Reading Data

I began by loading the appropriate packages in R so we could wrangle our data and create similar looking plots. Some packages required to be installed, where the install.packages() function was used. For pre-existing packages, the library() function was used with the name of the package embedded within brackets.

library(tidyverse) # For general data wrangling using dplyr and ggplot for figures

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(ggnewscale) # Create multiple colour scales for different geoms in one graph

## Warning: package 'ggnewscale' was built under R version 4.3.1

library(ggsignif) # For significance lines in figures

## Warning: package 'ggsignif' was built under R version 4.3.1

library(knitr) # To help with report generation
library(tidyr) # For the drop_na function which allows me to drop NA values
library(conflicted) # Resolves conflicts by letting me choose which package I want a function from

I then read in “PearsonLab_Data_Collated.csv” file from the data folder in the working directory. This was the main file used for all the participants included across experiments. I also read in the “included_excluded_data.csv” file which contained data from the 13 participants excluded from the main experiment (this was actually omitted in the original data file, and we had to request it from our supervisor to reproduce the figures). I used the read_csv function as part of the readr package in the tidyverse library.

data <-read_csv("data/PearsonLab_Data_Collated.csv")

## New names:
## Rows: 11087 Columns: 13
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (2): file, phase dbl (11): ...1, condition, primedGrating, chosenGrating,
## brokenSuppression, ...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`

# Printing included participant data
print(data)

## # A tibble: 11,087 × 13
##     ...1 file  condition primedGrating chosenGrating brokenSuppression vividness
##    <dbl> <chr>     <dbl>         <dbl>         <dbl>             <dbl>     <dbl>
##  1     1 01_1…         1             1             1                 0        NA
##  2     2 01_1…         1             2             2                 0        NA
##  3     3 01_1…         3             2            NA                 1        NA
##  4     4 01_1…         1             1             1                 0        NA
##  5     5 01_1…         1             1             1                 0        NA
##  6     6 01_1…         1             1             1                 0        NA
##  7     7 01_1…         3             1            NA                 1        NA
##  8     8 01_1…         1             2             1                 0        NA
##  9     9 01_1…         1             2             1                 0        NA
## 10    10 01_1…         1             2             2                 0        NA
## # ℹ 11,077 more rows
## # ℹ 6 more variables: preImageryTime <dbl>, agency <dbl>, right_wrong <dbl>,
## #   phase <chr>, PID <dbl>, block <dbl>

excluded_data <- read_csv("data/included_excluded_data.csv")

## New names:
## Rows: 33 Columns: 4
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (4): included, ...2, ...3, ...4
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...2`
## • `` -> `...3`
## • `` -> `...4`

# Printing excluded participant data
print(excluded_data)

## # A tibble: 33 × 4
##    included    ...2            ...3              ...4                 
##    <chr>       <chr>           <chr>             <chr>                
##  1 priming     catch detection suppression break difference  red/green
##  2 0.660714286 0.9             0.066666667       0.071428571          
##  3 0.867924528 1               0.116666667       0.094339623          
##  4 0.583333333 0.8             0.016393443       0.033333333          
##  5 0.559322034 1               0.016666667       0.016949153          
##  6 0.661016949 1               0.078125          0.084745763          
##  7 0.491525424 0.933333333     0.016666667       0.016949153          
##  8 0.568965517 1               0.033333333       0.034482759          
##  9 0.672413793 1               0.033333333       0                    
## 10 0.593220339 1               0.016666667       0.016949153          
## # ℹ 23 more rows

2.2 Plot 1: Decision Priming and Vividness Across Congruence

Figure 1 included the effects of subliminal gratings on subliminal priming (1a) and vividness (1b). This figure had two components:

Plotting individual participant results.
Plotting total averages.

Beginning with Figure 1A, we first had to create a new data frame that included the necessary conditions before we could calculate the average priming. The steps are as follows:

Began by creating a data frame called “plot1a_data” from the included participant data.
Then piped this into a filter() function (specifying the function from the dplyr package) for the control and main experimental phases only, for those in a normal priming trial, and for when suppression is not broken.
We came across a discrepancy in the raw data set where some blocks were listed as “0”, so we made sure to also filter() so block did not equal 0 (as an aside, our supervisor mentioned how it could have referred to a trial that some kind of bug occurred, but it was still included in the data set anyway).
Pipe this filter specifications into a mutate() function to create new columns to determine whether trials had successful priming.
- Called this new column “primed”, as used the function case_when() to specify that it should return a TRUE value if primed and chosen grating should both be the same value (where 1 means green, and 2 means red). So, if a participant were primed with a subliminal green grating and they chose a green grating later on, hooray successful priming!
Pipe this into a group_by() to group the data by phase and participant ID to retain those columns and return the above values over blocks.
Then, with the summarise() function, we count the total number of trials completed by each participant by n(), and summing the number of successful primed by removing the NA values and only retaining TRUE values. We also calculated the percentage of primed trials by calculating 100 multiplied by the amount primed over the total number of trials.
We then ungroup() the data.

plot1a_data <- data %>% 
  dplyr::filter(phase %in% c("ctrl", "main"), condition == 1, brokenSuppression == 0, block != 0) %>% 
  mutate(primed = case_when(primedGrating == 1 & chosenGrating == 1 ~ TRUE,
                            primedGrating == 2 & chosenGrating == 2 ~ TRUE)) %>% 
  group_by(phase, PID) %>% 
  summarise(trials = n(), 
            number_primed = sum(primed, na.rm = TRUE), 
            percentage = 100*number_primed/trials) %>% 
  ungroup()

## `summarise()` has grouped output by 'phase'. You can override using the
## `.groups` argument.

glimpse(plot1a_data)

## Rows: 39
## Columns: 5
## $ phase         <chr> "ctrl", "ctrl", "ctrl", "ctrl", "ctrl", "ctrl", "ctrl", …
## $ PID           <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1…
## $ trials        <int> 117, 119, 116, 117, 119, 106, 119, 118, 81, 95, 113, 74,…
## $ number_primed <int> 81, 61, 62, 63, 63, 54, 61, 94, 38, 51, 55, 40, 71, 52, …
## $ percentage    <dbl> 69.23077, 51.26050, 53.44828, 53.84615, 52.94118, 50.943…

And we now have each individual primed percentage for those included in the main experiment! However, we also realised that the figure also includes different coloured points for those participants who were later included in the subliminal training. This meant we had to distinguish these individuals from the rest of the participants. Using the plot1a_data, we piped this into another mutate() function and created a new column called “train” to specify those participants included in the training phase from this main experiment. Participant IDs for those included and excluded from training were given to us by our supervisor. We used the case_when() function to specify only the participants IDs included in training which were 1, 2, 4, 7, 8, 16, 17. These values were returned as TRUE and anything else as FALSE. We also manually reordered the factor levels in the phase column so that “main” appears first then “control” using the fct_relevel() function. This is so it replicated how the conditions in the figure are ordered.

plot1a_data <- plot1a_data %>% 
  mutate(train = case_when(phase == "main" & PID == 1 | 
                           phase == "main" & PID == 2 | 
                           phase == "main" & PID == 4 | 
                           phase == "main" & PID == 7 | 
                           phase == "main" & PID == 8 | 
                           phase == "main" & PID == 16 | 
                           phase == "main" & PID == 17 ~ TRUE,
                           TRUE ~ FALSE))

plot1a_data$phase <- fct_relevel(plot1a_data$phase, c("ctrl", "main"))

glimpse(plot1a_data)

## Rows: 39
## Columns: 6
## $ phase         <fct> ctrl, ctrl, ctrl, ctrl, ctrl, ctrl, ctrl, ctrl, ctrl, ct…
## $ PID           <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1…
## $ trials        <int> 117, 119, 116, 117, 119, 106, 119, 118, 81, 95, 113, 74,…
## $ number_primed <int> 81, 61, 62, 63, 63, 54, 61, 94, 38, 51, 55, 40, 71, 52, …
## $ percentage    <dbl> 69.23077, 51.26050, 53.44828, 53.84615, 52.94118, 50.943…
## $ train         <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, …

The last piece for Figure 1A was including the priming averages. The means documented in the paper for the main experiment condition was 62.8% with a SEM of ±3.04, and 57.65% with a SEM of ±2.07 for the control condition.

plot1a_mean <- plot1a_data %>% # Using the plot1a_mean data frame
  group_by(phase) %>%          # Grouping by the main/control phase
  summarise(percent = mean(percentage), # Calculating the average priming using mean() function
            SEM = sd(percentage)/sqrt(n())) # SEM calculated by SD divided by total trial number.

glimpse(plot1a_mean)

## Rows: 2
## Columns: 3
## $ phase   <fct> ctrl, main
## $ percent <dbl> 57.27215, 62.82850
## $ SEM     <dbl> 2.088622, 3.033010

plot1a_mean$phase <- fct_relevel(plot1a_mean$phase, c("main", "ctrl"))

write_csv(plot1a_mean, file = "plot1adescriptives.csv")

And we managed to get it almost exactly the same! With a mean of 62.8% and SEM of ±3.03 for the main experiment, and a mean of 57.2% and SEM of ±2.08 for the control experiment. There were slight rounding errors, but we were told it was most likely because the paper used a different programming tool (i.e. MatLab).

We could then move on to putting it all together. This was a tedious process since there were a few different coloured components alongside the null significance bar. The main steps to reproducing the figure are as follows:

Creating a new object called “plot1a” to store the figure in.
Using theggplot2 package, I used the ggplot() function to create a column graph using geom_col() with local data argument of plot1a_mean and aesthetic mapping where x axis is phase and y is percentage priming (mean).
I learned that the order of geoms is important! So having geom_hline() first, then geom_col(), then geom_errorbar() allows the geoms to be layered on top of each other in the correct order. Otherwise, there was a line at 50% cutting on top of the columns or the error bars would be hidden beneath the columns.
Then using geom_jitter for local plot1a_data to displays points that have individual participant priming.
Using scale_y_continuous I created breaks in the axis and added labels for 50, 70 and 90 whilst replacing every second tick in between with a blank using the rep() function.
Labeling the variables and axes appropriately using the labels() function in scale_x_discrete, and the labs() function.
Using the theme() function to alter specific elements of the graph, such as ticks and panel background.

plot1a <- ggplot() +
  geom_hline(yintercept = 50, linewidth = 0.3) +
  geom_col(plot1a_mean, 
           mapping = aes(x = phase, y = percent, fill = phase), 
           width = 0.7, colour = "black") +
  geom_errorbar(plot1a_mean, 
                mapping = aes(x = phase, ymin = percent - SEM, ymax = percent + SEM), 
                width = 0, 
                linewidth = 1) +
  geom_jitter(aes(x = phase, y = percentage, fill = train), 
              plot1a_data, 
              width = 0.1, 
              pch = 21) +
  coord_cartesian(ylim = c(40, 100)) + # Restricting the limits of the y-axis from 40 to 100
  scale_y_continuous(breaks = c(50, 60, 70, 80, 90, 100),
                     labels = c(50, rep("", 1), 
                                70, rep("", 1), 
                                90, rep("", 1))) +
  scale_x_discrete(labels = c("Main experiment", "No imagery ctrl")) +
  theme_minimal() +
  labs(
    title = "Decision Priming",
    subtitle = "Across Experiment",
    x = "Experiment",
    y = "Decision Priming (%)") +
  theme(axis.line.y = element_line(colour = "black", linewidth = 0.5), 
        axis.ticks.y = element_line(colour = "black", linewidth = 0.5),
        axis.ticks.length = unit(-0.1, "cm"),
        axis.line.x = element_line(colour = "black", linewidth = 0.5),
        panel.grid = element_blank(), 
        legend.position = "none")  

#plot plot1a

plot(plot1a)

Now that the basic graph looked correct, there were still a few elements that weren’t the same. It was looking a bit too colourful so I needed to change it to black and white. Also, I still didn’t have the significance bar between the two conditions. To add these details, I did the following:

Used scale_fill_manual to specify the colours for average priming and individual participants.
Used geom_signif() as part of the ggsignif package to create the null significance bar across the two conditions.
Used geom_text() to add the astericks above the columns indicating level of significance.

plot1a <- plot1a +
  scale_fill_manual(values = c("FALSE" = "#FFFFFF", 
                               "TRUE" = "#000000", 
                               "main" = "#808080", 
                               "ctrl" = "#FFFFFF")) +
  geom_signif(data = plot1a_mean, aes(factor(phase), percent),
              comparisons = list(c("main", "ctrl")),
              y_position = 98,
              map_signif_level =  TRUE,
              annotations = "n.s.") +
  geom_text(data = data.frame(phase = "main", y = 92.5),
    aes(phase, y = y, size = 2), label = "***", colour = "#808080") +
  geom_text(data = data.frame(phase = "ctrl", y = 92.5),
    aes(phase, y = y, size = 2), label = "**", colour = "#808080")

## Warning in geom_signif(data = plot1a_mean, aes(factor(phase), percent), : You
## have set data and mapping, are you sure that manual = FALSE is correct?

plot(plot1a)

Figure 1a complete!

Now, onto Figure 1b. Similarly, we had to plot individual participant results and the total averages. This was more tedious since there were a greater number of conditions we had to adhere to. Now we knew we had to distinguish participants who were trained compared to those who weren’t, we incorporated that into our code. The steps are as follows:

Start with the included participant data and call a new data “plot1b_data”.
filter() for the control and main experiment (phase) only, priming and empty conditions (1 and 2) and when suppression isn’t broken (0). Also, filter out for those blocks that equal to 0/
mutate() to create a new column that tells us if a trials had congruence between primed and chosen grating. Call this “congruence” and let it return a congruent value when the primed and chosen grating are the same. Return “incongruent” value if primed and chosen grating are different. Return “empty” if the condition is 2 for empty trials.
group_by() congruence and participant ID to retain those columns and return values over blocks.
summarise() the mean vividness.
mutate() to create a new column called “train” to specify those included in the training experiment for incongruent, empty and congruent conditions. Also, specify the respective colour for these participants that will appear later on in the plot. This section is long! But we had to ensure that we were getting each participant for each congruence condition.
Manually reorder the factor levels in the congruence column using the fct_relevel() function.

plot1b_data <- data %>% 
  dplyr::filter(phase == "main", condition != 3, brokenSuppression == 0, block != 0) %>% 
  mutate(congruence = case_when(condition == 1 & primedGrating == 1 & chosenGrating == 1 ~ "congruent",
                                condition == 1 & primedGrating == 2 & chosenGrating == 2 ~ "congruent",
                                condition == 1 & primedGrating == 1 & chosenGrating == 2 ~ "incongruent",
                                condition == 1 & primedGrating == 2 & chosenGrating == 1 ~ "incongruent",
                                condition == 2 ~ "empty")) %>% 
  group_by(congruence, PID) %>% 
  summarise(mean = mean(vividness)) %>% 
  ungroup() %>% 
  mutate(train = case_when(congruence == "incongruent" & PID == 1 | 
                           congruence == "incongruent" & PID == 2 | 
                           congruence == "incongruent" & PID == 4 | 
                           congruence == "incongruent" & PID == 7 | 
                           congruence == "incongruent" & PID == 8 | 
                           congruence == "incongruent" & PID == 16 | 
                           congruence == "incongruent" & PID == 17 | 
                           congruence == "empty" & PID == 1 | 
                           congruence == "empty" & PID == 2 | 
                           congruence == "empty" & PID == 4 | 
                           congruence == "empty" & PID == 7 | 
                           congruence == "empty" & PID == 8 | 
                           congruence == "empty" & PID == 16 | 
                           congruence == "empty" & PID == 17 ~ "BLACK",
                           congruence == "congruent" & PID == 1 | 
                           congruence == "congruent" & PID == 2 | 
                           congruence == "congruent" & PID == 4 | 
                           congruence == "congruent" & PID == 7 | 
                           congruence == "congruent" & PID == 8 | 
                           congruence == "congruent" & PID == 16 | 
                           congruence == "congruent" & PID == 17 ~ "GREY", 
                           TRUE ~ "WHITE"))

## `summarise()` has grouped output by 'congruence'. You can override using the
## `.groups` argument.

plot1b_data$congruence <- fct_relevel(plot1b_data$congruence, c("empty", "incongruent", "congruent"))

#print plot1b_data

glimpse(plot1b_data)

## Rows: 51
## Columns: 4
## $ congruence <fct> congruent, congruent, congruent, congruent, congruent, cong…
## $ PID        <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, …
## $ mean       <dbl> 2.540541, 3.478261, 3.342857, 2.090909, 2.972973, 2.310345,…
## $ train      <chr> "GREY", "GREY", "WHITE", "GREY", "WHITE", "WHITE", "GREY", …

For Figure 1b, there did not appear to be any means documented in the paper for the vividness rating between primed and empty trials. So, we had to rely on comparing our averages to the plot in the paper by eye.

plot1b_mean <- plot1b_data %>% # New data frame
  group_by(congruence) %>% # Group by congruence to retain column
  summarise(average = mean(mean), # Summarise means and calculate SEM
            SEM = sd(mean)/sqrt(n())) 

plot1b_mean$congruence <- fct_relevel(plot1b_mean$congruence, # Manually reorder factor levels
                                      c("empty", "incongruent", "congruent")) 
# Print plot1b_mean
glimpse(plot1b_mean)

## Rows: 3
## Columns: 3
## $ congruence <fct> empty, incongruent, congruent
## $ average    <dbl> 2.241783, 2.416609, 2.510490
## $ SEM        <dbl> 0.1637733, 0.1691102, 0.1574406

The process of creating the plot for 1b was similar to 1a. The main steps include:

Created an object called “plot1b” and use ggplot and geom_col for column graph using local plot1b_mean data. The aesthetic mapping for this is where x is congruence and y is the mean vividnes.
Used geom_errorbar using local plot1b_mean data to add the SEM for each congruence condition.
Used geom_jitter() for the individual points of participant mean vividness using local plot1b_data, and aesthetic mapping where x is congruence and y is the mean vividness.
Used geom_signif as part of the ggsignif package to create the significance bar across the empty and incongruent condition, and empty and congruent condition.
Used scale_fill_manual to specify the colours for average priming and individual participants (remembering that we coded BLACK, WHITE, and GREY for each participant which made our lives much easier to colour the points).
Used scale_y_continuous to create appropriate axis breaks and replacing every second tick using the rep() function.
Finally, labeling the axes and congruence conditions using scale_x_discrete and labs(), as well as theme() to modify specific elements of the plot.

plot1b <- ggplot() +
  geom_col(plot1b_mean, 
           mapping = aes(x = congruence, y = average, fill = congruence), 
           colour = "black", 
           width = 0.7) +
  geom_errorbar(plot1b_mean, 
                mapping = aes(x = congruence, ymin = average - SEM, ymax = average + SEM), 
                width = 0, 
                linewidth = 1) +
  geom_jitter(aes(x = congruence, y = mean, fill = train), 
              plot1b_data, 
              pch = 21, 
              width = 0.1) +
  geom_signif(plot1b_mean, 
              mapping = aes(factor(congruence), average),
              comparisons = list(c("empty", "incongruent")),
              y_position = 3.7,
              map_signif_level =  TRUE,
              annotations = "*") +
  geom_signif(plot1b_mean, 
              mapping = aes(factor(congruence), average),
              comparisons = list(c("empty", "congruent")),
              y_position = 3.85,
              map_signif_level =  TRUE,
              annotations = "*") +
  scale_fill_manual(values = c("WHITE" = "#FFFFFF", 
                               "BLACK" = "#000000", 
                               "GREY" = "#808080", 
                               "incongruent" = "#808080", 
                               "empty" = "#FFFFFF", 
                               "congruent" = "#000000")) +
  coord_cartesian(ylim = c(1, 4)) +
  scale_y_continuous(breaks = c(1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5),
                     labels = c(1, rep("", 1), 
                                2, rep("", 1), 
                                3, rep("", 1),
                                4, rep("", 1))) +
  scale_x_discrete(labels = c("Empty", "Incongruent", "Congruent")) +
  theme_minimal() +
  labs(
    title = "Vividness Across Congruence",
    subtitle = "Within Main Experiment",
    x = "Trial",
    y = "Vividness") +
  theme(axis.line.y = element_line(colour = "black", linewidth = 0.5),
        axis.ticks.y = element_line(colour = "black", linewidth = 0.5),
        axis.ticks.length = unit(-0.1, "cm"),
        axis.line.x = element_line(colour = "black", linewidth = 0.5),
        panel.grid = element_blank(),
        legend.position = "none")

## Warning in geom_signif(plot1b_mean, mapping = aes(factor(congruence), average),
## : You have set data and mapping, are you sure that manual = FALSE is correct?

## Warning in geom_signif(plot1b_mean, mapping = aes(factor(congruence), average),
## : You have set data and mapping, are you sure that manual = FALSE is correct?

#plot plot1b
plot(plot1b)

And that’s all for Figure 1b!

2.3 Plot 2: Included and Excluded Participants

This was a tough plot to reproduce because we needed to compute 4 different exclusion criteria measures and differentiate the excluded and included participants. Although we were given the data for the included participants for figure 2, we wanted to see if we could code it ourselves. We approached this carefully, making sure to check that we were selecting the right number identifiers for each variable.

To begin, we started by coding the data for the included participants. The steps we took were as follows:

Created a data frame called “plot2_included” from the included participants data set.
filter() for the main experiment (phase) only.
mutate() several new columns based on different criteria:
- Firstly, to determine whether trials has successful priming
  - Called a value “primed” and let it return a primed value when the primed and chosen grating are the same and suppression isn’t broken (i.e. brokenSuppression == 0).
  - Similarly, called a value “unprimed” and let it return an unprimed value when primed and chosen grating differ and suppression isn’t broken.
  - Called a value “broken” when suppression was broken.
- Also, called values for “green_priming” and “red_priming”, whereby primed and chosen grating are green and suppression isn’t broken and primed and chosen grating are red and suppression isn’t broken, respectively.
- Also, called a value for “catch_detection” which looks only at catch trials (i.e. condition == 3) and suppression is broken (i.e. brokenSuppression == 1).
Used the group_by() function to group by participant IDs to retain those columns and return the above values based on blocks.
Used the summarise() function to summarise the number of trials using sum() by which a certain criteria was met:
- Firstly, the total number of primed trials for when variables in the “priming” column is equal to “primed” and removing NA values over total normal trials.
- Secondly, the total number of caught trials for when variables in the “catch_detection” is equal to “caught” and removing NA values over total catch detection trials.
- Thirdly, the total number of trials where suppression was broken over total normal trials.
- Finally, the total absolute value (using the abs() function) of green primed trials minus red primed trials over total normal trials.

plot2_included <- data %>% 
  dplyr::filter(phase == "main", block != 0) %>% 
  mutate(priming = case_when(condition == 1 & primedGrating == 1 
                             & chosenGrating == 1 & brokenSuppression == 0 ~ "primed",
                             condition == 1 & primedGrating == 2 
                             & chosenGrating == 2 & brokenSuppression == 0 ~ "primed",
                             condition == 1 & primedGrating == 1 
                             & chosenGrating == 2 & brokenSuppression == 0 ~ "unprimed",
                             condition == 1 & primedGrating == 2 
                             & chosenGrating == 1 & brokenSuppression == 0 ~ "unprimed",
                             condition == 1 & brokenSuppression == 1 ~ "broken"),
         green_priming = case_when(condition == 1 & primedGrating == 1 
                                   
                                   & brokenSuppression == 0 ~ "primed_green"),
         red_priming = case_when(condition == 1 & primedGrating == 2 
                                 & brokenSuppression == 0 ~ "primed_red"),
         catch_detection = case_when(condition == 3 & brokenSuppression == 1 ~ "caught")) %>% 
  group_by(PID) %>% 
  summarise(priming = sum(priming == "primed", 
                          na.rm = TRUE)/sum(condition == 1 & 
                          brokenSuppression == 0),
            catch_detection = sum(catch_detection == "caught", na.rm = TRUE)/sum(condition == 3),
            suppression_break = sum(condition != 3 & brokenSuppression == 1)/sum(condition == 1),
            gr_difference = abs(sum(green_priming == "primed_green", na.rm = TRUE) - 
                                  sum(red_priming == "primed_red", 
                                      na.rm = TRUE))/sum(condition == 1 & brokenSuppression == 0)) 

print(plot2_included)

## # A tibble: 17 × 5
##      PID priming catch_detection suppression_break gr_difference
##    <dbl>   <dbl>           <dbl>             <dbl>         <dbl>
##  1     1   0.661           0.9              0.0667        0.0714
##  2     2   0.868           1                0.117         0.0943
##  3     3   0.593           0.8              0.0167        0.0169
##  4     4   0.559           1                0.0333        0.0169
##  5     5   0.661           1                0.0667        0.0714
##  6     6   0.492           0.933            0.0333        0.0169
##  7     7   0.569           1                0.0333        0.0345
##  8     8   0.672           1                0.0833        0     
##  9     9   0.593           1                0.0167        0.0169
## 10    10   0.537           0.967            0.133         0.0741
## 11    11   0.542           1                0.05          0.0169
## 12    12   0.627           1                0.0333        0.0169
## 13    13   0.554           1                0.183         0.0357
## 14    14   0.508           0.967            0.0167        0.0169
## 15    15   0.525           1                0.0167        0.0169
## 16    16   0.915           0.9              0.0167        0.0169
## 17    17   0.804           0.833            0.1           0.0357

Wow, that was a lot! We did manage to get values for each 4 criteria for the 17 participants included in the main experiment, and they match up with the included participant data we were given in the “excluded_data” file. However, the format the data frame is in wasn’t appropriate for the plot we had to replicate. We had to find a way to shrink this data frame so that each of the 4 criteria were in a column of it’s own with their respective value so we could plot each of the 4 criteria on the x-axis. Luckily the privot_longer() function does just that. We also needed to change a variable into a factor vector using the as.factor() function so that ggplot() can recognise and plot it later on.

plot2_included <- plot2_included %>% 
  pivot_longer(2:5, # Specifying the column numbers to manipulate
               names_to = "measure", # Moving names to a column named measure
               values_to = "portion") %>% # Putting values into a column named portion
  mutate(id = "Included") # Creating a column called id to list included

# Changing the measure variable into a factor vector 
plot2_included$measure <- as.factor(plot2_included$measure) 

# Manually reordering factor levels to the appropriate order on x-axis
plot2_included$measure <- fct_relevel(plot2_included$measure, 
                                      c("priming", "catch_detection", 
                                        "suppression_break", "gr_difference"))

print(plot2_included)

## # A tibble: 68 × 4
##      PID measure           portion id      
##    <dbl> <fct>               <dbl> <chr>   
##  1     1 priming            0.661  Included
##  2     1 catch_detection    0.9    Included
##  3     1 suppression_break  0.0667 Included
##  4     1 gr_difference      0.0714 Included
##  5     2 priming            0.868  Included
##  6     2 catch_detection    1      Included
##  7     2 suppression_break  0.117  Included
##  8     2 gr_difference      0.0943 Included
##  9     3 priming            0.593  Included
## 10     3 catch_detection    0.8    Included
## # ℹ 58 more rows

Horray! Now that the data frame is in the way we want it to be, we can code the data for the excluded participants. This was a much easier process as we were given the data for the excluded participants (which was not in the original data given, so we couldn’t give it a go ourselves like we did for the included participants). However, we did have to rename the variables to make it easier to interpret since the columns were labelled by ellipses and a number.

plot2_excluded <- excluded_data %>% 
  slice(21:33) %>% # Specifying the columns we need that have the excluded participant data 
  rename(priming = included, # Renaming columns in an interpretable way
         catch_detection = ...2, 
         suppression_break = ...3, 
         gr_difference = ...4) %>% 
  mutate(PID = 1:n()) %>%  # Creating a new column to store participant number
  pivot_longer(1:4, # Shrinking the data frame so we can plot it 
               names_to = "measure", 
               values_to = "portion") %>% 
  mutate(id = "Excluded") # Creating an id column to list excluded

# Changing portion into a numeric vector 
plot2_excluded$portion <- as.numeric(plot2_excluded$portion)

# Changing PID into a numeric vector
plot2_excluded$PID <- as.numeric(plot2_excluded$PID)

# Changing measure into a factor vector
plot2_excluded$measure <- as.factor(plot2_excluded$measure)

# Manually reordering factor levels 
plot2_excluded$measure <- fct_relevel(plot2_excluded$measure, c("priming", "catch_detection", 
                                                                "suppression_break", "gr_difference"))

print(plot2_excluded)

## # A tibble: 52 × 4
##      PID measure           portion id      
##    <dbl> <fct>               <dbl> <chr>   
##  1     1 priming            0.575  Excluded
##  2     1 catch_detection    0.1    Excluded
##  3     1 suppression_break  0.333  Excluded
##  4     1 gr_difference      0.5    Excluded
##  5     2 priming            0.559  Excluded
##  6     2 catch_detection    0.767  Excluded
##  7     2 suppression_break  0.0167 Excluded
##  8     2 gr_difference      0.0169 Excluded
##  9     3 priming            0.464  Excluded
## 10     3 catch_detection    0.6    Excluded
## # ℹ 42 more rows

Now that we have our two data frames that match up, we can combine them to form one cohesive data frame that we can plot easily. To achieve this, we used the rbind() function for both plot2_included and plot2_excluded.

plot2_data <- rbind(plot2_included, plot2_excluded)

# Once again, changing id to a factor vector
plot2_data$id <- as.factor(plot2_data$id)

Having the full data for included and excluded participants, we now can reproduce the means and SEMs reported in the paper for excluded and included participants. For catch detection this was 0.673 (±0.079 SEM) and 0.959 (±0.016 SEM), respectively, suppression breaks was 0.074 (±0.034 SEM) and 0.042 (±0.008 SEM),respectively, red/green priming difference was 0.047 (±0.038 SEM) and 0.035 (±0.007 SEM), respectively, and finally priming ratios were 0.591 (±0.035 SEM) and 0.628 (±0.023 SEM), respectively.

Using the plot2_data, we used the pivot_wider() function to return the column labels for each of the four criteria for easier viewing. We then group_by() excluded and included and summarise() the means and SEMs for each criteria.

plot2_mean <- plot2_data %>% 
  pivot_wider(id_cols = c(PID, id),
              names_from = measure, 
              values_from = portion) %>% 
  group_by(id) %>% 
  summarise(mean_priming = mean(priming), SEM_priming = sd(priming)/sqrt(n()),
            mean_catch = mean(catch_detection), SEM_catch = sd(catch_detection)/sqrt(n()),
            mean_suppression = mean(suppression_break), SEM_suppression = sd(suppression_break)/sqrt(n()),
            mean_red_green_diff = mean(gr_difference), SEM_diff = sd(gr_difference)/sqrt(n()))
print(plot2_mean)

## # A tibble: 2 × 9
##   id       mean_priming SEM_priming mean_catch SEM_catch mean_suppression
##   <fct>           <dbl>       <dbl>      <dbl>     <dbl>            <dbl>
## 1 Excluded        0.591      0.0346      0.673    0.0797           0.0739
## 2 Included        0.628      0.0303      0.959    0.0155           0.0598
## # ℹ 3 more variables: SEM_suppression <dbl>, mean_red_green_diff <dbl>,
## #   SEM_diff <dbl>

Fortunately, our means seem to line up!

Now that all the data wrangling is out of the way, we could finally move onto actually plotting the data. For our first attempt we used the geom_point() function for the individual plots, and the geom_path() function which is intended to connect observations in the order they appear in the data. However, we found that the lines between points were not connecting properly.

ggplot(plot2_data, aes(measure, portion, colour = id)) +
  geom_point(position = position_jitterdodge(jitter.width = 0.2, seed = 1)) +
  geom_path(position = position_jitterdodge(jitter.width = 0.2, seed = 1), 
            aes(group = interaction(PID, id), alpha = 0.5)) +
  scale_colour_manual(values = c("#808080", "#FF0000")) +
  scale_x_discrete(labels = c("Priming", 
                              "Catch Detection", 
                              "Suppression Break", 
                              "Red/Green difference")) +
  theme_minimal() +
  labs(
    title = "Decision Priming",
    subtitle = "Across Training Period",
    x = "Proportion of trials") +
  theme(axis.line.y = element_line(color = "black", linewidth = 0.5),
        axis.ticks.y = element_line(color = "black", linewidth = 0.5),
        axis.ticks.length = unit(-0.1, "cm"),
        axis.line.x = element_line(color = "black", linewidth = 0.5),
        panel.grid = element_blank(),
        legend.position = "none")

Well, at least the plots from geom_point() appeared to be correct…but the lines from geom_path() were not connecting as it should.

We were a bit stuck, but after some guidance and Googling we found out that we needed to use geom_line(). We had to manually adjust the where the individual points lied on the graph and have the geom_point() and geom_line() x and y values match up so they could be connected. This required a few steps. Firstly, we took the plot2_data and created a new column called “PID2” using mutate() where the condition was multiplying the participant id by 100 only if the id was “included”. This ensured that the lines connecting each point could run across all 4 criteria for each participant.

plot2_data <- plot2_data %>% 
  mutate(PID2 = ifelse(id == "Included", PID*100, PID))

We then had to create a new data frame to modify the plot2_data. We called this “plot2_datan” and used the transform() function to create a new column called “dmeasure”. Within this column, the “measure” variable was converted to a numeric vector using as.numeric() for those participants that were included and either adding or subtracting .25. This means that the points for the included participants are placed on right side, and the points for the excluded participants are placed to the left side for each of the four criteria. Another column was created called “jportion” which specifies how spread out the individual points are using the jitter() function. In this function, the amount of dispersion is set at 0.1.

plot2_datan = transform(plot2_data, dmeasure = ifelse(id == "Included", 
                              as.numeric(measure) + .25,
                              as.numeric(measure) - .25))
  
plot2_datan = transform(plot2_datan, dmeasure = ifelse(id == "Included", 
                        jitter(as.numeric(measure) + .25, .1),
                        jitter(as.numeric(measure) - .25, .1)),
                        jportion = jitter(portion, amount = 0.1))

# Converting jportion into a numeric vector
plot2_datan$jportion <- as.numeric(plot2_datan$jportion)

Now that the points have been manually manipulated, hopefully the plot now looks more like the paper! In order to achieve the different colours for the points and line, we had to download a new package called ggnewscale which allowed us to put separate colour scales for different geoms_. The main for this plot are as follows:

Using geom_line and specifying the aesthetics to be group = PID2 so the lines joining each point can be distinguished, the x and y variables and the id as either exluded or included.
Using scale_color_manual() to adjust the colour for excluded and included. Then adding new_scale_colour() to reset the colours so that any further geoms_ do not adopt the same colour scales.
Using geom_point and specifying the x and y variables and the id as either exluded or included. Also adjusting the point size and the colours appropriately.
scale_x_continuous to create four breaks for each criteria and labelling it appropriately.
Using scale_y_continuous to create breaks in the axis and replacing every second tick in between with a blank using the rep() function.
Labelling the axes and modifying graph elements to match the paper.

ggplot(plot2_datan) +
  geom_line(aes(group = PID2, x = dmeasure, y = jportion, colour = id)) +
  scale_color_manual(values = c("Excluded" = "grey", "Included" = "#FF9481")) +
  new_scale_colour() +
  geom_point(aes(x = dmeasure, y = jportion, colour = id),
             size = 2.5) +
  scale_colour_manual(values = c("Excluded" = "grey30", "Included" = "#EB4D34")) +
  scale_x_continuous(breaks = c(1, 2, 3, 4), 
                     labels = c("Priming", "Catch Detection", 
                                "Suppression Break", "Red/green difference")) +
  scale_y_continuous(breaks = c(0, 0.25, 0.5, 0.75, 1, 1.25),
                     labels = c(0, rep("", 1),
                                0.5, rep("", 1),
                                1, rep("", 1))) +
  theme_minimal() + 
  labs(y = "Proportion of trials",
       title = "Decision Priming",
       subtitle = "Across Training Period") +
  theme(axis.line.y = element_line(colour = "grey50", linewidth = 0.5),
        axis.ticks.y = element_line(colour = "grey50", size = 0.5),
        axis.text.y = element_text(colour = "black"),
        axis.line.x = element_line(colour = "grey50", linewidth = 0.5),
        axis.title.x = element_blank(),
        axis.ticks.x = element_line(colour = "grey50", size = 0.5),
        axis.text.x = element_text(colour = "black"),
        axis.ticks.length=unit(-0.10, "cm"),
        panel.grid = element_blank(),
        legend.position = c(0.75, 0.8),
        legend.title = element_blank())

## Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Success! Now it’s looking very similar to the paper.

2.4 Plot 3: Priming Across Training

Figure 3 was a little more straight forward. It was just averaging the mean percentage of priming for each training phase and plotting it. However, things that appear simple can be deceiving sometimes, and that was the case here. Before I get into the challenges we had, firstly onto how we coded the data for plot 3. The steps were as follows:

Created a new data frame called “plot3_data” from theincluded data set and used the dplyr filter() function to include only when phase includes results from training 1, 2 and 3.
Then piped this into another filter() when the condition is a normal trial and there is no broken suppression.
Used the mutate() function to create a new column called “primed” for when the primed grating and chosen grating are the same and return these values as TRUE.
group_by() participant ID, phase and block so we can then summarise() the number trials and count the percentage of trials that were primed.
ungroup() this so we can group_by() block and phase to calculate the the average percentage primed and the SEM for each individual block.

plot3_data <- data %>% 
  dplyr::filter(phase %in% c("results_train1", "results_train2", "results_train3")) %>% 
  dplyr::filter(condition == 1, brokenSuppression == 0) %>% 
  mutate(primed = case_when(primedGrating == 1 & chosenGrating == 1 ~ TRUE,
                            primedGrating == 2 & chosenGrating == 2 ~ TRUE)) %>% 
  group_by(PID, phase, block) %>% 
  summarise(trials = n(), 
            number_primed = sum(primed, na.rm = TRUE), 
            percentage = 100*number_primed/trials) %>% 
  ungroup() %>% 
  group_by(phase, block) %>% 
  summarise(mean_percentage = mean(percentage),
            SEM = sd(percentage)/sqrt(n()))

## `summarise()` has grouped output by 'PID', 'phase'. You can override using the
## `.groups` argument.
## `summarise()` has grouped output by 'phase'. You can override using the
## `.groups` argument.

# Converting block to a factor vector
plot3_data$block <- as.factor(plot3_data$block)

# Printing the data to check everything went smoothly
print(plot3_data)

## # A tibble: 15 × 4
## # Groups:   phase [3]
##    phase          block mean_percentage   SEM
##    <chr>          <fct>           <dbl> <dbl>
##  1 results_train1 1                43.1  6.02
##  2 results_train1 2                48.1  8.87
##  3 results_train1 3                52.1  7.18
##  4 results_train1 4                44.6  4.79
##  5 results_train1 5                36.0  4.99
##  6 results_train2 1                41.9  5.58
##  7 results_train2 2                36.9  7.35
##  8 results_train2 3                33.9  4.31
##  9 results_train2 4                39.3  5.13
## 10 results_train2 5                34.5  4.14
## 11 results_train3 1                39.0  8.61
## 12 results_train3 2                38.1  6.77
## 13 results_train3 3                31.8  5.72
## 14 results_train3 4                40.5  6.54
## 15 results_train3 5                42.3  5.78

However, upon checking our mean_percentage with the individual plots of figure 3, something was off. Our numbers seemed to be slightly off, either too low or too high. For example, for results training 1 in block 3, our coded output was 52.07, however the original plot had that value below 50. We thought we had done something wrong with our code and meticulously checked it line by line. We had no idea where we had gone wrong so we consulted with our supervisor. It turns out that there was an error on their end. There had been a bug in the previous code whereby they were “averaging past blocks in each session” rather than independently, thus leading to incorrect reporting in the manuscript. Our numbers we coded were actually right! Our supervisor provided an amended graph:

Moving on, the figure also had individual points on the sides of the graph denoting the baseline average and the post-training average. These were coded below in a similar fashion to the average per training session above:

plot3_baseline <- data %>% 
  dplyr::filter(phase == "training_baseline",  # Only using data from baseline training
                condition == 1,  # Again, specifying this is a normal trial
                brokenSuppression == 0) %>% 
  mutate(primed = case_when( # Again, a primed column for correct priming
                  primedGrating == 1 & chosenGrating == 1 ~ TRUE,
                  primedGrating == 2 & chosenGrating == 2 ~ TRUE,
         TRUE ~ FALSE)) %>% # 
  group_by(PID, block, phase) %>% 
  summarise(trials = n(), 
            number_primed = sum(primed, na.rm = TRUE), 
            percentage = 100*number_primed/trials) %>% 
  ungroup() %>% 
  summarise(mean_percentage = mean(percentage),
            SEM = sd(percentage)/sqrt(n()))

## `summarise()` has grouped output by 'PID', 'block'. You can override using the
## `.groups` argument.

print(plot3_baseline)

## # A tibble: 1 × 2
##   mean_percentage   SEM
##             <dbl> <dbl>
## 1            69.7  2.97

write_csv(plot3_baseline, file = "plot3basedescriptives.csv")

The mean pre-training priming percentage was reported in the paper as 69.89%, which we got a similar value of 69.67%.

plot3_trainingaverage <- data %>% 
  dplyr::filter(phase %in% c("results_train1", "results_train2", "results_train3"), 
                condition == 1, 
                brokenSuppression == 0) %>% 
  mutate(primed = case_when(primedGrating == 1 & chosenGrating == 1 ~ TRUE,
                            primedGrating == 2 & chosenGrating == 2 ~ TRUE,
         TRUE ~ FALSE)) %>% 
  group_by(PID, block, phase) %>% 
  summarise(trials = n(), 
            number_primed = sum(primed, na.rm = TRUE), 
            percentage = 100*number_primed/trials) %>% 
  ungroup() %>% 
  summarise(mean_percentage = mean(percentage),
            SEM = sd(percentage)/sqrt(n()))

## `summarise()` has grouped output by 'PID', 'block'. You can override using the
## `.groups` argument.

print(plot3_trainingaverage)

## # A tibble: 1 × 2
##   mean_percentage   SEM
##             <dbl> <dbl>
## 1            40.1  1.59

The mean post-training priming percentage was reported in the paper as 40.11% which we got a similar value of 40.13%.

Finally, the average means and SEMs for each training period were calculated.

plot3_mean <- plot3_data %>% 
  summarise(mean = mean(mean_percentage), 
            SEM = sd(mean_percentage)/sqrt(n()))

print(plot3_mean)

## # A tibble: 3 × 3
##   phase           mean   SEM
##   <chr>          <dbl> <dbl>
## 1 results_train1  44.8  2.68
## 2 results_train2  37.3  1.49
## 3 results_train3  38.3  1.77

write_csv(plot3_mean, file = "plot3meandescriptives.csv")

Now we have all the data wrangled, we can plot it! Again, although we thought this was simple, we actually had to attempt this a few times. This was our first attempt to just make sure the individual plots and SEMs were correct. The main steps were as follows:

Called “plot3” and used ggplot to specify the data to be “plot3_data”.
Used geom_point() for individual plots and map aesthetics so that block is plotted on the x-axis and mean percentage on the y-axis.
Used geom_path() (which did work this time!) to connect the points generated based on the phase of training.
geom_errorbar() to generate the error bars calculated by the means ± the SEMs.
In order to have the graph be separated into the different training phases, we used the facet_wrap() function and specified it to separate by phase.
Created a dotted line at y=50 using the geom_hline() function.

plot3 <- ggplot(plot3_data) +
  geom_point(aes(x = block, 
                 y = mean_percentage)) +
  geom_path(aes(x = block, 
                y = mean_percentage, 
                group = phase)) +
  geom_errorbar(aes(x = block, 
                    ymin = mean_percentage - SEM, 
                    ymax = mean_percentage + SEM), 
                    width = 0.4) +
  facet_wrap(~phase) +
  geom_hline(yintercept = 50, size = 0.3, linetype = "dotted") +
  coord_cartesian(ylim = c(0, 90)) + # y-axis limits from 0 to 90
  scale_y_continuous(breaks = c(10, 30, 50, 70), # Creating breaks in y-axis
                     minor_breaks = c(0, 20, 40, 60, 80, 100), 
                     expand = c(0,0)) + # Forcing the y-axis to start at 0
  labs( # Appropriately labeling the graph
    title = "Decision Priming",
    subtitle = "Across Training Period",
    x = "Training Period",
    y = "Decision Priming (%)") +
  theme(axis.line.y = element_line(colour = "black", # Modifying graph elements
                                   linewidth = 0.5),
        axis.ticks.y = element_line(colour = "black", 
                                    linewidth = 0.5),
        axis.ticks.length = unit(-0.1, "cm"),
        axis.line.x = element_line(colour = "black", 
                                   linewidth = 0.5),
        panel.grid = element_blank())

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

plot(plot3)

The individual plots match up with the amended points our supervisor gave! But there were a lot of adjustments to be made, most notably the shaded error bars and the pre- and post-training averages. I was having difficulty with coding the pre- and post-training averages since the graph was faceted. This meant that if the point would appear in all three graphs instead of just one. For example:

Which was totally wrong. I found my answer here “https://www3.nd.edu/~steve/computing_with_data/13_Facets/facets.html”, which said that you needed to restricting the data set so your geom_point() can show in a single facet. I knew I need to try and specify which facet it should appear it, so I created a new column called phase to match the plot3_data, and specified the exact training facet. That way, the data frames could could work synonymously with each other.

phase = c("results_train1")
plot3_baseline$phase <- phase

phase = c("results_train3")
plot3_trainingaverage$phase <- phase

print(plot3_baseline)

## # A tibble: 1 × 3
##   mean_percentage   SEM phase         
##             <dbl> <dbl> <chr>         
## 1            69.7  2.97 results_train1

print(plot3_trainingaverage)

## # A tibble: 1 × 3
##   mean_percentage   SEM phase         
##             <dbl> <dbl> <chr>         
## 1            40.1  1.59 results_train3

Let’s hope this works!

Now, I moved onto attempt 2 of plotting this figure. Another challenge we encounted was with the shaded error bars. It was recommended that we used geom_ribbon(). However, when I plotted geom_ribbon() with the necessary x and ymin/ymax values, I still wasn’t getting the ribbon! This Stack Overflow post saved me: https://stackoverflow.com/questions/34553044/missing-ribbon-in-ggplot2. The post mentioned that you needed a grouping variable for it to work, and “if there is only one observation per group, then you specify group = 1”. Adding that to the aesthetics of geom_ribbon() worked perfectly! Thank you stackoverflow. Once again, the order of the geoms was important, geom_ribbon goes last so that geom_point can go on top.

For the pre- and post-training points, I used geom_point() and specified the x and y values for each as was previously calculated. I also adjusted the shape and colour so it matched what was in the manuscript. Using geom_text() I added labels for these points.

plot3 <- ggplot() +
  geom_ribbon(data = plot3_data, 
              mapping = aes(x = block,  
                            ymin = mean_percentage - SEM, 
                            ymax = mean_percentage + SEM, 
                            group = 1), 
              fill = "lightgrey") +
  geom_point(data = plot3_data, 
             mapping = aes(x = block, y = mean_percentage), size = 3) +
  geom_path(data = plot3_data, 
            mapping = aes(x = block, y = mean_percentage, group = phase)) +
  facet_wrap(~phase, # Adding appropriate labels to each training phase
             labeller = as_labeller(c(results_train1 = "Session 1",
                                      results_train2 = "Session 2",
                                      results_train3 = "Session 3")),
             strip.position = "bottom") +
  geom_hline(yintercept = 50, size = 0.3, linetype = "dotted") +
  geom_errorbar(plot3_baseline, 
                mapping = aes(x = 0.5, 
                              ymin = mean_percentage - SEM, 
                              ymax = mean_percentage + SEM), 
                              width = 0) +
  geom_point(data = plot3_baseline, 
             x = 0.5, y = 69.66955,
             shape = 21, 
             colour = "black",
             fill = "white", 
             stroke = 1.5, 
             size = 2.75) +
  geom_text(data = plot3_baseline, 
            x = 0.5, y = 69.66955, 
            label = "Pre-training", 
            size = 3.2, angle = 90, vjust = -1) +
  geom_errorbar(plot3_trainingaverage, 
                mapping = aes(x = 5.5, 
                              ymin = mean_percentage - SEM, 
                              ymax = mean_percentage + SEM), 
                              width = 0) +
  geom_point(data = plot3_trainingaverage, 
             x = 5.5, y = 40.13709, 
             shape = 21, 
             colour = "black",
             fill = "white", 
             stroke = 1.5, 
             size = 2.75) +
  geom_text(data = plot3_trainingaverage, 
            x = 5.5, y = 40.13709, 
            label = "Training Average", 
            size = 3, angle = 270, vjust = -1) +
  coord_cartesian(ylim = c(25, 75)) +
  scale_y_continuous(expand = c(0, 0),
                     breaks = c(30, 35, 40, 45, 50, 55, 60, 65, 70, 75),
                     labels = c(30, rep("", 1), # Replacing every 2nd tick with a blank
                                40, rep("", 1), 
                                50, rep("", 1),
                                60, rep("", 1),
                                70, rep("", 1))) +
  scale_x_discrete(expand = c(0.3, 0.3))+
  theme_minimal() +
  labs( # Adding appropriate labels
    title = "Decision Priming",
    subtitle = "Across Training Period",
    x = "Training",
    y = "Decision Priming (%)") +
  theme(strip.background = element_blank(),# Modifying specific graph elements
        strip.placement = "outside",
        axis.line.y = element_line(colour = "black", linewidth = 0.5),
        axis.ticks.y = element_line(colour = "black", linewidth = 0.5),
        axis.ticks.length = unit(-0.1, "cm"),
        axis.line.x = element_line(colour = "black", linewidth = 0.5),
        axis.ticks.x = element_line(colour = "black", linewidth =  0.5),
        panel.grid = element_blank(),
        panel.spacing = unit(0, "lines"))

plot(plot3)

And after all that, we did it! It looks almost identical to what was in the manuscript! I was so proud.

2.5 Plot 4: Training Effects After 1 Week

Last, but not least, we have our final plot.

Similar to the previous plot, we needed to filter() for the data from the training phases only. Then, we used mutate() to created a new column for priming and the specific training phase. “Training_baseline” and “post_train” was retained, however “results_train” 1, 2 and 3 were merged into “results_train”. This was piped into a group_by() function for the training phase and participant ID. We then summarise() for the average number of trials that were primed.

plot4_data <- data %>% 
  dplyr::filter(phase %in% c("training_baseline", 
                             "results_train1", 
                             "results_train2", 
                             "results_train3", 
                             "post_train"), condition == 1, brokenSuppression == 0) %>% 
  mutate(primed = case_when(primedGrating == 1 & chosenGrating == 1 ~ TRUE,
                            primedGrating == 2 & chosenGrating == 2 ~ TRUE),
         training_phase = case_when(phase == "training_baseline" ~ "training_baseline",
                                    phase == "results_train1" | 
                                    phase == "results_train2" | 
                                    phase == "results_train3" ~ "results_train",
                                    phase == "post_train" ~ "post_train")) %>% 
  group_by(training_phase, PID) %>% 
  summarise(trials = n(), 
            number_primed = sum(primed, na.rm = TRUE), 
            percentage = 100*number_primed/trials) %>% 
  ungroup()

## `summarise()` has grouped output by 'training_phase'. You can override using
## the `.groups` argument.

# Manually reordering so training_baseline appears first, results_train next, then post_train
plot4_data$training_phase <- fct_relevel(plot4_data$training_phase, 
                                         c("training_baseline", "results_train", "post_train"))

print(plot4_data)

## # A tibble: 21 × 5
##    training_phase   PID trials number_primed percentage
##    <fct>          <dbl>  <int>         <int>      <dbl>
##  1 post_train         1     60            28       46.7
##  2 post_train         2     60            29       48.3
##  3 post_train         3     60             7       11.7
##  4 post_train         4     58            27       46.6
##  5 post_train         5     60            58       96.7
##  6 post_train         6     60             7       11.7
##  7 post_train         7     60            22       36.7
##  8 results_train      1    360           149       41.4
##  9 results_train      2    359           177       49.3
## 10 results_train      3    360            99       27.5
## # ℹ 11 more rows

To obtain the mean for the effects of training over time, we group_by() training phase to get values over participant ID and summarised the mean(). Importantly, we had to minus this value by 50, which refers to priming by chance if it was 50%.

plot4_mean <- plot4_data %>% 
  group_by(training_phase) %>% 
  summarise(mean = mean(percentage) - 50)
print(plot4_mean)

## # A tibble: 3 × 2
##   training_phase     mean
##   <fct>             <dbl>
## 1 training_baseline 19.9 
## 2 results_train     -9.95
## 3 post_train        -7.40

We then obtained the means in a similar way, however, without subtracting 50 so we could code the SEM.

plot4_desc <- plot4_data %>% 
  group_by(training_phase) %>% 
  summarise(mean = mean(percentage), 
            SEM = sd(percentage)/sqrt(n()))

# Manually reordering the factors once again
plot4_mean$training_phase <- fct_relevel(
  plot4_mean$training_phase, c("training_baseline", "results_train", "post_train"))

print(plot4_desc)

## # A tibble: 3 × 3
##   training_phase     mean   SEM
##   <fct>             <dbl> <dbl>
## 1 training_baseline  69.9  4.66
## 2 results_train      40.1  3.36
## 3 post_train         42.6 10.8

write_csv(plot4_desc, file = "plot4descriptives.csv")

These means and SEMs correlate to the 1 week post-training reported in the manuscript, which was 42.6% and 10.84 SEM, respectively. No means or SEMs were reported for pre-training and during training, so we confirmed our coded means by eye by comparing it to the figure. It appeared to be correct!

We could then move on to plotting the above data. The main steps to reproducing this figure were as follows:

Called a ggplot() object and named it “plot4”.
Used the geom_col() function for column graph according to local data of plot4_mean and, aesthetic mapping of x being training phase and y being percentage priming.
Used the geom_point() function for the points corresponding to each of the 7 included participants for each training phase.
Used the geom_path() function to connect these points according to participant ID.
Changed the y scale to continuous between 0 and 100 by increments of 20 using scale_y_continuous(). The most difficult part was attempting to replicate the arrow on the right hand side. We worked around this by adding right hand side axis in the scale_y_continuous() function using sec.axis() and labeling it “Congruent Incongruent”. To form the arrow heads, in theme() we used the arrow argument as part of element_line().
We added additional graph elements changes in theme() for ticks, texts, titles and panel grid.

plot4 <- ggplot() +
  geom_col(plot4_mean, 
           mapping = aes(training_phase, y = mean, fill = training_phase),
           colour = "black",# Column border colour
           size = 0.1,# Making border line thickness smaller
           position = position_nudge(y = 50), width = 0.5) +
  scale_fill_manual(values = c("white", "lightgrey", "black")) + # Colouring the bars appropriately
  geom_errorbar(plot4_desc, mapping = aes(x = training_phase, 
                                          ymin = mean - SEM, 
                                          ymax = mean + SEM), width = 0) +
  geom_point(plot4_data, 
             mapping = aes(x = training_phase, y = percentage),
             shape = 21, # This specific shape contains a border
             colour = "white",# Border colour
             fill = "gray50", # Point colour
             stroke = 1, # Adjusting border thickness
             size = 2.75) + # Adjusting point size
  geom_path(plot4_data, 
            mapping = aes(x = training_phase, y = percentage, group = PID),
            colour = "gray50") +
  geom_hline(yintercept = 50, size = 0.3) + # Line going through the graph
  coord_cartesian(ylim = c(0, 100)) +
  scale_y_continuous(
    breaks = c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100), 
    minor_breaks = (c(0, 20, 40, 60, 80)),
    labels = c(10, rep("", 1), # Replace each second value with a blank (i.e. "")
               30, rep("", 1), # So every second tick will be blank
               50, rep("", 1),
               70, rep("", 1),
               90, rep("", 1)),
    expand = c(0,0),
    sec.axis = sec_axis(~.*1, name = "Congruent      Incongruent")) + # Adding right hand side axis
  scale_x_discrete(labels = c("Pre-training", "Training", "Post-training")) +
  theme_minimal() +
  labs(
    title = "Decision Priming",
    subtitle = "Across Training Period",
    x = "Training Period",
    y = "Decision Priming (%)") +
  theme(legend.position = "none", 
        axis.line.y.right = 
          element_line(arrow = grid::arrow(length = unit(0.3, "cm"), 
                       ends = "both", 
                       type = "closed"), 
                       colour = "grey50", 
                       linewidth = 0.75), 
        axis.ticks.y.right = element_blank(), 
        axis.text.y.right = element_blank(),
        axis.title.y.right = element_text(colour = "grey50", size = 11, vjust = 2),
        axis.line.y.left = element_line(colour = "grey50", linewidth = 0.5),
        axis.ticks.y = element_line(colour = "grey50", size = 0.5),
        axis.ticks.length = unit(-0.1, "cm"),
        axis.ticks.x = element_line(colour = "grey50", size = 0.5),
        panel.grid = element_blank())

plot(plot4)

And so concludes the very last plot for this manuscript!

3 Exploration

Moving onto the exploratory analysis, which required me to use the data to attempt to answer 3 questions of interest to me about the research.

3.1 Q1: Which colour grating leads to better subliminal priming?

The method states that they have used both green and red gratings during the priming main experiment, and in my initial reading I was curious as to the decision of the colour differences. There could be a potential for a certain colour leading to better overall priming in the main experiment, which has implications for what is most optimal for the visual suppression paradigm.

To start off fresh, I read in all the data files given.

data <-read_csv("data/PearsonLab_Data_Collated.csv")

## New names:
## Rows: 11087 Columns: 13
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (2): file, phase dbl (11): ...1, condition, primedGrating, chosenGrating,
## brokenSuppression, ...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`

# Printing included participant data
print(data)

## # A tibble: 11,087 × 13
##     ...1 file  condition primedGrating chosenGrating brokenSuppression vividness
##    <dbl> <chr>     <dbl>         <dbl>         <dbl>             <dbl>     <dbl>
##  1     1 01_1…         1             1             1                 0        NA
##  2     2 01_1…         1             2             2                 0        NA
##  3     3 01_1…         3             2            NA                 1        NA
##  4     4 01_1…         1             1             1                 0        NA
##  5     5 01_1…         1             1             1                 0        NA
##  6     6 01_1…         1             1             1                 0        NA
##  7     7 01_1…         3             1            NA                 1        NA
##  8     8 01_1…         1             2             1                 0        NA
##  9     9 01_1…         1             2             1                 0        NA
## 10    10 01_1…         1             2             2                 0        NA
## # ℹ 11,077 more rows
## # ℹ 6 more variables: preImageryTime <dbl>, agency <dbl>, right_wrong <dbl>,
## #   phase <chr>, PID <dbl>, block <dbl>

excluded_data <- read_csv("data/included_excluded_data.csv")

## New names:
## Rows: 33 Columns: 4
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (4): included, ...2, ...3, ...4
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...2`
## • `` -> `...3`
## • `` -> `...4`

# Printing excluded participant data
print(excluded_data)

## # A tibble: 33 × 4
##    included    ...2            ...3              ...4                 
##    <chr>       <chr>           <chr>             <chr>                
##  1 priming     catch detection suppression break difference  red/green
##  2 0.660714286 0.9             0.066666667       0.071428571          
##  3 0.867924528 1               0.116666667       0.094339623          
##  4 0.583333333 0.8             0.016393443       0.033333333          
##  5 0.559322034 1               0.016666667       0.016949153          
##  6 0.661016949 1               0.078125          0.084745763          
##  7 0.491525424 0.933333333     0.016666667       0.016949153          
##  8 0.568965517 1               0.033333333       0.034482759          
##  9 0.672413793 1               0.033333333       0                    
## 10 0.593220339 1               0.016666667       0.016949153          
## # ℹ 23 more rows

I needed to use the data from the main experiment, and compare correct and incorrect priming for green and red. Both follow the same process, with the only difference being in the mutate() function where a new column called “primed” includes case_when() either primed grating and chosen grating both equal 1, for green, or both equal 2, for red. Whichever colour it was, it returned a TRUE value indicating successful priming. The explanation for functions used is embedded in the code below. Also, rather than using summarise(), I used the reframe() function as part of the dplyr package. The different between the two is that reframe() is less stringent and is useful for returning more or less than 1 row (since I had 2 rows).

green_priming <- data %>% 
  dplyr::filter(phase == "main", # Include only the main experiment
         condition != 3, # Cannot be a catch trial where suppression is meant to break
         brokenSuppression == 0, # No broken suppression
         block != 0) %>% # Excluding block 0
  mutate(primed = case_when(primedGrating == 1 & 
                            chosenGrating == 1 ~ TRUE)) %>%
  mutate_at(vars("primed"), # Under the primed label, replace NA values with FALSE
            ~replace_na(.,FALSE)) %>%
  count(trials = n(), primed) %>% # Count the number of trials and the amount of primed TRUE/FALSE
  reframe(primed, 
          percentage = 100*n/trials) # Calculating average primed trials
print(green_priming)

## # A tibble: 2 × 2
##   primed percentage
##   <lgl>       <dbl>
## 1 FALSE        65.2
## 2 TRUE         34.8

colour = c("green") # Adding a column to specify the colour as green
green_priming$colour <- colour

red_priming <- data %>% 
  dplyr::filter(phase == "main", 
         condition != 3, 
         brokenSuppression == 0, 
         block != 0) %>% 
  mutate(primed = case_when(primedGrating == 2 & chosenGrating == 2 ~ TRUE)) %>%
  mutate_at(vars("primed"),
            ~replace_na(.,FALSE)) %>%
  count(trials = n(), primed) %>% 
  reframe(primed,
          percentage = 100*n/trials)
print(red_priming)

## # A tibble: 2 × 2
##   primed percentage
##   <lgl>       <dbl>
## 1 FALSE        78.8
## 2 TRUE         21.2

colour = c("red") # Adding a column to specify the colour as red
red_priming$colour <- colour

From the raw numbers, it appears that having a green stimulus leads to better priming. However, I wanted to plot it to make comparisons easier. To make plotting easier, I combined the two data frames for green and red priming and called it “colour_priming”. I did this by using the bind_rows() function. I then changed the “primed” variable into a factor vector (AKA a categorical variable) using the as.factor() function. This allowed me to then reorder the variables in the primed column so that “TRUE” appears first using the fct_relevel() function. For viewing purposes, I thought it would be better for successful priming to appear on the left hand side of the graph.

colour_priming <- bind_rows(green_priming, red_priming)

# Writing a .csv file so that this data can be easily accessible
write_csv(colour_priming, file = "green&red_priming.csv")

colour_priming$primed <- as.factor(colour_priming$primed) 

colour_priming$primed <- fct_relevel(colour_priming$primed, c("TRUE", "FALSE"))

I called a new ggplot “green_red_priming” and used local data from colour_priming. I chose geom_col() since it is most effective for displaying ranking or comparisons.

plot_green_red_priming <- ggplot(colour_priming) +
  geom_col(aes(x = primed, y = percentage, fill = colour), 
           position = "dodge") + # Preserve vertical position while adjusting horizontal position
  scale_fill_manual(values = c("#85E378", "#F95858"), # Appropriate colour for green/red
                    name = "Colour", 
                    labels = c("Green", "Red")) + # Renaming legend title & labels
  scale_y_continuous(
    breaks = c(0, 20, 40, 60, 80)) +
  scale_x_discrete( # Renaming x-axis variables 
    labels = c("Successful Priming", "Unsucessful Priming")) +
  labs(title = "Priming Using Green Versus Red Stimulus", # Adding plot title
       y = "Average Primed (%)") + # Renaming y-axis label
  theme_light() + 
  theme(axis.line = element_line(colour = "black"), # Modifying specific graph elements
        axis.ticks = element_line(colour = "black"),
        axis.title.x = element_blank(),
        axis.text.x = element_text(colour = "black"),
        plot.title = element_text(hjust = 0.5),
        panel.grid.major.x = element_blank()) 
plot(plot_green_red_priming)

So, yes, there does seem to be a difference in colour affecting priming. It appears that the green grating is better than the red grating at facilitating subliminal priming with an average of 34.8% compared to 21.2%, respectively. There are also more instances of unsuccessful subliminal priming for red grating compared to green grating with an average of 78.8% compared to 65.2%, respectively. However, in general, unsuccessful priming seems to outweigh successful priming regardless of colour.

3.2 Q2: Does higher agency mean less pre-imagery time?

I also wondered if there was an association between agency and participant time before the imagery period. Since participants were able to choose when to begin the imagery period following the priming paradigm, I was curious to know whether the belief that you know exactly what to imagine/have control over your choices mean faster reaction time. Moreover, the authors mentioned in the introduction that subliminal priming could lead to faster reaction times for decisions leaning towards the stimulus. So, there could be a potential association between the agency and pre-imagery time for those who had successful subliminal priming. I expected that there would be a shorter pre-imagery time when there is higher agency.

To achieve this, I used data from both the main and training phases of the experiment.

The filters for the condition, broken suppression and block are the same as coded previously, and the mutate() function created a new column called “primed” for case_when() priming was successful (i.e. the colour of primed and chosen grating match).
Because I wanted to work only with participants that had sucessful priming, I needed to drop those who didn’t have successful priming. I used the drop_na() function as part of the tidyr package to remove NA values from a specific column, in this case the “primed” column.
I then reframe() the data to include agency and pre-imagery time, then used the arrange() function so that agency was in ascending order.

main_agency_time <- data %>% # Creating a new data frame for the main experiment
  dplyr::filter(phase %in% c("main"), # Filtering for the main experiment only
         condition != 3, 
         brokenSuppression == 0, 
         block != 0) %>% 
  mutate(primed = case_when(primedGrating == 1 & chosenGrating == 1 ~ TRUE,
                            primedGrating == 2 & chosenGrating == 2 ~ TRUE)) %>% 
  drop_na(primed) %>% 
  reframe(agency, preImageryTime) %>% 
  arrange(agency)

# Converting agency into a factor vector (AKA a categorical variable)
main_agency_time$agency <- factor(main_agency_time$agency)

print(main_agency_time)

## # A tibble: 1,109 × 2
##    agency preImageryTime
##    <fct>           <dbl>
##  1 1               0.603
##  2 1               0.254
##  3 1               0.169
##  4 1               0.288
##  5 1               0.214
##  6 1               0.153
##  7 1               0.257
##  8 1               0.389
##  9 1               0.352
## 10 1               0.273
## # ℹ 1,099 more rows

# Also calculating the mean pre-imagery time based on grouping the agency rating
main_agency_time_mean <- main_agency_time %>% 
  group_by(agency) %>%
  summarise(preImagery_time_mean = mean(preImageryTime))

print(main_agency_time_mean)

## # A tibble: 4 × 2
##   agency preImagery_time_mean
##   <fct>                 <dbl>
## 1 1                     1.34 
## 2 2                     1.60 
## 3 3                     1.76 
## 4 4                     0.954

# Writing a .csv file so that this data can be easily accessible
write_csv(main_agency_time, file = "agency&time_main_experiment.csv")

The same steps as above are applied here.

training_agency_time<- data %>% # Creating a new data frame for the training phases 
  dplyr::filter(phase %in% c("results_train1", "results_train2", "results_train3"), 
         condition != 3, 
         brokenSuppression == 0, 
         block != 0) %>% 
  mutate(primed = case_when(primedGrating == 1 & chosenGrating == 1 ~ TRUE,
                            primedGrating == 2 & chosenGrating == 2 ~ TRUE)) %>% 
  drop_na(primed) %>% 
  reframe(agency, preImageryTime) %>% 
  arrange(agency)

training_agency_time$agency <- factor(training_agency_time$agency)

print(training_agency_time)

## # A tibble: 1,001 × 2
##    agency preImageryTime
##    <fct>           <dbl>
##  1 1               0.245
##  2 1               0.137
##  3 1               0.231
##  4 1               1.44 
##  5 1               1.42 
##  6 1               1.38 
##  7 1               3.03 
##  8 1               0.791
##  9 1               1.25 
## 10 1               0.909
## # ℹ 991 more rows

training_agency_time_mean <- training_agency_time %>% 
  group_by(agency) %>% 
  summarise(preImagery_time_mean = mean(preImageryTime))

print(training_agency_time_mean)

## # A tibble: 4 × 2
##   agency preImagery_time_mean
##   <fct>                 <dbl>
## 1 1                     1.29 
## 2 2                     0.988
## 3 3                     0.839
## 4 4                     0.823

write_csv(training_agency_time, file = "agency&time_training_phases.csv")

And yes, in both the main experiment and the training phases, a higher agency rating for participants where subliminal priming was successful does lead to, on average, lower pre-imagery times.

However, I wanted to see the spread of this data through a plot. I wanted to see how skewed the pre-imagery time was for each agency rating, so I chose to use a box plot as I could have each box plot side-by-side for easier comparisons. This website http://www.sthda.com/english/wiki/ggplot2-box-plot-quick-start-guide-r-software-and-data-visualization was helpful to begin with since I hadn’t used the geom_boxplot() function before. I created an object called “plot_main_agency_time” and used ggplot() to specify the data, and what is plotted on the x and y axis. I then used the geom_boxplot() to create a box plot, and excluded an outliers using the outlier.shape function. I added colour with fill.

plot_main_agency_time <- ggplot(data = main_agency_time, aes(x = agency, y = preImageryTime)) +
  geom_boxplot(outlier.shape = NA, 
               fill = "cornflowerblue") +
  coord_cartesian(ylim = c(0, 4.5)) + # Restricting the axis from 0 to 4.5
  scale_y_continuous(breaks = c(0, 1, 2 ,3, 4)) + # Adding breaks in y-axis from 0 to 4
  theme_classic() +
  labs(title = "Pre-Imagery Time & Agency Rating During Main Experiment", # Adding plot title
       y = "Pre-Imagery Time (sec)",
       x = "Agency") +
  theme_light() + 
  theme(axis.line = element_line(colour = "black"), # Modifying specific graph elements
        axis.ticks = element_line(colour = "black"),
        axis.text.x = element_text(colour = "black"),
        plot.title = element_text(hjust = 0.5),
        panel.grid.major.x = element_blank()) 
plot(plot_main_agency_time)

plot_training_agency_time <- ggplot(data = training_agency_time, aes(x = agency, y = preImageryTime)) +
  geom_boxplot(outlier.shape = NA, fill = "cornflowerblue") +
  coord_cartesian(ylim = c(0, 4)) +
  scale_y_continuous(breaks = c(0, 1, 2, 3, 4)) +
  theme_classic() +
  labs(title = "Pre-Imagery Time & Agency Rating During Training", # Adding plot title
       y = "Pre-Imagery Time (sec)",
       x = "Agency") +
  theme_light() + 
  theme(axis.line = element_line(colour = "black"), # Modifying specific graph elements
        axis.ticks = element_line(colour = "black"),
        axis.text.x = element_text(colour = "black"),
        plot.title = element_text(hjust = 0.5),
        panel.grid.major.x = element_blank()) 
plot(plot_training_agency_time)

In both phases for those with successful priming, an agency rating of 4 (indicating high agency) had lower pre-imagery time. The box plot for this rating is more condensed when compared to other agency ratings, suggesting that most participants that feel in control about their decisions do spend less time figuring out what to imagine. For the main experiment, it appears that those with an agency rating of 3 are more variable in how long spent during pre-imagery, despite being feeling pretty in control of their decision. For the training phase, those with a low agency rating spend more time during pre-imagery.

3.3 Q3: Is there a correlation between agency and vividness?

Finally, I was curious as to whether there was a statistical correlation between ratings of agency and vividness. I would assume that a participant who can clearly imagine the grating would believe that they are in the most control of their choice selection.

To do this, I chose to pull data from both the main experiment and training phases, making sure to filter() out the control phase since it did not have any imagery period and therefore no vividness rating.

agency_vividness_main <- data %>% 
  dplyr::filter(phase == "main", 
         condition != 3, 
         brokenSuppression == 0, 
         block != 0) %>% 
  reframe(vividness, agency) # Keeping only the vividness and agency ratings 

glimpse(agency_vividness_main) # Having a look at the coded outcome

## Rows: 1,979
## Columns: 2
## $ vividness <dbl> 2, 2, 1, 3, 2, 2, 2, 3, 2, 2, 3, 1, 2, 3, 2, 3, 3, 2, 2, 2, …
## $ agency    <dbl> 3, 3, 3, 2, 4, 3, 2, 1, 3, 3, 2, 3, 3, 1, 4, 3, 2, 2, 3, 3, …

# Writing a .csv file so that this data can be easily accessible
write_csv(agency_vividness_main, file = "agency&vividness_main_experiment.csv")

agency_vividness_training <- data %>% 
  dplyr::filter(phase %in% c("results_train1", "results_train2", "results_train3"),
         condition != 3, 
         brokenSuppression == 0, 
         block != 0) %>% 
  reframe(vividness, agency)

glimpse(agency_vividness_training)

## Rows: 2,499
## Columns: 2
## $ vividness <dbl> 2, 3, 3, 3, 2, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 3, 2, 2, 3, 4, …
## $ agency    <dbl> 4, 3, 3, 2, 4, 4, 1, 3, 1, 2, 3, 3, 3, 2, 4, 3, 2, 2, 3, 3, …

write_csv(agency_vividness_training, file = "agency&vividness_training_phases.csv")

I used a new function called cor() which calculate the correlation coefficient for the data frame specified. This function also allowed me to select the correlation method. I chose “spearman” since it is used to evaluate relationship involving data that has a ranked value for each variable. Both vividness and agency have the lowest value being 1 and the highest value being 3 and 4, respectively. I used this website to help with this function https://statsandr.com/blog/correlation-coefficient-and-correlation-test-in-r/.

cor(agency_vividness_main, method = c("spearman"))

##           vividness    agency
## vividness 1.0000000 0.1939141
## agency    0.1939141 1.0000000

cor(agency_vividness_training, method = c("spearman"))

##            vividness     agency
## vividness 1.00000000 0.04108986
## agency    0.04108986 1.00000000

In the main experiment, there seems to be a weak positive relationship between agency and vividness at 0.19. In the training phase, there is no real linear relationship between vividness and agency at 0.04.

4 Recommendations

4.1 Improve dataset organisation

Before my group and I could begin our reproducibility project, we encountered difficulties in interpreting the csv files. Originally, the data set was provided to us where each participant was split independently per condition and phase (control, main, post training, results training 1, results training 2, results training 3), as well as per block of trials. Furthermore, although we were given an initial README file, the original .csv files did not contain any labels or headings for the columns and was missing information that was needed for interpreting the data. There were a lot of trials, conditions, phases, and IVs to the experimental design, yet these remained unclear upon initial reading. It was confusing to follow without clear column labels and explanations. The screenshots below are examples of the original data file.

We were very fortunate that we received help from our coding tutor who put together a collated version so all the information was readable. However, in the real world, tracking down missing data or following up on the meaning behind data can slow down the process of data reproducibility.Even if somebody were to upload their data onto an open resource like OSF, if the formatting is uninterpretable, it begs the question of whether the data is truly open when compiled in a convoluted and tedious arrangement. The authors of the paper could have included clearer variable names, collated the individual .csv files into one data set, and provided all the necessary data (such as the excluded participants needed for Figure 2). Some resources to improve data management include following the FAIR principles which are supported by the Australian Research Data Commons (ARDC). The includes making data findable, accessible, interoperable and reusable.

4.2 Improve consistency between raw data and reported numbers

Data consistency refers to the completeness and correctness of data stored in a database. There were several instances where the raw data and reported numbers in the manuscript did not match, which were barriers to reproducibility. For example, when evaluating the research manuscript my group and I noticed a discrepancy surrounding the number of trials per participant. As stated repeatedly within the manuscript, there were an expected 8 blocks of 30 trials per participant. However, the data set appeared to have only 5 blocks per participant (a total of 150 per participant, as opposed to the stated 240). Moreover, in the main experiment, for participants 3 and 5, there appeared to be an additional block that was labelled “0”. Despite the report being absent of any miscellaneous or extra blocks (i.e. any explanations of their purpose/identity), there was still data present within these surplus trials. The authors of the paper should ensure they are correctly reporting the numbers as recorded in the data set, and giving a reason in the paper for any extra data in the .csv files. This transparency would improve the legitimacy of the research and provide clarity for those needing to reproduce the data.

4.3 Include detailed code explanations

Myself and my other group members had no prior experience in coding, so we relied on the knowledge we gained from the early coding modules and lots, and lots of searching on blogs like Stack Overflow. The authors should consider providing a code book, which is a document outlining coding framework for research. This can include providing clear definitions for the dependent variables (for example, what is meant by a catch detection trial) so a viewer does not need to constantly look through the manuscript to search for the definition, and any key programs or functions used in the author’s coding process. During our reproducibility project, we were given an example of code the author’s used for Figure 2. However, it was uninterpretable for us since not only was it completed in MatLab, but there was little documentation as to the process behind their code.

5 Reflection

When I first heard that the internship involved learning how to code in R I felt intimidated, but also excited. With my zero prior experience in coding and my presumptions about the difficulties of it, it all felt completely out of my depth. At the same time, I was also keen to pick up a new skill. Not just for being able to understand and apply a whole new programming language, but the potential of it benefiting me during my honours year (and beyond if I were to pursue research) was a valuable outcome.
Now that I have been learning R for 10 weeks, I feel much more confident about my coding abilities and more appreciative about data reproducibility. I think Daphne from Week 1 would be shocked at how much she’s learned and achieved throughout this entire term. When I first started to code, I would honestly just copy lines of the basic code from Danielle’s module as I watched and called it there. I know, horrible practice looking back on it now because I really wasn’t understanding the purpose of a certain function was. I think I was still afraid of my abilities, which stopped me from being an active, independent learner. Luckily through completing the self-tests and the coding assignment, I really found myself being comfortable in my own skills. Applying the coding knowledge I learned to suit how I wanted the data to be formatted and how I wanted it to look was pretty empowering. Moreover, I have a great appreciation for the importance of good data management. To ensure that research is carried out properly, you should be able to reproduce the results from the raw data provided. Even throughout my R journey, my group and I encountered barriers with how accessible and open the data actually was. I can’t imagine how tough it must be to try and reproduce data in the real world, and not just for a course an university! But it is a necessary practice, and I’ve grown a much stronger passion for open science throughout this coding experience.
The hardest thing I have encountered about learning R this term was trying to adopt a growth mindset. There was a steep learning curve having to learn everything from the ground up, and during the beginning if I made any kind of mistake, I got frustrated and didn’t give myself as much grace as I should have. This led me to doubt whether I would ever be good at picking up coding. There was a point during this term where I even began to compare my abilities to others. Hearing others readily solve problems or raise interesting ideas to how to approach it made me feel a little self-conscious about how I could never be on the same level.
I am most proud of myself for not giving up when I encountered difficulties in code. Perseverance and patience are the two qualities I developed the most throughout this term. One of the best strategies I had was that if I did encounter an error, and I had been working on it for what felt like forever, I took a step back, closed R studio, and took some time to refresh. Every single time I returned to it, whether it was later that same day or even the day after, I managed to solve the problem! The clearest example I can remember was me hitting a roadblock at night trying to figure out how to plot those two individual points for the training baseline and average in Figure 3. I wasn’t getting anywhere at all, so rather than wallow away in self-pity, I told myself I needed to get some rest, clear my head, and try again tomorrow. I gave it another shot in the morning before I had to go meet my friends to watch a movie, and I managed to figure it out in little time! My friends probably had so much fun listening to me talk about how to plot individual points within specific faceted graphs during that outing. I really built my confidence during this term and overcame those frustrations and doubts I had early on. I gave myself a lot more kindness, for example, I’m quite dramatic so whenever I coded error-free, I always pumped my fists in the air, and exaggerated by saying I feel “like a genius”. I also consistently sent my group updates on code I was working on and hearing encouragements from them really boosted my confidence! Even though this verification report isn’t perfect, and I still have much to learn, I’m really proud of all the progress I made.
The next thing I want to learn how to do in R is how to conduct statistical analyses like T-tests or ANOVA. It would be valuable to know how to do this, especially in the lead up to honours year. Fortunately, there are many resources online with example data sets that I can play around with! As a more personal project, however, I also want make my own blog. A bit left field, I know, but it would just be for my own interests, or even much further into the future I could create a website that is more career focused. There is just so much customisability when it comes to coding that appeals to me. There are actually a few ways to go about this! For example using the Distill package for R Markdown or installing Rblogdown. Danielle Navarro actually has videos uploaded on her channel about how to use blogdown, so I’ll need to check those out in the term break.

Thanks for reading! I hope you enjoyed my very first coding journey!

Pearson Lab Verification Report

Daphne Ly

08/08/2023