Koenig-Robert, R., El Omar, H., & Pearson, J. (2022). Unconscious Bias Training Removes Bias Effect from Non-Conscious Stimuli, Restoring Choice Divergence. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.4308582
Subliminal stimuli refers being exposed to information that is presented below a person’s awareness. Subliminal information permeates our world, such as advertising impact consumer choice or social media propagating unrealistic ideals. The impact of subliminal information on the decisions we make has remained a growing field in cognitive science. Previous studies have found that when we are presented with with non-conscious information, the decisions we make lean into those subliminal primes without our knowledge. This occurs across a range of behaviours including attitudes or biases, how we learn or even what we see. The notion that subliminal information influences how we act raises the question of whether we can be trained to steer away from biased choices, meaning that decision making can potentially be recovered.
The present study by Koenig et al. (2022) aims to investigate whether the influence of subliminal information is malleable and if we could diverge away from previous biases. The researchers propose that if behaviour can move away from subliminal biases, this would suggest that the process from subliminal perception to behaviour are subject to influence (such as learning) and not always rigidly biased in the same manner. They hypothesise that, firstly, presenting subliminal stimuli did bias a decision task, and, secondly, that training could cause choice divergences from subliminal stimuli.
To test hypothesis 1, 17 participants underwent an imagery-based subliminal prime task. Those in the main experiment were presented with a non-moving stimulus (either a red or green grating) in their non-dominant eye and a random, moving pattern in their dominant eye. The point of this was so the grating becomes consciously repressed due to the moving stimuli, hence becoming subliminal. They were then asked to imagine which grating they saw, and then select which grating they chose from one of two options. They were then asked to rate how vivid their imagery was (i.e. vividness) and how much they believed they were in control about their grating choice (i.e. agency). Those in the no-imagery control condition underwent the same paradigm, however they did not have an imagination period. Results showed that subliminal gratings did influence participant selection, as participants readily chose the grating that was subliminally presented to them. To test hypothesis 2, 7 participants were selected from the main experiment who displayed above chance priming (i.e. they were more likely to chose the same grating as the subliminal prime). These participants were trained on the same paradigm weeks after the main experiment, however they were given a training blocks where they received feedback on whether the chosen grating was correct or incorrect. Right selections were if the chosen stimulus did not match the subliminal stimulus. Results showed that training that moves away from subliminal stimuli does switch participant choices, and they readily select more incongruent choices. In post experiment interviews, participants reported not knowing how to perform the task, however, they still could implicitly learn to diverge from the subliminal stimuli. The figure below shows the steps participants undertook for both the subliminal prime and training experiments.
These results demonstrate
that choice bias towards subliminal stimuli can be reversed through
training. This finding is of note as non-conscious information that is
learnt has been regarded in the literature as being unalterable. Being
able to learn to make decisions against subliminal information may have
practical implications for treating those who may be more susceptible to
processing negative subliminal information. For example, in the case of
addiction, helping to diverge away from stimuli related to addictive
behaviours that are subliminal processed, or in the case of eating
disorders, diverging away from stimuli which may promote a harmful body
image.
I was surprised that there was a way to test the influence of non-conscious information. Since subliminal information happens below a person’s awareness, I thought it would be difficult to measure the effectiveness of it since we’re not even conscious of its influence on us. Also, I thought it would be difficult to come up with an experiment of subliminal information that isn’t impacted by individual effects. Seeing this procedure, I understood how there could be an empirical way to test this phenomenon. Having participants respond to whether they saw a completely novel stimulus from their non-dominant eye, which would be considered subliminal since although vision is obscured, it is still processing information in the brain. This type of visual suppression has been used quite frequently to control the observer’s awareness of critical stimuli. However, I’m still a little sceptical about how well awareness is measured in this paradigm. Although the researchers did include an agency rating (defined by how well participants felt in control of their decision), I would have liked the researchers to have looked into this variable further in their results as a more thorough check of whether information received was truly processed unconsciously.
I’m not sure I understood why certain orientations and colours were chosen. In the paper itself, there did not appear to be any rationale behind the two different choices of colours (red and green) or the orientations (horizontal and vertical) of the gratings. I do know that looking at the colour red for too long then turning away can cause an afterimage of green since they are complementary colours. However, I don’t know for certain whether alternating between the two colours was done to prevent any afterimages, or if there is another reason to do with priming effectiveness of red and green stimuli. Moreover, I was unsure as to why specifically a horizontal and vertical grating were used compared to any other type of pattern. My best guess, again, has to do with preventing any afterimage effects, but I still wasn’t sure how using different orientation of lines was better than using any other type of shape.
I wonder whether these effects could also be translated into an audio medium. When I read the title and saw “subliminal”, the first thing I thought of was audio subliminal messaging. I’m most familiar with them since they’re big in pop culture for self-help purposes and are claimed to help unconsciously improve certain skills, like language learning. Seeing how decisions can be influenced by imagery-based subliminal information, I think it’d be interesting to investigate audio-based subliminal priming.
(As an aside, there is the RPubs link to this entire report: https://rpubs.com/daphnely03/1069878 in the case that code may be cut off!)
This begins my very first experience in coding and attempting to reproduce this study with the help of my team members.
The goal of this section was to reproduce the demographics and figures within the paper based on the data set provided. The only demographics listed in the paper was for the main experiment, which had 29 participants consisting of 18 females aged between 18 and 39 (pg.2). However, this data was not given, and, after discussing with the paper’s author about how these demographics could be obtained, there was actually no proper record of gender or age for any participants within the study. Unfortunately, reproducing demographics of this study was not possible.
As for the figures, there were 4 main figures within the paper:
I began by loading the appropriate packages in R so we could wrangle
our data and create similar looking plots. Some packages required to be
installed, where the install.packages() function was used.
For pre-existing packages, the library() function was used
with the name of the package embedded within brackets.
library(tidyverse) # For general data wrangling using dplyr and ggplot for figures
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggnewscale) # Create multiple colour scales for different geoms in one graph
## Warning: package 'ggnewscale' was built under R version 4.3.1
library(ggsignif) # For significance lines in figures
## Warning: package 'ggsignif' was built under R version 4.3.1
library(knitr) # To help with report generation
library(tidyr) # For the drop_na function which allows me to drop NA values
library(conflicted) # Resolves conflicts by letting me choose which package I want a function from
I then read in “PearsonLab_Data_Collated.csv” file from the data
folder in the working directory. This was the main file used for all the
participants included across experiments. I also read in the
“included_excluded_data.csv” file which contained data from the 13
participants excluded from the main experiment (this was actually
omitted in the original data file, and we had to request it from our
supervisor to reproduce the figures). I used the read_csv
function as part of the readr package in the
tidyverse library.
data <-read_csv("data/PearsonLab_Data_Collated.csv")
## New names:
## Rows: 11087 Columns: 13
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (2): file, phase dbl (11): ...1, condition, primedGrating, chosenGrating,
## brokenSuppression, ...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
# Printing included participant data
print(data)
## # A tibble: 11,087 × 13
## ...1 file condition primedGrating chosenGrating brokenSuppression vividness
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 01_1… 1 1 1 0 NA
## 2 2 01_1… 1 2 2 0 NA
## 3 3 01_1… 3 2 NA 1 NA
## 4 4 01_1… 1 1 1 0 NA
## 5 5 01_1… 1 1 1 0 NA
## 6 6 01_1… 1 1 1 0 NA
## 7 7 01_1… 3 1 NA 1 NA
## 8 8 01_1… 1 2 1 0 NA
## 9 9 01_1… 1 2 1 0 NA
## 10 10 01_1… 1 2 2 0 NA
## # ℹ 11,077 more rows
## # ℹ 6 more variables: preImageryTime <dbl>, agency <dbl>, right_wrong <dbl>,
## # phase <chr>, PID <dbl>, block <dbl>
excluded_data <- read_csv("data/included_excluded_data.csv")
## New names:
## Rows: 33 Columns: 4
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (4): included, ...2, ...3, ...4
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...2`
## • `` -> `...3`
## • `` -> `...4`
# Printing excluded participant data
print(excluded_data)
## # A tibble: 33 × 4
## included ...2 ...3 ...4
## <chr> <chr> <chr> <chr>
## 1 priming catch detection suppression break difference red/green
## 2 0.660714286 0.9 0.066666667 0.071428571
## 3 0.867924528 1 0.116666667 0.094339623
## 4 0.583333333 0.8 0.016393443 0.033333333
## 5 0.559322034 1 0.016666667 0.016949153
## 6 0.661016949 1 0.078125 0.084745763
## 7 0.491525424 0.933333333 0.016666667 0.016949153
## 8 0.568965517 1 0.033333333 0.034482759
## 9 0.672413793 1 0.033333333 0
## 10 0.593220339 1 0.016666667 0.016949153
## # ℹ 23 more rows
Figure 1 included the effects of subliminal gratings on subliminal priming (1a) and vividness (1b). This figure had two components:
Beginning with Figure 1A, we first had to create a new data frame that included the necessary conditions before we could calculate the average priming. The steps are as follows:
filter() function (specifying
the function from the dplyr package) for the control and
main experimental phases only, for those in a normal priming trial, and
for when suppression is not broken.filter() so
block did not equal 0 (as an aside, our supervisor mentioned how it
could have referred to a trial that some kind of bug occurred, but it
was still included in the data set anyway).mutate()
function to create new columns to determine whether trials had
successful priming.
case_when() to specify that it should return a TRUE value
if primed and chosen grating should both be the same value (where 1
means green, and 2 means red). So, if a participant were primed with a
subliminal green grating and they chose a green grating later on, hooray
successful priming!group_by() to group the data by phase
and participant ID to retain those columns and return the above values
over blocks.summarise() function, we count the total
number of trials completed by each participant by n(), and
summing the number of successful primed by removing the NA values and
only retaining TRUE values. We also calculated the percentage of primed
trials by calculating 100 multiplied by the amount primed over the total
number of trials.ungroup() the data.plot1a_data <- data %>%
dplyr::filter(phase %in% c("ctrl", "main"), condition == 1, brokenSuppression == 0, block != 0) %>%
mutate(primed = case_when(primedGrating == 1 & chosenGrating == 1 ~ TRUE,
primedGrating == 2 & chosenGrating == 2 ~ TRUE)) %>%
group_by(phase, PID) %>%
summarise(trials = n(),
number_primed = sum(primed, na.rm = TRUE),
percentage = 100*number_primed/trials) %>%
ungroup()
## `summarise()` has grouped output by 'phase'. You can override using the
## `.groups` argument.
glimpse(plot1a_data)
## Rows: 39
## Columns: 5
## $ phase <chr> "ctrl", "ctrl", "ctrl", "ctrl", "ctrl", "ctrl", "ctrl", …
## $ PID <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1…
## $ trials <int> 117, 119, 116, 117, 119, 106, 119, 118, 81, 95, 113, 74,…
## $ number_primed <int> 81, 61, 62, 63, 63, 54, 61, 94, 38, 51, 55, 40, 71, 52, …
## $ percentage <dbl> 69.23077, 51.26050, 53.44828, 53.84615, 52.94118, 50.943…
And we now have each individual primed percentage for those included
in the main experiment! However, we also realised that the figure also
includes different coloured points for those participants who were later
included in the subliminal training. This meant we had to distinguish
these individuals from the rest of the participants. Using the
plot1a_data, we piped this into another mutate() function
and created a new column called “train” to specify those participants
included in the training phase from this main experiment. Participant
IDs for those included and excluded from training were given to us by
our supervisor. We used the case_when() function to specify
only the participants IDs included in training which were 1, 2, 4, 7, 8,
16, 17. These values were returned as TRUE and anything else as FALSE.
We also manually reordered the factor levels in the phase column so that
“main” appears first then “control” using the fct_relevel()
function. This is so it replicated how the conditions in the figure are
ordered.
plot1a_data <- plot1a_data %>%
mutate(train = case_when(phase == "main" & PID == 1 |
phase == "main" & PID == 2 |
phase == "main" & PID == 4 |
phase == "main" & PID == 7 |
phase == "main" & PID == 8 |
phase == "main" & PID == 16 |
phase == "main" & PID == 17 ~ TRUE,
TRUE ~ FALSE))
plot1a_data$phase <- fct_relevel(plot1a_data$phase, c("ctrl", "main"))
glimpse(plot1a_data)
## Rows: 39
## Columns: 6
## $ phase <fct> ctrl, ctrl, ctrl, ctrl, ctrl, ctrl, ctrl, ctrl, ctrl, ct…
## $ PID <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1…
## $ trials <int> 117, 119, 116, 117, 119, 106, 119, 118, 81, 95, 113, 74,…
## $ number_primed <int> 81, 61, 62, 63, 63, 54, 61, 94, 38, 51, 55, 40, 71, 52, …
## $ percentage <dbl> 69.23077, 51.26050, 53.44828, 53.84615, 52.94118, 50.943…
## $ train <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, …
The last piece for Figure 1A was including the priming averages. The means documented in the paper for the main experiment condition was 62.8% with a SEM of ±3.04, and 57.65% with a SEM of ±2.07 for the control condition.
plot1a_mean <- plot1a_data %>% # Using the plot1a_mean data frame
group_by(phase) %>% # Grouping by the main/control phase
summarise(percent = mean(percentage), # Calculating the average priming using mean() function
SEM = sd(percentage)/sqrt(n())) # SEM calculated by SD divided by total trial number.
glimpse(plot1a_mean)
## Rows: 2
## Columns: 3
## $ phase <fct> ctrl, main
## $ percent <dbl> 57.27215, 62.82850
## $ SEM <dbl> 2.088622, 3.033010
plot1a_mean$phase <- fct_relevel(plot1a_mean$phase, c("main", "ctrl"))
write_csv(plot1a_mean, file = "plot1adescriptives.csv")
And we managed to get it almost exactly the same! With a mean of 62.8% and SEM of ±3.03 for the main experiment, and a mean of 57.2% and SEM of ±2.08 for the control experiment. There were slight rounding errors, but we were told it was most likely because the paper used a different programming tool (i.e. MatLab).
We could then move on to putting it all together. This was a tedious process since there were a few different coloured components alongside the null significance bar. The main steps to reproducing the figure are as follows:
ggplot2 package, I used the
ggplot() function to create a column graph using
geom_col() with local data argument of plot1a_mean and
aesthetic mapping where x axis is phase and y is percentage priming
(mean).geom_hline() first, then geom_col(), then
geom_errorbar() allows the geoms to be layered on top of
each other in the correct order. Otherwise, there was a line at 50%
cutting on top of the columns or the error bars would be hidden beneath
the columns.geom_jitter for local plot1a_data to
displays points that have individual participant priming.scale_y_continuous I created breaks in the axis
and added labels for 50, 70 and 90 whilst replacing every second tick in
between with a blank using the rep() function.labels() function in scale_x_discrete, and the
labs() function.theme() function to alter specific elements
of the graph, such as ticks and panel background.plot1a <- ggplot() +
geom_hline(yintercept = 50, linewidth = 0.3) +
geom_col(plot1a_mean,
mapping = aes(x = phase, y = percent, fill = phase),
width = 0.7, colour = "black") +
geom_errorbar(plot1a_mean,
mapping = aes(x = phase, ymin = percent - SEM, ymax = percent + SEM),
width = 0,
linewidth = 1) +
geom_jitter(aes(x = phase, y = percentage, fill = train),
plot1a_data,
width = 0.1,
pch = 21) +
coord_cartesian(ylim = c(40, 100)) + # Restricting the limits of the y-axis from 40 to 100
scale_y_continuous(breaks = c(50, 60, 70, 80, 90, 100),
labels = c(50, rep("", 1),
70, rep("", 1),
90, rep("", 1))) +
scale_x_discrete(labels = c("Main experiment", "No imagery ctrl")) +
theme_minimal() +
labs(
title = "Decision Priming",
subtitle = "Across Experiment",
x = "Experiment",
y = "Decision Priming (%)") +
theme(axis.line.y = element_line(colour = "black", linewidth = 0.5),
axis.ticks.y = element_line(colour = "black", linewidth = 0.5),
axis.ticks.length = unit(-0.1, "cm"),
axis.line.x = element_line(colour = "black", linewidth = 0.5),
panel.grid = element_blank(),
legend.position = "none")
#plot plot1a
plot(plot1a)
Now that the basic graph looked correct, there were still a few elements
that weren’t the same. It was looking a bit too colourful so I needed to
change it to black and white. Also, I still didn’t have the significance
bar between the two conditions. To add these details, I did the
following:
scale_fill_manual to specify the colours for
average priming and individual participants.geom_signif() as part of the ggsignif
package to create the null significance bar across the two
conditions.geom_text() to add the astericks above the columns
indicating level of significance.plot1a <- plot1a +
scale_fill_manual(values = c("FALSE" = "#FFFFFF",
"TRUE" = "#000000",
"main" = "#808080",
"ctrl" = "#FFFFFF")) +
geom_signif(data = plot1a_mean, aes(factor(phase), percent),
comparisons = list(c("main", "ctrl")),
y_position = 98,
map_signif_level = TRUE,
annotations = "n.s.") +
geom_text(data = data.frame(phase = "main", y = 92.5),
aes(phase, y = y, size = 2), label = "***", colour = "#808080") +
geom_text(data = data.frame(phase = "ctrl", y = 92.5),
aes(phase, y = y, size = 2), label = "**", colour = "#808080")
## Warning in geom_signif(data = plot1a_mean, aes(factor(phase), percent), : You
## have set data and mapping, are you sure that manual = FALSE is correct?
plot(plot1a)
Figure 1a complete!
Now, onto Figure 1b. Similarly, we had to plot individual participant results and the total averages. This was more tedious since there were a greater number of conditions we had to adhere to. Now we knew we had to distinguish participants who were trained compared to those who weren’t, we incorporated that into our code. The steps are as follows:
filter() for the control and main experiment (phase)
only, priming and empty conditions (1 and 2) and when suppression isn’t
broken (0). Also, filter out for those blocks that equal to 0/mutate() to create a new column that tells us if a
trials had congruence between primed and chosen grating. Call this
“congruence” and let it return a congruent value when the primed and
chosen grating are the same. Return “incongruent” value if primed and
chosen grating are different. Return “empty” if the condition is 2 for
empty trials.group_by() congruence and participant ID to retain
those columns and return values over blocks.summarise() the mean vividness.mutate() to create a new column called “train” to
specify those included in the training experiment for incongruent, empty
and congruent conditions. Also, specify the respective colour for these
participants that will appear later on in the plot. This section is
long! But we had to ensure that we were getting each participant for
each congruence condition.fct_relevel() function.plot1b_data <- data %>%
dplyr::filter(phase == "main", condition != 3, brokenSuppression == 0, block != 0) %>%
mutate(congruence = case_when(condition == 1 & primedGrating == 1 & chosenGrating == 1 ~ "congruent",
condition == 1 & primedGrating == 2 & chosenGrating == 2 ~ "congruent",
condition == 1 & primedGrating == 1 & chosenGrating == 2 ~ "incongruent",
condition == 1 & primedGrating == 2 & chosenGrating == 1 ~ "incongruent",
condition == 2 ~ "empty")) %>%
group_by(congruence, PID) %>%
summarise(mean = mean(vividness)) %>%
ungroup() %>%
mutate(train = case_when(congruence == "incongruent" & PID == 1 |
congruence == "incongruent" & PID == 2 |
congruence == "incongruent" & PID == 4 |
congruence == "incongruent" & PID == 7 |
congruence == "incongruent" & PID == 8 |
congruence == "incongruent" & PID == 16 |
congruence == "incongruent" & PID == 17 |
congruence == "empty" & PID == 1 |
congruence == "empty" & PID == 2 |
congruence == "empty" & PID == 4 |
congruence == "empty" & PID == 7 |
congruence == "empty" & PID == 8 |
congruence == "empty" & PID == 16 |
congruence == "empty" & PID == 17 ~ "BLACK",
congruence == "congruent" & PID == 1 |
congruence == "congruent" & PID == 2 |
congruence == "congruent" & PID == 4 |
congruence == "congruent" & PID == 7 |
congruence == "congruent" & PID == 8 |
congruence == "congruent" & PID == 16 |
congruence == "congruent" & PID == 17 ~ "GREY",
TRUE ~ "WHITE"))
## `summarise()` has grouped output by 'congruence'. You can override using the
## `.groups` argument.
plot1b_data$congruence <- fct_relevel(plot1b_data$congruence, c("empty", "incongruent", "congruent"))
#print plot1b_data
glimpse(plot1b_data)
## Rows: 51
## Columns: 4
## $ congruence <fct> congruent, congruent, congruent, congruent, congruent, cong…
## $ PID <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, …
## $ mean <dbl> 2.540541, 3.478261, 3.342857, 2.090909, 2.972973, 2.310345,…
## $ train <chr> "GREY", "GREY", "WHITE", "GREY", "WHITE", "WHITE", "GREY", …
For Figure 1b, there did not appear to be any means documented in the paper for the vividness rating between primed and empty trials. So, we had to rely on comparing our averages to the plot in the paper by eye.
plot1b_mean <- plot1b_data %>% # New data frame
group_by(congruence) %>% # Group by congruence to retain column
summarise(average = mean(mean), # Summarise means and calculate SEM
SEM = sd(mean)/sqrt(n()))
plot1b_mean$congruence <- fct_relevel(plot1b_mean$congruence, # Manually reorder factor levels
c("empty", "incongruent", "congruent"))
# Print plot1b_mean
glimpse(plot1b_mean)
## Rows: 3
## Columns: 3
## $ congruence <fct> empty, incongruent, congruent
## $ average <dbl> 2.241783, 2.416609, 2.510490
## $ SEM <dbl> 0.1637733, 0.1691102, 0.1574406
The process of creating the plot for 1b was similar to 1a. The main steps include:
ggplot and
geom_col for column graph using local plot1b_mean data. The
aesthetic mapping for this is where x is congruence and y is the mean
vividnes.geom_errorbar using local plot1b_mean data to add
the SEM for each congruence condition.geom_jitter() for the individual points of
participant mean vividness using local plot1b_data, and aesthetic
mapping where x is congruence and y is the mean vividness.geom_signif as part of the ggsignif
package to create the significance bar across the empty and incongruent
condition, and empty and congruent condition.scale_fill_manual to specify the colours for
average priming and individual participants (remembering that we coded
BLACK, WHITE, and GREY for each participant which made our lives much
easier to colour the points).scale_y_continuous to create appropriate axis
breaks and replacing every second tick using the rep()
function.scale_x_discrete and labs(), as well as
theme() to modify specific elements of the plot.plot1b <- ggplot() +
geom_col(plot1b_mean,
mapping = aes(x = congruence, y = average, fill = congruence),
colour = "black",
width = 0.7) +
geom_errorbar(plot1b_mean,
mapping = aes(x = congruence, ymin = average - SEM, ymax = average + SEM),
width = 0,
linewidth = 1) +
geom_jitter(aes(x = congruence, y = mean, fill = train),
plot1b_data,
pch = 21,
width = 0.1) +
geom_signif(plot1b_mean,
mapping = aes(factor(congruence), average),
comparisons = list(c("empty", "incongruent")),
y_position = 3.7,
map_signif_level = TRUE,
annotations = "*") +
geom_signif(plot1b_mean,
mapping = aes(factor(congruence), average),
comparisons = list(c("empty", "congruent")),
y_position = 3.85,
map_signif_level = TRUE,
annotations = "*") +
scale_fill_manual(values = c("WHITE" = "#FFFFFF",
"BLACK" = "#000000",
"GREY" = "#808080",
"incongruent" = "#808080",
"empty" = "#FFFFFF",
"congruent" = "#000000")) +
coord_cartesian(ylim = c(1, 4)) +
scale_y_continuous(breaks = c(1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5),
labels = c(1, rep("", 1),
2, rep("", 1),
3, rep("", 1),
4, rep("", 1))) +
scale_x_discrete(labels = c("Empty", "Incongruent", "Congruent")) +
theme_minimal() +
labs(
title = "Vividness Across Congruence",
subtitle = "Within Main Experiment",
x = "Trial",
y = "Vividness") +
theme(axis.line.y = element_line(colour = "black", linewidth = 0.5),
axis.ticks.y = element_line(colour = "black", linewidth = 0.5),
axis.ticks.length = unit(-0.1, "cm"),
axis.line.x = element_line(colour = "black", linewidth = 0.5),
panel.grid = element_blank(),
legend.position = "none")
## Warning in geom_signif(plot1b_mean, mapping = aes(factor(congruence), average),
## : You have set data and mapping, are you sure that manual = FALSE is correct?
## Warning in geom_signif(plot1b_mean, mapping = aes(factor(congruence), average),
## : You have set data and mapping, are you sure that manual = FALSE is correct?
#plot plot1b
plot(plot1b)
And that’s all for Figure 1b!
This was a tough plot to reproduce because we needed to compute 4 different exclusion criteria measures and differentiate the excluded and included participants. Although we were given the data for the included participants for figure 2, we wanted to see if we could code it ourselves. We approached this carefully, making sure to check that we were selecting the right number identifiers for each variable.
To begin, we started by coding the data for the included participants. The steps we took were as follows:
filter() for the main experiment (phase) only.mutate() several new columns based on different
criteria:
group_by() function to group by participant
IDs to retain those columns and return the above values based on
blocks.summarise() function to summarise the number
of trials using sum() by which a certain criteria was met:
abs()
function) of green primed trials minus red primed trials over total
normal trials.plot2_included <- data %>%
dplyr::filter(phase == "main", block != 0) %>%
mutate(priming = case_when(condition == 1 & primedGrating == 1
& chosenGrating == 1 & brokenSuppression == 0 ~ "primed",
condition == 1 & primedGrating == 2
& chosenGrating == 2 & brokenSuppression == 0 ~ "primed",
condition == 1 & primedGrating == 1
& chosenGrating == 2 & brokenSuppression == 0 ~ "unprimed",
condition == 1 & primedGrating == 2
& chosenGrating == 1 & brokenSuppression == 0 ~ "unprimed",
condition == 1 & brokenSuppression == 1 ~ "broken"),
green_priming = case_when(condition == 1 & primedGrating == 1
& brokenSuppression == 0 ~ "primed_green"),
red_priming = case_when(condition == 1 & primedGrating == 2
& brokenSuppression == 0 ~ "primed_red"),
catch_detection = case_when(condition == 3 & brokenSuppression == 1 ~ "caught")) %>%
group_by(PID) %>%
summarise(priming = sum(priming == "primed",
na.rm = TRUE)/sum(condition == 1 &
brokenSuppression == 0),
catch_detection = sum(catch_detection == "caught", na.rm = TRUE)/sum(condition == 3),
suppression_break = sum(condition != 3 & brokenSuppression == 1)/sum(condition == 1),
gr_difference = abs(sum(green_priming == "primed_green", na.rm = TRUE) -
sum(red_priming == "primed_red",
na.rm = TRUE))/sum(condition == 1 & brokenSuppression == 0))
print(plot2_included)
## # A tibble: 17 × 5
## PID priming catch_detection suppression_break gr_difference
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 0.661 0.9 0.0667 0.0714
## 2 2 0.868 1 0.117 0.0943
## 3 3 0.593 0.8 0.0167 0.0169
## 4 4 0.559 1 0.0333 0.0169
## 5 5 0.661 1 0.0667 0.0714
## 6 6 0.492 0.933 0.0333 0.0169
## 7 7 0.569 1 0.0333 0.0345
## 8 8 0.672 1 0.0833 0
## 9 9 0.593 1 0.0167 0.0169
## 10 10 0.537 0.967 0.133 0.0741
## 11 11 0.542 1 0.05 0.0169
## 12 12 0.627 1 0.0333 0.0169
## 13 13 0.554 1 0.183 0.0357
## 14 14 0.508 0.967 0.0167 0.0169
## 15 15 0.525 1 0.0167 0.0169
## 16 16 0.915 0.9 0.0167 0.0169
## 17 17 0.804 0.833 0.1 0.0357
Wow, that was a lot! We did manage to get values for each 4 criteria
for the 17 participants included in the main experiment, and they match
up with the included participant data we were given in the
“excluded_data” file. However, the format the data frame is in wasn’t
appropriate for the plot we had to replicate. We had to find a way to
shrink this data frame so that each of the 4 criteria were in a column
of it’s own with their respective value so we could plot each of the 4
criteria on the x-axis. Luckily the privot_longer()
function does just that. We also needed to change a variable into a
factor vector using the as.factor() function so that
ggplot() can recognise and plot it later on.
plot2_included <- plot2_included %>%
pivot_longer(2:5, # Specifying the column numbers to manipulate
names_to = "measure", # Moving names to a column named measure
values_to = "portion") %>% # Putting values into a column named portion
mutate(id = "Included") # Creating a column called id to list included
# Changing the measure variable into a factor vector
plot2_included$measure <- as.factor(plot2_included$measure)
# Manually reordering factor levels to the appropriate order on x-axis
plot2_included$measure <- fct_relevel(plot2_included$measure,
c("priming", "catch_detection",
"suppression_break", "gr_difference"))
print(plot2_included)
## # A tibble: 68 × 4
## PID measure portion id
## <dbl> <fct> <dbl> <chr>
## 1 1 priming 0.661 Included
## 2 1 catch_detection 0.9 Included
## 3 1 suppression_break 0.0667 Included
## 4 1 gr_difference 0.0714 Included
## 5 2 priming 0.868 Included
## 6 2 catch_detection 1 Included
## 7 2 suppression_break 0.117 Included
## 8 2 gr_difference 0.0943 Included
## 9 3 priming 0.593 Included
## 10 3 catch_detection 0.8 Included
## # ℹ 58 more rows
Horray! Now that the data frame is in the way we want it to be, we can code the data for the excluded participants. This was a much easier process as we were given the data for the excluded participants (which was not in the original data given, so we couldn’t give it a go ourselves like we did for the included participants). However, we did have to rename the variables to make it easier to interpret since the columns were labelled by ellipses and a number.
plot2_excluded <- excluded_data %>%
slice(21:33) %>% # Specifying the columns we need that have the excluded participant data
rename(priming = included, # Renaming columns in an interpretable way
catch_detection = ...2,
suppression_break = ...3,
gr_difference = ...4) %>%
mutate(PID = 1:n()) %>% # Creating a new column to store participant number
pivot_longer(1:4, # Shrinking the data frame so we can plot it
names_to = "measure",
values_to = "portion") %>%
mutate(id = "Excluded") # Creating an id column to list excluded
# Changing portion into a numeric vector
plot2_excluded$portion <- as.numeric(plot2_excluded$portion)
# Changing PID into a numeric vector
plot2_excluded$PID <- as.numeric(plot2_excluded$PID)
# Changing measure into a factor vector
plot2_excluded$measure <- as.factor(plot2_excluded$measure)
# Manually reordering factor levels
plot2_excluded$measure <- fct_relevel(plot2_excluded$measure, c("priming", "catch_detection",
"suppression_break", "gr_difference"))
print(plot2_excluded)
## # A tibble: 52 × 4
## PID measure portion id
## <dbl> <fct> <dbl> <chr>
## 1 1 priming 0.575 Excluded
## 2 1 catch_detection 0.1 Excluded
## 3 1 suppression_break 0.333 Excluded
## 4 1 gr_difference 0.5 Excluded
## 5 2 priming 0.559 Excluded
## 6 2 catch_detection 0.767 Excluded
## 7 2 suppression_break 0.0167 Excluded
## 8 2 gr_difference 0.0169 Excluded
## 9 3 priming 0.464 Excluded
## 10 3 catch_detection 0.6 Excluded
## # ℹ 42 more rows
Now that we have our two data frames that match up, we can combine
them to form one cohesive data frame that we can plot easily. To achieve
this, we used the rbind() function for both plot2_included
and plot2_excluded.
plot2_data <- rbind(plot2_included, plot2_excluded)
# Once again, changing id to a factor vector
plot2_data$id <- as.factor(plot2_data$id)
Having the full data for included and excluded participants, we now can reproduce the means and SEMs reported in the paper for excluded and included participants. For catch detection this was 0.673 (±0.079 SEM) and 0.959 (±0.016 SEM), respectively, suppression breaks was 0.074 (±0.034 SEM) and 0.042 (±0.008 SEM),respectively, red/green priming difference was 0.047 (±0.038 SEM) and 0.035 (±0.007 SEM), respectively, and finally priming ratios were 0.591 (±0.035 SEM) and 0.628 (±0.023 SEM), respectively.
Using the plot2_data, we used the pivot_wider() function
to return the column labels for each of the four criteria for easier
viewing. We then group_by() excluded and included and
summarise() the means and SEMs for each criteria.
plot2_mean <- plot2_data %>%
pivot_wider(id_cols = c(PID, id),
names_from = measure,
values_from = portion) %>%
group_by(id) %>%
summarise(mean_priming = mean(priming), SEM_priming = sd(priming)/sqrt(n()),
mean_catch = mean(catch_detection), SEM_catch = sd(catch_detection)/sqrt(n()),
mean_suppression = mean(suppression_break), SEM_suppression = sd(suppression_break)/sqrt(n()),
mean_red_green_diff = mean(gr_difference), SEM_diff = sd(gr_difference)/sqrt(n()))
print(plot2_mean)
## # A tibble: 2 × 9
## id mean_priming SEM_priming mean_catch SEM_catch mean_suppression
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Excluded 0.591 0.0346 0.673 0.0797 0.0739
## 2 Included 0.628 0.0303 0.959 0.0155 0.0598
## # ℹ 3 more variables: SEM_suppression <dbl>, mean_red_green_diff <dbl>,
## # SEM_diff <dbl>
Fortunately, our means seem to line up!
Now that all the data wrangling is out of the way, we could finally
move onto actually plotting the data. For our first attempt we used the
geom_point() function for the individual plots, and the
geom_path() function which is intended to connect
observations in the order they appear in the data. However, we found
that the lines between points were not connecting properly.
ggplot(plot2_data, aes(measure, portion, colour = id)) +
geom_point(position = position_jitterdodge(jitter.width = 0.2, seed = 1)) +
geom_path(position = position_jitterdodge(jitter.width = 0.2, seed = 1),
aes(group = interaction(PID, id), alpha = 0.5)) +
scale_colour_manual(values = c("#808080", "#FF0000")) +
scale_x_discrete(labels = c("Priming",
"Catch Detection",
"Suppression Break",
"Red/Green difference")) +
theme_minimal() +
labs(
title = "Decision Priming",
subtitle = "Across Training Period",
x = "Proportion of trials") +
theme(axis.line.y = element_line(color = "black", linewidth = 0.5),
axis.ticks.y = element_line(color = "black", linewidth = 0.5),
axis.ticks.length = unit(-0.1, "cm"),
axis.line.x = element_line(color = "black", linewidth = 0.5),
panel.grid = element_blank(),
legend.position = "none")
Well, at least the plots from
geom_point() appeared to be
correct…but the lines from geom_path() were not connecting
as it should.
We were a bit stuck, but after some guidance and Googling we found
out that we needed to use geom_line(). We had to manually
adjust the where the individual points lied on the graph and have the
geom_point() and geom_line() x and y values
match up so they could be connected. This required a few steps. Firstly,
we took the plot2_data and created a new column called “PID2” using
mutate() where the condition was multiplying the
participant id by 100 only if the id was “included”. This ensured that
the lines connecting each point could run across all 4 criteria for each
participant.
plot2_data <- plot2_data %>%
mutate(PID2 = ifelse(id == "Included", PID*100, PID))
We then had to create a new data frame to modify the plot2_data. We
called this “plot2_datan” and used the transform() function
to create a new column called “dmeasure”. Within this column, the
“measure” variable was converted to a numeric vector using
as.numeric() for those participants that were included and
either adding or subtracting .25. This means that the points for the
included participants are placed on right side, and the points for the
excluded participants are placed to the left side for each of the four
criteria. Another column was created called “jportion” which specifies
how spread out the individual points are using the jitter()
function. In this function, the amount of dispersion is set
at 0.1.
plot2_datan = transform(plot2_data, dmeasure = ifelse(id == "Included",
as.numeric(measure) + .25,
as.numeric(measure) - .25))
plot2_datan = transform(plot2_datan, dmeasure = ifelse(id == "Included",
jitter(as.numeric(measure) + .25, .1),
jitter(as.numeric(measure) - .25, .1)),
jportion = jitter(portion, amount = 0.1))
# Converting jportion into a numeric vector
plot2_datan$jportion <- as.numeric(plot2_datan$jportion)
Now that the points have been manually manipulated, hopefully the
plot now looks more like the paper! In order to achieve the different
colours for the points and line, we had to download a new package called
ggnewscale which allowed us to put separate colour scales
for different geoms_. The main for this plot are as
follows:
geom_line and specifying the aesthetics to be
group = PID2 so the lines joining each point can be distinguished, the x
and y variables and the id as either exluded or included.scale_color_manual() to adjust the colour for
excluded and included. Then adding new_scale_colour() to
reset the colours so that any further geoms_ do not adopt the same
colour scales.geom_point and specifying the x and y variables
and the id as either exluded or included. Also adjusting the point size
and the colours appropriately.scale_x_continuous to create four breaks for each
criteria and labelling it appropriately.scale_y_continuous to create breaks in the axis
and replacing every second tick in between with a blank using the
rep() function.ggplot(plot2_datan) +
geom_line(aes(group = PID2, x = dmeasure, y = jportion, colour = id)) +
scale_color_manual(values = c("Excluded" = "grey", "Included" = "#FF9481")) +
new_scale_colour() +
geom_point(aes(x = dmeasure, y = jportion, colour = id),
size = 2.5) +
scale_colour_manual(values = c("Excluded" = "grey30", "Included" = "#EB4D34")) +
scale_x_continuous(breaks = c(1, 2, 3, 4),
labels = c("Priming", "Catch Detection",
"Suppression Break", "Red/green difference")) +
scale_y_continuous(breaks = c(0, 0.25, 0.5, 0.75, 1, 1.25),
labels = c(0, rep("", 1),
0.5, rep("", 1),
1, rep("", 1))) +
theme_minimal() +
labs(y = "Proportion of trials",
title = "Decision Priming",
subtitle = "Across Training Period") +
theme(axis.line.y = element_line(colour = "grey50", linewidth = 0.5),
axis.ticks.y = element_line(colour = "grey50", size = 0.5),
axis.text.y = element_text(colour = "black"),
axis.line.x = element_line(colour = "grey50", linewidth = 0.5),
axis.title.x = element_blank(),
axis.ticks.x = element_line(colour = "grey50", size = 0.5),
axis.text.x = element_text(colour = "black"),
axis.ticks.length=unit(-0.10, "cm"),
panel.grid = element_blank(),
legend.position = c(0.75, 0.8),
legend.title = element_blank())
## Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Success! Now it’s looking very similar to the paper.
Figure 3 was a little more straight forward. It was just averaging the mean percentage of priming for each training phase and plotting it. However, things that appear simple can be deceiving sometimes, and that was the case here. Before I get into the challenges we had, firstly onto how we coded the data for plot 3. The steps were as follows:
dplyr filter() function to
include only when phase includes results from training 1, 2 and 3.filter() when the
condition is a normal trial and there is no broken suppression.mutate() function to create a new column
called “primed” for when the primed grating and chosen grating are the
same and return these values as TRUE.group_by() participant ID, phase and block so we can
then summarise() the number trials and count the percentage
of trials that were primed.ungroup() this so we can group_by() block
and phase to calculate the the average percentage primed and the SEM for
each individual block.plot3_data <- data %>%
dplyr::filter(phase %in% c("results_train1", "results_train2", "results_train3")) %>%
dplyr::filter(condition == 1, brokenSuppression == 0) %>%
mutate(primed = case_when(primedGrating == 1 & chosenGrating == 1 ~ TRUE,
primedGrating == 2 & chosenGrating == 2 ~ TRUE)) %>%
group_by(PID, phase, block) %>%
summarise(trials = n(),
number_primed = sum(primed, na.rm = TRUE),
percentage = 100*number_primed/trials) %>%
ungroup() %>%
group_by(phase, block) %>%
summarise(mean_percentage = mean(percentage),
SEM = sd(percentage)/sqrt(n()))
## `summarise()` has grouped output by 'PID', 'phase'. You can override using the
## `.groups` argument.
## `summarise()` has grouped output by 'phase'. You can override using the
## `.groups` argument.
# Converting block to a factor vector
plot3_data$block <- as.factor(plot3_data$block)
# Printing the data to check everything went smoothly
print(plot3_data)
## # A tibble: 15 × 4
## # Groups: phase [3]
## phase block mean_percentage SEM
## <chr> <fct> <dbl> <dbl>
## 1 results_train1 1 43.1 6.02
## 2 results_train1 2 48.1 8.87
## 3 results_train1 3 52.1 7.18
## 4 results_train1 4 44.6 4.79
## 5 results_train1 5 36.0 4.99
## 6 results_train2 1 41.9 5.58
## 7 results_train2 2 36.9 7.35
## 8 results_train2 3 33.9 4.31
## 9 results_train2 4 39.3 5.13
## 10 results_train2 5 34.5 4.14
## 11 results_train3 1 39.0 8.61
## 12 results_train3 2 38.1 6.77
## 13 results_train3 3 31.8 5.72
## 14 results_train3 4 40.5 6.54
## 15 results_train3 5 42.3 5.78
However, upon checking our mean_percentage with the individual plots of figure 3, something was off. Our numbers seemed to be slightly off, either too low or too high. For example, for results training 1 in block 3, our coded output was 52.07, however the original plot had that value below 50. We thought we had done something wrong with our code and meticulously checked it line by line. We had no idea where we had gone wrong so we consulted with our supervisor. It turns out that there was an error on their end. There had been a bug in the previous code whereby they were “averaging past blocks in each session” rather than independently, thus leading to incorrect reporting in the manuscript. Our numbers we coded were actually right! Our supervisor provided an amended graph:
Moving on, the figure also had individual points on the sides of the graph denoting the baseline average and the post-training average. These were coded below in a similar fashion to the average per training session above:
plot3_baseline <- data %>%
dplyr::filter(phase == "training_baseline", # Only using data from baseline training
condition == 1, # Again, specifying this is a normal trial
brokenSuppression == 0) %>%
mutate(primed = case_when( # Again, a primed column for correct priming
primedGrating == 1 & chosenGrating == 1 ~ TRUE,
primedGrating == 2 & chosenGrating == 2 ~ TRUE,
TRUE ~ FALSE)) %>% #
group_by(PID, block, phase) %>%
summarise(trials = n(),
number_primed = sum(primed, na.rm = TRUE),
percentage = 100*number_primed/trials) %>%
ungroup() %>%
summarise(mean_percentage = mean(percentage),
SEM = sd(percentage)/sqrt(n()))
## `summarise()` has grouped output by 'PID', 'block'. You can override using the
## `.groups` argument.
print(plot3_baseline)
## # A tibble: 1 × 2
## mean_percentage SEM
## <dbl> <dbl>
## 1 69.7 2.97
write_csv(plot3_baseline, file = "plot3basedescriptives.csv")
The mean pre-training priming percentage was reported in the paper as 69.89%, which we got a similar value of 69.67%.
plot3_trainingaverage <- data %>%
dplyr::filter(phase %in% c("results_train1", "results_train2", "results_train3"),
condition == 1,
brokenSuppression == 0) %>%
mutate(primed = case_when(primedGrating == 1 & chosenGrating == 1 ~ TRUE,
primedGrating == 2 & chosenGrating == 2 ~ TRUE,
TRUE ~ FALSE)) %>%
group_by(PID, block, phase) %>%
summarise(trials = n(),
number_primed = sum(primed, na.rm = TRUE),
percentage = 100*number_primed/trials) %>%
ungroup() %>%
summarise(mean_percentage = mean(percentage),
SEM = sd(percentage)/sqrt(n()))
## `summarise()` has grouped output by 'PID', 'block'. You can override using the
## `.groups` argument.
print(plot3_trainingaverage)
## # A tibble: 1 × 2
## mean_percentage SEM
## <dbl> <dbl>
## 1 40.1 1.59
The mean post-training priming percentage was reported in the paper as 40.11% which we got a similar value of 40.13%.
Finally, the average means and SEMs for each training period were calculated.
plot3_mean <- plot3_data %>%
summarise(mean = mean(mean_percentage),
SEM = sd(mean_percentage)/sqrt(n()))
print(plot3_mean)
## # A tibble: 3 × 3
## phase mean SEM
## <chr> <dbl> <dbl>
## 1 results_train1 44.8 2.68
## 2 results_train2 37.3 1.49
## 3 results_train3 38.3 1.77
write_csv(plot3_mean, file = "plot3meandescriptives.csv")
Now we have all the data wrangled, we can plot it! Again, although we thought this was simple, we actually had to attempt this a few times. This was our first attempt to just make sure the individual plots and SEMs were correct. The main steps were as follows:
ggplot to specify the data to
be “plot3_data”.geom_point() for individual plots and map
aesthetics so that block is plotted on the x-axis and mean percentage on
the y-axis.geom_path() (which did work this time!) to connect
the points generated based on the phase of training.geom_errorbar() to generate the error bars calculated
by the means ± the SEMs.facet_wrap() function and specified it
to separate by phase.geom_hline()
function.plot3 <- ggplot(plot3_data) +
geom_point(aes(x = block,
y = mean_percentage)) +
geom_path(aes(x = block,
y = mean_percentage,
group = phase)) +
geom_errorbar(aes(x = block,
ymin = mean_percentage - SEM,
ymax = mean_percentage + SEM),
width = 0.4) +
facet_wrap(~phase) +
geom_hline(yintercept = 50, size = 0.3, linetype = "dotted") +
coord_cartesian(ylim = c(0, 90)) + # y-axis limits from 0 to 90
scale_y_continuous(breaks = c(10, 30, 50, 70), # Creating breaks in y-axis
minor_breaks = c(0, 20, 40, 60, 80, 100),
expand = c(0,0)) + # Forcing the y-axis to start at 0
labs( # Appropriately labeling the graph
title = "Decision Priming",
subtitle = "Across Training Period",
x = "Training Period",
y = "Decision Priming (%)") +
theme(axis.line.y = element_line(colour = "black", # Modifying graph elements
linewidth = 0.5),
axis.ticks.y = element_line(colour = "black",
linewidth = 0.5),
axis.ticks.length = unit(-0.1, "cm"),
axis.line.x = element_line(colour = "black",
linewidth = 0.5),
panel.grid = element_blank())
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
plot(plot3)
The individual plots match up with the amended points our supervisor
gave! But there were a lot of adjustments to be made, most notably the
shaded error bars and the pre- and post-training averages. I was having
difficulty with coding the pre- and post-training averages since the
graph was faceted. This meant that if the point would appear in all
three graphs instead of just one. For example:
Which was totally wrong. I found my answer here “https://www3.nd.edu/~steve/computing_with_data/13_Facets/facets.html”,
which said that you needed to restricting the data set so your
geom_point() can show in a single facet. I knew I need to
try and specify which facet it should appear it, so I created a new
column called phase to match the plot3_data, and specified the exact
training facet. That way, the data frames could could work synonymously
with each other.
phase = c("results_train1")
plot3_baseline$phase <- phase
phase = c("results_train3")
plot3_trainingaverage$phase <- phase
print(plot3_baseline)
## # A tibble: 1 × 3
## mean_percentage SEM phase
## <dbl> <dbl> <chr>
## 1 69.7 2.97 results_train1
print(plot3_trainingaverage)
## # A tibble: 1 × 3
## mean_percentage SEM phase
## <dbl> <dbl> <chr>
## 1 40.1 1.59 results_train3
Let’s hope this works!
Now, I moved onto attempt 2 of plotting this figure. Another
challenge we encounted was with the shaded error bars. It was
recommended that we used geom_ribbon(). However, when I
plotted geom_ribbon() with the necessary x and ymin/ymax
values, I still wasn’t getting the ribbon! This Stack Overflow post
saved me: https://stackoverflow.com/questions/34553044/missing-ribbon-in-ggplot2.
The post mentioned that you needed a grouping variable for it to work,
and “if there is only one observation per group, then you specify group
= 1”. Adding that to the aesthetics of geom_ribbon() worked
perfectly! Thank you stackoverflow. Once again, the order of the geoms
was important, geom_ribbon goes last so that
geom_point can go on top.
For the pre- and post-training points, I used
geom_point() and specified the x and y values for each as
was previously calculated. I also adjusted the shape and colour so it
matched what was in the manuscript. Using geom_text() I
added labels for these points.
plot3 <- ggplot() +
geom_ribbon(data = plot3_data,
mapping = aes(x = block,
ymin = mean_percentage - SEM,
ymax = mean_percentage + SEM,
group = 1),
fill = "lightgrey") +
geom_point(data = plot3_data,
mapping = aes(x = block, y = mean_percentage), size = 3) +
geom_path(data = plot3_data,
mapping = aes(x = block, y = mean_percentage, group = phase)) +
facet_wrap(~phase, # Adding appropriate labels to each training phase
labeller = as_labeller(c(results_train1 = "Session 1",
results_train2 = "Session 2",
results_train3 = "Session 3")),
strip.position = "bottom") +
geom_hline(yintercept = 50, size = 0.3, linetype = "dotted") +
geom_errorbar(plot3_baseline,
mapping = aes(x = 0.5,
ymin = mean_percentage - SEM,
ymax = mean_percentage + SEM),
width = 0) +
geom_point(data = plot3_baseline,
x = 0.5, y = 69.66955,
shape = 21,
colour = "black",
fill = "white",
stroke = 1.5,
size = 2.75) +
geom_text(data = plot3_baseline,
x = 0.5, y = 69.66955,
label = "Pre-training",
size = 3.2, angle = 90, vjust = -1) +
geom_errorbar(plot3_trainingaverage,
mapping = aes(x = 5.5,
ymin = mean_percentage - SEM,
ymax = mean_percentage + SEM),
width = 0) +
geom_point(data = plot3_trainingaverage,
x = 5.5, y = 40.13709,
shape = 21,
colour = "black",
fill = "white",
stroke = 1.5,
size = 2.75) +
geom_text(data = plot3_trainingaverage,
x = 5.5, y = 40.13709,
label = "Training Average",
size = 3, angle = 270, vjust = -1) +
coord_cartesian(ylim = c(25, 75)) +
scale_y_continuous(expand = c(0, 0),
breaks = c(30, 35, 40, 45, 50, 55, 60, 65, 70, 75),
labels = c(30, rep("", 1), # Replacing every 2nd tick with a blank
40, rep("", 1),
50, rep("", 1),
60, rep("", 1),
70, rep("", 1))) +
scale_x_discrete(expand = c(0.3, 0.3))+
theme_minimal() +
labs( # Adding appropriate labels
title = "Decision Priming",
subtitle = "Across Training Period",
x = "Training",
y = "Decision Priming (%)") +
theme(strip.background = element_blank(),# Modifying specific graph elements
strip.placement = "outside",
axis.line.y = element_line(colour = "black", linewidth = 0.5),
axis.ticks.y = element_line(colour = "black", linewidth = 0.5),
axis.ticks.length = unit(-0.1, "cm"),
axis.line.x = element_line(colour = "black", linewidth = 0.5),
axis.ticks.x = element_line(colour = "black", linewidth = 0.5),
panel.grid = element_blank(),
panel.spacing = unit(0, "lines"))
plot(plot3)
And after all that, we did it! It looks almost identical to what was in
the manuscript! I was so proud.
Last, but not least, we have our final plot.
Similar to the previous plot, we needed to filter() for
the data from the training phases only. Then, we used
mutate() to created a new column for priming and the
specific training phase. “Training_baseline” and “post_train” was
retained, however “results_train” 1, 2 and 3 were merged into
“results_train”. This was piped into a group_by() function
for the training phase and participant ID. We then
summarise() for the average number of trials that were
primed.
plot4_data <- data %>%
dplyr::filter(phase %in% c("training_baseline",
"results_train1",
"results_train2",
"results_train3",
"post_train"), condition == 1, brokenSuppression == 0) %>%
mutate(primed = case_when(primedGrating == 1 & chosenGrating == 1 ~ TRUE,
primedGrating == 2 & chosenGrating == 2 ~ TRUE),
training_phase = case_when(phase == "training_baseline" ~ "training_baseline",
phase == "results_train1" |
phase == "results_train2" |
phase == "results_train3" ~ "results_train",
phase == "post_train" ~ "post_train")) %>%
group_by(training_phase, PID) %>%
summarise(trials = n(),
number_primed = sum(primed, na.rm = TRUE),
percentage = 100*number_primed/trials) %>%
ungroup()
## `summarise()` has grouped output by 'training_phase'. You can override using
## the `.groups` argument.
# Manually reordering so training_baseline appears first, results_train next, then post_train
plot4_data$training_phase <- fct_relevel(plot4_data$training_phase,
c("training_baseline", "results_train", "post_train"))
print(plot4_data)
## # A tibble: 21 × 5
## training_phase PID trials number_primed percentage
## <fct> <dbl> <int> <int> <dbl>
## 1 post_train 1 60 28 46.7
## 2 post_train 2 60 29 48.3
## 3 post_train 3 60 7 11.7
## 4 post_train 4 58 27 46.6
## 5 post_train 5 60 58 96.7
## 6 post_train 6 60 7 11.7
## 7 post_train 7 60 22 36.7
## 8 results_train 1 360 149 41.4
## 9 results_train 2 359 177 49.3
## 10 results_train 3 360 99 27.5
## # ℹ 11 more rows
To obtain the mean for the effects of training over time, we
group_by() training phase to get values over participant ID
and summarised the mean(). Importantly, we had to minus
this value by 50, which refers to priming by chance if it was 50%.
plot4_mean <- plot4_data %>%
group_by(training_phase) %>%
summarise(mean = mean(percentage) - 50)
print(plot4_mean)
## # A tibble: 3 × 2
## training_phase mean
## <fct> <dbl>
## 1 training_baseline 19.9
## 2 results_train -9.95
## 3 post_train -7.40
We then obtained the means in a similar way, however, without subtracting 50 so we could code the SEM.
plot4_desc <- plot4_data %>%
group_by(training_phase) %>%
summarise(mean = mean(percentage),
SEM = sd(percentage)/sqrt(n()))
# Manually reordering the factors once again
plot4_mean$training_phase <- fct_relevel(
plot4_mean$training_phase, c("training_baseline", "results_train", "post_train"))
print(plot4_desc)
## # A tibble: 3 × 3
## training_phase mean SEM
## <fct> <dbl> <dbl>
## 1 training_baseline 69.9 4.66
## 2 results_train 40.1 3.36
## 3 post_train 42.6 10.8
write_csv(plot4_desc, file = "plot4descriptives.csv")
These means and SEMs correlate to the 1 week post-training reported in the manuscript, which was 42.6% and 10.84 SEM, respectively. No means or SEMs were reported for pre-training and during training, so we confirmed our coded means by eye by comparing it to the figure. It appeared to be correct!
We could then move on to plotting the above data. The main steps to reproducing this figure were as follows:
ggplot() object and named it “plot4”.geom_col() function for column graph according
to local data of plot4_mean and, aesthetic mapping of x being training
phase and y being percentage priming.geom_point() function for the points
corresponding to each of the 7 included participants for each training
phase.geom_path() function to connect these points
according to participant ID.scale_y_continuous(). The most difficult part was
attempting to replicate the arrow on the right hand side. We worked
around this by adding right hand side axis in the
scale_y_continuous() function using sec.axis()
and labeling it “Congruent Incongruent”. To form the arrow heads, in
theme() we used the arrow argument as part of
element_line().theme()
for ticks, texts, titles and panel grid.plot4 <- ggplot() +
geom_col(plot4_mean,
mapping = aes(training_phase, y = mean, fill = training_phase),
colour = "black",# Column border colour
size = 0.1,# Making border line thickness smaller
position = position_nudge(y = 50), width = 0.5) +
scale_fill_manual(values = c("white", "lightgrey", "black")) + # Colouring the bars appropriately
geom_errorbar(plot4_desc, mapping = aes(x = training_phase,
ymin = mean - SEM,
ymax = mean + SEM), width = 0) +
geom_point(plot4_data,
mapping = aes(x = training_phase, y = percentage),
shape = 21, # This specific shape contains a border
colour = "white",# Border colour
fill = "gray50", # Point colour
stroke = 1, # Adjusting border thickness
size = 2.75) + # Adjusting point size
geom_path(plot4_data,
mapping = aes(x = training_phase, y = percentage, group = PID),
colour = "gray50") +
geom_hline(yintercept = 50, size = 0.3) + # Line going through the graph
coord_cartesian(ylim = c(0, 100)) +
scale_y_continuous(
breaks = c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100),
minor_breaks = (c(0, 20, 40, 60, 80)),
labels = c(10, rep("", 1), # Replace each second value with a blank (i.e. "")
30, rep("", 1), # So every second tick will be blank
50, rep("", 1),
70, rep("", 1),
90, rep("", 1)),
expand = c(0,0),
sec.axis = sec_axis(~.*1, name = "Congruent Incongruent")) + # Adding right hand side axis
scale_x_discrete(labels = c("Pre-training", "Training", "Post-training")) +
theme_minimal() +
labs(
title = "Decision Priming",
subtitle = "Across Training Period",
x = "Training Period",
y = "Decision Priming (%)") +
theme(legend.position = "none",
axis.line.y.right =
element_line(arrow = grid::arrow(length = unit(0.3, "cm"),
ends = "both",
type = "closed"),
colour = "grey50",
linewidth = 0.75),
axis.ticks.y.right = element_blank(),
axis.text.y.right = element_blank(),
axis.title.y.right = element_text(colour = "grey50", size = 11, vjust = 2),
axis.line.y.left = element_line(colour = "grey50", linewidth = 0.5),
axis.ticks.y = element_line(colour = "grey50", size = 0.5),
axis.ticks.length = unit(-0.1, "cm"),
axis.ticks.x = element_line(colour = "grey50", size = 0.5),
panel.grid = element_blank())
plot(plot4)
And so concludes the very last plot for this manuscript!
Moving onto the exploratory analysis, which required me to use the data to attempt to answer 3 questions of interest to me about the research.
The method states that they have used both green and red gratings during the priming main experiment, and in my initial reading I was curious as to the decision of the colour differences. There could be a potential for a certain colour leading to better overall priming in the main experiment, which has implications for what is most optimal for the visual suppression paradigm.
To start off fresh, I read in all the data files given.
data <-read_csv("data/PearsonLab_Data_Collated.csv")
## New names:
## Rows: 11087 Columns: 13
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (2): file, phase dbl (11): ...1, condition, primedGrating, chosenGrating,
## brokenSuppression, ...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
# Printing included participant data
print(data)
## # A tibble: 11,087 × 13
## ...1 file condition primedGrating chosenGrating brokenSuppression vividness
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 01_1… 1 1 1 0 NA
## 2 2 01_1… 1 2 2 0 NA
## 3 3 01_1… 3 2 NA 1 NA
## 4 4 01_1… 1 1 1 0 NA
## 5 5 01_1… 1 1 1 0 NA
## 6 6 01_1… 1 1 1 0 NA
## 7 7 01_1… 3 1 NA 1 NA
## 8 8 01_1… 1 2 1 0 NA
## 9 9 01_1… 1 2 1 0 NA
## 10 10 01_1… 1 2 2 0 NA
## # ℹ 11,077 more rows
## # ℹ 6 more variables: preImageryTime <dbl>, agency <dbl>, right_wrong <dbl>,
## # phase <chr>, PID <dbl>, block <dbl>
excluded_data <- read_csv("data/included_excluded_data.csv")
## New names:
## Rows: 33 Columns: 4
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (4): included, ...2, ...3, ...4
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...2`
## • `` -> `...3`
## • `` -> `...4`
# Printing excluded participant data
print(excluded_data)
## # A tibble: 33 × 4
## included ...2 ...3 ...4
## <chr> <chr> <chr> <chr>
## 1 priming catch detection suppression break difference red/green
## 2 0.660714286 0.9 0.066666667 0.071428571
## 3 0.867924528 1 0.116666667 0.094339623
## 4 0.583333333 0.8 0.016393443 0.033333333
## 5 0.559322034 1 0.016666667 0.016949153
## 6 0.661016949 1 0.078125 0.084745763
## 7 0.491525424 0.933333333 0.016666667 0.016949153
## 8 0.568965517 1 0.033333333 0.034482759
## 9 0.672413793 1 0.033333333 0
## 10 0.593220339 1 0.016666667 0.016949153
## # ℹ 23 more rows
I needed to use the data from the main experiment, and compare
correct and incorrect priming for green and red. Both follow the same
process, with the only difference being in the mutate()
function where a new column called “primed” includes
case_when() either primed grating and chosen grating both
equal 1, for green, or both equal 2, for red. Whichever colour it was,
it returned a TRUE value indicating successful priming. The explanation
for functions used is embedded in the code below. Also, rather than
using summarise(), I used the reframe()
function as part of the dplyr package. The different
between the two is that reframe() is less stringent and is
useful for returning more or less than 1 row (since I had 2 rows).
green_priming <- data %>%
dplyr::filter(phase == "main", # Include only the main experiment
condition != 3, # Cannot be a catch trial where suppression is meant to break
brokenSuppression == 0, # No broken suppression
block != 0) %>% # Excluding block 0
mutate(primed = case_when(primedGrating == 1 &
chosenGrating == 1 ~ TRUE)) %>%
mutate_at(vars("primed"), # Under the primed label, replace NA values with FALSE
~replace_na(.,FALSE)) %>%
count(trials = n(), primed) %>% # Count the number of trials and the amount of primed TRUE/FALSE
reframe(primed,
percentage = 100*n/trials) # Calculating average primed trials
print(green_priming)
## # A tibble: 2 × 2
## primed percentage
## <lgl> <dbl>
## 1 FALSE 65.2
## 2 TRUE 34.8
colour = c("green") # Adding a column to specify the colour as green
green_priming$colour <- colour
red_priming <- data %>%
dplyr::filter(phase == "main",
condition != 3,
brokenSuppression == 0,
block != 0) %>%
mutate(primed = case_when(primedGrating == 2 & chosenGrating == 2 ~ TRUE)) %>%
mutate_at(vars("primed"),
~replace_na(.,FALSE)) %>%
count(trials = n(), primed) %>%
reframe(primed,
percentage = 100*n/trials)
print(red_priming)
## # A tibble: 2 × 2
## primed percentage
## <lgl> <dbl>
## 1 FALSE 78.8
## 2 TRUE 21.2
colour = c("red") # Adding a column to specify the colour as red
red_priming$colour <- colour
From the raw numbers, it appears that having a green stimulus leads
to better priming. However, I wanted to plot it to make comparisons
easier. To make plotting easier, I combined the two data frames for
green and red priming and called it “colour_priming”. I did this by
using the bind_rows() function. I then changed the “primed”
variable into a factor vector (AKA a categorical variable) using the
as.factor() function. This allowed me to then reorder the
variables in the primed column so that “TRUE” appears first using the
fct_relevel() function. For viewing purposes, I thought it
would be better for successful priming to appear on the left hand side
of the graph.
colour_priming <- bind_rows(green_priming, red_priming)
# Writing a .csv file so that this data can be easily accessible
write_csv(colour_priming, file = "green&red_priming.csv")
colour_priming$primed <- as.factor(colour_priming$primed)
colour_priming$primed <- fct_relevel(colour_priming$primed, c("TRUE", "FALSE"))
I called a new ggplot “green_red_priming” and used local
data from colour_priming. I chose geom_col() since it is
most effective for displaying ranking or comparisons.
plot_green_red_priming <- ggplot(colour_priming) +
geom_col(aes(x = primed, y = percentage, fill = colour),
position = "dodge") + # Preserve vertical position while adjusting horizontal position
scale_fill_manual(values = c("#85E378", "#F95858"), # Appropriate colour for green/red
name = "Colour",
labels = c("Green", "Red")) + # Renaming legend title & labels
scale_y_continuous(
breaks = c(0, 20, 40, 60, 80)) +
scale_x_discrete( # Renaming x-axis variables
labels = c("Successful Priming", "Unsucessful Priming")) +
labs(title = "Priming Using Green Versus Red Stimulus", # Adding plot title
y = "Average Primed (%)") + # Renaming y-axis label
theme_light() +
theme(axis.line = element_line(colour = "black"), # Modifying specific graph elements
axis.ticks = element_line(colour = "black"),
axis.title.x = element_blank(),
axis.text.x = element_text(colour = "black"),
plot.title = element_text(hjust = 0.5),
panel.grid.major.x = element_blank())
plot(plot_green_red_priming)
So, yes, there does seem to be a difference in colour affecting priming.
It appears that the green grating is better than the red grating at
facilitating subliminal priming with an average of 34.8% compared to
21.2%, respectively. There are also more instances of unsuccessful
subliminal priming for red grating compared to green grating with an
average of 78.8% compared to 65.2%, respectively. However, in general,
unsuccessful priming seems to outweigh successful priming regardless of
colour.
I also wondered if there was an association between agency and participant time before the imagery period. Since participants were able to choose when to begin the imagery period following the priming paradigm, I was curious to know whether the belief that you know exactly what to imagine/have control over your choices mean faster reaction time. Moreover, the authors mentioned in the introduction that subliminal priming could lead to faster reaction times for decisions leaning towards the stimulus. So, there could be a potential association between the agency and pre-imagery time for those who had successful subliminal priming. I expected that there would be a shorter pre-imagery time when there is higher agency.
To achieve this, I used data from both the main and training phases of the experiment.
mutate() function created
a new column called “primed” for case_when() priming was
successful (i.e. the colour of primed and chosen grating match).drop_na() function as part of the
tidyr package to remove NA values from a specific column,
in this case the “primed” column.reframe() the data to include agency and
pre-imagery time, then used the arrange() function so that
agency was in ascending order.main_agency_time <- data %>% # Creating a new data frame for the main experiment
dplyr::filter(phase %in% c("main"), # Filtering for the main experiment only
condition != 3,
brokenSuppression == 0,
block != 0) %>%
mutate(primed = case_when(primedGrating == 1 & chosenGrating == 1 ~ TRUE,
primedGrating == 2 & chosenGrating == 2 ~ TRUE)) %>%
drop_na(primed) %>%
reframe(agency, preImageryTime) %>%
arrange(agency)
# Converting agency into a factor vector (AKA a categorical variable)
main_agency_time$agency <- factor(main_agency_time$agency)
print(main_agency_time)
## # A tibble: 1,109 × 2
## agency preImageryTime
## <fct> <dbl>
## 1 1 0.603
## 2 1 0.254
## 3 1 0.169
## 4 1 0.288
## 5 1 0.214
## 6 1 0.153
## 7 1 0.257
## 8 1 0.389
## 9 1 0.352
## 10 1 0.273
## # ℹ 1,099 more rows
# Also calculating the mean pre-imagery time based on grouping the agency rating
main_agency_time_mean <- main_agency_time %>%
group_by(agency) %>%
summarise(preImagery_time_mean = mean(preImageryTime))
print(main_agency_time_mean)
## # A tibble: 4 × 2
## agency preImagery_time_mean
## <fct> <dbl>
## 1 1 1.34
## 2 2 1.60
## 3 3 1.76
## 4 4 0.954
# Writing a .csv file so that this data can be easily accessible
write_csv(main_agency_time, file = "agency&time_main_experiment.csv")
The same steps as above are applied here.
training_agency_time<- data %>% # Creating a new data frame for the training phases
dplyr::filter(phase %in% c("results_train1", "results_train2", "results_train3"),
condition != 3,
brokenSuppression == 0,
block != 0) %>%
mutate(primed = case_when(primedGrating == 1 & chosenGrating == 1 ~ TRUE,
primedGrating == 2 & chosenGrating == 2 ~ TRUE)) %>%
drop_na(primed) %>%
reframe(agency, preImageryTime) %>%
arrange(agency)
training_agency_time$agency <- factor(training_agency_time$agency)
print(training_agency_time)
## # A tibble: 1,001 × 2
## agency preImageryTime
## <fct> <dbl>
## 1 1 0.245
## 2 1 0.137
## 3 1 0.231
## 4 1 1.44
## 5 1 1.42
## 6 1 1.38
## 7 1 3.03
## 8 1 0.791
## 9 1 1.25
## 10 1 0.909
## # ℹ 991 more rows
training_agency_time_mean <- training_agency_time %>%
group_by(agency) %>%
summarise(preImagery_time_mean = mean(preImageryTime))
print(training_agency_time_mean)
## # A tibble: 4 × 2
## agency preImagery_time_mean
## <fct> <dbl>
## 1 1 1.29
## 2 2 0.988
## 3 3 0.839
## 4 4 0.823
write_csv(training_agency_time, file = "agency&time_training_phases.csv")
And yes, in both the main experiment and the training phases, a higher agency rating for participants where subliminal priming was successful does lead to, on average, lower pre-imagery times.
However, I wanted to see the spread of this data through a plot. I
wanted to see how skewed the pre-imagery time was for each agency
rating, so I chose to use a box plot as I could have each box plot
side-by-side for easier comparisons. This website http://www.sthda.com/english/wiki/ggplot2-box-plot-quick-start-guide-r-software-and-data-visualization
was helpful to begin with since I hadn’t used the
geom_boxplot() function before. I created an object called
“plot_main_agency_time” and used ggplot() to specify the
data, and what is plotted on the x and y axis. I then used the
geom_boxplot() to create a box plot, and excluded an
outliers using the outlier.shape function. I added colour
with fill.
plot_main_agency_time <- ggplot(data = main_agency_time, aes(x = agency, y = preImageryTime)) +
geom_boxplot(outlier.shape = NA,
fill = "cornflowerblue") +
coord_cartesian(ylim = c(0, 4.5)) + # Restricting the axis from 0 to 4.5
scale_y_continuous(breaks = c(0, 1, 2 ,3, 4)) + # Adding breaks in y-axis from 0 to 4
theme_classic() +
labs(title = "Pre-Imagery Time & Agency Rating During Main Experiment", # Adding plot title
y = "Pre-Imagery Time (sec)",
x = "Agency") +
theme_light() +
theme(axis.line = element_line(colour = "black"), # Modifying specific graph elements
axis.ticks = element_line(colour = "black"),
axis.text.x = element_text(colour = "black"),
plot.title = element_text(hjust = 0.5),
panel.grid.major.x = element_blank())
plot(plot_main_agency_time)
plot_training_agency_time <- ggplot(data = training_agency_time, aes(x = agency, y = preImageryTime)) +
geom_boxplot(outlier.shape = NA, fill = "cornflowerblue") +
coord_cartesian(ylim = c(0, 4)) +
scale_y_continuous(breaks = c(0, 1, 2, 3, 4)) +
theme_classic() +
labs(title = "Pre-Imagery Time & Agency Rating During Training", # Adding plot title
y = "Pre-Imagery Time (sec)",
x = "Agency") +
theme_light() +
theme(axis.line = element_line(colour = "black"), # Modifying specific graph elements
axis.ticks = element_line(colour = "black"),
axis.text.x = element_text(colour = "black"),
plot.title = element_text(hjust = 0.5),
panel.grid.major.x = element_blank())
plot(plot_training_agency_time)
In both phases for those with successful priming, an agency rating of 4 (indicating high agency) had lower pre-imagery time. The box plot for this rating is more condensed when compared to other agency ratings, suggesting that most participants that feel in control about their decisions do spend less time figuring out what to imagine. For the main experiment, it appears that those with an agency rating of 3 are more variable in how long spent during pre-imagery, despite being feeling pretty in control of their decision. For the training phase, those with a low agency rating spend more time during pre-imagery.
Finally, I was curious as to whether there was a statistical correlation between ratings of agency and vividness. I would assume that a participant who can clearly imagine the grating would believe that they are in the most control of their choice selection.
To do this, I chose to pull data from both the main experiment and
training phases, making sure to filter() out the control
phase since it did not have any imagery period and therefore no
vividness rating.
agency_vividness_main <- data %>%
dplyr::filter(phase == "main",
condition != 3,
brokenSuppression == 0,
block != 0) %>%
reframe(vividness, agency) # Keeping only the vividness and agency ratings
glimpse(agency_vividness_main) # Having a look at the coded outcome
## Rows: 1,979
## Columns: 2
## $ vividness <dbl> 2, 2, 1, 3, 2, 2, 2, 3, 2, 2, 3, 1, 2, 3, 2, 3, 3, 2, 2, 2, …
## $ agency <dbl> 3, 3, 3, 2, 4, 3, 2, 1, 3, 3, 2, 3, 3, 1, 4, 3, 2, 2, 3, 3, …
# Writing a .csv file so that this data can be easily accessible
write_csv(agency_vividness_main, file = "agency&vividness_main_experiment.csv")
agency_vividness_training <- data %>%
dplyr::filter(phase %in% c("results_train1", "results_train2", "results_train3"),
condition != 3,
brokenSuppression == 0,
block != 0) %>%
reframe(vividness, agency)
glimpse(agency_vividness_training)
## Rows: 2,499
## Columns: 2
## $ vividness <dbl> 2, 3, 3, 3, 2, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 3, 2, 2, 3, 4, …
## $ agency <dbl> 4, 3, 3, 2, 4, 4, 1, 3, 1, 2, 3, 3, 3, 2, 4, 3, 2, 2, 3, 3, …
write_csv(agency_vividness_training, file = "agency&vividness_training_phases.csv")
I used a new function called cor() which calculate the
correlation coefficient for the data frame specified. This function also
allowed me to select the correlation method. I chose “spearman” since it
is used to evaluate relationship involving data that has a ranked value
for each variable. Both vividness and agency have the lowest value being
1 and the highest value being 3 and 4, respectively. I used this website
to help with this function https://statsandr.com/blog/correlation-coefficient-and-correlation-test-in-r/.
cor(agency_vividness_main, method = c("spearman"))
## vividness agency
## vividness 1.0000000 0.1939141
## agency 0.1939141 1.0000000
cor(agency_vividness_training, method = c("spearman"))
## vividness agency
## vividness 1.00000000 0.04108986
## agency 0.04108986 1.00000000
In the main experiment, there seems to be a weak positive relationship between agency and vividness at 0.19. In the training phase, there is no real linear relationship between vividness and agency at 0.04.
Before my group and I could begin our reproducibility project, we encountered difficulties in interpreting the csv files. Originally, the data set was provided to us where each participant was split independently per condition and phase (control, main, post training, results training 1, results training 2, results training 3), as well as per block of trials. Furthermore, although we were given an initial README file, the original .csv files did not contain any labels or headings for the columns and was missing information that was needed for interpreting the data. There were a lot of trials, conditions, phases, and IVs to the experimental design, yet these remained unclear upon initial reading. It was confusing to follow without clear column labels and explanations. The screenshots below are examples of the original data file.
We were very fortunate that we received help from our coding tutor who put together a collated version so all the information was readable. However, in the real world, tracking down missing data or following up on the meaning behind data can slow down the process of data reproducibility.Even if somebody were to upload their data onto an open resource like OSF, if the formatting is uninterpretable, it begs the question of whether the data is truly open when compiled in a convoluted and tedious arrangement. The authors of the paper could have included clearer variable names, collated the individual .csv files into one data set, and provided all the necessary data (such as the excluded participants needed for Figure 2). Some resources to improve data management include following the FAIR principles which are supported by the Australian Research Data Commons (ARDC). The includes making data findable, accessible, interoperable and reusable.
Data consistency refers to the completeness and correctness of data stored in a database. There were several instances where the raw data and reported numbers in the manuscript did not match, which were barriers to reproducibility. For example, when evaluating the research manuscript my group and I noticed a discrepancy surrounding the number of trials per participant. As stated repeatedly within the manuscript, there were an expected 8 blocks of 30 trials per participant. However, the data set appeared to have only 5 blocks per participant (a total of 150 per participant, as opposed to the stated 240). Moreover, in the main experiment, for participants 3 and 5, there appeared to be an additional block that was labelled “0”. Despite the report being absent of any miscellaneous or extra blocks (i.e. any explanations of their purpose/identity), there was still data present within these surplus trials. The authors of the paper should ensure they are correctly reporting the numbers as recorded in the data set, and giving a reason in the paper for any extra data in the .csv files. This transparency would improve the legitimacy of the research and provide clarity for those needing to reproduce the data.
Myself and my other group members had no prior experience in coding, so we relied on the knowledge we gained from the early coding modules and lots, and lots of searching on blogs like Stack Overflow. The authors should consider providing a code book, which is a document outlining coding framework for research. This can include providing clear definitions for the dependent variables (for example, what is meant by a catch detection trial) so a viewer does not need to constantly look through the manuscript to search for the definition, and any key programs or functions used in the author’s coding process. During our reproducibility project, we were given an example of code the author’s used for Figure 2. However, it was uninterpretable for us since not only was it completed in MatLab, but there was little documentation as to the process behind their code.
Thanks for reading! I hope you enjoyed my very first coding journey!