Humiston and Wamsley (2019) aims to replicate Hu, Antony, Creery, Vargas, Bodenhausen and Paller's 2015 study. Hu and colleauges found that implicit racial and gender biases could be reduced or even unlearned during sleep by using a procedure called targeted memory reactivation (TMR). This procedure aims to boost memory retention by increasing consolidation during sleep. When applied in this context, TMR is predicted to increase the effect of the counter-bias training and produce a greater reduction in implicit social biases, compared to the use of counter-bias training alone.
Given the potential real-world applications of TMR, Humiston and Wamsley sought to use the same procedure as Hu and colleagues in an attempt to replicate their findings.
In the 2015 study, participants first underwent a computerised counter-bias racial and gender training procedure, where they were instructed to press buttons to affirm certain pairings (for example, female faces with science-related words). This procedure uses uncued stimuli. Participants then began a 90 minute nap, with white noise playing in the background. Once a participant began slow-wave sleep, the experimenter would play either the race or gender cue during sleep to target and strengthen the counter-bias training memories. If participants awoke or were entering another sleep stage, these sound cues would be discontinued. Playing only one of the two sound cues during sleep allows for all participants to experience both the cued and uncued stimuli, however some participants (n = 16) will be played the race cue, and the other participants (n=15) will be played the gender cue.
Participants completed implicit bias assessments both before and after the counter-bias training procedure. They were again assessed immediately post-nap as well as one week after the nap.
Overall, Humiston and Wamsely failed to replicate the 2015 study's results. They found no evidence that TMR makes counter-bias training more effective. Instead, bias was found to be non-significantly greater for cued than for uncued stimuli.
Note, cued stimuli/condition refers to the TMR procedure, while uncued stimuli/condition refers to the counter-bias training.
It seems that the next step in this area of research would be to run another replication of the original study that follows the original study fully to discern whether the current replication is a result of a Type II error or if the original study’s conclusions are a result of a false positive. There were a few procedural differences that could have influenced the differing results: the post-nap verbal inquiry about noise was asked immediately after the nap in this study, and after the post-nap IATs in the original study. Further, the participants in the original study were compensated through course credits, whereas in the current replication, some received course credits and others received a cash payment.
The logic of the rationale that the current study is not a result of a Type II error is compelling as the current study was well-powered (power of 0.9) to detect an effect size (if there was one) as reported by the original study. Further, the cueing effect that was observed is significantly different and in the opposite direction than that reported by the original study. Finally, the size of the non-significant effect found in current study is too small to have been reliably detected by either the original or current study. All this taken together suggests that the current cueing effect observed is significantly different to that reported in the original study, and is unlikely to indicate a noisy estimate of the originally-reported effect.
When I was reading the article, I was excited to learn that there was a high possibility of an effective method to reduce implicit biases. This is such an important area of research, especially in this current society, so if this replication study is able to reproduce the original study's findings this would have great real-world implications to reduce biases among the public. However, once I finished reading the article I was disappointed by the lack of reproducibility and the lack of generalisability of the TMR procedure. I found the method described was convoluted and unclear.
As part of the verification report, our group's goal is to reproduce the 4 tables and 3 figures included in the Humiston and Wamsley's paper.
I obtained the open data from the OSF Repository. First, I have to load the packages:
tidyverse: for dplyr and ggplothaven: to read files from SPSSreadspss: to read files from SPSSggplot2: to create plotsjanitor: to clean up dataplotrix: to calculate standard errorsgt: to create tablesggpubr: compare means for plotsggeasy: for easy ggplot functionspapaja: for apa themejmv: to conduct statistical analysesTo load the packages onto RStudio, I have to use library() and have the package I want contained within the brackets.
library(tidyverse)
library(haven)
library(readspss)
library(ggplot2)
library(janitor)
library(plotrix)
library(gt)
library(ggpubr)
library(ggeasy)
library(papaja)
library(jmv)
The open data is in .sav format from SPSS. For this reason I need to use the read.sav() function from the readspss package. The study's data is read into RStudio by creating a new dataset called replicationdata using <-, which will contain all the data for only participants who have been included in the study. To put it simply, <- allows for the assignment of a value to a name, for example x <- table assigns the value table to the name x. read.sav() is a function from the readspss package that allows for a SPSS sav file to be read into a dataframe (i.e. replicationdata). The relevant SPSS sav file name is contained within quotation marks within the read.sav() function - in this case, our data file is called "Humiston & Wamsley 2019 data.sav".
I am using head() to see how the data is structured and what data is contained within the variables.
replicationdata <- read.sav("Humiston & Wamsley 2019 data.sav")
head(replicationdata)
## ParticipantID exclude cue_presented heard_cue_report heard_cue_exit
## 1 ub3 yes yes no no
## 2 ub6 no yes no no
## 3 ub7 no yes no unsure
## 4 ub8 no yes no no
## 5 ub9 no yes no no
## 6 ub10 yes yes no no
## predicted_cue Cue_condition Counterbias_order Sound_assignment
## 1 no gender cue played racial training first machR and descG
## 2 no race cue played racial training first machR and descG
## 3 no race cue played gender training first machG and descR
## 4 no gender cue played racial training first machR and descG
## 5 suspected gender cue played racial training first machR and descG
## 6 no race cue played gender training first machG and descR
## IAT1_order IAT234_order IAT_order compensation General_1_Age General_1_Sex
## 1 EATF-SATF SATS-EATS ES, SESESE cash 19 Female
## 2 EATF-SATF SATS-EATS ES, SESESE cash 21 Female
## 3 SATF-EATF EATS-SATS SE, ESESES cash 21 Female
## 4 EATF-SATF SATS-EATS ES, SESESE cash 20 Female
## 5 SATF-EATF EATS-SATS SE, ESESES cash 21 Male
## 6 EATF-SATF SATS-EATS ES, SESESE cash 19 Male
## General_1_Race General_1_English General_1_EnglishYrs General_1_Caffeine
## 1 Non-White Yes NA No
## 2 White Yes NA Yes
## 3 White Yes NA Yes
## 4 White Yes NA No
## 5 White Yes NA No
## 6 Non-White Yes NA No
## General_1_CaffCups General_1_CaffHrsAgo General_1_SleepDisor
## 1 <NA> NA No
## 2 1 2 No
## 3 1 3 No
## 4 <NA> NA No
## 5 <NA> NA Yes
## 6 <NA> NA No
## General_1_MentalDiso General_1_Meds General_1_MedList
## 1 No Yes DepoProvera, 200mg, once every 3 months
## 2 No No
## 3 No Yes 20 mg prozac every day
## 4 No No
## 5 No No
## 6 No No
## General_1_University General_1_UniYears Demo_1_Ethnic
## 1 Furman University 2 Not Hispanic or Latino
## 2 Furman University 3 Not Hispanic or Latino
## 3 Furman University 3 Not Hispanic or Latino
## 4 Furman 2 Not Hispanic or Latino
## 5 Furman University 3 Not Hispanic or Latino
## 6 Furman University 1 Not Hispanic or Latino
## Demo_1_Racial Demo_1_Gender Demo_1_NonParticipat
## 1 Black or African American Female <NA>
## 2 White Female <NA>
## 3 White Female <NA>
## 4 White Female <NA>
## 5 White Male <NA>
## 6 Black or African American Male <NA>
## Epworth_1_Read Epworth_1_TV Epworth_1_Public
## 1 moderate chance of dozing high chance of dozing slight chance of dozing
## 2 slight chance of dozing slight chance of dozing slight chance of dozing
## 3 slight chance of dozing slight chance of dozing no chance of dozing
## 4 slight chance of dozing slight chance of dozing no chance of dozing
## 5 slight chance of dozing slight chance of dozing no chance of dozing
## 6 moderate chance of dozing slight chance of dozing moderate chance of dozing
## Epworth_1_Passenger Epworth_1_LyingDown Epworth_1_Talking
## 1 slight chance of dozing high chance of dozing no chance of dozing
## 2 moderate chance of dozing high chance of dozing no chance of dozing
## 3 slight chance of dozing slight chance of dozing no chance of dozing
## 4 slight chance of dozing slight chance of dozing no chance of dozing
## 5 no chance of dozing no chance of dozing no chance of dozing
## 6 high chance of dozing high chance of dozing no chance of dozing
## Epworth_1_Lunch Epworth_1_Traffic Epworth_total
## 1 slight chance of dozing no chance of dozing 19
## 2 no chance of dozing no chance of dozing 16
## 3 no chance of dozing no chance of dozing 12
## 4 slight chance of dozing no chance of dozing 13
## 5 no chance of dozing no chance of dozing 10
## 6 moderate chance of dozing no chance of dozing 21
## AlertTest_1_Concentr_1 AlertTest_1_Refresh_1
## 1 80 90
## 2 80 60
## 3 60 70
## 4 60 60
## 5 70 30
## 6 100 100
## AlertTest_1_Feel
## 1 3 - Awake, but relaxed; responsive but not fully alert
## 2 2 - Functioning at high levels, but not at peak; able to concentrate
## 3 2 - Functioning at high levels, but not at peak; able to concentrate
## 4 3 - Awake, but relaxed; responsive but not fully alert
## 5 3 - Awake, but relaxed; responsive but not fully alert
## 6 1 - Feeling active, vital alert, or wide awake
## AlertTest_2_Concentr_1 AlertTest_2_Refresh_1
## 1 NA NA
## 2 70 70
## 3 60 60
## 4 40 30
## 5 60 30
## 6 80 80
## AlertTest_2_Feel
## 1 <NA>
## 2 3 - Awake, but relaxed; responsive but not fully alert
## 3 2 - Functioning at high levels, but not at peak; able to concentrate
## 4 4 - Somewhat foggy, let down
## 5 3 - Awake, but relaxed; responsive but not fully alert
## 6 2 - Functioning at high levels, but not at peak; able to concentrate
## AlertTest_3_Concentr_1 AlertTest_3_Refresh_1
## 1 90 80
## 2 NA NA
## 3 60 70
## 4 40 50
## 5 80 70
## 6 100 100
## AlertTest_3_Feel
## 1 2 - Functioning at high levels, but not at peak; able to concentrate
## 2 <NA>
## 3 2 - Functioning at high levels, but not at peak; able to concentrate
## 4 3 - Awake, but relaxed; responsive but not fully alert
## 5 2 - Functioning at high levels, but not at peak; able to concentrate
## 6 1 - Feeling active, vital alert, or wide awake
## AlertTest_4_Concentr_1 AlertTest_4_Refresh_1
## 1 80 80
## 2 80 90
## 3 60 50
## 4 40 30
## 5 NA NA
## 6 90 80
## AlertTest_4_Feel
## 1 2 - Functioning at high levels, but not at peak; able to concentrate
## 2 1 - Feeling active, vital alert, or wide awake
## 3 3 - Awake, but relaxed; responsive but not fully alert
## 4 3 - Awake, but relaxed; responsive but not fully alert
## 5 <NA>
## 6 0
## S1_ExitQ_1_sound S1_ExitQ_1_soundaffect S1_ExitQ_2_sound S1_ExitQ_3_sound
## 1 <NA> <NA> <NA> <NA>
## 2 No No No No
## 3 No No No No
## 4 No No No No
## 5 No No No No
## 6 No No No No
## S1_ExitQ_4_sound S1_ExitQ_4_soundaffect S1_ExitQ_5_sound
## 1 <NA> <NA> <NA>
## 2 No No No
## 3 No No No
## 4 No No No
## 5 No No No
## 6 No No No
## S1_ExitQ_5_soundaffect S2_ExitQ_1_sound S2_ExitQ_1_soundaffect
## 1 <NA> <NA> <NA>
## 2 No No No
## 3 No No No
## 4 No No No
## 5 No No No
## 6 No No No
## S2_ExitQ_2_sound S2_ExitQ_3_sound S2_ExitQ_4_sound S2_ExitQ_4_soundaffect
## 1 <NA> <NA> <NA> <NA>
## 2 No No No No
## 3 No No No No
## 4 No No No No
## 5 No No No No
## 6 No No No No
## S2_ExitQ_5_sound S2_ExitQ_5_soundaffect Total_sleep Wake_amount NREM1_amount
## 1 <NA> <NA> 64 26 19
## 2 No No 65 25 10
## 3 No No 66 24 9
## 4 No No 80 10 5
## 5 No No 62 28 5
## 6 No No 84 6 4
## NREM2_amount SWS_amount REM_amount SWSxREM cue_minutes baseIATcued
## 1 29.0 8 9 72 19.0 0.19620397
## 2 20.0 12 23 276 9.5 0.57544182
## 3 52.0 19 0 0 12.0 0.09911241
## 4 15.0 24 17 408 15.5 0.20577365
## 5 15.5 24 17 408 16.0 0.35314196
## 6 51.0 22 6 132 29.0 -0.21579533
## baseIATuncued preIATcued preIATuncued postIATcued postIATuncued weekIATcued
## 1 -0.2653553 -0.34989445 -0.4905672 -0.192035676 -1.04192332 -0.3613492
## 2 0.6095365 0.55905291 0.2146214 0.681910146 0.46728694 0.2037737
## 3 0.6439654 -0.13380639 0.3398503 0.044634805 -0.05686262 0.4587371
## 4 1.5243562 0.51077026 0.3799023 -0.002583615 0.68243589 0.3985947
## 5 0.1310848 -0.02933191 -0.9420955 -0.245989410 0.94970369 0.9234159
## 6 -0.4344782 0.06515661 -0.1185617 -0.958083677 0.28246751 -0.3426443
## weekIATuncued postnap_change_cued postnap_change_uncued week_change_cued
## 1 0.3829139 0.1578588 -0.5513561 0
## 2 0.6827742 0.1228572 0.2526655 0
## 3 -0.0107046 0.1784412 -0.3967129 0
## 4 0.7118729 -0.5133539 0.3025336 0
## 5 0.2021283 -0.2166575 1.8917992 0
## 6 -0.1973225 -1.0232403 0.4010292 0
## week_change_uncued diff_biaschange_cued diff_biaschange_uncued
## 1 0 0.5575531 -0.64826921
## 2 0 0.3716681 -0.07323769
## 3 0 -0.3596247 0.65466998
## 4 0 -0.1928210 0.81248335
## 5 0 -0.5702740 -0.07104354
## 6 0 0.1268490 -0.23715575
## diff_biaschange base_IAT_race base_IAT_gen pre_IAT_race pre_IAT_gen
## 1 1.2058224 -0.26535527 0.1962040 -0.49056718 -0.34989445
## 2 0.4449058 0.57544182 0.6095365 0.55905291 0.21462144
## 3 -1.0142947 0.09911241 0.6439654 -0.13380639 0.33985028
## 4 -1.0053044 1.52435622 0.2057736 0.37990232 0.51077026
## 5 -0.4992304 0.13108478 0.3531420 -0.94209553 -0.02933191
## 6 0.3640047 -0.21579533 -0.4344782 0.06515661 -0.11856168
## post_IAT_race week_IAT_race post_IAT_gen week_IAT_gen filter_$ cues_total
## 1 -1.0419233 -0.1920357 0.382913945 -0.3613492 Not Selected 285.0
## 2 0.6819101 0.2037737 0.467286940 0.6827742 Selected 142.5
## 3 0.0446348 0.4587371 -0.056862624 -0.0107046 Selected 180.0
## 4 0.6824359 0.7118729 -0.002583615 0.3985947 Selected 232.5
## 5 0.9497037 0.2021283 -0.245989410 0.9234159 Selected 240.0
## 6 -0.9580837 0.2824675 -0.342644325 -0.1973225 Not Selected 435.0
Since some participants have been excluded from the study, participants who were not included in the study have to be filtered out. To do this, a new dataframe called cleandata.
Note: %>% is the pipe operator used in RStudio. It allows for a sequence of calculations or code to be easily chained together. It takes the output of one statement and makes it the input of the next statement. In this case, we are taking the dataframe replicationdata and using it as the input for our next line of code - using filter() from the stats package to exclude anything that has "no" as part of its variable. This removes participants who are excluded in the study.
cleandata <- replicationdata %>%
filter(exclude == "no")
To save time and make coding more streamlined, a .csv file for cleandata dataframe is created, so that we can simply load the file that only includes the relevant participants, instead of having to load the raw data then exclude/filter for the relevant participants. To do this, write_csv() is used to create a .csv file. In the brackets, the first argument defines the dataframe to be used (i.e. cleandata) and the second argument define the name for the file, contained within "" (i.e. cleandata.csv`).
I need to first reproduce Table 1 which contains all the participants' demographic information. To create Table 1, I have to first calculate all the values, as none of them were included within the open data file given.
Age average
The variable ageaverage is created using <-, where the sequence of code following <- is the assigned value for the named variable ageaverage. First, the cleandata dataframe is taken then piped (%>%) to be the input for select(). To select my needed variable, I use select() from the dplyr package. It allows for the selection of variables within a dataframe - namely the General_1_Age variable from the cleandata dataframe. I need to calculate the summary statistics for mean and standard deviations (sd) so I am using summarise() from the dplyr package. It allows for the creation of a new dataframe, with a column for each variable and a column for each of the summary statistics that are specified. = assigns a value (right side of symbol) to a name (left side of symbol). In this case, the name ageaverage is assigned the value of the mean of the variable General_1_Age that is calculated using the mean() function in dplyr. sd() calculates the standard deviation of the selected variable within the brackets.
ageaverage <- cleandata %>%
select(General_1_Age) %>%
summarise(ageaverage = mean(General_1_Age),
agesd = sd(General_1_Age))
ESS (Epworth Sleeping Scale) score
Calculating the values for this variable is identical to calculating Participants' ages, as seen above. The variable ESS is created using <-, where the sequence of code following <- is the assigned value for the named variable ESS. First, the cleandata dataframe is taken then piped (%>%) to be the input for select().
select() is from the dplyr package. It allows for the selection of variables within a dataframe - namely the Epworth_total variable from the cleandata dataframesummarise() is from the dplyr package. It allows for the creation of a new dataframe, with a column for each variable and a column for each of the summary statistics that are specified.= assigns a value (right side of symbol) to a name (left side of symbol). In this case, the name ESSaverage is assigned the value of the mean of the variable General_1_Age that is calculated using the mean() function in dplyr.sd() calculates the standard deviation of the selected variable within the brackets.ESS <- cleandata %>%
select(Epworth_total) %>%
summarise(ESSaverage = mean(Epworth_total),
ESSsd = sd(Epworth_total))
SSS (Stanford Sleepiness Scale)
To calculate SSS, the original values and variables from cleandata cannot be used and a new transformed variable must be created. I didn't realise this at first, so when I tried calculating the means and standard deviations in the same way as before, RStudio kept giving me errors.
SSSerror <- cleandata %>%
select(AlertTest_1_Feel) %>%
summarise(SSSaverage = mean(AlertTest_1_Feel),
SSSsd = sd(AlertTest_1_Feel))
## Error: Problem with `summarise()` column `SSSsd`.
## ℹ `SSSsd = sd(AlertTest_1_Feel)`.
## x Calling var(x) on a factor x is defunct.
## Use something like 'all(duplicated(x)[-1L])' to test for a constant vector.
I had a closer look at the data, and realised there were 4 variables with similar names. I tried binding the 4 variables into one and then calculating the means and standard deviations for this binded variable. I did this by first selecting the 4 variables using select() and using drop_na() to remove obversations containing a NA response. I then attempted to compile them into one variable using rbind() and used summarise() like before to calculate for the means and standard deviations for this binded variable. This method did output values, however they did not match to the ones published in the study.
SSStrial <- replicationdata %>%
select(AlertTest_1_Feel,
AlertTest_2_Feel,
AlertTest_3_Feel,
AlertTest_4_Feel) %>%
drop_na() %>%
summarise(SSStrialaverage = mean(rbind(AlertTest_1_Feel, AlertTest_2_Feel, AlertTest_3_Feel, AlertTest_4_Feel)),
SSStrialsd = sd(rbind(AlertTest_1_Feel, AlertTest_2_Feel, AlertTest_3_Feel, AlertTest_4_Feel)))
Thus, I had a closer look at the data again and realised that the data under AlertTest_1_Feel was a mixture of numeric and character vectors. I realised I had to create a new variable that contains only numeric values in order to calculate my summary statistics. I used mutate() to create a new column "SSSvalue" to the existing cleandata dataframe. as.numeric() changes the variable "AlertTest_1_Feel" to numeric values. To do this, I need to specify what numeric values I want it changed to (levels = 1:5) and then indicating in labels = which observations that are needed to be replaced. Thus, the labels are changed to numeric values 1-5.
cleandata <- cleandata %>%
mutate(
SSSvalue = as.numeric(
x = AlertTest_1_Feel,
levels = 1:5,
labels = c("1 - Feeling active, vital alert, or wide awake",
"2 - Functioning at high levels, but not at peak; able to concentrate",
"3 - Awake, but relaxed; responsive but not fully alert",
"4 - Somewhat foggy, let down",
"5 - Foggy; losing interest in remaining awake; slowed down"),
exclude = NA
)
)
Now that I've changed my SSS variable to numeric values, I can now use the same method I previously used to calculate the mean and standard deviation.
SSS <- cleandata %>%
select(SSSvalue) %>%
summarise(SSSaverage = mean(SSSvalue),
SSSsd = sd(SSSvalue))
Baseline Implicit Bias
Baseline Implicit Bias is the combination of the baseline implicit biases scores of both the racial and gender IAT. Thus, to calculate the mean and standard deviation for Baseline Implicit Bias, the method will be similar to what I've done before but I'll be selecting two variables (base_IAT_race and base_IAT_gen) using select() and using rbind() to bind together the race and gender baseline IAT values to calculate the mean and standard deviation for this binded value.
BIB <- cleandata %>%
select(
base_IAT_race,
base_IAT_gen) %>%
summarise(
BIBaverage = mean(rbind(base_IAT_race, base_IAT_gen)),
BIBsd = sd(rbind(base_IAT_race, base_IAT_gen))
)
Prenap Implicit Bias
Similar to Baseline Implicit Bias, Prenap Implicit Bias is the combination of the prenap implicit biases scores of both the racial and gender IAT. Thus, the method will be similar to before, except I'll be using the two variables pre_IAT_race and pre_IAT_gen.
PrenapIB <- cleandata %>%
select(
pre_IAT_race,
pre_IAT_gen) %>%
summarise(
PrenapIBaverage = mean(
rbind(pre_IAT_race, pre_IAT_gen)
),
PrenapIBsd = sd(
rbind(pre_IAT_race, pre_IAT_gen))
)
Postnap implicit bias
Similar to Baseline Implicit Bias, Postnap Implicit Bias is the combination of the postnap implicit biases scores of both the racial and gender IAT. The method will be similar to before, except I'll be using the two variables post_IAT_race and post_IAT_gen.
PostnapIB <- cleandata %>%
select(
post_IAT_race,
post_IAT_gen) %>%
summarise(
PostnapIBaverage = mean(
rbind(
post_IAT_race,
post_IAT_gen
)),
PostnapIBsd = sd(
rbind(
post_IAT_race,
post_IAT_gen
))
)
One-week delay implicit bias
Similar to Baseline Implicit Bias, One-week Delay Implicit Bias is the combination of the implicit biases scores of both the racial and gender IAT after a one-week delay. The method will be similar to before, except I'll be using the two variables week_IAT_race and week_IAT_gen.
OWDIB <- cleandata %>%
select(
week_IAT_race,
week_IAT_gen) %>%
summarise(
OWDIBaverage = mean(
rbind(
week_IAT_race,
week_IAT_gen
)
),
OWDIBsd = sd(
rbind(
week_IAT_race,
week_IAT_gen
)
)
)
Average sex
The table has average sex displayed as a percentage of males. Thus, I'm creating a new variable Male and using select() to choose the variable for sex. tally() counts how many "male" values are observed in the variable General_1_Sex. I then create a new variable Male_percentage that is the percentage of males in the sample by fractioning out of 31. It is fractioned out of 31 as the cleandata data has 31 participants.
Male <- cleandata %>%
select(General_1_Sex) %>%
tally(General_1_Sex == "Male")
Male_percentage <- Male/31
print(Male_percentage)
## n
## 1 0.483871
Average Cue played during nap (% racial cue)
I will use a similar method as above to calculate the percentage of racial cues during the nap. I create a new variable Napcue by first selecting my needed variable using select(). I then use tally() to count how many "race cue played" observations are within the variable "Cue_condition". I use the newly-made Napcue variable to create a percentage by fractioning out of 31. Again, it is fractioned out of 31 as the cleandata data set has 31 participants.
Napcue <- cleandata %>%
select(Cue_condition) %>%
tally(Cue_condition == "race cue played")
racialcue_perentage <- Napcue/31
print(racialcue_perentage)
## n
## 1 0.5483871
Creating Table 1
Now that I've calculated all the variables required, I can now create the table. After doing some research, there were three main packages for creating a table: kableExtra(), magick() and gt().
I decided to go with gt() because it seemed to produce the cleanest table. If it wasn't what I wanted, then I would then try kableExtra(), and then magick().
I wasn't sure how to input the information, but I decided to just create a new dataframe called table1 using the values that were reproduced from earlier. To create a data frame, I used tibble(). Within the brackets, I have to specify what each column will be named and then within c() I specify what data will be contained within each column.
table1 <- tibble(
Characteristics = c("Age (yrs)", "ESS", "SSS", "Baseline implicit bias", "Prenap implicit bias", "Postnap implicit bias", "One-week delay implicit bias", "Sex (% male)", "Cue played during nap (% racial cue)"),
Mean = c(19.5, 15.3, 2.81, 0.557, 0.257, 0.278, 0.399, 0.484, 0.548),
SD = c(1.23, 2.83, 0.749, 0.406, 0.478, 0.459, 0.425, NA, NA)
)
I now create my table using gt(). I first have to specify what dataframe I'm using then using the pipe operator %>% to use that as my input for the gt()package. tab_header is from the gt() package and adds a table header to the gt() table. This function allows for a title and a subtitle so I have to specify that I'm creating a title. The title text is placed in quotation marks.
table1 %>%
gt() %>%
tab_header(
title = "Participant characteristics")
| Participant characteristics | ||
|---|---|---|
| Characteristics | Mean | SD |
| Age (yrs) | 19.500 | 1.230 |
| ESS | 15.300 | 2.830 |
| SSS | 2.810 | 0.749 |
| Baseline implicit bias | 0.557 | 0.406 |
| Prenap implicit bias | 0.257 | 0.478 |
| Postnap implicit bias | 0.278 | 0.459 |
| One-week delay implicit bias | 0.399 | 0.425 |
| Sex (% male) | 0.484 | NA |
| Cue played during nap (% racial cue) | 0.548 | NA |
After some feedback from my learning logs and collaboration with my group members, I was able to create a new dataframe that did not require me to manually input values. The format to do this is dataframe$variable - for example, if I want to take the prenap implicit bias mean I first specify the dataframe it is coming from (i.e. PrenapIB) and then the variable itself (i.e. PrenapIBsd). In the designated format, the prenap implicit bias mean will look like this: PrenapIB$PrenapIBsd. This reduces the possibility of a mistyped numerical value.
table1.2 <- tibble( #3 columns
Characteristics = c("Age (yrs)", "ESS", "SSS", "Baseline implicit bias", "Prenap implicit bias", "Postnap implicit bias", "One-week delay implicit bias", "Sex (% male)", "Cue played during nap (% racial cue)"), #label
Mean = c(ageaverage$ageaverage, ESS$ESSaverage, SSS$SSSaverage, BIB$BIBaverage, PrenapIB$PrenapIBaverage, PostnapIB$PostnapIBaverage, OWDIB$OWDIBaverage, Male_percentage$n, racialcue_perentage$n),
SD = c(ageaverage$agesd, ESS$ESSsd, SSS$SSSsd, BIB$BIBsd, PrenapIB$PrenapIBsd, PostnapIB$PostnapIBsd, OWDIB$OWDIBsd, NA, NA)
)
I also improved the formatting of the Table by using md() from the gt() package. It allows for formatting of text so I can use bolded fonts. To format values to output to 2 decimal places as is seen in the original paper, I used fmt_number() and decimals = from gt() package which controls the formatting of numeric values.
columns = specifies how to format the columns. vars is similar to select() in that it selects the variables that are needed. Combined, these two arguments specify that I want mean and sd as my columns.
To add a source note for the table footer, I used tab_source_note(). To remove the first column label "characteristics", I used cols_label. Characteristics = " " specifies that I want a blank space to replace the "Characteristics" label.
table1.2 %>%
gt() %>%
tab_header(title = md("**Table 1. Participant characteristics.**")) %>%
fmt_number(columns = vars(Mean, SD),
decimals = 2) %>%
tab_source_note(source_note = "Implicit bias values are the average of D600 score for each timepoint") %>%
cols_label(Characteristics = " ")
| Table 1. Participant characteristics. | ||
|---|---|---|
| Mean | SD | |
| Age (yrs) | 19.55 | 1.23 |
| ESS | 15.29 | 2.83 |
| SSS | 2.81 | 0.75 |
| Baseline implicit bias | 0.56 | 0.41 |
| Prenap implicit bias | 0.26 | 0.48 |
| Postnap implicit bias | 0.28 | 0.46 |
| One-week delay implicit bias | 0.40 | 0.43 |
| Sex (% male) | 0.48 | NA |
| Cue played during nap (% racial cue) | 0.55 | NA |
| Implicit bias values are the average of D600 score for each timepoint | ||
Table 2 displays the means and standard deviations for the race and gender implicit bias scores at baseline and prenap. These values need to be calculated as the open data file did not contain this information.
I first attempted to do this by manually and separately calculating each variable. For example, to calculate the mean and SD of the baseline Race bias, I created new data new data BaselineRace from cleandata (the dataset without the excluded participants). I used the select() function to select only the variable "base_IAT_race" from cleandata. Then, using the summarise() function I claculated the mean and SD for the selected variable.
BaselineRace <- cleandata %>%
select(base_IAT_race) %>%
summarise(BaselineRaceMean = mean(base_IAT_race), BaselineRaceSD = sd(base_IAT_race))
I followed that method for all four data variables, but one of my group members, Julia, showed our group a way to clean up the data and get the same output in a more efficient way: she created a new data set called implicitbiaslevels using <- and selected the 4 variables base_IAT_race, base_IAT_gen, pre_IAT_race, pre_IAT_gen from the cleandata dataset using the select() function. Then, she calculated the means and standard deviations for all those variables using the summarise() function. She captured and calculatesd all the means and SDs of each variable that contains "IAT" within its variable name using contains("IAT") and listed the calculated means and sds for each variable under the label "mean" and "sd" using the list() function. Now the data "implicitbiaslevels" data has the means and SDs for each of the variables previously selected, instead of having to separately code for each mean and SD needed and then having to create a new dataframe.
implicitbiaslevels <- cleandata %>%
select(base_IAT_race, base_IAT_gen, pre_IAT_race, pre_IAT_gen)
implicitbiaslevels <- implicitbiaslevels %>%
summarise(across(contains("IAT"), list(mean = mean, sd = sd)))
I wanted to see if this new method of coding that Julia used was able to be outputted into a table without having to create a new data frame. I hoped that this simplified code could be used to quickly and simply recreate the table. I used the same gt() package as was used for Table 1.
As was seen in Table 1, tab_header() is used to specify the table title and tab_source_note() creates a source note in the table footer. fmt_number() is used to format number in gt(). The decimals = argument formats values to 2 decimal places and vars() select which variables are to be used for each column.
However, Table 2 is different to Table 1 in that this table has grouped columns and grouped rows. I attempted to format the columns by using tab_spanner(). The label = argument specifies the spanner column label while the columns = argument specifies what variables are to be placed underneath the spanner column label. In a similar way, I attempted to format the rows by using tab_row_group(). The label = argument specifies what heading I want for this grouped row and rows = specifies how many rows I want in this row group.
However, this method kept giving me errors so I had to try another method.
implicitbiaslevels %>%
gt() %>%
tab_header(
title = "Table 2: Race and Gender Implicit Bias Levels") %>%
tab_source_note("Implicit bias values are the average D600 score for each timepoint") %>%
fmt_number(columns = vars(base_IAT_race_mean, pre_IAT_race_mean, base_IAT_race_sd, pre_IAT_race_sd),
decimals = 2) %>%
tab_spanner(
label = "Baseline",
columns = c(implicitbiaslevels$base_IAT_race_mean, implicitbiaslevels$base_IAT_race_sd)
) %>%
tab_spanner(
label = "Prenap",
columns = c(implicitbiaslevels$pre_IAT_race_mean, implicitbiaslevels$pre_IAT_race_sd)
) %>%
tab_row_group(
label = "Race",
rows = 1
) %>%
tab_row_group(
label = "Gender",
rows = 2
)
## Error: Must subset columns with a valid subscript vector.
## x Can't convert from <double> to <integer> due to loss of precision.
Since I was having so much trouble formatting my columns and values according to the original data frame implicitbiaslevels, a new data frame had to be created that would suit Table 2's formatting. I first tried creating a new data frame like below and using it as the input for the table code used above. The only difference between the table code below and the one above is the data frame input and the variables I chose for my columns and rows.
Changing the dataframe created a table. However, there were several issues with this table - the row values are not in line with row labels and the column labels need to be renamed.
table2 <- tibble(
mean1 = c(0.6186929, 0.4943818),
mean2 = c(0.2023364, 0.3109984),
SD1 = c(0.4423884, 0.36228),
SD2 = c(0.5633004, 0.3748071))
table2 %>%
gt() %>%
tab_header(
title = "Table 2: Race and Gender Implicit Bias Levels") %>%
tab_source_note("Implicit bias values are the average D600 score for each timepoint") %>%
fmt_number(columns = vars(mean1, mean2, SD1, SD2),
decimals = 2) %>%
tab_spanner(
label = "Baseline",
columns = c(mean1, SD1)
) %>%
tab_spanner(
label = "Prenap",
columns = c(mean2, SD2)
) %>%
tab_row_group(
label = "Race",
rows = 1
) %>%
tab_row_group(
label = "Gender",
rows = 2
)
| Table 2: Race and Gender Implicit Bias Levels | |||
|---|---|---|---|
| Baseline | Prenap | ||
| mean1 | SD1 | mean2 | SD2 |
| Gender | |||
| 0.49 | 0.36 | 0.31 | 0.37 |
| Race | |||
| 0.62 | 0.44 | 0.20 | 0.56 |
| Implicit bias values are the average D600 score for each timepoint | |||
After collaboration with the group, we were able to resolve this issue. By adding a column in the dataframe that specifies the row labels, a nicely-formatted table can be created. We created a new dataframe including the column for the row labels and also cleaned up our data by inputting our values in the dataframe$variable format.
table2label <- tibble(
label = c("Race", "Gender"),
mean1 = c(implicitbiaslevels$base_IAT_race_mean, implicitbiaslevels$base_IAT_gen_mean),
mean2 = c(implicitbiaslevels$pre_IAT_race_mean, implicitbiaslevels$pre_IAT_gen_mean),
SD1 = c(implicitbiaslevels$base_IAT_race_sd, implicitbiaslevels$base_IAT_gen_sd),
SD2 = c(implicitbiaslevels$pre_IAT_race_sd, implicitbiaslevels$pre_IAT_gen_sd))
To improve the formatting of Table 2, the tab_row_group() function is removed so that the row labels and values can be in line with each other. Further, to change the column labels we used cols_label() and to use the bold font on our column labels and table title we used md() from the gt() package.
table2label %>%
gt() %>%
tab_header(
title = md("**Table 2. Race and gender implicit bias levels**")) %>%
tab_source_note("Implicit bias values are the average D600 score for each timepoint") %>%
fmt_number(columns = vars(mean1, mean2, SD1, SD2),
decimals = 2) %>%
tab_spanner(
label = md("**Baseline**"),
columns = c(mean1, SD1)
) %>%
tab_spanner(
label = md("**Prenap**"),
columns = c(mean2, SD2)
) %>%
cols_label(mean1 = "Mean", mean2 = "Mean", SD1 = "SD", SD2 = "SD", label = " ")
| Table 2. Race and gender implicit bias levels | ||||
|---|---|---|---|---|
| Baseline | Prenap | |||
| Mean | SD | Mean | SD | |
| Race | 0.62 | 0.44 | 0.20 | 0.56 |
| Gender | 0.49 | 0.36 | 0.31 | 0.37 |
| Implicit bias values are the average D600 score for each timepoint | ||||
After finally understanding the basics of the gt() package, reproducing Table 3 and 4 was relatively simple.
Table 3 contains the means and standard deviations of implicit bias levels by cued and uncued conditions, over 4 timepoints. The calculations for the summary statistics and the formatting for the table is very similar to Table 2.
Calculating the means and standard deviations
I first attempted calculating the means and standard deviations for the cued IATs only. The process to calculate this is similar to how I calculated the implicit bias levels for Table 2. I created a new variable cued, using select() to select the 4 cued data variables. To calculate the means and standard deviations of all of these variables, I used a combination of summarise() and across(contains()). This captures and calculates the means and standard deviations of each variable that contains "IAT" within its variable name. list() calculates the means and standard deviation for each variable under the label "mean" and "sd".
Now the data "cued" has the means and standard deviations for each of the variables previously selected, instead of having to separately code for each mean and SD needed and then having to create a new dataframe.
cued <- cleandata %>%
select(baseIATcued, preIATcued, postIATcued, weekIATcued)
cued <- cued %>%
summarise(across(contains("IAT"), list(mean = mean, sd = sd)))
After checking that this simplified code worked, I decided to tried calculating all the cued and uncued conditions together. I adjusted the above code by creating a new variable cued_uncued and selecting all 8 variables - both the cued and uncued variables at all 4 timepoints.
cued_uncued <- cleandata %>%
select(baseIATcued, preIATcued, postIATcued, weekIATcued,
baseIATuncued, preIATuncued, postIATuncued, weekIATuncued)
cued_uncued <- cued_uncued %>%
summarise(across(contains("IAT"), list(mean = mean, sd = sd)))
Creating the data frame for the table
After calculating all the means and standard deviations for the cued and uncued conditions, I had to create a new dataframe with the calculated values from "cued_uncued":
table3 <- tibble(
label = c("Baseline", "Prenap", "Postnap", "1-week delay"),
mean1 = c(0.518, 0.211, 0.307, 0.4),
mean2 = c(0.595, 0.302, 0.249, 0.399),
SD1 = c(0.363, 0.514, 0.445, 0.387),
SD2 = c(0.447, 0.442, 0.478, 0.467)
)
Formatting the table
Using gt(), Table 3 is created.
table3 %>%
gt() %>%
tab_header(
title = md("**Table 3. Implicit bias levels by condition**")) %>%
tab_source_note("Implicit bias values are the average D600 score for each timepoint") %>%
fmt_number(columns = vars(mean1, mean2, SD1, SD2), decimals = 2) %>%
tab_spanner(
label = "Cued",
columns = c(mean1, SD1)
) %>%
tab_spanner(
label = "Uncued",
columns = c(mean2, SD2)
) %>%
cols_label(mean1 = "Mean", mean2 = "Mean", SD1 = "SD", SD2 = "SD", label = " ")
| Table 3. Implicit bias levels by condition | ||||
|---|---|---|---|---|
| Cued | Uncued | |||
| Mean | SD | Mean | SD | |
| Baseline | 0.52 | 0.36 | 0.59 | 0.45 |
| Prenap | 0.21 | 0.51 | 0.30 | 0.44 |
| Postnap | 0.31 | 0.45 | 0.25 | 0.48 |
| 1-week delay | 0.40 | 0.39 | 0.40 | 0.47 |
| Implicit bias values are the average D600 score for each timepoint | ||||
Reproducing Table 4 required more trial-and-error and research than I had originally anticipated. Table 4 requires us to calculate the number of responses to the postnap verbal inquiry and to the exit questionnaire. A response was not recorded for 1 participant as they did not hear the sound cue on the final exit questionnaire.
Calculating the needed values
I first calculated the number of people who said "no"" and "maybe" in the verbal report using:
select() to select the variable from my "cleandata" dataset, andtally() to count the number of "no" / "maybe" responses in the selected variable.glimpse() to check whether the calculated value matches the value seen in the original paperreport_no <- cleandata %>%
select(heard_cue_report) %>%
tally(heard_cue_report == "no")
report_maybe <- cleandata %>%
select(heard_cue_report) %>%
tally(heard_cue_report == "maybe, unsure, unclear")
glimpse(report_maybe)
## Rows: 1
## Columns: 1
## $ n <int> 2
glimpse(report_no)
## Rows: 1
## Columns: 1
## $ n <int> 28
I then tried to add the two calculated variables together using sum() but it kept giving me the error: "Error: Must subset columns with a valid subscript vector. x Subscript has the wrong type data.frame<n:integer>. ℹ It must be numeric or character.".
report <- cleandata %>%
select(report_no, report_maybe) %>%
sum(c(report_no, report_maybe))
## Error: Must subset columns with a valid subscript vector.
## x Subscript has the wrong type `data.frame<n:integer>`.
## ℹ It must be numeric or character.
Because of this error, I thought the variables were integers so I tried to convert them to numeric using as.numeric().
report_no <- as.numeric(as.integer(report_no))
report_maybe <- as.numeric(as.integer(report_maybe))
I used typeof() to determine what category the above variables are. It outputted as "double" and after some researching on Google, I realised that a "double" output means it is a numeric value.
typeof(report_maybe)
## [1] "double"
typeof(report_no)
## [1] "double"
I tried using the unlist() function because I read online that the error might be because the code is only temporary, and that the unlist() function makes the code permanent. Again, I used typeof() to check if it's still returning as a numeric function.
report_maybe <- as.numeric(unlist(report_maybe))
report_no <- as.numeric(unlist(report_no))
typeof(report_maybe)
## [1] "double"
Now, I tried to use sum() again to calculate the total, hoping the error doesn't pop up, but now it's giving a different error:
report_total <- cleandata %>%
select(report_no, report_maybe) %>%
sum(report_no, report_maybe)
## Error in FUN(X[[i]], ...): only defined on a data frame with all numeric-alike variables
I then decided to just manually add the data using c() which combines the arguments within the brackets.
report_total <- c(2, 28)
sum(report_total)
## [1] 30
Now, I had to calculate the data for the exit questionnaire. I calculated the total number of responses for the exit questionnaire by manually adding the data using c().
exit_total <- c(0, 29)
sum(exit_total)
## [1] 29
When calculating the responses for the exit questionnaire, I finally realised that I had to exclude a participant because of their incomplete response because I kept getting an incorrect value of 29, instead of 28.
exit_no <- cleandata %>%
select(heard_cue_exit) %>%
tally(heard_cue_exit == "no")
exit_maybe <- cleandata %>%
select(heard_cue_exit) %>%
tally(heard_cue_exit == "unsure")
glimpse(exit_no)
## Rows: 1
## Columns: 1
## $ n <int> 29
glimpse(exit_maybe)
## Rows: 1
## Columns: 1
## $ n <int> 2
Thus, I started over from the beginning and created a new dataframe without the participant who had incomplete response. I used na.omit() to remove the row with the incomplete response.
soundcuereporting <- cleandata %>%
select(heard_cue_exit, heard_cue_report) %>%
na.omit()
I then recalculated Table 4's data without the omitted participant. This gave me the correct number of responses.
report_no <- soundcuereporting %>%
select(heard_cue_report) %>%
tally(heard_cue_report == "no")
report_maybe <- soundcuereporting %>%
select(heard_cue_report) %>%
tally(heard_cue_report == "maybe, unsure, unclear")
glimpse(report_maybe)
## Rows: 1
## Columns: 1
## $ n <int> 2
glimpse(report_no)
## Rows: 1
## Columns: 1
## $ n <int> 28
exit_no <- soundcuereporting %>%
select(heard_cue_exit) %>%
tally(heard_cue_exit == "no")
exit_maybe <- soundcuereporting %>%
select(heard_cue_exit) %>%
tally(heard_cue_exit == "unsure")
glimpse(exit_no)
## Rows: 1
## Columns: 1
## $ n <int> 28
glimpse(exit_maybe)
## Rows: 1
## Columns: 1
## $ n <int> 2
I then realised that the values I had calculated so far were actually for the "Total" columns and not for the cells in the middle of the table. Thus, this code below calculates for the cells in the middle of the table. I used filter() to consider participants with the specified responses in both exit questionnaire and verbal report. To calculate the number of variables left after filtering, I used summarise(n=n()). This provides the number of responses in the given variable. glimpse() allowed me to check if these values are correct.
exitno_reportno <- soundcuereporting %>%
filter(heard_cue_exit == "no", heard_cue_report == "no") %>%
summarise(n=n())
exitno_reportmaybe <- soundcuereporting %>%
filter(heard_cue_exit == "no", heard_cue_report == "maybe, unsure, unclear") %>%
summarise(n=n())
exitmaybe_reportno <- soundcuereporting %>%
filter(heard_cue_exit == "unsure", heard_cue_report == "no") %>%
summarise(n=n())
exitmaybe_reportmaybe <- soundcuereporting %>%
filter(heard_cue_exit == "unsure", heard_cue_report == "maybe, unsure, unclear") %>%
summarise(n=n())
glimpse(exitno_reportno)
## Rows: 1
## Columns: 1
## $ n <int> 26
glimpse(exitno_reportmaybe)
## Rows: 1
## Columns: 1
## $ n <int> 2
glimpse(exitmaybe_reportno)
## Rows: 1
## Columns: 1
## $ n <int> 2
glimpse(exitmaybe_reportmaybe)
## Rows: 1
## Columns: 1
## $ n <int> 0
I also calculated the total participants overall using summarise(n=n()).
soundcuereporting_total <- soundcuereporting %>%
summarise(n=n())
Creating the dataframe for the table
Now that all the variables have been calculated, I put the variables into an appropriately formatted dataframe:
table4 <- tibble(
label = c("No", "Maybe", "Total"),
no = c(26, 2, 28),
maybe = c(2, 0, 2),
total = c(28, 2, 30)
)
** Creating the table **
I created a table using the gt() package, using the same functions as I used for Tables 1-3 but the formatting did not output right.The row order did not match, and there was a double-line located in the middle of the table.
table4 %>%
gt() %>%
tab_header(
title = "Table 4. Sound cue reporting. ") %>%
tab_source_note("Participants’ responses to the postnap verbal inquiry and to the exit questionnaire. A response was not recorded for n = 1 participant; this participant reported that they did not hear the sound cue on the final exit questionnaire.") %>%
fmt_number(columns = vars(no, maybe, total)) %>%
tab_spanner(
label = "Reported Hearing Cue on Verbal Report?",
columns = c(no, maybe, total)
) %>%
tab_row_group(
label = "Reported Hearing Cue on Exit Questionnaire?",
rows = 3
)
| Table 4. Sound cue reporting. | |||
|---|---|---|---|
| label | Reported Hearing Cue on Verbal Report? | ||
| no | maybe | total | |
| Reported Hearing Cue on Exit Questionnaire? | |||
| Total | 28.00 | 2.00 | 30.00 |
| No | 26.00 | 2.00 | 28.00 |
| Maybe | 2.00 | 0.00 | 2.00 |
| Participants’ responses to the postnap verbal inquiry and to the exit questionnaire. A response was not recorded for n = 1 participant; this participant reported that they did not hear the sound cue on the final exit questionnaire. | |||
I tried to change up the formatting by adding column labels using cols_label() in order to change the column labels, but it didn't seem to change anything.
table4 %>%
gt() %>%
tab_header(
title = "Table 4. Sound cue reporting. ") %>%
tab_source_note("Participants’ responses to the postnap verbal inquiry and to the exit questionnaire. A response was not recorded for n = 1 participant; this participant reported that they did not hear the sound cue on the final exit questionnaire.") %>%
fmt_number(columns = vars(no, maybe, total)) %>%
tab_spanner(
label = "Reported Hearing Cue on Verbal Report?", #Spanner column label
columns = c(no, maybe, total)
) %>%
tab_row_group(
label = "Reported Hearing Cue on Exit Questionnaire?",
rows = 3) %>%
cols_label(no = "No", maybe = "Maybe", total = "Total", label = " ")
| Table 4. Sound cue reporting. | |||
|---|---|---|---|
| Reported Hearing Cue on Verbal Report? | |||
| No | Maybe | Total | |
| Reported Hearing Cue on Exit Questionnaire? | |||
| Total | 28.00 | 2.00 | 30.00 |
| No | 26.00 | 2.00 | 28.00 |
| Maybe | 2.00 | 0.00 | 2.00 |
| Participants’ responses to the postnap verbal inquiry and to the exit questionnaire. A response was not recorded for n = 1 participant; this participant reported that they did not hear the sound cue on the final exit questionnaire. | |||
I thought it might be an issue with the dataframe used, so I decided to make a new dataframe. I modified the dataframe by adding a column for the row labels.
table4_ <- tibble(
label = c("label", "No", "Maybe", "Total"),
no = c("No", 26, 2, 28),
maybe = c("Maybe", 2, 0, 2),
total = c("Total", 28, 2, 30)
)
I used the same code to create the table but it kept giving me an error: "The fmt_number() function can only be used on columns with numeric data". This is very likely due to the added characters I added in the new dataframe.
table4_ %>%
gt() %>%
tab_header(
title = "Table 4. Sound cue reporting. ") %>%
tab_source_note("Participants’ responses to the postnap verbal inquiry and to the exit questionnaire. A response was not recorded for n = 1 participant; this participant reported that they did not hear the sound cue on the final exit questionnaire.") %>%
fmt_number(columns = vars(no, maybe, total)) %>%
tab_spanner(
label = "Reported Hearing Cue on Verbal Report?",
columns = c(no, maybe, total)
) %>%
tab_row_group(
label = "Reported Hearing Cue on Exit Questionnaire?",
rows = 3
) %>%
cols_label(no = "No", maybe = "Maybe", total = "Total", label = " ")
## Error: The `fmt_number()` function can only be used on `columns` with numeric data
I decided to go back to the original dataframe, but changed the tab_row_group() function to tab_stubhead(), hoping that it would change the weird ordering of the rows. It did change it and now the table looks similar to the original paper, but the tab stubhead label for "Reported Hearing Cue on Exit Questionnaire?" is not appearing.
table4 %>%
gt() %>%
tab_header(
title = "Table 4. Sound cue reporting. ") %>%
tab_source_note("Participants’ responses to the postnap verbal inquiry and to the exit questionnaire. A response was not recorded for n = 1 participant; this participant reported that they did not hear the sound cue on the final exit questionnaire.") %>%
fmt_number(columns = c(no, maybe, total)) %>%
tab_spanner(
label = "Reported Hearing Cue on Verbal Report?", #Spanner column label
columns = c(no, maybe, total)
) %>%
tab_stubhead(
label = "Reported Hearing Cue on Exit Questionnaire?"
) %>%
cols_label(no = "No", maybe = "Maybe", total = "Total", label = " ")
| Table 4. Sound cue reporting. | |||
|---|---|---|---|
| Reported Hearing Cue on Verbal Report? | |||
| No | Maybe | Total | |
| No | 26.00 | 2.00 | 28.00 |
| Maybe | 2.00 | 0.00 | 2.00 |
| Total | 28.00 | 2.00 | 30.00 |
| Participants’ responses to the postnap verbal inquiry and to the exit questionnaire. A response was not recorded for n = 1 participant; this participant reported that they did not hear the sound cue on the final exit questionnaire. | |||
By adding rowname_col = "label", I was able to add the tab stubhead label.
table4 %>%
gt(rowname_col = "label") %>%
tab_header(
title = "Table 4. Sound cue reporting. ") %>%
tab_source_note("Participants’ responses to the postnap verbal inquiry and to the exit questionnaire. A response was not recorded for n = 1 participant; this participant reported that they did not hear the sound cue on the final exit questionnaire.") %>%
fmt_number(columns = c(no, maybe, total)) %>%
tab_spanner(
label = "Reported Hearing Cue on Verbal Report?",
columns = c(no, maybe, total)
) %>%
tab_stubhead(label = "Reported Hearing Cue on Exit Questionnaire?"
) %>%
cols_label(no = "No", maybe = "Maybe", total = "Total", label = " ")
| Table 4. Sound cue reporting. | |||
|---|---|---|---|
| Reported Hearing Cue on Exit Questionnaire? | Reported Hearing Cue on Verbal Report? | ||
| No | Maybe | Total | |
| No | 26.00 | 2.00 | 28.00 |
| Maybe | 2.00 | 0.00 | 2.00 |
| Total | 28.00 | 2.00 | 30.00 |
| Participants’ responses to the postnap verbal inquiry and to the exit questionnaire. A response was not recorded for n = 1 participant; this participant reported that they did not hear the sound cue on the final exit questionnaire. | |||
I then used the md() function to change the aesthetics of the headings (e.g. bolded, italics). I also added decimals = 0 to the fmt_number column so that there were no decimals showing.
table4 %>%
gt(rowname_col = "label") %>%
tab_header(
title = md("**Table 4. Sound cue reporting.**")) %>%
tab_source_note("Participants’ responses to the postnap verbal inquiry and to the exit questionnaire. A response was not recorded for n = 1 participant; this participant reported that they did not hear the sound cue on the final exit questionnaire.") %>%
fmt_number(columns = c(no, maybe, total), decimals = 0) %>%
tab_spanner(
label = md("**Reported Hearing Cue on Verbal Report?**"),
columns = c(no, maybe, total)
) %>%
tab_stubhead(label = md("**Reported Hearing Cue on Exit Questionnaire?**")
) %>%
cols_label(no = md("**No**"), maybe = md("**Maybe**"), total = md("***Total***"), label = " ")
| Table 4. Sound cue reporting. | |||
|---|---|---|---|
| Reported Hearing Cue on Exit Questionnaire? | Reported Hearing Cue on Verbal Report? | ||
| No | Maybe | Total | |
| No | 26 | 2 | 28 |
| Maybe | 2 | 0 | 2 |
| Total | 28 | 2 | 30 |
| Participants’ responses to the postnap verbal inquiry and to the exit questionnaire. A response was not recorded for n = 1 participant; this participant reported that they did not hear the sound cue on the final exit questionnaire. | |||
Our group decided to reproduce Figure 4 first as it seemed simpler than the other two figures. Figure 4 illustrates changes in implicit bias levels following the TMR procedure at the immediate and one-week delay tests.
Calculating the needed values
The first pair of column bars in the figure encapsulate the immediate change in implicit bias levels following the TMR procedure. This is calculated as the difference between prenap implicit bias and postnap implicit bias. The second pair of column bars encapsulates the change in implicit bias levels, the week following the TMR procedure. This is calculated as the difference between prenap and implicit bias a week later.
These values can be taken from Table 3, which contains the means and standard deviations of implicit bias levels by cued and uncued conditions, over 4 timepoints. Using values from Table 3, we first calculated the changesin implicit bias levels at the immediate and one-week delay tests.
pre_post_change_cued = 0.31 - 0.21
pre_post_change_uncued = 0.25 - 0.3
pre_week_change_cued = 0.40 - 0.21
pre_week_change_uncued = 0.40 - 0.30
We first tried manually creating a new dataframe fig4 using the tibble() function. 3 columns are created: "change_from_pre_to", "cued" and "uncued". "change_from_pre_to" has the labels for immediate and week, while the cued and uncued columns/values are taken from the chunk above. fig4 tibble organises the calculated values from above into the cued and uncued conditions, for the immediate (pre- to post-nap) changes and the one-week-delay changes.
fig4 <- tibble(
change_from_pre_to = c("immediate","week"),
cued = c(0.1, 0.19),
uncued = c(-0.05, 0.1))
However, when we thought about using this tibble to create a plot, we realised that the data tibble was not formatted correctly - when creating a data set, you indicate what “variables” go into each group. This also determines what will be the axes of the graph. Since our fig4 tibble was not formatted we created a new dataframe. The time1 dataframe uses the same values as above, but is formatted differently.
time1 includes the two time conditions “immediate” (pre- to post-nap change)and “week” (change from pre-nap to one-week later). rep() replicates/repeats the values within the bracket. The value "2" indicates how many columns is needed for each time point i.e. two (which will be for the cued and uncued conditions).
bias_change is where the relevant values that were calculated previously are entered.
time1 <- c(rep("immediate",2),rep("week",2))
condition <-rep(c("cued","uncued"),2)
bias_change <- c(0.10, -0.05, 0.19, 0.10)
data = data.frame(time1, condition, bias_change)
head(data)
## time1 condition bias_change
## 1 immediate cued 0.10
## 2 immediate uncued -0.05
## 3 week cued 0.19
## 4 week uncued 0.10
Plotting the graph
Now that we've formatted the data into a proper format, it's time to plot the graph. We used the ggplot() package to graph our data.
ggplot() is to indicate we want to graph our data
aes() function for the aesthetics/formatting of the graphfill = indicates that different colours are to be allocated for each conditiongeom_bar() adds a feature to the graph that makes the heights of the bar proportional to the number of cases in each group
position = "dodge" ensure that the separate conditions are not stacked but are instead side by sidestat = "identity" is a statement that needs to include when using geom_bar() as this function reads data in a way that is incompatible with the ‘y’ aesthetic. Normally geom_bar() formats the heights of the bars such that it formats the height to the number of observations in the group, not the value we assign to it. Therefore we need to add stat = "identity" to indicate to R that we want the bar heights to be the values we provide, rather than to the default setting (number of observations).alpha = determines the opacity of a geom, with lower values indicating more transparencyggplot(data = data, aes(
x = time1,
y = bias_change,
fill = condition
)) +
geom_bar(
position = "dodge",
stat = "identity",
alpha=0.7)
Calculating the error bar
Now that the graph has been created, we need to create the error bars. We first assumed that the error bars represent standard deviations, similar to what was calculated in Table 3. However, after attempting to produce the plot with the error labels, we discovered that the error bars weren't matching the error bars seen in the replication study. After some clarification, we realised that the error bars represent one standard error for each change-in-bias group. Thus, we need to calculate the standard error.
I first tried creating a new variable fig4_stderror using the std.error() function from the plotrix package. std.error() calculates the standard errors of the dataframe bias_change. fig4_stderror = data.frame translates this into a dataframe named fig4_stderror. However, it only came up with one standard error value.
fig4_stderror <- std.error(bias_change)
fig4_stderror = data.frame(fig4_stderror)
head(fig4_stderror)
## fig4_stderror
## 1 0.04974937
Thus, I tried doing it another way: I created a new variable biaschangeconditions using select() to select the variables I wanted from the cleandata dataset. Then, using mutate() on the variables I had just selected, I created 4 new variables/columns in the data set (calculated for both cued and uncued conditions, as the differences between prenap implicit bias and postnap implicit bias, as well as the differences between prenap and one-week-delay implicit bias).
biaschangeconditions <- cleandata %>%
select(postIATcued, preIATcued, postIATuncued, preIATuncued, weekIATcued, weekIATuncued)
biaschange <- biaschangeconditions %>%
mutate(immed_cued = postIATcued - preIATcued,
immed_uncued = postIATuncued - preIATuncued,
week_cued = weekIATcued - preIATcued,
week_uncued = weekIATuncued - preIATuncued)
Now, I have to find the means for each of these variables, and then the standard error of each mean. I used summarise_all() from the dplyr package which selects every variable in the dataframe.
biaschange_mean_se <- biaschange %>%
summarise_all(list(mean = mean, se = std.error))
Since I've calculated the standard errors, I need to update the dataframe with the calculated values.
time1 which organises our data into two groups of columns: one for the immediate change in bias (difference between pre-nap bias and post-nap bias), and one for the delayed change in bias (difference between pre-nap bias and bias one week after the experiment).condition allows us to create columns within the aforementioned groups: one for the participants who received the cue, and one for the average of the participants who did not receive the corresponding cue.bias_change1 is the part of the dataframe where the values go according to the aforementioned groups. So, the first value corresponds with the cued immediate change, the second value is the uncued immediate change, and so on.stderror does the same but instead of using the mean values, it uses the SE values.time1 <- c(rep("Immediate",2),rep("Week",2))
Condition <-rep(c("Cued","Uncued"),2)
bias_change1 <- c(biaschange_mean_se$immed_cued_mean,
biaschange_mean_se$immed_uncued_mean,
biaschange_mean_se$week_cued_mean,
biaschange_mean_se$week_uncued_mean)
stderror <- c(biaschange_mean_se$immed_cued_se,
biaschange_mean_se$immed_uncued_se,
biaschange_mean_se$week_cued_se,
biaschange_mean_se$week_uncued_se)
data = data.frame(time1, Condition, bias_change1, stderror)
head(data)
## time1 Condition bias_change1 stderror
## 1 Immediate Cued 0.09593775 0.09759788
## 2 Immediate Uncued -0.05390545 0.10297893
## 3 Week Cued 0.18906891 0.11593440
## 4 Week Uncued 0.09643346 0.09008655
Plotting the graph
Now, I plot the graph again with the updated standard errors and some new aesthetics/formatting. The extra functions I used in this are:
geom_errorbar() from ggplot2 adds the error bars, defined by x, ymin and ymax.
bias_change) minus and plus the standard error, respectively.width = defines how thick the error bars are.colour = specifies what colour the error bar will be.position = position_dodge() ensures that the error bars are sitting side by side, instead of on top of each other.ylim() defines the limits of the plot's y-axis.labs() from gglot2 allows for modification of aesthetics. x = "" ensures there is no label on x axis, while a label is specified for the y-axis and a caption.scale_fill_grey() and theme_classic() turn the default colourful plot into a greyscale.ggplot(data = data, aes(
x = time1,
y = bias_change,
fill = condition
)) +
geom_bar(position = "dodge",
stat = "identity",
alpha = 0.7) +
geom_errorbar(aes(
x= time1,
ymin=bias_change-stderror,
ymax=bias_change+stderror),
width=0.4,
colour="black",
alpha= 0.9,
position = position_dodge(0.9)) +
ylim(-0.2, 0.4) +
labs(x = "",
y = "Bias Change",
title = "Fig 4. Change in implicit bias levels at the immediate and one-week delay tests.") +
theme_classic() +
scale_fill_grey()
Figure 5 is a scatterplot that illustrates if there is an association between differential bias change and SWS x REM sleep duration.
To reproduce Figure 5, two sets of values are needed. For the x-axis, the values for minutes in SWS x minutes in REM is needed and for the y-axis, differential bias change is needed. The x-axis values has already been provided in the open data, labelled under the variable SWSxREM.
For the y-axis, the paper defines differential bias change as “baseline minus delayed score for uncued bias subtracted from the baseline minus delayed score for cued bias”.
So the equation would look similar to this:
differential bias change = (baseline_cued - delayed_cued) - (baseline_uncued - delayed_uncued)
This equation must be applied to each participant's score. Thus,mutate() would be the best function to do this. mutate() is taken from the dplyr package and allows for the creation, modification and deletion of columns. It allows for new variables to be added, while keeping existing ones.
The first part of the equation can be grouped into 2 new variables: cued_differential and uncued_differential. cued_differential will be defined as baseIATcued - weekIAT cued. Likewise, uncued_differential will be defined as baseIATuncued - weekIATuncued. Thus, the equation to create the variable diff_bias_change can alse be defined as:
diff_bias_change = cued_differential - uncued_differential
differential <- cleandata %>%
select(ParticipantID, baseIATcued, weekIATcued, baseIATuncued, weekIATuncued, SWSxREM) %>%
mutate(cued_differential = baseIATcued - weekIATcued,
uncued_differential = baseIATuncued - weekIATuncued,
diff_bias_change = cued_differential - uncued_differential)
head(differential)
## ParticipantID baseIATcued weekIATcued baseIATuncued weekIATuncued SWSxREM
## 1 ub6 0.57544182 0.20377367 0.60953653 0.6827742 276
## 2 ub7 0.09911241 0.45873715 0.64396538 -0.0107046 0
## 3 ub8 0.20577365 0.39859469 1.52435622 0.7118729 408
## 4 ub9 0.35314196 0.92341592 0.13108478 0.2021283 408
## 5 ub11 0.57200207 -0.01869151 0.04879409 0.1307118 32
## 6 ub13 0.31025514 0.56073473 0.90121486 1.1162984 648
## cued_differential uncued_differential diff_bias_change
## 1 0.3716681 -0.07323769 0.44490584
## 2 -0.3596247 0.65466998 -1.01429471
## 3 -0.1928210 0.81248335 -1.00530440
## 4 -0.5702740 -0.07104354 -0.49923043
## 5 0.5906936 -0.08191775 0.67261133
## 6 -0.2504796 -0.21508358 -0.03539601
Plotting the graph
Now that all the needed values has been gathered, it's time to plot the graph. ggplot() package is used to graph the data.
ggplot() initiates that I want to create a ggplot object
data = indicate what data is to be used, followed by the aes() function for the aesthetics/formatting of the graph.aes() function indicates what variables are to be used for the x- and y-axis.geom_point() adds a feature to the graph that allows for scattorplots.geom_smooth() adds a feature to the graph that allows for a regression lineThe overall output was quite strange - the graph contains confidence interval shading, the regression line is not straight, and the x-axis does not begin at the value 0.
ggplot(data = differential, aes(
x = SWSxREM,
y = diff_bias_change
)) +
geom_point() +
geom_smooth()
To fix this, I did this:
geom_smooth() showed that the confidence interval has been set as se = TRUE by default. Thus, to get rid of the confidence interval shading, the confidence interval must be set as se = FALSE.method = 'loess' by default. loess uses the smoothing method, based around local fitting. method = can alse be set as "lm" (linear model) or glm" (generalised linear model). To make the regression line straight, method must be set as "lm" (linear model).xlim() and ylim can be used, respectively. Within the brackets, two numeric values must be determined, where the first (left) value specifies the lower limit and the second (right) value specifies the upper limit of each axis.This looks almost correct, except the x- and y-axes are still not starting at the value 0.
ggplot(data = differential, aes(
x = SWSxREM,
y = diff_bias_change
)) +
geom_point() +
geom_smooth(se = FALSE,
method = lm) +
xlim(0, 850) +
ylim(-2, 1.5)
With further research on google, I realised that xlim() and ylim() are used for simple manipulations of limits. For more in depth manipulation of x- and y-axes aesthetics, scale_x_discrete(), scale_x_continuous(), or scale_x_date() can be used. Since the data is continuous, I'll try replacing the xlim() and ylim() with scale_x_continuous() and scale_y_continuous(). Within these functions' brackets, I need to specify some things:
limits defines the limits of the scale.expand has been set on default to allow for some padding/gap on each side for the data variables.expand is set at c(0,0) for both the x- and y-axes.c(...) in c(0,0) combines the arguments (i.e. the values within the brackets 0, 0) to form a vector.However, now the figure is completely weird. It is only showing a dot at the centre of the figure.
ggplot(data = differential, aes(
x = SWSxREM,
y = diff_bias_change
)) +
geom_point() +
geom_smooth(se = FALSE,
method = lm) +
scale_x_continuous(limits = 0, 850,
expand = c(0,0)) +
scale_y_continuous(limits = -2, 1.5,
expand = c(0,0))
Looking back at my code, I realised that when I inputted the values for the limits I wanted for the x- and y-axes, I didn't include them within c(...), and thus it didn't return them as a vector. Instead it set the x-axis limit as 0 with an axis label of "1000" and the y-axis limit as -2 with an axis label of "1.5". Adding all the values into c(...) fixed the issue of the axis limits.
ggplot(data = differential, aes(
x = SWSxREM,
y = diff_bias_change
)) +
geom_point() +
geom_smooth(se = FALSE,
method = lm) +
scale_x_continuous(limits = c(0, 850),
expand = c(0,0)) +
scale_y_continuous(limits = c(-2, 1.5),
expand = c(0,0))
Now, I wanted to add aesthetics and formatting for the figure, using labs(). Note: I used subtitle = instead of title = here because not all of the title could be seen since the default title = font was too big to include everything.
ggplot(data = differential, aes(
x = SWSxREM,
y = diff_bias_change
)) +
geom_point() +
geom_smooth(se = FALSE,
method = lm) +
scale_x_continuous(limits = c(0, 850),
expand = c(0,0)) +
scale_y_continuous(limits = c(-2, 1.5),
expand = c(0,0)) +
labs(subtitle = "Fig 5. No association between minutes in SWS x minutes in REM and differential bias change",
x = "SWS x REM sleep duration (min)",
y = "Differential bias change")
I wanted to change it so that the x-axis values seen matches what is seen in the original paper i.e. only the values 0 and 500 are shown. To do this, I added breaks = in the scale_x_continuous() function and specified the values I wanted to see in c(...). This displays on the x-axis only the values that have been specified. I also did this for the y-axis, so that the value 1.5 is shown and the axis is in intervals of 0.5.
ggplot(data = differential, aes(
x = SWSxREM,
y = diff_bias_change
)) +
geom_point() +
geom_smooth(se = FALSE,
method = lm) +
scale_x_continuous(limits = c(0, 850),
breaks = c(0,500),
expand = c(0,0)) +
scale_y_continuous(limits = c(-2, 1.5),
breaks = c(-2, -1.5, -1, -0.5, 0, 0.5, 1, 1.5),
expand = c(0,0)) +
labs(subtitle = "Fig 5. No association between minutes in SWS x minutes in REM and differential bias change",
x = "SWS x REM sleep duration (min)",
y = "Differential bias change")
To make it more similar to the figure seen in the paper, I want to remove the grey background and the grid lines. To do this, I experimented with a few themes by adding theme_bw(), theme_light(), theme_minimal() and theme_classic(). In the end, the one that worked best was theme_classic().
ggplot(data = differential, aes(
x = SWSxREM,
y = diff_bias_change
)) +
geom_point() +
geom_smooth(se = FALSE,
method = lm) +
scale_x_continuous(limits = c(0, 850),
breaks = c(0,500),
expand = c(0,0)) +
scale_y_continuous(limits = c(-2, 1.5),
breaks = c(-2, -1.5, -1, -0.5, 0, 0.5, 1, 1.5),
expand = c(0,0)) +
labs(subtitle = "Fig 5. No association between minutes in SWS x minutes in REM and differential bias change",
x = "SWS x REM sleep duration (min)",
y = "Differential bias change") +
theme_classic()
We reproduced Figure 3 last because it seemed the most difficult. Figure 3 is line graph with error bars and two lines. It illustrates the mean bias scores for both the cued and uncued conditions over the 4 IAT timepoints.
Calculating the values
First, we need to calculate the standard errors for the error bars and means for the data points. The variables selected are the IAT scores for the 4 timepoints (baseline, prenap, postnap and one-week delay) for both the cued and uncued conditions. Calculating the means and standard errors is a similar process to what was done for Figure 4. We calculate the averages by using summarise_all() and use std.error to calculate the standard errors.
choosing_bias <- cleandata %>%
select(baseIATcued, baseIATuncued, preIATcued, preIATuncued, postIATcued, postIATuncued, weekIATcued, weekIATuncued)
bias_av_se <- choosing_bias %>%
summarise_all(list(mean = mean, se = std.error))
Creating a dataframe for the plot
Now, we need to create a datafram for the scatterplot. A column is needed to specify the cued/uncued conditions and another column is need for when the implicit bias was measured (baseline,prenap, postnap, or one week after). The other two columns specify the means and standard errors, respectively.
data4 <- data.frame(
Condition = factor(c("Cued", "Cued", "Cued", "Cued",
"Uncued", "Uncued", "Uncued", "Uncued")),
time = factor(c("Baseline", "Prenap", "Postnap", "1-week",
"Baseline", "Prenap", "Postnap", "1-week")),
bias_av = c(bias_av_se$baseIATcued_mean, bias_av_se$preIATcued_mean,
bias_av_se$postIATcued_mean, bias_av_se$weekIATcued_mean,
bias_av_se$baseIATuncued_mean, bias_av_se$preIATuncued_mean,
bias_av_se$postIATuncued_mean, bias_av_se$weekIATuncued_mean))
se = c(bias_av_se$baseIATcued_se, bias_av_se$preIATcued_se,
bias_av_se$postIATcued_se, bias_av_se$weekIATcued_se,
bias_av_se$baseIATuncued_se, bias_av_se$preIATuncued_se,
bias_av_se$postIATuncued_se, bias_av_se$weekIATuncued_se)
head(data4)
## Condition time bias_av
## 1 Cued Baseline 0.5175814
## 2 Cued Prenap 0.2108864
## 3 Cued Postnap 0.3068241
## 4 Cued 1-week 0.3999553
## 5 Uncued Baseline 0.5954932
## 6 Uncued Prenap 0.3024484
As can be seen, this dataframe is similar to the dataframe created for Figure 4.
Plotting the graph
Creating the plot uses similar functions to the ones used to reproduce Figure 4 and 5. geom_line() and geom_errorbar() adds the lines and error bars to the plot. The aesthetics of the plot are formatted by changing the colour, width and transparency of the error bars using colour =, width = and alpha = respectively. Finally, labs() was used to remove the x-axis label, relabel the y-axis, and adds a title. theme_bw() was used to remove the grey plot background but still include the grid lines.
ggplot(data = data4, aes(
x = factor(time,
level = c("Baseline", "Prenap", "Postnap", "1-week")),
y = bias_av,
colour = Condition,
group = Condition)) +
geom_line() +
geom_errorbar(aes(
x= time,
ymin=bias_av-se,
ymax=bias_av+se),
width=0.1,
colour="grey",
alpha= 0.9) +
ylim(0.0, 0.7) +
labs(x = "",
y = "D600 Bias Score",
title = "Fig 3. Average D600 scores at each IAT timepoint") +
theme_bw()
When I was examining the OSF data, I found it interesting that the study had a range of university students at different years in their university experience. Usually, as is the case with a lot of UNSW SONA studies, the majority of participants recruited from the university are only in their first year of university and are participating in the study as part of their course requirement.
Thus, I want to explore if TMR is particularly effective for a certain year group. If there is a significant difference, there can be further research done on this year group to determine why or how TMR works so effectively with them - for example, it may be a particular culture or demographic within this year group that allows for TMR to work so well with them. Further, it allows for research efforts to be redirected to refining or discovering more effective techniques for the other year groups. In all, this could have real-world implications for the use of TMR in education, the workforce and more.
This question will look at the differences between the prenap (post-counterbias training) and postnap (after TMR) IAT scores. I'll visualise this using a column graph to see if there are any differences between the 4 year groups.
Determining what data I need
First, I need to calculate the changes in implicit bias levels at the immediate tests (i.e. from pre-to post nap). Since I'm only looking for the effectiveness of TMR, I'm only looking at the cued condition. Thus, I need to calculate the difference between pre-nap bias and post nap bias scores for each of the 4 age groups. This difference is encapsulated in the variable postnap_change_cued.
** Descriptive statistics **
To create my plot, I need to first calculate the means and standard errors for their difference in bias scores for each of the 4 year groups. I did this by creating a new variable year_summary_postnapchange and the following functions:
group_by() is from the dplyr package. It allows for an existing data tibble (i.e. cleandata) to be converted into a grouped tibble according to the variable selected in the brackets. In this case, we are grouping by General_1_UniYears (number of years spent in University).summarise() is from the dplyr package. It allows for summary statistics to be created in a new dataframe. It will create a column for each grouping variable (in this case, we only have one: General_1_UniYears) and a column for each of the summary statistics I specified (mean, sd, n, se)Note that I'm calculating the standard errors differently to how I've calculated them previously. When I was reproducing the tables and plots, I used std.error from the plotrix() package to calculate the standard errors. After a Q&A session with Jenny, I decided to try using the equation where the standard error is calculated as the standard deviation divided by the square root of the sample size. I could not have used this formula if I had not already calculated standard deviation.
year_summary_postnapchange <- cleandata %>%
group_by(General_1_UniYears) %>%
summarise(mean = mean(postnap_change_cued),
sd = sd(postnap_change_cued),
n = n(),
se = sd/sqrt(n))
Now that I've calculated my needed values, I need to put it into a table. Again, I've used the gt() package to do this.
year_summary_postnapchange then use the pipe operator %>% to use it as the input for my gt() package.tab_header is from gt() package and adds a table header to the gt() table. This function allows for a title and a subtitle so I have to specify that I'm creating a title. The title text is placed in quotation marks.
md() is from the gt() package. It allows for formatting of text so I can use bolded, or italicised fonts etc.fmt_number() is from gt() package and controls the formatting of numeric values.
columns = specifies how to format the columns. vars is similar to select() in that it selects the variables that are needed. Combined, these two arguments specify that I want mean, sd and se as my columns.decimals = specifies how many decimal points I want displayed in my table.cols_label is from the gt() package. It lets me relabel my columns. I first have to specify the original variable name (e.g. General_1_UniYears) and then after =, place my new label in quotation marks.year_summary_postnapchange %>%
gt() %>%
tab_header(title = md("**Change in implicit bias levels at the immediate test for each year group**")) %>%
fmt_number(
columns = vars(mean, sd, se),
decimals = 2
) %>%
cols_label(General_1_UniYears = "Number of Years at University",
mean = "Mean",
sd = "SD",
n = "n",
se = "SE")
| Change in implicit bias levels at the immediate test for each year group | ||||
|---|---|---|---|---|
| Number of Years at University | Mean | SD | n | SE |
| 0 | −0.05 | 0.53 | 11 | 0.16 |
| 1 | 0.24 | 0.75 | 4 | 0.38 |
| 2 | 0.33 | 0.56 | 10 | 0.18 |
| 3 | −0.12 | 0.28 | 6 | 0.11 |
There does seem to be some variation in the group means. However, I did note that the sample sizes are quite small for each condition.
Visualisation
Now that the descriptive statistics have been calculated, it's time to create the plot. First, I have to create the dataframe for the figure.
condition = defines the x-axis, which will be the 4 different age groupsbias_change defines the y-axisstderror defines the data points for the standard errorsdata.frame translates this into a dataframe named data where the relevant groups in the brackets are included in the data.Condition <-c("0", "1", "2", "3")
bias_change <- c(-0.05, 0.24, 0.33, -0.12)
stderror <- c(0.16, 0.38, 0.18, 0.11)
data = data.frame(condition, bias_change, stderror)
head(data)
## condition bias_change stderror
## 1 cued -0.05 0.16
## 2 uncued 0.24 0.38
## 3 cued 0.33 0.18
## 4 uncued -0.12 0.11
The dataframe created looks similar to the dataframe created for Figure 4 so I decided to create my plot using similar methods and functions as Figure 4. I've plotted the graph using ggplot().
ggplot(data = data, aes(
x = condition,
y = bias_change,
fill = condition
)) +
geom_bar(
position = "dodge",
stat = "identity",
alpha=0.7) +
geom_errorbar(aes(
x = condition,
ymin=bias_change-stderror,
ymax=bias_change+stderror),
width=0.4,
colour="black",
alpha= 0.9,
position = position_dodge(0.9)) +
ylim(-0.4, 0.7) +
labs(x = "Number of years participant has spent in university",
y = "Bias Change",
title = "Change in implicit bias levels at the immediate test for each year group")
The graph produced shows that there is a difference in change in implicit bias levels between each year group. The 0- and 3-year groups experienced a slight decrease in implicit bias levels immediately following the TMR procedure, while the 1- and 2-year groups experienced the opposite. The latter groups experieneced a greater increase in implicit bias levels immediately following the TMR procedure.
This graph provides some insights to help answer my exploratory question. However, I realised that maybe including the uncued condition could provide more information. Recall that the uncued condition is the social bias (either race or gender) for each participant that only received counterbias training.
Attempt 2
My question is worded the same but now I define my question differently. My question will now look at the differences between the prenap (post-counterbias training) and postnap (after TMR) IAT scores. The effectiveness of TMR is measured by looking at the changes of the cued (counterbias training + TMR) versus uncued (counterbias training only) condition between prenap and postnap tests. Recall that prenap scores are obtained after counterbias training and postnap scores are acquired after the TMR procedure.
Descriptive statistics
The descriptive statistics for the cued condition are the same as was calculated previously:
year_summary_postnapcued <- cleandata %>%
group_by(General_1_UniYears) %>%
summarise(mean = mean(postnap_change_cued),
sd = sd(postnap_change_cued),
n = n(),
se = sd/sqrt(n))
year_summary_postnapcued %>%
gt() %>%
tab_header(title = md("**Change in implicit bias levels using Targeted Memory Reactivation for each year group**")) %>%
fmt_number(
columns = vars(mean, sd, se),
decimals = 2
) %>%
cols_label(General_1_UniYears = "Number of Years at University",
mean = "Mean",
sd = "SD",
n = "n",
se = "SE")
| Change in implicit bias levels using Targeted Memory Reactivation for each year group | ||||
|---|---|---|---|---|
| Number of Years at University | Mean | SD | n | SE |
| 0 | −0.05 | 0.53 | 11 | 0.16 |
| 1 | 0.24 | 0.75 | 4 | 0.38 |
| 2 | 0.33 | 0.56 | 10 | 0.18 |
| 3 | −0.12 | 0.28 | 6 | 0.11 |
Now I'm calculating the descriptive statistics for the uncued condition by using the variable postnap_change_uncued.
year_summary_postnapuncued <- cleandata %>%
group_by(General_1_UniYears) %>%
summarise(mean = mean(postnap_change_uncued),
sd = sd(postnap_change_uncued),
n = n(),
se = sd/sqrt(n))
year_summary_postnapuncued %>%
gt() %>%
tab_header(title = md("**Change in implicit bias levels using counterbias training for each year group**")) %>%
fmt_number(
columns = vars(mean, sd, se),
decimals = 2
) %>%
cols_label(General_1_UniYears = "Number of Years at University",
mean = "Mean",
sd = "SD",
n = "n",
se = "SE")
| Change in implicit bias levels using counterbias training for each year group | ||||
|---|---|---|---|---|
| Number of Years at University | Mean | SD | n | SE |
| 0 | −0.14 | 0.53 | 11 | 0.16 |
| 1 | −0.19 | 0.42 | 4 | 0.21 |
| 2 | −0.11 | 0.45 | 10 | 0.14 |
| 3 | 0.28 | 0.87 | 6 | 0.36 |
Visualisation
Now that the descriptive statistics have been calculated, it's time to create the figure. First, I have to create the dataframe for the figure.
time1 <- c(rep("0",2), rep("1",2), rep("2",2), rep("3",2))
Condition <-rep(c("Cued","Uncued"),4)
bias_change <- c(-0.05, -0.14, 0.24, -0.19, 0.33, -0.11, -0.12, 0.28)
stderror <- c(0.16, 0.16, 0.38, 0.21, 0.18, 0.14, 0.11, 0.36)
data1 = data.frame(condition, bias_change, stderror)
head(data1)
## condition bias_change stderror
## 1 cued -0.05 0.16
## 2 uncued -0.14 0.16
## 3 cued 0.24 0.38
## 4 uncued -0.19 0.21
## 5 cued 0.33 0.18
## 6 uncued -0.11 0.14
I now recreate the plot using the dataframe I just created:
ggplot(data = data1, aes(
x = time1,
y = bias_change,
fill = condition
)) +
geom_bar(
position = "dodge",
stat = "identity",
alpha=0.7) +
geom_errorbar(aes(
x = time1,
ymin=bias_change-stderror,
ymax=bias_change+stderror),
width=0.4,
colour="black",
alpha= 0.9,
position = position_dodge(0.9)) +
ylim(-0.4, 0.7) +
labs(x = "Number of years spent in university",
y = "Bias Change",
title = "Change in implicit bias levels at the immediate test for each year group") +
theme_bw()
As seen by this plot, only the 1-year and 2-year university groups experienced an increase in cued bias scores and a decrease in uncued bias scores. The opposite effect is observed in the 3-year university group. The only year-group that experienced both a reduction in bias levels using both the TMR and counter-bias training procedure are the participants that spent 0 years in university. However, note that these students experienced a greater reduction in bias levels using the counterbias procedure, compared to the TMR procedure.
Statistics
I want to compare several things:
To compare the four year-group means for the cued condition, I used a two-way ANOVA. I'm comparing the difference in means at the immediate test for the cued condition as a function of university year group. I use aov() from the stats package to calculate the test statistic for the ANOVA. I have to specify which data I'm using and my dependent and independent variables. summary() prints out the ANOVA test-statistic.
Pr(>F) is the p-value associated with each F-statistic. Since the p-value is greater than 0.05, there is no evidence to suggest that there is a difference of TMR effectiveness between university year groups.
cued_anova <- aov(postnap_change_cued ~ General_1_UniYears, data = cleandata)
summary(cued_anova)
## Df Sum Sq Mean Sq F value Pr(>F)
## General_1_UniYears 3 1.118 0.3728 1.3 0.295
## Residuals 27 7.740 0.2867
I used a two-way ANOVA to check for the uncued condition. Again, the p-value is greater than 0.05. There is no evidence to suggest that there is a difference of counterbias training effectiveness between university year groups.
uncued_anova <- aov(postnap_change_uncued ~ General_1_UniYears, data = cleandata)
summary(uncued_anova)
## Df Sum Sq Mean Sq F value Pr(>F)
## General_1_UniYears 3 0.845 0.2818 0.844 0.482
## Residuals 27 9.017 0.3340
Lastly, I wanted to compare between the cued and uncued conditions, for each year group using a t-test. First, I have to create variables that only contain the data for each particular year group using filter().
year0uni_participants <- cleandata %>%
filter(General_1_UniYears == "0")
year1uni_participants <- cleandata %>%
filter(General_1_UniYears == "1")
year2uni_participants <- cleandata %>%
filter(General_1_UniYears == "2")
year3uni_participants <- cleandata %>%
filter(General_1_UniYears == "3")
I used t.test() from the stats() package and specified which two data variables I'll be comparing the means of.
Year0:
t.test(year0uni_participants$postnap_change_cued, year0uni_participants$postnap_change_uncued)
##
## Welch Two Sample t-test
##
## data: year0uni_participants$postnap_change_cued and year0uni_participants$postnap_change_uncued
## t = 0.37971, df = 20, p-value = 0.7082
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.3883295 0.5611662
## sample estimates:
## mean of x mean of y
## -0.05003512 -0.13645348
t.test(year1uni_participants$postnap_change_cued, year1uni_participants$postnap_change_uncued)
##
## Welch Two Sample t-test
##
## data: year1uni_participants$postnap_change_cued and year1uni_participants$postnap_change_uncued
## t = 0.99427, df = 4.6787, p-value = 0.3687
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.7011576 1.5563760
## sample estimates:
## mean of x mean of y
## 0.2382869 -0.1893223
t.test(year2uni_participants$postnap_change_cued, year2uni_participants$postnap_change_uncued)
##
## Welch Two Sample t-test
##
## data: year2uni_participants$postnap_change_cued and year2uni_participants$postnap_change_uncued
## t = 1.9184, df = 17.21, p-value = 0.0718
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.04303475 0.91459719
## sample estimates:
## mean of x mean of y
## 0.3268173 -0.1089640
t.test(year3uni_participants$postnap_change_cued, year3uni_participants$postnap_change_uncued)
##
## Welch Two Sample t-test
##
## data: year3uni_participants$postnap_change_cued and year3uni_participants$postnap_change_uncued
## t = -1.0579, df = 5.9914, p-value = 0.3309
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.3110152 0.5197779
## sample estimates:
## mean of x mean of y
## -0.1161439 0.2794747
As seen above, all p-values for each of the 4 year groups rise above 0.05. Thus, for each year group, there is no evidence to suggest that there is a significant difference between the use of TMR vs counterbias training procedures in reducing implicit biases.
Summary: Do the number of years spent in university influence the effectiveness of targeted memory reactivation (TMR) for reducing implicit biases?
Although there are slight differences in the effectiveness of targeted memory reactivation (TMR) in reducing implicit biases between different university year groups, cued and uncued bias did not change differentially from the prenap test to the postnap test.
In fact, only the 1-year and 2-year university groups experienced a numerical (but non-signifcant) increase in cued bias scores and a decrease in uncued bias scores. This can be interpreted as TMR showing an undesirable effect in increasing bias levels for these two university year groups, while counterbias training was able to somewhat reduce bias levels.
The opposite effect is observed in the 3-year university group - this group experienced non-significant increases in uncued bias scores and decreases in cued bias scores. This can be interpreted as TMR showing a desirable effect in decreasing bias levels, while counterbias training increased bias levels in this year group.
The only year-group that experienced both a reduction in bias levels using both the TMR and counter-bias training procedure are the participants that spent 0 years in university. However, note that these students experienced a greater (though non-significant) reduction in bias levels using the counterbias procedure, compared to the TMR procedure.
During the targeted memory reactivation procedure, participants were exposed to sound cues for the duration of their 90 minute nap. However, the sound cues automatically stopped if participants showed signs of awakening or entering another sleep stage. Thus, the number of cues that participants were exposed to varied from as low as 37.5 cues to as high as 660 cues.
In this exploratory analysis, I'll be looking at the correlation between this differential bias change between cued and uncued conditions and the number of cues participants are exposed to. Thus, I'll use a scatterplot to illustrate this.
Determining what data I need
There are two sets of data that I need:
The first data set is already provided in the open data, under the variable cues_total. The latter was calculated when reproducing Figure 5. Recall that differential bias change is calculated as "the baseline minus delayed score for uncued bias subtracted from the baseline minus delayed score for cued bias". As I was reviewing previous files to see how I coded for differential bias change, I realised that these values had already been included in the open data under the variable name diff_biaschange. This discovery was quite annoying because it highlighted how important it is to have a codebook of some sort. If only our group had something that could tell us which variables in the dataset was needed for reproducing each figure, we would have saved us so much time in working out how to calculate and code for this variable. For this exploratory analysis, I will be using the variable diff_biaschange that was included in the open data file.
Descriptive statistics
I had incorrectly assumed that I would need to calculate for descriptive statistics in order to answer this question. I calculated my descriptive statistics in the same way I did for my first exploratory question. However, my table output was strange - there was no standard deviation or standard errors for most of my values. After asking for some clarification from Jenny, she highlighted that because most participants were exposed to a varying number of cues, calculating for descriptive statistics would be unnecessary in order to answer my question.
cuesexposed <- cleandata %>%
group_by(cues_total) %>%
summarise(mean = mean(diff_biaschange),
sd = sd(diff_biaschange),
n = n(),
se = sd/sqrt(n))
cuesexposed %>%
gt() %>%
tab_header(title = md("**Is there an association between number of cues exposed to and differential bias change?**")) %>%
fmt_number(
columns = vars(mean, sd, se),
decimals = 2
) %>%
cols_label(cues_total = "Number of cues exposed",
mean = "Mean",
sd = "SD",
n = "n",
se = "SE")
| Is there an association between number of cues exposed to and differential bias change? | ||||
|---|---|---|---|---|
| Number of cues exposed | Mean | SD | n | SE |
| 37.5 | 1.27 | NA | 1 | NA |
| 45.0 | −0.16 | NA | 1 | NA |
| 52.5 | 0.34 | NA | 1 | NA |
| 142.5 | 0.44 | NA | 1 | NA |
| 165.0 | 0.21 | NA | 1 | NA |
| 180.0 | −1.01 | NA | 1 | NA |
| 187.5 | 0.81 | NA | 1 | NA |
| 210.0 | −0.83 | NA | 1 | NA |
| 225.0 | 0.67 | NA | 1 | NA |
| 232.5 | −1.01 | NA | 1 | NA |
| 240.0 | −0.50 | NA | 1 | NA |
| 262.5 | 0.29 | NA | 1 | NA |
| 270.0 | −0.23 | NA | 1 | NA |
| 277.5 | −0.53 | NA | 1 | NA |
| 285.0 | −0.04 | 0.36 | 2 | 0.26 |
| 345.0 | −1.48 | NA | 1 | NA |
| 352.5 | −0.02 | NA | 1 | NA |
| 375.0 | −0.04 | NA | 1 | NA |
| 390.0 | 0.25 | NA | 1 | NA |
| 405.0 | −0.31 | NA | 1 | NA |
| 420.0 | 0.54 | 0.25 | 2 | 0.17 |
| 427.5 | −0.53 | NA | 1 | NA |
| 442.5 | 0.18 | NA | 1 | NA |
| 487.5 | 0.69 | NA | 1 | NA |
| 502.5 | −0.79 | NA | 1 | NA |
| 540.0 | −1.11 | NA | 1 | NA |
| 562.5 | 0.03 | NA | 1 | NA |
| 600.0 | −0.13 | NA | 1 | NA |
| 660.0 | 0.04 | NA | 1 | NA |
Thus, I just decided to calculate for the overall mean and standard error and format the values into a gt() table. An average of 323+-29 individual cues were presented to each participant.
cuesexposed2 <- cleandata %>%
summarise(mean = mean(cues_total),
sd = sd(cues_total),
n = n(),
se = sd/sqrt(n))
cuesexposed2 %>%
gt() %>%
tab_header(title = md("**Is there an association between number of cues exposed to and differential bias change?**")) %>%
fmt_number(
columns = vars(mean, sd, n, se),
decimals = 2
) %>%
cols_label(mean = "Mean",
sd = "SD",
n = "n",
se = "SE")
| Is there an association between number of cues exposed to and differential bias change? | |||
|---|---|---|---|
| Mean | SD | n | SE |
| 323.47 | 162.33 | 31.00 | 29.15 |
Visualisation
I plotted the scatterplot using ggplot(). I used geom_point() to add the scatterplot and geom_smooth() to add the regression line. Note that I used the caption = argument to add a title because my question was too long and caption = uses a smaller font). I've plotted the graph using `ggplot.
ggplot(data = cleandata, aes(
x = cues_total,
y = diff_biaschange
))+
geom_point()+
geom_smooth(method = lm,
se = F)+
scale_x_continuous(expand = c(0,0),limits = c(0,700))+
scale_y_continuous(expand = c(0,0),limits = c(-2,2))+
labs(caption = "Do the number of cues that participants are exposed to influence the effectiveness of TMR?",
x = "Number of cues exposed",
y = "Differential bias change")+
theme_apa()
Looking at the plot, there seems to be a slight decreasing trend but it is almost horizontal. Thus, I need to run a statistical test to see if this trend is of signficance.
Statistics
To measure correlation, I'll be using cor_test().
cor.test(cleandata$cues_total, cleandata$diff_biaschange)
##
## Pearson's product-moment correlation
##
## data: cleandata$cues_total and cleandata$diff_biaschange
## t = -0.99925, df = 29, p-value = 0.3259
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.5041878 0.1837791
## sample estimates:
## cor
## -0.1824417
There is a non-significant negative correlation of -0.1824417. Thus, although there does seem to be a slight decreasing trend in that the more cues participants are exposed to, the more decrease in bias is seen, overall there is no association between number of cues that participants are exposed to and the effectiveness of TMR.
In one of my reactions, I mentioned that there were a few procedural differences between Hu et al.'s 2015 study and the 2019 replication study that could have influenced the differing results. I wanted to explore if there were differences in bias change in the 2019 study between participants who were offered course credit and those who were offered cash compensation. I will be constructing a boxplot to compare the means.
Determining what data I need
There are two sets of data that I need:
For the first data set, the data to determine what compensation participants were offered is already provided in the open data, under variable compensation. As I did for the second exploratory analysis, I'll be looking at the differential bias change between cued (counterbias training + TMR) and uncued (counterbias training only) conditions. I'll be using the data variable diff_biaschange that was included in the open data file, rather than the one that my group had coded for when reproducing Figure 5 before we realised that the variables had already been calculated.
Descriptive statistics
I've calculated and my descriptive statistics in the same way, as I did for my first exploratory analysis.
In a similar manner to my first exploratory analysis, I've calculated the descriptive statistics and formatted this data into a gt() table.
coursecompensation <- cleandata %>%
group_by(compensation) %>%
summarise(mean = mean(diff_biaschange),
sd = sd(diff_biaschange),
n = n(),
se = sd/sqrt(n))
coursecompensation %>%
gt() %>%
tab_header(title = md("**Does getting offered course credit (vs. cash) significantly change implicit biases?**")) %>%
fmt_number(
columns = vars(mean, sd, n, se),
decimals = 2
) %>%
cols_label(mean = "Mean",
sd = "SD",
n = "n",
se = "SE")
| Does getting offered course credit (vs. cash) significantly change implicit biases? | ||||
|---|---|---|---|---|
| compensation | Mean | SD | n | SE |
| cash | −0.20 | 0.48 | 12.00 | 0.14 |
| course credit | 0.00 | 0.72 | 19.00 | 0.16 |
It seems like using course credit as compensation did not impact the effectiveness of TMR compared to using counterbias training only - as the mean is 0, this indicates no difference at all between cued (TMR) and uncued (counterbias training). There was a very very slight difference when using cash as course compensation - the negative mean indicates that there was a greater change in implicit biases in the uncued condition, compared to cued. However, both these means are very close to zero.
Visualisation
I've plotted the graph using ggplot() and used geom_boxplot() to add the boxplot. Without this, the plot would be empty, with only x- and y-axis showing.
ggplot(data = cleandata, aes(
x = compensation,
y = diff_biaschange,
fill = compensation))+
geom_boxplot()
I wanted to improve my boxplot so I addedgeom_jitter() to visualise each datapoint. To differentiate the different datapoints, colour = is used to differentiate which condition the datapoints belong to. I used easy_remove_legend() from the ggeasy() package to remove the legend. I decided to change the theme to theme_apa() from the papaja() package which allows for APA compatible ggplots.
ggplot(data = cleandata, aes(
x = compensation,
y = diff_biaschange,
fill = compensation)) +
geom_boxplot() +
geom_jitter(alpha = 0.8,
aes(colour=compensation))+
theme_apa() +
labs(x = "Compensation",
y = "Differential Bias Change",
title = "Does getting offered course credit (vs. cash) significantly change implicit biases?")
Statistics
To compare means between the two conditions, I decided to use a t-test to see if the differences were significant. To perform a t-test, I used the ttestIS() function from the jmv() package. I’ve specified the DV diff_biaschange and I want to know if that varies significantly between compensation. The data comes from the cleandata dataframe.
ttestIS(formula = diff_biaschange ~ compensation, data = cleandata)
##
## INDEPENDENT SAMPLES T-TEST
##
## Independent Samples T-Test
## ─────────────────────────────────────────────────────────────────────────
## Statistic df p
## ─────────────────────────────────────────────────────────────────────────
## diff_biaschange Student's t -0.8417201 29.00000 0.4068325
## ─────────────────────────────────────────────────────────────────────────
As can be seen, the p-value (0.4068325) is greater than 0.05. Thus, there is no evidence to suggest that the difference in compensation procedure matters - getting offered course credit (vs. cash) does not significantly change implicit biases. It is unlikely that the procedural difference of the study's form of compensation greatly influenced the differing results between Hu et al's 2015 study and Humiston & Wamsley's 2019 study.
Our reproducibility journey would have been a lot simpler to navigate had the authors described each dependent variable more clearly in their papers. This was especially difficult at the beginning when we were reproducing our first table. The Stanford Sleepiness Scale (SSS) score was one of the first variables we had to calculate. However, the paper did not describe how this score (and other variables) were calculated.
In order to improve reproducibility, clear dependent variable descriptions can be included in the methods section when explaining what the authors are attempting to obtain. For example, with The Stanford Sleepiness Scale (SSS) score, this variable was mentioned briefly in the methods section when discussing the initial questionnaire that participants would complete. By explaining these dependent variables more thoroughly, reader understanding can also be improved.
Following on from the previous recommendation, having a codebook data dictionary would have streamlined our verification journey. No codebook data dictionary was included in the OSF repository, which made it difficult to know which data variables corresponded to which dependent variable mentioned in the paper. When first reproducing our plots, we weren't able to match our error bars to what was seen in the original paper because we were using standard deviations. Only after some clarification from our tutors were we able to realise that standard errors were used to create the error bars. If a codebook had been included in the OSF repository, reproducing the plots would have been faster and less complicated.
Further, we also experienced issues with mismatching data variable names. Using the same SSS variable example as earlier, when we looked at our OSF data file, not only was the variable we needed labelled completely different to what would be expected, there were 4 variables with similar labels - AlertTest_1_Feel, AlertTest_2_Feel, AlertTest_3_Feel, and AlertTest_4_Feel. It was only through trial-and-error did we determine that the only variable needed was AlertTest_1_Feel. This variable should have been labelled with something pertaining to the SSS variable being measured (e.g. SSS_1, SSS_2) and a codebook should have contained information about the ordering of this variable. Thus, in conjunction with including a codebook, authors should give variables appropriate names that are explanatory, consistent and make sense at a glance.
A useful resource for authors that covers the basics of creating a codebook is linked here. It provides examples of what to include in the codebook and tools on for creating a codebook. To actually create a codebook in R, this blog post is incredibly helpful. It outlines the different packages and functions to create a codebook out of an existing data file.
Most importantly, authors should share not only their raw data, but also their coding analysis script - having just the data is similar to being given the ingredients for a meal without the recipe instructions.
As mentioned earlier, our group struggled to reproduce many of our plots because the standard errors were not included in the data file. We had to calculate the standard error values for some of our plots and had to roughly guess if our error bars matched what was seen in the original paper. We also struggled to calculate many other values because no coding script was given to us. We had to use a mixture of trial-and-error, googling and a lot of patience in order to reproduce many of our values and figures.
If authors truly wish to be transparent and want their studies to be easily reproducible, including a coding analysis script is incredibly important. This can be consistently recording every step taken to process the data. This can be documented using RMarkdown from the beginning to the end of the study analysis and hiding the code that authors don't want included in their paper using include = FALSE. This allows for the code to still be available for perusal in the RMarkdown file.