This week mainly involved building upon the start on exploratory analyses from last week.
Specifically my goals are:
to try out statistical analyses, in particular, applying it to the data I calculated last week
to start finalising my report and putting it all in an Rmarkdown file, in particular, knitting the doc every so often to ensure it works!
Load packages
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.2 v dplyr 1.0.6
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(readspss)
library(plotrix)
library(gt)
library(ggpubr)
library(jmv)
library(rstatix)
##
## Attaching package: 'rstatix'
## The following object is masked from 'package:stats':
##
## filter
library(here)
## here() starts at C:/Users/miche/Documents/Coding-R/Learning logs
library(readxl)
library(psych)
##
## Attaching package: 'psych'
## The following objects are masked from 'package:jmv':
##
## pca, reliability
## The following object is masked from 'package:plotrix':
##
## rescale
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
library(gtsummary)
## #BlackLivesMatter
Loading data
data <- read.sav("Humiston & Wamsley 2019 data.sav")
Cleaning data
cleandata <- data %>%
filter(exclude == "no")
To remind everyone of my progress from last week, I developed two different plots.
One plot compared the mean bias levels for race and gender bias across time.
Here we found that both types of biases showed similar trends across time, however, race bias showed less bias increase at the 1 week delay time point.
biasdata <- data.frame(
condition = factor(c("Gender", "Gender", "Gender", "Gender", "Race", "Race", "Race", "Race")),
time = factor(c("Baseline", "Prenap", "Postnap", "1-week", "Baseline", "Prenap", "Postnap", "1-week")),
levels = c("Baseline", "Prenap", "Postnap", "1-week"),
bias_av = c(0.49767, 0.3357758, 0.3389568, 0.4857734, 0.533979, 0.1080363, 0.2803619, 0.3292815),
se = c( 0.07186912, 0.11089910, 0.12809562, 0.08824424, 0.1051369, 0.1394592, 0.1035415, 0.1031192)
)
ggplot(data = biasdata, aes(
x = factor(time, level = c("Baseline", "Prenap", "Postnap", "1-week")),
y = bias_av,
colour = condition,
group = condition)) +
geom_line() +
geom_errorbar(aes(
x= time,
ymin=bias_av-se,
ymax=bias_av+se),
width=0.1, colour="grey", alpha= 0.9) +
labs(x = "time", title = "Bias change for bias type")
The other plot looked specifically at race bias, comparing the cued and uncued conditions.
racebiasdata2 <- data.frame(
condition = factor(c("Cued", "Cued", "Cued", "Cued", "Uncued", "Uncued", "Uncued", "Uncued")),
time = factor(c("Baseline", "Prenap", "Postnap", "1-week", "Baseline", "Prenap", "Postnap", "1-week")),
levels = c("Baseline", "Prenap", "Postnap", "1-week"),
bias_av = c(0.533979, 0.1080363, 0.2803619 , 0.3292815, 0.7215597, 0.3168438, 0.3243954, 0.5191932),
se = c( 0.1051369, 0.1394592, 0.1035415, 0.1031192, 0.1193954, 0.1462797, 0.1344576, 0.1267464)
)
ggplot(data = racebiasdata2, aes(
x = factor(time, level = c("Baseline", "Prenap", "Postnap", "1-week")),
y = bias_av,
colour = condition,
group = condition)) +
geom_line() +
geom_errorbar(aes(
x= time,
ymin=bias_av-se,
ymax=bias_av+se),
width=0.1, colour="grey", alpha= 0.9) +
labs(x = "time", title = "Race Bias Change")
Here we observed that there doesn’t appear to be much difference between cued and uncued conditions for race bias.
So far this has provided evidence that TMR is unsuccessful, regardless of bias type.
Now, I’ll try apply statistical analyses!
Firstly, I will attempt to use the stat_compare_means function, by applying it to the plot.
racevsgender <- list(c("race", "gender"))
ggplot(data = biasdata, aes(
x = factor(time, level = c("Baseline", "Prenap", "Postnap", "1-week")),
y = bias_av,
colour = condition,
group = condition)) +
geom_line() +
geom_errorbar(aes(
x= time,
ymin=bias_av-se,
ymax=bias_av+se),
width=0.1, colour="grey", alpha= 0.9) +
labs(x = "time", title = "Bias change for bias type") +
stat_compare_means(comparisons = racevsgender, method = "t.test")
## Warning: Computation failed in `stat_signif()`:
## missing value where TRUE/FALSE needed
Hm, It’s coming up with a warning: Warning: Computation failed in stat_signif(): missing value where TRUE/FALSE needed
Not quite sure what this means! I’ll rewatch Jenny’s Q and A, and maybe I’ll figure it out.
In the meantime, I’ll move on to doing t-tests.
t-tests are used to compare two means, so I’ll need to do multiple t-tests for each time point, to compare the types of bias. i.e. baseline, prenap, postnap, 1 week delay.
Additionally, I could compare the means between each time point, to determine if bias significantly changes between time points for each type of bias.
I’ll start off with race vs gender bias for prenap.
prenap_race <- cleandata %>%
filter(Cue_condition == "race")
prenap_gender <- cleandata %>%
filter(Cue_condition == "gender")
t.test(prenap_race$preIATcued, prenap_gender$preIATcued)
Oh no! This doesn’t appear to work either.
I’m getting a warning that there is not enough x observations. When I check the environment, for some reason there’s 0 for prenap_race and prenap_gender!
In the data, I assumed it was coded as “race”, “gender”, but perhaps I have to use the coded numeral if that makes sense. i.e. race = 1, gender = 2
Lets try that.
prenap_race <- cleandata %>%
filter(Cue_condition == 1)
prenap_gender <- cleandata %>%
filter(Cue_condition == 2)
t.test(prenap_race$preIATcued, prenap_gender$preIATcued)
Hm, they still have 0 observations.
After looking back at my other learning logs and Rmarkdowns, I think I found the problem.
Weirdly, when viewing the data, depending where, the entries are coded differently. In the csv file, the data is coded in 1’s and 0’s.
However, it is actually coded in words specifically: - “race cue played” - “gender cue played”
Hence, why, when I used “race” and “gender”, there were no observations, as these data entries didn’t exist!
Lets do it again
prenap_race <- cleandata %>%
filter(Cue_condition == "race cue played")
prenap_gender <- cleandata %>%
filter(Cue_condition == "gender cue played")
t.test(prenap_race$preIATcued, prenap_gender$preIATcued)
##
## Welch Two Sample t-test
##
## data: prenap_race$preIATcued and prenap_gender$preIATcued
## t = -1.2782, df = 28.572, p-value = 0.2115
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.5923922 0.1369133
## sample estimates:
## mean of x mean of y
## 0.1080363 0.3357758
Tada! It worked!!
the p value provided is = 0.2115, much bigger than 0.05. According to this analysis, we cannot say there is a significant difference between race and gender bias types at the prenap timepoint.
However, this is not of much concern as this is before TMR occurs.
Now we will repeat for post-nap.
prenap_race <- cleandata %>%
filter(Cue_condition == "race cue played")
prenap_gender <- cleandata %>%
filter(Cue_condition == "gender cue played")
t.test(prenap_race$postIATcued, prenap_gender$postIATcued)
##
## Welch Two Sample t-test
##
## data: prenap_race$postIATcued and prenap_gender$postIATcued
## t = -0.35575, df = 26.385, p-value = 0.7249
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.3969202 0.2797304
## sample estimates:
## mean of x mean of y
## 0.2803619 0.3389568
For this analysis, p= 0.7249 > 0.05. Again, there is no significant difference!
When comparing these results to the plot, this is not that surprising. However, there may potentially be an effect at the 1 week delay time point.
prenap_race <- cleandata %>%
filter(Cue_condition == "race cue played")
prenap_gender <- cleandata %>%
filter(Cue_condition == "gender cue played")
t.test(prenap_race$weekIATcued, prenap_gender$weekIATcued)
##
## Welch Two Sample t-test
##
## data: prenap_race$weekIATcued and prenap_gender$weekIATcued
## t = -1.153, df = 28.924, p-value = 0.2583
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.4341075 0.1211237
## sample estimates:
## mean of x mean of y
## 0.3292815 0.4857734
Following this analysis, we can confirm there is no significant difference between bias types at the week delay time point as p = 0.2583 > 0.05.
Regardless, I will now try to do statistical analyses for the second plot which focuses on just race.
This time, I will try using jmv package to run t-tests.
Like before, I will run a t-test for each time point.
To use the JMV package, it is similar to the previous method. You indicate which variable means you are comparing, and how you are dividing the data. Here I indicate I want to compare the means of base_IAT_race, for each cue condition.
ttestIS(formula = base_IAT_race ~ Cue_condition, data = cleandata)
##
## INDEPENDENT SAMPLES T-TEST
##
## Independent Samples T-Test
## ----------------------------------------------------------------------
## Statistic df p
## ----------------------------------------------------------------------
## base_IAT_race Student's t -1.182656 29.00000 0.2465512
## ----------------------------------------------------------------------
The p value calculated is 0.257 (rounded) which is larger than 0.05. Therefore we can say there is no signifcant different between cued and uncued conditions for race bias. Again, this isn’t that concerning as this is prior to TMR.
Now we repeat for prenap.
ttestIS(formula = pre_IAT_race ~ Cue_condition, data = cleandata)
##
## INDEPENDENT SAMPLES T-TEST
##
## Independent Samples T-Test
## ---------------------------------------------------------------------
## Statistic df p
## ---------------------------------------------------------------------
## pre_IAT_race Student's t -1.028076 29.00000 0.3124129
## ---------------------------------------------------------------------
Again, we find no significant result, and again this isn’t concerning as TMR has not been implemented at this stage.
Now, post-nap.
ttestIS(formula = post_IAT_race ~ Cue_condition, data = cleandata)
##
## INDEPENDENT SAMPLES T-TEST
##
## Independent Samples T-Test
## -----------------------------------------------------------------------
## Statistic df p
## -----------------------------------------------------------------------
## post_IAT_race Student's t -0.2637360 29.00000 0.7938484
## -----------------------------------------------------------------------
Unfortunately, there’s no significant difference for this time point! This indicated that at the post-nap timepoint, there is no significant difference in implicit bias levels for cued and uncued conditions. This suggests that TMR does not have an effect at the post nap time point.
However, lets see if there is an effect at the one week delay timepoint.
ttestIS(formula = week_IAT_race ~ Cue_condition, data = cleandata)
##
## INDEPENDENT SAMPLES T-TEST
##
## Independent Samples T-Test
## ----------------------------------------------------------------------
## Statistic df p
## ----------------------------------------------------------------------
## week_IAT_race Student's t -1.175013 29.00000 0.2495474
## ----------------------------------------------------------------------
Again, there is no significant difference! (p=0.25 > 0.05)
Moving on from question 1, lets get a start on question 2!
This question looks at if the length of time the cue is played for during the nap, influence the effectiveness of TMR.
Perhaps, the longer the cue is played for, the stronger the TMR effect will be?
cue_minutesdescribe <- cleandata %>%
select(cue_minutes)
describe(cue_minutesdescribe)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 31 21.56 10.82 19 21.52 11.86 2.5 44 41.5 0.11 -0.8 1.94
I thought of this question, as the cue duration ranges from 2.5 minutes to 44 minutes! That’s such a large range of time (41.5 minutes) considering that the study is looking at the effect of reactivating memories during sleep using cues. How could a memory be reactivated effectively, if it is only being played for 2.5 minutes, compared to 44!
Therefore, this question aims to address if there is a correlation between duration of cue, and bias change.
Firstly, I select the relevant variables to calculate the descriptive statistics.
I then mutate these variables to calculate differential bias change. I got this equation from the original paper.
cueduration_data <- cleandata %>%
select(ParticipantID, baseIATcued, weekIATcued, baseIATuncued, weekIATuncued, cue_minutes) %>%
mutate(cued_differential = baseIATcued - weekIATcued,
uncued_differential = baseIATuncued - weekIATuncued,
diff_bias_change = cued_differential - uncued_differential)
I then use ggplot to make a scatter plot to visualise the data.
I include a line of best fit using geom_smooth
ggplot(data = cueduration_data, aes(
x = cue_minutes,
y = diff_bias_change
))+
geom_point()+
geom_smooth(method = lm,
se = F)+
scale_x_continuous(expand = c(0,0),limits = c(0,50))+
scale_y_continuous(expand = c(0,0),limits = c(-2,1.5))+
labs(title = "Fig 2",
x = "Cue duration (minutes)",
y = "Differential bias change")+
theme_bw()
## `geom_smooth()` using formula 'y ~ x'
Looking at the plot, there seems to be slight decreasing trend, but it is not very dramatic. It doesn’t look quite convincing that there is an effect.
Therefore, lets try to apply some statistical analysis to determine if there is!
To do analysis, I need to use the cor_test function
Lets give that a go
cor.test(cueduration_data$cue_minutes, cueduration_data$diff_bias_change)
##
## Pearson's product-moment correlation
##
## data: cueduration_data$cue_minutes and cueduration_data$diff_bias_change
## t = -0.99925, df = 29, p-value = 0.3259
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.5041878 0.1837791
## sample estimates:
## cor
## -0.1824417
This test shows us that there is a correlation of -0.182, however, this is not a significant correlation.
It is noted that this would be a good suggestion for future research: to control, or conduct a quasi experiment with a larger sample size so the spread of data is better. The data I used for this analysis wasn’t ideal as there were few participants with a really small cue duration, compared to other duration lengths.
The correlation does slightly fit expected results. TMR is meant to reduce bias. Therefore, we want differential bias change to be negative, to indicate there is a decrease in bias between the uncued an cued conditions (such that the cued condition has lower bias levels).
In the plot, you can see, that as cue duration increases, the difference between cued and uncued condition increases! Perhaps with a better sample size and spread between durations, future researchs could find an effect.
Next steps would to rewatch the Q and A from this week to get an even better understanding of statistical analyses.
I also need to start my third question which is looking at if changes in procedure may have affected the results of the presented study.