This week was probably our most productive week yet, and our group collectively blew our goals out of the water. The goals for this week were:
As it turns out, adding the error bars was the easy part since you just use geom_errorbar(). The difficult part was working out what the values of the error bar were.
To recap, this is what the original Figure 4 looks like:
So, first I’m going to recreate the original column bar figure without the error bars first. I’m also going to clean up my code from last time.
Last time when I was putting together my tibble for the Figure 4 data, I didn’t know how to use mutate(), so I just calculated the values like and directly put those values into the tibble. It’s very clunky and inelegant, and it looked like this:
#load packages
library(readspss) #package to read the original datafile from OFS
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.4 ✓ purrr 0.3.4
## ✓ tibble 3.1.2 ✓ dplyr 1.0.6
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
#read data
data <- read.sav("Humiston & Wamsley 2019 data.sav")
#remove excluded
cleandata <- data %>% #remove excluded participants
filter(exclude=="no")
#calculate change
pre_post_change_cued = 0.31 - 0.21 #note, these values are those taken from table 3
pre_post_change_uncued = 0.25 - 0.3
pre_week_change_cued = 0.40 - 0.21
pre_week_change_uncued = 0.40 - 0.30
#create dataframe
fig4_df <- tibble(
change_from_pre_to = c("immediate","week"),
cued = c(0.1, 0.19),
uncued = c(-0.05, 0.1),
)
Let’s update this with a solution that is a bit more elegant and mutate the change values instead. First, we’ll select the variables from the data we want to use, then mutate the change variables (which are calculated as the differences between prenap implicit bias and postnap implicit bias, as well as the difference between prenap and implicit bias a week later - all of this for both the cued and uncued conditions).
#select the variables I want to use first
biaschangeconditions <- cleandata %>%
select(postIATcued, preIATcued, postIATuncued, preIATuncued, weekIATcued, weekIATuncued)
#mutate data which is easier than calculating
biaschange <- biaschangeconditions %>%
mutate(immed_cued = postIATcued - preIATcued,
immed_uncued = postIATuncued - preIATuncued,
week_cued = weekIATcued - preIATcued,
week_uncued = weekIATuncued - preIATuncued)
Now it’s looking much nicer. After discussing with the group for a bit, Michelle told us that Jenny said the error bars are usually calculated with standard error rather than SD in psychology papers. So we had to figure out how to calculate SE. Michelle and I had tried to figure out what the SE in a few different ways but we just couldn’t seem to get the answer.
Firstly, we just tried to use the describe function to see if that would give us the SE. While this has given us values, these don’t match the ones given on the original plot. The SEs we generated are significantly higher and go even beyond the entire scale of the original plot’s y-axis. We were very confused and weren’t sure how to calculate the SE.
library(psych)
##
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
describe(biaschange)
## vars n mean sd median trimmed mad min max range skew
## postIATcued 1 31 0.31 0.44 0.29 0.32 0.43 -0.56 0.99 1.55 -0.13
## preIATcued 2 31 0.21 0.51 0.27 0.23 0.45 -1.03 1.13 2.16 -0.34
## postIATuncued 3 31 0.25 0.48 0.33 0.25 0.58 -0.69 1.07 1.76 -0.04
## preIATuncued 4 31 0.30 0.44 0.30 0.32 0.32 -0.94 1.27 2.21 -0.45
## weekIATcued 5 31 0.40 0.39 0.40 0.40 0.36 -0.38 1.23 1.61 0.06
## weekIATuncued 6 31 0.40 0.47 0.39 0.41 0.54 -0.56 1.20 1.75 -0.05
## immed_cued 7 31 0.10 0.54 0.14 0.08 0.60 -0.83 1.32 2.15 0.19
## immed_uncued 8 31 -0.05 0.57 -0.05 -0.08 0.50 -1.03 1.89 2.92 0.94
## week_cued 9 31 0.19 0.65 0.16 0.16 0.60 -0.94 1.93 2.88 0.50
## week_uncued 10 31 0.10 0.50 0.18 0.11 0.41 -1.01 1.14 2.15 -0.35
## kurtosis se
## postIATcued -0.97 0.08
## preIATcued -0.30 0.09
## postIATuncued -1.08 0.09
## preIATuncued 0.68 0.08
## weekIATcued -0.77 0.07
## weekIATuncued -1.11 0.08
## immed_cued -0.75 0.10
## immed_uncued 2.19 0.10
## week_cued 0.37 0.12
## week_uncued -0.40 0.09
Luckily for us, Julia had been playing around with some new packages and worked out how to do it.
First we had to find the mean which we did using summarise (which would select every variable which has ‘cued’ in it - all of which we want to use), and then finding the mean. We also found the standard error with the std.error command. And voila, it shows up in the biaschangeerror variable. While it is still very difficult to tell if this value is exactly correct (given that the scale on the graph is not precise enough to tell), it looks much closer to the original graph. However, I do not know why this is calculating SE in a different way to the describe function.
library(plotrix) #new library Julia found
##
## Attaching package: 'plotrix'
## The following object is masked from 'package:psych':
##
## rescale
biaschangemean <- biaschange %>%
summarise(across(contains("cued"), list(mean = mean))) #finding the mean of each variable
print(biaschangemean)
## postIATcued_mean preIATcued_mean postIATuncued_mean preIATuncued_mean
## 1 0.3068241 0.2108864 0.248543 0.3024484
## weekIATcued_mean weekIATuncued_mean immed_cued_mean immed_uncued_mean
## 1 0.3999553 0.3988819 0.09593775 -0.05390545
## week_cued_mean week_uncued_mean
## 1 0.1890689 0.09643346
biaschangeerror <- std.error(biaschange) #finding the standard error of each variable
print(biaschangeerror)
## postIATcued preIATcued postIATuncued preIATuncued weekIATcued
## 0.07984374 0.09232149 0.08578681 0.07937978 0.06954221
## weekIATuncued immed_cued immed_uncued week_cued week_uncued
## 0.08388760 0.09759788 0.10297893 0.11593440 0.09008655
Now that we have the standard error values, we can put it back into the graph. I have also added some aesthetic things. alpha represents the colour transparency of each element, ylim are the limits of the y axis. All in all, adding the error bars was a success!
#create dataset
time1 <- c(rep("immediate",2),rep("week",2)) #the two groups of columns
condition <-rep(c("cued","uncued"),2) #the subgroups within the grouped columns
bias_change1 <- c(0.10, -0.05,0.19, 0.10) #calculated differences, needs to be in order of the graph
stderror <- c(0.09759788, 0.10297893, 0.11593440, 0.09008655)
data = data.frame(time1, condition, bias_change1, stderror)
head(data)
## time1 condition bias_change1 stderror
## 1 immediate cued 0.10 0.09759788
## 2 immediate uncued -0.05 0.10297893
## 3 week cued 0.19 0.11593440
## 4 week uncued 0.10 0.09008655
# plot
ggplot(data = data, aes(
x = time1,
y = bias_change1,
fill = condition
)) +
geom_bar(position = "dodge",stat = "identity")+
geom_errorbar(aes(x= time1, ymin=bias_change1-stderror, ymax=bias_change1+stderror), width=0.4, colour="grey", alpha= 0.9, position = position_dodge(0.9)) +
ylim(-0.2, 0.4) +
labs(x = "", y = "Bias Change", caption = "Fig 4. Change in implicit bias levels at the immediate and one-week delay tests.")
This plot was largely done by Michelle, but it was still a team effort with all of us adding together our knowledge from the first plot we had just completed.
This is the plot we were trying to recreate.
Additionally, this was quite easy to do because it was very similar to the first plot we did. The main difference was that it was in geom_line instead of geom_bar. Both graphs have error bars and different levels.
First, we need to work out how to get the SEs for the error bars and means. We calculate the averages by using summarise, then for SE, we are using the same method with the plotrix package that we used in figure 4 (above).
#selecting variables that show biases
choosing_bias <- cleandata %>%
select(baseIATcued, baseIATuncued, preIATcued, preIATuncued, postIATcued, postIATuncued, weekIATcued, weekIATuncued)
#calculate averages
bias_av <- choosing_bias %>%
summarise(cued_baseline_av = mean(baseIATcued),
cued_pre_av = mean(preIATcued),
cued_post_av = mean(postIATcued),
cued_week_av = mean(weekIATcued),
uncued_baseline_av = mean(baseIATuncued),
uncued_pre_av = mean(preIATuncued),
uncued_post_av = mean(postIATuncued),
uncued_week_av = mean(weekIATuncued))
print(bias_av)
## cued_baseline_av cued_pre_av cued_post_av cued_week_av uncued_baseline_av
## 1 0.5175814 0.2108864 0.3068241 0.3999553 0.5954932
## uncued_pre_av uncued_post_av uncued_week_av
## 1 0.3024484 0.248543 0.3988819
SE_bias <- std.error(choosing_bias)
print(SE_bias)
## baseIATcued baseIATuncued preIATcued preIATuncued postIATcued
## 0.06522755 0.08030357 0.09232149 0.07937978 0.07984374
## postIATuncued weekIATcued weekIATuncued
## 0.08578681 0.06954221 0.08388760
Then, we have to put our tibble/dataframe together. We need a column for whether it was cued/uncued mean, another column for when the implicit bias was measured (baseline,prenap, postnap, or one week after)
#construct data
data4 <- data.frame(
condition = factor(c("cued", "cued", "cued", "cued", "uncued", "uncued", "uncued", "uncued")),
time = factor(c("Baseline", "Prenap", "Postnap", "1-week", "Baseline", "Prenap", "Postnap", "1-week")),
levels = c("Baseline", "Prenap", "Postnap", "1-week"),
bias_av = c(0.52, 0.21, 0.31, 0.40, 0.60, 0.30, 0.25, 0.40),
se = c(0.06, 0.09, 0.08,0.07,0.08, 0.08, 0.09,0.08)
)
head(data4)
## condition time levels bias_av se
## 1 cued Baseline Baseline 0.52 0.06
## 2 cued Prenap Prenap 0.21 0.09
## 3 cued Postnap Postnap 0.31 0.08
## 4 cued 1-week 1-week 0.40 0.07
## 5 uncued Baseline Baseline 0.60 0.08
## 6 uncued Prenap Prenap 0.30 0.08
As you can see, this is quite similar to the dataframe for figure 4.
Now, we put it into the ggplot.
ggplot(data = data4, aes(
x = factor(time, level = c("Baseline", "Prenap", "Postnap", "1-week")), #googled how to reorder x variables !
y = bias_av,
colour = condition,
group = condition)) +
geom_line() +
geom_errorbar(aes(
x= time,
ymin=bias_av-se,
ymax=bias_av+se),
width=0.1, colour="grey", alpha= 0.9) +
ylim(0.0, 0.7) + #expands y axis
labs(x = "", y = "D600 Bias Score", caption = "Fig 3. Average D600 scores at each IAT timepoint")
It looks good! Really happy with how it has turned out, and this helped me consolidate methods for calculating SEs. A success!
This one was probably the biggest success of all this week. I managed to start and finish this plot all on my own! It was easier for a few reasons:
I’ve already posted the process for this elsewhere on my RPubs, but here it is again.
#load packages
library(readspss) #package to read the original datafile from OFS
library(tidyverse)
#read data
data <- read.sav("Humiston & Wamsley 2019 data.sav")
#remove excluded
cleandata <- data %>% #remove excluded participants
filter(exclude=="no")
First, we’ll have a look at the graph we want to replicate. This is a screenshot from the original report:
We need to figure out how to calculate the differential bias change (for the y axis). However, luckily for us we don’t need to calculate the x axis since that is already done.
According to the paper, differential bias change is calculated by “baseline minus delayed score for uncued bias subtracted from the baseline minus delayed score for cued bias”.
So the equation would look something like this:
(baseline_cued - delayed_cued) - (baseline_uncued - delayed_uncued)
We would then have to apply this to each participant’s scores.
Firstly, we have to mutate a bunch of extra columns to get what we want for the y-axis. So we mutate to get one set of brackets, again for the other set, and once again to get the final differential bias change.
differential <- cleandata %>%
select(ParticipantID, baseIATcued, weekIATcued, baseIATuncued, weekIATuncued, SWSxREM) %>%
mutate(cued_differential = baseIATcued - weekIATcued,
uncued_differential = baseIATuncued - weekIATuncued,
diff_bias_change = cued_differential - uncued_differential) #this is the y-axis
print(differential)
## ParticipantID baseIATcued weekIATcued baseIATuncued weekIATuncued SWSxREM
## 1 ub6 0.57544182 0.20377367 0.60953653 0.68277422 276
## 2 ub7 0.09911241 0.45873715 0.64396538 -0.01070460 0
## 3 ub8 0.20577365 0.39859469 1.52435622 0.71187286 408
## 4 ub9 0.35314196 0.92341592 0.13108478 0.20212832 408
## 5 ub11 0.57200207 -0.01869151 0.04879409 0.13071184 32
## 6 ub13 0.31025514 0.56073473 0.90121486 1.11629844 648
## 7 ub14 0.23241080 -0.06857532 1.50094682 -0.27687601 333
## 8 ub15 0.67870908 0.25359928 0.61393136 0.03100248 36
## 9 ub18 1.08814254 0.77816500 0.52245709 -0.31888702 552
## 10 ub24 0.55318776 0.08084087 0.28256540 0.03038869 275
## 11 ub25 0.91751740 0.28109140 0.41293602 0.58451878 70
## 12 ub27 0.68424700 0.51216459 0.44334149 0.98690632 496
## 13 ub28 1.08844176 0.55663600 0.77617187 0.93836953 0
## 14 ub29 0.94430311 0.21879167 0.76616645 0.40828505 476
## 15 ub31 -0.15051656 0.64307684 1.20427060 0.88612653 363
## 16 ub32 -0.01583277 0.90497541 0.89984149 0.99438842 0
## 17 ub34 0.53679315 0.57458515 0.26553538 0.00799229 198
## 18 ub35 0.89884164 -0.12045279 0.94869928 1.19622739 0
## 19 ub36 0.63866504 0.64107173 0.36672662 0.62092800 216
## 20 ub38 -0.23209723 0.13055599 -0.20514602 -0.07578923 240
## 21 ub40 0.35954000 0.30356801 0.51371306 0.50089370 450
## 22 ub41 0.26726994 -0.24029660 0.35867739 0.02753136 684
## 23 ub42 0.84482587 -0.37601343 0.63582486 -0.55788997 836
## 24 ub43 0.63319883 0.99021393 0.56643958 0.38961081 506
## 25 ub44 0.43954561 0.62261116 0.93824426 0.98645892 0
## 26 ub45 0.73144877 0.16658792 1.08670619 0.86611864 418
## 27 ub46 -0.07735156 0.05022121 0.08362815 -0.10227978 480
## 28 ub47 1.08893601 1.22970000 -0.18741515 0.16271922 50
## 29 ub48 0.79863715 0.89204287 1.32229673 0.62636312 0
## 30 ub49 0.44411896 0.16209957 0.33019024 0.33564519 336
## 31 ub50 0.53631389 0.68478850 0.15458906 0.28350526 224
## cued_differential uncued_differential diff_bias_change
## 1 0.371668148 -0.07323769 0.44490584
## 2 -0.359624732 0.65466998 -1.01429471
## 3 -0.192821041 0.81248335 -1.00530440
## 4 -0.570273967 -0.07104354 -0.49923043
## 5 0.590693579 -0.08191775 0.67261133
## 6 -0.250479588 -0.21508358 -0.03539601
## 7 0.300986111 1.77782283 -1.47683672
## 8 0.425109806 0.58292888 -0.15781907
## 9 0.309977544 0.84134411 -0.53136657
## 10 0.472346888 0.25217670 0.22017019
## 11 0.636426002 -0.17158276 0.80800876
## 12 0.172082405 -0.54356483 0.71564723
## 13 0.531805761 -0.16219766 0.69400342
## 14 0.725511445 0.35788139 0.36763005
## 15 -0.793593406 0.31814407 -1.11173748
## 16 -0.920808186 -0.09454693 -0.82626125
## 17 -0.037791997 0.25754309 -0.29533509
## 18 1.019294435 -0.24752811 1.26682255
## 19 -0.002406687 -0.25420138 0.25179469
## 20 -0.362653222 -0.12935678 -0.23329644
## 21 0.055971988 0.01281936 0.04315263
## 22 0.507566544 0.33114603 0.17642051
## 23 1.220839300 1.19371483 0.02712447
## 24 -0.357015106 0.17682877 -0.53384387
## 25 -0.183065550 -0.04821467 -0.13485088
## 26 0.564860845 0.22058754 0.34427330
## 27 -0.127572774 0.18590793 -0.31348070
## 28 -0.140763994 -0.35013437 0.20937037
## 29 -0.093405722 0.69593361 -0.78933933
## 30 0.282019388 -0.00545494 0.28747433
## 31 -0.148474613 -0.12891620 -0.01955842
Now that we have the values for the axes, I’m going to plot the graph.
We select the data to be what we used before, then we specify the y and x axes. We then use geom_point to create a scatter plot. After that, we add geom_smooth to add a regression line. Within geom_smooth we have to add method = lm to make it a straight line, and we get rid of the confidence interval with se = F.
ggplot(data = differential, aes(
x = SWSxREM,
y = diff_bias_change
))+
geom_point()+
geom_smooth(method = lm, #lm stands for linear model, so it makes the line straight
se = F)+ #removes confidence interval shading
xlim(0,1000) #for some reason the x axis isn't starting at 0 with this
## `geom_smooth()` using formula 'y ~ x'
An attempt to get the x-axis to start at 0. I found this solution on Google which adds the scale_x_continuous function to centre the graph at the origin. At the same time, the `limits = c(0,1000) allows us to expand the axis to 1000, the same length as the original graph.
Final graph:
ggplot(data = differential, aes(
x = SWSxREM,
y = diff_bias_change
))+
geom_point()+
geom_smooth(method = lm, #lm stands for linear model, so it makes the line straight
se = F)+ #removes confidence interval shading
scale_x_continuous(expand = c(0,0),limits = c(0,1000))+
scale_y_continuous(expand = c(0,0),limits = c(-2,1.5))+
labs(title = "Fig 5. No association between minutes in SWS x minutes in REM and differential bias change",
x = "SWS x REM sleep duration (min)",
y = "Differential bias change")+
theme_bw()
## `geom_smooth()` using formula 'y ~ x'
Was extremely happy with this process and there were little to no bumps on the road. The only big difficulty with this was initially working out how to calculate the differential bias change. It even looks really similar to the original graph on an aesthetic level!
Also learnt how to use github this week for our group project since the lockdown will not be ending any time soon.
I had some challenges setting up and navigating the Mac OS terminal, but thanks to Jenny’s guidance and the HappyGithub website, this turned out really well. We have now set up a repository and are uploading all of our scripts and code for figures onto this now.
Since our group has now finished all our plots and tables (surprisingly early), my main goal is going to be backtracking and trying to fully understand aspects of code that I don’t understand. For me, this is mostly to do with tables and the gt() package.
gt() package by recreating Table 1 myself