Week 6 coding goals

This week was probably our most productive week yet, and our group collectively blew our goals out of the water. The goals for this week were:

Adding error bars to Figure 4
Finishing another plot
Starting on the final plot
Learn to use github with the group

Challenges and successes

1. Adding error bars to Figure 4

As it turns out, adding the error bars was the easy part since you just use geom_errorbar(). The difficult part was working out what the values of the error bar were.

To recap, this is what the original Figure 4 looks like:

So, first I’m going to recreate the original column bar figure without the error bars first. I’m also going to clean up my code from last time.

Last time when I was putting together my tibble for the Figure 4 data, I didn’t know how to use mutate(), so I just calculated the values like and directly put those values into the tibble. It’s very clunky and inelegant, and it looked like this:

#load packages
library(readspss) #package to read the original datafile from OFS
library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✓ ggplot2 3.3.4     ✓ purrr   0.3.4
## ✓ tibble  3.1.2     ✓ dplyr   1.0.6
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

#read data
data <- read.sav("Humiston & Wamsley 2019 data.sav")

#remove excluded 
cleandata <- data %>%     #remove excluded participants 
  filter(exclude=="no")

#calculate change
pre_post_change_cued = 0.31 - 0.21 #note, these values are those taken from table 3

pre_post_change_uncued = 0.25 - 0.3

pre_week_change_cued = 0.40 - 0.21

pre_week_change_uncued = 0.40 - 0.30

#create dataframe
fig4_df <- tibble(
  change_from_pre_to = c("immediate","week"),
  cued = c(0.1, 0.19),
  uncued = c(-0.05, 0.1),
)

Let’s update this with a solution that is a bit more elegant and mutate the change values instead. First, we’ll select the variables from the data we want to use, then mutate the change variables (which are calculated as the differences between prenap implicit bias and postnap implicit bias, as well as the difference between prenap and implicit bias a week later - all of this for both the cued and uncued conditions).

#select the variables I want to use first
biaschangeconditions <- cleandata %>%
  select(postIATcued, preIATcued, postIATuncued, preIATuncued, weekIATcued, weekIATuncued)

#mutate data which is easier than calculating
biaschange <- biaschangeconditions  %>%
  mutate(immed_cued = postIATcued - preIATcued,
         immed_uncued = postIATuncued - preIATuncued,
         week_cued = weekIATcued - preIATcued,
         week_uncued = weekIATuncued - preIATuncued)

Now it’s looking much nicer. After discussing with the group for a bit, Michelle told us that Jenny said the error bars are usually calculated with standard error rather than SD in psychology papers. So we had to figure out how to calculate SE. Michelle and I had tried to figure out what the SE in a few different ways but we just couldn’t seem to get the answer.

Firstly, we just tried to use the describe function to see if that would give us the SE. While this has given us values, these don’t match the ones given on the original plot. The SEs we generated are significantly higher and go even beyond the entire scale of the original plot’s y-axis. We were very confused and weren’t sure how to calculate the SE.

library(psych)

## 
## Attaching package: 'psych'

## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha

describe(biaschange)

##               vars  n  mean   sd median trimmed  mad   min  max range  skew
## postIATcued      1 31  0.31 0.44   0.29    0.32 0.43 -0.56 0.99  1.55 -0.13
## preIATcued       2 31  0.21 0.51   0.27    0.23 0.45 -1.03 1.13  2.16 -0.34
## postIATuncued    3 31  0.25 0.48   0.33    0.25 0.58 -0.69 1.07  1.76 -0.04
## preIATuncued     4 31  0.30 0.44   0.30    0.32 0.32 -0.94 1.27  2.21 -0.45
## weekIATcued      5 31  0.40 0.39   0.40    0.40 0.36 -0.38 1.23  1.61  0.06
## weekIATuncued    6 31  0.40 0.47   0.39    0.41 0.54 -0.56 1.20  1.75 -0.05
## immed_cued       7 31  0.10 0.54   0.14    0.08 0.60 -0.83 1.32  2.15  0.19
## immed_uncued     8 31 -0.05 0.57  -0.05   -0.08 0.50 -1.03 1.89  2.92  0.94
## week_cued        9 31  0.19 0.65   0.16    0.16 0.60 -0.94 1.93  2.88  0.50
## week_uncued     10 31  0.10 0.50   0.18    0.11 0.41 -1.01 1.14  2.15 -0.35
##               kurtosis   se
## postIATcued      -0.97 0.08
## preIATcued       -0.30 0.09
## postIATuncued    -1.08 0.09
## preIATuncued      0.68 0.08
## weekIATcued      -0.77 0.07
## weekIATuncued    -1.11 0.08
## immed_cued       -0.75 0.10
## immed_uncued      2.19 0.10
## week_cued         0.37 0.12
## week_uncued      -0.40 0.09

Luckily for us, Julia had been playing around with some new packages and worked out how to do it.

First we had to find the mean which we did using summarise (which would select every variable which has ‘cued’ in it - all of which we want to use), and then finding the mean. We also found the standard error with the std.error command. And voila, it shows up in the biaschangeerror variable. While it is still very difficult to tell if this value is exactly correct (given that the scale on the graph is not precise enough to tell), it looks much closer to the original graph. However, I do not know why this is calculating SE in a different way to the describe function.

library(plotrix) #new library Julia found

## 
## Attaching package: 'plotrix'

## The following object is masked from 'package:psych':
## 
##     rescale

biaschangemean <- biaschange %>% 
  summarise(across(contains("cued"), list(mean = mean))) #finding the mean of each variable
print(biaschangemean)

##   postIATcued_mean preIATcued_mean postIATuncued_mean preIATuncued_mean
## 1        0.3068241       0.2108864           0.248543         0.3024484
##   weekIATcued_mean weekIATuncued_mean immed_cued_mean immed_uncued_mean
## 1        0.3999553          0.3988819      0.09593775       -0.05390545
##   week_cued_mean week_uncued_mean
## 1      0.1890689       0.09643346

biaschangeerror <- std.error(biaschange) #finding the standard error of each variable
print(biaschangeerror)

##   postIATcued    preIATcued postIATuncued  preIATuncued   weekIATcued 
##    0.07984374    0.09232149    0.08578681    0.07937978    0.06954221 
## weekIATuncued    immed_cued  immed_uncued     week_cued   week_uncued 
##    0.08388760    0.09759788    0.10297893    0.11593440    0.09008655

Now that we have the standard error values, we can put it back into the graph. I have also added some aesthetic things. alpha represents the colour transparency of each element, ylim are the limits of the y axis. All in all, adding the error bars was a success!

#create dataset
time1 <- c(rep("immediate",2),rep("week",2)) #the two groups of columns
condition <-rep(c("cued","uncued"),2) #the subgroups within the grouped columns
bias_change1 <- c(0.10, -0.05,0.19, 0.10) #calculated differences, needs to be in order of the graph
stderror <- c(0.09759788, 0.10297893, 0.11593440, 0.09008655)
data = data.frame(time1, condition, bias_change1, stderror)

head(data)

##       time1 condition bias_change1   stderror
## 1 immediate      cued         0.10 0.09759788
## 2 immediate    uncued        -0.05 0.10297893
## 3      week      cued         0.19 0.11593440
## 4      week    uncued         0.10 0.09008655

# plot
ggplot(data = data, aes(
  x = time1,
  y = bias_change1,
  fill = condition
)) +
  geom_bar(position = "dodge",stat = "identity")+
  geom_errorbar(aes(x= time1, ymin=bias_change1-stderror, ymax=bias_change1+stderror), width=0.4, colour="grey", alpha= 0.9, position = position_dodge(0.9)) +
  ylim(-0.2, 0.4) +
  labs(x = "", y = "Bias Change", caption = "Fig 4. Change in implicit bias levels at the immediate and one-week delay tests.")

2. Finish another plot

This plot was largely done by Michelle, but it was still a team effort with all of us adding together our knowledge from the first plot we had just completed.

This is the plot we were trying to recreate.

Additionally, this was quite easy to do because it was very similar to the first plot we did. The main difference was that it was in geom_line instead of geom_bar. Both graphs have error bars and different levels.

First, we need to work out how to get the SEs for the error bars and means. We calculate the averages by using summarise, then for SE, we are using the same method with the plotrix package that we used in figure 4 (above).

#selecting variables that show biases
choosing_bias <- cleandata %>%
  select(baseIATcued, baseIATuncued, preIATcued, preIATuncued, postIATcued, postIATuncued, weekIATcued, weekIATuncued)

#calculate averages
bias_av <- choosing_bias %>%
  summarise(cued_baseline_av = mean(baseIATcued),
            cued_pre_av = mean(preIATcued),
            cued_post_av = mean(postIATcued),
            cued_week_av = mean(weekIATcued),
            uncued_baseline_av = mean(baseIATuncued),
            uncued_pre_av = mean(preIATuncued),
            uncued_post_av = mean(postIATuncued),
            uncued_week_av = mean(weekIATuncued))
  
print(bias_av)

##   cued_baseline_av cued_pre_av cued_post_av cued_week_av uncued_baseline_av
## 1        0.5175814   0.2108864    0.3068241    0.3999553          0.5954932
##   uncued_pre_av uncued_post_av uncued_week_av
## 1     0.3024484       0.248543      0.3988819

SE_bias <- std.error(choosing_bias)
print(SE_bias)

##   baseIATcued baseIATuncued    preIATcued  preIATuncued   postIATcued 
##    0.06522755    0.08030357    0.09232149    0.07937978    0.07984374 
## postIATuncued   weekIATcued weekIATuncued 
##    0.08578681    0.06954221    0.08388760

Then, we have to put our tibble/dataframe together. We need a column for whether it was cued/uncued mean, another column for when the implicit bias was measured (baseline,prenap, postnap, or one week after)

#construct data 
data4 <- data.frame(
  condition = factor(c("cued", "cued", "cued", "cued", "uncued", "uncued", "uncued", "uncued")),
  time = factor(c("Baseline", "Prenap", "Postnap", "1-week", "Baseline", "Prenap", "Postnap", "1-week")),
  levels = c("Baseline", "Prenap", "Postnap", "1-week"),
  bias_av = c(0.52, 0.21, 0.31, 0.40, 0.60, 0.30, 0.25, 0.40),
  se = c(0.06, 0.09, 0.08,0.07,0.08, 0.08, 0.09,0.08)
)

head(data4)

##   condition     time   levels bias_av   se
## 1      cued Baseline Baseline    0.52 0.06
## 2      cued   Prenap   Prenap    0.21 0.09
## 3      cued  Postnap  Postnap    0.31 0.08
## 4      cued   1-week   1-week    0.40 0.07
## 5    uncued Baseline Baseline    0.60 0.08
## 6    uncued   Prenap   Prenap    0.30 0.08

As you can see, this is quite similar to the dataframe for figure 4.

Now, we put it into the ggplot.

ggplot(data = data4, aes(
  x = factor(time, level = c("Baseline", "Prenap", "Postnap", "1-week")), #googled how to reorder x variables ! 
  y = bias_av,
  colour = condition,
  group = condition)) +
  geom_line() +
  geom_errorbar(aes(
    x= time,
    ymin=bias_av-se,
    ymax=bias_av+se),
width=0.1, colour="grey", alpha= 0.9) +
  ylim(0.0, 0.7) + #expands y axis 
  labs(x = "", y = "D600 Bias Score", caption = "Fig 3. Average D600 scores at each IAT timepoint")

It looks good! Really happy with how it has turned out, and this helped me consolidate methods for calculating SEs. A success!

3. Start on the final plot

This one was probably the biggest success of all this week. I managed to start and finish this plot all on my own! It was easier for a few reasons:

I already knew a lot about plotting and setting up dataframes thanks to the past tables and plots I had attempted
I didn’t have to calculate any of my own values, and the ones that I did have to calculate were easy.

I’ve already posted the process for this elsewhere on my RPubs, but here it is again.

Load packages and read data first.

#load packages
library(readspss) #package to read the original datafile from OFS
library(tidyverse)

#read data
data <- read.sav("Humiston & Wamsley 2019 data.sav")

#remove excluded 
cleandata <- data %>%     #remove excluded participants 
  filter(exclude=="no")

Figuring out what to put into the figure

First, we’ll have a look at the graph we want to replicate. This is a screenshot from the original report:

We need to figure out how to calculate the differential bias change (for the y axis). However, luckily for us we don’t need to calculate the x axis since that is already done.

Calculating y-axis: differential bias change

According to the paper, differential bias change is calculated by “baseline minus delayed score for uncued bias subtracted from the baseline minus delayed score for cued bias”.

So the equation would look something like this:

(baseline_cued - delayed_cued) - (baseline_uncued - delayed_uncued)

We would then have to apply this to each participant’s scores.

Firstly, we have to mutate a bunch of extra columns to get what we want for the y-axis. So we mutate to get one set of brackets, again for the other set, and once again to get the final differential bias change.

differential <- cleandata %>%
  select(ParticipantID, baseIATcued, weekIATcued, baseIATuncued, weekIATuncued, SWSxREM) %>%
  mutate(cued_differential = baseIATcued - weekIATcued,
         uncued_differential = baseIATuncued - weekIATuncued,
         diff_bias_change = cued_differential - uncued_differential) #this is the y-axis

print(differential)

##    ParticipantID baseIATcued weekIATcued baseIATuncued weekIATuncued SWSxREM
## 1            ub6  0.57544182  0.20377367    0.60953653    0.68277422     276
## 2            ub7  0.09911241  0.45873715    0.64396538   -0.01070460       0
## 3            ub8  0.20577365  0.39859469    1.52435622    0.71187286     408
## 4            ub9  0.35314196  0.92341592    0.13108478    0.20212832     408
## 5           ub11  0.57200207 -0.01869151    0.04879409    0.13071184      32
## 6           ub13  0.31025514  0.56073473    0.90121486    1.11629844     648
## 7           ub14  0.23241080 -0.06857532    1.50094682   -0.27687601     333
## 8           ub15  0.67870908  0.25359928    0.61393136    0.03100248      36
## 9           ub18  1.08814254  0.77816500    0.52245709   -0.31888702     552
## 10          ub24  0.55318776  0.08084087    0.28256540    0.03038869     275
## 11          ub25  0.91751740  0.28109140    0.41293602    0.58451878      70
## 12          ub27  0.68424700  0.51216459    0.44334149    0.98690632     496
## 13          ub28  1.08844176  0.55663600    0.77617187    0.93836953       0
## 14          ub29  0.94430311  0.21879167    0.76616645    0.40828505     476
## 15          ub31 -0.15051656  0.64307684    1.20427060    0.88612653     363
## 16          ub32 -0.01583277  0.90497541    0.89984149    0.99438842       0
## 17          ub34  0.53679315  0.57458515    0.26553538    0.00799229     198
## 18          ub35  0.89884164 -0.12045279    0.94869928    1.19622739       0
## 19          ub36  0.63866504  0.64107173    0.36672662    0.62092800     216
## 20          ub38 -0.23209723  0.13055599   -0.20514602   -0.07578923     240
## 21          ub40  0.35954000  0.30356801    0.51371306    0.50089370     450
## 22          ub41  0.26726994 -0.24029660    0.35867739    0.02753136     684
## 23          ub42  0.84482587 -0.37601343    0.63582486   -0.55788997     836
## 24          ub43  0.63319883  0.99021393    0.56643958    0.38961081     506
## 25          ub44  0.43954561  0.62261116    0.93824426    0.98645892       0
## 26          ub45  0.73144877  0.16658792    1.08670619    0.86611864     418
## 27          ub46 -0.07735156  0.05022121    0.08362815   -0.10227978     480
## 28          ub47  1.08893601  1.22970000   -0.18741515    0.16271922      50
## 29          ub48  0.79863715  0.89204287    1.32229673    0.62636312       0
## 30          ub49  0.44411896  0.16209957    0.33019024    0.33564519     336
## 31          ub50  0.53631389  0.68478850    0.15458906    0.28350526     224
##    cued_differential uncued_differential diff_bias_change
## 1        0.371668148         -0.07323769       0.44490584
## 2       -0.359624732          0.65466998      -1.01429471
## 3       -0.192821041          0.81248335      -1.00530440
## 4       -0.570273967         -0.07104354      -0.49923043
## 5        0.590693579         -0.08191775       0.67261133
## 6       -0.250479588         -0.21508358      -0.03539601
## 7        0.300986111          1.77782283      -1.47683672
## 8        0.425109806          0.58292888      -0.15781907
## 9        0.309977544          0.84134411      -0.53136657
## 10       0.472346888          0.25217670       0.22017019
## 11       0.636426002         -0.17158276       0.80800876
## 12       0.172082405         -0.54356483       0.71564723
## 13       0.531805761         -0.16219766       0.69400342
## 14       0.725511445          0.35788139       0.36763005
## 15      -0.793593406          0.31814407      -1.11173748
## 16      -0.920808186         -0.09454693      -0.82626125
## 17      -0.037791997          0.25754309      -0.29533509
## 18       1.019294435         -0.24752811       1.26682255
## 19      -0.002406687         -0.25420138       0.25179469
## 20      -0.362653222         -0.12935678      -0.23329644
## 21       0.055971988          0.01281936       0.04315263
## 22       0.507566544          0.33114603       0.17642051
## 23       1.220839300          1.19371483       0.02712447
## 24      -0.357015106          0.17682877      -0.53384387
## 25      -0.183065550         -0.04821467      -0.13485088
## 26       0.564860845          0.22058754       0.34427330
## 27      -0.127572774          0.18590793      -0.31348070
## 28      -0.140763994         -0.35013437       0.20937037
## 29      -0.093405722          0.69593361      -0.78933933
## 30       0.282019388         -0.00545494       0.28747433
## 31      -0.148474613         -0.12891620      -0.01955842

Now that we have the values for the axes, I’m going to plot the graph.

Plot the graph

We select the data to be what we used before, then we specify the y and x axes. We then use geom_point to create a scatter plot. After that, we add geom_smooth to add a regression line. Within geom_smooth we have to add method = lm to make it a straight line, and we get rid of the confidence interval with se = F.

ggplot(data = differential, aes(
  x = SWSxREM,
  y = diff_bias_change
))+
  geom_point()+
  geom_smooth(method = lm, #lm stands for linear model, so it makes the line straight
              se = F)+ #removes confidence interval shading
  xlim(0,1000) #for some reason the x axis isn't starting at 0 with this

## `geom_smooth()` using formula 'y ~ x'

An attempt to get the x-axis to start at 0. I found this solution on Google which adds the scale_x_continuous function to centre the graph at the origin. At the same time, the `limits = c(0,1000) allows us to expand the axis to 1000, the same length as the original graph.

Final graph:

ggplot(data = differential, aes(
  x = SWSxREM,
  y = diff_bias_change
))+
  geom_point()+
  geom_smooth(method = lm, #lm stands for linear model, so it makes the line straight
              se = F)+ #removes confidence interval shading
  scale_x_continuous(expand = c(0,0),limits = c(0,1000))+ 
  scale_y_continuous(expand = c(0,0),limits = c(-2,1.5))+
  labs(title = "Fig 5. No association between minutes in SWS x minutes in REM and differential bias change", 
       x = "SWS x REM sleep duration (min)",
       y = "Differential bias change")+
  theme_bw()

## `geom_smooth()` using formula 'y ~ x'

Was extremely happy with this process and there were little to no bumps on the road. The only big difficulty with this was initially working out how to calculate the differential bias change. It even looks really similar to the original graph on an aesthetic level!

4. Learn to use github

Also learnt how to use github this week for our group project since the lockdown will not be ending any time soon.

I had some challenges setting up and navigating the Mac OS terminal, but thanks to Jenny’s guidance and the HappyGithub website, this turned out really well. We have now set up a repository and are uploading all of our scripts and code for figures onto this now.

Next steps on my coding journey

Since our group has now finished all our plots and tables (surprisingly early), my main goal is going to be backtracking and trying to fully understand aspects of code that I don’t understand. For me, this is mostly to do with tables and the gt() package.

Learn to use the gt() package by recreating Table 1 myself
Learn to create tibbles with the value name instead of the actual value itself

Week 6 Learning Log

Katherine Wong

09/07/2021