Wk 7 Learning Log

This week’s goals

Finish the speech and formatting of the slides for my first part of the presentation in time for our weekly meeting
Discuss examples of code to use in our presentation during the meething and allocate the rest of the work
Continue learning how to use Github and add my code for tables and figures onto the group github
Have a go at reproducing figure 3 and 5 independently and then using my team mates code from github
Fix the code that wasn’t working from last week

Figure 3

Preliminaries

load packages

I will be using the same packages as last week to create figure 3 and 5

library(tidyverse) #for data wrangling and visualisation

## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --

## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.1.2     v dplyr   1.0.6
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(gt) #for creating a table
library(janitor) #for cleaning names and other possibly handy functions

## 
## Attaching package: 'janitor'

## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

library(plotrix) #for calculating standard error of the mean

read in the data

As usual I am reading in the clean data file I saved from when creating table 1. Just to recap, this file excludes all data from excluded participants. There are more variables than I need to reproduce plot 2, so I will need to do some data wrangling.

cleandata <- read_csv("cleandata.csv")

## 
## -- Column specification --------------------------------------------------------
## cols(
##   .default = col_character(),
##   Age = col_double(),
##   General_1_EnglishYrs = col_double(),
##   General_1_CaffCups = col_double(),
##   General_1_CaffHrsAgo = col_double(),
##   General_1_UniYears = col_double(),
##   EES = col_double(),
##   AlertTest_1_Concentr_1 = col_double(),
##   AlertTest_1_Refresh_1 = col_double(),
##   AlertTest_2_Concentr_1 = col_double(),
##   AlertTest_2_Refresh_1 = col_double(),
##   AlertTest_3_Concentr_1 = col_double(),
##   AlertTest_3_Refresh_1 = col_double(),
##   AlertTest_4_Concentr_1 = col_double(),
##   AlertTest_4_Refresh_1 = col_double(),
##   Total_sleep = col_double(),
##   Wake_amount = col_double(),
##   NREM1_amount = col_double(),
##   NREM2_amount = col_double(),
##   SWS_amount = col_double(),
##   REM_amount = col_double()
##   # ... with 26 more columns
## )
## i Use `spec()` for the full column specifications.

Data wrangling

obtaining relevant variables

This figure plots the mean bias scores for cued and uncued conditions across all four time frames, thus I will be using select to obtain the relevant variables from the ‘cleandata’ file first.

Then I’m using summarise to obtain the means of all these variables to create a new variabel called ‘mean_bias’. This will be done efficiently by using across in conjunction with contains to identify all variables with a common phrase to carry out the summary on rather than typing all variables out again.

Then I’m using the std.error function from the plotrix package to find the standard error of the mean for each of varaibles in ‘avgbias’:

avgbias <- cleandata %>% 
  select(baseIATcued, baseIATuncued, preIATcued, preIATuncued, postIATcued, postIATuncued,weekIATcued, weekIATuncued) 

mean_bias <- avgbias %>% 
  summarise(across(contains("IAT"), list(mean = "mean")))

print(mean_bias)

## # A tibble: 1 x 8
##   baseIATcued_mean baseIATuncued_mean preIATcued_mean preIATuncued_mean
##              <dbl>              <dbl>           <dbl>             <dbl>
## 1            0.518              0.595           0.211             0.302
## # ... with 4 more variables: postIATcued_mean <dbl>, postIATuncued_mean <dbl>,
## #   weekIATcued_mean <dbl>, weekIATuncued_mean <dbl>

se_bias <- std.error(avgbias)
  
print(se_bias)

##   baseIATcued baseIATuncued    preIATcued  preIATuncued   postIATcued 
##    0.06522755    0.08030357    0.09232149    0.07937978    0.07984374 
## postIATuncued   weekIATcued weekIATuncued 
##    0.08578681    0.06954221    0.08388760

Reproducing the figure

creating the dataframe

Now that I have all my values reproduced, I can organise them into a data frame which will be used to create the gg plot later. I’m trying something new that Jenny suggested to us in one of our group members learning logs. That is using the $ to select the exact variable or mean from the dataframe/variable ‘mean_bias’ created earlier rather than copying and pasting the actual mean value in case of an error. However, I was not able to do this with the standard error values so I had to copy and paste them.

time <- c(rep("baseline",2),rep("prenap",2), rep("postnap",2), rep("week", 2))
condition <-rep(c("cued","uncued"))
bias <- c(mean_bias$baseIATcued_mean, mean_bias$baseIATuncued_mean, mean_bias$preIATcued_mean, mean_bias$preIATuncued_mean, mean_bias$postIATcued_mean, mean_bias$postIATuncued_mean, mean_bias$weekIATcued_mean, mean_bias$weekIATuncued_mean)
stderror <- c(0.06522755, 0.08030357, 0.09232149, 0.07937978, 0.07984374, 0.08578681, 0.06954221, 0.08388760)
data2 <- data.frame(time, condition, bias, stderror)

head(data2)

##       time condition      bias   stderror
## 1 baseline      cued 0.5175814 0.06522755
## 2 baseline    uncued 0.5954932 0.08030357
## 3   prenap      cued 0.2108864 0.09232149
## 4   prenap    uncued 0.3024484 0.07937978
## 5  postnap      cued 0.3068241 0.07984374
## 6  postnap    uncued 0.2485430 0.08578681

creating the ggplot

Now I’m up to formatting the info in the above dataframe into a plot.

fig_3 <- ggplot(data = data2, aes(x = time, y = bias, fill = condition))+
  geom_line()+
  geom_errorbar(aes(x = time, ymin = bias - stderror, ymax = bias + stderror), width=0.1, colour="grey", alpha= 0.9)+
   ylim(0.0, 0.7) +
  labs(x = "", y = "D600 Bias Score", caption = "Fig 3. Average D600 scores at each IAT timepoint") +
  theme_bw()

print(fig_3)

## geom_path: Each group consists of only one observation. Do you need to adjust
## the group aesthetic?

Hmm, it seemss like I do not lines connecting each of the time conditions, plus the key for the cued and uncued conditions seems to be missing. I’m experiencing a similar issue that Michelle had, so I will try to reorgainse the tibble in a way that she found to be successful for connecting the lines.

attempt 2: re-creating my dataframe

data2.1 <- data.frame(
  condition = factor(c("cued", "cued", "cued", "cued", "uncued", "uncued", "uncued", "uncued")),
  time = factor(c("Baseline", "Prenap", "Postnap", "1-week", "Baseline", "Prenap", "Postnap", "1-week")),
  levels = c("Baseline", "Prenap", "Postnap", "1-week"),
  bias = c(mean_bias$baseIATcued_mean, mean_bias$preIATcued_mean, mean_bias$postIATcued_mean, mean_bias$weekIATcued_mean, mean_bias$baseIATuncued_mean, mean_bias$preIATuncued_mean,mean_bias$postIATuncued_mean, mean_bias$weekIATuncued_mean),
  stderror = c(0.06522755, 0.09232149, 0.07984374, 0.06954221, 0.08030357, 0.07937978, 0.08578681, 0.08388760))
  
head(data2.1)

##   condition     time   levels      bias   stderror
## 1      cued Baseline Baseline 0.5175814 0.06522755
## 2      cued   Prenap   Prenap 0.2108864 0.09232149
## 3      cued  Postnap  Postnap 0.3068241 0.07984374
## 4      cued   1-week   1-week 0.3999553 0.06954221
## 5    uncued Baseline Baseline 0.5954932 0.08030357
## 6    uncued   Prenap   Prenap 0.3024484 0.07937978

attempt 2: creating my ggplot

I ended up taking a look at some of the code my team members put on github to figure out this last part especially with the part where I had to define the x variable. Turns out labeling time isn’t enough because the times were not organised the way we wished. Using level was necessary to organise the each of the time point chronologically.

The rest of the ggplot features are similar to those used in figure 4. We used geom_line and geom_errorbar to stack them on top of each other, with final touch ups with aesthetics by changing the colour, width and transparency of the error bars using colour, `width the alpha respectively.

And finally using labs removes the x label, relabels the y label, and adds a caption.

fig_3.2 <- ggplot(data2.1, aes(x = factor(time, level = c("Baseline", "Prenap", "Postnap", "1-week")), y = bias, colour = condition, group = condition)) +
  geom_line()+
geom_errorbar(aes(x = time, ymin = bias + stderror, ymax = bias - stderror), width = 0.1, colour = "grey", alpha = 0.9)+
  labs(x = "", y = "D600 Bias Score", caption = "Fig 3. Average D600 scores at each IAT timepoint") +
  theme_bw()
  
print(fig_3.2)

## Figure 5

This figure shows the association between SWS x REM sleep and differential bias change.

Since I have already loaded the relevant packages and read in the data, I will jump straight to data wrangling.

Data wrangling

obtaining relevant variables

Looking at the axis of the graph, it seems that there is no differential bias change variable provided by the authors spss file, so we have to figure out a way to create this variable. However SWS x REM has been provided by the authors.

From the paper, differential bias change is “baseline minus delayed score for uncued bias subtracted from the baseline minus delayed score for cued bias”. This is the equation Katherine had provided on github.

(baseline_cued - delayed_cued) - (baseline_uncued - delayed_uncued) = cued_differential - uncued_differential

It seems that differential bias change is the difference in differences between the cued and uncued bias levels from the first IAT measure to the last IAT measure condition. Though I’m still a little confused as to what exactly it means, and also why the authors decided to look at this.

Either way, we need to uses mutate to create some new variables to obtain the differential bias measure:

As usual, select picks the relevent variables from the ‘cleandata’ for further analysis. Then mutate is used to create the ‘cued_differential’ and ‘uncued_differential’ variables which would be used to find the final differential bias change value.

differential <- cleandata %>%
  select(baseIATcued, weekIATcued, baseIATuncued, weekIATuncued, SWSxREM) %>%
  mutate(cued_differential = baseIATcued - weekIATcued,
         uncued_differential = baseIATuncued - weekIATuncued,
         diff_bias_change = cued_differential - uncued_differential) #this is the y-axis

print(differential)

## # A tibble: 31 x 8
##    baseIATcued weekIATcued baseIATuncued weekIATuncued SWSxREM cued_differential
##          <dbl>       <dbl>         <dbl>         <dbl>   <dbl>             <dbl>
##  1      0.575       0.204         0.610         0.683      276             0.372
##  2      0.0991      0.459         0.644        -0.0107       0            -0.360
##  3      0.206       0.399         1.52          0.712      408            -0.193
##  4      0.353       0.923         0.131         0.202      408            -0.570
##  5      0.572      -0.0187        0.0488        0.131       32             0.591
##  6      0.310       0.561         0.901         1.12       648            -0.250
##  7      0.232      -0.0686        1.50         -0.277      333             0.301
##  8      0.679       0.254         0.614         0.0310      36             0.425
##  9      1.09        0.778         0.522        -0.319      552             0.310
## 10      0.553       0.0808        0.283         0.0304     275             0.472
## # ... with 21 more rows, and 2 more variables: uncued_differential <dbl>,
## #   diff_bias_change <dbl>

Recreating figure 5

creating my ggplot

This time I’m usisng geom_point since it is a scatter graph and then also addting geom_smooth to stack on the line of best fit. Using method = lm, it formats the line of best fit to be straight rather than curved.

using scale x and y continuous allows us to specify the limits of the x and y values on the graph to format it similarly to the original grpah. And finally labs allows us to add axis labels, titles and a footnote.

fig_5 <- ggplot(data = differential, aes(x = SWSxREM, y = diff_bias_change))+
  geom_point()+
  geom_smooth(method = lm, #lm stands for linear model, so it makes the line straight
              se = F)+ #removes confidence interval shading
  scale_x_continuous(expand = c(0,0),limits = c(0,1000))+ 
  scale_y_continuous(expand = c(0,0),limits = c(-2,1.5))+
  labs(title = "Fig 5. No association between minutes in SWS x minutes in REM and differential bias change", 
       x = "SWS x REM sleep duration (min)",
       y = "Differential bias change")+
  theme_bw()

print(fig_5)

## `geom_smooth()` using formula 'y ~ x'

And with that I’m pretty confident about all the code for every table and figure.

Presentation progress

I was able to achieve all the goals for the presentation thiss week, although I did have to ask my group to reschedule the group meeting to a little later than we usually have it as I had a lot of work for other subjects, but it worked out well in the end.

We ended up doing all the things planned for today’s meeting and all Iwe have left is finishing our respective parts and speeches, slides formatting and then we’re ready to practice and record.

Fixing my code from last week

I ended up getting some help from a team member and I got it all sorted :)

Using github

During the meeting today, I wass successfully able to figure out how to upload my own files onto github with the help of my team members.

Challenges and Successes

Overall things went very smoothly this week. Most of my challenges were documented above, along with some other aspects such as using using github, forgetting my password and resetting it, and struggling to reduce the length of my speech.

Nevertheless I sorted everything out and had time to go over the code from other figures my team had put up, which was nice for me since I now fully understand all of the code! how exciting

Next week

I plan to finish off my parts of the presentation and record it with my group
Get started on the exploratory analysis, it loos like there’s a lot to do, and there isn’t much time left