This week's coding goals

As mentioned in my previous learning log, my group have finished reproducing all tables and figures. For this week, my group mainly focused on putting together our presentation. Personally, I focused on reproducing Figure 5 values and graph.

How did I go? / Challenges and successes

Preliminaries

Load relevant packages

library() function loads the packages. The readspss package is used to read the original datafile from OFS. tidyverse package is used for data wrangling.

library(readspss) 
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.4     ✓ purrr   0.3.4
## ✓ tibble  3.1.2     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

Load data

The read_csv function is used to read a dataset into a new data variable Plot5 using <-.

Plot5 <- read_csv("cleandata.csv")
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   .default = col_character(),
##   General_1_Age = col_double(),
##   General_1_EnglishYrs = col_double(),
##   General_1_CaffCups = col_double(),
##   General_1_CaffHrsAgo = col_double(),
##   General_1_UniYears = col_double(),
##   Epworth_total = col_double(),
##   AlertTest_1_Concentr_1 = col_double(),
##   AlertTest_1_Refresh_1 = col_double(),
##   AlertTest_2_Concentr_1 = col_double(),
##   AlertTest_2_Refresh_1 = col_double(),
##   AlertTest_3_Concentr_1 = col_double(),
##   AlertTest_3_Refresh_1 = col_double(),
##   AlertTest_4_Concentr_1 = col_double(),
##   AlertTest_4_Refresh_1 = col_double(),
##   Total_sleep = col_double(),
##   Wake_amount = col_double(),
##   NREM1_amount = col_double(),
##   NREM2_amount = col_double(),
##   SWS_amount = col_double(),
##   REM_amount = col_double()
##   # ... with 26 more columns
## )
## ℹ Use `spec()` for the full column specifications.

Create Figure 5 data tibble

To reproduce Figure 5, two sets of values are needed. For the x-axis, the values for minutes in SWS x minutes in REM is needed and for the y-axis, differential bias change is needed. The x-axis values has already been provided in the open data, labelled under the variable SWSxREM.

For the y-axis, the paper defines differential bias change as “baseline minus delayed score for uncued bias subtracted from the baseline minus delayed score for cued bias”.

So the equation would look similar to this:

differential bias change = (baseline_cued - delayed_cued) - (baseline_uncued - delayed_uncued)

This equation must be applied to each participant's score. Thus,mutate() would be the best function to do this. mutate() is taken from the dplyr package and allows for the creation, modification and deletion of columns. It allows for new variables to be added, while keeping existing ones. The first part of the equation can be grouped into 2 new variables: cued_differential and uncued_differential. cued_differential will be defined as baseIATcued - weekIAT cued. Likewise, uncued_differential will be defined as baseIATuncued - weekIATuncued. Thus, the equation to create the variable diff_bias_change can alse be defined as:

diff_bias_change = cued_differential - uncued_differential

However, before that equation can be coded in, several other steps need to occur first:

  • First, a new variable differential is created using <-. <- assigns a value (given on the right of the symbol) to a name (i.e. differential). Then, the data Plot5 is selected.
  • The pipe operator %>% is used to chain multiple methods into a single statement, without having to create and store new variables. It does this by taking the output of one statement and making it the input of the next statement.
  • For example, the first line of code is the first statement. It creates a new variable differential that includes all data from Plot5. The pipe operator %>% takes that output, and uses it for the input of the next line of code, and so forth.
  • select() is from the dplyr package. It allows for the selection of variables within a dataframe - the variables from Plot5 dataframe that are to be selected are contained within the brackets
  • mutate() is used to create the three new variables as detailed above. mutate() adds three new columns/variables to the dataset Plot5.
  • head allows for the newly-calculated variable to be viewed.
differential <- Plot5 %>%
  select(ParticipantID, baseIATcued, weekIATcued, baseIATuncued, weekIATuncued, SWSxREM) %>%
  mutate(cued_differential = baseIATcued - weekIATcued,
         uncued_differential = baseIATuncued - weekIATuncued,
         diff_bias_change = cued_differential - uncued_differential) 

head(differential)
## # A tibble: 6 x 9
##   ParticipantID baseIATcued weekIATcued baseIATuncued weekIATuncued SWSxREM
##   <chr>               <dbl>       <dbl>         <dbl>         <dbl>   <dbl>
## 1 ub6                0.575       0.204         0.610         0.683      276
## 2 ub7                0.0991      0.459         0.644        -0.0107       0
## 3 ub8                0.206       0.399         1.52          0.712      408
## 4 ub9                0.353       0.923         0.131         0.202      408
## 5 ub11               0.572      -0.0187        0.0488        0.131       32
## 6 ub13               0.310       0.561         0.901         1.12       648
## # … with 3 more variables: cued_differential <dbl>, uncued_differential <dbl>,
## #   diff_bias_change <dbl>

Plotting the graph

Now that all the needed values has been gathered, it's time to plot the graph. ggplot() package is used to graph the data.

  • ggplot() initiates that I want to create a ggplot object
  • data = indicate what data is to be used, followed by the aes() function for the aesthetics/formatting of the graph.
  • the aes() function indicates what variables are to be used for the x- and y-axis.
  • geom_point() adds a feature to the graph that allows for scattorplots.
  • geom_smooth() adds a feature to the graph that allows for a regression line

The overall output is quite weird - the graph contains confidence interval shading, the regression line is not straight, and the x-axis does not begin at the value 0.

ggplot(data = differential, aes(
         x = SWSxREM,
         y = diff_bias_change
)) +
  geom_point() +
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

To fix this, I did this:

  • Researching geom_smooth() showed that the confidence interval has been set as se = TRUE by default. Thus, to get rid of the confidence interval shading, the confidence interval must be set as se = FALSE.
  • Regarding the non-straight regression line, the regression line has been set as method = 'loess' by default. loess uses the smoothing method, based around local fitting. method = can alse be set as "lm" (linear model) or glm" (generalised linear model). To make the regression line straight, method must be set as "lm" (linear model).
  • To set the x- and y-axis value limits, xlim() and ylim can be used, respectively. Within the brackets, two numeric values must be determined, where the first (left) value specifies the lower limit and the second (right) value specifies the upper limit of each axis.

This looks almost correct, except the x- and y-axes are still not starting at the value 0.

ggplot(data = differential, aes(
         x = SWSxREM,
         y = diff_bias_change
)) +
  geom_point() +
  geom_smooth(se = FALSE,
              method = lm) +
  xlim(0, 850) +
  ylim(-2, 1.5)
## `geom_smooth()` using formula 'y ~ x'

ggplot(data = differential, aes(
         x = SWSxREM,
         y = diff_bias_change
)) +
  geom_point() +
  geom_smooth(se = FALSE,
              method = lm) +
  xlim(0, 850) +
  ylim(-2, 1.5)
## `geom_smooth()` using formula 'y ~ x'

I realised that xlim() and ylim() are used for simple manipulations of limits. For more in depth manipulation of x- and y-axes aesthetics, scale_x_discrete(), scale_x_continuous(), or scale_x_date() can be used. Since the data is continuous, I'll try replacing the xlim() and ylim() with scale_x_continuous() and scale_y_continuous(). Within these functions' brackets, I need to specify some things:

  • limits defines the limits of the scale.
  • expand has been set on default to allow for some padding/gap on each side for the data variables.
  • i.e. as seen in the figure above, there is a 5% gap/expansion on each side of the scale because this dataset has continuous variables. For discrete variables, there is a padding of 0.6units on each side of the scale.
  • To remove that default padding/expansion and set it so that the x-axis starts at the value 0, expand is set at c(0,0) for both the x- and y-axes.
  • The c(...) in c(0,0) combines the arguments (i.e. the values within the brackets 0, 0) to form a vector

However, now the figure is completely weird. It is only showing a dot at the centre of the figure.

ggplot(data = differential, aes(
         x = SWSxREM,
         y = diff_bias_change
)) +
  geom_point() +
  geom_smooth(se = FALSE,
              method = lm) +
  scale_x_continuous(limits = 0, 850,
                     expand = c(0,0)) +
  scale_y_continuous(limits = -2, 1.5,
                     expand = c(0,0))
## `geom_smooth()` using formula 'y ~ x'

Looking back at my code, I realised that when I inputted the values for the limits I wanted for the x- and y-axes, I didn't include them within c(...), and thus it didn't return them as a vector. Instead it set the x-axis limit as 0 with an axis label of "1000" and the y-axis limit as -2 with an axis label of "1.5". Adding all the values into c(...) fixed the issue of the axis limits.

ggplot(data = differential, aes(
         x = SWSxREM,
         y = diff_bias_change
)) +
  geom_point() +
  geom_smooth(se = FALSE,
              method = lm) +
  scale_x_continuous(limits = c(0, 850),
                     expand = c(0,0)) +
  scale_y_continuous(limits = c(-2, 1.5),
                     expand = c(0,0))
## `geom_smooth()` using formula 'y ~ x'

Now, to add aesthetics and formatting for the figure:

  • labs() adds titles, plot labels and legend labels to the figure.
  • title = specifies the text for the title with the text included in "" following the =. The same goes for the x- (x =) and y-axes (y =).
  • Note: I used subtitle = instead of title = here because not all of the title could be seen since the default title = font was too big to include everything.
ggplot(data = differential, aes(
         x = SWSxREM,
         y = diff_bias_change
)) +
  geom_point() +
  geom_smooth(se = FALSE,
              method = lm) +
  scale_x_continuous(limits = c(0, 850),
                     expand = c(0,0)) +
  scale_y_continuous(limits = c(-2, 1.5),
                     expand = c(0,0)) + 
   labs(subtitle = "Fig 5. No association between minutes in SWS x minutes in REM and differential bias change", 
       x = "SWS x REM sleep duration (min)",
       y = "Differential bias change") 
## `geom_smooth()` using formula 'y ~ x'

I wanted to change it so that the x-axis values seen matches what is seen in the original paper i.e. only the values 0 and 500 are shown. To do this, I added breaks = in the scale_x_continuous() function and specified the values I wanted to see in c(...). This displays on the x-axis only the values that have been specified. I also did this for the y-axis, so that the value 1.5 is shown and the axis is in intervals of 0.5.

ggplot(data = differential, aes(
         x = SWSxREM,
         y = diff_bias_change
)) +
  geom_point() +
  geom_smooth(se = FALSE,
              method = lm) +
  scale_x_continuous(limits = c(0, 850),
                     breaks = c(0,500),
                     expand = c(0,0)) +
  scale_y_continuous(limits = c(-2, 1.5),
                     breaks = c(-2, -1.5, -1, -0.5, 0, 0.5, 1, 1.5),
                     expand = c(0,0)) + 
   labs(subtitle = "Fig 5. No association between minutes in SWS x minutes in REM and differential bias change", 
       x = "SWS x REM sleep duration (min)",
       y = "Differential bias change")
## `geom_smooth()` using formula 'y ~ x'

To make it more similar to the figure seen in the paper, I want to remove the grey background and the grid lines. To do this, I experimented with a few themes by adding theme_bw(), theme_light(), theme_minimal() and theme_classic(). In the end, the one that worked best was theme_classic().

ggplot(data = differential, aes(
         x = SWSxREM,
         y = diff_bias_change
)) +
  geom_point() +
  geom_smooth(se = FALSE,
              method = lm) +
  scale_x_continuous(limits = c(0, 850),
                     breaks = c(0,500),
                     expand = c(0,0)) +
  scale_y_continuous(limits = c(-2, 1.5),
                     breaks = c(-2, -1.5, -1, -0.5, 0, 0.5, 1, 1.5),
                     expand = c(0,0)) + 
   labs(subtitle = "Fig 5. No association between minutes in SWS x minutes in REM and differential bias change", 
       x = "SWS x REM sleep duration (min)",
       y = "Differential bias change") +
  theme_classic()
## `geom_smooth()` using formula 'y ~ x'

Next steps

Now, I've reproduced everything in the paper by myself except for Figure 3. If I have the time this coming week, I will attempt to reproduce Figure 3 on my own. However, my main focus for this coming week is to get started on my exploratory analyses.