As mentioned in my previous learning log, my group have finished reproducing all tables and figures. For this week, my group mainly focused on putting together our presentation. Personally, I focused on reproducing Figure 5 values and graph.
Load relevant packages
library() function loads the packages. The readspss package is used to read the original datafile from OFS. tidyverse package is used for data wrangling.
library(readspss)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.4 ✓ purrr 0.3.4
## ✓ tibble 3.1.2 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
Load data
The read_csv function is used to read a dataset into a new data variable Plot5 using <-.
Plot5 <- read_csv("cleandata.csv")
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## .default = col_character(),
## General_1_Age = col_double(),
## General_1_EnglishYrs = col_double(),
## General_1_CaffCups = col_double(),
## General_1_CaffHrsAgo = col_double(),
## General_1_UniYears = col_double(),
## Epworth_total = col_double(),
## AlertTest_1_Concentr_1 = col_double(),
## AlertTest_1_Refresh_1 = col_double(),
## AlertTest_2_Concentr_1 = col_double(),
## AlertTest_2_Refresh_1 = col_double(),
## AlertTest_3_Concentr_1 = col_double(),
## AlertTest_3_Refresh_1 = col_double(),
## AlertTest_4_Concentr_1 = col_double(),
## AlertTest_4_Refresh_1 = col_double(),
## Total_sleep = col_double(),
## Wake_amount = col_double(),
## NREM1_amount = col_double(),
## NREM2_amount = col_double(),
## SWS_amount = col_double(),
## REM_amount = col_double()
## # ... with 26 more columns
## )
## ℹ Use `spec()` for the full column specifications.
To reproduce Figure 5, two sets of values are needed. For the x-axis, the values for minutes in SWS x minutes in REM is needed and for the y-axis, differential bias change is needed. The x-axis values has already been provided in the open data, labelled under the variable SWSxREM.
For the y-axis, the paper defines differential bias change as “baseline minus delayed score for uncued bias subtracted from the baseline minus delayed score for cued bias”.
So the equation would look similar to this:
differential bias change = (baseline_cued - delayed_cued) - (baseline_uncued - delayed_uncued)
This equation must be applied to each participant's score. Thus,mutate() would be the best function to do this. mutate() is taken from the dplyr package and allows for the creation, modification and deletion of columns. It allows for new variables to be added, while keeping existing ones. The first part of the equation can be grouped into 2 new variables: cued_differential and uncued_differential. cued_differential will be defined as baseIATcued - weekIAT cued. Likewise, uncued_differential will be defined as baseIATuncued - weekIATuncued. Thus, the equation to create the variable diff_bias_change can alse be defined as:
diff_bias_change = cued_differential - uncued_differential
However, before that equation can be coded in, several other steps need to occur first:
differential is created using <-. <- assigns a value (given on the right of the symbol) to a name (i.e. differential). Then, the data Plot5 is selected.%>% is used to chain multiple methods into a single statement, without having to create and store new variables. It does this by taking the output of one statement and making it the input of the next statement.differential that includes all data from Plot5. The pipe operator %>% takes that output, and uses it for the input of the next line of code, and so forth.select() is from the dplyr package. It allows for the selection of variables within a dataframe - the variables from Plot5 dataframe that are to be selected are contained within the bracketsmutate() is used to create the three new variables as detailed above. mutate() adds three new columns/variables to the dataset Plot5.head allows for the newly-calculated variable to be viewed.differential <- Plot5 %>%
select(ParticipantID, baseIATcued, weekIATcued, baseIATuncued, weekIATuncued, SWSxREM) %>%
mutate(cued_differential = baseIATcued - weekIATcued,
uncued_differential = baseIATuncued - weekIATuncued,
diff_bias_change = cued_differential - uncued_differential)
head(differential)
## # A tibble: 6 x 9
## ParticipantID baseIATcued weekIATcued baseIATuncued weekIATuncued SWSxREM
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ub6 0.575 0.204 0.610 0.683 276
## 2 ub7 0.0991 0.459 0.644 -0.0107 0
## 3 ub8 0.206 0.399 1.52 0.712 408
## 4 ub9 0.353 0.923 0.131 0.202 408
## 5 ub11 0.572 -0.0187 0.0488 0.131 32
## 6 ub13 0.310 0.561 0.901 1.12 648
## # … with 3 more variables: cued_differential <dbl>, uncued_differential <dbl>,
## # diff_bias_change <dbl>
Now that all the needed values has been gathered, it's time to plot the graph. ggplot() package is used to graph the data.
ggplot() initiates that I want to create a ggplot objectdata = indicate what data is to be used, followed by the aes() function for the aesthetics/formatting of the graph.aes() function indicates what variables are to be used for the x- and y-axis.geom_point() adds a feature to the graph that allows for scattorplots.geom_smooth() adds a feature to the graph that allows for a regression lineThe overall output is quite weird - the graph contains confidence interval shading, the regression line is not straight, and the x-axis does not begin at the value 0.
ggplot(data = differential, aes(
x = SWSxREM,
y = diff_bias_change
)) +
geom_point() +
geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
To fix this, I did this:
geom_smooth() showed that the confidence interval has been set as se = TRUE by default. Thus, to get rid of the confidence interval shading, the confidence interval must be set as se = FALSE.method = 'loess' by default. loess uses the smoothing method, based around local fitting. method = can alse be set as "lm" (linear model) or glm" (generalised linear model). To make the regression line straight, method must be set as "lm" (linear model).xlim() and ylim can be used, respectively. Within the brackets, two numeric values must be determined, where the first (left) value specifies the lower limit and the second (right) value specifies the upper limit of each axis.This looks almost correct, except the x- and y-axes are still not starting at the value 0.
ggplot(data = differential, aes(
x = SWSxREM,
y = diff_bias_change
)) +
geom_point() +
geom_smooth(se = FALSE,
method = lm) +
xlim(0, 850) +
ylim(-2, 1.5)
## `geom_smooth()` using formula 'y ~ x'
ggplot(data = differential, aes(
x = SWSxREM,
y = diff_bias_change
)) +
geom_point() +
geom_smooth(se = FALSE,
method = lm) +
xlim(0, 850) +
ylim(-2, 1.5)
## `geom_smooth()` using formula 'y ~ x'
I realised that xlim() and ylim() are used for simple manipulations of limits. For more in depth manipulation of x- and y-axes aesthetics, scale_x_discrete(), scale_x_continuous(), or scale_x_date() can be used. Since the data is continuous, I'll try replacing the xlim() and ylim() with scale_x_continuous() and scale_y_continuous(). Within these functions' brackets, I need to specify some things:
limits defines the limits of the scale.expand has been set on default to allow for some padding/gap on each side for the data variables.expand is set at c(0,0) for both the x- and y-axes.c(...) in c(0,0) combines the arguments (i.e. the values within the brackets 0, 0) to form a vectorHowever, now the figure is completely weird. It is only showing a dot at the centre of the figure.
ggplot(data = differential, aes(
x = SWSxREM,
y = diff_bias_change
)) +
geom_point() +
geom_smooth(se = FALSE,
method = lm) +
scale_x_continuous(limits = 0, 850,
expand = c(0,0)) +
scale_y_continuous(limits = -2, 1.5,
expand = c(0,0))
## `geom_smooth()` using formula 'y ~ x'
Looking back at my code, I realised that when I inputted the values for the limits I wanted for the x- and y-axes, I didn't include them within c(...), and thus it didn't return them as a vector. Instead it set the x-axis limit as 0 with an axis label of "1000" and the y-axis limit as -2 with an axis label of "1.5". Adding all the values into c(...) fixed the issue of the axis limits.
ggplot(data = differential, aes(
x = SWSxREM,
y = diff_bias_change
)) +
geom_point() +
geom_smooth(se = FALSE,
method = lm) +
scale_x_continuous(limits = c(0, 850),
expand = c(0,0)) +
scale_y_continuous(limits = c(-2, 1.5),
expand = c(0,0))
## `geom_smooth()` using formula 'y ~ x'
Now, to add aesthetics and formatting for the figure:
labs() adds titles, plot labels and legend labels to the figure.title = specifies the text for the title with the text included in "" following the =. The same goes for the x- (x =) and y-axes (y =).subtitle = instead of title = here because not all of the title could be seen since the default title = font was too big to include everything.ggplot(data = differential, aes(
x = SWSxREM,
y = diff_bias_change
)) +
geom_point() +
geom_smooth(se = FALSE,
method = lm) +
scale_x_continuous(limits = c(0, 850),
expand = c(0,0)) +
scale_y_continuous(limits = c(-2, 1.5),
expand = c(0,0)) +
labs(subtitle = "Fig 5. No association between minutes in SWS x minutes in REM and differential bias change",
x = "SWS x REM sleep duration (min)",
y = "Differential bias change")
## `geom_smooth()` using formula 'y ~ x'
I wanted to change it so that the x-axis values seen matches what is seen in the original paper i.e. only the values 0 and 500 are shown. To do this, I added breaks = in the scale_x_continuous() function and specified the values I wanted to see in c(...). This displays on the x-axis only the values that have been specified. I also did this for the y-axis, so that the value 1.5 is shown and the axis is in intervals of 0.5.
ggplot(data = differential, aes(
x = SWSxREM,
y = diff_bias_change
)) +
geom_point() +
geom_smooth(se = FALSE,
method = lm) +
scale_x_continuous(limits = c(0, 850),
breaks = c(0,500),
expand = c(0,0)) +
scale_y_continuous(limits = c(-2, 1.5),
breaks = c(-2, -1.5, -1, -0.5, 0, 0.5, 1, 1.5),
expand = c(0,0)) +
labs(subtitle = "Fig 5. No association between minutes in SWS x minutes in REM and differential bias change",
x = "SWS x REM sleep duration (min)",
y = "Differential bias change")
## `geom_smooth()` using formula 'y ~ x'
To make it more similar to the figure seen in the paper, I want to remove the grey background and the grid lines. To do this, I experimented with a few themes by adding theme_bw(), theme_light(), theme_minimal() and theme_classic(). In the end, the one that worked best was theme_classic().
ggplot(data = differential, aes(
x = SWSxREM,
y = diff_bias_change
)) +
geom_point() +
geom_smooth(se = FALSE,
method = lm) +
scale_x_continuous(limits = c(0, 850),
breaks = c(0,500),
expand = c(0,0)) +
scale_y_continuous(limits = c(-2, 1.5),
breaks = c(-2, -1.5, -1, -0.5, 0, 0.5, 1, 1.5),
expand = c(0,0)) +
labs(subtitle = "Fig 5. No association between minutes in SWS x minutes in REM and differential bias change",
x = "SWS x REM sleep duration (min)",
y = "Differential bias change") +
theme_classic()
## `geom_smooth()` using formula 'y ~ x'
Now, I've reproduced everything in the paper by myself except for Figure 3. If I have the time this coming week, I will attempt to reproduce Figure 3 on my own. However, my main focus for this coming week is to get started on my exploratory analyses.