This week's coding goals

Reproduce Figure 4 statistics and graph
Install github onto RStudio

Our group managed to finish all three figures, which are linked here: Figure 4, Figure 5, and Figure 3 in Julia's learning log.

I wanted to try and reproduce Figure 4 on my own, to see how far I've gone in my coding skills.

How did I go? / Challenges and successes

Goal 1: reproduce Figure 4

Load relevant packages

library() function loads the packages. We used readspss package in order to read the original datafile from OFS. tidyverse package is used for data wrangling. psych package provides tools for personality, psychometric theory and experimental psychology.

library(readspss) 
library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✓ ggplot2 3.3.4     ✓ purrr   0.3.4
## ✓ tibble  3.1.2     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(psych)

## 
## Attaching package: 'psych'

## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha

Load data

The read_csv function is used to read a dataset into a new data variable Plot4 using <-.

Plot4 <- read_csv("cleandata.csv")

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   .default = col_character(),
##   General_1_Age = col_double(),
##   General_1_EnglishYrs = col_double(),
##   General_1_CaffCups = col_double(),
##   General_1_CaffHrsAgo = col_double(),
##   General_1_UniYears = col_double(),
##   Epworth_total = col_double(),
##   AlertTest_1_Concentr_1 = col_double(),
##   AlertTest_1_Refresh_1 = col_double(),
##   AlertTest_2_Concentr_1 = col_double(),
##   AlertTest_2_Refresh_1 = col_double(),
##   AlertTest_3_Concentr_1 = col_double(),
##   AlertTest_3_Refresh_1 = col_double(),
##   AlertTest_4_Concentr_1 = col_double(),
##   AlertTest_4_Refresh_1 = col_double(),
##   Total_sleep = col_double(),
##   Wake_amount = col_double(),
##   NREM1_amount = col_double(),
##   NREM2_amount = col_double(),
##   SWS_amount = col_double(),
##   REM_amount = col_double()
##   # ... with 26 more columns
## )
## ℹ Use `spec()` for the full column specifications.

Create Figure 4 data tibble

We first need to calculate the change in implicit bias levels at the immediate and one-week delay tests. To do this, we use the values we calculated in Table 3. For example, the variable pre_post_change_cued is calculated from the subtraction of the post-nap cued to pre-nap cued condition's mean. The first two variables (pre_post_change_cued, pre_post_change_uncued) are the pre- to post- nap change for the cued and uncued condition, respectively. The last two variables are the changes in implicit bias levels from pre-nap to one week later, for the cued and uncued conditions, respectively. The new variable is the left side of the = and the equation of the means' subtraction is calculated on the right side of the =.

pre_post_change_cued = 0.31 - 0.21

pre_post_change_uncued = 0.25 - 0.3

pre_week_change_cued = 0.40 - 0.21

pre_week_change_uncued = 0.40 - 0.30

We create a new dataframe fig4 using the tibble() function. 3 columns are created: "change_from_pre_to", "cued" and "uncued". "change_from_pre_to" has the labels for immediate and week, while the cued and uncued columns/values are taken from the chunk above. fig4 organises the calculated values from above into the cued and uncued conditions, for the immediate (pre- to post-nap) changes and the one-week-delay changes. print() is used to check if the tibble has been formatted correctly.

fig4 <- tibble(
  change_from_pre_to = c("immediate","week"),
  cued = c(0.1, 0.19),
  uncued = c(-0.05, 0.1) 
)

print(fig4)

## # A tibble: 2 x 3
##   change_from_pre_to  cued uncued
##   <chr>              <dbl>  <dbl>
## 1 immediate           0.1   -0.05
## 2 week                0.19   0.1

Creating time1 data

When creating a data set, you indicate what “variables” go into each group. This also determines what will be the axes of the graph.

We realised that the data might need to be formatted in a different way. Thus, the dataframe below (time1) uses the same values as above, but is formatted differently.

time1 includes the two time conditions “immediate” (pre- to post-nap change)and “week” (change from pre-nap to one-week later). rep() replicates/repeats the values within the bracket. The value "2" indicates how many columns is needed for each time point i.e. two (which will be for the cued and uncued conditions).

bias_change is where the relevant values that were calculated previously are entered.

Data = data.frame translates this into a dataframe named data where the relevant groups in the brackets are included in the data.

head allows the data to be viewed.

time1 <- c(rep("immediate",2),rep("week",2))
condition <-rep(c("cued","uncued"),2)
bias_change <- c(0.10, -0.05, 0.19, 0.10)
data = data.frame(time1, condition, bias_change)

head(data)

##       time1 condition bias_change
## 1 immediate      cued        0.10
## 2 immediate    uncued       -0.05
## 3      week      cued        0.19
## 4      week    uncued        0.10

Plotting the graph

Now that we've formatted the data into a proper format, it's time to plot the graph. We used the ggplot() package to graph our data. -ggplot() is to indicate we want to graph our data - In the brackets we indicate what data we are using, followed by the aes() function for the aesthetics/formatting of the graph - The next lines indicate what variables we want on the x and y axis, where RStudio will use the data we provided - fill = indicates that different colours are to be allocated for each condition - geom_bar() adds a feature to the graph that makes the heights of the bar proportional to the number of cases in each group - position = "dodge" ensure that the separate conditions are not stacked but are instead side by side - stat = "identity" is a statement that needs to include when using geom_bar() as this function reads data in a way that is incompatible with the ‘y’ aesthetic. Normally geom_bar() formats the heights of the bars such that it formats the height to the number of observations in the group, not the value we assign to it. Therefore we need to add stat = "identity" to indicate to R that we want the bar heights to be the values we provide, rather than to the default setting (number of observations). - alpha determines the opacity of a geom, with lower values indicating more transparency

ggplot(data = data, aes(
  x = time1,
  y = bias_change,
  fill = condition
)) +
  geom_bar(
    position = "dodge", 
    stat = "identity", 
    alpha=0.7)

Calculations for the error bar

Now that the graph has been created, we need to create the error bars. In this case, the error bars represent one standard error for each change-in-bias group. Thus, we need to calculate the standard error.

The package plotrix() has a built-in function for calculating standard error, so the first step is to install and load the package. install.packages() installs the package contained within the quotation marks and brackets. library() loads the downloaded package into your RStudio.

install.packages("plotrix")

## Error in contrib.url(repos, "source"): trying to use CRAN without setting a mirror

library(plotrix)

## 
## Attaching package: 'plotrix'

## The following object is masked from 'package:psych':
## 
##     rescale

I first tried creating a new variable fig4_stderror using the std.error() function from plotrix package. std.error()calculates the standard errors of the dataframe bias_change. fig4_stderror = data.frame translates this into a dataframe named fig4_stderror where the relevant groups in the brackets are included in the data. head allows the dataframe to be viewed. However, it only came up with one standard error value.

fig4_stderror <- std.error(bias_change)

fig4_stderror = data.frame(fig4_stderror)

head(fig4_stderror)

##   fig4_stderror
## 1    0.04974937

Thus, I tried doing it Julia's way: I created a new variable biaschangeconditions using select() to select the variables I wanted from the cleandata dataset. Then, using mutate() on the variables I had just selected, I created 4 new variables (calculated for both cued and uncued conditions, as the differences between prenap implicit bias and postnap implicit bias, as well as the differences between prenap and one-week-delay implicit bias).

biaschangeconditions <- cleandata %>%
  select(postIATcued, preIATcued, postIATuncued, preIATuncued, weekIATcued, weekIATuncued)

## Error in select(., postIATcued, preIATcued, postIATuncued, preIATuncued, : object 'cleandata' not found

biaschange <- biaschangeconditions  %>%
  mutate(immed_cued = postIATcued - preIATcued,
         immed_uncued = postIATuncued - preIATuncued,
         week_cued = weekIATcued - preIATcued,
         week_uncued = weekIATuncued - preIATuncued)

## Error in mutate(., immed_cued = postIATcued - preIATcued, immed_uncued = postIATuncued - : object 'biaschangeconditions' not found

Now, I have to find the means for each of these variables, and then the standard error of each mean. To find the mean, I used summarise() from dplyr package. contains() filters for only the variables that contain "cued" in their name, and list() allows for the means of each to be calculated. across() ensures that these conditions are met for all that meet the criteria. print() allows for the means to be viewed.

biaschangemean <- biaschange %>% 
  summarise(across(contains("cued"), list(mean = mean)))

## Error in summarise(., across(contains("cued"), list(mean = mean))): object 'biaschange' not found

head(biaschangemean)

## Error in head(biaschangemean): object 'biaschangemean' not found

Now, I need to find the standard error from the means I just calculated. To do this, I used std.error() function from plotrix package to create a new variable biaschangeerror. std.error()calculates the standard errors of the dataframe biaschange. head allows the data to be viewed.

biaschangeerror <- std.error(biaschange)

## Error in std.error(biaschange): object 'biaschange' not found

head(biaschangeerror)

## Error in head(biaschangeerror): object 'biaschangeerror' not found

Now, I need to update the dataframe with the calculated standard error values. time1 includes the two time conditions/columns “immediate” (pre- to post-nap change) and “week” (change from pre-nap to one-week later). rep() replicates/repeats the values within the bracket. The value "2" indicates how many subcolumns is needed for each timepoint/column i.e. two (which will be for the cued and uncued conditions).

bias_change is where the relevant values that were calculated previously are entered.

Data = data.frame translates this into a dataframe named data where the relevant groups in the brackets are included in the data.

head allows the data to be viewed.

time1 <- c(rep("immediate",2),rep("week",2)) #the two groups of columns
condition <-rep(c("cued","uncued"),2) 
bias_change <- c(0.10, -0.05,0.19, 0.10) #calculated differences, needs to be in order of the graph
stderror <- c(0.09759788, 0.10297893, 0.11593440, 0.09008655)
data = data.frame(time1, condition, bias_change, stderror)

head(data)

##       time1 condition bias_change   stderror
## 1 immediate      cued        0.10 0.09759788
## 2 immediate    uncued       -0.05 0.10297893
## 3      week      cued        0.19 0.11593440
## 4      week    uncued        0.10 0.09008655

Now, I plot the graph again with the updated standard errors and some new aesthetics/formatting. The extra functions I used in this are: - geom_errorbar() from ggplot2 adds the error bars, defined by $x$, $ymin$ and $ymax$. $ymin$ and $ymax$ is calculated as the mean (variable: bias_change) + or - the standard error. - labs() from gglot2 allows for modification of aesthetics. $x$ = "" ensures there is no label on x axis, while a label is specified for the y-axis and a caption is made as well.

ggplot(data = data, aes(
  x = time1,
  y = bias_change,
  fill = condition
)) +
  geom_bar(position = "dodge",stat = "identity") +
  geom_errorbar(aes(
    x= time1, 
    ymin=bias_change-stderror, 
    ymax=bias_change+stderror), 
    width=0.4, 
    colour="grey", 
    alpha= 0.9, 
    position = position_dodge(0.9)) +
  ylim(-0.2, 0.4) +
  labs(x = "", 
       y = "Bias Change", 
       caption = "Fig 4. Change in implicit bias levels at the immediate and one-week delay tests.")

Goal 2: Install github onto RStudio

We had a zoom meeting set up this week to get github installed onto RStudio with Jenny. With her help, we were able to troubleshoot all our issues and were all able to download it (thank you~)! We have set up a repository that we are updating with our code scripts.

Next steps

Our group has finished reproducing all our tables and plots, so our next step is putting together our presentation for Week 8. My personal goal for next week in terms of coding, is to try and reproduce one of the other figures (either Figure 3 or 5) by myself to see if I fully understand the coding process.

Learning Log 5

Jade Gurtala

10/07/2021

This week's coding goals

How did I go? / Challenges and successes

Goal 1: reproduce Figure 4

Goal 2: Install github onto RStudio

Next steps