Our group managed to finish all three figures, which are linked here: Figure 4, Figure 5, and Figure 3 in Julia's learning log.
I wanted to try and reproduce Figure 4 on my own, to see how far I've gone in my coding skills.
Load relevant packages
library() function loads the packages. We used readspss package in order to read the original datafile from OFS. tidyverse package is used for data wrangling. psych package provides tools for personality, psychometric theory and experimental psychology.
library(readspss)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.4 ✓ purrr 0.3.4
## ✓ tibble 3.1.2 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(psych)
##
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
Load data
The read_csv function is used to read a dataset into a new data variable Plot4 using <-.
Plot4 <- read_csv("cleandata.csv")
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## .default = col_character(),
## General_1_Age = col_double(),
## General_1_EnglishYrs = col_double(),
## General_1_CaffCups = col_double(),
## General_1_CaffHrsAgo = col_double(),
## General_1_UniYears = col_double(),
## Epworth_total = col_double(),
## AlertTest_1_Concentr_1 = col_double(),
## AlertTest_1_Refresh_1 = col_double(),
## AlertTest_2_Concentr_1 = col_double(),
## AlertTest_2_Refresh_1 = col_double(),
## AlertTest_3_Concentr_1 = col_double(),
## AlertTest_3_Refresh_1 = col_double(),
## AlertTest_4_Concentr_1 = col_double(),
## AlertTest_4_Refresh_1 = col_double(),
## Total_sleep = col_double(),
## Wake_amount = col_double(),
## NREM1_amount = col_double(),
## NREM2_amount = col_double(),
## SWS_amount = col_double(),
## REM_amount = col_double()
## # ... with 26 more columns
## )
## ℹ Use `spec()` for the full column specifications.
Create Figure 4 data tibble
We first need to calculate the change in implicit bias levels at the immediate and one-week delay tests. To do this, we use the values we calculated in Table 3. For example, the variable pre_post_change_cued is calculated from the subtraction of the post-nap cued to pre-nap cued condition's mean. The first two variables (pre_post_change_cued, pre_post_change_uncued) are the pre- to post- nap change for the cued and uncued condition, respectively. The last two variables are the changes in implicit bias levels from pre-nap to one week later, for the cued and uncued conditions, respectively. The new variable is the left side of the = and the equation of the means' subtraction is calculated on the right side of the =.
pre_post_change_cued = 0.31 - 0.21
pre_post_change_uncued = 0.25 - 0.3
pre_week_change_cued = 0.40 - 0.21
pre_week_change_uncued = 0.40 - 0.30
We create a new dataframe fig4 using the tibble() function. 3 columns are created: "change_from_pre_to", "cued" and "uncued". "change_from_pre_to" has the labels for immediate and week, while the cued and uncued columns/values are taken from the chunk above. fig4 organises the calculated values from above into the cued and uncued conditions, for the immediate (pre- to post-nap) changes and the one-week-delay changes. print() is used to check if the tibble has been formatted correctly.
fig4 <- tibble(
change_from_pre_to = c("immediate","week"),
cued = c(0.1, 0.19),
uncued = c(-0.05, 0.1)
)
print(fig4)
## # A tibble: 2 x 3
## change_from_pre_to cued uncued
## <chr> <dbl> <dbl>
## 1 immediate 0.1 -0.05
## 2 week 0.19 0.1
Creating time1 data
When creating a data set, you indicate what “variables” go into each group. This also determines what will be the axes of the graph.
We realised that the data might need to be formatted in a different way. Thus, the dataframe below (time1) uses the same values as above, but is formatted differently.
time1 includes the two time conditions “immediate” (pre- to post-nap change)and “week” (change from pre-nap to one-week later). rep() replicates/repeats the values within the bracket. The value "2" indicates how many columns is needed for each time point i.e. two (which will be for the cued and uncued conditions).
bias_change is where the relevant values that were calculated previously are entered.
Data = data.frame translates this into a dataframe named data where the relevant groups in the brackets are included in the data.
head allows the data to be viewed.
time1 <- c(rep("immediate",2),rep("week",2))
condition <-rep(c("cued","uncued"),2)
bias_change <- c(0.10, -0.05, 0.19, 0.10)
data = data.frame(time1, condition, bias_change)
head(data)
## time1 condition bias_change
## 1 immediate cued 0.10
## 2 immediate uncued -0.05
## 3 week cued 0.19
## 4 week uncued 0.10
Plotting the graph
Now that we've formatted the data into a proper format, it's time to plot the graph. We used the ggplot() package to graph our data. -ggplot() is to indicate we want to graph our data - In the brackets we indicate what data we are using, followed by the aes() function for the aesthetics/formatting of the graph - The next lines indicate what variables we want on the x and y axis, where RStudio will use the data we provided - fill = indicates that different colours are to be allocated for each condition - geom_bar() adds a feature to the graph that makes the heights of the bar proportional to the number of cases in each group - position = "dodge" ensure that the separate conditions are not stacked but are instead side by side - stat = "identity" is a statement that needs to include when using geom_bar() as this function reads data in a way that is incompatible with the ‘y’ aesthetic. Normally geom_bar() formats the heights of the bars such that it formats the height to the number of observations in the group, not the value we assign to it. Therefore we need to add stat = "identity" to indicate to R that we want the bar heights to be the values we provide, rather than to the default setting (number of observations). - alpha determines the opacity of a geom, with lower values indicating more transparency
ggplot(data = data, aes(
x = time1,
y = bias_change,
fill = condition
)) +
geom_bar(
position = "dodge",
stat = "identity",
alpha=0.7)
Calculations for the error bar
Now that the graph has been created, we need to create the error bars. In this case, the error bars represent one standard error for each change-in-bias group. Thus, we need to calculate the standard error.
The package plotrix() has a built-in function for calculating standard error, so the first step is to install and load the package. install.packages() installs the package contained within the quotation marks and brackets. library() loads the downloaded package into your RStudio.
install.packages("plotrix")
## Error in contrib.url(repos, "source"): trying to use CRAN without setting a mirror
library(plotrix)
##
## Attaching package: 'plotrix'
## The following object is masked from 'package:psych':
##
## rescale
I first tried creating a new variable fig4_stderror using the std.error() function from plotrix package. std.error()calculates the standard errors of the dataframe bias_change. fig4_stderror = data.frame translates this into a dataframe named fig4_stderror where the relevant groups in the brackets are included in the data. head allows the dataframe to be viewed. However, it only came up with one standard error value.
fig4_stderror <- std.error(bias_change)
fig4_stderror = data.frame(fig4_stderror)
head(fig4_stderror)
## fig4_stderror
## 1 0.04974937
Thus, I tried doing it Julia's way: I created a new variable biaschangeconditions using select() to select the variables I wanted from the cleandata dataset. Then, using mutate() on the variables I had just selected, I created 4 new variables (calculated for both cued and uncued conditions, as the differences between prenap implicit bias and postnap implicit bias, as well as the differences between prenap and one-week-delay implicit bias).
biaschangeconditions <- cleandata %>%
select(postIATcued, preIATcued, postIATuncued, preIATuncued, weekIATcued, weekIATuncued)
## Error in select(., postIATcued, preIATcued, postIATuncued, preIATuncued, : object 'cleandata' not found
biaschange <- biaschangeconditions %>%
mutate(immed_cued = postIATcued - preIATcued,
immed_uncued = postIATuncued - preIATuncued,
week_cued = weekIATcued - preIATcued,
week_uncued = weekIATuncued - preIATuncued)
## Error in mutate(., immed_cued = postIATcued - preIATcued, immed_uncued = postIATuncued - : object 'biaschangeconditions' not found
Now, I have to find the means for each of these variables, and then the standard error of each mean. To find the mean, I used summarise() from dplyr package. contains() filters for only the variables that contain "cued" in their name, and list() allows for the means of each to be calculated. across() ensures that these conditions are met for all that meet the criteria. print() allows for the means to be viewed.
biaschangemean <- biaschange %>%
summarise(across(contains("cued"), list(mean = mean)))
## Error in summarise(., across(contains("cued"), list(mean = mean))): object 'biaschange' not found
head(biaschangemean)
## Error in head(biaschangemean): object 'biaschangemean' not found
Now, I need to find the standard error from the means I just calculated. To do this, I used std.error() function from plotrix package to create a new variable biaschangeerror. std.error()calculates the standard errors of the dataframe biaschange. head allows the data to be viewed.
biaschangeerror <- std.error(biaschange)
## Error in std.error(biaschange): object 'biaschange' not found
head(biaschangeerror)
## Error in head(biaschangeerror): object 'biaschangeerror' not found
Now, I need to update the dataframe with the calculated standard error values. time1 includes the two time conditions/columns “immediate” (pre- to post-nap change) and “week” (change from pre-nap to one-week later). rep() replicates/repeats the values within the bracket. The value "2" indicates how many subcolumns is needed for each timepoint/column i.e. two (which will be for the cued and uncued conditions).
bias_change is where the relevant values that were calculated previously are entered.
Data = data.frame translates this into a dataframe named data where the relevant groups in the brackets are included in the data.
head allows the data to be viewed.
time1 <- c(rep("immediate",2),rep("week",2)) #the two groups of columns
condition <-rep(c("cued","uncued"),2)
bias_change <- c(0.10, -0.05,0.19, 0.10) #calculated differences, needs to be in order of the graph
stderror <- c(0.09759788, 0.10297893, 0.11593440, 0.09008655)
data = data.frame(time1, condition, bias_change, stderror)
head(data)
## time1 condition bias_change stderror
## 1 immediate cued 0.10 0.09759788
## 2 immediate uncued -0.05 0.10297893
## 3 week cued 0.19 0.11593440
## 4 week uncued 0.10 0.09008655
Now, I plot the graph again with the updated standard errors and some new aesthetics/formatting. The extra functions I used in this are: - geom_errorbar() from ggplot2 adds the error bars, defined by \(x\), \(ymin\) and \(ymax\). \(ymin\) and \(ymax\) is calculated as the mean (variable: bias_change) + or - the standard error. - labs() from gglot2 allows for modification of aesthetics. $x$ = "" ensures there is no label on x axis, while a label is specified for the y-axis and a caption is made as well.
ggplot(data = data, aes(
x = time1,
y = bias_change,
fill = condition
)) +
geom_bar(position = "dodge",stat = "identity") +
geom_errorbar(aes(
x= time1,
ymin=bias_change-stderror,
ymax=bias_change+stderror),
width=0.4,
colour="grey",
alpha= 0.9,
position = position_dodge(0.9)) +
ylim(-0.2, 0.4) +
labs(x = "",
y = "Bias Change",
caption = "Fig 4. Change in implicit bias levels at the immediate and one-week delay tests.")
We had a zoom meeting set up this week to get github installed onto RStudio with Jenny. With her help, we were able to troubleshoot all our issues and were all able to download it (thank you~)! We have set up a repository that we are updating with our code scripts.
Our group has finished reproducing all our tables and plots, so our next step is putting together our presentation for Week 8. My personal goal for next week in terms of coding, is to try and reproduce one of the other figures (either Figure 3 or 5) by myself to see if I fully understand the coding process.