This week’s goals
As some of group members had began working on figure 4 last week, I wanted to independently Work on figure 4 to address their limitations as well as produce more condensed code if possible before our weekly group meeting
Attend weekly group meeting to discuss our progress on figures 3 to 5
Begin brainstorming points to say and include in our group presentation for week 8, as well as begin splitting up work between everyone
Learn how to use github to upload all our tables and graphs when all are reproduced so everything can be shared between each other
Figure 4
The goal:
Before I start I want to explain why I wanted to work on this figure when it seems like my group members had everything figured out:
In Michelle’s last learning log, Jenny gave us some helpful hints as to why her error bars in the final reproduced graph didn’t align with the papers ie. the authors probably used standard error of the mean rather than the standard error (which michelle calculated). I want to see if I could find a way to find the standard error of the mean and hence fully reproduce the figure.
I looked at my teammates code, and it seems like they used the average values of immediate and week bias (for cued and uncued conditions) which were calculated in table 3. I thought that this probably isn’t the most efficient way to calculate the standard error, or the standard error of the mean, so I wanted to find a way to use every single value of immediate and wweek bias (for both cue/uncued conditions) to produce the average bias change value in the final graph
Preliminaries
load packages
On top of using the same packages as usual. I’m also using the package ‘plotrix’ this time. I did a quick google search to figure out how to find the standard error of the mean. It turns out that I could first find the standard deviation function from ‘tidyverse’ then use the formula for standard error of mean to calculate it. But this seemed a bit tedious. So I chose to use the std.error function from that ‘plotrix’ as it easily calculates standard error of the mean in one line of code.
library(tidyverse) #for data wrangling and visualisation## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.1.2 v dplyr 1.0.6
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(gt) #for creating a table
library(janitor) #for cleaning names and other possibly handy functions##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(plotrix) #for calculating standard error of the mean read in the data
As usual I am reading in the clean data file I saved from when creating table 1. Just to recap, this file excludes all data from excluded participants. There are more variables than I need to reproduce plot 2, so I will need to do some data wrangling.
cleandata <- read_csv("cleandata.csv") ##
## -- Column specification --------------------------------------------------------
## cols(
## .default = col_character(),
## Age = col_double(),
## General_1_EnglishYrs = col_double(),
## General_1_CaffCups = col_double(),
## General_1_CaffHrsAgo = col_double(),
## General_1_UniYears = col_double(),
## EES = col_double(),
## AlertTest_1_Concentr_1 = col_double(),
## AlertTest_1_Refresh_1 = col_double(),
## AlertTest_2_Concentr_1 = col_double(),
## AlertTest_2_Refresh_1 = col_double(),
## AlertTest_3_Concentr_1 = col_double(),
## AlertTest_3_Refresh_1 = col_double(),
## AlertTest_4_Concentr_1 = col_double(),
## AlertTest_4_Refresh_1 = col_double(),
## Total_sleep = col_double(),
## Wake_amount = col_double(),
## NREM1_amount = col_double(),
## NREM2_amount = col_double(),
## SWS_amount = col_double(),
## REM_amount = col_double()
## # ... with 26 more columns
## )
## i Use `spec()` for the full column specifications.
Data wrangling
obtaining relevent variables
Because this plot looks like it uses the same values from table 3, I am going use the same code as from table 3 to select the relevant variablesfor this graph. Everything is the same, except I don’t have the base variables:
biaslevelsbycondition <- cleandata %>%
select(preIATcued, postIATcued, weekIATcued, preIATuncued, postIATuncued, weekIATuncued)
print(biaslevelsbycondition)## # A tibble: 31 x 6
## preIATcued postIATcued weekIATcued preIATuncued postIATuncued weekIATuncued
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.559 0.682 0.204 0.215 0.467 0.683
## 2 -0.134 0.0446 0.459 0.340 -0.0569 -0.0107
## 3 0.511 -0.00258 0.399 0.380 0.682 0.712
## 4 -0.0293 -0.246 0.923 -0.942 0.950 0.202
## 5 0.305 0.442 -0.0187 -0.241 -0.292 0.131
## 6 0.148 0.989 0.561 0.247 0.258 1.12
## 7 0.0338 0.684 -0.0686 0.211 0.927 -0.277
## 8 1.10 0.737 0.254 0.127 0.334 0.0310
## 9 0.271 -0.563 0.778 0.338 -0.691 -0.319
## 10 0.728 0.463 0.0808 0.519 -0.265 0.0304
## # ... with 21 more rows
creating new variables
Now I’m creating a new variable called biaschange to only include data that will be used for later. I’m using mutate to create new variables from the original measurements that the authors measured which is to be used for my graph. These are the bias change variables as seen in the code below.
After I’ve created my new variables, I’m using select to choose only these new variables as they’ll be used on my graph.
biaschange <- biaslevelsbycondition %>%
mutate(immed_cued = postIATcued - preIATcued,
immed_uncued = postIATuncued - preIATuncued,
week_cued = weekIATcued - preIATcued,
week_uncued = weekIATuncued - preIATuncued) %>%
select(immed_cued, immed_uncued, week_cued, week_uncued)
print(biaschange)## # A tibble: 31 x 4
## immed_cued immed_uncued week_cued week_uncued
## <dbl> <dbl> <dbl> <dbl>
## 1 0.123 0.253 -0.355 0.468
## 2 0.178 -0.397 0.593 -0.351
## 3 -0.513 0.303 -0.112 0.332
## 4 -0.217 1.89 0.953 1.14
## 5 0.137 -0.0513 -0.323 0.371
## 6 0.842 0.0111 0.413 0.869
## 7 0.650 0.716 -0.102 -0.488
## 8 -0.359 0.206 -0.842 -0.0963
## 9 -0.835 -1.03 0.507 -0.657
## 10 -0.265 -0.784 -0.647 -0.488
## # ... with 21 more rows
obtaining mean and standard error of the mean values
great, now that I have these four columns, which are essentially the columns which are in the original graph, I can find the means and standard error of the means for each of those columns.
I’m creating a new variable called biaschangemean which denotes the mean bias change for each of the variables in the dataframe above.Using summarise in conjunction with across and contains neatly finds the mean and sd of each of those columns without typing them all out using a phrase common to all columns.
Then I’m creating a variable called biaserrorchange to calculate the standard error of the mean for each of the above 4 columns using the handy new function from the package plotrix. I simply use the function std.error and type my dataframe in which I want to find the standard error of the mean of each column in the brackets.
biaschangemean <- biaschange %>%
summarise(across(contains("cued"), list(mean = mean))) #finding the mean of each variable
print(biaschangemean)## # A tibble: 1 x 4
## immed_cued_mean immed_uncued_mean week_cued_mean week_uncued_mean
## <dbl> <dbl> <dbl> <dbl>
## 1 0.0959 -0.0539 0.189 0.0964
biaschangeerror <- std.error(biaschange) #finding the standard error of each variable
print(biaschangeerror)## immed_cued immed_uncued week_cued week_uncued
## 0.09759788 0.10297893 0.11593440 0.09008655
tada, now I have all the relevant values to create my dataframe from which the graph will be based upon.
Using the data.frame function essentially creates the structure of the tbale which I will create, where I list the what each column will be called.
Then in each of the columns which I’ve written before the <- I’m listing the corresponding bias change and standard error values found above to the correct time or condition.
Finally all the information is tied together in a table by the function head
time1 <- c(rep("immediate",2),rep("week",2))
condition <-rep(c("cued","uncued"),2)
bias_change <- c(0.09593775, -0.05390545, 0.1890689, 0.09643346)
stderror <- c(0.09759788, 0.10297893, 0.11593440, 0.09008655)
data1 <- data.frame(time1, condition, bias_change, stderror)
head(data1)## time1 condition bias_change stderror
## 1 immediate cued 0.09593775 0.09759788
## 2 immediate uncued -0.05390545 0.10297893
## 3 week cued 0.18906890 0.11593440
## 4 week uncued 0.09643346 0.09008655
wow so close. before I move onto creating the ggplot I’m double checking that each of the values reproduced look similar to the original figure. Note the authors didn’t explicitly state the standard error values, so the only way we knew if it was correct, was if the reproduced graph looked like the original graph from the paper.
Creating the figure
Using the code from my teammates figured out last week, I was able to recreate the figure!
We’re using geom_bar to recreate the above plot, but to make sure that the cued and uncued conditions don’t overlap on top of each other we use the position function to ‘dodge’ them from each other such that they’re adjacent instead.
alpha decreases the transparency of the colours to see the scale more easily to read the bias change values.
geom_errorbar is used to calculate the error values for each of the x values (time1). The equations of ymin and ymax were slightly edited from teammates code, where I used + and - stderror rather than sd (which they used). Again I’m changing the aesthetics of the graph where width determines the width of the top/bottom parts of the error bars,colour for colour, alpha as above, and position as above to get them in the middle of the bar.
ylim ensures that the y value of the graph doesn’t go past a certain value
labs relabels the x axis as null to remove the ‘time1’ lavel, the y label to below, and finally adding a caption to the figure to describe what it’s about.
fig4 <- ggplot(data = data1, aes(x = time1, y = bias_change,fill = condition)) +
geom_bar(position = "dodge", stat = "identity", alpha=0.7) +
geom_errorbar(aes(x= time1, ymin=bias_change-stderror, ymax=bias_change+stderror), width=0.4, colour="grey", alpha= 0.9, position = position_dodge(0.9)) +
ylim(-0.2, 0.4) +
labs(x = "", y = "Bias Change", caption = "Fig 4. Change in implicit bias levels at the immediate and one-week delay tests.")
print(fig4)And with all that work, I am finally done. And hooray the error bars look more accurate this time round!!
Getting onto github
I couldn’t make the group zoom meeting this week so I used a guide that my members linked onto our facebook chat to instead git, create a github and link it to R studio
To be honest the whole process seemed quite difficult and the document was very lengthy. But luckily during our group meeting my team members were able to show me the most important things I needed to learn, as well as sending Jenny a message to add me as administrator
I’m mostly set up now, I just need to practice adding documents up onto git and changing ones that my group members have put up there and reuploading it
Group presentation progress
- During our meeting we got started on our presentation
- We basically ideas down for every part of the presentation which is great
- We began splitting up some of the work we brainstormed which we would further develop the script and add it to the slides
Challenges
- Many of the challenges were documented above, which largely related to figuring out figure 4
- There is something funny that happened where I tried running the code I had in a previous learning log and I got an error.
This was the code:
biaslevelsbycondition <- cleandata %>% select(baseIATcued, preIATcued, postIATcued, weekIATcued, baseIATuncued, preIATuncued, postIATuncued, weekIATuncued) %>% summarise(across(contains(“IAT”), list(mean = mean, sd = sd)))
And this was the error:
Problem with summarise() input ..1. i ..1 = across(contains("IAT"), list(mean = mean, sd = sd)). x Can’t convert a double vector to function
I don’t know what happened since this code worked in the past. I checked if the ‘cleandata’ had been changed since and it haven’t, so I’m a bit stuck.
Successes
- Table 4 was created to a higher degree of accuracy which i shared with my team
- We finished reproducing everything !!
- And we have made good progress on our presentation
Next steps from here
- revisit table 3 where I got the error described above
- complete my part of the presentation in time for our weekly meeting
- get everything we have on each of the tables/figures onto github