Week 5 coding goals

Really got stuck into the group project this week and led the charge on the plots. Goals this week were

Figure out how to put data into tables
Finish at least one plot

Challenges and successes

1. Figure out how to put data into tables

My group members figured this out before me so I had to reverse engineer it for myself but the steps seemed to be just put everything in a tibble. This uses the tibble function then organises everything in columns - like an Excel sheet. I also learnt the difference between a tibble and a dataframe when learning this. A tibble only displays the first ten rows whereas a dataframe shows everything - it’s a purely aesthetic difference it seems.

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✓ ggplot2 3.3.4     ✓ purrr   0.3.4
## ✓ tibble  3.1.2     ✓ dplyr   1.0.6
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(gt)

table3 <- tibble(
  label = c("Baseline", "Prenap", "Postnap", "1-week delay"),
  mean1 = c(0.518, 0.211, 0.307, 0.4), #these values are calculated from last week's descriptives
  mean2 = c(0.595, 0.302, 0.249, 0.399),
  SD1 = c(0.363, 0.514, 0.445, 0.387),
  SD2 = c(0.447, 0.442, 0.478, 0.467)
)

print(table3)

## # A tibble: 4 x 5
##   label        mean1 mean2   SD1   SD2
##   <chr>        <dbl> <dbl> <dbl> <dbl>
## 1 Baseline     0.518 0.595 0.363 0.447
## 2 Prenap       0.211 0.302 0.514 0.442
## 3 Postnap      0.307 0.249 0.445 0.478
## 4 1-week delay 0.4   0.399 0.387 0.467

From there, our group used the gt() function to sort it into a table. Here we are using the table three tibble which was calculated above, then piping it into the gt() function so it can work with the data. The tab_header allows us to name the entire table, tab_source_note is the caption that explains the table in reference to the rest of the article, fmt_number formats the data into four columns and each of those values can only have two decimal points. From there, the tab_spanner allows us to split the table off into its own section: one for cued and one for uncued. We then specify which columns go under each group. Finally, cols_label allows us to rename the columns to improve readability.

table3 %>% 
  gt() %>% 
  tab_header(
    title = "Table 3. Implicit bias levels by condition") %>% 
  tab_source_note("Implicit bias values are the average D600 score for each timepoint") %>% 
  fmt_number(columns = vars(mean1, mean2,  SD1, SD2), decimals = 2) %>% 
  tab_spanner( 
    label = "Cued", #Spanner column label 
    columns = c(mean1, SD1)
    ) %>% 
  tab_spanner(
    label = "Uncued", #Spanner column label 
    columns = c(mean2, SD2)
  ) %>% 
  cols_label(mean1 = "Mean", mean2 = "Mean", SD1 = "SD", SD2 = "SD", label = " ")

## Warning: `columns = vars(...)` has been deprecated in gt 0.3.0:
## * please use `columns = c(...)` instead

## Warning: `columns = vars(...)` has been deprecated in gt 0.3.0:
## * please use `columns = c(...)` instead

Table 3. Implicit bias levels by condition
	Cued		Uncued
	Mean	SD	Mean	SD
Baseline	0.52	0.36	0.59	0.45
Prenap	0.21	0.51	0.30	0.44
Postnap	0.31	0.45	0.25	0.48
1-week delay	0.40	0.39	0.40	0.47
Implicit bias values are the average D600 score for each timepoint

print(table3)

## # A tibble: 4 x 5
##   label        mean1 mean2   SD1   SD2
##   <chr>        <dbl> <dbl> <dbl> <dbl>
## 1 Baseline     0.518 0.595 0.363 0.447
## 2 Prenap       0.211 0.302 0.514 0.442
## 3 Postnap      0.307 0.249 0.445 0.478
## 4 1-week delay 0.4   0.399 0.387 0.467

This was largely successful to reverse engineer as my group members were kind enough to go through their code with me and explain every step of the way. The only challenge I faced in this section was understanding a tibble in the first place, but that came when I tried to take on my next goal.

2. Finish at least one plot

Since all my group members were already far ahead of me in their understanding of code (since I had a lot of assignments last week), I decided to repay the favour and lead the charge on the plots. Looking through the plots, I searched for the one which I thought was the easiest to start with.

In our report, we have three plots to choose from:

I did not choose graph 1 because I’m not even sure what kind of graph it is. It’s like a mix between a line graph and a column graph, and I would rather figure out something a bit easier

This is the graph I ended up trying out since it looked simply enough: a column graph with an error bar - oh how wrong I was.

I don’t even understand how to read this graph so I’ve left it for now.

So in my quest to reproduce the second graph, my brain at 3am for some reason thought it was a boxplot, so I wasted a lot of time wondering how to fit the data in. Eventually I realised my mistake.

First, I had to calculate the data that goes into the plot. Since it’s a plot that illustrates change, I had to calculate that myself.

#calculate change
pre_post_change_cued = 0.31 - 0.21 #note, these values are those taken from table 3

pre_post_change_uncued = 0.25 - 0.3

pre_week_change_cued = 0.40 - 0.21

pre_week_change_uncued = 0.40 - 0.30

#Create dataframe
fig4 <- tibble(
  change_from_pre_to = c("immediate","week"),
  cued = c(0.1, 0.19),
  uncued = c(-0.05, 0.1),
)

print(fig4)

## # A tibble: 2 x 3
##   change_from_pre_to  cued uncued
##   <chr>              <dbl>  <dbl>
## 1 immediate           0.1   -0.05
## 2 week                0.19   0.1

It took me an incredibly long time to try figure out how I wanted to represent this in a table, and when I initially tried to create the tibble, I had the axes wrong. I googled how to flip the labels on the table’s axes but none of the solutions worked so I ended up just renaming everything myself. When I fixed the tibble, I decided to try create a basic column graph to get the gist of it first.

I loaded ggplot2 so I could put the tibble data into a graph. From there, I created a tibble (note that tibble and data.frame are essentially interchangeable with tables less than 10 rows). I then created a plot. This took a lot of trial and error due to my initial misunderstanding of tibbles but I eventually got there. I also added geom_col so it would translate into a column graph.

#load packages
library(ggplot2)

#create data
cued_changes <- data.frame(
  time = c("immediate", "week"),
  bias_change = c(0.10,0.19) 
)

head(cued_changes)

##        time bias_change
## 1 immediate        0.10
## 2      week        0.19

# plot
cued_graph <- ggplot(data = cued_changes, aes(
  x = time,
  y = bias_change
)) +
  geom_col()

print(cued_graph)

This basic column graph was not too bad for me to figure out on my own. But from there, figuring out how to group it was a nightmare.

So I initially Googled it and used the information from this website. It took me a while to break it down but I eventually figured out that I could ignore the ‘value’ part and rename the others for my needs. My findings are as the comments below.

#create dataset
time1 <- c(rep("immediate",2),rep("week",2)) #the two groups of columns
condition <-rep(c("cued","uncued"),2) #the subgroups within the grouped columns
bias_change1 <- c(0.10, -0.05,0.19, 0.10) #calculated differences, needs to be in order of the graph
data = data.frame(time1, condition)

head(data)

##       time1 condition
## 1 immediate      cued
## 2 immediate    uncued
## 3      week      cued
## 4      week    uncued

# plot
ggplot(data = data, aes(
  x = time1,
  y = bias_change1,
  fill = condition
)) +
  geom_bar(position = "dodge",stat = "identity")

The main differences between this and the graph about which is only the cued condition is that the fill is how we greated groups. From there, we had to change from geom_col to geom_bar because geom_col wouldn’t let us group it next to each other. However, changing to geom_bar gave us some issues as geom_bar automatically aggregates data by counting the number of rows in a tibble. That is why we had to set stat = identity. Additionally position = dodge makes the columns group next to eachother instead of stack, which makes it look more like the picture in the original article.

All of this I figured out with other members of the group :)

But then came an even bigger challenge: adding the error bars. I knew it had something to do with using the geom_error function since I had seen it in another group’s project during the workshop this week. I spent an excrutiatingly long time trying to even work out what exactly those error bars were since most normal error bars don’t exceed the length of the bar. But unfortunately I was unable to complete that this week due to another assignment due this Sunday (which is why this learning log is so late). But we have a group meeting tomorrow, so I’ll spend time to do it then

Next steps in my coding journey

We’re coming towards the end of the project now! Since it’s reading week, I’m hoping to get a bit more done. Goals include:

Adding error bars to Figure 4
Finishing another plot
Starting on the final plot

Week 5 Learning Log

Katherine Wong

04/07/2021

Week 5 coding goals

Challenges and successes

1. Figure out how to put data into tables

2. Finish at least one plot

Next steps in my coding journey