Wk 5 Learning Log

This week’s goals

Finish reproducing all of the tables so we can move onto the plots
Attend my weekly group meeting to share our progress
Attend this week’s QnA, don’t forget to ask my question!

Table 2:

Preliminaries

setting up

making sure that when I knit this document, error messasges aren’t printed

load packages

library(tidyverse) #for data wrangling and visualisation

## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --

## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.1.2     v dplyr   1.0.6
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(gt) #for creating a table

read in the data

This time using the clean data file that was I saved when reproducing the descriptives from the first table. To recap this file has relevant variables renamed, and omits the data from excluded participants.

cleandata <- read_csv("cleandata.csv")

## 
## -- Column specification --------------------------------------------------------
## cols(
##   .default = col_character(),
##   Age = col_double(),
##   General_1_EnglishYrs = col_double(),
##   General_1_CaffCups = col_double(),
##   General_1_CaffHrsAgo = col_double(),
##   General_1_UniYears = col_double(),
##   EES = col_double(),
##   AlertTest_1_Concentr_1 = col_double(),
##   AlertTest_1_Refresh_1 = col_double(),
##   AlertTest_2_Concentr_1 = col_double(),
##   AlertTest_2_Refresh_1 = col_double(),
##   AlertTest_3_Concentr_1 = col_double(),
##   AlertTest_3_Refresh_1 = col_double(),
##   AlertTest_4_Concentr_1 = col_double(),
##   AlertTest_4_Refresh_1 = col_double(),
##   Total_sleep = col_double(),
##   Wake_amount = col_double(),
##   NREM1_amount = col_double(),
##   NREM2_amount = col_double(),
##   SWS_amount = col_double(),
##   REM_amount = col_double()
##   # ... with 26 more columns
## )
## i Use `spec()` for the full column specifications.

Data wrangling

Now cleandata still has a lot of variables that won’t all be used to reproduce table 2. Thus from cleandata I’m using select to obtain the relevant variables required to reproduce table 2 to create a new data set called implicitbiasslevels.

Then I’m using summarise to obtain the descriptive statistics (mean and sd) of all those variables in select. Rather than listing all those variables again in summarise I’m using the across function. It is handy in combination with contain as it creates summaries of all the variables that contains a particular phrase, in this case, it’s “IAT”

implicitbiaslevels <- cleandata %>% 
  select(base_IAT_race, base_IAT_gen, pre_IAT_race, pre_IAT_gen) %>% 
  summarise(across(contains("IAT"), list(mean = mean, sd = sd)))

print(implicitbiaslevels)

## # A tibble: 1 x 8
##   base_IAT_race_mean base_IAT_race_sd base_IAT_gen_mean base_IAT_gen_sd
##                <dbl>            <dbl>             <dbl>           <dbl>
## 1              0.619            0.442             0.494           0.362
## # ... with 4 more variables: pre_IAT_race_mean <dbl>, pre_IAT_race_sd <dbl>,
## #   pre_IAT_gen_mean <dbl>, pre_IAT_gen_sd <dbl>

Recreating table 2

creating the table manually

I’m trying if I can directly pop the data into a table rather than manually creating a tibble with the means and sds by using gt().

table2 <- implicitbiaslevels %>% 
  gt()

However, this table is very wide and not in the format of the original table, so I will go back to what I already know and manually create table 2 using tibble. I cannot name two columns the same eg. mean, twice or ssd, twice, so I must name them as below for now:

table_2 <- tibble(
  mean1 = c(0.6186929, 0.4943818),
  SD1 = c(0.4423884, 0.36228),
  mean2 = c(0.2023364, 0.3109984),
  SD2 = c(0.5633004, 0.3748071))

Table Formatting

Now that I have the table, I will use various functions to format the aessthetics as close to the original table as possible:

tab_header is used to add a title, and md denotes markdown formatting used to make titles bold, italic or italic bold

format_numberis used to round the numbers in every column specified to 2dp.

tab_source_note adds a footnote at the end of the table

tab_spanner creates headings that groups two or more columns together including a label

tab_row_group creates a row group with a collection of rows within it

table_2 %>% 
  gt() %>% 
  tab_header(title = md("**Table 2. Race and Gender Implicit Bias Levels.**")) %>% 
  fmt_number(columns = vars(mean1, SD1, mean2, SD2), decimals = 2) %>% 
  tab_source_note(source_note = "Implicit bias values are the average of D600 score for each timepoint.") %>% 
  tab_spanner(label = md("**Baseline**"),columns = c(mean1, SD1)) %>% 
  tab_spanner(label = md("**Prenap**"),columns = c(mean2, SD2)) %>% 
  tab_row_group(label = md("**Race**"),rows = 1  ) %>% 
  tab_row_group(label = md("**Gender**"),rows = 2)

## Warning: `columns = vars(...)` has been deprecated in gt 0.3.0:
## * please use `columns = c(...)` instead

## Warning: `columns = vars(...)` has been deprecated in gt 0.3.0:
## * please use `columns = c(...)` instead

Table 2. Race and Gender Implicit Bias Levels.
Baseline		Prenap
mean1	SD1	mean2	SD2
Gender
0.49	0.36	0.31	0.37
Race
0.62	0.44	0.20	0.56
Implicit bias values are the average of D600 score for each timepoint.

Okay this looks good, but it can be improved. Gender and race labels are on different rows to their descriptive values, and I want to eliminate this. Perhaps we can remove tab_row_group since we only have one row of gender, and one row of race. Instead we can remake the tibbnle by including another column and calling it label for now, but we treat it as it’s own column so that we can have race and gender as their own row.

table_2 <- tibble(
  label = c("Race", "Gender"),
  mean1 = c(0.6186929, 0.4943818),
  mean2 = c(0.2023364, 0.3109984),
  SD1 = c(0.4423884, 0.36228),
  SD2 = c(0.5633004, 0.3748071)
)

Okay great, now we’re doing the similar formatting as above but we want to remove the label heading, so by using col_label we can rename each of the columns to make the table to resemble the original. Note label is renamed as "" to indicate that we want to entirely remove the label in the final table.

table_2 %>% 
  gt() %>% 
  tab_header(title = md("**Table 2. Race and Gender Implicit Bias Levels.**")) %>% 
  fmt_number(columns = vars(mean1, SD1, mean2, SD2), decimals = 2) %>% 
  tab_source_note(source_note = "Implicit bias values are the average of D600 score for each timepoint.") %>% 
  tab_spanner(label = md("**Baseline**"),columns = c(mean1, SD1)) %>% 
  tab_spanner(label = md("**Prenap**"),columns = c(mean2, SD2)) %>% 
  cols_label(mean1 = md("***Mean***"), SD1 = md("***SD***"), mean2 = md("***Mean***"), SD2 = md("***SD***"), label = " ")

## Warning: `columns = vars(...)` has been deprecated in gt 0.3.0:
## * please use `columns = c(...)` instead

## Warning: `columns = vars(...)` has been deprecated in gt 0.3.0:
## * please use `columns = c(...)` instead

Table 2. Race and Gender Implicit Bias Levels.
	Baseline		Prenap
	Mean	SD	Mean	SD
Race	0.62	0.44	0.20	0.56
Gender	0.49	0.36	0.31	0.37
Implicit bias values are the average of D600 score for each timepoint.

yay, we have now reproduced table 2. Just one thing to note is that I can no longer make race and gender bold like the previous table and I’m not sure how to do that.

Table 3

Preliminaries

load packages

library(tidyverse) #for data visualisation and wrangling eg. dplyr
library(gt) #for creating a table once the statistics have been reproduce

read in the data

Reading in the data from the csv file that was saved when coding for table 1

cleandata <-read_csv("cleandata.csv")

## 
## -- Column specification --------------------------------------------------------
## cols(
##   .default = col_character(),
##   Age = col_double(),
##   General_1_EnglishYrs = col_double(),
##   General_1_CaffCups = col_double(),
##   General_1_CaffHrsAgo = col_double(),
##   General_1_UniYears = col_double(),
##   EES = col_double(),
##   AlertTest_1_Concentr_1 = col_double(),
##   AlertTest_1_Refresh_1 = col_double(),
##   AlertTest_2_Concentr_1 = col_double(),
##   AlertTest_2_Refresh_1 = col_double(),
##   AlertTest_3_Concentr_1 = col_double(),
##   AlertTest_3_Refresh_1 = col_double(),
##   AlertTest_4_Concentr_1 = col_double(),
##   AlertTest_4_Refresh_1 = col_double(),
##   Total_sleep = col_double(),
##   Wake_amount = col_double(),
##   NREM1_amount = col_double(),
##   NREM2_amount = col_double(),
##   SWS_amount = col_double(),
##   REM_amount = col_double()
##   # ... with 26 more columns
## )
## i Use `spec()` for the full column specifications.

Data Wrangling

reproducing descriptives

First I’m using select to choose the relevant variables for reproducing table 3 as cleandata has more than what’s required via %>%. And I’m naming this new dataset biaslevelsbycondition

Similar to table 2, I’m using summarise to obtain the mean and sd of all of the variables by using the across function. Using contains is handy as all the variables I want to find the mean and SD for contain the same IAT phrase which allows me to condense the code. I’m listing the variables in the order of which I will need to construct my table later.

list helps me identify what descriptive I need for these variables.

biaslevelsbycondition <- cleandata %>% 
  select(baseIATcued, preIATcued, postIATcued, weekIATcued, baseIATuncued, preIATuncued, postIATuncued, weekIATuncued) %>% 
  summarise(across(contains("IAT"), list(mean = mean, sd = sd)))

print(biaslevelsbycondition)

## # A tibble: 1 x 16
##   baseIATcued_mean baseIATcued_sd preIATcued_mean preIATcued_sd postIATcued_mean
##              <dbl>          <dbl>           <dbl>         <dbl>            <dbl>
## 1            0.518          0.363           0.211         0.514            0.307
## # ... with 11 more variables: postIATcued_sd <dbl>, weekIATcued_mean <dbl>,
## #   weekIATcued_sd <dbl>, baseIATuncued_mean <dbl>, baseIATuncued_sd <dbl>,
## #   preIATuncued_mean <dbl>, preIATuncued_sd <dbl>, postIATuncued_mean <dbl>,
## #   postIATuncued_sd <dbl>, weekIATuncued_mean <dbl>, weekIATuncued_sd <dbl>

visualising the reproduced statistics

Now that we have all the values, using the gt() function will help visualize the data (means and sd) to help me format it into a table

biaslevelsbycondition %>% 
  gt()

baseIATcued_mean	baseIATcued_sd	preIATcued_mean	preIATcued_sd	postIATcued_mean	postIATcued_sd	weekIATcued_mean	weekIATcued_sd	baseIATuncued_mean	baseIATuncued_sd	preIATuncued_mean	preIATuncued_sd	postIATuncued_mean	postIATuncued_sd	weekIATuncued_mean	weekIATuncued_sd
0.5175814	0.3631716	0.2108864	0.5140243	0.3068241	0.4445511	0.3999553	0.3871947	0.5954932	0.4471114	0.3024484	0.4419679	0.248543	0.4776407	0.3988819	0.4670664

Recreating table 3:

creating the table/tibble

I’m using tibble to put the above data into a table with its respective columns. Again I’m creating a column called label for now so I can have post, pre, baseline etc. labels on the ssame row as their means and sds.

table_3 <- tibble(
  label = c("Baseline", "Prenap", "Postnap", "1-week Delay"),
  mean1 = c(0.5175814, 0.2108864, 0.3068241, 0.3999553),
  SD1 = c(0.3631716, 0.5140243, 0.4445511, 0.3871947),
  mean2 = c(0.5954932, 0.3024484, 0.248543, 0.3988819),
  SD2 = c(0.4471114, 0.4419679, 0.4776407, 0.4670664)
)

formatting the table

I’m using the ssame functions as specified in table 2 with only one exception. In gt()I added rowname_col to be label column I specified so I don’t have to rename label to "" at the end in col_labels

table_3 %>% 
  gt(rowname_col = "label") %>% 
  tab_header(title = md("**Table 3. Implicit bias levels by condition.**")) %>% 
  fmt_number(columns = vars(mean1, SD1, mean2, SD2), decimals = 2) %>% 
  tab_source_note(source_note = "Implicit bias values are the average of D600 score for each timepoint.") %>% 
  tab_spanner(label = md("**Cued**"),columns = c(mean1, SD1)) %>% 
  tab_spanner(label = md("**Uncued**"),columns = c(mean2, SD2)) %>% 
  cols_label(mean1 = md("***Mean***"), SD1 = md("***SD***"), mean2 = md("***Mean***"), SD2 = md("***SD***"))

## Warning: `columns = vars(...)` has been deprecated in gt 0.3.0:
## * please use `columns = c(...)` instead

## Warning: `columns = vars(...)` has been deprecated in gt 0.3.0:
## * please use `columns = c(...)` instead

Table 3. Implicit bias levels by condition.
	Cued		Uncued
	Mean	SD	Mean	SD
Baseline	0.52	0.36	0.60	0.45
Prenap	0.21	0.51	0.30	0.44
Postnap	0.31	0.44	0.25	0.48
1-week Delay	0.40	0.39	0.40	0.47
Implicit bias values are the average of D600 score for each timepoint.

yay, we did it again. Time for table 4

Table 4:

Preliminaries

Loading relevant packages

library(tidyverse) #for data visualisation and wrangling eg. dplyr
library(gt) #for creating a table once the statistics have been reproduced
library(janitor) #to use the tabyl function

## 
## Attaching package: 'janitor'

## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

Reading in the data

Again, Im using the same cleandata csv file that was saved when coding for table 1

cleandata <-read_csv("cleandata.csv")

## 
## -- Column specification --------------------------------------------------------
## cols(
##   .default = col_character(),
##   Age = col_double(),
##   General_1_EnglishYrs = col_double(),
##   General_1_CaffCups = col_double(),
##   General_1_CaffHrsAgo = col_double(),
##   General_1_UniYears = col_double(),
##   EES = col_double(),
##   AlertTest_1_Concentr_1 = col_double(),
##   AlertTest_1_Refresh_1 = col_double(),
##   AlertTest_2_Concentr_1 = col_double(),
##   AlertTest_2_Refresh_1 = col_double(),
##   AlertTest_3_Concentr_1 = col_double(),
##   AlertTest_3_Refresh_1 = col_double(),
##   AlertTest_4_Concentr_1 = col_double(),
##   AlertTest_4_Refresh_1 = col_double(),
##   Total_sleep = col_double(),
##   Wake_amount = col_double(),
##   NREM1_amount = col_double(),
##   NREM2_amount = col_double(),
##   SWS_amount = col_double(),
##   REM_amount = col_double()
##   # ... with 26 more columns
## )
## i Use `spec()` for the full column specifications.

Data Wrangling

obtaining relevant participant data

I’m using filter to remove the data of the participant who did not hear the cue as stated on the footnote of the original table.

Then I used select to choose the relevant variables for reproducing table 4 using the %>% function as cleandata has more variables than required.

soundcuereporting <- cleandata %>% 
  filter(heard_cue_report != "NA") %>% 
  select(heard_cue_report, heard_cue_exit)

print(soundcuereporting)

## # A tibble: 30 x 2
##    heard_cue_report       heard_cue_exit
##    <chr>                  <chr>         
##  1 no                     no            
##  2 no                     unsure        
##  3 no                     no            
##  4 no                     no            
##  5 no                     no            
##  6 no                     no            
##  7 no                     no            
##  8 no                     no            
##  9 maybe, unsure, unclear no            
## 10 no                     no            
## # ... with 20 more rows

Now I am ready to count the number of people who reported different answers to each of the heard cue report and heard cue exit question.

no. participants who reported no verbal report and exit questionnaire

I’m creating a variable called reportnoexitno to categorise this group of participants. Using the soundcuereporting data created earlier filter selects the participant’s data who fulfills the requirements above (in the bold title) and tally will count the number of participants who fulfills all of the stated criteria in filter

reportnoexitno <- soundcuereporting %>% 
  filter(heard_cue_report == "no", heard_cue_exit == "no") %>% 
  tally()

print(reportnoexitno)

## # A tibble: 1 x 1
##       n
##   <int>
## 1    26

Okay now I can use this same method to obtain values for:

no. participants who reported no on verbal report but maybe on verbal report

reportnoexitmaybe <- soundcuereporting %>% 
  filter(heard_cue_report == "no", heard_cue_exit == "unsure") %>% 
  tally()

print(reportnoexitmaybe)

## # A tibble: 1 x 1
##       n
##   <int>
## 1     2

no. participants who reported maybe on verbal report and no on verbal report

reportmaybeexitno <- soundcuereporting %>% 
  filter(heard_cue_report == "maybe, unsure, unclear", heard_cue_exit == "no") %>% 
  tally()

print(reportmaybeexitno)

## # A tibble: 1 x 1
##       n
##   <int>
## 1     2

no. participants who reported maybe on verbal report and no on verbal report

reportmaybeexitno <- soundcuereporting %>% 
  filter(heard_cue_report == "maybe, unsure, unclear", heard_cue_exit == "unsure") %>% 
  tally()

print(reportnoexitno)

## # A tibble: 1 x 1
##       n
##   <int>
## 1    26

Okay this method works to reproduce the values I need, and I can continue using this method to find the total values that remain to be reproduced, but I feel that maybe there is a faster way of doing this which enables me to condense my code.

trying a new method

I will try using the group_by function along with tally in order to count the number of values for each for each of the responses in heardcuereport and heard_cue_exit

#first obtaining those who said maybe or no in the heard cue report questionnaire  

report <- soundcuereporting %>% 
  group_by(heard_cue_report) %>% 
  tally() %>% 
  ungroup()

print(report)

## # A tibble: 2 x 2
##   heard_cue_report           n
##   <chr>                  <int>
## 1 maybe, unsure, unclear     2
## 2 no                        28

yes! success, now repeat with heard_cue_exit

#counting those who said maybe or no in the heard cue exit questionnaire

exit <- soundcuereporting %>% 
  group_by(heard_cue_exit) %>% 
  tally() %>% 
  ungroup()

print(exit)

## # A tibble: 2 x 2
##   heard_cue_exit     n
##   <chr>          <int>
## 1 no                28
## 2 unsure             2

Now that I have the total values (refer to original table for more detail), I will need to find the middle values of the table. Maybe I can use group_by to do this:

reportandexit <- soundcuereporting %>% 
  group_by(heard_cue_report, heard_cue_exit) %>% 
  tally() %>% 
  ungroup() 

print(reportandexit)

## # A tibble: 3 x 3
##   heard_cue_report       heard_cue_exit     n
##   <chr>                  <chr>          <int>
## 1 maybe, unsure, unclear no                 2
## 2 no                     no                26
## 3 no                     unsure             2

alright this works for the most part, but there’s one part of the data missing in the table where no is the answer to the exit and report questionnaire. This values is zero so I wonder if it just doesn’t show up if n = 0?

trying something new

I’m going to try using the taybl function I learnt in the QnA from the janitor package to see if i can reproduce the 0 value.

tabyl will count the number of unique responses across each of the variables listen in the brackets.

reportandexit <- soundcuereporting %>% 
  tabyl(heard_cue_report, heard_cue_exit) 

print(reportandexit)

##        heard_cue_report no unsure
##  maybe, unsure, unclear  2      0
##                      no 26      2

#yay success

Recreating the table

creating the table

Again I’m using tibble to insert the reproduced numbers into a table

table_4 <- tibble(
  label = c("No", "Maybe", "Total"),
  no = c(26, 2, 28),
  maybe = c(2, 0, 2),
  total = c(28, 2, 30)
)

Table Formatting

Similar to previous tables I’m using gt and subsequent functions detailed above to format the table

But with the addition of the tab_stubhed function to create a title for the no, maybe and total column

table_4 %>% 
  gt() %>% 
  tab_header(title = "Table 4. Sound cue reporting. ") %>% 
  tab_source_note("Participants responses to the postnap verbal inquiry and to the exit questionnaire. A response was not recorded for n = 1 participant; this participant reported that they did not hear the sound cue on the final exit questionnaire.") %>% 
  fmt_number(columns = c(no, maybe,  total), decimals = 0) %>% 
  tab_spanner(label = "Reported Hearing Cue on Verbal Report?", columns = c(no, maybe, total)) %>% 
  tab_stubhead(label = "Reported Hearing Cue on Exit Questionnaire?") %>% 
  cols_label(no = "No", maybe = "Maybe", total = "Total", label = " ")

Table 4. Sound cue reporting.
	Reported Hearing Cue on Verbal Report?
	No	Maybe	Total
No	26	2	28
Maybe	2	0	2
Total	28	2	30
Participants responses to the postnap verbal inquiry and to the exit questionnaire. A response was not recorded for n = 1 participant; this participant reported that they did not hear the sound cue on the final exit questionnaire.

So this works, but the stubhead for Reported hearing cue on exit questionnaire is missing, I’m not too sure why so I’m going to try some reformatting:

Similar to table 3, I’m using rowname_col in gt() creates a column name in the data table which lists out the values previously defined in label() (see above tibble)

However, now there is no title for the no, maybe and total column so using tab_spanner allows us to create a title for that column.

Finally I’m using md() which denotes markdown formatting to do some final aesthetic formatting in creating bold and italiscied row and column titles.

table_4 %>% 
  gt(rowname_col = "label") %>% 
  fmt_number(columns = c(no, maybe,  total), decimals = 0) %>% 
  tab_spanner( label = md("**Reported Hearing Cue on Verbal Report?**"), columns = c(no, maybe, total)) %>% 
  tab_stubhead(label = md("**Reported Hearing Cue on Exit Questionnaire?**")) %>% 
  cols_label(no = md("**No**"), maybe = md("**Maybe**"), total = md("***Total***")) %>% 
    tab_header(title = md("**Table 4. Sound cue reporting.**")) %>% 
  tab_source_note("Participants responses to the postnap verbal inquiry and to the exit questionnaire. A response was not recorded for n = 1 participant; this participant reported that they did not hear the sound cue on the final exit questionnaire.")

Table 4. Sound cue reporting.
Reported Hearing Cue on Exit Questionnaire?	Reported Hearing Cue on Verbal Report?
Reported Hearing Cue on Exit Questionnaire?	No	Maybe	Total
No	26	2	28
Maybe	2	0	2
Total	28	2	30
Participants responses to the postnap verbal inquiry and to the exit questionnaire. A response was not recorded for n = 1 participant; this participant reported that they did not hear the sound cue on the final exit questionnaire.

And finally we are done with all the tables! Again, just to note desptie having the md() to format the column labels asthetically, the final format of the table remains the same

Challenges and Successesss

There were a lot of challenges this week, which has I have documented throughout this lerarning log. I have omitted some of the challenges that were originally encountered as this would have made this learning log way too long. Nevertheless efficient groupwork made achieving my weekly goal possible. The QnA had topics that came into clutch, helping me resolve some problems. I’m keen to move onto reproducing the figures. Only three left!

Next steps from here

Try work on some of the figures during flex week
Schedule a group meeting to tackle those figures
Use what I learnt alogn the way this week to improve the code for table 1 if possible

Wk 5 Learning Log

Julia Chen

03/07/2021

This week’s goals

Table 2:

Table 3

Table 4:

Challenges and Successesss

Next steps from here