This week's coding goals

  1. Find easier and better ways to format our data into a table, so that it looks more similar to what is seen our replication study
  2. Find out how to round up our figures into 2 decimal places, as is seen in our replication study
  3. Continue reproducing the next tables and values
  • I didn't have a specific goal with this - I wanted to at least start and finish reproducing Table 2, but I didn't want to restrict myself if I have the time to be able to continue reproducing the other tables and figures

How did I go?

Goal 1 & 2: tidy up Table 1's code and reformat Table 1 to only show 2 decimal places

  • We had a group meeting on Monday and were able to recreate Table 1 with 2 decimal places and remove the characteristics label that was showing up previously:
  • NB: sometimes I have issues with my Learning Log output so the full attempt for Table 1 is linked here

Load relevant packages

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.4     ✓ purrr   0.3.4
## ✓ tibble  3.1.2     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(janitor) #cleannames() function 
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(dplyr)
library(haven)
library(readspss)
library(gt) #different package for creating table 
library(glue) #different package for creating table 
## 
## Attaching package: 'glue'
## The following object is masked from 'package:dplyr':
## 
##     collapse

Load data

replicationdata <- read.sav("Humiston & Wamsley 2019 data.sav")

Create a dataframe for Table 1 (Participant characteristics)

#Manually inputting values from pre-calculated values 
table1 <- tibble( #3 columns
  Characteristics = c("Age (yrs)", "ESS", "SSS", "Baseline implicit bias", "Prenap implicit bias", "Postnap implicit bias", "One-week delay implicit bias", "Sex (% male)", "Cue played during nap (% racial cue)"), #label
  Mean = c(19.5, 15.3, 2.81, 0.557, 0.257, 0.278, 0.399, 0.484, 0.548),
  SD = c(1.23, 2.83, 0.749, 0.406, 0.478, 0.459, 0.425, NA, NA)
)

print(table1)
## # A tibble: 9 x 3
##   Characteristics                        Mean     SD
##   <chr>                                 <dbl>  <dbl>
## 1 Age (yrs)                            19.5    1.23 
## 2 ESS                                  15.3    2.83 
## 3 SSS                                   2.81   0.749
## 4 Baseline implicit bias                0.557  0.406
## 5 Prenap implicit bias                  0.257  0.478
## 6 Postnap implicit bias                 0.278  0.459
## 7 One-week delay implicit bias          0.399  0.425
## 8 Sex (% male)                          0.484 NA    
## 9 Cue played during nap (% racial cue)  0.548 NA
table1 %>% #use table1 dataframe
  gt() %>% #gt() package to create table 
  tab_header(title = md("**Table 1. Participant characteristics.**")) %>% #title, with bolded font. 
  fmt_number(columns = vars(Mean, SD), decimals = 2) %>% #format values to output for 2 decimal places
    #vars() function used to select variables "Mean" and "SD" for the table 
  tab_source_note(source_note = "Implicit bias values are the average of D600 score for each timepoint") %>%  #Source note for table footer 
  cols_label(Characteristics = "") #replace the original "characteristics" label with "" - this shows as blank
## Warning: `columns = vars(...)` has been deprecated in gt 0.3.0:
## * please use `columns = c(...)` instead

## Warning: `columns = vars(...)` has been deprecated in gt 0.3.0:
## * please use `columns = c(...)` instead
Table 1. Participant characteristics.
Mean SD
Age (yrs) 19.50 1.23
ESS 15.30 2.83
SSS 2.81 0.75
Baseline implicit bias 0.56 0.41
Prenap implicit bias 0.26 0.48
Postnap implicit bias 0.28 0.46
One-week delay implicit bias 0.40 0.42
Sex (% male) 0.48 NA
Cue played during nap (% racial cue) 0.55 NA
Implicit bias values are the average of D600 score for each timepoint

Goal 3: continue reproducing the next tables and values

  • NB: each of my attempts are linked here in case this Learning Log does not output correctly:
  • Table 2
  • Table 3
  • Table 4

Table 2 attempt

For Table 2, we attempted to recreate the table below for means and SDs (excluding t-values, and p-values):

I first attempted to do this by manually and separately calculating each variable. For example, to calculate the mean and SD of the baseline Race bias, I created new data new data BaselineRace from cleandata (the dataset without the excluded participants). I used the select() function to select only the variable "base_IAT_race" from cleandata. Then, using the summarise() function I claculated the mean and SD for the selected variable.

I used print() to check that the calculated mean and SD are identical to that shown in the original table.

BaselineRace <- cleandata %>% 
  select(base_IAT_race) %>% 
  summarise(BaselineRaceMean = mean(base_IAT_race), BaselineRaceSD = sd(base_IAT_race)) 
## Error in select(., base_IAT_race): object 'cleandata' not found
print(BaselineRace)
## Error in print(BaselineRace): object 'BaselineRace' not found

I followed that method for all four data variables, but during that Monday meeting, Julia showed us a way to clean up the data and get the same output in a more efficient way: she created a new data set called implicitbiaslevels using the <- and selected the 4 variables base_IAT_race, base_IAT_gen, pre_IAT_race, pre_IAT_gen from the cleandata dataset using the select() function. Then, she calculated the mean and sd for all those variables using the summarise() function. She captured and calculatesd all the means and SDs of each variable that contains "IAT" within its variable name using contaions("IAT") and listed the calculated means and sds for each variable under the label "mean" and "sd" using the list() function. Now the data "implicitbiaslevels" has the means and SDs for each of the variables previously selected, instead of having to separately code for each mean and SD needed and then having to create a new dataframe.

implicitbiaslevels <- cleandata %>% 
  select(base_IAT_race, base_IAT_gen, pre_IAT_race, pre_IAT_gen) 
## Error in select(., base_IAT_race, base_IAT_gen, pre_IAT_race, pre_IAT_gen): object 'cleandata' not found
implicitbiaslevels <- implicitbiaslevels %>% 
  summarise(across(contains("IAT"), list(mean = mean, sd = sd))) 
## Error in summarise(., across(contains("IAT"), list(mean = mean, sd = sd))): object 'implicitbiaslevels' not found

After calculating the necessary data, it was time to put it into a table. We decided to use the same gt() package as was used for Table 1. We loaded the necessary package then tried to recreate Table 2 using simplified code and dataset "implicitbiaslevels". We were hoping that this simplified code could be used to quickly and simply recreate the table.

implicitbiaslevels %>% 
  gt() %>% 
  tab_header(
    title = "Table 2: Race and Gender Implicit Bias Levels") %>% #title
  tab_source_note("Implicit bias values are the average D600 score for each timepoint") %>% #creates source note in the table footer
  fmt_number(columns = vars(mean1, mean2,  SD1, SD2), decimals = 2) %>% #formats values to 2 decimal places
      #vars() function used to select variables to input into the columns
  tab_spanner(
    label = "Baseline", #Spanner Column label 
    columns = c(mean1, SD1) #selects variables "mean1" and "SD1" for the 2 columns underneath the spanner column label "Baseline"
  ) %>% 
  tab_spanner(
    label = "Prenap", #Spanner Column label 
    columns = c(mean2, SD2) #selects variables "mean2" and "SD2" for the 2 columns underneath the spanner column label "Prenap"
  ) %>% 
  tab_row_group( 
    label = "Race", #label for row group 
    rows = 1  #1 row for this row group 
  ) %>% 
  tab_row_group(
    label = "Gender", #label for row group
    rows = 2 #2 rows for this row group 
  ) 
## Error in dplyr::group_vars(data): object 'implicitbiaslevels' not found
print(implicitbiaslevels)
## Error in print(implicitbiaslevels): object 'implicitbiaslevels' not found

It does work but does not let the mean be inputted alone. We thought it might need to be identified as separate columns as stub labels don't sit flush but look more like a subheader. Thus, we tried to manually input the data.

#Since using the simplified dataset "implicitbiaslevels" didn't work, I tried inserting/formatting the same values differently by creating a new dataframe with different labels. 
#values are taken from the "implicitbiaslevels" data. 
table2 <- tibble( #3 columns
  Variables = c("base_IAT_race", "base_IAT_gen", "pre_IAT_race", "pre_IAT_gen"), #new variable names 
  Mean = c(0.619, 0.494, 0.202, 0.311), 
  SD = c(0.442, 0.362, 0.563, 0.375)
)

print(table2)
## # A tibble: 4 x 3
##   Variables      Mean    SD
##   <chr>         <dbl> <dbl>
## 1 base_IAT_race 0.619 0.442
## 2 base_IAT_gen  0.494 0.362
## 3 pre_IAT_race  0.202 0.563
## 4 pre_IAT_gen   0.311 0.375

We tried to recreate Table 2 using manually-inputed data "table2" but it kept giving this error: "Error: Can't subset columns that don't exist. x Column Mean doesn't exist."

table2 %>% 
  gt( #use gt() package to create table
    rowname_col = "Race", "Gender" #gave the 2 row names "Race" and "Gender"
  ) %>% 
  tab_header(
    title = "Table 2: Race and Gender Implicit Bias Levels") %>% #title for the table
  tab_source_note("Implicit bias values are the average D600 score for each timepoint") %>% #creates source note in the table footer
  fmt_number(columns = vars(Mean, SD, Mean, SD), decimals = 2) %>% #formats values to 2 decimal places
      #vars() function used to select variables to input into the columns
  tab_spanner(
    label = "Baseline", #Spanner column label
    columns = c(base_IAT_race, base_IAT_gen) #selects variables "base_IAT_race" and "base_IAT_gen" for the 2 columns underneath the spanner column label "Baseline"
  ) %>% 
  tab_spanner(
    label = "Prenap", #Spanner column label 
    columns = c(pre_IAT_race, pre_IAT_gen) #selects variables "pre_IAT_race" and "pre_IAT_gen" for the 2 columns underneath the spanner column label "Prenap"
  ) 
## Warning: `columns = vars(...)` has been deprecated in gt 0.3.0:
## * please use `columns = c(...)` instead

## Warning: `columns = vars(...)` has been deprecated in gt 0.3.0:
## * please use `columns = c(...)` instead
## Error: Can't subset columns that don't exist.
## x Column `base_IAT_race` doesn't exist.
#Keeps giving this error: "Error: Can't subset columns that don't exist. x Column `Mean` doesn't exist."

I then tried creating a new dataframe:

#Tried formatting values into a new dataframe to more closely match the table that we are trying to reproduce. Values are taken from the same "implicitbiaslevels" data as before. 
table_2 <- tibble( #4 different columns 
  mean1 = c(0.6186929, 0.4943818),
  mean2 = c(0.2023364, 0.3109984),
  SD1 = c(0.4423884, 0.36228),
  SD2 = c(0.5633004, 0.3748071)
)
print(table_2)
## # A tibble: 2 x 4
##   mean1 mean2   SD1   SD2
##   <dbl> <dbl> <dbl> <dbl>
## 1 0.619 0.202 0.442 0.563
## 2 0.494 0.311 0.362 0.375

Using "table_2" dataframe, I tried to create a table. The format is now finally showing as is seen in original paper. However, outputted column names are shown as "mean1", "SD1", "mean2", "SD2", which is not identical to what is seen in the paper. I now need to figure out how to change these tab names to "Mean" and "SD".

table_2 %>% 
  gt() %>% #use gt() package to create table
  tab_header( #format table header
    title = "Table 2: Race and Gender Implicit Bias Levels") %>% #title of table 
  tab_source_note("Implicit bias values are the average D600 score for each timepoint") %>% #creates source note in the table footer
  fmt_number(columns = vars(mean1, mean2,  SD1, SD2), decimals = 2) %>% #formats values to 2 decimal places
      #vars() function used to select variables to input into the columns
  tab_spanner( 
    label = "Baseline", #Spanner column label 
    columns = c(mean1, SD1) #selects variables "mean1" and "SD1" for the 2 columns underneath the spanner column label "Baseline"
  ) %>% 
  tab_spanner(
    label = "Prenap", #Spanner column label 
    columns = c(mean2, SD2) #selects variables "mean2" and "SD2" for the 2 columns underneath the spanner column label "Prenap"
  ) %>% 
  tab_row_group( 
    label = "Race", #Row group label
    rows = 1 #1 rows underneath this row group  
  ) %>% 
  tab_row_group(
    label = "Gender", #Row group label
    rows = 2 #2 rows underneath this row group 
  ) 
## Warning: `columns = vars(...)` has been deprecated in gt 0.3.0:
## * please use `columns = c(...)` instead

## Warning: `columns = vars(...)` has been deprecated in gt 0.3.0:
## * please use `columns = c(...)` instead
Table 2: Race and Gender Implicit Bias Levels
Baseline Prenap
mean1 SD1 mean2 SD2
Gender
0.49 0.36 0.31 0.37
Race
0.62 0.44 0.20 0.56
Implicit bias values are the average D600 score for each timepoint
print(table_2)
## # A tibble: 2 x 4
##   mean1 mean2   SD1   SD2
##   <dbl> <dbl> <dbl> <dbl>
## 1 0.619 0.202 0.442 0.563
## 2 0.494 0.311 0.362 0.375

Using the same code as above, I decided to get rid of tab_row_group() function (that was used to label the group row names). Instead, I used cols_label() function to label the group columns. However, it now gives the error: "Error in cols_label(., mean1 = "mean", mean2 = "mean", SD1 = "SD", SD2 = "SD", : All column names provided must exist in the input data table."

#Used same code as before but getting rid of tab_row_group() function (that was used to label the group row names) and 
#Also, instead using cols_label() function to label the group columns
table_2 %>% 
  gt() %>% #use gt() package 
  tab_header(
    title = "Table 2: Race and Gender Implicit Bias Levels") %>% #table title 
  tab_source_note("Implicit bias values are the average D600 score for each timepoint") %>% #creates source note in the table footer
  fmt_number(columns = vars(mean1, mean2,  SD1, SD2), decimals = 2) %>% #formats values to 2 decimal places
      #vars() function used to select variables to input into the columns
  tab_spanner(
    label = "Baseline",
    columns = c(mean1, SD1)
  ) %>% 
  tab_spanner(
    label = "Prenap",
    columns = c(mean2, SD2)
  ) %>% 
  cols_label(mean1 = "mean", mean2 = "mean", SD1 = "SD", SD2 = "SD", label = " ") #changed column labels 
## Warning: `columns = vars(...)` has been deprecated in gt 0.3.0:
## * please use `columns = c(...)` instead

## Warning: `columns = vars(...)` has been deprecated in gt 0.3.0:
## * please use `columns = c(...)` instead
## Error in cols_label(., mean1 = "mean", mean2 = "mean", SD1 = "SD", SD2 = "SD", : All column names provided must exist in the input `data` table.

Thus, I tried creating a new dataframe with labels and then using the same code as before but with the new dataframe. Using this method, I was finally able to make it look like the table in the original paper.

table2_label <- tibble(
  label = c("Race", "Gender"),
  mean1 = c(0.6186929, 0.4943818),
  mean2 = c(0.2023364, 0.3109984),
  SD1 = c(0.4423884, 0.36228),
  SD2 = c(0.5633004, 0.3748071)
)
print(table2_label)
## # A tibble: 2 x 5
##   label  mean1 mean2   SD1   SD2
##   <chr>  <dbl> <dbl> <dbl> <dbl>
## 1 Race   0.619 0.202 0.442 0.563
## 2 Gender 0.494 0.311 0.362 0.375
table2_label %>% 
  gt() %>% 
  tab_header(
    title = "Table 2. Race and gender implicit bias levels") %>% #title for table 
  tab_source_note("Implicit bias values are the average D600 score for each timepoint") %>% #creates source note in the table footer
  fmt_number(columns = vars(mean1, mean2,  SD1, SD2), decimals = 2) %>% #formats values to 2 decimal places
      #vars() function used to select variables to input into the columns
  tab_spanner(
    label = "Baseline", #Spanner column label 
    columns = c(mean1, SD1) #selects variables "mean1" and "SD1" for the 2 columns underneath the spanner column label "Baseline"
  ) %>% 
  tab_spanner(
    label = "Prenap", #Spanner column label 
    columns = c(mean2, SD2) #selects variables "mean2" and "SD2" for the 2 columns underneath the spanner column label "Prenap"
  ) %>% 
  cols_label(mean1 = "Mean", mean2 = "Mean", SD1 = "SD", SD2 = "SD", label = " ") #changes the column labels for each one
## Warning: `columns = vars(...)` has been deprecated in gt 0.3.0:
## * please use `columns = c(...)` instead

## Warning: `columns = vars(...)` has been deprecated in gt 0.3.0:
## * please use `columns = c(...)` instead
Table 2. Race and gender implicit bias levels
Baseline Prenap
Mean SD Mean SD
Race 0.62 0.44 0.20 0.56
Gender 0.49 0.36 0.31 0.37
Implicit bias values are the average D600 score for each timepoint
#originalvariablename = "New column name"


print(table2_label)
## # A tibble: 2 x 5
##   label  mean1 mean2   SD1   SD2
##   <chr>  <dbl> <dbl> <dbl> <dbl>
## 1 Race   0.619 0.202 0.442 0.563
## 2 Gender 0.494 0.311 0.362 0.375
#Everything looks perfect!

Table 3 attempt

Full code of my attempt for Table 3 is linked here.

Table 3 was pretty similar to Table 2, in terms of calculations and table formatting, so there weren't major issues.

We first loaded the packages and loaded the data (excluding the excluded participants):

library(tidyverse)
library(dplyr)
library(gt)
library(glue)
library(haven)
replicationdata <- read.sav("Humiston & Wamsley 2019 data.sav")
cleandata <- replicationdata %>%    
  filter(exclude == "no")

Then, we calculated all the means and SDS for cued and uncued IATs. I first tried just doing means and SDs for the cued IATs, but after checking it worked. I decided to tried calculating all the cued and uncued conditions together. > First tried cued condition only

cued <- cleandata %>% #for cued condition only 
  select(baseIATcued, preIATcued, postIATcued, weekIATcued)
 #Selecting these variables from the "cleandata" dataset to create new data "cued"

#now I'm going to calculate the mean and sd of all of these variables:

cued <- cued %>% 
  summarise(across(contains("IAT"), list(mean = mean, sd = sd))) #captures and calculates the means and SDs of each variable that contains "IAT" within its variable name. Lists the calculated means and sds for each variable under the label "mean" and "sd". 

#now the data "cued" has the means and SDs for each of the variables previously selected, instead of having to separately code for each mean and SD needed and then having to create a new dataframe. 

print(cued)
##   baseIATcued_mean baseIATcued_sd preIATcued_mean preIATcued_sd
## 1        0.5175814      0.3631716       0.2108864     0.5140243
##   postIATcued_mean postIATcued_sd weekIATcued_mean weekIATcued_sd
## 1        0.3068241      0.4445511        0.3999553      0.3871947

Then tried to see if I could calculate cued and uncued conditions together:

cued_uncued <- cleandata %>% #for cued and uncued conditios  
  select(baseIATcued, preIATcued, postIATcued, weekIATcued,
         baseIATuncued, preIATuncued, postIATuncued, weekIATuncued)
 #Selecting these variables from the "cleandata" dataset to create new data "cued_uncued"

#now I'm going to calculate the mean and sd of all of these variables:

cued_uncued <- cued_uncued %>% 
  summarise(across(contains("IAT"), list(mean = mean, sd = sd))) #captures and calculates the means and SDs of each variable that contains "IAT" within its variable name. Lists the calculated means and sds for each variable under the label "mean" and "sd". 

#now the data "cued" has the means and SDs for each of the variables previously selected, instead of having to separately code for each mean and SD needed and then having to create a new dataframe. 

print(cued_uncued)
##   baseIATcued_mean baseIATcued_sd preIATcued_mean preIATcued_sd
## 1        0.5175814      0.3631716       0.2108864     0.5140243
##   postIATcued_mean postIATcued_sd weekIATcued_mean weekIATcued_sd
## 1        0.3068241      0.4445511        0.3999553      0.3871947
##   baseIATuncued_mean baseIATuncued_sd preIATuncued_mean preIATuncued_sd
## 1          0.5954932        0.4471114         0.3024484       0.4419679
##   postIATuncued_mean postIATuncued_sd weekIATuncued_mean weekIATuncued_sd
## 1           0.248543        0.4776407          0.3988819        0.4670664

After calculating all the means and SDs for the cued and uncued conditions, I had to create a new dataframe with calculated values from "cued_uncued":

table3 <- tibble(
  label = c("Baseline", "Prenap", "Postnap", "1-week delay"),
  mean1 = c(0.518, 0.211, 0.307, 0.4),
  mean2 = c(0.595, 0.302, 0.249, 0.399),
  SD1 = c(0.363, 0.514, 0.445, 0.387),
  SD2 = c(0.447, 0.442, 0.478, 0.467)
)

print(table3)
## # A tibble: 4 x 5
##   label        mean1 mean2   SD1   SD2
##   <chr>        <dbl> <dbl> <dbl> <dbl>
## 1 Baseline     0.518 0.595 0.363 0.447
## 2 Prenap       0.211 0.302 0.514 0.442
## 3 Postnap      0.307 0.249 0.445 0.478
## 4 1-week delay 0.4   0.399 0.387 0.467

Created a table using the gt() package and the table3 dataframe from above

table3 %>% 
  gt() %>% 
  tab_header(
    title = "Table 3. Implicit bias levels by condition") %>% #title for table 
  tab_source_note("Implicit bias values are the average D600 score for each timepoint") %>% 
  #creates source note in the table footer
  fmt_number(columns = vars(mean1, mean2,  SD1, SD2), decimals = 2) %>% #formats values to 2 decimal places
      #vars() function used to select variables to input into the columns
  tab_spanner( 
    label = "Cued", #Spanner column label 
    columns = c(mean1, SD1) #selects variables "mean1" and "SD1" for the 2 columns underneath the spanner column label "Cued"
    ) %>% 
  tab_spanner(
    label = "Uncued", #Spanner column label 
    columns = c(mean2, SD2) #selects variables "mean2" and "SD2" for the 2 columns underneath the spanner column label "Uncued"
  ) %>% 
  cols_label(mean1 = "Mean", mean2 = "Mean", SD1 = "SD", SD2 = "SD", label = " ") #changes the column labels for each one
## Warning: `columns = vars(...)` has been deprecated in gt 0.3.0:
## * please use `columns = c(...)` instead

## Warning: `columns = vars(...)` has been deprecated in gt 0.3.0:
## * please use `columns = c(...)` instead
Table 3. Implicit bias levels by condition
Cued Uncued
Mean SD Mean SD
Baseline 0.52 0.36 0.59 0.45
Prenap 0.21 0.51 0.30 0.44
Postnap 0.31 0.45 0.25 0.48
1-week delay 0.40 0.39 0.40 0.47
Implicit bias values are the average D600 score for each timepoint
#originalvariablename = "New column name"

Table 4 attempt

Full code for my attempt at Table 4 is linked here The table we are attempting to recreate is:

We first loaded the packages and loaded the data (excluding the excluded participants):

library(tidyverse)
library(dplyr)
library(gt)
library(glue)
library(haven)
replicationdata <- read.sav("Humiston & Wamsley 2019 data.sav")
cleandata <- replicationdata %>%    
  filter(exclude == "no")

Then, we calculated the data needed: I first calculated the number of people who said no and maybe in the verbal report using: - select() to select the variable from my "cleandata" dataset, and - tally() to count the number of "no" / "maybe" responses in the selected variable. - glimpse() was used to check whether the calculated value matches the value seen in the original paper

report_no <- cleandata %>% 
  select(heard_cue_report) %>% 
  tally(heard_cue_report == "no")

report_maybe <- cleandata %>% 
  select(heard_cue_report) %>% 
  tally(heard_cue_report == "maybe, unsure, unclear")

glimpse(report_maybe)
## Rows: 1
## Columns: 1
## $ n <int> 2
glimpse(report_no)
## Rows: 1
## Columns: 1
## $ n <int> 28

I then tried to add the two calculated variables together using sum() but it kept giving me the error: "Error: Must subset columns with a valid subscript vector. x Subscript has the wrong type data.frame<n:integer>. ℹ It must be numeric or character.".

report <- cleandata %>% 
  select(report_no, report_maybe) %>% 
  sum(c(report_no, report_maybe))
## Note: Using an external vector in selections is ambiguous.
## ℹ Use `all_of(report_no)` instead of `report_no` to silence this message.
## ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
## Error: Must subset columns with a valid subscript vector.
## x Subscript has the wrong type `data.frame<n:integer>`.
## ℹ It must be numeric or character.

Because of that error, I thought the variables were integers so I tried to convert them to numeric.

#I thought that the variables were integers so tried converting to numeric 
#tried converting integer into a numeric value 
report_no <- as.numeric(as.integer(report_no))
report_maybe <- as.numeric(as.integer(report_maybe))
#check if the above variables are the type
typeof(report_maybe)
## [1] "double"
typeof(report_no)
## [1] "double"
#the above are outputting as "double"
#According to google, a "double" output means it's a numeric value 

I tried using the unlist() function because I read online that the error might be because the code is only temporary, and that the unlist() function makes the code permanent

report_maybe <- as.numeric(unlist(report_maybe))
report_no <- as.numeric(unlist(report_no))
typeof(report_maybe) #checked to see if correct
## [1] "double"

Now, I tried again, hoping the error doesn't pop up, but now it's giving a different error

report_total <- cleandata %>% 
  select(report_no, report_maybe) %>% 
      sum(report_no, report_maybe)
## Note: Using an external vector in selections is ambiguous.
## ℹ Use `all_of(report_maybe)` instead of `report_maybe` to silence this message.
## ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
## Error in FUN(X[[i]], ...): only defined on a data frame with all numeric-alike variables
#Gives error: Error in FUN(X[[i]], ...) : only defined on a data frame with all numeric-alike variables

I then decided to just manually add the data. It gave me the correct value, so I guess this method works?

report_total <- c(2, 28) #add values of report_maybe and report_no to create report_total 
sum(report_total)
## [1] 30

Now, I had to calculate the data for the exit questionnaire. With this, I finally realised that I had to exclude a participant because of their incomplete response because I kept getting an incorrect value of 29, instead of 28.

exit_no <- cleandata %>% 
  select(heard_cue_exit) %>% 
  tally(heard_cue_exit == "no", exclude = participantID$ub46)
## Error in tally(., heard_cue_exit == "no", exclude = participantID$ub46): unused argument (exclude = participantID$ub46)
exit_maybe <- cleandata %>% 
  select(heard_cue_exit) %>% 
  tally(heard_cue_exit == "unsure")

glimpse(exit_no)
## Error in glimpse(exit_no): object 'exit_no' not found
glimpse(exit_maybe) #this gives incorrect value of 29, instead of 28. 
## Rows: 1
## Columns: 1
## $ n <int> 2
exit_total <- c(0, 29) #add values of report_maybe and report_no to create report_total 
sum(exit_total)
## [1] 29

Thus, I had to start over from the beginning and create a new dataframe without the participant who had incomplete response:

soundcuereporting <- cleandata %>% 
  select(heard_cue_exit, heard_cue_report) %>% 
  na.omit() #omitted row with incomplete response

I recalculated table 4 data without omitted participant:

report_no <- soundcuereporting %>% 
  select(heard_cue_report) %>% 
  tally(heard_cue_report == "no")

report_maybe <- soundcuereporting %>% 
  select(heard_cue_report) %>% 
  tally(heard_cue_report == "maybe, unsure, unclear")

glimpse(report_maybe)
## Rows: 1
## Columns: 1
## $ n <int> 2
glimpse(report_no)
## Rows: 1
## Columns: 1
## $ n <int> 28
exit_no <- soundcuereporting %>% 
  select(heard_cue_exit) %>% 
  tally(heard_cue_exit == "no")

exit_maybe <- soundcuereporting %>% 
  select(heard_cue_exit) %>% 
  tally(heard_cue_exit == "unsure")

glimpse(exit_no)
## Rows: 1
## Columns: 1
## $ n <int> 28
glimpse(exit_maybe) #this gives correct value of 28. 
## Rows: 1
## Columns: 1
## $ n <int> 2

I then realised that the values I had calculated so far were actually for the "Total" columns and not for the cells in the middle of the table. Thus, this code below calculates for the cells in the middle of the table:

exitno_reportno <- soundcuereporting %>% 
  filter(heard_cue_exit == "no", heard_cue_report == "no") %>% #only considers participants with "no" response in both exit questionnaire and verbal report
  summarise(n=n()) #calculates the number of variables left after filtering

exitno_reportmaybe <- soundcuereporting %>% 
  filter(heard_cue_exit == "no", heard_cue_report == "maybe, unsure, unclear") %>% 
  summarise(n=n())

exitmaybe_reportno <- soundcuereporting %>% 
  filter(heard_cue_exit == "unsure", heard_cue_report == "no") %>% 
  summarise(n=n())

exitmaybe_reportmaybe <- soundcuereporting %>% 
  filter(heard_cue_exit == "unsure", heard_cue_report == "maybe, unsure, unclear") %>% 
  summarise(n=n())

glimpse(exitno_reportno)
## Rows: 1
## Columns: 1
## $ n <int> 26
glimpse(exitno_reportmaybe) 
## Rows: 1
## Columns: 1
## $ n <int> 2
glimpse(exitmaybe_reportno) 
## Rows: 1
## Columns: 1
## $ n <int> 2
glimpse(exitmaybe_reportmaybe) #checked to see if values are correct, and they are ! yay!
## Rows: 1
## Columns: 1
## $ n <int> 0

I calculated the total participants overall:

soundcuereporting_total <- soundcuereporting %>% 
  summarise(n=n())

Then I created a new dataframe "table4" using calculated data from above:

table4 <- tibble(
  label = c("No", "Maybe", "Total"),
  no = c(26, 2, 28),
  maybe = c(2, 0, 2),
  total = c(28, 2, 30)
)

print(table4)
## # A tibble: 3 x 4
##   label    no maybe total
##   <chr> <dbl> <dbl> <dbl>
## 1 No       26     2    28
## 2 Maybe     2     0     2
## 3 Total    28     2    30

I created a table using the gt() package, but the formatting came up weird so I need to change it.

table4 %>% 
  gt() %>% 
  tab_header(
    title = "Table 4. Sound cue reporting. ") %>% #title for table 
  tab_source_note("Participants’ responses to the postnap verbal inquiry and to the exit questionnaire. A response was not recorded for n = 1 participant; this participant reported that they did not hear the sound cue on the final exit questionnaire.") %>% #source note for table 
  fmt_number(columns = vars(no, maybe,  total)) %>%  #vars() function used to select variables to input into the columns
  tab_spanner( 
    label = "Reported Hearing Cue on Verbal Report?", #Spanner column label 
    columns = c(no, maybe, total) #selects variables "no", "maybe" and "total" for the 3 columns underneath the spanner column label "Reported Hearing Cue on Verbal Report?"
    ) %>% 
  tab_row_group(
    label = "Reported Hearing Cue on Exit Questionnaire?", # Row group label
    rows = 3 # number of rows under the row group tab
  )
## Warning: `columns = vars(...)` has been deprecated in gt 0.3.0:
## * please use `columns = c(...)` instead

## Warning: `columns = vars(...)` has been deprecated in gt 0.3.0:
## * please use `columns = c(...)` instead
Table 4. Sound cue reporting.
label Reported Hearing Cue on Verbal Report?
no maybe total
Reported Hearing Cue on Exit Questionnaire?
Total 28.00 2.00 30.00
No 26.00 2.00 28.00
Maybe 2.00 0.00 2.00
Participants’ responses to the postnap verbal inquiry and to the exit questionnaire. A response was not recorded for n = 1 participant; this participant reported that they did not hear the sound cue on the final exit questionnaire.

I tried to change up the formatting by adding column labels, but it didn't seem to change anything. I thought it might be an issue with the dataframe used, so i decided to make a new dataframe.

table4 %>% 
  gt() %>% 
  tab_header(
    title = "Table 4. Sound cue reporting. ") %>% 
  tab_source_note("Participants’ responses to the postnap verbal inquiry and to the exit questionnaire. A response was not recorded for n = 1 participant; this participant reported that they did not hear the sound cue on the final exit questionnaire.") %>% 
  fmt_number(columns = vars(no, maybe,  total)) %>% 
  tab_spanner( 
    label = "Reported Hearing Cue on Verbal Report?", #Spanner column label 
    columns = c(no, maybe, total)
    ) %>% 
  tab_row_group(
    label = "Reported Hearing Cue on Exit Questionnaire?",
    rows = 3) %>% 
  cols_label(no = "No", maybe = "Maybe", total = "Total", label = " ")
## Warning: `columns = vars(...)` has been deprecated in gt 0.3.0:
## * please use `columns = c(...)` instead

## Warning: `columns = vars(...)` has been deprecated in gt 0.3.0:
## * please use `columns = c(...)` instead
Table 4. Sound cue reporting.
Reported Hearing Cue on Verbal Report?
No Maybe Total
Reported Hearing Cue on Exit Questionnaire?
Total 28.00 2.00 30.00
No 26.00 2.00 28.00
Maybe 2.00 0.00 2.00
Participants’ responses to the postnap verbal inquiry and to the exit questionnaire. A response was not recorded for n = 1 participant; this participant reported that they did not hear the sound cue on the final exit questionnaire.
#formatting is very strange, need to change

I created a new dataframe using calculated data from above, but added labels to it.

table4_ <- tibble(
  label = c("label", "No", "Maybe", "Total"),
  no = c("No", 26, 2, 28),
  maybe = c("Maybe", 2, 0, 2),
  total = c("Total", 28, 2, 30)
)

print(table4_)
## # A tibble: 4 x 4
##   label no    maybe total
##   <chr> <chr> <chr> <chr>
## 1 label No    Maybe Total
## 2 No    26    2     28   
## 3 Maybe 2     0     2    
## 4 Total 28    2     30

I used the same code to create the table but it kept giving me an error: "The fmt_number() function can only be used on columns with numeric data". This is very likely due to the added characters I added in the new dataframe.

table4_ %>% 
  gt() %>% 
  tab_header(
    title = "Table 4. Sound cue reporting. ") %>% 
  tab_source_note("Participants’ responses to the postnap verbal inquiry and to the exit questionnaire. A response was not recorded for n = 1 participant; this participant reported that they did not hear the sound cue on the final exit questionnaire.") %>% 
  fmt_number(columns = vars(no, maybe,  total)) %>% 
  tab_spanner( 
    label = "Reported Hearing Cue on Verbal Report?", #Spanner column label 
    columns = c(no, maybe, total)
    ) %>% 
  tab_row_group(
    label = "Reported Hearing Cue on Exit Questionnaire?",
    rows = 3
  ) %>% 
  cols_label(no = "No", maybe = "Maybe", total = "Total", label = " ")
## Warning: `columns = vars(...)` has been deprecated in gt 0.3.0:
## * please use `columns = c(...)` instead
## Error: The `fmt_number()` function can only be used on `columns` with numeric data

I decided to go back to the original dataframe, but changed the tab_row_group() functino to tab_stubhead, hoping that it would change the weird ordering of the rows. It did change it and now the table looks similar to the original paper, but the tab stubhead label for "Reported Hearing Cue on Exit Questionnaire?" is not appearing.

table4 %>% 
  gt() %>% 
  tab_header(
    title = "Table 4. Sound cue reporting. ") %>% 
  tab_source_note("Participants’ responses to the postnap verbal inquiry and to the exit questionnaire. A response was not recorded for n = 1 participant; this participant reported that they did not hear the sound cue on the final exit questionnaire.") %>% 
  fmt_number(columns = c(no, maybe,  total)) %>% 
  tab_spanner( 
    label = "Reported Hearing Cue on Verbal Report?", #Spanner column label 
    columns = c(no, maybe, total)
    ) %>% 
  tab_stubhead(
    label = "Reported Hearing Cue on Exit Questionnaire?"
  ) %>% 
  cols_label(no = "No", maybe = "Maybe", total = "Total", label = " ")
Table 4. Sound cue reporting.
Reported Hearing Cue on Verbal Report?
No Maybe Total
No 26.00 2.00 28.00
Maybe 2.00 0.00 2.00
Total 28.00 2.00 30.00
Participants’ responses to the postnap verbal inquiry and to the exit questionnaire. A response was not recorded for n = 1 participant; this participant reported that they did not hear the sound cue on the final exit questionnaire.

I added rowname_col = "label" and it added the tab stubhead label!

table4 %>% 
  gt(rowname_col = "label") %>% 
  tab_header(
    title = "Table 4. Sound cue reporting. ") %>% 
  tab_source_note("Participants’ responses to the postnap verbal inquiry and to the exit questionnaire. A response was not recorded for n = 1 participant; this participant reported that they did not hear the sound cue on the final exit questionnaire.") %>% 
  fmt_number(columns = c(no, maybe,  total)) %>% 
  tab_spanner( 
    label = "Reported Hearing Cue on Verbal Report?", #Spanner column label 
    columns = c(no, maybe, total)
    ) %>% 
  tab_stubhead(label = "Reported Hearing Cue on Exit Questionnaire?"
  ) %>% 
  cols_label(no = "No", maybe = "Maybe", total = "Total", label = " ")
Table 4. Sound cue reporting.
Reported Hearing Cue on Exit Questionnaire? Reported Hearing Cue on Verbal Report?
No Maybe Total
No 26.00 2.00 28.00
Maybe 2.00 0.00 2.00
Total 28.00 2.00 30.00
Participants’ responses to the postnap verbal inquiry and to the exit questionnaire. A response was not recorded for n = 1 participant; this participant reported that they did not hear the sound cue on the final exit questionnaire.

I then used the md() function to change the aesthetics of the headings (e.g. bolded, italics) but it does not alter the format of the table. I also added decimals = 0 to the fmt_number column so that there were no decimals showing.

table4 %>% 
  gt(rowname_col = "label") %>% 
  tab_header(
    title = md("**Table 4. Sound cue reporting.**")) %>% 
  tab_source_note("Participants’ responses to the postnap verbal inquiry and to the exit questionnaire. A response was not recorded for n = 1 participant; this participant reported that they did not hear the sound cue on the final exit questionnaire.") %>% 
  fmt_number(columns = c(no, maybe,  total), decimals = 0) %>% 
  tab_spanner( 
    label = md("**Reported Hearing Cue on Verbal Report?**"), #Spanner column label 
    columns = c(no, maybe, total)
    ) %>% 
  tab_stubhead(label = md("**Reported Hearing Cue on Exit Questionnaire?**")
  ) %>% 
  cols_label(no = md("**No**"), maybe = md("**Maybe**"), total = md("***Total***"), label = " ")
Table 4. Sound cue reporting.
Reported Hearing Cue on Exit Questionnaire? Reported Hearing Cue on Verbal Report?
No Maybe Total
No 26 2 28
Maybe 2 0 2
Total 28 2 30
Participants’ responses to the postnap verbal inquiry and to the exit questionnaire. A response was not recorded for n = 1 participant; this participant reported that they did not hear the sound cue on the final exit questionnaire.

Challenges and successes

I faced many challenges this week, which I have included in this learning log. However, working as a group has definitely helped and decreased the potential time spent working through a problem. I'm glad that we've finished reproducing all the tables and now we only have to do the figures (and there are only 3)!

Next steps

  • Getting started on the first figure (since it will definitely be completely different to what the tables have been like)
  • Rework my code so that my explanations aren't within my chunks of code, but are above it (similar to the exercise we did in class this week).