Week 4: Learning Log

This week’s goals

Meet my group for a team meeting after the QnA session to make a start on reproducing the descriptive statistics
Collaborate together to aim and complete reproducing the first table
Upon ending our meeting, set individual goals to try and reproduce the first table independently so we can help each other code the parts that we struggled with ourselves

Table 1:

My progress this week

The data from the journal was provided as an Spss file online. Thus we had to download it before finding out a way to import it into Rstudio. I tried to export the file as a csv so I could read_csv(“data”) but I couldn’t find where the csv file was installed on my computer and therefore had to use another method. With help from my groupmate I ended up installing a package from github that enables me to read spss files directly into Rstudio.

#Install package that enables spss data to be directly read into R

library(remotes)
install_github("JanMarvin/readspss")

## Skipping install of 'readspss' from a github remote, the SHA1 (bbc71e6b) has not changed since last install.
##   Use `force = TRUE` to force installation

Next I loaded relevant packages

library(readspss)  #for importing SPSS data into R
library(tidyverse) #for data wrangling and visualisation

## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --

## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.1.2     v dplyr   1.0.6
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

#read in the data

originaldata <- read.sav("Humiston & Wamsley 2019 data.sav")

Look at the variables from the data

glimpse(originaldata)

## Rows: 69
## Columns: 101
## $ ParticipantID          <chr> "ub3", "ub6", "ub7", "ub8", "ub9", "ub10", "ub1~
## $ exclude                <fct> yes, no, no, no, no, yes, no, no, no, no, yes, ~
## $ cue_presented          <fct> yes, yes, yes, yes, yes, yes, yes, yes, yes, ye~
## $ heard_cue_report       <fct> "no", "no", "no", "no", "no", "no", "no", "no",~
## $ heard_cue_exit         <fct> no, no, unsure, no, no, no, no, no, no, no, no,~
## $ predicted_cue          <fct> no, no, no, no, suspected, no, no, no, no, no, ~
## $ Cue_condition          <fct> gender cue played, race cue played, race cue pl~
## $ Counterbias_order      <fct> racial training first, racial training first, g~
## $ Sound_assignment       <fct> machR and descG, machR and descG, machG and des~
## $ IAT1_order             <fct> EATF-SATF, EATF-SATF, SATF-EATF, EATF-SATF, SAT~
## $ IAT234_order           <fct> SATS-EATS, SATS-EATS, EATS-SATS, SATS-EATS, EAT~
## $ IAT_order              <fct> "ES, SESESE", "ES, SESESE", "SE, ESESES", "ES, ~
## $ compensation           <fct> cash, cash, cash, cash, cash, cash, course cred~
## $ General_1_Age          <dbl> 19, 21, 21, 20, 21, 19, 19, 20, 18, 18, 18, 18,~
## $ General_1_Sex          <fct> Female, Female, Female, Female, Male, Male, Fem~
## $ General_1_Race         <fct> Non-White, White, White, White, White, Non-Whit~
## $ General_1_English      <fct> Yes, Yes, Yes, Yes, Yes, Yes, No, Yes, Yes, Yes~
## $ General_1_EnglishYrs   <dbl> NA, NA, NA, NA, NA, NA, 12, NA, NA, NA, NA, NA,~
## $ General_1_Caffeine     <fct> No, Yes, Yes, No, No, No, No, No, No, Yes, No, ~
## $ General_1_CaffCups     <fct> NA, 1, 1, NA, NA, NA, NA, NA, NA, 1, NA, NA, NA~
## $ General_1_CaffHrsAgo   <dbl> NA, 2.0, 3.0, NA, NA, NA, NA, NA, NA, 5.5, NA, ~
## $ General_1_SleepDisor   <fct> No, No, No, No, Yes, No, No, No, No, No, No, No~
## $ General_1_MentalDiso   <fct> No, No, No, No, No, No, No, No, No, No, No, No,~
## $ General_1_Meds         <fct> Yes, No, Yes, No, No, No, Yes, Yes, No, No, No,~
## $ General_1_MedList      <chr> "DepoProvera, 200mg, once every 3 months", "", ~
## $ General_1_University   <chr> "Furman University", "Furman University", "Furm~
## $ General_1_UniYears     <fct> 2, 3, 3, 2, 3, 1, 0, 2, 0, 0, 0, 0, 0, 1, 0, 1,~
## $ Demo_1_Ethnic          <fct> Not Hispanic or Latino, Not Hispanic or Latino,~
## $ Demo_1_Racial          <fct> Black or African American, White, White, White,~
## $ Demo_1_Gender          <fct> Female, Female, Female, Female, Male, Male, Fem~
## $ Demo_1_NonParticipat   <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, I choos~
## $ Epworth_1_Read         <fct> moderate chance of dozing, slight chance of doz~
## $ Epworth_1_TV           <fct> high chance of dozing, slight chance of dozing,~
## $ Epworth_1_Public       <fct> slight chance of dozing, slight chance of dozin~
## $ Epworth_1_Passenger    <fct> slight chance of dozing, moderate chance of doz~
## $ Epworth_1_LyingDown    <fct> high chance of dozing, high chance of dozing, s~
## $ Epworth_1_Talking      <fct> no chance of dozing, no chance of dozing, no ch~
## $ Epworth_1_Lunch        <fct> slight chance of dozing, no chance of dozing, n~
## $ Epworth_1_Traffic      <fct> no chance of dozing, no chance of dozing, no ch~
## $ Epworth_total          <dbl> 19, 16, 12, 13, 10, 21, 16, 12, 20, 15, 20, 16,~
## $ AlertTest_1_Concentr_1 <dbl> 80, 80, 60, 60, 70, 100, 70, 40, 80, 80, 60, 80~
## $ AlertTest_1_Refresh_1  <dbl> 90, 60, 70, 60, 30, 100, 60, 40, 80, 60, 30, 40~
## $ AlertTest_1_Feel       <fct> "3 - Awake, but relaxed; responsive but not ful~
## $ AlertTest_2_Concentr_1 <dbl> NA, 70, 60, 40, 60, 80, 80, 40, 80, 80, 80, 60,~
## $ AlertTest_2_Refresh_1  <dbl> NA, 70, 60, 30, 30, 80, 60, 40, 70, 60, 40, 40,~
## $ AlertTest_2_Feel       <fct> NA, "3 - Awake, but relaxed; responsive but not~
## $ AlertTest_3_Concentr_1 <dbl> 90, NA, 60, 40, 80, 100, NA, 60, 100, 80, 80, 7~
## $ AlertTest_3_Refresh_1  <dbl> 80, NA, 70, 50, 70, 100, NA, 80, 100, 90, 70, 9~
## $ AlertTest_3_Feel       <fct> "2 - Functioning at high levels, but not at pea~
## $ AlertTest_4_Concentr_1 <dbl> 80, 80, 60, 40, NA, 90, NA, 70, 90, 100, 90, 70~
## $ AlertTest_4_Refresh_1  <dbl> 80, 90, 50, 30, NA, 80, NA, 80, 90, 80, 90, 60,~
## $ AlertTest_4_Feel       <fct> "2 - Functioning at high levels, but not at pea~
## $ S1_ExitQ_1_sound       <fct> NA, No, No, No, No, No, No, No, No, No, No, No,~
## $ S1_ExitQ_1_soundaffect <fct> NA, No, No, No, No, No, No, No, No, No, No, No,~
## $ S1_ExitQ_2_sound       <fct> NA, No, No, No, No, No, No, No, No, No, No, No,~
## $ S1_ExitQ_3_sound       <fct> NA, No, No, No, No, No, No, No, No, No, Yes, No~
## $ S1_ExitQ_4_sound       <fct> NA, No, No, No, No, No, No, No, No, No, No, No,~
## $ S1_ExitQ_4_soundaffect <fct> NA, No, No, No, No, No, No, No, No, No, No, No,~
## $ S1_ExitQ_5_sound       <fct> NA, No, No, No, No, No, No, No, No, No, No, No,~
## $ S1_ExitQ_5_soundaffect <fct> NA, No, No, No, No, No, No, No, No, No, No, No,~
## $ S2_ExitQ_1_sound       <fct> NA, No, No, No, No, No, No, No, No, No, No, No,~
## $ S2_ExitQ_1_soundaffect <fct> NA, No, No, No, No, No, No, No, No, No, No, No,~
## $ S2_ExitQ_2_sound       <fct> NA, No, No, No, No, No, No, No, No, No, No, No,~
## $ S2_ExitQ_3_sound       <fct> NA, No, No, No, No, No, No, No, No, No, Yes, No~
## $ S2_ExitQ_4_sound       <fct> NA, No, No, No, No, No, No, No, No, No, No, No,~
## $ S2_ExitQ_4_soundaffect <fct> NA, No, No, No, No, No, No, No, No, No, No, No,~
## $ S2_ExitQ_5_sound       <fct> NA, No, No, No, No, No, No, No, No, No, No, No,~
## $ S2_ExitQ_5_soundaffect <fct> NA, No, No, No, No, No, No, No, No, No, No, No,~
## $ Total_sleep            <dbl> 64, 65, 66, 80, 62, 84, 51, 81, 81, 67, 82, 79,~
## $ Wake_amount            <dbl> 26, 25, 24, 10, 28, 6, 39, 9, 9, 23, 8, 11, 22,~
## $ NREM1_amount           <dbl> 19, 10, 9, 5, 5, 4, 11, 4, 3, 4, 5, 4, 9, 2, 2,~
## $ NREM2_amount           <dbl> 29.0, 20.0, 52.0, 15.0, 15.5, 51.0, 22.0, 23.0,~
## $ SWS_amount             <dbl> 8, 12, 19, 24, 24, 22, 16, 36, 37, 4, 18, 23, 6~
## $ REM_amount             <dbl> 9, 23, 0, 17, 17, 6, 2, 18, 9, 9, 11, 24, 10, 2~
## $ SWSxREM                <dbl> 72, 276, 0, 408, 408, 132, 32, 648, 333, 36, 19~
## $ cue_minutes            <dbl> 19.0, 9.5, 12.0, 15.5, 16.0, 29.0, 15.0, 25.0, ~
## $ baseIATcued            <dbl> 0.19620397, 0.57544182, 0.09911241, 0.20577365,~
## $ baseIATuncued          <dbl> -0.26535527, 0.60953653, 0.64396538, 1.52435622~
## $ preIATcued             <dbl> -0.34989445, 0.55905291, -0.13380639, 0.5107702~
## $ preIATuncued           <dbl> -0.4905672, 0.2146214, 0.3398503, 0.3799023, -0~
## $ postIATcued            <dbl> -0.192035676, 0.681910146, 0.044634805, -0.0025~
## $ postIATuncued          <dbl> -1.04192332, 0.46728694, -0.05686262, 0.6824358~
## $ weekIATcued            <dbl> -0.36134918, 0.20377367, 0.45873715, 0.39859469~
## $ weekIATuncued          <dbl> 0.38291394, 0.68277422, -0.01070460, 0.71187286~
## $ postnap_change_cued    <dbl> 0.1578588, 0.1228572, 0.1784412, -0.5133539, -0~
## $ postnap_change_uncued  <dbl> -0.55135615, 0.25266550, -0.39671291, 0.3025335~
## $ week_change_cued       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ week_change_uncued     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ diff_biaschange_cued   <dbl> 0.557553150, 0.371668148, -0.359624732, -0.1928~
## $ diff_biaschange_uncued <dbl> -0.64826921, -0.07323769, 0.65466998, 0.8124833~
## $ diff_biaschange        <dbl> 1.20582236, 0.44490584, -1.01429471, -1.0053044~
## $ base_IAT_race          <dbl> -0.26535527, 0.57544182, 0.09911241, 1.52435622~
## $ base_IAT_gen           <dbl> 0.19620397, 0.60953653, 0.64396538, 0.20577365,~
## $ pre_IAT_race           <dbl> -0.49056718, 0.55905291, -0.13380639, 0.3799023~
## $ pre_IAT_gen            <dbl> -0.34989445, 0.21462144, 0.33985028, 0.51077026~
## $ post_IAT_race          <dbl> -1.04192332, 0.68191015, 0.04463480, 0.68243589~
## $ week_IAT_race          <dbl> -0.19203568, 0.20377367, 0.45873715, 0.71187286~
## $ post_IAT_gen           <dbl> 0.382913945, 0.467286940, -0.056862624, -0.0025~
## $ week_IAT_gen           <dbl> -0.36134918, 0.68277422, -0.01070460, 0.3985946~
## $ `filter_$`             <fct> Not Selected, Selected, Selected, Selected, Sel~
## $ cues_total             <dbl> 285.0, 142.5, 180.0, 232.5, 240.0, 435.0, 225.0~

Then we excluded the excluded participants from further analysis

#making sure we only keep the people whose data was used in the analysis 

 cleandata <- originaldata %>%    
  filter(exclude == "no")

Having a go at reproducing one part of the descriptives

#obtaining age mean and sd

ageaverage <- cleandata %>% 
  select(General_1_Age) %>% 
  summarise(ageaverage =mean(General_1_Age), agesd = sd(General_1_Age))

print(ageaverage)

##   ageaverage    agesd
## 1   19.54839 1.233929

I’m realising that there is probably a faster way to reproduce the first table without individually finding the mean and as of each variable before combining it together in the table. So I’m going to try something different -

First I’m going to renaming relevant variables

cleandata <- cleandata %>% 
  rename(EES = Epworth_total, SSS = AlertTest_1_Feel, Age = General_1_Age, Sex = General_1_Sex, Cue_Played = Cue_condition)

Then I’m selecting all the relevant variables needed to reproduce table 1

#I'm trying to see if i can find the average and sd of all of these variables at once, but I couldn't figure it out once I got down to the summarise part, I also wasn't sure it would work since I needed to find the mean and sd of some variables differently to others

participant_charac <- cleandata %>% 
  select(Age, EES, SSS, base_IAT_race, base_IAT_gen, pre_IAT_race, pre_IAT_gen, post_IAT_race, post_IAT_gen, week_IAT_race, week_IAT_gen, Sex, Cue_Played)
  # summarise(mean = mean(Age), mean = mean(EES))?? this part is definetely wrong
  #maybe instead of select i need to use group_by? though I'm still a bit confused by that function

so now we’re back to the slow method…

Trying to finding the mean and sd of:

#Age:

Age <- participant_charac %>% 
  select(Age) %>% 
  summarise(mean = mean(Age), SD = sd(Age)) 

print(Age)

##       mean       SD
## 1 19.54839 1.233929

#EES

EES <- participant_charac %>% 
  select(EES) %>% 
  summarise(mean = mean(EES), SD = sd(EES)) 

print(EES)

##       mean       SD
## 1 15.29032 2.830707

#SSS

cleandata <- cleandata %>% 
  mutate(
    SSSvalue = as.numeric(
      x = SSS,
      levels = 1:5,
      labels = c("1 - Feeling active, vital alert, or wide awake",
      "2 - Functioning at high levels, but not at peak; able to concentrate",
      "3 - Awake, but relaxed; responsive but not fully alert",
      "4 - Somewhat foggy, let down",
      "5 - Foggy; losing interest in remaining awake; slowed down"),
      exclude = NA
    )
  )
SSS <- cleandata %>% 
  select(SSSvalue) %>% 
  summarise(SSSaverage = mean(SSSvalue),
            SSSsd = sd(SSSvalue))

print(SSS)

##   SSSaverage     SSSsd
## 1   2.806452 0.7491931

#Baseline implicit bias

Baseline_IAT <- participant_charac %>% 
  select(base_IAT_race, base_IAT_gen) %>% 
  summarise(mean = mean(rbind(base_IAT_race, base_IAT_gen)), SD = sd(rbind(base_IAT_race,base_IAT_gen)))

print(Baseline_IAT)

##        mean        SD
## 1 0.5565373 0.4058619

#Prenap implicit bias

Prenap_IAT <- participant_charac %>% 
  select(pre_IAT_race, pre_IAT_gen) %>% 
  summarise(mean = mean(rbind(pre_IAT_race, pre_IAT_gen)), SD = sd(rbind(pre_IAT_race,pre_IAT_gen)))

print(Prenap_IAT)

##        mean        SD
## 1 0.2566674 0.4776418

#Postnap implicit bias

Postnap_IAT <- participant_charac %>% 
  select(post_IAT_race, post_IAT_gen) %>% 
  summarise(mean = mean(rbind(post_IAT_race, post_IAT_gen)), SD = sd(rbind(post_IAT_race,post_IAT_gen)))

print(Postnap_IAT)

##        mean        SD
## 1 0.2776836 0.4585372

#One week delay implicit bias

One_wk_delay <- participant_charac %>% 
  select(week_IAT_race, week_IAT_gen) %>% 
  summarise(mean = mean(rbind(week_IAT_race, week_IAT_gen)), SD = sd(rbind(week_IAT_race,week_IAT_gen)))

print(One_wk_delay)

##        mean        SD
## 1 0.3994186 0.4254629

#Sex (% male)

Male <- participant_charac %>% 
  select(Sex) %>% 
  tally(Sex == "Male") 

Male_percentage <- Male/31 #31 as the clean participant data set has 31 participants

print(Male_percentage)

##          n
## 1 0.483871

#Cue played( % racial cue)

cue <- participant_charac %>% 
  select(Cue_Played) %>% 
  tally(Cue_Played == "race cue played") 
  
racial_cue_percentage <- cue/31

print(racial_cue_percentage)

##           n
## 1 0.5483871

Okay now that we have all the values, we have to find a way to produce a table

#Installing and loading the packages that create tables


#install.packages("gt")
#install.packages("kableExtra")

library(gt)

Recreating table 1

#adding the values to the table

table_1 <- tibble(
  Characteristics = c("Age (yrs)", "ESS", "SSS", "Baseline implicit bias", "Prenap implicit bias", "Postnap implicit bias", "One-week delay implicit bias", "Sex (% male)", "Cue played during nap (% racial cue)"),
  Mean = c(19.54839, 15.29032, 2.806452, 0.5565373, 0.2566674, 0.2776836, 0.3994186, 0.483871, 0.5483871),
  SD = c(1.233929, 2.830707, 0.7491931, 0.4058619, 0.4776418, 0.4585372, 0.4254629, NA, NA)
)


#formatting the table as close to the original as possible

table_1 %>% 
  gt() %>% 
  tab_header(title = md("**Table 1. Participant characteristics.**")) %>% 
  fmt_number(columns = vars(Mean, SD), decimals = 2) %>% 
  tab_source_note(source_note = "Implicit bias values are the average of D600 score for each timepoint.")

## Warning: `columns = vars(...)` has been deprecated in gt 0.3.0:
## * please use `columns = c(...)` instead

## Warning: `columns = vars(...)` has been deprecated in gt 0.3.0:
## * please use `columns = c(...)` instead

Table 1. Participant characteristics.
Characteristics	Mean	SD
Age (yrs)	19.55	1.23
ESS	15.29	2.83
SSS	2.81	0.75
Baseline implicit bias	0.56	0.41
Prenap implicit bias	0.26	0.48
Postnap implicit bias	0.28	0.46
One-week delay implicit bias	0.40	0.43
Sex (% male)	0.48	NA
Cue played during nap (% racial cue)	0.55	NA
Implicit bias values are the average of D600 score for each timepoint.

Challenges

This week has been very challenging compared to previous weeks. Some of the challenges I had individually were documented in the previous sections but there were a few that we struggled with as a group
Firstly, importing the SPSS document into R was a challenge in of itself, which took multiple tries and help from teammates to successfully pass this hurdle
Secondly, it took a while for our team to figure out all the different variables in the original data and the what they all meant and which of those variables corresponded with the ones in the table.
Continuing off the last point, it took our team multiple tries to get the correct mean and sd for some of the values, especially SSS, as it was a combination of a couple of variables that also had missing data in various columns
Personally, I thought there would be a quicker way to code for all the variables rather than do it all separately and use that code to put it in table rather than manually adding all the numbers to recreate the table, but I struggled a lot with that.

Successes

Our first group meeting was a success, with all team members contributing and helping each other out.
We weren’t able to complete the table during the meeting as it ran for a long time, but afterwards each team member worked to the table some more and shared it to the rest of the group. I had a lot of help using different parts of code from different team members, and I was able to improve it and build on it.
There was a lot of googling that was done, but eventually i got there in the end, particularly relating the creating the table
We got the first table done!

Next steps from here

I just have a feeling that all this code could be condensed to become more concise. I do think I need some help with thiss, but once we make sufficient progress on other tables and figures, then I’d like to revisit this code to fix it up a bit.
Collab with m teammates to share our code on table 1
Get started on reproducing the other descriptive tables.