This week’s goals
- Meet my group for a team meeting after the QnA session to make a start on reproducing the descriptive statistics
- Collaborate together to aim and complete reproducing the first table
- Upon ending our meeting, set individual goals to try and reproduce the first table independently so we can help each other code the parts that we struggled with ourselves
Table 1:
My progress this week
- The data from the journal was provided as an Spss file online. Thus we had to download it before finding out a way to import it into Rstudio. I tried to export the file as a csv so I could read_csv(“data”) but I couldn’t find where the csv file was installed on my computer and therefore had to use another method. With help from my groupmate I ended up installing a package from github that enables me to read spss files directly into Rstudio.
#Install package that enables spss data to be directly read into R
library(remotes)
install_github("JanMarvin/readspss")## Skipping install of 'readspss' from a github remote, the SHA1 (bbc71e6b) has not changed since last install.
## Use `force = TRUE` to force installation
- Next I loaded relevant packages
library(readspss) #for importing SPSS data into R
library(tidyverse) #for data wrangling and visualisation## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.1.2 v dplyr 1.0.6
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
#read in the data
originaldata <- read.sav("Humiston & Wamsley 2019 data.sav")- Look at the variables from the data
glimpse(originaldata)## Rows: 69
## Columns: 101
## $ ParticipantID <chr> "ub3", "ub6", "ub7", "ub8", "ub9", "ub10", "ub1~
## $ exclude <fct> yes, no, no, no, no, yes, no, no, no, no, yes, ~
## $ cue_presented <fct> yes, yes, yes, yes, yes, yes, yes, yes, yes, ye~
## $ heard_cue_report <fct> "no", "no", "no", "no", "no", "no", "no", "no",~
## $ heard_cue_exit <fct> no, no, unsure, no, no, no, no, no, no, no, no,~
## $ predicted_cue <fct> no, no, no, no, suspected, no, no, no, no, no, ~
## $ Cue_condition <fct> gender cue played, race cue played, race cue pl~
## $ Counterbias_order <fct> racial training first, racial training first, g~
## $ Sound_assignment <fct> machR and descG, machR and descG, machG and des~
## $ IAT1_order <fct> EATF-SATF, EATF-SATF, SATF-EATF, EATF-SATF, SAT~
## $ IAT234_order <fct> SATS-EATS, SATS-EATS, EATS-SATS, SATS-EATS, EAT~
## $ IAT_order <fct> "ES, SESESE", "ES, SESESE", "SE, ESESES", "ES, ~
## $ compensation <fct> cash, cash, cash, cash, cash, cash, course cred~
## $ General_1_Age <dbl> 19, 21, 21, 20, 21, 19, 19, 20, 18, 18, 18, 18,~
## $ General_1_Sex <fct> Female, Female, Female, Female, Male, Male, Fem~
## $ General_1_Race <fct> Non-White, White, White, White, White, Non-Whit~
## $ General_1_English <fct> Yes, Yes, Yes, Yes, Yes, Yes, No, Yes, Yes, Yes~
## $ General_1_EnglishYrs <dbl> NA, NA, NA, NA, NA, NA, 12, NA, NA, NA, NA, NA,~
## $ General_1_Caffeine <fct> No, Yes, Yes, No, No, No, No, No, No, Yes, No, ~
## $ General_1_CaffCups <fct> NA, 1, 1, NA, NA, NA, NA, NA, NA, 1, NA, NA, NA~
## $ General_1_CaffHrsAgo <dbl> NA, 2.0, 3.0, NA, NA, NA, NA, NA, NA, 5.5, NA, ~
## $ General_1_SleepDisor <fct> No, No, No, No, Yes, No, No, No, No, No, No, No~
## $ General_1_MentalDiso <fct> No, No, No, No, No, No, No, No, No, No, No, No,~
## $ General_1_Meds <fct> Yes, No, Yes, No, No, No, Yes, Yes, No, No, No,~
## $ General_1_MedList <chr> "DepoProvera, 200mg, once every 3 months", "", ~
## $ General_1_University <chr> "Furman University", "Furman University", "Furm~
## $ General_1_UniYears <fct> 2, 3, 3, 2, 3, 1, 0, 2, 0, 0, 0, 0, 0, 1, 0, 1,~
## $ Demo_1_Ethnic <fct> Not Hispanic or Latino, Not Hispanic or Latino,~
## $ Demo_1_Racial <fct> Black or African American, White, White, White,~
## $ Demo_1_Gender <fct> Female, Female, Female, Female, Male, Male, Fem~
## $ Demo_1_NonParticipat <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, I choos~
## $ Epworth_1_Read <fct> moderate chance of dozing, slight chance of doz~
## $ Epworth_1_TV <fct> high chance of dozing, slight chance of dozing,~
## $ Epworth_1_Public <fct> slight chance of dozing, slight chance of dozin~
## $ Epworth_1_Passenger <fct> slight chance of dozing, moderate chance of doz~
## $ Epworth_1_LyingDown <fct> high chance of dozing, high chance of dozing, s~
## $ Epworth_1_Talking <fct> no chance of dozing, no chance of dozing, no ch~
## $ Epworth_1_Lunch <fct> slight chance of dozing, no chance of dozing, n~
## $ Epworth_1_Traffic <fct> no chance of dozing, no chance of dozing, no ch~
## $ Epworth_total <dbl> 19, 16, 12, 13, 10, 21, 16, 12, 20, 15, 20, 16,~
## $ AlertTest_1_Concentr_1 <dbl> 80, 80, 60, 60, 70, 100, 70, 40, 80, 80, 60, 80~
## $ AlertTest_1_Refresh_1 <dbl> 90, 60, 70, 60, 30, 100, 60, 40, 80, 60, 30, 40~
## $ AlertTest_1_Feel <fct> "3 - Awake, but relaxed; responsive but not ful~
## $ AlertTest_2_Concentr_1 <dbl> NA, 70, 60, 40, 60, 80, 80, 40, 80, 80, 80, 60,~
## $ AlertTest_2_Refresh_1 <dbl> NA, 70, 60, 30, 30, 80, 60, 40, 70, 60, 40, 40,~
## $ AlertTest_2_Feel <fct> NA, "3 - Awake, but relaxed; responsive but not~
## $ AlertTest_3_Concentr_1 <dbl> 90, NA, 60, 40, 80, 100, NA, 60, 100, 80, 80, 7~
## $ AlertTest_3_Refresh_1 <dbl> 80, NA, 70, 50, 70, 100, NA, 80, 100, 90, 70, 9~
## $ AlertTest_3_Feel <fct> "2 - Functioning at high levels, but not at pea~
## $ AlertTest_4_Concentr_1 <dbl> 80, 80, 60, 40, NA, 90, NA, 70, 90, 100, 90, 70~
## $ AlertTest_4_Refresh_1 <dbl> 80, 90, 50, 30, NA, 80, NA, 80, 90, 80, 90, 60,~
## $ AlertTest_4_Feel <fct> "2 - Functioning at high levels, but not at pea~
## $ S1_ExitQ_1_sound <fct> NA, No, No, No, No, No, No, No, No, No, No, No,~
## $ S1_ExitQ_1_soundaffect <fct> NA, No, No, No, No, No, No, No, No, No, No, No,~
## $ S1_ExitQ_2_sound <fct> NA, No, No, No, No, No, No, No, No, No, No, No,~
## $ S1_ExitQ_3_sound <fct> NA, No, No, No, No, No, No, No, No, No, Yes, No~
## $ S1_ExitQ_4_sound <fct> NA, No, No, No, No, No, No, No, No, No, No, No,~
## $ S1_ExitQ_4_soundaffect <fct> NA, No, No, No, No, No, No, No, No, No, No, No,~
## $ S1_ExitQ_5_sound <fct> NA, No, No, No, No, No, No, No, No, No, No, No,~
## $ S1_ExitQ_5_soundaffect <fct> NA, No, No, No, No, No, No, No, No, No, No, No,~
## $ S2_ExitQ_1_sound <fct> NA, No, No, No, No, No, No, No, No, No, No, No,~
## $ S2_ExitQ_1_soundaffect <fct> NA, No, No, No, No, No, No, No, No, No, No, No,~
## $ S2_ExitQ_2_sound <fct> NA, No, No, No, No, No, No, No, No, No, No, No,~
## $ S2_ExitQ_3_sound <fct> NA, No, No, No, No, No, No, No, No, No, Yes, No~
## $ S2_ExitQ_4_sound <fct> NA, No, No, No, No, No, No, No, No, No, No, No,~
## $ S2_ExitQ_4_soundaffect <fct> NA, No, No, No, No, No, No, No, No, No, No, No,~
## $ S2_ExitQ_5_sound <fct> NA, No, No, No, No, No, No, No, No, No, No, No,~
## $ S2_ExitQ_5_soundaffect <fct> NA, No, No, No, No, No, No, No, No, No, No, No,~
## $ Total_sleep <dbl> 64, 65, 66, 80, 62, 84, 51, 81, 81, 67, 82, 79,~
## $ Wake_amount <dbl> 26, 25, 24, 10, 28, 6, 39, 9, 9, 23, 8, 11, 22,~
## $ NREM1_amount <dbl> 19, 10, 9, 5, 5, 4, 11, 4, 3, 4, 5, 4, 9, 2, 2,~
## $ NREM2_amount <dbl> 29.0, 20.0, 52.0, 15.0, 15.5, 51.0, 22.0, 23.0,~
## $ SWS_amount <dbl> 8, 12, 19, 24, 24, 22, 16, 36, 37, 4, 18, 23, 6~
## $ REM_amount <dbl> 9, 23, 0, 17, 17, 6, 2, 18, 9, 9, 11, 24, 10, 2~
## $ SWSxREM <dbl> 72, 276, 0, 408, 408, 132, 32, 648, 333, 36, 19~
## $ cue_minutes <dbl> 19.0, 9.5, 12.0, 15.5, 16.0, 29.0, 15.0, 25.0, ~
## $ baseIATcued <dbl> 0.19620397, 0.57544182, 0.09911241, 0.20577365,~
## $ baseIATuncued <dbl> -0.26535527, 0.60953653, 0.64396538, 1.52435622~
## $ preIATcued <dbl> -0.34989445, 0.55905291, -0.13380639, 0.5107702~
## $ preIATuncued <dbl> -0.4905672, 0.2146214, 0.3398503, 0.3799023, -0~
## $ postIATcued <dbl> -0.192035676, 0.681910146, 0.044634805, -0.0025~
## $ postIATuncued <dbl> -1.04192332, 0.46728694, -0.05686262, 0.6824358~
## $ weekIATcued <dbl> -0.36134918, 0.20377367, 0.45873715, 0.39859469~
## $ weekIATuncued <dbl> 0.38291394, 0.68277422, -0.01070460, 0.71187286~
## $ postnap_change_cued <dbl> 0.1578588, 0.1228572, 0.1784412, -0.5133539, -0~
## $ postnap_change_uncued <dbl> -0.55135615, 0.25266550, -0.39671291, 0.3025335~
## $ week_change_cued <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ week_change_uncued <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
## $ diff_biaschange_cued <dbl> 0.557553150, 0.371668148, -0.359624732, -0.1928~
## $ diff_biaschange_uncued <dbl> -0.64826921, -0.07323769, 0.65466998, 0.8124833~
## $ diff_biaschange <dbl> 1.20582236, 0.44490584, -1.01429471, -1.0053044~
## $ base_IAT_race <dbl> -0.26535527, 0.57544182, 0.09911241, 1.52435622~
## $ base_IAT_gen <dbl> 0.19620397, 0.60953653, 0.64396538, 0.20577365,~
## $ pre_IAT_race <dbl> -0.49056718, 0.55905291, -0.13380639, 0.3799023~
## $ pre_IAT_gen <dbl> -0.34989445, 0.21462144, 0.33985028, 0.51077026~
## $ post_IAT_race <dbl> -1.04192332, 0.68191015, 0.04463480, 0.68243589~
## $ week_IAT_race <dbl> -0.19203568, 0.20377367, 0.45873715, 0.71187286~
## $ post_IAT_gen <dbl> 0.382913945, 0.467286940, -0.056862624, -0.0025~
## $ week_IAT_gen <dbl> -0.36134918, 0.68277422, -0.01070460, 0.3985946~
## $ `filter_$` <fct> Not Selected, Selected, Selected, Selected, Sel~
## $ cues_total <dbl> 285.0, 142.5, 180.0, 232.5, 240.0, 435.0, 225.0~
- Then we excluded the excluded participants from further analysis
#making sure we only keep the people whose data was used in the analysis
cleandata <- originaldata %>%
filter(exclude == "no")- Having a go at reproducing one part of the descriptives
#obtaining age mean and sd
ageaverage <- cleandata %>%
select(General_1_Age) %>%
summarise(ageaverage =mean(General_1_Age), agesd = sd(General_1_Age))
print(ageaverage)## ageaverage agesd
## 1 19.54839 1.233929
I’m realising that there is probably a faster way to reproduce the first table without individually finding the mean and as of each variable before combining it together in the table. So I’m going to try something different -
- First I’m going to renaming relevant variables
cleandata <- cleandata %>%
rename(EES = Epworth_total, SSS = AlertTest_1_Feel, Age = General_1_Age, Sex = General_1_Sex, Cue_Played = Cue_condition)- Then I’m selecting all the relevant variables needed to reproduce table 1
#I'm trying to see if i can find the average and sd of all of these variables at once, but I couldn't figure it out once I got down to the summarise part, I also wasn't sure it would work since I needed to find the mean and sd of some variables differently to others
participant_charac <- cleandata %>%
select(Age, EES, SSS, base_IAT_race, base_IAT_gen, pre_IAT_race, pre_IAT_gen, post_IAT_race, post_IAT_gen, week_IAT_race, week_IAT_gen, Sex, Cue_Played)
# summarise(mean = mean(Age), mean = mean(EES))?? this part is definetely wrong
#maybe instead of select i need to use group_by? though I'm still a bit confused by that functionso now we’re back to the slow method…
- Trying to finding the mean and sd of:
#Age:
Age <- participant_charac %>%
select(Age) %>%
summarise(mean = mean(Age), SD = sd(Age))
print(Age)## mean SD
## 1 19.54839 1.233929
#EES
EES <- participant_charac %>%
select(EES) %>%
summarise(mean = mean(EES), SD = sd(EES))
print(EES)## mean SD
## 1 15.29032 2.830707
#SSS
cleandata <- cleandata %>%
mutate(
SSSvalue = as.numeric(
x = SSS,
levels = 1:5,
labels = c("1 - Feeling active, vital alert, or wide awake",
"2 - Functioning at high levels, but not at peak; able to concentrate",
"3 - Awake, but relaxed; responsive but not fully alert",
"4 - Somewhat foggy, let down",
"5 - Foggy; losing interest in remaining awake; slowed down"),
exclude = NA
)
)
SSS <- cleandata %>%
select(SSSvalue) %>%
summarise(SSSaverage = mean(SSSvalue),
SSSsd = sd(SSSvalue))
print(SSS)## SSSaverage SSSsd
## 1 2.806452 0.7491931
#Baseline implicit bias
Baseline_IAT <- participant_charac %>%
select(base_IAT_race, base_IAT_gen) %>%
summarise(mean = mean(rbind(base_IAT_race, base_IAT_gen)), SD = sd(rbind(base_IAT_race,base_IAT_gen)))
print(Baseline_IAT)## mean SD
## 1 0.5565373 0.4058619
#Prenap implicit bias
Prenap_IAT <- participant_charac %>%
select(pre_IAT_race, pre_IAT_gen) %>%
summarise(mean = mean(rbind(pre_IAT_race, pre_IAT_gen)), SD = sd(rbind(pre_IAT_race,pre_IAT_gen)))
print(Prenap_IAT)## mean SD
## 1 0.2566674 0.4776418
#Postnap implicit bias
Postnap_IAT <- participant_charac %>%
select(post_IAT_race, post_IAT_gen) %>%
summarise(mean = mean(rbind(post_IAT_race, post_IAT_gen)), SD = sd(rbind(post_IAT_race,post_IAT_gen)))
print(Postnap_IAT)## mean SD
## 1 0.2776836 0.4585372
#One week delay implicit bias
One_wk_delay <- participant_charac %>%
select(week_IAT_race, week_IAT_gen) %>%
summarise(mean = mean(rbind(week_IAT_race, week_IAT_gen)), SD = sd(rbind(week_IAT_race,week_IAT_gen)))
print(One_wk_delay)## mean SD
## 1 0.3994186 0.4254629
#Sex (% male)
Male <- participant_charac %>%
select(Sex) %>%
tally(Sex == "Male")
Male_percentage <- Male/31 #31 as the clean participant data set has 31 participants
print(Male_percentage)## n
## 1 0.483871
#Cue played( % racial cue)
cue <- participant_charac %>%
select(Cue_Played) %>%
tally(Cue_Played == "race cue played")
racial_cue_percentage <- cue/31
print(racial_cue_percentage)## n
## 1 0.5483871
- Okay now that we have all the values, we have to find a way to produce a table
#Installing and loading the packages that create tables
#install.packages("gt")
#install.packages("kableExtra")
library(gt)- Recreating table 1
#adding the values to the table
table_1 <- tibble(
Characteristics = c("Age (yrs)", "ESS", "SSS", "Baseline implicit bias", "Prenap implicit bias", "Postnap implicit bias", "One-week delay implicit bias", "Sex (% male)", "Cue played during nap (% racial cue)"),
Mean = c(19.54839, 15.29032, 2.806452, 0.5565373, 0.2566674, 0.2776836, 0.3994186, 0.483871, 0.5483871),
SD = c(1.233929, 2.830707, 0.7491931, 0.4058619, 0.4776418, 0.4585372, 0.4254629, NA, NA)
)
#formatting the table as close to the original as possible
table_1 %>%
gt() %>%
tab_header(title = md("**Table 1. Participant characteristics.**")) %>%
fmt_number(columns = vars(Mean, SD), decimals = 2) %>%
tab_source_note(source_note = "Implicit bias values are the average of D600 score for each timepoint.")## Warning: `columns = vars(...)` has been deprecated in gt 0.3.0:
## * please use `columns = c(...)` instead
## Warning: `columns = vars(...)` has been deprecated in gt 0.3.0:
## * please use `columns = c(...)` instead
| Table 1. Participant characteristics. | ||
|---|---|---|
| Characteristics | Mean | SD |
| Age (yrs) | 19.55 | 1.23 |
| ESS | 15.29 | 2.83 |
| SSS | 2.81 | 0.75 |
| Baseline implicit bias | 0.56 | 0.41 |
| Prenap implicit bias | 0.26 | 0.48 |
| Postnap implicit bias | 0.28 | 0.46 |
| One-week delay implicit bias | 0.40 | 0.43 |
| Sex (% male) | 0.48 | NA |
| Cue played during nap (% racial cue) | 0.55 | NA |
| Implicit bias values are the average of D600 score for each timepoint. | ||
Challenges
This week has been very challenging compared to previous weeks. Some of the challenges I had individually were documented in the previous sections but there were a few that we struggled with as a group
Firstly, importing the SPSS document into R was a challenge in of itself, which took multiple tries and help from teammates to successfully pass this hurdle
Secondly, it took a while for our team to figure out all the different variables in the original data and the what they all meant and which of those variables corresponded with the ones in the table.
Continuing off the last point, it took our team multiple tries to get the correct mean and sd for some of the values, especially SSS, as it was a combination of a couple of variables that also had missing data in various columns
Personally, I thought there would be a quicker way to code for all the variables rather than do it all separately and use that code to put it in table rather than manually adding all the numbers to recreate the table, but I struggled a lot with that.
Successes
Our first group meeting was a success, with all team members contributing and helping each other out.
We weren’t able to complete the table during the meeting as it ran for a long time, but afterwards each team member worked to the table some more and shared it to the rest of the group. I had a lot of help using different parts of code from different team members, and I was able to improve it and build on it.
There was a lot of googling that was done, but eventually i got there in the end, particularly relating the creating the table
We got the first table done!
Next steps from here
I just have a feeling that all this code could be condensed to become more concise. I do think I need some help with thiss, but once we make sufficient progress on other tables and figures, then I’d like to revisit this code to fix it up a bit.
Collab with m teammates to share our code on table 1
Get started on reproducing the other descriptive tables.