Learning Log 4

library(tidyverse)

## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --

## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.1.2     v dplyr   1.0.6
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(dplyr)

This week marks the first week I’ve been working on my group replication project! (Very exciting but also slightly scary).

I had a group meeting with my group to get started on replicating the figures from our paper which included descriptive statistics and some plots. Which is the goal for this week, to get started on the descriptive statistics!

This week’s goal: start replicating descriptive stats

First things, first, we had to figure out what our data even meant. We noticed that on an internet browser, the labels were cut off, and it was very hard to tell what each variable was.

We opened the data on SPSS which was helpful as in variable view, each variable had a label which described what it was.

However you can’t code on SPSS so we tried opening the data in R. This was much easier than we thought and all we had to do was locate the data file and import it. We also labelled the data to make it easier to use. I used “replicationdata” (although I note now that I could have probably used a shorter and easier label).

replicationdata <- read_csv("~/Coding-R/replication project/replicationdata.csv")

## 
## -- Column specification --------------------------------------------------------
## cols(
##   .default = col_double(),
##   ParticipantID = col_character(),
##   General_1_MedList = col_character(),
##   General_1_University = col_character()
## )
## i Use `spec()` for the full column specifications.

Following this we started to attempt to replicate the descriptive statisics from the following table:

We started off by filtering our data to exclude the same particpants the original study excluded from analysis.

cleandata <- replicationdata %>% 
  filter(exclude==0)

Age

Following this we tackled age first as it seemed to be the easiest statistic to reproduce, since it had it’s own assigned variable. Therefore, we only needed to find the average and standard deviation of this variable. The functions we used to do this were things taught in the videos from prior weeks (note: it took us a little bit to infer this from past modules but we got there in the end)

ageaverage <- cleandata %>% #calculating average age including sd using cleaned data
  select(General_1_Age) %>%
  summarise(ageaverage = mean(General_1_Age), 
          agesd = sd(General_1_Age))


print(ageaverage)

## # A tibble: 1 x 2
##   ageaverage agesd
##        <dbl> <dbl>
## 1       19.5  1.23

I will note that I make this sound like we got to this solution straight away. We actually didn’t clean our data by excluding the excluding participants originally. We started calculating age average, then found our numbers weren’t the same as the paper, so we decided to filter our data! Which worked!

ESS

We applied this logic to find ESS. Which we had to do some reading to learn that this was the result of the Epworth Sleepiness Scale.

Lucky for us, there was a variable labelled Epworth_total.

ESS <- cleandata %>% 
  select(Epworth_total) %>% 
  summarise(ESS = mean(Epworth_total), ESSSD = sd(Epworth_total))

print(ESS)

## # A tibble: 1 x 2
##     ESS ESSSD
##   <dbl> <dbl>
## 1  15.3  2.83

From here we started to encounter some challenges…

Challenges: it’s starting to get complicated ! Ish

Group member complications

First challenge was that not all group members were able to reach the same output.Jade and Kath’s code were coming up with an error code that the argument was not logical or numerical. This was strange because we all had the same code. We even copy pasted our code in attempt to make it work. Unfortunately we did not reach any solutions. Therefore, a goal for the next week is to figure out how to make the code work for the other members!

SSS - finding the average of combined variables

During the group meeting we tried to also squeeze in calculating the next variable: SSS (which took some sleuthing to figure out).

In the article SSS stood for the Standford Sleepiness Scale. Interestingly, no variable included this, or even SSS in the label.

After some googling (which I also learnt today, google is literally your best friend) I found that the scale measures alertness (more so than sleepiness, even though that’s what it’s named after), and aha, there were variables that started with alert.

It started getting more complicated here.

This variable is tricky as we need to average the scores across variables compared to just finding the average of one variable.

Or so I thought.

After doing some much needed reading, I realised that all I had to do was find the average of the variable Alerttest_1_feel, which is a section of the SSS done prior to the study (a baseline measure).

SSS <- cleandata %>% 
  select(AlertTest_1_Feel) %>% 
  summarise(SSSmean = mean(AlertTest_1_Feel), SSSsd = sd(AlertTest_1_Feel))

print(SSS)

## # A tibble: 1 x 2
##   SSSmean SSSsd
##     <dbl> <dbl>
## 1    2.81 0.749

I won’t include all the messy details in this learning log, but prior to discovering this, I tried so many different methods to try to merge variable data, and find the average of this data. Interestingly, I got somewhere, but obviously the numbers did not match up with the original paper.

These methods included:

Selecting only Alert_test_ _Feel, variables (as the numbers were small and seemed most likely to lead to the original result)
calculating the mean of each variable, and making this a new data set
creating a new variable trying to find the mean of the variable means

SSSgroup <- cleandata %>% 
  select(AlertTest_1_Feel, 
         AlertTest_2_Feel, 
         AlertTest_3_Feel, 
         AlertTest_4_Feel) %>% 
  drop_na() %>% 
    summarise(
    AT1F = mean(AlertTest_1_Feel),
    AT2F = mean(AlertTest_2_Feel),
    AT3F = mean(AlertTest_3_Feel),
    AT4F = mean(AlertTest_4_Feel)
    )

SSS <- SSSgroup %>% 
  summarise(SSSav = mean(SSSgroup))

## Warning in mean.default(SSSgroup): argument is not numeric or logical: returning
## NA

print(SSS)

## # A tibble: 1 x 1
##   SSSav
##   <dbl>
## 1    NA

Attepting using merge function (a method I found using google)

SSStotal <- cleandata %>% 
   select(AlertTest_1_Feel, 
         AlertTest_2_Feel, 
         AlertTest_3_Feel, 
         AlertTest_4_Feel) %>% 
  merge(AlertTest_1_Feel, AlertTest_2_Feel, AlertTest_3_Feel, AlertTest_4_Feel, by=participantID)

print(SSStotal)

attempting to use bind function: successful but not right value
also tried to remove NA, as I was coming up with an error saying the data was not logical or numerical

SSS <- cleandata %>% 
  select(AlertTest_1_Feel, 
         AlertTest_2_Feel, 
         AlertTest_3_Feel, 
         AlertTest_4_Feel) %>% 
  drop_na() %>% 
  summarise( mean( rbind(AlertTest_1_Feel, AlertTest_2_Feel, AlertTest_3_Feel, AlertTest_4_Feel)))

print(SSS)

## # A tibble: 1 x 1
##   `mean(...)`
##         <dbl>
## 1        2.69

Applying above to not tweaked data (ignoring exclusions) - not right value

SSStrial <- replicationdata %>% 
  select(AlertTest_1_Feel, 
         AlertTest_2_Feel, 
         AlertTest_3_Feel, 
         AlertTest_4_Feel) %>% 
  drop_na() %>% 
  summarise( mean( rbind(AlertTest_1_Feel, AlertTest_2_Feel, AlertTest_3_Feel, AlertTest_4_Feel)))

print(SSStrial)

## # A tibble: 1 x 1
##   `mean(...)`
##         <dbl>
## 1        2.62

I suppose this kind of counts as a success as I learnt how to find the mean of multiple variables combined.

Calculating Baseline Implicit Bias

Lucky for me, I got to use this new found knowledge, and apply it to the next variable! In the dataset, there are two variables for baseline bias: - base_IAT_race - base_IAT_gen

These measured baseline implicit bias for race or gender.

Therefore, I needed to combine these two variables and find the mean of the variables combined: which was what I tried to do before with SSS using the bind function!

I tried it with baseline bias and it worked!

BIB <- cleandata %>% 
  select(
    base_IAT_race,
    base_IAT_gen) %>% 
  summarise(BIBav = mean(rbind(base_IAT_race, base_IAT_gen)), BIBsd = sd(rbind(base_IAT_race, base_IAT_gen)))

print(BIB)

## # A tibble: 1 x 2
##   BIBav BIBsd
##   <dbl> <dbl>
## 1 0.557 0.406

I then attempted calculating the next few variables using a similar method:

Pre-nap implicit bias:

PrenapIB <- cleandata %>% 
  select(
    pre_IAT_race,
    pre_IAT_gen) %>% 
  summarise(PrenapIBav = mean(rbind(pre_IAT_race, pre_IAT_gen)), PrenapIBsd = sd(rbind(pre_IAT_race, pre_IAT_gen))) 

print(PrenapIB)

## # A tibble: 1 x 2
##   PrenapIBav PrenapIBsd
##        <dbl>      <dbl>
## 1      0.257      0.478

Post nap implicit bias:

PostnapIB <- cleandata %>% 
  select(
    post_IAT_race,
    post_IAT_gen) %>% 
  summarise(
    PostnapIBav = mean(rbind(post_IAT_race, post_IAT_gen)), 
    PostnapIBsd = sd(rbind(post_IAT_race, post_IAT_gen))) 

print(PostnapIB)

## # A tibble: 1 x 2
##   PostnapIBav PostnapIBsd
##         <dbl>       <dbl>
## 1       0.278       0.459

Week delay bias:

OWDIB <- cleandata %>% 
  select(
    week_IAT_race,
    week_IAT_gen) %>% 
  summarise(
    OWDIBav = mean(rbind(week_IAT_gen, week_IAT_gen)), 
    OWDIBsd = sd(rbind(week_IAT_gen, week_IAT_gen))
  )

print(OWDIB)

## # A tibble: 1 x 2
##   OWDIBav OWDIBsd
##     <dbl>   <dbl>
## 1   0.384   0.402

The last two variables were slightly different, being percentages. After some googling, I learnt how to tally data in a variable, and how to manually calculate a percentage:

Calculating average sex:

Male <- cleandata %>% 
  select(General_1_Sex) %>%
  tally(General_1_Sex == 1)

Male_percentage <- Male/31 #31 as the clean data set has 31 participants

print(Male_percentage)

##          n
## 1 0.483871

Calculating cue percentage:

NapCue <- cleandata %>% 
  select(Cue_condition) %>% 
  tally(Cue_condition == 1)

NapCue_percentage <- NapCue/31

print(NapCue_percentage)

##           n
## 1 0.5483871

Next steps

Although I made heaps of progress in calculating averages, the results were not rounded to 2 decimal points like the original table, and a few were slightly off. The next steps in my coding journey would be to figure out how to round my output to 2 d.p. and also to figure out why certain variables are not showing the same results as the original table.

In regards to putting this data into a table, we are working as a group to figure out how to do so, with Jade having an attempt which turned out great!

Link to Jade’s RPubs table: https://rpubs.com/jgurtala/descriptivetableattempt