Week 8 Learning Log

This week’s goals

Finish the speech and slides for the presentation
Prerecord the presentation and upload it
Get started on exploratory analysis, begin brainstorming ideas
Finalise some of the tables from part 2 verification

Brainstorming ideas for exploratory analysis

I began by rereading the paper as well as the original paper which our paper tried to reproduce. As I was reading I took note of what the authors mentioned in their discussion, exploratory analysis (though it was very short) to see if I could come up with any ideas.

After I had general clue of what I might be able to look at I then revisited the original SPSS data file where there were all the variables were just to check that I had the data there to explore the analysis I was interested in.

To be honest thinking of 3 questions was particularly hard for me, I only have one question so far so I just decided to work on that for now

Question 1

Are there differences in the long term effectiveness of TMR across racial and gender bias levels

Something that I noticed when reproducing the tables and figures was that each of the implicit bias measures from table 3 onwards was calculated as the average of the gender and race bias values across time per cued or uncued condition. For example, referring back to table 3, the bias score at the baseline time measure for the cued condition measures the average of cued gender bias score and cued race bias score at baseline time.

This is interesting because there may be differences in the prevelance and most importantly persistence of gender bias compared to racial bias. In fact, the authors recorded a significantly greater reduction in racial bias ( p< 0.001) compared to gender bias (p = 0.2) following counter-stereotype training. This suggests that gender bias maybe more difficult to reduce compared to racial bias. So simply averaging the gender and bias scores across all subsequent analysis to measure the cued condition will loose information regarding the differences in the effectiveness of TMR for different kinds of implicit bias. Recall that TMR is performed in the cued condition and not the uncued condition. All measures in the uncued condition show the effectiveness of counter-stereotype training only.

Since racial bias may be more easily reduced with techniques such as counter bias training, it may be the case that TMR is more effective in reducing racial bias compared to gender bias. This brings me to my first question: Are there differences in the delayed effectiveness of TMR across racial and gender bias levels.

In my reaction, I mentioned the potential real word implications of TMR such that if it is shown to be effective then it could potentially reduce gender and race bias across a domain of contexts such as employment and education. But to do so TMR must be able to maintain a reduction of bias overtime. If it cannot maintain reduced levels of implicit bias for even one week, then it may not be worth implementing in the real world at all.

Because of that, I chose explore the delayed (or long term) effectiveness of TMR as opposed to immediate effectiveness of TMR. Now although the authors didn’t find TMR (cued condition) to be more effective that no TMR (or counterstereotype training only - uncued condition), there may still be differences in the effect of TMR on gender and race bias at this time poit.

load libraries

library(tidyverse) #for dyplr and ggplot to conduct data wrangling and visualization

## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --

## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.2     v dplyr   1.0.6
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(readspss) #to read in sav data
library(gt) #for creating highly customisable tables 
library(janitor) #to clean names

## 
## Attaching package: 'janitor'

## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

library(rstatix) #for inferrential statistics

## 
## Attaching package: 'rstatix'

## The following object is masked from 'package:janitor':
## 
##     make_clean_names

## The following object is masked from 'package:stats':
## 
##     filter

library(jmv) #for inferrential statistics
library(ggpubr) #for t tests and other inferrential statistics
library(ggeasy) #for easy ggplot functions

creating my dataframes

First, I used the dataframe ‘cleandata’ as used previously in verification as I have no changed any exlcusion criteria. Then I useselect to choose the variables relevant to this question. I am looking at the long term effect of TMR which requires measures of bias levels from the prenap time frame and one week delay timeframe. Further details of the selected variables are here:

cue_played: either the gender or race cue that is played during sleep
pre_iat_race: bias measure for race bias prior to nap
pre_iat_gen: bias measure for gender bias prior to nap
week_iat_race: bias measure for race one week after nap
week_iat_gen: bias measure for gender one week after nap

I then used mutate to rename some of the labels in the cue_played column to make them shorter and hence easier to use later when coding. Finally I used rename to rename the cue_played column to ‘bias_type’ since that is what this question is focusing on: the effect of TMR on different bias types.

This new dataframe is called ‘q1’ which I will be using for further manipulation in part of the exploratory analysis, and hence I reffered it to my master dataframe

#had to add this part in because my doc wouldn't knit, the error said clean data wasn't found, although when i run this chunk without this part it works.
originaldata <- read.sav("Humiston & Wamsley 2019 data.sav") %>% 
  clean_names() %>% 
  rename(base_iat_cued = base_ia_tcued, base_iat_uncued = base_ia_tuncued, pre_iat_cued = pre_ia_tcued, pre_iat_uncued = pre_ia_tuncued, post_iat_cued = post_ia_tcued, post_iat_uncued = post_ia_tuncued, week_iat_cued = week_ia_tcued, week_iat_uncued = week_ia_tuncued)


cleandata <- originaldata %>%    
  filter(exclude == "no")

cleandata <- cleandata %>% 
  rename(age = general_1_age, ess = epworth_total, sex = general_1_sex, cue_played = cue_condition)

cleandata <- cleandata %>% 
  mutate(alert_test_1_feel = as.numeric(
      x = alert_test_1_feel,
      levels = 1:5,
      labels = c("1 - Feeling active, vital alert, or wide awake",
      "2 - Functioning at high levels, but not at peak; able to concentrate",
      "3 - Awake, but relaxed; responsive but not fully alert",
      "4 - Somewhat foggy, let down",
      "5 - Foggy; losing interest in remaining awake; slowed down"),
      exclude = NA)) %>% 
  rename(sss = alert_test_1_feel)

#okay q1 actually starts here: master dataframe
q1 <- cleandata %>% 
  select(cue_played, pre_iat_race, pre_iat_gen, week_iat_race, week_iat_gen) %>% 
  mutate(cue_played = case_when(cue_played == "race cue played" ~ "race", cue_played == "gender cue played" ~ "gender")) %>% 
  rename(bias_type = cue_played)

print(q1)

##    bias_type pre_iat_race pre_iat_gen week_iat_race week_iat_gen
## 1       race   0.55905291  0.21462144    0.20377367   0.68277422
## 2       race  -0.13380639  0.33985028    0.45873715  -0.01070460
## 3     gender   0.37990232  0.51077026    0.71187286   0.39859469
## 4     gender  -0.94209553 -0.02933191    0.20212832   0.92341592
## 5       race   0.30457405 -0.24051733   -0.01869151   0.13071184
## 6     gender   0.24703493  0.14792592    1.11629844   0.56073473
## 7       race   0.03381898  0.21080619   -0.06857532  -0.27687601
## 8     gender   0.12732428  1.09547470    0.03100248   0.25359928
## 9       race   0.27143618  0.33836594    0.77816500  -0.31888702
## 10      race   0.72820197  0.51886030    0.08084087   0.03038869
## 11      race   1.12742309  0.68495341    0.28109140   0.58451878
## 12    gender   0.57176179  0.23441009    0.98690632   0.51216459
## 13      race   0.57176179  0.53588944    0.55663600   0.93836953
## 14      race   0.48911851  0.22564875    0.21879167   0.40828505
## 15    gender   0.51408849 -0.08136314    0.88612653   0.64307684
## 16      race  -1.02859538  0.27286945    0.90497541   0.99438842
## 17    gender  -0.44307448  0.39856097    0.00799229   0.57458515
## 18      race  -0.09889809  1.00470212   -0.12045279   1.19622739
## 19    gender   0.45773742  0.33828494    0.62092800   0.64107173
## 20      race  -0.75366413  0.20834125    0.13055599  -0.07578923
## 21    gender  -0.06327933 -0.28970106    0.50089370   0.30356801
## 22      race   0.02408617 -0.17993151   -0.24029660   0.02753136
## 23    gender   0.44885737  0.56794190   -0.55788997  -0.37601343
## 24      race  -0.55822308  0.30386831    0.99021393   0.38961081
## 25    gender   0.82039383  0.46273489    0.98645892   0.62261116
## 26    gender   0.80491967 -0.22351103    0.86611864   0.16658792
## 27      race  -0.46870538  0.75843118    0.05022121  -0.10227978
## 28      race   0.66699549 -0.32955309    1.22970000   0.16271922
## 29    gender   1.26790526  1.00491531    0.62636312   0.89204287
## 30      race   0.10204007  0.07288287    0.16209957   0.33564519
## 31    gender   0.24433661  0.56374874    0.28350526   0.68478850

I spent a long time figuring out how to add a column that identifies the cued and uncued conditions as q1 didn’t have these values. I finally figured to make two seperate dataframes, one detailing only the data from when the bias type is race, and the other detailing only data from when the bias type is gender.

For each of these data frames I used filter to select the relevant rows of information, then using mutate to create two new columns called cued and uncued in each dataframe, where the formulas with respect to the time frame was the same. ie. in q1_race, cued and uncued both use week - pre. Only difference is eg. in q1_race, if race cue was presented during nap (now called bias_type below in filter), then the week and pre race bias scores would represent the cued scores, and the week and pre gender bias scores would represent the uncued scores.Cued essentially means the sound associated with sound was played during nap. And for q1_gender where the gender cue was played during sleep I did the same thing, matching the gender bias scores into the cued equation, and race bias into the uncued equation.

For both dataframes, I then selected on the the bias_type, cued, and uncued columns because I wanted to change the format of the dataframe into somethning useable for ggplot. I did this by using pivot_longer, because I wanted the variable cued and uncued in a single column which I named ‘cue_condition’ and the bias values of the cued and uncued condition into another column which I named ‘bias_change’

#race info
q1_race <- q1 %>% 
  filter(bias_type == "race") %>% 
  mutate(cued = week_iat_race - pre_iat_race, uncued = week_iat_gen - pre_iat_gen) %>% 
  select(bias_type, cued, uncued) 

q1_race <- q1_race %>% 
  pivot_longer(q1_race, cols = c(cued, uncued), names_to = "cue_condition", values_to = "bias_change")

## Warning in gsub(paste0("^", names_prefix), "", names(cols)): argument 'pattern'
## has length > 1 and only the first element will be used

print(q1_race)

## # A tibble: 34 x 3
##    bias_type cue_condition bias_change
##    <chr>     <chr>               <dbl>
##  1 race      cued               -0.355
##  2 race      uncued              0.468
##  3 race      cued                0.593
##  4 race      uncued             -0.351
##  5 race      cued               -0.323
##  6 race      uncued              0.371
##  7 race      cued               -0.102
##  8 race      uncued             -0.488
##  9 race      cued                0.507
## 10 race      uncued             -0.657
## # ... with 24 more rows

#gender info
q1_gender <- q1 %>% 
  filter(bias_type == "gender") %>% 
  mutate(cued = week_iat_gen - pre_iat_gen, uncued = week_iat_race - pre_iat_race) %>% 
  select(bias_type, cued, uncued) 

q1_gender<- q1_gender %>% 
  pivot_longer(q1_gender, cols = c(cued, uncued), names_to = "cue_condition", values_to = "bias_change")

## Warning in gsub(paste0("^", names_prefix), "", names(cols)): argument 'pattern'
## has length > 1 and only the first element will be used

print(q1_gender)

## # A tibble: 28 x 3
##    bias_type cue_condition bias_change
##    <chr>     <chr>               <dbl>
##  1 gender    cued              -0.112 
##  2 gender    uncued             0.332 
##  3 gender    cued               0.953 
##  4 gender    uncued             1.14  
##  5 gender    cued               0.413 
##  6 gender    uncued             0.869 
##  7 gender    cued              -0.842 
##  8 gender    uncued            -0.0963
##  9 gender    cued               0.278 
## 10 gender    uncued             0.415 
## # ... with 18 more rows

Voila, now I can merge the information from these two dataframes I’m calling q1_full into one using ‘bind_rows’ as I want to stack them on top of each other. Then as I’m focusing on the effects of TMR, which is the cued condition only, I used filter to select data from the cued condition only.

#binding race and gender info together for ggplot later

q1_full <- bind_rows(q1_race, q1_gender) %>% 
  filter(cue_condition == "cued")

print(q1_full)

## # A tibble: 31 x 3
##    bias_type cue_condition bias_change
##    <chr>     <chr>               <dbl>
##  1 race      cued              -0.355 
##  2 race      cued               0.593 
##  3 race      cued              -0.323 
##  4 race      cued              -0.102 
##  5 race      cued               0.507 
##  6 race      cued              -0.647 
##  7 race      cued              -0.846 
##  8 race      cued              -0.0151
##  9 race      cued              -0.270 
## 10 race      cued               1.93  
## # ... with 21 more rows

descriptive statistics

outliers

First I’m going to check for outliers in my data before moving onto producing the summary statistics. I’m using group_by along with identify_outliers from the rstatix package which will tell the outliers from the data per race and gender group. Finally I ungroup (good habits).

q1_check <- q1_full %>% 
  group_by(bias_type) %>% 
  identify_outliers(bias_change) %>% 
  ungroup()

print(q1_check)

## # A tibble: 3 x 5
##   bias_type cue_condition bias_change is.outlier is.extreme
##   <chr>     <chr>               <dbl> <lgl>      <lgl>     
## 1 gender    cued               -0.842 TRUE       FALSE     
## 2 gender    cued               -0.944 TRUE       FALSE     
## 3 race      cued                1.93  TRUE       FALSE

So there are some, but none of them are extreme outliers, therefore I won’t need to exclude and more data for summary statistics.

summary statistics

I’m creating a new dataframe called q1_summ which will include summary data from ‘q1_full’. I’m using summarise to find the mean, sd, n, and se for both race and gender bias levels, which is specified using group_by. I used round to round the values to 2 decimal places. Lastly I piped it into a table using gt() and added a heading with tab_header.

q1_summ <- q1_full %>% 
  group_by(bias_type) %>% 
  summarise(mean = round(mean(bias_change),2),
            sd = round(sd(bias_change),2),
            n = n(),
            se = round(sd/sqrt(n),2)) %>% 
  gt() %>%
  tab_header(title = md("**Bias levels across bias type and cue condition**"))

data visualisation

Now it’s time to put the data into a plot using ggplot. I used the following geoms:

geom_boxplot as I hadn’t used it before in part 2 of the report. It’s nice because it shows the mean, quartiles, range and outliers
geom_jitter to visualise where the individual measurements of bias change were. To easily distinguish which dots belong to which bias type I used colour=bias_type to do so. I also make the more opaque using alpha compared to the boxplot since I wanted to see the points on top of the boxplot.

To finalise the aethetics of the graph:

I also changed the limits of the y-axis using limits in scale_y_continuous because I didn’t want the outliers to be too close to the edge of the graph.
easy_remove_legend from the ggeasy package to remove the legend as the information overlapped with the x axis
labs() to relabel the axis and add a footnote using caption

q1_pic <- ggplot(data = q1_full, aes(x = bias_type, y = bias_change, fill = bias_type))+
  geom_boxplot(alpha = 0.5)+
  geom_jitter(alpha = 0.8, aes(colour=bias_type))+
  scale_y_continuous(limits = c(-1.25,2.15))+
  theme_bw()+
  easy_remove_legend()+
  labs(x = "Implicit Bias Type",
       y = "Bias Change (Week-pre)",
       title = "Long term effectiveness of TMR for different implicit bias types")

plot(q1_pic)

Okay, interesting graph, it seems that there is a slight difference in bias change between bias type, but there is also a lot of variation, so running a t-test should confirm if the bias change is significantly different between bias type.

inferential statistics

Here I used the ttestIS function from the jmv package. It performs a t-test once the data and variables have been specified. Below I’ve specified the DV as ‘bias_change’ and I want to know if that varies significantly between ‘bias_type`. THe data comes from the ’q1_full’ dataframe I created earlier.

ttestIS(formula = bias_change ~ bias_type, data = q1_full)

## 
##  INDEPENDENT SAMPLES T-TEST
## 
##  Independent Samples T-Test                                            
##  --------------------------------------------------------------------- 
##                                  Statistic     df          p           
##  --------------------------------------------------------------------- 
##    bias_change    Student's t    -0.3011625    29.00000    0.7654394   
##  ---------------------------------------------------------------------

reporting results

Results from the independent samples t test indicate that there is no significant difference between bias change between gender and race bias. (t = -.030, df = 29, p = 0.77).

Interpreting results

There is no evidence to support differences in the long term effectiveness (1 week) of TMR for different types of implicit (gender and race) bias.

Challenges and Successess

I had so many troubles with this first question already, mostly relating to how I was going to obtain the data relevant for my ggplot. It took a lot of trial and error and the biggest breakthrough was remembering the pivot_longer function which allowed me to go back to the beginning and start again with a better idea of where I was headed.

It is definetely a good idea to work backwards next time starting from the graph I want to then deciding what format my data needed to be and then figuring out how I can get the original data and manipulate it into that.

I also encountered some struggles when downloading the jmv package as it did something funny to tidyverse and similar to uninstalling it meaning I couldn’t work through any code (insert surprised face). I tried re installing tidyverse and it took a couple tries but it eventually worked!!

Also when knitting there were multiple errors stating that specific dataframes couldn’t be found which was strange as I ran through the code for each chunk and it worked, I ended up having to make some of the code longer such as in the pivot sections, and adding this massive chunk of code that I used when creating cleandata.

Overall I’m proud of what I’ve achieved this week, I feel like I did a lot of big brain coding with creating an manipulating all my data, and going back to the beginning where I remembered the pivot longer function or bindrows. Also we finished our presentaton! That a big win :)

Questions

Is it necessary to find literature for every piece of exploratory analysis to justify why it’s being conducted, or is something similar to what I have done, ie. taking something that the authors found, and try to build on it good enough?
I found it really hard to think of 3 entirely different questions that would reproduce different types of figures from the verification especially since our group had a scatterplot, a bargraph, and a line graph. Is it okay if say 2 of our exploratory analysis questions had figures that are similar to ones produced in part 2?
In terms of rubber ducking for part 2 of the VR report, if I have explained certain functions say eg. in my table 1 formatting from the gt package such as title, subtitle, etc., and the same functions are used again to do exactly the same things in all my other tables (total 4), do I need to explain them again for each graph, or is saying something like refer to table 1 for explanation of functions sufficient?

Next week

Tidy up the code for q1 if possible
Keep brainstorming to find another question for exploratory analysis