Week 9 Learning Log

Julia Chen

01/08/2021

This week’s goals

  • Fully understand my code for my 1st exploratory question
  • Continue brainstorming ideas for my exploratory analysis, aim to at least think of 2 more ideas even if I can’t finish all the code by the end of this week

Brainstoriming ideas for exploratory analysis

  • See if there is any other variable related to sleep that produces a significant correlation with differential bias change. The authors explored the correlations between each type of sleep stage recorded in their data with differential bias change and none of them were statistically significant

  • I wonder if there are differences between men and women in terms of their implicit bias levels following TMR or counter stereotype training only (averaged across gender and race implicit bias). This idea is still a work in progress, there are lots of time frames, and different types of implicit bias measures that I could use, so I’ll need to think of some sort narrower direction and a rationale to go along with it before I start coding.

  • Are there differences in the effectiveness of TMR and counter stereotype training only depending on experiment compensation. This is my back up question if I can’t figure out some of the other questions.

loading libraries

I’m going to first load my libraries for coding later

library(tidyverse) #for dyplr and ggplot to conduct data wrangling and visualization
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.2     v dplyr   1.0.6
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(gt) #for creating highly customisable tables 
library(janitor) #to clean names
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(rstatix) #for inferrential statistics
## 
## Attaching package: 'rstatix'
## The following object is masked from 'package:janitor':
## 
##     make_clean_names
## The following object is masked from 'package:stats':
## 
##     filter
library(jmv) #for inferrential statistics
library(ggpubr) #for t tests and other inferrential statistics
library(ggeasy) #for easy ggplot functions
library(corrplot) #for correlation matrixes
## corrplot 0.89 loaded

Question 2: Do other measures of sleep quality correlate with differential bias change?

In part 2 of the verification report, our group reproduced this graph:

This graph shows the insignificant correlation between SWS x REM sleep duration (min) and differential bias change (the difference between the cued and uncued conditions regarding their respective change in implicit bias levels across time). The authors then explored whether time spent in other stages of sleep were correlated to differential bias change. They didn’t find a significant correlation between the time spent in any of the other sleep stages (non-rem 1, non-rem 2, SWS (slow wave sleep) individually, REM sleep individually) and differential bias change. They also didn’t find a significant correlation between the length of time the cue was presented during sleep and differential bias change (though the p value for this correlation was the closest to being significant compared to the other correlations).

When revising this graph again, I was curious as to why the authors measured time in slow wave sleep (SWS) multiplied by time in REM sleep because this didn’t really make much sense to me as what it actually measured. I first thought perhaps they did that because SWS and REM are correlated moreso than the other sleep stages are. So to check, I used the cor function from the corrplot package to create a correlation matrix between time spent in all of the sleep stages. select allows me to select values from those variables to create the matrix from:

cleandata <- read.csv("cleandata.csv") #reading in the cleandata frame saved from earlier

q2_corr_matrix <- cleandata %>% #I decided to use cleandata because I didn't want to change the exclusion criteria from what the authors originally decided on
  select(total_sleep, wake_amount, nrem1_amount, nrem2_amount, sws_amount, rem_amount) %>% 
  cor()

corrplot(q2_corr_matrix)

So it turns out that this isn’t true and the time spent in each of the sleep stages didn’t really correlate with any other ones. So I had to refer to the original paper (Hu et al., 2015) that ours replicated to figure out that SWS x REM was used used as it was a measure of sleep quality with a higher value indicating greater sleep quality.

This is intriguing because other measures of sleep quality may correlate with differential bias change. One other measure of sleep quality is the proportion of time spent in REM sleep. It seems that healthy sleep is determined by REM sleep approx. accounting for 20-25% of the total time asleep in adults (Altevogt & Colten, 2006). Another measure of sleep quality is the total amount of time asleep compared to time awake.

This brings me to my second question, whether other measures of sleep quality are correlated with differential bias change.

obtaining relevant variables

First I am selecting variables that will be useful for this question using select from the ‘cleandata’ dataframe, as I am still following the exlucsion criteria of the authors.These variables are the differential bias change value (diff_biaschange), time spent sleeping, awake, in REM and time in SWS x REM.

I used rename to renmame the sws x rem variable so that it’s easier to work with.

Then I used mutate to create three new columns/variables of data. The first one is ‘rem_pct’ which is the total amount of time the participant spends in rem compared to their total time asleep. The second one is ‘sleep_and_wake’ which is the total time the participants spent awake and alseep. This value is used to create the third variable ‘asleep_pct’ which is the percentage of time that the participant spends asleep compared to awake.

q2 <- cleandata %>% 
  select(diff_biaschange, diff_biaschange_cued, diff_biaschange_uncued, total_sleep, wake_amount, rem_amount, sw_sx_rem, cue_minutes) %>% 
  rename(sws_x_rem = sw_sx_rem) %>% 
  mutate(rem_pct = rem_amount/total_sleep * 100,
         sleep_and_wake = total_sleep + wake_amount,
         asleep_pct = total_sleep/sleep_and_wake * 100) 
  
print(q2)
##    diff_biaschange diff_biaschange_cued diff_biaschange_uncued total_sleep
## 1       0.44490584          0.371668148            -0.07323769          65
## 2      -1.01429471         -0.359624732             0.65466998          66
## 3      -1.00530440         -0.192821041             0.81248335          80
## 4      -0.49923043         -0.570273967            -0.07104354          62
## 5       0.67261133          0.590693579            -0.08191775          51
## 6      -0.03539601         -0.250479588            -0.21508358          81
## 7      -1.47683672          0.300986111             1.77782283          81
## 8      -0.15781907          0.425109806             0.58292888          67
## 9      -0.53136657          0.309977544             0.84134411          79
## 10      0.22017019          0.472346888             0.25217670          71
## 11      0.80800876          0.636426002            -0.17158276          68
## 12      0.71564723          0.172082405            -0.54356483          85
## 13      0.69400342          0.531805761            -0.16219766          74
## 14      0.36763005          0.725511445             0.35788139          73
## 15     -1.11173748         -0.793593406             0.31814407          79
## 16     -0.82626125         -0.920808186            -0.09454693          37
## 17     -0.29533509         -0.037791997             0.25754309          52
## 18      1.26682255          1.019294435            -0.24752811          62
## 19      0.25179469         -0.002406687            -0.25420138          88
## 20     -0.23329644         -0.362653222            -0.12935678          70
## 21      0.04315263          0.055971988             0.01281936          77
## 22      0.17642051          0.507566544             0.33114603          74
## 23      0.02712447          1.220839300             1.19371483          77
## 24     -0.53384387         -0.357015106             0.17682877          77
## 25     -0.13485088         -0.183065550            -0.04821467          84
## 26      0.34427330          0.564860845             0.22058754          79
## 27     -0.31348070         -0.127572774             0.18590793          65
## 28      0.20937037         -0.140763994            -0.35013437          61
## 29     -0.78933933         -0.093405722             0.69593361          75
## 30      0.28747433          0.282019388            -0.00545494          80
## 31     -0.01955842         -0.148474613            -0.12891620          84
##    wake_amount rem_amount sws_x_rem cue_minutes   rem_pct sleep_and_wake
## 1           25         23       276         9.5 35.384615             90
## 2           24          0         0        12.0  0.000000             90
## 3           10         17       408        15.5 21.250000             90
## 4           28         17       408        16.0 27.419355             90
## 5           39          2        32        15.0  3.921569             90
## 6            9         18       648        25.0 22.222222             90
## 7            9          9       333        23.0 11.111111             90
## 8           23          9        36         3.0 13.432836             90
## 9           11         24       552        18.5 30.379747             90
## 10          19         11       275        19.0 15.492958             90
## 11          22         10        70        12.5 14.705882             90
## 12           5         16       496        28.0 18.823529             90
## 13          16          0         0        32.5  0.000000             90
## 14          17         17       476        28.0 23.287671             90
## 15          11         11       363        36.0 13.924051             90
## 16          53          0         0        14.0  0.000000             90
## 17          38          9       198        19.0 17.307692             90
## 18          28          0         0         2.5  0.000000             90
## 19           2          9       216        26.0 10.227273             90
## 20          20         16       240        18.0 22.857143             90
## 21          13         10       450        44.0 12.987013             90
## 22          16         18       684        29.5 24.324324             90
## 23          13         22       836        37.5 28.571429             90
## 24          13         23       506        28.5 29.870130             90
## 25           6          0         0        40.0  0.000000             90
## 26          11         19       418         3.5 24.050633             90
## 27          25         24       480        27.0 36.923077             90
## 28          29          5        50        11.0  8.196721             90
## 29          15          0         0        33.5  0.000000             90
## 30          10         12       336        17.5 15.000000             90
## 31           6         14       224        23.5 16.666667             90
##    asleep_pct
## 1    72.22222
## 2    73.33333
## 3    88.88889
## 4    68.88889
## 5    56.66667
## 6    90.00000
## 7    90.00000
## 8    74.44444
## 9    87.77778
## 10   78.88889
## 11   75.55556
## 12   94.44444
## 13   82.22222
## 14   81.11111
## 15   87.77778
## 16   41.11111
## 17   57.77778
## 18   68.88889
## 19   97.77778
## 20   77.77778
## 21   85.55556
## 22   82.22222
## 23   85.55556
## 24   85.55556
## 25   93.33333
## 26   87.77778
## 27   72.22222
## 28   67.77778
## 29   83.33333
## 30   88.88889
## 31   93.33333

identifying outliers

Then I’m using the identify_outliers from the rstatix package to see if anyone’s ’asleep_pct` is an outlier.

q2_outliers <- q2 %>% 
  identify_outliers(asleep_pct)

print(q2_outliers)
##   diff_biaschange diff_biaschange_cued diff_biaschange_uncued total_sleep
## 1      -0.8262613           -0.9208082            -0.09454693          37
##   wake_amount rem_amount sws_x_rem cue_minutes rem_pct sleep_and_wake
## 1          53          0         0          14       0             90
##   asleep_pct is.outlier is.extreme
## 1   41.11111       TRUE      FALSE

There are no extreme outliers, so I will not be filtering this dataset any further.

data visualisation

Now to create a ggplot to visualise the correlation between quality of sleep defined as percentage of time spent asleep, and differential bias change:

I’m using geom_point to create a scatterplot, and adding another layer geom_smooth to add the line of best fit, specifing the line to be linear with method = lm, and removing the confidence interval shading with se = F.

q2.1_pic <- ggplot(q2, aes(asleep_pct, diff_biaschange))+
  geom_point()+
  geom_smooth(method = lm, se = F)

plot(q2.1_pic)
## `geom_smooth()` using formula 'y ~ x'

another <- ggplot(q2, aes(asleep_pct, diff_biaschange_cued))+
  geom_point()+
  geom_smooth(method = lm, se = F)

plot(another)
## `geom_smooth()` using formula 'y ~ x'

There doesn’t seem to be a correlation between these two variables. Before moving onto inferrential statistics, lets check how many people fall into the 20-25% of REM sleep using the same ggplot functions as the previous graph:

q2_pic <- ggplot(q2, aes(rem_pct, diff_biaschange))+
  geom_point()+
  geom_smooth(method = lm, se = F)

plot(q2_pic)
## `geom_smooth()` using formula 'y ~ x'

It doesn’t look like many people do.

Originally I was going to use filter the data from the q2 data frame by excluding all participants who did not have good quality sleep by filtering out participants whose rem_pct was beyond the 20-25% range as previously stated and then using that sample of participants to investigate the correlation between SWSxREM and differential bias change. However as we can see from the graph above, only 6 participants meet this criteria, and therefore wouldn’t be a large enough sample to conduct my intended analysis.

Therefore I will be only using statistical analysis to look at the correlation between asleep_pct and differential bias change.

inferential statistics

I’m using the function lm from the package stats to carry out a regression between asleep_pct and diff_bias change as seen below

q2_stats <- lm(asleep_pct ~ diff_biaschange, data = q2)
summary(q2_stats)
## 
## Call:
## lm(formula = asleep_pct ~ diff_biaschange, data = q2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -39.047  -7.058   2.969   8.470  18.261 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      79.6662     2.3039  34.578   <2e-16 ***
## diff_biaschange  -0.5955     3.6589  -0.163    0.872    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.73 on 29 degrees of freedom
## Multiple R-squared:  0.0009124,  Adjusted R-squared:  -0.03354 
## F-statistic: 0.02648 on 1 and 29 DF,  p-value: 0.8719

As expected after looking at my graph,

The correlation between quality of sleep (proportion of time spent asleep comapred to awake) and differential bias change is not significant (F = 0.02648, df = 29, p-value = 0.8719, R^2 = -0.03354)

conclusions Other measures of quality of sleep (proportion of time spent asleep compared to awake) are not correlated with differential bias change just like SWS x REM (mins).

thoughts The p-value obtained is quite a large so instead of doing further exploratory analysis on this measure of quality of sleep, I wanted to follow up on the correlation between minutes of the cue presented and differential bias change since it did have the lowest p-value out of all the correlations explored by the authors.

follow up - replicating exploratory analysis from the authors: the correlation between cue_minutes and differential bias change

visualising the data

Because the q2 dataframe I made earlier has the relevant x and y vairables (cue_minutes and diff_biaschange) I am creating a ggplot similar to all previous graphs from that data frame.

q2_cue <- ggplot(q2, aes(cue_minutes, diff_biaschange))+
  geom_point()+
  geom_smooth(method = lm, se = F)

plot(q2_cue)
## `geom_smooth()` using formula 'y ~ x'

It looks like there is a negative correlation here, so I’m going to use ‘cor’ just like before to see if I can reproduce the values from the paper

q2_cue_stats <- lm(cue_minutes ~ diff_biaschange, data = q2)
summary(q2_cue_stats)
## 
## Call:
## lm(formula = cue_minutes ~ diff_biaschange, data = q2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -18.810  -7.908  -2.911   8.277  22.815 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       21.319      1.959  10.882 9.38e-12 ***
## diff_biaschange   -3.109      3.111  -0.999    0.326    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.82 on 29 degrees of freedom
## Multiple R-squared:  0.03328,    Adjusted R-squared:  -5.004e-05 
## F-statistic: 0.9985 on 1 and 29 DF,  p-value: 0.3259

conclusions

Total number of minutes the cue is played for during sleep is not correlated with differential bias change (F = 0.9985, df = 29, p-value = 0.326 , R^2 = 0.03328. Checking to results from the paper, these results match.

following on

But I want to take things one step further and apply the same logic as previous analysis, to look at the percentage of time that the cue was presented compared to the total amount of time asleep, rather than just looking at the total time the cue was presented. The reason for this is because, the rationale behind TMR (targeted memory reactivation) is that sleep helps strengthen the memory of counter stereotype training by played a cue associated to that training during sleep. Since the participants were asleep over large range of times, and had the cue presented to them across a range of times, perhaps looking at the ratio between cue played (mins) and time asleep (total_sleep) would result in a stronger correlation to differntial bias change (diff_biaschange).

obtaining my variables

First I’m creating a subset dataframe from q2 where I create a new variable uising mutate which has the equation to calculate the percentage of time where the cue was played whilst the participant was asleepl.

q2_cueextra <- q2 %>% 
  mutate(cue_pct = cue_minutes/total_sleep*100)

print(q2_cueextra)
##    diff_biaschange diff_biaschange_cued diff_biaschange_uncued total_sleep
## 1       0.44490584          0.371668148            -0.07323769          65
## 2      -1.01429471         -0.359624732             0.65466998          66
## 3      -1.00530440         -0.192821041             0.81248335          80
## 4      -0.49923043         -0.570273967            -0.07104354          62
## 5       0.67261133          0.590693579            -0.08191775          51
## 6      -0.03539601         -0.250479588            -0.21508358          81
## 7      -1.47683672          0.300986111             1.77782283          81
## 8      -0.15781907          0.425109806             0.58292888          67
## 9      -0.53136657          0.309977544             0.84134411          79
## 10      0.22017019          0.472346888             0.25217670          71
## 11      0.80800876          0.636426002            -0.17158276          68
## 12      0.71564723          0.172082405            -0.54356483          85
## 13      0.69400342          0.531805761            -0.16219766          74
## 14      0.36763005          0.725511445             0.35788139          73
## 15     -1.11173748         -0.793593406             0.31814407          79
## 16     -0.82626125         -0.920808186            -0.09454693          37
## 17     -0.29533509         -0.037791997             0.25754309          52
## 18      1.26682255          1.019294435            -0.24752811          62
## 19      0.25179469         -0.002406687            -0.25420138          88
## 20     -0.23329644         -0.362653222            -0.12935678          70
## 21      0.04315263          0.055971988             0.01281936          77
## 22      0.17642051          0.507566544             0.33114603          74
## 23      0.02712447          1.220839300             1.19371483          77
## 24     -0.53384387         -0.357015106             0.17682877          77
## 25     -0.13485088         -0.183065550            -0.04821467          84
## 26      0.34427330          0.564860845             0.22058754          79
## 27     -0.31348070         -0.127572774             0.18590793          65
## 28      0.20937037         -0.140763994            -0.35013437          61
## 29     -0.78933933         -0.093405722             0.69593361          75
## 30      0.28747433          0.282019388            -0.00545494          80
## 31     -0.01955842         -0.148474613            -0.12891620          84
##    wake_amount rem_amount sws_x_rem cue_minutes   rem_pct sleep_and_wake
## 1           25         23       276         9.5 35.384615             90
## 2           24          0         0        12.0  0.000000             90
## 3           10         17       408        15.5 21.250000             90
## 4           28         17       408        16.0 27.419355             90
## 5           39          2        32        15.0  3.921569             90
## 6            9         18       648        25.0 22.222222             90
## 7            9          9       333        23.0 11.111111             90
## 8           23          9        36         3.0 13.432836             90
## 9           11         24       552        18.5 30.379747             90
## 10          19         11       275        19.0 15.492958             90
## 11          22         10        70        12.5 14.705882             90
## 12           5         16       496        28.0 18.823529             90
## 13          16          0         0        32.5  0.000000             90
## 14          17         17       476        28.0 23.287671             90
## 15          11         11       363        36.0 13.924051             90
## 16          53          0         0        14.0  0.000000             90
## 17          38          9       198        19.0 17.307692             90
## 18          28          0         0         2.5  0.000000             90
## 19           2          9       216        26.0 10.227273             90
## 20          20         16       240        18.0 22.857143             90
## 21          13         10       450        44.0 12.987013             90
## 22          16         18       684        29.5 24.324324             90
## 23          13         22       836        37.5 28.571429             90
## 24          13         23       506        28.5 29.870130             90
## 25           6          0         0        40.0  0.000000             90
## 26          11         19       418         3.5 24.050633             90
## 27          25         24       480        27.0 36.923077             90
## 28          29          5        50        11.0  8.196721             90
## 29          15          0         0        33.5  0.000000             90
## 30          10         12       336        17.5 15.000000             90
## 31           6         14       224        23.5 16.666667             90
##    asleep_pct   cue_pct
## 1    72.22222 14.615385
## 2    73.33333 18.181818
## 3    88.88889 19.375000
## 4    68.88889 25.806452
## 5    56.66667 29.411765
## 6    90.00000 30.864198
## 7    90.00000 28.395062
## 8    74.44444  4.477612
## 9    87.77778 23.417722
## 10   78.88889 26.760563
## 11   75.55556 18.382353
## 12   94.44444 32.941176
## 13   82.22222 43.918919
## 14   81.11111 38.356164
## 15   87.77778 45.569620
## 16   41.11111 37.837838
## 17   57.77778 36.538462
## 18   68.88889  4.032258
## 19   97.77778 29.545455
## 20   77.77778 25.714286
## 21   85.55556 57.142857
## 22   82.22222 39.864865
## 23   85.55556 48.701299
## 24   85.55556 37.012987
## 25   93.33333 47.619048
## 26   87.77778  4.430380
## 27   72.22222 41.538462
## 28   67.77778 18.032787
## 29   83.33333 44.666667
## 30   88.88889 21.875000
## 31   93.33333 27.976190

Then I’m using the same ggplot as before to visualse the correlation

q2_cueextra_pic <- ggplot(q2_cueextra, aes(cue_pct, diff_biaschange))+
  geom_point()+
  geom_smooth(method = lm, se = F)

plot(q2_cueextra_pic)
## `geom_smooth()` using formula 'y ~ x'

It seems that it is more sloped than the cue_minutes x diff_biaschange graph. To test if the correlation is significant I used the same cor function as previously:

q2_cueextra <- lm(cue_pct ~ diff_biaschange, data = q2_cueextra)
summary(q2_cueextra)
## 
## Call:
## lm(formula = cue_pct ~ diff_biaschange, data = q2_cueextra)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -25.672  -8.277   1.298  10.678  27.951 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       29.398      2.411  12.192 6.16e-13 ***
## diff_biaschange   -4.766      3.829  -1.245    0.223    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.32 on 29 degrees of freedom
## Multiple R-squared:  0.0507, Adjusted R-squared:  0.01797 
## F-statistic: 1.549 on 1 and 29 DF,  p-value: 0.2233

Ok the p-value is lower but it still isn’t significant.

conclusions

The proportion of time asleep that the cue is played for is not correlated with differential bias change (F = 1.549, df = 29, p-value = 0.2233 , R^2 = 0.0507. Checking to results from the paper, these results match.

Challenges and Successess

Wow this week was tough. I had some troubles thinking of cool exploratory analysis questions that had a rationale to back them up. Hopefully what I’ve got is good enough.

I’m not so sure that my logic follows through for this question, especially since it’s a little different from my first question in the sense that it has a sort of follow up section with further analysis, and I have also omitted statistical analysis in part of the code (though I did provide a reason for it)

I’m proud of myself for working on a question even though I originally wanted to explore 2 more questions, it has been a busy and tough week for me.

Questions

  • I’m would like to know if the logic for this exploratory question works/makes sense as I do tend to get lost in what I’m doing sometimes if I spend a lot of time on it so a fresh pair of eyes would be great :)

  • When reporting the R squared value, which one is the appropriate one to use - the multiple R squared value or the adjusted R squared value?

Next week

  • Tidy up code by revising how to make my own function so I don’t have to type very similar code for very similar graphs 4 times, and adding aesthetics to my graphs

  • Make a start on my final exploratory analysis question and finish the code/my first attempt in time for the qna session