MLM stuff from FSHD617C assignment

About

Overview

This ADA uses data from the project “The effect of intranasal oxytocin on neural functioning in widow(er)s” (#1503744420, PI: Mary-Frances O’Connor, Co-PI: Brian Arizmendi). It was a double-blind randomized crossover study, designed to address conflicting accounts of motivated behavior in complicatedgrief using a novel approach-avoidance task variant (grief AAT; gAAT) that allows for comparison between participant-provided photos and nomothetic grief stimuli. Participants were 40 English-speaking adults ages 55-80, who experienced the death of a spouse or long-term romantic partner between six months and three years prior to their participation.

Task

Participants attended two identical experimental sessions, at which they received one of two intranasal sprays (treatment A/B) and then took part in an fMRI scan, which involved a structural sequence, the gAAT, and a resting state sequence.

For the gAAT, five stimulus categories were presented: (1) deceased spouse, (2) living loved one [attachment figure], (3) stranger, (4) generic grief-related scenes, and (5) neutral scenes. Photos provided by the participant were scanned and resized to a standard 750-pixel height and width, and framed in colored frames to match the stimuli used for non-idiographic categories (i.e., stranger, grief-related, neutral). Photos of the stranger were sex-matched to the participant’s partner.

During the task, participants were instructed to follow rules based on an irrelevant characteristic of the stimuli, namely, the color of the frame around the photo (e.g., “push the joystick when the frame is blue, pull when the frame is yellow”). The task was completed in two runs of approximately seven minutes each. In the second run, the instructions were reversed (e.g. “now pull the joystick for blue framed photos”). Order of instructions was counterbalanced across participants. Stimuli were presented via Inquisit 4 (2014), in a pseudorandomized order determined by genetic algorithm to optimize statistical power and psychological validity (Wager & Nichols, 2003). Each trial last 3000ms, with 2500 ms allowed for response. Intertrial interval was 500ms.

Participants completed the task using an MRI-compatible joystick. The stimuli were animated so that a joystick push would make the image smaller (as if they were pushing it away from them) and a joystick pull would make the image larger (as if they were bringing it nearer to them).

Variables

The primary dependent variable of interest for the present analysis is reaction time bias, computed as RTpush-RTpull. Negative bias scores (faster to push than pull) indicate relative avoidance bias, while positive scores (faster to pull than to push) indicate relative approach bias. Because the two runs were identical in every way except for the instruction provided, each trial had both a push RT and a pull RT (e.g., run1 trial1 was push, run2 trial1 was pull, so subtract RT on run2 trial 1 from run1 trial1 to get the difference - this is further elaborated in the data cleaning script below), unless they failed to reverse the instructions.

Main predictors of interest included trial number (in order to establish whether peoples’ response bias changed over the 7-minute run), stimulus (neutral, spouse, stranger, living loved one, death), and condition (treatment A/B - oxytocin or placebo [I’m still blinded]).

Participants also completed a number of self-report measures (state and trait) and provided biological samples for OXTR genotyping and cortisol assay, some of which were used as predictors.

Equations

Level 1:

We first performed MLM to estimate the effect of trial number (i.e., the random effect of time) on response bias (push RT-pull RT) on a given trial. The level-1 model estimated, for each participant j (j = 1–35), a regression line that predicted each participant’s response bias in milliseconds on each trial i from the trial number. This model was represented as follows:
[biasi]j = b0j + b1j [trialnumberi] + rij

where b0j, the intercept, is interpreted as participant j’s mean response bias for a given trial; b1j, the slope, represents the effect of trial number on response bias for each participant j (positive coefficients represent the predictive magnitude of trial number on response bias for a given participant); and rij is the residual error term.

Level 2:

The level-2 models estimated the average effects for the entire sample. The models were represented as follows:

b0j = γ00 + μ0j

b1j = γ10 + μ1j

where the intercept, γ00, is interpreted as the average response bias for the entire sample; γ10 is the average effect of represents the effect of trial number on response bias for each participant j (positive coefficients represent the predictive magnitude of trial number on response bias for the sample as a whole), and thus, the primary estimate of interest; and μ0j and μ1j are the residual error terms.

Adding the fixed effect of stimulus type resulted in the following model:

b0j = γ00 + γ01(stimulusi) + μ0j

b1j = γ10 + γ11(stimulusi) + μ1j

Data cleaning

Note: Went back to the raw data due to some inconsistencies in the OT_ADAA_Omnibus.xlsx file from BA that I found.

Pre-R steps:

  1. Removed all .iqdat files under 25KB or so in order to filter out pilot data and false starts.
  2. Removed non-study data (i.e., tests and OT undergrad files).
  3. Organized raw .iqdat files for each visit by run 1/run 2 (two separate folders) based on date/time stamp.

    In Terminal, within each of the folders in turn…

  4. Changed file extensions to .tsv (tab-separated) from .iqdat: find . -iname "*.iqdat" -exec bash -c 'mv "$0" "${0%\.iqdat}.tsv"' {} \;
  5. Merged the files: cat *.tsv > OT_gAAT_run-1.tsv (or OT_gAAT_run-2.tsv) 5. Merged the files: mlr --tsvlite cat *.tsv > OT_gAAT_run-1.tsv (or OT_gAAT_run-2.tsv)
      5a. mlr --tsvlite drops the header from all but the first file, so column names are not repeated throughout the dataset.
Notes:
  • D117: Had 3 visits due to scanner technical issues. Use “_b" and “_c" as visit 1 and visit 2 respectively.
  • D114: Has multiple files from visit 2. Use the last two.
  • D135: Use files from 6/06/16 and 6/17/18 (scanner issues on 06/13/16). Use “_b" and “_c" as visit 1 and visit 2 respectively.
  • D147: Visit 1 run 2 is partial data for some reason (I do have notes that she switched back to PUSH-YELLOW during run 2, but did not re-run because running way over time.)

Creating the MLM dataset

Import data

library(tidyverse)
filter <- dplyr::filter
select <- dplyr::select
# read in the data (only importing a subset of columns)
raw_r1 <- data.frame(read_tsv("~/Desktop/OT-gAAT_MLM-data/Run1_tsv/OT_gAAT_run-1.tsv", col_types=cols_only(
  date = col_integer(),
  time = col_character(),
  subject = col_character(),
  blockcode = col_character(),
  blocknum = col_character(),
  trialcode = col_character(),
  values.trialcode = col_character(),
  values.stimulus = col_character(),
  values.initialresponse = col_character(),
  values.RT = col_number()
)))
number of columns of result is not a multiple of vector length (arg 1)154 parsing failures.
row # A tibble: 5 x 5 col     row col       expected   actual file                                                    expected   <int> <chr>     <chr>      <chr>  <chr>                                                   actual 1   327 date      an integer date   '~/Desktop/OT-gAAT_MLM-data/Run1_tsv/OT_gAAT_run-1.tsv' file 2   327 values.RT a number   .      '~/Desktop/OT-gAAT_MLM-data/Run1_tsv/OT_gAAT_run-1.tsv' row 3   653 date      an integer date   '~/Desktop/OT-gAAT_MLM-data/Run1_tsv/OT_gAAT_run-1.tsv' col 4   653 values.RT a number   .      '~/Desktop/OT-gAAT_MLM-data/Run1_tsv/OT_gAAT_run-1.tsv' expected 5   974 date      an integer date   '~/Desktop/OT-gAAT_MLM-data/Run1_tsv/OT_gAAT_run-1.tsv'
... ................................. ... ........................................................................................... ........ ........................................................................................................................................................................................................................ ...... ........................................................................................................... .... ........................................................................................................... ... ........................................................................................................... ... ........................................................................................................... ........ ...........................................................................................................
See problems(...) for more details.
raw_r2 <- data.frame(read_tsv("~/Desktop/OT-gAAT_MLM-data/Run2_tsv/OT_gAAT_run-2.tsv", col_types=cols_only(
  date = col_integer(),
  time = col_character(),
  subject = col_character(),
  blockcode = col_character(),
  blocknum = col_character(),
  trialcode = col_character(),
  values.trialcode = col_character(),
  values.stimulus = col_character(),
  values.initialresponse = col_character(),
  values.RT = col_number()
)))
number of columns of result is not a multiple of vector length (arg 1)154 parsing failures.
row # A tibble: 5 x 5 col     row col       expected   actual file                                                    expected   <int> <chr>     <chr>      <chr>  <chr>                                                   actual 1   324 date      an integer date   '~/Desktop/OT-gAAT_MLM-data/Run2_tsv/OT_gAAT_run-2.tsv' file 2   324 values.RT a number   .      '~/Desktop/OT-gAAT_MLM-data/Run2_tsv/OT_gAAT_run-2.tsv' row 3   643 date      an integer date   '~/Desktop/OT-gAAT_MLM-data/Run2_tsv/OT_gAAT_run-2.tsv' col 4   643 values.RT a number   .      '~/Desktop/OT-gAAT_MLM-data/Run2_tsv/OT_gAAT_run-2.tsv' expected 5   954 date      an integer date   '~/Desktop/OT-gAAT_MLM-data/Run2_tsv/OT_gAAT_run-2.tsv'
... ................................. ... ........................................................................................... ........ ........................................................................................................................................................................................................................ ...... ........................................................................................................... .... ........................................................................................................... ... ........................................................................................................... ... ........................................................................................................... ........ ...........................................................................................................
See problems(...) for more details.
# header line repeats when I used `cat` to merge, so get those out of there
raw_r1 <- raw_r1 %>% filter(!is.na(date))
raw_r2 <- raw_r2 %>% filter(!is.na(date))
unique(raw_r1$subject)
 [1] "D101"   "D101_2" "D102"   "D102_B" "D103"   "D103_b" "D104"   "D104_b" "D105"   "D105_b" "D107"   "D107_b" "D110"   "D110_b" "D113"   "D113_b" "D114"   "D114_b" "D115"  
[20] "D115_b" "D117_b" "D117_c" "test"   "D118_b" "D119"   "D119_b" "D120"   "D120_b" "D121"   "D121_b" "D122"   "D122_b" "D123"   "D123_b" "D125"   "D125_b" "D126"   "D126_b"
[39] "D127"   "D127_b" "D128"   "D128_b" "D129"   "D129_b" "D130"   "D131"   "D131_b" "D132"   "D132_b" "D133"   "D133_b" "D134"   "D134_b" "D135"   "D135_c" "D137"   "D137_b"
[58] "D138"   "D138_b" "D139"   "D139_b" "D140"   "D140_b" "D141"   "D141_b" "D142"   "D144"   "D144_b" "D145"   "D145_b" "D146"   "D146_b" "D147"   "D147_b" "D148"   "D148_b"
# D118 is showing up as "test" for some reason (checked against the .iqdat file), and D130_b is missing
# both of D130's and D142's visits were entered as "D130"/"D142", so change the subject ID for the visit on 052416 to D130_b and on 
raw_r1$subject[raw_r1$date == 020916] <- "D118"
raw_r1$subject[raw_r1$date == 052416] <- "D130_b"
raw_r1$subject[raw_r1$date == 111516] <- "D142_b"
# fixing some other stuff in a similar vein
raw_r1$subject[raw_r1$subject == "D101_2"] <- "D101_b"
raw_r1$subject[raw_r1$subject == "D102_B"] <- "D102_b"
raw_r1$subject[raw_r1$subject == "D117_b"] <- "D117"
raw_r1$subject[raw_r1$subject == "D117_c"] <- "D117_b"
raw_r1$subject[raw_r1$subject == "D135_c"] <- "D135_b"
unique(raw_r1$subject) # 78 unique - that is correct
 [1] "D101"   "D101_b" "D102"   "D102_b" "D103"   "D103_b" "D104"   "D104_b" "D105"   "D105_b" "D107"   "D107_b" "D110"   "D110_b" "D113"   "D113_b" "D114"   "D114_b" "D115"  
[20] "D115_b" "D117"   "D117_b" "D118"   "D118_b" "D119"   "D119_b" "D120"   "D120_b" "D121"   "D121_b" "D122"   "D122_b" "D123"   "D123_b" "D125"   "D125_b" "D126"   "D126_b"
[39] "D127"   "D127_b" "D128"   "D128_b" "D129"   "D129_b" "D130"   "D130_b" "D131"   "D131_b" "D132"   "D132_b" "D133"   "D133_b" "D134"   "D134_b" "D135"   "D135_b" "D137"  
[58] "D137_b" "D138"   "D138_b" "D139"   "D139_b" "D140"   "D140_b" "D141"   "D141_b" "D142"   "D142_b" "D144"   "D144_b" "D145"   "D145_b" "D146"   "D146_b" "D147"   "D147_b"
[77] "D148"   "D148_b"
unique(raw_r2$subject)
 [1] "D101"     "D101_2Y"  "D102"     "D102_B"   "D103"     "D103_b"   "D104"     "D104_b"   "D105"     "D105_b"   "D107"     "D107_B_2" "D110"     "D110_b"   "D113"     "D113_b"  
[17] "D114"     "D114_b"   "D115"     "D115_b"   "D117_b"   "D117_c"   "D118"     "D118_b"   "D119"     "D119_b"   "D120"     "D120_b"   "D121"     "D121_b"   "D122"     "D122_b"  
[33] "D123"     "D123_b"   "D125"     "D125_b"   "D126"     "D126_B"   "D127"     "D127_b"   "D128"     "D128_b"   "D129"     "D129_b"   "D130"     "D130_b"   "D131"     "D131_b"  
[49] "D132"     "D132_b"   "D133"     "D133_b"   "D134"     "D134_b"   "D135"     "D135_c"   "D137"     "D137_b"   "D138"     "D138_b"   "D139"     "D139_b"   "D140"     "D140_b"  
[65] "D141"     "D141_b"   "D142"     "D144"     "D144_b"   "D145"     "D145_b"   "D146"     "D146_b"   "D147"     "D147_b"   "D148"     "D148_b"  
# some similar issues to fix
raw_r2$subject[raw_r2$subject == "D101_2Y"] <- "D101_b"
raw_r2$subject[raw_r2$subject == "D102_B"] <- "D102_b"
raw_r2$subject[raw_r2$subject == "D107_B_2"] <- "D107_b"
raw_r2$subject[raw_r2$subject == "D117_b"] <- "D117"
raw_r2$subject[raw_r2$subject == "D117_c"] <- "D117_b"
raw_r2$subject[raw_r2$subject == "D126_B"] <- "D126_b"
raw_r2$subject[raw_r2$subject == "D135_c"] <- "D135_b"
raw_r2$subject[raw_r2$date == 111516] <- "D142_b"
unique(raw_r2$subject) # 78 unique - that is correct
 [1] "D101"   "D101_b" "D102"   "D102_b" "D103"   "D103_b" "D104"   "D104_b" "D105"   "D105_b" "D107"   "D107_b" "D110"   "D110_b" "D113"   "D113_b" "D114"   "D114_b" "D115"  
[20] "D115_b" "D117"   "D117_b" "D118"   "D118_b" "D119"   "D119_b" "D120"   "D120_b" "D121"   "D121_b" "D122"   "D122_b" "D123"   "D123_b" "D125"   "D125_b" "D126"   "D126_b"
[39] "D127"   "D127_b" "D128"   "D128_b" "D129"   "D129_b" "D130"   "D130_b" "D131"   "D131_b" "D132"   "D132_b" "D133"   "D133_b" "D134"   "D134_b" "D135"   "D135_b" "D137"  
[58] "D137_b" "D138"   "D138_b" "D139"   "D139_b" "D140"   "D140_b" "D141"   "D141_b" "D142"   "D142_b" "D144"   "D144_b" "D145"   "D145_b" "D146"   "D146_b" "D147"   "D147_b"
[77] "D148"   "D148_b"
str(raw_r1)
'data.frame':   25746 obs. of  10 variables:
 $ date                  : int  101515 101515 101515 101515 101515 101515 101515 101515 101515 101515 ...
 $ time                  : chr  "15:38:09" "15:38:09" "15:38:09" "15:38:09" ...
 $ subject               : chr  "D101" "D101" "D101" "D101" ...
 $ blockcode             : chr  "AATpracticeB" "AATpracticeB" "AATpracticeB" "AATpracticeB" ...
 $ blocknum              : chr  "2" "2" "2" "2" ...
 $ trialcode             : chr  "practicetrial_B" "practiceincrease" "practiceincrease" "practiceincrease" ...
 $ values.trialcode      : chr  "practicetrial_B" "practicetrial_B" "practicetrial_B" "practicetrial_B" ...
 $ values.stimulus       : chr  "grayrectangle_Y.jpg" "grayrectangle_Y.jpg" "grayrectangle_Y.jpg" "grayrectangle_Y.jpg" ...
 $ values.initialresponse: chr  "PULL" "PULL" "PULL" "PULL" ...
 $ values.RT             : num  1411 1411 1411 1411 1411 ...
str(raw_r2)
'data.frame':   25457 obs. of  10 variables:
 $ date                  : int  101515 101515 101515 101515 101515 101515 101515 101515 101515 101515 ...
 $ time                  : chr  "15:48:12" "15:48:12" "15:48:12" "15:48:12" ...
 $ subject               : chr  "D101" "D101" "D101" "D101" ...
 $ blockcode             : chr  "AATpracticeA" "AATpracticeA" "AATpracticeA" "AATpracticeA" ...
 $ blocknum              : chr  "2" "2" "2" "2" ...
 $ trialcode             : chr  "practicetrial_A" "practicedecrease" "practicedecrease" "practicedecrease" ...
 $ values.trialcode      : chr  "practicetrial_A" "practicetrial_A" "practicetrial_A" "practicetrial_A" ...
 $ values.stimulus       : chr  "grayrectangle_B.jpg" "grayrectangle_B.jpg" "grayrectangle_B.jpg" "grayrectangle_B.jpg" ...
 $ values.initialresponse: chr  "PUSH" "PUSH" "PUSH" "PUSH" ...
 $ values.RT             : num  1472 1472 1472 1472 1472 ...
# take out ITIs
unique(raw_r1$trialcode)
 [1] "practicetrial_B"      "practiceincrease"     "endincrease_practice" "InterTrialInterval"   "practicetrial_A"      "practicedecrease"     "enddecrease_practice"
 [8] "AAT_1"                "AAT_5"                "AAT_6"                "AAT_3"                "AAT_7"                "AAT_4"                "AAT_8"               
[15] "AAT_9"                "AAT_2"                "AAT_10"               "AAT_11"               "AAT_12"              
mlm_r1 <- filter(raw_r1, grepl("^A",trialcode))
unique(mlm_r1$trialcode)
 [1] "AAT_1"  "AAT_5"  "AAT_6"  "AAT_3"  "AAT_7"  "AAT_4"  "AAT_8"  "AAT_9"  "AAT_2"  "AAT_10" "AAT_11" "AAT_12"
unique(raw_r2$trialcode)
 [1] "practicetrial_A"      "practicedecrease"     "enddecrease_practice" "InterTrialInterval"   "practicetrial_B"      "practiceincrease"     "endincrease_practice"
 [8] "AAT_1"                "AAT_5"                "AAT_6"                "AAT_3"                "AAT_7"                "AAT_4"                "AAT_8"               
[15] "AAT_9"                "AAT_2"                "AAT_10"               "AAT_11"               "AAT_12"              
mlm_r2 <- filter(raw_r2, grepl("^A",trialcode))
unique(mlm_r2$trialcode)
 [1] "AAT_1"  "AAT_5"  "AAT_6"  "AAT_3"  "AAT_7"  "AAT_4"  "AAT_8"  "AAT_9"  "AAT_2"  "AAT_10" "AAT_11" "AAT_12"
# add a "trialnum" variable, grouped by ID
mlm_r1 <- mlm_r1 %>% group_by(subject) %>% mutate(trialnum = row_number())
View(mlm_r2)
mlm_r2 <- mlm_r2 %>% group_by(subject) %>% mutate(trialnum = row_number())
View(mlm_r2)
mlm_r1 %>% group_by(trialnum) %>% count(trialnum) # everyone has 144 trials, hooray!
mlm_r2 %>% group_by(trialnum) %>% count(trialnum) # everyone has 144 trials, one person has 177
mlm_r2 %>% group_by(subject) %>% count() # that person is D122_b
# no clue why that happened, although I do vaguely remember we had one session where the task seemed to go on and on and we force quit it?
# remove trials 145-177 for D122_b
mlm_r2 <- filter(mlm_r2, trialnum <=144)
mlm_r2 %>% group_by(trialnum) %>% count(trialnum)
# now everyone has 144 trials
# add a column for "run"
mlm_r1$run <- "1"
mlm_r2$run <- "2"
# add a column for "visit"
mlm_r1$visit <- as.factor(ifelse(grepl("*_b", mlm_r1$subject), "2", "1"))
mlm_r2$visit <- as.factor(ifelse(grepl("*_b", mlm_r2$subject), "2", "1"))
mlm_r1 %>% group_by(subject) %>% count(visit) # all looks good
mlm_r2 %>% group_by(subject) %>% count(visit) # all looks good
# now remove the "_b" from visit 2 IDs
mlm_r1 <- mlm_r1 %>% ungroup() %>%
  mutate(subject = str_replace(subject, "_b", ""))
unique(mlm_r1$subject)
 [1] "D101" "D102" "D103" "D104" "D105" "D107" "D110" "D113" "D114" "D115" "D117" "D118" "D119" "D120" "D121" "D122" "D123" "D125" "D126" "D127" "D128" "D129" "D130" "D131" "D132"
[26] "D133" "D134" "D135" "D137" "D138" "D139" "D140" "D141" "D142" "D144" "D145" "D146" "D147" "D148"
mlm_r2 <- mlm_r2 %>% ungroup() %>%
  mutate(subject = str_replace(subject, "_b", ""))
unique(mlm_r2$subject)
 [1] "D101" "D102" "D103" "D104" "D105" "D107" "D110" "D113" "D114" "D115" "D117" "D118" "D119" "D120" "D121" "D122" "D123" "D125" "D126" "D127" "D128" "D129" "D130" "D131" "D132"
[26] "D133" "D134" "D135" "D137" "D138" "D139" "D140" "D141" "D142" "D144" "D145" "D146" "D147" "D148"
# merge data from the 2 runs 
mlm_bx <- bind_rows(mlm_r1, mlm_r2)
# split up by visit
mlm_v1 <- mlm_bx %>% filter(visit == "1")
mlm_v2 <- mlm_bx %>% filter(visit == "2")
mlm_v1 %>% group_by(trialnum) %>% count(trialnum) # everyone has 144 trials, hooray!
mlm_v2 %>% group_by(trialnum) %>% count(trialnum) # everyone has 144 trials, hooray!
# add columns for "cond" at visit 1 and visit 2 [condition: A or B]
# load in the randomization data
randomize <- readRDS("~/Dropbox/GLASS Lab/OT Study/data/cleaned-data/randomization_clean.rds")
head(randomize)
mlm_v1 <- mlm_v1 %>%
  rename("ID" = subject)
mlm_v2 <- mlm_v2 %>%
  rename("ID" = subject)
# get the IDs for everyone who got treatment A at visit 1 
IDs_v1_txA <- randomize %>% filter(tx_v1 == "A")
View(IDs_v1_txA)
# ifelse statement: if ID = any of those listed in txA_list, make cond_v1 = "A", else make it "B"
mlm_v1 <- mlm_v1 %>%
  mutate(cond_v1 = as.factor(ifelse(ID == "D101" | ID == "D105" | ID == "D109" | ID == "D110" | ID == "D114" | ID == "D115" | ID == "D116" | ID == "D118" | ID == "D119" | ID == "D120" | ID == "D123" | ID == "D124" | ID == "D126" | ID == "D128" | ID == "D132" | ID == "D135" | ID == "D136" | ID == "D137" | ID == "D138" | ID == "D140" | ID == "D141" | ID == "D145" | ID == "D148" | ID == "D149" | ID == "D117", "A", "B")), cond_v2 = as.factor(ifelse(ID == "D101" | ID == "D105" | ID == "D109" | ID == "D110" | ID == "D114" | ID == "D115" | ID == "D116" | ID == "D118" | ID == "D119" | ID == "D120" | ID == "D123" | ID == "D124" | ID == "D126" | ID == "D128" | ID == "D132" | ID == "D135" | ID == "D136" | ID == "D137" | ID == "D138" | ID == "D140" | ID == "D141" | ID == "D145" | ID == "D148" | ID == "D149" | ID == "D117", "B", "A")))
head(mlm_v1)
# ifelse statement: if ID = any of those listed in txA_list, make cond_v2 = "B", else make it "A" (people who were A at visit 1 should be B for visit 2, and vice versa)
mlm_v2 <- mlm_v2 %>%
  mutate(cond_v2 = as.factor(ifelse(ID == "D101" | ID == "D105" | ID == "D109" | ID == "D110" | ID == "D114" | ID == "D115" | ID == "D116" | ID == "D118" | ID == "D119" | ID == "D120" | ID == "D123" | ID == "D124" | ID == "D126" | ID == "D128" | ID == "D132" | ID == "D135" | ID == "D136" | ID == "D137" | ID == "D138" | ID == "D140" | ID == "D141" | ID == "D145" | ID == "D148" | ID == "D149" | ID == "D117", "B", "A")), cond_v1 = as.factor(ifelse(ID == "D101" | ID == "D105" | ID == "D109" | ID == "D110" | ID == "D114" | ID == "D115" | ID == "D116" | ID == "D118" | ID == "D119" | ID == "D120" | ID == "D123" | ID == "D124" | ID == "D126" | ID == "D128" | ID == "D132" | ID == "D135" | ID == "D136" | ID == "D137" | ID == "D138" | ID == "D140" | ID == "D141" | ID == "D145" | ID == "D148" | ID == "D149" | ID == "D117", "A", "B")))
levels(mlm_v1$cond_v1)
[1] "A" "B"
levels(mlm_v2$cond_v2)
[1] "A" "B"
head(mlm_v1)
  
IDs_v1_txA$ID
 [1] "D101" "D105" "D109" "D110" "D114" "D115" "D116" "D118" "D119" "D120" "D123" "D124" "D126" "D128" "D132" "D135" "D136" "D137" "D138" "D140" "D141" "D145" "D148" "D149" "D117"
mlm_v1 %>% group_by(cond_v1) %>% count(ID) # IDs and cond match txA_list
mlm_v2 %>% group_by(cond_v2) %>% count(ID) # IDs and cond match txA_list
mlm_bx <- bind_rows(mlm_v1,mlm_v2)
View(mlm_bx)
# save it
saveRDS(mlm_bx, "~/Dropbox/2018Fall/FSHD617C/mlm_bx.rds")
# rename some variables
colnames(mlm_bx)
 [1] "date"                   "time"                   "ID"                     "blockcode"              "blocknum"               "trialcode"              "values.trialcode"      
 [8] "values.stimulus"        "values.initialresponse" "values.RT"              "trialnum"               "run"                    "visit"                  "cond_v1"               
[15] "cond_v2"               
mlm_bx <- mlm_bx %>% rename("push_pull" = values.initialresponse,
                  "gAAT_RT" = values.RT)
head(mlm_bx)
# add a new variable for condition
mlm_bx1 <- mlm_bx %>% mutate(cond = as.factor(ifelse(cond_v1 == "A" & visit == "1", "A",
                              ifelse(cond_v2 == "A" & visit == "2", "A",
                                    ifelse(cond_v1 == "B" & visit == "1", "B",
                                           ifelse(cond_v2 == "B" & visit == "2", "B", NA))))))
View(mlm_bx1) # check that it coded the "cond" column correctly      
# add a new variable for stimulus category
# in values.stimulus:
# 1 = spouse, 2 = living loved one/WHOTO, 3 = stranger, 4 = nomothetic death-related, 5 = neutral images
sort(unique(mlm_bx1$values.stimulus))
 [1] "1B_1.jpg" "1B_2.jpg" "1B_3.jpg" "1Y_1.jpg" "1Y_2.jpg" "1Y_3.jpg" "2B_1.jpg" "2B_2.jpg" "2B_3.jpg" "2Y_1.jpg" "2Y_2.jpg" "2Y_3.jpg" "3B_1.jpg" "3B_2.jpg" "3B_3.jpg" "3Y_1.jpg"
[17] "3Y_2.jpg" "3Y_3.jpg" "4B_1.jpg" "4B_2.jpg" "4B_3.jpg" "4Y_1.jpg" "4Y_2.jpg" "4Y_3.jpg" "5B_1.jpg" "5B_2.jpg" "5B_3.jpg" "5Y_1.jpg" "5Y_2.jpg" "5Y_3.jpg"
mlm_bx2 <- mlm_bx1 %>%
  mutate(stim = as.factor(ifelse(values.stimulus == "1B_1.jpg"|values.stimulus == "1B_2.jpg"|values.stimulus == "1B_3.jpg"|values.stimulus == "1Y_1.jpg"|values.stimulus == "1Y_2.jpg"|values.stimulus == "1Y_3.jpg", "spouse", 
                       ifelse(values.stimulus == "2B_1.jpg"|values.stimulus == "2B_2.jpg"|values.stimulus == "2B_3.jpg"|values.stimulus == "2Y_1.jpg"|values.stimulus == "2Y_2.jpg"|values.stimulus == "2Y_3.jpg", "living",
                              ifelse(values.stimulus == "3B_1.jpg"|values.stimulus == "3B_2.jpg"|values.stimulus == "3B_3.jpg"|values.stimulus == "3Y_1.jpg"|values.stimulus == "3Y_2.jpg"|values.stimulus == "3Y_3.jpg", "stranger",
                                     ifelse(values.stimulus == "4B_1.jpg"|values.stimulus == "4B_2.jpg"|values.stimulus == "4B_3.jpg"|values.stimulus == "4Y_1.jpg"|values.stimulus == "4Y_2.jpg"|values.stimulus == "4Y_3.jpg", "death",
                                            ifelse(values.stimulus == "5B_1.jpg"|values.stimulus == "5B_2.jpg"|values.stimulus == "5B_3.jpg"|values.stimulus == "5Y_1.jpg"|values.stimulus == "5Y_2.jpg"|values.stimulus == "5Y_3.jpg", "neutral", NA)))))))
levels(mlm_bx2$stim)
[1] "death"    "living"   "neutral"  "spouse"   "stranger"
mlm_bx2$stim <- relevel(mlm_bx2$stim, ref = "neutral") # make neutral the reference level
                       
# save it
saveRDS(mlm_bx2, "~/Dropbox/2018Fall/FSHD617C/mlm_bx2.rds")

Bias scores

Separate out push minus pull, then subtract run1 from run2.

mlm_r1.1 <- mlm_bx2 %>% filter(run == "1")
tail(mlm_r1.1) # does it line up?
mlm_r2.1 <- mlm_bx2 %>% filter(run == "2")
tail(mlm_r2.1) 
p99r1 <- quantile(mlm_r1.1$gAAT_RT, 0.99) 
p01r1 <- quantile(mlm_r1.1$gAAT_RT, 0.01) 
p99r1 # 99th percentile RTs >= 1713.48ms 
    99% 
1713.38 
p01r1 # 1st percentile RTs <= 472ms
 1% 
472 
p99r2 <- quantile(mlm_r2.1$gAAT_RT, 0.99) 
p01r2 <- quantile(mlm_r2.1$gAAT_RT, 0.01) # 1st percentile RTs <= 463ms
p99r2 # 99th percentile RTs >= 1712.45ms 
    99% 
1712.45 
p01r2 # 1st percentile RTs <= 463ms
 1% 
463 
mlm_r1.2 <- mlm_r1.1 %>%
  mutate(outlier_RT = ifelse(gAAT_RT <= p01r1 | gAAT_RT >= p99r1, 1, 0))
table(mlm_r1.2$outlier_RT) 

    0     1 
11005   227 
227/11005 # 227 trials will be dropped, 11005 will be retained...about 2% of trials.
[1] 0.02062699
mlm_r2.2 <- mlm_r2.1 %>%
  mutate(outlier_RT = ifelse(gAAT_RT <= p01r2 | gAAT_RT >= p99r2, 1, 0))
table(mlm_r2.2$outlier_RT) 

    0     1 
11002   230 
230/11002 # 230 trials will be dropped, 11002 will be retained...about 2% of trials.
[1] 0.02090529
mlm_r1r2 <- bind_cols(mlm_r1.2, mlm_r2.2) # 11232 trials total.
head(mlm_r1r2) # make sure that trials in the two runs line up post-merge
tail(mlm_r1r2) # make sure that trials in the two runs line up post-merge
# get rid of any trials with outliers in either run1 or run2 
# (need run1/run2 to be the same length so that the trials match up)
mlm_r1r2.1 <- mlm_r1r2 %>% filter(outlier_RT == "0" & outlier_RT1 == "0") 
count(mlm_r1r2.1) # 10790 obs.
10790/11232 # 96% of trials retained = 4% of trials dropped. That feels acceptable.
[1] 0.9606481
push_pull <- mlm_r1r2.1 %>% filter(push_pull == "PUSH" & push_pull1 == "PULL") %>% mutate(bias = (gAAT_RT-gAAT_RT1))
pull_push <- mlm_r1r2.1 %>% filter(push_pull == "PULL" & push_pull1 == "PUSH") %>% mutate(bias = (gAAT_RT1-gAAT_RT))
mlm_pp <- bind_rows(push_pull,pull_push)
View(mlm_pp)

Transformations

library(psych)
describe(mlm_pp$bias) # skewness is 0.1, kurtosis is 1.69
   vars    n  mean     sd median trimmed    mad   min  max range skew kurtosis   se
X1    1 9880 11.83 246.66      8    9.97 192.74 -1157 1119  2276  0.1     1.69 2.48
hist(mlm_pp$bias)

qqnorm(mlm_pp$bias)

ggplot(mlm_pp, aes(sample=bias))+stat_qq()+stat_qq_line() # not a good look at all...

# install.packages("bestNormalize")
# what's the best transformation for my data?
library(bestNormalize)
orderNorm_bias <- bestNormalize(mlm_pp$bias) # suggested orderNorm transform as best
boxcox  did not work;  Error in estimate_boxcox_lambda(x, ...) : x must be positive
exp_x  did not work;  Error in exp_x(standardize = TRUE, warn = TRUE, x = c(26, -187, 45, 782,  : 
  Transformation finite for less than 3 x values
Ties in data, Normal distribution not guaranteed

|===                                                                                                                                                          |  2% ~3 s remaining     
|======                                                                                                                                                       |  4% ~3 s remaining     
|=========                                                                                                                                                    |  6% ~3 s remaining     
|============                                                                                                                                                 |  8% ~3 s remaining     
|===============                                                                                                                                              | 10% ~3 s remaining     
|==================                                                                                                                                           | 12% ~3 s remaining     
|=====================                                                                                                                                        | 14% ~3 s remaining     
|=========================                                                                                                                                    | 16% ~3 s remaining     
|============================                                                                                                                                 | 18% ~3 s remaining     
|===============================                                                                                                                              | 20% ~3 s remaining     
|==================================                                                                                                                           | 22% ~3 s remaining     
|=====================================                                                                                                                        | 24% ~3 s remaining     
|========================================                                                                                                                     | 26% ~3 s remaining     
|===========================================                                                                                                                  | 28% ~3 s remaining     
|===============================================                                                                                                              | 30% ~3 s remaining     
|==================================================                                                                                                           | 32% ~3 s remaining     
|=====================================================                                                                                                        | 34% ~2 s remaining     
|========================================================                                                                                                     | 36% ~2 s remaining     
|===========================================================                                                                                                  | 38% ~2 s remaining     
|==============================================================                                                                                               | 40% ~2 s remaining     
|=================================================================                                                                                            | 42% ~2 s remaining     
|=====================================================================                                                                                        | 44% ~2 s remaining     
|========================================================================                                                                                     | 46% ~2 s remaining     
|===========================================================================                                                                                  | 48% ~2 s remaining     
|==============================================================================                                                                               | 50% ~2 s remaining     
|=================================================================================                                                                            | 52% ~2 s remaining     
|====================================================================================                                                                         | 54% ~2 s remaining     
|=======================================================================================                                                                      | 56% ~2 s remaining     
|===========================================================================================                                                                  | 58% ~2 s remaining     
|==============================================================================================                                                               | 60% ~2 s remaining     
|=================================================================================================                                                            | 62% ~1 s remaining     
|====================================================================================================                                                         | 64% ~1 s remaining     
|=======================================================================================================                                                      | 66% ~1 s remaining     
|==========================================================================================================                                                   | 68% ~1 s remaining     
|=============================================================================================================                                                | 70% ~1 s remaining     
|=================================================================================================================                                            | 72% ~1 s remaining     
|====================================================================================================================                                         | 74% ~1 s remaining     
|=======================================================================================================================                                      | 76% ~1 s remaining     
|==========================================================================================================================                                   | 78% ~1 s remaining     
|=============================================================================================================================                                | 80% ~1 s remaining     
|================================================================================================================================                             | 82% ~1 s remaining     
|===================================================================================================================================                          | 84% ~1 s remaining     
|=======================================================================================================================================                      | 86% ~1 s remaining     
|==========================================================================================================================================                   | 88% ~0 s remaining     
|=============================================================================================================================================                | 90% ~0 s remaining     
|================================================================================================================================================             | 92% ~0 s remaining     
|===================================================================================================================================================          | 94% ~0 s remaining     
|======================================================================================================================================================       | 96% ~0 s remaining     
|=========================================================================================================================================================    | 98% ~0 s remaining     
|=============================================================================================================================================================|100% ~0 s remaining     
mlm_pp$bias_t <- orderNorm_bias$x.t # pull x.t from orderNorm_bias and assign it back into dataset
describe(mlm_pp$bias_t)
   vars    n mean sd median trimmed mad   min  max range skew kurtosis   se
X1    1 9880    0  1      0       0   1 -3.61 3.89   7.5    0    -0.02 0.01
hist(mlm_pp$bias_t)

qqnorm(mlm_pp$bias_t)

ggplot(mlm_pp, aes(sample=bias_t))+stat_qq()+stat_qq_line() # so much better!

# back-transformation, just to show how to do it:
p <- predict(orderNorm_bias)
    x2 <- predict(orderNorm_bias, newdata = p, inverse = TRUE) # x2 is the back-transformed data
    all.equal(x2, mlm_pp$bias_t) 
[1] "Mean relative difference: 0.9955923"
mlm_bx3 <- mlm_pp
saveRDS(mlm_bx3, "~/Dropbox/2018Fall/FSHD617C/mlm_bx3.rds")

Checking the data

table <- mlm_bx3 %>% group_by(ID) %>% group_by(cond) %>% count(ID) 
table
# missing data...
# cond A: D101 (3 trials), D141 (18 trials), D148 (19 trials)
# cond B:  D147 (1 trial)
describe(table$n)
   vars  n   mean    sd median trimmed  mad min max range  skew kurtosis   se
X1    1 78 126.67 28.35    134  133.11 5.93   1 144   143 -3.58    12.05 3.21
ggplot(table, aes(sample=n))+stat_qq()+stat_qq_line() # can see those 4 on the QQ plot

# where are the trials getting dropped?
mlm_r1r2 %>% group_by(ID) %>% group_by(cond) %>% count(ID) # everyone has 144 here
mlm_r1r2.1 %>% group_by(ID) %>% group_by(cond) %>% count(ID) 
# after getting rid of outliers, most people missing a few trials (D107 cond A/B and D113 cond A missing >20)
mlm_r1r2.1 %>% filter(ID=="D101" & cond=="A")
mlm_r1r2.1 %>% filter(ID=="D141" & cond=="A")
mlm_r1r2.1 %>% filter(ID=="D148" & cond=="A")
mlm_r1r2.1 %>% filter(ID=="D147" & cond=="B")
# so trials are getting dropped because they didn't effectively reverse the instructions or they reverted to the old instructions
# in essence, this removes error trials
# create a "drop" variable for these 4, just in case...
mlm_bx3 <- mlm_bx3 %>% mutate(low_n_trials = ifelse(ID == "D101" | ID == "D141" | ID == "D147" | ID == "D148", 1, 0))
# check whether the rows line up
notlinedup <- mlm_bx3 %>% filter(stim != stim1)
notlinedup
notlinedup <- mlm_bx3 %>% filter(trialcode != trialcode1)
notlinedup
notlinedup <- mlm_bx3 %>% filter(trialnum != trialnum1)
notlinedup
# 0 rows returned; all the rows match
head(mlm_bx3)
saveRDS(mlm_bx3, "~/Dropbox/2018Fall/FSHD617C/mlm_bx3.rds")

Add time-invariant variables

Next step is to add demographics and other variables that we might potentially want to include in the model but that do not change over time. Looking at the codebook for the master dataset, these include:

Demographics:

  • age
  • age_yrs
  • sex_m
  • ethnicity_hisp
  • race
  • timesincedeath

Genetic variables:

  • rs2254298
  • rs2268498
  • rs53576
  • rs2254298_A.carrier
  • rs2268498_C.carrier
  • rs53576_A.carrier

Self-report measures:

  • tot_icg [grief]
  • tot_bdi [depression]
  • tot_bgq [brief grief questionnaire with impairment item]
  • tot_ecrrs_global_anx [anxious attachment - global]
  • tot_ecrrs_global_avoid [avoidant attachment - global]
  • tot_ecrrs_spouse_anx [anxious attachment - spouse]
  • tot_ecrrs_spouse_avoid [avoidant attachment - spouse]
  • tot_ysl [yearning for the deceased]
# load in master dataset
master <- readRDS("~/Dropbox/GLASS Lab/OT Study/data/cleaned-data/selfreports_oxtr_cortisol_random_AB_bx_clean.rds") 
# subset the selected variables
mlm_add <- subset(master, select=c(ID,age, age_yrs, sex_m, ethnicity_hisp, race, timesincedeath, rs2254298, rs2268498, rs53576, rs2254298_A.carrier, rs2268498_C.carrier, rs53576_A.carrier, tot_bdi, tot_bgq, tot_ecrrs_global_anx,tot_ecrrs_global_avoid, tot_ecrrs_spouse_anx, tot_ecrrs_spouse_avoid, tot_ysl, tot_icg))
head(mlm_add)
# merge into the behavioral dataset
mlm_bx4 <- left_join(mlm_bx3, mlm_add, by="ID")
head(mlm_bx4)
# save it
saveRDS(mlm_bx4, "~/Dropbox/2018Fall/FSHD617C/mlm_bx4.rds")

Exploratory analysis and checking assumptions

First step is to overview the entire dataset using summarytools.

library(summarytools)
# overview the entire dataset
# this will look like TOTAL crap in the console (notebook output)
# use view(dfSummary(mlm_bx4)) from the console to see a much more readable and attractive output
dfSummary(mlm_bx4)
Data Frame Summary   
mlm_bx4     
N: 9880   
-------------------------------------------------------------------------------------------------------------------------------------------------------
No   Variable                  Stats / Values                     Freqs (% of Valid)      Text Graph                               Valid      Missing  
---- ------------------------- ---------------------------------- ----------------------- ---------------------------------------- ---------- ---------
1    date                      mean (sd) : 63183.18 (35369.28)    73 distinct val.                          :                      9880       0        
     [integer]                 min < med < max :                                          : .               :                      (100%)     (0%)     
                               11416 < 52316 < 121515                                     : : . :           :                                          
                               IQR (CV) : 69500 (0.56)                                    : : : :   : .   : :                                          
                                                                                          : : : : . : :   : :                                          

2    time                      1. 12:36:02                        144 ( 1.5%)                                                      9880       0        
     [character]               2. 14:24:39                        144 ( 1.5%)                                                      (100%)     (0%)     
                               3. 15:06:35                        144 ( 1.5%)                                                                          
                               4. 13:46:36                        142 ( 1.4%)                                                                          
                               5. 16:32:30                        142 ( 1.4%)                                                                          
                               6. 14:12:22                        141 ( 1.4%)                                                                          
                               7. 16:29:40                        141 ( 1.4%)                                                                          
                               8. 11:18:48                        140 ( 1.4%)                                                                          
                               9. 13:07:05                        140 ( 1.4%)                                                                          
                               10. 13:15:27                       140 ( 1.4%)                                                                          
                               [ 68 others ]                      8462 (85.8%)            IIIIIIIIIIIIIIII                                             

3    ID                        1. D110                            284 ( 2.9%)                                                      9880       0        
     [character]               2. D120                            283 ( 2.9%)                                                      (100%)     (0%)     
                               3. D123                            282 ( 2.9%)                                                                          
                               4. D132                            281 ( 2.8%)                                                                          
                               5. D127                            279 ( 2.8%)                                                                          
                               6. D146                            277 ( 2.8%)                                                                          
                               7. D102                            275 ( 2.8%)                                                                          
                               8. D130                            275 ( 2.8%)                                                                          
                               9. D138                            274 ( 2.8%)                                                                          
                               10. D131                           273 ( 2.8%)                                                                          
                               [ 29 others ]                      7097 (71.8%)            IIIIIIIIIIIIIIII                                             

4    blockcode                 1. AAT4                            894 ( 9.0%)             IIIIIIIIIII                              9880       0        
     [character]               2. AAT6                            894 ( 9.0%)             IIIIIIIIIII                              (100%)     (0%)     
                               3. AAT1                            892 ( 9.0%)             IIIIIIIIIII                                                  
                               4. AAT5                            889 ( 9.0%)             IIIIIIIIIII                                                  
                               5. AAT7                            886 ( 9.0%)             IIIIIIIIIII                                                  
                               6. AAT3                            873 ( 8.8%)             IIIIIIIIIII                                                  
                               7. AAT10                           843 ( 8.5%)             IIIIIIIIII                                                   
                               8. AAT9                            839 ( 8.5%)             IIIIIIIIII                                                   
                               9. AAT8                            834 ( 8.4%)             IIIIIIIIII                                                   
                               10. AAT2                           804 ( 8.1%)             IIIIIIIIII                                                   
                               [ 2 others ]                       1232 (12.5%)            IIIIIIIIIIIIIIII                                             

5    blocknum                  1. 154                             74 ( 0.8%)                                                       9880       0        
     [character]               2. 71                              74 ( 0.8%)                                                       (100%)     (0%)     
                               3. 15                              73 ( 0.7%)                                                                           
                               4. 22                              73 ( 0.7%)                                                                           
                               5. 102                             72 ( 0.7%)                                                                           
                               6. 107                             72 ( 0.7%)                                                                           
                               7. 117                             72 ( 0.7%)                                                                           
                               8. 124                             72 ( 0.7%)                                                                           
                               9. 125                             72 ( 0.7%)                                                                           
                               10. 130                            72 ( 0.7%)                                                                           
                               [ 134 others ]                     9154 (92.9%)            IIIIIIIIIIIIIIII                                             

6    trialcode                 1. AAT_4                           894 ( 9.0%)             IIIIIIIIIII                              9880       0        
     [character]               2. AAT_6                           894 ( 9.0%)             IIIIIIIIIII                              (100%)     (0%)     
                               3. AAT_1                           892 ( 9.0%)             IIIIIIIIIII                                                  
                               4. AAT_5                           889 ( 9.0%)             IIIIIIIIIII                                                  
                               5. AAT_7                           886 ( 9.0%)             IIIIIIIIIII                                                  
                               6. AAT_3                           873 ( 8.8%)             IIIIIIIIIII                                                  
                               7. AAT_10                          843 ( 8.5%)             IIIIIIIIII                                                   
                               8. AAT_9                           839 ( 8.5%)             IIIIIIIIII                                                   
                               9. AAT_8                           834 ( 8.4%)             IIIIIIIIII                                                   
                               10. AAT_2                          804 ( 8.1%)             IIIIIIIIII                                                   
                               [ 2 others ]                       1232 (12.5%)            IIIIIIIIIIIIIIII                                             

7    values.trialcode          1. AAT_4                           894 ( 9.0%)             IIIIIIIIIII                              9880       0        
     [character]               2. AAT_6                           894 ( 9.0%)             IIIIIIIIIII                              (100%)     (0%)     
                               3. AAT_1                           892 ( 9.0%)             IIIIIIIIIII                                                  
                               4. AAT_5                           889 ( 9.0%)             IIIIIIIIIII                                                  
                               5. AAT_7                           886 ( 9.0%)             IIIIIIIIIII                                                  
                               6. AAT_3                           873 ( 8.8%)             IIIIIIIIIII                                                  
                               7. AAT_10                          843 ( 8.5%)             IIIIIIIIII                                                   
                               8. AAT_9                           839 ( 8.5%)             IIIIIIIIII                                                   
                               9. AAT_8                           834 ( 8.4%)             IIIIIIIIII                                                   
                               10. AAT_2                          804 ( 8.1%)             IIIIIIIIII                                                   
                               [ 2 others ]                       1232 (12.5%)            IIIIIIIIIIIIIIII                                             

8    values.stimulus           1. 3Y_3.jpg                        418 ( 4.2%)             I                                        9880       0        
     [character]               2. 3Y_2.jpg                        413 ( 4.2%)             I                                        (100%)     (0%)     
                               3. 3Y_1.jpg                        411 ( 4.2%)             I                                                            
                               4. 2B_3.jpg                        386 ( 3.9%)             I                                                            
                               5. 2B_2.jpg                        382 ( 3.9%)                                                                          
                               6. 2B_1.jpg                        371 ( 3.8%)                                                                          
                               7. 1B_3.jpg                        362 ( 3.7%)                                                                          
                               8. 1B_2.jpg                        336 ( 3.4%)                                                                          
                               9. 3B_1.jpg                        336 ( 3.4%)                                                                          
                               10. 1B_1.jpg                       334 ( 3.4%)                                                                          
                               [ 20 others ]                      6131 (62.1%)            IIIIIIIIIIIIIIII                                             

9    push_pull                 1. PULL                            4953 (50.1%)            IIIIIIIIIIIIIIII                         9880       0        
     [character]               2. PUSH                            4927 (49.9%)            IIIIIIIIIIIIIII                          (100%)     (0%)     

10   gAAT_RT                   mean (sd) : 794.92 (203.04)        959 distinct val.         : .                                    9880       0        
     [numeric]                 min < med < max :                                            : :                                    (100%)     (0%)     
                               473 < 756 < 1712                                           . : : :                                                      
                               IQR (CV) : 251 (0.26)                                      : : : : .                                                    
                                                                                          : : : : : : .                                                

11   trialnum                  mean (sd) : 72.87 (41.54)          144 distinct val.       : . . : . . : . . :                      9880       0        
     [integer]                 min < med < max :                                          : : : : : : : : : :                      (100%)     (0%)     
                               1 < 73 < 144                                               : : : : : : : : : :                                          
                               IQR (CV) : 72 (0.57)                                       : : : : : : : : : :                                          
                                                                                          : : : : : : : : : :                                          

12   run                       1. 1                               9880 (100.0%)           IIIIIIIIIIIIIIII                         9880       0        
     [character]                                                                                                                   (100%)     (0%)     

13   visit                     1. 1                               4683 (47.4%)            IIIIIIIIIIIIII                           9880       0        
     [factor]                  2. 2                               5197 (52.6%)            IIIIIIIIIIIIIIII                         (100%)     (0%)     

14   cond_v1                   1. A                               5014 (50.7%)            IIIIIIIIIIIIIIII                         9880       0        
     [factor]                  2. B                               4866 (49.2%)            IIIIIIIIIIIIIII                          (100%)     (0%)     

15   cond_v2                   1. A                               4866 (49.2%)            IIIIIIIIIIIIIII                          9880       0        
     [factor]                  2. B                               5014 (50.7%)            IIIIIIIIIIIIIIII                         (100%)     (0%)     

16   cond                      1. A                               4787 (48.4%)            IIIIIIIIIIIIIII                          9880       0        
     [factor]                  2. B                               5093 (51.5%)            IIIIIIIIIIIIIIII                         (100%)     (0%)     

17   stim                      1. neutral                         1882 (19.1%)            IIIIIIIIIIIII                            9880       0        
     [factor]                  2. death                           1790 (18.1%)            IIIIIIIIIIIII                            (100%)     (0%)     
                               3. living                          2101 (21.3%)            IIIIIIIIIIIIIII                                              
                               4. spouse                          1906 (19.3%)            IIIIIIIIIIIII                                                
                               5. stranger                        2201 (22.3%)            IIIIIIIIIIIIIIII                                             

18   outlier_RT                mean (sd) : 0 (0)                  0.00 : 9880 (100.0%)    IIIIIIIIIIIIIIII                         9880       0        
     [numeric]                 min < med < max :                                                                                   (100%)     (0%)     
                               0 < 0 < 0                                                                                                               
                               IQR (CV) : 0 (NaN)                                                                                                      

19   date1                     mean (sd) : 63183.18 (35369.28)    73 distinct val.                          :                      9880       0        
     [integer]                 min < med < max :                                          : .               :                      (100%)     (0%)     
                               11416 < 52316 < 121515                                     : : . :           :                                          
                               IQR (CV) : 69500 (0.56)                                    : : : :   : .   : :                                          
                                                                                          : : : : . : :   : :                                          

20   time1                     1. 12:45:25                        144 ( 1.5%)                                                      9880       0        
     [character]               2. 14:34:34                        144 ( 1.5%)                                                      (100%)     (0%)     
                               3. 15:15:51                        144 ( 1.5%)                                                                          
                               4. 13:55:57                        142 ( 1.4%)                                                                          
                               5. 16:42:40                        142 ( 1.4%)                                                                          
                               6. 14:22:28                        141 ( 1.4%)                                                                          
                               7. 16:40:38                        141 ( 1.4%)                                                                          
                               8. 11:28:07                        140 ( 1.4%)                                                                          
                               9. 13:15:44                        140 ( 1.4%)                                                                          
                               10. 13:26:09                       140 ( 1.4%)                                                                          
                               [ 68 others ]                      8462 (85.8%)            IIIIIIIIIIIIIIII                                             

21   ID1                       1. D110                            284 ( 2.9%)                                                      9880       0        
     [character]               2. D120                            283 ( 2.9%)                                                      (100%)     (0%)     
                               3. D123                            282 ( 2.9%)                                                                          
                               4. D132                            281 ( 2.8%)                                                                          
                               5. D127                            279 ( 2.8%)                                                                          
                               6. D146                            277 ( 2.8%)                                                                          
                               7. D102                            275 ( 2.8%)                                                                          
                               8. D130                            275 ( 2.8%)                                                                          
                               9. D138                            274 ( 2.8%)                                                                          
                               10. D131                           273 ( 2.8%)                                                                          
                               [ 29 others ]                      7097 (71.8%)            IIIIIIIIIIIIIIII                                             

22   blockcode1                1. AAT4                            894 ( 9.0%)             IIIIIIIIIII                              9880       0        
     [character]               2. AAT6                            894 ( 9.0%)             IIIIIIIIIII                              (100%)     (0%)     
                               3. AAT1                            892 ( 9.0%)             IIIIIIIIIII                                                  
                               4. AAT5                            889 ( 9.0%)             IIIIIIIIIII                                                  
                               5. AAT7                            886 ( 9.0%)             IIIIIIIIIII                                                  
                               6. AAT3                            873 ( 8.8%)             IIIIIIIIIII                                                  
                               7. AAT10                           843 ( 8.5%)             IIIIIIIIII                                                   
                               8. AAT9                            839 ( 8.5%)             IIIIIIIIII                                                   
                               9. AAT8                            834 ( 8.4%)             IIIIIIIIII                                                   
                               10. AAT2                           804 ( 8.1%)             IIIIIIIIII                                                   
                               [ 2 others ]                       1232 (12.5%)            IIIIIIIIIIIIIIII                                             

23   blocknum1                 1. 154                             74 ( 0.8%)                                                       9880       0        
     [character]               2. 71                              74 ( 0.8%)                                                       (100%)     (0%)     
                               3. 15                              73 ( 0.7%)                                                                           
                               4. 22                              73 ( 0.7%)                                                                           
                               5. 102                             72 ( 0.7%)                                                                           
                               6. 107                             72 ( 0.7%)                                                                           
                               7. 117                             72 ( 0.7%)                                                                           
                               8. 124                             72 ( 0.7%)                                                                           
                               9. 125                             72 ( 0.7%)                                                                           
                               10. 130                            72 ( 0.7%)                                                                           
                               [ 134 others ]                     9154 (92.9%)            IIIIIIIIIIIIIIII                                             

24   trialcode1                1. AAT_4                           894 ( 9.0%)             IIIIIIIIIII                              9880       0        
     [character]               2. AAT_6                           894 ( 9.0%)             IIIIIIIIIII                              (100%)     (0%)     
                               3. AAT_1                           892 ( 9.0%)             IIIIIIIIIII                                                  
                               4. AAT_5                           889 ( 9.0%)             IIIIIIIIIII                                                  
                               5. AAT_7                           886 ( 9.0%)             IIIIIIIIIII                                                  
                               6. AAT_3                           873 ( 8.8%)             IIIIIIIIIII                                                  
                               7. AAT_10                          843 ( 8.5%)             IIIIIIIIII                                                   
                               8. AAT_9                           839 ( 8.5%)             IIIIIIIIII                                                   
                               9. AAT_8                           834 ( 8.4%)             IIIIIIIIII                                                   
                               10. AAT_2                          804 ( 8.1%)             IIIIIIIIII                                                   
                               [ 2 others ]                       1232 (12.5%)            IIIIIIIIIIIIIIII                                             

25   values.trialcode1         1. AAT_4                           894 ( 9.0%)             IIIIIIIIIII                              9880       0        
     [character]               2. AAT_6                           894 ( 9.0%)             IIIIIIIIIII                              (100%)     (0%)     
                               3. AAT_1                           892 ( 9.0%)             IIIIIIIIIII                                                  
                               4. AAT_5                           889 ( 9.0%)             IIIIIIIIIII                                                  
                               5. AAT_7                           886 ( 9.0%)             IIIIIIIIIII                                                  
                               6. AAT_3                           873 ( 8.8%)             IIIIIIIIIII                                                  
                               7. AAT_10                          843 ( 8.5%)             IIIIIIIIII                                                   
                               8. AAT_9                           839 ( 8.5%)             IIIIIIIIII                                                   
                               9. AAT_8                           834 ( 8.4%)             IIIIIIIIII                                                   
                               10. AAT_2                          804 ( 8.1%)             IIIIIIIIII                                                   
                               [ 2 others ]                       1232 (12.5%)            IIIIIIIIIIIIIIII                                             

26   values.stimulus1          1. 3Y_1.jpg                        427 ( 4.3%)             I                                        9880       0        
     [character]               2. 3Y_3.jpg                        417 ( 4.2%)             I                                        (100%)     (0%)     
                               3. 3Y_2.jpg                        398 ( 4.0%)             I                                                            
                               4. 2B_3.jpg                        391 ( 4.0%)             I                                                            
                               5. 2B_1.jpg                        384 ( 3.9%)             I                                                            
                               6. 2B_2.jpg                        364 ( 3.7%)                                                                          
                               7. 1B_1.jpg                        349 ( 3.5%)                                                                          
                               8. 1B_2.jpg                        344 ( 3.5%)                                                                          
                               9. 1B_3.jpg                        339 ( 3.4%)                                                                          
                               10. 5B_3.jpg                       332 ( 3.4%)                                                                          
                               [ 20 others ]                      6135 (62.1%)            IIIIIIIIIIIIIIII                                             

27   push_pull1                1. PULL                            4927 (49.9%)            IIIIIIIIIIIIIII                          9880       0        
     [character]               2. PUSH                            4953 (50.1%)            IIIIIIIIIIIIIIII                         (100%)     (0%)     

28   gAAT_RT1                  mean (sd) : 769.98 (197.98)        957 distinct val.         :                                      9880       0        
     [numeric]                 min < med < max :                                            : :                                    (100%)     (0%)     
                               464 < 727 < 1709                                           . : :                                                        
                               IQR (CV) : 236 (0.26)                                      : : : : .                                                    
                                                                                          : : : : : . .                                                

29   trialnum1                 mean (sd) : 72.87 (41.54)          144 distinct val.       : . . : . . : . . :                      9880       0        
     [integer]                 min < med < max :                                          : : : : : : : : : :                      (100%)     (0%)     
                               1 < 73 < 144                                               : : : : : : : : : :                                          
                               IQR (CV) : 72 (0.57)                                       : : : : : : : : : :                                          
                                                                                          : : : : : : : : : :                                          

30   run1                      1. 2                               9880 (100.0%)           IIIIIIIIIIIIIIII                         9880       0        
     [character]                                                                                                                   (100%)     (0%)     

31   visit1                    1. 1                               4683 (47.4%)            IIIIIIIIIIIIII                           9880       0        
     [factor]                  2. 2                               5197 (52.6%)            IIIIIIIIIIIIIIII                         (100%)     (0%)     

32   cond_v11                  1. A                               5014 (50.7%)            IIIIIIIIIIIIIIII                         9880       0        
     [factor]                  2. B                               4866 (49.2%)            IIIIIIIIIIIIIII                          (100%)     (0%)     

33   cond_v21                  1. A                               4866 (49.2%)            IIIIIIIIIIIIIII                          9880       0        
     [factor]                  2. B                               5014 (50.7%)            IIIIIIIIIIIIIIII                         (100%)     (0%)     

34   cond1                     1. A                               4787 (48.4%)            IIIIIIIIIIIIIII                          9880       0        
     [factor]                  2. B                               5093 (51.5%)            IIIIIIIIIIIIIIII                         (100%)     (0%)     

35   stim1                     1. neutral                         1882 (19.1%)            IIIIIIIIIIIII                            9880       0        
     [factor]                  2. death                           1790 (18.1%)            IIIIIIIIIIIII                            (100%)     (0%)     
                               3. living                          2101 (21.3%)            IIIIIIIIIIIIIII                                              
                               4. spouse                          1906 (19.3%)            IIIIIIIIIIIII                                                
                               5. stranger                        2201 (22.3%)            IIIIIIIIIIIIIIII                                             

36   outlier_RT1               mean (sd) : 0 (0)                  0.00 : 9880 (100.0%)    IIIIIIIIIIIIIIII                         9880       0        
     [numeric]                 min < med < max :                                                                                   (100%)     (0%)     
                               0 < 0 < 0                                                                                                               
                               IQR (CV) : 0 (NaN)                                                                                                      

37   bias                      mean (sd) : 11.83 (246.66)         1342 distinct val.                :                              9880       0        
     [numeric]                 min < med < max :                                                  : :                              (100%)     (0%)     
                               -1157 < 8 < 1119                                                   : :                                                  
                               IQR (CV) : 260 (20.85)                                             : : .                                                
                                                                                              . : : : : .                                              

38   bias_t                    mean (sd) : 0 (1)                  1342 distinct val.              : .                              9880       0        
     [numeric]                 min < med < max :                                                  : :                              (100%)     (0%)     
                               -3.61 < 0 < 3.89                                                 : : : .                                                
                               IQR (CV) : 1.35 (2539.72)                                        : : : :                                                
                                                                                            . : : : : : .                                              

39   low_n_trials              mean (sd) : 0.06 (0.24)            0.00 : 9289 (94.0%)     IIIIIIIIIIIIIIII                         9880       0        
     [numeric]                 min < med < max :                  1.00 :  591 ( 6.0%)     I                                        (100%)     (0%)     
                               0 < 0 < 1                                                                                                               
                               IQR (CV) : 0 (3.96)                                                                                                     

40   age                       mean (sd) : 25229.12 (2370.06)     39 distinct val.                    :                            9880       0        
     [numeric]                 min < med < max :                                                      :                            (100%)     (0%)     
                               20845 < 25597 < 29101                                        :     . . :     .                                          
                               IQR (CV) : 3573 (0.09)                                     : :     : : :   : :                                          
                                                                                          : : . : : : : : : :                                          

41   age_yrs                   mean (sd) : 69.12 (6.49)           39 distinct val.                    :                            9880       0        
     [numeric]                 min < med < max :                                                      :                            (100%)     (0%)     
                               57.11 < 70.13 < 79.73                                        :     . . :     .                                          
                               IQR (CV) : 9.79 (0.09)                                     : :     : : :   : :                                          
                                                                                          : : . : : : : : : :                                          

42   sex_m                     1. female                          7250 (73.4%)            IIIIIIIIIIIIIIII                         9880       0        
     [ordered, factor]         2. male                            2630 (26.6%)            IIIII                                    (100%)     (0%)     

43   ethnicity_hisp            1. not Hispanic or Latino          9449 (95.6%)            IIIIIIIIIIIIIIII                         9880       0        
     [ordered, factor]         2. Hispanic or Latino               431 ( 4.4%)                                                     (100%)     (0%)     

44   race                      1. White                           9724 (98.4%)            IIIIIIIIIIIIIIII                         9880       0        
     [factor]                  2. Other (specify)                  156 ( 1.6%)                                                     (100%)     (0%)     

45   timesincedeath            mean (sd) : 460.85 (245.61)        36 distinct val.          :                                      9880       0        
     [numeric]                 min < med < max :                                            :                                      (100%)     (0%)     
                               151 < 411 < 1095                                             :   :                                                      
                               IQR (CV) : 320 (0.53)                                      . :   :                                                      
                                                                                          : : . : : . .   : .                                          

46   rs2254298                 1. GG                              7005 (72.8%)            IIIIIIIIIIIIIIII                         9627       253      
     [factor]                  2. A/G                             2622 (27.2%)            IIIII                                    (97.44%)   (2.56%)  

47   rs2268498                 1. C/T                             4325 (43.8%)            IIIIIIIIIIIIIIII                         9880       0        
     [character]               2. CC                              2636 (26.7%)            IIIIIIIII                                (100%)     (0%)     
                               3. TT                              2919 (29.5%)            IIIIIIIIII                                                   

48   rs53576                   1. A/G                             5207 (52.7%)            IIIIIIIIIIIIIIII                         9880       0        
     [character]               2. GG                              4673 (47.3%)            IIIIIIIIIIIIII                           (100%)     (0%)     

49   rs2254298_A.carrier       1. 0                               7005 (72.8%)            IIIIIIIIIIIIIIII                         9627       253      
     [ordered, factor]         2. 1                               2622 (27.2%)            IIIII                                    (97.44%)   (2.56%)  

50   rs2268498_C.carrier       1. 0                               2919 (29.5%)            IIIIII                                   9880       0        
     [ordered, factor]         2. 1                               6961 (70.5%)            IIIIIIIIIIIIIIII                         (100%)     (0%)     

51   rs53576_A.carrier         1. 0                               4673 (47.3%)            IIIIIIIIIIIIII                           9880       0        
     [ordered, factor]         2. 1                               5207 (52.7%)            IIIIIIIIIIIIIIII                         (100%)     (0%)     

52   tot_bdi                   mean (sd) : 10.47 (7.69)           22 distinct val.          :                                      9880       0        
     [numeric]                 min < med < max :                                            :                                      (100%)     (0%)     
                               0 < 10 < 28                                                : : .   .                                                    
                               IQR (CV) : 12 (0.73)                                       : : : : : . :                                                
                                                                                          : : : : : : : . : :                                          

53   tot_bgq                   mean (sd) : 9.56 (2.26)            6.00 :  958 ( 9.7%)     IIIII                                    9880       0        
     [numeric]                 min < med < max :                  7.00 :  809 ( 8.2%)     IIII                                     (100%)     (0%)     
                               6 < 9 < 15                         8.00 : 1079 (10.9%)     IIIIII                                                       
                               IQR (CV) : 2 (0.24)                9.00 : 2640 (26.7%)     IIIIIIIIIIIIIIII                                             
                                                                  10.00 : 1997 (20.2%)    IIIIIIIIIIII                                                 
                                                                  11.00 :  495 ( 5.0%)    III                                                          
                                                                  12.00 :  591 ( 6.0%)    III                                                          
                                                                  13.00 :  508 ( 5.1%)    III                                                          
                                                                  14.00 :  528 ( 5.3%)    III                                                          
                                                                  15.00 :  275 ( 2.8%)    I                                                            

54   tot_ecrrs_global_anx      mean (sd) : 3.24 (1.45)            16 distinct val.          :     .                                9880       0        
     [numeric]                 min < med < max :                                            :     :                                (100%)     (0%)     
                               1 < 3.33 < 6.33                                            . :     :   :                                                
                               IQR (CV) : 2.33 (0.45)                                     : : : . :   :   :                                            
                                                                                          : : : : : : : . : :                                          

55   tot_ecrrs_global_avoid    mean (sd) : 3.42 (1.07)            22 distinct val.                :                                9880       0        
     [numeric]                 min < med < max :                                                  : :                              (100%)     (0%)     
                               1 < 3.5 < 6.33                                                   . : :                                                  
                               IQR (CV) : 1.17 (0.31)                                         . : : :                                                  
                                                                                          : : : : : : : : . .                                          

56   tot_ecrrs_spouse_anx      mean (sd) : 2.22 (1.53)            12 distinct val.        :                                        9880       0        
     [numeric]                 min < med < max :                                          :                                        (100%)     (0%)     
                               1 < 1.33 < 6                                               :                                                            
                               IQR (CV) : 2.33 (0.69)                                     :                                                            
                                                                                          : : . : : :     . .                                          

57   tot_ecrrs_spouse_avoid    mean (sd) : 2.13 (1.15)            16 distinct val.        :                                        9880       0        
     [numeric]                 min < med < max :                                          :                                        (100%)     (0%)     
                               1 < 1.67 < 5.33                                            :                                                            
                               IQR (CV) : 1.83 (0.54)                                     : .   .                                                      
                                                                                          : : . : : : .   .                                            

58   tot_ysl                   mean (sd) : 67.32 (18.45)          31 distinct val.            .       : :                          9880       0        
     [numeric]                 min < med < max :                                              : . .   : :                          (100%)     (0%)     
                               31 < 69 < 100                                                . : : : . : :   :                                          
                               IQR (CV) : 32 (0.27)                                       : : : : : : : : . :                                          
                                                                                          : : : : : : : : : :                                          

59   tot_icg                   mean (sd) : 23.07 (12.16)          28 distinct val.            :     .                              9880       0        
     [numeric]                 min < med < max :                                          :   :     :                              (100%)     (0%)     
                               4 < 22 < 51                                                : . : : : :                                                  
                               IQR (CV) : 13 (0.53)                                       : : : : : : : . .                                            
                                                                                          : : : : : : : : : :                                          
-------------------------------------------------------------------------------------------------------------------------------------------------------
 # no missing data on any of the variables!

Descriptives

library(psych)
library(stargazer)

Please cite as: 

 Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.
 R package version 5.2.1. https://CRAN.R-project.org/package=stargazer 
library(summarytools)
# descriptives for the time-invariant variables are going to be wrong if we use mlm_bx4 (since observations are duplicated)
# so going back to the mlm_add object...
# note that n = 40, but n = 39 with RT data
# D149's behavioral data was dropped (ran outside of scanner), so drop his case
mlm_add <- filter(mlm_add, !grepl("D149",ID))
unique(mlm_add$ID) # it worked
 [1] "D101" "D103" "D102" "D104" "D105" "D107" "D110" "D114" "D115" "D113" "D118" "D119" "D121" "D117" "D120" "D122" "D123" "D125" "D127" "D130" "D132" "D128" "D134" "D126" "D133"
[26] "D135" "D131" "D129" "D137" "D138" "D140" "D139" "D141" "D144" "D145" "D142" "D147" "D146" "D148"
nums <- select_if(mlm_add, is.numeric) # just get the numeric variables
view(dfSummary(nums)) 
stargazer(nums, type="text", title="Descriptives", digits=2, out="table1.txt", notes = "Note: 'stargazer' makes nicer looking tables, but doesn't provide skewness or kurtosis statistics, so we'll use 'psych' for that.") 

Descriptives
======================================================================================================================================
Statistic                                       N                 Mean             St. Dev.             Min                Max        
--------------------------------------------------------------------------------------------------------------------------------------
age                                             39             25,309.79           2,382.79            20,845             29,101      
age_yrs                                         39               69.34               6.53              57.11              79.73       
timesincedeath                                  39               468.64             251.59              151               1,095       
tot_bdi                                         39               10.54               7.70                0                  28        
tot_bgq                                         39                9.59               2.29                6                  15        
tot_ecrrs_global_anx                            39                3.29               1.45               1.00               6.33       
tot_ecrrs_global_avoid                          39                3.40               1.10               1.00               6.33       
tot_ecrrs_spouse_anx                            39                2.17               1.54               1.00               6.00       
tot_ecrrs_spouse_avoid                          39                2.10               1.15               1.00               5.33       
tot_ysl                                         39               67.85              18.79                31                100        
tot_icg                                         39               23.38              12.63                4                  51        
--------------------------------------------------------------------------------------------------------------------------------------
Note: 'stargazer' makes nicer looking tables, but doesn't provide skewness or kurtosis statistics, so we'll use 'psych' for that.     
describe(nums)
                       vars  n     mean      sd   median  trimmed     mad      min      max   range  skew kurtosis     se
age                       1 39 25309.79 2382.79 25640.00 25366.85 2659.78 20845.00 29101.00 8256.00 -0.27    -1.01 381.55
age_yrs                   2 39    69.34    6.53    70.25    69.50    7.29    57.11    79.73   22.62 -0.27    -1.01   1.05
timesincedeath            3 39   468.64  251.59   411.00   442.36  198.67   151.00  1095.00  944.00  1.02    -0.02  40.29
tot_bdi                   4 39    10.54    7.70    10.00    10.06    8.90     0.00    28.00   28.00  0.52    -0.78   1.23
tot_bgq                   5 39     9.59    2.29     9.00     9.48    1.48     6.00    15.00    9.00  0.43    -0.38   0.37
tot_ecrrs_global_anx      6 39     3.29    1.45     3.33     3.23    1.98     1.00     6.33    5.33  0.34    -0.84   0.23
tot_ecrrs_global_avoid    7 39     3.40    1.10     3.50     3.39    0.99     1.00     6.33    5.33  0.07     0.14   0.18
tot_ecrrs_spouse_anx      8 39     2.17    1.54     1.33     1.96    0.49     1.00     6.00    5.00  1.08    -0.10   0.25
tot_ecrrs_spouse_avoid    9 39     2.10    1.15     1.67     1.98    0.99     1.00     5.33    4.33  0.88    -0.20   0.18
tot_ysl                  10 39    67.85   18.79    70.00    67.88   19.27    31.00   100.00   69.00 -0.08    -1.14   3.01
tot_icg                  11 39    23.38   12.63    22.00    22.79   10.38     4.00    51.00   47.00  0.36    -0.76   2.02

Frequencies

# which variables are factors?
mlm_add %>% select_if(is.factor) %>% colnames()
[1] "sex_m"               "ethnicity_hisp"      "race"                "rs2254298"           "rs2254298_A.carrier" "rs2268498_C.carrier" "rs53576_A.carrier"  
# hmm, rs2268498 and rs53576 should also be factors, so fix that now
mlm_add <- mlm_add %>% mutate(rs2268498 = recode_factor(rs2268498, "C/T" = "C/T", "CC" = "CC", "TT" = "TT"), rs53576 = recode_factor(rs53576, "A/G" = "A/G", "AA" = "AA", "GG" = "GG"))
mlm_add %>% select_if(is.factor) %>% colnames()
[1] "sex_m"               "ethnicity_hisp"      "race"                "rs2254298"           "rs2268498"           "rs53576"             "rs2254298_A.carrier" "rs2268498_C.carrier"
[9] "rs53576_A.carrier"  
# function to count frequencies for variables that are factors...this isn't working
# countAll <- function(basedata)
# {
#   facs <- sapply(basedata, is.factor)
#   facdata <- basedata[ ,facs]
#   for(i in 1:length(facdata))
#   {
#       count(facdata)
#    }
# }
# countAll(mlm_bx4)
mlm_add %>% count(sex_m) # 28 females, 11 males
mlm_add %>% count(race) # 38 white, 1 another race not listed
mlm_add %>% count(ethnicity_hisp) # 37 non-Hispanic/Latino, 2 Hispanic/Latino
mlm_add %>% count(rs2254298) # 22 GG, 10 A/G, 1 NA/missing (0 AA)
mlm_add %>% count(rs2268498) # 17 C/T, 10 CC, 12 TT
mlm_add %>% count(rs53576) # 21 A/G, 18 GG (0 AA)
mlm_add %>% count(rs2254298_A.carrier) # yes = 10, no = 28, NA = 1
mlm_add %>% count(rs2268498_C.carrier) # yes = 27, no = 12
mlm_add %>% count(rs53576_A.carrier) # yes = 21, no = 18

Checking distribution shapes and normality

Got a preview of this above when we used summarytools, but here’s a closer look at some histograms and QQ plots.

# a function to produce histograms for all numeric variables in a dataframe
histAll <- function(basedata)
{
    nums <- sapply((basedata), is.numeric)
    numdata <- basedata[ ,nums]
    par(mfrow=c(4,4))
    for(i in 1:length(numdata))
    {
        hist(numdata[,i], xlab=NULL, main=names(numdata[i]))
    }
}
histAll(as.data.frame(mlm_bx4))

Centering L2 predictors

# pull out only the total score variables, age, timesincedeath
gcvars <- as.data.frame(mlm_bx4[, grep('^tot|age|timesincedeath|ID', names(mlm_bx4))])
gcvars1 <- gcvars[!duplicated(gcvars$ID), ]
stargazer(gcvars1, type="text", median = T, digits=3)

===================================================================
Statistic              N     Mean    St. Dev.   Min   Median  Max  
-------------------------------------------------------------------
age                    39 25,309.790 2,382.786 20,845 25,640 29,101
age_yrs                39   69.342     6.528   57.110 70.247 79.729
timesincedeath         39  468.641    251.589   151    411   1,095 
tot_bdi                39   10.538     7.701     0      10     28  
tot_bgq                39   9.590      2.291     6      9      15  
tot_ecrrs_global_anx   39   3.291      1.449   1.000  3.333  6.333 
tot_ecrrs_global_avoid 39   3.402      1.101   1.000  3.500  6.333 
tot_ecrrs_spouse_anx   39   2.171      1.543   1.000  1.333  6.000 
tot_ecrrs_spouse_avoid 39   2.098      1.152   1.000  1.667  5.333 
tot_ysl                39   67.846    18.790     31     70    100  
tot_icg                39   23.385    12.630     4      22     51  
-------------------------------------------------------------------
gmean_age <- mean(gcvars1$age)
gmean_age_yrs <- mean(gcvars1$age_yrs)
gmean_time <- mean(gcvars1$timesincedeath)
gmean_bdi <- mean(gcvars1$tot_bdi)
gmean_bgq <- mean(gcvars1$tot_bgq)
gmean_ecrrs_g_anx <- mean(gcvars1$tot_ecrrs_global_anx)
gmean_ecrrs_g_av <- mean(gcvars1$tot_ecrrs_global_avoid)
gmean_ecrrs_s_anx <- mean(gcvars1$tot_ecrrs_spouse_anx)
gmean_ecrrs_s_av <- mean(gcvars1$tot_ecrrs_spouse_avoid)
gmean_icg <- mean(gcvars1$tot_icg)
gmean_ysl <- mean(gcvars1$tot_ysl)
mlm_bx5 <- mlm_bx4 %>% mutate(gmean_age = age-gmean_age, gmean_age_yrs = age_yrs-gmean_age_yrs, gmean_timesincedeath = timesincedeath-gmean_time, gmean_bdi = tot_bdi-gmean_bdi, gmean_bgq = tot_bgq-gmean_bgq, gmean_ecrrs_global_anx = tot_ecrrs_global_anx-gmean_ecrrs_g_anx, gmean_ecrrs_global_avoid = tot_ecrrs_global_avoid-gmean_ecrrs_g_av, gmean_ecrrs_spouse_anx = tot_ecrrs_spouse_anx-gmean_ecrrs_s_anx, gmean_ecrrs_spouse_avoid = tot_ecrrs_spouse_avoid-gmean_ecrrs_s_av, gmean_ysl = tot_ysl-gmean_ysl, gmean_icg = tot_icg-gmean_icg)
stargazer(as.data.frame(mlm_bx5), type="text") # note that means of the group-mean centered variables are not zero - I think this is because some people had more trials than others

=========================================================================
Statistic                  N      Mean     St. Dev.     Min        Max   
-------------------------------------------------------------------------
date                     9,880 63,183.180 35,369.280   11,416    121,515 
gAAT_RT                  9,880  794.920    203.041      473       1,712  
trialnum                 9,880   72.873     41.538       1         144   
outlier_RT               9,880   0.000      0.000        0          0    
date1                    9,880 63,183.180 35,369.280   11,416    121,515 
gAAT_RT1                 9,880  769.980    197.980      464       1,709  
trialnum1                9,880   72.873     41.538       1         144   
outlier_RT1              9,880   0.000      0.000        0          0    
bias                     9,880   11.828    246.662     -1,157     1,119  
bias_t                   9,880   0.0004     0.999      -3.612     3.888  
low_n_trials             9,880   0.060      0.237        0          1    
age                      9,880 25,229.120 2,370.058    20,845    29,101  
age_yrs                  9,880   69.121     6.493      57.110    79.729  
timesincedeath           9,880  460.855    245.608      151       1,095  
tot_bdi                  9,880   10.467     7.688        0         28    
tot_bgq                  9,880   9.558      2.258        6         15    
tot_ecrrs_global_anx     9,880   3.241      1.450      1.000      6.333  
tot_ecrrs_global_avoid   9,880   3.416      1.070      1.000      6.333  
tot_ecrrs_spouse_anx     9,880   2.218      1.532      1.000      6.000  
tot_ecrrs_spouse_avoid   9,880   2.127      1.150      1.000      5.333  
tot_ysl                  9,880   67.317     18.448       31        100   
tot_icg                  9,880   23.066     12.156       4         51    
gmean_age                9,880  -80.671   2,370.058  -4,464.795 3,791.205
gmean_age_yrs            9,880   -0.221     6.493     -12.232    10.387  
gmean_timesincedeath     9,880   -7.786    245.608    -317.641   626.359 
gmean_bdi                9,880   -0.071     7.688     -10.538    17.462  
gmean_bgq                9,880   -0.032     2.258      -3.590     5.410  
gmean_ecrrs_global_anx   9,880   -0.050     1.450      -2.291     3.043  
gmean_ecrrs_global_avoid 9,880   0.015      1.070      -2.402     2.932  
gmean_ecrrs_spouse_anx   9,880   0.047      1.532      -1.171     3.829  
gmean_ecrrs_spouse_avoid 9,880   0.029      1.150      -1.098     3.235  
gmean_ysl                9,880   -0.529     18.448    -36.846    32.154  
gmean_icg                9,880   -0.318     12.156    -19.385    27.615  
-------------------------------------------------------------------------
# test that hypothesis with the non-duplicated data
gcvars2 <- gcvars1 %>% mutate(gmean_age = age-gmean_age, gmean_age_yrs = age_yrs-gmean_age_yrs, gmean_timesincedeath = timesincedeath-gmean_time, gmean_bdi = tot_bdi-gmean_bdi, gmean_bgq = tot_bgq-gmean_bgq, gmean_ecrrs_global_anx = tot_ecrrs_global_anx-gmean_ecrrs_g_anx, gmean_ecrrs_global_avoid = tot_ecrrs_global_avoid-gmean_ecrrs_g_av, gmean_ecrrs_spouse_anx = tot_ecrrs_spouse_anx-gmean_ecrrs_s_anx, gmean_ecrrs_spouse_avoid = tot_ecrrs_spouse_avoid-gmean_ecrrs_s_av, gmean_ysl = tot_ysl-gmean_ysl, gmean_icg = tot_icg-gmean_icg)
stargazer(as.data.frame(gcvars2), type="text") # yes, the latter are zero

=====================================================================
Statistic                N     Mean    St. Dev.     Min        Max   
---------------------------------------------------------------------
age                      39 25,309.790 2,382.786   20,845    29,101  
age_yrs                  39   69.342     6.528     57.110    79.729  
timesincedeath           39  468.641    251.589     151       1,095  
tot_bdi                  39   10.538     7.701       0         28    
tot_bgq                  39   9.590      2.291       6         15    
tot_ecrrs_global_anx     39   3.291      1.449     1.000      6.333  
tot_ecrrs_global_avoid   39   3.402      1.101     1.000      6.333  
tot_ecrrs_spouse_anx     39   2.171      1.543     1.000      6.000  
tot_ecrrs_spouse_avoid   39   2.098      1.152     1.000      5.333  
tot_ysl                  39   67.846    18.790       31        100   
tot_icg                  39   23.385    12.630       4         51    
gmean_age                39   0.000    2,382.786 -4,464.795 3,791.205
gmean_age_yrs            39   0.000      6.528    -12.232    10.387  
gmean_timesincedeath     39   0.000     251.589   -317.641   626.359 
gmean_bdi                39   0.000      7.701    -10.538    17.462  
gmean_bgq                39   0.000      2.291     -3.590     5.410  
gmean_ecrrs_global_anx   39   0.000      1.449     -2.291     3.043  
gmean_ecrrs_global_avoid 39   0.000      1.101     -2.402     2.932  
gmean_ecrrs_spouse_anx   39   0.000      1.543     -1.171     3.829  
gmean_ecrrs_spouse_avoid 39   -0.000     1.152     -1.098     3.235  
gmean_ysl                39   0.000     18.790    -36.846    32.154  
gmean_icg                39   0.000     12.630    -19.385    27.615  
---------------------------------------------------------------------
saveRDS(mlm_bx5, "~/Dropbox/2018Fall/FSHD617C/mlm_bx5.rds")

Evaluate missing data

mlm_bx5 %>% 
  select_if(function(x) any(is.na(x))) %>% 
  summarise_all(funs(mean(is.na(.)))) -> mlm_bx5_NAs # or use sum() instead of mean() to find n missing values
mlm_bx5_NAs <- gather(mlm_bx5_NAs) # create long data that lists variables with any NAs, and % of data missing
mlm_bx5_NAs

One person is missing rs2254298 genotype (rs2254298 and rs2254298_A.carrier), which we already knew about.

Scale reliabilities

This dataset does not include the item-level data, but we can get that from the master dataset we loaded in earlier.

library(psych)
mlm_scales <- filter(master, !grepl("D149",ID))
## BDI
mlm_tot_bdi <- subset(mlm_scales, select=c(bdi_1c:bdi_21c))  
psych::alpha(mlm_tot_bdi)

Reliability analysis   
Call: psych::alpha(x = mlm_tot_bdi)

  raw_alpha std.alpha G6(smc) average_r S/N   ase mean   sd median_r
       0.9       0.9    0.97      0.31 9.4 0.023 0.51 0.37     0.32

 lower alpha upper     95% confidence boundaries
0.85 0.9 0.94 

 Reliability if an item is dropped:
        raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
bdi_1c       0.89      0.90    0.97      0.31 8.9    0.024 0.036  0.32
bdi_2c       0.89      0.90    0.97      0.31 9.0    0.024 0.035  0.32
bdi_3c       0.90      0.90    0.97      0.32 9.3    0.023 0.034  0.33
bdi_4c       0.89      0.89    0.97      0.30 8.5    0.025 0.034  0.30
bdi_5c       0.89      0.90    0.97      0.30 8.7    0.024 0.036  0.31
bdi_6c       0.90      0.91    0.97      0.32 9.6    0.023 0.031  0.34
bdi_7c       0.89      0.90    0.97      0.30 8.7    0.024 0.035  0.30
bdi_8c       0.89      0.90    0.97      0.31 8.8    0.024 0.036  0.32
bdi_9c       0.89      0.90    0.97      0.31 9.1    0.024 0.035  0.33
bdi_10c      0.89      0.90    0.97      0.30 8.6    0.025 0.035  0.30
bdi_11c      0.89      0.90    0.97      0.32 9.3    0.023 0.035  0.33
bdi_12c      0.88      0.89    0.97      0.29 8.2    0.027 0.033  0.30
bdi_13c      0.89      0.90    0.97      0.31 9.1    0.024 0.035  0.32
bdi_14c      0.89      0.90    0.97      0.30 8.7    0.024 0.035  0.31
bdi_15c      0.89      0.90    0.96      0.30 8.5    0.026 0.035  0.30
bdi_16c      0.89      0.90    0.97      0.31 8.9    0.024 0.037  0.31
bdi_17c      0.89      0.90    0.97      0.31 9.0    0.024 0.035  0.32
bdi_18c      0.90      0.91    0.97      0.33 9.8    0.022 0.033  0.35
bdi_19c      0.89      0.90    0.97      0.31 8.9    0.024 0.033  0.31
bdi_20c      0.89      0.90    0.97      0.30 8.6    0.025 0.035  0.30
bdi_21c      0.91      0.91    0.97      0.32 9.5    0.021 0.033  0.33

 Item statistics 
         n raw.r std.r r.cor r.drop mean   sd
bdi_1c  38  0.58  0.60  0.59   0.53 0.42 0.50
bdi_2c  39  0.55  0.56  0.55   0.47 0.36 0.63
bdi_3c  39  0.41  0.46  0.45   0.36 0.21 0.47
bdi_4c  39  0.75  0.75  0.75   0.71 0.79 0.70
bdi_5c  39  0.66  0.69  0.68   0.62 0.41 0.50
bdi_6c  39  0.29  0.35  0.33   0.25 0.10 0.38
bdi_7c  38  0.64  0.68  0.68   0.59 0.32 0.66
bdi_8c  38  0.63  0.63  0.62   0.56 0.32 0.53
bdi_9c  38  0.50  0.54  0.53   0.46 0.13 0.34
bdi_10c 39  0.70  0.70  0.70   0.66 0.67 0.62
bdi_11c 39  0.47  0.46  0.45   0.40 0.54 0.60
bdi_12c 39  0.87  0.85  0.85   0.84 0.64 0.84
bdi_13c 38  0.56  0.52  0.51   0.49 0.66 0.75
bdi_14c 39  0.63  0.65  0.65   0.58 0.38 0.63
bdi_15c 38  0.79  0.73  0.74   0.71 0.55 0.72
bdi_16c 39  0.60  0.60  0.58   0.53 0.77 0.67
bdi_17c 39  0.55  0.57  0.56   0.50 0.23 0.48
bdi_18c 39  0.27  0.27  0.24   0.20 0.64 0.67
bdi_19c 38  0.63  0.61  0.60   0.58 0.63 0.63
bdi_20c 39  0.74  0.70  0.70   0.69 0.67 0.66
bdi_21c 39  0.44  0.36  0.34   0.32 1.18 1.14

Non missing response frequency for each item
           0    1    2    3 miss
bdi_1c  0.58 0.42 0.00 0.00 0.03
bdi_2c  0.72 0.21 0.08 0.00 0.00
bdi_3c  0.82 0.15 0.03 0.00 0.00
bdi_4c  0.33 0.56 0.08 0.03 0.00
bdi_5c  0.59 0.41 0.00 0.00 0.00
bdi_6c  0.92 0.05 0.03 0.00 0.00
bdi_7c  0.79 0.11 0.11 0.00 0.03
bdi_8c  0.71 0.26 0.03 0.00 0.03
bdi_9c  0.87 0.13 0.00 0.00 0.03
bdi_10c 0.38 0.59 0.00 0.03 0.00
bdi_11c 0.51 0.44 0.05 0.00 0.00
bdi_12c 0.54 0.33 0.08 0.05 0.00
bdi_13c 0.50 0.34 0.16 0.00 0.03
bdi_14c 0.69 0.23 0.08 0.00 0.00
bdi_15c 0.58 0.29 0.13 0.00 0.03
bdi_16c 0.36 0.51 0.13 0.00 0.00
bdi_17c 0.79 0.18 0.03 0.00 0.00
bdi_18c 0.44 0.51 0.03 0.03 0.00
bdi_19c 0.45 0.47 0.08 0.00 0.03
bdi_20c 0.44 0.46 0.10 0.00 0.00
bdi_21c 0.41 0.15 0.28 0.15 0.00
## BGQ
mlm_tot_bgq <- subset(mlm_scales, select=c(bgq_1:bgq_5))
psych::alpha(mlm_tot_bgq)

Reliability analysis   
Call: psych::alpha(x = mlm_tot_bgq)

  raw_alpha std.alpha G6(smc) average_r S/N   ase mean   sd median_r
      0.71      0.72    0.73      0.34 2.5 0.071  1.9 0.46     0.31

 lower alpha upper     95% confidence boundaries
0.57 0.71 0.85 

 Reliability if an item is dropped:
      raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
bgq_1      0.62      0.63    0.60      0.29 1.7    0.097 0.030  0.32
bgq_2      0.56      0.55    0.51      0.23 1.2    0.112 0.014  0.29
bgq_3      0.72      0.73    0.73      0.41 2.8    0.074 0.028  0.32
bgq_4      0.67      0.67    0.68      0.34 2.1    0.085 0.052  0.32
bgq_5      0.73      0.73    0.73      0.40 2.7    0.070 0.033  0.35

 Item statistics 
       n raw.r std.r r.cor r.drop mean   sd
bgq_1 39  0.76  0.76  0.72   0.57  1.9 0.72
bgq_2 39  0.85  0.86  0.88   0.74  2.1 0.62
bgq_3 39  0.49  0.56  0.38   0.30  2.0 0.51
bgq_4 39  0.69  0.68  0.55   0.47  1.8 0.71
bgq_5 39  0.61  0.57  0.38   0.34  1.7 0.76

Non missing response frequency for each item
         1    2    3 miss
bgq_1 0.28 0.49 0.23    0
bgq_2 0.15 0.62 0.23    0
bgq_3 0.13 0.74 0.13    0
bgq_4 0.33 0.49 0.18    0
bgq_5 0.46 0.36 0.18    0
## ICG
mlm_tot_icg <- subset(mlm_scales, select=c(icg_1:icg_19))
psych::alpha(mlm_tot_icg)

Reliability analysis   
Call: psych::alpha(x = mlm_tot_icg)

  raw_alpha std.alpha G6(smc) average_r S/N   ase mean   sd median_r
      0.92      0.92    0.96      0.37  11 0.017  1.2 0.66     0.36

 lower alpha upper     95% confidence boundaries
0.89 0.92 0.96 

 Reliability if an item is dropped:
       raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
icg_1       0.92      0.91    0.96      0.36  10    0.018 0.036  0.35
icg_2       0.92      0.92    0.96      0.37  11    0.018 0.039  0.37
icg_3       0.91      0.91    0.96      0.36  10    0.019 0.036  0.35
icg_4       0.92      0.91    0.96      0.37  10    0.018 0.041  0.35
icg_5       0.92      0.92    0.96      0.39  12    0.017 0.040  0.39
icg_6       0.92      0.91    0.96      0.37  11    0.018 0.038  0.35
icg_7       0.91      0.91    0.96      0.36  10    0.019 0.037  0.35
icg_8       0.92      0.91    0.96      0.36  10    0.018 0.039  0.35
icg_9       0.92      0.91    0.96      0.36  10    0.018 0.040  0.35
icg_10      0.92      0.91    0.96      0.37  11    0.018 0.041  0.34
icg_11      0.93      0.92    0.96      0.40  12    0.017 0.034  0.40
icg_12      0.92      0.92    0.96      0.40  12    0.016 0.039  0.41
icg_13      0.91      0.91    0.96      0.36  10    0.019 0.039  0.33
icg_14      0.93      0.92    0.96      0.39  12    0.016 0.037  0.40
icg_15      0.92      0.92    0.96      0.39  11    0.017 0.039  0.38
icg_16      0.92      0.91    0.96      0.37  11    0.018 0.041  0.35
icg_17      0.92      0.91    0.96      0.37  11    0.018 0.039  0.36
icg_18      0.92      0.92    0.96      0.38  11    0.017 0.042  0.38
icg_19      0.92      0.91    0.95      0.37  10    0.018 0.040  0.35

 Item statistics 
        n raw.r std.r r.cor r.drop mean   sd
icg_1  39  0.82  0.81  0.81   0.80 1.41 0.88
icg_2  39  0.65  0.64  0.63   0.60 1.79 0.95
icg_3  39  0.83  0.82  0.82   0.80 1.21 1.20
icg_4  39  0.72  0.72  0.71   0.67 2.54 1.14
icg_5  39  0.43  0.43  0.39   0.36 1.87 0.92
icg_6  39  0.73  0.71  0.71   0.68 1.46 1.19
icg_7  39  0.80  0.79  0.79   0.77 1.64 1.16
icg_8  39  0.76  0.76  0.76   0.72 1.36 1.09
icg_9  39  0.75  0.76  0.76   0.72 0.72 0.76
icg_10 39  0.69  0.71  0.70   0.65 0.90 0.94
icg_11 39  0.24  0.28  0.23   0.19 0.26 0.55
icg_12 39  0.37  0.38  0.33   0.30 0.72 0.92
icg_13 39  0.82  0.81  0.81   0.78 1.97 1.27
icg_14 39  0.37  0.39  0.38   0.31 0.67 0.96
icg_15 39  0.47  0.49  0.48   0.42 0.33 0.84
icg_16 39  0.69  0.68  0.66   0.64 0.77 1.22
icg_17 39  0.68  0.67  0.66   0.63 0.85 1.09
icg_18 39  0.56  0.56  0.55   0.50 0.92 1.09
icg_19 39  0.73  0.72  0.72   0.69 2.00 1.08

Non missing response frequency for each item
          0    1    2    3    4 miss
icg_1  0.15 0.38 0.36 0.10 0.00    0
icg_2  0.08 0.31 0.38 0.21 0.03    0
icg_3  0.38 0.21 0.28 0.08 0.05    0
icg_4  0.08 0.08 0.28 0.36 0.21    0
icg_5  0.05 0.28 0.46 0.15 0.05    0
icg_6  0.26 0.28 0.26 0.15 0.05    0
icg_7  0.18 0.31 0.26 0.21 0.05    0
icg_8  0.26 0.31 0.28 0.13 0.03    0
icg_9  0.46 0.36 0.18 0.00 0.00    0
icg_10 0.41 0.36 0.15 0.08 0.00    0
icg_11 0.79 0.15 0.05 0.00 0.00    0
icg_12 0.54 0.26 0.15 0.05 0.00    0
icg_13 0.13 0.26 0.28 0.18 0.15    0
icg_14 0.59 0.21 0.18 0.00 0.03    0
icg_15 0.82 0.08 0.08 0.00 0.03    0
icg_16 0.64 0.13 0.10 0.08 0.05    0
icg_17 0.54 0.18 0.21 0.05 0.03    0
icg_18 0.49 0.23 0.15 0.13 0.00    0
icg_19 0.08 0.26 0.33 0.26 0.08    0
## YSL
mlm_tot_ysl <- subset(mlm_scales, select=c(ysl_1:ysl_21))
psych::alpha(mlm_tot_ysl)

Reliability analysis   
Call: psych::alpha(x = mlm_tot_ysl)

  raw_alpha std.alpha G6(smc) average_r S/N    ase mean   sd median_r
      0.96      0.96    0.98      0.52  22 0.0094  3.2 0.89     0.55

 lower alpha upper     95% confidence boundaries
0.94 0.96 0.98 

 Reliability if an item is dropped:
       raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
ysl_1       0.96      0.96    0.98      0.55  24   0.0093 0.018  0.57
ysl_2       0.95      0.95    0.98      0.51  21   0.0100 0.026  0.54
ysl_3       0.95      0.95    0.98      0.51  21   0.0101 0.026  0.54
ysl_4       0.96      0.96    0.98      0.52  21   0.0098 0.026  0.55
ysl_5       0.95      0.95    0.98      0.51  21   0.0100 0.026  0.54
ysl_6       0.95      0.95    0.98      0.51  21   0.0101 0.026  0.54
ysl_7       0.96      0.96    0.98      0.52  21   0.0097 0.025  0.55
ysl_8       0.95      0.95    0.98      0.51  21   0.0102 0.025  0.54
ysl_9       0.95      0.95    0.98      0.51  21   0.0102 0.025  0.54
ysl_10      0.95      0.95    0.98      0.51  21   0.0101 0.025  0.53
ysl_11      0.95      0.95    0.98      0.51  21   0.0101 0.025  0.54
ysl_12      0.96      0.96    0.98      0.52  22   0.0095 0.026  0.57
ysl_13      0.96      0.96    0.98      0.53  23   0.0092 0.024  0.57
ysl_14      0.95      0.95    0.98      0.51  21   0.0100 0.026  0.53
ysl_15      0.96      0.96    0.98      0.53  23   0.0092 0.024  0.57
ysl_16      0.96      0.96    0.98      0.53  22   0.0095 0.026  0.57
ysl_17      0.95      0.96    0.98      0.52  21   0.0099 0.025  0.54
ysl_18      0.96      0.96    0.98      0.52  22   0.0097 0.026  0.55
ysl_19      0.95      0.95    0.98      0.51  21   0.0099 0.026  0.54
ysl_20      0.95      0.96    0.98      0.52  21   0.0099 0.025  0.55
ysl_21      0.95      0.95    0.98      0.51  21   0.0102 0.024  0.54

 Item statistics 
        n raw.r std.r r.cor r.drop mean   sd
ysl_1  39  0.37  0.39  0.38   0.33  4.4 0.74
ysl_2  39  0.79  0.80  0.79   0.77  3.6 1.10
ysl_3  39  0.85  0.85  0.85   0.83  2.4 0.99
ysl_4  39  0.73  0.74  0.73   0.71  2.8 0.90
ysl_5  39  0.79  0.79  0.79   0.76  2.9 1.14
ysl_6  39  0.82  0.82  0.82   0.79  2.9 1.20
ysl_7  39  0.74  0.73  0.73   0.70  3.2 1.63
ysl_8  39  0.84  0.83  0.82   0.82  2.5 1.50
ysl_9  39  0.84  0.83  0.83   0.81  2.7 1.40
ysl_10 39  0.82  0.82  0.82   0.80  2.6 1.35
ysl_11 39  0.83  0.82  0.82   0.80  2.9 1.32
ysl_12 39  0.65  0.65  0.65   0.61  2.7 1.28
ysl_13 39  0.57  0.57  0.57   0.53  3.7 1.25
ysl_14 39  0.80  0.80  0.79   0.78  3.7 1.21
ysl_15 39  0.56  0.57  0.56   0.52  3.9 1.24
ysl_16 39  0.58  0.60  0.58   0.55  4.5 0.76
ysl_17 39  0.75  0.74  0.74   0.72  3.5 1.14
ysl_18 39  0.71  0.71  0.70   0.68  3.1 1.20
ysl_19 39  0.77  0.77  0.76   0.74  3.4 1.12
ysl_20 39  0.76  0.76  0.76   0.73  3.3 1.36
ysl_21 39  0.84  0.84  0.84   0.82  3.2 1.38

Non missing response frequency for each item
          1    2    3    4    5 miss
ysl_1  0.00 0.00 0.15 0.33 0.51    0
ysl_2  0.03 0.15 0.28 0.31 0.23    0
ysl_3  0.26 0.21 0.44 0.10 0.00    0
ysl_4  0.08 0.28 0.46 0.15 0.03    0
ysl_5  0.08 0.36 0.26 0.21 0.10    0
ysl_6  0.18 0.18 0.28 0.31 0.05    0
ysl_7  0.23 0.15 0.15 0.10 0.36    0
ysl_8  0.36 0.21 0.18 0.08 0.18    0
ysl_9  0.28 0.15 0.31 0.10 0.15    0
ysl_10 0.28 0.23 0.21 0.18 0.10    0
ysl_11 0.15 0.23 0.31 0.13 0.18    0
ysl_12 0.21 0.28 0.23 0.18 0.10    0
ysl_13 0.08 0.08 0.23 0.26 0.36    0
ysl_14 0.05 0.10 0.28 0.21 0.36    0
ysl_15 0.03 0.15 0.21 0.15 0.46    0
ysl_16 0.00 0.00 0.15 0.15 0.69    0
ysl_17 0.05 0.13 0.33 0.26 0.23    0
ysl_18 0.10 0.26 0.21 0.33 0.10    0
ysl_19 0.05 0.13 0.36 0.26 0.21    0
ysl_20 0.10 0.23 0.21 0.21 0.26    0
ysl_21 0.18 0.10 0.26 0.26 0.21    0
## ECRRS-global anxiety
mlm_tot_ecrrs_global_anx <- subset(mlm_scales,  select=c(ecrrs_global_7c:ecrrs_global_9c))
psych::alpha(mlm_tot_ecrrs_global_anx)

Reliability analysis   
Call: psych::alpha(x = mlm_tot_ecrrs_global_anx)

  raw_alpha std.alpha G6(smc) average_r S/N   ase mean  sd median_r
       0.9       0.9    0.86      0.74 8.6 0.029  3.3 1.4     0.75

 lower alpha upper     95% confidence boundaries
0.84 0.9 0.95 

 Reliability if an item is dropped:
                raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
ecrrs_global_7c      0.88      0.88    0.79      0.79 7.3    0.038    NA  0.79
ecrrs_global_8c      0.82      0.82    0.69      0.69 4.4    0.059    NA  0.69
ecrrs_global_9c      0.86      0.86    0.75      0.75 6.1    0.045    NA  0.75

 Item statistics 
                 n raw.r std.r r.cor r.drop mean  sd
ecrrs_global_7c 39  0.89  0.89  0.80   0.76  3.6 1.6
ecrrs_global_8c 39  0.93  0.93  0.89   0.84  3.1 1.6
ecrrs_global_9c 39  0.91  0.91  0.84   0.79  3.2 1.6

Non missing response frequency for each item
                   1    2    3    4    5    6    7 miss
ecrrs_global_7c 0.08 0.23 0.18 0.26 0.13 0.08 0.05    0
ecrrs_global_8c 0.13 0.31 0.23 0.18 0.03 0.10 0.03    0
ecrrs_global_9c 0.15 0.28 0.08 0.23 0.18 0.08 0.00    0
## ECRRS-global avoid
mlm_tot_ecrrs_global_avoid <- subset(mlm_scales,  select=c(ecrrs_global_1r:ecrrs_global_4r, ecrrs_global_5c, ecrrs_global_6c))
psych::alpha(mlm_tot_ecrrs_global_avoid)

Reliability analysis   
Call: psych::alpha(x = mlm_tot_ecrrs_global_avoid)

  raw_alpha std.alpha G6(smc) average_r S/N   ase mean  sd median_r
      0.84      0.85    0.88      0.48 5.6 0.038  3.4 1.1      0.5

 lower alpha upper     95% confidence boundaries
0.77 0.84 0.92 

 Reliability if an item is dropped:
                raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
ecrrs_global_1r      0.84      0.85    0.88      0.53 5.7    0.041 0.029  0.50
ecrrs_global_2r      0.78      0.78    0.81      0.42 3.7    0.052 0.030  0.40
ecrrs_global_3r      0.80      0.80    0.82      0.45 4.1    0.048 0.033  0.47
ecrrs_global_4r      0.84      0.84    0.88      0.51 5.3    0.040 0.043  0.52
ecrrs_global_5c      0.83      0.84    0.84      0.51 5.2    0.042 0.023  0.51
ecrrs_global_6c      0.79      0.81    0.82      0.46 4.3    0.051 0.035  0.50

 Item statistics 
                 n raw.r std.r r.cor r.drop mean   sd
ecrrs_global_1r 39  0.57  0.64  0.53   0.46  2.0 0.93
ecrrs_global_2r 39  0.87  0.89  0.89   0.80  3.1 1.32
ecrrs_global_3r 39  0.80  0.83  0.82   0.70  2.9 1.42
ecrrs_global_4r 39  0.69  0.68  0.56   0.53  4.3 1.59
ecrrs_global_5c 39  0.74  0.69  0.65   0.58  4.0 1.72
ecrrs_global_6c 39  0.83  0.80  0.78   0.72  4.2 1.71

Non missing response frequency for each item
                   1    2    3    4    5    6    7 miss
ecrrs_global_1r 0.33 0.44 0.18 0.03 0.03 0.00 0.00    0
ecrrs_global_2r 0.10 0.26 0.36 0.08 0.18 0.03 0.00    0
ecrrs_global_3r 0.15 0.31 0.28 0.08 0.13 0.05 0.00    0
ecrrs_global_4r 0.08 0.05 0.18 0.10 0.41 0.10 0.08    0
ecrrs_global_5c 0.05 0.15 0.26 0.15 0.13 0.18 0.08    0
ecrrs_global_6c 0.05 0.13 0.23 0.10 0.26 0.13 0.10    0
## ECRRS-spouse anxiety
mlm_tot_ecrrs_spouse_anx <- subset(mlm_scales,  select=c(ecrrs_spouse_7c:ecrrs_spouse_9c))
psych::alpha(mlm_tot_ecrrs_spouse_anx)

Reliability analysis   
Call: psych::alpha(x = mlm_tot_ecrrs_spouse_anx)

  raw_alpha std.alpha G6(smc) average_r S/N   ase mean  sd median_r
      0.86      0.86    0.84      0.67 6.1 0.042  2.2 1.5     0.63

 lower alpha upper     95% confidence boundaries
0.77 0.86 0.94 

 Reliability if an item is dropped:
                raw_alpha std.alpha G6(smc) average_r  S/N alpha se var.r med.r
ecrrs_spouse_7c      0.92      0.92    0.86      0.86 11.9    0.025    NA  0.86
ecrrs_spouse_8c      0.69      0.69    0.53      0.53  2.2    0.099    NA  0.53
ecrrs_spouse_9c      0.77      0.77    0.63      0.63  3.3    0.074    NA  0.63

 Item statistics 
                 n raw.r std.r r.cor r.drop mean  sd
ecrrs_spouse_7c 39  0.82  0.81  0.63   0.60  2.2 1.8
ecrrs_spouse_8c 39  0.93  0.94  0.93   0.84  2.3 1.8
ecrrs_spouse_9c 39  0.89  0.90  0.87   0.77  2.1 1.6

Non missing response frequency for each item
                   1    2    3    4    5    6    7 miss
ecrrs_spouse_7c 0.59 0.15 0.03 0.08 0.08 0.03 0.05    0
ecrrs_spouse_8c 0.54 0.18 0.00 0.13 0.08 0.05 0.03    0
ecrrs_spouse_9c 0.59 0.18 0.00 0.13 0.03 0.08 0.00    0
## ECRRS-spouse avoid
mlm_tot_ecrrs_spouse_avoid <- subset(mlm_scales,  select=c(ecrrs_spouse_1r:ecrrs_spouse_4r, ecrrs_spouse_5c, ecrrs_spouse_6c))
psych::alpha(mlm_tot_ecrrs_spouse_avoid)

Reliability analysis   
Call: psych::alpha(x = mlm_tot_ecrrs_spouse_avoid)

  raw_alpha std.alpha G6(smc) average_r S/N   ase mean  sd median_r
      0.91      0.93    0.95      0.68  13 0.024  2.1 1.2     0.67

 lower alpha upper     95% confidence boundaries
0.86 0.91 0.95 

 Reliability if an item is dropped:
                raw_alpha std.alpha G6(smc) average_r  S/N alpha se var.r med.r
ecrrs_spouse_1r      0.87      0.89    0.91      0.63  8.5    0.034 0.024  0.60
ecrrs_spouse_2r      0.88      0.91    0.92      0.66  9.6    0.030 0.021  0.65
ecrrs_spouse_3r      0.88      0.90    0.92      0.65  9.1    0.031 0.025  0.62
ecrrs_spouse_4r      0.89      0.92    0.94      0.69 11.0    0.029 0.029  0.65
ecrrs_spouse_5c      0.92      0.94    0.95      0.75 15.2    0.021 0.021  0.76
ecrrs_spouse_6c      0.89      0.92    0.94      0.71 12.2    0.030 0.036  0.76

 Item statistics 
                 n raw.r std.r r.cor r.drop mean  sd
ecrrs_spouse_1r 39  0.94  0.96  0.97   0.91  2.0 1.2
ecrrs_spouse_2r 39  0.87  0.90  0.91   0.81  1.9 1.1
ecrrs_spouse_3r 39  0.90  0.92  0.93   0.86  1.8 1.0
ecrrs_spouse_4r 39  0.82  0.84  0.81   0.74  2.0 1.4
ecrrs_spouse_5c 39  0.77  0.72  0.63   0.61  2.7 1.9
ecrrs_spouse_6c 39  0.83  0.80  0.76   0.73  2.3 1.6

Non missing response frequency for each item
                   1    2    3    4    5    6    7 miss
ecrrs_spouse_1r 0.44 0.31 0.13 0.10 0.00 0.03 0.00    0
ecrrs_spouse_2r 0.44 0.41 0.05 0.08 0.00 0.03 0.00    0
ecrrs_spouse_3r 0.51 0.33 0.05 0.08 0.03 0.00 0.00    0
ecrrs_spouse_4r 0.51 0.23 0.10 0.10 0.03 0.00 0.03    0
ecrrs_spouse_5c 0.38 0.23 0.03 0.18 0.05 0.10 0.03    0
ecrrs_spouse_6c 0.49 0.23 0.00 0.13 0.13 0.03 0.00    0

Drop the 4 people with low trial counts

data <- filter(mlm_bx5, !grepl("D101|D141|D147|D148",ID))
unique(data$ID)
 [1] "D102" "D103" "D104" "D105" "D107" "D110" "D113" "D114" "D115" "D117" "D118" "D119" "D120" "D121" "D122" "D123" "D125" "D126" "D127" "D128" "D129" "D130" "D131" "D132" "D133"
[26] "D134" "D135" "D137" "D138" "D139" "D140" "D142" "D144" "D145" "D146"

OLS estimates for exploratory purposes

# getting regular regression estimates one person at a time for exploratory purposes:
lmEst <- by(data, data$ID, function(x) summary(lm(bias_t ~ trialnum, data=x)))
lmEst
data$ID: D102

Call:
lm(formula = bias_t ~ trialnum, data = x)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.48996 -0.58245  0.06197  0.68084  2.64005 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.291e-01  1.136e-01  -1.137    0.257
trialnum    -2.241e-05  1.359e-03  -0.016    0.987

Residual standard error: 0.9253 on 273 degrees of freedom
Multiple R-squared:  9.965e-07, Adjusted R-squared:  -0.003662 
F-statistic: 0.000272 on 1 and 273 DF,  p-value: 0.9869

----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D103

Call:
lm(formula = bias_t ~ trialnum, data = x)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.2719 -0.5849 -0.0927  0.3886  2.9673 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.082364   0.102043   0.807    0.420
trialnum    -0.001906   0.001211  -1.573    0.117

Residual standard error: 0.8135 on 262 degrees of freedom
Multiple R-squared:  0.009358,  Adjusted R-squared:  0.005577 
F-statistic: 2.475 on 1 and 262 DF,  p-value: 0.1169

----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D104

Call:
lm(formula = bias_t ~ trialnum, data = x)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.34415 -0.69203 -0.01214  0.68738  2.23131 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.1884022  0.1162064  -1.621    0.106
trialnum     0.0006281  0.0013690   0.459    0.647

Residual standard error: 0.9107 on 260 degrees of freedom
Multiple R-squared:  0.000809,  Adjusted R-squared:  -0.003034 
F-statistic: 0.2105 on 1 and 260 DF,  p-value: 0.6468

----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D105

Call:
lm(formula = bias_t ~ trialnum, data = x)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.3081 -0.7206 -0.0130  0.6973  3.2083 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0087356  0.1191993  -0.073    0.942
trialnum     0.0001354  0.0014298   0.095    0.925

Residual standard error: 0.9797 on 269 degrees of freedom
Multiple R-squared:  3.331e-05, Adjusted R-squared:  -0.003684 
F-statistic: 0.008962 on 1 and 269 DF,  p-value: 0.9247

----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D107

Call:
lm(formula = bias_t ~ trialnum, data = x)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.85010 -0.44853 -0.01855  0.41297  3.11259 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.060853   0.101464  -0.600    0.549
trialnum    -0.001334   0.001215  -1.098    0.273

Residual standard error: 0.731 on 222 degrees of freedom
Multiple R-squared:  0.0054,    Adjusted R-squared:  0.0009193 
F-statistic: 1.205 on 1 and 222 DF,  p-value: 0.2735

----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D110

Call:
lm(formula = bias_t ~ trialnum, data = x)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.67052 -0.56069 -0.05879  0.55996  2.49168 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.364749   0.096444  -3.782  0.00019 ***
trialnum     0.001562   0.001150   1.359  0.17533    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.8085 on 282 degrees of freedom
Multiple R-squared:  0.006504,  Adjusted R-squared:  0.002981 
F-statistic: 1.846 on 1 and 282 DF,  p-value: 0.1753

----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D113

Call:
lm(formula = bias_t ~ trialnum, data = x)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.5435 -0.7334 -0.0605  0.6933  3.8484 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.156910   0.143946    1.09    0.277
trialnum    -0.001568   0.001723   -0.91    0.364

Residual standard error: 1.158 on 255 degrees of freedom
Multiple R-squared:  0.003239,  Adjusted R-squared:  -0.0006703 
F-statistic: 0.8285 on 1 and 255 DF,  p-value: 0.3636

----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D114

Call:
lm(formula = bias_t ~ trialnum, data = x)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.50969 -0.65882 -0.00707  0.71733  2.52092 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.025504   0.120929  -0.211    0.833
trialnum     0.000752   0.001435   0.524    0.601

Residual standard error: 0.9673 on 264 degrees of freedom
Multiple R-squared:  0.001039,  Adjusted R-squared:  -0.002745 
F-statistic: 0.2745 on 1 and 264 DF,  p-value: 0.6007

----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D115

Call:
lm(formula = bias_t ~ trialnum, data = x)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.9324 -0.7991  0.0765  0.8424  3.6521 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.029711   0.148768  -0.200    0.842
trialnum    -0.000728   0.001758  -0.414    0.679

Residual standard error: 1.191 on 268 degrees of freedom
Multiple R-squared:  0.0006395, Adjusted R-squared:  -0.00309 
F-statistic: 0.1715 on 1 and 268 DF,  p-value: 0.6791

----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D117

Call:
lm(formula = bias_t ~ trialnum, data = x)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.64965 -0.58759  0.01891  0.64528  2.42830 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1011772  0.1241156   0.815    0.416
trialnum    0.0005858  0.0014759   0.397    0.692

Residual standard error: 1.006 on 265 degrees of freedom
Multiple R-squared:  0.000594,  Adjusted R-squared:  -0.003177 
F-statistic: 0.1575 on 1 and 265 DF,  p-value: 0.6918

----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D118

Call:
lm(formula = bias_t ~ trialnum, data = x)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.5708 -0.6143  0.0243  0.5573  3.5952 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.147982   0.115324  -1.283    0.201
trialnum     0.001085   0.001367   0.794    0.428

Residual standard error: 0.9324 on 258 degrees of freedom
Multiple R-squared:  0.002435,  Adjusted R-squared:  -0.001432 
F-statistic: 0.6298 on 1 and 258 DF,  p-value: 0.4282

----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D119

Call:
lm(formula = bias_t ~ trialnum, data = x)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.46893 -0.28725 -0.04541  0.27308  1.41940 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)   
(Intercept) -0.1848698  0.0580857  -3.183  0.00163 **
trialnum     0.0009938  0.0006790   1.464  0.14452   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4621 on 263 degrees of freedom
Multiple R-squared:  0.008078,  Adjusted R-squared:  0.004307 
F-statistic: 2.142 on 1 and 263 DF,  p-value: 0.1445

----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D120

Call:
lm(formula = bias_t ~ trialnum, data = x)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.23591 -0.59528  0.02358  0.66195  2.25539 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.013139   0.106695   0.123    0.902
trialnum    0.001517   0.001279   1.186    0.237

Residual standard error: 0.8919 on 281 degrees of freedom
Multiple R-squared:  0.004979,  Adjusted R-squared:  0.001438 
F-statistic: 1.406 on 1 and 281 DF,  p-value: 0.2367

----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D121

Call:
lm(formula = bias_t ~ trialnum, data = x)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.43694 -0.91284  0.06214  0.88456  2.63851 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0733386  0.1441131   0.509    0.611
trialnum    0.0007977  0.0016891   0.472    0.637

Residual standard error: 1.109 on 250 degrees of freedom
Multiple R-squared:  0.0008913, Adjusted R-squared:  -0.003105 
F-statistic: 0.223 on 1 and 250 DF,  p-value: 0.6372

----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D122

Call:
lm(formula = bias_t ~ trialnum, data = x)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.56956 -0.41297  0.06257  0.47489  2.13686 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.363954   0.087539   4.158 4.36e-05 ***
trialnum    -0.002274   0.001048  -2.170   0.0309 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.7109 on 262 degrees of freedom
Multiple R-squared:  0.01766,   Adjusted R-squared:  0.01391 
F-statistic:  4.71 on 1 and 262 DF,  p-value: 0.03089

----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D123

Call:
lm(formula = bias_t ~ trialnum, data = x)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.40012 -0.56537 -0.00043  0.66146  2.34616 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0491546  0.1061556   0.463    0.644
trialnum    0.0006319  0.0012798   0.494    0.622

Residual standard error: 0.8798 on 280 degrees of freedom
Multiple R-squared:  0.0008698, Adjusted R-squared:  -0.002698 
F-statistic: 0.2438 on 1 and 280 DF,  p-value: 0.6219

----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D125

Call:
lm(formula = bias_t ~ trialnum, data = x)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.8243 -0.7796  0.1010  0.8776  2.5089 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)  
(Intercept) -0.104281   0.149377  -0.698   0.4858  
trialnum     0.003034   0.001758   1.726   0.0855 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.171 on 251 degrees of freedom
Multiple R-squared:  0.01174,   Adjusted R-squared:  0.007799 
F-statistic: 2.981 on 1 and 251 DF,  p-value: 0.08549

----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D126

Call:
lm(formula = bias_t ~ trialnum, data = x)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.59811 -0.84432  0.03587  0.79812  2.77645 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.116054   0.146348   0.793    0.429
trialnum    -0.001164   0.001716  -0.678    0.498

Residual standard error: 1.132 on 251 degrees of freedom
Multiple R-squared:  0.001829,  Adjusted R-squared:  -0.002148 
F-statistic: 0.4599 on 1 and 251 DF,  p-value: 0.4983

----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D127

Call:
lm(formula = bias_t ~ trialnum, data = x)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.34833 -0.46407  0.01749  0.49797  2.14017 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0083635  0.0926001   0.090    0.928
trialnum    0.0002109  0.0011159   0.189    0.850

Residual standard error: 0.7748 on 277 degrees of freedom
Multiple R-squared:  0.0001289, Adjusted R-squared:  -0.003481 
F-statistic: 0.03571 on 1 and 277 DF,  p-value: 0.8503

----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D128

Call:
lm(formula = bias_t ~ trialnum, data = x)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.88746 -0.77939  0.01618  0.90018  2.92400 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.036492   0.144944   0.252    0.801
trialnum    0.001748   0.001781   0.982    0.327

Residual standard error: 1.122 on 246 degrees of freedom
Multiple R-squared:  0.003902,  Adjusted R-squared:  -0.0001475 
F-statistic: 0.9636 on 1 and 246 DF,  p-value: 0.3273

----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D129

Call:
lm(formula = bias_t ~ trialnum, data = x)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.29761 -0.68312 -0.07219  0.64877  2.57404 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)  
(Intercept)  0.271839   0.116233   2.339   0.0201 *
trialnum    -0.001223   0.001378  -0.887   0.3756  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.9355 on 270 degrees of freedom
Multiple R-squared:  0.002909,  Adjusted R-squared:  -0.0007842 
F-statistic: 0.7876 on 1 and 270 DF,  p-value: 0.3756

----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D130

Call:
lm(formula = bias_t ~ trialnum, data = x)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.96670 -0.73599  0.05166  0.64512  2.42892 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)   
(Intercept) -0.286183   0.107700  -2.657  0.00834 **
trialnum     0.002860   0.001289   2.218  0.02737 * 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.8939 on 273 degrees of freedom
Multiple R-squared:  0.0177,    Adjusted R-squared:  0.01411 
F-statistic:  4.92 on 1 and 273 DF,  p-value: 0.02737

----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D131

Call:
lm(formula = bias_t ~ trialnum, data = x)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.2194 -0.5982 -0.0545  0.6498  2.7764 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.0791588  0.1059716   0.747    0.456
trialnum    -0.0003126  0.0012645  -0.247    0.805

Residual standard error: 0.8565 on 271 degrees of freedom
Multiple R-squared:  0.0002255, Adjusted R-squared:  -0.003464 
F-statistic: 0.06111 on 1 and 271 DF,  p-value: 0.8049

----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D132

Call:
lm(formula = bias_t ~ trialnum, data = x)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.1408 -0.7653  0.1364  0.8599  2.3351 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.013177   0.136249   0.097    0.923
trialnum    0.001955   0.001622   1.206    0.229

Residual standard error: 1.121 on 279 degrees of freedom
Multiple R-squared:  0.005182,  Adjusted R-squared:  0.001616 
F-statistic: 1.453 on 1 and 279 DF,  p-value: 0.229

----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D133

Call:
lm(formula = bias_t ~ trialnum, data = x)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.4840 -0.8749 -0.1888  0.8248  3.3248 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.1745814  0.1439426  -1.213    0.226
trialnum    -0.0008207  0.0017355  -0.473    0.637

Residual standard error: 1.142 on 240 degrees of freedom
Multiple R-squared:  0.000931,  Adjusted R-squared:  -0.003232 
F-statistic: 0.2236 on 1 and 240 DF,  p-value: 0.6367

----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D134

Call:
lm(formula = bias_t ~ trialnum, data = x)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.2731 -0.5992 -0.0479  0.7295  3.3435 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.0536221  0.1424674   0.376    0.707
trialnum    -0.0004335  0.0016789  -0.258    0.796

Residual standard error: 1.138 on 262 degrees of freedom
Multiple R-squared:  0.0002544, Adjusted R-squared:  -0.003561 
F-statistic: 0.06666 on 1 and 262 DF,  p-value: 0.7965

----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D135

Call:
lm(formula = bias_t ~ trialnum, data = x)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.78490 -0.53797  0.01393  0.67350  2.64560 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept) 0.053305   0.123706   0.431   0.6669  
trialnum    0.003670   0.001462   2.510   0.0127 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.9873 on 261 degrees of freedom
Multiple R-squared:  0.02356,   Adjusted R-squared:  0.01982 
F-statistic: 6.298 on 1 and 261 DF,  p-value: 0.01269

----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D137

Call:
lm(formula = bias_t ~ trialnum, data = x)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.6075 -0.9216 -0.0313  0.7593  3.2136 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.2405934  0.1503080  -1.601    0.111
trialnum     0.0008816  0.0017989   0.490    0.624

Residual standard error: 1.231 on 270 degrees of freedom
Multiple R-squared:  0.0008888, Adjusted R-squared:  -0.002812 
F-statistic: 0.2402 on 1 and 270 DF,  p-value: 0.6245

----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D138

Call:
lm(formula = bias_t ~ trialnum, data = x)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.90652 -0.57527 -0.05418  0.52103  2.44687 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)  
(Intercept) -0.2472722  0.1014525  -2.437   0.0154 *
trialnum     0.0003618  0.0012118   0.299   0.7655  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.8344 on 272 degrees of freedom
Multiple R-squared:  0.0003276, Adjusted R-squared:  -0.003348 
F-statistic: 0.08913 on 1 and 272 DF,  p-value: 0.7655

----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D139

Call:
lm(formula = bias_t ~ trialnum, data = x)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.89378 -0.76546  0.05778  0.67061  2.69126 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.1587970  0.1281068    1.24    0.216
trialnum    -0.0001376  0.0015243   -0.09    0.928

Residual standard error: 1.025 on 260 degrees of freedom
Multiple R-squared:  3.136e-05, Adjusted R-squared:  -0.003815 
F-statistic: 0.008153 on 1 and 260 DF,  p-value: 0.9281

----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D140

Call:
lm(formula = bias_t ~ trialnum, data = x)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.3323 -0.6421  0.0875  0.6323  3.0150 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.03402    0.12900   0.264    0.792
trialnum    -0.00110    0.00152  -0.724    0.470

Residual standard error: 1.031 on 251 degrees of freedom
Multiple R-squared:  0.002084,  Adjusted R-squared:  -0.001892 
F-statistic: 0.5241 on 1 and 251 DF,  p-value: 0.4698

----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D142

Call:
lm(formula = bias_t ~ trialnum, data = x)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.9667 -0.7474  0.1062  0.7347  2.7858 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.046481   0.128445   0.362    0.718
trialnum    -0.001103   0.001506  -0.732    0.465

Residual standard error: 1.014 on 271 degrees of freedom
Multiple R-squared:  0.001975,  Adjusted R-squared:  -0.001707 
F-statistic: 0.5364 on 1 and 271 DF,  p-value: 0.4646

----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D144

Call:
lm(formula = bias_t ~ trialnum, data = x)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.4566 -0.6576 -0.1585  0.7136  3.0419 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.154162   0.137451  -1.122    0.263
trialnum    -0.000492   0.001635  -0.301    0.764

Residual standard error: 1.077 on 258 degrees of freedom
Multiple R-squared:  0.0003507, Adjusted R-squared:  -0.003524 
F-statistic: 0.0905 on 1 and 258 DF,  p-value: 0.7638

----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D145

Call:
lm(formula = bias_t ~ trialnum, data = x)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.9882 -1.0491  0.1342  0.9403  3.2810 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.0368251  0.1526175   0.241    0.810
trialnum    -0.0003689  0.0018315  -0.201    0.841

Residual standard error: 1.246 on 267 degrees of freedom
Multiple R-squared:  0.0001519, Adjusted R-squared:  -0.003593 
F-statistic: 0.04056 on 1 and 267 DF,  p-value: 0.8405

----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D146

Call:
lm(formula = bias_t ~ trialnum, data = x)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.2402 -0.7404 -0.0087  0.6633  3.1953 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.075070   0.122868   0.611    0.542
trialnum    -0.002039   0.001469  -1.388    0.166

Residual standard error: 1.014 on 275 degrees of freedom
Multiple R-squared:  0.00696,   Adjusted R-squared:  0.003349 
F-statistic: 1.928 on 1 and 275 DF,  p-value: 0.1662

Stem plot for the intercepts based on the OLS estimates:

int <- by(data, data$ID, function(x) coefficients(lm(bias_t ~ trialnum, data=x))[[1]])
int
data$ID: D102
[1] -0.1290878
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D103
[1] 0.08236442
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D104
[1] -0.1884022
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D105
[1] -0.008735585
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D107
[1] -0.06085291
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D110
[1] -0.3647491
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D113
[1] 0.1569096
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D114
[1] -0.02550412
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D115
[1] -0.02971102
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D117
[1] 0.1011772
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D118
[1] -0.1479823
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D119
[1] -0.1848698
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D120
[1] 0.01313922
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D121
[1] 0.07333861
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D122
[1] 0.3639537
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D123
[1] 0.04915464
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D125
[1] -0.1042809
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D126
[1] 0.116054
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D127
[1] 0.008363529
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D128
[1] 0.0364918
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D129
[1] 0.2718394
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D130
[1] -0.2861826
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D131
[1] 0.07915881
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D132
[1] 0.01317668
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D133
[1] -0.1745814
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D134
[1] 0.05362205
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D135
[1] 0.05330519
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D137
[1] -0.2405934
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D138
[1] -0.2472722
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D139
[1] 0.158797
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D140
[1] 0.03402106
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D142
[1] 0.04648122
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D144
[1] -0.1541616
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D145
[1] 0.03682513
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D146
[1] 0.07506976
summary(int)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-0.36470 -0.13850  0.01318 -0.01496  0.07420  0.36400 
stem(int, scale=2)

  The decimal point is 1 digit(s) to the left of the |

  -3 | 6
  -3 | 
  -2 | 95
  -2 | 4
  -1 | 98755
  -1 | 30
  -0 | 6
  -0 | 331
   0 | 111344
   0 | 55557888
   1 | 02
   1 | 66
   2 | 
   2 | 7
   3 | 
   3 | 6
hist(int)

Stem plot for the slopes based on the OLS estimates

slope <- by(data, data$ID, function(x) coefficients(lm(bias_t ~ trialnum, data=x))[[2]])
slope
data$ID: D102
[1] -2.241218e-05
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D103
[1] -0.001905679
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D104
[1] 0.0006280884
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D105
[1] 0.0001353558
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D107
[1] -0.001334097
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D110
[1] 0.001561949
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D113
[1] -0.001568244
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D114
[1] 0.0007520295
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D115
[1] -0.0007279879
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D117
[1] 0.000585758
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D118
[1] 0.001084571
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D119
[1] 0.000993783
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D120
[1] 0.001516656
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D121
[1] 0.0007976749
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D122
[1] -0.002274025
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D123
[1] 0.0006318915
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D125
[1] 0.003034465
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D126
[1] -0.001163756
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D127
[1] 0.0002108626
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D128
[1] 0.001748003
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D129
[1] -0.001222927
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D130
[1] 0.002859666
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D131
[1] -0.0003125823
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D132
[1] 0.001955351
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D133
[1] -0.0008207093
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D134
[1] -0.0004334724
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D135
[1] 0.003669641
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D137
[1] 0.0008815924
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D138
[1] 0.000361766
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D139
[1] -0.0001376385
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D140
[1] -0.001100133
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D142
[1] -0.001103152
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D144
[1] -0.000491994
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D145
[1] -0.0003688567
----------------------------------------------------------------------------------------------------------------------------------------- 
data$ID: D146
[1] -0.002038851
summary(slope)
      Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
-0.0022740 -0.0009604  0.0001354  0.0001824  0.0009377  0.0036700 
stem(slope, scale=2)

  The decimal point is 3 digit(s) to the left of the |

  -2 | 30
  -1 | 96
  -1 | 32211
  -0 | 875
  -0 | 44310
   0 | 124
   0 | 666889
   1 | 01
   1 | 567
   2 | 0
   2 | 9
   3 | 0
   3 | 7
hist(slope)

Note that these are all extremely close to zero - probably because of the lack of time effect.

R-square for OLS models

rsq <- by(data, data$ID, function(x) summary(lm(bias_t ~ trialnum, data=x))$r.squared)
summary(rsq)
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
9.960e-07 4.724e-04 1.829e-03 4.105e-03 5.291e-03 2.356e-02 
stem(rsq, scale=2)

  The decimal point is 3 digit(s) to the left of the |

   0 | 000122334668999908
   2 | 014929
   4 | 024
   6 | 50
   8 | 14
  10 | 7
  12 | 
  14 | 
  16 | 77
  18 | 
  20 | 
  22 | 6
cor(int, slope)
[1] -0.475133

The correlation between intercepts and slopes is -0.475, so people who show less avoidance bias overall change less over the course of the task.

Graphing multilevel data for exploratory purposes

library(lattice)
xyplot(bias_t ~ trialnum | ID, data = data, as.table=T) # too many people/hard to see, try random subset

ids <- sample(unique(data$ID), 12)
ids
 [1] "D125" "D128" "D145" "D114" "D115" "D129" "D117" "D131" "D123" "D134" "D132" "D118"
temp <- data[data$ID %in% ids, ]
xyplot(bias_t ~ trialnum | ID, data = temp, as.table=T) 

# add fit line(s)
xyplot(bias_t ~ trialnum | ID, data = temp, type=c("p","r")) # linear fit

xyplot(bias_t ~ trialnum | ID, data = temp, type=c("p","smooth")) # curvilinear fit

# graph of the raw data for each person
interaction.plot(data$trialnum, data$ID, data$bias_t)

# generate linear fitted values for each person and then plot all together in one plot
fit <- by(data, data$ID, function(x) fitted.values(lm(bias_t ~ trialnum, data=x)))
fit2 <- unlist(fit)
interaction.plot(data$trialnum, data$ID, fit2, xlab="trial number", ylab="RT bias (transformed)")

There does not seem to be either a linear or curvilinear effect of trial number (i.e., time) on RT bias in the task.

Analyses (NHST)

Remember, the effect of time refers to trialnumber (this is NOT the time variable; time refers to time of day when task was run.)

Unconditional means model

library(nlme)
# unconditional means model
m1 <- lme(bias_t ~ 1, random = ~ 1 | ID, data=data, method="ML")
summary(m1)
Linear mixed-effects model fit by maximum likelihood
 Data: data 
       AIC      BIC    logLik
  26249.82 26271.23 -13121.91

Random effects:
 Formula: ~1 | ID
        (Intercept)  Residual
StdDev:   0.1244563 0.9906337

Fixed effects: bias_t ~ 1 
                   Value  Std.Error   DF     t-value p-value
(Intercept) -0.001588482 0.02341911 9254 -0.06782845  0.9459

Standardized Within-Group Residuals:
         Min           Q1          Med           Q3          Max 
-3.545388737 -0.657743071  0.002789368  0.676201102  3.889482930 

Number of Observations: 9289
Number of Groups: 35 
# ICC
iVar <- as.numeric(VarCorr(m1)[1,1])
residVar <- as.numeric(VarCorr(m1)[2,1])
icc1 <- iVar/(iVar+residVar) 
icc1
[1] 0.0155384

The overall mean bias score (transformed) is about -0.002, with a standard deviation of 0.023.

ICC is about 1.6% of the variance. This means that 1.6% of the variance in bias is explained by people being different from one another.

Add fixed and random intercept and slope of time

# model with fixed and random intercept and slope of time (intercepts implied)
m2 <- lme(bias_t ~ trialnum, data = data, random = ~ trialnum | ID, method="ML")
summary(m2) 
Linear mixed-effects model fit by maximum likelihood
 Data: data 
       AIC      BIC    logLik
  26254.44 26297.26 -13121.22

Random effects:
 Formula: ~trialnum | ID
 Structure: General positive-definite, Log-Cholesky parametrization
            StdDev      Corr  
(Intercept) 0.112903390 (Intr)
trialnum    0.000529973 0.112 
Residual    0.990386738       

Fixed effects: bias_t ~ trialnum 
                   Value   Std.Error   DF    t-value p-value
(Intercept) -0.015938869 0.028255586 9253 -0.5640962  0.5727
trialnum     0.000196016 0.000263655 9253  0.7434570  0.4572
 Correlation: 
         (Intr)
trialnum -0.577

Standardized Within-Group Residuals:
         Min           Q1          Med           Q3          Max 
-3.540840497 -0.657596342  0.001668977  0.677196004  3.894566704 

Number of Observations: 9289
Number of Groups: 35 
anova(m2)
            numDF denDF   F-value p-value
(Intercept)     1  9253 0.0272704  0.8688
trialnum        1  9253 0.5527283  0.4572

The average bias at trial number = 0 is -0.016 and gets increasingly approach-biased by a nonsignificant fraction of a percent with each unit of time. There is no significant effect of trial number (time) on RT bias score, t = 0.74, p = 0.457.

# ICC
iVar <- as.numeric(VarCorr(m2)[1,1])
residVar <- as.numeric(VarCorr(m2)[2,1])
icc2 <- iVar/(iVar+residVar)
icc2
[1] 0.999978

99.99% of variance in bias is due to the between-person variance, after controlling for time. This is consistent with the lack of a significant effect of time. However… I have no clue how this makes sense with the ICC of 0.016 from the unconditional means model above, which suggests that <2% of the variance is due to between-person differences. ZERO clue. Atina tried to help me think through this for 20 minutes and I still don’t get it.

Adding stimulus as a fixed effect

There were 5 different types of stimuli, presented in a randomized order (consistent across participants/sessions) determined by genetic algorithm. Everyone saw the same stimuli in the same order, so it is a fixed effect. The different categories of images were: Death (generic), Deceased (personal photo), Living loved one (personal photo), Stranger (sex-matched to spouse), and Neutral (objects).

# add the fixed effect of stim and compare its fit to m2 by inspecting the t and f tests
m3 <- lme(bias_t ~ trialnum + stim, data = data, random = ~ trialnum | ID, method="ML")
summary(m3)
Linear mixed-effects model fit by maximum likelihood
 Data: data 
      AIC      BIC    logLik
  26222.9 26294.27 -13101.45

Random effects:
 Formula: ~trialnum | ID
 Structure: General positive-definite, Log-Cholesky parametrization
            StdDev       Corr  
(Intercept) 0.1132056534 (Intr)
trialnum    0.0005257672 0.108 
Residual    0.9882755468       

Fixed effects: bias_t ~ trialnum + stim 
                   Value  Std.Error   DF   t-value p-value
(Intercept)  -0.01666628 0.03734155 9249 -0.446320  0.6554
trialnum      0.00006687 0.00026759 9249  0.249892  0.8027
stimdeath    -0.05312693 0.03423609 9249 -1.551782  0.1207
stimliving    0.04449279 0.03262355 9249  1.363824  0.1727
stimspouse    0.11774054 0.03318866 9249  3.547614  0.0004
stimstranger -0.05577654 0.03227777 9249 -1.728017  0.0840
 Correlation: 
             (Intr) trilnm stmdth stmlvn stmsps
trialnum     -0.533                            
stimdeath    -0.537  0.171                     
stimliving   -0.529  0.123  0.517              
stimspouse   -0.484  0.063  0.497  0.518       
stimstranger -0.534  0.123  0.523  0.541  0.523

Standardized Within-Group Residuals:
         Min           Q1          Med           Q3          Max 
-3.539701896 -0.662289657  0.002436611  0.671269781  3.967555161 

Number of Observations: 9289
Number of Groups: 35 

NO effect of time. There is a significant fixed effect of spouse, t = 3.55, p = .0004. There is also a non-significant effect of stranger, t = -1.73, p = .084. This indicates that people tended to show more approach bias to photos of their spouse (across conditions). The parameter estimate is 0.12 with an SE of 0.03. They also tended to show greater more avoidance bias to photos of the sex-matched stranger.

intervals(m3, which="fixed")
Approximate 95% confidence intervals

 Fixed effects:
                     lower          est.       upper
(Intercept)  -0.0898403107 -1.666628e-02 0.056507748
trialnum     -0.0004574916  6.686771e-05 0.000591227
stimdeath    -0.1202155468 -5.312693e-02 0.013961677
stimliving   -0.0194359045  4.449279e-02 0.108421484
stimspouse    0.0527044697  1.177405e-01 0.182776618
stimstranger -0.1190276408 -5.577654e-02 0.007474562
attr(,"label")
[1] "Fixed effects:"

95% CIs don’t cross zero for spouse, but do for all other stimulus types.

iVar <- as.numeric(VarCorr(m3)[1,1])
residVar <- as.numeric(VarCorr(m3)[2,1])
icc3 <- iVar/(iVar+residVar)
icc3 # Same ICC as m2
[1] 0.9999784

The ICC is identical to that in model 2.

anova(m2, m3) # compare this model with the first one
   Model df      AIC      BIC    logLik   Test  L.Ratio p-value
m2     1  6 26254.44 26297.26 -13121.22                        
m3     2 10 26222.90 26294.27 -13101.45 1 vs 2 39.53877  <.0001

Comparing the model with the fixed effect of stimulus (m3) to the model without (m2), m3 is better (p < .0001) as evidenced by lower AIC, BIC, and lL values.

Take out fixed and random effect of time

Because there’s no significant effect of time, makes sense to drop it from the model. Now we’re basically doing regression but accounting for whatever interdependence exists. Note that these results will NOT match the results of the regressions, because the regressions use medians of the bias scores, rather than means.

# without fixed or random effect of time
m4 <- lme(bias_t ~ stim, data = data, random = ~ 1 | ID, method="ML")
summary(m4)
Linear mixed-effects model fit by maximum likelihood
 Data: data 
       AIC      BIC    logLik
  26217.69 26267.65 -13101.85

Random effects:
 Formula: ~1 | ID
        (Intercept)  Residual
StdDev:   0.1244692 0.9884897

Fixed effects: bias_t ~ stim 
                   Value  Std.Error   DF   t-value p-value
(Intercept)  -0.01079980 0.03153608 9250 -0.342458  0.7320
stimdeath    -0.05495775 0.03367448 9250 -1.632029  0.1027
stimliving    0.04329407 0.03234999 9250  1.338302  0.1808
stimspouse    0.11715733 0.03312053 9250  3.537302  0.0004
stimstranger -0.05692371 0.03200718 9250 -1.778467  0.0754
 Correlation: 
             (Intr) stmdth stmlvn stmsps
stimdeath    -0.519                     
stimliving   -0.541  0.506              
stimspouse   -0.528  0.495  0.515       
stimstranger -0.546  0.512  0.533  0.520

Standardized Within-Group Residuals:
         Min           Q1          Med           Q3          Max 
-3.543324753 -0.662107099  0.003549832  0.673528310  3.963270484 

Number of Observations: 9289
Number of Groups: 35 

Fixed effects are still more or less the same - significant for spouse, marginal for stranger - although the parameter estimates and test statistics are slightly different.

intervals(m4, which="fixed")
Approximate 95% confidence intervals

 Fixed effects:
                   lower        est.       upper
(Intercept)  -0.07260082 -0.01079980 0.051001227
stimdeath    -0.12094940 -0.05495775 0.011033896
stimliving   -0.02010198  0.04329407 0.106690123
stimspouse    0.05225126  0.11715733 0.182063398
stimstranger -0.11964795 -0.05692371 0.005800537
attr(,"label")
[1] "Fixed effects:"

As above, spouse is the only stimulus category whose 95 CIs don’t cross zero, supporting a significant effect of spouse on bias.

Is m4 any better than m3?

anova(m3,m4)
   Model df      AIC      BIC    logLik   Test   L.Ratio p-value
m3     1 10 26222.90 26294.27 -13101.45                         
m4     2  7 26217.69 26267.65 -13101.85 1 vs 2 0.7920774  0.8514

Nope. Models do not significantly differ (p = 0.851), though the AIC and BIC values are a little smaller for m4 than m3. So let’s forget time and go with m4…

Adding condition as a fixed effect

Participants completed the task at two different sessions, where they were randomized to receive treatment A at one and treatment B at the other (order counterbalanced across participants). The two treatments are oxytocin vs. placebo.

# add the interaction of stim and check its fit by inspecting t and f tests. 
m5 <- lme(bias_t ~ stim + cond, data = data, random = ~ 1 | ID, method="ML")
summary(m5)
Linear mixed-effects model fit by maximum likelihood
 Data: data 
       AIC      BIC    logLik
  26219.67 26276.77 -13101.84

Random effects:
 Formula: ~1 | ID
        (Intercept)  Residual
StdDev:   0.1244747 0.9884885

Fixed effects: bias_t ~ stim + cond 
                   Value  Std.Error   DF   t-value p-value
(Intercept)  -0.01230112 0.03317411 9249 -0.370805  0.7108
stimdeath    -0.05494483 0.03367637 9249 -1.631554  0.1028
stimliving    0.04326388 0.03235236 9249  1.337271  0.1812
stimspouse    0.11715898 0.03312228 9249  3.537166  0.0004
stimstranger -0.05694662 0.03200925 9249 -1.779068  0.0753
condB         0.00299612 0.02053218 9249  0.145923  0.8840
 Correlation: 
             (Intr) stmdth stmlvn stmsps stmstr
stimdeath    -0.495                            
stimliving   -0.512  0.506                     
stimspouse   -0.502  0.494  0.515              
stimstranger -0.518  0.512  0.533  0.520       
condB        -0.310  0.003 -0.006  0.000 -0.005

Standardized Within-Group Residuals:
         Min           Q1          Med           Q3          Max 
-3.544942380 -0.661684418  0.005083279  0.672392393  3.961649422 

Number of Observations: 9289
Number of Groups: 35 

The fixed effect of condition B is not significant, t = 0.15, p = 0.884.

anova(m4, m5)
   Model df      AIC      BIC    logLik   Test    L.Ratio p-value
m4     1  7 26217.69 26267.65 -13101.85                          
m5     2  8 26219.67 26276.76 -13101.84 1 vs 2 0.02130728  0.8839

The two models are not significantly different, but the one with the fixed effect of condition (m5) has slightly higher AIC/BIC.

Adding the stimulus*condition interaction

We might expect that peoples’ tendency to approach or avoid might differ depending on which treatment they got, but only for certain stimuli.

# add the interaction of stim and check its fit by inspecting t and f tests. 
m6 <- lme(bias_t ~ stim*cond, data = data, random = ~ 1 | ID, method="ML")
summary(m6)
Linear mixed-effects model fit by maximum likelihood
 Data: data 
       AIC   BIC    logLik
  26222.36 26308 -13099.18

Random effects:
 Formula: ~1 | ID
        (Intercept)  Residual
StdDev:    0.124494 0.9882046

Fixed effects: bias_t ~ stim * cond 
                         Value  Std.Error   DF    t-value p-value
(Intercept)         0.01640033 0.03934944 9245  0.4167868  0.6768
stimdeath          -0.11111299 0.04756084 9245 -2.3362283  0.0195
stimliving          0.04131019 0.04600610 9245  0.8979286  0.3692
stimspouse          0.06476556 0.04686377 9245  1.3819963  0.1670
stimstranger       -0.09233079 0.04545796 9245 -2.0311251  0.0423
condB              -0.05431038 0.04696981 9245 -1.1562826  0.2476
stimdeath:condB     0.11261285 0.06734696 9245  1.6721298  0.0945
stimliving:condB    0.00494200 0.06470579 9245  0.0763764  0.9391
stimspouse:condB    0.10467698 0.06624022 9245  1.5802632  0.1141
stimstranger:condB  0.07047422 0.06401636 9245  1.1008783  0.2710
 Correlation: 
                   (Intr) stmdth stmlvn stmsps stmstr condB  stmd:B stml:B stmsp:B
stimdeath          -0.590                                                         
stimliving         -0.610  0.505                                                  
stimspouse         -0.599  0.496  0.513                                           
stimstranger       -0.618  0.511  0.528  0.519                                    
condB              -0.598  0.495  0.511  0.502  0.517                             
stimdeath:condB     0.417 -0.706 -0.357 -0.350 -0.361 -0.697                      
stimliving:condB    0.434 -0.359 -0.711 -0.364 -0.376 -0.726  0.506               
stimspouse:condB    0.424 -0.351 -0.363 -0.707 -0.367 -0.709  0.495  0.515        
stimstranger:condB  0.439 -0.363 -0.375 -0.368 -0.710 -0.734  0.512  0.533  0.520 

Standardized Within-Group Residuals:
         Min           Q1          Med           Q3          Max 
-3.517164674 -0.660296590  0.006315667  0.672822601  3.934464966 

Number of Observations: 9289
Number of Groups: 35 

Interesting. With the interaction in the model, there is a non-significant interaction between death and condition B.

anova(m6,m4,m5)
   Model df      AIC      BIC    logLik   Test  L.Ratio p-value
m6     1 12 26222.36 26308.00 -13099.18                        
m4     2  7 26217.69 26267.65 -13101.85 1 vs 2 5.332051  0.3767
m5     3  8 26219.67 26276.76 -13101.84 2 vs 3 0.021307  0.8839
intervals(m6)
Approximate 95% confidence intervals

 Fixed effects:
                         lower         est.       upper
(Intercept)        -0.06069172  0.016400326  0.09349237
stimdeath          -0.20429254 -0.111112992 -0.01793344
stimliving         -0.04882335  0.041310191  0.13144373
stimspouse         -0.02704831  0.064765560  0.15657943
stimstranger       -0.18139044 -0.092330795 -0.00327115
condB              -0.14633201 -0.054310379  0.03771125
stimdeath:condB    -0.01933096  0.112612852  0.24455667
stimliving:condB   -0.12182734  0.004941997  0.13171134
stimspouse:condB   -0.02509855  0.104676978  0.23445251
stimstranger:condB -0.05494441  0.070474223  0.19589286
attr(,"label")
[1] "Fixed effects:"

 Random Effects:
  Level: ID 
                    lower     est.     upper
sd((Intercept)) 0.0931069 0.124494 0.1664618

 Within-group standard error:
    lower      est.     upper 
0.9740020 0.9882046 1.0026143 

The model with the interaction in it is not significantly better than either the one with just the fixed effect of condition or with no effect of condition at all, so makes sense to go with the simplest model which only includes stimulus as a fixed effect.

Adding other plausible predictors

There are a few other variables that we might expect to impact approach/avoidance bias, namely grief severity, OXTR genotype, time since death, and attachment style (avoidant, anxious).

# used group mean centered scores for the additional variables for interpretability
m7 <- lme(bias_t ~ stim + rs53576_A.carrier + rs2268498_C.carrier + gmean_ecrrs_global_anx + gmean_ecrrs_global_avoid + gmean_age + gmean_icg, data = data, random = ~ 1 | ID, method="ML")
summary(m7)
Linear mixed-effects model fit by maximum likelihood
 Data: data 
       AIC      BIC    logLik
  26227.59 26320.36 -13100.79

Random effects:
 Formula: ~1 | ID
        (Intercept)  Residual
StdDev:   0.1198702 0.9884902

Fixed effects: bias_t ~ stim + rs53576_A.carrier + rs2268498_C.carrier + gmean_ecrrs_global_anx +      gmean_ecrrs_global_avoid + gmean_age + gmean_icg 
                               Value  Std.Error   DF   t-value p-value
(Intercept)              -0.01203754 0.03405637 9250 -0.353459  0.7238
stimdeath                -0.05487972 0.03368552 9250 -1.629178  0.1033
stimliving                0.04330158 0.03236055 9250  1.338098  0.1809
stimspouse                0.11722059 0.03313178 9250  3.538010  0.0004
stimstranger             -0.05691502 0.03201784 9250 -1.777603  0.0755
rs53576_A.carrier.L       0.00525467 0.04347813   28  0.120858  0.9047
rs2268498_C.carrier.L     0.00193315 0.04822525   28  0.040086  0.9683
gmean_ecrrs_global_anx   -0.01559904 0.01800716   28 -0.866269  0.3937
gmean_ecrrs_global_avoid -0.00361422 0.02238085   28 -0.161487  0.8729
gmean_age                 0.00000470 0.00001033   28  0.455159  0.6525
gmean_icg                -0.00073388 0.00229617   28 -0.319611  0.7516
 Correlation: 
                         (Intr) stmdth stmlvn stmsps stmstr r53576 r22684 gmn_crrs_glbl_n gmn_crrs_glbl_v gmen_g
stimdeath                -0.481                                                                                 
stimliving               -0.500  0.506                                                                          
stimspouse               -0.489  0.495  0.515                                                                   
stimstranger             -0.506  0.512  0.533  0.520                                                            
rs53576_A.carrier.L       0.248  0.001  0.000 -0.002 -0.001                                                     
rs2268498_C.carrier.L    -0.406  0.000  0.000  0.001  0.001 -0.646                                              
gmean_ecrrs_global_anx    0.017  0.000  0.000  0.000  0.000 -0.143  0.054                                       
gmean_ecrrs_global_avoid -0.078 -0.002 -0.002 -0.001 -0.001 -0.111  0.120 -0.049                                
gmean_age                -0.026  0.002  0.002  0.004  0.003 -0.187  0.171  0.304           0.142                
gmean_icg                -0.021  0.000  0.001  0.002  0.003  0.150  0.055 -0.468          -0.179          -0.263

Standardized Within-Group Residuals:
         Min           Q1          Med           Q3          Max 
-3.540273271 -0.660509033  0.004325344  0.672841199  3.966239659 

Number of Observations: 9289
Number of Groups: 35 
anova(m4,m7)
   Model df      AIC      BIC    logLik   Test  L.Ratio p-value
m4     1  7 26217.69 26267.65 -13101.85                        
m7     2 13 26227.59 26320.37 -13100.80 1 vs 2 2.104686  0.9098

None of the added fixed effects are significant and the AIC/BIC values are higher, so take ’em out.

Final model

# variance explained by the entire model 
fitted <- fitted(m4)
data$fitted <- as.vector(fitted)
rsq <- lm(bias ~ fitted, data=data) 
summary(rsq)

Call:
lm(formula = bias ~ fitted, data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-1170.66  -128.74    -2.77   127.93  1115.32 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   11.850      2.532    4.68 2.91e-06 ***
fitted       272.525     19.581   13.92  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 244 on 9287 degrees of freedom
Multiple R-squared:  0.02043,   Adjusted R-squared:  0.02033 
F-statistic: 193.7 on 1 and 9287 DF,  p-value: < 2.2e-16

A case of significant, but not meaningful: The R2 for the entire model is .02, indicating that 2% of the variance in bias is explained by the model. So this model is not very good at predicting approach/avoidance bias from the fixed effect of stimulus within person. Oh well.

Check final model assumptions

Checking normality of the residuals:

# checking normality of residuals
# level 1:
qqnorm(m4, ~resid(.), id=0.00075)

plot(m4, resid(., type="p") ~ trialnum, id=0.0025)

The QQ plot looks pretty okay, albeit a few outliers. D113 is the person who most falls off the line, and does so at both ends. The residuals are also pretty evenly distributed across time, though many fall outside of the +/- 2 range.

# the following bit wants IDs to be numeric, so take out the "D" from each ID
data <- data %>% mutate(numID = as.numeric(str_replace(ID, "D", "")))
# level 2:
qqnorm(m4, ~ranef(.), id=0.25)

ranInt <- ranef(m4)[1] # create object with the random intercept estimates
ranInt <- unlist(ranInt) 
names(ranInt) <- NULL 
ID <- unlist(subset(data, !duplicated(data$numID), select="numID")) # create an ID variable
plot(ranInt ~ ID)

ranSlope <- ranef(m4)[2] # create an object with the random slopes
Error in `[.data.frame`(ranef(m4), 2) : undefined columns selected

Getting an error here Error in [.data.frame(ranef(m4), 72) : undefined columns selected, which I got stuck for way too long on and am leaving alone for now.

Checking homoscedasticity:

# checking homoscedasticity
# level 1:
data$resid <- as.numeric(residuals(m4)) # create an object with the Level-1 residuals
plot(data$resid ~ data$stim)

# level 2:
library(lattice)
stim <- unlist(subset(data, !duplicated(data$numID), select="stim"))
Unknown or uninitialised column: 'numID'.Length of logical index must be 1 or 9289, not 0
xyplot(ranInt ~ stim)

xyplot(ranSlope ~ stim)
Error in eval(expr, envir, enclos) : object 'ranSlope' not found

Note that again there are a lot of residuals falling outside of the +/-2 range.

Also, there’s something weird about the L2 plot. Not sure why it shows only 3 of 5 stimulus types (edit: now not showing anything).

The error message is because the ranSlope object does not exist (see error message in output from previous chunk.)

OT RT plots

Long dataset for regressions

More data cleaning

Continuing from the data cleaning above…

library(psych)
library(tidyverse)
── Attaching packages ────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
✔ ggplot2 3.0.0     ✔ purrr   0.2.4
✔ tibble  1.4.2     ✔ dplyr   0.7.4
✔ tidyr   0.8.1     ✔ stringr 1.2.0
✔ readr   1.1.1     ✔ forcats 0.3.0
── Conflicts ───────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ ggplot2::%+%()       masks psych::%+%()
✖ ggplot2::alpha()     masks psych::alpha()
✖ .GlobalEnv::filter() masks dplyr::filter(), stats::filter()
✖ dplyr::lag()         masks stats::lag()
filter <- dplyr::filter
select <- dplyr::select
mlm_r1r2_long <- bind_rows(mlm_r1.2, mlm_r2.2)
head(mlm_r1r2_long) ## 22464 observations
# get rid of any trials with outliers
mlm_long <- mlm_r1r2_long %>% filter(outlier_RT == "0") 
count(mlm_r1r2.1) # 22007 obs.
22007/22464 # 98% of trials retained = 2% of trials dropped.
[1] 0.9796563
mlm_long1 <- left_join(mlm_long, mlm_add, by="ID") # add the self-report variables
head(mlm_long1)
gmeans <- select(mlm_bx5.1, starts_with("gmean")) # add grand mean centered variables from mlm_bx5.1 object
ID <- select(mlm_bx5.1, starts_with("ID")) 
ID$ID1 <- NULL
gmeans_ID <- bind_cols(ID, gmeans) %>% distinct(gmeans_ID, ID, .keep_all = TRUE)
mlm_long2 <- left_join(mlm_long1, gmeans_ID, by="ID")
library(bestNormalize)
describe(mlm_long1$gAAT_RT) # RT data has skewness of 1.17 and kurtosis of 1.7. Min is 464 and max is 1712, as we filtered out trials above and below the 99th and 1st percentiles already
   vars     n  mean    sd median trimmed    mad min  max range skew kurtosis   se
X1    1 22007 788.6 205.1    747  765.68 179.39 464 1712  1248 1.17      1.7 1.38
bestNormalize(mlm_long2$gAAT_RT)
exp_x  did not work;  Error in exp_x(standardize = TRUE, warn = TRUE, x = c(1139, 980, 1235,  : 
  Transformation finite for less than 3 x values
Ties in data, Normal distribution not guaranteed

|==                                                                                                                           |  2% ~9 s remaining     
|=====                                                                                                                        |  4% ~10 s remaining    
|=======                                                                                                                      |  6% ~9 s remaining     
|==========                                                                                                                   |  8% ~8 s remaining     
|============                                                                                                                 | 10% ~8 s remaining     
|===============                                                                                                              | 12% ~8 s remaining     
|=================                                                                                                            | 14% ~8 s remaining     
|====================                                                                                                         | 16% ~9 s remaining     
|======================                                                                                                       | 18% ~8 s remaining     
|=========================                                                                                                    | 20% ~8 s remaining     
|===========================                                                                                                  | 22% ~7 s remaining     
|==============================                                                                                               | 24% ~7 s remaining     
|================================                                                                                             | 26% ~7 s remaining     
|===================================                                                                                          | 28% ~7 s remaining     
|=====================================                                                                                        | 30% ~6 s remaining     
|========================================                                                                                     | 32% ~6 s remaining     
|==========================================                                                                                   | 34% ~6 s remaining     
|=============================================                                                                                | 36% ~6 s remaining     
|===============================================                                                                              | 38% ~5 s remaining     
|==================================================                                                                           | 40% ~5 s remaining     
|====================================================                                                                         | 42% ~5 s remaining     
|=======================================================                                                                      | 44% ~5 s remaining     
|=========================================================                                                                    | 46% ~5 s remaining     
|============================================================                                                                 | 48% ~4 s remaining     
|==============================================================                                                               | 50% ~4 s remaining     
|=================================================================                                                            | 52% ~4 s remaining     
|===================================================================                                                          | 54% ~4 s remaining     
|======================================================================                                                       | 56% ~4 s remaining     
|========================================================================                                                     | 58% ~4 s remaining     
|===========================================================================                                                  | 60% ~3 s remaining     
|=============================================================================                                                | 62% ~3 s remaining     
|================================================================================                                             | 64% ~3 s remaining     
|==================================================================================                                           | 66% ~3 s remaining     
|=====================================================================================                                        | 68% ~3 s remaining     
|=======================================================================================                                      | 70% ~2 s remaining     
|==========================================================================================                                   | 72% ~2 s remaining     
|============================================================================================                                 | 74% ~2 s remaining     
|===============================================================================================                              | 76% ~2 s remaining     
|=================================================================================================                            | 78% ~2 s remaining     
|====================================================================================================                         | 80% ~2 s remaining     
|======================================================================================================                       | 82% ~1 s remaining     
|=========================================================================================================                    | 84% ~1 s remaining     
|===========================================================================================================                  | 86% ~1 s remaining     
|==============================================================================================================               | 88% ~1 s remaining     
|================================================================================================================             | 90% ~1 s remaining     
|===================================================================================================================          | 92% ~1 s remaining     
|=====================================================================================================================        | 94% ~1 s remaining     
|========================================================================================================================     | 96% ~0 s remaining     
|==========================================================================================================================   | 98% ~0 s remaining     
|=============================================================================================================================|100% ~0 s remaining     
Best Normalizing transformation with 22007 Observations
 Estimated Normality Statistics (Pearson P / df, lower => more normal):
 - No transform: 10.717 
 - Box-Cox: 1.3513 
 - Log_b(x+a): 3.5498 
 - sqrt(x+a): 6.4216 
 - arcsinh(x): 3.5498 
 - Yeo-Johnson: 1.3516 
 - orderNorm: 1.1981 
Estimation method: Out-of-sample via CV with 10 folds and 5 repeats
 
Based off these, bestNormalize chose:
orderNorm Transformation with 22007 nonmissing obs and ties
 - 1111 unique values 
 - Original quantiles:
  0%  25%  50%  75% 100% 
 464  640  747  892 1712 
gAAT_RT_t <- bestNormalize(mlm_long2$gAAT_RT)  # transform the RT data
exp_x  did not work;  Error in exp_x(standardize = TRUE, warn = TRUE, x = c(1139, 980, 1235,  : 
  Transformation finite for less than 3 x values
Ties in data, Normal distribution not guaranteed

|==                                                                                                                           |  2% ~8 s remaining     
|=====                                                                                                                        |  4% ~7 s remaining     
|=======                                                                                                                      |  6% ~7 s remaining     
|==========                                                                                                                   |  8% ~7 s remaining     
|============                                                                                                                 | 10% ~7 s remaining     
|===============                                                                                                              | 12% ~7 s remaining     
|=================                                                                                                            | 14% ~7 s remaining     
|====================                                                                                                         | 16% ~7 s remaining     
|======================                                                                                                       | 18% ~7 s remaining     
|=========================                                                                                                    | 20% ~6 s remaining     
|===========================                                                                                                  | 22% ~6 s remaining     
|==============================                                                                                               | 24% ~6 s remaining     
|================================                                                                                             | 26% ~6 s remaining     
|===================================                                                                                          | 28% ~6 s remaining     
|=====================================                                                                                        | 30% ~6 s remaining     
|========================================                                                                                     | 32% ~5 s remaining     
|==========================================                                                                                   | 34% ~5 s remaining     
|=============================================                                                                                | 36% ~5 s remaining     
|===============================================                                                                              | 38% ~5 s remaining     
|==================================================                                                                           | 40% ~5 s remaining     
|====================================================                                                                         | 42% ~5 s remaining     
|=======================================================                                                                      | 44% ~5 s remaining     
|=========================================================                                                                    | 46% ~5 s remaining     
|============================================================                                                                 | 48% ~4 s remaining     
|==============================================================                                                               | 50% ~4 s remaining     
|=================================================================                                                            | 52% ~4 s remaining     
|===================================================================                                                          | 54% ~4 s remaining     
|======================================================================                                                       | 56% ~4 s remaining     
|========================================================================                                                     | 58% ~3 s remaining     
|===========================================================================                                                  | 60% ~3 s remaining     
|=============================================================================                                                | 62% ~3 s remaining     
|================================================================================                                             | 64% ~3 s remaining     
|==================================================================================                                           | 66% ~3 s remaining     
|=====================================================================================                                        | 68% ~3 s remaining     
|=======================================================================================                                      | 70% ~2 s remaining     
|==========================================================================================                                   | 72% ~2 s remaining     
|============================================================================================                                 | 74% ~2 s remaining     
|===============================================================================================                              | 76% ~2 s remaining     
|=================================================================================================                            | 78% ~2 s remaining     
|====================================================================================================                         | 80% ~2 s remaining     
|======================================================================================================                       | 82% ~1 s remaining     
|=========================================================================================================                    | 84% ~1 s remaining     
|===========================================================================================================                  | 86% ~1 s remaining     
|==============================================================================================================               | 88% ~1 s remaining     
|================================================================================================================             | 90% ~1 s remaining     
|===================================================================================================================          | 92% ~1 s remaining     
|=====================================================================================================================        | 94% ~0 s remaining     
|========================================================================================================================     | 96% ~0 s remaining     
|==========================================================================================================================   | 98% ~0 s remaining     
|=============================================================================================================================|100% ~0 s remaining     
mlm_long2$gAAT_RT_t <- gAAT_RT_t$x.t # make a new variable for the transformed values and attach it to the long dataset
hist(mlm_long2$gAAT_RT) # compare histograms for the transformed and non-transformed values

hist(mlm_long2$gAAT_RT_t) 

qqnorm(mlm_long2$gAAT_RT) #compare QQ plots for the transformed and non-transformed values

qqnorm(mlm_long2$gAAT_RT_t)

mlm_long2 %>% group_by(ID) %>% count(cond) # look at how many trials each person has for each condition (A/B)
# splitting up by instruction
mlm_long2_push <- filter(mlm_long2, grepl("PUSH",push_pull))
head(mlm_long2_push)
mlm_long2_pull <- filter(mlm_long2, grepl("PULL",push_pull))
head(mlm_long2_push)

Transforming really improved the RT data. Everyone looks like they have a good number of trials (out of 288; 288 = 144 trials*2 runs).

Bias scores (based on medians)

library(tidyverse)
# get the OXTR and self-report scores
# load in master dataset
master <- readRDS("~/Dropbox/GLASS Lab/OT Study/data/cleaned-data/selfreports_oxtr_cortisol_random_AB_bx_clean.rds") 
# subset the selected variables
subjvars <- subset(master, select=c(ID,age, age_yrs, sex_m, ethnicity_hisp, race, timesincedeath, rs2254298, rs2268498, rs53576, rs2254298_A.carrier, rs2268498_C.carrier, rs53576_A.carrier, tot_bdi, tot_bgq, tot_ecrrs_global_anx,tot_ecrrs_global_avoid, tot_ecrrs_spouse_anx, tot_ecrrs_spouse_avoid, tot_ysl, tot_icg))
head(subjvars)
# add a group variable for CG/NCG
subjvars <- subjvars %>%
  mutate(grp_cg = as.factor(ifelse(tot_icg >= 25, "CG", "NCG"))) 
# all of the OXTR vars should be factors but aren't
subjvars <- subjvars %>% mutate(rs2268498 = as.factor(rs2268498),
                        rs53576 = as.factor(rs53576))
# creating a wide dataset with the bias variables, which will then be converted into a long dataset
# The arguments to spread():
# - data: Data object
# - key: Name of column containing the new column names
# - value: Name of column containing values
data_wide <- spread(mlm_r1r2_long, key=stim, value=gAAT_RT)
colnames(data_wide)
 [1] "date"             "time"             "ID"               "blockcode"        "blocknum"         "trialcode"        "values.trialcode" "values.stimulus"  "push_pull"        "trialnum"         "run"             
[12] "visit"            "cond_v1"          "cond_v2"          "cond"             "outlier_RT"       "neutral"          "death"            "living"           "spouse"           "stranger"        
View(data_wide)
# subset and remove outliers
data_pushA <- data_wide %>% select(ID, cond, push_pull, visit, outlier_RT, neutral, death, living, spouse, stranger) %>% filter(outlier_RT == "0" & cond == "A" & push_pull == "PUSH") %>% group_by(ID) %>% mutate(neutralApush = median(neutral, na.rm = TRUE), deathApush = median(death, na.rm = TRUE), spouseApush = median(spouse, na.rm=TRUE), livingApush = median(living, na.rm=TRUE), strangerApush = median(stranger, na.rm=TRUE))
data_pullA <- data_wide %>% select(ID, cond, push_pull, visit, outlier_RT, neutral, death, living, spouse, stranger) %>% filter(outlier_RT == "0" & cond == "A" & push_pull == "PULL") %>% group_by(ID) %>% mutate(neutralApull = median(neutral, na.rm = TRUE), deathApull = median(death, na.rm = TRUE), spouseApull = median(spouse, na.rm=TRUE), livingApull = median(living, na.rm=TRUE), strangerApull = median(stranger, na.rm=TRUE))
data_pushB <- data_wide %>% select(ID, cond, push_pull, visit, outlier_RT, neutral, death, living, spouse, stranger) %>% filter(outlier_RT == "0" & cond == "B" & push_pull == "PUSH") %>% group_by(ID) %>% mutate(neutralBpush = median(neutral, na.rm = TRUE), deathBpush = median(death, na.rm = TRUE), spouseBpush = median(spouse, na.rm=TRUE), livingBpush = median(living, na.rm=TRUE), strangerBpush = median(stranger, na.rm=TRUE))
data_pullB <- data_wide %>% select(ID, cond, push_pull, visit, outlier_RT, neutral, death, living, spouse, stranger) %>% filter(outlier_RT == "0" & cond == "B" & push_pull == "PULL") %>% group_by(ID) %>% mutate(neutralBpull = median(neutral, na.rm = TRUE), deathBpull = median(death, na.rm = TRUE), spouseBpull = median(spouse, na.rm=TRUE), livingBpull = median(living, na.rm=TRUE), strangerBpull = median(stranger, na.rm=TRUE))
# summarize by ID
data_pushA1 <- data_pushA[!duplicated(data_pushA$ID), ] %>% select(-c(neutral, death, living, spouse, stranger, visit, outlier_RT, cond, push_pull))
data_pushB1 <- data_pushB[!duplicated(data_pushB$ID), ] %>% select(-c(neutral, death, living, spouse, stranger, visit, outlier_RT, cond, push_pull))
data_pullA1 <- data_pullA[!duplicated(data_pullA$ID), ] %>% select(-c(neutral, death, living, spouse, stranger, visit, outlier_RT, cond, push_pull))
data_pullB1 <- data_pullB[!duplicated(data_pullB$ID), ] %>% select(-c(neutral, death, living, spouse, stranger, visit, outlier_RT, cond, push_pull))
head(data_pullB1)
# merge the 4 subsets together
join1 <- left_join(data_pushA1, data_pushB1, by="ID")
join2 <- left_join(join1, data_pullA1, by = "ID")
medians <- left_join(join2, data_pullB1, by = "ID")
head(medians)
medians1 <- medians %>% mutate(bias_neu_A = neutralApush - neutralApull,
                                     bias_dea_A = deathApush - deathApull,
                                     bias_spo_A = spouseApush - spouseApull,
                                     bias_str_A = strangerApush - strangerApull,
                                     bias_liv_A = livingApush - livingApull,
                                     bias_neu_B = neutralBpush - neutralBpull,
                                     bias_dea_B = deathBpush - deathApull,
                                     bias_spo_B = spouseBpush - spouseBpull,
                                     bias_str_B = strangerBpush - strangerBpull,
                                     bias_liv_B = livingBpush - livingBpull
                                     )
# now add the grp_cg/oxtr/self-report variables
medians2 <- left_join(medians1, subjvars, by="ID")
head(medians2)
# drop the condition/stim specific variables
medians2 <- medians2 %>% select(-c(neutralApush, deathApush,  spouseApush,  livingApush, strangerApush, neutralBpush, deathBpush,   spouseBpush,  livingBpush,  strangerBpush, neutralApull, deathApull, spouseApull, livingApull, strangerApull, neutralBpull, deathBpull, spouseBpull, livingBpull, strangerBpull))
colnames(medians2)
 [1] "ID"                     "bias_neu_A"             "bias_dea_A"             "bias_spo_A"             "bias_str_A"             "bias_liv_A"             "bias_neu_B"             "bias_dea_B"            
 [9] "bias_spo_B"             "bias_str_B"             "bias_liv_B"             "age"                    "age_yrs"                "sex_m"                  "ethnicity_hisp"         "race"                  
[17] "timesincedeath"         "rs2254298"              "rs2268498"              "rs53576"                "rs2254298_A.carrier"    "rs2268498_C.carrier"    "rs53576_A.carrier"      "tot_bdi"               
[25] "tot_bgq"                "tot_ecrrs_global_anx"   "tot_ecrrs_global_avoid" "tot_ecrrs_spouse_anx"   "tot_ecrrs_spouse_avoid" "tot_ysl"                "tot_icg"                "grp_cg"                
# turn it into a long dataset
medians_long <- gather(medians2,
                   key = "stim",
                   value = "bias", -c(ID,age, age_yrs, sex_m, ethnicity_hisp, race, timesincedeath, rs2254298, rs2268498, rs53576, rs2254298_A.carrier, rs2268498_C.carrier, rs53576_A.carrier, tot_bdi, tot_bgq, tot_ecrrs_global_anx,tot_ecrrs_global_avoid, tot_ecrrs_spouse_anx, tot_ecrrs_spouse_avoid, tot_ysl, tot_icg, grp_cg)) 
str(medians_long)
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 390 obs. of  24 variables:
 $ ID                    : chr  "D101" "D105" "D110" "D114" ...
 $ age                   : num  25597 26613 25049 27122 28295 ...
 $ age_yrs               : num  70.1 72.9 68.6 74.3 77.5 ...
 $ sex_m                 : Ord.factor w/ 2 levels "female"<"male": 2 1 1 1 2 1 2 1 1 1 ...
 $ ethnicity_hisp        : Ord.factor w/ 2 levels "not Hispanic or Latino"<..: 1 1 1 1 1 1 1 1 1 1 ...
 $ race                  : Factor w/ 2 levels "White","Other (specify)": 1 1 1 1 1 1 1 1 1 1 ...
 $ timesincedeath        : num  402 317 277 934 281 ...
 $ rs2254298             : Factor w/ 2 levels "GG","A/G": 1 1 2 2 1 1 1 1 1 2 ...
 $ rs2268498             : Factor w/ 3 levels "C/T","CC","TT": 1 1 1 1 3 2 1 1 2 2 ...
 $ rs53576               : Factor w/ 2 levels "A/G","GG": 1 2 2 1 2 1 1 1 2 1 ...
 $ rs2254298_A.carrier   : Ord.factor w/ 2 levels "0"<"1": 1 1 2 2 1 1 1 1 1 2 ...
 $ rs2268498_C.carrier   : Ord.factor w/ 2 levels "0"<"1": 2 2 2 2 1 2 2 2 2 2 ...
 $ rs53576_A.carrier     : Ord.factor w/ 2 levels "0"<"1": 2 1 1 2 1 2 2 2 1 2 ...
 $ tot_bdi               : num  3 14 6 8 2 5 5 3 5 0 ...
 $ tot_bgq               : num  6 11 9 9 9 10 8 9 8 6 ...
 $ tot_ecrrs_global_anx  : num  3.33 2.33 3.33 1.33 4 ...
 $ tot_ecrrs_global_avoid: num  1 3.5 4.17 4.5 3.33 ...
 $ tot_ecrrs_spouse_anx  : num  1 1 1 2.67 1 ...
 $ tot_ecrrs_spouse_avoid: num  1 1 1.17 2.83 2.33 ...
 $ tot_ysl               : num  44 77 48 45 62 79 57 41 49 55 ...
 $ tot_icg               : num  7 27 21 6 26 22 10 12 16 19 ...
 $ grp_cg                : Factor w/ 2 levels "CG","NCG": 2 1 2 2 1 2 2 2 2 2 ...
 $ stim                  : chr  "bias_neu_A" "bias_neu_A" "bias_neu_A" "bias_neu_A" ...
 $ bias                  : num  52 32 -76 11 -60 129 47.5 -4 1 29.5 ...
 - attr(*, "vars")= chr "ID"
 - attr(*, "drop")= logi TRUE
 - attr(*, "indices")=List of 39
  ..$ : int  0 39 78 117 156 195 234 273 312 351
  ..$ : int  20 59 98 137 176 215 254 293 332 371
  ..$ : int  21 60 99 138 177 216 255 294 333 372
  ..$ : int  22 61 100 139 178 217 256 295 334 373
  ..$ : int  1 40 79 118 157 196 235 274 313 352
  ..$ : int  23 62 101 140 179 218 257 296 335 374
  ..$ : int  2 41 80 119 158 197 236 275 314 353
  ..$ : int  24 63 102 141 180 219 258 297 336 375
  ..$ : int  3 42 81 120 159 198 237 276 315 354
  ..$ : int  4 43 82 121 160 199 238 277 316 355
  ..$ : int  5 44 83 122 161 200 239 278 317 356
  ..$ : int  6 45 84 123 162 201 240 279 318 357
  ..$ : int  7 46 85 124 163 202 241 280 319 358
  ..$ : int  8 47 86 125 164 203 242 281 320 359
  ..$ : int  25 64 103 142 181 220 259 298 337 376
  ..$ : int  26 65 104 143 182 221 260 299 338 377
  ..$ : int  9 48 87 126 165 204 243 282 321 360
  ..$ : int  27 66 105 144 183 222 261 300 339 378
  ..$ : int  10 49 88 127 166 205 244 283 322 361
  ..$ : int  28 67 106 145 184 223 262 301 340 379
  ..$ : int  11 50 89 128 167 206 245 284 323 362
  ..$ : int  29 68 107 146 185 224 263 302 341 380
  ..$ : int  30 69 108 147 186 225 264 303 342 381
  ..$ : int  31 70 109 148 187 226 265 304 343 382
  ..$ : int  12 51 90 129 168 207 246 285 324 363
  ..$ : int  32 71 110 149 188 227 266 305 344 383
  ..$ : int  33 72 111 150 189 228 267 306 345 384
  ..$ : int  13 52 91 130 169 208 247 286 325 364
  ..$ : int  14 53 92 131 170 209 248 287 326 365
  ..$ : int  15 54 93 132 171 210 249 288 327 366
  ..$ : int  34 73 112 151 190 229 268 307 346 385
  ..$ : int  16 55 94 133 172 211 250 289 328 367
  ..$ : int  17 56 95 134 173 212 251 290 329 368
  ..$ : int  35 74 113 152 191 230 269 308 347 386
  ..$ : int  36 75 114 153 192 231 270 309 348 387
  ..$ : int  18 57 96 135 174 213 252 291 330 369
  ..$ : int  37 76 115 154 193 232 271 310 349 388
  ..$ : int  38 77 116 155 194 233 272 311 350 389
  ..$ : int  19 58 97 136 175 214 253 292 331 370
 - attr(*, "group_sizes")= int  10 10 10 10 10 10 10 10 10 10 ...
 - attr(*, "biggest_group_size")= int 10
 - attr(*, "labels")='data.frame':  39 obs. of  1 variable:
  ..$ ID: chr  "D101" "D102" "D103" "D104" ...
  ..- attr(*, "vars")= chr "ID"
  ..- attr(*, "drop")= logi TRUE
# create new columns for stimulus category and condition
medians_long$cond <- as.factor(ifelse(grepl("*_A", data_long$stim), "A", "B"))
medians_long$stimulus <- as.factor(ifelse(grepl("*neu*", data_long$stim), "neutral", 
                                   ifelse(grepl("*dea*", data_long$stim), "death",
                                          ifelse(grepl("*spo*", data_long$stim), "spouse", 
                                                 ifelse(grepl("*str*", data_long$stim), "stranger", 
                                                        ifelse(grepl("*liv*", data_long$stim), "living", NA))))))
View(medians_long) # 390 obs. of 26 vars, that's correct
library(psych)
describe(medians_long$bias) # skewness is -.07, kurtosis is 1.57, SD = 74.95
   vars   n  mean    sd median trimmed   mad  min max range  skew kurtosis   se
X1    1 390 14.38 74.94  13.75   14.08 64.12 -284 295   579 -0.07     1.57 3.79
hist(medians_long$bias)

qqnorm(medians_long$bias) # not terrible, but definitely some outliers

# check out the outliers
## function from https://stackoverflow.com/questions/12866189/calculating-the-outliers-in-r
outfun <- function(x) {
  abs(x-mean(x,na.rm=TRUE)) > 3*sd(x,na.rm=TRUE)
}
# add a variable for outlier = T/F
medians_long$outlier <- outfun(medians_long$bias)
table(medians_long$outlier) # 5 observations where outlier = TRUE

FALSE  TRUE 
  385     5 
# which are these?
medians_long %>% filter(outlier==TRUE)
The five extreme outliers (>3 SD) are:
    D137 death A (-284.0) D145 death A (-247.5) D137 death B (-261.0) D147 death B (295.0) D139 spouse B (275.5)

What’s interesting here is that 4/5 of the outliers are in the CG group, and 3/4 of those show extreme avoidance bias to death stimuli. D137 has an extreme avoidance bias to death in both A and B. The the remaining two are extreme approach biases to death and spouse (NCG, condition B).

D147 will be dropped in the next chunk (low trial count), the others…??

Dropping the 4 people with low trial counts

medians_long.1 <- filter(medians_long, !grepl("D101|D141|D147|D148",ID)) 
unique(medians_long.1$ID) # dropped the 4 participants

Boxplots

panelbystim <- ggplot(medians_long.1, aes(fill=cond, y=bias, x=grp_cg)) + 
  geom_boxplot(aes(fill=cond), outlier.alpha = 0.3)
p1 <- panelbystim + facet_grid(. ~ stimulus) +  theme(strip.text.x = element_text(size=12, color="red",
                                      face="bold.italic"),
          strip.text.y = element_text(size=12, color="red",
                                      face="bold.italic")) + geom_hline(yintercept=0, linetype="solid", color="black", size=.5) + xlab("Group") + ylab("Avoid bias                Approach bias")
panelbyCG <- ggplot(medians_long.1, aes(fill=cond, y=bias, x=stimulus)) + 
  geom_boxplot(aes(fill=cond), outlier.alpha = 0.3)
p2 <- panelbyCG + facet_grid(. ~ grp_cg) +  theme(strip.text.x = element_text(size=12, color="red",
                                      face="bold.italic"),
          strip.text.y = element_text(size=12, color="red",
                                      face="bold.italic")) + geom_hline(yintercept=0, linetype="solid", color="black", size=.5) + xlab("Stimuli") + ylab("Avoid bias                Approach bias")
# What do the boxplots look like after excluding the outliers identified above?
medians_long.2 <- medians_long.1 %>% mutate(bias_no.out = ifelse(outlier==FALSE, bias, NA)) %>% filter(!is.na(bias_no.out))
View(medians_long.2)
panelbystim.2 <- ggplot(medians_long.2, aes(fill=cond, y=bias_no.out, x=grp_cg)) + 
  geom_boxplot(aes(fill=cond), outlier.alpha = 0.3)
p3 <- panelbystim.2 + facet_grid(. ~ stimulus) +  theme(strip.text.x = element_text(size=12, color="red",
                                      face="bold.italic"),
          strip.text.y = element_text(size=12, color="red",
                                      face="bold.italic")) + geom_hline(yintercept=0, linetype="solid", color="black", size=.5) + xlab("Group") + ylab("Avoid bias                Approach bias")
panelbyCG.2 <- ggplot(medians_long.2, aes(fill=cond, y=bias_no.out, x=stimulus)) + 
  geom_boxplot(aes(fill=cond), outlier.alpha = 0.3)
p4 <- panelbyCG.2 + facet_grid(. ~ grp_cg) +  theme(strip.text.x = element_text(size=12, color="red",
                                      face="bold.italic"),
          strip.text.y = element_text(size=12, color="red",
                                      face="bold.italic")) + geom_hline(yintercept=0, linetype="solid", color="black", size=.5) + xlab("Stimuli") + ylab("Avoid bias                Approach bias")
p1 # with outliers

p3 # no outliers

p2 # with outliers

p4 # no outliers

Note that the error bars are wayyyy wider if outliers not dropped.

Use plotly to explore outliers and stats

The plotly package allows you to interactively explore your ggplot objects. Conditions A/B are plotted on top of each other, which makes it hard to see both together, but you can click the legend to show just A or just B, which is pretty cool (see https://plot.ly/r/getting-started/ for more info). It doesn’t seem to ID all of the same outliers that ggplot2’s geom_boxplot does, but does provide a way to look at the values for some of them at least.

# install.packages("plotly")
# install.packages("shiny")
library(plotly)

Attaching package: ‘plotly’

The following object is masked from ‘package:ggplot2’:

    last_plot

The following object is masked from ‘package:stats’:

    filter

The following object is masked from ‘package:graphics’:

    layout
library(shiny)
ggplotly(p1) # with outliers

ggplotly(p3) # no outliers

ggplotly(p2) # with outliers

ggplotly(p4) # no outliers

[Spouse/Living/Death/Stranger] vs. Neutral plots

# computing bias relative to neutral:
medians3 <- medians2 %>% mutate(diff_bias_spo_A = bias_spo_A - bias_neu_A,
                                  diff_bias_str_A = bias_str_A - bias_neu_A,
                                  diff_bias_liv_A = bias_liv_A - bias_neu_A,
                                  diff_bias_dea_A = bias_dea_A - bias_neu_A,
                                  diff_bias_spo_B = bias_spo_B - bias_neu_B,
                                  diff_bias_str_B = bias_str_B - bias_neu_B,
                                  diff_bias_liv_B = bias_liv_B - bias_neu_B,
                                  diff_bias_dea_B = bias_dea_B - bias_neu_B)
View(medians3)
data_long2 <- gather(medians3,
                   key = "stim",
                   value = "diff_bias", -ID, -grp_cg)
attributes are not identical across measure variables;
they will be dropped
data_long3 <- filter(data_long2, grepl("^diff",stim))
data_long3$cond <- as.factor(ifelse(grepl("*_A", data_long3$stim), "A", "B"))
data_long3$stimulus <- as.factor(ifelse(grepl("*dea*", data_long3$stim), "death-neu",
                                          ifelse(grepl("*spo*", data_long3$stim), "spouse-neu", 
                                                 ifelse(grepl("*str*", data_long3$stim), "stranger-neu", 
                                                        ifelse(grepl("*liv*", data_long3$stim), "living-neu", NA)))))
minus_neu <- data_long3
View(minus_neu)
# diff_bias is chr rather than num for some reason
minus_neu$diff_bias <-as.numeric(minus_neu$diff_bias)
describe(minus_neu$diff_bias) # skewness is -0.16, kurtosis is .95
   vars   n  mean    sd median trimmed   mad    min   max range  skew kurtosis   se
X1    1 312 16.62 85.94  13.75   17.65 83.77 -304.5 292.5   597 -0.16     0.95 4.87
hist(minus_neu$diff_bias) # not bad

qqnorm(minus_neu$diff_bias) # not too bad, but definitely a few extreme values

# panel by stimulus category
panelbystim_diff <- ggplot(minus_neu, aes(fill=cond, y=diff_bias, x=grp_cg)) + 
  geom_boxplot(aes(fill=cond), outlier.alpha = 0.3)
p5 <- panelbystim_diff + facet_grid(. ~ stimulus) +  theme(strip.text.x = element_text(size=12, color="red",
                                      face="bold.italic"),
          strip.text.y = element_text(size=12, color="red",
                                      face="bold.italic")) + geom_hline(yintercept=0, linetype="solid", color="black", size=.5) + xlab("Group") + ylab("Avoid bias                Approach bias")
p5

ggplotly(p5)

# panel by group
panelbyCG_diff <- ggplot(minus_neu, aes(fill=cond, y=diff_bias, x=stimulus)) + 
  geom_boxplot(aes(fill=cond), outlier.alpha = 0.3)
p6 <- panelbyCG_diff + facet_grid(. ~ grp_cg) +  theme(strip.text.x = element_text(size=12, color="red",
                                      face="bold.italic"),
          strip.text.y = element_text(size=12, color="red",
                                      face="bold.italic")) + geom_hline(yintercept=0, linetype="solid", color="black", size=.5) + xlab("Group") + ylab("Avoid bias                Approach bias")
p6

ggplotly(p6)

# dropping low-trial count people
minus_neu.1 <- filter(minus_neu, !grepl("D101|D141|D147|D148",ID))
# panel by stimulus category
panelbystim_diff.1 <- ggplot(minus_neu.1, aes(fill=cond, y=diff_bias, x=grp_cg)) + 
  geom_boxplot(aes(fill=cond), outlier.alpha = 0.3)
p7 <- panelbystim_diff.1 + facet_grid(. ~ stimulus) +  theme(strip.text.x = element_text(size=12, color="red",
                                      face="bold.italic"),
          strip.text.y = element_text(size=12, color="red",
                                      face="bold.italic")) + geom_hline(yintercept=0, linetype="solid", color="black", size=.5) + xlab("Group") + ylab("Avoid bias                Approach bias")
p7

ggplotly(p7)

# panel by group
panelbyCG_diff.1 <- ggplot(minus_neu.1, aes(fill=cond, y=diff_bias, x=stimulus)) + 
  geom_boxplot(aes(fill=cond), outlier.alpha = 0.3)
p8 <- panelbyCG_diff.1 + facet_grid(. ~ grp_cg) +  theme(strip.text.x = element_text(size=12, color="red",
                                      face="bold.italic"),
          strip.text.y = element_text(size=12, color="red",
                                      face="bold.italic")) + geom_hline(yintercept=0, linetype="solid", color="black", size=.5) + xlab("Group") + ylab("Avoid bias                Approach bias")
p8

ggplotly(p8)

Spouse vs. Stranger plots

# computing spouse vs. stranger
medians4 <- medians3 %>% mutate(spo_vs_str_A = bias_spo_A - bias_str_A,
                                  spo_vs_str_B = bias_spo_B - bias_str_B,
                                  liv_vs_str_A = bias_liv_A - bias_str_A,
                                  liv_vs_str_B = bias_liv_B - bias_str_B)
data_long4 <- gather(data_means4,
                   key = "stim",
                   value = "diff_bias", -ID, -grp_cg)
data_long4 <- filter(data_long4, grepl("^spo_|^liv_",stim))
data_long4$cond <- as.factor(ifelse(grepl("*_A", data_long4$stim), "A", "B"))
data_long4$stimulus <- as.factor(ifelse(grepl("*spo*", data_long4$stim), "spouse-stranger",
                                          ifelse(grepl("*liv*", data_long4$stim), "living-stranger", NA)))
spouse_vs_str <- data_long4
View(spouse_vs_str)
# panel by stimulus category
panelbystim_diff2 <- ggplot(spouse_vs_str, aes(fill=cond, y=diff_bias, x=grp_cg)) + 
  geom_boxplot(aes(fill=cond), outlier.alpha = 0.3)
p9 <- panelbystim_diff2 + facet_grid(. ~ stimulus) +  theme(strip.text.x = element_text(size=12, color="red",
                                      face="bold.italic"),
          strip.text.y = element_text(size=12, color="red",
                                      face="bold.italic")) + geom_hline(yintercept=0, linetype="solid", color="black", size=.5) + xlab("Group") + ylab("Avoid bias                Approach bias")
p9

ggplotly(p9)

# panel by group
panelbyCG_diff2 <- ggplot(spouse_vs_str, aes(fill=cond, y=diff_bias, x=stimulus)) + 
  geom_boxplot(aes(fill=cond), outlier.alpha = 0.3)
p10 <- panelbyCG_diff2 + facet_grid(. ~ grp_cg) +  theme(strip.text.x = element_text(size=12, color="red",
                                      face="bold.italic"),
          strip.text.y = element_text(size=12, color="red",
                                      face="bold.italic")) + geom_hline(yintercept=0, linetype="solid", color="black", size=.5) + xlab("Group") + ylab("Avoid bias                Approach bias")
p10

ggplotly(p10)

# dropping low-trial count people
spouse_vs_str.1 <- filter(spouse_vs_str, !grepl("D101|D141|D147|D148",ID))
# panel by stimulus category
panelbystim_diff2.1 <- ggplot(spouse_vs_str.1, aes(fill=cond, y=diff_bias, x=grp_cg)) + 
  geom_boxplot(aes(fill=cond), outlier.alpha = 0.3)
p11 <- panelbystim_diff2.1 + facet_grid(. ~ stimulus) +  theme(strip.text.x = element_text(size=12, color="red",
                                      face="bold.italic"),
          strip.text.y = element_text(size=12, color="red",
                                      face="bold.italic")) + geom_hline(yintercept=0, linetype="solid", color="black", size=.5) + xlab("Group") + ylab("Avoid bias                Approach bias")
p11

ggplotly(p11)

# panel by group
panelbyCG_diff2.1 <- ggplot(spouse_vs_str.1, aes(fill=cond, y=diff_bias, x=stimulus)) + 
  geom_boxplot(aes(fill=cond), outlier.alpha = 0.3)
p12 <- panelbyCG_diff2.1 + facet_grid(. ~ grp_cg) +  theme(strip.text.x = element_text(size=12, color="red",
                                      face="bold.italic"),
          strip.text.y = element_text(size=12, color="red",
                                      face="bold.italic")) + geom_hline(yintercept=0, linetype="solid", color="black", size=.5) + xlab("Group") + ylab("Avoid bias                Approach bias")
p12

ggplotly(p12)

Regressions

# predict from stimulus
m1 <- lm(bias ~ stimulus, data=medians_long.3)
summary(m1)

Call:
lm(formula = bias ~ stimulus, data = medians_long.3)

Residuals:
     Min       1Q   Median       3Q      Max 
-211.214  -42.371    2.786   39.877  176.710 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)   
(Intercept)        1.5896     8.0430   0.198  0.84345   
stimulusliving    24.1247    11.2520   2.144  0.03274 * 
stimulusneutral   -1.8681    11.2520  -0.166  0.86823   
stimulusspouse    36.7003    11.2918   3.250  0.00127 **
stimulusstranger  -0.4324    11.2520  -0.038  0.96937   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 65.83 on 341 degrees of freedom
Multiple R-squared:  0.05517,   Adjusted R-squared:  0.04408 
F-statistic: 4.977 on 4 and 341 DF,  p-value: 0.0006538
# predict from condition
m2 <- lm(bias ~ cond, data=medians_long.3)
summary(m2)

Call:
lm(formula = bias ~ cond, data = medians_long.3)

Residuals:
     Min       1Q   Median       3Q      Max 
-206.142  -43.290   -2.142   43.767  201.494 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)  13.1416     5.1269   2.563   0.0108 *
condB         0.3642     7.2505   0.050   0.9600  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 67.43 on 344 degrees of freedom
Multiple R-squared:  7.333e-06, Adjusted R-squared:  -0.0029 
F-statistic: 0.002523 on 1 and 344 DF,  p-value: 0.96
# predict from stimulus*condition
m3 <- lm(bias ~ stimulus*cond, data=medians_long.3)
summary(m3)

Call:
lm(formula = bias ~ stimulus * cond, data = medians_long.3)

Residuals:
     Min       1Q   Median       3Q      Max 
-204.186  -43.439    2.779   37.625  175.809 

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)   
(Intercept)              -5.485     11.518  -0.476  0.63425   
stimulusliving           38.228     16.055   2.381  0.01782 * 
stimulusneutral           5.756     16.055   0.359  0.72017   
stimulusspouse           42.899     16.055   2.672  0.00791 **
stimulusstranger          5.185     16.055   0.323  0.74694   
condB                    13.941     16.169   0.862  0.38920   
stimulusliving:condB    -27.998     22.619  -1.238  0.21665   
stimulusneutral:condB   -15.041     22.619  -0.665  0.50653   
stimulusspouse:condB    -12.164     22.700  -0.536  0.59242   
stimulusstranger:condB  -11.026     22.619  -0.487  0.62623   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 66.17 on 336 degrees of freedom
Multiple R-squared:  0.0596,    Adjusted R-squared:  0.03441 
F-statistic: 2.366 on 9 and 336 DF,  p-value: 0.01328
# predict from stimulus*condition*group
m4 <- lm(bias ~ stimulus*cond*grp_cg, data=medians_long.3)
summary(m4)

Call:
lm(formula = bias ~ stimulus * cond * grp_cg, data = medians_long.3)

Residuals:
     Min       1Q   Median       3Q      Max 
-216.321  -42.661   -0.645   44.052  182.711 

Coefficients:
                                 Estimate Std. Error t value Pr(>|t|)
(Intercept)                      -16.5769    18.2348  -0.909    0.364
stimulusliving                    38.4436    24.9135   1.543    0.124
stimulusneutral                   -0.5897    24.9135  -0.024    0.981
stimulusspouse                    31.4769    24.9135   1.263    0.207
stimulusstranger                  -9.8231    24.9135  -0.394    0.694
condB                             40.8984    25.3233   1.615    0.107
grp_cgNCG                         18.3019    23.4231   0.781    0.435
stimulusliving:condB             -37.0984    34.8944  -1.063    0.288
stimulusneutral:condB            -29.5984    34.8944  -0.848    0.397
stimulusspouse:condB              -7.8650    34.8944  -0.225    0.822
stimulusstranger:condB           -28.1650    34.8944  -0.807    0.420
stimulusliving:grp_cgNCG           0.7314    32.4491   0.023    0.982
stimulusneutral:grp_cgNCG         12.2147    32.4491   0.376    0.707
stimulusspouse:grp_cgNCG          21.0981    32.4491   0.650    0.516
stimulusstranger:grp_cgNCG        27.3731    32.4491   0.844    0.400
condB:grp_cgNCG                  -45.2734    32.7648  -1.382    0.168
stimulusliving:condB:grp_cgNCG    14.0234    45.6305   0.307    0.759
stimulusneutral:condB:grp_cgNCG   23.5734    45.6305   0.517    0.606
stimulusspouse:condB:grp_cgNCG    -9.7705    45.7550  -0.214    0.831
stimulusstranger:condB:grp_cgNCG  28.0900    45.6305   0.616    0.539

Residual standard error: 65.75 on 326 degrees of freedom
Multiple R-squared:  0.09915,   Adjusted R-squared:  0.04664 
F-statistic: 1.888 on 19 and 326 DF,  p-value: 0.01431
# predict from stimulus + condition*group
m5 <- lm(bias ~ stimulus + cond*grp_cg, data=medians_long.3)
summary(m5)

Call:
lm(formula = bias ~ stimulus + cond * grp_cg, data = medians_long.3)

Residuals:
    Min      1Q  Median      3Q     Max 
-210.25  -41.33   -0.83   43.29  177.69 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)      -16.78345   10.55990  -1.589 0.112915    
stimulusliving    24.62532   11.14989   2.209 0.027875 *  
stimulusneutral   -1.36754   11.14989  -0.123 0.902457    
stimulusspouse    37.18685   11.19033   3.323 0.000988 ***
stimulusstranger   0.06817   11.14989   0.006 0.995125    
condB             20.07815   10.76037   1.866 0.062916 .  
grp_cgNCG         30.59089   10.04317   3.046 0.002502 ** 
condB:grp_cgNCG  -33.76508   14.18831  -2.380 0.017877 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 65.23 on 338 degrees of freedom
Multiple R-squared:  0.08069,   Adjusted R-squared:  0.06165 
F-statistic: 4.238 on 7 and 338 DF,  p-value: 0.0001679
cat_plot(m5, pred="grp_cg", modx = "cond", plot.points=T, jitter=0.1, point.shape=T)

# predict from stimulus*condition + group
m6 <- lm(bias ~ stimulus*cond + grp_cg, data=medians_long.3)
summary(m6)

Call:
lm(formula = bias ~ stimulus * cond + grp_cg, data = medians_long.3)

Residuals:
     Min       1Q   Median       3Q      Max 
-210.069  -42.615    0.774   40.045  169.752 

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)   
(Intercept)             -13.805     12.268  -1.125  0.26129   
stimulusliving           38.703     15.993   2.420  0.01605 * 
stimulusneutral           6.232     15.993   0.390  0.69705   
stimulusspouse           43.375     15.993   2.712  0.00703 **
stimulusstranger          5.660     15.993   0.354  0.72363   
condB                    14.185     16.106   0.881  0.37907   
grp_cgNCG                13.728      7.170   1.915  0.05639 . 
stimulusliving:condB    -28.243     22.530  -1.254  0.21088   
stimulusneutral:condB   -15.285     22.530  -0.678  0.49796   
stimulusspouse:condB    -12.236     22.611  -0.541  0.58877   
stimulusstranger:condB  -11.271     22.530  -0.500  0.61721   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 65.91 on 335 degrees of freedom
Multiple R-squared:  0.06978,   Adjusted R-squared:  0.04201 
F-statistic: 2.513 on 10 and 335 DF,  p-value: 0.006337
anova(m1,m2) # m2 > m1
Analysis of Variance Table

Model 1: bias ~ stimulus
Model 2: bias ~ cond
  Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
1    341 1477973                                  
2    344 1564255 -3    -86282 6.6357 0.0002286 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
anova(m2,m3) # m3 > m2
Analysis of Variance Table

Model 1: bias ~ cond
Model 2: bias ~ stimulus * cond
  Res.Df     RSS Df Sum of Sq      F   Pr(>F)   
1    344 1564255                                
2    336 1471036  8     93219 2.6615 0.007619 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
anova(m3,m4) # ns
Analysis of Variance Table

Model 1: bias ~ stimulus * cond
Model 2: bias ~ stimulus * cond * grp_cg
  Res.Df     RSS Df Sum of Sq      F Pr(>F)
1    336 1471036                           
2    326 1409176 10     61860 1.4311 0.1651
anova(m4,m5) # ns
Analysis of Variance Table

Model 1: bias ~ stimulus * cond * grp_cg
Model 2: bias ~ stimulus + cond * grp_cg
  Res.Df     RSS  Df Sum of Sq      F Pr(>F)
1    326 1409176                            
2    338 1438043 -12    -28867 0.5565 0.8761
anova(m4,m6) # ns
Analysis of Variance Table

Model 1: bias ~ stimulus * cond * grp_cg
Model 2: bias ~ stimulus * cond + grp_cg
  Res.Df     RSS Df Sum of Sq      F Pr(>F)
1    326 1409176                           
2    335 1455112 -9    -45937 1.1808 0.3064
# predict from stimulus*condition*tot_icg
m7 <- lm(bias ~ stimulus*cond*tot_icg, data=medians_long.3)
summary(m7) # the tot_icg*cond interaction is a significant predictor, t = 2.522, p = .012

Call:
lm(formula = bias ~ stimulus * cond * tot_icg, data = medians_long.3)

Residuals:
    Min      1Q  Median      3Q     Max 
-207.64  -41.77   -1.76   43.87  177.39 

Coefficients:
                               Estimate Std. Error t value Pr(>|t|)  
(Intercept)                     39.9964    23.8580   1.676   0.0946 .
stimulusliving                  20.3388    33.6526   0.604   0.5460  
stimulusneutral                -13.9277    33.6526  -0.414   0.6792  
stimulusspouse                  38.2107    33.6526   1.135   0.2570  
stimulusstranger                10.2604    33.6526   0.305   0.7606  
condB                          -61.1171    33.7395  -1.811   0.0710 .
tot_icg                         -2.0337     0.9399  -2.164   0.0312 *
stimulusliving:condB           -10.1968    47.5913  -0.214   0.8305  
stimulusneutral:condB           26.6026    47.5913   0.559   0.5766  
stimulusspouse:condB            13.1576    47.7113   0.276   0.7829  
stimulusstranger:condB          45.4550    47.5913   0.955   0.3402  
stimulusliving:tot_icg           0.8311     1.3136   0.633   0.5274  
stimulusneutral:tot_icg          0.9093     1.3136   0.692   0.4893  
stimulusspouse:tot_icg           0.2557     1.3136   0.195   0.8458  
stimulusstranger:tot_icg        -0.1699     1.3136  -0.129   0.8972  
condB:tot_icg                    3.3465     1.3271   2.522   0.0122 *
stimulusliving:condB:tot_icg    -0.8509     1.8561  -0.458   0.6470  
stimulusneutral:condB:tot_icg   -1.8901     1.8561  -1.018   0.3093  
stimulusspouse:condB:tot_icg    -1.1811     1.8573  -0.636   0.5253  
stimulusstranger:condB:tot_icg  -2.5368     1.8561  -1.367   0.1726  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 64.83 on 326 degrees of freedom
Multiple R-squared:  0.1241,    Adjusted R-squared:  0.073 
F-statistic:  2.43 on 19 and 326 DF,  p-value: 0.0008619
# no 3-way interaction
interact_plot(m7, pred="tot_icg", modx = "cond", plot.points=T, jitter=0.1, point.shape=T)

Note that in the last plot, people in the CG group shows increased approach bias under condition B.

Exploratory analyses with the genotype data

I wanted to take a look at attachment styles, OXTR genotype, and condition. All of the ECRRS variables are super negatively skewed. I tried some transformations using bestNormalize but their recommendations didn’t help much. However, the residuals for the models do look okay imo. But do bear that in mind as you look at these.

Also, keep in mind that this is still the long dataset, so each person has multiple observations of the bias score. This makes it look like there are more datapoints than there are, and I’m not sure if this is correct because it doesn’t model the interdependence in the data.

# library(bestNormalize)
# norm <- bestNormalize(medians_long.3$tot_ecrrs_spouse_anx)
# medians_long.3$tot_ecrrs_spouse_anx_t <- norm$x.t # pull x.t from norm and assign it back into dataset
# hist(medians_long.3$tot_ecrrs_spouse_anx_t)
# The recent Schiele et al. paper on CG, rs2254298, behavioral inhibition and separation anxiety found:
#  - No significant main effect of OXTR genotype on ICG score (same as we found)
#  - Significant interaction of OXTR x behavioral inhibition on ICG score
#  - Marginal interaction of OXTR x adult separation anxiety on ICG score
m8 <- lm(bias ~ stimulus*cond+tot_ecrrs_global_anx, data=medians_long.3)
summary(m8) # interesting - GLOBAL anxious attachment style OVERALL predicts bias

Call:
lm(formula = bias ~ stimulus * cond + tot_ecrrs_global_anx, data = medians_long.3)

Residuals:
     Min       1Q   Median       3Q      Max 
-210.349  -39.758   -0.459   40.775  169.249 

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)   
(Intercept)              12.021     13.760   0.874  0.38296   
stimulusliving           38.429     15.954   2.409  0.01655 * 
stimulusneutral           5.957     15.954   0.373  0.70909   
stimulusspouse           43.100     15.954   2.701  0.00725 **
stimulusstranger          5.386     15.954   0.338  0.73589   
condB                    13.641     16.068   0.849  0.39652   
tot_ecrrs_global_anx     -5.484      2.393  -2.292  0.02253 * 
stimulusliving:condB    -27.698     22.477  -1.232  0.21872   
stimulusneutral:condB   -14.741     22.477  -0.656  0.51240   
stimulusspouse:condB    -12.042     22.558  -0.534  0.59381   
stimulusstranger:condB  -10.727     22.477  -0.477  0.63352   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 65.75 on 335 degrees of freedom
Multiple R-squared:  0.07412,   Adjusted R-squared:  0.04648 
F-statistic: 2.682 on 10 and 335 DF,  p-value: 0.003604
qqnorm(resid(m8))

m9 <- lm(bias ~ stimulus+cond+tot_ecrrs_spouse_anx, data=medians_long.3)
summary(m9) # but SPOUSE anxious attachment style does NOT predict bias

Call:
lm(formula = bias ~ stimulus + cond + tot_ecrrs_spouse_anx, data = medians_long.3)

Residuals:
     Min       1Q   Median       3Q      Max 
-212.285  -42.418    0.487   38.900  175.662 

Coefficients:
                     Estimate Std. Error t value Pr(>|t|)   
(Intercept)            7.1528    10.2102   0.701  0.48406   
stimulusliving        24.3143    11.2649   2.158  0.03160 * 
stimulusneutral       -1.6786    11.2649  -0.149  0.88163   
stimulusspouse        36.8675    11.3047   3.261  0.00122 **
stimulusstranger      -0.2429    11.2649  -0.022  0.98281   
condB                  0.5470     7.0862   0.077  0.93851   
tot_ecrrs_spouse_anx  -2.6147     2.2984  -1.138  0.25608   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 65.9 on 339 degrees of freedom
Multiple R-squared:  0.05878,   Adjusted R-squared:  0.04212 
F-statistic: 3.528 on 6 and 339 DF,  p-value: 0.002115
qqnorm(resid(m9))

m10 <- lm(bias ~ stimulus+cond+tot_ecrrs_spouse_anx*rs2268498, data=medians_long.3)
summary(m10) # SPOUSE anxious attachment style DOES predict bias in interaction with rs2268498, p = .009 (this also works with rs2268498_C.carrier)

Call:
lm(formula = bias ~ stimulus + cond + tot_ecrrs_spouse_anx * 
    rs2268498, data = medians_long.3)

Residuals:
    Min      1Q  Median      3Q     Max 
-230.84  -40.28    2.06   39.51  169.01 

Coefficients:
                                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)                        8.7028    12.2860   0.708 0.479220    
stimulusliving                    24.7203    11.0979   2.227 0.026579 *  
stimulusneutral                   -1.2725    11.0979  -0.115 0.908780    
stimulusspouse                    37.4645    11.1385   3.364 0.000859 ***
stimulusstranger                   0.1632    11.0979   0.015 0.988277    
condB                              0.5418     6.9792   0.078 0.938165    
tot_ecrrs_spouse_anx              -5.4620     3.8413  -1.422 0.155983    
rs2268498CC                      -25.5855    15.0986  -1.695 0.091087 .  
rs2268498TT                       20.9495    15.4776   1.354 0.176797    
tot_ecrrs_spouse_anx:rs2268498CC  15.1189     5.7161   2.645 0.008555 ** 
tot_ecrrs_spouse_anx:rs2268498TT  -5.3741     5.4725  -0.982 0.326802    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 64.9 on 335 degrees of freedom
Multiple R-squared:  0.09785,   Adjusted R-squared:  0.07092 
F-statistic: 3.634 on 10 and 335 DF,  p-value: 0.0001296
qqnorm(resid(m10))

interact_plot(m10, pred="tot_ecrrs_spouse_anx", modx = "rs2268498", plot.points=T, jitter=0.1, point.shape=T)

m11 <- lm(bias ~ stimulus+cond+tot_ecrrs_spouse_anx*rs2254298, data=medians_long.3)
summary(m11) # but SPOUSE anxious attachment style does not significantly predict bias in interaction with rs2254298, p = .066

Call:
lm(formula = bias ~ stimulus + cond + tot_ecrrs_spouse_anx * 
    rs2254298, data = medians_long.3)

Residuals:
    Min      1Q  Median      3Q     Max 
-211.34  -41.97    2.42   38.87  165.22 

Coefficients:
                                  Estimate Std. Error t value Pr(>|t|)   
(Intercept)                        11.5514    11.0908   1.042  0.29840   
stimulusliving                     22.6773    11.2490   2.016  0.04462 * 
stimulusneutral                    -1.9845    11.2490  -0.176  0.86008   
stimulusspouse                     33.6134    11.2899   2.977  0.00313 **
stimulusstranger                   -3.5066    11.2490  -0.312  0.75545   
condB                              -0.1525     7.0749  -0.022  0.98281   
tot_ecrrs_spouse_anx               -4.1191     2.5190  -1.635  0.10296   
rs2254298A/G                      -24.1837    14.7420  -1.640  0.10187   
tot_ecrrs_spouse_anx:rs2254298A/G  13.3575     7.2449   1.844  0.06613 . 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 64.84 on 327 degrees of freedom
  (10 observations deleted due to missingness)
Multiple R-squared:  0.06512,   Adjusted R-squared:  0.04225 
F-statistic: 2.847 on 8 and 327 DF,  p-value: 0.004524
qqnorm(resid(m11))

m12 <- lm(bias ~ stimulus+cond*tot_ecrrs_spouse_anx*rs2268498, data=medians_long.3)
summary(m12) # non-significant 3-way interaction of condition*spouse anxious attachment*rs2268498CC 

Call:
lm(formula = bias ~ stimulus + cond * tot_ecrrs_spouse_anx * 
    rs2268498, data = medians_long.3)

Residuals:
     Min       1Q   Median       3Q      Max 
-208.851  -42.218    0.567   40.922  173.054 

Coefficients:
                                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)                              4.2576    15.0098   0.284 0.776852    
stimulusliving                          24.6716    11.0607   2.231 0.026382 *  
stimulusneutral                         -1.3212    11.0607  -0.119 0.904990    
stimulusspouse                          37.5057    11.1014   3.378 0.000816 ***
stimulusstranger                         0.1145    11.0607   0.010 0.991747    
condB                                    9.4741    18.7429   0.505 0.613560    
tot_ecrrs_spouse_anx                    -5.0629     5.4139  -0.935 0.350385    
rs2268498CC                              4.3260    21.2774   0.203 0.839015    
rs2268498TT                             27.9779    21.9489   1.275 0.203317    
condB:tot_ecrrs_spouse_anx              -0.7982     7.6565  -0.104 0.917035    
condB:rs2268498CC                      -59.7803    30.0941  -1.986 0.047811 *  
condB:rs2268498TT                      -13.8829    30.8551  -0.450 0.653049    
tot_ecrrs_spouse_anx:rs2268498CC         4.2111     8.0519   0.523 0.601335    
tot_ecrrs_spouse_anx:rs2268498TT        -4.9553     7.7318  -0.641 0.522033    
condB:tot_ecrrs_spouse_anx:rs2268498CC  21.8513    11.3933   1.918 0.055985 .  
condB:tot_ecrrs_spouse_anx:rs2268498TT  -0.8768    10.9069  -0.080 0.935978    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 64.68 on 330 degrees of freedom
Multiple R-squared:  0.1174,    Adjusted R-squared:  0.07725 
F-statistic: 2.926 on 15 and 330 DF,  p-value: 0.0002247
### I THINK I MIGHT NEED TO REDO THIS ONE AFTER RE-LEVELING rs2268498 - REFERENCE LEVEL IS C/T, SHOULD BE TT OR CC?
qqnorm(resid(m12))

m13 <- lm(bias ~ stimulus+cond*tot_ecrrs_spouse_anx*rs2254298, data=medians_long.3)
summary(m13) # significant interaction of spouse anxious attachment*rs2254298, p = .018, no 3-way interaction so m11 > m13 for parsimony

Call:
lm(formula = bias ~ stimulus + cond * tot_ecrrs_spouse_anx * 
    rs2254298, data = medians_long.3)

Residuals:
     Min       1Q   Median       3Q      Max 
-211.971  -42.557    2.942   38.186  163.337 

Coefficients:
                                        Estimate Std. Error t value Pr(>|t|)   
(Intercept)                               16.897     13.051   1.295  0.19635   
stimulusliving                            22.641     11.233   2.016  0.04467 * 
stimulusneutral                           -2.021     11.233  -0.180  0.85737   
stimulusspouse                            33.631     11.274   2.983  0.00307 **
stimulusstranger                          -3.543     11.233  -0.315  0.75269   
condB                                    -10.743     15.336  -0.701  0.48411   
tot_ecrrs_spouse_anx                      -7.082      3.563  -1.988  0.04769 * 
rs2254298A/G                             -35.561     20.839  -1.706  0.08888 . 
condB:tot_ecrrs_spouse_anx                 5.920      5.030   1.177  0.24009   
condB:rs2254298A/G                        22.689     29.443   0.771  0.44149   
tot_ecrrs_spouse_anx:rs2254298A/G         24.268     10.233   2.371  0.01830 * 
condB:tot_ecrrs_spouse_anx:rs2254298A/G  -21.814     14.469  -1.508  0.13262   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 64.75 on 324 degrees of freedom
  (10 observations deleted due to missingness)
Multiple R-squared:  0.07632,   Adjusted R-squared:  0.04496 
F-statistic: 2.434 on 11 and 324 DF,  p-value: 0.006327
qqnorm(resid(m13))

m14 <- lm(bias ~ stimulus+cond+tot_ecrrs_spouse_anx*rs53576, data=medians_long.3)
summary(m14) # no effect of interaction

Call:
lm(formula = bias ~ stimulus + cond + tot_ecrrs_spouse_anx * 
    rs53576, data = medians_long.3)

Residuals:
    Min      1Q  Median      3Q     Max 
-211.42  -41.80    3.57   40.99  175.57 

Coefficients:
                               Estimate Std. Error t value Pr(>|t|)   
(Intercept)                     -0.8630    12.1489  -0.071  0.94341   
stimulusliving                  24.6402    11.2592   2.188  0.02932 * 
stimulusneutral                 -1.3526    11.2592  -0.120  0.90445   
stimulusspouse                  37.2304    11.3003   3.295  0.00109 **
stimulusstranger                 0.0831    11.2592   0.007  0.99412   
condB                            0.5411     7.0809   0.076  0.93913   
tot_ecrrs_spouse_anx             0.8011     3.1744   0.252  0.80091   
rs53576GG                       15.5190    12.6943   1.223  0.22237   
tot_ecrrs_spouse_anx:rs53576GG  -7.3005     4.6192  -1.580  0.11494   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 65.85 on 337 degrees of freedom
Multiple R-squared:  0.06577,   Adjusted R-squared:  0.04359 
F-statistic: 2.965 on 8 and 337 DF,  p-value: 0.003203
qqnorm(resid(m14))

m15 <- lm(bias ~ stimulus+cond*tot_ecrrs_spouse_anx*rs53576, data=medians_long.3)
summary(m15) # significant 3-way interaction of condition*spouse anxious attachment*rs53576 (GG), p = 0.019

Call:
lm(formula = bias ~ stimulus + cond * tot_ecrrs_spouse_anx * 
    rs53576, data = medians_long.3)

Residuals:
     Min       1Q   Median       3Q      Max 
-207.456  -43.412   -0.263   41.221  172.034 

Coefficients:
                                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)                           20.0558    14.6747   1.367 0.172639    
stimulusliving                        24.7431    11.1872   2.212 0.027662 *  
stimulusneutral                       -1.2498    11.1872  -0.112 0.911116    
stimulusspouse                        37.3550    11.2279   3.327 0.000976 ***
stimulusstranger                       0.1859    11.1872   0.017 0.986751    
condB                                -41.4628    18.1445  -2.285 0.022931 *  
tot_ecrrs_spouse_anx                  -6.6900     4.4587  -1.500 0.134450    
rs53576GG                            -15.8468    17.8724  -0.887 0.375900    
condB:tot_ecrrs_spouse_anx            14.9862     6.3081   2.376 0.018078 *  
condB:rs53576GG                       62.5554    25.2260   2.480 0.013638 *  
tot_ecrrs_spouse_anx:rs53576GG         3.5756     6.4942   0.551 0.582290    
condB:tot_ecrrs_spouse_anx:rs53576GG -21.7174     9.1772  -2.366 0.018529 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 65.43 on 334 degrees of freedom
Multiple R-squared:  0.08593,   Adjusted R-squared:  0.05582 
F-statistic: 2.854 on 11 and 334 DF,  p-value: 0.001366
qqnorm(resid(m15))

Some more data viz

library(lattice)
xyplot(bias ~ cond | as.factor(ID), data=medians_long.3, as.table=T, type=c("p","r")) # look at peoples' individual slopes between conditions

xyplot(bias ~ cond | as.factor(grp_cg), data=medians_long.3, as.table=T, type=c("p","r")) ## look at slopes between groups

xyplot(bias ~ stimulus | as.factor(grp_cg), data=medians_long.3, as.table=T, type=c("p","r")) ## look at slopes between groups

New ANOVAs and plots

Repeated-measures ANOVA on the aggregate RTs

Caveat: I didn’t include any covariates in the model yet.

library(tidyverse)
library(afex)
library(emmeans)
library(jtools)
# use the dataset "mlm_long2" that was created in a previous step (it has trial-level RTs; stim, cond, and push_pull are factors)
# add a group variable for CG/NCG, make push_pull & run into factors, and drop the 4 people who failed to reverse the instructions
anovaData <- mlm_long2 %>% 
  mutate(grp_cg = as.factor(ifelse(tot_icg >= 25, "CG", "NCG")), push_pull = as.factor(push_pull), run = as.factor(run)) %>% filter(!grepl("D101|D141|D147|D148",ID))
# repeated-measures ANOVA using medians, with 3 within-subjects factors (stimulus, condition, push or pull) and 1 between-subjects factor (CG vs. NCG)
a1 <- aov_ez(id = "ID", dv = "gAAT_RT", anovaData, between = "grp_cg", within = c("stim", "cond", "push_pull"), fun_aggregate = median) # using non-transformed RTs
Contrasts set to contr.sum for the following variables: grp_cg
a1 
Anova Table (Type 3 tests)

Response: gAAT_RT
                       Effect           df       MSE        F   ges p.value
1                      grp_cg        1, 33 114188.41     1.00   .02     .32
2                        stim 3.57, 117.74   2826.95 7.76 ***   .01  <.0001
3                 grp_cg:stim 3.57, 117.74   2826.95     1.24  .002     .30
4                        cond        1, 33  25949.97     2.65   .01     .11
5                 grp_cg:cond        1, 33  25949.97     0.38  .002     .54
6                   push_pull        1, 33   6543.37     2.62  .003     .12
7            grp_cg:push_pull        1, 33   6543.37     2.27  .003     .14
8                   stim:cond 3.79, 125.20   1461.71     1.01 .0009     .40
9            grp_cg:stim:cond 3.79, 125.20   1461.71     0.67 .0006     .61
10             stim:push_pull 3.46, 114.10   2332.98 8.27 ***   .01  <.0001
11      grp_cg:stim:push_pull 3.46, 114.10   2332.98     0.62 .0009     .62
12             cond:push_pull        1, 33   1615.08     0.78 .0002     .38
13      grp_cg:cond:push_pull        1, 33   1615.08  9.19 **  .003    .005
14        stim:cond:push_pull 3.37, 111.20   1847.22     0.91 .0010     .45
15 grp_cg:stim:cond:push_pull 3.37, 111.20   1847.22     0.69 .0007     .58
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘+’ 0.1 ‘ ’ 1

Sphericity correction method: GG 
# repeated-measures ANOVA using medians, with 3 within-subjects factors (stimulus, condition, push or pull) and 1 between-subjects factor (CG vs. NCG)
a1_t <- aov_ez(id = "ID", dv = "gAAT_RT_t", anovaData, between = "grp_cg", within = c("stim", "cond", "push_pull"), fun_aggregate = median)  # using transformed RTs
Contrasts set to contr.sum for the following variables: grp_cg
a1_t
Anova Table (Type 3 tests)

Response: gAAT_RT_t
                       Effect           df  MSE        F    ges p.value
1                      grp_cg        1, 33 4.02     0.64    .01     .43
2                        stim 3.45, 113.87 0.09 7.38 ***    .01  <.0001
3                 grp_cg:stim 3.45, 113.87 0.09     1.21   .002     .31
4                        cond        1, 33 0.81     2.66    .01     .11
5                 grp_cg:cond        1, 33 0.81     0.75   .003     .39
6                   push_pull        1, 33 0.22     1.71   .002     .20
7            grp_cg:push_pull        1, 33 0.22     2.10   .002     .16
8                   stim:cond 3.78, 124.73 0.04     0.67  .0005     .60
9            grp_cg:stim:cond 3.78, 124.73 0.04     0.92  .0007     .45
10             stim:push_pull 3.64, 120.19 0.06 7.87 ***   .009  <.0001
11      grp_cg:stim:push_pull 3.64, 120.19 0.06     0.71  .0008     .58
12             cond:push_pull        1, 33 0.05     0.19 <.0001     .66
13      grp_cg:cond:push_pull        1, 33 0.05 10.09 **   .003    .003
14        stim:cond:push_pull 3.58, 118.06 0.05     0.62  .0006     .63
15 grp_cg:stim:cond:push_pull 3.58, 118.06 0.05     0.47  .0004     .74
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘+’ 0.1 ‘ ’ 1

Sphericity correction method: GG 

Results don’t differ appreciably between using the transformed and non-transformed RTs.

For both the transformed and non-transformed RTs, there is a significant three-way interaction of Group x Condition x Push_Pull, F(1,33) = 10.09, p = .003, generalized eta-squared = .003 (stats are for transformed variable). However, if I’m interpreting that effect size correctly, it’s a pretty tiny effect (.02 = small effect, .13 = medium, .26 = large).

p13 <- afex_plot(a1, x = "push_pull", trace = "cond", panel = "grp_cg", 
          error = "within", mapping="linetype", error_arg = list(width = .1))
NOTE: Results may be misleading due to involvement in interactions
p13 + theme_light() # greyscale, light theme

p13 + theme_apa() # greyscale, APA theme

p14 <- afex_plot(a1, x = "push_pull", trace = "cond", panel = "grp_cg", 
          error = "within", mapping="color", error_arg = list(width = .1))
NOTE: Results may be misleading due to involvement in interactions
p14 + theme_light() # color, light theme

p14 + theme_apa() # color, APA theme

From these, it looks like the interaction shows a significant effect of Condition (spray A/B), but only for people with CG and only on their push behavior. In the CG group, B decreased avoidance bias* as shown by the within-subjects error bars that barely overlap if at all.

In the two-way interaction plot for the bias score (from the regression), people in the CG group appeared to show increased approach bias under condition B. From the plots above, it looks like what is driving this apparent increase in approach bias is actually decreased avoidance behavior rather than increased approach behavior. It’s just not evident from the difference scores, which are agnostic to cause.

*(well, is it really bias per se if it’s not relative to pull? hmm)

Estimated marginal means

emm <- emmeans(a1, ~c(grp_cg,cond,push_pull))
NOTE: Results may be misleading due to involvement in interactions
emm
 grp_cg cond push_pull   emmean       SE   df lower.CL upper.CL
 CG     A    PULL      749.4461 21.05578 55.6 707.2597 791.6326
 NCG    A    PULL      712.5678 20.36964 50.2 671.6583 753.4773
 CG     B    PULL      765.0661 21.05578 55.6 722.8797 807.2526
 NCG    B    PULL      731.6028 20.36964 50.2 690.6933 772.5123
 CG     A    PUSH      738.1261 21.05578 55.6 695.9397 780.3126
 NCG    A    PUSH      738.4778 20.36964 50.2 697.5683 779.3873
 CG     B    PUSH      777.7661 21.05578 55.6 735.5797 819.9526
 NCG    B    PUSH      744.3128 20.36964 50.2 703.4033 785.2223

Results are averaged over the levels of: stim 
Confidence level used: 0.95 
update(pairs(emm), by=NULL, adjust = "holm")
 contrast                   estimate        SE    df t.ratio p.value
 CG,A,PULL - NCG,A,PULL   36.8783333 29.411984 52.75   1.254  1.0000
 CG,A,PULL - CG,B,PULL   -15.6200000 19.171175 37.09  -0.815  1.0000
 CG,A,PULL - NCG,B,PULL   17.8433333 29.411984 52.75   0.607  1.0000
 CG,A,PULL - CG,A,PUSH    11.3200000 10.429733 48.36   1.085  1.0000
 CG,A,PULL - NCG,A,PUSH   10.9683333 29.411984 52.75   0.373  1.0000
 CG,A,PULL - CG,B,PUSH   -28.3200000 20.814526 48.65  -1.361  1.0000
 CG,A,PULL - NCG,B,PUSH    5.1333333 29.411984 52.75   0.175  1.0000
 NCG,A,PULL - CG,B,PULL  -52.4983333 29.411984 52.75  -1.785  1.0000
 NCG,A,PULL - NCG,B,PULL -19.0350000 16.602725 37.09  -1.146  1.0000
 NCG,A,PULL - CG,A,PUSH  -25.5583333 29.411984 52.75  -0.869  1.0000
 NCG,A,PULL - NCG,A,PUSH -25.9100000  9.032414 48.36  -2.869  0.1707
 NCG,A,PULL - CG,B,PUSH  -65.1983333 29.411984 52.75  -2.217  0.8363
 NCG,A,PULL - NCG,B,PUSH -31.7450000 18.025909 48.65  -1.761  1.0000
 CG,B,PULL - NCG,B,PULL   33.4633333 29.411984 52.75   1.138  1.0000
 CG,B,PULL - CG,A,PUSH    26.9400000 20.814526 48.65   1.294  1.0000
 CG,B,PULL - NCG,A,PUSH   26.5883333 29.411984 52.75   0.904  1.0000
 CG,B,PULL - CG,B,PUSH   -12.7000000 10.429733 48.36  -1.218  1.0000
 CG,B,PULL - NCG,B,PUSH   20.7533333 29.411984 52.75   0.706  1.0000
 NCG,B,PULL - CG,A,PUSH   -6.5233333 29.411984 52.75  -0.222  1.0000
 NCG,B,PULL - NCG,A,PUSH  -6.8750000 18.025909 48.65  -0.381  1.0000
 NCG,B,PULL - CG,B,PUSH  -46.1633333 29.411984 52.75  -1.570  1.0000
 NCG,B,PULL - NCG,B,PUSH -12.7100000  9.032414 48.36  -1.407  1.0000
 CG,A,PUSH - NCG,A,PUSH   -0.3516667 29.411984 52.75  -0.012  1.0000
 CG,A,PUSH - CG,B,PUSH   -39.6400000 19.171175 37.09  -2.068  1.0000
 CG,A,PUSH - NCG,B,PUSH   -6.1866667 29.411984 52.75  -0.210  1.0000
 NCG,A,PUSH - CG,B,PUSH  -39.2883333 29.411984 52.75  -1.336  1.0000
 NCG,A,PUSH - NCG,B,PUSH  -5.8350000 16.602725 37.09  -0.351  1.0000
 CG,B,PUSH - NCG,B,PUSH   33.4533333 29.411984 52.75   1.137  1.0000

Results are averaged over the levels of: stim 
P value adjustment: holm method for 28 tests 

Not quite sure about these pairwise comparisons. Ideally, we’d be able to set up specific contrasts to test as described below but I haven’t figured that out yet. However, looking at the plots, I think it’s safe to interpret the interaction as stated above. I will work on this.

?aov_ez

The S3 object returned per default can be directly passed to emmeans::emmeans for further analysis. This allows to test any type of contrasts that might be of interest independent of whether or not this contrast involves between-subject variables, within-subject variables, or a combination thereof. The general procedure to run those contrasts is the following (see Examples for a full example):

(1) Estimate an afex_aov object with the function returned here. For example: x <- aov_car(dv ~ a*b + (id/c), d)

(2) Obtain a emmGrid-class object by running emmeans on the afex_aov object from step 1 using the factors involved in the contrast. For example: r <- emmeans(x, ~a:c)

(3) Create a list containing the desired contrasts on the reference grid object from step 2. For example: 
con1 <- list(a_x = c(-1, 1, 0, 0, 0, 0), b_x = c(0, 0, -0.5, -0.5, 0, 1))

(4) Test the contrast on the reference grid using contrast. For example: contrast(r, con1)

To control for multiple testing p-value adjustments can be specified. For example the Bonferroni-Holm correction: contrast(r, con1, adjust = "holm")

Writing plots to .pptx

This is a useful tool for high-res figures! Using the officer package to write plots directly to a Powerpoint presentation:

library(officer)
# Create a new powerpoint document
ot_plots <- read_pptx()
# Add a new slide into the ppt document 
ot_plots <- ot_plots %>% 
  add_slide(layout = "Title and Content", master = "Office Theme")
# Add the ggplot2 objects to the slide
ot_plots <- ph_with_gg(ot_plots, value=p14)
# write the document to a file
print(ot_plots, target = "ot-rt-plots.pptx") 
---
title: "OT RT plots"
author: "Saren Seeley"
date: "10-22-2018"
output:
  html_notebook:
    number_sections: no
    theme: paper
    toc: yes
    toc_float: yes
  html_document:
    df_print: paged
    toc: yes
---
# MLM stuff from FSHD617C assignment
## About
### Overview
This ADA uses data from the project “The effect of intranasal oxytocin on neural functioning in widow(er)s” (#1503744420, PI: Mary-Frances O'Connor, Co-PI: Brian Arizmendi). It was a double-blind randomized crossover study, designed to address conflicting accounts of motivated behavior in complicatedgrief using a novel approach-avoidance task variant (grief AAT; gAAT) that allows for comparison between participant-provided photos and nomothetic grief stimuli. Participants were 40 English-speaking adults ages 55-80, who experienced the death of a spouse or long-term romantic partner between six months and three years prior to their participation. 

### Task
Participants attended two identical experimental sessions, at which they received one of two intranasal sprays (treatment A/B) and then took part in an fMRI scan, which involved a structural sequence, the gAAT, and a resting state sequence.

For the gAAT, five stimulus categories were presented: (1) deceased spouse, (2) living loved one [attachment figure], (3) stranger, (4) generic grief-related scenes, and (5) neutral scenes. Photos provided by the participant were scanned and resized to a standard 750-pixel height and width, and framed in colored frames to match the stimuli used for non-idiographic categories (i.e., stranger, grief-related, neutral). Photos of the stranger were sex-matched to the participant's partner. 

During the task, participants were instructed to follow rules based on an irrelevant characteristic of the stimuli, namely, the color of the frame around the photo (e.g., _"push the joystick when the frame is blue, pull when the frame is yellow"_). The task was completed in two runs of approximately seven minutes each. In the second run, the instructions were reversed (e.g. _"now pull the joystick for blue framed photos"_). Order of instructions was counterbalanced across participants. Stimuli were presented via Inquisit 4 (2014), in a pseudorandomized order determined by genetic algorithm to optimize statistical power and psychological validity (Wager & Nichols, 2003). Each trial last 3000ms, with 2500 ms allowed for response. Intertrial interval was 500ms.

Participants completed the task using an MRI-compatible joystick. The stimuli were animated so that a joystick push would make the image smaller (as if they were pushing it away from them) and a joystick pull would make the image larger (as if they were bringing it nearer to them).

### Variables
The primary dependent variable of interest for the present analysis is **reaction time bias**, computed as RT<sub>push</sub>-RT<sub>pull</sub>. Negative bias scores (faster to push than pull) indicate relative avoidance bias, while positive scores (faster to pull than to push) indicate relative approach bias. Because the two runs were identical in every way except for the instruction provided, each trial had both a push RT and a pull RT (e.g., run1 trial1 was push, run2 trial1 was pull, so subtract RT on run2 trial 1 from run1 trial1 to get the difference - this is further elaborated in the data cleaning script below), unless they failed to reverse the instructions.

Main predictors of interest included **trial number** (in order to establish whether peoples' response bias changed over the 7-minute run), **stimulus** (neutral, spouse, stranger, living loved one, death), and **condition** (treatment A/B - oxytocin or placebo [I'm still blinded]). 

Participants also completed a number of self-report measures (state and trait) and provided biological samples for OXTR genotyping and cortisol assay, some of which were used as predictors.

## Equations
#### Level 1:
We first performed MLM to estimate the effect of trial number (i.e., the random effect of time) on response bias (push RT-pull RT) on a given trial. The level-1 model estimated, for each participant _j_ (_j_ = 1–35), a regression line that predicted each participant’s response bias in milliseconds on each trial _i_ from the trial number. This model was represented as follows: 
<blockquote><center><font size=4>
[bias<i><sub>i</sub></i>]<sub>j</sub> = <i>b</i><sub>0<i><sub>j</sub></i></sub> + <i>b</i><sub>1<i><sub>j</sub></i></sub> [trialnumber<i><sub>i</sub></i>] + <i>r<sub>ij</sub></i>
</font></center>
</blockquote>

where <i>b</i><sub>0<i><sub>j</sub></i></sub>, the intercept, is interpreted as participant _j_’s mean response bias for a given trial; <i>b</i><sub>1<i><sub>j</sub></i></sub>, the slope, represents the effect of trial number on response bias for each participant _j_ (positive coefficients represent the predictive magnitude of trial number on response bias for a given participant); and <i>r<sub>ij</sub></i> is the residual error term.

#### Level 2:
The level-2 models estimated the average effects for the entire sample. The models were represented as follows: 

<blockquote><center><font size=4>
<i>b</i><sub>0</sub><sub>j</sub> = γ<sub>00</sub> + μ<sub>0<sub>j</sub></sub>
<p>
<i>b</i><sub>1<sub><i>j</i></sub></sub> = γ<sub>10</sub> +  μ<sub>1<sub>j</sub></sub>
</font></center></blockquote>

where the intercept, γ<sub>00</sub>, is interpreted as the average response bias for the entire sample; γ<sub>10</sub> is the average effect of represents the effect of trial number on response bias for each participant _j_ (positive coefficients represent the predictive magnitude of trial number on response bias for the sample as a whole), and thus, the primary estimate of interest; and μ<sub>0<sub>j</sub></sub> and μ<sub>1<sub>j</sub></sub> are the residual error terms. 

Adding the fixed effect of stimulus type resulted in the following model:

<blockquote><center><font size=4>
<i>b</i><sub>0</sub><sub>j</sub> = γ<sub>00</sub> + γ<sub>01</sub>(stimulus<sub>i</sub>) + μ<sub>0<sub>j</sub></sub>
<p>
<i>b</i><sub>1<sub><i>j</i></sub></sub> = γ<sub>10</sub> + γ<sub>11</sub>(stimulus<sub>i</sub>) + μ<sub>1<sub>j</sub></sub></font></center></blockquote>


## Data cleaning
<i>Note: Went back to the raw data due to some inconsistencies in the `OT_ADAA_Omnibus.xlsx` file from BA that I found.</i>

<b>Pre-R steps</b>:

1. Removed all .iqdat files under 25KB or so in order to filter out pilot data and false starts.<br>
2. Removed non-study data (i.e., tests and OT undergrad files). <br>
3. Organized raw .iqdat files for each visit by run 1/run 2 (two separate folders) based on date/time stamp. <p>
<i>In Terminal, within each of the folders in turn...</i><p>
4. Changed file extensions to .tsv (tab-separated) from .iqdat: `find . -iname "*.iqdat" -exec bash -c 'mv "$0" "${0%\.iqdat}.tsv"' {} \;`<br>
5. Merged the files: `cat *.tsv > OT_gAAT_run-1.tsv` (or `OT_gAAT_run-2.tsv`)
<strike>5. Merged the files: `mlr --tsvlite cat *.tsv > OT_gAAT_run-1.tsv` (or `OT_gAAT_run-2.tsv`)
<ul>5a. `mlr --tsvlite` drops the header from all but the first file, so column names are not repeated throughout the dataset.</ul>
<ul>5b. This requires Miller, which can be installed via Homebrew on OSX (http://johnkerl.org/miller/doc/build.html)</strike> <i>tried to make this work but no luck</i></ul>

<i>Notes:<br>
<ul><li>D117: Had 3 visits due to scanner technical issues. Use "_b" and "_c" as visit 1 and visit 2 respectively.</li>
<li>D114: Has multiple files from visit 2. Use the last two.</li>
<li>D135: Use files from 6/06/16 and 6/17/18 (scanner issues on 06/13/16). Use "_b" and "_c" as visit 1 and visit 2 respectively.</li>
<li>D147: Visit 1 run 2 is partial data for some reason (I do have notes that she switched back to <i>PUSH-YELLOW</i> during run 2, but did not re-run because running way over time.)</li></ul></i>

## Creating the MLM dataset
### Import data
```{r}
library(tidyverse)
filter <- dplyr::filter
select <- dplyr::select

# read in the data (only importing a subset of columns)
raw_r1 <- data.frame(read_tsv("~/Desktop/OT-gAAT_MLM-data/Run1_tsv/OT_gAAT_run-1.tsv", col_types=cols_only(
  date = col_integer(),
  time = col_character(),
  subject = col_character(),
  blockcode = col_character(),
  blocknum = col_character(),
  trialcode = col_character(),
  values.trialcode = col_character(),
  values.stimulus = col_character(),
  values.initialresponse = col_character(),
  values.RT = col_number()
)))

raw_r2 <- data.frame(read_tsv("~/Desktop/OT-gAAT_MLM-data/Run2_tsv/OT_gAAT_run-2.tsv", col_types=cols_only(
  date = col_integer(),
  time = col_character(),
  subject = col_character(),
  blockcode = col_character(),
  blocknum = col_character(),
  trialcode = col_character(),
  values.trialcode = col_character(),
  values.stimulus = col_character(),
  values.initialresponse = col_character(),
  values.RT = col_number()
)))

# header line repeats when I used `cat` to merge, so get those out of there
raw_r1 <- raw_r1 %>% filter(!is.na(date))
raw_r2 <- raw_r2 %>% filter(!is.na(date))

unique(raw_r1$subject)
# D118 is showing up as "test" for some reason (checked against the .iqdat file), and D130_b is missing
# both of D130's and D142's visits were entered as "D130"/"D142", so change the subject ID for the visit on 052416 to D130_b and on 
raw_r1$subject[raw_r1$date == 020916] <- "D118"
raw_r1$subject[raw_r1$date == 052416] <- "D130_b"
raw_r1$subject[raw_r1$date == 111516] <- "D142_b"

# fixing some other stuff in a similar vein
raw_r1$subject[raw_r1$subject == "D101_2"] <- "D101_b"
raw_r1$subject[raw_r1$subject == "D102_B"] <- "D102_b"
raw_r1$subject[raw_r1$subject == "D117_b"] <- "D117"
raw_r1$subject[raw_r1$subject == "D117_c"] <- "D117_b"
raw_r1$subject[raw_r1$subject == "D135_c"] <- "D135_b"

unique(raw_r1$subject) # 78 unique - that is correct

unique(raw_r2$subject)
# some similar issues to fix
raw_r2$subject[raw_r2$subject == "D101_2Y"] <- "D101_b"
raw_r2$subject[raw_r2$subject == "D102_B"] <- "D102_b"
raw_r2$subject[raw_r2$subject == "D107_B_2"] <- "D107_b"
raw_r2$subject[raw_r2$subject == "D117_b"] <- "D117"
raw_r2$subject[raw_r2$subject == "D117_c"] <- "D117_b"
raw_r2$subject[raw_r2$subject == "D126_B"] <- "D126_b"
raw_r2$subject[raw_r2$subject == "D135_c"] <- "D135_b"
raw_r2$subject[raw_r2$date == 111516] <- "D142_b"


unique(raw_r2$subject) # 78 unique - that is correct

str(raw_r1)
str(raw_r2)
```

```{r}
# take out ITIs
unique(raw_r1$trialcode)
mlm_r1 <- filter(raw_r1, grepl("^A",trialcode))
unique(mlm_r1$trialcode)

unique(raw_r2$trialcode)
mlm_r2 <- filter(raw_r2, grepl("^A",trialcode))
unique(mlm_r2$trialcode)

# add a "trialnum" variable, grouped by ID
mlm_r1 <- mlm_r1 %>% group_by(subject) %>% mutate(trialnum = row_number())
View(mlm_r2)
mlm_r2 <- mlm_r2 %>% group_by(subject) %>% mutate(trialnum = row_number())
View(mlm_r2)

mlm_r1 %>% group_by(trialnum) %>% count(trialnum) # everyone has 144 trials, hooray!
mlm_r2 %>% group_by(trialnum) %>% count(trialnum) # everyone has 144 trials, one person has 177
mlm_r2 %>% group_by(subject) %>% count() # that person is D122_b
# no clue why that happened, although I do vaguely remember we had one session where the task seemed to go on and on and we force quit it?
# remove trials 145-177 for D122_b

mlm_r2 <- filter(mlm_r2, trialnum <=144)
mlm_r2 %>% group_by(trialnum) %>% count(trialnum)
# now everyone has 144 trials

# add a column for "run"
mlm_r1$run <- "1"
mlm_r2$run <- "2"

# add a column for "visit"
mlm_r1$visit <- as.factor(ifelse(grepl("*_b", mlm_r1$subject), "2", "1"))
mlm_r2$visit <- as.factor(ifelse(grepl("*_b", mlm_r2$subject), "2", "1"))

mlm_r1 %>% group_by(subject) %>% count(visit) # all looks good
mlm_r2 %>% group_by(subject) %>% count(visit) # all looks good

# now remove the "_b" from visit 2 IDs
mlm_r1 <- mlm_r1 %>% ungroup() %>%
  mutate(subject = str_replace(subject, "_b", ""))
unique(mlm_r1$subject)

mlm_r2 <- mlm_r2 %>% ungroup() %>%
  mutate(subject = str_replace(subject, "_b", ""))
unique(mlm_r2$subject)

# merge data from the 2 runs 
mlm_bx <- bind_rows(mlm_r1, mlm_r2)

# split up by visit
mlm_v1 <- mlm_bx %>% filter(visit == "1")
mlm_v2 <- mlm_bx %>% filter(visit == "2")

mlm_v1 %>% group_by(trialnum) %>% count(trialnum) # everyone has 144 trials, hooray!
mlm_v2 %>% group_by(trialnum) %>% count(trialnum) # everyone has 144 trials, hooray!
```

```{r}
# add columns for "cond" at visit 1 and visit 2 [condition: A or B]
# load in the randomization data
randomize <- readRDS("~/Dropbox/GLASS Lab/OT Study/data/cleaned-data/randomization_clean.rds")
head(randomize)

mlm_v1 <- mlm_v1 %>%
  rename("ID" = subject)

mlm_v2 <- mlm_v2 %>%
  rename("ID" = subject)

# get the IDs for everyone who got treatment A at visit 1 
IDs_v1_txA <- randomize %>% filter(tx_v1 == "A")
View(IDs_v1_txA)

# ifelse statement: if ID = any of those listed in txA_list, make cond_v1 = "A", else make it "B"
mlm_v1 <- mlm_v1 %>%
  mutate(cond_v1 = as.factor(ifelse(ID == "D101" | ID == "D105" | ID == "D109" | ID == "D110" | ID == "D114" | ID == "D115" | ID == "D116" | ID == "D118" | ID == "D119" | ID == "D120" | ID == "D123" | ID == "D124" | ID == "D126" | ID == "D128" | ID == "D132" | ID == "D135" | ID == "D136" | ID == "D137" | ID == "D138" | ID == "D140" | ID == "D141" | ID == "D145" | ID == "D148" | ID == "D149" | ID == "D117", "A", "B")), cond_v2 = as.factor(ifelse(ID == "D101" | ID == "D105" | ID == "D109" | ID == "D110" | ID == "D114" | ID == "D115" | ID == "D116" | ID == "D118" | ID == "D119" | ID == "D120" | ID == "D123" | ID == "D124" | ID == "D126" | ID == "D128" | ID == "D132" | ID == "D135" | ID == "D136" | ID == "D137" | ID == "D138" | ID == "D140" | ID == "D141" | ID == "D145" | ID == "D148" | ID == "D149" | ID == "D117", "B", "A")))
head(mlm_v1)

# ifelse statement: if ID = any of those listed in txA_list, make cond_v2 = "B", else make it "A" (people who were A at visit 1 should be B for visit 2, and vice versa)
mlm_v2 <- mlm_v2 %>%
  mutate(cond_v2 = as.factor(ifelse(ID == "D101" | ID == "D105" | ID == "D109" | ID == "D110" | ID == "D114" | ID == "D115" | ID == "D116" | ID == "D118" | ID == "D119" | ID == "D120" | ID == "D123" | ID == "D124" | ID == "D126" | ID == "D128" | ID == "D132" | ID == "D135" | ID == "D136" | ID == "D137" | ID == "D138" | ID == "D140" | ID == "D141" | ID == "D145" | ID == "D148" | ID == "D149" | ID == "D117", "B", "A")), cond_v1 = as.factor(ifelse(ID == "D101" | ID == "D105" | ID == "D109" | ID == "D110" | ID == "D114" | ID == "D115" | ID == "D116" | ID == "D118" | ID == "D119" | ID == "D120" | ID == "D123" | ID == "D124" | ID == "D126" | ID == "D128" | ID == "D132" | ID == "D135" | ID == "D136" | ID == "D137" | ID == "D138" | ID == "D140" | ID == "D141" | ID == "D145" | ID == "D148" | ID == "D149" | ID == "D117", "A", "B")))

levels(mlm_v1$cond_v1)
levels(mlm_v2$cond_v2)

head(mlm_v1)
  
IDs_v1_txA$ID
mlm_v1 %>% group_by(cond_v1) %>% count(ID) # IDs and cond match txA_list
mlm_v2 %>% group_by(cond_v2) %>% count(ID) # IDs and cond match txA_list

mlm_bx <- bind_rows(mlm_v1,mlm_v2)
View(mlm_bx)

# save it
saveRDS(mlm_bx, "~/Dropbox/2018Fall/FSHD617C/mlm_bx.rds")
```

```{r}
# rename some variables
colnames(mlm_bx)
mlm_bx <- mlm_bx %>% rename("push_pull" = values.initialresponse,
                  "gAAT_RT" = values.RT)
head(mlm_bx)

# add a new variable for condition
mlm_bx1 <- mlm_bx %>% mutate(cond = as.factor(ifelse(cond_v1 == "A" & visit == "1", "A",
                              ifelse(cond_v2 == "A" & visit == "2", "A",
                                    ifelse(cond_v1 == "B" & visit == "1", "B",
                                           ifelse(cond_v2 == "B" & visit == "2", "B", NA))))))

View(mlm_bx1) # check that it coded the "cond" column correctly      

# add a new variable for stimulus category
# in values.stimulus:
# 1 = spouse, 2 = living loved one/WHOTO, 3 = stranger, 4 = nomothetic death-related, 5 = neutral images

sort(unique(mlm_bx1$values.stimulus))

mlm_bx2 <- mlm_bx1 %>%
  mutate(stim = as.factor(ifelse(values.stimulus == "1B_1.jpg"|values.stimulus == "1B_2.jpg"|values.stimulus == "1B_3.jpg"|values.stimulus == "1Y_1.jpg"|values.stimulus == "1Y_2.jpg"|values.stimulus == "1Y_3.jpg", "spouse", 
                       ifelse(values.stimulus == "2B_1.jpg"|values.stimulus == "2B_2.jpg"|values.stimulus == "2B_3.jpg"|values.stimulus == "2Y_1.jpg"|values.stimulus == "2Y_2.jpg"|values.stimulus == "2Y_3.jpg", "living",
                              ifelse(values.stimulus == "3B_1.jpg"|values.stimulus == "3B_2.jpg"|values.stimulus == "3B_3.jpg"|values.stimulus == "3Y_1.jpg"|values.stimulus == "3Y_2.jpg"|values.stimulus == "3Y_3.jpg", "stranger",
                                     ifelse(values.stimulus == "4B_1.jpg"|values.stimulus == "4B_2.jpg"|values.stimulus == "4B_3.jpg"|values.stimulus == "4Y_1.jpg"|values.stimulus == "4Y_2.jpg"|values.stimulus == "4Y_3.jpg", "death",
                                            ifelse(values.stimulus == "5B_1.jpg"|values.stimulus == "5B_2.jpg"|values.stimulus == "5B_3.jpg"|values.stimulus == "5Y_1.jpg"|values.stimulus == "5Y_2.jpg"|values.stimulus == "5Y_3.jpg", "neutral", NA)))))))

levels(mlm_bx2$stim)
mlm_bx2$stim <- relevel(mlm_bx2$stim, ref = "neutral") # make neutral the reference level
                       
# save it
saveRDS(mlm_bx2, "~/Dropbox/2018Fall/FSHD617C/mlm_bx2.rds")
```

### Bias scores
Separate out push minus pull, then subtract run1 from run2.
```{r}

mlm_r1.1 <- mlm_bx2 %>% filter(run == "1")
tail(mlm_r1.1) # does it line up?
mlm_r2.1 <- mlm_bx2 %>% filter(run == "2")
tail(mlm_r2.1) 

p99r1 <- quantile(mlm_r1.1$gAAT_RT, 0.99) 
p01r1 <- quantile(mlm_r1.1$gAAT_RT, 0.01) 
p99r1 # 99th percentile RTs >= 1713.48ms 
p01r1 # 1st percentile RTs <= 472ms

p99r2 <- quantile(mlm_r2.1$gAAT_RT, 0.99) 
p01r2 <- quantile(mlm_r2.1$gAAT_RT, 0.01) # 1st percentile RTs <= 463ms
p99r2 # 99th percentile RTs >= 1712.45ms 
p01r2 # 1st percentile RTs <= 463ms

mlm_r1.2 <- mlm_r1.1 %>%
  mutate(outlier_RT = ifelse(gAAT_RT <= p01r1 | gAAT_RT >= p99r1, 1, 0))
table(mlm_r1.2$outlier_RT) 
227/11005 # 227 trials will be dropped, 11005 will be retained...about 2% of trials.

mlm_r2.2 <- mlm_r2.1 %>%
  mutate(outlier_RT = ifelse(gAAT_RT <= p01r2 | gAAT_RT >= p99r2, 1, 0))
table(mlm_r2.2$outlier_RT) 
230/11002 # 230 trials will be dropped, 11002 will be retained...about 2% of trials.

mlm_r1r2 <- bind_cols(mlm_r1.2, mlm_r2.2) # 11232 trials total.
head(mlm_r1r2) # make sure that trials in the two runs line up post-merge
tail(mlm_r1r2) # make sure that trials in the two runs line up post-merge

# get rid of any trials with outliers in either run1 or run2 
# (need run1/run2 to be the same length so that the trials match up)
mlm_r1r2.1 <- mlm_r1r2 %>% filter(outlier_RT == "0" & outlier_RT1 == "0") 
count(mlm_r1r2.1) # 10790 obs.
10790/11232 # 96% of trials retained = 4% of trials dropped. That feels acceptable.

push_pull <- mlm_r1r2.1 %>% filter(push_pull == "PUSH" & push_pull1 == "PULL") %>% mutate(bias = (gAAT_RT-gAAT_RT1))
pull_push <- mlm_r1r2.1 %>% filter(push_pull == "PULL" & push_pull1 == "PUSH") %>% mutate(bias = (gAAT_RT1-gAAT_RT))

mlm_pp <- bind_rows(push_pull,pull_push)

View(mlm_pp)
```

### Transformations
```{r}
library(psych)
describe(mlm_pp$bias) # skewness is 0.1, kurtosis is 1.69
hist(mlm_pp$bias)
qqnorm(mlm_pp$bias)
ggplot(mlm_pp, aes(sample=bias))+stat_qq()+stat_qq_line() # not a good look at all...

# install.packages("bestNormalize")
# what's the best transformation for my data?
library(bestNormalize)

orderNorm_bias <- bestNormalize(mlm_pp$bias) # suggested orderNorm transform as best
mlm_pp$bias_t <- orderNorm_bias$x.t # pull x.t from orderNorm_bias and assign it back into dataset

describe(mlm_pp$bias_t)
hist(mlm_pp$bias_t)
qqnorm(mlm_pp$bias_t)
ggplot(mlm_pp, aes(sample=bias_t))+stat_qq()+stat_qq_line() # so much better!

# back-transformation, just to show how to do it:
p <- predict(orderNorm_bias)
    x2 <- predict(orderNorm_bias, newdata = p, inverse = TRUE) # x2 is the back-transformed data
    all.equal(x2, mlm_pp$bias_t) 

mlm_bx3 <- mlm_pp
saveRDS(mlm_bx3, "~/Dropbox/2018Fall/FSHD617C/mlm_bx3.rds")
```

### Checking the data
```{r}
table <- mlm_bx3 %>% group_by(ID) %>% group_by(cond) %>% count(ID) 
table
# missing data...
# cond A: D101 (3 trials), D141 (18 trials), D148 (19 trials)
# cond B:  D147 (1 trial)

describe(table$n)
ggplot(table, aes(sample=n))+stat_qq()+stat_qq_line() # can see those 4 on the QQ plot

# where are the trials getting dropped?
mlm_r1r2 %>% group_by(ID) %>% group_by(cond) %>% count(ID) # everyone has 144 here
mlm_r1r2.1 %>% group_by(ID) %>% group_by(cond) %>% count(ID) 
# after getting rid of outliers, most people missing a few trials (D107 cond A/B and D113 cond A missing >20)

mlm_r1r2.1 %>% filter(ID=="D101" & cond=="A")
mlm_r1r2.1 %>% filter(ID=="D141" & cond=="A")
mlm_r1r2.1 %>% filter(ID=="D148" & cond=="A")
mlm_r1r2.1 %>% filter(ID=="D147" & cond=="B")
# so trials are getting dropped because they didn't effectively reverse the instructions or they reverted to the old instructions
# in essence, this removes error trials

# create a "drop" variable for these 4, just in case...
mlm_bx3 <- mlm_bx3 %>% mutate(low_n_trials = ifelse(ID == "D101" | ID == "D141" | ID == "D147" | ID == "D148", 1, 0))

# check whether the rows line up
notlinedup <- mlm_bx3 %>% filter(stim != stim1)
notlinedup
notlinedup <- mlm_bx3 %>% filter(trialcode != trialcode1)
notlinedup
notlinedup <- mlm_bx3 %>% filter(trialnum != trialnum1)
notlinedup
# 0 rows returned; all the rows match

head(mlm_bx3)
saveRDS(mlm_bx3, "~/Dropbox/2018Fall/FSHD617C/mlm_bx3.rds")
```

### Add time-invariant variables
Next step is to add demographics and other variables that we might potentially want to include in the model but that do not change over time. Looking at the codebook for the master dataset, these include:

Demographics:

* `age`
* `age_yrs`
* `sex_m`
* `ethnicity_hisp`
* `race`
* `timesincedeath`

Genetic variables:

* `rs2254298`
* `rs2268498`
* `rs53576`
* `rs2254298_A.carrier`
* `rs2268498_C.carrier`
* `rs53576_A.carrier`

Self-report measures:

* `tot_icg` [grief]
* `tot_bdi` [depression]
* `tot_bgq` [brief grief questionnaire with impairment item]
* `tot_ecrrs_global_anx` [anxious attachment - global]
* `tot_ecrrs_global_avoid` [avoidant attachment - global]
* `tot_ecrrs_spouse_anx` [anxious attachment - spouse]
* `tot_ecrrs_spouse_avoid` [avoidant attachment - spouse]
* `tot_ysl` [yearning for the deceased]


```{r}
# load in master dataset
master <- readRDS("~/Dropbox/GLASS Lab/OT Study/data/cleaned-data/selfreports_oxtr_cortisol_random_AB_bx_clean.rds") 

# subset the selected variables
mlm_add <- subset(master, select=c(ID,age, age_yrs, sex_m, ethnicity_hisp, race, timesincedeath, rs2254298, rs2268498, rs53576, rs2254298_A.carrier, rs2268498_C.carrier, rs53576_A.carrier, tot_bdi, tot_bgq, tot_ecrrs_global_anx,tot_ecrrs_global_avoid, tot_ecrrs_spouse_anx, tot_ecrrs_spouse_avoid, tot_ysl, tot_icg))
head(mlm_add)

# merge into the behavioral dataset
mlm_bx4 <- left_join(mlm_bx3, mlm_add, by="ID")
head(mlm_bx4)

# save it
saveRDS(mlm_bx4, "~/Dropbox/2018Fall/FSHD617C/mlm_bx4.rds")
```


## Exploratory analysis and checking assumptions

First step is to overview the entire dataset using `summarytools`. 
```{r}
library(summarytools)

# overview the entire dataset
# this will look like TOTAL crap in the console (notebook output)
# use view(dfSummary(mlm_bx4)) from the console to see a much more readable and attractive output
dfSummary(mlm_bx4)
 # no missing data on any of the variables!
```

### Descriptives
```{r}
library(psych)
library(stargazer)
library(summarytools)

# descriptives for the time-invariant variables are going to be wrong if we use mlm_bx4 (since observations are duplicated)
# so going back to the mlm_add object...
# note that n = 40, but n = 39 with RT data
# D149's behavioral data was dropped (ran outside of scanner), so drop his case

mlm_add <- filter(mlm_add, !grepl("D149",ID))
unique(mlm_add$ID) # it worked
nums <- select_if(mlm_add, is.numeric) # just get the numeric variables
view(dfSummary(nums)) 
stargazer(nums, type="text", title="Descriptives", digits=2, out="table1.txt", notes = "Note: 'stargazer' makes nicer looking tables, but doesn't provide skewness or kurtosis statistics, so we'll use 'psych' for that.") 

describe(nums)
```

### Frequencies
```{r}
# which variables are factors?
mlm_add %>% select_if(is.factor) %>% colnames()

# hmm, rs2268498 and rs53576 should also be factors, so fix that now
mlm_add <- mlm_add %>% mutate(rs2268498 = recode_factor(rs2268498, "C/T" = "C/T", "CC" = "CC", "TT" = "TT"), rs53576 = recode_factor(rs53576, "A/G" = "A/G", "AA" = "AA", "GG" = "GG"))
mlm_add %>% select_if(is.factor) %>% colnames()

# function to count frequencies for variables that are factors...this isn't working
# countAll <- function(basedata)
# {
#	facs <- sapply(basedata, is.factor)
#	facdata <- basedata[ ,facs]
#	for(i in 1:length(facdata))
#	{
#		count(facdata)
#	 }
# }
# countAll(mlm_bx4)

mlm_add %>% count(sex_m) # 28 females, 11 males
mlm_add %>% count(race) # 38 white, 1 another race not listed
mlm_add %>% count(ethnicity_hisp) # 37 non-Hispanic/Latino, 2 Hispanic/Latino
mlm_add %>% count(rs2254298) # 22 GG, 10 A/G, 1 NA/missing (0 AA)
mlm_add %>% count(rs2268498) # 17 C/T, 10 CC, 12 TT
mlm_add %>% count(rs53576) # 21 A/G, 18 GG (0 AA)
mlm_add %>% count(rs2254298_A.carrier) # yes = 10, no = 28, NA = 1
mlm_add %>% count(rs2268498_C.carrier) # yes = 27, no = 12
mlm_add %>% count(rs53576_A.carrier) # yes = 21, no = 18
```

### Checking distribution shapes and normality
Got a preview of this above when we used `summarytools`, but here's a closer look at some histograms and QQ plots.
```{r}
# a function to produce histograms for all numeric variables in a dataframe
histAll <- function(basedata)
{
	nums <- sapply((basedata), is.numeric)
	numdata <- basedata[ ,nums]
	par(mfrow=c(4,4))
	for(i in 1:length(numdata))
	{
		hist(numdata[,i], xlab=NULL, main=names(numdata[i]))
	}
}

histAll(as.data.frame(mlm_bx4))

```

### Centering L2 predictors

```{r}
# pull out only the total score variables, age, timesincedeath
gcvars <- as.data.frame(mlm_bx4[, grep('^tot|age|timesincedeath|ID', names(mlm_bx4))])
gcvars1 <- gcvars[!duplicated(gcvars$ID), ]
stargazer(gcvars1, type="text", median = T, digits=3)

gmean_age <- mean(gcvars1$age)
gmean_age_yrs <- mean(gcvars1$age_yrs)
gmean_time <- mean(gcvars1$timesincedeath)
gmean_bdi <- mean(gcvars1$tot_bdi)
gmean_bgq <- mean(gcvars1$tot_bgq)
gmean_ecrrs_g_anx <- mean(gcvars1$tot_ecrrs_global_anx)
gmean_ecrrs_g_av <- mean(gcvars1$tot_ecrrs_global_avoid)
gmean_ecrrs_s_anx <- mean(gcvars1$tot_ecrrs_spouse_anx)
gmean_ecrrs_s_av <- mean(gcvars1$tot_ecrrs_spouse_avoid)
gmean_icg <- mean(gcvars1$tot_icg)
gmean_ysl <- mean(gcvars1$tot_ysl)

mlm_bx5 <- mlm_bx4 %>% mutate(gmean_age = age-gmean_age, gmean_age_yrs = age_yrs-gmean_age_yrs, gmean_timesincedeath = timesincedeath-gmean_time, gmean_bdi = tot_bdi-gmean_bdi, gmean_bgq = tot_bgq-gmean_bgq, gmean_ecrrs_global_anx = tot_ecrrs_global_anx-gmean_ecrrs_g_anx, gmean_ecrrs_global_avoid = tot_ecrrs_global_avoid-gmean_ecrrs_g_av, gmean_ecrrs_spouse_anx = tot_ecrrs_spouse_anx-gmean_ecrrs_s_anx, gmean_ecrrs_spouse_avoid = tot_ecrrs_spouse_avoid-gmean_ecrrs_s_av, gmean_ysl = tot_ysl-gmean_ysl, gmean_icg = tot_icg-gmean_icg)
stargazer(as.data.frame(mlm_bx5), type="text") # note that means of the group-mean centered variables are not zero - I think this is because some people had more trials than others

# test that hypothesis with the non-duplicated data
gcvars2 <- gcvars1 %>% mutate(gmean_age = age-gmean_age, gmean_age_yrs = age_yrs-gmean_age_yrs, gmean_timesincedeath = timesincedeath-gmean_time, gmean_bdi = tot_bdi-gmean_bdi, gmean_bgq = tot_bgq-gmean_bgq, gmean_ecrrs_global_anx = tot_ecrrs_global_anx-gmean_ecrrs_g_anx, gmean_ecrrs_global_avoid = tot_ecrrs_global_avoid-gmean_ecrrs_g_av, gmean_ecrrs_spouse_anx = tot_ecrrs_spouse_anx-gmean_ecrrs_s_anx, gmean_ecrrs_spouse_avoid = tot_ecrrs_spouse_avoid-gmean_ecrrs_s_av, gmean_ysl = tot_ysl-gmean_ysl, gmean_icg = tot_icg-gmean_icg)
stargazer(as.data.frame(gcvars2), type="text") # yes, the latter are zero

saveRDS(mlm_bx5, "~/Dropbox/2018Fall/FSHD617C/mlm_bx5.rds")
```

### Evaluate missing data 
```{r}
mlm_bx5 %>% 
  select_if(function(x) any(is.na(x))) %>% 
  summarise_all(funs(mean(is.na(.)))) -> mlm_bx5_NAs # or use sum() instead of mean() to find n missing values

mlm_bx5_NAs <- gather(mlm_bx5_NAs) # create long data that lists variables with any NAs, and % of data missing
mlm_bx5_NAs
```

One person is missing rs2254298 genotype (`rs2254298` and `rs2254298_A.carrier`), which we already knew about.

### Scale reliabilities
This dataset does not include the item-level data, but we can get that from the master dataset we loaded in earlier.
```{r}
library(psych)
mlm_scales <- filter(master, !grepl("D149",ID))

## BDI
mlm_tot_bdi <- subset(mlm_scales, select=c(bdi_1c:bdi_21c))  
psych::alpha(mlm_tot_bdi)

## BGQ
mlm_tot_bgq <- subset(mlm_scales, select=c(bgq_1:bgq_5))
psych::alpha(mlm_tot_bgq)

## ICG
mlm_tot_icg <- subset(mlm_scales, select=c(icg_1:icg_19))
psych::alpha(mlm_tot_icg)

## YSL
mlm_tot_ysl <- subset(mlm_scales, select=c(ysl_1:ysl_21))
psych::alpha(mlm_tot_ysl)

## ECRRS-global anxiety
mlm_tot_ecrrs_global_anx <- subset(mlm_scales,  select=c(ecrrs_global_7c:ecrrs_global_9c))
psych::alpha(mlm_tot_ecrrs_global_anx)

## ECRRS-global avoid
mlm_tot_ecrrs_global_avoid <- subset(mlm_scales,  select=c(ecrrs_global_1r:ecrrs_global_4r, ecrrs_global_5c, ecrrs_global_6c))
psych::alpha(mlm_tot_ecrrs_global_avoid)

## ECRRS-spouse anxiety
mlm_tot_ecrrs_spouse_anx <- subset(mlm_scales,  select=c(ecrrs_spouse_7c:ecrrs_spouse_9c))
psych::alpha(mlm_tot_ecrrs_spouse_anx)

## ECRRS-spouse avoid
mlm_tot_ecrrs_spouse_avoid <- subset(mlm_scales,  select=c(ecrrs_spouse_1r:ecrrs_spouse_4r, ecrrs_spouse_5c, ecrrs_spouse_6c))
psych::alpha(mlm_tot_ecrrs_spouse_avoid)

```

### Drop the 4 people with low trial counts
```{r}
data <- filter(mlm_bx5, !grepl("D101|D141|D147|D148",ID))
unique(data$ID)
```

## OLS estimates for exploratory purposes

```{r}
# getting regular regression estimates one person at a time for exploratory purposes:
lmEst <- by(data, data$ID, function(x) summary(lm(bias_t ~ trialnum, data=x)))
lmEst
```

### Stem plot for the intercepts based on the OLS estimates:
```{r}
int <- by(data, data$ID, function(x) coefficients(lm(bias_t ~ trialnum, data=x))[[1]])
int

summary(int)
stem(int, scale=2)
hist(int)
```
### Stem plot for the slopes based on the OLS estimates
```{r}
slope <- by(data, data$ID, function(x) coefficients(lm(bias_t ~ trialnum, data=x))[[2]])
slope

summary(slope)
stem(slope, scale=2)
hist(slope)
```
Note that these are all extremely close to zero - probably because of the lack of time effect.

### R-square for OLS models
```{r}

rsq <- by(data, data$ID, function(x) summary(lm(bias_t ~ trialnum, data=x))$r.squared)
summary(rsq)
stem(rsq, scale=2)

cor(int, slope)
```
The correlation between intercepts and slopes is -0.475, so people who show less avoidance bias overall change less over the course of the task.
 
## Graphing multilevel data for exploratory purposes
```{r}
library(lattice)
xyplot(bias_t ~ trialnum | ID, data = data, as.table=T) # too many people/hard to see, try random subset

ids <- sample(unique(data$ID), 12)
ids
temp <- data[data$ID %in% ids, ]
xyplot(bias_t ~ trialnum | ID, data = temp, as.table=T) 

# add fit line(s)
xyplot(bias_t ~ trialnum | ID, data = temp, type=c("p","r")) # linear fit
xyplot(bias_t ~ trialnum | ID, data = temp, type=c("p","smooth")) # curvilinear fit

# graph of the raw data for each person
interaction.plot(data$trialnum, data$ID, data$bias_t)

# generate linear fitted values for each person and then plot all together in one plot
fit <- by(data, data$ID, function(x) fitted.values(lm(bias_t ~ trialnum, data=x)))
fit2 <- unlist(fit)
interaction.plot(data$trialnum, data$ID, fit2, xlab="trial number", ylab="RT bias (transformed)")
```
There does not seem to be either a linear or curvilinear effect of trial number (i.e., time) on RT bias in the task.


## Analyses (NHST)
<center><i>Remember, the effect of time refers to `trialnumber` (this is NOT the `time` variable; `time` refers to time of day when task was run.)</i></center>

### Unconditional means model
```{r}
library(nlme)

# unconditional means model
m1 <- lme(bias_t ~ 1, random = ~ 1 | ID, data=data, method="ML")
summary(m1)

# ICC
iVar <- as.numeric(VarCorr(m1)[1,1])
residVar <- as.numeric(VarCorr(m1)[2,1])
icc1 <- iVar/(iVar+residVar) 
icc1
```
The overall mean bias score (transformed) is about -0.002, with a standard deviation of 0.023.

ICC is about 1.6% of the variance. This means that 1.6% of the variance in bias is explained by people being different from one another.

### Add fixed and random intercept and slope of time
```{r}
# model with fixed and random intercept and slope of time (intercepts implied)
m2 <- lme(bias_t ~ trialnum, data = data, random = ~ trialnum | ID, method="ML")
summary(m2) 
anova(m2)
```
The average bias at trial number = 0 is -0.016 and gets increasingly approach-biased by a nonsignificant fraction of a percent with each unit of time. There is no significant effect of trial number (time) on RT bias score, _t_ = 0.74, _p_ = 0.457. 

```{r}
# ICC
iVar <- as.numeric(VarCorr(m2)[1,1])
residVar <- as.numeric(VarCorr(m2)[2,1])
icc2 <- iVar/(iVar+residVar)
icc2
```
99.99% of variance in bias is due to the between-person variance, after controlling for time. This is consistent with the lack of a significant effect of time. 
However... I have no clue how this makes sense with the ICC of 0.016 from the unconditional means model above, which suggests that <2% of the variance is due to between-person differences. ZERO clue. Atina tried to help me think through this for 20 minutes and I still don't get it.

### Adding stimulus as a fixed effect
There were 5 different types of stimuli, presented in a randomized order (consistent across participants/sessions) determined by genetic algorithm. Everyone saw the same stimuli in the same order, so it is a fixed effect. The different categories of images were: Death (generic), Deceased (personal photo), Living loved one (personal photo), Stranger (sex-matched to spouse), and Neutral (objects).
```{r}
# add the fixed effect of stim and compare its fit to m2 by inspecting the t and f tests
m3 <- lme(bias_t ~ trialnum + stim, data = data, random = ~ trialnum | ID, method="ML")
summary(m3)
```
NO effect of time. There is a significant fixed effect of spouse, _t_ = 3.55, _p_ = .0004. There is also a non-significant effect of stranger, _t_ = -1.73, _p_ = .084. This indicates that people tended to show more approach bias to photos of their spouse (across conditions). The parameter estimate is 0.12 with an _SE_ of 0.03. They also tended to show greater more avoidance bias to photos of the sex-matched stranger.

```{r}
intervals(m3, which="fixed")
```
95% CIs don't cross zero for spouse, but do for all other stimulus types.

```{r}
iVar <- as.numeric(VarCorr(m3)[1,1])
residVar <- as.numeric(VarCorr(m3)[2,1])
icc3 <- iVar/(iVar+residVar)
icc3 # Same ICC as m2
```
The ICC is identical to that in model 2. 

```{r}
anova(m2, m3) # compare this model with the first one
```
Comparing the model with the fixed effect of stimulus (m3) to the model without (m2), m3 is better (_p_ < .0001) as evidenced by lower AIC, BIC, and lL values.

### Take out fixed and random effect of time
Because there's no significant effect of time, makes sense to drop it from the model. Now we're basically doing regression but accounting for whatever interdependence exists. <font color="red">Note that these results will _NOT_ match the results of the regressions, because the regressions use medians of the bias scores, rather than means.</font>

```{r}
# without fixed or random effect of time
m4 <- lme(bias_t ~ stim, data = data, random = ~ 1 | ID, method="ML")
summary(m4)
```
Fixed effects are still more or less the same - significant for spouse, marginal for stranger - although the parameter estimates and test statistics are slightly different.

```{r}
intervals(m4, which="fixed")
```
As above, spouse is the only stimulus category whose 95 CIs don't cross zero, supporting a significant effect of spouse on bias.

Is m4 any better than m3?
```{r}
anova(m3,m4)
```
Nope. Models do not significantly differ (_p_ = 0.851), though the AIC and BIC values are a little smaller for m4 than m3. So let's forget time and go with m4...

### Adding condition as a fixed effect
Participants completed the task at two different sessions, where they were randomized to receive treatment A at one and treatment B at the other (order counterbalanced across participants). The two treatments are oxytocin vs. placebo.
```{r}
# add the interaction of stim and check its fit by inspecting t and f tests. 
m5 <- lme(bias_t ~ stim + cond, data = data, random = ~ 1 | ID, method="ML")
summary(m5)
```
The fixed effect of condition B is not significant, _t_ = 0.15, _p_ = 0.884.

```{r}
anova(m4, m5)
```
The two models are not significantly different, but the one with the fixed effect of condition (m5) has slightly higher AIC/BIC. 

### Adding the stimulus*condition interaction

We might expect that peoples' tendency to approach or avoid might differ depending on which treatment they got, but only for certain stimuli.
```{r}
# add the interaction of stim and check its fit by inspecting t and f tests. 
m6 <- lme(bias_t ~ stim*cond, data = data, random = ~ 1 | ID, method="ML")
summary(m6)
```
Interesting. With the interaction in the model, there is a non-significant interaction between death and condition B.

```{r}
anova(m6,m4,m5)
intervals(m6)
```
The model with the interaction in it is not significantly better than either the one with just the fixed effect of condition or with no effect of condition at all, so makes sense to go with the simplest model which only includes stimulus as a fixed effect.

### Adding other plausible predictors
There are a few other variables that we might expect to impact approach/avoidance bias, namely grief severity, OXTR genotype, time since death, and attachment style (avoidant, anxious). 
```{r}
# used group mean centered scores for the additional variables for interpretability
m7 <- lme(bias_t ~ stim + rs53576_A.carrier + rs2268498_C.carrier + gmean_ecrrs_global_anx + gmean_ecrrs_global_avoid + gmean_age + gmean_icg, data = data, random = ~ 1 | ID, method="ML")
summary(m7)
anova(m4,m7)
```
None of the added fixed effects are significant and the AIC/BIC values are higher, so take 'em out. 

### Final model

```{r}
# variance explained by the entire model 
fitted <- fitted(m4)
data$fitted <- as.vector(fitted)
rsq <- lm(bias ~ fitted, data=data) 
summary(rsq)
```
A case of significant, but not meaningful: The R<sup>2</sup> for the entire model is .02, indicating that 2% of the variance in bias is explained by the model. So this model is not very good at predicting approach/avoidance bias from the fixed effect of stimulus within person. Oh well.

### Check final model assumptions

Checking normality of the residuals:
```{r}
# checking normality of residuals

# level 1:
qqnorm(m4, ~resid(.), id=0.00075)
plot(m4, resid(., type="p") ~ trialnum, id=0.0025)
```
The QQ plot looks pretty okay, albeit a few outliers. D113 is the person who most falls off the line, and does so at both ends. The residuals are also pretty evenly distributed across time, though many fall outside of the +/- 2 range.

```{r}
# the following bit wants IDs to be numeric, so take out the "D" from each ID
data <- data %>% mutate(numID = as.numeric(str_replace(ID, "D", "")))

# level 2:
qqnorm(m4, ~ranef(.), id=0.25)
ranInt <- ranef(m4)[1] # create object with the random intercept estimates
ranInt <- unlist(ranInt) 
names(ranInt) <- NULL 
ID <- unlist(subset(data, !duplicated(data$numID), select="numID")) # create an ID variable
plot(ranInt ~ ID)

ranSlope <- ranef(m4)[2] # create an object with the random slopes
ranSlope<- unlist(ranSlope) 
names(ranSlope) <- NULL 
plot(ranSlope ~ ID)

```
Getting an error here <font color="red">`Error in [.data.frame(ranef(m4), 72) : undefined columns selected`</font>, which I got stuck for way too long on and am leaving alone for now.

Checking homoscedasticity:
```{r}
# checking homoscedasticity

# level 1:
data$resid <- as.numeric(residuals(m4)) # create an object with the Level-1 residuals
plot(data$resid ~ data$stim)

# level 2:
library(lattice)
stim <- unlist(subset(data, !duplicated(data$numID), select="stim"))
xyplot(ranInt ~ stim)
xyplot(ranSlope ~ stim)
```
Note that again there are a lot of residuals falling outside of the +/-2 range.

Also, there's something weird about the L2 plot. Not sure why it shows only 3 of 5 stimulus types (edit: now not showing anything).

The error message is because the ranSlope object does not exist (see error message in output from previous chunk.)

# OT RT plots
### Long dataset for regressions
#### More data cleaning
Continuing from the data cleaning above...
```{r}
library(psych)
library(tidyverse)
filter <- dplyr::filter
select <- dplyr::select

mlm_r1r2_long <- bind_rows(mlm_r1.2, mlm_r2.2)
head(mlm_r1r2_long) ## 22464 observations

# get rid of any trials with outliers
mlm_long <- mlm_r1r2_long %>% filter(outlier_RT == "0") 
count(mlm_r1r2.1) # 22007 obs.
22007/22464 # 98% of trials retained = 2% of trials dropped.

mlm_long1 <- left_join(mlm_long, mlm_add, by="ID") # add the self-report variables
head(mlm_long1)

gmeans <- select(mlm_bx5.1, starts_with("gmean")) # add grand mean centered variables from mlm_bx5.1 object
ID <- select(mlm_bx5.1, starts_with("ID")) 
ID$ID1 <- NULL
gmeans_ID <- bind_cols(ID, gmeans) %>% distinct(gmeans_ID, ID, .keep_all = TRUE)

mlm_long2 <- left_join(mlm_long1, gmeans_ID, by="ID")


library(bestNormalize)
describe(mlm_long1$gAAT_RT) # RT data has skewness of 1.17 and kurtosis of 1.7. Min is 464 and max is 1712, as we filtered out trials above and below the 99th and 1st percentiles already
bestNormalize(mlm_long2$gAAT_RT)
gAAT_RT_t <- bestNormalize(mlm_long2$gAAT_RT)  # transform the RT data
mlm_long2$gAAT_RT_t <- gAAT_RT_t$x.t # make a new variable for the transformed values and attach it to the long dataset
hist(mlm_long2$gAAT_RT) # compare histograms for the transformed and non-transformed values
hist(mlm_long2$gAAT_RT_t) 
qqnorm(mlm_long2$gAAT_RT) #compare QQ plots for the transformed and non-transformed values
qqnorm(mlm_long2$gAAT_RT_t)

mlm_long2 %>% group_by(ID) %>% count(cond) # look at how many trials each person has for each condition (A/B)

# splitting up by instruction
mlm_long2_push <- filter(mlm_long2, grepl("PUSH",push_pull))
head(mlm_long2_push)

mlm_long2_pull <- filter(mlm_long2, grepl("PULL",push_pull))
head(mlm_long2_push)

```
Transforming really improved the RT data. Everyone looks like they have a good number of trials (out of 288; 288 = 144 trials*2 runs).

### Bias scores (based on medians)

```{r}
library(tidyverse)
# get the OXTR and self-report scores
# load in master dataset
master <- readRDS("~/Dropbox/GLASS Lab/OT Study/data/cleaned-data/selfreports_oxtr_cortisol_random_AB_bx_clean.rds") 

# subset the selected variables
subjvars <- subset(master, select=c(ID,age, age_yrs, sex_m, ethnicity_hisp, race, timesincedeath, rs2254298, rs2268498, rs53576, rs2254298_A.carrier, rs2268498_C.carrier, rs53576_A.carrier, tot_bdi, tot_bgq, tot_ecrrs_global_anx,tot_ecrrs_global_avoid, tot_ecrrs_spouse_anx, tot_ecrrs_spouse_avoid, tot_ysl, tot_icg))
head(subjvars)

# add a group variable for CG/NCG
subjvars <- subjvars %>%
  mutate(grp_cg = as.factor(ifelse(tot_icg >= 25, "CG", "NCG"))) 

# all of the OXTR vars should be factors but aren't
subjvars <- subjvars %>% mutate(rs2268498 = as.factor(rs2268498),
                        rs53576 = as.factor(rs53576))

# creating a wide dataset with the bias variables, which will then be converted into a long dataset
# The arguments to spread():
# - data: Data object
# - key: Name of column containing the new column names
# - value: Name of column containing values
data_wide <- spread(mlm_r1r2_long, key=stim, value=gAAT_RT)
colnames(data_wide)
View(data_wide)

# subset and remove outliers
data_pushA <- data_wide %>% select(ID, cond, push_pull, visit, outlier_RT, neutral, death, living, spouse, stranger) %>% filter(outlier_RT == "0" & cond == "A" & push_pull == "PUSH") %>% group_by(ID) %>% mutate(neutralApush = median(neutral, na.rm = TRUE), deathApush = median(death, na.rm = TRUE), spouseApush = median(spouse, na.rm=TRUE), livingApush = median(living, na.rm=TRUE), strangerApush = median(stranger, na.rm=TRUE))

data_pullA <- data_wide %>% select(ID, cond, push_pull, visit, outlier_RT, neutral, death, living, spouse, stranger) %>% filter(outlier_RT == "0" & cond == "A" & push_pull == "PULL") %>% group_by(ID) %>% mutate(neutralApull = median(neutral, na.rm = TRUE), deathApull = median(death, na.rm = TRUE), spouseApull = median(spouse, na.rm=TRUE), livingApull = median(living, na.rm=TRUE), strangerApull = median(stranger, na.rm=TRUE))

data_pushB <- data_wide %>% select(ID, cond, push_pull, visit, outlier_RT, neutral, death, living, spouse, stranger) %>% filter(outlier_RT == "0" & cond == "B" & push_pull == "PUSH") %>% group_by(ID) %>% mutate(neutralBpush = median(neutral, na.rm = TRUE), deathBpush = median(death, na.rm = TRUE), spouseBpush = median(spouse, na.rm=TRUE), livingBpush = median(living, na.rm=TRUE), strangerBpush = median(stranger, na.rm=TRUE))

data_pullB <- data_wide %>% select(ID, cond, push_pull, visit, outlier_RT, neutral, death, living, spouse, stranger) %>% filter(outlier_RT == "0" & cond == "B" & push_pull == "PULL") %>% group_by(ID) %>% mutate(neutralBpull = median(neutral, na.rm = TRUE), deathBpull = median(death, na.rm = TRUE), spouseBpull = median(spouse, na.rm=TRUE), livingBpull = median(living, na.rm=TRUE), strangerBpull = median(stranger, na.rm=TRUE))

# summarize by ID
data_pushA1 <- data_pushA[!duplicated(data_pushA$ID), ] %>% select(-c(neutral, death, living, spouse, stranger, visit, outlier_RT, cond, push_pull))
data_pushB1 <- data_pushB[!duplicated(data_pushB$ID), ] %>% select(-c(neutral, death, living, spouse, stranger, visit, outlier_RT, cond, push_pull))
data_pullA1 <- data_pullA[!duplicated(data_pullA$ID), ] %>% select(-c(neutral, death, living, spouse, stranger, visit, outlier_RT, cond, push_pull))
data_pullB1 <- data_pullB[!duplicated(data_pullB$ID), ] %>% select(-c(neutral, death, living, spouse, stranger, visit, outlier_RT, cond, push_pull))
head(data_pullB1)

# merge the 4 subsets together
join1 <- left_join(data_pushA1, data_pushB1, by="ID")
join2 <- left_join(join1, data_pullA1, by = "ID")
medians <- left_join(join2, data_pullB1, by = "ID")
head(medians)

medians1 <- medians %>% mutate(bias_neu_A = neutralApush - neutralApull,
                                     bias_dea_A = deathApush - deathApull,
                                     bias_spo_A = spouseApush - spouseApull,
                                     bias_str_A = strangerApush - strangerApull,
                                     bias_liv_A = livingApush - livingApull,
                                     bias_neu_B = neutralBpush - neutralBpull,
                                     bias_dea_B = deathBpush - deathApull,
                                     bias_spo_B = spouseBpush - spouseBpull,
                                     bias_str_B = strangerBpush - strangerBpull,
                                     bias_liv_B = livingBpush - livingBpull
                                     )

# now add the grp_cg/oxtr/self-report variables
medians2 <- left_join(medians1, subjvars, by="ID")
head(medians2)

# drop the condition/stim specific variables
medians2 <- medians2 %>% select(-c(neutralApush, deathApush,  spouseApush,  livingApush, strangerApush, neutralBpush, deathBpush,   spouseBpush,  livingBpush,  strangerBpush, neutralApull, deathApull, spouseApull, livingApull, strangerApull, neutralBpull, deathBpull, spouseBpull, livingBpull, strangerBpull))
colnames(medians2)

# turn it into a long dataset
medians_long <- gather(medians2,
                   key = "stim",
                   value = "bias", -c(ID,age, age_yrs, sex_m, ethnicity_hisp, race, timesincedeath, rs2254298, rs2268498, rs53576, rs2254298_A.carrier, rs2268498_C.carrier, rs53576_A.carrier, tot_bdi, tot_bgq, tot_ecrrs_global_anx,tot_ecrrs_global_avoid, tot_ecrrs_spouse_anx, tot_ecrrs_spouse_avoid, tot_ysl, tot_icg, grp_cg)) 
str(medians_long)

# create new columns for stimulus category and condition
medians_long$cond <- as.factor(ifelse(grepl("*_A", data_long$stim), "A", "B"))
medians_long$stimulus <- as.factor(ifelse(grepl("*neu*", data_long$stim), "neutral", 
                                   ifelse(grepl("*dea*", data_long$stim), "death",
                                          ifelse(grepl("*spo*", data_long$stim), "spouse", 
                                                 ifelse(grepl("*str*", data_long$stim), "stranger", 
                                                        ifelse(grepl("*liv*", data_long$stim), "living", NA))))))
View(medians_long) # 390 obs. of 26 vars, that's correct

library(psych)
describe(medians_long$bias) # skewness is -.07, kurtosis is 1.57, SD = 74.95
hist(medians_long$bias)
qqnorm(medians_long$bias) # not terrible, but definitely some outliers



# check out the outliers
## function from https://stackoverflow.com/questions/12866189/calculating-the-outliers-in-r
outfun <- function(x) {
  abs(x-mean(x,na.rm=TRUE)) > 3*sd(x,na.rm=TRUE)
}

# add a variable for outlier = T/F
medians_long$outlier <- outfun(medians_long$bias)
table(medians_long$outlier) # 5 observations where outlier = TRUE (>3SD)
# which are these?
medians_long %>% filter(outlier==TRUE)

### this is another way to do it using boxplot.stats
# outlier_values <- boxplot.stats(medians_long$bias)$out
# boxplot(data_long$bias, main="Outliers", boxwex=0.3)
# mtext(paste("Outlying values: ", paste(round(outlier_values, digits=2), collapse=", ")), cex=0.9, side=1)


```
The five extreme outliers (>3 SD) are:
<ul>D137 death A (-284.0)
D145 death A (-247.5)
D137 death B (-261.0)
D147 death B (295.0)
D139 spouse B (275.5)</ul>

What’s interesting here is that 4/5 of the outliers are in the CG group, and 3/4 of those show extreme avoidance bias to death stimuli. D137 has an extreme avoidance bias to death in both A and B. The the remaining two are extreme approach biases to death and spouse (NCG, condition B).

D147 will be dropped in the next chunk (low trial count), the others…??

#### Dropping the 4 people with low trial counts
```{r}
medians_long.1 <- filter(medians_long, !grepl("D101|D141|D147|D148",ID)) 
unique(medians_long.1$ID) # dropped the 4 participants
```

### Boxplots
```{r}
panelbystim <- ggplot(medians_long.1, aes(fill=cond, y=bias, x=grp_cg)) + 
  geom_boxplot(aes(fill=cond), outlier.alpha = 0.3)
p1 <- panelbystim + facet_grid(. ~ stimulus) +  theme(strip.text.x = element_text(size=12, color="red",
                                      face="bold.italic"),
          strip.text.y = element_text(size=12, color="red",
                                      face="bold.italic")) + geom_hline(yintercept=0, linetype="solid", color="black", size=.5) + xlab("Group") + ylab("Avoid bias                Approach bias")


panelbyCG <- ggplot(medians_long.1, aes(fill=cond, y=bias, x=stimulus)) + 
  geom_boxplot(aes(fill=cond), outlier.alpha = 0.3)
p2 <- panelbyCG + facet_grid(. ~ grp_cg) +  theme(strip.text.x = element_text(size=12, color="red",
                                      face="bold.italic"),
          strip.text.y = element_text(size=12, color="red",
                                      face="bold.italic")) + geom_hline(yintercept=0, linetype="solid", color="black", size=.5) + xlab("Stimuli") + ylab("Avoid bias                Approach bias")

# What do the boxplots look like after excluding the outliers identified above?
medians_long.2 <- medians_long.1 %>% mutate(bias_no.out = ifelse(outlier==FALSE, bias, NA)) %>% filter(!is.na(bias_no.out))
View(medians_long.2)

panelbystim.2 <- ggplot(medians_long.2, aes(fill=cond, y=bias_no.out, x=grp_cg)) + 
  geom_boxplot(aes(fill=cond), outlier.alpha = 0.3)
p3 <- panelbystim.2 + facet_grid(. ~ stimulus) +  theme(strip.text.x = element_text(size=12, color="red",
                                      face="bold.italic"),
          strip.text.y = element_text(size=12, color="red",
                                      face="bold.italic")) + geom_hline(yintercept=0, linetype="solid", color="black", size=.5) + xlab("Group") + ylab("Avoid bias                Approach bias")

panelbyCG.2 <- ggplot(medians_long.2, aes(fill=cond, y=bias_no.out, x=stimulus)) + 
  geom_boxplot(aes(fill=cond), outlier.alpha = 0.3)
p4 <- panelbyCG.2 + facet_grid(. ~ grp_cg) +  theme(strip.text.x = element_text(size=12, color="red",
                                      face="bold.italic"),
          strip.text.y = element_text(size=12, color="red",
                                      face="bold.italic")) + geom_hline(yintercept=0, linetype="solid", color="black", size=.5) + xlab("Stimuli") + ylab("Avoid bias                Approach bias")

p1 # with outliers
p3 # no outliers
p2 # with outliers
p4 # no outliers

```
Note that the error bars are wayyyy wider if outliers not dropped.

### Use plotly to explore outliers and stats
The plotly package allows you to interactively explore your ggplot objects. Conditions A/B are plotted on top of each other, which makes it hard to see both together, but you can click the legend to show just A or just B, which is pretty cool (see https://plot.ly/r/getting-started/ for more info). It doesn't seem to ID all of the same outliers that ggplot2's `geom_boxplot` does, but does provide a way to look at the values for some of them at least. 
```{r}
# install.packages("plotly")
# install.packages("shiny")
library(plotly)
library(shiny)

ggplotly(p1) # with outliers
ggplotly(p3) # no outliers
ggplotly(p2) # with outliers
ggplotly(p4) # no outliers

# the dots indicate extreme values but that doesn't necessarily mean they are outliers
```

### [Spouse/Living/Death/Stranger] vs. Neutral plots
```{r}
# computing bias relative to neutral:
medians3 <- medians2 %>% mutate(diff_bias_spo_A = bias_spo_A - bias_neu_A,
                                  diff_bias_str_A = bias_str_A - bias_neu_A,
                                  diff_bias_liv_A = bias_liv_A - bias_neu_A,
                                  diff_bias_dea_A = bias_dea_A - bias_neu_A,
                                  diff_bias_spo_B = bias_spo_B - bias_neu_B,
                                  diff_bias_str_B = bias_str_B - bias_neu_B,
                                  diff_bias_liv_B = bias_liv_B - bias_neu_B,
                                  diff_bias_dea_B = bias_dea_B - bias_neu_B)

View(medians3)

data_long2 <- gather(medians3,
                   key = "stim",
                   value = "diff_bias", -ID, -grp_cg)

data_long3 <- filter(data_long2, grepl("^diff",stim))

data_long3$cond <- as.factor(ifelse(grepl("*_A", data_long3$stim), "A", "B"))
data_long3$stimulus <- as.factor(ifelse(grepl("*dea*", data_long3$stim), "death-neu",
                                          ifelse(grepl("*spo*", data_long3$stim), "spouse-neu", 
                                                 ifelse(grepl("*str*", data_long3$stim), "stranger-neu", 
                                                        ifelse(grepl("*liv*", data_long3$stim), "living-neu", NA)))))

minus_neu <- data_long3
View(minus_neu)
# diff_bias is chr rather than num for some reason
minus_neu$diff_bias <-as.numeric(minus_neu$diff_bias)

describe(minus_neu$diff_bias) # skewness is -0.16, kurtosis is .95
hist(minus_neu$diff_bias) # not bad
qqnorm(minus_neu$diff_bias) # not too bad, but definitely a few extreme values
```


```{r}
# panel by stimulus category
panelbystim_diff <- ggplot(minus_neu, aes(fill=cond, y=diff_bias, x=grp_cg)) + 
  geom_boxplot(aes(fill=cond), outlier.alpha = 0.3)
p5 <- panelbystim_diff + facet_grid(. ~ stimulus) +  theme(strip.text.x = element_text(size=12, color="red",
                                      face="bold.italic"),
          strip.text.y = element_text(size=12, color="red",
                                      face="bold.italic")) + geom_hline(yintercept=0, linetype="solid", color="black", size=.5) + xlab("Group") + ylab("Avoid bias                Approach bias")
p5
ggplotly(p5)

# panel by group
panelbyCG_diff <- ggplot(minus_neu, aes(fill=cond, y=diff_bias, x=stimulus)) + 
  geom_boxplot(aes(fill=cond), outlier.alpha = 0.3)
p6 <- panelbyCG_diff + facet_grid(. ~ grp_cg) +  theme(strip.text.x = element_text(size=12, color="red",
                                      face="bold.italic"),
          strip.text.y = element_text(size=12, color="red",
                                      face="bold.italic")) + geom_hline(yintercept=0, linetype="solid", color="black", size=.5) + xlab("Group") + ylab("Avoid bias                Approach bias")
p6
ggplotly(p6)

# dropping low-trial count people
minus_neu.1 <- filter(minus_neu, !grepl("D101|D141|D147|D148",ID))

# panel by stimulus category
panelbystim_diff.1 <- ggplot(minus_neu.1, aes(fill=cond, y=diff_bias, x=grp_cg)) + 
  geom_boxplot(aes(fill=cond), outlier.alpha = 0.3)
p7 <- panelbystim_diff.1 + facet_grid(. ~ stimulus) +  theme(strip.text.x = element_text(size=12, color="red",
                                      face="bold.italic"),
          strip.text.y = element_text(size=12, color="red",
                                      face="bold.italic")) + geom_hline(yintercept=0, linetype="solid", color="black", size=.5) + xlab("Group") + ylab("Avoid bias                Approach bias")
p7
ggplotly(p7)

# panel by group
panelbyCG_diff.1 <- ggplot(minus_neu.1, aes(fill=cond, y=diff_bias, x=stimulus)) + 
  geom_boxplot(aes(fill=cond), outlier.alpha = 0.3)
p8 <- panelbyCG_diff.1 + facet_grid(. ~ grp_cg) +  theme(strip.text.x = element_text(size=12, color="red",
                                      face="bold.italic"),
          strip.text.y = element_text(size=12, color="red",
                                      face="bold.italic")) + geom_hline(yintercept=0, linetype="solid", color="black", size=.5) + xlab("Group") + ylab("Avoid bias                Approach bias")
p8
ggplotly(p8)

```

### Spouse vs. Stranger plots
```{r}
# computing spouse vs. stranger
medians4 <- medians3 %>% mutate(spo_vs_str_A = bias_spo_A - bias_str_A,
                                  spo_vs_str_B = bias_spo_B - bias_str_B,
                                  liv_vs_str_A = bias_liv_A - bias_str_A,
                                  liv_vs_str_B = bias_liv_B - bias_str_B)


data_long4 <- gather(data_means4,
                   key = "stim",
                   value = "diff_bias", -ID, -grp_cg)

data_long4 <- filter(data_long4, grepl("^spo_|^liv_",stim))

data_long4$cond <- as.factor(ifelse(grepl("*_A", data_long4$stim), "A", "B"))
data_long4$stimulus <- as.factor(ifelse(grepl("*spo*", data_long4$stim), "spouse-stranger",
                                          ifelse(grepl("*liv*", data_long4$stim), "living-stranger", NA)))

spouse_vs_str <- data_long4
View(spouse_vs_str)
```

```{r}
# panel by stimulus category
panelbystim_diff2 <- ggplot(spouse_vs_str, aes(fill=cond, y=diff_bias, x=grp_cg)) + 
  geom_boxplot(aes(fill=cond), outlier.alpha = 0.3)
p9 <- panelbystim_diff2 + facet_grid(. ~ stimulus) +  theme(strip.text.x = element_text(size=12, color="red",
                                      face="bold.italic"),
          strip.text.y = element_text(size=12, color="red",
                                      face="bold.italic")) + geom_hline(yintercept=0, linetype="solid", color="black", size=.5) + xlab("Group") + ylab("Avoid bias                Approach bias")
p9
ggplotly(p9)

# panel by group
panelbyCG_diff2 <- ggplot(spouse_vs_str, aes(fill=cond, y=diff_bias, x=stimulus)) + 
  geom_boxplot(aes(fill=cond), outlier.alpha = 0.3)
p10 <- panelbyCG_diff2 + facet_grid(. ~ grp_cg) +  theme(strip.text.x = element_text(size=12, color="red",
                                      face="bold.italic"),
          strip.text.y = element_text(size=12, color="red",
                                      face="bold.italic")) + geom_hline(yintercept=0, linetype="solid", color="black", size=.5) + xlab("Group") + ylab("Avoid bias                Approach bias")
p10
ggplotly(p10)

# dropping low-trial count people
spouse_vs_str.1 <- filter(spouse_vs_str, !grepl("D101|D141|D147|D148",ID))

# panel by stimulus category
panelbystim_diff2.1 <- ggplot(spouse_vs_str.1, aes(fill=cond, y=diff_bias, x=grp_cg)) + 
  geom_boxplot(aes(fill=cond), outlier.alpha = 0.3)
p11 <- panelbystim_diff2.1 + facet_grid(. ~ stimulus) +  theme(strip.text.x = element_text(size=12, color="red",
                                      face="bold.italic"),
          strip.text.y = element_text(size=12, color="red",
                                      face="bold.italic")) + geom_hline(yintercept=0, linetype="solid", color="black", size=.5) + xlab("Group") + ylab("Avoid bias                Approach bias")
p11
ggplotly(p11)

# panel by group
panelbyCG_diff2.1 <- ggplot(spouse_vs_str.1, aes(fill=cond, y=diff_bias, x=stimulus)) + 
  geom_boxplot(aes(fill=cond), outlier.alpha = 0.3)
p12 <- panelbyCG_diff2.1 + facet_grid(. ~ grp_cg) +  theme(strip.text.x = element_text(size=12, color="red",
                                      face="bold.italic"),
          strip.text.y = element_text(size=12, color="red",
                                      face="bold.italic")) + geom_hline(yintercept=0, linetype="solid", color="black", size=.5) + xlab("Group") + ylab("Avoid bias                Approach bias")
p12
ggplotly(p12)

```
### Regressions
```{r}
# predict from stimulus
m1 <- lm(bias ~ stimulus, data=medians_long.3)
summary(m1)

# predict from condition
m2 <- lm(bias ~ cond, data=medians_long.3)
summary(m2)

# predict from stimulus*condition
m3 <- lm(bias ~ stimulus*cond, data=medians_long.3)
summary(m3)

# predict from stimulus*condition*group
m4 <- lm(bias ~ stimulus*cond*grp_cg, data=medians_long.3)
summary(m4)

# predict from stimulus + condition*group
m5 <- lm(bias ~ stimulus + cond*grp_cg, data=medians_long.3)
summary(m5)
cat_plot(m5, pred="grp_cg", modx = "cond", plot.points=T, jitter=0.1, point.shape=T)


# predict from stimulus*condition + group
m6 <- lm(bias ~ stimulus*cond + grp_cg, data=medians_long.3)
summary(m6)

anova(m1,m2) # m2 > m1
anova(m2,m3) # m3 > m2
anova(m3,m4) # ns
anova(m4,m5) # ns
anova(m4,m6) # ns

# predict from stimulus*condition*tot_icg
m7 <- lm(bias ~ stimulus*cond*tot_icg, data=medians_long.3)
summary(m7) # the tot_icg*cond interaction is a significant predictor, t = 2.522, p = .012
# no 3-way interaction
interact_plot(m7, pred="tot_icg", modx = "cond", plot.points=T, jitter=0.1, point.shape=T)

```
Note that in the last plot, people in the CG group shows increased approach bias under condition B.

#### Exploratory analyses with the genotype data
I wanted to take a look at attachment styles, _OXTR_ genotype, and condition. All of the ECRRS variables are super negatively skewed. I tried some transformations using `bestNormalize` but their recommendations didn't help much. However, the residuals for the models do look okay imo. But do bear that in mind as you look at these.

**Also, keep in mind that this is still the long dataset, so each person has multiple observations of the `bias` score. This makes it look like there are more datapoints than there are, and I'm not sure if this is correct because it doesn't model the interdependence in the data.**

```{r}
# library(bestNormalize)
# norm <- bestNormalize(medians_long.3$tot_ecrrs_spouse_anx)
# medians_long.3$tot_ecrrs_spouse_anx_t <- norm$x.t # pull x.t from norm and assign it back into dataset
# hist(medians_long.3$tot_ecrrs_spouse_anx_t)

# The recent Schiele et al. paper on CG, rs2254298, behavioral inhibition and separation anxiety found:
#  - No significant main effect of OXTR genotype on ICG score (same as we found)
#  - Significant interaction of OXTR x behavioral inhibition on ICG score
#  - Marginal interaction of OXTR x adult separation anxiety on ICG score
# WE HAVE BIS/BAS IN THE MASTER DATASET - COULD REPLICATE?

m8 <- lm(bias ~ stimulus*cond+tot_ecrrs_global_anx, data=medians_long.3)
summary(m8) # interesting - GLOBAL anxious attachment style OVERALL predicts bias
qqnorm(resid(m8))

m9 <- lm(bias ~ stimulus+cond+tot_ecrrs_spouse_anx, data=medians_long.3)
summary(m9) # but SPOUSE anxious attachment style does NOT predict bias
qqnorm(resid(m9))

m10 <- lm(bias ~ stimulus+cond+tot_ecrrs_spouse_anx*rs2268498, data=medians_long.3)
summary(m10) # SPOUSE anxious attachment style DOES predict bias in interaction with rs2268498, p = .009 (this also works with rs2268498_C.carrier)
qqnorm(resid(m10))
interact_plot(m10, pred="tot_ecrrs_spouse_anx", modx = "rs2268498", plot.points=T, jitter=0.1, point.shape=T)

m11 <- lm(bias ~ stimulus+cond+tot_ecrrs_spouse_anx*rs2254298, data=medians_long.3)
summary(m11) # but SPOUSE anxious attachment style does not significantly predict bias in interaction with rs2254298, p = .066
qqnorm(resid(m11))

m12 <- lm(bias ~ stimulus+cond*tot_ecrrs_spouse_anx*rs2268498, data=medians_long.3)
summary(m12) # non-significant 3-way interaction of condition*spouse anxious attachment*rs2268498CC 
### I THINK I MIGHT NEED TO REDO THIS ONE AFTER RE-LEVELING rs2268498 - REFERENCE LEVEL IS C/T, SHOULD BE TT OR CC?
qqnorm(resid(m12))

m13 <- lm(bias ~ stimulus+cond*tot_ecrrs_spouse_anx*rs2254298, data=medians_long.3)
summary(m13) # significant interaction of spouse anxious attachment*rs2254298, p = .018, no 3-way interaction so m11 > m13 for parsimony
qqnorm(resid(m13))

m14 <- lm(bias ~ stimulus+cond+tot_ecrrs_spouse_anx*rs53576, data=medians_long.3)
summary(m14) # no effect of interaction
qqnorm(resid(m14))

m15 <- lm(bias ~ stimulus+cond*tot_ecrrs_spouse_anx*rs53576, data=medians_long.3)
summary(m15) # significant 3-way interaction of condition*spouse anxious attachment*rs53576 (GG), p = 0.019
qqnorm(resid(m15))
```


### Some more data viz

```{r}
library(lattice)

xyplot(bias ~ cond | as.factor(ID), data=medians_long.3, as.table=T, type=c("p","r")) # look at peoples' individual slopes between conditions

xyplot(bias ~ cond | as.factor(grp_cg), data=medians_long.3, as.table=T, type=c("p","r")) ## look at slopes between groups

xyplot(bias ~ stimulus | as.factor(grp_cg), data=medians_long.3, as.table=T, type=c("p","r")) ## look at slopes between groups
```

# New ANOVAs and plots
#### Repeated-measures ANOVA on the aggregate RTs

**Caveat: I didn't include any covariates in the model yet.**

```{r}
library(tidyverse)
library(afex)
library(emmeans)
library(jtools)

# use the dataset "mlm_long2" that was created in a previous step (it has trial-level RTs; stim, cond, and push_pull are factors)
# add a group variable for CG/NCG, make push_pull & run into factors, and drop the 4 people who failed to reverse the instructions
anovaData <- mlm_long2 %>% 
  mutate(grp_cg = as.factor(ifelse(tot_icg >= 25, "CG", "NCG")), push_pull = as.factor(push_pull), run = as.factor(run)) %>% filter(!grepl("D101|D141|D147|D148",ID))

# repeated-measures ANOVA using medians, with 3 within-subjects factors (stimulus, condition, push or pull) and 1 between-subjects factor (CG vs. NCG)
a1 <- aov_ez(id = "ID", dv = "gAAT_RT", anovaData, between = "grp_cg", within = c("stim", "cond", "push_pull"), fun_aggregate = median) # using non-transformed RTs
a1 

# repeated-measures ANOVA using medians, with 3 within-subjects factors (stimulus, condition, push or pull) and 1 between-subjects factor (CG vs. NCG)
a1_t <- aov_ez(id = "ID", dv = "gAAT_RT_t", anovaData, between = "grp_cg", within = c("stim", "cond", "push_pull"), fun_aggregate = median)  # using transformed RTs
a1_t
```

Results don't differ appreciably between using the transformed and non-transformed RTs.

For both the transformed and non-transformed RTs, there is a significant three-way interaction of `Group` x `Condition` x `Push_Pull`, _F_(1,33) = 10.09, _p_ = .003, generalized _eta_-squared = .003 (stats are for transformed variable). However, if I'm interpreting that effect size correctly, it's a pretty tiny effect (.02 = small effect, .13 = medium, .26 = large). 

```{r}
p13 <- afex_plot(a1, x = "push_pull", trace = "cond", panel = "grp_cg", 
          error = "within", mapping="linetype", error_arg = list(width = .1))
p13 + theme_light() # greyscale, light theme
p13 + theme_apa() # greyscale, APA theme

p14 <- afex_plot(a1, x = "push_pull", trace = "cond", panel = "grp_cg", 
          error = "within", mapping="color", error_arg = list(width = .1))
p14 + theme_light() # color, light theme
p14 + theme_apa() # color, APA theme
```
From these, it looks like the interaction shows a significant effect of Condition (spray A/B), but _only_ for people with CG and _only_ on their push behavior. In the CG group, B decreased avoidance bias* as shown by the within-subjects error bars that barely overlap if at all. 

In the two-way interaction plot for the bias score (from the regression), people in the CG group appeared to show increased approach bias under condition B. From the plots above, it looks like what is driving this apparent increase in approach bias is actually **decreased avoidance behavior** rather than **increased approach behavior**. It's just not evident from the difference scores, which are agnostic to cause.

*_(well, is it really bias per se if it's not relative to pull? hmm)_

#### Estimated marginal means

```{r}
emm <- emmeans(a1, ~c(grp_cg,cond,push_pull))
emm
update(pairs(emm), by=NULL, adjust = "holm")

```
Not quite sure about these pairwise comparisons. Ideally, we'd be able to set up specific contrasts to test as described below but I haven't figured that out yet. However, looking at the plots, I think it's safe to interpret the interaction as stated above. I will work on this.

```
?aov_ez

The S3 object returned per default can be directly passed to emmeans::emmeans for further analysis. This allows to test any type of contrasts that might be of interest independent of whether or not this contrast involves between-subject variables, within-subject variables, or a combination thereof. The general procedure to run those contrasts is the following (see Examples for a full example):

(1) Estimate an afex_aov object with the function returned here. For example: x <- aov_car(dv ~ a*b + (id/c), d)

(2) Obtain a emmGrid-class object by running emmeans on the afex_aov object from step 1 using the factors involved in the contrast. For example: r <- emmeans(x, ~a:c)

(3) Create a list containing the desired contrasts on the reference grid object from step 2. For example: 
con1 <- list(a_x = c(-1, 1, 0, 0, 0, 0), b_x = c(0, 0, -0.5, -0.5, 0, 1))

(4) Test the contrast on the reference grid using contrast. For example: contrast(r, con1)

To control for multiple testing p-value adjustments can be specified. For example the Bonferroni-Holm correction: contrast(r, con1, adjust = "holm")
```

#### Writing plots to .pptx
This is a useful tool for high-res figures! Using the `officer` package to write plots directly to a Powerpoint presentation:
```{r}
library(officer)
# Create a new powerpoint document
ot_plots <- read_pptx()
# Add a new slide into the ppt document 
ot_plots <- ot_plots %>% 
  add_slide(layout = "Title and Content", master = "Office Theme")
# Add the ggplot2 objects to the slide
ot_plots <- ph_with_gg(ot_plots, value=p14)
# write the document to a file
print(ot_plots, target = "ot-rt-plots.pptx") 


```

