This project uses behavioral and self-report data from the study “The effect of intranasal oxytocin on neural functioning in widow(er)s” (#1503744420, PI: Mary-Frances O’Connor, Co-PI: Brian Arizmendi). It was a double-blind randomized crossover study using an intranasal oxytocin manipulation and approach-avoidance task variant (grief AAT; gAAT) to investigate conflicting accounts of motivated behavior in people with complicated grief. Enrolled participants were older adults who had experienced the death of a spouse or long-term romantic partner between 6-36 months prior to their participation. Stratified sampling was used to ensure a range of grief symptom severity scores was represented in the sample.
Participants attended two identical experimental sessions, at which they received one of two intranasal sprays (oxytocin or placebo) and then took part in an fMRI scan, during which time they completed the task.
In the gAAT, five stimulus categories were presented:
Photos provided by the participant were scanned,resized, and framed with blue/yellow frames to match the other stimuli used in the task. Photos of the stranger were sex-matched to the participant’s partner.
Participants completed the task using a joystick. The stimuli were animated so that a joystick push would make the image smaller (as if they were pushing it away) and a joystick pull would make the image larger (as if they were bringing it closer).
Participants were instructed to respond during the task to the color of a photo’s frame, e.g., “push the joystick when the frame is BLUE, pull when the frame is YELLOW”). Once participants had completed the first run of the task (144 trials), they completed a second run that was identical to the first except for the instructions, which were reversed (e.g. “now push the joystick for YELLOW and pull for BLUE”). Order of instructions was counterbalanced across participants. Stimuli were presented via Inquisit 4 (2014), in a pseudorandomized order determined by genetic algorithm to optimize statistical power and psychological validity (Wager & Nichols, 2003). Each trial last 3000ms, with 2500 ms allowed for response. Intertrial interval was 500ms.
MFO:
Should it be RT to one stimulus type, RT to the contrast of one stimuli type > another, or compatibility scores? Compatibility scores have felt the easiest for me to understand and compare (and they were what at least one of our reviewers was expecting). Is it true that using compatibility scores as a DV has the advantage of reducing the number of predictors in any of our models (as they remove the need for response direction if those are summed in the DV)?
BA:
I agree with this point, and do believe that converting push/pull to one compatibility score as the DV could work well to reduce the number of predictors. This might be good considering w the power issue.
SS:
Yes, it is true that if we use bias scores we do not include “direction” as a predictor.
On the other hand, if we use the mixed effects modeling approach, we have vastly more power because we then have ~288 observations per person per session (so ~576 each subject, if using data from both OT and PL sessions) rather than 10 observations (i.e., 5 stimuli x 2 directions, using medians to compute bias) per person/session. So having “direction” as a predictor is less of an issue.
Either way, I think using the bias scores for figures is helpful in depicting approach/avoid tendencies.
Pros of “direction” model: Being able to identify whether the change in bias is driven by faster/slower push or pull is potentially useful. More power than bias scores (more observations - unless we use the trial-level bias scores, which I’m not sure that we should.)
Cons of “direction” model: May be more difficult to interpret given the additional level in the interactions (potential 4-way).
Side note: I find the term “compatibility scores” confusing (vs. “bias scores”). I know that’s what Rinck & Becker used, but I think the term “compatibility” only makes sense when you have a sense of what people’s natural response tendencies are – in their paper, they had participants who were afraid of spiders, so a negative compatibility score (faster to pull than push) indicated a compatibility or congruency between their behavior on the task and real-world behaviors. I think the thing that’s always confused me about the use of the term in this project is, compatible with what? We need a strong a priori hypothesis about which tendency the group will show in order for this term to make sense (IMO)
MFO:
MLM seems unnecessarily complicated after all. I’m generally pro-regression.
BA:
I agree that MLM appears unnecessarily complicated. My thoughts right now are that an RM-ANOVA would be the best way to go because:
It appears to be what some others (maccallum) are doing, and the creator of the AAT has published results using this method. It is likely more digestible by reviewers as well.
The RM-ANOVA and the MLM should theoretically yield very similar results for the reasons described in the previous point
Regarding regression: From my understanding, we have basically already done this in MLM. The last line of the model of the MLM is a regression with all variables included… If we did a series of regressions like Eisma did, then we cannot directly compare stimuli types. If I’m understanding the Eisma paper correctly, that group ran a series of regressions, with one regression per stimulus type, and the DV was the median RT. So those conclusions CAN say, for instance: that rumination affects RT on this stimulus, and not on this one (for example) but they CANNOT say that rumination affects RT more or less on one stimulus type than on another stimulus type, because no direct comparison is made in a series of separate regressions.
SS:
Re: BA’s last point - right. I am not a fan of running 5 different regressions. We get an inflated chance of type I error (so would need to adjust our alpha accordingly), and as Brian said, we can’t directly compare stimulus effects. And as we saw from reviewer comment in green below, they’re likely to be skeptical of that approach. Regressions also don’t account for the fact that we have multiple observations that come from the same person.
My vote is that we use the linear mixed effects approach as our main approach (“mixed effects model” is just a different term for MLM – means we include both fixed [e.g., stimulus] and random [e.g., person] effects.) I think there’s a simpler way to do it than the MLM analyses Brian and I both did previous, which were pretty complicated. I’m calling it a mixed model here to distinguish it from those previous analyses but it’s just a different term for MLM.
Advantages: Gives us way more power!!! Therefore, we are able to test more complicated models, such as the three- and four-way interactions.
We do need to account for the random effect of subject. Regression does not allow us to account for the interdependence of the data. In a mixed model, we address the interdependence by nesting observations within person.
But I also think it makes sense to run the RM-ANOVA, and see how our results differ (or don’t), both because it may be easier to understand and because it’s consistent with what Maccallum did.
I propose a single 2 (Group) x 5 (Stimulus Type) [x 2 (Direction) if RT is DV] model - in the placebo data only - to address Aims 1 and 2. This could be LME or ANOVA.
Then for Aims 3 and 4, run the full model: 2 (Group) x 2 (Treatment) x 5 (Stimulus Type) [x 2 (Direction) if RT is DV]. This could be LME or ANOVA.
If I understand correctly, the authors ran a group (2) x response (2) x stimulus (2) x OT condition (2) analysis, for five of the possible ten stimulus pairs. It would seem more appropriate to run a single omnibus 2x2x5x2 analysis with all 5 stimuli.
SS:
I agree with this reviewer.
One result was that both grief groups showed an approach pattern for idiographic spouse images. This has some methodological implications for reaction time tests and is consistent with the reward component of grief. However, the theoretical contribution of the result is minimal.
The other primary result was that reaction time was slower in the CG group overall after OT administration. This was surprising in that there was no specific effect of stimulus x group x OT (a three way interaction) that would have been very interesting. The failure to obtain a three-way interaction is itself difficult to unpack. Because of the small sample, this study is vulnerable to a type II error. Indeed, this is the principal flaw here. It is difficult to know whether the effect is not present or merely wasn’t detected.
SS:
Using the trial-level data in MLM/mixed model solves the power problem.
Note that the experimental design for the present study is not identical to the studies below.
median RT to pull grief, positive, negative and abstract, and median RT to push grief, positive, negative and abstract stimuli. Compatibility effect scores were calculated by subtracting median RT’s in the pull condition from median RT’s in the push condition (Table 2).
A 2 (Group) x 2 (Prime) x 2 (Direction) x 4 (Stimulus Type) repeated measures ANOVA indicated a main effect for Stimulus Type [F(3,49) = 39.88, p < 001, η = 0.71], a significant 2- way interaction between Direction x Stimulus Type [F(3,49) = 6.29 p< .002, η = 0.27], and a significant 3-way interaction between Group x Direction x Stimulus Type, [F(3,49) = 3.79, , p < .017, η = 0.19].
Median RT’s for all stimuli were calculated for each stimulus type. To be able to test our hypotheses, we also determined push-pull scores for each stimulus type, which are calculated by deducting the Median RT’s for pull trials from the Median RT’s for the push trials.
Our main analyses consisted of multiple regression analyses, in which rumination levels were used as a predictor of reaction times (push-pull) for each stimulus type (deceased + loss word; deceased + neutral word; stranger + loss word; stranger + neutral word; puzzle + X’s).
In both papers, bias scores were computed as (median RTpush - median RTpull) within each stimulus category.
MFO:
These were the questions I thought the field would want to know, and the reviewer seemed to like.
Rephrased as aim:
Aim 3.
To identify whether effects of intranasal oxytocin differ in high-severity [CG group] versus low-severity [Non-CG group] participants.
Analysis
Look at the condition (2) x group (2) interaction in the following model:
Using bias scores:
Using push/pull RTs:
If we see the expected interaction, then look at… (not sure which one is best? I think the first?)
Whether there is a within-group difference between OT and PL sessions?
Whether there is a within-session difference between CG and NCG?
Rephrased as aim:
Aim 4. To identify whether the response to spouse images in the CG group significantly differs under intranasal oxytocin vs. placebo.
Look at the stimulus (5) xcondition (2) x group (2) interaction in the following model (same model for Aim 3 above):
Using bias scores:
Using push/pull RTs:
If we see the expected interaction, then look at planned contrasts. We hypothesize that:
spouse,placebo,CG != spouse,placebo,CG
and that
spouse,placebo,NCG == spouse,placebo,NCG
(i.e., spouse in the placebo session is not equal to spouse in the oxytocin session, in the CG group only).
The scripts below generate the data files used in the analyses above.
Pre-R steps:
In Terminal, within each of the folders in turn…
find . -iname "*.iqdat" -exec bash -c 'mv "$0" "${0%\.iqdat}.tsv"' {} \;cat *.tsv > OT_gAAT_run-1.tsv (or OT_gAAT_run-2.tsv)library(tidyverse)
filter <- dplyr::filter
select <- dplyr::select
# read in the data (only importing a subset of columns)
raw_r1 <- as_data_frame(read_tsv("~/Dropbox/GLASS Lab/OT Study/data/raw-data/OT-gAAT-behavioral-data/Run1_tsv/OT_gAAT_run-1.tsv", col_types=cols_only(
date = col_integer(),
time = col_character(),
subject = col_character(),
blockcode = col_character(),
blocknum = col_character(),
trialcode = col_character(),
values.trialcode = col_character(),
values.stimulus = col_character(),
values.initialresponse = col_character(),
values.RT = col_number()
)))
raw_r2 <- as_tibble(read_tsv("~/Dropbox/GLASS Lab/OT Study/data/raw-data/OT-gAAT-behavioral-data/Run2_tsv/OT_gAAT_run-2.tsv", col_types=cols_only(
date = col_integer(),
time = col_character(),
subject = col_character(),
blockcode = col_character(),
blocknum = col_character(),
trialcode = col_character(),
values.trialcode = col_character(),
values.stimulus = col_character(),
values.initialresponse = col_character(),
values.RT = col_number()
)))
# header line repeats when I used `cat` to merge, so get those out of there
raw_r1 <- raw_r1 %>% filter(!is.na(date))
raw_r2 <- raw_r2 %>% filter(!is.na(date))
unique(raw_r1$subject)
# D118 is showing up as "test" for some reason (checked against the .iqdat file), and D130_b is missing
# both of D130's and D142's visits were entered as "D130"/"D142", so change the subject ID for the visit on 052416 to D130_b and on
raw_r1$subject[raw_r1$date == 020916] <- "D118"
raw_r1$subject[raw_r1$date == 052416] <- "D130_b"
raw_r1$subject[raw_r1$date == 111516] <- "D142_b"
# fixing some other stuff in a similar vein
raw_r1$subject[raw_r1$subject == "D101_2"] <- "D101_b"
raw_r1$subject[raw_r1$subject == "D102_B"] <- "D102_b"
raw_r1$subject[raw_r1$subject == "D117_b"] <- "D117"
raw_r1$subject[raw_r1$subject == "D117_c"] <- "D117_b"
raw_r1$subject[raw_r1$subject == "D135_c"] <- "D135_b"
length(unique(raw_r1$subject)) # 78 unique - that is correct
unique(raw_r2$subject)
# some similar issues to fix
raw_r2$subject[raw_r2$subject == "D101_2Y"] <- "D101_b"
raw_r2$subject[raw_r2$subject == "D102_B"] <- "D102_b"
raw_r2$subject[raw_r2$subject == "D107_B_2"] <- "D107_b"
raw_r2$subject[raw_r2$subject == "D117_b"] <- "D117"
raw_r2$subject[raw_r2$subject == "D117_c"] <- "D117_b"
raw_r2$subject[raw_r2$subject == "D126_B"] <- "D126_b"
raw_r2$subject[raw_r2$subject == "D135_c"] <- "D135_b"
raw_r2$subject[raw_r2$date == 111516] <- "D142_b"
length(unique(raw_r2$subject)) # 78 unique - that is correct
str(raw_r1)
str(raw_r2)
# take out ITIs
unique(raw_r1$trialcode)
mlm_r1 <- filter(raw_r1, grepl("^A",trialcode))
unique(mlm_r1$trialcode)
unique(raw_r2$trialcode)
mlm_r2 <- filter(raw_r2, grepl("^A",trialcode))
unique(mlm_r2$trialcode)
# add a "trialnum" variable, grouped by ID
mlm_r1 <- mlm_r1 %>% group_by(subject) %>% mutate(trialnum = row_number())
View(mlm_r1)
mlm_r2 <- mlm_r2 %>% group_by(subject) %>% mutate(trialnum = row_number())
View(mlm_r2)
mlm_r1 %>% group_by(subject) %>% count() # everyone has 144 trials
mlm_r2 %>% group_by(subject) %>% count() # everyone has 144 trials, except D122_b has 177
# no clue why that happened, although I do vaguely remember we had one session where the task seemed to go on and on and we force quit it?
# remove trials 145-177 for D122_b:
mlm_r2 <- filter(mlm_r2, trialnum <=144)
mlm_r2 %>% group_by(subject) %>% count()
# now everyone has 144 trials
# add a column for "run"
mlm_r1$run <- "1"
mlm_r2$run <- "2"
# add a column for "visit"
mlm_r1$visit <- as.factor(ifelse(grepl("*_b", mlm_r1$subject), "2", "1"))
mlm_r2$visit <- as.factor(ifelse(grepl("*_b", mlm_r2$subject), "2", "1"))
mlm_r1 %>% group_by(subject, visit, run) %>% count() # all looks good: everyone has 144 trials for each run at each visit
mlm_r2 %>% group_by(subject, visit, run) %>% count() # all looks good: everyone has 144 trials for each run at each visit
# now remove the "_b" from visit 2 IDs
mlm_r1 <- mlm_r1 %>% ungroup() %>%
mutate(subject = str_replace(subject, "_b", ""))
unique(mlm_r1$subject)
mlm_r2 <- mlm_r2 %>% ungroup() %>%
mutate(subject = str_replace(subject, "_b", ""))
unique(mlm_r2$subject)
# merge data from the 2 runs
bx <- bind_rows(mlm_r1, mlm_r2)
# split up by visit
bx_v1 <- bx %>% filter(visit == "1")
bx_v2 <- bx %>% filter(visit == "2")
bx_v1 %>% group_by(subject, run) %>% count() # everyone has 144 trials for each run at visit 1
bx_v2 %>% group_by(subject, run) %>% count() # everyone has 144 trials for each run at visit 2
# add columns for "cond" at visit 1 and visit 2 [condition: A/placebo or B/oxytocin]
# load in the randomization data
randomize <- readRDS("~/Dropbox/GLASS Lab/OT Study/data/cleaned-data/randomization.rds")
str(randomize)
bx_v1 <- bx_v1 %>%
rename("ID" = subject)
bx_v2 <- bx_v2 %>%
rename("ID" = subject)
# get the IDs for everyone who got treatment A at visit 1
IDs_v1_txA <- randomize %>% filter(tx_v1 == "A")
txA_list <- IDs_v1_txA$ID
txA_list
# ifelse statement: if ID = any of those listed in txA_list, make cond_v1 = "A", else make it "B"
bx_v1 <- bx_v1 %>%
mutate(cond_v1 = as.factor(ifelse(ID == "D101" | ID == "D105" | ID == "D109" | ID == "D110" | ID == "D114" | ID == "D115" | ID == "D116" | ID == "D118" | ID == "D119" | ID == "D120" | ID == "D123" | ID == "D124" | ID == "D126" | ID == "D128" | ID == "D132" | ID == "D135" | ID == "D136" | ID == "D137" | ID == "D138" | ID == "D140" | ID == "D141" | ID == "D145" | ID == "D148" | ID == "D149" | ID == "D117", "A", "B")), cond_v2 = as.factor(ifelse(ID == "D101" | ID == "D105" | ID == "D109" | ID == "D110" | ID == "D114" | ID == "D115" | ID == "D116" | ID == "D118" | ID == "D119" | ID == "D120" | ID == "D123" | ID == "D124" | ID == "D126" | ID == "D128" | ID == "D132" | ID == "D135" | ID == "D136" | ID == "D137" | ID == "D138" | ID == "D140" | ID == "D141" | ID == "D145" | ID == "D148" | ID == "D149" | ID == "D117", "B", "A")))
# ifelse statement: if ID = any of those listed in txA_list, make cond_v2 = "B", else make it "A" (people who were A at visit 1 should be B for visit 2, and vice versa)
bx_v2 <- bx_v2 %>%
mutate(cond_v2 = as.factor(ifelse(ID == "D101" | ID == "D105" | ID == "D109" | ID == "D110" | ID == "D114" | ID == "D115" | ID == "D116" | ID == "D118" | ID == "D119" | ID == "D120" | ID == "D123" | ID == "D124" | ID == "D126" | ID == "D128" | ID == "D132" | ID == "D135" | ID == "D136" | ID == "D137" | ID == "D138" | ID == "D140" | ID == "D141" | ID == "D145" | ID == "D148" | ID == "D149" | ID == "D117", "B", "A")), cond_v1 = as.factor(ifelse(ID == "D101" | ID == "D105" | ID == "D109" | ID == "D110" | ID == "D114" | ID == "D115" | ID == "D116" | ID == "D118" | ID == "D119" | ID == "D120" | ID == "D123" | ID == "D124" | ID == "D126" | ID == "D128" | ID == "D132" | ID == "D135" | ID == "D136" | ID == "D137" | ID == "D138" | ID == "D140" | ID == "D141" | ID == "D145" | ID == "D148" | ID == "D149" | ID == "D117", "A", "B")))
levels(bx_v1$cond_v1)
levels(bx_v2$cond_v2)
head(bx_v1)
txA_list
bx_v1 %>% group_by(cond_v1) %>% count(ID) # IDs and cond match txA_list *note that some IDs from the randomization dataset are listed in txA but not in the behavioral data. This is because some participants were dropped or never ended up completing their experimental session: D109 (dropped after 1st session), D116 and D136 (never enrolled), D124 (incidental finding, excluded after T1MPRAGE).
bx_v2 %>% group_by(cond_v2) %>% count(ID) # IDs and cond match txA_list
bx_long <- bind_rows(bx_v1,bx_v2)
View(bx_long)
# rename some variables
colnames(bx_long)
bx_long <- bx_long %>% rename("push_pull" = values.initialresponse,
"gAAT_RT" = values.RT)
head(bx_long)
# add a new variable for condition
bx_long <- bx_long %>% mutate(cond = as.factor(ifelse(cond_v1 == "A" & visit == "1", "A",
ifelse(cond_v2 == "A" & visit == "2", "A",
ifelse(cond_v1 == "B" & visit == "1", "B",
ifelse(cond_v2 == "B" & visit == "2", "B", NA))))))
View(bx_long) # check that it coded the "cond" column correctly
# add a new variable for stimulus category
# in values.stimulus:
# 1 = spouse, 2 = living loved one/WHOTO, 3 = stranger, 4 = nomothetic death-related, 5 = neutral images
sort(unique(bx_long$values.stimulus))
bx_long <- bx_long %>%
mutate(stim = as.factor(ifelse(values.stimulus == "1B_1.jpg"|values.stimulus == "1B_2.jpg"|values.stimulus == "1B_3.jpg"|values.stimulus == "1Y_1.jpg"|values.stimulus == "1Y_2.jpg"|values.stimulus == "1Y_3.jpg", "spouse",
ifelse(values.stimulus == "2B_1.jpg"|values.stimulus == "2B_2.jpg"|values.stimulus == "2B_3.jpg"|values.stimulus == "2Y_1.jpg"|values.stimulus == "2Y_2.jpg"|values.stimulus == "2Y_3.jpg", "living",
ifelse(values.stimulus == "3B_1.jpg"|values.stimulus == "3B_2.jpg"|values.stimulus == "3B_3.jpg"|values.stimulus == "3Y_1.jpg"|values.stimulus == "3Y_2.jpg"|values.stimulus == "3Y_3.jpg", "stranger",
ifelse(values.stimulus == "4B_1.jpg"|values.stimulus == "4B_2.jpg"|values.stimulus == "4B_3.jpg"|values.stimulus == "4Y_1.jpg"|values.stimulus == "4Y_2.jpg"|values.stimulus == "4Y_3.jpg", "death",
ifelse(values.stimulus == "5B_1.jpg"|values.stimulus == "5B_2.jpg"|values.stimulus == "5B_3.jpg"|values.stimulus == "5Y_1.jpg"|values.stimulus == "5Y_2.jpg"|values.stimulus == "5Y_3.jpg", "neutral", NA)))))))
levels(bx_long$stim)
bx_long$stim <- relevel(bx_long$stim, ref = "neutral") # make neutral the reference level
# save it
saveRDS(bx_long, "~/Dropbox/GLASS Lab/OT gAAT for meeting 5-10-19/bx_long.rds")
Outlier removal follows conventions from other gAAT papers, which exclude trials with RTs that are equal to or above the 99th percentile (calculated from the sample as a whole) or equal to or below 1st percentile.
# calculate percentiles separately for placebo/oxytocin conditions
bx_pl <- bx_long %>% filter(cond == "A")
bx_ot <- bx_long %>% filter(cond == "B")
p99pl <- quantile(bx_pl$gAAT_RT, 0.99) # 1716.76 ms
p01pl <- quantile(bx_pl$gAAT_RT, 0.01) # 463 ms
p99ot <- quantile(bx_ot$gAAT_RT, 0.99) # 1711.07 ms
p01ot <- quantile(bx_ot$gAAT_RT, 0.01) # 473 ms
# identify placebo condition outliers
bx_pl <- bx_pl %>%
mutate(outlier_RT = ifelse(gAAT_RT <= p01pl | gAAT_RT >= p99pl, 1, 0))
table(bx_pl$outlier_RT)
227/11005 # about 2% of trials are outliers
# identify oxytocin condition outliers
bx_ot <- bx_ot %>%
mutate(outlier_RT = ifelse(gAAT_RT <= p01ot | gAAT_RT >= p99ot, 1, 0))
table(bx_ot$outlier_RT)
229/11003 # about 2% of trials are outliers
bx_long_noOut <- bind_rows(bx_pl, bx_ot) %>% filter(outlier_RT == 0)
dim(bx_long)
dim(bx_long_noOut) # after outliers removed, there are 22008 trials total of the original 22464: 22464 - (227+229) = 22008
saveRDS(bx_long_noOut, "~/Dropbox/GLASS Lab/OT gAAT for meeting 5-10-19/bx_long_noOut.rds")
write_csv(bx_long_noOut, "~/Dropbox/GLASS Lab/OT gAAT for meeting 5-10-19/bx_long_noOut.csv")
library(psych)
# how much missing data is there?
ntrials <- bx_long_noOut %>% group_by(ID, cond) %>% count() # don't group by push_pull because peopled pulled/pushed on different numbers of trials, for ex. responses in the wrong direction
describe(ntrials$n/288) # everyone has at least 90% of the 288 trials expected at each visit (no more than 10% trials dropped because of outliers or missed responses)
describe(bx_long_noOut$gAAT_RT) # skewness is 1.17, kurtosis is 1.7
hist(bx_long_noOut$gAAT_RT) # distribution is skewed, as usual for RT data...
qqnorm(bx_long_noOut$gAAT_RT) # and skewness is reflected in QQ plot
# creating a wide dataset with the bias variables, which will then be converted back to a long dataset
# The arguments to spread():
# - data: Data object
# - key: Name of column containing the new column names
# - value: Name of column containing values
data_wide <- spread(bx_long_noOut, key=stim, value=gAAT_RT)
colnames(data_wide)
View(data_wide)
# subset by condition and direction
data_pushA <- data_wide %>% select(ID, cond, push_pull, visit, neutral, death, living, spouse, stranger) %>% filter(cond == "A" & push_pull == "PUSH") %>% group_by(ID) %>% mutate(neutralApush = median(neutral, na.rm = TRUE), deathApush = median(death, na.rm = TRUE), spouseApush = median(spouse, na.rm=TRUE), livingApush = median(living, na.rm=TRUE), strangerApush = median(stranger, na.rm=TRUE))
data_pullA <- data_wide %>% select(ID, cond, push_pull, visit, neutral, death, living, spouse, stranger) %>% filter(cond == "A" & push_pull == "PULL") %>% group_by(ID) %>% mutate(neutralApull = median(neutral, na.rm = TRUE), deathApull = median(death, na.rm = TRUE), spouseApull = median(spouse, na.rm=TRUE), livingApull = median(living, na.rm=TRUE), strangerApull = median(stranger, na.rm=TRUE))
data_pushB <- data_wide %>% select(ID, cond, push_pull, visit, neutral, death, living, spouse, stranger) %>% filter(cond == "B" & push_pull == "PUSH") %>% group_by(ID) %>% mutate(neutralBpush = median(neutral, na.rm = TRUE), deathBpush = median(death, na.rm = TRUE), spouseBpush = median(spouse, na.rm=TRUE), livingBpush = median(living, na.rm=TRUE), strangerBpush = median(stranger, na.rm=TRUE))
data_pullB <- data_wide %>% select(ID, cond, push_pull, visit, neutral, death, living, spouse, stranger) %>% filter(cond == "B" & push_pull == "PULL") %>% group_by(ID) %>% mutate(neutralBpull = median(neutral, na.rm = TRUE), deathBpull = median(death, na.rm = TRUE), spouseBpull = median(spouse, na.rm=TRUE), livingBpull = median(living, na.rm=TRUE), strangerBpull = median(stranger, na.rm=TRUE))
# summarize by ID
data_pushA1 <- data_pushA[!duplicated(data_pushA$ID), ] %>% select(-c(neutral, death, living, spouse, stranger, visit, cond, push_pull))
data_pushB1 <- data_pushB[!duplicated(data_pushB$ID), ] %>% select(-c(neutral, death, living, spouse, stranger, visit, cond, push_pull))
data_pullA1 <- data_pullA[!duplicated(data_pullA$ID), ] %>% select(-c(neutral, death, living, spouse, stranger, visit, cond, push_pull))
data_pullB1 <- data_pullB[!duplicated(data_pullB$ID), ] %>% select(-c(neutral, death, living, spouse, stranger, visit, cond, push_pull))
# merge the 4 subsets together
join1 <- left_join(data_pushA1, data_pushB1, by="ID")
join2 <- left_join(join1, data_pullA1, by = "ID")
bias <- left_join(join2, data_pullB1, by = "ID")
head(bias)
bias <- bias %>% mutate(bias_neu_A = neutralApush - neutralApull,
bias_dea_A = deathApush - deathApull,
bias_spo_A = spouseApush - spouseApull,
bias_str_A = strangerApush - strangerApull,
bias_liv_A = livingApush - livingApull,
bias_neu_B = neutralBpush - neutralBpull,
bias_dea_B = deathBpush - deathBpull,
bias_spo_B = spouseBpush - spouseBpull,
bias_str_B = strangerBpush - strangerBpull,
bias_liv_B = livingBpush - livingBpull
)
# drop the condition/stim specific variables
bias <- bias %>% select(-c(neutralApush, deathApush, spouseApush, livingApush, strangerApush, neutralBpush, deathBpush, spouseBpush, livingBpush, strangerBpush, neutralApull, deathApull, spouseApull, livingApull, strangerApull, neutralBpull, deathBpull, spouseBpull, livingBpull, strangerBpull))
colnames(bias)
# turn it into a long dataset
bias_long <- gather(bias,
key = "stim",
value = "bias", -c(ID))
head(bias_long)
# create new columns for stimulus category and condition
bias_long$cond <- as.factor(ifelse(grepl("*_A", bias_long$stim), "A", "B"))
bias_long$stimulus <- as.factor(ifelse(grepl("*neu*", bias_long$stim), "neutral",
ifelse(grepl("*dea*", bias_long$stim), "death",
ifelse(grepl("*spo*", bias_long$stim), "spouse",
ifelse(grepl("*str*", bias_long$stim), "stranger",
ifelse(grepl("*liv*", bias_long$stim), "living", NA))))))
bias_long <- bias_long %>% select(-stim) # no longer need the "stim" variable
describe(bias_long$bias) # skewness is -.01, kurtosis is 1.34, M = 14.04, SD = 70.01
hist(bias_long$bias) # normal distribution
qqnorm(bias_long$bias) # definitely some outliers
# check out the outliers
## function from https://stackoverflow.com/questions/12866189/calculating-the-outliers-in-r
outfun <- function(x) {
abs(x-mean(x,na.rm=TRUE)) > 3*sd(x,na.rm=TRUE)
}
# add a variable for outlier = T/F
bias_long$outlier <- outfun(bias_long$bias)
table(bias_long$outlier) # 4 observations where outlier = TRUE (>3SD)
# which are these?
bias_long %>% filter(outlier==TRUE)
### this is another way to do it using boxplot.stats
# outlier_values <- boxplot.stats(medians_long$bias)$out
# boxplot(data_long$bias, main="Outliers", boxwex=0.3)
# mtext(paste("Outlying values: ", paste(round(outlier_values, digits=2), collapse=", ")), cex=0.9, side=1)
saveRDS(bias_long, "~/Dropbox/GLASS Lab/OT gAAT for meeting 5-10-19/bias_long.rds")
write_csv(bias_long, "~/Dropbox/GLASS Lab/OT gAAT for meeting 5-10-19/bias_long.csv")
The four extreme outliers (>3 SD) are:
To calculate trial-level bias (aka “compatibility”) scores: separate push minus pull trials, then subtract RT on that trial at run1 from RT on that trial at run2.
bx_long <- readRDS("~/Dropbox/GLASS Lab/OT gAAT for meeting 5-10-19/bx_long.rds")
bx_r1.1 <- bx_long %>% filter(run == "1")
bx_r2.1 <- bx_long %>% filter(run == "2")
tail(bx_r1.1)
tail(bx_r2.1)
# run1 should line up with run2: "stim" should match (i.e, spouse trials with spouse trials) but push_pull should say PUSH at one run and PULL at the other
# calculate percentiles for outlier removal (run1 and run2)
p99r1 <- quantile(bx_r1.1$gAAT_RT, 0.99)
p01r1 <- quantile(bx_r1.1$gAAT_RT, 0.01)
p99r1 # 99th percentile RTs = 1713.38ms
p01r1 # 1st percentile RTs = 472ms
p99r2 <- quantile(bx_r2.1$gAAT_RT, 0.99)
p01r2 <- quantile(bx_r2.1$gAAT_RT, 0.01)
p99r2 # 99th percentile RTs = 1712.45ms
p01r2 # 1st percentile RTs = 463ms
bx_r1.2 <- bx_r1.1 %>%
mutate(outlier_RT = ifelse(gAAT_RT <= p01r1 | gAAT_RT >= p99r1, 1, 0))
table(bx_r1.2$outlier_RT)
227/11005 # 227 trials will be dropped, 11005 will be retained...about 2% of trials.
bx_r2.2 <- bx_r2.1 %>%
mutate(outlier_RT = ifelse(gAAT_RT <= p01r2 | gAAT_RT >= p99r2, 1, 0))
table(bx_r2.2$outlier_RT)
230/11002 # 230 trials will be dropped, 11002 will be retained...about 2% of trials.
bx_r1r2 <- bind_cols(bx_r1.2, bx_r2.2)
dim(bx_r1r2) # after outliers removed, there are 11232 trials total
# make sure that trials in the two runs line up post-merge
head(bx_r1r2)
tail(bx_r1r2)
# get rid of any trials with outliers in either run1 or run2
# (need run1/run2 to be the same length so that the trials match up)
bx_r1r2.1 <- bx_r1r2 %>% filter(outlier_RT == "0" & outlier_RT1 == "0")
count(bx_r1r2.1) # 10790 obs.
10790/11232 # ~96% of trials retained = 4% data excluded
push_pull <- bx_r1r2.1 %>% filter(push_pull == "PUSH" & push_pull1 == "PULL") %>% mutate(bias = (gAAT_RT-gAAT_RT1))
pull_push <- bx_r1r2.1 %>% filter(push_pull == "PULL" & push_pull1 == "PUSH") %>% mutate(bias = (gAAT_RT1-gAAT_RT))
bx_pp <- bind_rows(push_pull,pull_push)
head(bx_pp) # can see from this that D101 (and possibly others) has only a few trials, because they failed to reverse the instructions (i.e., push_pull and push_pull1 columns have the same values instead of PUSH at one and PULL at the other)
# in essence, this removes their "error" trials
bx_pp %>% group_by(ID, cond) %>% count() # yep, this is visible in their *very low* trialcounts:
# D101 A n = 3
# D141 A n = 18
# D147 B n = 1
# D148 A n = 19
As we see from the last part of the script above, 4 participants have extremely low trial counts at one visit or another, due to failure to reverse the instructions. This will be addressed next.
# create a "drop" variable for these 4
bx_pp <- bx_pp %>% mutate(low_n_trials = ifelse(ID == "D101" & cond == "A" | ID == "D141" & cond == "A"| ID == "D148" & cond == "A" | ID == "D147" & cond == "B", 1, 0))
# check whether the rows line up
notlinedup <- bx_pp %>% filter(stim != stim1)
notlinedup
notlinedup <- bx_pp %>% filter(trialcode != trialcode1)
notlinedup
notlinedup <- bx_pp %>% filter(trialnum != trialnum1)
notlinedup
notlinedup <- bx_pp %>% filter(push_pull == push_pull1)
notlinedup
# 0 rows returned; all the rows match. Thus we conclude the rows from the two runs line up.
saveRDS(bx_pp, "~/Dropbox/GLASS Lab/OT gAAT for meeting 5-10-19/bx_long_trialbytrialBias.rds")