Replication of Study Attention Alters Perceived Attractiveness (Experiment 3) by Störmer & Alvarez (2016, Psychological Science)

Introduction

In their study, Störmer & Alvarez (2016) tested whether attention can alter the appearance of real-world stimuli. They asked three questions: 1) Does attention alter perceived attractiveness of faces? 2) Is the change in attractiveness judgments driven by exogenous (involuntary) attention? 3) Does the attentional cue influence attractive judgments by modulating the apparent local contrast of the faces? Their results showed that attention altered the perception of facial attractiveness which belongs to higher-level aspects of perception. In the current replication project, we aim to replicate Experiment 3 of the original study in which the authors found that attention increased the apparent contrast around the eye region of faces which has been reported to modulate facial attractiveness.

Methods

Power Analysis

The effect size for the key statistical test (the paired-samples t test) reported in the original paper was \(\eta^2=0.30\). We computed Cohen’s d_z based on statistics provided in the paper (55.8% vs. 45.8%, \(t\)(15) = 2.51, \(p\) = .02, \(\eta^2\)=0.30).

N <- 16
meanDiff <- 55.8 - 45.8
t <- 2.51  # t = meanDiff / se_of_meanDiff
se_d <- meanDiff/t
s_d <- se_d * sqrt(N) # standard deviation of the mean differences
d_z <- meanDiff / s_d
# also equals
d_z <- t/sqrt(N)

Based on the computed effect size of d_z = 0.6275, we performed post hoc power analysis using G*Power. The analysis indicated that the power of the original study was 0.65. We would need a sample size of N = 22 to achieve 80% power, N = 29 to achieve 90% power, and N = 35 to achieve 95% power to be able to detect the reported effect size. We aim to acieve 80% power. However, to shorten the duration of the experiment for each participant, we are going to split the experiment into halves and double the number of participants. That will give us N = 44.

Planned Sample

We plan to recruit 44 US participants on the Amazon Mechanical Turk.

Materials

"A small black fixation cross (0.5° × 0.5°) was presented in the center of the screen throughout the experiment. Two small horizontal lines (~0.5° long) were presented to the left and right of fixation and served as landmarks for the horizontal midline of the screen. The target display consisted of two faces (each 8° × 6°) that were presented to the left and right of fixation at an eccentricity of 6°. The face images were chosen from 20 images of female Caucasian faces (approximate age range from 20 to 30 years) taken from Bronstad and Russell’s (2007) database. They were converted to gray scale and cropped such that only their inner features (no hair or neck) were visible. All the faces were matched in overall brightness (104 cd/m2), but the contrast of the eye region was systematically manipulated for each face. Specifically, a mask (handdefined in Adobe Photoshop) covering the eyes and the eyebrows was created for each face, and the contrast within that mask was manipulated by parametrically changing the standard deviation of that section of the image using MATLAB (The MathWorks, Natick, MA). This decreased or increased the luminance differences around the eye region in the face. For each face, five different contrast levels were created. These levels were measured in terms of root-mean-square error (RMSE) of the pixels’ luminance values within the masked region. Contrast levels of 0.30, 0.35, 0.39, 0.44, and 0.50 RMSE were used." (from Störmer & Alvarez p.565.)

The experimental materials used in the original study (i.e., contrast manipulated face images) were provided by the original authors. Although we used the same stimuli, the size, brightness and spacing of stimuli could not be precisely controlled due to several constraints in the experimental settings as our experiment was conducted online on the Amazon Mechanical Turk (e.g., different displays, web browing environments, etc). However, we tried to keep the relative sizes and positions of stimuli as close to those in the original study as possible.

Procedure

"The experiment was conducted in a dimly lit room, and the stimuli were presented on a 15-in. CRT display (1,280 × 1,024 pixels; 85 Hz) whose background color was set to gray (111 cd/m2). Participants viewed the stimuli at a distance of 57 cm, and a chin rest was used to stabilize their heads. Participants’ gaze was tracked with an eye tracker (EyeLink 1000, SR Research Ltd., Mississauga, Ontario, Canada) to ensure fixation. The experiment was run in MATLAB using the Psychophysics Toolbox (Brainard, 1997; Pelli, 1997).
Participants were instructed to maintain their gaze on the fixation cross in the center of the gray screen throughout each experimental block. When they moved their gaze more than 1.5° away from fixation, the trial was aborted. At the beginning of each trial, a black circle appeared briefly (~70 ms) on either the left or the right side of the screen. After another 58 ms, face images were presented simultaneously on the left and right for 58 ms (Fig. 1a). Thus, the stimulus onset asynchrony (SOA) between the attentional cue and the faces was 128 ms. After the offset of the faces, the gray screen with the fixation cross was presented until the participant responded. The intertrial interval varied from 1.0 to 1.5 s. On two thirds of the trials, two different faces were randomly selected from the set of 20 faces to be presented as the target display. One of the faces was presented at the standard contrast (Level 3), and the other face was presented at one of the test contrasts (Levels 1–5). On the other third of the trials, the exact same face was presented on the left and right at the standard contrast. These trials were included so that we could compare responses to cued and uncued faces while all physical attributes of the two faces were matched. The analyses of the effects of the attentional cue on attractiveness judgments focused on these matched-face trials." (from Störmer & Alvarez p.565.)  

"Experiment 3 followed the same procedure as Experiment 1 except for the task instructions. Participants were asked to report the vertical positioning (upward or downward) of the face that appeared to have higher contrast around the eye region by pressing the up- or down-arrow key on a keyboard. Prior to the experiment, participants were shown three example stimuli and were told that contrast varied around the eye region. As in the other experiments, participants were told that the black dot (the attentional cue) was task irrelevant." (from Störmer & Alvarez p.565.)

The experimental procedures were followed precisely, excxept for the following changes:

The presentation of stimuli was controlled by javascript.
As mentioned before, we did not have control over the size, resolution and refresh rate of displays. Also, the testing environments were considerably different from the original study – Turkers’ presumably worked from their home viewing the screen at a random distance without a chin rest and without an eye tracker recording their gaze positions.
There were also a few differences in the duration of stimulus presentation. Considering that the most common refresh rates for computer screens include 30Hz and 60Hz, we fixed the lengths of cue and face presentation at 66 ms instead of 70 ms and 58 ms, respectively. Thus, the stimulus onset asynchrony (SOA) between the attentional cue and the faces in our version of the experiment was 132 ms which was 4 ms longer than in the original study. However, we did not expect that difference to be effective enough to abolish the attentional cueing effect.
To reduce the length of the experiment 10 instead of 20 faces were used for each participant. We splitted the faces into two sets and randomly assigned which set would be used for each participant. The remaining 10 faces appeared in practice trials.
Participants completed 20 practice trials instead of 40 trials prior to the experiment. Only Level 1 and 5 contrast levels were used as the contrast of the test face, and the first twelve trials were slow-paced than the original experiment.

Analysis Plan

There was a main effect of actual contrast level on contrast judgments, \(F\)(4, 15) = 27.93, \(p\)= .0001, \(\eta^2\) = .31. Participants chose the face with higher contrast around the eye region more often than the face with lower contrast around the eye region. As in the previous experiments, our main analysis focused on the matched-face trials, in which identical faces were presented at Contrast Level 3. As shown in Figure 4b, when the two faces were physically identical, participants tended to judge the face at the cued location to have higher contrast than the face at the uncued location (55.8% vs.45.8%), \(t\)(15) = 2.51, \(p\) = .02, \(\eta^2\) = .30. (from Störmer & Alvarez p.568.)

Key Analysis of Interest

The dependent variable (DV) of interest in Experiment 3 of the original study was the percentage of trials where participants judged the “test” face as having higher contrast than the “standard” face (contrast fixed at Level 3). Specifically, they focused their analysis on the “matched-face” trials in which the actual contrast of test and standard faces was identical (i.e., both Level 3 and of the same identity). That is, they compared participants’ responses from the matched-face trials in Test Face Cued and Standard Face Cued conditions to scrutinize the effect of attentional cue (IV) on the contrast judgment, controlling for physical difference.

The original authors conducted a paired-samples t test to examine the effect of attention within participants and found that attention had a significant effect on participants’ responses (55.8% vs. 45.8%), \(t\)(15) = 2.51, \(p\) = .02, \(\eta^2\) = .30 (d_z = 0.6275).

The key analysis of interest in this replication is, therefore, a paired-sample t test to determine whether or not the percentage of cued faces being chosen is equal to the percentage of uncued faces being chosen.

As subsidary analyses, we are going to replicate:

The two-way repeated measures ANOVA to test the main effects of the contrast level and the cue condition.
Figure 2a showing participants’ choices as a function of contrast of the test face, for test face cued and for standard face cued conditions separately.

Finally, we will perform the following additional analyses:

Mixed-effects logistic regression to model binary responses (test vs. standard)
Logistic (logit) regression to model individual participants’ responses

Differences from Original Study

The original study was conducted in a highly controlled experimental setting typical for psychophysics experiments, whereas our study will be conducted online. We will have less control over the display (e.g., size, luminance, resolution, refresh rate etc.), timing of stimulus presentation, testing environment and so on.

Contrast & luminance: Although we will not be able to control the absolute contrast/luminance shown to participants, we expect that relative differences between images will be maintained. In addition, we might be able to infer whether different contrasts were distinguishable on participants’ screen by looking at the effect of contrast on each participant’s reponses.
Screen dimension & stimulus size: There is a possibility that the size of stimuli or the distance to targets from the fixation point might affect experimental results. It has been well documented that visual task performance is modulated by eccentricity, and it also has been reported that the effect of exogenous attention depends on the size of the cue. However, while it is possible that such factors can make participants’ performance vary, we expect that the variance could be explained by a subject factor.
We did not include difference face trials which composed 2/3 of the trials. Because only the same face trials were included in the analysis in the original study, we reasoned that this decision would not cause a significant difference in results.
We do not have eye tracking data collected from mturk participants, whereas the original authors aborted trials where participants’ gaze deviated more than 1.5° away from the fixation. This difference could have made our data less reliable. We emphasized the importance of fixating in the center repeatedly during the experiment to minimize any possible biases resulted from this limitation so far as possible.

Methods Addendum (Post Data Collection)

Actual Sample

Forty-four mTurkers participated in the experiment. Participants who got less than 50% of practice trials correct were excluded from the analysis. The remaining thirty-nine participants (13 female, 26 male) were between the ages of 21 and 65 years old (mean = 36).

Differences from pre-data collection methods plan

None.

Results

Data preparation

###Data Preparation
path <- "~/class/StanfordPsych254/stormer/"
files <- dir(paste0(path,"anonymized-results/"), 
             pattern = "*.json")
d.raw <- data.frame()
d_practice <- data.frame()
idList <- data.frame()
####Import data
for (f in files) {
  jf <- paste0(path, "anonymized-results/",f)
  jd <- fromJSON(paste(readLines(jf)))
  stimcond <- jd$answers$data$exptstim
  resp <- jd$answers$data$exptresp
  id <- data.frame(workerid=jd$WorkerId)
  idList <- bind_rows(idList,id)
  whichFaceSet <- data.frame(whichSet = 
                              as.integer(jd$answers$data$whichFaces))

  id <- cbind(id,stimcond,resp,whichFaceSet)
  d.raw <- bind_rows(d.raw, id)
  
  # load practice data
  pracAns <- data.frame(testContrast = na.omit(jd$answers$data$pracstim$testContrast),
                        testPos=na.omit(jd$answers$data$pracstim$testPos),
                        leftUp=na.omit(jd$answers$data$pracstim$leftUp),
                        correct = na.omit(jd$answers$data$pracresp$correct),
                        keypress=na.omit(jd$answers$data$pracresp$keypress))
  id <- data.frame(workerid = jd$WorkerId)
  id_p <- cbind(id,pracAns)
  d_practice <- bind_rows(d_practice,id_p)
}

# Number of participants
allWorkers = length(unique(d.raw$workerid))
which(duplicated(idList))

## integer(0)

Data exclusion / filtering

# exclude those who got less than 50% correct for contrast level 1 and 5 in the practice run
d_practice<-d_practice %>%
  mutate(testVertical = factor(ifelse((testPos=="left"&leftUp==T)|(testPos=="right"&leftUp==F),"up", ifelse((testPos=="left"&leftUp==F)|(testPos=="right"&leftUp==T),"down",NA)))) %>%
  mutate(correctAns = ifelse(testContrast > 3,"test", ifelse(testContrast < 3,"standard","none"))) %>%
  mutate(chosen = factor(ifelse(testVertical==keypress,"test","standard"))) %>%
  mutate(correct = ifelse(chosen==correctAns,1, ifelse(chosen!=correctAns & correctAns!="none", 0,NA)))
# contrast 1 and 5 trials
d_prac <- d_practice[d_practice$testContrast==1 | d_practice$testContrast==5, ]
# % correct
pracResult <- d_prac %>%
  group_by(workerid) %>%
  summarise(correctPer=sum(correct)/n())
# search subject with % under 50
exclude <- which(pracResult$correctPer<.5)

d_backup <- d.raw
for (i in exclude) {
  exWorker = paste0("anon",i)
  d.raw <- d.raw[-c(which(d.raw$workerid==exWorker)),]
}
numWorkers <- allWorkers - length(exclude)

5 participants were excluded from analysis based on their performance in the practice run (i.e., got less than 50% correct).

Prepare data for analysis - create columns etc.

# as factor
d.raw[sapply(d.raw, is.character)] <- lapply(d.raw[sapply(d.raw, is.character)], as.factor)
d.raw$face <- factor(d.raw$face)
d.raw$testContrast <- factor(d.raw$testContrast)
d.raw$whichSet <- factor(d.raw$whichSet)

# retrieve the face chosen and the vertical position of target face in each trial
d<-d.raw %>%
  mutate(testVertical = factor(ifelse((testPos=="left"&leftUp==T)|(testPos=="right"&leftUp==F),"up", ifelse((testPos=="left"&leftUp==F)|(testPos=="right"&leftUp==T),"down",NA)))) %>%
  mutate(chosen = factor(ifelse(testVertical==keypress,"test","standard"))) %>%
  mutate(correctAns = ifelse(as.numeric(testContrast) > 3,"test", 
                             ifelse(as.numeric(testContrast) < 3,"standard","none"))) %>%
  mutate(correct = ifelse(chosen==correctAns,1, 
                                 ifelse((chosen!=correctAns & correctAns!="none"), 0,NA)))

d_15 <- d[d$testContrast==1 | d$testContrast==5, ]
exptCorrect <- d_15 %>%
  group_by(workerid) %>%
  summarise(correctPer=sum(correct)/n())

# calculate the % of trials where participants answered that "test" face had higher contrast than "standard" face
tbl <- d %>% 
  group_by(workerid,cue,testContrast) %>%
  summarise(n=n(),testChosen=sum(chosen=="test"), 
            standardChosen=sum(chosen=="standard"),
            pctTest=testChosen/n*100,
            whichFace=unique(whichSet))

# prep data for plotting
tbl <- tbl %>%
  group_by(workerid) %>%
  mutate(workerMean=mean(pctTest)) %>% # worker means across conditions
  ungroup() %>%
  mutate(grandMean=mean(workerMean)) %>% # grand mean across subjects
  # To compute within subject errors later
 # new value = old value – subject average + grand average
  mutate(newPct=pctTest-workerMean+grandMean)

Confirmatory analysis

ANOVA: manipulation check

aov_rep <- ezANOVA(data=tbl, dv=pctTest, wid=workerid, within=.(testContrast,cue),detailed=TRUE)
print(aov_rep)

## $ANOVA
##             Effect DFn DFd    SSn   SSd      F       p p<.05    ges
## 1      (Intercept)   1  38 976501  4822 7695.5 1.8e-45     * 0.8863
## 2     testContrast   4 152  82166 65324   47.8 5.7e-26     * 0.3962
## 3              cue   1  38   4639 42404    4.2 4.8e-02     * 0.0357
## 4 testContrast:cue   4 152    444 12676    1.3 2.6e-01       0.0035
## 
## $`Mauchly's Test for Sphericity`
##             Effect     W       p p<.05
## 2     testContrast 0.021 6.9e-26     *
## 4 testContrast:cue 0.648 7.1e-02      
## 
## $`Sphericity Corrections`
##             Effect  GGe   p[GG] p[GG]<.05  HFe   p[HF] p[HF]<.05
## 2     testContrast 0.34 2.6e-10         * 0.35 1.7e-10         *
## 4 testContrast:cue 0.85 2.7e-01           0.95 2.6e-01

Our repeated-measures analysis of variance (ANOVA) revealed a main effect of physical contrast on contrast judgments, \(F\)(4, 152) = 47.80, \(p\) = p=0.000, \(\eta^2\) = 0.40.

There was a significant main effect of cue condition (test vs standard), \(F\)(1, 38) = 4.16, \(p\) = 0.048, \(\eta^2\) = 0.036. The interaction effect was not significant, \(F\)(4, 152) = 1.33, \(p\) = 0.26, \(\eta^2\) = 0.004.

The original authors only reported the main effect of contrast level in Experiment 3, \(F\)(4, 15) = 27.93, \(p\) = .0001, \(\eta^2\) = .31.

Replication of Figure 4a. Psychometric functions

Psychometric curve fit (weibull) using MLE

# get mean psychometric data points across subjects
psychometric <- tbl %>% 
  group_by(cue,testContrast) %>%
  # mean and within subject errors
  summarise(avePct=mean(pctTest), se=sd(newPct)/sqrt(n()),
            ntrial=mean(n),aveTestChosen=mean(testChosen),
            sumTest=sum(testChosen),sumN=sum(n))
psychometric <- psychometric %>%
  mutate(RMScontrast = c(.30,.35,.39,.44,.50))

RMScontrast = c(.30,.35,.39,.44,.50)
x<-RMScontrast
n<-psychometric$sumN #n trial per condition
k <- psychometric$sumTest # number of times that the observer reports that can see the stimulus
y <- k/n
cue <- psychometric$cue
dat <- data.frame(cue,x, k,n, y)
# define function that ouputs best function parameters
fitting <- function(df){
  nll <- function(p) { 
  phi <- pweibull(df$x, p[1], p[2]) 
  -sum(df$k * log(phi) + (df$n - df$k) * log(1 - phi))
  }
para <- optim(c(.7,.7), nll)$par
xseq <- seq(.3,.5,.001)
yseq <- pweibull(xseq, para[1], para[2])
data.frame(xseq,yseq) 
}
# plotting the curves
npoints <- 201
# prediction with fit values
curves <- dat %>%
  split(.$cue) %>%
  map_df(fitting)
curves$cue <-rbind(matrix(rep("standard",npoints)), matrix(rep("test",npoints)))

# plot
p<- psychometric %>%
  ggplot(aes(x=RMScontrast,y=avePct,group=cue,color=cue)) +
  geom_pointrange(aes(ymin=avePct-se, ymax=avePct+se)) +
  geom_line(data=curves,aes(x=xseq,y=yseq*100,color=cue)) +
  geom_segment(aes(x=.39,xend=.39, y=-Inf, yend=5),color="black",size=1.4) +
  xlim(0.28,0.52)+
  ggthemes::theme_few()+ 
  xlab("Contrast Level of Test Face (RMSE)") +
  ylab("Test Face Chosen (%)") +
  theme(legend.position = c(0,1), legend.justification = c(0,1),
        legend.background= element_rect(fill=NA, color=NA),
        legend.title = element_blank()) +
        scale_color_brewer(type="qual",palette=6, labels=c("Standard Face Cued","Test Face Cued"), guide=guide_legend(reverse=TRUE)) +
  scale_y_continuous(limits = c(20, NA),oob = rescale_none)
# p

Side-by-side comparison with the original graph

Key statistics: Paired-samples t

# matched-face trials
d_sum <- tbl %>%
  group_by(workerid) %>%
  mutate(workerMean_3 = mean(pctTest)) %>%
  ungroup() %>%
  mutate(grandMean_3 = mean(workerMean)) %>%
  mutate(newPct3 = pctTest - workerMean_3 + grandMean_3)
d_t <- d_sum[d_sum$testContrast=="3",]

bar <- d_t %>%
  group_by(cue) %>%
   # mean and within subject errors
  summarise(avePct=mean(pctTest), se=sd(newPct3)/sqrt(n()),
            ntrial=mean(n), aveTestChosen=mean(testChosen),
            sumTest=sum(testChosen), sumN=sum(n))

bargraph<-bar %>%
  ggplot(aes(x=cue,y=avePct,fill=cue)) +
  geom_bar(stat = "identity", position = "dodge",width = 0.6) +
  geom_linerange(aes(ymin=avePct-se, ymax=avePct+se)) +
  ggthemes::theme_few() + 
  xlab("Face Cued") +
  ylab("Test Face Chosen (%)") +
  theme(legend.position="none") + 
  scale_y_continuous(limits = c(30, NA),oob = rescale_none) +
  scale_fill_brewer(type="qual",palette=6)
# bargraph

Conduct a paired t test (test vs. standard face, Contrast Level 3)

#conduct a paired t-test (test VS standard face, Contrast=3)
ttest<-t.test(pctTest ~ cue, d_t, paired=TRUE)
ttest

## 
##  Paired t-test
## 
## data:  pctTest by cue
## t = -2, df = 40, p-value = 0.09
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -16.6   1.2
## sample estimates:
## mean of the differences 
##                    -7.7

# cohen's effect size measure
dz<-cohensD(pctTest ~ cue, data=d_t, method="paired")
print(dz)

## [1] 0.28

The paired t test revealed a marginally significant (p < .10) difference in contrast judgment between Test Face Cued and Standard Face Cued conditions, 55.9% vs. 48.21%, \(t\)(38) = -1.75, \(p\) = 0.09, \(\eta^2\) = 0.04. The attended (cued) face tended to be judged to have higher contrast than the same face that was unattended to some degree.

The original paper reported 55.8% vs. 45.8%, \(t\)(15) = 2.51, \(p\) = .02, \(\eta^2\) = .30.

Side-by-side comparison with the original graph

Original	Replication
\(t\)(15)=2.51, \(p\) = .02, \(\eta^2\) = .30	\(t\)(38)=-1.75, \(p\) = 0.09, \(\eta^2\) = 0.04

Exploratory analyses

Logistic regression

Fitting the logit to our binary response data using the GLM

# data preparation
# adding a 0/1 binary response column in the raw dset
d$choice <- as.numeric(d$chosen) - 1
d$rms <-d$testContrast
levels(d$rms) <- RMScontrast
d$rms <- as.numeric(as.character(d$rms))
tbl$rms <-RMScontrast

# defining a function to perform logistic regression for individual data
logFit <- function(df,tb) {
logreg <- glm(formula = choice ~ cue + rms - 1, family = "binomial", df)
#get predicted responses from the model prediction
xseq <- seq(.3,.5,.001)
y_t <- predict(logreg,data.frame(cue="test",rms=xseq),type = "response")
y_s <- predict(logreg,data.frame(cue="standard",rms=xseq),type = "response")
#organize as data frame
curve_log <- data.frame(rbind(matrix(y_s),matrix(y_t)))
colnames(curve_log) = "yseq"
curve_log$cue <-rbind(matrix(rep("standard",npoints)), matrix(rep("test",npoints)))
curve_log$xseq = xseq

logplot <- df %>%ggplot(aes(x=rms,y=choice)) +
  geom_point(aes(x=rms, y=choice, color=cue),shape=21, fill=NA,alpha=0.7,
             position=position_jitter(width =.006,height = .04)) +
  geom_line(data=curve_log,aes(x=xseq,y=yseq,color=cue)) +
  ggthemes::theme_few() +
  theme(legend.position="none") +
  geom_point(data=tb,aes(x=rms, y=pctTest/100, color=cue)) +
  ylab("p test") +
  xlab("RMSE") +
  scale_color_brewer(type="qual",palette=6)
return(logplot)
}

# for loop to generate figures for each participant
workerID=unique(d$workerid)
plots <- list()
for (i in 1:numWorkers) {
  plots[[i]] <- logFit(d[d$workerid==workerID[i],], tbl[tbl$workerid==workerID[i],])
}
# plotting all
do.call(grid.arrange,c(plots,ncol=4))

Individual differences as a function of test contrast

# examine individual differences
d_sum %>% #only matched trials here
  ggplot(aes(x=cue,y=pctTest,group=workerid, color=workerid)) +
  geom_line() +
  geom_point(size=1.6, shape=21) +
  facet_grid(.~testContrast) +
  ggtitle("Test Contrast") +
  xlab("Face Cued") +
  ylab("Test Face Chosen (%)") +
  theme(plot.title = element_text(hjust = 0.5,size=rel(1)),
        legend.position = "bottom", legend.text=element_text(size=4),
        legend.title=element_text(size=8), legend.key.size = unit(0.03, "npc")) +
   guides(color=guide_legend(override.aes=list(size=0.6)))

Mixed effects logistic regression

# focusing on contrast level 3 only
# model comparisons?
d3 <- d[d$testContrast==3,]
# random intercept
log1 <- glmer(choice ~ cue + (1|workerid), family = "binomial", d3)
#random slope
log2 <- glmer(choice ~ cue + (1+cue|workerid), family = "binomial", d3)
anova(log1,log2)

## Data: d3
## Models:
## log1: choice ~ cue + (1 | workerid)
## log2: choice ~ cue + (1 + cue | workerid)
##      Df  AIC  BIC logLik deviance Chisq Chi Df Pr(>Chisq)    
## log1  3 2157 2173  -1075     2151                            
## log2  5 2124 2151  -1057     2114  36.8      2      1e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# including all contrast levels.....
# model comparisons?
d$rms_sc <- scale(d$rms, scale=FALSE)
# random intercept
mixedLog1 <- glmer(choice ~ cue + rms_sc + (1|workerid), family = "binomial", d)
mixedLog2 <- glmer(choice ~ cue * rms_sc + (1|workerid), family = "binomial", d)
anova(mixedLog1,mixedLog2)

## Data: d
## Models:
## mixedLog1: choice ~ cue + rms_sc + (1 | workerid)
## mixedLog2: choice ~ cue * rms_sc + (1 | workerid)
##           Df   AIC   BIC logLik deviance Chisq Chi Df Pr(>Chisq)  
## mixedLog1  4 10131 10159  -5062    10123                          
## mixedLog2  5 10130 10165  -5060    10120  3.18      1      0.074 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# random slope (cue)
mixedLog3 <- glmer(choice ~ cue * rms_sc + (1+cue|workerid), family = "binomial", d)
anova(mixedLog2,mixedLog3)

## Data: d
## Models:
## mixedLog2: choice ~ cue * rms_sc + (1 | workerid)
## mixedLog3: choice ~ cue * rms_sc + (1 + cue | workerid)
##           Df   AIC   BIC logLik deviance Chisq Chi Df Pr(>Chisq)    
## mixedLog2  5 10130 10165  -5060    10120                            
## mixedLog3  7  9864  9912  -4925     9850   271      2     <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

summary(mixedLog3)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: choice ~ cue * rms_sc + (1 + cue | workerid)
##    Data: d
## 
##      AIC      BIC   logLik deviance df.resid 
##     9864     9912    -4925     9850     7793 
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -5.618 -0.855  0.178  0.863  4.241 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  workerid (Intercept) 0.263    0.513         
##           cuetest     1.104    1.051    -1.00
## Number of obs: 7800, groups:  workerid, 39
## 
## Fixed effects:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)     -0.1679     0.0892   -1.88    0.060 .  
## cuetest          0.3440     0.1754    1.96    0.050 *  
## rms_sc           9.7543     0.5196   18.77   <2e-16 ***
## cuetest:rms_sc  -1.2919     0.7291   -1.77    0.076 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) cuetst rms_sc
## cuetest     -0.957              
## rms_sc      -0.014  0.007       
## ctst:rms_sc  0.010  0.003 -0.713

# random slope (cue, rms)
mixedLog4 <- glmer(choice ~ cue * rms_sc + (1+cue+rms_sc|workerid), family = "binomial", d)
anova(mixedLog4,mixedLog3)

## Data: d
## Models:
## mixedLog3: choice ~ cue * rms_sc + (1 + cue | workerid)
## mixedLog4: choice ~ cue * rms_sc + (1 + cue + rms_sc | workerid)
##           Df  AIC  BIC logLik deviance Chisq Chi Df Pr(>Chisq)    
## mixedLog3  7 9864 9912  -4925     9850                            
## mixedLog4 10 9466 9535  -4723     9446   404      3     <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# # item effect
mixedLog5 <- glmer(choice ~ cue * rms_sc + (1+cue+rms_sc|workerid) + (1|whichSet), family = "binomial", d)
anova(mixedLog5,mixedLog4)

## Data: d
## Models:
## mixedLog4: choice ~ cue * rms_sc + (1 + cue + rms_sc | workerid)
## mixedLog5: choice ~ cue * rms_sc + (1 + cue + rms_sc | workerid) + (1 | 
## mixedLog5:     whichSet)
##           Df  AIC  BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## mixedLog4 10 9466 9535  -4723     9446                        
## mixedLog5 11 9468 9544  -4723     9446   0.3      1       0.58

summary(mixedLog4)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: choice ~ cue * rms_sc + (1 + cue + rms_sc | workerid)
##    Data: d
## 
##      AIC      BIC   logLik deviance df.resid 
##     9466     9535    -4723     9446     7790 
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -4.655 -0.846  0.149  0.877  3.610 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr       
##  workerid (Intercept)  0.283   0.532               
##           cuetest      1.195   1.093    -0.99      
##           rms_sc      80.156   8.953     0.22 -0.20
## Number of obs: 7800, groups:  workerid, 39
## 
## Fixed effects:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)     -0.1526     0.0926   -1.65    0.099 .  
## cuetest          0.3284     0.1824    1.80    0.072 .  
## rms_sc          11.0743     1.5167    7.30  2.8e-13 ***
## cuetest:rms_sc  -1.4529     0.7646   -1.90    0.057 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) cuetst rms_sc
## cuetest     -0.955              
## rms_sc       0.186 -0.175       
## ctst:rms_sc  0.003  0.004 -0.251

sjp.setTheme(axis.textsize.x = 0.7)
sjp.glmer(mixedLog4, y.offset = .4, geom.size = 2)

## Plotting random effects...

Discussion

Summary of Replication Attempt

Our confirmatory analysis focused on replicating the effect of the attentional cue on the probability of the test face being chosen using the same key statistical analysis (the paired t test) as in the original study. There was a slight tendency that our mTurk participants judge the face at the cued location to have higher contrast than the face at the uncued location in the contrast of the two faces was matched, but the effect was only marginal (p = .09).

Instead, in our study, the two-way repeated measures ANOVA with the test face contrast and cue condition as the factors revealed not only the significant main effect of contrast but also the main effect of the cue which was not reported in the original study.

Therefore, our study partially replicated the original results.

Commentary

Our follow-up exploratory analysis implied that there were substantial individual differences in the cueing effect and contrast judgment across our participants. Despite of our attempt to exclude participants who did not seem to distinguish different contrasts shown during the practice run, we ended up having a considerable number of participants who barely showed any modulation of contrast level in their performance in the first place in our main analysis. But, this is not very surprising as we expected high variance and noise in our data coming from lack of precise control over stimulus presentation and testing environments. Also, we cannot exclude the possibility that some workers were noncompliant due to the nature of online experiments.

Nontheless, our mixed effects logistic regression models indicated a marginally significant cueing effect on contrast judgments after accounting for random interaction between participants and conditions, suggesting some effects of attetion on participants’ contrast judgment in our replication study.

Replication of Study Attention Alters Perceived Attractiveness (Experiment 3) by Störmer & Alvarez (2016, Psychological Science)

Minyoung Lee (minyoung.lee@stanford.edu)

March 26, 2017

Introduction

Methods

Power Analysis

Planned Sample

Materials

Procedure

Analysis Plan

Key Analysis of Interest

The key analysis of interest in this replication is, therefore, a paired-sample t test to determine whether or not the percentage of cued faces being chosen is equal to the percentage of uncued faces being chosen.

Differences from Original Study

Methods Addendum (Post Data Collection)

Actual Sample

Differences from pre-data collection methods plan

Results

Data preparation

Data exclusion / filtering

Prepare data for analysis - create columns etc.

Confirmatory analysis

ANOVA: manipulation check

Replication of Figure 4a. Psychometric functions

Psychometric curve fit (weibull) using MLE

Side-by-side comparison with the original graph

Key statistics: Paired-samples t

Conduct a paired t test (test vs. standard face, Contrast Level 3)

Side-by-side comparison with the original graph

Exploratory analyses

Logistic regression

Individual differences as a function of test contrast

Mixed effects logistic regression

Discussion

Summary of Replication Attempt

Commentary