REPLICATION REPORT

Replication of The Sound of Power: Conveying and Detecting Hierarchical Rank Through Voice by Ko et al. (2015, Psychological Science)

Justin Salloum

jsalloum@stanford.edu

Introduction

The present study replicates Experiment 2 from the original research, which assesses whether perceivers use speakers’ hierarchy-induced acoustic cues to make hierarchical inferences about speakers. The researchers found that “perceivers used higher pitch, greater loudness, and greater loudness variability to make accurate inferences of speakers’ hierarchical rank, demonstrating that acoustic cues are systematically used to detect hierarchy.” Of particular interest is the result that “speakers who had been in the high-rank condition — regardless of their sex — were rated as more likely to engage in high-rank behaviors than were those in the low-rank condition.”

Methods

Power Analysis

Original effect size \(\eta^2\) = 0.603, \(f^2\) = 0.572. The effect size was determined using the F-statistic and the between- and within-subject degrees of freedom.

\(df_1 = 1\)

\(df_2 = 55\)

\(F(1, 55) = 83.67\)

\(\eta^2 = \frac{df_1F}{df_1F + df_2} = 0.603\)

\(f^2 = \frac{\eta^2}{1 - \eta^2} = 0.572\)

Power analysis was done using the software G*Power. To detect an effect size of 0.603, the following samples sizes are needed to achieve various power:

Power Sample Size Needed
0.8 17
0.9 21
0.95 25

All of these sample sizes are reasonable and financially feasible. Note that here the sample size used in data analysis actually corresponded to the number of stimuli (speakers) rather than the number of participants in the experiment, since all the scores for each speaker were averaged over all the participants in the original research.

Planned Sample

Just like in the original study, 40 undergraduates will be randomly selected for the sample, without restriction on age, gender or demographics. However, as a result of our power analysis, each participant will listen to only 24 speakers (25 was the necessary sample size calculated, but it must be an even number to ensure equal respresentation between speaker sex), as opposed to 60 which the number of speakers in the original research.

Materials

Recording of speakers saying aloud the Negotiation Passage: “I’m glad that we are able to meet today and I am looking forward to our negotiation. I know that you and I have different perspectives on some of the key issues and that these differences would need to be resolved for us to come to an agreement.”

“The voices’ baseline acoustics served as the criterion for the subset of voices such that the chosen voices’ baseline values had a smaller average deviation from the mean of their respective sex’s baseline values.”

The following items are used to measure hierarchy-based behavioral influences: alt text

Procedure

“Each perceiver listened to a subset of recordings of the Negotiation Passage from Experiment 1 (12 female and 12 male voices). After each recording, perceivers rated the speaker on 12 hierarchy-based behaviors plausible in a negotiation context, using a scale from 1 (not at all) to 7 (very much). Six of these behaviors were associated with high rank, and six with low rank. The order of the speakers and the order of the behaviors were randomized for each perceiver. The low-rank behaviors were reverse-scored, and then scores for all 12 behaviors were averaged to create one composite hierarchical-inference score per perceiver per speaker.

The original procedure is followed exactly, with the exception that Experiment 1 isn’t actually carried out - it is just used as a reference in the original research to obtain the recordings of the speakers.

Analysis Plan

The data will be analyzed with the same approach as in the original research. Effect of condition on hierarchy-based behavioral inferences:

“We examined the extent to which perceivers’ hierarchical inferences were consistent with the speakers’ hierarchical rank using a 2 (speaker’s condition: high rank, low rank) × 2 (speaker’s sex: male, female) analysis of variance.”

Like in the original research, the current research will look for a main effect of speaker’s condition, as well as main effect of speaker sex and interaction effects between speaker condition and sex.

Differences from Original Study

The biggest difference from the original study is the number of speakers that each participant listens to. In the original study each participant listened to and made inferences about 60 speakers, whereas in this study the number of speakers is reduced to 24. Another key difference is that answering the 12 questions about hierarchy-based behavior is the only inferential task that participants perform in this study. The third difference between the current study and the original study is the setting; the current study will be entirely online and distributed via Amazon Mechanical Turk.

Pilot Analysis

Initializtion and Setup

Loading the libaries needed for data analysis.

options(warn=-1)

rm(list=ls())
library(tidyr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(rjson)
library(tidyjson)
library(lme4)
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## The following object is masked from 'package:tidyr':
## 
##     expand
library(lmerTest)
## 
## Attaching package: 'lmerTest'
## The following object is masked from 'package:lme4':
## 
##     lmer
## The following object is masked from 'package:stats':
## 
##     step
library(gridExtra)
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
sem <- function(x) {sd(x, na.rm=TRUE) / sqrt(length(x))}
ci95 <- function(x) {sem(x) * 1.96}

Data Reading

Data is read from the various json files into a data frame in long form.

wid = 1
files = dir(paste0("./","production-results/"), pattern = "*.json")
d.raw = data.frame()
for (f in files) {
  jf = paste0("./", "production-results/",f)
  jd = fromJSON(paste(readLines(jf), collapse=""))
  for (elem in jd$answers$data) {
    id = data.frame(workerId = as.factor(wid),
                    speakerId = elem$speakerId,
                    speakerSex = elem$sex,
                    plev = elem$plev,
                    behaviorScore = elem$behaviorScore)
    d.raw = bind_rows(d.raw, id)
  }
  wid = wid + 1
}

The original data is simply read in from the csv provided on OSF.

d = read.csv('S2_voice_level_Final.csv')

Data Preparation for Analysis

To prepare the data for analysis, speakerSex and plev are recoded as [‘Male’, ‘Female’] and [‘Low-rank’, ‘High-rank’], respectively. Two sets of analysis will be carried out:

  1. Analysis on the data aggregated by average behavior score (d.a and d.af), in order to replicate the analysis done in the original research. To draw comparison to the original results, plots will also be generated for the original data (d.o and d.of).
  2. Analysis on the raw data in long form (d.raw and d.rawf). Mixed model analysis will be conducted to see if there are any random effects of other variables such as speaker id or worker id.
# d.a is the aggregated data with speakerSex and plev as numeric, while d.af codes speakerSex and plev as factors. The same applies for d.o and d.of, which is the original data

d.a = aggregate(d.raw[3:5], list(d.raw$speakerId), mean)
d.a = rename(d.a, speakerId = Group.1)
d.af = d.a

d.a$speakerSex[d.a$speakerSex == -1] = 0
d.a$plev[d.a$plev == -1] = 0

d.af$speakerSex[d.af$speakerSex == -1] = 'Male'
d.af$speakerSex[d.af$speakerSex == 1] = 'Female'
d.af$plev[d.af$plev == -1] = 'Low'
d.af$plev[d.af$plev == 1] = 'High'

d.o = select(d, voice, plev, vsex, newpster)
d.o = rename(d.o, speakerSex = vsex, behaviorScore = newpster)
d.o = d.o[complete.cases(d.o),]
d.of = d.o

d.of$speakerSex[d.of$speakerSex == -1] = 'Male'
d.of$speakerSex[d.of$speakerSex == 1] = 'Female'
d.of$plev[d.of$plev == -1] = 'Low'
d.of$plev[d.of$plev == 1] = 'High'

# d.raw is the raw data with speakerSex and plev as numeric, while d.rawf codes speakerSex and plev as factors

d.rawf = d.raw

d.raw$speakerSex[d.raw$speakerSex == -1] = 0
d.raw$plev[d.raw$plev == -1] = 0

d.rawf$speakerSex[d.rawf$speakerSex == -1] = 'Male'
d.rawf$speakerSex[d.rawf$speakerSex == 1] = 'Female'
d.rawf$plev[d.rawf$plev == -1] = 'Low'
d.rawf$plev[d.rawf$plev == 1] = 'High'

Now that the data has been prepared, here’s a quick look at our data (in both forms) that we will analyze:

print(head(d.af))
##   speakerId speakerSex plev behaviorScore
## 1         5       Male High      4.481481
## 2        16       Male High      4.555556
## 3        24       Male  Low      3.925926
## 4        37       Male High      4.444444
## 5        40       Male High      4.398148
## 6        45       Male  Low      3.601852
print(head(d.rawf))
## Source: local data frame [6 x 5]
## 
##   workerId speakerId speakerSex  plev behaviorScore
##      (chr)     (dbl)      (chr) (chr)         (dbl)
## 1        1        37       Male  High      4.750000
## 2        1        24       Male   Low      2.833333
## 3        1       163     Female   Low      4.333333
## 4        1        75       Male   Low      4.333333
## 5        1        53       Male  High      4.333333
## 6        1       155     Female   Low      2.833333

1. Replication of Original Analysis

Box Plots

We will gather an idea of the distribution of behavior scores in relation to hierarchy condition and speaker sex, and compare with the original results.

bx1 = ggplot(d.of, aes(x = plev, y = behaviorScore, fill = speakerSex)) +
  geom_boxplot() +
  labs(title = 'Original Results', x = 'Hierarchy Condition', y = 'Behavior Score') +
  scale_fill_discrete(name = 'Speaker Sex')
bx2 = ggplot(d.af, aes(x = plev, y = behaviorScore, fill = speakerSex)) +
  geom_boxplot() +
  labs(title = 'Replication Results', x = 'Hierarchy Condition', y = 'Behavior Score') +
  scale_fill_discrete(name = 'Speaker Sex')

grid.arrange(bx1, bx2, ncol = 2)

Bar Plots

We will use bar plots to get an idea of the average behavior scores between hierarchy condition and speaker sex, and compare with the original results.

bp1 = ggplot(d.of, aes(x = plev, y = behaviorScore, fill = speakerSex)) +
  geom_bar(position = 'dodge', stat = 'identity') +
  labs(title = 'Original Results', x = 'Hierarchy Condition', y = 'Behavior Score') +
  scale_fill_discrete(name = 'Speaker Sex')
bp2 = ggplot(d.af, aes(x = plev, y = behaviorScore, fill = speakerSex)) +
  geom_bar(position = 'dodge', stat = 'identity') +
  labs(title = 'Replication Results', x = 'Hierarchy Condition', y = 'Behavior Score') +
  scale_fill_discrete(name = 'Speaker Sex')

grid.arrange(bp1, bp2, ncol = 2)

Analysis of Variance (ANOVA) - Fixed Effects Model Analysis

Additive Model

rs1.11 = aov(behaviorScore ~ plev + speakerSex, data = d.af)
summary(rs1.11)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## plev         1 1.8828  1.8828  26.579 4.16e-05 ***
## speakerSex   1 0.2242  0.2242   3.164   0.0897 .  
## Residuals   21 1.4876  0.0708                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
rs1.12 = lm(behaviorScore ~ plev + speakerSex, data = d.a)
summary(rs1.12)
## 
## Call:
## lm(formula = behaviorScore ~ plev + speakerSex, data = d.a)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.44892 -0.13807  0.00783  0.11001  0.61404 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.8582     0.1002  38.513  < 2e-16 ***
## plev          0.5275     0.1102   4.787 9.93e-05 ***
## speakerSex   -0.1960     0.1102  -1.779   0.0897 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2662 on 21 degrees of freedom
## Multiple R-squared:  0.5862, Adjusted R-squared:  0.5467 
## F-statistic: 14.87 on 2 and 21 DF,  p-value: 9.48e-05

Interactive Model

rs1.21 = aov(behaviorScore ~ plev * speakerSex, data = d.af)
summary(rs1.21)
##                 Df Sum Sq Mean Sq F value   Pr(>F)    
## plev             1 1.8828  1.8828  25.621 5.97e-05 ***
## speakerSex       1 0.2242  0.2242   3.050   0.0961 .  
## plev:speakerSex  1 0.0178  0.0178   0.243   0.6277    
## Residuals       20 1.4698  0.0735                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
rs1.22 = lm(behaviorScore ~ plev * speakerSex, data = d.a)
summary(rs1.22)
## 
## Call:
## lm(formula = behaviorScore ~ plev * speakerSex, data = d.a)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.41667 -0.12440 -0.00529  0.10741  0.64630 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       3.8259     0.1212  31.558  < 2e-16 ***
## plev              0.5828     0.1587   3.672  0.00151 ** 
## speakerSex       -0.1407     0.1587  -0.887  0.38581    
## plev:speakerSex  -0.1106     0.2245  -0.493  0.62766    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2711 on 20 degrees of freedom
## Multiple R-squared:  0.5911, Adjusted R-squared:  0.5298 
## F-statistic: 9.638 on 3 and 20 DF,  p-value: 0.000383

2. Additional Analysis

All the data analysis thus far replicates the analysis that was actually done in the original study. From here on all analysis aims to follow up on the original analysis by analyzing the raw data (unaggreagted, in long form) to look for other effects.

Box Plot and Bar Plot

Just like with the aggregated data, we will gather an idea of the distribution of and average behavior scores in relation to hierarchy condition and speaker sex.

ggplot(d.rawf, aes(x = plev, y = behaviorScore, fill = speakerSex)) +
  geom_boxplot() +
  labs(title = 'Replication Results (raw)', x = 'Hierarchy Condition', y = 'Behavior Score') +
  scale_fill_discrete(name = 'Speaker Sex')

ggplot(d.rawf, aes(x = plev, y = behaviorScore, fill = speakerSex)) +
  geom_bar(position = 'dodge', stat = 'identity') +
  labs(title = 'Replication Results (raw)', x = 'Hierarchy Condition', y = 'Behavior Score') +
  scale_fill_discrete(name = 'Speaker Sex')

ANOVA - Mixed Model Analysis

Random effect of speakerId

rs2.2 = lmer(behaviorScore ~ (plev * speakerSex) + (1 | speakerId), data = d.raw )
summary(rs2.2)
## Linear mixed model fit by REML t-tests use Satterthwaite approximations
##   to degrees of freedom [lmerMod]
## Formula: behaviorScore ~ (plev * speakerSex) + (1 | speakerId)
##    Data: d.raw
## 
## REML criterion at convergence: 416.5
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.57592 -0.63208  0.05622  0.62009  2.38285 
## 
## Random effects:
##  Groups    Name        Variance Std.Dev.
##  speakerId (Intercept) 0.03277  0.1810  
##  Residual              0.36645  0.6053  
## Number of obs: 216, groups:  speakerId, 24
## 
## Fixed effects:
##                 Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)       3.8259     0.1212 20.0000  31.558  < 2e-16 ***
## plev              0.5828     0.1587 20.0000   3.672  0.00151 ** 
## speakerSex       -0.1407     0.1587 20.0000  -0.887  0.38581    
## plev:speakerSex  -0.1106     0.2245 20.0000  -0.493  0.62766    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) plev   spkrSx
## plev        -0.764              
## speakerSex  -0.764  0.583       
## plev:spkrSx  0.540 -0.707 -0.707

Random effect of speakerId and workerId

rs2.3 = lmer(behaviorScore ~ (plev * speakerSex) + (1 | workerId) + (1 | speakerId), data = d.raw )
summary(rs2.3)
## Linear mixed model fit by REML t-tests use Satterthwaite approximations
##   to degrees of freedom [lmerMod]
## Formula: 
## behaviorScore ~ (plev * speakerSex) + (1 | workerId) + (1 | speakerId)
##    Data: d.raw
## 
## REML criterion at convergence: 416.5
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.57592 -0.63208  0.05622  0.62009  2.38285 
## 
## Random effects:
##  Groups    Name        Variance Std.Dev.
##  speakerId (Intercept) 0.03277  0.1810  
##  workerId  (Intercept) 0.00000  0.0000  
##  Residual              0.36645  0.6053  
## Number of obs: 216, groups:  speakerId, 24; workerId, 9
## 
## Fixed effects:
##                 Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)       3.8259     0.1212 20.0000  31.558  < 2e-16 ***
## plev              0.5828     0.1587 20.0000   3.672  0.00151 ** 
## speakerSex       -0.1407     0.1587 20.0000  -0.887  0.38581    
## plev:speakerSex  -0.1106     0.2245 20.0000  -0.493  0.62766    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) plev   spkrSx
## plev        -0.764              
## speakerSex  -0.764  0.583       
## plev:spkrSx  0.540 -0.707 -0.707