For this exercise, please try to reproduce the results from Experiment 1 of the associated paper (Ko, Sadler & Galinsky, 2015). The PDF of the paper is included in the same folder as this Rmd file.

Methods summary:

A sense of power has often been tied to how we perceive each other’s voice. Social hierarchy is embedded into the structure of society and provides a metric by which others relate to one another. In 1956, the Brunswik Lens Model was introduced to examine how vocal cues might influence hierarchy. In “The Sound of Power: Conveying and Detecting Hierarchical Rank Through Voice,” Ko and colleagues investigated how manipulation of hierarchal rank within a situation might impact vocal acoustic cues. Using the Brunswik Model, six acoustic metrics were utilized (pitch mean & variability, loudness mean & variability, and resonance mean & variability) to isolate a potential contribution between individuals of different hierarchal rank. In the first experiment, Ko, Sadler & Galinsky examined the vocal acoustic cues of individuals before and after being assigned a hierarchal rank in a sample of 161 subjects (80 male). Each of the six hierarchy acoustic cues were analyzed with a 2 (high vs. low rank condition) x 2 (male vs. female) analysis of covariance, controlling for the baseline of the respective acoustic cue.


Target outcomes:

Below is the specific result you will attempt to reproduce (quoted directly from the results section of Experiment 1):

The impact of hierarchical rank on speakers’ acoustic cues. Each of the six hierarchy-based (i.e., postmanipulation) acoustic variables was submitted to a 2 (condition: high rank, low rank) × 2 (speaker’s sex: female, male) between-subjects analysis of covariance, controlling for the corresponding baseline acoustic variable. Table 4 presents the adjusted means by condition. Condition had a significant effect on pitch, pitch variability, and loudness variability. Speakers’ voices in the high-rank condition had higher pitch, F(1, 156) = 4.48, p < .05; were more variable in loudness, F(1, 156) = 4.66, p < .05; and were more monotone (i.e., less variable in pitch), F(1, 156) = 4.73, p < .05, compared with speakers’ voices in the low-rank condition (all other Fs < 1; see the Supplemental Material for additional analyses of covariance involving pitch and loudness). (from Ko et al., 2015, p. 6; emphasis added)

The adjusted means for these analyses are reported in Table 4 (Table4_AdjustedMeans.png, included in the same folder as this Rmd file).


Step 1: Load packages

library(tidyverse) # for data munging
library(knitr) # for kable table formating
library(haven) # import and export 'SPSS', 'Stata' and 'SAS' Files
library(readxl) # import excel files
library(emmeans)
library(effsize)
emm_options(opt.digits = F)

# #optional packages:
# library(psych)
library(car) # for ANCOVA
# library(compute.es) # for ANCOVA
# library(lsmeans) # for ANCOVA

Step 2: Load data

# Just Experiment 1
d <-read_csv("data/S1_voice_level_Final.csv")
DT::datatable(d)

Step 3: Tidy data

data_subset <-
  d %>%
  select(voice, plev, vsex, # grab voice, hierarchy rank, and speaker sex columns
         starts_with("pitch"),
         starts_with("intense"),
         starts_with("form"))

Step 4: Run analysis

Pre-processing

Descriptive statistics

In the paper, the adjusted means by condition are reported (see Table 4, or Table4_AdjustedMeans.png, included in the same folder as this Rmd file). Reproduce these values below:

model1 <-
  lm(pitch_smean ~
     plev * vsex +
     pitch_rmean,
   data = data_subset)

emmeans(model1, ~plev)
##  plev   emmean       SE  df lower.CL upper.CL
##    -1 155.5227 1.037984 156 153.4724 157.5730
##     1 158.6098 1.019801 156 156.5954 160.6242
## 
## Results are averaged over the levels of: vsex 
## Confidence level used: 0.95
es <- effectsize::eta_squared(model1, partial = F)

round(es$Eta2[1], 2)
## [1] 0
es_partial <- effectsize::eta_squared(model1, partial = T)

round(es_partial$Eta2_partial[1], 2)
## [1] 0.02
model2 <-
  lm(pitch_svar ~
     plev * vsex +
     pitch_rvar,
   data = data_subset)

emmeans(model2, ~plev)
##  plev   emmean       SE  df lower.CL upper.CL
##    -1 1648.367 73.31500 156 1503.549 1793.185
##     1 1425.016 71.93725 156 1282.920 1567.113
## 
## Results are averaged over the levels of: vsex 
## Confidence level used: 0.95
es <- effectsize::eta_squared(model2, partial = F)

round(es$Eta2[1], 2)
## [1] 0.02
es_partial <- effectsize::eta_squared(model2, partial = T)

round(es_partial$Eta2_partial[1], 3)
## [1] 0.044
model3 <-
  lm(intense_smean ~
     plev * vsex +
     intense_rmean,
   data = data_subset)

emmeans(model3, ~plev)
##  plev   emmean        SE  df lower.CL upper.CL
##    -1 58.66784 0.3205314 156 58.03470 59.30098
##     1 59.33567 0.3144773 156 58.71449 59.95685
## 
## Results are averaged over the levels of: vsex 
## Confidence level used: 0.95
es <- effectsize::eta_squared(model3, partial = F)

round(es$Eta2[1], 2)
## [1] 0.01
model4 <-
  lm(intense_svar ~
     plev * vsex +
     intense_rvar,
   data = data_subset)

emmeans(model4, ~plev)
##  plev   emmean       SE  df lower.CL upper.CL
##    -1 183.4795 4.378414 156 174.8308 192.1281
##     1 196.7301 4.295270 156 188.2457 205.2144
## 
## Results are averaged over the levels of: vsex 
## Confidence level used: 0.95
es <- effectsize::eta_squared(model4, partial = F)

round(es$Eta2[1], 2)
## [1] 0.03
model5 <-
  lm(form_smean ~
     plev * vsex +
     form_rmean,
   data = data_subset)

emmeans(model5, ~plev)
##  plev   emmean       SE  df lower.CL upper.CL
##    -1 1128.806 9.418957 156 1110.201 1147.411
##     1 1129.384 9.240729 156 1111.131 1147.637
## 
## Results are averaged over the levels of: vsex 
## Confidence level used: 0.95
es <- effectsize::eta_squared(model5, partial = F)

round(es$Eta2[1], 2)
## [1] 0
model6 <-
  lm(form_svar ~
     plev * vsex +
     form_rvar,
   data = data_subset)

emmeans(model6, ~plev)
##  plev   emmean       SE  df lower.CL upper.CL
##    -1 43654.54 1540.799 156 40611.01 46698.06
##     1 42170.78 1510.769 156 39186.58 45154.98
## 
## Results are averaged over the levels of: vsex 
## Confidence level used: 0.95
es <- effectsize::eta_squared(model6, partial = F)

round(es$Eta2[1], 2)
## [1] 0

Inferential statistics

The impact of hierarchical rank on speakers’ acoustic cues. Each of the six hierarchy-based (i.e., postmanipulation) acoustic variables was submitted to a 2 (condition: high rank, low rank) × 2 (speaker’s sex: female, male) between-subjects analysis of covariance, controlling for the corresponding baseline acoustic variable. […] Condition had a significant effect on pitch, pitch variability, and loudness variability. Speakers’ voices in the high-rank condition had higher pitch, F(1, 156) = 4.48, p < .05; were more variable in loudness, F(1, 156) = 4.66, p < .05; and were more monotone (i.e., less variable in pitch), F(1, 156) = 4.73, p < .05, compared with speakers’ voices in the low-rank condition (all other Fs < 1; see the Supplemental Material for additional analyses of covariance involving pitch and loudness).

# reproduce the above results here

Higher pitch:

infmodel1 <-
  aov(pitch_smean ~
        plev * vsex +
        pitch_rmean,
      data = data_subset)

print <- car::Anova(infmodel1, type = "III")

print[c(2,6),]
## Anova Table (Type III tests)
## 
## Response: pitch_smean
##            Sum Sq  Df F value  Pr(>F)  
## plev        380.1   1  4.4837 0.03581 *
## Residuals 13223.4 156                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

More monotone/less variable in pitch:

infmodel2 <-
  aov(pitch_svar ~
        plev * vsex +
        pitch_rvar,
      data = data_subset)

print <- car::Anova(infmodel2, type = "III")

print[c(2,6),]
## Anova Table (Type III tests)
## 
## Response: pitch_svar
##             Sum Sq  Df F value Pr(>F)  
## plev       2003882   1   4.727 0.0312 *
## Residuals 66131879 156                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

More variable in loudness:

infmodel4 <-
  aov(intense_svar ~
        plev * vsex +
        intense_rvar,
      data = data_subset)

print <- car::Anova(infmodel4, type = "III")

print[c(2,6),]
## Anova Table (Type III tests)
## 
## Response: intense_svar
##           Sum Sq  Df F value  Pr(>F)  
## plev        7042   1   4.662 0.03236 *
## Residuals 235630 156                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

All other Fs are p > .05:

No difference in loudness:

infmodel3 <-
  aov(intense_smean ~
        plev * vsex +
        intense_rmean,
      data = data_subset)

print <- car::Anova(infmodel3, type = "III")

print[c(2,6),]
## Anova Table (Type III tests)
## 
## Response: intense_smean
##            Sum Sq  Df F value Pr(>F)
## plev        17.93   1  2.2118  0.139
## Residuals 1264.30 156

No difference in resonance:

infmodel5 <-
  aov(form_smean ~
        plev * vsex +
        form_rmean,
      data = data_subset)

print <- car::Anova(infmodel5, type = "III")

print[c(2,6),]
## Anova Table (Type III tests)
## 
## Response: form_smean
##            Sum Sq  Df F value Pr(>F)
## plev           13   1  0.0019 0.9651
## Residuals 1089894 156

No difference in resonance variability:

infmodel6 <-
  aov(form_svar ~
        plev * vsex +
        form_rvar,
      data = data_subset)

print <- car::Anova(infmodel6, type = "III")

print[c(2,6),]
## Anova Table (Type III tests)
## 
## Response: form_svar
##               Sum Sq  Df F value Pr(>F)
## plev      8.7633e+07   1  0.4705 0.4938
## Residuals 2.9055e+10 156

Step 5: Reflection

Were you able to reproduce the results you attempted to reproduce? If not, what part(s) were you unable to reproduce?

I was able to reproduce most of the main results, but not all of the R^2 effect sizes (see model1 and model2 in descriptives). I don’t think the values that the original authors reported in Table 4 are eta-squared effect sizes. It looks they’re partial eta-squared, at least the output matches more closely in that case, though it’s still not exactly what it’s supposed to be. I would have gone beyond 3 hours so I stopped.

How difficult was it to reproduce your results?

This data set was quite annoying to deal with. It wasn’t really clear which variable is which. Also, it would have been a huge effort to transform only the relevant outputs into a compact table format (like in the paper) within the 3 hours, so I did not do that part.

What aspects made it difficult? What aspects made it easy?

(See above)