For this exercise, please try to reproduce the results from Experiment 1 of the associated paper (Ko, Sadler & Galinsky, 2015). The PDF of the paper is included in the same folder as this Rmd file.

Methods summary:

A sense of power has often been tied to how we perceive each other’s voice. Social hierarchy is embedded into the structure of society and provides a metric by which others relate to one another. In 1956, the Brunswik Lens Model was introduced to examine how vocal cues might influence hierarchy. In “The Sound of Power: Conveying and Detecting Hierarchical Rank Through Voice,” Ko and colleagues investigated how manipulation of hierarchal rank within a situation might impact vocal acoustic cues. Using the Brunswik Model, six acoustic metrics were utilized (pitch mean & variability, loudness mean & variability, and resonance mean & variability) to isolate a potential contribution between individuals of different hierarchal rank. In the first experiment, Ko, Sadler & Galinsky examined the vocal acoustic cues of individuals before and after being assigned a hierarchal rank in a sample of 161 subjects (80 male). Each of the six hierarchy acoustic cues were analyzed with a 2 (high vs. low rank condition) x 2 (male vs. female) analysis of covariance, controlling for the baseline of the respective acoustic cue.

Target outcomes:

Below is the specific result you will attempt to reproduce (quoted directly from the results section of Experiment 1):

The impact of hierarchical rank on speakers’ acoustic cues. Each of the six hierarchy-based (i.e., postmanipulation) acoustic variables was submitted to a 2 (condition: high rank, low rank) × 2 (speaker’s sex: female, male) between-subjects analysis of covariance, controlling for the corresponding baseline acoustic variable. Table 4 presents the adjusted means by condition. Condition had a significant effect on pitch, pitch variability, and loudness variability. Speakers’ voices in the high-rank condition had higher pitch, F(1, 156) = 4.48, p < .05; were more variable in loudness, F(1, 156) = 4.66, p < .05; and were more monotone (i.e., less variable in pitch), F(1, 156) = 4.73, p < .05, compared with speakers’ voices in the low-rank condition (all other Fs < 1; see the Supplemental Material for additional analyses of covariance involving pitch and loudness). (from Ko et al., 2015, p. 6; emphasis added)

The adjusted means for these analyses are reported in Table 4 (Table4_AdjustedMeans.png, included in the same folder as this Rmd file).

Step 1: Load packages

library(tidyverse) # for data munging
library(knitr) # for kable table formating
library(haven) # import and export 'SPSS', 'Stata' and 'SAS' Files
library(readxl) # import excel files
library(magrittr) # pipes

# #optional packages:
# library(psych)
library(car) # for ANCOVA
library(compute.es) # for ANCOVA
library(emmeans) # for ANCOVA

Step 2: Load data

# Just Experiment 1
d <-read_csv("data/S1_voice_level_Final.csv")
# DT::datatable(d)

Step 3: Tidy data

## more intuitive var names
filteredD <- d %>% select(-contains("MD"), -contains("Z"), -("vsex"))
names(filteredD)[names(filteredD) == "voice"] <- "participant"
names(filteredD)[names(filteredD) == "pow"] <- "powerManipulation"
names(filteredD)[names(filteredD) == "plev"] <- "hierarchyRank"
names(filteredD)[names(filteredD) == "feelpower"] <- "subjectivePower"

names(filteredD) %<>% gsub('(_[sr])([A-Za-z]+)', '\\2\\1', .) %>%
  gsub('mean', 'Mean', .) %>% gsub('var', 'Var', .) %>% 
  gsub('intense', 'loudness', .) %>% gsub('form', 'resonance', .)

## coerce vars to factors
filteredD$hierarchyRank <- as_factor(dplyr::recode(filteredD$hierarchyRank, "1" = "high", "-1" = "low"))
filteredD$sex %<>% as_factor();
filteredD$race %<>% as_factor();
filteredD$native %<>% as_factor();

## gather observed variables
filteredLongD <- filteredD %>%
  pivot_longer(contains("_"), names_to = c(".value", "stimulus"), names_pattern = "(.+)_(.)")

Step 4: Run analysis

Pre-processing

## no other pre-processing

Descriptive statistics

In the paper, the adjusted means by condition are reported (see Table 4, or Table4_AdjustedMeans.png, included in the same folder as this Rmd file). Reproduce these values below:

In the original paper, the authors did not specify the algorithm by which they “adjusted” the means, other than suggesting that it was in some way related to the “corresponding baseline acoustic variable”. Thus I am unable to reproduce their adjustment, and the closest I can do is to produce a similar table of unadjusted means.

summaryD <- filteredLongD %>% 
  group_by(hierarchyRank) %>%
  summarise(across(c(contains("Mean"), contains("Var")), mean, .names = "avg{.col}"), .groups = "drop")

tempTable <- t(summaryD)
summaryTable <- data.frame(tempTable[-1,])
names(summaryTable) <- as.character(unlist(tempTable[1,]))
row.names(summaryTable) <- c("Resonance", "Loudness", "Pitch", 
                             "Resonance variability", "Loudness variability", "Pitch variability")

print(summaryTable)

##                           high      low
## Resonance             1208.677 1213.288
## Loudness              58.44162 58.03299
## Pitch                 151.4545 155.2655
## Resonance variability 52611.38 54465.90
## Loudness variability  191.6928 180.9865
## Pitch variability     1559.008 1734.349

Inferential statistics

The impact of hierarchical rank on speakers’ acoustic cues. Each of the six hierarchy-based (i.e., postmanipulation) acoustic variables was submitted to a 2 (condition: high rank, low rank) × 2 (speaker’s sex: female, male) between-subjects analysis of covariance, controlling for the corresponding baseline acoustic variable. […] Condition had a significant effect on pitch, pitch variability, and loudness variability. Speakers’ voices in the high-rank condition had higher pitch, F(1, 156) = 4.48, p < .05; were more variable in loudness, F(1, 156) = 4.66, p < .05; and were more monotone (i.e., less variable in pitch), F(1, 156) = 4.73, p < .05, compared with speakers’ voices in the low-rank condition (all other Fs < 1; see the Supplemental Material for additional analyses of covariance involving pitch and loudness).

# reproduce the above results here
filteredWideD <- filteredLongD %>% pivot_longer(c(contains("Mean"), contains("Var")), 
                               names_to = "variable", values_to = "value") %>% 
  pivot_wider(names_from = stimulus, values_from = value)

varVector <- c("pitchMean", "pitchVar", "loudnessMean", "loudnessVar", "resonanceMean", "resonanceVar")
fCollated <- vector()
pCollated <- vector()

for (varNo in seq_along(varVector)) {
  thisVar <- filteredWideD %>% filter(.$variable == varVector[varNo])
  ancovaFit <- aov(r ~ s + sex * hierarchyRank, data = thisVar)
  thisAncova <- Anova(ancovaFit, type = "III")
  fCollated[varNo] <- thisAncova[4, 3]
  pCollated[varNo] <- thisAncova[4, 4]
}

ancovaCollated <- as.data.frame(cbind(fCollated, pCollated))
names(ancovaCollated) <- c("F value", "p value")
row.names(ancovaCollated) <- varVector

print(ancovaCollated)

##                  F value   p value
## pitchMean     0.07133618 0.7897539
## pitchVar      1.34311723 0.2482559
## loudnessMean  0.50957403 0.4763900
## loudnessVar   0.15195632 0.6972045
## resonanceMean 0.68651983 0.4086165
## resonanceVar  2.20562476 0.1395261

Step 5: Reflection

Were you able to reproduce the results you attempted to reproduce? If not, what part(s) were you unable to reproduce?

No, unfortunately. I was not able to reproduce any of the results—even reproducing the descriptive statistics was hampered by the lack of description of the “adjustment” process, and after conducting (what I assumed had been) the authors’ procedure for the ANCOVA, I was unable to get results even remotely close to those that they had presented, despite having worked on this reproduction for over 3h.

How difficult was it to reproduce your results?

Very.

What aspects made it difficult? What aspects made it easy?

The non-transparent variable names made the pre-processing difficult, and the lack of detailed description of the analytic processes undertaken meant that it was extremely challenging to reproduce the results. Indeed, the fact that the input data were messy was not a significant concern in comparison with the lack of clear, repeatable instructions.

Reproducibility Report: Group B Choice 2