Introduction

In their 2012 study, Livingston, Rosette, and Washington found that, contrary to commonly held beliefs and longstanding findings in the literature on women and leadership, there are some circumstances under which certain women may not face the so-called “agency penalty” for acting dominantly – a stereotype incongruent type of behavior. They suggest that the experience of this agency penalty is moderated by race such that while White women (and Black men) experience a penalty for acting dominantly, Black women (and White men) do not. Further, they propose that this effect may be a function of Black women’s unique role in the social hierarchy as individuals with two subordinate identities. That is, the prescriptive and proscriptive norms that Black women face may be different than those faced by White women or Black men, with whom they share certain identities. Black women may face less strong proscriptive norms against dominant behaviors than White women or Black men, and the prescriptive norm for communal behaviors may be less strong for Black women than White women.

Using a simple experimental design, Livingston et al recruited 84 participants for an online experiment. Their main finding was a significant three-way interaction between target race (White vs. Black), target gender (male vs. female), and target behavior (dominant vs. communal) on leader status evaluations, F(1, 76) = 11.53, p < .001, \(\eta^2\) = .13. Within the female targets, they found a significant two-way interaction, F(1, 35) = 4.77, p < .04, \(\eta^2\) = .12, such that while White women were punished (rated less favorably) for engaging in dominant behaviors, Black women were not. They also found a significant two-way interaction amongst the male targets, F(1, 41) = 7.16, p < .02, \(\eta^2\) = .15, such that while Black men were punished (rated less favorably) for engaging in dominant behaviors, White men were not. Further, Livingston et al found that there was a significant three-way interaction in how participants tended to make attributions about target behaviors, F(1, 75) = 7.87, p < .006, \(\eta^2\) = .10. Again, there was a significant two-way interaction amongst female targets, F(1, 34) = 2.95, p < .10, \(\eta^2\) = .08, such that for White women, dominance was more attributed to the person than the situation when compared to communal behaviors, but this difference was not significant for Black women. For men, they found that there was a significant effect of race on attributions, t(20) = 2.24, p < .04, such that participants tended to attribute dominant behaviors more to the person rather than the situation for Black targets than for White targets. Finally, they found that within Black targets, there was a marginally significant effect of gender, t(18) = 1.86, p < .08, such that participants tended to attribute dominant behaviors more to the person rather than the situation for Black men than for Black women.

Figure 1. Plot of original findings reported by Livingston, Rosette, & Washington (2012)

Figure 1. Plot of original findings reported by Livingston, Rosette, & Washington (2012)

Methods

Power Analysis

Livingston et al reported an effect size for the main three-way interaction on leader status evaluations of \(\eta^2\) = .13. We found that this translates to 93.8% statistical power. Given that the effect size that Livingston et al found was the true effect size of this effect, this estimation of power seems plausible, even with the small sample size (N = 84). Using their effect size and our desired 95% power, we found that we would need approximately 90 participants split between the 8 conditions to detect the effect if Livingston et al’s effect size is true. In order to be conservative, we decided to use 2 times the original sample size as our final sample size (N = 168). If Livingston et al’s effect size is the true size of the effect, this number of participants should be more than enough to detect the effect. However, it is also a large enough sample to be conservative – if the true effect size is smaller than what Livingston et al found, we may still be able to detect the effect, if it exists. Further, the sample is a reasonable size, and given that the survey takes approximately 3 minutes to complete, should not be too costly to implement.

Planned Sample

The planned sample is the mTurk population. As Livingston et al used the mTurk or a similar pool as a “nationally representative” pool, so shall we. No participants will be excluded on the basis of race or gender. A manipulation check was built into our version of the study that may not have been included in the original study. In our study, we have a question asking whether participants attended to the manipulated race of the target. If they answer this question incorrectly, they will be excluded from analyses. As stated above, we will aim for a sample size of N = 210.

Materials

The following is the passage participants will read.

The instructions page will read:

On the following page, you will read a brief biography and account of a workplace encounter. Please take a moment to read and consider it carefully. Take your time. You will be asked to answer a series of questions about this passage at a later time.

Then the vignette will read:

[Molly/Aliyah/Mark/Darnell]* Johnson is a Senior Vice President at a major consulting firm, where [she/he] has worked for twenty years. [She/He] obtained [her/his] BA in Economics from University of Southern California and [her/his] MBA from London Business School. [She/he is currently an active member of the National Association of Business Executives./She/he is currently an active member of the National Association of Black Business Executives.] The company sets forth as its mission: “To provide our clients with the highest quality services to address their business needs. We do so by recruiting and retaining the most diverse, passionate, and knowledgeable professionals and by providing a collaborative and open work environment that enables our employees to thrive.” Values that the company emphasizes include honesty and integrity, intellectual rigor, accountability, and the pursuit of excellence.

Recently, [Molly/Aliyah/Mark/Darnell] Johnson’s group nearly missed a deadline on a major project they were working on for an important client, with whom the company has had a longstanding relationship. [Ms./Mr.] Johnson discovered that this potentially costly mistake was largely due to one of [her/his] subordinates. When [she/he] called this employee into [her/his] office to discuss this matter, [she/he] issued reprimands, saying, [“I demand that you take steps to improve your performance.”/“I encourage you to take steps to improve your performance.”]

When asked by [her/his] peers about how [she/he] was dealing with this mistake, [she/he] stated, [“I am a tough, determined boss and intend to do everything in my power to ensure that my employees’ performance improves.”/“I am a caring, committed boss and intend to do everything in my power to ensure that my employees’ performance improves.”]

*Molly and Mark are intended to be interpreted as White, as these names are more stereotypically White. Aliyah and Darnell are intended to be interpreted as Black, as these names are more stereotypically Black. Further cues to race are included in the business executive organizations targets are members of.

We were unable to obtain the original materials used in the 2012 study, as the author who was in possession of the materials could not be reached. However, the above materials were synthesized using details gleaned from the original study and other similar studies that Livingston and colleagues have run, either together or individually, and represent a conceptual replication of the original study.

Final survey to be administered can be found here: https://stanfordgsb.qualtrics.com/SE/?SID=SV_3UTmvTzvBqY0TXv

Procedure

Quoted from the original article, the procedures is as follows (with certain modifications noted):

Participants were randomly assigned to one of eight conditions in a 2 (leader’s race: White vs. Black) × 2 (leader’s gender: male vs. female) × 2 (leader’s behavior: dominant vs. communal) between-subjects design. All participants were shown a description and photograph of a fictitious senior vice president who worked for a Fortune 500 company. The photographs were matched on perceived age, attractiveness, and “babyfaceness,” [1] and the descriptions included identical information about the leader’s education, company tenure, and leadership mission. The materials described a meeting between the leader and a subordinate employee who did not meet the company’s expectations. Dominant leaders were described as communicating their disappointment by demanding action (i.e., “I demand that you take steps to improve your performance”) and expressing assertiveness (i.e., “I am a tough, determined boss and intend to do everything in my power to ensure that your performance improves”). Communal leaders were described as encouraging the subordinate (i.e., “I encourage you to take steps to improve your performance”) and communicating compassion (i.e., “I am a caring, committed boss and intend to do everything in my power to ensure that your performance improves”).

Participants rated the leader on the following questions: “How well do you think the leader handled the situation with the employee?” “How effective is the leader at maximizing the employee’s performance?” “How much do you think the leader is admired by his or her employees?” and “How respected is this leader by the other executives at the company?” (Cronbach’s α = .89). We also assessed participants’ expectations regarding the leader’s salary by asking them to indicate what they thought the leader’s annual salary should be, using a scale ranging from 1 ($100,000) to 9 ($500,000). We combined these variables into a single composite score assessing the leader’s status in the organization.

Finally, we assessed attributions for the leader’s behavior using the following item: “How much does the leader’s reaction reflect something about his/her personality versus something about the situation?” (1 = definitely the personality, 7 = definitely the situation). If dominance is proscribed for Black men and White women, then internal attributions should be higher for Black men and White women who “break the rules” by behaving dominantly than for Black men and White women who behave normatively by adhering to prescribed stereotypes. [2]

[1] In our experiment, we only used the text manipulation, using manipulation of target names and mention of a membership in an organization for Black executives or simply executives as a manipulation of perceived race. Previous research has found that this type of manipulation is often sufficient to manipulate participants’ perceptions of race (e.g. see Bertrand & Mullainathan, 2004), and we expected that the organization related to Black executives would prime race effectively, if all else fails. Because we were not able to obtain the original materials, we did not want to introduce more confounds by selecting our own faces to use. We did not know where the original authors obtained these stimuli nor what they looked like, besides that they were matched on age, attractiveness, and babyfacedness (though the authors did not mention what age they were matched at, etc.). In order to achieve a cleaner manipulation, we simply used names and mentions of the organizations individuals were members of to manipulate race.

[2] Although the authors did not explicitly state, we assume that they collected relevant demographic information, including but not limited to participant race and gender. In addition to these two variables, we also asked about participant age, educational background, work experience, social liberalism or conservatism, and economic liberalism or conservatism. While the authors did not report that they controlled for any of these demographic factors, it is possible that they have an impact on how targets are evaluated, especially political beliefs. Therefore, we include them with the potential to run additional analyses in addition to the main ones conducted by Livingston et al.

Analysis Plan

Livingston et al (2012) did not specify what, if any, exclusion criteria used or any rules used in data cleaning. For our purposes, we will be excluding all participants that fail the manipulation check (question that asks about perceived target race).

Our main analysis will be examining the three-way interaction between target race, target gender, and target behavior on leader status evaluations. The original authors used an ANOVA to examine this interaction, but we will be using a linear model, which conceptually achieves the same thing and can be used more flexibly in R. We will be using the appropriate linear models throughout our analyses where the original authors used ANOVAs or t-tests.

In additional, exploratory analyses that are not meant to speak to the replicability of the original finding but perhaps may extend the original finding, we will also factor in political liberalism/conservatism as a covariate in our analyses after running the main analyses.

Differences from Original Study

The differences from the original study reported here are assumed, but it is difficult to tell how similar or different this study is from the original, as we don’t have the original materials. However, we have made every attempt to keep this study as conceptually close to the original as possible.

One difference is that in our study, we use only the name manipulation and added another text manipulation, and not the photograph manipulation, as noted above. Again, this is because we were unsure of what pictures were used and where these images were selected from (or if the experimenters took the photographs themselves), and we did not want to introduce too many differences into the experiment. However, this change is not anticipated to have a huge impact on the results, as name manipulations have been shown previously to be effective in manipulating perceived race (e.g. Bertrand & Mullainathan, 2004). Therefore, the photographs may not be necessary to achieve the desired effect and in fact, in our case, may do more harm than good. As such, we decided that it might be a cleaner manipulation to only include name manipulations and manipulations of leadership organizations individuals were members of rather than names and photographs.

Of course another difference is the passage itself. As much as possible, we used the exact wording used in the original study (e.g. the parts quoted in the article itself). We also made sure to include each part mentioned in the article, using the article itself as well as other studies carried out by Livingston and colleagues, together or individually, to guide our speculation of what exactly the content of the passage was. Although the wording and content of this passage will be inevitably different from the materials used in the original paper, we believe at this point that the passage is a fairly good conceptual replication of the passage used in the original study, meaning that it likely manipulates the same things and should produce the same effect, given that the effect is true.

Other differences are minor and mostly technical. We added some demographic questions that may or may not have been asked in the original study, but this should not impact the main effect of interest in any way because these are not manipulations and are measured at the end of the experiment. We also included a timer on the page of the survey with the passage designed to ensure that participants spend at least 30 seconds on that page reading the passage (but they’re allowed to spend longer if necessary). This is to ensure that participants spend enough time on the page to read the passage and should not change the results, as it is simply to ensure that the manipulation is effective. Finally, we also included a manipulation check, which Livingston et al did not. This manipulation check examines if participants attended to the manipulated race of the target and should not impact the main effect of interest, as it is asked after the main measures are administered.

Methods Addendum (Post Data Collection)

You can comment this section out prior to final report with data collection.

Actual Sample

Sample size, demographics, data exclusions based on rules spelled out in analysis plan

Differences from pre-data collection methods plan

Any differences from what was described as the original plan, or “none”.

Results

Data preparation

Data preparation following the analysis plan.

###Data Preparation

####Load Relevant Libraries and Functions
library(tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag():    dplyr, stats
library(psych)
## 
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha
library(ggplot2)

####Import data
#d = read.csv("~/Desktop/PSYC254/livingston2012/Pilot_A.csv")

#### Data exclusion / filtering
#d.tidy = d %>%
#  filter(manipcheck == 0) %>%
#  dplyr::select(T_race, T_gender, T_behavior, lead_well, lead_max, lead_admire, lead_respect, salary, PvsS, race, age, gender,educ, workex,pol_s, pol_e) 

#### Prepare data for analysis - create columns etc.
#d.tidy$T_race = as.factor(d.tidy$T_race)
#levels(d.tidy$T_race)[1] = "white"
#levels(d.tidy$T_race)[2] = "black"
#d.tidy$T_gender = as.factor(d.tidy$T_gender)
#levels(d.tidy$T_gender)[1] = "male"
#levels(d.tidy$T_gender)[2] = "female"
#d.tidy$T_behavior = as.factor(d.tidy$T_behavior)
#levels(d.tidy$T_behavior)[1] = "agentic"
#levels(d.tidy$T_behavior)[2] = "communal"

#d.tidy$lead_well = as.numeric(d.tidy$lead_well) 
#d.tidy$lead_max = as.numeric(d.tidy$lead_max)
#d.tidy$lead_admire = as.numeric(d.tidy$lead_admire)
#d.tidy$lead_respect = as.numeric(d.tidy$lead_respect)
#d.tidy$salary = as.numeric(d.tidy$salary)
#d.tidy$PvsS = as.numeric(d.tidy$PvsS)

#d.tidy$leadev = rowMeans(d.tidy[,4:8]) #make composite 
#lead = matrix(c(d.tidy$lead_well, d.tidy$lead_max, d.tidy$lead_admire, d.tidy$lead_respect,d.tidy$salary), ncol=5)
#alpha(lead) 

#d.women = d.tidy %>%
#  filter(T_gender == "female")

#d.men = d.tidy %>%
#  filter(T_gender == "male")

Confirmatory analysis

The analyses as specified in the analysis plan.

#hist(leadev)
#hist(PvsS)

#summary(lm(leadev ~ T_race*T_gender*T_behavior, d.tidy)) #main 3 way interaction (leader evaluations)

#summary(lm(leadev ~ T_race*T_behavior, d.women)) #2 way for women (leader evaluations)

#summary(lm(leadev ~ T_race*T_behavior, d.men)) #2 way for men (leader evaluations)

#summary(lm(PvsS ~ T_race*T_gender*T_behavior,d.tidy)) #3 way interaction (attribution of behavior)
#summary(lm(PvsS ~ T_race*T_behavior,d.women)) #2 way for women (attribution of behavior)
#summary(lm(PvsS ~ T_race*T_behavior,d.men)) #2 way for men (attribution of behavior)
#summary(lm(PvsS ~ T_race,d.tidy)) #main effect of race

Comparison of main findings with original study.

original = data.frame(
  race = factor(c("White","White", "Black", "Black","White","White", "Black", "Black"), levels=c("White","Black")),
  gender = factor(c("Male", "Male","Male","Male", "Female","Female","Female","Female"), levels=c("Male","Female")),
  behavior = factor(c("Dominant", "Communal"), levels=c("Dominant","Communal")),
  leadev = c(3.11,3.25,1.97,4.06,2.23,3.85,3.05,3.32),
  sd = c(1.36,1.24,.98,1.27,.7,1.24,.8,1.02)
)

ggplot(data=original, aes(x=race, y = leadev,fill=behavior, ymin=(original$leadev - .5*original$sd), ymax=(original$leadev + .5*original$sd))) +
  facet_wrap(~gender)+ 
  geom_bar(position="dodge", stat="identity") + 
  geom_errorbar(position="dodge") +
  labs(title = "Original plot of main finding from Livingston et al. (2012)",
       x = "Leader type", 
       y = "Leader status")

Original means and standard deviations of leader status score as function of leader’s race, gender, and behavior – as reported in Livingston et al., 2012. Error bars represent standard deviations.

#d.tidy2 = d.tidy %>%
#  group_by(T_behavior,T_race,T_gender) %>% 
#  summarise(m_leadev = mean(leadev)) %>%
#  summarise(sd_leadev = sd(leadev))

#ggplot(data=d.tidy2, aes(x=T_race, y = leadev,fill=T_behavior, ymin=(m_leadev-.5*sd_leadev), ymax=(m_leadev + .5*se_leadev)))) +
#  facet_wrap(~T_gender)+ 
#  geom_bar(position="dodge", stat="identity") + 
#  geom_errorbar(position="dodge") +
#  labs(title = "Results from replication of Livingston et al. (2012)",
#       x = "Leader type", 
#       y = "Leader status")

Means and standard deviations of leader status score as function of leader’s race, gender, and behavior – as found in our replication. Error bars represent standard deviations.

Exploratory analyses

Any follow-up analyses desired (not required).

Discussion

Summary of Replication Attempt

Open the discussion section with a paragraph summarizing the primary result from the confirmatory analysis and the assessment of whether it replicated, partially replicated, or failed to replicate the original result.

Commentary

Add open-ended commentary (if any) reflecting (a) insights from follow-up exploratory analysis, (b) assessment of the meaning of the replication (or not) - e.g., for a failure to replicate, are the differences between original and present study ones that definitely, plausibly, or are unlikely to have been moderators of the result, and (c) discussion of any objections or challenges raised by the current and original authors about the replication attempt. None of these need to be long.