Replication of Study 3 by Porter et al. (2016, Psychological Science)

Author

Ziwen Chen (ziwench@stanford.edu)

Published

March 27, 2024

Introduction

The experiment to be replicated is Study 3 from Porter et al. (2016). The original study investigated the social cognition of linguistic intergroup bias (LIB), which is the tendency to use abstract language to describe desirable behaviors of in-group members and undesirable behaviors of out-group members, and vice versa. The study believes that the action of using LIB could be used as a cue to infer the speaker’s social identity. In general, this study is related to the fields of linguistics, categorization, and social cognition, which are relevant to my research interests in measuring culture via language.

The replicated study will follow a similar pipeline as the original study. Specifically, 168 participants will be recruited online via Prolific and randomly assigned to one of the two conditions (favorable LIB or unfavorable LIB). In both conditions, participants will be asked to read a short paragraph describing a target person’s political affiliation and a description provided by an unknown communicator about that person’s behavior. The communicator will use favorable LIB in one condition and unfavorable LIB in another condition. After reading the paragraph, participants will be asked to guess the communicator’s political affiliation. Participants will also be asked to rate the likelihood that they will be friends with the communicator. After the experiment, participants will be asked to provide their political affiliation information as well.

The main effect of LIB on social categorization will be tested via ANOVA tests. In addition, a three-way ANOVA analysis (i.e., 2 (LIB condition) × 2 (target’s political affiliation) × 3 (participant’s political affiliation)) will be carried out to test the secondary effect on friendship likelihood.

Below are links to related materials:

Link to the Original study
Link to the First Replication
Link to the Qualtrics Survey
Link to the GitHub repository
Link to the Preregistraion

Summary of prior replication attempt

We only found one replication of the focal study, which is linked above (referred to as the “first replication”). We found no other published replications of the focus study after manually verifying all citations to it.

Method differences

Since the first replication failed to obtain the specific wording of questions and instructions from the authors, they came up with their own wording for the questions and instructions. However, while searching for the original codebook in the OSF repository, I discovered some questions that were used in the original material. The table below compares the wording in the original paper to that in the first replication.

	Original	First Replication
Political Orientation	I endorse many aspects of conservative political ideology (7-point likert scale, 1= Strongly Disagree). I endorse many aspects of conservative political ideology (7-point likert scale, 1= Strongly Disagree) In general, how would you describe your political views on social issues? (7-point likert scale, 1= Very liberal) Generally speaking, do you usually consider yourself a Democrat, a Republican, an Independent, or affliliated with another political party? (Democrat, Republican, Independent, Other political party)	How would you describe your political affiliation? (Democrat, Republican, Independent)
Future Behavior	Please estimate the percentage of future situations in which Peter is likely to help others. (Percentage 0-100) Please estimate the percentage of future situations in which Peter is likely to be rude to others. (Percentage 0-100)	Not Given
Friendship Likelihood	How likely is it that you and the communicator would be friends? (5-point likert scale, 1=not at all likely)	How likely would you personally be friends with Peter?
Communicator’s group	likely group membership of the communicator (8-point likert scale, 1 = definitely a Democrat) (exact question wording are not given)	What do you think is the communicator’s political group membership?

Note that a key difference between the original paper and the first replication is the friendship likelihood question. While the original paper asked for participants to estimate their likelihood to be friend with the communicator, the replication asked for their likelihood to be friend with the target. Apart from this obvious measurement error, it is unclear whether the wording of the other questions might contribute to the failure of the first replication.

Sampling Differences

The original paper recruited 145 participants (mean age= 30.91). Participants’ self-reported political-party affiliations were as follows: 41.4% Democrat, 17.2% Republican, 33.1% Independent, 3.4% “other,” and 4.8% not reported. Seventeen participants were excluded from the analysis.

The first replication recruited 168 participants (mean age = 33.41), based on their power analysis results. Participants’ self-reported political-party affiliations were as follows: 41.4% Democrat, 23.1% Republican, 35.5% Independent. All participants finished all the portions of the study, and none of them was excluded from data analyses.

In terms of size and demographic distribution, the first replication’s sample is generally fairly similar to that of the original research. The only difference is that there were no data exclusion in the first replication than in the original paper; this could be because the first replication did not include a “other” option for the political affiliation question, which was included in the original study. Failure to include a “other” option reduces the exclusion rate in the first replication compared to the original study.

Analysis Differences

I found no discernible analysis differences that could potentially cause differences in analysis results.

Methods

Power Analysis

The main effect of the original study is stated as follow:

Participants in the favorable-LIB condition, relative to those in the unfavorable-LIB condition, were significantly more likely to believe that the communicator and target shared political-group membership, F(1, 121)= 8.67, p < .01, d = 0.51.

For Power Analysis in G*Power, I used the following parameters:

Test Family: F tests.
Statistical Test: ANOVA: Fixed effects, special, main effects and interactions
Type of Power Analysis: A priori (for sample size calculation).
Effect Size (f): 0.255 (converted from Cohen’s d to Cohen’s f)
Alpha Level: 0.05
Number of Groups: 2
Numerator DF: 1

The sample size is estimated to be 123 for 80% power, 164 for 90% power, and 202 for 95% power. We decided to take 168 samples because the original study recruited 145 samples and the first replication recruited 168 samples. It should be noted that this power analysis is based on the main effect, and there is a risk of under-powering the secondary effect calculated using the three-way ANOVA. However, because the first replication failed to replicate both the manipulation and the main effect, we decided to concentrate on the main effect in this study.

Planned Sample

The sample in the original research is described as follows:

One hundred forty-five participants (mean age = 30.91 years) were recruited from MTurk .com. Participants’ self-reported political-party affiliation were as follows: 41.4% Democrat, 17.2% Republican, 33.1% Independent, 3.4% “other,” and 4.8% not reported. Five participants who did not complete one of the dependent measures, 5 participants who reported their political-party affiliation as “other,” and 7 participants who did not report their political-party affiliation were not included in analyses.

We decided to recruit 168 participants for the reasons stated in the power analysis section.

Materials

The following passage describes the materials used in the original study:

Participants were asked to read a passage and then respond to questions. In the Republican-target condition, the passage indicated that Peter had voted for John McCain; in the Democratic-target condition, Peter had voted for Barack Obama. In the second part of the passage, participants were again provided with an unknown communicator’s description of Peter’s helpful and rude behaviors. Following Wigboldus et al. (2000), we included a description of one discrete episode, expressed in the present tense, for each type of behavior (for the full descriptions, see Table S1 in the Supplemental Material available online). For example, the description of helpful behavior in the favorable-LIB condition was written in abstract language and read as follows: “On one occasion, there is a person in a wheelchair who needs assistance getting up a ramp. Peter reaches for the handles of the wheelchair. Peter is helpful.” In the unfavorable-LIB condition, helpful behavior was described concretely: “On one occasion, there is a person in a wheelchair who needs assistance getting up a ramp. Peter reaches for the handles of the wheelchair. Peter pushes the wheelchair up the ramp.” After reading the passage, participants indicated the likely group membership of the communicator on an 8-point scale anchored by 1, definitely a Democrat, and 8, definitely a Republican. They then rated the likelihood that they would be friends with the communicator, using a 5-point scale ranging from 1, it is not at all likely, to 5, it is extremely likely. Finally, participants completed the manipulation-check items and a demographic questionnaire on which they reported their political-party affiliation and political ideology.

We’ll use the same scale and sequence as in the original passage. However, given that the Obama-McCain presidential election took place in 2008, our manipulation may be ineffective. Instead, we decide to provide more generalized group information by stating that Peter has previously voted for the Democratic or Republican Parties. The author of the initial replication pointed out that factors such as a lack of attention check and potential memory loss may result in failure in passing the manipulation check, which would then lead to replication failure in general. The present attempt addressed these issues.

Furthermore, after contacting the original paper’s authors, it appears that the specific material used in the experiment is no longer available for reference, so I took some liberties to infer the best possible practice when the detailed original material is not provided.

Procedure

Building on the material discussed earlier, the procedure for the questionnaire is as follows:

Inform participants of the target social group’s membership.
Present the communicator’s description of the target.
Participants make inferences about the future behavior of the target, the communicator’s group affiliation, and the likelihood of being friends with the communicator.
Participants complete a demographic questionnaire.

Controls

Given the previous replication’s failure and the subtlety of the manipulation, we will add a new attention check question post-inference queries. Participants will be asked, “Which statement correctly describes Peter’s behavior on the previous page?” Possible answers include: “Peter is always polite and attentive,” “Peter assists someone in a wheelchair but is rude by yawning and ignoring his colleague,” “Peter never engages with work colleagues,” and “Peter refuses to help a person in a wheelchair.” The answer should be pretty obvious if the participant read the material.

Analysis Plan

We will conduct three sets of data analyses to replicate the results:

LIB Manipulation Check: This analysis aims to determine if favorable or unfavorable language-induced bias (LIB) conditions lead to different predictions about the target’s future behavior (helpful or rude). The following analyses will be performed: Factorial ANOVAs to estimate the effects of LIB condition (2 levels), target’s political affiliation (2 levels)
Social Category Inference: The focus here is to examine if the LIB conditions influence inferences about the likelihood of shared category membership between the target and communicator. The analyses include: Factorial ANOVAs to estimate the effects of LIB condition (2 levels), target’s political affiliation (2 levels), and participant’s political affiliation (3 levels: Democrat, Republican, Independent).
Friendship Likelihood: We will assess participants’ likelihood of becoming friends with the communicator using: Factorial ANOVAs to estimate the effects of LIB condition (2 levels), target’s political affiliation (2 levels), and participant’s political affiliation (3 levels: Democrat, Republican, Independent).

Differences from Original Study and 1st replication

There are three key differences between the current project and the original study, as well as its first replication:

Change in Wording for Disclosing Target Social Group Membership: We used “Peter voted for the Democratic or Republican Party in the past” instead of “Peter voted for Obama vs. McCain” to indicate Peter’s group information.
Inclusion of Attention Check Questions: Attention check questions have been added in the post-test questionnaire. See description in the “Control” section.
Put the material and question on the same page: To avoid potential memory loss, we now put the material as well as the questions on the same page so that participants could refer to the material while answering questions.

Given that the current version of replication is conducted in a different context (i.e., 15 years after the Obama-McCain presidential election), using slightly different procedural details and survey platform, and using slightly different stimulus to indicate the target’s group membership, we consider it to be a close replication (LeBel et al., 2018). The rationale of experimental design is explained in the sections above.

Methods Addendum (Post Data Collection)

Actual Sample

One hundred and seventy one samples were collected from the Prolific platform. Among the 171 participants, 7 failed the attention test, 7 reported “other” in their political affiliation, and 26 spent less than 100 seconds in the test. When removing the above participants, we got 136 observations. For the 136 observations, the participant mean age is 35.42. The participant’s self-reported political party affiliation was as follows: 44.1% Democrat, 14% Republican, and 41.9% Independent.

Differences from pre-data collection methods plan

Prolific provided three more data points than anticipated; nevertheless, this was not the author’s intention.

Results

Data preparation

### Data Preparation
library(dplyr)
library(tidyr)
library(readr)
library(readxl)
library(ggplot2)
library(knitr)

# Read the data
data_path = "../data"
data <- read_csv(file.path(data_path, "full_anonymized.csv"))

# remove rows that (1)failed the attention check, (2) report "other in political affiliation"(follow the original study), and (3) spent less than 100 seconds in the survey (the specific threshold did not make a huge difference)
data_cleaned <- data %>% 
  filter(attention==2 & polid !=4) %>%
  filter(`Duration (in seconds)`>=100)

# Revert wide to long, change variable names
data_cleaned <- data_cleaned %>%
  rename(future_help1 = future_help1_1,
         future_help2 = future_help2_1,
         future_help3 = future_help3_1,
         future_help4 = future_help4_1,
         future_rude1 = future_rude1_1,
         future_rude2 = future_rude2_1,
         future_rude3 = future_rude3_1,
         future_rude4 = future_rude4_1) %>%
  pivot_longer(cols = c(starts_with("future_help"), starts_with("future_rude"),
                        starts_with("pol_cate"), starts_with("likelyfriends")),
               names_to = c(".value", "condition"),
               names_sep = "(?<=\\D)(?=\\d)",
               values_drop_na = TRUE) %>%
  mutate(lib_cate = if_else(condition %in% c(1, 3), "F", "U"),
         group_cate = if_else(condition %in% c(1, 2), "R", "D"),
         shared_group = if_else(condition %in% c(1, 2), 8 - pol_cate, pol_cate))

data_cleaned$lib_cate <- factor(data_cleaned$lib_cate)
data_cleaned$group_cate <- factor(data_cleaned$group_cate)
data_cleaned$polid <- factor(data_cleaned$polid, levels = c(1, 2, 3), labels = c("Democrat", "Republican", "Independent"))

Confirmatory analysis

Manipulation Check

The manipulation check failed, according to the data analysis results shown below. Different LIB manipulations had no effect on forecasting the target’s future helpfulness and rudeness.

# Descriptive Stats: Future Helpfulness
data_cleaned %>%
  group_by(lib_cate) %>%
  summarise(
    mean = mean(future_help),
    se = sd(future_help) / sqrt(n()))%>%
  kable(align = "r")

lib_cate	mean	se
F	63.90769	2.479893
U	62.57746	2.172506

# Descriptive Stats: Future Rudeness
data_cleaned %>%
  group_by(lib_cate) %>%
  summarise(
    mean = mean(future_rude),
    se = sd(future_rude) / sqrt(n()))%>%
  kable(align = "r")

lib_cate	mean	se
F	60.73846	2.755002
U	57.49296	2.094149

# ANOVA test: Future Helpfulness
res.aov <- aov(future_help~lib_cate, data = data_cleaned)
print(summary(res.aov))

             Df Sum Sq Mean Sq F value Pr(>F)
lib_cate      1     60      60   0.164  0.686
Residuals   134  49041     366

# ANOVA test: Future Rudeness
res.aov <- aov(future_rude~lib_cate, data = data_cleaned)
print(summary(res.aov))

             Df Sum Sq Mean Sq F value Pr(>F)
lib_cate      1    357   357.4   0.897  0.345
Residuals   134  53370   398.3

# T-tests: Future Helpfulness
print(t.test(future_help~lib_cate, data = data_cleaned))


    Welch Two Sample t-test

data:  future_help by lib_cate
t = 0.40348, df = 129.95, p-value = 0.6873
alternative hypothesis: true difference in means between group F and group U is not equal to 0
95 percent confidence interval:
 -5.192348  7.852803
sample estimates:
mean in group F mean in group U 
       63.90769        62.57746

# print(t.test(future_help~lib_cate, data = data_cleaned[data_cleaned$group_cate == 'D', ]))
# print(t.test(future_help~lib_cate, data = data_cleaned[data_cleaned$group_cate == 'R', ]))

# T-tests: Future Rudeness
print(t.test(future_rude~lib_cate, data = data_cleaned))


    Welch Two Sample t-test

data:  future_rude by lib_cate
t = 0.93785, df = 122.07, p-value = 0.3502
alternative hypothesis: true difference in means between group F and group U is not equal to 0
95 percent confidence interval:
 -3.60499 10.09600
sample estimates:
mean in group F mean in group U 
       60.73846        57.49296

# print(t.test(future_rude~lib_cate, data = data_cleaned[data_cleaned$group_cate == 'D', ]))
# print(t.test(future_rude~lib_cate, data = data_cleaned[data_cleaned$group_cate == 'R', ]))

Main Effect

Despite the fact that the manipulation effect failed, the main effect has indeed been reproduced, especially when the target was a Democrat. When the target is a Republican, however, the main effect fails to reproduce. The effect of the target’s political affiliation is something that was not discovered in the original study. Consider the fact that the manipulation effect failed; it is likely that something else might cause the main effect.

## prepare data
summary_data <- data_cleaned %>%
  group_by(lib_cate, group_cate) %>%
  summarise(
    mean = mean(shared_group),
    se = sd(shared_group) / sqrt(n()))

## bar plot
# note that the scale of y in original plot is 3.5 to 6. Here we change it to 2-6 due to missing datapoints
ggplot(summary_data, aes(x = group_cate, y = mean,  fill = lib_cate)) +
  geom_bar(stat = "identity", position = position_dodge()) +
  geom_errorbar(aes(ymin = mean - se, ymax = mean + se), 
                width = 0.2, 
                position = position_dodge(0.9))+
  coord_cartesian(ylim = c(2, NA)) +
  labs(x = "Political Group", y = "Likelihood of Shared Group Membership") +
  theme_minimal()

## one way anova: LIB on shared group membership
res.aov <- aov(shared_group ~ lib_cate, data = data_cleaned)
print(summary(res.aov))

             Df Sum Sq Mean Sq F value Pr(>F)  
lib_cate      1   20.7  20.729   6.396 0.0126 *
Residuals   134  434.3   3.241                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## three way anova
res.aov3 <- aov(shared_group ~ lib_cate* group_cate* polid, data = data_cleaned)
print(summary(res.aov3))

                           Df Sum Sq Mean Sq F value   Pr(>F)    
lib_cate                    1   20.7   20.73   8.041  0.00534 ** 
group_cate                  1   93.5   93.50  36.268 1.81e-08 ***
polid                       2    6.8    3.38   1.313  0.27281    
lib_cate:group_cate         1    5.6    5.55   2.154  0.14477    
lib_cate:polid              2    7.0    3.51   1.363  0.25966    
group_cate:polid            2    1.5    0.76   0.296  0.74413    
lib_cate:group_cate:polid   2    0.2    0.11   0.041  0.95975    
Residuals                 124  319.7    2.58                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## t test
print(t.test(shared_group~lib_cate, data = data_cleaned[data_cleaned$group_cate == 'D', ]))


    Welch Two Sample t-test

data:  shared_group by lib_cate
t = 3.0639, df = 65.77, p-value = 0.003165
alternative hypothesis: true difference in means between group F and group U is not equal to 0
95 percent confidence interval:
 0.389303 1.845991
sample estimates:
mean in group F mean in group U 
       5.411765        4.294118

print(t.test(shared_group~lib_cate, data = data_cleaned[data_cleaned$group_cate == 'R', ]))


    Welch Two Sample t-test

data:  shared_group by lib_cate
t = 0.7319, df = 65.83, p-value = 0.4668
alternative hypothesis: true difference in means between group F and group U is not equal to 0
95 percent confidence interval:
 -0.5107275  1.1018348
sample estimates:
mean in group F mean in group U 
       3.322581        3.027027

Secondary Effect

We did not detect a significant interaction between Target’s Political Affiliation and Participant’s Political Affiliation. Furthermore, contrary to the original study, which found that the LIB Condition Target’s Political Affiliation interaction was strongest for democrat participants, marginal for republican participants, and nonsignificant for independent participants, we discovered that independent participants preferred being friends with communicators who used favorable LIB frames.

# Three way ANOVA
res.aov3 <- aov(likelyfriends ~ lib_cate* group_cate* polid, data = data_cleaned)
print(summary(res.aov3))

                           Df Sum Sq Mean Sq F value   Pr(>F)    
lib_cate                    1   2.45   2.446   3.329 0.070473 .  
group_cate                  1   0.71   0.708   0.964 0.328063    
polid                       2   0.75   0.375   0.510 0.601541    
lib_cate:group_cate         1   0.78   0.785   1.068 0.303371    
lib_cate:polid              2  13.71   6.856   9.332 0.000168 ***
group_cate:polid            2   1.61   0.803   1.094 0.338204    
lib_cate:group_cate:polid   2   0.70   0.352   0.480 0.620153    
Residuals                 124  91.10   0.735                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Two Way ANOVA for democrat participant
res.aov2 <- aov(likelyfriends ~ lib_cate* group_cate, data = subset(data_cleaned, polid=='Democrat'))
print(summary(res.aov2))

                    Df Sum Sq Mean Sq F value Pr(>F)
lib_cate             1   0.41  0.4074   0.576  0.451
group_cate           1   0.41  0.4111   0.581  0.449
lib_cate:group_cate  1   0.54  0.5395   0.762  0.386
Residuals           56  39.63  0.7076

# Two Way ANOVA for republican participant
res.aov2 <- aov(likelyfriends ~ lib_cate* group_cate, data = subset(data_cleaned, polid=='Republican'))
print(summary(res.aov2))

                    Df Sum Sq Mean Sq F value Pr(>F)  
lib_cate             1  1.321  1.3212   4.740 0.0458 *
group_cate           1  2.373  2.3728   8.513 0.0106 *
lib_cate:group_cate  1  0.230  0.2304   0.826 0.3777  
Residuals           15  4.181  0.2787                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Two Way ANOVA for independent participant
res.aov2 <- aov(likelyfriends ~ lib_cate* group_cate, data = subset(data_cleaned, polid=='Independent'))
print(summary(res.aov2))

                    Df Sum Sq Mean Sq F value   Pr(>F)    
lib_cate             1  14.25  14.246  15.964 0.000201 ***
group_cate           1   0.07   0.071   0.080 0.778570    
lib_cate:group_cate  1   0.63   0.632   0.708 0.403727    
Residuals           53  47.30   0.892                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The code below is used to build a table with summary data on participants’ mean self-reported likelihood of friendship with the communicator. We found no evidence of in-group favortism, contrary to what the original study found.

# recreate the table
summary_data <- data_cleaned %>%
  group_by(polid, lib_cate, group_cate) %>%
  summarise(
    mean_value = mean(likelyfriends),
    sd_value = sd(likelyfriends)
  ) %>%
  ungroup()

# Next, pivot the data to a wider format
wide_data <- summary_data %>%
  pivot_wider(
    names_from = c(group_cate, lib_cate),
    values_from = c(mean_value, sd_value),
    names_glue = "{group_cate}_{lib_cate}_{.value}"
  )

# Create a function to combine mean and sd into a single string
combine_mean_sd <- function(mean, sd) {
  sprintf("%.2f (%.2f)", mean, sd)
}

# Apply the function to each pair of mean and sd columns
for (target in unique(data_cleaned$group_cate)) {
  for (lib in unique(data_cleaned$lib_cate)) {
    mean_col_name <- paste(target, lib, "mean_value", sep = "_")
    sd_col_name <- paste(target, lib, "sd_value", sep = "_")
    combined_col_name <- paste(target, lib, sep = " ")
    wide_data <- wide_data %>%
      mutate(!!combined_col_name := combine_mean_sd(!!sym(mean_col_name), !!sym(sd_col_name)))
  }
}

# Select only the combined columns and the polid
final_columns <- c("polid", paste(rep(unique(data_cleaned$group_cate), each = length(unique(data_cleaned$lib_cate))), 
                                  unique(data_cleaned$lib_cate), sep = " "))
final_data <- wide_data %>%
  select(all_of(final_columns))

kable(final_data)

polid	R U	R F	D U	D F
Democrat	2.59 (0.80)	2.60 (0.91)	2.60 (0.91)	2.23 (0.73)
Republican	3.50 (0.58)	2.67 (0.58)	2.57 (0.53)	2.20 (0.45)
Independent	1.94 (0.77)	3.15 (0.90)	2.08 (0.79)	2.88 (1.20)

Exploratory analyses

I have imported data from the original study to test if the analysis method used above could reproduce the results based on the original data. The general conclusion is that the analysis method used above have no problem reproducing results reported in the original study. Therefore, the failure of reproduction could only be attributed to the data quality and data collection process.

original_data <-read_excel(file.path(data_path, "original_study.xlsx"))
original_data<-original_data%>%
  mutate(lib_cate = if_else(LIB_condition ==1, "F", "U"),
         group_cate = if_else(RepDemcondition==1, "D", "R"))

original_data$lib_cate <- factor(original_data$lib_cate)
original_data$group_cate <- factor(original_data$group_cate)
original_data$polid <- factor(original_data$polid, levels = c(1, 2, 3), labels = c("Democrat", "Republican", "Independent"))


# Manipulation Check: pass
res.aov <- aov(future_help~lib_cate, data = original_data)
print(summary(res.aov))

             Df Sum Sq Mean Sq F value  Pr(>F)   
lib_cate      1   4195    4195   10.34 0.00164 **
Residuals   131  53133     406                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

res.aov <- aov(future_rude~lib_cate, data = original_data)
print(summary(res.aov))

             Df Sum Sq Mean Sq F value Pr(>F)   
lib_cate      1   4504    4504   9.352 0.0027 **
Residuals   131  63092     482                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Main Result
summary_data <- original_data %>%
  group_by(lib_cate, group_cate) %>%
  summarise(
    mean = mean(target_pol_categorization),
    se = sd(target_pol_categorization) / sqrt(n()))

## bar plot
# note that the scale of y in original plot is 3.5 to 6. Here we change it to 2-6 due to missing datapoints
ggplot(summary_data, aes(x = group_cate, y = mean,  fill = lib_cate)) +
  geom_bar(stat = "identity", position = position_dodge()) +
  geom_errorbar(aes(ymin = mean - se, ymax = mean + se), 
                width = 0.2, 
                position = position_dodge(0.9))+
  coord_cartesian(ylim = c(2, NA)) +
  labs(x = "Political Group", y = "Likelihood of Shared Group Membership") +
  theme_minimal()

# Secondary Result
res.aov3 <- aov(likelyfriends ~ lib_cate* group_cate* polid, data = original_data)
print(summary(res.aov3))

                           Df Sum Sq Mean Sq F value Pr(>F)  
lib_cate                    1   0.00   0.003   0.003 0.9544  
group_cate                  1   0.11   0.107   0.135 0.7138  
polid                       2   1.26   0.632   0.802 0.4508  
lib_cate:group_cate         1   0.44   0.439   0.556 0.4572  
lib_cate:polid              2   0.84   0.421   0.533 0.5879  
group_cate:polid            2   4.95   2.475   3.140 0.0469 *
lib_cate:group_cate:polid   2   7.25   3.623   4.595 0.0119 *
Residuals                 121  95.40   0.788                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Two Way ANOVA for democrat participant
res.aov2 <- aov(likelyfriends ~ lib_cate* group_cate, data = subset(original_data, polid=='Democrat'))
print(summary(res.aov2))

                    Df Sum Sq Mean Sq F value Pr(>F)  
lib_cate             1   0.22   0.217   0.263 0.6101  
group_cate           1   0.88   0.878   1.064 0.3067  
lib_cate:group_cate  1   4.36   4.361   5.287 0.0252 *
Residuals           56  46.19   0.825                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Two Way ANOVA for republican participant
res.aov2 <- aov(likelyfriends ~ lib_cate* group_cate, data = subset(original_data, polid=='Republican'))
print(summary(res.aov2))

                    Df Sum Sq Mean Sq F value Pr(>F)  
lib_cate             1  0.113   0.113   0.105 0.7489  
group_cate           1  3.447   3.447   3.207 0.0878 .
lib_cate:group_cate  1  3.305   3.305   3.074 0.0941 .
Residuals           21 22.575   1.075                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Two Way ANOVA for independent participant
res.aov2 <- aov(likelyfriends ~ lib_cate* group_cate, data = subset(original_data, polid=='Independent'))
print(summary(res.aov2))

                    Df Sum Sq Mean Sq F value Pr(>F)
lib_cate             1  0.478  0.4775   0.789  0.379
group_cate           1  0.803  0.8025   1.326  0.256
lib_cate:group_cate  1  0.002  0.0017   0.003  0.958
Residuals           44 26.635  0.6053

# recreate the table
summary_data <- original_data %>%
  group_by(polid, lib_cate, group_cate) %>%
  summarise(
    mean_value = mean(likelyfriends),
    sd_value = sd(likelyfriends)
  ) %>%
  ungroup()

# Next, pivot the data to a wider format
wide_data <- summary_data %>%
  pivot_wider(
    names_from = c(group_cate, lib_cate),
    values_from = c(mean_value, sd_value),
    names_glue = "{group_cate}_{lib_cate}_{.value}"
  )

# Create a function to combine mean and sd into a single string
combine_mean_sd <- function(mean, sd) {
  sprintf("%.2f (%.2f)", mean, sd)
}

# Apply the function to each pair of mean and sd columns
for (target in unique(original_data$group_cate)) {
  for (lib in unique(original_data$lib_cate)) {
    mean_col_name <- paste(target, lib, "mean_value", sep = "_")
    sd_col_name <- paste(target, lib, "sd_value", sep = "_")
    combined_col_name <- paste(target, lib, sep = " ")
    wide_data <- wide_data %>%
      mutate(!!combined_col_name := combine_mean_sd(!!sym(mean_col_name), !!sym(sd_col_name)))
  }
}

# Select only the combined columns and the polid
final_columns <- c("polid", paste(rep(unique(original_data$group_cate), each = length(unique(original_data$lib_cate))), 
                                  unique(original_data$lib_cate), sep = " "))
final_data <- wide_data %>%
  select(all_of(final_columns))

# View the final table
kable(final_data)

polid	D F	D U	R F	R U
Democrat	3.15 (1.07)	2.47 (0.87)	2.33 (0.98)	2.73 (0.70)
Republican	2.00 (0.63)	2.60 (1.52)	3.50 (0.55)	2.62 (1.19)
Independent	2.50 (0.85)	2.70 (0.67)	2.23 (0.60)	2.45 (0.93)

Discussion

Mini meta analysis

	Original	1st Replication	2nd Replication
Manipulation (Future Helpful)	F(1, 129) = 10.14, p < .01	F(1, 165) =7.07, p < .01	F(1, 134) = 0.16, p = 0.69
Manipulation (Future Rude)	F(1, 129) = 9.40, p < .01	Not reported	F(1, 134) = 0.90, p = 0.35
Main Effect (Social Categorization)	F(1, 121) = 8.67, p < .01	F(1, 167) = 0.5, p = 0.48	F(1, 134) = 6.40, p = 0.01
Secondary Effect (Friendship likelihood)	F(2, 121) = 4.60, p = .01	F(2, 157) = 0.62, p = 0.53	F(1, 124) =0.48, p = 0.62

We also provided a graphic comparison of the main effect across original study, first replication, and the current replication:

Original

1st Replication

2nd Replication

Summary of Replication Attempt

As mentioned above, this study tries to replicate three effects: (1) Manipulation effect of LIB on future rude/helpfulness (2)Main Effect of LIB on social categorization (3)Secondary effect on friendship likelihood. This replication attempt failed to replicate the manipulation effect and the secondary effect, but, surprisingly, replicated the main effect. And we find that the success of the replication of the main effect is mainly driven by the particular experiment condition when the target was setted up as voted for democratic party. As a result, we consider the numerical score of the replication success to be 0.25, reflecting that only a small portion of the target effects are successfully replicated.

Commentary

I’ll avoid speculating too deeply on why the replication failed due to limited evidence, but there are a few potential reasons:

Theory-Level: The manipulation check may not actually mediate the effect of LIB. The use of favorable LIB frames by a communicator might only make people aware of the communicator’s bias towards the target, without changing their own views about the target. People could recognize the use of LIB frames and infer the communicator’s group membership, yet remain unpersuaded by these frames. This could lead to a failed manipulation but a replicated main effect.
Experiment-Level:

Participant inattention. The average survey completion time was 221 seconds, shorter than the estimated 5 minutes, suggesting that participants might not have paid enough attention to the nuances in the material.
Modification of Experimental Material: I altered the wording for disclosing the target’s social group membership. This change might have unexpectedly affected the experiment, although the specifics of this impact are unclear.
Change in Experimental Environment: The original study was conducted a decade ago, and since then, the political landscape has significantly changed. The current highly polarized political atmosphere could overshadow the subtle effects of LIB, though it’s also conceivable that it could amplify them. The exact reasons for this potential override, rather than amplification, remain unclear.

Other Comments:

Past replication efforts identified two design-level issues that could have led to replication failure: separation of the vignette and questions and absence of an attention check. However, these issues were addressed in this current version.
A minor inconsistency in the original studies was noted, though it likely didn’t significantly impact replication failure. The original study used different Likert scales for the same research question. In study 1, group membership likelihood was rated on a 7-point scale (1 being ‘definitely a Republican’, 7 being ‘definitely a Democrat’), whereas in study 3, it was an 8-point scale (1 for ‘definitely a Democrat’, 8 for ‘definitely a Republican’), with no justification for the change provided.