Replication of Study 1 by Canning et al. (2022, Social Psychological and Personality Science)

Author

Kevin Kennedy kevinrk@stanford.edu

Published

October 26, 2025

Introduction

For my replication project, I have decided to replicate Study 1 of Canning et al. (2022). Canning et al. (2022, Study 1) examined the effect of perceived faculty mindset – whether they are perceived to endorse the view that intelligence is fixed or malleable – on student’s anticipated belonging and performance in a hypothetical college calculus class. Canning et al. (2022) were specifically interested in the role of faculty mindset in producing gender disparities in belonging and performance in STEM. Canning et al. (2022) manipulated the perceived mindset by having participants read a course syllabus. After reading the syllabus, participants completed a manipulation check on perceived professor mindset (adapted from Dweck, 1999). They then completed the main outcome measures on perceived stereotype endorsement, anticipated belonging (Murphy & Zirkel, 2015), and then math test performance. The math test performance was 30 GRE problems used in prior research (from Schmader, 2002).

For my replication project, I am specifically interested in replicating the anticipated belonging finding (i.e., Condition x Gender interaction). This finding is the closest to my own research interests on how institutional norms, practices, and policies influence students’ sense of belonging. Given time and resource constraints, I have decided not to replicate the performance finding. I will also adopt the study in several ways to be run on Prolific. The original study was conducted using a university subject pool (N = 217). With permission from the teaching team, I have decided to make the following adaptations. One, I will remove the single mention of mention of “Indiana University honor code” on the stimulus materials, changing it to “the university honor code.” I can also recruit current or recent college students using the filters provided on Prolific, which I have successfully done before as part of my FYP. For the cover story, I could also say that the syllabus is for a community college course rather than for a Stanford course, if there are concerns that the Stanford name comes off as elitist and thus biases the results. I will also try to recruit a relatively even balance of men and women to examine the gender effects.

Link to Github repository Link (backup in case broken): https://github.com/kevinrk97/canning2022

Link to original paper (in repository) Link (backup in case broken): https://github.com/kevinrk97/canning2022/tree/main/original_paper

Link to paradigm (on qualtrics) Link (backup in case broken): https://stanforduniversity.qualtrics.com/jfe/form/SV_dj6cAGsx2dRRUeW

Methods

Power Analysis

The original effect size for the finding I am interested in replicating was the result of the Gender x Condition interaction on anticipated belonging (partial eta squared = .049).

For 80% power - I would need N = 158 (153 denominator df + 5 parameters estimated)

# Load appropriate packages
library(pwr)

# Power test
pwr.f2.test(u = 1, # numerator df
            f2 = (.049/(1-.049)), # effect size converted from partial eta squared
            sig.level = 0.05, # signifance level
            power = 0.8) # Power


     Multiple regression power calculation 

              u = 1
              v = 152.2764
             f2 = 0.05152471
      sig.level = 0.05
          power = 0.8

For 90% power - would need N = 209 (204 denominator df + 5 parameters estimated)

# Load appropriate packages
library(pwr)

# Power test
pwr.f2.test(u = 1, # numerator df
            f2 = (.049/(1-.049)), # effect size converted from partial eta squared
            sig.level = 0.05, # signifance level
            power = 0.90) # Power


     Multiple regression power calculation 

              u = 1
              v = 203.8695
             f2 = 0.05152471
      sig.level = 0.05
          power = 0.9

For 95% power - would need N = 258 (253 denominator df + 5 parameters estimated)

# Load appropriate packages
library(pwr)

# Power test
pwr.f2.test(u = 1, # numerator df
            f2 = (.049/(1-.049)), # effect size converted from Cohen's D
            sig.level = 0.05, # signifance level
            power = 0.95) # Power


     Multiple regression power calculation 

              u = 1
              v = 252.1405
             f2 = 0.05152471
      sig.level = 0.05
          power = 0.95

Planned Sample

Based on the power analysis with 80% power, I will plan to recruit 158 participants. To be eligible for this study, participants must be at least 18 years of age, and either a current or former college student. I will utilize the “education” filters on Prolific to ensure that all participants are enrolled in college or have recently received a college degree. I will set the upper age limit at 25 years old to ensure that participants are recent college graduates. This practice is consistent with past research that I have done.

Materials

As with the original study (Canning et al., 2022, Study 1) participants were exposed to a course syllabus that was designed to imply that the professor had either a fixed or a growth mindset of intelligence. Note that, per Canning et al. (2022), these materials were created through focus groups with college students. Minor changes were made to these materials to adapt them for an online study. For example, I removed the one reference to “Indiana University”.

The complete outcome measures are below, taken from the Supplemental Materials:

Manipulation Check: Perceived faculty fixed mindset. (1 = strongly disagree, 6 = strongly agree) The professor in this class seems to believe that students have a certain amount of intelligence, and they really can’t do much to change it. The professor in this class seems to believe that students either “have it” or they don’t. The professor in this class seems to believe that every student can learn new things and significantly grow their intelligence. The professor in this class seems to believe that some students are smart, while others are not. The professor in this class seems to believe that students who are less smart will always be less smart than the other students in the class.

Perceived stereotype endorsement (1 = strongly disagree, 6 = strongly agree) I think the professor in this class would endorse gender stereotypes. I think the professor in this class would treat male and female students differently in class.

Anticipated belonging (1 = Extremely, 6 = Not at all; all items were recoded so that higher values indicated greater anticipated belonging) If you were a student in this class, how comfortable would you feel during this class? If you were a student in this class, how much would you feel that you could be yourself during this class? If you were a student in this class, how much would you feel that you “fit in” during this class? If you were a student in this class, how alienated would you feel during this class?

Personal mindset, r = .70 (1 = strongly disagree, 6 = strongly agree) You have a certain amount of intelligence, and you can’t really do much to change it. Your basic intelligence is something about you that you can’t change very much.

Procedure

As with Canning et al. (2022, Study 1) participants were recruited to take part in a study on impressions of courses. However, as the replication study was conducted through Prolific, and not a university subject pool, I made some slight modifications. First, participants were told that we are a group of Stanford psychology researchers working with the math department of a local community college to evaluate a new calculus course. After viewing the syllabus, participants provided their perceptions/impressions of the course by completing a manipulation check on perceived professor mindset (adapted from Dweck, 1999), perceived stereotype endorsement, anticipated belonging (Murphy and Zirkel, 2015), and, as a covariate, participant’s own mindset (i.e., personal fixed vs. growth mindset). Canning et al. (2022) also had participants complete a math test (i.e., 30 GRE problems; Schmader, 2002). However, this replication project did not include the math test due to time and resource constraints.

Analysis Plan

Canning et al. (2022) analyzed the key dependent variables by regressing the outcome on gender (0 = male, 1 = female), condition (0 = fixed mindset syllabus, 1 = growth mindset syllabus), the gender X condition interaction, and the personal fixed mindset covariate. They note that all participants were retained in the final analysis. If a participant has missing data the researchers did not impute it and just excluded that participant from that outcome variable’s analysis.

Clarify key analysis of interest here As with Canning et al. (2022), I will regress each of the key outcome variables (i.e., perceived gender stereotyping, anticipated belonging) on gender (0 = male, 1 = female), condition (0 = fixed mindset syllabus, 1 = growth mindset syllabus), the gender X condition interaction, and the personal fixed mindset covariate. However, the key analysis of interest is the gender x condition interaction term for the anticipated belonging outcome measure.

Differences from Original Study

The major known difference between Canning et al. (2022, Study 1) and my replication study are that the original study was run using a university subject pool by having participants complete materials in-person. Thus, the participants were all college students. Likewise, I will also recruit current (or recent) college students, but I will collect data online through Prolific, rather than in person. This difference could influence the results of the replication. For example, it could be that participants are less likely to pay attention online than in person. I have sought to mitigate this issue by using an attention check (i.e., “in your opinion should we use your data”) and by adding timers to prevent participants from reading through the online stimulus material too quickly. However, in any study, whether in person or online, there is always a concern that participants will not engage seriously with the material.

Methods Addendum (Post Data Collection)

You can comment this section out prior to final report with data collection.

Actual Sample

Sample size, demographics, data exclusions based on rules spelled out in analysis plan

Differences from pre-data collection methods plan

Any differences from what was described as the original plan, or “none”.

Results

Data preparation

Data preparation following the analysis plan.

Confirmatory analysis

The analyses as specified in the analysis plan.

Analysis #1: Manipulation Check (controlling for personal mindset)

# Descriptive Statistics
df.data %>% 
  group_by(condition) %>% 
  summarise(mean = mean(faculty_mindset_m, na.rm = T),
            sd = sd(faculty_mindset_m, na.rm = T))

# A tibble: 2 × 3
  condition  mean    sd
  <chr>     <dbl> <dbl>
1 Fixed      3.76 0.889
2 Growth     3.49 0.817

# Inferential statistics 
lm(faculty_mindset_m ~ condition_c + personal_mindset_m,
           data = df.data) %>% 
  summary()


Call:
lm(formula = faculty_mindset_m ~ condition_c + personal_mindset_m, 
    data = df.data)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.6577 -0.4270 -0.0526  0.6258  1.7423 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)          3.3837     0.4844   6.986 1.65e-07 ***
condition_c         -0.2842     0.3137  -0.906    0.373    
personal_mindset_m   0.1055     0.1208   0.873    0.390    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.8573 on 27 degrees of freedom
Multiple R-squared:  0.05224,   Adjusted R-squared:  -0.01796 
F-statistic: 0.7441 on 2 and 27 DF,  p-value: 0.4846

Analysis #2: Perceived Stereotype Endorsement (controlling for personal mindset)

# Descriptive Statistics
df.data %>% 
  group_by(condition) %>% 
  summarise(mean = mean(stereo_endorse_m, na.rm = T),
            sd = sd(stereo_endorse_m, na.rm = T))

# A tibble: 2 × 3
  condition  mean    sd
  <chr>     <dbl> <dbl>
1 Fixed      3.5   1.10
2 Growth     3.67  1.52

# Inferential statistics 
lm(stereo_endorse_m ~ condition_c * gender_c + personal_mindset_m,
           data = df.data) %>% 
  summary()


Call:
lm(formula = stereo_endorse_m ~ condition_c * gender_c + personal_mindset_m, 
    data = df.data)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.1927 -0.9930  0.1515  0.8426  2.3844 

Coefficients:
                     Estimate Std. Error t value Pr(>|t|)   
(Intercept)           3.08913    0.84413   3.660  0.00118 **
condition_c           0.22448    0.65310   0.344  0.73394   
gender_c              0.53109    0.81298   0.653  0.51955   
personal_mindset_m    0.07549    0.20153   0.375  0.71113   
condition_c:gender_c -0.37845    1.09035  -0.347  0.73143   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.389 on 25 degrees of freedom
Multiple R-squared:  0.02576,   Adjusted R-squared:  -0.1301 
F-statistic: 0.1652 on 4 and 25 DF,  p-value: 0.954

Analysis #3: Anticipated Belonging (controlling for personal mindset) ** This is the key analysis **

# Descriptive Statistics
df.data %>% 
  group_by(condition) %>% 
  summarise(mean = mean(belong_m, na.rm = T),
            sd = sd(belong_m, na.rm = T))

# A tibble: 2 × 3
  condition  mean    sd
  <chr>     <dbl> <dbl>
1 Fixed      3.63 1.14 
2 Growth     3.57 0.980

# Inferential statistics 
lm(belong_m ~ condition_c * gender_c + personal_mindset_m,
           data = df.data) %>% 
  summary()


Call:
lm(formula = belong_m ~ condition_c * gender_c + personal_mindset_m, 
    data = df.data)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.8499 -0.7047 -0.3085  0.5838  1.6779 

Coefficients:
                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)            3.7605     0.6224   6.042  2.6e-06 ***
condition_c           -0.3189     0.4816  -0.662   0.5139    
gender_c              -1.2742     0.5995  -2.126   0.0436 *  
personal_mindset_m     0.0596     0.1486   0.401   0.6918    
condition_c:gender_c   1.0653     0.8040   1.325   0.1972    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.025 on 25 degrees of freedom
Multiple R-squared:  0.1689,    Adjusted R-squared:  0.03589 
F-statistic:  1.27 on 4 and 25 DF,  p-value: 0.3081

Side-by-side graph with original graph is ideal here

Exploratory analyses

Any follow-up analyses desired (not required).

Discussion

Summary of Replication Attempt

Open the discussion section with a paragraph summarizing the primary result from the confirmatory analysis and the assessment of whether it replicated, partially replicated, or failed to replicate the original result.

Commentary

Add open-ended commentary (if any) reflecting (a) insights from follow-up exploratory analysis, (b) assessment of the meaning of the replication (or not) - e.g., for a failure to replicate, are the differences between original and present study ones that definitely, plausibly, or are unlikely to have been moderators of the result, and (c) discussion of any objections or challenges raised by the current and original authors about the replication attempt. None of these need to be long.