Replication of Study 1 by Muenks, K., Canning, E. A., LaCosse, J., Green, D. J., Zirkel, S., Garcia, J. A., & Murphy, M. C. (2020, Journal of Experimental Psychology:General)
Author
Ramya Kumar (ramyakmr@stanford.edu)
Published
December 1, 2025
Introduction
My current research interests focus on how messaging about belonging, ability mindset, and values communicated by the environment (in the form of media, professors, teachers, peers, etc) can impact underserved students’ experiences in STEM education. My qualifying project explores how the classroom remarks related to belonging (encouraged by the instructors’ own personal experiences when they were taking the class) made by STEM instructors on the first day of class can “set the scene” for students, potentially affecting their sense of belonging in the course. I was drawn to Muenks et al. (2020) paper because study 1 uses similar stimuli and measurements to my qualifying project. In particular, the researchers created video clips of the first section of a made-up calculus I class that highlighted either fixed or growth mindset ability beliefs from the instructor. Study 1 found that students who perceived their instructor to endorse more fixed mindset beliefs reported less belonging in the class, higher impostor feelings, and higher evaluation anxiety.
I am interested in replicating Study 1 of the paper. Study 1 had 255 prolific participants who indicated that they were currently enrolled in college. These participants were randomly assigned to watch a video clip of the first section of a calculus class (either growth mindset condition, fixed mindset condition, or control condition). The experiment manipulates the instructor’s endorsement of mindset beliefs depicted in video clips of the first section of a calculus class. The script of the video clips goes through course syllabus policies but reflects the instructor’s ability mindset beliefs with language indicating who should be taking the class (if it was the control video clip, no ability mindset belief language was used). After students watch the video clip, they were asked a series of survey questions regarding their perception of the class and instructor. In particular, the study was interested in observing how it affected their belonging, interest, and motivation in the class by using Likert-scale response items. The study had 255 prolific participants who indicated that they were currently enrolled in college.
Link to the Repository: https://github.com/psych251/Muenks2020#
Link to the original paper: https://github.com/psych251/Muenks2020/blob/main/original_paper/Muenks2020.pdf
Methods
Power Analysis
Original effect size, power analysis for samples to achieve 80%, 90%, 95% power to detect that effect size. Considerations of feasibility for selecting planned sample size. For study 1, the original effect size detected are summarized in Table 2 of the paper. I am interested in replicating the main effect of condition on the same psychosocial variables in the study. The effect sizes for the main effect of condition are:
Using G*power, apriori analysis with 80% power suggests that I need between 244-87 participants total in order to see the same effect sizes (to achieve an f-value being between .20 to .33) mentioned in the paper (ANCOVA test with 3 groups and 3 covariates). I took the average f effect size (.28), in which the apriori analysis at 80% power indicates that I need 127 participants to be sufficiently powered to replicate the findings from the original study. In order to divide the number evenly amongst 3 groups, I will be collecting 129 participants (43 in each condition).
Planned Sample
The planned sample will be college students in the United States recruited from Prolific. In order to achieve the sample similar to the study, I will be using the same filters/pre-selection criteria as the original paper. The paper used 2 critera: 1. Participant must be located in the US and 2. The participant must be currently a college student. The same filters on Prolific will be used for this replication.
In addition to the pre-selection filters, an additional filter is used in this experiment (not mentioned in the original). This addition is exclusion of participants who fail the audio check at the begining of the experiment (i.e. participants who cannot hear the audio in the study session). These participants will be removed from the study because the video clips in each condition require participants to hear the audio. It is unclear if an audio check is used in the original paper’s study.
Materials and Procedures
See original paper’s material, procedure, and measurement information below:
“Institutional Review Board (IRB) approval was obtained prior to data collection. Students were invited to participate in a study about their opinions of college courses.They were randomly assigned to watch one of three short video clips from the first day of a Calculus course in which the mindset manipulations were embedded. In these course introduction videos, the same actor (an older White man2) read several sections of his syllabus regarding his beliefs about what it took to do well in the class and his expectations of students (see the online supplemental materials for the class description and video scripts). Interspersed through the professor’s remarks were comments suggesting that he either endorsed (a) fixed mindset beliefs (e.g., “In this course, you either know the concepts and have the skills, or you don’t”), (b) growth mindset beliefs (“These assignments are designed to help you improve your skills throughout the semester”), or (c) no mindset beliefs. After watching the randomly assigned video, students completed a brief survey that included our dependent variables, were debriefed, and compensated for their participation. The full text of all measures and manipulations can be found in the online supplemental materials.
Professor mindset manipulation check. To assess whether students perceived the manipulation as intended, students responded to four items adapted from Dweck (1999) about their perceptions of the professor’s mindset (e.g., “The professor in this class seems to believe that students have a certain amount of intelligence, and they really can’t do much to change it”) on a scale from 1 (strongly disagree) to 6 (strongly agree). Items were recoded so that higher values indicated stronger fixed mindset perceptions and averaged to form a composite.
Demographics. Students self-reported their gender (0 = male, 1 = female) and race/ethnicity, which we then coded to denote underrepresented racial minority (URM) group membership (0 = White, East Asian, and Southeast Asian students; 1 = Black, Hispanic, Indian Subcontinent, Native American, Mixed race, Middle Eastern, and Other race students). Because previous research has found mean-level differences in students’ belonging and evaluative concerns in STEM settings based on their gender and race (e.g., Murphy & Zirkel, 2015; Rudolph & Conley, 2005), we coded and controlled for these factors in our analyses so that we could observe the effect of faculty mindset above and beyond these potential group differences.
Personal mindset beliefs. Four items (Dweck, 1999) assessed students’ own mindset beliefs (e.g., “You have a certain amount of intelligence, and you can’t really do much to change it”) rated on a scale from 1 (strongly agree) to 6 (strongly disagree). We recoded values such that higher values indicated greater fixed mindset beliefs. In our analyses, we controlled for students’ personal mindset beliefs because much previous research has found that people’s personal mindset beliefs influence their experiences and behavior (e.g., Blackwell et al., 2007; see Yeager & Dweck, 2012, for a review) and we aimed to examine the effect of faculty mindset above and beyond students’ personal mindset beliefs.
Psychological vulnerability. To assess students’ anticipated psychological vulnerability in the professor’s class, we measured their anticipated feelings of belonging using five items adapted from Murphy and Zirkel (2015; e.g., “How much would you feel that you ‘fit in’ during this class?”). We also measured their anticipated evaluative concerns using five items adapted from Wout, Murphy, and Steele (2010; e.g., “How much would you worry that you might say the wrong thing in class?”). Both scales ranged from 1 (not at all) to 7 (extremely). A composite was created by averaging the belonging and evaluative concerns items; higher numbers indicated greater anticipated psychological vulnerability in the professor’s class.
Course engagement. Anticipated course engagement was assessed by asking students how motivated and willing they would be to put in effort in the professor’s course using a three-item measure (e.g., “I think I would be willing to put in extra effort if the professor asked me to”). Items were rated on a scale ranging from 1 (strongly disagree) to 8 (strongly agree) and averaged to form a course engagement composite; higher scores indicate more course engagement.
Course interest. To assess anticipated course interest, students completed a three-item measure (e.g., “How interested would you be in taking a class taught by the professor?”). The scale ranged from 1 (not at all) to 6 (extremely) and items were averaged to form a course interest composite; higher scores indicate more course interest.
Course performance. To assess anticipated course performance, students completed a three-item measure (“I think I would get a good grade in this class). The scale ranged from 1 (strongly disagree) to 8 (strongly agree); items were averaged to form an anticipated performance composite; higher scores indicate greater anticipated performance.”
Materials, procedures, and measurements for this replication were followed almost precisely to the original. The materials and measurements were obtained from the “supplementary materials” portion of the original paper. There were some key differences between the original and the replication of study 1. In particular, we did not have access to the original videos used in the original experiment. Using the script provided in the paper, we re-created the videos using a different actor (we did still use an older, white man like the original paper). Additionally, no response scale label options were provided in the materials (only the minimum and maximum values/labels), so I had to guess what the middle label values were. Additionally, I added audio and attention checks to the replication to make sure the participants were able to hear the video and understand the content.
Analysis Plan
I will be following the analysis plan for study 1 precisely from the original paper:
“To analyze the manipulation check, we used an analysis of covariance (ANCOVA) and for the remaining analyses, we employed analyses of covariance (ANCOVAs) with students’ personal mindset beliefs, their gender (0 = male, 1 = female), and their race (0 =White, East Asian, and Southeast Asian students; 1 = Black, Hispanic, Indian Subcontinent, Native American, Mixed race, Middle Eastern, Other race students) entered as covariates. Table 2 includes summary statistics from the ANCOVA analyses across variables. Figure 2 depicts condition differences across variables.”
Clarify key analysis of interest here For this replication, I be conducting the ANCOVA analysis (using personal mindset, gender, and race as covariates) in order to replicate the main effects of condition (growth mindset instructor, fixed minset instructor, control instructor) on anticipated belonging, evaluative concerns, course engagement, course interest, and course performance. Unlike the original experiment, I will not be replicating the psychological vulnerability composite.
Differences from Original Study
Explicitly describe known differences in sample, setting, procedure, and analysis plan from original study. The goal, of course, is to minimize those differences, but differences will inevitably occur. Also, note whether such differences are anticipated to make a difference based on claims in the original article or subsequent published research on the conditions for obtaining the effect.
Methods Addendum (Post Data Collection)
You can comment this section out prior to final report with data collection.
Actual Sample
Sample size, demographics, data exclusions based on rules spelled out in analysis plan
Differences from pre-data collection methods plan
Any differences from what was described as the original plan, or “none”.
Results
For this assignment, I am using both my pilot A (friends, family, and course TAs) and pilot B (2 participants from prolific) to test run the ANCOVA and main analysis of interest.
Data preparation
Data preparation following the analysis plan.
## Load all packages needed for reshaping datalibrary(tidyverse) # for piping, useful packageslibrary(ltm) # for Cronbach's alphalibrary(effects) # for predicted data for plottinglibrary(ggplot2) # for graphslibrary(sjPlot) # correlation & model outputlibrary(car) # for comparing coefficientslibrary(emmeans) # for unpacking interactionslibrary(ggpubr) # for significance on plotslibrary(psych)library(effectsize)library(jpeg)library(grid)library(gridExtra)pilotb <-read.csv("pilotb_data.csv")
#functionrun_ancova <-function(outcome_var, data) { formula <-as.formula(paste(outcome_var, "~ condition + dc_urm + dc_gender + personalmindset")) model <-lm(formula, data = data)return(model)}# List of outcome variablesoutcomes <-c("belonging", "evaluation", "engagement", "interest", "preformance")# Run ANCOVA for each outcomemodels <-lapply(outcomes, run_ancova, data = df.pilotb)names(models) <- outcomes# View results for eachlapply(models, function(m) Anova(m, type ="II"))
Warning: There were 10 warnings in `summarise()`.
The first warning was:
ℹ In argument: `CI_lower = Mean - qt(0.975, df = sum(!is.na(value)) - 1) * SE`.
ℹ In group 16: `condition = NA` and `variable = "belonging"`.
Caused by warning in `qt()`:
! NaNs produced
ℹ Run `dplyr::last_dplyr_warnings()` to see the 9 remaining warnings.
Side-by-side graph with original graph is ideal here
Exploratory analyses
Any follow-up analyses desired (not required).
Discussion
Summary of Replication Attempt
Open the discussion section with a paragraph summarizing the primary result from the confirmatory analysis and the assessment of whether it replicated, partially replicated, or failed to replicate the original result.
Commentary
Add open-ended commentary (if any) reflecting (a) insights from follow-up exploratory analysis, (b) assessment of the meaning of the replication (or not) - e.g., for a failure to replicate, are the differences between original and present study ones that definitely, plausibly, or are unlikely to have been moderators of the result, and (c) discussion of any objections or challenges raised by the current and original authors about the replication attempt. None of these need to be long.