The paper I will be replicating for my PSYCH 251 project is a study conducted by Muenks et al. (2020) published in the Journal of Experimental Psychology: General. In particular, the paper is in the field of educational psychology. In this paper the authors were interested in exploring how college students’ perception of their STEM instructors’ mindset beliefs (particularly the concepts of intelligence malleability beliefs [i.e. fixed vs growth mindset]) influence students’ psychological experiences in their STEM classes. The paper conducted 2 experiments and 2 field studies and found that students who perceived their instructor to hold fixed mindsets influence anticipated (study 1 and 2) and actually experienced (study 3 and 4) lower sense of belonging, greater evaluative concerns, and greater psychological vulnerability which predicted future likelihood to drop-out, lower grades, and lower attendance in their STEM classes.
I am interested in replicating Study 1 of the paper. Study 1 had 255 prolific participants who indicated that they were currently enrolled in college. These participants were randomly assigned to watch a video clip of the first section of a calculus class (either growth mindset condition, fixed mindset condition, or control condition). The experiment manipulates the instructor’s endorsement of mindset beliefs depicted in the video clips. The script of the video clips goes through class policies and reflects the instructor’s ability mindset beliefs with language throughout indicating who should be taking the class (if it was the control video clip, no ability mindset belief language was used). After students watched the video clip, they were asked a series of survey questions regarding their perception of the class and instructor. In particular, the study was interested in observing how the videos affected their belonging, interest, and motivation in the class by using Likert-scale response items.
*Link to pre-registration: https://osf.io/8wp5k/overview
*Link to the repository: https://github.com/psych251/Muenks2020#
*Link to the survey paradigm: https://stanforduniversity.qualtrics.com/jfe/form/SV_7PNJTKIHxk8Gk7k
*Link to the original paper: https://github.com/psych251/Muenks2020/blob/main/original_paper/Muenks2020.pdf
Methods
Power Analysis
For study 1, the original effect size detected are summarized in Table 2 of the paper (shown below). I am interested in replicating the main effect of condition on the same psychosocial variables in the study. The effect sizes for the main effect of condition are:
Using G*power, apriori analysis with 80% power suggests that I need between 244-87 participants total in order to see the same effect sizes (to achieve an f-value being between .20 to .33) for the main statistical test mentioned in the paper (ANCOVA test with 3 groups and 3 covariates). I took the average f effect size (.28), in which the apriori analysis at 80% power indicates that I need 127 participants to be sufficiently powered to replicate the findings from the original study. In order to divide the number evenly amongst 3 groups, I will be collecting 135 participants (45 in each condition).
Planned Sample
The planned sample will be college students in the United States recruited from Prolific. In order to achieve the sample similar to the study, I will be using the same filters/pre-selection criteria as the original paper. The paper used 2 criteria: 1) The participant must be located in the US and 2) The participant must be currently a college student.
Materials and Procedures
See original paper’s material, procedure, and measurement information below:
“Institutional Review Board (IRB) approval was obtained prior to data collection. Students were invited to participate in a study about their opinions of college courses.They were randomly assigned to watch one of three short video clips from the first day of a Calculus course in which the mindset manipulations were embedded. In these course introduction videos, the same actor (an older White man2) read several sections of his syllabus regarding his beliefs about what it took to do well in the class and his expectations of students (see the online supplemental materials for the class description and video scripts). Interspersed through the professor’s remarks were comments suggesting that he either endorsed (a) fixed mindset beliefs (e.g., “In this course, you either know the concepts and have the skills, or you don’t”), (b) growth mindset beliefs (“These assignments are designed to help you improve your skills throughout the semester”), or (c) no mindset beliefs. After watching the randomly assigned video, students completed a brief survey that included our dependent variables, were debriefed, and compensated for their participation. The full text of all measures and manipulations can be found in the online supplemental materials.
Professor mindset manipulation check. To assess whether students perceived the manipulation as intended, students responded to four items adapted from Dweck (1999) about their perceptions of the professor’s mindset (e.g., “The professor in this class seems to believe that students have a certain amount of intelligence, and they really can’t do much to change it”) on a scale from 1 (strongly disagree) to 6 (strongly agree). Items were recoded so that higher values indicated stronger fixed mindset perceptions and averaged to form a composite.
Demographics. Students self-reported their gender (0 = male, 1 = female) and race/ethnicity, which we then coded to denote underrepresented racial minority (URM) group membership (0 = White, East Asian, and Southeast Asian students; 1 = Black, Hispanic, Indian Subcontinent, Native American, Mixed race, Middle Eastern, and Other race students). Because previous research has found mean-level differences in students’ belonging and evaluative concerns in STEM settings based on their gender and race (e.g., Murphy & Zirkel, 2015; Rudolph & Conley, 2005), we coded and controlled for these factors in our analyses so that we could observe the effect of faculty mindset above and beyond these potential group differences.
Personal mindset beliefs. Four items (Dweck, 1999) assessed students’ own mindset beliefs (e.g., “You have a certain amount of intelligence, and you can’t really do much to change it”) rated on a scale from 1 (strongly agree) to 6 (strongly disagree). We recoded values such that higher values indicated greater fixed mindset beliefs. In our analyses, we controlled for students’ personal mindset beliefs because much previous research has found that people’s personal mindset beliefs influence their experiences and behavior (e.g., Blackwell et al., 2007; see Yeager & Dweck, 2012, for a review) and we aimed to examine the effect of faculty mindset above and beyond students’ personal mindset beliefs.
Psychological vulnerability. To assess students’ anticipated psychological vulnerability in the professor’s class, we measured their anticipated feelings of belonging using five items adapted from Murphy and Zirkel (2015; e.g., “How much would you feel that you ‘fit in’ during this class?”). We also measured their anticipated evaluative concerns using five items adapted from Wout, Murphy, and Steele (2010; e.g., “How much would you worry that you might say the wrong thing in class?”). Both scales ranged from 1 (not at all) to 7 (extremely). A composite was created by averaging the belonging and evaluative concerns items; higher numbers indicated greater anticipated psychological vulnerability in the professor’s class.
Course engagement. Anticipated course engagement was assessed by asking students how motivated and willing they would be to put in effort in the professor’s course using a three-item measure (e.g., “I think I would be willing to put in extra effort if the professor asked me to”). Items were rated on a scale ranging from 1 (strongly disagree) to 8 (strongly agree) and averaged to form a course engagement composite; higher scores indicate more course engagement.
Course interest. To assess anticipated course interest, students completed a three-item measure (e.g., “How interested would you be in taking a class taught by the professor?”). The scale ranged from 1 (not at all) to 6 (extremely) and items were averaged to form a course interest composite; higher scores indicate more course interest.
Course performance. To assess anticipated course performance, students completed a three-item measure (“I think I would get a good grade in this class). The scale ranged from 1 (strongly disagree) to 8 (strongly agree); items were averaged to form an anticipated performance composite; higher scores indicate greater anticipated performance.”
Analysis Plan
I will be following the analysis plan for study 1 precisely from the original paper:
“To analyze the manipulation check, we used an analysis of covariance (ANCOVA) and for the remaining analyses, we employed analyses of covariance (ANCOVAs) with students’ personal mindset beliefs, their gender (0 = male, 1 = female), and their race (0 =White, East Asian, and Southeast Asian students; 1 = Black, Hispanic, Indian Subcontinent, Native American, Mixed race, Middle Eastern, Other race students) entered as covariates. Table 2 includes summary statistics from the ANCOVA analyses across variables. Figure 2 depicts condition differences across variables.”
For this replication, I will be conducting the ANCOVA analysis (using personal mindset, gender, and race as covariates) in order to replicate the main effects of condition (growth mindset instructor, fixed minset instructor, control instructor) on anticipated belonging, evaluative concerns, course engagement, course interest, and anticipated course performance.
Differences from Original Study
Since the original study was also done on prolific, the anticipated differences in sample and setting are minimal. So I anticipate no differences in the claims made by original paper due to sample differences. For my replication, the materials, procedures, and measurements were followed almost precisely to the original. The materials and measurements were obtained from the “supplementary materials” portion of the original paper.
There were some key differences between the original and the replication of study 1. In particular, we did not have access to the original videos used in the original experiment. Using the script provided in the paper, we re-created the videos using a different actor (we did still use an actor characterized as an older, white male specified in the paper). Additionally, no response scale label options were provided in the materials (only the minimum and maximum values/labels), so I had to make an educated guess on what the middle label values were. I also added audio and attention checks to the replication to make sure the participants were able to hear the video and understand the content. I don’t expect the differences in methodology (particularly adding more verbal labels to the questions, using a different actor for the videos, or adding attention/audio checks) would make a difference to the claims of the original paper given that I am still using the same script and outcome questions provided by the authors.
Because I included an audio check and attention check question, participants who failed the audio check question at the beginning of the experiment (i.e. participants who cannot hear the audio in the study session) or failed the following attention check question “Please rate how strongly you agree or disagree with each of the following statements: Select the”disagree” option.” (i.e. participants who were not reading the questions carefully) were excluded from the analyses. I did add some attention checks regarding the video clips (like remembering particular classroom policies), but I did not exclude participants from the analyses for failling them. This is because these questions don’t necessarily mean that the participant wasn’t paying attention, they may have not remembered specific details that these questions were asking. It is unclear if audio or attention checks is used in the original paper’s study.
Regarding the analysis plan, I will not be replicating the psychological vulnerability composite (just interested in the belonging and evaluative concerns as separate constructs) or the manipulation check analysis. Additionally, I will be using bonferroni correction because multiple outcomes are being analyzed. This was not done in the original paper, but seems essential in order to control for the increased likelihood for false positives when running multiple tests.
Methods Addendum
Actual Sample
I collected 137 participants from Prolific. Of the 137 participants, one participant was excluded from the analyses for failing the attention check question and 0 were removed for failing the audio check question (leading to a final sample of 136). Not sure how I got more than 135 participants though because that’s the number that I specified on Prolific. Maybe a couple of participants submitted the survey twice, but there’s no evidence of this by IP address. The demographic breakdown of my sample is the following: 46% identified as female; 57% White, 16% Black, 12% Asian, 10% Hispanic, and 5% as Hispanic. This is comparable to the original sample from the paper, except I had a higher percentage of Black students in my sample.
Differences from pre-data collection methods plan
After finishing data collection, I realized I selected the wrong filter on Prolific. Specifically, I thought the college student pre-selection filter was student-status. However, the more specific filter would have been a filter on education-level and only allowing participants that selected that their “highest level of education is high school (or equivalent such as GED).” Because of this, the student-status pre-selection filter could contain both college and graduate students as eligible participants for the study. However, I did specify on the study posting that participants must currently be enrolled in college. I did not have a question in my paradigm to check if they actually were college students. Since I did not have the budget to redo the study with a new set of participants, this is now a limitation/difference I will acknowledge as I interpret the analyses and findings.
Results
Data preparation
## Load all packages needed for reshaping datalibrary(tidyverse) # for piping, useful packageslibrary(ltm) # for Cronbach's alphalibrary(effects) # for predicted data for plottinglibrary(ggplot2) # for graphslibrary(sjPlot) # correlation & model outputlibrary(car) # for comparing coefficientslibrary(emmeans) # for unpacking interactionslibrary(ggpubr) # for significance on plotslibrary(psych)library(effectsize)library(jpeg)library(grid)library(gridExtra)library(kableExtra)prolific <-read.csv("prolific_data.csv")
For this replication’s confirmatory analysis, I have summarized below the descriptive statistics (mean and standard deviation) for the personal mindset, and the manipulation check (instructor’s endorsed mindset), and main psychological experience outcomes by condition (growth, fixed, or control mindset instructor) and by all participants.
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `across(starts_with("mean_") | starts_with("sd_"), round, digits
= 2)`.
Caused by warning:
! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
Supply arguments directly to `.fns` through an anonymous function instead.
# Previously
across(a:b, mean, na.rm = TRUE)
# Now
across(a:b, \(x) mean(x, na.rm = TRUE))
#print demographic table final_table %>%kable(caption ="Table 1. Descriptive statistics by instructor condition and total sample") %>%kable_styling(full_width = F, position ="left")
Table 1. Descriptive statistics by instructor condition and total sample
outcome
mean_control instructor
sd_control instructor
mean_growth instructor
sd_growth instructor
mean_fixed instructor
sd_fixed instructor
mean_total sample
sd_total sample
Personal mindset
2.67
1.25
2.70
0.95
2.51
1.08
2.63
1.10
Instructor mindset
2.80
1.04
2.66
1.25
4.62
1.06
3.35
1.43
Anticipated belonging
3.92
1.05
4.27
1.25
3.12
1.15
3.77
1.24
Evaluative concerns
3.11
1.69
2.90
1.71
4.65
1.93
3.55
1.93
Course interest
3.30
1.48
4.01
1.24
2.30
1.62
3.21
1.60
Course engagement
5.57
1.59
5.98
1.30
4.53
1.97
5.36
1.74
Anticipated course performance
5.03
0.92
5.04
1.05
4.61
1.27
4.89
1.10
Table 2 shows the ANCOVA results (the main effect of condition on psychological experience outcomes with 3 covariates of race, gender, and personal mindset) for the replication.
#function of ANCOVArun_ancova <-function(outcome_var, data) { formula <-as.formula(paste(outcome_var, "~ condition + dc_urm + dc_gender + personalmindset")) model <-lm(formula, data = data)return(model)}# List of outcome variablesoutcomes <-c("belonging", "evaluation", "engagement", "interest", "preformance")# Run ANCOVA for each outcomemodels <-lapply(outcomes, run_ancova, data = df.prolific)names(models) <- outcomes# View results for eachlapply(models, function(m) Anova(m, type ="II"))
# Create ANCOVA summary tableresults_summary <-data.frame()for (i in1:length(models)) { outcome_name <-names(models)[i] model <- models[[i]]# ANOVA results anova_result <-Anova(model, type ="II") condition_row <- anova_result["condition", ]# Partial eta squared eta_sq <-eta_squared(model, partial =TRUE) condition_eta <- eta_sq[eta_sq$Parameter =="condition", "Eta2_partial"]# Cohen's f from model (omnibus effect size) cohens_f <- effectsize::cohens_f(model, partial =TRUE) condition_f <- cohens_f[cohens_f$Parameter =="condition", "Cohens_f_partial"]# Model R-squared r_sq <-summary(model)$r.squared results_summary <-rbind(results_summary, data.frame(Outcome = outcome_name,F_value =round(condition_row$`F value`, 2),df1 = condition_row$Df,df2 = anova_result["Residuals", "Df"],p_value = condition_row$`Pr(>F)`,partial_eta_sq =round(condition_eta, 3),cohens_f =round(condition_f, 3),R_squared =round(r_sq, 3) ))}# Add corrections and significanceresults_summary <- results_summary %>%mutate(p_bonferroni =p.adjust(p_value, method ="bonferroni"))# Rename and replace the outcomesoutcome_names <-c("personal"="Personal mindset","instructor"="Instructor mindset","belonging"="Anticipated belonging", "evaluation"="Evaluative concerns","interest"="Course interest","engagement"="Course engagement","preformance"="Anticipated course performance")results_summary$Outcome <-factor(results_summary$Outcome, levels =names(outcome_names), labels = outcome_names)#print ANCOVA resultsresults_summary %>%kable(caption ="Table 2. ANCOVA analysis of condition on psychological experiences") %>%kable_styling(full_width = F, position ="left")
Table 2. ANCOVA analysis of condition on psychological experiences
Outcome
F_value
df1
df2
p_value
partial_eta_sq
cohens_f
R_squared
p_bonferroni
Anticipated belonging
10.58
2
127
0.0000564
0.148
0.416
0.161
0.0002818
Evaluative concerns
11.47
2
127
0.0000263
0.152
0.424
0.160
0.0001315
Course engagement
8.94
2
127
0.0002321
0.118
0.366
0.145
0.0011605
Course interest
15.54
2
127
0.0000009
0.188
0.481
0.224
0.0000046
Anticipated course performance
2.29
2
127
0.1051935
0.035
0.190
0.052
0.5259673
Lastly, figure 1 shows a visualization of the main effect of condition on psychological experiences for my replication experiment. Figure 2 below it shows the findings from the original paper to give you a direct comparison.
Overall, the findings suggest that there is a main effect of instructor condition on various psychological experiences in their classroom. Specifically, students who perceive their instructor to endorse a fixed mindset anticipated a lower sense of belonging, lower course interest, lower course engagement, and greater evaluative concerns in their classroom than students who perceive their instructor to endorse growth or no intelligence mindset beliefs. This replicates with the findings from Study 1 of Muenks et al. (2020)’s paper. However, my replication did not find any significant differences in anticipated course performance among instructor conditions. This result failed to replicate the finding from the original study. In conclusion, both the original study and my replication suggests that instructors should be mindful of their language used in the first day of class; in particular, endorsing fixed mindset beliefs through discussing classroom policies could affect students sense of belonging, interest, and engagement in the class.
Commentary
One potential reason why the course performance finding did not replicate in my study is potentially the filters I used in Prolific. As mentioned previously, I accidentally used the student-status filter, which could mean my sample contained a mix of graduate students and undergraduate students unlike in the original paper (which only had undergraduate students). My hypothesis is that older, more educated students (like graduate students) could have more prior exposure/experience with calculus and can use their prior experiences as a gauge to their anticipated course performance more than the instructor’s mindset. I do critique the original experiment for not taking this into account because I do think that people’s previous performance and confidence in their abilities could change some of these outcomes more than the instructor’s demeanor so it would be interesting to control for those factors to see the true effect of instructor’s mindset beliefs on these outcomes. Another thing that I thought was interesting was the personal mindset beliefs (i.e. the students’ own beliefs about intelligence malleability) across conditions (in table 1). This was not manipulated in my experiment (only the instructor’s endorsed mindset beliefs) but its interesting to see that stronger personal fixed mindset beliefs were held in the growth and control condition than in the fixed instructor condition (though no significant differences were found in personal mindset beliefs across conditions).