Replication of Study 4 by Rattan, Good & Dweck (2012, Journal of Experimental Social Psychology)

Author

Micaela Bonilla, bonillam@stanford.edu

Published

October 26, 2025

Introduction

I chose to replicate Study 4 from Rattan et al. (2012) because my research interests center on formative assessment in mathematics education, particularly the mechanisms through which feedback influences students’ motivation and learning. Much of the existing research on formative assessment comes from educational interventions in authentic classroom settings, which are ecologically valid but less controlled. Experimental studies that isolate the effects of different types of feedback are rare, making this work especially relevant. This study is, to the best of my knowledge, the only one that experimentally examines how variations in feedback (comfort-oriented, strategy-oriented, or neutral) shape students’ motivation, beliefs about ability, and expectations. Replicating this design allows me to understand more about the gap between intervention-based research and more controlled experimentation, providing insight into the psychological mechanisms underlying formative assessment practices.

The stimuli in this experiment consist of written scenarios and feedback scripts. Participants are asked to imagine themselves as students in a calculus course who have just received a disappointing test score (65%). After reading an initial statement of support, they are randomly assigned to receive one of three types of feedback: (a) comfort-oriented feedback emphasizing general strengths and minimizing demands, (b) strategy-oriented feedback suggesting concrete steps to improve, or (c) control feedback expressing care without specific guidance. Following this manipulation, participants complete scales assessing their perceptions of the instructor’s implicit theory of math ability, the professor’s expectations and investment, their own motivation, and anticipated performance.

I anticipate three main challenges in conducting this replication. First, I will need to adapt the materials into an online format and ensure random assignment is implemented correctly, which requires technical accuracy in programming the survey tool, specially considering this will be my first time conducting an experiment of this kind. Second, I will need to secure a participant sample large enough to provide adequate statistical power, which will be challenging since I will not be using a Prolific sample and will instead need to recruit students in order to replicate the study with fidelity. Third, interpreting the results in relation to my broader research program will require careful consideration, since I will have to connect findings from a controlled and somewhat artificial setting to the complex realities of formative assessment in classrooms.

Repository and Paper Links: - https://github.com/psych251/rattan2012 - https://github.com/psych251/rattan2012/blob/main/original_paper/rattan2012.pdf

Methods

Power Analysis

To plan the replication study, I estimated effect sizes (Cohen’s f) from the F-values reported in the original study, since effect sizes were not explicitly provided.

The estimated effect sizes for the four dependent variables are:

  • Perceptions of Entity Theory: F(2, 51) = 15.95 → Cohen’s f = 0.79
  • Expectations and Investment: F(2, 51) = 12.83 → Cohen’s f = 0.71
  • Motivation: F(2, 51) = 6.33 → Cohen’s f = 0.5
  • Anticipated Grade: F(2, 51) = 5.25 → Cohen’s f = 0.45

The lowest estimated effect size (f = 0.45, for anticipated grade) was used to calculate the required sample size to ensure adequate statistical power across all outcomes. Using a balanced one-way ANOVA with three groups, a significance level of 0.05, and a desired power of 0.80.

The analysis indicated that approximately 17 participants per group would be needed. This conservative approach ensures sufficient power to detect even the smallest effect observed in the original study.

Planned Sample

Fifty-one students at a competitive private university on the West coast (Stanford Psych 1 students), are planed to be recuited in exchange for class credit. Participants will be asked demographic questions about genderm ethnicity and age in order to be able to compare with the original demographic profile of the study, which included “26 males, 28 females; 8 AfricanAmericans, 15 Asian-Americans, 21 European-Americans/Whites, 6 Latino-Americans, 2 Native Americans, 2 Biracial; mean age= 20.2,SD= 2.36” (Rattan et al., 2012).

Procedure

The study followed the procedure and materials described in Rattan et al. (2012):

“Procedure

Participants completed an online study in which they imagined being in a calculus course at their university. They read a scenario in which, after the first calculus test of the year, they met with their professor to learn their grade and receive their test. All participants first read that they received a low score on the test (65%) and were given some initial feedback: “Your professor notices that you are not happy with your grade and says, ‘I can understand that you are probably disappointed by your grade.’” They then received the feedback manipulation, reading either comfort-oriented feedback (that focused on their strengths), strategy-oriented feedback (that provided concrete suggestions), or control feedback (that contained two statements of caring that were present in the other conditions): Comfort Feedback “I want to assure you that I know you are a talented student in general — it’s just not the case that everyone is a “math person.” I want you to remember how great you do in other subjects. I want you to know what I’m going to do too — I’m going to make a point not to call on you as much in class because I don’t want you to have the added pressure of putting you on the spot and I’m going to give you some easier math tasks to work on so you can get more comfortable with those skills. I want to assure you that I really care, so let’s stay in contact about how you’re doing in the class.” Strategy Feedback “I want to assure you that I know that you are a talented student in general. I want you to change your study strategies and consider working with a tutor. I want you to know what I’m going to do too — I’m going to make a point to call on you more in class and I’m going to give you more challenging math tasks. I want to assure you that I really care, so let’s stay in contact about how you’re doing in the class.” Control Feedback “I want to assure you that I know you are a talented student in general, and I want to assure you that I really care, so let’s stay in contact about how you’re doing in the class.” Though the feedback did not explicitly communicate a theory of math intelligence, we hypothesized that a professor’s more comfortoriented feedback would communicate more of an entity theory to students as compared with strategy-oriented or control feedback. Thus, participants completed a 4-item Perceptions of an Environmental Entity Theory (PEET) scale (Good et al., in press; e.g., “My professor believes that I have a certain amount of math intelligence, and I can’t really do much to change it,” α= 0.96, strongly disagree “1” – strongly agree “6,”). Participants then responded to four items that assessed the degree to which they felt their professor had low expectations and little investment in their future in the field (e.g., “How would you characterize your professor’s assessment of your math ability?” My professor thinks I have very little ability in math “1” – My professor thinks I have a great deal of ability in math “7;” “How much do you feel that your professor is invested in your success in math?” not at all “1” – extremely “7,” α= 0.87). We also investigated whether the feedback conditions would have differential effects on students’ motivation using 2 items, “How encouraged in math do you feel by your professor’s feedback?” and “How motivated to try to improve in math do you feel by your professor’s feedback?” (not at all “1” – extremely “7,” α= 0.82). Finally, we asked whether students would anticipate differential performance outcomes for themselves by asking, “What do you think your final grade in this math class will be at the end of the semester?” (1 “35%” – 2 “50%” – 3 “65%” – 4 “80%” – 5 “95%”).”

To ensure precise replication, I contacted Professor Rattan to request the following:

  • Instructions before presenting the scenarios.
  • Items and labels of the 6-point likert scale for the Perceived Environmental Entity Theory (PEET) scale.
  • Items and labels of the 7-point likert scale for the Professor Expectations and Investment scale
  • The response labels used for the Motivation scale (whether each point on the 7-point scale was labeled, and what those labels were; or they used labels just in 1 not at all and 7 - extremely).
  • Instructions provided to participants for each survey.

The draft has already been set up in qualtrics and will be adjusted when the author responds with the details. - Qualtrics Pilot A survey: https://github.com/psych251/rattan2012/blob/main/materials/Pilot_A_Survey.pdf - Qualtrics Pilot A workflow: https://github.com/psych251/rattan2012/blob/main/materials/survey_flow_qualtrics.pdf

Analysis Plan

I will follow the original procedure described by Rattan et al. (2012) to analyze the data. Prior to hypothesis testing, I will compute descriptive statistics for all study variables and examine the internal consistency of each multi-item measure. Specifically, I will average participants’ responses to produce composite scores for the three multi-item measures: (1) the Perceptions of an Environmental Entity Theory (PEET) scale, (2) the expectations and investment scale, and (3) the motivation items. Cronbach’s alpha will be calculated for each scale to assess internal reliability.

After establishing reliability, I will conduct a one-way analysis of variance (ANOVA) to examine the effect of feedback condition (comfort-oriented, strategy-oriented, or control) on each of the four dependent variables: (1) perceived professor entity theory beliefs, (2) perceived professor expectations and investment, (3) participants’ self-reported motivation, and (4) anticipated final grade in the course.

When a significant main effect of feedback condition is found, I will conduct planned pairwise comparisons (t tests) to compare the comfort feedback condition with the strategy and control conditions, as well as to compare the strategy and control conditions with each other. This analytic approach will mirror that used in the original study.

For each dependent variable, I will compute means (M) and standard deviations (SD) by condition and report F- and t-statistics along with associated p-values. Statistical significance will be evaluated at an alpha level of .05. Although the original study did not report effect sizes, I will calculate and report Cohen’s F for ANOVAs, and Cohen’s d for planned pairwise comparisons to provide estimates of the magnitude of effects.

A figure will be included to illustrate participants’ responses across feedback conditions, modeled after figure 3 in Rattan et al., (2012) “Students’ responses as a function of feedback condition in Study 4. Comfort feedback was significantly different from control and concrete feedback for each of the dependent variables”.

All analyses will be conducted using the statistical software R.

Differences from Original Study

Explicitly describe known differences in sample, setting, procedure, and analysis plan from original study. The goal, of course, is to minimize those differences, but differences will inevitably occur. Also, note whether such differences are anticipated to make a difference based on claims in the original article or subsequent published research on the conditions for obtaining the effect.

Methods Addendum (Post Data Collection)

You can comment this section out prior to final report with data collection.

Actual Sample

Sample size, demographics, data exclusions based on rules spelled out in analysis plan

Differences from pre-data collection methods plan

Any differences from what was described as the original plan, or “none”.

Results

Data preparation

##Pilot A:

Mistakes in the questionnaire: - I did not include an Embedded Data element inside the Randomizer, so the scenario shown to each participant was not recorded in the exported dataset. For this pilot, I identified which scenario each question belonged to by manually viewing each questionnaire. For future versions, I will add an Embedded Data field within the Randomizer to automatically capture and export the scenario information.

#### Load Relevant Libraries and Functions
library(tidyr)
library(readr)
library(dplyr)
library(stringr)
library(dplyr)
library(rstatix)  
library(knitr)   
library(car)
library(emmeans)
library(ggplot2)

#### Import data
pilot_a <- read_csv("data/pilot_a.csv")

### Data Preparation
#Delete first two rows as they contain information
pilot_a <- pilot_a %>% 
  slice(-c(1, 2))

#Recode variables
pilot_a <- pilot_a %>%
  mutate(across(Q1:Q10, ~ as.numeric(str_extract(., "[1-7]"))) )%>%
  mutate(across(Q1:Q10, as.numeric))

#### Data exclusion / filtering

#### Prepare data for analysis - create columns etc.
pilot_a$Condition <- factor(pilot_a$Condition, levels = c("control", "strategy", "comfort"))

#Create PEET score
pilot_a <- pilot_a %>%
  mutate(Q4_rev = 6 - Q4)

pilot_a <- pilot_a %>%
  mutate(peet = rowMeans(across(c(Q1, Q2, Q3, Q4_rev)), na.rm = TRUE))

# Professor Expectations and Investment scale (PEI)
pilot_a <- pilot_a %>%
  mutate(pei = rowMeans(across(Q5:Q8), na.rm = TRUE))

# Motivation
pilot_a <- pilot_a %>%
  mutate(motivation = rowMeans(across(Q9:Q10), na.rm = TRUE))

#Sociodemographic info

proportions_gender <- pilot_a %>%
  count(Q11) %>%
  mutate(Proportion = n / sum(n))
print(proportions_gender)
# A tibble: 2 × 3
  Q11        n Proportion
  <chr>  <int>      <dbl>
1 Female     4        0.4
2 Male       6        0.6
pilot_a$Q12 <- as.numeric(as.character(pilot_a$Q12)) 
summary_age <- pilot_a %>%
  summarise(
    Mean_Age = mean(Q12, na.rm = TRUE),
    SD_Age = sd(Q12, na.rm = TRUE)
  )
print(summary_age) 
# A tibble: 1 × 2
  Mean_Age SD_Age
     <dbl>  <dbl>
1     30.4   3.10

Confirmatory analysis

####Pilot A Confirmatory Analysis

### Descriptives
dv_vars <- c("peet", "pei", "motivation")

descriptives <- pilot_a %>%
  group_by(Condition) %>%
  summarise(across(all_of(dv_vars),
                   list(M = ~mean(., na.rm = TRUE),
                        SD = ~sd(., na.rm = TRUE)),
                   .names = "{col}_{fn}"))

descriptives
# A tibble: 3 × 7
  Condition peet_M peet_SD pei_M pei_SD motivation_M motivation_SD
  <fct>      <dbl>   <dbl> <dbl>  <dbl>        <dbl>         <dbl>
1 control     1.83   0.289  4.92  0.144         5.33         0.577
2 strategy    2.25   0.677  3.94  2.07          3.75         2.02 
3 comfort     4.83   0.629  3     1.95          2.83         0.764
### Tests of difference
#PEET
anova_peet <- aov(peet ~ Condition, data = pilot_a)
emm_peet <- emmeans(anova_peet, ~ Condition)
contrast_peet <- contrast(emm_peet, method = "pairwise", adjust = "bonferroni")
ss_effect <- summary(anova_peet)[[1]]["Condition", "Sum Sq"]
ss_error  <- summary(anova_peet)[[1]]["Residuals", "Sum Sq"]
eta2_peet <- ss_effect / (ss_effect + ss_error)
cohens_f_peet <- sqrt(eta2_peet / (1 - eta2_peet))

#PEI
anova_pei <- aov(pei ~ Condition, data = pilot_a)
emm_pei <- emmeans(anova_pei, ~ Condition)
contrast_pei <- contrast(emm_pei, method = "pairwise", adjust = "bonferroni")

ss_effect <- summary(anova_pei)[[1]]["Condition", "Sum Sq"]
ss_error  <- summary(anova_pei)[[1]]["Residuals", "Sum Sq"]
eta2_pei <- ss_effect / (ss_effect + ss_error)
cohens_f_pei <- sqrt(eta2_pei / (1 - eta2_pei))

#Motivation
anova_motivation <- aov(motivation ~ Condition, data = pilot_a)
emm_motivation <- emmeans(anova_motivation, ~ Condition)
contrast_motivation <- contrast(emm_motivation, method = "pairwise", adjust = "bonferroni")
ss_effect <- summary(anova_motivation)[[1]]["Condition", "Sum Sq"]
ss_error  <- summary(anova_motivation)[[1]]["Residuals", "Sum Sq"]
eta2_motivation <- ss_effect / (ss_effect + ss_error)
cohens_f_motivation <- sqrt(eta2_motivation / (1 - eta2_motivation))


##RESULTS
anova_peet
Call:
   aov(formula = peet ~ Condition, data = pilot_a)

Terms:
                Condition Residuals
Sum of Squares  16.316667  2.333333
Deg. of Freedom         2         7

Residual standard error: 0.5773503
Estimated effects may be unbalanced
contrast_peet
 contrast           estimate    SE df t.ratio p.value
 control - strategy   -0.417 0.441  7  -0.945  1.0000
 control - comfort    -3.000 0.471  7  -6.364  0.0011
 strategy - comfort   -2.583 0.441  7  -5.858  0.0019

P value adjustment: bonferroni method for 3 tests 
cohens_f_peet
[1] 2.644401
anova_pei
Call:
   aov(formula = pei ~ Condition, data = pilot_a)

Terms:
                Condition Residuals
Sum of Squares   5.511458 20.463542
Deg. of Freedom         2         7

Residual standard error: 1.709785
Estimated effects may be unbalanced
contrast_pei
 contrast           estimate   SE df t.ratio p.value
 control - strategy    0.979 1.31  7   0.750  1.0000
 control - comfort     1.917 1.40  7   1.373  0.6364
 strategy - comfort    0.938 1.31  7   0.718  1.0000

P value adjustment: bonferroni method for 3 tests 
cohens_f_pei
[1] 0.5189707
anova_motivation
Call:
   aov(formula = motivation ~ Condition, data = pilot_a)

Terms:
                Condition Residuals
Sum of Squares   9.641667 14.083333
Deg. of Freedom         2         7

Residual standard error: 1.418416
Estimated effects may be unbalanced
contrast_motivation
 contrast           estimate   SE df t.ratio p.value
 control - strategy    1.583 1.08  7   1.462  0.5618
 control - comfort     2.500 1.16  7   2.159  0.2032
 strategy - comfort    0.917 1.08  7   0.846  1.0000

P value adjustment: bonferroni method for 3 tests 
cohens_f_motivation
[1] 0.8274149
# Calculate means and standard errors
plot_data <- pilot_a %>%
  pivot_longer(cols = c(peet, pei, motivation), names_to = "Variable", values_to = "Score") %>%
  group_by(Condition, Variable) %>%
  summarise(mean_score = mean(Score), 
            se_score = sd(Score) / sqrt(n()), .groups = 'drop')

# Create the bar plot
ggplot(plot_data, aes(x = Variable, y = mean_score, fill = Condition)) +
  geom_bar(stat = "identity", position = position_dodge(), width = 0.7) +
  geom_errorbar(aes(ymin = mean_score - se_score, ymax = mean_score + se_score), 
                position = position_dodge(0.7), width = 0.2) +
  scale_fill_manual(values = c("comfort" = "black", "control" = "lightgray", "strategy" = "gray"),
                    labels = c("comfort" = "Comfort Feedback", 
                               "control" = "Control Feedback", 
                               "strategy" = "Strategy Feedback")) +
  scale_x_discrete(labels = c(peet = "Perceptions of Entity Theory", 
                               pei = "Low Expectations/Investment", 
                               motivation = "Student's Motivation")) +
  labs(title = "Student Responses to Feedback", y = "(1-7 points scale)", x = NULL) +
  coord_cartesian(ylim = c(1, 7)) +
  theme_classic() +
  theme(legend.title = element_blank())

Side-by-side graph with original graph is ideal here

Exploratory analyses

Any follow-up analyses desired (not required).

Discussion

Summary of Replication Attempt

Open the discussion section with a paragraph summarizing the primary result from the confirmatory analysis and the assessment of whether it replicated, partially replicated, or failed to replicate the original result.

Commentary

Add open-ended commentary (if any) reflecting (a) insights from follow-up exploratory analysis, (b) assessment of the meaning of the replication (or not) - e.g., for a failure to replicate, are the differences between original and present study ones that definitely, plausibly, or are unlikely to have been moderators of the result, and (c) discussion of any objections or challenges raised by the current and original authors about the replication attempt. None of these need to be long.