Replication of Study 4 by Rattan, Good & Dweck (2012, Journal of Experimental Social Psychology)

Author

Micaela Bonilla, bonillam@stanford.edu

Published

December 12, 2025

Introduction

I chose to replicate Study 4 from Rattan et al. (2012) because my research interests center on formative assessment in mathematics education, particularly the mechanisms through which feedback influences students’ motivation and learning. Much of the existing research on formative assessment comes from educational interventions in authentic classroom settings, which are ecologically valid but less controlled. Experimental studies that isolate the effects of different types of feedback are rare, making this work especially relevant. This study is, to the best of my knowledge, the only one that experimentally examines how variations in feedback (comfort-oriented, strategy-oriented, or neutral) shape students’ motivation, beliefs about ability, and expectations. Replicating this design allows me to understand more about the gap between intervention-based research and more controlled experimentation, providing insight into the psychological mechanisms underlying formative assessment practices.

The stimuli in this experiment consist of written scenarios and feedback scripts. Participants are asked to imagine themselves as students in a calculus course who have just received a disappointing test score (65%). After reading an initial statement of support, they are randomly assigned to receive one of three types of feedback: (a) comfort-oriented feedback emphasizing general strengths and minimizing demands, (b) strategy-oriented feedback suggesting concrete steps to improve, or (c) control feedback expressing care without specific guidance. Following this manipulation, participants complete scales assessing their perceptions of the instructor’s implicit theory of math ability, the professor’s expectations and investment, their own motivation, and anticipated performance.

I anticipate three main challenges in conducting this replication. First, I will need to adapt the materials into an online format and ensure random assignment is implemented correctly, which requires technical accuracy in programming the survey tool, specially considering this will be my first time conducting an experiment of this kind. Second, I will need to secure a participant sample large enough to provide adequate statistical power, which will be challenging since I will not be using a Prolific sample and will instead need to recruit students in order to replicate the study with fidelity. Third, interpreting the results in relation to my broader research program will require careful consideration, since I will have to connect findings from a controlled and somewhat artificial setting to the complex realities of formative assessment in classrooms.

Repository and Paper Links: - https://github.com/psych251/rattan2012 - https://github.com/psych251/rattan2012/blob/main/original_paper/rattan2012.pdf

Methods

Power Analysis

To plan the replication study, I estimated effect sizes (Cohen’s f) from the F-values reported in the original study, since effect sizes were not explicitly provided.

The estimated effect sizes for the four dependent variables are:

Perceptions of Entity Theory: F(2, 51) = 15.95 → Cohen’s f = 0.79
Expectations and Investment: F(2, 51) = 12.83 → Cohen’s f = 0.71
Motivation: F(2, 51) = 6.33 → Cohen’s f = 0.5
Anticipated Grade: F(2, 51) = 5.25 → Cohen’s f = 0.45

The lowest estimated effect size (f = 0.45, for anticipated grade) was used to calculate the required sample size to ensure adequate statistical power across all outcomes. Using a one-way ANOVA with three groups, a significance level of 0.05, and a desired power of 0.80.

The analysis indicated that approximately 17 participants per group would be needed. This conservative approach ensures sufficient power to detect even the smallest effect observed in the original study.

Planned Sample

Fifty-one students at a competitive private university on the West coast (Stanford Psych 1 students), are planned to be recruited in exchange for class credit. Participants will be asked demographic questions about gender ethnicity and age in order to be able to compare with the original demographic profile of the study, which included “26 males, 28 females; 8 AfricanAmericans, 15 Asian-Americans, 21 European-Americans/Whites, 6 Latino-Americans, 2 Native Americans, 2 Biracial; mean age= 20.2,SD= 2.36” (Rattan et al., 2012).

Procedure

The study followed the procedure and materials described in Rattan et al. (2012):

“Procedure

Participants completed an online study in which they imagined being in a calculus course at their university. They read a scenario in which, after the first calculus test of the year, they met with their professor to learn their grade and receive their test. All participants first read that they received a low score on the test (65%) and were given some initial feedback: “Your professor notices that you are not happy with your grade and says, ‘I can understand that you are probably disappointed by your grade.’” They then received the feedback manipulation, reading either comfort-oriented feedback (that focused on their strengths), strategy-oriented feedback (that provided concrete suggestions), or control feedback (that contained two statements of caring that were present in the other conditions): Comfort Feedback “I want to assure you that I know you are a talented student in general — it’s just not the case that everyone is a “math person.” I want you to remember how great you do in other subjects. I want you to know what I’m going to do too — I’m going to make a point not to call on you as much in class because I don’t want you to have the added pressure of putting you on the spot and I’m going to give you some easier math tasks to work on so you can get more comfortable with those skills. I want to assure you that I really care, so let’s stay in contact about how you’re doing in the class.” Strategy Feedback “I want to assure you that I know that you are a talented student in general. I want you to change your study strategies and consider working with a tutor. I want you to know what I’m going to do too — I’m going to make a point to call on you more in class and I’m going to give you more challenging math tasks. I want to assure you that I really care, so let’s stay in contact about how you’re doing in the class.” Control Feedback “I want to assure you that I know you are a talented student in general, and I want to assure you that I really care, so let’s stay in contact about how you’re doing in the class.” Though the feedback did not explicitly communicate a theory of math intelligence, we hypothesized that a professor’s more comfortoriented feedback would communicate more of an entity theory to students as compared with strategy-oriented or control feedback. Thus, participants completed a 4-item Perceptions of an Environmental Entity Theory (PEET) scale (Good et al., in press; e.g., “My professor believes that I have a certain amount of math intelligence, and I can’t really do much to change it,” α= 0.96, strongly disagree “1” – strongly agree “6,”). Participants then responded to four items that assessed the degree to which they felt their professor had low expectations and little investment in their future in the field (e.g., “How would you characterize your professor’s assessment of your math ability?” My professor thinks I have very little ability in math “1” – My professor thinks I have a great deal of ability in math “7;” “How much do you feel that your professor is invested in your success in math?” not at all “1” – extremely “7,” α= 0.87). We also investigated whether the feedback conditions would have differential effects on students’ motivation using 2 items, “How encouraged in math do you feel by your professor’s feedback?” and “How motivated to try to improve in math do you feel by your professor’s feedback?” (not at all “1” – extremely “7,” α= 0.82). Finally, we asked whether students would anticipate differential performance outcomes for themselves by asking, “What do you think your final grade in this math class will be at the end of the semester?” (1 “35%” – 2 “50%” – 3 “65%” – 4 “80%” – 5 “95%”).”

I contacted Professor Rattan to request the following:

Instructions before presenting the scenarios.
Items and labels of the 6-point likert scale for the Perceived Environmental Entity Theory (PEET) scale.
Items and labels of the 7-point likert scale for the Professor Expectations and Investment scale
The response labels used for the Motivation scale (whether each point on the 7-point scale was labeled, and what those labels were; or they used labels just in 1 not at all and 7 - extremely).
Instructions provided to participants for each survey.

However, I did not receive a response to this contact (see details in Differences from Original Study section).

The final paradigm can be found in https://stanforduniversity.qualtrics.com/jfe/preview/previewId/abd34555-d22c-481a-868e-a1e6976455c8/SV_5ywvfwvuRKjJtEq?Q_CHL=preview&Q_SurveyVersionID=current

The study was preregistred in https://osf.io/5TKN2.

Analysis Plan

I will follow the original procedure described by Rattan et al. (2012) to analyze the data. Prior to hypothesis testing, I will compute descriptive statistics for all study variables and examine the internal consistency of each multi-item measure. Specifically, I will average participants’ responses to produce composite scores for the three multi-item measures: (1) the Perceptions of an Environmental Entity Theory (PEET) scale, (2) the expectations and investment scale, and (3) the motivation items. Cronbach’s alpha will be calculated for each scale to assess internal reliability.

After establishing reliability, I will conduct a one-way analysis of variance (ANOVA) to examine the effect of feedback condition (comfort-oriented, strategy-oriented, or control) on each of the four dependent variables: (1) perceived professor entity theory beliefs, (2) perceived professor expectations and investment, (3) participants’ self-reported motivation, and (4) anticipated final grade in the course.

When a significant main effect of feedback condition is found, I will conduct planned pairwise comparisons (t tests) to compare the comfort feedback condition with the strategy and control conditions, as well as to compare the strategy and control conditions with each other. This analytic approach will mirror that used in the original study.

For each dependent variable, I will compute means (M) and standard deviations (SD) by condition and report F- and t-statistics along with associated p-values. Statistical significance will be evaluated at an alpha level of .05. Although the original study did not report effect sizes, I will calculate and report Cohen’s F for ANOVAs, and Cohen’s d for planned pairwise comparisons to provide estimates of the magnitude of effects.

A figure will be included to illustrate participants’ responses across feedback conditions, modeled after figure 3 in Rattan et al., (2012) “Students’ responses as a function of feedback condition in Study 4. Comfort feedback was significantly different from control and concrete feedback for each of the dependent variables”.

All analyses will be conducted using the statistical software R.

Differences from Original Study

Setting, procedure, and analysis: The corresponding author did not respond to my requests, so the original material were not available directly from them. However, sample items were provided in the published article, and I was able to retrieve the full set of items from colleagues familiar with the scales. Although we cannot assess item-level differences at this time, I do not believe there are substantial deviations from the original materials or procedures. The same applies to the analytic approach: in the absence of direct clarification from the authors, I deduced the analysis plan from the results in the article and do not anticipate major discrepancies.

Pilot A:

Mistakes in the questionnaire: - I did not include an Embedded Data element inside the Randomizer, so the scenario shown to each participant was not recorded in the exported dataset. For this pilot, I identified which scenario each question belonged to by manually viewing each questionnaire. For future versions, I will add an Embedded Data field within the Randomizer to automatically capture and export the scenario information.

#### Import data
pilot_a <- read_csv("data/pilot_a.csv")

### Data Preparation
#Delete first two rows as they contain information
pilot_a <- pilot_a %>% 
  slice(-c(1, 2))

#Recode variables
pilot_a <- pilot_a %>%
  mutate(across(Q1:Q10, ~ as.numeric(str_extract(., "[1-7]"))) )%>%
  mutate(across(Q1:Q10, as.numeric))

#### Data exclusion / filtering

#### Prepare data for analysis - create columns etc.
pilot_a$Condition <- factor(pilot_a$Condition, levels = c("comfort","strategy", "control"))

#Create PEET score
pilot_a <- pilot_a %>%
  mutate(Q4_rev = 6 - Q4)

pilot_a <- pilot_a %>%
  mutate(peet = rowMeans(across(c(Q1, Q2, Q3, Q4_rev)), na.rm = TRUE))

# Professor Expectations and Investment scale (PEI)
pilot_a <- pilot_a %>%
  mutate(pei = rowMeans(across(Q5:Q8), na.rm = TRUE))

# Motivation
pilot_a <- pilot_a %>%
  mutate(motivation = rowMeans(across(Q9:Q10), na.rm = TRUE))

#Sociodemographic info

proportions_gender <- pilot_a %>%
  count(Q11) %>%
  mutate(Proportion = n / sum(n))
print(proportions_gender)

# A tibble: 2 × 3
  Q11        n Proportion
  <chr>  <int>      <dbl>
1 Female     4        0.4
2 Male       6        0.6

pilot_a$Q12 <- as.numeric(as.character(pilot_a$Q12)) 
summary_age <- pilot_a %>%
  summarise(
    Mean_Age = mean(Q12, na.rm = TRUE),
    SD_Age = sd(Q12, na.rm = TRUE)
  )
print(summary_age)

# A tibble: 1 × 2
  Mean_Age SD_Age
     <dbl>  <dbl>
1     30.4   3.10

#### Pilot A Confirmatory Analysis

### Descriptives
dv_vars <- c("peet", "pei", "motivation")

descriptives <- pilot_a %>%
  group_by(Condition) %>%
  summarise(across(all_of(dv_vars),
                   list(M = ~mean(., na.rm = TRUE),
                        SD = ~sd(., na.rm = TRUE)),
                   .names = "{col}_{fn}"))

descriptives

# A tibble: 3 × 7
  Condition peet_M peet_SD pei_M pei_SD motivation_M motivation_SD
  <fct>      <dbl>   <dbl> <dbl>  <dbl>        <dbl>         <dbl>
1 comfort     4.83   0.629  3     1.95          2.83         0.764
2 strategy    2.25   0.677  3.94  2.07          3.75         2.02 
3 control     1.83   0.289  4.92  0.144         5.33         0.577

### Tests of difference
# PEET
pilot_anova_peet <- aov(peet ~ Condition, data = pilot_a)
pilot_emm_peet <- emmeans(pilot_anova_peet, ~ Condition)
pilot_contrast_peet <- contrast(pilot_emm_peet, method = "pairwise", adjust = "bonferroni")
pilot_ss_effect <- summary(pilot_anova_peet)[[1]]["Condition", "Sum Sq"]
pilot_ss_error  <- summary(pilot_anova_peet)[[1]]["Residuals", "Sum Sq"]
pilot_eta2_peet <- pilot_ss_effect / (pilot_ss_effect + pilot_ss_error)
pilot_cohens_f_peet <- sqrt(pilot_eta2_peet / (1 - pilot_eta2_peet))

# PEI
pilot_anova_pei <- aov(pei ~ Condition, data = pilot_a)
pilot_emm_pei <- emmeans(pilot_anova_pei, ~ Condition)
pilot_contrast_pei <- contrast(pilot_emm_pei, method = "pairwise", adjust = "bonferroni")

pilot_ss_effect <- summary(pilot_anova_pei)[[1]]["Condition", "Sum Sq"]
pilot_ss_error  <- summary(pilot_anova_pei)[[1]]["Residuals", "Sum Sq"]
pilot_eta2_pei <- pilot_ss_effect / (pilot_ss_effect + pilot_ss_error)
pilot_cohens_f_pei <- sqrt(pilot_eta2_pei / (1 - pilot_eta2_pei))

# Motivation
pilot_anova_motivation <- aov(motivation ~ Condition, data = pilot_a)
pilot_emm_motivation <- emmeans(pilot_anova_motivation, ~ Condition)
pilot_contrast_motivation <- contrast(pilot_emm_motivation, method = "pairwise", adjust = "bonferroni")
pilot_ss_effect <- summary(pilot_anova_motivation)[[1]]["Condition", "Sum Sq"]
pilot_ss_error  <- summary(pilot_anova_motivation)[[1]]["Residuals", "Sum Sq"]
pilot_eta2_motivation <- pilot_ss_effect / (pilot_ss_effect + pilot_ss_error)
pilot_cohens_f_motivation <- sqrt(pilot_eta2_motivation / (1 - pilot_eta2_motivation))


## RESULTS
pilot_anova_peet

Call:
   aov(formula = peet ~ Condition, data = pilot_a)

Terms:
                Condition Residuals
Sum of Squares  16.316667  2.333333
Deg. of Freedom         2         7

Residual standard error: 0.5773503
Estimated effects may be unbalanced

pilot_contrast_peet

 contrast           estimate    SE df t.ratio p.value
 comfort - strategy    2.583 0.441  7   5.858  0.0019
 comfort - control     3.000 0.471  7   6.364  0.0011
 strategy - control    0.417 0.441  7   0.945  1.0000

P value adjustment: bonferroni method for 3 tests

pilot_cohens_f_peet

[1] 2.644401

pilot_anova_pei

Call:
   aov(formula = pei ~ Condition, data = pilot_a)

Terms:
                Condition Residuals
Sum of Squares   5.511458 20.463542
Deg. of Freedom         2         7

Residual standard error: 1.709785
Estimated effects may be unbalanced

pilot_contrast_pei

 contrast           estimate   SE df t.ratio p.value
 comfort - strategy   -0.938 1.31  7  -0.718  1.0000
 comfort - control    -1.917 1.40  7  -1.373  0.6364
 strategy - control   -0.979 1.31  7  -0.750  1.0000

P value adjustment: bonferroni method for 3 tests

pilot_cohens_f_pei

[1] 0.5189707

pilot_anova_motivation

Call:
   aov(formula = motivation ~ Condition, data = pilot_a)

Terms:
                Condition Residuals
Sum of Squares   9.641667 14.083333
Deg. of Freedom         2         7

Residual standard error: 1.418416
Estimated effects may be unbalanced

pilot_contrast_motivation

 contrast           estimate   SE df t.ratio p.value
 comfort - strategy   -0.917 1.08  7  -0.846  1.0000
 comfort - control    -2.500 1.16  7  -2.159  0.2032
 strategy - control   -1.583 1.08  7  -1.462  0.5618

P value adjustment: bonferroni method for 3 tests

pilot_cohens_f_motivation

[1] 0.8274149

# Prepare the data
plot_data <- pilot_a %>%
  pivot_longer(cols = c(peet, pei, motivation), 
               names_to = "Variable", values_to = "Score") %>%
  group_by(Condition, Variable) %>%
  summarise(mean_score = mean(Score), 
            se_score = sd(Score) / sqrt(n()), .groups = 'drop') %>%
  mutate(Variable = factor(Variable, levels = c("peet", "pei", "motivation")))

# Make sure Condition is a factor in the desired order
plot_data <- plot_data %>%
  mutate(Condition = factor(Condition, levels = c("comfort", "control", "strategy")))

# Create the bar plot
ggplot(plot_data, aes(x = Variable, y = mean_score, fill = Condition)) +
  geom_bar(stat = "identity", position = position_dodge(), width = 0.7) +
  geom_errorbar(aes(ymin = mean_score - se_score, ymax = mean_score + se_score), 
                position = position_dodge(0.7), width = 0.2) +
  scale_fill_manual(values = c("comfort" = "black", 
                               "control" = "lightgray", 
                               "strategy" = "gray"),
                    labels = c("comfort" = "Comfort Feedback", 
                               "control" = "Control Feedback", 
                               "strategy" = "Strategy Feedback")) +
  scale_x_discrete(labels = c(peet = "Perceptions of Entity Theory", 
                              pei = "Low Expectations/Investment", 
                              motivation = "Student's Motivation")) +
  labs(title = "Student Responses to Feedback", y = "(1-7 points scale)", x = NULL) +
  coord_cartesian(ylim = c(1, 7)) +
  theme_classic() +
  theme(
    legend.title = element_blank(),
    plot.title = element_text(size = 18, face = "bold", hjust = 0.5),
    axis.title = element_text(size = 14),
    axis.text = element_text(size = 12),
    legend.text = element_text(size = 12)
  )

Pilot B

Pilot B and Data Collection Error: Due to a misunderstanding of the SONA scheduling system, I inadvertently launched the study earlier than intended. I had entered November 30 as the participation date, not realizing that SONA uses this field as a deadline, which allowed participants to enroll and complete the survey immediately. As a result, 45 participants completed the survey before I was able to implement the revised Pilot B materials. I closed the study as soon as I identified the issue to prevent additional enrollments. The planned revisions involved only a minor clarification in the information sheet, and the survey content itself remained essentially unchanged. Given the minimal differences between the pilot and the intended final version, I proceeded with analyzing the existing data while transparently documenting this deviation. Because the planned sample size was 51 participants, I will collect the remaining participants and collapse the analyses across the original 45 and the additional participants.

Methods Addendum (Post Data Collection)

Actual Sample

#### Import data
data <- read_csv("data/data_replication.csv")

#Delete first two rows as they contain information
data <- data %>% 
  slice(-c(1, 2))

proportions_condition <- data %>%
  count(condition) %>%
  mutate(Proportion = n / sum(n))
print(proportions_condition)

# A tibble: 3 × 3
  condition     n Proportion
  <chr>     <int>      <dbl>
1 comfort      19      0.352
2 control      19      0.352
3 strategy     16      0.296

#Sociodemographic info

proportions_gender <- data %>%
  count(Q12) %>%
  mutate(Proportion = n / sum(n))
print(proportions_gender)

# A tibble: 4 × 3
  Q12                   n Proportion
  <chr>             <int>      <dbl>
1 Female               37     0.685 
2 Male                 13     0.241 
3 Non-Binary            1     0.0185
4 Prefer not to say     3     0.0556

proportions_race <- data %>%
  count(Q14) %>%
  mutate(Proportion = n / sum(n))
print(proportions_race)

# A tibble: 10 × 3
   Q14                                                              n Proportion
   <chr>                                                        <int>      <dbl>
 1 American Indian or Alaska Native,Black or African American,…     1     0.0185
 2 Asian                                                           21     0.389 
 3 Asian,Hispanic or Latino                                         1     0.0185
 4 Asian,White                                                      2     0.0370
 5 Black or African American                                        2     0.0370
 6 Black or African American,White                                  1     0.0185
 7 Hispanic or Latino                                               5     0.0926
 8 Hispanic or Latino,White                                         3     0.0556
 9 Prefer not to say                                                5     0.0926
10 White                                                           13     0.241

data$Q13 <- as.numeric(as.character(data$Q13)) 
summary_age <- data %>%
  summarise(
    Mean_Age = mean(Q13, na.rm = TRUE),
    SD_Age = sd(Q13, na.rm = TRUE)
  )
print(summary_age)

# A tibble: 1 × 2
  Mean_Age SD_Age
     <dbl>  <dbl>
1     19.1  0.998

Our final sample consisted of 54 undergraduate students enrolled in an introductory psychology course at Stanford University. No participants excluded. Participants were, on average, 19 years old (SD = 0.99), making the sample slightly younger than that of the original study by Rattan et al. (2012). The gender distribution included 37 women, 13 men, 1 non-binary participant, and 3 individuals who preferred not to disclose their gender. In terms of racial and ethnic background, the sample comprised 2 Black or African American students, 21 Asian students, 13 White students, 5 Hispanic or Latino students, 8 students identifying as mixed race, and 5 who preferred not to report their racial/ethnic identity.

Differences from pre-data collection methods plan

My original preregistered plan was to collect data from 51 participants, with 17 students assigned to each of the three feedback conditions. However, during data collection I discovered that the groups were becoming uneven because Qualtrics assigns participants to a condition as soon as they open the survey—even if they drop out before completing any items. As a result, early exits were still counted toward a condition’s total, creating imbalance that I had not anticipated. To compensate for this and to remain consistent with the original Rattan et al. (2012) study, I increased the maximum sample size to 54 participants. Although this adjustment helped stabilize the condition counts, the final distribution was still somewhat uneven: 19 participants in the comfort condition, 19 in the control condition, and 16 in the strategy condition. This imbalance is not ideal, but it reflects the constraints of the randomization process and the practical limitations of data collection.

Results

Data preparation

#Recode variables
data <- data %>%
  mutate(across(Q1:Q10, ~ as.numeric(str_extract(., "[1-7]"))) )%>%
  mutate(across(Q1:Q10, as.numeric)) %>%
  mutate(
    Q11 = case_match(
      Q11,
      "35%" ~ 1,
      "50%" ~ 2,
      "65%" ~ 3,
      "80%" ~ 4,
      "95%" ~ 5,
      .default = NA_integer_
    )
  )

#### Prepare data for analysis - create columns etc.
data$condition <- factor(data$condition, levels = c("comfort", "strategy","control"))

#Create PEET score
data <- data %>%
  mutate(Q4_rev = 6 - Q4)

data <- data %>%
  mutate(peet = rowMeans(across(c(Q1, Q2, Q3, Q4_rev)), na.rm = TRUE))

# Professor Expectations and Investment scale (PEI)
data <- data %>%
  mutate(pei = rowMeans(across(Q5:Q8), na.rm = TRUE))

# Motivation
data <- data %>%
  mutate(motivation = rowMeans(across(Q9:Q10), na.rm = TRUE))

Confirmatory analysis

### Descriptives
dv_vars <- c("peet", "pei", "motivation")

descriptives <- data %>%
  group_by(condition) %>%
  summarise(across(all_of(dv_vars),
                   list(M = ~mean(., na.rm = TRUE),
                        SD = ~sd(., na.rm = TRUE)),
                   .names = "{col}_{fn}"))

descriptives

# A tibble: 3 × 7
  condition peet_M peet_SD pei_M pei_SD motivation_M motivation_SD
  <fct>      <dbl>   <dbl> <dbl>  <dbl>        <dbl>         <dbl>
1 comfort     4.11   1.28   3.58  1.39          3.03          1.61
2 strategy    1.77   0.942  5.58  1.11          5.5           1.13
3 control     2.51   0.884  5.49  0.810         5.34          1.04

### Tests of difference
# PEET
anova_peet <- aov(peet ~ condition, data = data)
emm_peet <- emmeans(anova_peet, ~ condition)
contrast_peet <- contrast(emm_peet, method = "pairwise", adjust = "bonferroni")
ss_effect <- summary(anova_peet)[[1]]["condition", "Sum Sq"]
ss_error  <- summary(anova_peet)[[1]]["Residuals", "Sum Sq"]
eta2_peet <- ss_effect / (ss_effect + ss_error)
cohens_f_peet <- sqrt(eta2_peet / (1 - eta2_peet))

# PEI
anova_pei <- aov(pei ~ condition, data = data)
emm_pei <- emmeans(anova_pei, ~ condition)
contrast_pei <- contrast(emm_pei, method = "pairwise", adjust = "bonferroni")

ss_effect <- summary(anova_pei)[[1]]["condition", "Sum Sq"]
ss_error  <- summary(anova_pei)[[1]]["Residuals", "Sum Sq"]
eta2_pei <- ss_effect / (ss_effect + ss_error)
cohens_f_pei <- sqrt(eta2_pei / (1 - eta2_pei))

# Motivation
anova_motivation <- aov(motivation ~ condition, data = data)
emm_motivation <- emmeans(anova_motivation, ~ condition)
contrast_motivation <- contrast(emm_motivation, method = "pairwise", adjust = "bonferroni")
ss_effect <- summary(anova_motivation)[[1]]["condition", "Sum Sq"]
ss_error  <- summary(anova_motivation)[[1]]["Residuals", "Sum Sq"]
eta2_motivation <- ss_effect / (ss_effect + ss_error)
cohens_f_motivation <- sqrt(eta2_motivation / (1 - eta2_motivation))

# Grade (Q11)
anova_grade <- aov(Q11 ~ condition, data = data)
emm_grade <- emmeans(anova_grade, ~ condition)
contrast_grade <- contrast(emm_grade, method = "pairwise", adjust = "bonferroni")

ss_effect_grade <- summary(anova_grade)[[1]]["condition", "Sum Sq"]
ss_error_grade  <- summary(anova_grade)[[1]]["Residuals", "Sum Sq"]

eta2_grade <- ss_effect_grade / (ss_effect_grade + ss_error_grade)
cohens_f_grade <- sqrt(eta2_grade / (1 - eta2_grade))

# Extract F values 

F_peet <- as.numeric(summary(anova_peet)[[1]]$"F value"[1])
F_pei <- as.numeric(summary(anova_pei)[[1]]$"F value"[1])
F_motivation <- as.numeric(summary(anova_motivation)[[1]]$"F value"[1])
F_grade <- as.numeric(summary(anova_grade)[[1]]$"F value"[1])

# Contrasts to data frames
contrast_peet <- as.data.frame(contrast_peet)
contrast_pei <- as.data.frame(contrast_pei)
contrast_motivation <- as.data.frame(contrast_motivation)
contrast_grade <- as.data.frame(contrast_grade)

## RESULTS
anova_peet

Call:
   aov(formula = peet ~ condition, data = data)

Terms:
                condition Residuals
Sum of Squares   50.90754  56.90728
Deg. of Freedom         2        51

Residual standard error: 1.056328
Estimated effects may be unbalanced

contrast_peet

 contrast             estimate        SE df t.ratio p.value
 comfort - strategy  2.3396382 0.3584231 51   6.528  <.0001
 comfort - control   1.5921053 0.3427181 51   4.646  0.0001
 strategy - control -0.7475329 0.3584231 51  -2.086  0.1261

P value adjustment: bonferroni method for 3 tests

cohens_f_peet

[1] 0.945817

anova_pei

Call:
   aov(formula = pei ~ condition, data = data)

Terms:
                condition Residuals
Sum of Squares   46.88140  65.03063
Deg. of Freedom         2        51

Residual standard error: 1.129208
Estimated effects may be unbalanced

contrast_pei

 contrast             estimate        SE df t.ratio p.value
 comfort - strategy -1.9991776 0.3831520 51  -5.218  <.0001
 comfort - control  -1.9078947 0.3663634 51  -5.208  <.0001
 strategy - control  0.0912829 0.3831520 51   0.238  1.0000

P value adjustment: bonferroni method for 3 tests

cohens_f_pei

[1] 0.8490657

anova_motivation

Call:
   aov(formula = motivation ~ condition, data = data)

Terms:
                condition Residuals
Sum of Squares   70.44055  85.26316
Deg. of Freedom         2        51

Residual standard error: 1.292991
Estimated effects may be unbalanced

contrast_motivation

 contrast             estimate        SE df t.ratio p.value
 comfort - strategy -2.4736842 0.4387255 51  -5.638  <.0001
 comfort - control  -2.3157895 0.4195018 51  -5.520  <.0001
 strategy - control  0.1578947 0.4387255 51   0.360  1.0000

P value adjustment: bonferroni method for 3 tests

cohens_f_motivation

[1] 0.9089304

anova_grade

Call:
   aov(formula = Q11 ~ condition, data = data)

Terms:
                condition Residuals
Sum of Squares   11.42215  17.91118
Deg. of Freedom         2        51

Residual standard error: 0.592621
Estimated effects may be unbalanced

contrast_grade

 contrast             estimate        SE df t.ratio p.value
 comfort - strategy -1.0296053 0.2010825 51  -5.120  <.0001
 comfort - control  -0.8947368 0.1922716 51  -4.654  0.0001
 strategy - control  0.1348684 0.2010825 51   0.671  1.0000

P value adjustment: bonferroni method for 3 tests

cohens_f_grade

[1] 0.7985677

Perceptions of Professor Entity Theory (PEET)

In the original study, feedback condition strongly influenced students’ perceptions of their professor’s entity beliefs, F(2, 51) = 15.95, p < .001, Cohen’s f = 0.79.
This replication produced a similarly strong effect,
F(2, 51) = 22.812,
p < .001,
Cohen’s f = 0.946.

As in the original, participants in the comfort condition viewed the professor as more entity-minded than those in the strategy and control conditions. The contrast between comfort and strategy was large and significant (t(51) = 6.528, p < .001), and comfort also differed significantly from control (t(51) = 4.646, p = 7.2604498^{-5}).
The strategy–control comparison again showed no significant difference (p = 0.1261069), mirroring the original pattern.

Overall, the replication closely matches the original findings in magnitude and direction, suggesting that the feedback manipulation reliably shapes students’ interpretations of a professor’s mindset.

Perceptions of Professors’ Expectations and Investment (PEI)

The original study reported a significant effect of condition on perceived expectations and investment,
F(2, 51) = 12.83, p < .01, Cohen’s f = 0.71.
Our replication again found a strong effect,
F(2, 51) = 18.383,
p < .001,
Cohen’s f = 0.849.

Consistent with prior results, the comfort condition led to significantly lower perceptions of expectations and investment than both the strategy (t(51) = -5.218, p < .001) and control conditions (t(51) = -5.208, p < .001).
Unlike the original study—which found a significant strategy–control difference—we observed no difference between these two conditions (p = 1).
Thus, the replication supports the main effect but provides weaker evidence for differences between the two constructive-feedback conditions.

Student Motivation

The original study found that comfort feedback significantly reduced student motivation,
F(2, 51) = 6.33, p < .01, Cohen’s f = 0.49.
Our replication revealed an even stronger effect,
F(2, 51) = 21.067,
p < .001,
Cohen’s f = 0.909.

As expected, comfort feedback produced substantially lower motivation than strategy (t(51) = -5.638, p < .001) or control feedback (t(51) = -5.52, p < .001).
The strategy–control comparison was nonsignificant (p = 1), which is consistent with the original findings.

Overall, the replication not only reproduces but amplifies the motivational pattern observed in the original work.

Expected Grade

The original study found that comfort feedback lowered expectations for end-of-year grades, F(2, 51) = 5.25, p < .01.
Our replication showed a similarly robust effect,
F(2, 51) = 16.262,
p < .001,
Cohen’s f = 0.799.

Participants in the comfort condition expected significantly lower grades than those in the strategy (t(51) = -5.12, p < .001) or control conditions (t(51) = -4.654, p < .001).
As in earlier measures, the strategy–control comparison was nonsignificant (p = 1), whereas the original study found a small difference.

Taken together, the replication strongly supports the primary claim that comfort feedback leads students to lower expectations for their own performance.

Figure from original paper (Rattan et al., 2012):

Figure from replication study.

# Prepare the data
plot_data <- data %>%
  pivot_longer(cols = c(peet, pei, motivation), 
               names_to = "Variable", values_to = "Score") %>%
  group_by(condition, Variable) %>%
  summarise(mean_score = mean(Score), 
            se_score = sd(Score) / sqrt(n()), .groups = 'drop') %>%
  mutate(Variable = factor(Variable, levels = c("peet", "pei", "motivation")))

# Make sure Condition is a factor in the desired order
plot_data <- plot_data %>%
  mutate(condition = factor(condition, levels = c("comfort", "control", "strategy")))

# Create the bar plot
ggplot(plot_data, aes(x = Variable, y = mean_score, fill = condition)) +
  geom_bar(stat = "identity", position = position_dodge(), width = 0.7) +
  geom_errorbar(aes(ymin = mean_score - se_score, ymax = mean_score + se_score), 
                position = position_dodge(0.7), width = 0.2) +
  scale_fill_manual(values = c("comfort" = "black", 
                               "control" = "lightgray", 
                               "strategy" = "gray"),
                    labels = c("comfort" = "Comfort Feedback", 
                               "control" = "Control Feedback", 
                               "strategy" = "Strategy Feedback")) +
  scale_x_discrete(labels = c(peet = "Perceptions of Entity Theory", 
                              pei = "Low Expectations/Investment", 
                              motivation = "Student's Motivation")) +
  labs(title = "Student Responses to Feedback", y = "(1-7 points scale)", x = NULL) +
  coord_cartesian(ylim = c(1, 7)) +
  theme_classic() +
  theme(
    legend.title = element_blank(),
    plot.title = element_text(size = 18, face = "bold", hjust = 0.5),
    axis.title = element_text(size = 14),
    axis.text = element_text(size = 12),
    legend.text = element_text(size = 12)
  )

Discussion

Summary of Replication Attempt

The confirmatory analyses provide strong evidence that the original study’s central findings replicated. Across all measured constructs—perceptions of the professor’s entity theory, perceptions of expectations and investment, student motivation, and expected final grade—the feedback manipulation produced the same qualitative pattern reported in the original work: comfort feedback consistently led to more negative interpretations of the professor, lower motivation, and lower expected performance compared to strategy or control feedback. Three of the four effects replicated closely, showing similar or even larger effect sizes than those observed originally. One effect (the difference between the strategy and control conditions on perceptions of expectations and investment) partially replicated, with our data supporting the main contrast involving comfort feedback but not reproducing the smaller difference between the constructive-feedback conditions. Overall, the replication was highly successful, providing convergent evidence that comfort feedback reliably produces more negative academic perceptions and expectations than strategy or control feedback

Commentary

This replication effort closely aligns with the original study by Rattan et al. (2012), and the findings generally reproduced the original pattern of results. Across all primary outcomes, comfort feedback again led to lower motivation, lower perceived expectations, and higher perceptions of entity-theory beliefs compared to strategy or control feedback. Although some differences between strategy and control conditions were not significant in the present study, these discrepancies appear small and well within what would be expected due to sampling variability. Given the close matching of materials, procedures, and sample characteristics, these differences are unlikely to represent meaningful moderators, and I interpret the replication as successful in capturing the core effect.

One notable deviation from the pre-data collection plan concerned cell sizes. I initially planned to recruit 51 participants with equal allocation across the three conditions (17 per group). However, I later discovered that opening the Qualtrics survey counts as a condition assignment, meaning that dropout or partial engagement created uneven group sizes. To maintain consistency with the original study and ensure sufficient power, I increased the target to 54 participants—the same number used in the original paper. The final distribution (comfort = 19, control = 19, strategy = 16) is not ideal but still acceptable for the planned analyses, and the imbalance is unlikely to have materially affected the results.

Additionally, a data-collection error occurred due to a misunderstanding of the SONA scheduling system. I mistakenly set November 30 as the participation date, not realizing that SONA treats this field as a deadline rather than a start date. As a result, participants were able to enroll and complete the study immediately, and 45 participants completed the survey before I could implement the intended Pilot B revisions. Once I recognized the issue, I closed the study to prevent additional early enrollments. The planned revisions involved only a minor clarification in the information sheet, and the survey content itself remained essentially unchanged. Given that the difference was minimal and that the core materials were identical, I proceeded with analyzing the existing data while transparently documenting this deviation.

Overall, despite these practical challenges, the replication retains strong methodological fidelity, and the observed results support the robustness of the original findings.

Appendix: Repository

https://github.com/psych251/rattan2012