Replication of Experiment 4 in ‘‘What Predicts Children’s Fixed and Growth Intelligence Mindsets? Not Their Parents’ View of Intelligence but Their Parents’ View of Failure’ by Haimovitz and Dweck (2016, Psychological Science)

Introduction

Short justification for my choice of experiment.

My choice of experiment aligns with my research interests in the field of education, particularly in the area of students’ intelligence mindset. Throughout my master’s program, I delved into the fascinating debate surrounding fixed and growth mindsets among students. Additionally, I have explored research articles investigating the connection between parents’ mindsets and their children’s mindset development. However, the unique focus of this experiment on parents’ failure mindset and its potential influence on students’ intelligence mindset offers a fresh perspective. This study promises to contribute significantly to my understanding of the factors that shape students’ intelligence mindsets, making it a valuable addition to my research program.

A description of the stimuli and procedures that will be required to conduct this experiment, and the expected challenges.

This experiment aimed to investigate whether parents’ failure mindsets have a causal effect on their reactions to their children’s failures. The procedures of this experiments might include the following steps; 1) Recruit the parents 2) Pre-assess the parents’ current beliefs and perceptions of their child’s competence 3) Randomly assign parents to two groups to manipulate their their failure mindset (i.e., failure-is-debilitating mind-set and failure-is- enhancing mind-set) 5) Ask participants to imagine their child’s failure in math quiz and write their response 6) Post-assess parents’ failure mindset after manipulation 7) Code the open-ended responses 8) Finally analyze the data and report the result.

However, crafting effective questionnaires that genuinely manipulate the desired mindset can be challenging. Moreover, even though we attempt to randomly assign participants to one of the two questionnaire conditions, ensuring truly random assignment can be difficult due to factors such as participants’ personal biases or demographic information, which may affect the mindset manipulation. Additionally, when analyzing the results of the intervention, the parents’ responses regarding their feelings related to their child’s supposed failure on a math quiz are in an open-ended descriptive format. This means that researchers should code the open-ended responses into performance-oriented and learning-oriented responses, a process that can be subject to inter-rater variability. Maintaining high reliability between coders is essential.

A link to the repository and to the original paper is “here”.

Summary of prior replication attempt

Based on the prior write-up, describe any differences between the original and 1st replication in terms of methods, sample, sample size, and analysis.

	Original paper	First replication
Sample size	n = 132	n = 115
Sample characteristics
1)Gender	Female (57%)	Female (49%), genderqueer, gender fluid, or non-binary (2%)
2)Education	High school diploma or some college education (31%), College degree (51%), Postgraduate degree (18%)	No degrees (1%), High school diploma (24%), College degree (55%), Post-graduate degree (19%), Preferred not to answer (1%)
3)Race/Ethnicity	White (75%), African American (12%), Asian American (7%), Hispanic (6%)	White (63%), Black/African American (16%), Asian (4%), Hispanic, Latino or Spanish origin (17%), Native Hawaiian or Other Pacific Islander (1%), Filled in their own option (3%), Preferred not to answer (1%)
Crowdsourcing Platform	Amazon’s Mechanical Turk Web	Prolific
Analysis	Unpaired, Two-Tailed T-Test	Unpaired, Two-Tailed T-Test
Coder	Two coders: rated 20 responses (15%).(performance-oriented responses: ICC = .91; learning- oriented responses: ICC = .90).	one coder
Data exclusion	did not mention for the exclusion	exclude data from participants if the open-ended responses were not possible to code (e.g., unintelligible responses, responses such as “I don’t know.”).
Result	Parents who were induced to hold a failure-is-debilitating mind-set were more likely to react with concerns about their child’s performance and lack of ability t(131) = 3.246, p < .001, ηp2 = .075	showed similar trends to this original finding. But, the effect was not as large and there was no statistical difference in the number of performance-oriented responses between the two conditions, t(113) = -1.9291, p = 0.0562, ηp2 = .031.

In addition to the original research, the replication study examined demographic variables such as the child’s age, parents’ socioeconomic status (specifically, education level), parents’ gender, and parents’ race/ethnicity. The study also investigated the relationship between parents’ prior knowledge of mindset research and their learning-oriented and performance-oriented responses.

While the replication study provided a transparent account of the sampling process and data analysis, it had limitations. Specifically, the study employed only a single coder, making it impossible to assess inter-coder reliability.

Methods

Power Analysis

Original effect size, power analysis for samples to achieve 80%, 90%, 95% power to detect that effect size. Considerations of feasibility for selecting planned sample size.

How much power does your planned sample have for original effect? For an attenuated effect that is half the size of the original?

(If power analysis is not possible or precise, discuss more fully how you determined a sample size that would be sufficient for rescue.)

Planned Sample

In the original study, 310 adults from a crowdsourcing platform called Amazon Mechanical Turk completed an initial survey asking whether they were a parent. Out of these adults, 132 participants reported to be a parent and were chosen to participate in the study.

In the first replication study, 119 parents were recruited and 115 participants were finally analyzed.

This study plan to recruit the parents from prolific; 150 parents (75 mothers and 74 fathers). Because, the result of first replication study was not significant compared to the original paper and the replicator mentioned that; > However, I think my replication was rather successful, given that my sample size was smaller.

Therefore, this study tried to collect the data more than original participants (n = 132) to differentiate the sample size from the previous research. Parents were recruited through a crowdsourcing platform called Prolific. The participant pool on Prolific was filtered by parental status to specifically reach parents.

Materials

General Materials and Procedures

This study followed the materials and procedures outlined in the original article same as the first replication study.

All survey items outlined in the supplementary “Materials and Measures” document were used.

Participants completed an online survey initially assessing several beliefs, including their perceptions of their child’s competence (assessed with same measure as in Study 1; α = .79). Then we temporarily manipulated failure mind-sets by randomly assigning the parents to complete one of two five-item biased questionnaires, written to foster agreement with either a failure-is-debilitating mind-set (e.g., “Experienc- ing failure can lead to negative feelings, like shame or sadness, that interfere with learning”) or a failure-is- enhancing mind-set (e.g., “Experiencing failure can improve performance in the long run if you learn from it”). All measures used a 6-point rating scale from 1 (strongly disagree) to 6 (strongly agree). One-sample t tests comparing the mean in each priming condition with the scale’s midpoint (3.5) showed that participants’ agree- ment with the intended mind-set was above the midpoint in both the failure-is-debilitating condition (M = 4.41, SD = 1.07), t(56) = 6.45, p < .001, and the failure-is- enhancing condition (M = 5.14, SD = 0.829), t(74) = 17.11, p < .001.

We then asked participants to read and vividly imagine a scenario in which their child came home from school with a failing grade on a math quiz, as in Study 2. They then wrote what they would do, think, and feel in response. Finally, participants reported on their failure mind-sets (α = .82), using the same items as in Study 1, as part of a survey that included a few other items.

The first replication study highlighted the ambiguity of the Likert scale used in the original article, as well as the lack of clarity in the collected demographic data: > From reading the original article and supplementary document, it was not clear whether the middle points on the Likert scale were labelled. Based on similar studies examining intelligence mindsets, I decided to label the middle points with “mostly dis/agree” and “slightly dis/agree.” … As I was also not sure where to collect demographic data (i.e., sex, race, education level, child’s age), I decided to collect this information as part of the final survey at the very end of the study.

This study decided to follow this from first replication study.

Here is a link to the survey that was used to conduct the replication study: https://stanforduniversity.qualtrics.com/jfe/form/SV_1ZfjroaQicSlE8e

open-ended responses coding

The open ended responses were coded following the explanation in the original article and the supplementary document; “Materials and Measures”

Two raters, blind to condition, coded the open- ended responses. The first author developed a coding scheme on the basis of an initial reading of the responses and then made clarifying revisions on the basis of feed- back from the two raters.

The coding scheme was already developed by the author, this study followed it.

The codes were broken down into two main categories of interest: performance-oriented responses and learning-oriented responses. Coders gave a score of 1 each time a code was present. Codes in the performance-oriented category were responses that focused on judgments of ability, particularly as a stable trait (e.g., “I would think maybe my child is just not that good at math”); comfort for lack of ability (e.g., “It’s ok that you got an F. You tried your best”); contingent self-worth based on their child’s performance (e.g., “I’d feel bad about myself”); pity for their child’s lack of ability (“I would feel a little nervous for my child because I know how hard it can be”); grades as a goal (e.g., “I would . . . hope their grades from previous [tests] are high enough to make up for the test”); and social comparison (“I would also want to know how the other children in the class scored”). Codes in the learning-oriented category were responses that focused on judgments of effort (e.g., “I would tell my son he needs to study harder”); strategies, which included both general strategies (e.g., “he didn’t study the material in the right way”) and specific study or test-taking strategies (e.g., “I would also say that double checking your work before you hand it in is a good habit to get into”); help seeking (e.g., “I would get her a math tutor”); mastery, or conceptual understanding, as a goal (e.g., “the important thing we need to do is try to understand the concepts behind the problems he got wrong, and then study those”); interest (e.g., “I would hope that the results of the test would not stop her from enjoying the class and wonder about ways I could help keep her liking of the subject going”); and explicit characterizations of failure as enhancing, or good (e.g., “It is ok to make mistakes and fail sometimes, because that’s how people learn”).

Two statements that repeated the same sentiment were not coded as two instances (e.g., “I would question how much studying did they do” and “I would also ask . . . do they think they studied enough” would be one code for effort). However, two statements that expressed different ideas but fell under the same code were marked as two instances (e.g., “I would question my child to make sure that she studied the correct material thoroughly” and “I would ask to make sure that she was paying attention in class” would be marked as two codes for strategies, as these statements represent different strategies). If a statement fell under two codes and one was more specific than the other, only the more specific classification was counted. That is, although effort and help seeking can be different types of strategies, statements expressing these ideas were coded only as effort and help seeking, not also as strategies.

Scores for performance-oriented and learning-oriented responses were each created by summing all instances of their respective subcategories. Two coders rated 20 responses (15%) to assess reliability.”

On the other hand, first replication study conducted the coding with one person. > As there was already a coding scheme developed by the original authors, I did not create my own coding scheme. Rather, I carefully reviewed the coding scheme outlined in the supplementary document to code the responses. I considered recruiting an undergraduate RA or another PSYCH 251 student to serve as the second rater, but in the end, I decided to code all of the responses on my own. I came to this decision during the pilot study, when I noticed it takes a while to get used to the coding scheme outlined by the authors. Given that the project needed to be completed in a quarter, it didn’t seem feasible to recruit another student and have them reliably code the responses in this short time frame. By scrambling the responses before downloading and viewing the open-ended responses for coding, I was able to ensure that I remained blind to condition.

This study will follow the orignial paper by recruiting the other coder besides me.

Analysis Plan

The key analysis of interest will be an unpaired, two-tailed t-test comparing performance-oriented responses between the two conditions. The analysis will aim to replicate the following result:

Parents who were induced to hold a failure-is-debilitating mind-set were more likely to react with concerns about their child’s per- formance and lack of ability, t(131) = 3.246, p < .001, ηp2 = .075, and less likely to react with support for their child’s learning and mastery, t(131) = −2.04, p = .043, ηp2 = .031, compared with those who were induced to hold a failure-is-enhancing mind-set (see Fig. 2). Parents in both conditions did not report performance-oriented responses (M = 0.485, SD = 0.693) nearly as often as learning-oriented responses (M = 2.38, SD = 1.53).

Moreover, the 1st replication study did exploratory analysis. > Apart from any of the analyses reported in the original study, I also explored whether parents’ knowledge of mindset research was associated with their learning-oriented and performance-oriented responses.

This study aim to replicate this exploratory analysis too.

Differences from Original Study and 1st replication

To summarize the distinctions between the original study, the first replication, and the present study: Firstly, the current research aims to recruit 150 parents, a sample size larger than those in the original study and the first replication. This increase in sample size is anticipated to analyze whether the failure to replicate the original research in the first study was due to a smaller sample size. Secondly, this research intends to code the open-ended responses with two coders, adhering to the method employed in the original study. This approach can address the limitations identified in the first replication study. Lastly, this study will reproduce the explanatory analysis conducted in the first replication, specifically examining the relationship between parents’ prior knowledge of failure mindset and their learning-oriented and performance-oriented responses. This endeavor offers an additional opportunity to validate the findings of the first replication study.

Methods Addendum (Post Data Collection)

You can comment this section out prior to final report with data collection.

Actual Sample

Sample size, demographics, data exclusions based on rules spelled out in analysis plan

Differences from pre-data collection methods plan

Any differences from what was described as the original plan, or “none”.

Results

Data preparation

Data preparation following the analysis plan.

### Data Preparation

#### Load Relevant Libraries and Functions
library(rmarkdown)
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(knitr)
library(qualtRics)
library(psych)      # for reverse coding

## 
## Attaching package: 'psych'
## 
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha

library(ggsignif)   # to add significance bars

## Load Data

### get Qualtrics survey data
qualtrics_data <- read_csv('/Users/myeong-eunjeong/Library/CloudStorage/OneDrive-Stanford/PSYCH 251/haimovitz2016_1_rescue-main/raw_data/1029 result.csv')

## Rows: 4 Columns: 59
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (59): StartDate, EndDate, Status, IPAddress, Progress, Duration (in seco...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

### filter out all pilot participants, scramble open-ended responses by prolific_id to blind myself to condition before coding, then re-assign prolific_id to remove prolific_id identifier
#qualtrics_data <- qualtrics_data |>
  #filter(StartDate >= as.Date("2021-11-18")) |>
  #arrange(prolific_id) |>
  #mutate(prolific_id = 1: length(prolific_id)) |>
  #select(-c("PROLIFIC_PID", "ResponseId"))

### create a new file with open-ended responses to use for coding
data_response_scrambled <- qualtrics_data |>
  select("prolific_id", "oer_1", "oer_2", "oer_3")

## save both csv files

### scrambled, open-ended responses
write.csv(data_response_scrambled, "../raw_data/data_response_scrambled.csv")

### rest of the Qualtrics data, WITHOUT the open-ended responses
qualtrics_data <- qualtrics_data |>
  select(-c("oer_1", "oer_2", "oer_3"))
write.csv(qualtrics_data, "../raw_data/qualtrics_data.csv")

## load coded responses
coded_response <- read.csv("/Users/myeong-eunjeong/Library/CloudStorage/OneDrive-Stanford/PSYCH 251/First_Replication/writeup/data_response_scrambled_coded.csv", header = TRUE, sep = ",")

## data exclusion / filtering: filter out all participants whose open-ended response scores was NA
coded_response <- coded_response |>
  filter(!is.na(coded_por) | !is.na(coded_lor))

## prepare Qualtrics data (take out irrelvant variables and label each participant with condition)
qualtrics_data <- qualtrics_data |>
  #label each participant with the correct condition
  mutate(condition = ifelse(!is.na(fid_manip_1), "debilitating", 
                            ifelse(!is.na(fie_manip_1), "enhancing", NA)))

## merge data 
metadata <- merge(qualtrics_data, coded_response, by = "prolific_id")

Results of control measures

How did people perform on any quality control checks or positive and negative controls?

Confirmatory analysis

Main Analysis of Interest: Were parents who hold a failure-is-debilitating mind-set more likely to react with concerns about their child’s performance and lack of ability?

# Make a dataframe for the FID and FIE conditions to conduct t-test
#data_coded_response_fie <- metadata |>
  #filter(condition == "enhancing")
#data_coded_response_fid <- metadata |>
  #filter(condition == "debilitating")

# T-Test 
#t_test_por <- t.test(data_coded_response_fie$coded_por, data_coded_response_fid$coded_por, alternative = "two.sided", var.equal = FALSE)

#t_test_por

Original Study Result > “Parents who were induced to hold a failure-is-debilitating mind-set were more likely to react with concerns about their child’s performance and lack of ability, t(131) = 3.246, p < .001, ηp2 = .075 … compared with those who were induced to hold a failure-is-enhancing mind-set.”

Three-panel graph with original, 1st replication, and your replication is ideal here

original research result

1st replication research result

Discussion

Mini meta analysis

Combining across the original paper, 1st replication, and 2nd replication, what is the aggregate effect size?

Summary of Replication Attempt

Open the discussion section with a paragraph summarizing the primary result from the confirmatory analysis and the assessment of whether it replicated, partially replicated, or failed to replicate the original result.

Commentary

Add open-ended commentary (if any) reflecting (a) insights from follow-up exploratory analysis, (b) assessment of the meaning of the replication (or not) - e.g., for a failure to replicate, are the differences between original and present study ones that definitely, plausibly, or are unlikely to have been moderators of the result, and (c) discussion of any objections or challenges raised by the current and original authors about the replication attempt. None of these need to be long.