Replication of “The Illusion of Moral Decline” by Mastroianni & Gilbert (2023, DOI)
Author
Fisher Anderson
Published
December 7, 2025
Introduction
My particular research interests reside in the field of Symbolic Systems, which pertains mainly to the brain and it’s ever growing similarities with computers. Specifically, I have molded this path towards Human Centered AI (HAI), with frequent detours into the realm of human consciousness. In trying to find an adequate study to replicate, none were made known to me that both fit my research relevance and real world feasibility. This is because most work on consciousness has been published as a report or commentary, and the few research articles that exist require large scale machinery like fMRI. This article by Mastroianni can be tied to my field of study by examining how the mind perceives morality, a hot topic in HAI, and is also practically feasibly by using Prolific, directly mimicking the study.
This study consists of 5 major segments, all of which require survey data to make conclusions about what people think about morality. The first three segments show that people will claim that morality has declined when explicitly asked to assess moral change in a variety of time spans. The fourth segment shows that, in reality, people do not think morality is on a decline when asked to assess their own contemporaries. Lastly, the fifth segment shows that the appearance of moral decline is also not present for people when asked about their own personal worlds (i.e. friends and family). I will be replicating study 2(c), which specifically asks about morality in the current year, the year the participant was born, and about 20 years ago.
In order to conduct this experiment, I will need to find a similar database of people (all of which are from the US) to survey about their own morality and whether or not it has declined around them in varying amounts of time. This is made easy by prolific screeners, and an additional screener about US cultural knowledge in the study.
Original effect size, power analysis for samples to achieve 80%, 90%, 95% power to detect that effect size. Considerations of feasibility for selecting planned sample size.
– There were 3 aspects of study 2(c) that proved to be significant, with effect sizes of −0.72, −1.08, and −0.37.
– Originally, they collected data from 484 respondents, about 50 in each of the 10 age bins. After exclusions, they were left with 347 participants. In order to reach a power of 80%, I would only need 60 participants. Because of the feasibility of Prolific and desire for strong effect, I am using 100 participants (rounding up from 97) to reach a power level of 95%, as shown below.
The only major demographic screener is that the participants must be able to complete a three-item test of English proficiency and knowledge of US American culture. For instance, know that a “bell bottom” is not a type of footwear. Additionally, the participants are an evenly distributed sample size across 10 age brackets from 18-69 in roughly 5 year bins.
Participants were excluded upon meeting any of the following criteria: they incorrectly asnwered any of the 3 questions to the English proficiency & cultural screener; their exact reported age at the end of the study did not their selected age bin at the beginning of the study; they failed a built in consistency check about perceived morality in the year they were born; or they failed an attention check asking them to select “other” and write “apple” manually.
Planned sample size and/or termination rule, sampling frame, known demographics if any, preselection rules if any.
Materials
All materials - can quote directly from original article - just put the text in quotations and note that this was followed precisely. Or, quote directly and just point out exceptions to what was described in the original article.
– how do I link to the original work doc with screenshots of the qualtrics flow?
Procedure
Can quote directly from original article - just put the text in quotations and note that this was followed precisely. Or, quote directly and just point out exceptions to what was described in the original article.
– Here is the direct procedure as quoted in the paper: “Study 2c was conducted in 2020. Participants responded to an advertisement for a study on Amazon Mechanical Turk. After providing informed consent, participants reported how “kind, honest, nice and good” people are today. They then reported how “kind, honest, nice and good” people were when they (the participants) were about 20 years old, and at about the time they (the participants) were born. This was done by adjusting the wording of the subsequent questions on the basis of the participant’s age. For example, if the participant was between 30 and 34 years old, they were asked “How kind, honest, nice, and good were people about ten years ago?” and then “How kind, honest, nice, and good were people about 30 years ago?” If participants were under 25 years, they answered only the questions for today and when they were born. All questions were answered using a seven-point Likert scale with endpoints labelled ‘not very’ and ‘very’. As in previous studies, participants were then given a consistency check that required them to remember whether they had rated people today as more, equally or less moral compared to people in the year they were born. Participants then answered some further exploratory and demographic questions. Embedded among them was an attention check that required participants to select the option ‘other’ and type the word ‘apple’. Finally, participants were compensated and dismissed.”
– The only differences are that the study was conducted in 2025, and the platform was Prolific and not MTurk.
Analysis Plan
Can also quote directly, though it is less often spelled out effectively for an analysis strategy section. The key is to report an analysis strategy that is as close to the original - data cleaning rules, data exclusion rules, covariates, etc. - as possible.
– Here is the direct analysis as quoted in the study: “To analyse the data, we fit a linear mixed effects model using the lme4 package in R, extracted P values using the lmerTest package and calculated planned contrasts using the emmeans package, using a Holm–Bonferroni correction for multiple comparisons. The outcome was participants’ ratings and the predictor was the year of those ratings (one factor with three levels: today, the year the participant turned 20, the year the participant was born). The model included a fixed effect of the year of each rating and a random intercept for each participant. For this and all models, we checked model assumptions by plotting the outcome variable, residuals and fitted values. All tests we report are two-tailed.”
Clarify key analysis of interest here You can also pre-specify additional analyses you plan to do.
– We fit a linear mixed effects model with random intercepts for each participants, and then did planned contrasts between each of the time points, with the Holm-Bonferroni correction for multiple comparisons.
Differences from Original Study
Explicitly describe known differences in sample, setting, procedure, and analysis plan from original study. The goal, of course, is to minimize those differences, but differences will inevitably occur. Also, note whether such differences are anticipated to make a difference based on claims in the original article or subsequent published research on the conditions for obtaining the effect.
– A large consideration is the inherent difference between the two platforms, Prolific (used here) and Amazon Mechanical Turk (used in the original). This carries along with it a unique set of population differences that may or may not be significant to the end result.
– Also, my quotas are built into prolific, his were manual on mtruk – Also, I rejected execptionally fast responses
Methods Addendum (Post Data Collection)
You can comment this section out prior to final report with data collection.
Actual Sample
Sample size, demographics, data exclusions based on rules spelled out in analysis plan
Differences from pre-data collection methods plan
Any differences from what was described as the original plan, or “none”.
Results
Data preparation
Data preparation following the analysis plan.
Confirmatory analysis
The analyses as specified in the analysis plan.
Side-by-side graph with original graph is ideal here
#####model#####good_melt$time <-factor(good_melt$time, levels(good_melt$time)[c(3,2,1)])good_mod <-lmer(rating ~ time + (1|participant), data = good_melt) #still having errors here, but also in Pilot Asummary(good_mod)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: rating ~ time + (1 | participant)
Data: good_melt
REML criterion at convergence: 195.2
Scaled residuals:
Min 1Q Median 3Q Max
-2.14615 -0.42903 -0.02105 0.31119 2.53613
Random effects:
Groups Name Variance Std.Dev.
participant (Intercept) 0.7145 0.8453
Residual 1.2353 1.1115
Number of obs: 58, groups: participant, 20
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 4.5492 0.3259 46.3056 13.960 <2e-16 ***
timetoday -0.6492 0.3636 36.7965 -1.785 0.0825 .
timeborn 0.4008 0.3636 36.7965 1.102 0.2775
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) timtdy
timetoday -0.595
timeborn -0.595 0.533
####plot####good_melt$time <-factor(good_melt$time, levels =c("born","twenty","today"))plot <-ggplot(good_melt, aes(x = time, y = rating)) +stat_summary(fun.data ="mean_cl_boot") #ggplot_build(plot) #unnecessary?plot
Warning: Removed 35 rows containing non-finite outside the scale range
(`stat_summary()`).
Exploratory analyses
Any follow-up analyses desired (not required).
Discussion
Summary of Replication Attempt
Open the discussion section with a paragraph summarizing the primary result from the confirmatory analysis and the assessment of whether it replicated, partially replicated, or failed to replicate the original result.
// I used Prolific, he used MTurk. This is a potentially big difference. //the only other thing I changed about the qualtrics survey is the consent email and debriefing note.
Commentary
Add open-ended commentary (if any) reflecting (a) insights from follow-up exploratory analysis, (b) assessment of the meaning of the replication (or not) - e.g., for a failure to replicate, are the differences between original and present study ones that definitely, plausibly, or are unlikely to have been moderators of the result, and (c) discussion of any objections or challenges raised by the current and original authors about the replication attempt. None of these need to be long.