Replication of ‘Does poverty promote a different and harmful way of thinking? The links between economic scarcity, concrete construal level and risk behaviors’ experiment 2 by Caballero et al. (2021, Current Psychology)

Author

Morgan Tompkins

Published

December 8, 2025

Introduction

Justification

Caballero et al. (2021) is a social psychology paper that experimentally manipulates perceived economic scarcity to test how it shifts people’s construal level, or their tendency to think in concrete versus abstract terms. The paper, titled ‘Does poverty promote a different and harmful way of thinking? The links between economic scarcity, concrete construal level and risk behaviors’ asserts that economic scarcity influences an individual’s construal level and subsequently could impact, or impair, decision making for people in low-income settings. Caballero et al. (2021) builds on Construal Level Theory (Trope & Liberman, 2003), which proposes that psychological distance influences whether people think concretely or abstractly. Specifically, in study 2 of the paper, researchers address the following research question:

Does feeling economically scarce lower people’s construal level (i.e., make them think more concretely), compared to feeling economically sufficient?

To address this question, participants were randomly assigned to a “scarcity” or “non-scarcity” condition within a fictional society called Bimboola. Researchers measured how this manipulation changed participants’ thinking styles using the Behavioral Identification Form (BIF) as a pre and post-test. They found that those in the scarcity condition thought more concretely about actions and goals, confirming that perceived financial limitation causally narrows one’s cognitive focus to the immediate present.

Original Authors:Pilar Carrera (pilar.carrera@uam.es); Amparo Caballero (amparo.caballero@uam.es); Itziar Fernández (ifernandez@psi.uned.es); Pilar Aguilar (mpaguilar@uloyola.es); Dolores Muñoz (lola.munnoz@uam.es)

Stimuli & Procedures

This replication will follow Caballero et al. (2023) Study 2. The original sample included 120 undergraduate participants (102 women) who were randomly assigned to a scarcity or non-scarcity condition. All data were collected individually in lab cubicles via Qualtrics.

The procedure consisted of: 1. BIF (Pre-test): 12 items from the Behavioral Identification Form, where each action (e.g., “locking a door”) is described at two levels: 1. a concrete “how” description (“putting the key in the lock”) and 2. an abstract “why” description (“securing the house”). Participants select which feels more natural.

  1. Economic-scarcity manipulation (fictional society): Participants imagined living in a fictional society (Bimboola) organized by five income groups.

    • Scarcity condition: assigned to the lowest income tier (≤ 400 Bimboolean per month) and required to choose items such as housing, transportation, and leisure from unattractive, low-resource options while viewing the better alternatives available to wealthier groups.
    • Non-scarcity condition: assigned to a comfortable middle-income tier (1,201–3,000 Bimboolean) and made similar choices among adequate options.
  2. BIF (Post-test): A second 12-item subset of the BIF (non-overlapping with the pre-test) measured change in abstraction.

  3. Manipulation checks: “My group is poor” and “My group is rich” on 7-point scales, plus an estimated group-income question.

  4. Basic demographic collection & debrief.

Methods

Power Analysis

The original study found a medium effect of f = 0.33 (ANOVA, F(1,118) = 13.18, p < .001) with N = 120. Using this value, I conducted an a priori power analysis (α = .05, two-tailed) targeting 80 percent power to detect an effect of this magnitude in a simple two-condition design. This analysis indicated that a minimum of approximately 70 participants would be required. I rounded to 72 to fit Prolific’s block constraints and ensure clean counterbalancing. This sample allows me to adequately detect an effect comparable to the original while avoiding unnecessary oversampling.

Sample

Participants were recruited to reach the planned sample size of n = 72, based on a power analysis targeting 80% power to detect the original effect size (f = 0.33) reported by Caballero et al. (2021). The study uses a questionnaire format and corresponds to Experiment 2 in the original paper. Data collection ended once the planned sample size is reached. There are no preregistered materials or open data associated with the original study. The original sample consisted of n = 120 participants. The sampling frame for the current replication follows the same general structure, without any additional preselection criteria or known demographic quotas. Estimated participation time is 15 minutes, with compensation set at $2.00 per person (equivalent to $8.00 per hour), for a total projected cost of $191.52.

Materials

Economic Scarcity Manipulation (Bimboola Procedure)

Economic scarcity was manipulated an adaptation of the Bimboola paradigm (Jetten et al., 2015) to manipulate the experience of economic scarcity. Participants were assigned to one of two economic conditions:

Scarcity: participants were placed in the lowest income group, described as living on the poverty line and having difficulty living comfortably.

Nonscarcity (Economic Sufficiency): participants were placed in the third income group, described as having enough money to live comfortably.

In both conditions, participants imagined themselves living in Bimboola and selected a house, vehicle, phone, and leisure activity from sets of three items associated with each income group. All groups’ options were visible during the choice process.

Behavioral Identification Form (BIF; Vallacher & Wegner, 1989)

Construal level was measured using the BIF, one of the most widely used tools for assessing abstraction. The original BIF includes 25 items; Study 2 used two sets of 12 items each. Participants first completed 12 randomly selected items before the economic manipulation, and then completed 12 additional items afterward (only one item from the original scale was not used at any point).

Each item presents an action with two descriptions: one abstract (scored 1) and one concrete (scored 0). For example, “locking a door” could be described as “securing the house” (abstract) or “putting the key in the lock” (concrete). The total number of abstract choices indexes higher-level construal.

Procedure

“Participants responded to the online survey in separate cubicles. First, they completed twelve items randomly chosen from the BIF (Vallacher & Wegner, 1989). The original version of the BIF (Vallacher & Wegner, 1989) includes 25 items. In this scale, participants are presented with different actions and must choose between two options for each action. One option describes the action in concrete terms, while the other option describes the action in abstract terms. For example, participants must choose whether “locking a door” is best described as “securing the house” (abstract level; scored as 1) or “putting the key in the lock” (concrete level; scored as 0). The number of abstract descriptions selected serves as a measure of abstraction: higher scores indicate higher abstraction. Then, participants imagined hemselves living in Bimboola. The same manipulation used in Study 1 was adopted, with a slight variation in the amounts of money: the wealthiest group earned more than 100,000 Bimboolean € per month; the second group earned between 3001 and 100,000 Bimboolean € per month; the third group earned between 1201 and 3000 Bimboolean € per month; the fourth group earned between 400 and 1200 Bimboolean € per month; and the fifth group earned less than 400 Bimboolean € per month (on the poverty line). The procedure was identical to that in Study 1; participants had to choose a house, a vehicle, a phone, and a leisure activity from a group of three items associated with each economic group. Participants observed the items of all groups when making the selection. The participants in the scarcity condition were assigned to the fifth group (on the poverty line); the participants in the nonscarcity condition were assigned to the third group (enough money to live comfortably). After the Bimboola manipulation, participants completed the second part of the BIF scale, with twelve new items (only one randomly selected item from the original scale was not used). Finally, they provided demographic information and responded to the three manipulation check items used in Study 1.”

The procedures of this study were followed with the following exceptions:

  1. Income groups were re-adjusted to match a U.S. rather than Spanish context. The new ranges approximate U.S. household income quintiles from Household Income data. They were then converted to monthly figures and reflect meaningful distinctions in purchasing power and living standards parallel to the Spanish brackets. Group 1 corresponds to income levels below the U.S. poverty threshold; Groups 2 and 3 map onto lower- and middle-income households; Group 4 reflects upper-middle-income earners; and Group 5 captures high-income households without inflating the top category unrealistically. This update keeps the relative spacing of the original categories while adapting to US cultural contexts. The revised income brackets are as follows:

Group 1 earns less than 1,500 $ Bimbolianos (DB) per month (they are below the poverty line), Group 2 earns between 1,500 and 4,000 DB per month, Group 3 earns between 4,001 and 8,000 DB per month, Group 4 earns between 8,001 and 20,000 DB per month, and, Group 5 earns more than 20,000 DB per month.

Images of housing, phone, transportation, and vacation options were also adjusted to reflect a U.S. context, borrowing images from Zillow, CarMax, and Google where applicable.

Analysis Plan

All analyses will follow procedures as closely aligned with the original article as possible. Data will first be inspected for incomplete responses, with participants excluded only if they did not finish the BIF measures or the manipulation checks, consistent with the original study’s practice of analyzing all voluntary participants who completed the survey. No additional exclusion rules or covariates were used in the original paper, and none will be introduced. Response text for BIF items will be cleaned by lowercasing and trimming whitespace to ensure accurate matching with the scoring dictionary.

A standardized dictionary will be constructed for all 25 Behavioral Identification Form items, detailing each item’s concrete and abstract response options. Participants’ BIF responses will then be reshaped into long format and coded as abstract (1) or concrete (0). For each participant, pre-test abstraction scores will be calculated using items 1–11, and post-test abstraction scores will be calculated using items 12–24. Condition-level means and standard deviations will be summarized in tables, accompanied by visualizations illustrating changes in abstraction from pre- to post-manipulation across scarcity and nonscarcity conditions. An additional item-level table will report the percentage of abstract selections for each BIF item by condition to mirror the interpretive granularity of the original study.

Key reproducibility criteria

The manipulation checks described in the original results will be reproduced through two between-subjects ANOVAs testing whether participants in the scarcity condition rated their assigned group as poorer and less rich than participants in the nonscarcity condition. These tests verify that the Bimboola manipulation successfully induced the intended subjective perceptions of scarcity.

The primary confirmatory analysis is a mixed-design ANOVA with time (pre-test vs. post-test BIF abstraction score) as a within-subject factor and condition (scarcity vs. nonscarcity) as a between-subject factor. The condition × time interaction is the critical test of the hypothesis, as the original study reported that scarcity decreased abstraction from pre to post while nonscarcity increased abstraction. Main effects of time and condition will also be reported, although they were not the focus of the original findings. To verify baseline equivalence, pre-test abstraction scores will be compared across conditions to confirm the absence of initial group differences prior to the manipulation.

This analytic strategy directly reproduces the original study’s scoring procedures, exclusion rules, manipulation checks, and inferential tests.

Differences from Original Study

The primary differences between the original study and the planned replication involve the sample, setting, and mode of data collection. The original research was conducted in person with Spanish undergraduates, whereas the replication uses American adults recruited online through Prolific. These differences were discussed with the original authors, and all adaptations were jointly agreed upon. The original materials were translated and reformatted for online administration, and several stimuli were adjusted to fit a U.S. economic and cultural context while preserving the structure of the original paradigm. Specifically, the income brackets in the fictional Bimboola society were redesigned to parallel U.S. household income quintiles. These updated values, converted to monthly income, were selected to create meaningful distinctions in purchasing power comparable to the Spanish categories: Group 1 aligns with income levels below the U.S. poverty threshold; Groups 2 and 3 correspond to lower- and middle-income households; Group 4 reflects upper–middle-income earners; and Group 5 captures high-income households without unrealistically inflating the top tier. Visual stimuli were updated accordingly. House photographs were selected from Zillow to reflect approximately 33–55 percent of each bracket’s monthly income. Phone options were chosen to mirror realistic purchasing behavior within each bracket, from flagship models (Group 5) to budget smartphones (Group 3), with Groups 1 and 2 retaining the original non-smartphone options still available on the U.S. market. Vehicles were matched to what households in each income tier could reasonably afford based on Kelley Blue Book and Consumer Reports cost estimates. Aside from these culturally necessary adjustments and the transition to an online format, no substantive procedural changes were made. Given that the BIF measure and the Bimboola manipulation have been validated across diverse populations and contexts, these differences are not expected to meaningfully alter the pattern of findings. The replication retains the paradigm’s internal logic while ensuring relevance for a contemporary U.S. sample.

Methods Addendum (Post Data Collection

Actual Sample

As planned, 72 participants were successfully collected from Prolific. However, three participants failed the attention check after participating in the simulation, which asked respondents to correctly identify their group’s income group in the fictional group. Upon discovery, these 3 participants were discarded for a final sample of n = 69.

Differences from pre-data collection methods plan –>

none.

Results

Data preparation

Load Relevant Libraries and Functions

PUBLIC ANALYSIS SCRIPT (ANONYMIZED)

NOTE: Raw data import, filtering, and Study ID exclusion steps have been removed for privacy. The analyses below begin from the anonymized dataset exported in the private preprocessing script, which includes only removing identifying information from participants and excluding test surveys.

Read in clean data

df_clean <- read.csv("full_data-anon.csv")
#head(df_clean)

Prepare data for analysis - create columns etc.

Create BIF dictionary

# The standard 25 BIF items. 
bif_item_descriptions <- data.frame(
  BIF_Item = paste0("BIF_", 1:25), 
  
  Item_Topic = c(
    "Organization", "Knowledge Acquisition", "Civic Duty/Defense", "Hygiene", "Nutrition/Consumption",
    "Resource Acquisition", "Home Improvement", "Housekeeping", "Painting", "Housing Maintenance",
    "Gardening", "Security", "Civic Duty", "Recreation", "Self-Reflection",
    "Dental Health", "Assessment", "Social Interaction", "Self-Control", "Eating/Physiology",
    "Gardening/Exercise", "Transportation", "Health Appointment", "Child Interaction", "Social Contact"
  ),
  
  # The exact text of the CONCRETE (0) choice based on user's list
  Concrete_Choice = c(
    "Writing things down",                   # BIF 1: Writing things down (b)
    "Following lines of print",             # BIF 2: Following lines of print (a)
    "Signing up",                           # BIF 3: Signing up (b)
    "Putting clothes into the machine",     # BIF 4: Putting clothes into the machine (b)
    "Pulling an apple off a branch",        # BIF 5: Pulling an apple off a branch (b)
    "Wielding an axe",                      # BIF 6: Wielding an axe (a)
    "Using a yardstick",                    # BIF 7: Using a yardstick (b)
    "Vacuuming the floor",                  # BIF 8: Vacuuming the floor (b)
    "Applying brush strokes",               # BIF 9: Applying brush strokes (a)
    "Writing a check",                      # BIF 10: Writing a check (b)
    "Watering plants",                      # BIF 11: Watering plants (a)
    "Putting a key in the lock",            # BIF 12: Putting a key in the lock (a)
    "Marking a ballot",                     # BIF 13: Marking a ballot (b)
    "Holding on to branches",               # BIF 14: Holding on to branches (b)
    "Answering questions",                  # BIF 15: Answering questions (a)
    "Moving a brush around in one's mouth", # BIF 16: Moving a brush around in one's mouth (b)
    "Answering questions",                  # BIF 17: Answering questions (a)
    "Saying hellow",                         # BIF 18: Saying hello (a)
    "Saying \"no\"",                        # BIF 19: Saying "no" (a)
    "Chewing and swallowing",               # BIF 20: Chewing and swallowing (b)
    "Planting seeds",                       # BIF 21: Planting seeds (a)
    "Following a map",                      # BIF 22: Following a map (a)
    "Going to the dentist",                 # BIF 23: Going to the dentist (b)
    "Using simple words",                   # BIF 24: Using simple words (b)
    "Moving a finger"                       # BIF 25: Moving a finger (a)
  ),
  
  # The exact text of the ABSTRACT (1) choice based on user's list
  Abstract_Choice = c(
    "Getting organized",                    # BIF 1: Getting organized (a)
    "Gaining knowledge",                    # BIF 2: Gaining knowledge (b)
    "Helping the Nation's defense",         # BIF 3: Helping the Nation's defense (a)
    "Removing odor from the clothes",          # BIF 4: Removing odors from clothes (a)
    "Getting something to eat",             # BIF 5: Getting something to eat (a)
    "Getting firewood",                     # BIF 6: Getting firewood (b)
    "Getting ready to remodel",             # BIF 7: Getting ready to remodel (a)
    "Showing one's cleanliness",            # BIF 8: Showing one's cleanliness (a)
    "Making the room look fresh",           # BIF 9: Making the room look fresh (b)
    "Maintaining a place to live",          # BIF 10: Maintaining a place to live (a)
    "Making the room look fresh",            # BIF 11: Making the room look nice (b)
    "Securing the house",                   # BIF 12: Securing the house (b)
    "Influencing the election",             # BIF 13: Influencing the election (a)
    "Getting a good view",                  # BIF 14: Getting a good view (a)
    "Revealing what you're like",           # BIF 15: Revealing what you're like (b)
    "Preventing tooth decay",               # BIF 16: Preventing tooth decay (a)
    "Showing one's knowledge",              # BIF 17: Showing one's knowledge (b)
    "Showing friendliness",                 # BIF 18: Showing friendliness (b)
    "Showing moral courage",                # BIF 19: Showing moral courage (b)
    "Getting nutrition",                    # BIF 20: Getting nutrition (a)
    "Getting fresh vegetables",             # BIF 21: Getting fresh vegetables (b)
    "Seeing countryside",                   # BIF 22: Seeing countryside (b)
    "Protecting your teeth",                # BIF 23: Protecting your teeth (a)
    "Teaching a child something",           # BIF 24: Teaching a child something (a)
    "Seeing if someone's home"              # BIF 25: Seeing if someone's home (b)
  ),
  stringsAsFactors = FALSE
) %>%
  mutate(
    Concrete_Choice = str_trim(tolower(Concrete_Choice)),
    Abstract_Choice = str_trim(tolower(Abstract_Choice))
  )

Confirmatory analysis

Step 0: Remove participants who didn’t pass attention check

df_manip_pass <- df_clean %>%
  filter(manip_pass == "TRUE")

# How many failed? 
nrow(df_clean) - nrow(df_manip_pass)
[1] 3

Step 1: Check manipulation checks

# descriptive
df_manip_pass %>%
  group_by(condition) %>%
  summarise(avg_rating_poor = mean(rating_poor),
            sd_rating_poor = sd(rating_poor),
            avg_rating_rich = mean(rating_rich),
            sd_rating_rich = sd(rating_rich))
# A tibble: 2 × 5
  condition   avg_rating_poor sd_rating_poor avg_rating_rich sd_rating_rich
  <chr>                 <dbl>          <dbl>           <dbl>          <dbl>
1 scarcity               6.75          0.622            1.25          0.916
2 sufficiency            2.89          1.26             2.95          1.22 
# ANOVA test - poor
summary(aov(df_manip_pass$rating_poor ~ df_manip_pass$condition))
                        Df Sum Sq Mean Sq F value Pr(>F)    
df_manip_pass$condition  1 255.42  255.42     246 <2e-16 ***
Residuals               67  69.57    1.04                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# ANOVA test - sufficiency
summary(aov(df_manip_pass$rating_rich ~ df_manip_pass$condition))
                        Df Sum Sq Mean Sq F value   Pr(>F)    
df_manip_pass$condition  1  49.35   49.35   41.39 1.54e-08 ***
Residuals               67  79.89    1.19                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Step 2: Score participant’s BIF responses for variables BIF 1-24 (abstract vs. concrete)

# Select relevant data
bif_counts_df <- df_manip_pass %>%
  select(starts_with("BIF"), condition, participant_id) %>%
  
  pivot_longer(
    cols = starts_with("BIF"),
    names_to = "BIF_Item",
    values_to = "Choice" # 1 = Abstract, 0 = Concrete
  ) %>%
  mutate(
    Choice = str_trim(tolower(Choice)), 
    condition = str_trim(tolower(condition))
  ) %>%
  
  left_join(bif_item_descriptions, by = "BIF_Item") %>%
  
  mutate(
    abstract = case_when(
     Choice == Abstract_Choice ~ 1, 
     Choice == Concrete_Choice ~ 0,
     TRUE ~ NA_real_ 
    )
  )

bif_counts_df  
# A tibble: 1,656 × 8
   condition participant_id BIF_Item Choice           Item_Topic Concrete_Choice
   <chr>              <int> <chr>    <chr>            <chr>      <chr>          
 1 scarcity               1 BIF_1    writing things … Organizat… writing things…
 2 scarcity               1 BIF_2    gaining knowled… Knowledge… following line…
 3 scarcity               1 BIF_3    signing up       Civic Dut… signing up     
 4 scarcity               1 BIF_4    putting clothes… Hygiene    putting clothe…
 5 scarcity               1 BIF_5    pulling an appl… Nutrition… pulling an app…
 6 scarcity               1 BIF_6    getting firewood Resource … wielding an axe
 7 scarcity               1 BIF_7    getting ready t… Home Impr… using a yardst…
 8 scarcity               1 BIF_8    vacuuming the f… Housekeep… vacuuming the …
 9 scarcity               1 BIF_9    making the room… Painting   applying brush…
10 scarcity               1 BIF_10   writing a check  Housing M… writing a check
# ℹ 1,646 more rows
# ℹ 2 more variables: Abstract_Choice <chr>, abstract <dbl>

Step 3: Compute simple pre-test BIF score (BIF 1-12) & post-test BIF score (BIF 13-24)

bif_pre_post_df <- bif_counts_df %>%
  mutate(BIF_Index = as.numeric(str_replace(BIF_Item, "BIF_", ""))) %>%
  group_by(participant_id, condition) %>%
  summarise(
    pre_test_score = sum(abstract[BIF_Index < 12], na.rm = TRUE),
    post_test_score = sum(abstract[BIF_Index >= 12 & BIF_Index <= 24], na.rm = TRUE),
    .groups = 'drop') 
  
pre_post_summary_table <- bif_pre_post_df %>%
  group_by(condition) %>%
  summarise(
  respondent_count = n(),  
  pre_score_avg = round(mean(pre_test_score, na.rm = TRUE),1),
  pre_score_sd = round(sd(pre_test_score, na.rm = TRUE),1),
  post_score_avg = round(mean(post_test_score, na.rm = TRUE),1),
  post_score_sd = round(sd(post_test_score, na.rm = TRUE),1),
  .groups = 'drop'
  )

bif_summary_table_final <- pre_post_summary_table %>%
  # 1. Start with kable (mandatory for kableExtra)
  kable(
    caption = "Item-Level Comparison of Abstract Choice Percentage by Condition",
    align = "lcccrr",
    digits = 1
  ) %>%
  # 2. Add the styling (striped effect comes from 'bootstrap_options')
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed", "responsive"),
    full_width = FALSE
  ) 

Step 4: Run the 2×2 ANOVA (within: time, between: condition)

# Reshape to long 
bif_long <- bif_pre_post_df %>%
  pivot_longer(
    cols = c(pre_test_score, post_test_score),
    names_to = "time",
    values_to = "BIF_score"
  ) %>%
  mutate(
    time        = factor(time,
                         levels = c("pre_test_score", "post_test_score"),
                         labels = c("pre", "post")),
    condition   = factor(condition),
    participant_id = factor(participant_id)
  )

# Mixed ANOVA
anova_results <- ezANOVA(
  data    = bif_long,
  dv      = .(BIF_score),
  wid     = .(participant_id),
  within  = .(time),
  between = .(condition),
  type    = 3,
  detailed = TRUE
)
Warning: Data is unbalanced (unequal N per group). Make sure you specified a
well-considered value for the type argument to ezANOVA().
anova_table <- anova_results$ANOVA %>%
  select(Effect, DFn, DFd, SSn, SSd, F, p, ges) %>%
  mutate(
    F   = round(F, 2),
    p   = signif(p, 3),
    ges = round(ges, 3),
    SSn = round(SSn, 0),
    SSd = round(SSd, 2)
  )

anova_kable <- anova_table %>%
  kable(
    caption = "Mixed ANOVA: BIF Score by Condition and Time",
    align   = "lrrrrrrr"
  ) %>%
  kable_styling(
    full_width       = FALSE,
    bootstrap_options = c("striped", "hover", "condensed")
  )

anova_kable
Mixed ANOVA: BIF Score by Condition and Time
Effect DFn DFd SSn SSd F p ges
(Intercept) 1 67 6392 1599.36 267.77 0.00e+00 0.783
condition 1 67 26 1599.36 1.08 3.03e-01 0.014
time 1 67 73 166.91 29.33 9.00e-07 0.040
condition:time 1 67 1 166.91 0.25 6.19e-01 0.000
Step 4.1 Effect of condition at post
t.test(post_test_score ~ condition, data = bif_pre_post_df)

    Welch Two Sample t-test

data:  post_test_score by condition
t = -0.76632, df = 66.798, p-value = 0.4462
alternative hypothesis: true difference in means between group scarcity and group sufficiency is not equal to 0
95 percent confidence interval:
 -2.636619  1.173781
sample estimates:
   mean in group scarcity mean in group sufficiency 
                 7.187500                  7.918919 
Step 4.2 Effect of time within each condition
scarcity_df <- bif_pre_post_df %>% filter(condition == "scarcity")
 
t.test(scarcity_df$pre_test_score, scarcity_df$post_test_score, paired = TRUE)

    Paired t-test

data:  scarcity_df$pre_test_score and scarcity_df$post_test_score
t = -4.1583, df = 31, p-value = 0.0002348
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 -2.375429 -0.812071
sample estimates:
mean difference 
       -1.59375 

Step 5: ANCOVA

ancova_model <- lm(post_test_score ~ condition + pre_test_score,
                   data = bif_pre_post_df)

anova(ancova_model)
Analysis of Variance Table

Response: post_test_score
               Df Sum Sq Mean Sq F value Pr(>F)    
condition       1   9.18    9.18   1.816 0.1824    
pre_test_score  1 728.00  728.00 144.012 <2e-16 ***
Residuals      66 333.64    5.06                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(ancova_model)

Call:
lm(formula = post_test_score ~ condition + pre_test_score, data = bif_pre_post_df)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.3474 -1.5347  0.3503  1.4489  5.7183 

Coefficients:
                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)            1.5018     0.6184   2.428   0.0179 *  
conditionsufficiency  -0.2859     0.5494  -0.520   0.6045    
pre_test_score         1.0164     0.0847  12.001   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.248 on 66 degrees of freedom
Multiple R-squared:  0.6884,    Adjusted R-squared:  0.679 
F-statistic: 72.91 on 2 and 66 DF,  p-value: < 2.2e-16
tab_model(
  lm(post_test_score ~ condition + pre_test_score, data = bif_pre_post_df),
  show.ci = FALSE,
  show.se = TRUE,
  digits = 3,
  dv.labels = "ANCOVA: Predicting Post-Test BIF Score",
  pred.labels = c("(Intercept)", "Condition (Sufficiency)", "Pre-Test BIF Score")
)
  ANCOVA: Predicting Post-Test BIF Score
Predictors Estimates std. Error p
(Intercept) 1.502 0.618 0.018
Condition (Sufficiency) -0.286 0.549 0.605
Pre-Test BIF Score 1.016 0.085 <0.001
Observations 69
R2 / R2 adjusted 0.688 / 0.679

Visual (The original paper did not include a graph)

# Shared dodge so bars, error bars, and labels align
pd <- position_dodge(width = 0.7)

pre_post_long_for_plot <- bif_pre_post_df %>%
  # wide → long
  pivot_longer(
    cols = c(pre_test_score, post_test_score),
    names_to = "Test_Phase",
    values_to = "Score"
  ) %>%
  mutate(
    Test_Phase = case_when(
      Test_Phase == "pre_test_score"  ~ "Pre-Test",
      Test_Phase == "post_test_score" ~ "Post-Test",
      TRUE ~ Test_Phase
    ),
    Test_Phase = factor(Test_Phase, levels = c("Pre-Test", "Post-Test")),
    condition  = stringr::str_to_title(condition)
  ) %>%
  group_by(condition, Test_Phase) %>%
  summarise(
    Mean_Score = mean(Score, na.rm = TRUE),
    SE_Score   = sd(Score, na.rm = TRUE) / sqrt(n()),
    .groups    = "drop"
  ) %>%
  mutate(
    SE_Score_fixed = ifelse(is.na(SE_Score), 0, SE_Score),
    label_y        = Mean_Score + SE_Score_fixed + 0.3
  )

pre_post_long_for_plot %>%
  ggplot(aes(x = Test_Phase, y = Mean_Score, fill = condition)) +
  geom_col(
    position = pd,
    color    = "white",
    width    = 0.6
  ) +
  geom_errorbar(
    aes(
      ymin = Mean_Score - SE_Score,
      ymax = Mean_Score + SE_Score
    ),
    position = pd,
    width    = 0.15,
    linewidth = 0.8,
    na.rm    = TRUE
  ) +
  geom_text(
    aes(
      y     = label_y,
      label = round(Mean_Score, 1)
    ),
    position = pd,
    vjust    = -0.1,
    size     = 4.5,
    fontface = "bold",
    color    = "black"
  ) +
  scale_fill_manual(values = c(
    "Scarcity"    = "#2C6CF5",
    "Sufficiency" = "#B0B0B0"
  )) +
  labs(
    title    = "Abstract BIF Score by Condition and Test Phase",
    subtitle = "Mean Abstract BIF Score (±1 SE, Max 12)",
    y        = NULL,
    x        = "Test Phase",
    fill     = "Condition"
  ) +
  scale_y_continuous(
    limits = c(0, NA),
    expand = expansion(mult = c(0, 0.08))
  ) +
  coord_cartesian(clip = "off") +
  theme_minimal(base_size = 16) +
  theme(
    text = element_text(family = "Mulish"),
    axis.text.y        = element_blank(),
    axis.title.y       = element_blank(),
    axis.ticks.y       = element_blank(),
    panel.grid.major.y = element_blank(),
    panel.grid.minor.y = element_blank(),
    panel.grid.major.x = element_blank(),
    panel.grid.minor.x = element_blank(),
    axis.line.x        = element_line(color = "black", linewidth = 0.5),
    plot.title         = element_text(face = "bold", size = 20),
    plot.subtitle      = element_text(size = 13),
    legend.position    = "bottom",
    plot.margin        = margin(t = 20, r = 10, b = 10, l = 10)
  )

Discussion

Summary of Replication Attempt

The primary confirmatory analysis tested whether the original interaction between economic condition (scarcity vs. sufficiency) and time (pre vs. post BIF score) would replicate. In the original study, scarcity decreased abstraction from pre- to post-test, whereas sufficiency increased abstraction, yielding a significant condition × time interaction (f = .33). In the replication, the manipulation checks again showed that participants distinguished the scarcity and sufficiency conditions as intended, confirming that the economic manipulation functioned. However, the critical mixed ANOVA revealed no evidence of the predicted interaction (F(1,67) = 0.25, p = .619, ges < .001). Participants did show a robust overall increase in abstract construal from pre to post (p < .0001), but this shift occurred regardless of condition, and the post-test means of the two groups did not significantly differ. The ANCOVA likewise found no effect of condition on post-test BIF scores when controlling for baseline abstraction (p = .60). Taken together, these results indicate that the replication failed to reproduce the central effect reported in the original article: the scarcity manipulation did not differentially lower abstraction relative to the sufficiency condition. While the manipulation checks replicated, the key theoretical effect did not.

Commentary

Several factors may help explain why the original effect did not replicate in the present study. First, the cultural context differed meaningfully: the original work was conducted with Spanish undergraduates, whereas the replication used a U.S.-based adult online sample. Construal level and perceptions of economic scarcity can vary across cultural settings, and the Bimboola paradigm may not evoke the same psychological resonance in a U.S. context. Second, the mode of administration shifted from an in-person laboratory environment to an online platform, where participants may engage less deeply with hypothetical scenarios. Third, although the stimuli were carefully adapted to U.S. economic realities, the scarcity manipulation may have been weaker or less emotionally vivid in this online format. In collaboration with the origianl authors, future online replications might consider strengthening the manipulation by requiring participants to construct a realistic monthly budget under their assigned income bracket, increasing immersion and perceived constraint. Finally, the replication used a smaller sample size due to course resource constraints, reducing power relative to the original and limiting the ability to detect subtler effects. Taken together, these cultural, methodological, and pragmatic differences provide reasonable possible explanations for why the central interaction did not reproduce.