1 Replication Study

1.1 Abstract

Risk presentation format refers to the way information about uncertain outcomes is communicated to decision-makers. In investment decisions, the same risky asset may be perceived differently depending on whether risk is described numerically or experienced through simulated outcomes. Kaufmann et al. (2013) found that participants allocated more to the risky fund when they used a risk tool that combined experience sampling and graphical displays, compared with a description-only condition. Across the broader set of experiments, the authors reported that the risk tool increased risky allocation by approximately five to fifteen percentage points, relative to ordinary descriptions and that higher risky allocations were associated with lower risk perception, higher confidence, and more accurate estimates of the probability of loss. This replication plan proposes an online Qualtrics replication and extension of Experiment I from Kaufmann et al. (2013), conducted with an Indonesian sample. In this replication, we will compare three conditions: a description condition, an original risk tool condition, and a path simulation extension condition. The replication component tests whether the original risk tool increases risky allocation relative to the description condition. The extension tests whether adding year-by-year wealth-path information changes risky allocation compared with the original endpoint/distribution-based risk tool. The Indonesian context is theoretically and practically relevant because investment participation, financial literacy, and digital investment adoption are growing rapidly in many emerging markets, while retail investors may still face substantial uncertainty in interpreting financial risk. The primary outcome is final allocation to the risky fund. Secondary outcomes include satisfaction after simulated payoff, subsequent hypothetical allocation, and mechanism/comprehension measures adapted from the target article. The study is powered using a conservative effect size of d = 0.15, α = 0.05, and power = 0.80, requiring approximately 2,100 participants across the three experimental conditions.

Full report can be requested here

2 Pre-analysis

2.1 Purpose of this pre-analysis

This report uses 1,000 synthetic completed responses, generated by LLM, to test the analysis pipeline for the replication and extension experiment. The synthetic dataset is not intended to provide empirical evidence. Instead, it is used to verify that the survey design produces analyzable variables, that the planned data cleaning steps work, and that the planned statistical models can be implemented before actual data collection.

The experiment contains three treatment arms:

Description condition.
Endpoint simulation or risk tool condition.
Path simulation condition.

The main outcome is the participant’s final allocation to the risky option. The extension-specific question is whether showing year-by-year investment paths changes allocation behaviour compared with showing endpoint simulation information.

2.2 Data preparation

The analysis-ready dataset was cleaned and standardised in the master script. Key constructed variables include:

final_total: final safe allocation plus final risky allocation.
subsequent_total: subsequent safe allocation plus subsequent risky allocation.
manual_luck: simulated portfolio outcome minus expected portfolio outcome.
risky_allocation_change: subsequent risky allocation minus final risky allocation.

glimpse(df)

## Rows: 1,000
## Columns: 23
## $ group_raw                          <chr> "tool", "pathway", "tool", "tool", …
## $ manual_final_allocation_risky      <dbl> 36, 53, 84, 86, 42, 84, 63, 60, 40,…
## $ manual_final_allocation_safe       <dbl> 64, 47, 16, 14, 58, 16, 37, 40, 60,…
## $ manual_expected_portfolio_outcome  <dbl> 133.8, 138.9, 148.2, 148.8, 135.6, …
## $ manual_simulated_portfolio_outcome <dbl> 156.53, 135.98, 91.05, 86.79, 119.7…
## $ manual_subsequent_allocation_risky <dbl> 26, 43, 67, 60, 40, 61, 69, 47, 22,…
## $ manual_subsequent_allocation_safe  <dbl> 74, 57, 33, 40, 60, 39, 31, 53, 78,…
## $ age                                <dbl> 32, 21, 29, 24, 41, 20, 29, 34, 22,…
## $ gender                             <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ education                          <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ income                             <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ stock_ownership                    <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ perceived_volatility               <dbl> 6, 6, 6, 5, 7, 5, 5, 5, 3, 4, 5, 4,…
## $ satisfaction                       <dbl> 4, 4, 3, 3, 4, 4, 5, 1, 3, 4, 3, 3,…
## $ confidence                         <dbl> 4, 4, 5, 3, 4, 6, 4, 3, 4, 4, 4, 2,…
## $ comprehension                      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ group                              <fct> tool, pathway, tool, tool, pathway,…
## $ final_total                        <dbl> 100, 100, 100, 100, 100, 100, 100, …
## $ subsequent_total                   <dbl> 100, 100, 100, 100, 100, 100, 100, …
## $ manual_luck                        <dbl> 22.73, -2.92, -57.15, -62.01, -15.8…
## $ final_total_valid                  <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,…
## $ subsequent_total_valid             <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,…
## $ risky_allocation_change            <dbl> -10, -10, -17, -26, -2, -23, 6, -13…

2.3 Data quality checks

The first step is to verify whether the synthetic survey data are complete and internally consistent.

quality_summary %>%
  gt() %>%
  fmt_number(
    columns = everything(),
    decimals = 2
  )

n	missing_group	missing_final_risky	missing_final_safe	missing_expected	missing_payoff	missing_subseq_risky	missing_subseq_safe	final_total_not_100	subsequent_total_not_100	final_risky_out_of_range	final_safe_out_of_range	subsequent_risky_out_of_range	subsequent_safe_out_of_range	missing_luck
1,000.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00

The treatment group balance is shown below:

group_balance %>%
  mutate(percent = percent(percent, accuracy = 0.1)) %>%
  gt()

group	n	percent
description	334	33.4%
tool	333	33.3%
pathway	333	33.3%

The key checks includes:

whether allocation variables are missing,
whether allocation totals add up to 100, and
whether luck can be calculated.

2.4 Descriptive statistics

2.4.1 Outcome summary by treatment group

outcome_summary_by_group %>%
  gt() %>%
  fmt_number(
    columns = where(is.numeric),
    decimals = 2
  )

group	n	mean_final_risky	sd_final_risky	mean_final_safe	mean_expected	mean_payoff	mean_luck	sd_luck	mean_subsequent_risky	sd_subsequent_risky	mean_change_risky
description	334.00	43.08	15.64	56.92	135.92	134.24	−1.68	25.20	41.48	16.12	−1.60
tool	333.00	52.37	17.18	47.63	138.71	138.39	−0.32	29.93	48.43	15.96	−3.94
pathway	333.00	41.77	15.31	58.23	135.53	137.16	1.63	24.45	40.63	14.76	−1.14

2.4.2 Outcome summary by treatment group

outcome_summary_by_group %>%
  gt() %>%
  fmt_number(
    columns = where(is.numeric),
    decimals = 2
  )

group	n	mean_final_risky	sd_final_risky	mean_final_safe	mean_expected	mean_payoff	mean_luck	sd_luck	mean_subsequent_risky	sd_subsequent_risky	mean_change_risky
description	334.00	43.08	15.64	56.92	135.92	134.24	−1.68	25.20	41.48	16.12	−1.60
tool	333.00	52.37	17.18	47.63	138.71	138.39	−0.32	29.93	48.43	15.96	−3.94
pathway	333.00	41.77	15.31	58.23	135.53	137.16	1.63	24.45	40.63	14.76	−1.14

2.4.3 Mechanism and comprehension analysis

mechanism_summary_by_group %>%
  gt() %>%
  fmt_number(
    columns = where(is.numeric),
    decimals = 2
  )

group	n	mean_perceived_volatility	sd_perceived_volatility	mean_satisfaction	sd_satisfaction	mean_confidence	sd_confidence	mean_comprehension	sd_comprehension
description	334.00	4.56	0.94	3.77	1.02	3.54	1.02	NA	NA
tool	333.00	4.61	0.94	3.95	1.12	4.26	0.97	NA	NA
pathway	333.00	5.42	0.89	3.77	1.09	3.74	1.03	NA	NA

2.5 Main treatment-effect analysis

The primary analysis estimates whether treatment assignment predicts final risky allocation.

The simple model is:

$ FinalRiskyAllocation_i = α + β1 RiskTool_i + β2 Pathway_i + ε_i $

where the description code is the reference group.

modelsummary(
  list(
    "Final risky allocation" = models$model_final_simple,
    "Final risky allocation + controls" = models$model_final_controlled
  ),
  stars = TRUE,
  statistic = "std.error"
)

	Final risky allocation	Final risky allocation + controls
+ p < 0.1, * p < 0.05, p < 0.01, * p < 0.001
(Intercept)	43.081***	40.221***
	(0.879)	(2.365)
group_description_reftool	9.286***	9.285***
	(1.244)	(1.244)
group_description_refpathway	-1.306	-1.370
	(1.244)	(1.245)
age		0.101
		(0.078)
Num.Obs.	1000	1000
R2	0.079	0.081
R2 Adj.	0.078	0.078
AIC	8396.1	8396.4
BIC	8415.7	8420.9
Log.Lik.	-4194.041	-4193.190
RMSE	16.04	16.03

Because this analysis uses synthetic data, the coefficients should be interpreted only as a test of the analysis workflow.

2.6 Extension-specific analysis

The extension compares the path simulation condition against the endpoint simulation/risk tool condition. This addresses the question of whether of not showing the year-by-year path produce different allocation behaviour than showing endpoint outcomes among simulated experiences.

modelsummary(
  list(
    "Pathway vs tool" = models$model_path_vs_tool
  ),
  stars = TRUE,
  statistic = "std.error"
)

	Pathway vs tool
+ p < 0.1, * p < 0.05, p < 0.01, * p < 0.001
(Intercept)	52.366***
	(0.880)
group_tool_refdescription	-9.286***
	(1.244)
group_tool_refpathway	-10.592***
	(1.245)
Num.Obs.	1000
R2	0.079
R2 Adj.	0.078
AIC	8396.1
BIC	8415.7
Log.Lik.	-4194.041
RMSE	16.04

2.7 Subsequent allocation analysis

The next model examines whether treatment assignment, final risky allocation, and luck predict the subsequent hypothetical allocation.

$ SubsequentRiskyAllocation_i = α + β1 Treatment_i + β2 FinalRiskyAllocation_i + β3 Luck_i + ε_i $

modelsummary(
  list(
    "Subsequent risky allocation" = models$model_subsequent,
    "Change in risky allocation" = models$model_change
  ),
  stars = TRUE,
  statistic = "std.error"
)

	Subsequent risky allocation	Change in risky allocation
+ p < 0.1, * p < 0.05, p < 0.01, * p < 0.001
(Intercept)	9.509***	-1.435*
	(0.969)	(0.570)
group_description_reftool	-0.117	-2.470**
	(0.762)	(0.806)
group_description_refpathway	-0.211	0.132
	(0.743)	(0.807)
manual_final_allocation_risky	0.746***
	(0.019)
manual_luck	0.103***	0.099***
	(0.011)	(0.012)
Num.Obs.	1000	1000
R2	0.643	0.073
R2 Adj.	0.641	0.070
AIC	7364.2	7529.0
BIC	7393.6	7553.5
Log.Lik.	-3676.081	-3759.493
RMSE	9.56	10.39

This model is useful because participants who receive a favourable simulated payoff may allocate more to the risky option later. In the actual experiment, this model will help distinguis the treatment effect from the effect of realised simulation luck.

2.8 Mechanism models

Where mechanism variables are available, the synthetic analysis estimates whether treatment assignment predicts perceived volatility, satisfaction, and confidence.

mechanism_models <- list()

if (!is.null(models$model_perceived_volatility)) {
  mechanism_models[["Perceived volatility"]] <- models$model_perceived_volatility
}

if (!is.null(models$model_satisfaction)) {
  mechanism_models[["Satisfaction"]] <- models$model_satisfaction
}

if (!is.null(models$model_confidence)) {
  mechanism_models[["Confidence"]] <- models$model_confidence
}

if (length(mechanism_models) > 0) {
  modelsummary(
    mechanism_models,
    stars = TRUE,
    statistic = "std.error"
  )
} else {
  cat("No mechanism models were estimated because mechanism variables were not available in the dataset.")
}

	Perceived volatility	Satisfaction	Confidence
+ p < 0.1, * p < 0.05, p < 0.01, * p < 0.001
(Intercept)	4.557***	3.766***	3.536***
	(0.051)	(0.059)	(0.055)
group_description_reftool	0.056	0.185*	0.722***
	(0.072)	(0.083)	(0.078)
group_description_refpathway	0.861***	0.005	0.203**
	(0.072)	(0.083)	(0.078)
Num.Obs.	1000	1000	1000
R2	0.153	0.006	0.084
R2 Adj.	0.151	0.004	0.082
AIC	2688.6	2988.8	2860.9
BIC	2708.2	3008.5	2880.5
Log.Lik.	-1340.275	-1490.413	-1426.446
RMSE	0.92	1.07	1.01

2.9 Regression model summary

The table below summarises the main regression models used in the synthetic-data pre-analysis. These models correspond to the planned analytical strategy for the actual experiment.

The first model estimates the unadjusted treatment effect on final risky allocation. The second model adds participant-level controls. The third model relevels the treatment condition to compare the path-simulation condition directly against the original risk-tool condition. The fourth model examines subsequent risky allocation after the payoff stage, including final risky allocation and realised luck as predictors. The fifth model examines change in risky allocation between the initial decision and the subsequent hypothetical allocation.

2.10 Figures

2.10.1 Final risky allocation by treatment group

knitr::include_graphics(here("outputs", "figures", "fig_final_risky_allocation_by_group.png"))

2.10.2 Subsequent risky allocation by treatment group

knitr::include_graphics(here("outputs", "figures", "fig_subsequent_risky_allocation_by_group.png"))

2.10.3 Distribution of final risky allocation

knitr::include_graphics(here("outputs", "figures", "fig_distribution_final_risky_allocation.png"))

2.10.4 Luck and subsequent risky allocation

knitr::include_graphics(here("outputs", "figures", "fig_luck_and_subsequent_risky_allocation.png"))

volatility_path <- here("outputs", "figures", "fig_perceived_volatility_by_group.png")

if (file.exists(volatility_path)) {
  knitr::include_graphics(volatility_path)
}

3 Preliminary interpretation

This synthetic-data analysis demonstrates that the proposed data pipeline can estimate the planned comparisons across the description, endpoint simulation, and path simulation arms. The key outcomes are final risky allocation, subsequent risky allocation, expected portfolio outcome, simulated portfolio outcome, and calculated luck.

The analysis should not be interpreted as empirical evidence because the data are simulated. Instead, the results serve as a pre-analysis check that the survey instrument and analysis plan are aligned.

For the actual experiment, the same script can be reused after replacing the synthetic dataset with the real completed survey data.

Synthetic Data Pre-Analysis for Replication Experiment

Sitti Salam

2026-05-19