1 Replication Study

1.1 Abstract

Risk presentation format refers to the way information about uncertain outcomes is communicated to decision-makers. In investment decisions, the same risky asset may be perceived differently depending on whether risk is described numerically or experienced through simulated outcomes. Kaufmann et al. (2013) found that participants allocated more to the risky fund when they used a risk tool that combined experience sampling and graphical displays, compared with a description-only condition. Across the broader set of experiments, the authors reported that the risk tool increased risky allocation by approximately five to fifteen percentage points, relative to ordinary descriptions and that higher risky allocations were associated with lower risk perception, higher confidence, and more accurate estimates of the probability of loss. This replication plan proposes an online Qualtrics replication and extension of Experiment I from Kaufmann et al. (2013), conducted with an Indonesian sample. In this replication, we will compare three conditions: a description condition, an original risk tool condition, and a path simulation extension condition. The replication component tests whether the original risk tool increases risky allocation relative to the description condition. The extension tests whether adding year-by-year wealth-path information changes risky allocation compared with the original endpoint/distribution-based risk tool. The Indonesian context is theoretically and practically relevant because investment participation, financial literacy, and digital investment adoption are growing rapidly in many emerging markets, while retail investors may still face substantial uncertainty in interpreting financial risk. The primary outcome is final allocation to the risky fund. Secondary outcomes include satisfaction after simulated payoff, subsequent hypothetical allocation, and mechanism/comprehension measures adapted from the target article. The study is powered using a conservative effect size of d = 0.15, α = 0.05, and power = 0.80, requiring approximately 2,100 participants across the three experimental conditions.

Full report can be requested here

2 Pre-analysis

2.1 Purpose of this pre-analysis

This report uses 1,000 synthetic completed responses, generated by LLM, to test the analysis pipeline for the replication and extension experiment. The synthetic dataset is not intended to provide empirical evidence. Instead, it is used to verify that the survey design produces analyzable variables, that the planned data cleaning steps work, and that the planned statistical models can be implemented before actual data collection.

The experiment contains three treatment arms:

  • Description condition.
  • Endpoint simulation or risk tool condition.
  • Path simulation condition.

The main outcome is the participant’s final allocation to the risky option. The extension-specific question is whether showing year-by-year investment paths changes allocation behaviour compared with showing endpoint simulation information.

2.2 Data preparation

The analysis-ready dataset was cleaned and standardised in the master script. Key constructed variables include:

  • final_total: final safe allocation plus final risky allocation.
  • subsequent_total: subsequent safe allocation plus subsequent risky allocation.
  • manual_luck: simulated portfolio outcome minus expected portfolio outcome.
  • risky_allocation_change: subsequent risky allocation minus final risky allocation.
glimpse(df)
## Rows: 1,000
## Columns: 23
## $ group_raw                          <chr> "tool", "pathway", "tool", "tool", …
## $ manual_final_allocation_risky      <dbl> 36, 53, 84, 86, 42, 84, 63, 60, 40,…
## $ manual_final_allocation_safe       <dbl> 64, 47, 16, 14, 58, 16, 37, 40, 60,…
## $ manual_expected_portfolio_outcome  <dbl> 133.8, 138.9, 148.2, 148.8, 135.6, …
## $ manual_simulated_portfolio_outcome <dbl> 156.53, 135.98, 91.05, 86.79, 119.7…
## $ manual_subsequent_allocation_risky <dbl> 26, 43, 67, 60, 40, 61, 69, 47, 22,…
## $ manual_subsequent_allocation_safe  <dbl> 74, 57, 33, 40, 60, 39, 31, 53, 78,…
## $ age                                <dbl> 32, 21, 29, 24, 41, 20, 29, 34, 22,…
## $ gender                             <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ education                          <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ income                             <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ stock_ownership                    <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ perceived_volatility               <dbl> 6, 6, 6, 5, 7, 5, 5, 5, 3, 4, 5, 4,…
## $ satisfaction                       <dbl> 4, 4, 3, 3, 4, 4, 5, 1, 3, 4, 3, 3,…
## $ confidence                         <dbl> 4, 4, 5, 3, 4, 6, 4, 3, 4, 4, 4, 2,…
## $ comprehension                      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ group                              <fct> tool, pathway, tool, tool, pathway,…
## $ final_total                        <dbl> 100, 100, 100, 100, 100, 100, 100, …
## $ subsequent_total                   <dbl> 100, 100, 100, 100, 100, 100, 100, …
## $ manual_luck                        <dbl> 22.73, -2.92, -57.15, -62.01, -15.8…
## $ final_total_valid                  <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,…
## $ subsequent_total_valid             <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,…
## $ risky_allocation_change            <dbl> -10, -10, -17, -26, -2, -23, 6, -13…

2.3 Data quality checks

The first step is to verify whether the synthetic survey data are complete and internally consistent.

quality_summary %>%
  gt() %>%
  fmt_number(
    columns = everything(),
    decimals = 2
  )
n missing_group missing_final_risky missing_final_safe missing_expected missing_payoff missing_subseq_risky missing_subseq_safe final_total_not_100 subsequent_total_not_100 final_risky_out_of_range final_safe_out_of_range subsequent_risky_out_of_range subsequent_safe_out_of_range missing_luck
1,000.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

The treatment group balance is shown below:

group_balance %>%
  mutate(percent = percent(percent, accuracy = 0.1)) %>%
  gt()
group n percent
description 334 33.4%
tool 333 33.3%
pathway 333 33.3%

The key checks includes:

  • whether allocation variables are missing,
  • whether allocation totals add up to 100, and
  • whether luck can be calculated.

2.4 Descriptive statistics

2.4.1 Outcome summary by treatment group

outcome_summary_by_group %>%
  gt() %>%
  fmt_number(
    columns = where(is.numeric),
    decimals = 2
  )
group n mean_final_risky sd_final_risky mean_final_safe mean_expected mean_payoff mean_luck sd_luck mean_subsequent_risky sd_subsequent_risky mean_change_risky
description 334.00 43.08 15.64 56.92 135.92 134.24 −1.68 25.20 41.48 16.12 −1.60
tool 333.00 52.37 17.18 47.63 138.71 138.39 −0.32 29.93 48.43 15.96 −3.94
pathway 333.00 41.77 15.31 58.23 135.53 137.16 1.63 24.45 40.63 14.76 −1.14

2.4.2 Outcome summary by treatment group

outcome_summary_by_group %>%
  gt() %>%
  fmt_number(
    columns = where(is.numeric),
    decimals = 2
  )
group n mean_final_risky sd_final_risky mean_final_safe mean_expected mean_payoff mean_luck sd_luck mean_subsequent_risky sd_subsequent_risky mean_change_risky
description 334.00 43.08 15.64 56.92 135.92 134.24 −1.68 25.20 41.48 16.12 −1.60
tool 333.00 52.37 17.18 47.63 138.71 138.39 −0.32 29.93 48.43 15.96 −3.94
pathway 333.00 41.77 15.31 58.23 135.53 137.16 1.63 24.45 40.63 14.76 −1.14

2.4.3 Mechanism and comprehension analysis

mechanism_summary_by_group %>%
  gt() %>%
  fmt_number(
    columns = where(is.numeric),
    decimals = 2
  )
group n mean_perceived_volatility sd_perceived_volatility mean_satisfaction sd_satisfaction mean_confidence sd_confidence mean_comprehension sd_comprehension
description 334.00 4.56 0.94 3.77 1.02 3.54 1.02 NA NA
tool 333.00 4.61 0.94 3.95 1.12 4.26 0.97 NA NA
pathway 333.00 5.42 0.89 3.77 1.09 3.74 1.03 NA NA

2.5 Main treatment-effect analysis

The primary analysis estimates whether treatment assignment predicts final risky allocation.

The simple model is:

$ FinalRiskyAllocation_i = α + β1 RiskTool_i + β2 Pathway_i + ε_i $

where the description code is the reference group.

modelsummary(
  list(
    "Final risky allocation" = models$model_final_simple,
    "Final risky allocation + controls" = models$model_final_controlled
  ),
  stars = TRUE,
  statistic = "std.error"
)
Final risky allocation Final risky allocation + controls
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
(Intercept) 43.081*** 40.221***
(0.879) (2.365)
group_description_reftool 9.286*** 9.285***
(1.244) (1.244)
group_description_refpathway -1.306 -1.370
(1.244) (1.245)
age 0.101
(0.078)
Num.Obs. 1000 1000
R2 0.079 0.081
R2 Adj. 0.078 0.078
AIC 8396.1 8396.4
BIC 8415.7 8420.9
Log.Lik. -4194.041 -4193.190
RMSE 16.04 16.03

Because this analysis uses synthetic data, the coefficients should be interpreted only as a test of the analysis workflow.

2.6 Extension-specific analysis

The extension compares the path simulation condition against the endpoint simulation/risk tool condition. This addresses the question of whether of not showing the year-by-year path produce different allocation behaviour than showing endpoint outcomes among simulated experiences.

modelsummary(
  list(
    "Pathway vs tool" = models$model_path_vs_tool
  ),
  stars = TRUE,
  statistic = "std.error"
)
Pathway vs tool
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
(Intercept) 52.366***
(0.880)
group_tool_refdescription -9.286***
(1.244)
group_tool_refpathway -10.592***
(1.245)
Num.Obs. 1000
R2 0.079
R2 Adj. 0.078
AIC 8396.1
BIC 8415.7
Log.Lik. -4194.041
RMSE 16.04

2.7 Subsequent allocation analysis

The next model examines whether treatment assignment, final risky allocation, and luck predict the subsequent hypothetical allocation.

$ SubsequentRiskyAllocation_i = α + β1 Treatment_i + β2 FinalRiskyAllocation_i + β3 Luck_i + ε_i $

modelsummary(
  list(
    "Subsequent risky allocation" = models$model_subsequent,
    "Change in risky allocation" = models$model_change
  ),
  stars = TRUE,
  statistic = "std.error"
)
Subsequent risky allocation Change in risky allocation
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
(Intercept) 9.509*** -1.435*
(0.969) (0.570)
group_description_reftool -0.117 -2.470**
(0.762) (0.806)
group_description_refpathway -0.211 0.132
(0.743) (0.807)
manual_final_allocation_risky 0.746***
(0.019)
manual_luck 0.103*** 0.099***
(0.011) (0.012)
Num.Obs. 1000 1000
R2 0.643 0.073
R2 Adj. 0.641 0.070
AIC 7364.2 7529.0
BIC 7393.6 7553.5
Log.Lik. -3676.081 -3759.493
RMSE 9.56 10.39

This model is useful because participants who receive a favourable simulated payoff may allocate more to the risky option later. In the actual experiment, this model will help distinguis the treatment effect from the effect of realised simulation luck.

2.8 Mechanism models

Where mechanism variables are available, the synthetic analysis estimates whether treatment assignment predicts perceived volatility, satisfaction, and confidence.

mechanism_models <- list()

if (!is.null(models$model_perceived_volatility)) {
  mechanism_models[["Perceived volatility"]] <- models$model_perceived_volatility
}

if (!is.null(models$model_satisfaction)) {
  mechanism_models[["Satisfaction"]] <- models$model_satisfaction
}

if (!is.null(models$model_confidence)) {
  mechanism_models[["Confidence"]] <- models$model_confidence
}

if (length(mechanism_models) > 0) {
  modelsummary(
    mechanism_models,
    stars = TRUE,
    statistic = "std.error"
  )
} else {
  cat("No mechanism models were estimated because mechanism variables were not available in the dataset.")
}
Perceived volatility Satisfaction Confidence
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
(Intercept) 4.557*** 3.766*** 3.536***
(0.051) (0.059) (0.055)
group_description_reftool 0.056 0.185* 0.722***
(0.072) (0.083) (0.078)
group_description_refpathway 0.861*** 0.005 0.203**
(0.072) (0.083) (0.078)
Num.Obs. 1000 1000 1000
R2 0.153 0.006 0.084
R2 Adj. 0.151 0.004 0.082
AIC 2688.6 2988.8 2860.9
BIC 2708.2 3008.5 2880.5
Log.Lik. -1340.275 -1490.413 -1426.446
RMSE 0.92 1.07 1.01

2.9 Regression model summary

The table below summarises the main regression models used in the synthetic-data pre-analysis. These models correspond to the planned analytical strategy for the actual experiment.

The first model estimates the unadjusted treatment effect on final risky allocation. The second model adds participant-level controls. The third model relevels the treatment condition to compare the path-simulation condition directly against the original risk-tool condition. The fourth model examines subsequent risky allocation after the payoff stage, including final risky allocation and realised luck as predictors. The fifth model examines change in risky allocation between the initial decision and the subsequent hypothetical allocation.

2.10 Figures

2.10.1 Final risky allocation by treatment group

knitr::include_graphics(here("outputs", "figures", "fig_final_risky_allocation_by_group.png"))

2.10.2 Subsequent risky allocation by treatment group

knitr::include_graphics(here("outputs", "figures", "fig_subsequent_risky_allocation_by_group.png"))

2.10.3 Distribution of final risky allocation

knitr::include_graphics(here("outputs", "figures", "fig_distribution_final_risky_allocation.png"))

2.10.4 Luck and subsequent risky allocation

knitr::include_graphics(here("outputs", "figures", "fig_luck_and_subsequent_risky_allocation.png"))

volatility_path <- here("outputs", "figures", "fig_perceived_volatility_by_group.png")

if (file.exists(volatility_path)) {
  knitr::include_graphics(volatility_path)
}

3 Preliminary interpretation

This synthetic-data analysis demonstrates that the proposed data pipeline can estimate the planned comparisons across the description, endpoint simulation, and path simulation arms. The key outcomes are final risky allocation, subsequent risky allocation, expected portfolio outcome, simulated portfolio outcome, and calculated luck.

The analysis should not be interpreted as empirical evidence because the data are simulated. Instead, the results serve as a pre-analysis check that the survey instrument and analysis plan are aligned.

For the actual experiment, the same script can be reused after replacing the synthetic dataset with the real completed survey data.