This report examines studies with unusually large z-statistics from Mertens, Herberz, Hahnel, & Brosch (2022), “The effectiveness of nudging: A meta-analysis of choice architecture interventions across behavioral domains,” published in PNAS. The meta-analysis synthesizes evidence on “nudges”—interventions that modify the choice environment to steer behavior without restricting options or changing economic incentives.
What counts as a nudge? The meta-analysis included studies testing choice architecture interventions classified into three categories: (1) decision structure interventions that change option-related effort or set defaults; (2) decision information interventions that translate or make information more visible; and (3) decision assistance interventions like reminders or commitment devices. The nine specific techniques are: default, reminder, composition, translation, social reference, visibility, commitment, effort, and consequence.
What studies were included? Mertens et al. included experimental and quasi-experimental studies that (a) tested a choice architecture intervention, (b) measured actual behavior (not just intentions), (c) compared an intervention group to a control group, and (d) reported sufficient statistics to compute effect sizes. The final dataset contains 447 effect sizes from 212 publications spanning six behavioral domains: food, health, environment, finance, pro-social behavior, and other.
What this report examines: All 68 effect sizes (15.2% of the total) with z-statistics exceeding 5 in absolute value—indicating effects estimated with unusually high precision. We investigate why these z-statistics are large: is it because the interventions produced genuinely dramatic behavioral changes, or because massive sample sizes made even tiny effects statistically detectable?
We classify papers into three patterns based on what drives their large z-statistics:
| Pattern | Papers | Definition | Interpretation |
|---|---|---|---|
| Large Effect | 32 | Cohen’s d ≥ 0.5 | Genuinely large behavioral effects; the intervention caused dramatic changes |
| Massive Sample | 5 | n ≥ 10,000 and d < 0.5 | Tiny effects detected with extreme precision due to huge samples |
| Moderate Both | 6 | Otherwise | Moderate effects with moderate samples, just crossing z > 5 threshold |
A large z-statistic (|z| > 5) tells us the effect is precisely estimated and statistically certain, but says nothing about whether the effect is large or practically important. Understanding which pattern applies is crucial:
Example: Johnson et al. (2002) - Privacy Defaults
In a study on online privacy choices, participants were either given an opt-in default (checkbox unchecked = no marketing contact) or opt-out default (checkbox pre-checked = receive marketing contact). The opt-out condition increased consent from 48% to 96%—a 48 percentage point increase.
Why z is large: The effect size itself is enormous. Despite a small sample, the dramatic behavioral change produces high statistical significance.
Example: BIT (2013) - Organ Donation Registration
The UK’s Behavioural Insights Team tested different messages encouraging organ donor registration among people renewing vehicle tax online. Over 1 million people were quasi-randomly assigned to different message conditions.
Why z is large: The effect is tiny—about 1 percentage point difference in registration rates. But with 271,000 participants, the standard error shrinks to 0.004, making even this tiny effect highly statistically significant. At population scale, this translates to ~96,000 additional organ donor registrations per year.
Example: Broman et al. (2014) - Smart Grid Acceptance
Participants were randomly assigned to opt-in vs. opt-out framing for Smart Grid technology acceptance in Denmark, Norway, and Switzerland.
Why z is large: Neither the effect nor the sample is extreme, but the combination just crosses the z > 5 threshold. The 18-19 percentage point difference with ~900 participants produces z values in the 5-6 range.
Title: Using a verbal prompt to increase protein consumption in a hospital setting: A field study
Pattern: Large Effect
Paper link: Search on Google Scholar
| Attribute | Value |
|---|---|
| Domain | food |
| Category | assistance |
| Technique | reminder |
| Experiment type | natural_field |
| Location | Outside US (Netherlands) |
| Population | Adults |
Description: Hospital staff used verbal prompts to remind patients to consume protein-rich foods. A simple reminder intervention in a healthcare setting.
| n Control | n Intervention | Cohen’s d | z |
|---|---|---|---|
| 93 | 62 | 0.96 | 5.56 |
Raw data: Control mean = 0.06, Intervention mean = 0.45, difference = 0.39
Why z is large: Large effect size (d ≈ 1.0). The verbal reminder dramatically increased protein consumption, from 6% to 45% of patients.
Title: Effective use of SMS: Improving government confirmation processes
Pattern: Moderate Both
Paper link: BETA Publications
| Attribute | Value |
|---|---|
| Domain | other (civic) |
| Category | information |
| Technique | visibility |
| Experiment type | natural_field |
| Location | Australia |
| Population | Adults |
Description: The Behavioural Economics Team of the Australian Government (BETA) tested SMS-based nudges to improve government confirmation processes.
| n Control | n Intervention | Cohen’s d | z |
|---|---|---|---|
| 1,437 | 1,415 | 0.24 | 6.30 |
Why z is large: Moderate effect with moderate-large sample (~2,800 total).
Title: “Yes/no/not right now”: Yes/no response formats can increase response rates even in non-forced-choice settings
Pattern: Massive Sample
Paper link: Search on Google Scholar
| Attribute | Value |
|---|---|
| Domain | health |
| Category | structure |
| Technique | default |
| Experiment type | natural_field |
| Location | United States |
| Population | Adults |
Description: Tested how different response formats (yes/no vs. other options) affect health-related decision-making at scale.
| n Control | n Intervention | Cohen’s d | z |
|---|---|---|---|
| 6,377 | 8,562 | 0.18 | 10.60 |
| 8,227 | 7,760 | 0.15 | 8.53 |
| 8,876 | 8,316 | 0.09 | 6.59 |
Why z is large: Massive samples (n > 14,000 per comparison) make small effects (d = 0.09-0.18) highly significant.
Title: Ethically deployed defaults: Transparency and consumer protection through disclosure and preference articulation
Pattern: Large Effect
Paper link: SAGE Journals
Citation: Journal of Marketing Research, 53(5), 865-880
| Attribute | Value |
|---|---|
| Domain | other/food |
| Category | structure |
| Technique | default |
| Experiment type | natural_field |
| Location | United States |
| Population | Adults |
Description: Seven experiments testing whether disclosing default intentions affects their effectiveness. Found that disclosure does not reduce default effects because consumers don’t understand how to counter the processes by which defaults bias judgment.
| Effect | n Control | n Intervention | Cohen’s d | z |
|---|---|---|---|---|
| 1 | 53 | 52 | 2.65 | 9.90 |
| 2 | 58 | 53 | 1.67 | 20.60 |
| 3 | 58 | 53 | 1.23 | 15.71 |
| 4 | 58 | 53 | 1.01 | 12.45 |
| 5 | 58 | 53 | 0.86 | 8.53 |
| 6 | 58 | 53 | 0.90 | 8.46 |
| 7 | 55 | 48 | 2.21 | 8.94 |
Why z is large: Extremely large effect sizes (d = 0.86 to 2.65). Defaults have powerful effects even when disclosed.
Title: Increased portion size leads to increased energy intake in a restaurant meal
Pattern: Large Effect
Citation: Obesity Research, 12(3), 562-568
| Attribute | Value |
|---|---|
| Domain | food |
| Category | structure |
| Technique | default |
| Experiment type | natural_field |
| Location | United States |
| Population | Adults |
Description: In a cafeteria-style restaurant, the portion size of a pasta entrée was varied (standard 248g vs. large 377g) without changing price. Customers purchasing the larger portion consumed 43% more calories from the entrée and 25% more from the entire meal, with no difference in perceived appropriateness of portion size.
| n Control | n Intervention | Cohen’s d | z |
|---|---|---|---|
| 89 | 91 | 3.08 | 13.98 |
Raw data: Control mean = 1671 kcal, Intervention mean = 2390 kcal (SD: 122.6 vs 305.3)
Why z is large: Very large effect size (d = 3.08). Portion size had an enormous impact on consumption—people ate 43% more when given larger portions.
Title: Red Potato Chips: Segmentation cues can substantially decrease food intake
Pattern: Large Effect
Paper link: PubMed | ResearchGate
Citation: Health Psychology, 31(3), 398-401
| Attribute | Value |
|---|---|
| Domain | food |
| Category | structure |
| Technique | composition |
| Experiment type | conventional_lab |
| Location | United States |
| Population | Adults |
Description: Participants ate from tubes of potato chips while watching a movie. Treatment groups had red-colored chips inserted at regular intervals (every 5th, 7th, 10th, or 14th chip) as “segmentation cues.” These acted as subconscious “stop signs” that curtailed consumption by more than 50%.
| n Control | n Intervention | Cohen’s d | z |
|---|---|---|---|
| 13 | 14 | 3.00 | 6.19 |
| 13 | 12 | 2.13 | 5.37 |
Why z is large: Very large effect sizes (d = 2.1-3.0). Despite tiny samples (~25), the effect was so dramatic (50%+ reduction in consumption) that it achieved high significance.
Note: This is Wansink research; interpret with awareness of subsequent concerns about his work.
Title: Applying behavioural insights to organ donation: Preliminary results from a randomised controlled trial
Pattern: Massive Sample
Paper link: BIT Publication | PMC
| Attribute | Value |
|---|---|
| Domain | pro-social |
| Category | information |
| Technique | visibility / social_reference |
| Experiment type | natural_field |
| Location | United Kingdom |
| Population | Adults |
Description: One of the largest RCTs ever conducted in the UK. Over 1 million people renewing vehicle tax or registering for driving licences online were quasi-randomly assigned to see one of 8 different message variants encouraging organ donor registration. Messages varied social norms framing, reciprocity appeals, and visual presentation.
| n Control | n Intervention | Cohen’s d | z |
|---|---|---|---|
| 135,665 | 135,665 | 0.05 | 13.15 |
| 135,665 | 135,665 | 0.04 | 9.81 |
| 135,665 | 135,665 | 0.04 | 9.81 |
| 135,665 | 135,665 | 0.04 | 9.81 |
Why z is large: Massive sample (n = 271,330 per comparison). Effect sizes are negligible (d ≈ 0.04-0.05), but with 270,000+ participants, SE shrinks to 0.004. At population scale, the best message could yield ~96,000 additional registrations per year.
Title: Defaults, Framing and Privacy: Why Opting In-Opting Out
Pattern: Large Effect
Citation: Marketing Letters, 13, 5-15
| Attribute | Value |
|---|---|
| Domain | other (privacy) |
| Category | structure |
| Technique | default |
| Experiment type | artefactual_field |
| Location | United States |
| Population | Adults |
Description: Classic study on opt-in vs. opt-out defaults for marketing contact. In opt-in condition, checkbox was unchecked (requiring action to agree); in opt-out, checkbox was pre-checked (requiring action to refuse). Found that default has larger effect than frame, and effects are additive.
| n Control | n Intervention | Cohen’s d | z |
|---|---|---|---|
| 69 | 69 | 1.22 | 6.58 |
Raw data: Opt-in consent rate = 48%, Opt-out consent rate = 96%
Why z is large: Very large effect size (d = 1.22). The default nearly doubled consent rates, from 48% to 96%.
Title: Bottoms Up! The Influence of Elongation on Pouring and Consumption Volume
Pattern: Large Effect
Paper link: Oxford Academic | SSRN
Citation: Journal of Consumer Research, 30(3), 455-463
| Attribute | Value |
|---|---|
| Domain | food |
| Category | structure |
| Technique | default |
| Experiment type | natural_field |
| Location | United States |
| Population | Children/Adolescents |
Description: Children in cafeterias were given either short, wide glasses or tall, slender glasses (same volume) to pour their own juice. Due to perceptual biases, people underestimate volume in short, wide containers. Children poured 74% more juice into short glasses and thought they poured less. Even experienced bartenders poured 20.5% more into short glasses.
| n Control | n Intervention | Cohen’s d | z |
|---|---|---|---|
| 49 | 48 | 1.97 | 7.95 |
Raw data: Tall glass mean = 5.54 oz, Short glass mean = 9.66 oz
Why z is large: Very large effect size (d = 1.97). Glass shape dramatically affected pouring behavior—nearly 2 standard deviations difference.
Note: Wansink research; interpret with awareness of subsequent concerns.
Title: Domestic uptake of green energy promoted by opt-out tariffs
Pattern: Large Effect
Paper link: Nature Climate Change
Citation: Nature Climate Change, 5, 868-871
| Attribute | Value |
|---|---|
| Domain | environment |
| Category | structure |
| Technique | default |
| Experiment type | natural_field |
| Location | Germany |
| Population | Adults |
Description: Nearly 42,000 German households were randomized to either opt-in or opt-out for green energy contracts (from renewable sources, ~$21/year more). Setting green energy as the default (opt-out) increased purchases nearly tenfold: 69.1% acceptance in opt-out vs. 7.2% in opt-in.
| n Control | n Intervention | Cohen’s d | z |
|---|---|---|---|
| 20,976 | 20,976 | 1.42 | 141.94 |
| 145 | 145 | 1.39 | 10.63 |
Why z is large: The first effect combines a large effect (d = 1.42) with a massive sample (n = 42,000), producing the highest z in the dataset (141.9). This represents both a large effect AND a large sample.
Title: If you ask, they will come (to register and vote): Field experiments with state election agencies on encouraging voter registration
Pattern: Massive Sample
Paper link: ScienceDirect
Citation: Electoral Studies, 63, 102021
| Attribute | Value |
|---|---|
| Domain | other (civic) |
| Category | assistance |
| Technique | reminder |
| Experiment type | natural_field |
| Location | United States |
| Population | Adults |
Description: State election agencies sent low-cost postcards to eligible but unregistered citizens encouraging voter registration. The postcards served as reminders with information on how to register. Conducted in partnership with state governments.
| n Control | n Intervention | Cohen’s d | z |
|---|---|---|---|
| 30,439 | 184,521 | 0.10 | 15.40 |
| 30,439 | 184,521 | 0.10 | 15.40 |
| 30,439 | 184,521 | 0.10 | 15.40 |
| 30,439 | 184,521 | 0.09 | 14.75 |
| 30,439 | 184,521 | 0.10 | 15.40 |
| 30,439 | 184,521 | 0.09 | 5.12 |
Raw data: Control registration rate ≈ 5%, Intervention rate ≈ 7%
Why z is large: Massive sample (n = 215,000). Effect sizes are small (d ≈ 0.10), but the huge sample makes them highly significant. At scale, a 2 percentage point increase means thousands more registered voters.
Title: Using insights from behavioral economics to nudge individuals towards healthier choices when eating out: A restaurant experiment
Pattern: Large Effect
Paper link: ScienceDirect
Citation: Food Quality and Preference, 73, 56-64
| Attribute | Value |
|---|---|
| Domain | food |
| Category | structure |
| Technique | default |
| Experiment type | framed_field |
| Location | Canada |
| Population | Adults |
Description: Field experiment with real restaurant customers testing default options on dessert order forms. When the healthy dessert option (e.g., fruit) was pre-selected as the default, customers had to actively opt out to choose the indulgent option (e.g., cake).
| n Control | n Intervention | Cohen’s d | z |
|---|---|---|---|
| 48 | 50 | 1.19 | 5.44 |
Raw data: Control healthy choice rate = 31%, Intervention = 86%
Why z is large: Large effect size (d = 1.19). Setting a healthy default shifted dessert selection from 31% to 86% healthy—a 55 percentage point difference.
Title: Larger bowl size increases the amount of cereal children request, consume, and waste
Pattern: Large Effect
Paper link: PubMed | ScienceDirect
Citation: Journal of Pediatrics, 164(2), 323-326
| Attribute | Value |
|---|---|
| Domain | food |
| Category | structure |
| Technique | default |
| Experiment type | framed_field |
| Location | United States |
| Population | Children/Adolescents |
Description: Children were randomized to receive small (8 oz) or large (16 oz) cereal bowls and asked how much cereal they wanted for a snack. Children with larger bowls requested 87% more cereal on average.
| n Control | n Intervention | Cohen’s d | z |
|---|---|---|---|
| 35 | 34 | 1.43 | 5.29 |
Raw data: Small bowl mean = 24.7g (SD 14.9), Large bowl mean = 46.1g (SD 15.1)
Why z is large: Large effect size (d = 1.43). Bowl size dramatically influenced how much children requested.
Note: Wansink research; interpret with awareness of subsequent concerns.
Title: The importance of framing for consumer acceptance of the Smart Grid: A comparative study of Denmark, Norway and Switzerland
Pattern: Moderate Both
Paper link: ScienceDirect | SSRN
Citation: Energy Research & Social Science, 3, 113-123
| Attribute | Value |
|---|---|
| Domain | environment |
| Category | structure |
| Technique | default |
| Experiment type | artefactual_field |
| Location | Denmark, Norway, Switzerland |
| Population | Adults |
Description: Online experiment in three countries testing opt-in vs. opt-out vs. active choice framing for Smart Grid participation. Opt-out framing led to significantly higher participation rates than opt-in.
| Country | n Control | n Intervention | Cohen’s d | z |
|---|---|---|---|---|
| 1 | 451 | 450 | 0.39 | 5.76 |
| 2 | 478 | 478 | 0.42 | 6.37 |
| 3 | 486 | 487 | 0.56 | 8.45 |
Raw data: Participation rates: ~58-60% opt-in vs. ~76-79% opt-out (18-19 pp difference)
Why z is large: Moderate effects (d = 0.39-0.56) with moderate samples (~900-1000), just crossing z > 5.
Title: Bad popcorn in big buckets: Portion size can influence intake as much as taste
Pattern: Large Effect
Paper link: PubMed | ResearchGate
| Attribute | Value |
|---|---|
| Domain | food |
| Category | structure |
| Technique | default |
| Experiment type | natural_field |
| Location | United States |
| Population | Adults |
Description: Moviegoers were given free popcorn in either medium (120g) or large (240g) containers, with popcorn that was either fresh or stale (14 days old). Even with stale popcorn, people ate 33.6% more from large containers. With fresh popcorn, they ate 45.3% more.
| n Control | n Intervention | Cohen’s d | z |
|---|---|---|---|
| 68 | 90 | 1.45 | 5.71 |
Why z is large: Large effect size (d = 1.45). Container size influenced intake as much as taste—an environmental cue overpowered palatability.
Note: Wansink research; interpret with awareness of subsequent concerns.
Title: The effect of a default-based nudge on the choice of whole wheat bread
Pattern: Large Effect
Paper link: ScienceDirect
Citation: Appetite, 121, 179-185
| Attribute | Value |
|---|---|
| Domain | food |
| Category | structure |
| Technique | default |
| Experiment type | framed_field |
| Location | Netherlands |
| Population | Adults |
Description: Field experiment at a Dutch university testing whole wheat bread as the default sandwich option. When healthy bread was the default, 94% of participants stuck with this option.
| n Control | n Intervention | Cohen’s d | z |
|---|---|---|---|
| 57 | 56 | 2.24 | 9.50 |
| 57 | 56 | 1.48 | 6.84 |
Why z is large: Very large effect sizes (d = 1.48-2.24). Defaults had a dramatic impact on sandwich bread choice.
Title: The application of defaults to optimize parents’ health-based choices for children
Pattern: Large Effect
Paper link: PubMed
| Attribute | Value |
|---|---|
| Domain | health/food |
| Category | structure |
| Technique | default |
| Experiment type | framed_field |
| Location | United States |
| Population | Adults (parents choosing for children) |
Description: Two experiments tested “optimal defaults” on parents’ health choices for children—one on breakfast food selections and one on activity choice. Results showed default condition significantly predicted choice (healthier vs. less healthy option).
| n Control | n Intervention | Cohen’s d | z |
|---|---|---|---|
| 56 | 48 | 1.92 | 6.24 |
Why z is large: Very large effect size (d = 1.92). Optimal defaults strongly influenced parents’ food choices for their children.
The remaining 25 papers follow similar patterns. Key highlights include:
Larrick & Soll (2008): “The MPG Illusion” - Showed that expressing fuel efficiency as gallons per mile (instead of miles per gallon) helps people make better decisions. Effect d = 0.81, z = 5.06. Duke News
Everett et al. (2015): Default effects in charitable giving - participants more likely to donate when donation was the default option, mediated by perceived social norms. d = 1.34, z = 7.23. ResearchGate
Tannenbaum et al. (2013): Partitioning and food choice experiments. Multiple effects with d = 0.77-0.89, z = 5.96-6.90.
Bohnet et al. (2016): Gender bias and evaluation nudges - joint (vs. separate) evaluation reduces bias. d = 1.53, z = 5.20. Harvard Kennedy School
Damgaard & Gravert (2017): Large-scale reminder experiment. d = 0.09, n > 40,000, z = 6.50.
Kesternich et al. (2019): Environmental default experiment with large sample. d = 0.21, n > 4,000, z = 10.65.
Papers also include work by: Young et al. (2009), Shevchenko et al. (2014), Hou (2017), Wansink et al. (2017), Shealy et al. (2018), Trevana et al. (2006), Haward et al. (2012), Schwartz (2007), Rosenkranz et al. (2017), Gartner (2018), Baek et al. (2014), Schulz et al. (2018), Van Bavel et al. (2019), Keller et al. (2011), BETA (2018), Van Dalen & Henkens (2014), and Isaksen et al. (2019).
| Pattern | Count | % |
|---|---|---|
| Large Effect | 32 | 74% |
| Massive Sample | 5 | 12% |
| Moderate Both | 6 | 14% |
Most large-z studies reflect genuinely large effects. Three-quarters of papers have Cohen’s d ≥ 0.5, meaning the interventions produced dramatic behavioral changes. Choice architecture interventions—especially defaults—can double or even triple the rate of desired behaviors.
A few studies leverage massive samples. The BIT organ donation study, Mann & Bryant voter registration study, and similar government-scale experiments have tiny individual effects (d ≈ 0.05-0.10) but achieve high z-values through sheer sample size. These effects matter at population scale.
Default interventions dominate. Of the 32 large-effect papers, the majority use default manipulations (opt-in vs. opt-out, pre-selected options). Defaults remain the most powerful nudge technique.
Food domain is well-represented. Many large-effect studies involve food choice (portion size, plate shape, menu defaults), where environmental cues strongly influence consumption.
Interpretation requires context. A z-statistic of 15 could reflect either a highly impactful intervention or a trivially small effect detected with massive precision. Understanding the pattern is essential for policy translation.
OSF Project: The effectiveness of nudging: A meta-analysis of choice architecture interventions across behavioral domains
Citation: Mertens, S., Herberz, M., Hahnel, U.J.J., & Brosch, T. (2022). The effectiveness of nudging: A meta-analysis of choice architecture interventions across behavioral domains. PNAS.