Studies with Large Z-Statistics (|z| > 5): Complete Report

Background

This report examines studies with unusually large z-statistics from Mertens, Herberz, Hahnel, & Brosch (2022), “The effectiveness of nudging: A meta-analysis of choice architecture interventions across behavioral domains,” published in PNAS. The meta-analysis synthesizes evidence on “nudges”—interventions that modify the choice environment to steer behavior without restricting options or changing economic incentives.

What counts as a nudge? The meta-analysis included studies testing choice architecture interventions classified into three categories: (1) decision structure interventions that change option-related effort or set defaults; (2) decision information interventions that translate or make information more visible; and (3) decision assistance interventions like reminders or commitment devices. The nine specific techniques are: default, reminder, composition, translation, social reference, visibility, commitment, effort, and consequence.

What studies were included? Mertens et al. included experimental and quasi-experimental studies that (a) tested a choice architecture intervention, (b) measured actual behavior (not just intentions), (c) compared an intervention group to a control group, and (d) reported sufficient statistics to compute effect sizes. The final dataset contains 447 effect sizes from 212 publications spanning six behavioral domains: food, health, environment, finance, pro-social behavior, and other.

What this report examines: All 68 effect sizes (15.2% of the total) with z-statistics exceeding 5 in absolute value—indicating effects estimated with unusually high precision. We investigate why these z-statistics are large: is it because the interventions produced genuinely dramatic behavioral changes, or because massive sample sizes made even tiny effects statistically detectable?


Summary Statistics

  • Total effect sizes in dataset: 447
  • Effect sizes with |z| > 5: 68 (15.2%)
  • Unique papers: 43

The Three Patterns

We classify papers into three patterns based on what drives their large z-statistics:

Pattern Papers Definition Interpretation
Large Effect 32 Cohen’s d ≥ 0.5 Genuinely large behavioral effects; the intervention caused dramatic changes
Massive Sample 5 n ≥ 10,000 and d < 0.5 Tiny effects detected with extreme precision due to huge samples
Moderate Both 6 Otherwise Moderate effects with moderate samples, just crossing z > 5 threshold

Why This Matters

A large z-statistic (|z| > 5) tells us the effect is precisely estimated and statistically certain, but says nothing about whether the effect is large or practically important. Understanding which pattern applies is crucial:

  • Large Effect studies demonstrate that choice architecture can dramatically change behavior (e.g., doubling healthy food selection)
  • Massive Sample studies show effects that are negligible at the individual level but potentially important at population scale (e.g., 1 percentage point increase across millions of people)

Representative Examples

Pattern 1: Large Effect

Example: Johnson et al. (2002) - Privacy Defaults

In a study on online privacy choices, participants were either given an opt-in default (checkbox unchecked = no marketing contact) or opt-out default (checkbox pre-checked = receive marketing contact). The opt-out condition increased consent from 48% to 96%—a 48 percentage point increase.

  • Cohen’s d = 1.22 (very large)
  • Sample: n = 138 (small)
  • z = 6.58

Why z is large: The effect size itself is enormous. Despite a small sample, the dramatic behavioral change produces high statistical significance.


Pattern 2: Massive Sample

Example: BIT (2013) - Organ Donation Registration

The UK’s Behavioural Insights Team tested different messages encouraging organ donor registration among people renewing vehicle tax online. Over 1 million people were quasi-randomly assigned to different message conditions.

  • Cohen’s d = 0.05 (negligible)
  • Sample: n = 271,330 (massive)
  • z = 13.15

Why z is large: The effect is tiny—about 1 percentage point difference in registration rates. But with 271,000 participants, the standard error shrinks to 0.004, making even this tiny effect highly statistically significant. At population scale, this translates to ~96,000 additional organ donor registrations per year.


Pattern 3: Moderate Both

Example: Broman et al. (2014) - Smart Grid Acceptance

Participants were randomly assigned to opt-in vs. opt-out framing for Smart Grid technology acceptance in Denmark, Norway, and Switzerland.

  • Cohen’s d = 0.39-0.42 (moderate)
  • Sample: n ≈ 900 (moderate)
  • z ≈ 5.8-6.4

Why z is large: Neither the effect nor the sample is extreme, but the combination just crosses the z > 5 threshold. The 18-19 percentage point difference with ~900 participants produces z values in the 5-6 range.


All Studies (43 Papers in Random Order)


1. Van der Zanden et al. (2015)

Title: Using a verbal prompt to increase protein consumption in a hospital setting: A field study

Pattern: Large Effect

Paper link: Search on Google Scholar

Intervention

Attribute Value
Domain food
Category assistance
Technique reminder
Experiment type natural_field
Location Outside US (Netherlands)
Population Adults

Description: Hospital staff used verbal prompts to remind patients to consume protein-rich foods. A simple reminder intervention in a healthcare setting.

Effect Sizes (1 with |z| > 5)

n Control n Intervention Cohen’s d z
93 62 0.96 5.56

Raw data: Control mean = 0.06, Intervention mean = 0.45, difference = 0.39

Why z is large: Large effect size (d ≈ 1.0). The verbal reminder dramatically increased protein consumption, from 6% to 45% of patients.


2. BETA (2017)

Title: Effective use of SMS: Improving government confirmation processes

Pattern: Moderate Both

Paper link: BETA Publications

Intervention

Attribute Value
Domain other (civic)
Category information
Technique visibility
Experiment type natural_field
Location Australia
Population Adults

Description: The Behavioural Economics Team of the Australian Government (BETA) tested SMS-based nudges to improve government confirmation processes.

Effect Sizes (1 with |z| > 5)

n Control n Intervention Cohen’s d z
1,437 1,415 0.24 6.30

Why z is large: Moderate effect with moderate-large sample (~2,800 total).


3. Putnam-Farr & Riis (2016)

Title: “Yes/no/not right now”: Yes/no response formats can increase response rates even in non-forced-choice settings

Pattern: Massive Sample

Paper link: Search on Google Scholar

Intervention

Attribute Value
Domain health
Category structure
Technique default
Experiment type natural_field
Location United States
Population Adults

Description: Tested how different response formats (yes/no vs. other options) affect health-related decision-making at scale.

Effect Sizes (3 with |z| > 5)

n Control n Intervention Cohen’s d z
6,377 8,562 0.18 10.60
8,227 7,760 0.15 8.53
8,876 8,316 0.09 6.59

Why z is large: Massive samples (n > 14,000 per comparison) make small effects (d = 0.09-0.18) highly significant.


4. Steffel et al. (2016)

Title: Ethically deployed defaults: Transparency and consumer protection through disclosure and preference articulation

Pattern: Large Effect

Paper link: SAGE Journals

Citation: Journal of Marketing Research, 53(5), 865-880

Intervention

Attribute Value
Domain other/food
Category structure
Technique default
Experiment type natural_field
Location United States
Population Adults

Description: Seven experiments testing whether disclosing default intentions affects their effectiveness. Found that disclosure does not reduce default effects because consumers don’t understand how to counter the processes by which defaults bias judgment.

Effect Sizes (7 with |z| > 5)

Effect n Control n Intervention Cohen’s d z
1 53 52 2.65 9.90
2 58 53 1.67 20.60
3 58 53 1.23 15.71
4 58 53 1.01 12.45
5 58 53 0.86 8.53
6 58 53 0.90 8.46
7 55 48 2.21 8.94

Why z is large: Extremely large effect sizes (d = 0.86 to 2.65). Defaults have powerful effects even when disclosed.


5. Diliberti et al. (2004)

Title: Increased portion size leads to increased energy intake in a restaurant meal

Pattern: Large Effect

Paper link: PubMed | Wiley

Citation: Obesity Research, 12(3), 562-568

Intervention

Attribute Value
Domain food
Category structure
Technique default
Experiment type natural_field
Location United States
Population Adults

Description: In a cafeteria-style restaurant, the portion size of a pasta entrée was varied (standard 248g vs. large 377g) without changing price. Customers purchasing the larger portion consumed 43% more calories from the entrée and 25% more from the entire meal, with no difference in perceived appropriateness of portion size.

Effect Sizes (1 with |z| > 5)

n Control n Intervention Cohen’s d z
89 91 3.08 13.98

Raw data: Control mean = 1671 kcal, Intervention mean = 2390 kcal (SD: 122.6 vs 305.3)

Why z is large: Very large effect size (d = 3.08). Portion size had an enormous impact on consumption—people ate 43% more when given larger portions.


6. Geier et al. (2012)

Title: Red Potato Chips: Segmentation cues can substantially decrease food intake

Pattern: Large Effect

Paper link: PubMed | ResearchGate

Citation: Health Psychology, 31(3), 398-401

Intervention

Attribute Value
Domain food
Category structure
Technique composition
Experiment type conventional_lab
Location United States
Population Adults

Description: Participants ate from tubes of potato chips while watching a movie. Treatment groups had red-colored chips inserted at regular intervals (every 5th, 7th, 10th, or 14th chip) as “segmentation cues.” These acted as subconscious “stop signs” that curtailed consumption by more than 50%.

Effect Sizes (2 with |z| > 5)

n Control n Intervention Cohen’s d z
13 14 3.00 6.19
13 12 2.13 5.37

Why z is large: Very large effect sizes (d = 2.1-3.0). Despite tiny samples (~25), the effect was so dramatic (50%+ reduction in consumption) that it achieved high significance.

Note: This is Wansink research; interpret with awareness of subsequent concerns about his work.


7. BIT (2013)

Title: Applying behavioural insights to organ donation: Preliminary results from a randomised controlled trial

Pattern: Massive Sample

Paper link: BIT Publication | PMC

Intervention

Attribute Value
Domain pro-social
Category information
Technique visibility / social_reference
Experiment type natural_field
Location United Kingdom
Population Adults

Description: One of the largest RCTs ever conducted in the UK. Over 1 million people renewing vehicle tax or registering for driving licences online were quasi-randomly assigned to see one of 8 different message variants encouraging organ donor registration. Messages varied social norms framing, reciprocity appeals, and visual presentation.

Effect Sizes (4 with |z| > 5)

n Control n Intervention Cohen’s d z
135,665 135,665 0.05 13.15
135,665 135,665 0.04 9.81
135,665 135,665 0.04 9.81
135,665 135,665 0.04 9.81

Why z is large: Massive sample (n = 271,330 per comparison). Effect sizes are negligible (d ≈ 0.04-0.05), but with 270,000+ participants, SE shrinks to 0.004. At population scale, the best message could yield ~96,000 additional registrations per year.


8. Johnson et al. (2002)

Title: Defaults, Framing and Privacy: Why Opting In-Opting Out

Pattern: Large Effect

Paper link: Springer | SSRN

Citation: Marketing Letters, 13, 5-15

Intervention

Attribute Value
Domain other (privacy)
Category structure
Technique default
Experiment type artefactual_field
Location United States
Population Adults

Description: Classic study on opt-in vs. opt-out defaults for marketing contact. In opt-in condition, checkbox was unchecked (requiring action to agree); in opt-out, checkbox was pre-checked (requiring action to refuse). Found that default has larger effect than frame, and effects are additive.

Effect Sizes (1 with |z| > 5)

n Control n Intervention Cohen’s d z
69 69 1.22 6.58

Raw data: Opt-in consent rate = 48%, Opt-out consent rate = 96%

Why z is large: Very large effect size (d = 1.22). The default nearly doubled consent rates, from 48% to 96%.


9. Wansink & van Ittersum (2003)

Title: Bottoms Up! The Influence of Elongation on Pouring and Consumption Volume

Pattern: Large Effect

Paper link: Oxford Academic | SSRN

Citation: Journal of Consumer Research, 30(3), 455-463

Intervention

Attribute Value
Domain food
Category structure
Technique default
Experiment type natural_field
Location United States
Population Children/Adolescents

Description: Children in cafeterias were given either short, wide glasses or tall, slender glasses (same volume) to pour their own juice. Due to perceptual biases, people underestimate volume in short, wide containers. Children poured 74% more juice into short glasses and thought they poured less. Even experienced bartenders poured 20.5% more into short glasses.

Effect Sizes (1 with |z| > 5)

n Control n Intervention Cohen’s d z
49 48 1.97 7.95

Raw data: Tall glass mean = 5.54 oz, Short glass mean = 9.66 oz

Why z is large: Very large effect size (d = 1.97). Glass shape dramatically affected pouring behavior—nearly 2 standard deviations difference.

Note: Wansink research; interpret with awareness of subsequent concerns.


10. Ebeling & Lotz (2015)

Title: Domestic uptake of green energy promoted by opt-out tariffs

Pattern: Large Effect

Paper link: Nature Climate Change

Citation: Nature Climate Change, 5, 868-871

Intervention

Attribute Value
Domain environment
Category structure
Technique default
Experiment type natural_field
Location Germany
Population Adults

Description: Nearly 42,000 German households were randomized to either opt-in or opt-out for green energy contracts (from renewable sources, ~$21/year more). Setting green energy as the default (opt-out) increased purchases nearly tenfold: 69.1% acceptance in opt-out vs. 7.2% in opt-in.

Effect Sizes (2 with |z| > 5)

n Control n Intervention Cohen’s d z
20,976 20,976 1.42 141.94
145 145 1.39 10.63

Why z is large: The first effect combines a large effect (d = 1.42) with a massive sample (n = 42,000), producing the highest z in the dataset (141.9). This represents both a large effect AND a large sample.


11. Mann & Bryant (2019)

Title: If you ask, they will come (to register and vote): Field experiments with state election agencies on encouraging voter registration

Pattern: Massive Sample

Paper link: ScienceDirect

Citation: Electoral Studies, 63, 102021

Intervention

Attribute Value
Domain other (civic)
Category assistance
Technique reminder
Experiment type natural_field
Location United States
Population Adults

Description: State election agencies sent low-cost postcards to eligible but unregistered citizens encouraging voter registration. The postcards served as reminders with information on how to register. Conducted in partnership with state governments.

Effect Sizes (6 with |z| > 5)

n Control n Intervention Cohen’s d z
30,439 184,521 0.10 15.40
30,439 184,521 0.10 15.40
30,439 184,521 0.10 15.40
30,439 184,521 0.09 14.75
30,439 184,521 0.10 15.40
30,439 184,521 0.09 5.12

Raw data: Control registration rate ≈ 5%, Intervention rate ≈ 7%

Why z is large: Massive sample (n = 215,000). Effect sizes are small (d ≈ 0.10), but the huge sample makes them highly significant. At scale, a 2 percentage point increase means thousands more registered voters.


12. Bergeron et al. (2019)

Title: Using insights from behavioral economics to nudge individuals towards healthier choices when eating out: A restaurant experiment

Pattern: Large Effect

Paper link: ScienceDirect

Citation: Food Quality and Preference, 73, 56-64

Intervention

Attribute Value
Domain food
Category structure
Technique default
Experiment type framed_field
Location Canada
Population Adults

Description: Field experiment with real restaurant customers testing default options on dessert order forms. When the healthy dessert option (e.g., fruit) was pre-selected as the default, customers had to actively opt out to choose the indulgent option (e.g., cake).

Effect Sizes (1 with |z| > 5)

n Control n Intervention Cohen’s d z
48 50 1.19 5.44

Raw data: Control healthy choice rate = 31%, Intervention = 86%

Why z is large: Large effect size (d = 1.19). Setting a healthy default shifted dessert selection from 31% to 86% healthy—a 55 percentage point difference.


13. Wansink et al. (2014)

Title: Larger bowl size increases the amount of cereal children request, consume, and waste

Pattern: Large Effect

Paper link: PubMed | ScienceDirect

Citation: Journal of Pediatrics, 164(2), 323-326

Intervention

Attribute Value
Domain food
Category structure
Technique default
Experiment type framed_field
Location United States
Population Children/Adolescents

Description: Children were randomized to receive small (8 oz) or large (16 oz) cereal bowls and asked how much cereal they wanted for a snack. Children with larger bowls requested 87% more cereal on average.

Effect Sizes (1 with |z| > 5)

n Control n Intervention Cohen’s d z
35 34 1.43 5.29

Raw data: Small bowl mean = 24.7g (SD 14.9), Large bowl mean = 46.1g (SD 15.1)

Why z is large: Large effect size (d = 1.43). Bowl size dramatically influenced how much children requested.

Note: Wansink research; interpret with awareness of subsequent concerns.


14. Broman et al. (2014)

Title: The importance of framing for consumer acceptance of the Smart Grid: A comparative study of Denmark, Norway and Switzerland

Pattern: Moderate Both

Paper link: ScienceDirect | SSRN

Citation: Energy Research & Social Science, 3, 113-123

Intervention

Attribute Value
Domain environment
Category structure
Technique default
Experiment type artefactual_field
Location Denmark, Norway, Switzerland
Population Adults

Description: Online experiment in three countries testing opt-in vs. opt-out vs. active choice framing for Smart Grid participation. Opt-out framing led to significantly higher participation rates than opt-in.

Effect Sizes (3 with |z| > 5)

Country n Control n Intervention Cohen’s d z
1 451 450 0.39 5.76
2 478 478 0.42 6.37
3 486 487 0.56 8.45

Raw data: Participation rates: ~58-60% opt-in vs. ~76-79% opt-out (18-19 pp difference)

Why z is large: Moderate effects (d = 0.39-0.56) with moderate samples (~900-1000), just crossing z > 5.


15. Wansink & Kim (2005)

Title: Bad popcorn in big buckets: Portion size can influence intake as much as taste

Pattern: Large Effect

Paper link: PubMed | ResearchGate

Intervention

Attribute Value
Domain food
Category structure
Technique default
Experiment type natural_field
Location United States
Population Adults

Description: Moviegoers were given free popcorn in either medium (120g) or large (240g) containers, with popcorn that was either fresh or stale (14 days old). Even with stale popcorn, people ate 33.6% more from large containers. With fresh popcorn, they ate 45.3% more.

Effect Sizes (1 with |z| > 5)

n Control n Intervention Cohen’s d z
68 90 1.45 5.71

Why z is large: Large effect size (d = 1.45). Container size influenced intake as much as taste—an environmental cue overpowered palatability.

Note: Wansink research; interpret with awareness of subsequent concerns.


16. Van Kleef et al. (2018)

Title: The effect of a default-based nudge on the choice of whole wheat bread

Pattern: Large Effect

Paper link: ScienceDirect

Citation: Appetite, 121, 179-185

Intervention

Attribute Value
Domain food
Category structure
Technique default
Experiment type framed_field
Location Netherlands
Population Adults

Description: Field experiment at a Dutch university testing whole wheat bread as the default sandwich option. When healthy bread was the default, 94% of participants stuck with this option.

Effect Sizes (2 with |z| > 5)

n Control n Intervention Cohen’s d z
57 56 2.24 9.50
57 56 1.48 6.84

Why z is large: Very large effect sizes (d = 1.48-2.24). Defaults had a dramatic impact on sandwich bread choice.


17. Loeb et al. (2017)

Title: The application of defaults to optimize parents’ health-based choices for children

Pattern: Large Effect

Paper link: PubMed

Intervention

Attribute Value
Domain health/food
Category structure
Technique default
Experiment type framed_field
Location United States
Population Adults (parents choosing for children)

Description: Two experiments tested “optimal defaults” on parents’ health choices for children—one on breakfast food selections and one on activity choice. Results showed default condition significantly predicted choice (healthier vs. less healthy option).

Effect Sizes (1 with |z| > 5)

n Control n Intervention Cohen’s d z
56 48 1.92 6.24

Why z is large: Very large effect size (d = 1.92). Optimal defaults strongly influenced parents’ food choices for their children.


18-43: Remaining Papers

The remaining 25 papers follow similar patterns. Key highlights include:

Large Effect Papers (selected):

  • Larrick & Soll (2008): “The MPG Illusion” - Showed that expressing fuel efficiency as gallons per mile (instead of miles per gallon) helps people make better decisions. Effect d = 0.81, z = 5.06. Duke News

  • Everett et al. (2015): Default effects in charitable giving - participants more likely to donate when donation was the default option, mediated by perceived social norms. d = 1.34, z = 7.23. ResearchGate

  • Tannenbaum et al. (2013): Partitioning and food choice experiments. Multiple effects with d = 0.77-0.89, z = 5.96-6.90.

  • Bohnet et al. (2016): Gender bias and evaluation nudges - joint (vs. separate) evaluation reduces bias. d = 1.53, z = 5.20. Harvard Kennedy School

Massive Sample Papers:

  • Damgaard & Gravert (2017): Large-scale reminder experiment. d = 0.09, n > 40,000, z = 6.50.

  • Kesternich et al. (2019): Environmental default experiment with large sample. d = 0.21, n > 4,000, z = 10.65.

Additional Papers:

Papers also include work by: Young et al. (2009), Shevchenko et al. (2014), Hou (2017), Wansink et al. (2017), Shealy et al. (2018), Trevana et al. (2006), Haward et al. (2012), Schwartz (2007), Rosenkranz et al. (2017), Gartner (2018), Baek et al. (2014), Schulz et al. (2018), Van Bavel et al. (2019), Keller et al. (2011), BETA (2018), Van Dalen & Henkens (2014), and Isaksen et al. (2019).


Summary: Why Do These Studies Have Large Z-Statistics?

Pattern Distribution

Pattern Count %
Large Effect 32 74%
Massive Sample 5 12%
Moderate Both 6 14%

Key Insights

  1. Most large-z studies reflect genuinely large effects. Three-quarters of papers have Cohen’s d ≥ 0.5, meaning the interventions produced dramatic behavioral changes. Choice architecture interventions—especially defaults—can double or even triple the rate of desired behaviors.

  2. A few studies leverage massive samples. The BIT organ donation study, Mann & Bryant voter registration study, and similar government-scale experiments have tiny individual effects (d ≈ 0.05-0.10) but achieve high z-values through sheer sample size. These effects matter at population scale.

  3. Default interventions dominate. Of the 32 large-effect papers, the majority use default manipulations (opt-in vs. opt-out, pre-selected options). Defaults remain the most powerful nudge technique.

  4. Food domain is well-represented. Many large-effect studies involve food choice (portion size, plate shape, menu defaults), where environmental cues strongly influence consumption.

  5. Interpretation requires context. A z-statistic of 15 could reflect either a highly impactful intervention or a trivially small effect detected with massive precision. Understanding the pattern is essential for policy translation.


Data Sources

OSF Project: The effectiveness of nudging: A meta-analysis of choice architecture interventions across behavioral domains

Citation: Mertens, S., Herberz, M., Hahnel, U.J.J., & Brosch, T. (2022). The effectiveness of nudging: A meta-analysis of choice architecture interventions across behavioral domains. PNAS.