| Stage | Step | Description |
|---|---|---|
| 1 | Assemble Expert Panel | Recruit 5–20 experts with diverse, relevant expertise from varied locations and institutions. |
| 2 | Set Forecasting Tasks | Define forecasting tasks clearly and comprehensively. Distribute equally to all experts. |
| 3 | Initial Forecasts | Experts submit individual forecasts along with detailed qualitative justifications. |
| 4 | Compile & Summarize | Facilitator aggregates all forecasts and synthesizes qualitative reasoning into a summary. |
| 5 | Provide Feedback | Share summary statistics (median, IQR) and anonymized qualitative justifications with all experts. |
| 6 | Iterate | Experts revise their forecasts in light of group feedback. Repeat Stages 3–5 until convergence. |
| 7 | Final Forecast | Aggregate final-round forecasts (typically equal weighting) into the group forecast. |
The Delphi Method
Structured Expert Forecasting
1 What Is the Delphi Method?
The Delphi Method is a structured communication technique for obtaining reliable consensus estimates from a group of experts through iterative rounds of questionnaires and controlled feedback. Unlike a simple committee vote or group discussion, Delphi is designed to harness the collective wisdom of experts while systematically preventing the social dynamics that distort group judgment.
At its core, Delphi rests on a deceptively simple insight:
Forecasts from a group are generally more accurate than forecasts from individuals — but only when the group process is properly structured to avoid the pitfalls of face-to-face interaction.
2 Origins
2.1 The RAND Corporation and Cold War Forecasting
The Delphi Method was developed in the early 1950s at the RAND Corporation by Olaf Helmer and Norman Dalkey, under the project name Project DELPHI. The context was starkly practical: the U.S. military needed a way to forecast Soviet industrial capacity and estimate how many atomic bombs would be required to neutralize it. They needed expert judgment — but they also knew that face-to-face expert panels were prone to groupthink, deference to rank, and the loudest voice winning.
The solution was radical for its time: ask experts anonymously, aggregate their forecasts statistically, share each other’s reasoning without revealing who said what, and let them revise. Repeat until convergence. The method’s name points back, with deliberate irony, to one of history’s most famous — and famously ambiguous — sources of expert opinion.
2.2 A Note on the Name: Historical Oracles
2.2.1 The Oracle at Delphi
For nearly a thousand years (roughly 800 BCE to 390 CE), rulers, generals, and city-states traveled to the Temple of Apollo at Delphi, Greece to consult the Pythia — the Oracle — before making consequential decisions. Two kings consulting the same Oracle before wars against Persia illustrate opposite ends of what we might call the useful forecasting spectrum.
Croesus — Ambiguity and Ruin. When the Lydian king Croesus (r. 560–547 BCE) asked whether he should attack Persia, the Oracle replied that if he crossed the Halys River, “a great empire would be destroyed.” He attacked. A great empire was destroyed — his own. The forecast was technically accurate but operationally useless: it provided no specific guidance on which side would win, and Croesus’s arrogant assumption that the falling empire must be Persia led him directly to ruin. This is the forecasting failure mode of strategic ambiguity — a prediction so hedged that any outcome confirms it, giving the decision-maker false confidence while providing no real information.
Leonidas — Clarity and Sacrifice. The prophecy received by King Leonidas I of Sparta (r. 489–480 BCE) before the Battle of Thermopylae was the opposite: stark, specific, and actionable. The Oracle told the Spartans that either their city would be destroyed by the Persians, or they would mourn the death of a king. Unlike Croesus, Leonidas understood the choice perfectly. He led his famous three hundred to Thermopylae knowing he was choosing his own death to save Sparta — and the delay his stand created proved decisive in the broader Greek defense.
The contrast between these two kings captures something essential about what makes a forecast valuable. Croesus received a forecast that was precise-sounding but operationally empty. Leonidas received a forecast that was explicit about the tradeoffs and enabled a clear, rational decision — even a tragic one. A good Delphi process aims for Leonidas-style outputs: forecasts that are specific enough to inform real choices, with uncertainty acknowledged rather than hidden behind ambiguity.
The forecasting lesson: the quality of expert judgment is not just about accuracy — it is about whether the forecast is structured in a way that improves the decision it is meant to inform. Aggregating multiple expert views through a structured method guards against single-oracle ambiguity, but facilitators must also ensure that the final forecast is specific and actionable enough to be useful.
2.2.2 Alexander the Great and the Gordian Knot
When Alexander the Great (r. 336–323 BCE) arrived at Gordium in 333 BCE, he was confronted with the legendary Gordian Knot — an intricate knot that, according to prophecy, only the future ruler of Asia could untie. Faced with the intractable problem, Alexander drew his sword and cut it.
For forecasters, this episode poses a useful question: when do you reframe the problem entirely? Sometimes the Delphi process itself reveals that experts are answering the wrong question — a finding as valuable as any point forecast. The facilitator’s skill in recognizing and redirecting this is critical.
2.2.3 Pop Culture: The Oracle in The Matrix (2000)
For a more recent illustration, consider the Oracle in The Matrix — a figure of genuine wisdom who tells Neo, “You’re not the One,” specifically to prevent him from making a fatal mistake based on overconfidence. She doesn’t lie; she gives the forecast that is useful given the decision context, not merely the forecast that is technically accurate.
This illustrates a principle the Delphi method takes seriously: the purpose of a forecast is to improve decisions, not merely to be correct in the abstract. How a forecast is framed and communicated to decision-makers matters just as much as its numerical accuracy.
3 The Delphi Process
3.1 The Seven-Stage Process
The standard Delphi procedure follows an iterative structure:
3.2 Key Structural Features of Standard Delphi
Delphi’s effectiveness rests on four features that distinguish it from ordinary committee meetings. Each one addresses a specific failure mode of traditional group forecasting.
3.2.1 1. Anonymity
Anonymity is the foundational design choice that separates Delphi from a committee. By ensuring experts never know who submitted which forecast, the process eliminates several well-documented distortions:
- No political or social pressure — experts respond to the evidence, not to the room.
- Equal accountability — a junior analyst’s forecast receives the same weight as a senior director’s.
- No dominant personalities — the expert who speaks loudest in meetings has no structural advantage.
- No seating or seniority effects — research shows that in face-to-face meetings, estimates gravitate toward whoever sits at the head of the table or holds the highest title; anonymity removes this entirely.
3.2.2 2. Iteration
A single round of expert elicitation is just a survey. Delphi’s power comes from the structured repetition of the forecasting task across multiple rounds:
- Controlled feedback rounds — each round incorporates the information generated by the previous one.
- Experts revise with new information — exposure to the group’s range of estimates and reasoning prompts genuine reconsideration, not just social conformity.
- Consensus emerges gradually — convergence is earned through deliberation, not imposed by a vote or a chairperson’s summary.
- Reduces groupthink — because revision happens individually and asynchronously, experts cannot feed off each other’s enthusiasm or anxiety in real time.
A key practical decision in any Delphi is whether to run a single round or multiple rounds — and the right answer depends on the question and the resources available.
One-Shot (Single Round) Delphi collects expert estimates once, aggregates them statistically, and reports the result. It is fast and low-burden, but it captures only initial priors. Experts never see each other’s reasoning, so the forecast cannot benefit from information held by others in the panel. Think of it as a structured poll: better than asking one person, but not truly iterative. Use it when time is the binding constraint or when expert estimates are expected to be well-calibrated from the outset.
Academic journal peer review — the accept/revise/reject decision — is structurally a one-shot Delphi. An editor recruits 2–4 anonymous reviewers (the expert panel), each independently evaluates the manuscript and submits a written recommendation (the forecast), and the editor aggregates those judgments into a decision (statistical aggregation). Reviewers never interact, never see each other’s reports before submitting their own, and cannot revise in light of the group’s view.
It shares all the hallmarks: anonymity, independence, written justification, and aggregation by a facilitator (the editor). What it deliberately lacks is iteration — reviewers do not converge across rounds before the initial decision is made.
This is not an accident. The journal peer review system trades iterative accuracy for speed and scalability — it would be impractical to run three Delphi rounds for every manuscript. The cost is that reviewer disagreement is common, replication of peer review decisions is surprisingly low, and the final decision can hinge heavily on which two or three reviewers the editor happened to invite. These are precisely the failure modes that multi-round Delphi is designed to correct.
The “revise and resubmit” cycle does introduce some iteration — authors respond to reviewer comments, and reviewers re-evaluate — but this is iteration between author and reviewers, not among reviewers themselves. It is better understood as a form of the Estimate-Talk-Estimate variant than as standard multi-round Delphi.
Multi-Round Delphi (the standard form) runs at least two and typically three rounds. Round 1 surfaces the full range of initial estimates. The facilitator then compiles summary statistics and anonymized qualitative justifications and returns them to experts before Round 2. Experts who are outliers see where they diverge from the group and the reasoning behind the majority view — and can revise, defend, or hold firm. Round 3 (where used) typically achieves substantial convergence.
The practical guidance from the literature: two rounds capture most of the accuracy gains available from iteration. A third round adds marginal value and is worth including only when disagreement remains high after Round 2 or when the stakes of the forecast justify the additional time. Beyond three rounds, panel fatigue typically erodes the quality of responses faster than further iteration improves the forecast.
3.2.3 3. Controlled Feedback
Between rounds, the facilitator does not simply report raw results — they curate and synthesize:
- Summary statistics of forecasts — typically the median, interquartile range, and full range of expert estimates.
- Qualitative justifications shared — the reasoning behind each forecast is shared anonymously, so experts can evaluate arguments on their merits.
- Facilitator highlights outliers — extreme forecasts are flagged and their justifications given special attention, since outliers sometimes reflect superior information.
- Focus attention where needed — the facilitator directs subsequent rounds toward the specific dimensions where expert disagreement is largest.
3.2.4 4. Statistical Aggregation
Rather than reaching consensus through discussion or voting, the Delphi forecast is computed statistically from the final round of individual submissions:
- Equal weights to all experts — in the absence of strong evidence that some experts are more accurate than others, equal weighting is the defensible default.
- Watch for extreme values — a single outlier can substantially shift a mean; always inspect the full distribution before reporting a final figure.
- Median is often more robust — the median is resistant to extreme values and is typically preferred over the mean as the group forecast.
- Final round = group forecast — the aggregated result of the last iteration is the Delphi forecast, not a negotiated compromise.
3.3 Delphi Variants
The standard four-feature design above represents the “pure” Delphi. In practice, several variants have been developed that trade off some of these features for speed, flexibility, or richer qualitative output.
3.3.1 Estimate-Talk-Estimate (ETE)
The most widely used variant, Estimate-Talk-Estimate relaxes the anonymity constraint between rounds by allowing experts to interact — in person, by phone, or in a moderated discussion — before submitting revised estimates.
| Dimension | Standard Delphi | Estimate-Talk-Estimate |
|---|---|---|
| Anonymity | Full — no interaction between rounds | Partial — interaction allowed between rounds |
| Clarification Speed | Slow — written only | Fast — verbal clarification possible |
| Qualitative Depth | Moderate — written justifications | High — nuanced arguments surface more easily |
| Dominant Personality Risk | None | Moderate — can re-emerge in discussion |
| Group Enthusiasm Bias | Low | Moderate — group enthusiasm may return |
| Best Suited For | Questions where seniority/status distorts estimates | Technically complex questions needing verbal debate |
3.3.2 Policy Delphi
Rather than seeking consensus, the Policy Delphi is designed to surface and sharpen disagreement. It is used when decision-makers need to understand the full range of defensible positions on a contentious issue — not a single point forecast.
Key differences from standard Delphi: the goal is to identify the strongest arguments on all sides; outlier views are actively preserved rather than converged away; and the output is a structured map of positions rather than an aggregated number. Policy Delphi is common in public policy, environmental planning, and regulatory forecasting.
3.3.3 Real-Time Delphi
Real-Time Delphi replaces sequential rounds with a continuously updating web interface. Experts submit estimates at any time, immediately see the current distribution of all submissions, and can revise on their own schedule. This eliminates round-management overhead and dramatically reduces calendar time, but it also removes the facilitator’s ability to curate and contextualize feedback between rounds — a significant loss when qualitative reasoning is important.
3.3.4 Mini-Delphi (Estimate-Discuss-Estimate)
Sometimes called Estimate-Discuss-Estimate, the Mini-Delphi compresses the entire process into a single structured meeting. Experts submit individual estimates privately (often via cards or electronic polling), results are displayed to the group, discussion occurs, and a second round of private estimates is collected. This can be completed in under two hours and is well-suited to situations where assembling experts in one place is feasible and the forecasting question does not require extended reflection.
| Variant | Goal | Anonymity | Rounds | Best When |
|---|---|---|---|---|
| Standard Delphi | Consensus point forecast | Full | 3–5 async rounds | Status/seniority risk is high |
| Estimate-Talk-Estimate | Consensus with verbal clarification | Partial | 3–5 with discussion breaks | Technical nuance needs verbal debate |
| Policy Delphi | Map of positions | Full | 3–5 async rounds | Policy question with legitimate disagreement |
| Real-Time Delphi | Rapid consensus | Full | Continuous | Speed is paramount |
| Mini-Delphi | Quick consensus | Partial | 1 meeting, 2 estimates | Experts can meet; question is bounded |
4 The Facilitator: Critical Role
The facilitator is not a passive administrator. Research consistently finds that facilitator skill is the single largest determinant of Delphi quality. Responsibilities include:
- Process design: structuring the number of rounds, questionnaire format, and timeline.
- Feedback compilation: accurately summarizing the range of expert opinion — including minority views — without introducing bias.
- Attention direction: identifying where experts disagree most sharply and ensuring those disagreements receive adequate qualitative discussion.
- Anonymity maintenance: ensuring no expert can identify others’ responses throughout the process.
A poor facilitator can introduce systematic bias into every round, steering the final forecast toward whatever framing they chose at the outset. This is not hypothetical — it has been documented in applied settings, including the Tourism Australia case discussed below.
5 Advantages and Disadvantages
5.1 Advantages
| Advantage | Explanation |
|---|---|
| Eliminates Group Dynamics Issues | No dominant voices, seniority pressure, seating effects, or personality conflicts distorting estimates. |
| Geographic Flexibility | Experts can participate from anywhere. No travel costs, scheduling conflicts, or in-person coordination. |
| Cost-Effective | No meeting rooms or travel budgets required. Rounds can be conducted asynchronously. |
| Thoughtful Responses | Experts have time to research, reflect, and consult references before submitting forecasts. |
| Documented Process | Written justifications create a transparent audit trail and enable organizational learning over time. |
| Diverse Expertise | Easy to recruit specialists from different domains, institutions, and countries. |
5.2 Disadvantages
| Limitation | Explanation |
|---|---|
| Time-Consuming | Multiple rounds take days or weeks, compared to hours for a single group meeting. |
| Panel Fatigue | Experts may lose interest, disengage, or drop out over long multi-round processes. |
| Difficult Clarifications | Without face-to-face interaction, subtle or nuanced reasoning can be difficult to convey. |
| Facilitator Dependency | Success depends heavily on the facilitator's skill, experience, and impartiality. |
| Consensus Pressure | Pressure to converge may suppress genuinely divergent but valid minority views over time. |
| Written Communication Limits | Complex probabilistic arguments are often harder to express in writing than in conversation. |
6 Application: Simulated Delphi in R
The following example simulates a three-round Delphi process for a new product sales forecast. We generate synthetic expert responses, compute summary statistics, and visualize the convergence of expert opinion across rounds.
6.1 Simulate Expert Forecasts
Show Code
set.seed(42)
n_experts <- 8
true_value <- 12500 # "true" annual sales units
# Round 1: wide dispersion — experts have little shared information
round1 <- rnorm(n_experts, mean = true_value * 0.95, sd = 3500) |> round(0)
# Round 2: after feedback, experts begin to converge
round2 <- rnorm(n_experts, mean = true_value * 0.98, sd = 1800) |> round(0)
# Round 3: further convergence after second round of feedback
round3 <- rnorm(n_experts, mean = true_value * 0.99, sd = 900) |> round(0)
delphi_df <- tibble(
Expert = rep(paste0("Expert ", 1:n_experts), 3),
Round = rep(c("Round 1", "Round 2", "Round 3"), each = n_experts),
Forecast = c(round1, round2, round3)
) |>
mutate(Round = factor(Round, levels = c("Round 1", "Round 2", "Round 3")))6.2 Summary Statistics by Round
| Round | Min | Q1 | Median | Mean | Q3 | Max | IQR | SD |
|---|---|---|---|---|---|---|---|---|
| Round 1 | 9,899 | 11,534.00 | 13,218 | 13,414 | 14,735.75 | 17,165 | 3,201.75 | 2,528 |
| Round 2 | 9,750 | 11,944.50 | 12,766 | 13,236 | 14,920.00 | 16,366 | 2,975.50 | 2,260 |
| Round 3 | 9,984 | 10,623.75 | 12,109 | 11,800 | 12,532.00 | 13,563 | 1,908.25 | 1,375 |
6.3 Visualizing Convergence
Show Code
ggplot(delphi_df, aes(x = Round, y = Forecast, group = Expert)) +
geom_line(alpha = 0.3, color = "steelblue", linewidth = 0.8) +
geom_point(aes(color = Expert), size = 3, alpha = 0.7) +
stat_summary(aes(group = 1), fun = median, geom = "line",
color = "firebrick", linewidth = 1.5, linetype = "dashed") +
stat_summary(aes(group = 1), fun = median, geom = "point",
color = "firebrick", size = 5, shape = 18) +
geom_hline(yintercept = true_value, color = "darkgreen",
linetype = "dotted", linewidth = 1.2) +
annotate("text", x = 0.6, y = true_value + 350,
label = "True Value", color = "darkgreen", size = 3.5, hjust = 0) +
scale_y_continuous(labels = scales::comma) +
scale_color_brewer(palette = "Set2") +
labs(
title = "Delphi Expert Forecast Convergence",
x = "Delphi Round",
y = "Forecast: Annual Sales Units",
color = "Expert"
) +
theme_minimal(base_size = 13) +
theme(legend.position = "bottom",
plot.title = element_text(face = "bold"))Show Code
ggplot(delphi_df, aes(x = Round, y = Forecast, fill = Round)) +
geom_boxplot(alpha = 0.7, outlier.shape = 16, outlier.size = 3) +
geom_jitter(width = 0.08, alpha = 0.6, size = 2.5, color = "gray30") +
geom_hline(yintercept = true_value, color = "darkgreen",
linetype = "dotted", linewidth = 1.2) +
annotate("text", x = 0.6, y = true_value + 350,
label = "True Value", color = "darkgreen", size = 3.5, hjust = 0) +
scale_y_continuous(labels = scales::comma) +
scale_fill_brewer(palette = "Blues") +
labs(
title = "Expert Forecast Distributions by Delphi Round",
x = "Delphi Round",
y = "Forecast: Annual Sales Units"
) +
theme_minimal(base_size = 13) +
theme(legend.position = "none",
plot.title = element_text(face = "bold"))6.4 Final Delphi Forecast
| Metric | Value |
|---|---|
| Round 3 Median (Delphi Forecast) | 12,109 |
| True Value | 12,500 |
| Absolute Error | 391 |
| MAPE | 3.13% |
7 Case Study: Tourism Australia’s Forecasting Committee
7.1 The Setup
Australia’s tourism industry is large enough to take seriously. In 2005 alone, domestic tourism contributed an estimated $55.5 billion to the Australian economy — more than three times the contribution of international arrivals. Getting the forecast right matters enormously for airline capacity decisions, hotel investment, government infrastructure spending, and regional employment planning across the country.
To produce official demand forecasts, the Australian government established the Tourism Forecasting Committee (TFC) — an independent expert panel that published forecasts twice yearly. The panel read like a who’s who of relevant expertise: Tourism Australia, the Australian Standing Committee on Tourism, the Australian Tourism Export Council, the Department of Industry Tourism and Resources, the Australian Bankers Association, Qantas, and representatives of the hotel and property sectors.
On paper, this is close to an ideal Delphi panel: diverse expertise, private and public sector perspectives, and a structured iterative process. In practice, something went wrong.
7.2 What the Forecasts Actually Said
When Athanasopoulos and Hyndman (2008) published their statistical models of Australian domestic tourism demand and compared them to the TFC’s official forecasts, they found a stark pattern: the TFC forecasts were systematically more optimistic than the statistical models, particularly over longer horizons. This wasn’t a one-time miss — the bias was consistent and directional.
When the same authors revisited the comparison with data through 2011, the picture was damning. The published TFC forecasts had continued to be optimistic. Year after year, the committee’s expert panel predicted stronger growth in domestic visitor nights than actually materialized. The optimism wasn’t random noise — it was a structural feature of the forecasting process.
7.3 What Was at Stake
The consequences of systematic optimism in tourism forecasting are not abstract. Airlines that plan capacity around inflated visitor forecasts fly planes with empty seats. Hotels built to meet a projected surge in demand sit partially vacant. State governments fund infrastructure for a tourism boom that never arrives. Regional communities that depend on tourism revenue — and plan hiring, seasonal staffing, and local business investment accordingly — are left exposed.
This is the hidden cost of a biased forecasting process: it is not just an intellectual failure. It propagates through real capital allocation decisions and real employment outcomes across an entire industry.
7.4 Why Did It Happen? The Diagnostic Questions
Hyndman and Athanasopoulos pose five diagnostic questions about the TFC process that are worth examining carefully, because each one maps directly onto a known failure mode of judgmental forecasting:
Although the TFC clearly states in its methodology that it produces ‘forecasts’ rather than ‘targets’, could this be a case where these have been confused? Are the forecasters and users sufficiently well-segregated in this process? Could the iterative process itself be improved? Could the adjustment process in the meetings be improved? Could it be that the group meetings have promoted optimism? Could it be that domestic tourism should have been considered earlier in the day?
That last question is more pointed than it sounds. If domestic tourism was consistently scheduled later in the day’s agenda — when panel members were tired, running behind, and less inclined to challenge or interrogate forecasts carefully — this represents a systematic procedural bias. The order of agenda items is not supposed to affect forecast accuracy. When it does, the process has failed.
The TFC case is not a story about bad experts. The panel members were highly qualified. It is a story about what happens when a well-intentioned process lacks the structural safeguards that make Delphi work:
Forecasts vs. targets confusion — when panel members are also advocates for the tourism industry, the line between “what we expect” and “what we want” blurs. The TFC explicitly claimed to produce forecasts, not targets, but the consistent optimism suggests otherwise.
Insufficient forecaster-user separation — several panel members represented organizations with a direct financial stake in strong tourism numbers. Qantas, hotel associations, and the tourism export council all benefit commercially from bullish demand forecasts. This is a structural conflict of interest that anonymity alone cannot fix.
Group meeting dynamics — the iterative process included group discussions, which means the social dynamics that Delphi is designed to eliminate may have re-entered through the back door. Enthusiasm is contagious in a room full of tourism stakeholders.
Agenda effects — if forecasts for the largest market segment (domestic tourism) were produced late in a long meeting day, cognitive fatigue and time pressure would reduce the scrutiny applied to those numbers.
7.5 The Broader Implication
The TFC case demonstrates a principle that graduate students in forecasting should internalize: structured expert processes are not self-correcting. A panel with good credentials, meeting regularly, following an iterative procedure, can still produce systematically biased forecasts if the incentive structure is misaligned, the facilitator lacks the authority or skill to challenge consensus, and the process does not explicitly guard against optimism.
The Delphi method offers the potential for superior group judgment. Realizing that potential requires more than assembling experts in a room — or even an asynchronous process. It requires careful attention to who the experts are, what their incentives are, how the feedback is curated, and whether someone in the room has the authority and independence to say: “This forecast looks too good. Defend it.”
8 Discussion Questions
More accurate when:
- No historical data exists (new products, unprecedented events) — a statistical model simply cannot be built.
- A structural break has occurred that the model cannot see (a major policy change, a new technology disrupting a market) — the model extrapolates a pattern that no longer holds, while experts can incorporate the new reality.
- Timely qualitative signals are available that precede the formal data — expert knowledge can nowcast what the statistics will confirm only later.
Less accurate when:
- Abundant, clean historical data exists and the underlying process is stable — in these cases, a well-specified statistical model will almost always outperform expert judgment because it is not subject to cognitive biases, anchoring, or wishful thinking.
- The forecasting horizon is short and the question is quantitative — statistical models have a structural advantage at one-to-four-step-ahead forecasts on time series data.
- Expert incentives are misaligned — as the Tourism Australia case demonstrates, when experts have a stake in the outcome, the Delphi process can produce systematically biased results that are worse than a naive statistical benchmark.
The general rule: Use Delphi where statistical models cannot reach. Use statistical models where they can, and treat Delphi-style expert input as a structured source of adjustments rather than a replacement.
Expert panel composition (aim for 8–12 members):
Recruit for diversity of perspective, not just seniority. A strong panel for a regulatory impact forecast might include: an economist specializing in the affected industry, a regulatory lawyer, a compliance officer from a large incumbent firm, a representative of a smaller firm that faces different compliance cost structures, a consumer advocate or behavioral economist, an academic with relevant sector expertise, and an international expert from a jurisdiction that has already implemented similar regulation. Crucially, exclude people whose organizations have a direct lobbying interest in the forecast’s outcome — or if they must be included, note the conflict explicitly and consider down-weighting their input.
Number of rounds:
Three rounds is the recommended design for a high-stakes question like this. Round 1 surfaces the full range of initial expert views and the reasoning behind them — this is especially important for regulatory questions, where experts often hold fundamentally different models of how firms and consumers respond. Round 2 provides structured feedback and focuses attention on the dimensions of largest disagreement. Round 3 achieves convergence and gives experts one final opportunity to defend outlier positions before the forecast is aggregated.
Key design consideration: Separate the quantitative forecast (e.g., percentage change in market size, job gains/losses) from the qualitative scenario assumptions (e.g., timeline for compliance, likelihood of legal challenges). Experts often agree more on mechanisms than on magnitudes.
The Tourism Australia case is the canonical example of forecast-target confusion in practice. Several structural safeguards can prevent it:
Separate forecasters from users institutionally. The people who produce the forecast should have no financial or organizational stake in its outcome. If Qantas benefits from high tourism forecasts, Qantas should not be on the forecasting panel. If the government department responsible for tourism promotion sets growth targets, it should not also produce the official demand forecast.
Require explicit probability language. Asking experts for a most likely point estimate invites optimism bias. Asking them for an 80% prediction interval — a range they believe has an 80% probability of containing the true outcome — forces them to think about downside scenarios and acknowledge uncertainty.
Track and publish accuracy over time. When forecasters know their historical track record will be publicly compared to outcomes, the incentive to produce flattering forecasts diminishes. The TFC’s systematic optimism persisted partly because the historical record was not regularly audited against realized outcomes.
Build in a formal devil’s advocate role. Assign one panel member (rotating across rounds) the explicit responsibility of challenging the emerging consensus and articulating the strongest case for a pessimistic forecast. This institutionalizes the skeptical voice that good facilitators provide informally.
Prefer ETE when:
- The forecasting question is highly technical and experts genuinely need verbal clarification to understand each other’s assumptions — for example, forecasting the output of a complex engineering system where terminology and model assumptions differ significantly across disciplines.
- Written justifications have produced genuine confusion in Round 1 — if the facilitator’s feedback summary shows that experts are talking past each other rather than updating on each other’s reasoning, structured discussion before Round 2 may resolve the miscommunication faster than another written round.
- Time is a binding constraint — ETE can compress two written rounds into one meeting day.
Tradeoffs introduced:
- Dominant personality risk returns. The most senior or most confident expert will tend to anchor the discussion, even in a nominally structured conversation. The anonymity that protects against this in standard Delphi cannot be maintained in a face-to-face setting.
- Social convergence replaces genuine updating. Experts may revise their Round 2 estimates toward the group not because they have learned something new, but because disagreeing out loud is socially uncomfortable. This produces apparent consensus that masks real uncertainty.
- Group enthusiasm effects. In a room of domain experts discussing a topic they care about, optimism is contagious. ETE forecasts for new products and technologies have been shown to be more optimistic on average than strict Delphi forecasts on the same questions.
The TFC case offers five concrete lessons for any organization running a Delphi-style forecasting panel:
1. Audit your panel’s incentive structure before you begin. List every panel member and ask: does this person or their organization benefit financially from a high forecast? from a low forecast? If the answer is yes for a majority of members, the panel’s composition is compromised and the forecast should be treated with skepticism regardless of how rigorous the process appears.
2. Publish and review historical accuracy regularly. The TFC’s systematic optimism persisted for years in part because the forecasts were not routinely compared against realized outcomes in a public, accountable way. Every organizational forecasting panel should publish a vintage accuracy report annually — tracking not just whether they were right, but whether they were systematically biased in a particular direction.
3. Treat group meetings as a risk, not a feature. The TFC’s process included group discussion phases that likely reintroduced the social dynamics Delphi is designed to exclude. If face-to-face interaction is included, it should be structured (ETE-style) with explicit norms against anchoring to the most recent speaker’s view.
4. Agenda order is not neutral. If the most important or most contentious forecasting question is always addressed last — when participants are fatigued and time-pressured — the quality of that forecast will systematically suffer. Rotate the order, or schedule high-stakes questions for the start of the process when cognitive resources are freshest.
5. Distinguish forecasts from aspirations in your communication. Even if the internal process produces an honest forecast, the way it is communicated to stakeholders can blur the line. A tourism authority that publishes its demand forecast in a glossy promotional report is implicitly framing it as an aspiration. Consider how the forecast is packaged and whether that packaging invites the forecast-target confusion that appears to have afflicted the TFC.
8.1 Further Reading
Core Text: Hyndman, R.J., & Athanasopoulos, G. (2021). Forecasting: Principles and Practice, 3rd ed. OTexts. Chapter 6: Judgmental Forecasting.
Foundational Delphi Research: Rowe, G., & Wright, G. (2001). Expert opinions in forecasting: The role of the Delphi technique. In J.S. Armstrong (Ed.), Principles of Forecasting. Springer.
Comprehensive Review: Lawrence, M., Goodwin, P., O’Connor, M., & Önkal, D. (2006). Judgmental forecasting: A review of progress over the last 25 years. International Journal of Forecasting, 22(3), 493–518.
9 Appendix: Delphi in Context — Judgmental Forecasting
The Delphi Method does not exist in isolation. It is one tool within the broader family of judgmental forecasting methods — approaches that rely on structured human expertise rather than (or in addition to) statistical models. Understanding where Delphi fits requires a brief orientation to when and why judgmental methods are used at all.
A separate lecture covers judgmental forecasting in full. This appendix provides just enough context to situate Delphi properly.
9.1 When Is Judgmental Forecasting Used?
Judgmental forecasting becomes necessary — or at least valuable — in three situations:
No Data Available. Statistical models require historical data. When none exists, expert judgment is the only option. Common examples include new product launches (no sales history), major policy changes such as plain cigarette packaging (no precedent), unprecedented market conditions, and new competitor entry. In these cases, Delphi is often the only structured forecasting method available.
Delayed Data. Sometimes data exists but arrives too late to be useful. GDP figures are published with a quarterly lag; by the time official statistics are available, decisions have already been made. Real-time expert judgment — structured through a Delphi-style process — can incorporate timely signals before the formal data catches up. This application is sometimes called nowcasting.
Statistical Adjustment Needed. Even when a well-specified statistical model is available, there are situations where expert knowledge should modify its output: special events not captured in historical data (the Olympics, a major holiday shift), promotions with no historical analog, or significant domain knowledge about a structural change in the market. Here, Delphi-style panels are often used to determine how and by how much to adjust a statistical forecast.
Key Insight from the research literature: Judgmental forecasts that combine domain expertise with timely information consistently outperform purely statistical forecasts in data-sparse or rapidly-changing environments. The Delphi method is the most rigorously validated structure for eliciting and aggregating that expertise.
9.2 Where Delphi Fits
| Method | Primary Use | Key Strength | Key Risk |
|---|---|---|---|
| Delphi Method | Structured expert consensus — any domain | Anonymity + iteration eliminates group dynamics bias | Facilitator dependency; time-consuming |
| Forecasting by Analogy | New products/situations with identifiable historical parallels | Leverages known outcomes from comparable situations | No two situations are truly identical |
| Scenario Forecasting | Long-range planning under deep uncertainty | Explores multiple plausible futures rather than one point estimate | Scenarios may not be exhaustive or coherent |
| Sales Force Composite | New product forecasting from field-level knowledge | Closest to actual customers and market conditions | Incentive conflicts; systematic optimism |
| Executive Opinion | Strategic decisions requiring senior perspective | Fast; incorporates strategic context | Groupthink; removed from market reality |
| Customer Intentions Survey | New product demand from potential buyers | Direct market signal from the source | Stated intentions do not always predict behavior |
The highlighted row is the subject of this lecture. The other methods are covered in the judgmental forecasting lecture.