Judgmental Forecasting

Methods, Applications, and Pitfalls

Author

Published

February 21, 2026

What Is Judgmental Forecasting?

Judgmental forecasting refers to any forecasting approach that relies — wholly or in part — on structured human expertise rather than purely on statistical models fitted to historical data. It is not a single method but a family of techniques, each suited to different circumstances and each carrying its own characteristic failure modes.

The phrase is sometimes used dismissively — as a synonym for “guessing” or “gut feel.” This is a mistake. Done well, judgmental forecasting is systematic, documented, and reproducible. Done poorly, it is exactly what the critics say: a post-hoc rationalization of whatever the forecaster hoped or feared would happen. The difference between the two lies almost entirely in process design.

The goal of judgmental forecasting is not to replace statistical methods — it is to bring human knowledge to bear in the places where statistical methods cannot reach.

When Is Judgmental Forecasting Necessary?

Three situations make judgmental methods necessary or at least valuable:

No data available. Statistical models require historical data. When none exists — new product launches, major policy changes with no precedent, unprecedented market disruptions — expert judgment is the only option. No amount of statistical sophistication can substitute for information that simply does not exist.

Data exists but arrives too late. GDP is published with a quarterly lag. Retail scanner data takes weeks to aggregate. In the gap between when a decision must be made and when the formal data will be available, expert judgment structured through a systematic process can nowcast what the statistics will confirm only later.

Statistical models need adjustment. Even a well-specified model cannot anticipate everything. Special events not in the historical record (the Olympics, an unexpected natural disaster, a sudden competitor exit), structural breaks (a new technology disrupting pricing patterns), or significant domain knowledge about a change in the underlying process all represent situations where expert input should modify — not replace — the statistical forecast.

The Key Insight from the Research Literature

Judgmental forecasts that combine domain expertise with timely information consistently outperform purely statistical forecasts in data-sparse or rapidly-changing environments. However, unaided judgment — expert opinion with no systematic structure — is frequently worse than a simple statistical benchmark. Structure is what separates valuable judgmental forecasting from expensive noise.

The Cognitive Bias Problem

Before surveying the specific methods, it is essential to understand why judgmental forecasting is difficult. Human experts bring genuine knowledge — but they also bring systematic cognitive biases that distort their forecasts in predictable directions:

Common Cognitive Biases in Judgmental Forecasting
Bias	Description	Implication for Process Design
Anchoring & Adjustment	Overreliance on the first number encountered; insufficient adjustment away from the initial anchor even when new information arrives.	Never anchor group discussions on a single initial estimate; elicit forecasts independently before sharing.
Availability Bias	Overweighting recent or particularly memorable events when estimating probabilities; ignoring base rates.	Explicitly require forecasters to consult base rates and historical distributions, not just recent memory.
Illusory Correlation	Perceiving causal patterns in random data; seeing trends that are statistical noise.	Require written justification for any perceived trend; subject it to statistical scrutiny.
Selective Perception	Unconsciously attending only to information that confirms existing beliefs; discounting contradictory evidence.	Assign a formal devil's advocate role; make disconfirming evidence a required part of the feedback package.
Optimism & Wishful Thinking	Systematically overestimating the probability of favorable outcomes; group settings amplify this effect.	Require explicit probability intervals; schedule pessimistic scenarios before optimistic ones.
Status Quo Bias	Forecasting minimal change from the current state; resistance to predicting disruption even when evidence supports it.	Frame questions around change from the baseline, not around the baseline itself.

These biases are not signs of incompetence — they are features of human cognition that affect experts and novices alike. The methods described in this lecture are, at their core, process designs that systematically reduce the influence of these biases on the final forecast.

Forecasting by Analogy

The Core Idea

Forecasting by analogy addresses a specific problem: how do you forecast something genuinely new when you have no historical data for it specifically, but you do have data for comparable things? Rather than throwing up its hands, the analogical approach asks: what has happened in situations sufficiently similar to this one, and what does that tell us about the likely outcome here?

The method is older than formal forecasting. Property appraisers have used it for centuries — valuing a house by comparing it to similar recently-sold properties adjusted for differences in location, size, condition, and features. Financial analysts use it when they value a startup by comparing it to comparable acquisitions. Military historians use it to forecast the outcomes of conflicts by comparing them to structurally similar historical wars.

The Process

Forecasting by analogy follows a structured four-step procedure:

Step 1 — Identify Candidate Analogies

Compile a list of historical situations, products, or events that share meaningful structural features with the case you are trying to forecast. At this stage, be generous: it is better to begin with too many candidates and prune than to miss a relevant analog because it was too quickly dismissed.

For a new consumer technology product, relevant analogs might include: the introduction of the DVD player, the smartphone, the microwave oven, digital cameras, and e-readers. Each entered a market with established incumbents, required behavioral change from consumers, and had a pricing trajectory that evolved as production scaled.

Step 2 — Establish Similarities and Differences

For each candidate analog, score its similarity to the target case across the dimensions that matter most for the forecast. For a new product forecast, relevant dimensions might include: market size at launch, price point relative to substitutes, degree of behavioral change required, presence of network effects, competitive response, and regulatory environment.

Differences are as informative as similarities. If your new product requires substantially more behavioral change than the closest analog, the analog’s adoption curve should be adjusted downward accordingly.

Step 3 — Expert Evaluation

Convene a panel of experts (often using a Delphi-style process) to score each analog’s relevance and to provide their own assessment of the key differences. This step converts the qualitative similarity assessment into a structured weighting scheme.

Step 4 — Derive the Forecast

Weight each analog’s historical outcome by its assessed similarity to the target case. The resulting weighted average — adjusted for known differences — is the analogical forecast. Report it with an explicit uncertainty range based on the spread across analogies: if similar past situations produced a wide range of outcomes, the forecast uncertainty is high regardless of how precise the central estimate appears.

Strengths and Limitations

Forecasting by Analogy: Strengths and Limitations
Dimension	Strength	Limitation
Data requirements	Requires no historical data for the target — only for analogies	Requires good historical data for the analogous cases
Transparency	Explicit, auditable reasoning: the analogy and the adjustment are both documented	Similarity scoring is subjective and can be gamed
Handles novelty	Specifically designed for novel situations with no direct precedent	The more genuinely novel the situation, the harder it is to find valid analogies
Expert disagreement	Multiple analogies provide a natural sensitivity analysis	Experts may disagree sharply on which analogies are relevant
Key assumption	—	No two situations are truly identical — the adjustment for differences is inherently judgmental
Failure mode	—	Cherry-picking analogies that support a predetermined conclusion

The Historian’s Warning

The deeper the historical analogy, the more seductive and the more dangerous it becomes. When a business analyst compares a new streaming platform to the early days of cable television, they are making implicit assumptions about regulatory similarity, consumer price sensitivity, competitive dynamics, and technology adoption curves — assumptions that may or may not hold. The discipline of analogical forecasting lies in making those assumptions explicit and subjecting them to scrutiny, rather than letting them operate unexamined beneath the surface of a confident forecast.

Case in point: AI infrastructure and the railroad boom. In the early 2020s, analysts forecasting investment in AI data centers, GPU clusters, and power infrastructure frequently invoked the 19th-century railroad buildout as the closest historical analog. The comparison is genuinely instructive: both involved massive upfront capital expenditure in enabling infrastructure before the killer applications were fully known, both triggered speculative over-investment followed by consolidation, and in both cases the long-run economic value accrued primarily to the users of the infrastructure rather than its builders. The railroad investors who went bankrupt in the 1870s and 1890s busts laid the tracks that made U.S. agricultural and industrial productivity possible for the next century. The analogy predicts that AI infrastructure providers may face margin compression and consolidation even as the broader economy benefits enormously.

But the analogy also has load-bearing differences that forecasters must confront explicitly. Railroads required physical right-of-way that created durable geographic monopolies — once a line was built between two cities, a competitor faced enormous barriers to entry. AI compute infrastructure has no such moat: a data center in Virginia competes with one in Oregon and one in Singapore, and the marginal cost of switching providers trends toward zero as the market matures. This single structural difference — the presence or absence of geographic monopoly — substantially changes the investment return forecast even if every other feature of the analogy holds. A forecaster who borrows the railroad analogy without flagging this difference is not doing analogical forecasting. They are telling a story.

Case in point: COVID-19 and the 1918 Spanish Flu. When epidemiologists and economic forecasters confronted COVID-19 in early 2020, the 1918 Spanish Flu was the most-cited historical analog — and with good reason. Both were novel respiratory viruses with no pre-existing population immunity, both spread globally before containment was possible, and both arrived in multiple waves with the later waves often more severe than the first. The Spanish Flu analogy was genuinely useful: it correctly predicted the wave structure, the disproportionate mortality burden on specific age groups (though the groups differed strikingly between 1918 and 2020), and the economic disruption pattern of stop-start activity as waves hit and receded.

Where the analogy mislead was in the recovery timeline. The post-1918 economy rebounded into the “Roaring Twenties” — a decade of rapid growth, consumer spending, and cultural exuberance widely attributed in part to a release of suppressed demand after the pandemic. Many forecasters in 2020 and 2021 predicted a similar “Roaring Twenties” rebound for the 2020s. What materialized instead was a supply-chain crisis, a historic inflation surge, and a labor market restructuring that the 1918 analog had not anticipated — because the 1918 economy had no globally integrated just-in-time supply chains, no remote-work infrastructure, and no central banks with multi-trillion-dollar balance sheets ready to deploy fiscal stimulus at scale. The analog captured the epidemiological structure well and the macroeconomic recovery poorly, for precisely the reasons a careful analyst would have flagged in advance: the structural features of the two economies were fundamentally different even if the disease dynamics were comparable.

The lesson is not that the Spanish Flu analogy was wrong to use — it was the best available. The lesson is that its limits deserved as much attention as its similarities, and that the forecasters who performed best were those who explicitly modeled which features of the 1918 experience would and would not transfer to 2020 conditions.

Tip

Iron Man — Forecasting by Analogy ## Pop Culture: Tony Stark and the Analogical Forecast

In Iron Man (2008), Tony Stark does not build the Mark III suit from nothing — he builds it by reasoning from analogies. The Mark I (built in a cave from scrap) establishes the core proof of concept: a powered exoskeleton can fly and withstand combat. The Mark II tests the structural and aerodynamic principles at scale. Each iteration is an analogy for the next — Stark asks, “given what happened with that design, what should I expect from this one, and where will the differences matter?”

This is exactly the logic of analogical forecasting. You never have data on the thing you are building — but you have data on predecessors, and the discipline lies in being explicit about which features transfer and which do not. Stark’s near-fatal icing problem in the Mark II (the one thing the analogy did not predict) is a reminder that the differences between the analog and the target are where the forecast fails. A good analogical forecaster documents those differences as the primary source of uncertainty — not as a footnote, but as the center of the analysis.

Forecasting by Analogy Example

Task: Forecast sales for new smartphone model

Show Code

# Historical analogous products
analogies <- tibble(
  product = c("Model A", "Model B", "Model C", "Model D"),
  first_year_sales = c(2.5, 3.8, 2.1, 4.2),  # millions
  price_similarity = c(0.95, 0.80, 0.60, 0.85),
  feature_similarity = c(0.90, 0.85, 0.70, 0.95),
  market_similarity = c(0.85, 0.90, 0.75, 0.88)
)

# Calculate composite similarity score
analogies <- analogies %>%
  mutate(
    overall_similarity = 
      (0.4 * price_similarity + 
       0.4 * feature_similarity + 
       0.2 * market_similarity),
    weight = overall_similarity / sum(overall_similarity)
  )

forecast_by_analogy <- sum(analogies$first_year_sales * analogies$weight)

Weighted analogy forecast: 3.21 million units

Scenario Forecasting

What Makes Scenarios Different

Scenario forecasting represents a fundamentally different approach to uncertainty. Every other method in this lecture attempts to produce one forecast — a most-likely outcome, possibly surrounded by a confidence interval. Scenario forecasting deliberately produces multiple parallel futures, each internally coherent and plausible, none of which is labeled “the forecast.”

This matters because in situations of deep uncertainty — long time horizons, disruptive technologies, volatile geopolitical environments — the honest answer to “what will happen?” is often not a point estimate with an error bar. It is a structured map of the possibility space: here are three or four genuinely different futures, each of which could plausibly materialize, and here is what drives the difference between them.

Decision-makers who receive a scenario set can test their strategies against each scenario: does our plan work only if the optimistic scenario materializes, or does it hold up reasonably well across all three? This is a fundamentally different and often more useful framing than asking “what is the most likely outcome?”

Pop Culture: Doctor Strange and the 14,000,605 Scenarios

In Avengers: Infinity War (2018), Doctor Strange uses the Time Stone to view 14,000,605 possible futures before the battle against Thanos. He reports back: “We win in one.” This is scenario forecasting taken to its logical extreme — an exhaustive enumeration of the possibility space in order to identify which path leads to the desired outcome.

Several features of this scene map directly onto good scenario forecasting practice:

He doesn’t just pick the optimistic scenario. Strange reviews all futures, including 14,000,604 in which the heroes lose. A forecaster who only models the scenarios in which their preferred outcome materializes is not doing scenario analysis — they are doing wishful thinking with extra steps.

The winning scenario requires a specific sequence of actions. Strange’s conclusion is not just “we can win” — it is “we win only if events unfold in a particular way.” This is the strategic value of scenario forecasting: it reveals not just whether a favorable outcome is possible, but what conditions are necessary for it to occur. That is far more actionable than a point forecast.

He withholds the details deliberately. Strange tells Tony Stark only “one” without revealing the path — because telling Stark the full scenario would change the behavior that makes it possible. This maps onto a real forecasting challenge: when a forecast is published, it changes the behavior of the agents it describes. Scenario forecasts used for competitive strategy often cannot be shared externally without undermining the strategic advantage they identify.

The limitation: 14 million scenarios is not practically available to real forecasters. The discipline of scenario forecasting is choosing the right three to five — the ones that span the key uncertainties without being so numerous that decision-makers cannot act on them. Strange had the Time Stone. Your planning team does not.

The Five-Step Process

Step 1 — Identify the Key Drivers

What are the major forces — technological, economic, political, social, environmental — that will shape the outcome over the forecast horizon? List all plausible drivers first, then prune to the handful that are both most uncertain and most impactful. The goal is to identify the two or three axes along which futures diverge most sharply.

Step 2 — Assess Their Relative Impacts

For each driver, estimate its likely range of outcomes and the impact of each range on the target variable. Some drivers are highly uncertain but low-impact; others are more predictable but decisive. Focus scenario construction on the high-uncertainty, high-impact drivers.

Step 3 — Analyze Interactions Between Drivers

Drivers are rarely independent. A scenario in which interest rates spike and geopolitical risk rises simultaneously is structurally different from either shock in isolation — the combination may produce emergent dynamics (credit crunches, capital flight, supply chain disruption) that neither driver alone would generate. Good scenario construction maps the plausible interaction space and eliminates internally incoherent combinations.

Step 4 — Define the Forecast Targets

Specify precisely what will be measured in each scenario: market size, price levels, adoption rates, regulatory outcomes. This concreteness is essential — it forces the scenario narrative to be specific enough to be falsifiable, rather than a vague story about “disruption” or “turbulence.”

Step 5 — Build the Scenarios

Combine the driver outcomes into a small number (typically three to five) of coherent, internally consistent scenario narratives. Give each scenario a name that captures its essence. Assign rough probabilities if possible — but note that in situations of deep uncertainty, the probabilities themselves are highly uncertain, and the value of the scenario set lies in its coverage of the possibility space, not in any particular probability assignment.

Optimistic, Baseline, and Pessimistic: A Practical Caution

The most common scenario structure — optimistic, baseline, pessimistic — is a practical starting point but carries a hidden risk. If decision-makers consistently assume the baseline scenario will materialize and treat the others as theoretical, the exercise degenerates into a single-point forecast with decorative alternatives. Effective scenario planning requires the organization to genuinely grapple with what it would do differently in each scenario — not just acknowledge that the pessimistic scenario exists.

Scenario Structure: Illustrative Example — New EV Battery Technology
Scenario	Key Driver Assumptions	Market Share by 2030	Rough Probability
Accelerated Adoption	Battery costs fall 40% by 2028; charging infrastructure expands rapidly; government mandates accelerate	45–55%	25%
Moderate Growth	Battery costs fall 20% by 2028; infrastructure grows steadily; policy support continues but weakens	25–35%	50%
Delayed Breakthrough	Battery costs fall less than 10% by 2028; infrastructure rollout stalls; policy reversal in key markets	10–18%	25%

R Example: Building and Visualising Scenarios

Show Code: Scenario Forecast

# Define scenario assumptions
scenario_params <- tibble(
  scenario     = c("Accelerated Adoption", "Moderate Growth", "Delayed Breakthrough"),
  probability  = c(0.25, 0.50, 0.25),
  base_share   = c(0.12, 0.12, 0.12),        # current market share
  annual_growth = c(0.12, 0.06, 0.01),        # annual share gain
  color        = c("#2ca25f", "#2171b5", "#d94801")
)

# Simulate 5-year trajectory for each scenario
years <- 2025:2030

scenario_df <- scenario_params |>
  rowwise() |>
  reframe(
    year        = years,
    scenario    = scenario,
    probability = probability,
    color       = color,
    market_share = pmin(base_share * (1 + annual_growth)^(year - 2025), 0.75)
  )

# Probability-weighted expected forecast
expected_df <- scenario_df |>
  group_by(year) |>
  summarise(expected = sum(market_share * probability))

# ── Plot ──────────────────────────────────────────────────────────────────────
ggplot() +
  geom_line(data = scenario_df,
            aes(x = year, y = market_share, color = scenario,
                linetype = scenario), linewidth = 1.2) +
  geom_ribbon(data = scenario_df |>
                group_by(year) |>
                summarise(lo = min(market_share), hi = max(market_share)),
              aes(x = year, ymin = lo, ymax = hi), alpha = 0.08, fill = "steelblue") +
  geom_line(data = expected_df,
            aes(x = year, y = expected), color = "black",
            linewidth = 1.5, linetype = "dotted") +
  annotate("text", x = 2030.05, y = tail(expected_df$expected, 1),
           label = "Expected\n(probability-weighted)", hjust = 0, size = 3.2) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1),
                     limits = c(0, 0.65)) +
  scale_color_manual(values = setNames(scenario_params$color,
                                        scenario_params$scenario)) +
  scale_linetype_manual(values = c("solid", "dashed", "dotdash")) +
  labs(title    = "EV Battery Market Share: Scenario Forecasts 2025–2030",
       subtitle = "Shaded band = full range; dotted = probability-weighted expected value",
       x = NULL, y = "EV Market Share",
       color = "Scenario", linetype = "Scenario") +
  theme_minimal(base_size = 13) +
  theme(plot.title    = element_text(face = "bold"),
        legend.position = "bottom")

Scenario forecast trajectories over a 5-year horizon

Scenario Forecasting Framework

Build three scenarios with different assumptions:

Show Code

scenarios <- tibble(
  scenario = c("Optimistic", "Baseline", "Pessimistic"),
  market_growth = c(0.15, 0.08, 0.02),
  competitor_response = c("Weak", "Moderate", "Aggressive"),
  marketing_effectiveness = c(1.2, 1.0, 0.8),
  economic_conditions = c("Strong", "Stable", "Weak"),
  base_demand = c(200, 150, 100)
)

# Calculate scenario forecasts
scenarios <- scenarios %>%
  mutate(
    forecast = base_demand * (1 + market_growth) * marketing_effectiveness,
    probability = c(0.20, 0.60, 0.20)  # Subjective probabilities
  )

# Probability-weighted forecast
expected_forecast <- sum(scenarios$forecast * scenarios$probability)

# A tibble: 3 × 8
  scenario    market_growth competitor_response marketing_effectiveness
  <chr>               <dbl> <chr>                                 <dbl>
1 Optimistic           0.15 Weak                                    1.2
2 Baseline             0.08 Moderate                                1  
3 Pessimistic          0.02 Aggressive                              0.8
# ℹ 4 more variables: economic_conditions <chr>, base_demand <dbl>,
#   forecast <dbl>, probability <dbl>


Expected (probability-weighted) forecast: 168.7

Sales Force Composite

The Logic

Sales force composite forecasting is the most grassroots of the methods covered here. Rather than assembling a panel of senior experts or industry analysts, it asks the people closest to actual customers — sales representatives, account managers, regional sales directors — to forecast demand for their specific territories, products, or customer segments. Those field-level forecasts are then aggregated upward through the organizational hierarchy into a company-wide demand forecast.

The appeal is obvious: no one knows what customers are planning to buy next quarter better than the sales representative who speaks to them every week. A regional sales manager who knows that a major account is about to expand capacity, or that a competitor has just lost a key product line, has information that no statistical model will capture until it shows up in the data months later.

The Aggregation Process

Sales Force Composite: Aggregation Levels
Level	Forecasts	Information Advantage
Individual Sales Rep	Individual account-level and territory-level forecasts; most granular	Direct customer contact; early signals of demand changes
Regional Manager	Aggregates rep-level forecasts; adjusts for local market knowledge	Competitive intelligence; regional economic conditions
Product/Category Manager	Reconciles regional forecasts with product strategy and supply chain constraints	Portfolio effects; cross-selling and cannibalization
Executive	Final composite; aligned with financial planning and inventory decisions	Strategic context; macroeconomic outlook

The Incentive Problem

Sales force composite forecasting has one deeply structural problem that no process tweak fully resolves: the people producing the forecasts have strong incentives to bias them.

Sales representatives whose territory forecast becomes their sales quota will systematically underforecast — setting a low bar they can comfortably exceed and collect bonuses against. Representatives who believe their compensation depends on management’s confidence in them may overforecast — projecting aggressive numbers to signal ambition. Either way, the forecast is no longer a genuine estimate of expected demand. It is a negotiating position.

Research consistently finds that sales force composite forecasts are systematically optimistic in new product contexts (where reps are enthusiastic and customers express interest more freely than they commit) and systematically conservative in established product contexts (where forecast = quota = performance target).

Separating Forecasts from Targets

The single most important safeguard in sales force composite forecasting is institutional separation of the forecast from the performance target. If sales representatives know that their forecast will not directly determine their quota, the incentive to bias disappears. Many organizations achieve this by using a statistical baseline as the quota anchor and treating the sales composite as an adjustment layer rather than the primary forecast — combining the field-level intelligence with the model’s systematic accuracy.

When to Use Sales Force Composite

Sales force composite is most valuable for short-horizon forecasts of established products in markets where customer relationships are deep and sales representatives genuinely have advance information about buying decisions. It degrades quickly for new products (where customers express intentions more readily than they act on them), long horizons (where field-level intelligence loses its edge over statistical models), and commoditized markets (where buyer relationships carry less information content).

R Example: Simulating and Diagnosing Sales Forecast Bias

Show Code: Sales Force Bias Simulation

set.seed(42)
n_reps    <- 20
true_mean <- 500   # true expected sales per territory

# Reps shade forecasts ~12% below truth to protect their quota
rep_data <- tibble(
  rep_id   = 1:n_reps,
  region   = rep(c("North", "South", "East", "West"), 5),
  actual   = rnorm(n_reps, mean = true_mean, sd = 60),
  forecast = rnorm(n_reps, mean = true_mean * 0.88, sd = 40)  # downward bias
) |>
  mutate(
    error      = forecast - actual,
    pct_error  = error / actual,
    bias_label = if_else(error < 0, "Under-forecast", "Over-forecast")
  )

# Summary
bias_summary <- rep_data |>
  summarise(
    Mean_Actual   = round(mean(actual), 1),
    Mean_Forecast = round(mean(forecast), 1),
    Mean_Bias_Pct = scales::percent(mean(pct_error), accuracy = 0.1),
    MAPE          = scales::percent(mean(abs(pct_error)), accuracy = 0.1)
  )

# ── Plot: forecast vs actual by rep ──────────────────────────────────────────
p1 <- ggplot(rep_data, aes(x = actual, y = forecast, color = bias_label)) +
  geom_point(size = 3, alpha = 0.8) +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed",
              color = "gray50", linewidth = 1) +
  scale_color_manual(values = c("Under-forecast" = "#d94801",
                                "Over-forecast"  = "#2ca25f")) +
  scale_x_continuous(labels = scales::comma) +
  scale_y_continuous(labels = scales::comma) +
  labs(title    = "Forecast vs. Actual by Sales Rep",
       subtitle = "Points below the 45° line = under-forecasting",
       x = "Actual Sales", y = "Forecast", color = NULL) +
  theme_minimal(base_size = 12) +
  theme(legend.position = "bottom", plot.title = element_text(face = "bold"))

# ── Plot: % error distribution ───────────────────────────────────────────────
p2 <- ggplot(rep_data, aes(x = pct_error, fill = bias_label)) +
  geom_histogram(bins = 10, color = "white", alpha = 0.8) +
  geom_vline(xintercept = 0, linetype = "dashed", linewidth = 1.1) +
  geom_vline(xintercept = mean(rep_data$pct_error),
             color = "firebrick", linetype = "solid", linewidth = 1.1) +
  scale_x_continuous(labels = scales::percent_format(accuracy = 1)) +
  scale_fill_manual(values = c("Under-forecast" = "#d94801",
                               "Over-forecast"  = "#2ca25f")) +
  labs(title    = "Distribution of Forecast Errors",
       subtitle = "Red line = mean bias",
       x = "Forecast Error (%)", y = "Count", fill = NULL) +
  theme_minimal(base_size = 12) +
  theme(legend.position = "bottom", plot.title = element_text(face = "bold"))

library(patchwork)
p1 + p2

Sales rep forecasts vs. actuals — systematic conservatism when forecast = quota

Sales Force Composite: Bias Diagnostic Summary
Mean Actual	Mean Forecast	Mean Bias	MAPE
511.5	429.2	-14.5%	18.0%

Executive Opinion

How It Works

Executive opinion is the most informal of the judgmental methods: senior leaders from across functions — sales, marketing, operations, finance, strategy — convene, pool their knowledge, discuss, and reach a consensus forecast. In many organizations this is the dominant forecasting method by default, particularly for strategic planning horizons where statistical models provide little guidance.

Its principal virtue is speed. A group of experienced executives can produce a forecast for a new market, a new product, or a new strategic scenario in a single meeting, without the time and administrative overhead of a Delphi process. When the question is genuinely urgent and the executives are genuinely knowledgeable, this can produce reasonable results.

The Groupthink Problem

Executive opinion’s principal vice is equally well-documented: groupthink. In a room of senior leaders, several dynamics systematically distort the forecast:

Hierarchy effects. The CEO speaks first, or speaks most confidently, and the room converges. This is not rational updating on superior information — it is social deference to authority. The CFO who has private doubts about the revenue projection may not voice them if the CEO has already committed to it publicly.

Shared mental models. Senior executives in the same organization have often worked together for years, attended the same industry conferences, and read the same trade publications. Their “independent” views may not be independent at all — they share the same anchors, the same recent experiences, and the same industry narratives. This produces false precision: apparent consensus that reflects a shared blind spot rather than genuine agreement on the evidence.

Distance from the market. Paradoxically, the most senior people in an organization are often the furthest from actual customer behavior. They see aggregated data, not individual purchase decisions. They hear from large accounts, not from the average customer. Their intuitions about the market may lag the reality that their sales force encounters daily.

Making Executive Opinion More Rigorous

Executive opinion need not be purely unstructured. Several modifications substantially improve its accuracy:

Pre-meeting independent estimates. Before the group convenes, ask each executive to submit a written estimate with a brief justification. Share these anonymously at the start of the meeting. This prevents the first speaker from anchoring the discussion and ensures all views enter the room before any social dynamics take hold.

Structured agenda. Rather than open discussion, structure the meeting around specific questions: What is the central estimate? What are the two most important sources of downside risk? What would have to be true for the optimistic scenario to materialize? This prevents the discussion from gravitating toward the most comfortable narrative.

Rotate the devil’s advocate. Assign one executive the explicit responsibility of arguing against the emerging consensus in each meeting, with the role rotating so it does not become associated with one person’s personality.

Track the record. Publish accuracy statistics for executive forecasts retrospectively. When executives know that their track record will be reviewed, the incentive to produce flattering forecasts diminishes.

In practice, executive opinion forecasting at major strategy consulting firms — McKinsey, BCG, Bain, and their peers — is considerably more structured than the informal “executives talk until they agree” caricature. A typical McKinsey demand or market-sizing forecast begins with analysts building a fact base independently before any senior input is solicited: market data, comparable company financials, expert interviews, and proprietary survey results are assembled into a structured briefing document. Partners and engagement managers then convene in a structured hypothesis session where each participant is expected to defend a position with evidence, not simply assert a view — the McKinsey norm of “so what?” pressure-testing forces every claim back to its supporting data. Crucially, the most senior person in the room is expected to speak last, not first, specifically to prevent the hierarchy effects that corrupt ordinary executive opinion processes. Dissenting views are recorded explicitly in the final document rather than smoothed away — McKinsey’s internal culture of “obligation to dissent” institutionalizes the devil’s advocate role that most organizations only talk about. The forecast is then stress-tested against two or three alternative scenarios before being presented to the client, so the deliverable is never a single number but a range with explicit assumptions attached to each bound. What distinguishes this from the executive opinion described in the academic literature is not the seniority of the participants but the process discipline imposed before, during, and after the discussion — which is, in essence, a manually enforced approximation of what the Delphi method achieves structurally.

Customer Intentions Survey

The Principle

Customer intentions surveys take the most direct possible approach to forecasting demand: ask the customers. Rather than inferring what buyers will do from historical patterns or expert opinion, this method goes to the source — surveying potential customers about their purchase intentions, timing, and price sensitivity.

The appeal is intuitive. If you want to know whether someone will buy your new product, why not ask them? And indeed, for some applications — gauging awareness, comparing concepts, identifying which customer segments are most interested — customer surveys provide genuinely useful signal.

The Intentions-Behavior Gap

The fundamental problem with customer intentions surveys as a forecasting tool is the well-documented gap between what people say they will do and what they actually do. This gap is not random — it is systematic and directional:

Stated intentions are consistently more positive than revealed behavior. Customers who say they are “very likely” to purchase a new product convert at rates far below 100%. The social desirability of appearing open-minded and interested in innovation inflates stated intentions. Customers who express strong interest in a concept are responding to the concept in the abstract — they have not yet encountered the real price, the real complexity of switching, or the real competition from alternatives.

The further the horizon, the wider the gap. Intentions expressed six months before a product launch are less predictive than intentions expressed six weeks before. Long-horizon surveys capture aspirations more than plans.

The gap varies by product category. For frequently purchased, low-involvement products, intentions surveys have poor track records — the decision at the point of purchase is driven by factors (shelf placement, price promotion, competitor availability) that the respondent cannot anticipate when completing the survey. For high-involvement purchases — major appliances, automobiles, real estate — the relationship between stated intentions and behavior is stronger, because the decision is more deliberate and the respondent has typically given it genuine prior thought.

Customer Intentions Surveys: Conversion Rate Benchmarks by Response Category
Stated Intention	Typical Actual Purchase Rate	Key Implication
Definitely will buy	~50–70% (high-involvement); ~20–40% (low-involvement)	Even the strongest stated intention substantially overstates actual conversion
Probably will buy	~20–40% (high-involvement); ~5–15% (low-involvement)	Treat as moderate interest signal, not a purchase commitment
Might or might not buy	~5–15%	Low-value segment for demand forecasting purposes
Probably will not buy	~1–5%	Can largely be excluded from forecast
Definitely will not buy	~0–2%	Reliable signal of non-purchase

Improving Intentions Survey Forecasts

The standard practice for improving intentions-to-behavior conversion estimates is to apply a deflation factor derived from the historical relationship between stated intentions and actual purchase rates in comparable product categories. Rather than taking stated intentions at face value, the forecaster multiplies each intention category by its empirical conversion rate, then aggregates across the distribution of responses.

A secondary improvement is to supplement the intentions question with behavioral specificity questions: “Have you already researched this type of product?” “Have you set a budget for this purchase?” “Have you discussed this purchase with anyone?” Respondents who answer yes to these behavioral indicators convert at substantially higher rates than those who express an equivalent intention without the accompanying behavior — because the behavioral questions distinguish genuine purchase consideration from polite survey responding.

Three concrete examples to paste wherever fits best:

Classic example — the University of Michigan Consumer Sentiment Index:

The University of Michigan’s Survey of Consumers has run since 1946 and is one of the most closely watched intentions surveys in economics. Every month, ~500 households are asked whether they plan to buy a car, a home, or major appliances in the next 12 months. The index has genuine predictive value for aggregate consumer spending — not because stated intentions map cleanly to purchases, but because changes in intention levels reliably lead actual spending by one to two quarters. This illustrates a core principle: intentions surveys are most useful as directional leading indicators of turning points, not as literal demand forecasts.

New product example — the Apple Newton (1993):

Before launching the Newton PDA, Apple conducted extensive customer intentions surveys showing strong purchase interest from business professionals. Stated intention rates were high enough to justify an ambitious production run. Actual sales were a fraction of the forecast — not because the product concept lacked appeal, but because respondents had no basis for anticipating the Newton’s real-world limitations: the handwriting recognition was poor, the device was large, and the $700 price point felt very different at the point of purchase than it did in a survey. The Newton case is a textbook illustration of the concept-reality gap: survey respondents evaluate an idealized description, not the actual product they will encounter on the shelf.

Housing market example — the Case-Shiller / Fannie Mae Housing Survey:

Fannie Mae’s monthly National Housing Survey asks consumers whether they think now is a good time to buy or sell a home, and whether they expect home prices to rise over the next 12 months. Because housing is a high-involvement purchase requiring deliberate planning, the intentions data here is more predictive than in low-involvement categories. Research finds that the “good time to buy” index leads actual home sales volume by approximately two to three months — giving mortgage lenders, real estate firms, and construction companies an early signal of demand shifts before they show up in transaction data. The key methodological point: behavioral specificity improves predictive validity. Respondents who say it is a good time to buy and report that they have already spoken to a lender convert to actual buyers at nearly three times the rate of those who express equivalent enthusiasm without the accompanying behavior.

Comparing the Methods

Judgmental Forecasting Methods: Structured Comparison
Method	Best For	Core Strength	Core Risk	Horizon
Delphi Method	Any domain where structured expert consensus is needed; no-data situations	Anonymity + iteration eliminates group dynamics bias	Facilitator dependency; time-consuming	Any
Forecasting by Analogy	New products or situations with identifiable historical parallels	Leverages known outcomes from comparable situations	No two situations are truly identical; cherry-picking analogies	Medium–long
Scenario Forecasting	Long-range planning under deep uncertainty; strategy stress-testing	Explores multiple plausible futures rather than one point estimate	Scenarios may not be exhaustive; probabilities are uncertain	Long
Sales Force Composite	Short-horizon demand forecasting for established products	Field-level intelligence; closest to actual customer behavior	Incentive conflicts; systematic optimism or pessimism	Short–medium
Executive Opinion	Fast strategic forecasts; early-stage decisions	Speed; incorporates cross-functional strategic context	Groupthink; removed from market reality; hierarchy effects	Any
Customer Intentions Survey	New product concept testing; segmentation intelligence	Direct signal from the target market	Intentions consistently overstate behavior; gap varies by category	Short–medium

These Methods Are Complements, Not Substitutes

In practice, organizations rarely choose one method and apply it in isolation. A robust judgmental forecasting process for a major new product launch might use customer intentions surveys for initial concept screening, forecasting by analogy to establish a baseline demand trajectory, sales force composite to refine the short-horizon forecast once the product is in-market, and scenario forecasting to test the strategic plan against plausible futures. The Delphi method can serve as the governance structure for integrating all of these inputs into a final consensus forecast.

Key Principles for Better Judgmental Forecasting

Regardless of which specific method is used, six principles consistently separate high-quality judgmental forecasting processes from low-quality ones:

Set the task carefully. Define precisely what is being forecast, over what horizon, and in what units. Vague forecasting tasks produce vague forecasts. Avoid emotive language, leading questions, and irrelevant context that anchors respondents before they have formed their own view.

Use systematic information. Require forecasters to consult base rates, historical distributions, and comparable cases before forming their estimates. Checklists that prompt consideration of specific information sources reduce the anchoring and availability biases that plague purely intuitive judgment.

Elicit independently before aggregating. In any group process, collect individual forecasts before sharing them with the group. This is the single most impactful procedural change available and it costs almost nothing.

Document assumptions and justifications. Written justifications create accountability, enable learning when forecasts are wrong, and make the underlying assumptions available for scrutiny. A forecast without a documented rationale is unfalsifiable — you can never learn from it.

Separate forecasters from users. When the people producing a forecast have a stake in its outcome — because it will become their performance target, because their budget depends on it, or because they are advocates for the outcome they are predicting — the forecast is compromised regardless of how rigorous the method appears. Structural independence is not optional.

Evaluate and learn. Track forecast accuracy over time and publish the results. Compare forecasts against outcomes and against a simple statistical benchmark. When a method consistently underperforms the benchmark, retire it. When it consistently outperforms in specific contexts, double down. Forecasting without feedback is not forecasting — it is storytelling.

Discussion Questions

Q1. Why does unaided expert judgment often underperform simple statistical benchmarks, even when the expert has genuine domain knowledge?

Experts have two things that statistical models lack: domain knowledge and cognitive biases. The research literature’s uncomfortable finding is that in many domains, the biases reliably overwhelm the knowledge.

The core mechanism is that experts tend to overweight information that is recent, memorable, and emotionally salient — and underweight base rates and historical distributions that feel abstract. When an expert says “sales will be strong next quarter because I’ve been getting good signals from our top accounts,” they are applying availability bias: a few vivid conversations are substituting for a systematic review of what happens to sales in comparable conditions. The statistical model, by contrast, is by construction weighting all relevant historical observations equally.

This does not mean experts add no value — they do, particularly when they have access to timely information that has not yet appeared in the data. But it does mean that the process through which expert knowledge is elicited and aggregated matters enormously. Structured methods that force explicit consideration of base rates, require written justification, and aggregate multiple independent views consistently outperform unstructured individual judgment.

Q2. A consumer goods company is launching a new product category with no direct historical precedent. Which judgmental method or combination of methods would you recommend, and why?

No single method is sufficient for a genuinely novel product category. A recommended combination:

Start with scenario forecasting to define the possibility space — what are the three or four materially different futures that could plausible emerge for this category, and what drives the differences? This prevents the rest of the process from being anchored to a single optimistic narrative.

Use forecasting by analogy within each scenario to generate a baseline demand trajectory. For a new product category, look for analogies in adjacent categories that required similar behavioral change from consumers, faced similar competitive dynamics, and had comparable price points relative to substitutes. Use the spread across analogies to define the uncertainty range within each scenario.

Conduct customer intentions surveys for concept screening and segmentation — which consumer segments show the strongest genuine interest, and what features drive that interest? Apply deflation factors to convert stated intentions to behavioral estimates.

Structure the final synthesis through a Delphi process — bring together experts from marketing, consumer research, competitive intelligence, and relevant academic fields to review the analogy-based baselines, the scenario structure, and the survey results, and to produce a consensus forecast with explicit uncertainty ranges.

Avoid relying on sales force composite at this stage — sales representatives have no established relationships with buyers of a product category that does not yet exist, and their estimates will reflect enthusiasm rather than genuine market intelligence.

Q3. Customer intentions surveys consistently overstate actual purchase rates. Why do organizations continue to use them, and what would a more rigorous alternative look like?

Organizations continue to use intentions surveys for several reasons that are at least partially defensible. First, the rank ordering of customer segments by interest level is often more reliable than the absolute level of stated intentions — even if “definitely will buy” overstates actual purchase rates, it still identifies the most promising segments for early marketing investment. Second, concept testing surveys provide qualitative insight into the features and messages that resonate, which has value independent of their demand forecasting accuracy. Third, for many organizations, intentions surveys are the only practical way to generate any quantitative demand estimate before a product exists.

A more rigorous alternative is conjoint analysis — a survey methodology that asks respondents to make repeated choices between product bundles that trade off features and price, rather than asking directly whether they intend to buy. Because respondents are making explicit tradeoffs rather than expressing unbounded enthusiasm for a concept, conjoint results more closely predict actual purchase behavior. Combined with a deflation factor calibrated against historical intentions-to-behavior conversion rates in the relevant category, conjoint-based forecasts substantially outperform simple intentions surveys on most accuracy metrics.

Q4. How would you design an executive opinion process for a company that has historically suffered from groupthink and over-optimistic forecasts?

The root cause of both problems — groupthink and optimism — is typically the same: the social dynamics of the meeting override the genuine independent views of the participants. The fix requires structural changes, not just cultural ones.

Structural change 1: Pre-meeting independent estimates. Before any group discussion, require each executive to submit a written estimate with a quantified uncertainty range and a one-paragraph justification. These are compiled and distributed anonymously at the start of the meeting. This prevents the most senior or most confident voice from anchoring the discussion before other views have been expressed.

Structural change 2: Pessimistic scenario first. Structure the meeting agenda to spend the first half on the pessimistic scenario — what would have to go wrong for the forecast to be substantially below the central estimate, and how likely are those conditions? This counteracts the natural tendency to spend most of the meeting discussing the optimistic case.

Structural change 3: Rotating devil’s advocate. Assign one executive per meeting the formal role of challenging the emerging consensus. Rotate the assignment so it does not become associated with one person’s personality or perceived as a career risk.

Structural change 4: Anonymous final vote. After discussion, collect final estimates anonymously (using electronic polling or written cards) rather than by show of hands or verbal declaration. Aggregate statistically. This preserves genuine independent judgment even after a discussion that may have shifted views.

Structural change 5: Publish the track record. Produce an annual retrospective that compares executive forecasts to outcomes and to a simple statistical benchmark. When the group sees its own historical optimism documented, the social norms around forecasting tend to shift.

Q5. When is scenario forecasting more appropriate than attempting a single point forecast?

Scenario forecasting is appropriate when the fundamental uncertainty of the situation makes a single point forecast misleading — not just inaccurate, but misleading, in the sense that presenting one number as “the forecast” implies a precision and a confidence that the situation does not warrant.

Specifically, prefer scenarios over point forecasts when:

The forecast horizon is long and the environment is genuinely turbulent. At a five to ten year horizon in a rapidly changing industry, the uncertainty around a point forecast is so wide that the forecast’s practical value is limited. Scenarios are more honest about this uncertainty and more useful for strategy testing.

Multiple plausible futures are structurally distinct from each other. If the key drivers could plausibly take values that produce fundamentally different market structures — not just different levels of the same variable, but different competitive dynamics, different customer behaviors, different regulatory regimes — then a single point forecast papers over a distinction that matters enormously for strategic planning.

Decision-making requires robustness testing. When the question is not “what will happen?” but “will our strategy work across a range of possible futures?”, scenarios are the right analytical tool. A strategy that works only in the optimistic scenario is a fundamentally different proposition from one that is robust across all three — and a point forecast cannot reveal that distinction.

The point forecast would be illusory precision. If the genuine 80% confidence interval around the forecast spans a range wide enough to change the strategic decision, present the range explicitly as scenarios rather than compressing it into a point estimate that implies false confidence.