Introduction

When an auto liability insurance claim is reported, the insurer rarely knows exactly how much it will ultimately cost. A fender-bender might settle quickly; a serious injury claim could take years to resolve. This creates a fundamental challenge for insurance companies: how do you set aside enough money today to pay claims that won’t fully develop until the future?

The answer lies in loss reserving — a set of actuarial techniques that use historical claims data to estimate how much money an insurer will ultimately owe. This project walks through two widely used methods:

  1. Chain Ladder (Development Method) — projects future losses by assuming historical payment patterns repeat themselves.
  2. Bornhuetter-Ferguson (BF) — blends the Chain Ladder projection with an independent estimate of expected losses, giving more stability for immature accident years.

We’ll build everything from scratch: generating synthetic claims data, constructing a loss triangle, and applying both methods — all in reproducible R code.


Glossary of Key Terms

Before diving into the data, here’s a plain-language guide to the terminology used throughout this analysis.

Term Definition
Accident Year The calendar year in which a loss event (accident) occurred. Claims are grouped by when the accident happened, not when they were reported or paid.
Development Lag How many years have passed since the accident year. Lag 1 = the year of the accident; Lag 2 = one year later; and so on.
Paid Losses The actual dollars an insurer has paid out on claims so far — cash out the door.
Reported (Incurred) Losses Paid losses plus case reserves — the insurer’s best estimate of what each open claim will ultimately cost.
Loss Triangle A matrix where rows are accident years and columns are development lags. Each cell shows cumulative losses as of a given lag. The upper-left portion is observed; the lower-right is unknown and must be estimated.
Age-to-Age Factor (LDF) A multiplier that describes how much losses grow from one development lag to the next. Also called a Loss Development Factor.
Tail Factor A factor applied beyond the last observed development period to capture any remaining loss development.
Ultimate Losses The total amount an insurer expects to pay on a group of claims once all development is complete.
IBNR Incurred But Not Reported — losses that have occurred but haven’t yet been fully recognized in the insurer’s books. IBNR = Ultimate − Reported to Date.
A Priori Loss Ratio An independent, upfront estimate of losses as a percentage of premium. Used in the BF method.
Expected Losses Premium × A Priori Loss Ratio. The baseline estimate of what losses should be before looking at actual data.

Part 1: Generating Synthetic Claims Data

Why Synthetic Data?

Real insurance claims data is confidential and proprietary. For this portfolio demonstration, we’ll simulate a realistic dataset for a fictitious auto liability book of business. The data is designed to mimic real-world patterns: losses develop slowly in early lags, then taper off as claims close.

The Simulation

We’ll generate individual claims with the following fields:

  • claim_id — a unique identifier for each claim
  • accident_year — the year the accident occurred (2017–2024)
  • development_lag — years of development observed (1–8)
  • paid_losses — cumulative paid losses as of that lag
  • reported_losses — cumulative reported (incurred) losses as of that lag
  • claim_count — number of claims in that accident year / lag combination
  • premium — earned premium for that accident year (used in BF)
library(tidyverse)
library(knitr)
library(kableExtra)
library(scales)

set.seed(42)  # Ensures reproducibility — anyone running this gets the same numbers

# ── Parameters ──────────────────────────────────────────────────────────────────
accident_years  <- 2017:2024
n_lags          <- 8
n_claims_range  <- c(180, 320)   # Claims per accident year

# Premium grows modestly each year (reflects a growing book of business)
premiums <- tibble(
  accident_year = accident_years,
  premium       = round(seq(8e6, 11.5e6, length.out = 8), -3)
)

# ── Development pattern ──────────────────────────────────────────────────────────
# These cumulative % factors describe how much of ultimate losses are paid by each lag.
# Auto liability tends to develop faster than, say, workers' comp.
cum_pct_paid <- c(0.38, 0.62, 0.76, 0.85, 0.91, 0.95, 0.98, 1.00)
cum_pct_rptd <- c(0.65, 0.82, 0.90, 0.95, 0.97, 0.98, 0.99, 1.00)

# ── Ultimate loss ratio assumptions ─────────────────────────────────────────────
# Each accident year has a slightly different underlying loss ratio, reflecting
# real-world variability in claim severity and frequency.
ult_loss_ratios <- c(0.68, 0.71, 0.74, 0.70, 0.73, 0.75, 0.72, 0.70)

# ── Build the claims dataset ─────────────────────────────────────────────────────
claims_list <- list()

for (i in seq_along(accident_years)) {
  ay      <- accident_years[i]
  prem    <- premiums$premium[i]
  ult_lr  <- ult_loss_ratios[i]
  ult_loss <- prem * ult_lr

  # The current calendar year is 2024, so accident year 2017 has 8 lags of
  # development, 2018 has 7, ..., 2024 has only 1.
  max_lag <- n_lags - (i - 1)   # most-recent AY has fewest lags observed

  n_claims <- sample(seq(n_claims_range[1], n_claims_range[2]), 1)

  for (lag in 1:max_lag) {
    # Add realistic noise around the expected development percentages
    noise_paid <- rnorm(1, 0, 0.015)
    noise_rptd <- rnorm(1, 0, 0.010)

    paid_pct <- min(max(cum_pct_paid[lag] + noise_paid, 0.01), 1.00)
    rptd_pct <- min(max(cum_pct_rptd[lag] + noise_rptd, paid_pct), 1.00)

    claims_list[[length(claims_list) + 1]] <- tibble(
      claim_id      = paste0("AY", ay, "-L", lag),
      accident_year = ay,
      development_lag = lag,
      claim_count   = round(n_claims * paid_pct),
      paid_losses   = round(ult_loss * paid_pct, 0),
      reported_losses = round(ult_loss * rptd_pct, 0),
      premium       = prem
    )
  }
}

claims_df <- bind_rows(claims_list)

# Preview the first several rows
claims_df %>%
  head(16) %>%
  mutate(across(c(paid_losses, reported_losses, premium), dollar)) %>%
  kable(caption = "Sample of Synthetic Auto Liability Claims Data",
        align   = "c") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE)
Sample of Synthetic Auto Liability Claims Data
claim_id accident_year development_lag claim_count paid_losses reported_losses premium
AY2017-L1 2017 1 92 $2,192,103 $3,588,003 $8,000,000
AY2017-L2 2017 2 142 $3,376,707 $4,400,710 $8,000,000
AY2017-L3 2017 3 175 $4,178,384 $4,927,563 $8,000,000
AY2017-L4 2017 4 192 $4,570,348 $5,252,586 $8,000,000
AY2017-L5 2017 5 203 $4,853,488 $5,285,059 $8,000,000
AY2017-L6 2017 6 213 $5,079,372 $5,418,967 $8,000,000
AY2017-L7 2017 7 224 $5,334,108 $5,440,000 $8,000,000
AY2017-L8 2017 8 228 $5,440,000 $5,440,000 $8,000,000
AY2018-L1 2018 1 96 $2,052,824 $3,775,468 $8,500,000
AY2018-L2 2018 2 181 $3,861,203 $4,930,194 $8,500,000
AY2018-L3 2018 3 208 $4,425,347 $5,421,125 $8,500,000
AY2018-L4 2018 4 246 $5,239,708 $5,847,625 $8,500,000
AY2018-L5 2018 5 256 $5,452,882 $5,838,424 $8,500,000
AY2018-L6 2018 6 261 $5,573,640 $5,942,067 $8,500,000
AY2018-L7 2018 7 275 $5,856,364 $6,002,136 $8,500,000
AY2019-L1 2019 1 112 $2,547,527 $4,270,403 $9,000,000
# Export the full dataset as a CSV for sharing / reproducibility
write_csv(claims_df, "auto_liability_claims.csv")
cat("✓ Exported", nrow(claims_df), "rows to auto_liability_claims.csv\n")
## ✓ Exported 36 rows to auto_liability_claims.csv

A quick summary of what we just built:

claims_df %>%
  group_by(accident_year) %>%
  summarise(
    lags_observed   = max(development_lag),
    total_claims    = max(claim_count),
    paid_at_latest  = dollar(max(paid_losses)),
    rptd_at_latest  = dollar(max(reported_losses)),
    premium         = dollar(first(premium))
  ) %>%
  kable(caption = "Summary by Accident Year (latest diagonal)",
        col.names = c("Accident Year", "Lags Observed", "# Claims",
                      "Paid to Date", "Reported to Date", "Premium"),
        align = "c") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Summary by Accident Year (latest diagonal)
Accident Year Lags Observed # Claims Paid to Date Reported to Date Premium
2017 8 228 $5,440,000 $5,440,000 $8,000,000
2018 7 275 $5,856,364 $6,002,136 $8,500,000
2019 6 280 $6,364,318 $6,537,471 $9,000,000
2020 5 176 $6,208,679 $6,493,253 $9,500,000
2021 4 185 $6,180,263 $6,951,550 $10,000,000
2022 3 161 $6,107,674 $7,160,007 $10,500,000
2023 2 114 $5,017,897 $6,620,778 $11,000,000
2024 1 86 $3,129,155 $5,294,338 $11,500,000

Part 2: Building the Loss Triangle

What Is a Loss Triangle?

A loss triangle is the backbone of most reserving analyses. Imagine laying out your claims data in a grid:

  • Rows = Accident Years (2017 through 2024)
  • Columns = Development Lags (Year 1 through Year 8)
  • Each cell = Cumulative losses as of that lag for that accident year

The result looks like a triangle because more-recent accident years have fewer development periods observed — we can only see data through the current calendar year. The lower-right portion of the grid is blank — that’s exactly what we need to estimate.

# ── Paid Loss Triangle ───────────────────────────────────────────────────────────
paid_triangle <- claims_df %>%
  select(accident_year, development_lag, paid_losses) %>%
  pivot_wider(
    names_from   = development_lag,
    values_from  = paid_losses,
    names_prefix = "Lag_"
  ) %>%
  arrange(accident_year) %>%
  column_to_rownames("accident_year")

# ── Reported Loss Triangle ───────────────────────────────────────────────────────
rptd_triangle <- claims_df %>%
  select(accident_year, development_lag, reported_losses) %>%
  pivot_wider(
    names_from   = development_lag,
    values_from  = reported_losses,
    names_prefix = "Lag_"
  ) %>%
  arrange(accident_year) %>%
  column_to_rownames("accident_year")

# Display the paid triangle with NA cells shown as "—"
paid_triangle %>%
  mutate(across(everything(), ~ ifelse(is.na(.), "—", dollar(.)))) %>%
  kable(caption = "Cumulative Paid Loss Triangle — Auto Liability (2017–2024)",
        align   = "c") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = TRUE) %>%
  column_spec(1:8, width = "10%")
Cumulative Paid Loss Triangle — Auto Liability (2017–2024)
Lag_1 Lag_2 Lag_3 Lag_4 Lag_5 Lag_6 Lag_7 Lag_8
2017 $2,192,103 $3,376,707 $4,178,384 $4,570,348 $4,853,488 $5,079,372 $5,334,108 $5,440,000
2018 $2,052,824 $3,861,203 $4,425,347 $5,239,708 $5,452,882 $5,573,640 $5,856,364
2019 $2,547,527 $4,223,797 $4,953,891 $5,719,035 $5,765,537 $6,364,318
2020 $2,390,514 $4,042,064 $5,010,963 $5,684,612 $6,208,679
2021 $2,463,287 $4,512,995 $5,507,478 $6,180,263
2022 $3,157,844 $5,036,363 $6,107,674
2023 $2,842,889 $5,017,897
2024 $3,129,155
rptd_triangle %>%
  mutate(across(everything(), ~ ifelse(is.na(.), "—", dollar(.)))) %>%
  kable(caption = "Cumulative Reported Loss Triangle — Auto Liability (2017–2024)",
        align   = "c") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = TRUE)
Cumulative Reported Loss Triangle — Auto Liability (2017–2024)
Lag_1 Lag_2 Lag_3 Lag_4 Lag_5 Lag_6 Lag_7 Lag_8
2017 $3,588,003 $4,400,710 $4,927,563 $5,252,586 $5,285,059 $5,418,967 $5,440,000 $5,440,000
2018 $3,775,468 $4,930,194 $5,421,125 $5,847,625 $5,838,424 $5,942,067 $6,002,136
2019 $4,270,403 $5,414,277 $5,990,559 $6,305,865 $6,393,393 $6,537,471
2020 $4,351,282 $5,549,033 $6,028,601 $6,265,375 $6,493,253
2021 $4,816,642 $5,992,849 $6,639,349 $6,951,550
2022 $5,061,476 $6,483,948 $7,160,007
2023 $5,208,989 $6,620,778
2024 $5,294,338

Notice the staircase pattern: accident year 2017 has all 8 lags filled in, while accident year 2024 has only Lag 1 visible. The goal of our reserving methods is to fill in the missing cells and sum up what we expect each accident year to ultimately cost.


Part 3: Chain Ladder Method

How Chain Ladder Works

The Chain Ladder method (also called the Development Method) is built on one key assumption: the way losses develop over time in the past will continue in the future.

The process has three steps:

  1. Calculate Age-to-Age Factors (LDFs) — for each pair of consecutive lags, how much did cumulative losses multiply from one lag to the next? Average these across all accident years to get a single factor per transition.
  2. Apply the factors forward — starting from the latest known diagonal, multiply each accident year’s current losses by the appropriate LDFs to project out to ultimate.
  3. Compute IBNR — Ultimate minus what’s already been reported = the reserve needed.

Step 1: Calculate Age-to-Age Factors

# ── Volume-weighted age-to-age factors ──────────────────────────────────────────
# Volume-weighting sums numerators and denominators separately before dividing.
# This gives more weight to larger accident years and is the actuarial standard.

compute_ldfs <- function(triangle) {
  n_rows <- nrow(triangle)
  n_cols <- ncol(triangle)
  ldfs   <- numeric(n_cols - 1)

  for (j in 1:(n_cols - 1)) {
    # Only use rows where BOTH the current and next lag are observed
    numerator   <- sum(triangle[, j + 1], na.rm = TRUE)
    denominator <- sum(triangle[1:(n_rows - j), j], na.rm = TRUE)
    ldfs[j]     <- numerator / denominator
  }
  ldfs
}

paid_ldfs <- compute_ldfs(as.matrix(paid_triangle))
rptd_ldfs <- compute_ldfs(as.matrix(rptd_triangle))

# Add a tail factor of 1.000 (assuming development is complete at Lag 8)
tail_factor <- 1.000
paid_ldfs_full <- c(paid_ldfs, tail_factor)
rptd_ldfs_full <- c(rptd_ldfs, tail_factor)

ldf_table <- tibble(
  Transition  = c(paste0("Lag ", 1:7, " → ", 2:8), "Lag 8 → Ult"),
  `Paid LDF`  = round(paid_ldfs_full, 4),
  `Rptd LDF`  = round(rptd_ldfs_full, 4)
)

ldf_table %>%
  kable(caption = "Age-to-Age Loss Development Factors (LDFs)",
        align   = "c") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Age-to-Age Loss Development Factors (LDFs)
Transition Paid LDF Rptd LDF
Lag 1 → 2 1.7040 1.2677
Lag 2 → 3 1.2048 1.1036
Lag 3 → 4 1.1378 1.0557
Lag 4 → 5 1.0503 1.0143
Lag 5 → 6 1.0588 1.0218
Lag 6 → 7 1.0505 1.0071
Lag 7 → 8 1.0199 1.0000
Lag 8 → Ult 1.0000 1.0000

Reading the table: A Paid LDF of, say, 1.63 from Lag 1 → Lag 2 means that, on average, cumulative paid losses grew by 63% between the first and second year of development. Factors closer to 1.000 in later lags indicate that most losses have already been paid and little development remains.

Step 2 & 3: Project to Ultimate and Calculate IBNR

# ── Cumulative-to-Ultimate (CDF) factors ────────────────────────────────────────
# The CDF at lag j = product of all LDFs from lag j onward.
# This tells us: "multiply current losses by this factor to get ultimate."

compute_cdfs <- function(ldfs_full) {
  n <- length(ldfs_full)
  cdfs <- numeric(n)
  cdfs[n] <- ldfs_full[n]
  for (j in (n - 1):1) {
    cdfs[j] <- ldfs_full[j] * cdfs[j + 1]
  }
  cdfs
}

paid_cdfs <- compute_cdfs(paid_ldfs_full)
rptd_cdfs <- compute_cdfs(rptd_ldfs_full)

# ── Extract the latest diagonal ──────────────────────────────────────────────────
# For each accident year, find its most recent observed value.
get_latest_diagonal <- function(triangle) {
  apply(triangle, 1, function(row) {
    vals <- row[!is.na(row)]
    if (length(vals) == 0) NA else tail(vals, 1)
  })
}

latest_paid <- get_latest_diagonal(as.matrix(paid_triangle))
latest_rptd <- get_latest_diagonal(as.matrix(rptd_triangle))

# The lag at which each accident year currently sits
current_lag <- sapply(as.data.frame(t(paid_triangle)), function(col) {
  sum(!is.na(col))
})
current_lag <- apply(as.matrix(paid_triangle), 1, function(row) sum(!is.na(row)))

# ── Compute ultimate and IBNR ────────────────────────────────────────────────────
cl_results <- tibble(
  accident_year       = accident_years,
  current_lag         = current_lag,
  paid_to_date        = latest_paid,
  reported_to_date    = latest_rptd,
  paid_cdf            = paid_cdfs[current_lag],
  rptd_cdf            = rptd_cdfs[current_lag],
  cl_ultimate_paid    = round(latest_paid * paid_cdfs[current_lag], 0),
  cl_ultimate_rptd    = round(latest_rptd * rptd_cdfs[current_lag], 0),
  ibnr_paid_basis     = round(latest_paid * paid_cdfs[current_lag] - latest_rptd, 0),
  ibnr_rptd_basis     = round(latest_rptd * rptd_cdfs[current_lag] - latest_rptd, 0)
)

cl_results %>%
  mutate(
    paid_to_date     = dollar(paid_to_date),
    reported_to_date = dollar(reported_to_date),
    paid_cdf         = round(paid_cdf, 4),
    rptd_cdf         = round(rptd_cdf, 4),
    cl_ultimate_rptd = dollar(cl_ultimate_rptd),
    ibnr_rptd_basis  = dollar(ibnr_rptd_basis)
  ) %>%
  select(accident_year, current_lag, reported_to_date, rptd_cdf,
         cl_ultimate_rptd, ibnr_rptd_basis) %>%
  kable(
    caption = "Chain Ladder Results — Reported Basis",
    col.names = c("Accident Year", "Current Lag", "Reported to Date",
                  "CDF to Ult", "CL Ultimate", "CL IBNR"),
    align = "c"
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Chain Ladder Results — Reported Basis
Accident Year Current Lag Reported to Date CDF to Ult CL Ultimate CL IBNR
2017 8 $5,440,000 1.0000 $5,440,000 $0
2018 7 $6,002,136 1.0000 $6,002,136 $0
2019 6 $6,537,471 1.0071 $6,584,139 $46,668
2020 5 $6,493,253 1.0291 $6,682,080 $188,827
2021 4 $6,951,550 1.0438 $7,256,056 $304,506
2022 3 $7,160,007 1.1019 $7,889,953 $729,946
2023 2 $6,620,778 1.2161 $8,051,839 $1,431,061
2024 1 $5,294,338 1.5418 $8,162,638 $2,868,300
cat("Chain Ladder Total IBNR (Reported Basis):",
    dollar(sum(cl_results$ibnr_rptd_basis)), "\n")
## Chain Ladder Total IBNR (Reported Basis): $5,569,308
cat("Chain Ladder Total Ultimate (Reported Basis):",
    dollar(sum(cl_results$cl_ultimate_rptd)), "\n")
## Chain Ladder Total Ultimate (Reported Basis): $56,068,841

Part 4: Bornhuetter-Ferguson Method

Why BF?

The Chain Ladder method is powerful, but it has a known weakness: for very recent accident years with little development, it leans heavily on immature data. If accident year 2024 has only one lag of history, multiplying it by a large CDF amplifies any early-period noise.

The Bornhuetter-Ferguson (BF) method was introduced by Ronald Bornhuetter and Ronald Ferguson in a landmark 1972 paper. Their insight: blend two sources of information.

  • What the data says — actual losses reported to date (like Chain Ladder uses).
  • What we expected — a priori expected losses based on premium and an assumed loss ratio, set independently before looking at the data.

The BF ultimate estimate is:

BF Ultimate = Reported to Date + Expected Unreported Losses

Where Expected Unreported Losses = Expected Ultimate × (1 − 1/CDF).

The term (1 − 1/CDF) is the percent unreported — how much of ultimate losses we’d expect to not yet be in the data at the current lag.

For mature accident years (CDF near 1.0), BF gives nearly the same answer as Chain Ladder. For immature years, BF is more stable because it leans on the a priori expectation rather than extrapolating from thin data.

A Priori Expected Losses

# ── A priori loss ratios ─────────────────────────────────────────────────────────
# In practice, these would come from pricing, industry benchmarks, or management
# judgment. Here we use slightly smoothed versions of our true simulation ratios.
apriori_lr <- c(0.70, 0.70, 0.72, 0.72, 0.73, 0.73, 0.72, 0.71)

bf_inputs <- cl_results %>%
  mutate(
    premium           = premiums$premium,
    apriori_lr        = apriori_lr,
    expected_ultimate = round(premium * apriori_lr, 0),
    pct_unreported    = round(1 - 1 / rptd_cdf, 4),   # Expected % still to develop
    expected_unreptd  = round(expected_ultimate * pct_unreported, 0)
  )

bf_inputs %>%
  mutate(
    premium           = dollar(premium),
    expected_ultimate = dollar(expected_ultimate),
    expected_unreptd  = dollar(expected_unreptd)
  ) %>%
  select(accident_year, premium, apriori_lr, expected_ultimate,
         pct_unreported, expected_unreptd) %>%
  kable(
    caption = "BF Method — A Priori Inputs",
    col.names = c("Accident Year", "Premium", "A Priori LR",
                  "Expected Ultimate", "% Unreported", "Expected Unreported"),
    align = "c"
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
BF Method — A Priori Inputs
Accident Year Premium A Priori LR Expected Ultimate % Unreported Expected Unreported
2017 $8,000,000 0.70 $5,600,000 0.0000 $0
2018 $8,500,000 0.70 $5,950,000 0.0000 $0
2019 $9,000,000 0.72 $6,480,000 0.0071 $46,008
2020 $9,500,000 0.72 $6,840,000 0.0283 $193,572
2021 $10,000,000 0.73 $7,300,000 0.0420 $306,600
2022 $10,500,000 0.73 $7,665,000 0.0925 $709,012
2023 $11,000,000 0.72 $7,920,000 0.1777 $1,407,384
2024 $11,500,000 0.71 $8,165,000 0.3514 $2,869,181

BF Ultimate and IBNR

bf_results <- bf_inputs %>%
  mutate(
    bf_ultimate = round(reported_to_date + expected_unreptd, 0),
    bf_ibnr     = round(bf_ultimate - reported_to_date, 0)
  )

bf_results %>%
  mutate(
    reported_to_date = dollar(reported_to_date),
    expected_unreptd = dollar(expected_unreptd),
    bf_ultimate      = dollar(bf_ultimate),
    bf_ibnr          = dollar(bf_ibnr)
  ) %>%
  select(accident_year, reported_to_date, expected_unreptd, bf_ultimate, bf_ibnr) %>%
  kable(
    caption = "Bornhuetter-Ferguson Results — Reported Basis",
    col.names = c("Accident Year", "Reported to Date", "Expected Unreported",
                  "BF Ultimate", "BF IBNR"),
    align = "c"
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Bornhuetter-Ferguson Results — Reported Basis
Accident Year Reported to Date Expected Unreported BF Ultimate BF IBNR
2017 $5,440,000 $0 $5,440,000 $0
2018 $6,002,136 $0 $6,002,136 $0
2019 $6,537,471 $46,008 $6,583,479 $46,008
2020 $6,493,253 $193,572 $6,686,825 $193,572
2021 $6,951,550 $306,600 $7,258,150 $306,600
2022 $7,160,007 $709,012 $7,869,019 $709,012
2023 $6,620,778 $1,407,384 $8,028,162 $1,407,384
2024 $5,294,338 $2,869,181 $8,163,519 $2,869,181
cat("BF Total IBNR:", dollar(sum(bf_results$bf_ibnr)), "\n")
## BF Total IBNR: $5,531,757
cat("BF Total Ultimate:", dollar(sum(bf_results$bf_ultimate)), "\n")
## BF Total Ultimate: $56,031,290

Part 5: Method Comparison & Visualization

Side-by-Side: Chain Ladder vs. BF

comparison <- tibble(
  accident_year    = accident_years,
  reported_to_date = cl_results$reported_to_date,
  cl_ultimate      = cl_results$cl_ultimate_rptd,
  cl_ibnr          = cl_results$ibnr_rptd_basis,
  bf_ultimate      = bf_results$bf_ultimate,
  bf_ibnr          = bf_results$bf_ibnr,
  diff_ibnr        = bf_results$bf_ibnr - cl_results$ibnr_rptd_basis
)

comparison %>%
  mutate(across(c(reported_to_date, cl_ultimate, cl_ibnr,
                  bf_ultimate, bf_ibnr, diff_ibnr), dollar)) %>%
  kable(
    caption = "Chain Ladder vs. Bornhuetter-Ferguson — Full Comparison",
    col.names = c("Accident Year", "Reported to Date",
                  "CL Ultimate", "CL IBNR",
                  "BF Ultimate", "BF IBNR",
                  "BF − CL IBNR"),
    align = "c"
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = TRUE) %>%
  row_spec(which(abs(bf_results$bf_ibnr - cl_results$ibnr_rptd_basis) ==
                   max(abs(bf_results$bf_ibnr - cl_results$ibnr_rptd_basis))),
           bold = TRUE, background = "#fff3cd")
Chain Ladder vs. Bornhuetter-Ferguson — Full Comparison
Accident Year Reported to Date CL Ultimate CL IBNR BF Ultimate BF IBNR BF − CL IBNR
2017 $5,440,000 $5,440,000 $0 $5,440,000 $0 $0
2018 $6,002,136 $6,002,136 $0 $6,002,136 $0 $0
2019 $6,537,471 $6,584,139 $46,668 $6,583,479 $46,008 -$660
2020 $6,493,253 $6,682,080 $188,827 $6,686,825 $193,572 $4,745
2021 $6,951,550 $7,256,056 $304,506 $7,258,150 $306,600 $2,094
2022 $7,160,007 $7,889,953 $729,946 $7,869,019 $709,012 -$20,934
2023 $6,620,778 $8,051,839 $1,431,061 $8,028,162 $1,407,384 -$23,677
2024 $5,294,338 $8,162,638 $2,868,300 $8,163,519 $2,869,181 $881

Ultimate Loss Estimates by Accident Year

plot_data <- comparison %>%
  select(accident_year, cl_ultimate, bf_ultimate) %>%
  pivot_longer(cols = c(cl_ultimate, bf_ultimate),
               names_to = "method", values_to = "ultimate") %>%
  mutate(method = recode(method,
                         "cl_ultimate" = "Chain Ladder",
                         "bf_ultimate" = "Bornhuetter-Ferguson"))

ggplot(plot_data, aes(x = factor(accident_year), y = ultimate, fill = method)) +
  geom_col(position = "dodge", width = 0.65, alpha = 0.88) +
  geom_text(aes(label = paste0("$", round(ultimate / 1e6, 2), "M")),
            position = position_dodge(width = 0.65),
            vjust = -0.4, size = 3.2, fontface = "bold") +
  scale_y_continuous(labels = label_dollar(scale = 1e-6, suffix = "M"),
                     expand = expansion(mult = c(0, 0.12))) +
  scale_fill_manual(values = c("Chain Ladder" = "#2c7bb6", "Bornhuetter-Ferguson" = "#d7191c")) +
  labs(
    title    = "Projected Ultimate Losses by Accident Year",
    subtitle = "Auto Liability | Chain Ladder vs. Bornhuetter-Ferguson",
    x        = "Accident Year",
    y        = "Ultimate Losses",
    fill     = "Method",
    caption  = "Note: Based on synthetic data generated for illustration purposes."
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title    = element_text(face = "bold", size = 15),
    plot.subtitle = element_text(color = "grey40"),
    legend.position = "top",
    panel.grid.major.x = element_blank()
  )
Chain Ladder vs. BF Ultimate Estimates by Accident Year

Chain Ladder vs. BF Ultimate Estimates by Accident Year

IBNR by Accident Year

ibnr_data <- comparison %>%
  select(accident_year, cl_ibnr, bf_ibnr) %>%
  pivot_longer(cols = c(cl_ibnr, bf_ibnr),
               names_to = "method", values_to = "ibnr") %>%
  mutate(method = recode(method,
                         "cl_ibnr" = "Chain Ladder",
                         "bf_ibnr" = "Bornhuetter-Ferguson"))

ggplot(ibnr_data, aes(x = factor(accident_year), y = ibnr, fill = method)) +
  geom_col(position = "dodge", width = 0.65, alpha = 0.88) +
  scale_y_continuous(labels = label_dollar(scale = 1e-6, suffix = "M"),
                     expand = expansion(mult = c(0, 0.15))) +
  scale_fill_manual(values = c("Chain Ladder" = "#2c7bb6", "Bornhuetter-Ferguson" = "#d7191c")) +
  labs(
    title    = "IBNR Reserves by Accident Year",
    subtitle = "Difference is largest for the most recent (immature) accident years",
    x        = "Accident Year",
    y        = "IBNR",
    fill     = "Method",
    caption  = "IBNR = Estimated Ultimate − Reported to Date"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title    = element_text(face = "bold", size = 15),
    plot.subtitle = element_text(color = "grey40"),
    legend.position = "top",
    panel.grid.major.x = element_blank()
  )
IBNR Estimates: Chain Ladder vs. BF

IBNR Estimates: Chain Ladder vs. BF

Development Pattern Visualization

dev_data <- claims_df %>%
  group_by(accident_year, development_lag) %>%
  summarise(reported_losses = max(reported_losses), .groups = "drop") %>%
  group_by(accident_year) %>%
  mutate(pct_of_latest = reported_losses / max(reported_losses)) %>%
  ungroup()

ggplot(dev_data, aes(x = development_lag, y = reported_losses / 1e6,
                     color = factor(accident_year), group = accident_year)) +
  geom_line(linewidth = 1.1, alpha = 0.8) +
  geom_point(size = 2.5) +
  scale_color_brewer(palette = "Dark2") +
  scale_x_continuous(breaks = 1:8) +
  scale_y_continuous(labels = label_dollar(suffix = "M")) +
  labs(
    title  = "Cumulative Reported Loss Development by Accident Year",
    x      = "Development Lag (Years)",
    y      = "Cumulative Reported Losses ($M)",
    color  = "Accident Year"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title      = element_text(face = "bold", size = 15),
    legend.position = "right"
  )
Loss Development Pattern by Lag

Loss Development Pattern by Lag


Part 6: Summary & Conclusions

summary_tbl <- tibble(
  Metric              = c("Total Reported to Date", "Chain Ladder Ultimate",
                           "Chain Ladder IBNR", "BF Ultimate", "BF IBNR"),
  Value               = c(
    dollar(sum(cl_results$reported_to_date)),
    dollar(sum(cl_results$cl_ultimate_rptd)),
    dollar(sum(cl_results$ibnr_rptd_basis)),
    dollar(sum(bf_results$bf_ultimate)),
    dollar(sum(bf_results$bf_ibnr))
  )
)

summary_tbl %>%
  kable(caption = "Portfolio Summary — All Accident Years Combined", align = "lr") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  row_spec(c(3, 5), bold = TRUE, background = "#e8f4f8")
Portfolio Summary — All Accident Years Combined
Metric Value
Total Reported to Date $50,499,533
Chain Ladder Ultimate $56,068,841
Chain Ladder IBNR $5,569,308
BF Ultimate $56,031,290
BF IBNR $5,531,757

Key Takeaways

Chain Ladder is intuitive and data-driven, but it amplifies noise in immature accident years. Notice how the Chain Ladder IBNR for 2023 and 2024 is more volatile — it’s extrapolating aggressively from just one or two lags of data.

Bornhuetter-Ferguson tempers that volatility by anchoring the immature years to an a priori expectation. For 2024 (only one lag visible), BF is largely driven by expected losses — which is the appropriate behavior. As more data matures, BF naturally converges toward the Chain Ladder result.

In practice, actuaries use both methods (and often others, like the Benktander method or Cape Cod) and apply professional judgment to select or blend estimates. No single method is always right — the goal is a reasonable, defensible reserve that ensures the insurer can pay its obligations to policyholders.


Session Info

sessionInfo()
## R version 4.5.3 (2026-03-11 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 26200)
## 
## Matrix products: default
##   LAPACK version 3.12.1
## 
## locale:
## [1] LC_COLLATE=English_United States.utf8 
## [2] LC_CTYPE=English_United States.utf8   
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.utf8    
## 
## time zone: America/Chicago
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] scales_1.4.0     kableExtra_1.4.0 knitr_1.51       lubridate_1.9.5 
##  [5] forcats_1.0.1    stringr_1.6.0    dplyr_1.2.0      purrr_1.2.1     
##  [9] readr_2.2.0      tidyr_1.3.2      tibble_3.3.1     ggplot2_4.0.2   
## [13] tidyverse_2.0.0 
## 
## loaded via a namespace (and not attached):
##  [1] sass_0.4.10        generics_0.1.4     xml2_1.5.2         stringi_1.8.7     
##  [5] hms_1.1.4          digest_0.6.39      magrittr_2.0.4     evaluate_1.0.5    
##  [9] grid_4.5.3         timechange_0.4.0   RColorBrewer_1.1-3 fastmap_1.2.0     
## [13] jsonlite_2.0.0     viridisLite_0.4.3  textshaping_1.0.5  jquerylib_0.1.4   
## [17] cli_3.6.5          crayon_1.5.3       rlang_1.1.7        bit64_4.6.0-1     
## [21] withr_3.0.2        cachem_1.1.0       yaml_2.3.12        otel_0.2.0        
## [25] parallel_4.5.3     tools_4.5.3        tzdb_0.5.0         vctrs_0.7.1       
## [29] R6_2.6.1           lifecycle_1.0.5    bit_4.6.0          vroom_1.7.0       
## [33] pkgconfig_2.0.3    pillar_1.11.1      bslib_0.10.0       gtable_0.3.6      
## [37] glue_1.8.0         systemfonts_1.3.2  xfun_0.56          tidyselect_1.2.1  
## [41] rstudioapi_0.18.0  farver_2.1.2       htmltools_0.5.9    labeling_0.4.3    
## [45] rmarkdown_2.30     svglite_2.2.2      compiler_4.5.3     S7_0.2.1

This analysis uses fully synthetic data and is intended for educational and portfolio demonstration purposes only. It does not represent the financials of any real insurance company.