When an auto liability insurance claim is reported, the insurer rarely knows exactly how much it will ultimately cost. A fender-bender might settle quickly; a serious injury claim could take years to resolve. This creates a fundamental challenge for insurance companies: how do you set aside enough money today to pay claims that won’t fully develop until the future?
The answer lies in loss reserving — a set of actuarial techniques that use historical claims data to estimate how much money an insurer will ultimately owe. This project walks through two widely used methods:
We’ll build everything from scratch: generating synthetic claims data, constructing a loss triangle, and applying both methods — all in reproducible R code.
Before diving into the data, here’s a plain-language guide to the terminology used throughout this analysis.
| Term | Definition |
|---|---|
| Accident Year | The calendar year in which a loss event (accident) occurred. Claims are grouped by when the accident happened, not when they were reported or paid. |
| Development Lag | How many years have passed since the accident year. Lag 1 = the year of the accident; Lag 2 = one year later; and so on. |
| Paid Losses | The actual dollars an insurer has paid out on claims so far — cash out the door. |
| Reported (Incurred) Losses | Paid losses plus case reserves — the insurer’s best estimate of what each open claim will ultimately cost. |
| Loss Triangle | A matrix where rows are accident years and columns are development lags. Each cell shows cumulative losses as of a given lag. The upper-left portion is observed; the lower-right is unknown and must be estimated. |
| Age-to-Age Factor (LDF) | A multiplier that describes how much losses grow from one development lag to the next. Also called a Loss Development Factor. |
| Tail Factor | A factor applied beyond the last observed development period to capture any remaining loss development. |
| Ultimate Losses | The total amount an insurer expects to pay on a group of claims once all development is complete. |
| IBNR | Incurred But Not Reported — losses that have occurred but haven’t yet been fully recognized in the insurer’s books. IBNR = Ultimate − Reported to Date. |
| A Priori Loss Ratio | An independent, upfront estimate of losses as a percentage of premium. Used in the BF method. |
| Expected Losses | Premium × A Priori Loss Ratio. The baseline estimate of what losses should be before looking at actual data. |
Real insurance claims data is confidential and proprietary. For this portfolio demonstration, we’ll simulate a realistic dataset for a fictitious auto liability book of business. The data is designed to mimic real-world patterns: losses develop slowly in early lags, then taper off as claims close.
We’ll generate individual claims with the following fields:
claim_id — a unique identifier for each claimaccident_year — the year the accident occurred
(2017–2024)development_lag — years of development observed
(1–8)paid_losses — cumulative paid losses as of that
lagreported_losses — cumulative reported (incurred) losses
as of that lagclaim_count — number of claims in that accident year /
lag combinationpremium — earned premium for that accident year (used
in BF)library(tidyverse)
library(knitr)
library(kableExtra)
library(scales)
set.seed(42) # Ensures reproducibility — anyone running this gets the same numbers
# ── Parameters ──────────────────────────────────────────────────────────────────
accident_years <- 2017:2024
n_lags <- 8
n_claims_range <- c(180, 320) # Claims per accident year
# Premium grows modestly each year (reflects a growing book of business)
premiums <- tibble(
accident_year = accident_years,
premium = round(seq(8e6, 11.5e6, length.out = 8), -3)
)
# ── Development pattern ──────────────────────────────────────────────────────────
# These cumulative % factors describe how much of ultimate losses are paid by each lag.
# Auto liability tends to develop faster than, say, workers' comp.
cum_pct_paid <- c(0.38, 0.62, 0.76, 0.85, 0.91, 0.95, 0.98, 1.00)
cum_pct_rptd <- c(0.65, 0.82, 0.90, 0.95, 0.97, 0.98, 0.99, 1.00)
# ── Ultimate loss ratio assumptions ─────────────────────────────────────────────
# Each accident year has a slightly different underlying loss ratio, reflecting
# real-world variability in claim severity and frequency.
ult_loss_ratios <- c(0.68, 0.71, 0.74, 0.70, 0.73, 0.75, 0.72, 0.70)
# ── Build the claims dataset ─────────────────────────────────────────────────────
claims_list <- list()
for (i in seq_along(accident_years)) {
ay <- accident_years[i]
prem <- premiums$premium[i]
ult_lr <- ult_loss_ratios[i]
ult_loss <- prem * ult_lr
# The current calendar year is 2024, so accident year 2017 has 8 lags of
# development, 2018 has 7, ..., 2024 has only 1.
max_lag <- n_lags - (i - 1) # most-recent AY has fewest lags observed
n_claims <- sample(seq(n_claims_range[1], n_claims_range[2]), 1)
for (lag in 1:max_lag) {
# Add realistic noise around the expected development percentages
noise_paid <- rnorm(1, 0, 0.015)
noise_rptd <- rnorm(1, 0, 0.010)
paid_pct <- min(max(cum_pct_paid[lag] + noise_paid, 0.01), 1.00)
rptd_pct <- min(max(cum_pct_rptd[lag] + noise_rptd, paid_pct), 1.00)
claims_list[[length(claims_list) + 1]] <- tibble(
claim_id = paste0("AY", ay, "-L", lag),
accident_year = ay,
development_lag = lag,
claim_count = round(n_claims * paid_pct),
paid_losses = round(ult_loss * paid_pct, 0),
reported_losses = round(ult_loss * rptd_pct, 0),
premium = prem
)
}
}
claims_df <- bind_rows(claims_list)
# Preview the first several rows
claims_df %>%
head(16) %>%
mutate(across(c(paid_losses, reported_losses, premium), dollar)) %>%
kable(caption = "Sample of Synthetic Auto Liability Claims Data",
align = "c") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE)| claim_id | accident_year | development_lag | claim_count | paid_losses | reported_losses | premium |
|---|---|---|---|---|---|---|
| AY2017-L1 | 2017 | 1 | 92 | $2,192,103 | $3,588,003 | $8,000,000 |
| AY2017-L2 | 2017 | 2 | 142 | $3,376,707 | $4,400,710 | $8,000,000 |
| AY2017-L3 | 2017 | 3 | 175 | $4,178,384 | $4,927,563 | $8,000,000 |
| AY2017-L4 | 2017 | 4 | 192 | $4,570,348 | $5,252,586 | $8,000,000 |
| AY2017-L5 | 2017 | 5 | 203 | $4,853,488 | $5,285,059 | $8,000,000 |
| AY2017-L6 | 2017 | 6 | 213 | $5,079,372 | $5,418,967 | $8,000,000 |
| AY2017-L7 | 2017 | 7 | 224 | $5,334,108 | $5,440,000 | $8,000,000 |
| AY2017-L8 | 2017 | 8 | 228 | $5,440,000 | $5,440,000 | $8,000,000 |
| AY2018-L1 | 2018 | 1 | 96 | $2,052,824 | $3,775,468 | $8,500,000 |
| AY2018-L2 | 2018 | 2 | 181 | $3,861,203 | $4,930,194 | $8,500,000 |
| AY2018-L3 | 2018 | 3 | 208 | $4,425,347 | $5,421,125 | $8,500,000 |
| AY2018-L4 | 2018 | 4 | 246 | $5,239,708 | $5,847,625 | $8,500,000 |
| AY2018-L5 | 2018 | 5 | 256 | $5,452,882 | $5,838,424 | $8,500,000 |
| AY2018-L6 | 2018 | 6 | 261 | $5,573,640 | $5,942,067 | $8,500,000 |
| AY2018-L7 | 2018 | 7 | 275 | $5,856,364 | $6,002,136 | $8,500,000 |
| AY2019-L1 | 2019 | 1 | 112 | $2,547,527 | $4,270,403 | $9,000,000 |
# Export the full dataset as a CSV for sharing / reproducibility
write_csv(claims_df, "auto_liability_claims.csv")
cat("✓ Exported", nrow(claims_df), "rows to auto_liability_claims.csv\n")## ✓ Exported 36 rows to auto_liability_claims.csv
A quick summary of what we just built:
claims_df %>%
group_by(accident_year) %>%
summarise(
lags_observed = max(development_lag),
total_claims = max(claim_count),
paid_at_latest = dollar(max(paid_losses)),
rptd_at_latest = dollar(max(reported_losses)),
premium = dollar(first(premium))
) %>%
kable(caption = "Summary by Accident Year (latest diagonal)",
col.names = c("Accident Year", "Lags Observed", "# Claims",
"Paid to Date", "Reported to Date", "Premium"),
align = "c") %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| Accident Year | Lags Observed | # Claims | Paid to Date | Reported to Date | Premium |
|---|---|---|---|---|---|
| 2017 | 8 | 228 | $5,440,000 | $5,440,000 | $8,000,000 |
| 2018 | 7 | 275 | $5,856,364 | $6,002,136 | $8,500,000 |
| 2019 | 6 | 280 | $6,364,318 | $6,537,471 | $9,000,000 |
| 2020 | 5 | 176 | $6,208,679 | $6,493,253 | $9,500,000 |
| 2021 | 4 | 185 | $6,180,263 | $6,951,550 | $10,000,000 |
| 2022 | 3 | 161 | $6,107,674 | $7,160,007 | $10,500,000 |
| 2023 | 2 | 114 | $5,017,897 | $6,620,778 | $11,000,000 |
| 2024 | 1 | 86 | $3,129,155 | $5,294,338 | $11,500,000 |
A loss triangle is the backbone of most reserving analyses. Imagine laying out your claims data in a grid:
The result looks like a triangle because more-recent accident years have fewer development periods observed — we can only see data through the current calendar year. The lower-right portion of the grid is blank — that’s exactly what we need to estimate.
# ── Paid Loss Triangle ───────────────────────────────────────────────────────────
paid_triangle <- claims_df %>%
select(accident_year, development_lag, paid_losses) %>%
pivot_wider(
names_from = development_lag,
values_from = paid_losses,
names_prefix = "Lag_"
) %>%
arrange(accident_year) %>%
column_to_rownames("accident_year")
# ── Reported Loss Triangle ───────────────────────────────────────────────────────
rptd_triangle <- claims_df %>%
select(accident_year, development_lag, reported_losses) %>%
pivot_wider(
names_from = development_lag,
values_from = reported_losses,
names_prefix = "Lag_"
) %>%
arrange(accident_year) %>%
column_to_rownames("accident_year")
# Display the paid triangle with NA cells shown as "—"
paid_triangle %>%
mutate(across(everything(), ~ ifelse(is.na(.), "—", dollar(.)))) %>%
kable(caption = "Cumulative Paid Loss Triangle — Auto Liability (2017–2024)",
align = "c") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = TRUE) %>%
column_spec(1:8, width = "10%")| Lag_1 | Lag_2 | Lag_3 | Lag_4 | Lag_5 | Lag_6 | Lag_7 | Lag_8 | |
|---|---|---|---|---|---|---|---|---|
| 2017 | $2,192,103 | $3,376,707 | $4,178,384 | $4,570,348 | $4,853,488 | $5,079,372 | $5,334,108 | $5,440,000 |
| 2018 | $2,052,824 | $3,861,203 | $4,425,347 | $5,239,708 | $5,452,882 | $5,573,640 | $5,856,364 | — |
| 2019 | $2,547,527 | $4,223,797 | $4,953,891 | $5,719,035 | $5,765,537 | $6,364,318 | — | — |
| 2020 | $2,390,514 | $4,042,064 | $5,010,963 | $5,684,612 | $6,208,679 | — | — | — |
| 2021 | $2,463,287 | $4,512,995 | $5,507,478 | $6,180,263 | — | — | — | — |
| 2022 | $3,157,844 | $5,036,363 | $6,107,674 | — | — | — | — | — |
| 2023 | $2,842,889 | $5,017,897 | — | — | — | — | — | — |
| 2024 | $3,129,155 | — | — | — | — | — | — | — |
rptd_triangle %>%
mutate(across(everything(), ~ ifelse(is.na(.), "—", dollar(.)))) %>%
kable(caption = "Cumulative Reported Loss Triangle — Auto Liability (2017–2024)",
align = "c") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = TRUE)| Lag_1 | Lag_2 | Lag_3 | Lag_4 | Lag_5 | Lag_6 | Lag_7 | Lag_8 | |
|---|---|---|---|---|---|---|---|---|
| 2017 | $3,588,003 | $4,400,710 | $4,927,563 | $5,252,586 | $5,285,059 | $5,418,967 | $5,440,000 | $5,440,000 |
| 2018 | $3,775,468 | $4,930,194 | $5,421,125 | $5,847,625 | $5,838,424 | $5,942,067 | $6,002,136 | — |
| 2019 | $4,270,403 | $5,414,277 | $5,990,559 | $6,305,865 | $6,393,393 | $6,537,471 | — | — |
| 2020 | $4,351,282 | $5,549,033 | $6,028,601 | $6,265,375 | $6,493,253 | — | — | — |
| 2021 | $4,816,642 | $5,992,849 | $6,639,349 | $6,951,550 | — | — | — | — |
| 2022 | $5,061,476 | $6,483,948 | $7,160,007 | — | — | — | — | — |
| 2023 | $5,208,989 | $6,620,778 | — | — | — | — | — | — |
| 2024 | $5,294,338 | — | — | — | — | — | — | — |
Notice the staircase pattern: accident year 2017 has all 8 lags filled in, while accident year 2024 has only Lag 1 visible. The goal of our reserving methods is to fill in the missing cells and sum up what we expect each accident year to ultimately cost.
The Chain Ladder method (also called the Development Method) is built on one key assumption: the way losses develop over time in the past will continue in the future.
The process has three steps:
# ── Volume-weighted age-to-age factors ──────────────────────────────────────────
# Volume-weighting sums numerators and denominators separately before dividing.
# This gives more weight to larger accident years and is the actuarial standard.
compute_ldfs <- function(triangle) {
n_rows <- nrow(triangle)
n_cols <- ncol(triangle)
ldfs <- numeric(n_cols - 1)
for (j in 1:(n_cols - 1)) {
# Only use rows where BOTH the current and next lag are observed
numerator <- sum(triangle[, j + 1], na.rm = TRUE)
denominator <- sum(triangle[1:(n_rows - j), j], na.rm = TRUE)
ldfs[j] <- numerator / denominator
}
ldfs
}
paid_ldfs <- compute_ldfs(as.matrix(paid_triangle))
rptd_ldfs <- compute_ldfs(as.matrix(rptd_triangle))
# Add a tail factor of 1.000 (assuming development is complete at Lag 8)
tail_factor <- 1.000
paid_ldfs_full <- c(paid_ldfs, tail_factor)
rptd_ldfs_full <- c(rptd_ldfs, tail_factor)
ldf_table <- tibble(
Transition = c(paste0("Lag ", 1:7, " → ", 2:8), "Lag 8 → Ult"),
`Paid LDF` = round(paid_ldfs_full, 4),
`Rptd LDF` = round(rptd_ldfs_full, 4)
)
ldf_table %>%
kable(caption = "Age-to-Age Loss Development Factors (LDFs)",
align = "c") %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| Transition | Paid LDF | Rptd LDF |
|---|---|---|
| Lag 1 → 2 | 1.7040 | 1.2677 |
| Lag 2 → 3 | 1.2048 | 1.1036 |
| Lag 3 → 4 | 1.1378 | 1.0557 |
| Lag 4 → 5 | 1.0503 | 1.0143 |
| Lag 5 → 6 | 1.0588 | 1.0218 |
| Lag 6 → 7 | 1.0505 | 1.0071 |
| Lag 7 → 8 | 1.0199 | 1.0000 |
| Lag 8 → Ult | 1.0000 | 1.0000 |
Reading the table: A Paid LDF of, say, 1.63 from Lag 1 → Lag 2 means that, on average, cumulative paid losses grew by 63% between the first and second year of development. Factors closer to 1.000 in later lags indicate that most losses have already been paid and little development remains.
# ── Cumulative-to-Ultimate (CDF) factors ────────────────────────────────────────
# The CDF at lag j = product of all LDFs from lag j onward.
# This tells us: "multiply current losses by this factor to get ultimate."
compute_cdfs <- function(ldfs_full) {
n <- length(ldfs_full)
cdfs <- numeric(n)
cdfs[n] <- ldfs_full[n]
for (j in (n - 1):1) {
cdfs[j] <- ldfs_full[j] * cdfs[j + 1]
}
cdfs
}
paid_cdfs <- compute_cdfs(paid_ldfs_full)
rptd_cdfs <- compute_cdfs(rptd_ldfs_full)
# ── Extract the latest diagonal ──────────────────────────────────────────────────
# For each accident year, find its most recent observed value.
get_latest_diagonal <- function(triangle) {
apply(triangle, 1, function(row) {
vals <- row[!is.na(row)]
if (length(vals) == 0) NA else tail(vals, 1)
})
}
latest_paid <- get_latest_diagonal(as.matrix(paid_triangle))
latest_rptd <- get_latest_diagonal(as.matrix(rptd_triangle))
# The lag at which each accident year currently sits
current_lag <- sapply(as.data.frame(t(paid_triangle)), function(col) {
sum(!is.na(col))
})
current_lag <- apply(as.matrix(paid_triangle), 1, function(row) sum(!is.na(row)))
# ── Compute ultimate and IBNR ────────────────────────────────────────────────────
cl_results <- tibble(
accident_year = accident_years,
current_lag = current_lag,
paid_to_date = latest_paid,
reported_to_date = latest_rptd,
paid_cdf = paid_cdfs[current_lag],
rptd_cdf = rptd_cdfs[current_lag],
cl_ultimate_paid = round(latest_paid * paid_cdfs[current_lag], 0),
cl_ultimate_rptd = round(latest_rptd * rptd_cdfs[current_lag], 0),
ibnr_paid_basis = round(latest_paid * paid_cdfs[current_lag] - latest_rptd, 0),
ibnr_rptd_basis = round(latest_rptd * rptd_cdfs[current_lag] - latest_rptd, 0)
)
cl_results %>%
mutate(
paid_to_date = dollar(paid_to_date),
reported_to_date = dollar(reported_to_date),
paid_cdf = round(paid_cdf, 4),
rptd_cdf = round(rptd_cdf, 4),
cl_ultimate_rptd = dollar(cl_ultimate_rptd),
ibnr_rptd_basis = dollar(ibnr_rptd_basis)
) %>%
select(accident_year, current_lag, reported_to_date, rptd_cdf,
cl_ultimate_rptd, ibnr_rptd_basis) %>%
kable(
caption = "Chain Ladder Results — Reported Basis",
col.names = c("Accident Year", "Current Lag", "Reported to Date",
"CDF to Ult", "CL Ultimate", "CL IBNR"),
align = "c"
) %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| Accident Year | Current Lag | Reported to Date | CDF to Ult | CL Ultimate | CL IBNR |
|---|---|---|---|---|---|
| 2017 | 8 | $5,440,000 | 1.0000 | $5,440,000 | $0 |
| 2018 | 7 | $6,002,136 | 1.0000 | $6,002,136 | $0 |
| 2019 | 6 | $6,537,471 | 1.0071 | $6,584,139 | $46,668 |
| 2020 | 5 | $6,493,253 | 1.0291 | $6,682,080 | $188,827 |
| 2021 | 4 | $6,951,550 | 1.0438 | $7,256,056 | $304,506 |
| 2022 | 3 | $7,160,007 | 1.1019 | $7,889,953 | $729,946 |
| 2023 | 2 | $6,620,778 | 1.2161 | $8,051,839 | $1,431,061 |
| 2024 | 1 | $5,294,338 | 1.5418 | $8,162,638 | $2,868,300 |
## Chain Ladder Total IBNR (Reported Basis): $5,569,308
cat("Chain Ladder Total Ultimate (Reported Basis):",
dollar(sum(cl_results$cl_ultimate_rptd)), "\n")## Chain Ladder Total Ultimate (Reported Basis): $56,068,841
The Chain Ladder method is powerful, but it has a known weakness: for very recent accident years with little development, it leans heavily on immature data. If accident year 2024 has only one lag of history, multiplying it by a large CDF amplifies any early-period noise.
The Bornhuetter-Ferguson (BF) method was introduced by Ronald Bornhuetter and Ronald Ferguson in a landmark 1972 paper. Their insight: blend two sources of information.
The BF ultimate estimate is:
BF Ultimate = Reported to Date + Expected Unreported Losses
Where Expected Unreported Losses = Expected Ultimate × (1 − 1/CDF).
The term (1 − 1/CDF) is the percent
unreported — how much of ultimate losses we’d expect to not
yet be in the data at the current lag.
For mature accident years (CDF near 1.0), BF gives nearly the same answer as Chain Ladder. For immature years, BF is more stable because it leans on the a priori expectation rather than extrapolating from thin data.
# ── A priori loss ratios ─────────────────────────────────────────────────────────
# In practice, these would come from pricing, industry benchmarks, or management
# judgment. Here we use slightly smoothed versions of our true simulation ratios.
apriori_lr <- c(0.70, 0.70, 0.72, 0.72, 0.73, 0.73, 0.72, 0.71)
bf_inputs <- cl_results %>%
mutate(
premium = premiums$premium,
apriori_lr = apriori_lr,
expected_ultimate = round(premium * apriori_lr, 0),
pct_unreported = round(1 - 1 / rptd_cdf, 4), # Expected % still to develop
expected_unreptd = round(expected_ultimate * pct_unreported, 0)
)
bf_inputs %>%
mutate(
premium = dollar(premium),
expected_ultimate = dollar(expected_ultimate),
expected_unreptd = dollar(expected_unreptd)
) %>%
select(accident_year, premium, apriori_lr, expected_ultimate,
pct_unreported, expected_unreptd) %>%
kable(
caption = "BF Method — A Priori Inputs",
col.names = c("Accident Year", "Premium", "A Priori LR",
"Expected Ultimate", "% Unreported", "Expected Unreported"),
align = "c"
) %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| Accident Year | Premium | A Priori LR | Expected Ultimate | % Unreported | Expected Unreported |
|---|---|---|---|---|---|
| 2017 | $8,000,000 | 0.70 | $5,600,000 | 0.0000 | $0 |
| 2018 | $8,500,000 | 0.70 | $5,950,000 | 0.0000 | $0 |
| 2019 | $9,000,000 | 0.72 | $6,480,000 | 0.0071 | $46,008 |
| 2020 | $9,500,000 | 0.72 | $6,840,000 | 0.0283 | $193,572 |
| 2021 | $10,000,000 | 0.73 | $7,300,000 | 0.0420 | $306,600 |
| 2022 | $10,500,000 | 0.73 | $7,665,000 | 0.0925 | $709,012 |
| 2023 | $11,000,000 | 0.72 | $7,920,000 | 0.1777 | $1,407,384 |
| 2024 | $11,500,000 | 0.71 | $8,165,000 | 0.3514 | $2,869,181 |
bf_results <- bf_inputs %>%
mutate(
bf_ultimate = round(reported_to_date + expected_unreptd, 0),
bf_ibnr = round(bf_ultimate - reported_to_date, 0)
)
bf_results %>%
mutate(
reported_to_date = dollar(reported_to_date),
expected_unreptd = dollar(expected_unreptd),
bf_ultimate = dollar(bf_ultimate),
bf_ibnr = dollar(bf_ibnr)
) %>%
select(accident_year, reported_to_date, expected_unreptd, bf_ultimate, bf_ibnr) %>%
kable(
caption = "Bornhuetter-Ferguson Results — Reported Basis",
col.names = c("Accident Year", "Reported to Date", "Expected Unreported",
"BF Ultimate", "BF IBNR"),
align = "c"
) %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| Accident Year | Reported to Date | Expected Unreported | BF Ultimate | BF IBNR |
|---|---|---|---|---|
| 2017 | $5,440,000 | $0 | $5,440,000 | $0 |
| 2018 | $6,002,136 | $0 | $6,002,136 | $0 |
| 2019 | $6,537,471 | $46,008 | $6,583,479 | $46,008 |
| 2020 | $6,493,253 | $193,572 | $6,686,825 | $193,572 |
| 2021 | $6,951,550 | $306,600 | $7,258,150 | $306,600 |
| 2022 | $7,160,007 | $709,012 | $7,869,019 | $709,012 |
| 2023 | $6,620,778 | $1,407,384 | $8,028,162 | $1,407,384 |
| 2024 | $5,294,338 | $2,869,181 | $8,163,519 | $2,869,181 |
## BF Total IBNR: $5,531,757
## BF Total Ultimate: $56,031,290
comparison <- tibble(
accident_year = accident_years,
reported_to_date = cl_results$reported_to_date,
cl_ultimate = cl_results$cl_ultimate_rptd,
cl_ibnr = cl_results$ibnr_rptd_basis,
bf_ultimate = bf_results$bf_ultimate,
bf_ibnr = bf_results$bf_ibnr,
diff_ibnr = bf_results$bf_ibnr - cl_results$ibnr_rptd_basis
)
comparison %>%
mutate(across(c(reported_to_date, cl_ultimate, cl_ibnr,
bf_ultimate, bf_ibnr, diff_ibnr), dollar)) %>%
kable(
caption = "Chain Ladder vs. Bornhuetter-Ferguson — Full Comparison",
col.names = c("Accident Year", "Reported to Date",
"CL Ultimate", "CL IBNR",
"BF Ultimate", "BF IBNR",
"BF − CL IBNR"),
align = "c"
) %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = TRUE) %>%
row_spec(which(abs(bf_results$bf_ibnr - cl_results$ibnr_rptd_basis) ==
max(abs(bf_results$bf_ibnr - cl_results$ibnr_rptd_basis))),
bold = TRUE, background = "#fff3cd")| Accident Year | Reported to Date | CL Ultimate | CL IBNR | BF Ultimate | BF IBNR | BF − CL IBNR |
|---|---|---|---|---|---|---|
| 2017 | $5,440,000 | $5,440,000 | $0 | $5,440,000 | $0 | $0 |
| 2018 | $6,002,136 | $6,002,136 | $0 | $6,002,136 | $0 | $0 |
| 2019 | $6,537,471 | $6,584,139 | $46,668 | $6,583,479 | $46,008 | -$660 |
| 2020 | $6,493,253 | $6,682,080 | $188,827 | $6,686,825 | $193,572 | $4,745 |
| 2021 | $6,951,550 | $7,256,056 | $304,506 | $7,258,150 | $306,600 | $2,094 |
| 2022 | $7,160,007 | $7,889,953 | $729,946 | $7,869,019 | $709,012 | -$20,934 |
| 2023 | $6,620,778 | $8,051,839 | $1,431,061 | $8,028,162 | $1,407,384 | -$23,677 |
| 2024 | $5,294,338 | $8,162,638 | $2,868,300 | $8,163,519 | $2,869,181 | $881 |
plot_data <- comparison %>%
select(accident_year, cl_ultimate, bf_ultimate) %>%
pivot_longer(cols = c(cl_ultimate, bf_ultimate),
names_to = "method", values_to = "ultimate") %>%
mutate(method = recode(method,
"cl_ultimate" = "Chain Ladder",
"bf_ultimate" = "Bornhuetter-Ferguson"))
ggplot(plot_data, aes(x = factor(accident_year), y = ultimate, fill = method)) +
geom_col(position = "dodge", width = 0.65, alpha = 0.88) +
geom_text(aes(label = paste0("$", round(ultimate / 1e6, 2), "M")),
position = position_dodge(width = 0.65),
vjust = -0.4, size = 3.2, fontface = "bold") +
scale_y_continuous(labels = label_dollar(scale = 1e-6, suffix = "M"),
expand = expansion(mult = c(0, 0.12))) +
scale_fill_manual(values = c("Chain Ladder" = "#2c7bb6", "Bornhuetter-Ferguson" = "#d7191c")) +
labs(
title = "Projected Ultimate Losses by Accident Year",
subtitle = "Auto Liability | Chain Ladder vs. Bornhuetter-Ferguson",
x = "Accident Year",
y = "Ultimate Losses",
fill = "Method",
caption = "Note: Based on synthetic data generated for illustration purposes."
) +
theme_minimal(base_size = 13) +
theme(
plot.title = element_text(face = "bold", size = 15),
plot.subtitle = element_text(color = "grey40"),
legend.position = "top",
panel.grid.major.x = element_blank()
)Chain Ladder vs. BF Ultimate Estimates by Accident Year
ibnr_data <- comparison %>%
select(accident_year, cl_ibnr, bf_ibnr) %>%
pivot_longer(cols = c(cl_ibnr, bf_ibnr),
names_to = "method", values_to = "ibnr") %>%
mutate(method = recode(method,
"cl_ibnr" = "Chain Ladder",
"bf_ibnr" = "Bornhuetter-Ferguson"))
ggplot(ibnr_data, aes(x = factor(accident_year), y = ibnr, fill = method)) +
geom_col(position = "dodge", width = 0.65, alpha = 0.88) +
scale_y_continuous(labels = label_dollar(scale = 1e-6, suffix = "M"),
expand = expansion(mult = c(0, 0.15))) +
scale_fill_manual(values = c("Chain Ladder" = "#2c7bb6", "Bornhuetter-Ferguson" = "#d7191c")) +
labs(
title = "IBNR Reserves by Accident Year",
subtitle = "Difference is largest for the most recent (immature) accident years",
x = "Accident Year",
y = "IBNR",
fill = "Method",
caption = "IBNR = Estimated Ultimate − Reported to Date"
) +
theme_minimal(base_size = 13) +
theme(
plot.title = element_text(face = "bold", size = 15),
plot.subtitle = element_text(color = "grey40"),
legend.position = "top",
panel.grid.major.x = element_blank()
)IBNR Estimates: Chain Ladder vs. BF
dev_data <- claims_df %>%
group_by(accident_year, development_lag) %>%
summarise(reported_losses = max(reported_losses), .groups = "drop") %>%
group_by(accident_year) %>%
mutate(pct_of_latest = reported_losses / max(reported_losses)) %>%
ungroup()
ggplot(dev_data, aes(x = development_lag, y = reported_losses / 1e6,
color = factor(accident_year), group = accident_year)) +
geom_line(linewidth = 1.1, alpha = 0.8) +
geom_point(size = 2.5) +
scale_color_brewer(palette = "Dark2") +
scale_x_continuous(breaks = 1:8) +
scale_y_continuous(labels = label_dollar(suffix = "M")) +
labs(
title = "Cumulative Reported Loss Development by Accident Year",
x = "Development Lag (Years)",
y = "Cumulative Reported Losses ($M)",
color = "Accident Year"
) +
theme_minimal(base_size = 13) +
theme(
plot.title = element_text(face = "bold", size = 15),
legend.position = "right"
)Loss Development Pattern by Lag
summary_tbl <- tibble(
Metric = c("Total Reported to Date", "Chain Ladder Ultimate",
"Chain Ladder IBNR", "BF Ultimate", "BF IBNR"),
Value = c(
dollar(sum(cl_results$reported_to_date)),
dollar(sum(cl_results$cl_ultimate_rptd)),
dollar(sum(cl_results$ibnr_rptd_basis)),
dollar(sum(bf_results$bf_ultimate)),
dollar(sum(bf_results$bf_ibnr))
)
)
summary_tbl %>%
kable(caption = "Portfolio Summary — All Accident Years Combined", align = "lr") %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
row_spec(c(3, 5), bold = TRUE, background = "#e8f4f8")| Metric | Value |
|---|---|
| Total Reported to Date | $50,499,533 |
| Chain Ladder Ultimate | $56,068,841 |
| Chain Ladder IBNR | $5,569,308 |
| BF Ultimate | $56,031,290 |
| BF IBNR | $5,531,757 |
Chain Ladder is intuitive and data-driven, but it amplifies noise in immature accident years. Notice how the Chain Ladder IBNR for 2023 and 2024 is more volatile — it’s extrapolating aggressively from just one or two lags of data.
Bornhuetter-Ferguson tempers that volatility by anchoring the immature years to an a priori expectation. For 2024 (only one lag visible), BF is largely driven by expected losses — which is the appropriate behavior. As more data matures, BF naturally converges toward the Chain Ladder result.
In practice, actuaries use both methods (and often others, like the Benktander method or Cape Cod) and apply professional judgment to select or blend estimates. No single method is always right — the goal is a reasonable, defensible reserve that ensures the insurer can pay its obligations to policyholders.
## R version 4.5.3 (2026-03-11 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 26200)
##
## Matrix products: default
## LAPACK version 3.12.1
##
## locale:
## [1] LC_COLLATE=English_United States.utf8
## [2] LC_CTYPE=English_United States.utf8
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.utf8
##
## time zone: America/Chicago
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] scales_1.4.0 kableExtra_1.4.0 knitr_1.51 lubridate_1.9.5
## [5] forcats_1.0.1 stringr_1.6.0 dplyr_1.2.0 purrr_1.2.1
## [9] readr_2.2.0 tidyr_1.3.2 tibble_3.3.1 ggplot2_4.0.2
## [13] tidyverse_2.0.0
##
## loaded via a namespace (and not attached):
## [1] sass_0.4.10 generics_0.1.4 xml2_1.5.2 stringi_1.8.7
## [5] hms_1.1.4 digest_0.6.39 magrittr_2.0.4 evaluate_1.0.5
## [9] grid_4.5.3 timechange_0.4.0 RColorBrewer_1.1-3 fastmap_1.2.0
## [13] jsonlite_2.0.0 viridisLite_0.4.3 textshaping_1.0.5 jquerylib_0.1.4
## [17] cli_3.6.5 crayon_1.5.3 rlang_1.1.7 bit64_4.6.0-1
## [21] withr_3.0.2 cachem_1.1.0 yaml_2.3.12 otel_0.2.0
## [25] parallel_4.5.3 tools_4.5.3 tzdb_0.5.0 vctrs_0.7.1
## [29] R6_2.6.1 lifecycle_1.0.5 bit_4.6.0 vroom_1.7.0
## [33] pkgconfig_2.0.3 pillar_1.11.1 bslib_0.10.0 gtable_0.3.6
## [37] glue_1.8.0 systemfonts_1.3.2 xfun_0.56 tidyselect_1.2.1
## [41] rstudioapi_0.18.0 farver_2.1.2 htmltools_0.5.9 labeling_0.4.3
## [45] rmarkdown_2.30 svglite_2.2.2 compiler_4.5.3 S7_0.2.1
This analysis uses fully synthetic data and is intended for educational and portfolio demonstration purposes only. It does not represent the financials of any real insurance company.