Motor Risk Pricing Analysis

Cheryn Tan

Executive Summary

Comprehensive Data Analysis
Motor Insurance Portfolio 2018-2020
Metric Value Insight
1. Claims Profile
Total Claims 7,419 7.4% of policies result in claims
Claim Frequency 7.4% Not a high amount which is positive
Average Claim Value £2,797 Moderate severity level
Median Claim Value £160 Net of recoveries + 25% windscreen claims
Maximum Claim £489,298 Single large outlier
2018 Performance 9.8% freq | £2,884 sev | 81.2% LR Started mid-year so we're missing the seasonal picture
2019 Performance 7% freq | £2,684 sev | 71.6% LR Consistent performance throughout - no major market disruptions
2020 Performance 1.4% freq | £3,191 sev | 67% LR That single large claim shows why we need capping strategies1
3. Capping Strategies2
Current System Cap £50,000 (11 claims) Standard reinsurance attachment point
System Cap Impact £207 mean reduction 11 claims affected (0.15% of claims)
Recommended 99th %ile Cap £25,714 More aggressive cap at 99th percentile
99th %ile Cap Impact £308 mean reduction Would affect 75 claims (1% of claims)
Volatility Reduction Comparison System: 49.9% | 99th: 55.9% 99th percentile gives us better control over the volatile claims
4. Key Insights
Class Imbalance 92.6% no claims | 7.4% claims Standard algorithms will fail with 92.6% majority class
Regional Variation Wales 98.8% LR | N. Ireland 63.1% LR Wales significantly underpriced relative to risk
Young Driver Impact 22.6% of policies | 75.6% loss ratio Young driver pricing can still be improved
Large Loss Concentration Top 1% claims = 18.2% of losses Large claims driving volatility
5. Portfolio Context
Total Policies 100,000 New business only (no renewals)-limited data
Average Premium £272 Mid-market price
Total Exposure Years 71,959 28% lapse rate (0.72 avg exposure)
Overall Loss Ratio 76.2% To be improved
1 2020 Q1 severity spike driven by single £489,298 claim - shows need for capping
2 We cap outliers for pricing (to keep model stable) but still need to track full amounts for recovery

Regional Risk Analysis

Key Insights

Critical Concerns: - Wales: Wales stands out as our biggest problem at 98.8% loss ratio - we’re essentially losing money on every policy there. - South East: at 87.9% loss ratio isn’t far behind

Different Risk Profiles: - London: Highest frequency (13.4%) but lower loss ratio (80.8%) than than of Wales and SE (minor incidents) - East Anglia: High frequency (13.1%) with moderate loss ratio

Growth Opportunities: - N Ireland: Lowest frequency (7.3%) and much better loss ratio (63.1%) - West Midlands: Strong performance at 63.1% loss ratio

Actions: - Further investigation showed that actually it was 1 claim of £440k from a young driver created a false picture. So removing large outlier claims >£50k then showed the following True Loss Ratios

True Regional LRs

Large Claims Impact Analysis

Critical to note: Just 6 severe claims (0.08% of all claims) account for 7.5% of total losses. The top 3 claims alone represent £1.29M in losses

Capped Incurred Column

Capping Strategy Impact
Analysis at 99th Percentile (£25,714)
Metric Value
Total Claims 7,419
99th Percentile £25,714
Claims > 99th Percentile 75
% Affected 1%
Mean (Uncapped) £2,797
Mean (Capped at 99th) £2,489
Reduction £308
Volatility Reduction 55.9%

Key Insights

What’s happening here?

If we cap claims at the 99th percentile at £25,714, we would only affect 75 out of 7,419 claims. But volatility reduces significantly and the average claim drops by £308.

Why this matters for the business

Since large losses can have detrimental effects on the pricing model (£489k outlier claim), we can cap at £25,714 to remove outliers

A closer look at the data

Interestingly, the actual system already caps claims at £50k (the is_capped_incurred field shows 11 claims hit this limit). But this £25,714 threshold represents a more aggressive approach that could work well to reduce volatility to 55.9%

What I’d suggest

Rather than a hard cap, consider a tiered approach - maybe cap at £25k for pricing but track actual costs for reinsurance recovery

Severity Predictors Analysis

Key Insights

Young Drivers

£3,900 average severity - 50% above mean
This explains why we should charge young drivers significantly higher premiums - it’s not just frequency, but severity too

Experience Matters more than Age

Zero NCD = £3,800+ severity
Experience matters more than demographics - a 40-year-old with no claims history is as risky as a young driver

Low Mileage Danger

0-5k miles = highest severity
Infrequent drivers may lack practice, making their rare trips more dangerous and severity worse

Assumptions Challenged

Bodily Injury hikes up claim size One example is Garaged policies have large claim size as 14% of claims average £14k- older age may mean more medical conditions

Data Quality Assessment

What does this mean?

Structural Issues

  • Consistent ~92.6% across ALL claims fields
  • Not actually “missing” - these are policies with no claims
  • Naturally occurring phenomenon- not everyone claims within 2 years

Data Quality Effects on Modelling

  • Severe class imbalance: Only 7.4% positive class
  • Requires certain techniques like:
    • Class weights (scale_pos_weight=12.5)
    • Probability calibration: each claim is weighted a lot more than a no claim
  • Not to rely on standard accuracy metrics
  • Found policies with £0 written premium with claims
    • Year-long policies with £0 earned premium
    • Same day claim and cancel
  • Must exclude 3,640 policies before modeling

Why not SMOTE? (Synthetic Minority Over-Sampling) While SMOTE could balance the dataset, synthetic claims raises some issues: creates unrealistic scenarios (insert a theft/ collision somewhere?), regulatory challenges with synthetic data, and 7,400 real claims is sufficient enough to seek alternative solutions i.e. class weights

Discussion on using this information to build a model to predict individual claims

Two-Stage Approach: Frequency × Severity

Proposed Modeling Framework
First predict how many will claim then expected amounts
Stage Model Target Key_Features Challenge
Stage 0: Data Cleaning Exclusion Rules Remove 3,640 invalid records Exclude: £0 premium claims, negative premiums, exposure < 30 days 23 severe claims with £66k losses
Stage 1: Frequency Poisson + Exposure Offset has_claim (0/1) Driver age, NCD, Region + log(exposure) offset Class imbalance (7.4%) + exposure bias
Stage 2: Severity Gamma GLM incurred | claim > 0 & exposure > 0.082 (30 days) Driver age, NCD, Mileage, Vehicle value Incomplete claim development
Stage 3: Combined Frequency × Severity Expected Loss Risk-adjusted premium Large uncertainty

Pricing Exercise: My Solution