Estimating Annual Profits for Auto Insurance Through Claim Simulation

Adam Shetler
12/06/2019

Background I

$ {Problem\ Setup:} $ You're an analyst working for an insurance company in the year 2005. Your boss comes to you on 10/01/2005 and asks you to estimate the annual UW Income (i.e. annual profit) for the current calendar year for the Auto product line. $ {Basic\ Terminology:} $

Policy-holders: people who have purchased a 1-year Auto insurance policy.
- Note: 1 policy-holder = 1 policy
Premium ($ P $): the gross ($) amount a policy-holder pays for their insurance coverage.
Rate ($ R $): this is the premium payed per policy-holder for their 1-year term (assumption: each policy-holder is charged the same rate)
Losses ($ Y $): $ R.V. $ defining the claim dollar amount.
Annual Losses ($ Y^{A_k} $): $ R.V. $ which defines the Annual Losses for a given year $ A_k $
Expenses Ratio ($ E\% $): % of the annual premium which accounts for expenses ($) as a result of doing business.
Commission Rate ($ C\% $): % of the annual premium which accounts for commissions ($) (which the company must pay out).
Exposure Units ($ EU $): the pro-rated portion of a policy earned over the term of the policy (between 0-1)
Claim Frequency ($ F $): $ \ \frac{\sum{claim\ count}}{\sum{EU}} $
Forecasted Annual EU ($ EU^F_k $): this is the annual forecasted EU's in year $ A_k $

Background II

What is Annual UW Income?

$ Annual\ UW\ Income = $ $ Annual\ Premium\ - Annual\ Losses -Annual\ Expenses - Annual\ Commmissions $

Intuition:

On an annual basis the company has money coming in:

- Premium ($ P $)

- Note $ \sum_{i=1}^{12}P_i = Annual\ Premium; $ where $ P_i $ represents the total amount of premium recieved in month $ i $ (from all policy-holders with coverage in effect)

On an annual basis the company has money going out: - Losses ($ Y $)

Note $ \sum_{i=1}^{M}Y_i = Annual\ Losses = Y^{A_k} $ where $ M $ represents the Annual Claim Count in year $ A_K $
$ AnnualPremium\times E $%
$ AnnualPremium\times C $%

Background III

Why is this important?

Important for an Insurance company to understand their underlying risk
Senior leaders always want to know what to expect in terms of overall profitability
If a product is performing poorly this approach can give indication when profitability will change/become positive
Calculate uncertainty around estimate

Assumptions

All policies were a 1-year term and written over the $ 2004/2005 $ Calendar years
$ E $% and $ C $% are both constant $ \forall $ policies
$ log(Y) \sim N(\mu, \sigma) $
Get annual $ EU $ for 2005 by forecasting (simple linear regression) OCT, NOV, DEC then sum $ EU $ $ \forall $ months 2005.
$ N \sim Poisson(\lambda) $ (i.e. $ \lambda $ represents $ Annual\ Claim\ Count $)
Given time of analysis - data is available 1 month in the rear.
Excluded claims with $ \ Y< 500 $ and $ Y >= 45K $ (this filter of the data ensured $ log(Y) \approx N(\mu, \sigma) $)

Data I

Data Source:

library: insuranceData
dataset: dataCar
7,856 policies (1 row per policy)
4,624 claims
Source: http://www.acst.mq.edu.au/GLMsforInsuranceData
References: De Jong P., Heller G.Z. (2008), Generalized linear models for insurance data, Cambridge University Press

plot of chunk unnamed-chunk-1 plot of chunk unnamed-chunk-2 plot of chunk unnamed-chunk-3

Data II

Summary Statistics

Claim Count	Loss $	EU's
Min. :1.000	Min. : 500.6	Min. :0.002738
1st Qu.:1.000	1st Qu.: 936.9	1st Qu.:0.413415
Median :1.000	Median : 1679.5	Median :0.635181
Mean :1.108	Mean : 3108.1	Mean :0.609630
3rd Qu.:1.000	3rd Qu.: 3563.7	3rd Qu.:0.826831
Max. :4.000	Max. :36502.1	Max. :0.999316

Analysis I

General Methodology For Simulating Claims ($)

Fit lognormal distribution to $ R.V. $ $ \ Y $ (i.e. Loss $'s)
- find MLE's based on all historical data
- Also try other transformations (e.g. Box-Cox, Power Transformations, etc.)… better fit than $ log(Y) $?
Forecast $ EU $ for OCT, NOV, DEC of $ 2005 $
- sum with $ EU $ data from JAN-SEP 2005 - now we will have a forecasted annual $ EU $ for $ 2005 $ (i.e. $ \ fcst\ Annual\ EU\ or (EU^F_k) $)
- Note: this number is a constant and will not change throughout the entire analysis
Calculate claim frequency given all data (composite) up until $ 10/01/2019 $
- Note: this will be improved upon in the “Extensions” section.
Now calculate the average number of claims for 2005 as:
- $ \ Average\ Annual\ Claim Count\ = Annual\ Claim\ Frequency\ \times fcst\ Annual\ EU = \alpha $

Analysis II

Continued: General Methodology For Simulating Claims ($)

5. One of our assumptions on the tab - “Background III” is: $ N \sim Poisson(\lambda);\ for\ N=0,1,\dots $ (i.e. $ \ N\ represents\ Annual\ Claim\ Count $).

On each iteration of the simulation I will draw a single sample from $ N \sim Poisson(\lambda = \alpha);\ for\ N=0,1,\dots $

6. Based on the integer output from the above step (call it $ n $) - draw $ n $ random samples from the lognormal distribution (the one we found the MLE's for in step 1)

This leaves us with a sequence of $ iid $ $ R.V's $ each drawn from the same distribution:

\[ Y_1, Y_2, \dots, Y_n\ s.t. Y_i \sim Poisson(\lambda = \alpha)\ \forall i = 1,\dots,n \]

we are interested in $ \sum_{i}^{n}{Y_i} = E(Y)E(N)= E(Y^{A_{k}})= Annual\ Losses $($): recall this is one of the components in UW Income

from the theorem on Random Sums of Random Variables we know: $ \sum_{i=1}^{n}Y_i \implies E(Y)E(N) $ because $ Y $ and $ N $ are independent (and also b/c $ Y_1, Y_2, ... $ is an $ iid $ sequence of $ R.V's $ with common mean and standard deviation.

7. Putting it all together - I will create a function in R which makes use of the replicate function to simulate: $ E(Y)E(N) = E(Y^{A}) $ many times. Also part of this function will be the logic to calculate the remaining components of the of UW Income formula. Based on the assumptions I made - calculating these components is trivial as I will choose “reasonable”: - $ Rate\ (R) $, $ Expense Ratio\ E $% and $ Commission/ Rate/ C $%

Analysis III

Continued: General Methodology For Simulating Claims ($) Final Notes

Note: the function described in step 7 will output an $ annual\ UW\ Income $ on each iteration of the simulation.

Variations On My Project

One variation I plan to work into my project is starting with modifying step 3 on the tab titled $ Analysis\ I $: the simple version of the project has the $ Annual\ Claim\ Frequency $ as a fixed number which is calculated based off historical data.

what I plan to do is take my entire dataset up to 10/01/2005 (spans 2004-2005) derive a distribution for the $ Annual\ Claim\ Frequency\ i.e.\ R.V.\ (F^{A_k}) $ using bootstrap methods. When I did this I found:

$ F^{A_k} \approx N(\mu_{B}, \sigma_{B});\ where\ B\ stands\ for\ bootstrap $

- Now the $ Annual\ Claim\ Frequency\ i.e.\ (F^{A_k}) $ is a $ R.V. $ instead of a constant - so my next step will be to derive the PDF for the new $ R.V. $ defined by:

\[ (F^{A_k})(fcst\_Annual\_EU) =(F^{A_k})(EU^{F_k})= \beta \]

Analysis IV

- Next I will derive the PDF of $ \beta $ which is trivial becuause I know the distribution of $ (F^{A_k})(EU^{F_k}) $. (Simply a univariate Normally distributed $ R.V. $ multiplied by constant scalar - which means $ \beta\sim\ Normal(\mu_,\sigma) $)

The question I want to answer at this point is:

What is the relationship between: $ \beta\sim Normal(\mu_{\beta}, \sigma_{\beta}) $ and $ N\sim Poisson(\lambda = \beta) $

Luckily $ \lambda $ tends to be large with respect to our $R.V.\ $ $ N $. This implies we can apply correction for continuity (yate's correction) and approximate $ N\sim Poisson(\lambda = \beta) \approx Normal(\mu_N = \lambda, \sigma_N = \lambda) $

Now getting the Joint Probability Density Function (JPDF) - $ P(N, \beta) $ is trivial because both $ R.V.'s $ are Normally distributed. Therefore the JPDF is simply a bivariate normal distribution given by: $ P(N, \beta) = BivNorm(\mu_n, \sigma_n, \mu_{\beta}, \sigma_{\beta}, \rho) = closed\ form\ expression\ = P(\beta \mid N)P(N) $

This will allow us to solve for $ E(Y)E(E(N \mid \beta)) $ and show by the law of total expectation that:

\[ E(E(N \mid \beta)) = E(N) \implies E(Y)E(E(N \mid \beta)) = E(Y)E(N) = E(Y^{A_k}) \]

Conclusions

Limitations to my project:

I only had 2-years worth of historical claims/policy data - a dataset that spanned a longer time frame may be helpful.
Further research could be to identify and model the relationship between claim size ($'s) and claim count (i.e. claim frequency vs claim severity).
More complex models could be designed which take into account other variables such as Gender, Vehicle type, Age, Geography, etc.