Estimating Annual Profits for Auto Insurance Through Claim Simulation

Adam Shetler
12/06/2019

Background I

\( {Problem\ Setup:} \) You're an analyst working for an insurance company in the year 2005. Your boss comes to you on 10/01/2005 and asks you to estimate the annual UW Income (i.e. annual profit) for the current calendar year for the Auto product line. \( {Basic\ Terminology:} \)

  • Policy-holders: people who have purchased a 1-year Auto insurance policy.
    • Note: 1 policy-holder = 1 policy
  • Premium (\( P \)): the gross ($) amount a policy-holder pays for their insurance coverage.
  • Rate (\( R \)): this is the premium payed per policy-holder for their 1-year term (assumption: each policy-holder is charged the same rate)
  • Losses (\( Y \)): \( R.V. \) defining the claim dollar amount.
  • Annual Losses (\( Y^{A_k} \)): \( R.V. \) which defines the Annual Losses for a given year \( A_k \)
  • Expenses Ratio (\( E\% \)): % of the annual premium which accounts for expenses ($) as a result of doing business.
  • Commission Rate (\( C\% \)): % of the annual premium which accounts for commissions ($) (which the company must pay out).
  • Exposure Units (\( EU \)): the pro-rated portion of a policy earned over the term of the policy (between 0-1)
  • Claim Frequency (\( F \)): \( \ \frac{\sum{claim\ count}}{\sum{EU}} \)
  • Forecasted Annual EU (\( EU^F_k \)): this is the annual forecasted EU's in year \( A_k \)

Background II

What is Annual UW Income?

\( Annual\ UW\ Income = \) \( Annual\ Premium\ - Annual\ Losses -Annual\ Expenses - Annual\ Commmissions \)

Intuition:

On an annual basis the company has money coming in:

- Premium (\( P \))

- Note \( \sum_{i=1}^{12}P_i = Annual\ Premium; \) where \( P_i \) represents the total amount of premium recieved in month \( i \) (from all policy-holders with coverage in effect)

On an annual basis the company has money going out: - Losses (\( Y \))

  • Note \( \sum_{i=1}^{M}Y_i = Annual\ Losses = Y^{A_k} \) where \( M \) represents the Annual Claim Count in year \( A_K \)
  • \( AnnualPremium\times E \)%
  • \( AnnualPremium\times C \)%

Background III

Why is this important?

  • Important for an Insurance company to understand their underlying risk
  • Senior leaders always want to know what to expect in terms of overall profitability
  • If a product is performing poorly this approach can give indication when profitability will change/become positive
  • Calculate uncertainty around estimate

Assumptions

  • All policies were a 1-year term and written over the \( 2004/2005 \) Calendar years
  • \( E \)% and \( C \)% are both constant \( \forall \) policies
  • \( log(Y) \sim N(\mu, \sigma) \)
  • Get annual \( EU \) for 2005 by forecasting (simple linear regression) OCT, NOV, DEC then sum \( EU \) \( \forall \) months 2005.
  • \( N \sim Poisson(\lambda) \) (i.e. \( \lambda \) represents \( Annual\ Claim\ Count \))
  • Given time of analysis - data is available 1 month in the rear.
  • Excluded claims with \( \ Y< 500 \) and \( Y >= 45K \) (this filter of the data ensured \( log(Y) \approx N(\mu, \sigma) \))

Data I

Data Source:

  • library: insuranceData
  • dataset: dataCar
  • 7,856 policies (1 row per policy)
  • 4,624 claims
  • Source: http://www.acst.mq.edu.au/GLMsforInsuranceData
  • References: De Jong P., Heller G.Z. (2008), Generalized linear models for insurance data, Cambridge University Press

plot of chunk unnamed-chunk-1 plot of chunk unnamed-chunk-2 plot of chunk unnamed-chunk-3

Data II

Summary Statistics

Claim Count Loss $ EU's
Min. :1.000 Min. : 500.6 Min. :0.002738
1st Qu.:1.000 1st Qu.: 936.9 1st Qu.:0.413415
Median :1.000 Median : 1679.5 Median :0.635181
Mean :1.108 Mean : 3108.1 Mean :0.609630
3rd Qu.:1.000 3rd Qu.: 3563.7 3rd Qu.:0.826831
Max. :4.000 Max. :36502.1 Max. :0.999316

Analysis I

General Methodology For Simulating Claims ($)

  1. Fit lognormal distribution to \( R.V. \) \( \ Y \) (i.e. Loss $'s)
    • find MLE's based on all historical data
    • Also try other transformations (e.g. Box-Cox, Power Transformations, etc.)… better fit than \( log(Y) \)?
  2. Forecast \( EU \) for OCT, NOV, DEC of \( 2005 \)
    • sum with \( EU \) data from JAN-SEP 2005 - now we will have a forecasted annual \( EU \) for \( 2005 \) (i.e. \( \ fcst\ Annual\ EU\ or (EU^F_k) \))
    • Note: this number is a constant and will not change throughout the entire analysis
  3. Calculate claim frequency given all data (composite) up until \( 10/01/2019 \)
    • Note: this will be improved upon in the “Extensions” section.
  4. Now calculate the average number of claims for 2005 as:
    • \( \ Average\ Annual\ Claim Count\ = Annual\ Claim\ Frequency\ \times fcst\ Annual\ EU = \alpha \)

Analysis II

Continued: General Methodology For Simulating Claims ($)

5. One of our assumptions on the tab - “Background III” is: \( N \sim Poisson(\lambda);\ for\ N=0,1,\dots \) (i.e. \( \ N\ represents\ Annual\ Claim\ Count \)).

On each iteration of the simulation I will draw a single sample from \( N \sim Poisson(\lambda = \alpha);\ for\ N=0,1,\dots \)

6. Based on the integer output from the above step (call it \( n \)) - draw \( n \) random samples from the lognormal distribution (the one we found the MLE's for in step 1)

This leaves us with a sequence of \( iid \) \( R.V's \) each drawn from the same distribution:

\[ Y_1, Y_2, \dots, Y_n\ s.t. Y_i \sim Poisson(\lambda = \alpha)\ \forall i = 1,\dots,n \]

  • we are interested in \( \sum_{i}^{n}{Y_i} = E(Y)E(N)= E(Y^{A_{k}})= Annual\ Losses \)($): recall this is one of the components in UW Income
  • from the theorem on Random Sums of Random Variables we know: \( \sum_{i=1}^{n}Y_i \implies E(Y)E(N) \) because \( Y \) and \( N \) are independent (and also b/c \( Y_1, Y_2, ... \) is an \( iid \) sequence of \( R.V's \) with common mean and standard deviation.

7. Putting it all together - I will create a function in R which makes use of the replicate function to simulate: \( E(Y)E(N) = E(Y^{A}) \) many times. Also part of this function will be the logic to calculate the remaining components of the of UW Income formula. Based on the assumptions I made - calculating these components is trivial as I will choose “reasonable”: - \( Rate\ (R) \), \( Expense Ratio\ E \)% and \( Commission/ Rate/ C \)%

Analysis III

Continued: General Methodology For Simulating Claims ($) Final Notes

Note: the function described in step 7 will output an \( annual\ UW\ Income \) on each iteration of the simulation.

Variations On My Project

One variation I plan to work into my project is starting with modifying step 3 on the tab titled \( Analysis\ I \): the simple version of the project has the \( Annual\ Claim\ Frequency \) as a fixed number which is calculated based off historical data.

what I plan to do is take my entire dataset up to 10/01/2005 (spans 2004-2005) derive a distribution for the \( Annual\ Claim\ Frequency\ i.e.\ R.V.\ (F^{A_k}) \) using bootstrap methods. When I did this I found:

\( F^{A_k} \approx N(\mu_{B}, \sigma_{B});\ where\ B\ stands\ for\ bootstrap \)

- Now the \( Annual\ Claim\ Frequency\ i.e.\ (F^{A_k}) \) is a \( R.V. \) instead of a constant - so my next step will be to derive the PDF for the new \( R.V. \) defined by:

\[ (F^{A_k})(fcst\_Annual\_EU) =(F^{A_k})(EU^{F_k})= \beta \]

Analysis IV

- Next I will derive the PDF of \( \beta \) which is trivial becuause I know the distribution of \( (F^{A_k})(EU^{F_k}) \). (Simply a univariate Normally distributed \( R.V. \) multiplied by constant scalar - which means \( \beta\sim\ Normal(\mu_,\sigma) \))

The question I want to answer at this point is:

What is the relationship between: \( \beta\sim Normal(\mu_{\beta}, \sigma_{\beta}) \) and \( N\sim Poisson(\lambda = \beta) \)

  • Luckily \( \lambda \) tends to be large with respect to our $R.V.\ $ \( N \). This implies we can apply correction for continuity (yate's correction) and approximate \( N\sim Poisson(\lambda = \beta) \approx Normal(\mu_N = \lambda, \sigma_N = \lambda) \)
  • Now getting the Joint Probability Density Function (JPDF) - \( P(N, \beta) \) is trivial because both \( R.V.'s \) are Normally distributed. Therefore the JPDF is simply a bivariate normal distribution given by: \( P(N, \beta) = BivNorm(\mu_n, \sigma_n, \mu_{\beta}, \sigma_{\beta}, \rho) = closed\ form\ expression\ = P(\beta \mid N)P(N) \)
  • This will allow us to solve for \( E(Y)E(E(N \mid \beta)) \) and show by the law of total expectation that:

\[ E(E(N \mid \beta)) = E(N) \implies E(Y)E(E(N \mid \beta)) = E(Y)E(N) = E(Y^{A_k}) \]

Conclusions

Limitations to my project:

  • I only had 2-years worth of historical claims/policy data - a dataset that spanned a longer time frame may be helpful.

  • Further research could be to identify and model the relationship between claim size ($'s) and claim count (i.e. claim frequency vs claim severity).

  • More complex models could be designed which take into account other variables such as Gender, Vehicle type, Age, Geography, etc.