Adam Shetler
12/06/2019
\( {Problem\ Setup:} \) You're an analyst working for an insurance company in the year 2005. Your boss comes to you on 10/01/2005 and asks you to estimate the annual UW Income (i.e. annual profit) for the current calendar year for the Auto product line. \( {Basic\ Terminology:} \)
What is Annual UW Income?
\( Annual\ UW\ Income = \) \( Annual\ Premium\ - Annual\ Losses -Annual\ Expenses - Annual\ Commmissions \)
Intuition:
On an annual basis the company has money coming in:
- Premium (\( P \))
- Note \( \sum_{i=1}^{12}P_i = Annual\ Premium; \) where \( P_i \) represents the total amount of premium recieved in month \( i \) (from all policy-holders with coverage in effect)
On an annual basis the company has money going out: - Losses (\( Y \))
Why is this important?
Assumptions
Data Source:
Summary Statistics
| Claim Count | Loss $ | EU's | |
|---|---|---|---|
| Min. :1.000 | Min. : 500.6 | Min. :0.002738 | |
| 1st Qu.:1.000 | 1st Qu.: 936.9 | 1st Qu.:0.413415 | |
| Median :1.000 | Median : 1679.5 | Median :0.635181 | |
| Mean :1.108 | Mean : 3108.1 | Mean :0.609630 | |
| 3rd Qu.:1.000 | 3rd Qu.: 3563.7 | 3rd Qu.:0.826831 | |
| Max. :4.000 | Max. :36502.1 | Max. :0.999316 |
General Methodology For Simulating Claims ($)
Continued: General Methodology For Simulating Claims ($)
5. One of our assumptions on the tab - “Background III” is: \( N \sim Poisson(\lambda);\ for\ N=0,1,\dots \) (i.e. \( \ N\ represents\ Annual\ Claim\ Count \)).
On each iteration of the simulation I will draw a single sample from \( N \sim Poisson(\lambda = \alpha);\ for\ N=0,1,\dots \)
6. Based on the integer output from the above step (call it \( n \)) - draw \( n \) random samples from the lognormal distribution (the one we found the MLE's for in step 1)
This leaves us with a sequence of \( iid \) \( R.V's \) each drawn from the same distribution:
\[ Y_1, Y_2, \dots, Y_n\ s.t. Y_i \sim Poisson(\lambda = \alpha)\ \forall i = 1,\dots,n \]
7. Putting it all together - I will create a function in R which makes use of the replicate function to simulate: \( E(Y)E(N) = E(Y^{A}) \) many times. Also part of this function will be the logic to calculate the remaining components of the of UW Income formula. Based on the assumptions I made - calculating these components is trivial as I will choose “reasonable”: - \( Rate\ (R) \), \( Expense Ratio\ E \)% and \( Commission/ Rate/ C \)%
Continued: General Methodology For Simulating Claims ($) Final Notes
Note: the function described in step 7 will output an \( annual\ UW\ Income \) on each iteration of the simulation.
Variations On My Project
One variation I plan to work into my project is starting with modifying step 3 on the tab titled \( Analysis\ I \): the simple version of the project has the \( Annual\ Claim\ Frequency \) as a fixed number which is calculated based off historical data.
what I plan to do is take my entire dataset up to 10/01/2005 (spans 2004-2005) derive a distribution for the \( Annual\ Claim\ Frequency\ i.e.\ R.V.\ (F^{A_k}) \) using bootstrap methods. When I did this I found:
\( F^{A_k} \approx N(\mu_{B}, \sigma_{B});\ where\ B\ stands\ for\ bootstrap \)
- Now the \( Annual\ Claim\ Frequency\ i.e.\ (F^{A_k}) \) is a \( R.V. \) instead of a constant - so my next step will be to derive the PDF for the new \( R.V. \) defined by:
\[ (F^{A_k})(fcst\_Annual\_EU) =(F^{A_k})(EU^{F_k})= \beta \]
- Next I will derive the PDF of \( \beta \) which is trivial becuause I know the distribution of \( (F^{A_k})(EU^{F_k}) \). (Simply a univariate Normally distributed \( R.V. \) multiplied by constant scalar - which means \( \beta\sim\ Normal(\mu_,\sigma) \))
The question I want to answer at this point is:
What is the relationship between: \( \beta\sim Normal(\mu_{\beta}, \sigma_{\beta}) \) and \( N\sim Poisson(\lambda = \beta) \)
\[ E(E(N \mid \beta)) = E(N) \implies E(Y)E(E(N \mid \beta)) = E(Y)E(N) = E(Y^{A_k}) \]
Limitations to my project:
I only had 2-years worth of historical claims/policy data - a dataset that spanned a longer time frame may be helpful.
Further research could be to identify and model the relationship between claim size ($'s) and claim count (i.e. claim frequency vs claim severity).
More complex models could be designed which take into account other variables such as Gender, Vehicle type, Age, Geography, etc.