Now, we can use the credibility approach to measure frequency and severity of claims and contrast them with new observed values. Use a coverage probability of 95% and a range around the true mean of 10%
We can start with some data.
setwd("~/Dropbox/UDLAP/Cursos/2022 Primavera/Tema Selecto/data")
data<-read.csv("insample.csv")
In this dataset, we have different years:
table(data$Year)
##
## 2006 2007 2008 2009 2010
## 1154 1138 1125 1112 1110
Imagine we have information from years 2006-2008, we can estimate a premium based on expected frequency and expected severity. In this example, lets explore small pieces … this is, first we will review frequency for the 2006-2009 period, and then will contrast that to the information from 2009 and we will decide if we can change to 2009 or stay with the previous value.
data1<-data[which(data$Year<=2008),]
data2<-data[which(data$Year==2009),]
The mean frequency for 2006-2008, this is, the mean per policy is:
mu<-mean(data1$Freq)
mu
## [1] 1.030729
Now, imagine we get information from 2009, should we update the information on frequency?
First, what is the expected number of claims based on the previous findings:
\[ \lambda_N = Policies \times \mu \]
policies<-length(data2$Y)
lN<- policies*mu
lN
## [1] 1146.17
\[ \lambda_F=\bigg( \frac{z_{1-\alpha/2}}{k} \bigg)^2 \]
In this example we are working with k=0.1 and \(\alpha\)=0.05, then:
\[ \lambda_F=\bigg( \frac{z_{0.975}}{0.1} \bigg)^2 \]
lF<-(qnorm(0.975)/0.1)^2
lF
## [1] 384.1459
Now, we have:
\[ 348.1459 = \lambda_F < \lambda_N = 1146.17 \]
This means that we attain full credibility for claim frequency. Therefor, for the next period we have an expected frequency of:
mu2<-mean(data2$Freq)
mu2
## [1] 1.219424
Now, we can see what happens with severity. First, we need to estimate the coefficient of variation, which is:
\[ C_x=\frac{\sigma_X}{\mu_X} \]
For claims in 2009, we need the values of deviation and variance. First, the number of claims:
table(data2$Freq)
##
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13 15 16 18 19 21 143
## 811 155 69 26 17 11 1 4 2 2 2 1 1 1 2 1 1 1 1 1
## 228 263
## 1 1
length(data2$Freq)
## [1] 1112
Which means we have 1112-811=301 claims… then:
varx<-var(data2$y[data2$y>0])
varx
## [1] 20039852864
sigmax<-sqrt(varx)
sigmax
## [1] 141562.2
mux<-mean(data2$y[data2$y>0])
Cx<-sigmax/mux
Cx
## [1] 3.857419
Now, for claim severity we have:
\[ \lambda_F \cdot C_x^2 \]
lF*Cx*Cx
## [1] 5715.97
We have then
\[ \lambda_F \cdot C_x^2 > Claims^{2009} \]
Which means that we did not attained full credibility for claim severity. Then, we can estimathe the credibility factor as
\[ Z=\sqrt(\frac{Claims}{\lambda_F C_x^2}) \]
Z<- sqrt(301/(lF*Cx*Cx))
Z
## [1] 0.2294765
Now, we can update our severity as:
\[ U_{X}=Z\cdot \mu_{X_{2009}} + (1-Z) \mu_{X_{2006-2008}} \]
Ux<-Z*mean(data2$y[data2$y>0])+(1-Z)*mean(data1$y[data1$y>0])
Ux
## [1] 47759.67
Which compares with the values for the observed data in 2009 and the observed 2006-2008 data
mean(data1$y[data1$y>0])
## [1] 51053.84
mean(data2$y[data2$y>0])
## [1] 36698.68
Now, how does this affect premiums?
According to the basic approach. Premium was calculated for 2006-2008 as:
p1<-mean(data1$Freq)*mean(data1$y[data1$y>0])
p1
## [1] 52622.66
Now, if we were to use our data from 2009, we would have a premium of the form:
p2<-mean(data2$Freq)*mean(data2$y[data2$y>0])
p2
## [1] 44751.26
However, based on credibility theory, the updated premium would be:
pu<-mean(data2$Freq)*Ux
pu
## [1] 58239.31
Which is the consequence of a higher frequency in 2009.
\[\textbf{Activity: Replicate this example with 2010.}\]