Jeff Hung

Data scientist of the Institute of Manufacturing Information and Systems of National Cheng Kung University
Gmail LinkedIn Github Polab

Description

The data represent industry aggregates for private passenger auto liability/medical coverages from year 1995 to year 2004, in millions of dollars. They are based on insurance company annual statements. The variable “Claim” represent cumulative net payments, including defense and cost containment expenses.

Load Packages and Data

library(insuranceData)
library(dplyr)
library(ggplot2)
library(plotly)
data(IndustryAuto)

Data Description

str(IndustryAuto)
## 'data.frame':    55 obs. of  3 variables:
##  $ Incurral.Year   : int  1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 ...
##  $ Development.Year: int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Claim           : int  17674 18315 18606 18816 20649 22327 23141 24301 24210 24468 ...
summary(IndustryAuto)
##  Incurral.Year  Development.Year     Claim      
##  Min.   :1995   Min.   : 1       Min.   :17674  
##  1st Qu.:1996   1st Qu.: 2       1st Qu.:35091  
##  Median :1998   Median : 4       Median :43829  
##  Mean   :1998   Mean   : 4       Mean   :39948  
##  3rd Qu.:2000   3rd Qu.: 6       3rd Qu.:46640  
##  Max.   :2004   Max.   :10       Max.   :53242

Incurral Year : The year in which a claim has been incurred
Development Year: The number of years from incurral to the time when the payment is made
Claim : Cumulative net payments, including defense and cost containment expenses (millions of dollars)

Data Preprocessing

IndustryAuto <- IndustryAuto %>% group_by(Incurral.Year) %>%
                                 arrange(Incurral.Year) %>%
                                 mutate(Claim2 = Claim - c(0,Claim[-length(Claim)]))

Data Visulization


Data Analysis

  1. Exponential regression of 1995 without first year
## 
## Call:
## lm(formula = log(Claim2) ~ Development.Year, data = Data_1995)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.054794 -0.020908  0.001446  0.029135  0.041060 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      10.913104   0.029699   367.5 2.92e-16 ***
## Development.Year -0.690007   0.004547  -151.8 1.42e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03522 on 7 degrees of freedom
## Multiple R-squared:  0.9997, Adjusted R-squared:  0.9997 
## F-statistic: 2.303e+04 on 1 and 7 DF,  p-value: 1.423e-13

  1. Exponential regression of 1996 without first year
## 
## Call:
## lm(formula = log(Claim2) ~ Development.Year, data = Data_1996)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.19576 -0.04779  0.01004  0.06011  0.16268 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       10.8577     0.1079  100.66 6.48e-11 ***
## Development.Year  -0.6655     0.0181  -36.76 2.70e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1173 on 6 degrees of freedom
## Multiple R-squared:  0.9956, Adjusted R-squared:  0.9948 
## F-statistic:  1352 on 1 and 6 DF,  p-value: 2.703e-08

  1. Exponential regression of all time without first year
## 
## Call:
## lm(formula = log(Claim2) ~ Development.Year, data = Data_all)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.35534 -0.06404 -0.00362  0.07000  0.29763 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      11.04757    0.03775  292.66   <2e-16 ***
## Development.Year -0.70164    0.00731  -95.98   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1084 on 43 degrees of freedom
## Multiple R-squared:  0.9954, Adjusted R-squared:  0.9952 
## F-statistic:  9213 on 1 and 43 DF,  p-value: < 2.2e-16

  1. Linear regression of all time with first year and second year
## 
## Call:
## lm(formula = Claim2 ~ Development.Year, data = Data_all_2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3219.2 -1604.9  -166.1  1465.5  3407.8 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         25833       1600   16.14 2.53e-11 ***
## Development.Year    -4939       1012   -4.88 0.000167 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2147 on 16 degrees of freedom
## Multiple R-squared:  0.5981, Adjusted R-squared:  0.573 
## F-statistic: 23.81 on 1 and 16 DF,  p-value: 0.0001669

  1. Piecewise Regression
    Model: \(y = \beta_0e^{\beta_1x+\beta_2(x-2)_+}\epsilon\)
## 
## Call:
## lm(formula = log(Claim2) ~ Development.Year + I(pmax(Development.Year - 
##     2, 0)), data = IndustryAuto)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.35534 -0.06739 -0.00362  0.07321  0.29763 
## 
## Coefficients:
##                                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      10.26877    0.07589 135.311  < 2e-16 ***
## Development.Year                 -0.31224    0.04425  -7.056 4.03e-09 ***
## I(pmax(Development.Year - 2, 0)) -0.38940    0.04820  -8.079 9.54e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1126 on 52 degrees of freedom
## Multiple R-squared:  0.9955, Adjusted R-squared:  0.9954 
## F-statistic:  5813 on 2 and 52 DF,  p-value: < 2.2e-16

Conclusion

As we can see, the model of piecewise regression fit quite well. With this model in hand, it is possible for the insurance company to well prepare the auto liability/medical coverages in every following year.