Jeff Hung
Data scientist of the Institute of Manufacturing Information and Systems of National Cheng Kung University
Gmail
LinkedIn
Github
Polab
Description
The data represent industry aggregates for private passenger auto liability/medical coverages from year 1995 to year 2004, in millions of dollars. They are based on insurance company annual statements. The variable “Claim” represent cumulative net payments, including defense and cost containment expenses.
Load Packages and Data
library(insuranceData)
library(dplyr)
library(ggplot2)
library(plotly)
data(IndustryAuto)Data Description
str(IndustryAuto)## 'data.frame': 55 obs. of 3 variables:
## $ Incurral.Year : int 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 ...
## $ Development.Year: int 1 1 1 1 1 1 1 1 1 1 ...
## $ Claim : int 17674 18315 18606 18816 20649 22327 23141 24301 24210 24468 ...
summary(IndustryAuto)## Incurral.Year Development.Year Claim
## Min. :1995 Min. : 1 Min. :17674
## 1st Qu.:1996 1st Qu.: 2 1st Qu.:35091
## Median :1998 Median : 4 Median :43829
## Mean :1998 Mean : 4 Mean :39948
## 3rd Qu.:2000 3rd Qu.: 6 3rd Qu.:46640
## Max. :2004 Max. :10 Max. :53242
Incurral Year : The year in which a claim has been incurred
Development Year: The number of years from incurral to the time when the payment is made
Claim : Cumulative net payments, including defense and cost containment expenses (millions of dollars)
Data Preprocessing
IndustryAuto <- IndustryAuto %>% group_by(Incurral.Year) %>%
arrange(Incurral.Year) %>%
mutate(Claim2 = Claim - c(0,Claim[-length(Claim)]))Data Visulization
Data Analysis
- Exponential regression of 1995 without first year
##
## Call:
## lm(formula = log(Claim2) ~ Development.Year, data = Data_1995)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.054794 -0.020908 0.001446 0.029135 0.041060
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.913104 0.029699 367.5 2.92e-16 ***
## Development.Year -0.690007 0.004547 -151.8 1.42e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03522 on 7 degrees of freedom
## Multiple R-squared: 0.9997, Adjusted R-squared: 0.9997
## F-statistic: 2.303e+04 on 1 and 7 DF, p-value: 1.423e-13
- Exponential regression of 1996 without first year
##
## Call:
## lm(formula = log(Claim2) ~ Development.Year, data = Data_1996)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.19576 -0.04779 0.01004 0.06011 0.16268
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.8577 0.1079 100.66 6.48e-11 ***
## Development.Year -0.6655 0.0181 -36.76 2.70e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1173 on 6 degrees of freedom
## Multiple R-squared: 0.9956, Adjusted R-squared: 0.9948
## F-statistic: 1352 on 1 and 6 DF, p-value: 2.703e-08
- Exponential regression of all time without first year
##
## Call:
## lm(formula = log(Claim2) ~ Development.Year, data = Data_all)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.35534 -0.06404 -0.00362 0.07000 0.29763
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.04757 0.03775 292.66 <2e-16 ***
## Development.Year -0.70164 0.00731 -95.98 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1084 on 43 degrees of freedom
## Multiple R-squared: 0.9954, Adjusted R-squared: 0.9952
## F-statistic: 9213 on 1 and 43 DF, p-value: < 2.2e-16
- Linear regression of all time with first year and second year
##
## Call:
## lm(formula = Claim2 ~ Development.Year, data = Data_all_2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3219.2 -1604.9 -166.1 1465.5 3407.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 25833 1600 16.14 2.53e-11 ***
## Development.Year -4939 1012 -4.88 0.000167 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2147 on 16 degrees of freedom
## Multiple R-squared: 0.5981, Adjusted R-squared: 0.573
## F-statistic: 23.81 on 1 and 16 DF, p-value: 0.0001669
- Piecewise Regression
Model: \(y = \beta_0e^{\beta_1x+\beta_2(x-2)_+}\epsilon\)
##
## Call:
## lm(formula = log(Claim2) ~ Development.Year + I(pmax(Development.Year -
## 2, 0)), data = IndustryAuto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.35534 -0.06739 -0.00362 0.07321 0.29763
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.26877 0.07589 135.311 < 2e-16 ***
## Development.Year -0.31224 0.04425 -7.056 4.03e-09 ***
## I(pmax(Development.Year - 2, 0)) -0.38940 0.04820 -8.079 9.54e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1126 on 52 degrees of freedom
## Multiple R-squared: 0.9955, Adjusted R-squared: 0.9954
## F-statistic: 5813 on 2 and 52 DF, p-value: < 2.2e-16
Conclusion
As we can see, the model of piecewise regression fit quite well. With this model in hand, it is possible for the insurance company to well prepare the auto liability/medical coverages in every following year.