Insurance is something that provides protection against financial losses. It is a type of a risk management. Insurance buyers are also known as risk avoiders by transferring the risk in terms of their need to reduce the amount of loss. Insurance companies use risk profiling methodology to calculate the price of the premium for their customers. Auto insurance-All drivers are required to carry auto insurance that covers potential costs related to an accident or theft. The costs may include repair or replacement of vehicles, or medical care that is related to injuries sustained in an accident. If law enforcement deems a driver at fault in an accident, that driver’s insurer picks up the tab. Auto insurance underwriters use a list of several criteria to determine whether you are likely to cause an accident such as credit history, age, address, driving records, marriage status, prior coverage etc. Auto insurance companies weigh these factors and others including your occupation, military service or your education level to determine your premium payment. Your ethnicity, religion or income cannot be used against you. If you are judged a high risk, you may be denied insurance. Most states offer a high-risk alternative source of insurance; your driving mistakes will cost you dearly in these cases because you have demonstrated that you are likely to cause insurer payouts. For example smoking is a high-risk behavior because it is known that smokers are likelier to need hospitalization. Health insurance companies may charge smokers more because there is a statistical likelihood that the policy owner will cost them money (Simmons, B.2015).
First Auto Insurance Company (FAIC) is an auto insurance company based out of South Africa. They are one of the largest players in the field of the auto insurance in the country. Every year they insure 90 out of every 150 vehicle in South Africa. They are listed on the Johannesburg stock exchange limited with a market cap of $150 million. They have followed the traditional method of underwriting the policy based on the condition of the vehicle and having a general premium price for all. Till early 1990’s when the market was closed for international investments they have enjoyed a stupendous success but things have begun to change after 2010.Foreign auto insurance companies has started investing in South Africa from 1995.They have done their ground work well. To begin with they have initiated the market survey to understand the local auto insurance market and its pricing system. Through their study they found that largest auto insurance company has have enjoyed the monopoly in insurance market over several years. They also came to know that while determining the price of premium they give minimum importance to the risk profile of the customer. To break the monopoly of FAIC, Foreign players has bring in the concept of differential pricing which has taken risk profile of customer into consideration thus charging higher premium from high risk profile customer.
1.Policy Number- Unique Policy Number, (Unique value identifying the policy ,Identifier) 2.Age -Age of Policy holder (16, 17,.,70 Numerical (Discrete)) 3.Years of Driving Experience -Years of Driving Experience of the Policy holder (0,1,..,53 , Numerical (Discrete) ) 4.Number of Vehicles- Number of Vehicles insured under the policy (1,2,3,4 , Numerical (Discrete) ) 5.Gender-Gender of the Policy holder (Female, Male ,Categorical (Binary) ) 6.Married -Marital status of the Policy holder (Married, Single Categorical (binary) ) 7.Vehicle Age -Age of vehicle insured under the policy (0,1,.,15 ,Numerical (Discrete) ) 8.Fuel Type - Fuel type of the vehicle insured (Diesel, Petrol , Categorical (Binary) ) 9.Losses Loss amount claimed under the policy (Range: 13- 3500, Numerical (Continuous) )
Age & Age band
After doing the analysis of the age variable it has been found that younger people are more prone to accidents than old age persons. Although when we have form the age band for the analysis it has been find that people in the age group of 16-25 are dangerous drivers. This explains the reason behind charging higher premium to the persons belonging to age group 16-25.(Appendix 2)
Years of Driving experience
The amount of time you have spent behind the driving wheel helps people in lowering the cost of insurance.Insurance companies has a practice of charging more to rookie driver than from the experienced driver.Also experienced driver is less likely to cause accidents thus lowering the loss for the insurance company.(Appendix 3)
Gender
Females are considered to be safe driver than males. Historically companies charges less premium to the females in comparison to the males.(Appendix 4)
Marital Status
According to the analysis of the marital status variable, Married people are tend to be safe drivers thus giving less loss to auto companies.(Appendix 5)
Fuel Type
According to the data analysis Diesel vehicle are more accident prone than the petrol vehicle.(Appendix 6)
Number of Vehicles
According to the data analysis if you have multiple vehicles you are less likely to cause an accident.(Appendix 7)
Vehicle Age
According to the data analysis average loss for new vehicle is less in comparison to the older vehicle.(Appendix 8)
Inorder to test the hypothesis,the following model was proposed
LOSS= a0+a1*Avg Age+a2*Number of Vehicles+a3*Gender Dummy+a4*Married Dummy+a5*Avg Vehicle Age+a6*Fuel Type Dummy+a7*Avg driving experience+error
#regression analysis
attach(project)
fit<-lm(Losses~`Average Age`+`Avg Driving Experience`+`Number of Vehicles`+`Gender Dummy`+`Dummy Married`+`Avg Vehicle Age`+`Dummy Fuel`)
summary(fit)
##
## Call:
## lm(formula = Losses ~ `Average Age` + `Avg Driving Experience` +
## `Number of Vehicles` + `Gender Dummy` + `Dummy Married` +
## `Avg Vehicle Age` + `Dummy Fuel`)
##
## Residuals:
## Min 1Q Median 3Q Max
## -353.4 -91.7 -3.5 74.6 3199.1
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 830.8209 7.3562 112.941 < 2e-16 ***
## `Average Age` -2.9390 0.2723 -10.792 < 2e-16 ***
## `Avg Driving Experience` -1.5576 0.2842 -5.481 4.31e-08 ***
## `Number of Vehicles` -2.2293 1.3190 -1.690 0.091 .
## `Gender Dummy` 49.5381 2.5642 19.319 < 2e-16 ***
## `Dummy Married` 78.1069 2.6054 29.979 < 2e-16 ***
## `Avg Vehicle Age` -11.3789 0.3313 -34.344 < 2e-16 ***
## `Dummy Fuel` -310.4897 3.5669 -87.048 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 155.5 on 15282 degrees of freedom
## Multiple R-squared: 0.6245, Adjusted R-squared: 0.6243
## F-statistic: 3630 on 7 and 15282 DF, p-value: < 2.2e-16
We established a model to find the effect of various factors on the loss.
The variable Number of vehicles statistically has no effect on the loss(p>0.05).Other variables such as average age , avg drving experience,gender dummy,married dummy, avg vehicle age,dummy fuel statistically has a significant effect on the loss(p<0.05) .62% (Multiple r-squared)variations in the dependent variable can be explained by the independent variable.The value increases when we add independent variable to it.Adjusted R square value is 0.594.It means 62.43%(Adjusted R-squared) variation in the dependent variable can be explained by the independent variable it become precise when we add independent variable to it.
This paper was motivated by the need for research that could improve our understanding of how the premium pricing is made based on customer risk profiling in insurance companies. The unique contribution of this paper is that we investigated the loss made by customers based on the risk profiling. We found that Higher is the age, lower is the loss,Average Loss for Males is higher than Females,Average Loss for Single is higher than Married,Older is the vehicle, lower are the losses Losses are higher for Fuel type Diesel.And the number of vehicle the customer have has no statistical significance in the model.
Simmons, B.(2015,January9),how insurance companies measure risk, Retrieved from http://www.insurancecompanies.com/insider-information-how-insurance-companies-measure-risk/
Simons Lintel, (2016, September 19), The Determinants of Auto Insurance Premiums, Retrieved from http://thismatter.com/money/insurance/types/auto-insurance-cost-determinants.htm
Jessica Bosrai,(2013,January 8),What Really Goes Into Determining Your Insurance Rates?, Retrieved from ,http://www.forbes.com/sites/moneywisewomen/2013/01/08/what-really-goes-into-determining-your-insurance-rates/#6c16ba743ffa
Anne Freedman,(2013,August 1),What Factors Should Underwriters Consider?, Retrieved from http://riskandinsurance.com/what-factors-should-underwriters-consider/
Investopedia Staff,(2014,September 16),How An Insurance Company Determines Your Premiums, Retrieved from http://www.investopedia.com/articles/pf/05/insurescore.asp
Appendix 1
#Summarizing the data
library(psych)
describe(project)
## vars n mean sd median
## Policy Number 1 15290 149910.28 28948.81 149872.0
## Age 2 15290 42.33 18.28 42.0
## Age Interval* 3 15290 NaN NA NA
## Average Age 4 15290 42.71 18.25 45.5
## Years of Driving Experience 5 15290 23.73 17.85 23.0
## Driving Experience Interval* 6 15290 NaN NA NA
## Avg Driving Experience 7 15290 24.46 17.30 17.5
## Number of Vehicles 8 15290 2.50 0.95 2.0
## Gender* 9 15290 NaN NA NA
## Gender Dummy 10 15290 0.49 0.50 0.0
## Married* 11 15290 NaN NA NA
## Dummy Married 12 15290 0.49 0.50 0.0
## Vehicle Age 13 15290 8.66 4.34 9.0
## Vehicle Age Interval* 14 15290 NaN NA NA
## Avg Vehicle Age 15 15290 8.58 4.26 9.5
## Fuel* 16 15290 NaN NA NA
## Dummy Fuel 17 15290 0.76 0.43 1.0
## Losses 18 15290 389.86 253.73 355.0
## Capped Losses 19 15290 389.86 253.73 355.0
## trimmed mad min max range
## Policy Number 149903.58 37216.97 100002.0 200000.0 99998
## Age 42.18 26.69 16.0 70.0 54
## Age Interval* NaN NA Inf -Inf -Inf
## Average Age 42.01 35.58 21.5 69.5 48
## Years of Driving Experience 23.41 26.69 0.0 53.0 53
## Driving Experience Interval* NaN NA Inf -Inf -Inf
## Avg Driving Experience 23.20 17.79 5.5 53.5 48
## Number of Vehicles 2.49 1.48 1.0 4.0 3
## Gender* NaN NA Inf -Inf -Inf
## Gender Dummy 0.49 0.00 0.0 1.0 1
## Married* NaN NA Inf -Inf -Inf
## Dummy Married 0.49 0.00 0.0 1.0 1
## Vehicle Age 8.87 4.45 0.0 15.0 15
## Vehicle Age Interval* NaN NA Inf -Inf -Inf
## Avg Vehicle Age 8.85 5.93 1.5 13.5 12
## Fuel* NaN NA Inf -Inf -Inf
## Dummy Fuel 0.83 0.00 0.0 1.0 1
## Losses 363.77 194.22 13.0 3500.0 3487
## Capped Losses 363.77 194.22 13.0 3500.0 3487
## skew kurtosis se
## Policy Number 0.00 -1.21 234.11
## Age 0.05 -1.51 0.15
## Age Interval* NA NA NA
## Average Age 0.15 -1.47 0.15
## Years of Driving Experience 0.10 -1.51 0.14
## Driving Experience Interval* NA NA NA
## Avg Driving Experience 0.25 -1.41 0.14
## Number of Vehicles 0.01 -0.93 0.01
## Gender* NA NA NA
## Gender Dummy 0.03 -2.00 0.00
## Married* NA NA NA
## Dummy Married 0.04 -2.00 0.00
## Vehicle Age -0.34 -0.93 0.04
## Vehicle Age Interval* NA NA NA
## Avg Vehicle Age -0.34 -1.14 0.03
## Fuel* NA NA NA
## Dummy Fuel -1.24 -0.47 0.00
## Losses 2.56 18.07 2.05
## Capped Losses 2.56 18.07 2.05
Appexdix 2
#Age & Age band
aggregate(project$Losses,by=list(AGE=project$`Age Interval`),mean)
## AGE x
## 1 16-27 516.8375
## 2 28-39 419.5922
## 3 40-51 412.1081
## 4 52-63 311.3217
## 5 64-75 207.2843
boxplot(Losses~`Age Interval`,horizontal=TRUE,col=c("green","yellow","red","pink","orange"))
Appendix 3
##Years of Driving experience
aggregate(project$Losses,by=list(YEARS=project$`Driving Experience Interval`),mean)
## YEARS x
## 1 0-11 505.6585
## 2 23-Dec 418.7986
## 3 24-35 417.1441
## 4 36-47 261.2160
## 5 48-59 205.8557
boxplot(Losses~`Driving Experience Interval`,horizontal=TRUE,col=c("green","yellow","blue","grey","red"))
Appendix 4
#Gender
aggregate(project$Losses,by=list(Gender=project$Gender),mean)
## Gender x
## 1 F 343.7114
## 2 M 437.2527
boxplot(Losses~Gender,horizontal=TRUE,col=c("green","yellow"))
Appendix 5
#Marrital status
aggregate(project$Losses,by=list(Married=project$Married),mean)
## Married x
## 1 Married 323.7421
## 2 Single 458.4047
boxplot(Losses~Married,horizontal=TRUE,col=c("green","yellow"))
Appendix 6
#Fuel type
aggregate(project$Losses,by=list(Fuel=project$Fuel),mean)
## Fuel x
## 1 D 720.0174
## 2 P 287.4435
boxplot(Losses~Fuel,horizontal=TRUE,col=c("green","yellow"))
Appendix 7
#Number of vehicles
aggregate(project$Losses,by=list(vehicles=project$`Number of Vehicles`),mean)
## vehicles x
## 1 1 397.3399
## 2 2 389.9020
## 3 3 386.9963
## 4 4 388.0263
boxplot(Losses~`Number of Vehicles`,horizontal=TRUE,col=c("green","yellow","purple","pink"))
Appendix 8
#vehicle age
aggregate(project$Losses,by=list(VehicleAge=project$`Vehicle Age Interval`),mean)
## VehicleAge x
## 1 0-3 527.2806
## 2 11-Aug 362.4155
## 3 15-Dec 325.5348
## 4 7-Apr 417.8239
boxplot(Losses~`Vehicle Age Interval`,horizontal=TRUE,col=c("green","yellow","red","orange"))
Appendix 9 #DISTRIBUTION OF DEPENDENTVARIABLE VARIABLES
library(lattice)
histogram(project$Losses,col="green",main="Distribution of dependent variable-capped loss",xlab="LOSS")
APPENDIX 10 CORROGRAM
library(corrgram)
corrgram(project, order=TRUE, lower.panel=panel.shade,
upper.panel=panel.pie, text.panel=panel.txt,
main="Corrgram ")
9.1.setwd(“~/”) setwd(“C:/Users/SUBARNA/Desktop/DATA INTERN ACTIVITY”) library(readr) project <- read_csv(“project.csv”)
9.2.attach(project)
fit<-lm(Losses~Average Age+Avg Driving Experience+Number of Vehicles+Gender Dummy+Dummy Married+Avg Vehicle Age+Dummy Fuel)
summary(fit)
9.3.library(psych) describe(project)
9.4aggregate(project\(Losses,by=list(AGE=project\)Age Interval),mean)
boxplot(Losses~Age Interval,horizontal=TRUE,col=c(“green”,“yellow”,“red”,“pink”,“orange”))
9.5.aggregate(project\(Losses,by=list(YEARS=project\)Driving Experience Interval),mean)
boxplot(Losses~Driving Experience Interval,horizontal=TRUE,col=c(“green”,“yellow”,“blue”,“grey”,“red”))
9.6.aggregate(project\(Losses,by=list(Gender=project\)Gender),mean)
boxplot(Losses~Gender,horizontal=TRUE,col=c(“green”,“yellow”))
9.7.aggregate(project\(Losses,by=list(Married=project\)Married),mean)
boxplot(Losses~Married,horizontal=TRUE,col=c(“green”,“yellow”))
9.8.aggregate(project\(Losses,by=list(Fuel=project\)Fuel),mean)
boxplot(Losses~Fuel,horizontal=TRUE,col=c(“green”,“yellow”))
9.9.aggregate(project\(Losses,by=list(vehicles=project\)Number of Vehicles),mean)
boxplot(Losses~Number of Vehicles,horizontal=TRUE,col=c(“green”,“yellow”,“purple”,“pink”))
9.10.aggregate(project\(Losses,by=list(VehicleAge=project\)Vehicle Age Interval),mean)
boxplot(Losses~Vehicle Age Interval,horizontal=TRUE,col=c(“green”,“yellow”,“red”,“orange”))
9.11.library(lattice) histogram(project$Losses,col=“green”,main=“Distribution of dependent variable-capped loss”,xlab=“LOSS”)
9.12.library(corrgram) corrgram(project, order=TRUE, lower.panel=panel.shade, upper.panel=panel.pie, text.panel=panel.txt, main=“Corrgram”)