1.INTRODUCTION:

Insurance is something that provides protection against financial losses. It is a type of a risk management. Insurance buyers are also known as risk avoiders by transferring the risk in terms of their need to reduce the amount of loss. Insurance companies use risk profiling methodology to calculate the price of the premium for their customers. Auto insurance-All drivers are required to carry auto insurance that covers potential costs related to an accident or theft. The costs may include repair or replacement of vehicles, or medical care that is related to injuries sustained in an accident. If law enforcement deems a driver at fault in an accident, that driver’s insurer picks up the tab. Auto insurance underwriters use a list of several criteria to determine whether you are likely to cause an accident such as credit history, age, address, driving records, marriage status, prior coverage etc. Auto insurance companies weigh these factors and others including your occupation, military service or your education level to determine your premium payment. Your ethnicity, religion or income cannot be used against you. If you are judged a high risk, you may be denied insurance. Most states offer a high-risk alternative source of insurance; your driving mistakes will cost you dearly in these cases because you have demonstrated that you are likely to cause insurer payouts. For example smoking is a high-risk behavior because it is known that smokers are likelier to need hospitalization. Health insurance companies may charge smokers more because there is a statistical likelihood that the policy owner will cost them money (Simmons, B.2015).

2.BACKGROUND OF THE STUDY:

First Auto Insurance Company (FAIC) is an auto insurance company based out of South Africa. They are one of the largest players in the field of the auto insurance in the country. Every year they insure 90 out of every 150 vehicle in South Africa. They are listed on the Johannesburg stock exchange limited with a market cap of $150 million. They have followed the traditional method of underwriting the policy based on the condition of the vehicle and having a general premium price for all. Till early 1990’s when the market was closed for international investments they have enjoyed a stupendous success but things have begun to change after 2010.Foreign auto insurance companies has started investing in South Africa from 1995.They have done their ground work well. To begin with they have initiated the market survey to understand the local auto insurance market and its pricing system. Through their study they found that largest auto insurance company has have enjoyed the monopoly in insurance market over several years. They also came to know that while determining the price of premium they give minimum importance to the risk profile of the customer. To break the monopoly of FAIC, Foreign players has bring in the concept of differential pricing which has taken risk profile of customer into consideration thus charging higher premium from high risk profile customer.

3.DATA DESCRIPTION:

1.Policy Number- Unique Policy Number, (Unique value identifying the policy ,Identifier) 2.Age -Age of Policy holder (16, 17,.,70 Numerical (Discrete)) 3.Years of Driving Experience -Years of Driving Experience of the Policy holder (0,1,..,53 , Numerical (Discrete) ) 4.Number of Vehicles- Number of Vehicles insured under the policy (1,2,3,4 , Numerical (Discrete) ) 5.Gender-Gender of the Policy holder (Female, Male ,Categorical (Binary) ) 6.Married -Marital status of the Policy holder (Married, Single Categorical (binary) ) 7.Vehicle Age -Age of vehicle insured under the policy (0,1,.,15 ,Numerical (Discrete) ) 8.Fuel Type - Fuel type of the vehicle insured (Diesel, Petrol , Categorical (Binary) ) 9.Losses Loss amount claimed under the policy (Range: 13- 3500, Numerical (Continuous) )

3.1 INDEPENDENT VARIABLES ANALYSIS

Age & Age band

After doing the analysis of the age variable it has been found that younger people are more prone to accidents than old age persons. Although when we have form the age band for the analysis it has been find that people in the age group of 16-25 are dangerous drivers. This explains the reason behind charging higher premium to the persons belonging to age group 16-25.(Appendix 2)

Years of Driving experience

The amount of time you have spent behind the driving wheel helps people in lowering the cost of insurance.Insurance companies has a practice of charging more to rookie driver than from the experienced driver.Also experienced driver is less likely to cause accidents thus lowering the loss for the insurance company.(Appendix 3)

Gender

Females are considered to be safe driver than males. Historically companies charges less premium to the females in comparison to the males.(Appendix 4)

Marital Status

According to the analysis of the marital status variable, Married people are tend to be safe drivers thus giving less loss to auto companies.(Appendix 5)

Fuel Type

According to the data analysis Diesel vehicle are more accident prone than the petrol vehicle.(Appendix 6)

Number of Vehicles

According to the data analysis if you have multiple vehicles you are less likely to cause an accident.(Appendix 7)

Vehicle Age

According to the data analysis average loss for new vehicle is less in comparison to the older vehicle.(Appendix 8)

4.MODEL ANALYSIS

Inorder to test the hypothesis,the following model was proposed

 LOSS= a0+a1*Avg Age+a2*Number of Vehicles+a3*Gender Dummy+a4*Married Dummy+a5*Avg Vehicle Age+a6*Fuel Type Dummy+a7*Avg driving experience+error
#regression analysis
attach(project)
fit<-lm(Losses~`Average Age`+`Avg Driving Experience`+`Number of Vehicles`+`Gender Dummy`+`Dummy Married`+`Avg Vehicle Age`+`Dummy Fuel`)
summary(fit)
## 
## Call:
## lm(formula = Losses ~ `Average Age` + `Avg Driving Experience` + 
##     `Number of Vehicles` + `Gender Dummy` + `Dummy Married` + 
##     `Avg Vehicle Age` + `Dummy Fuel`)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -353.4  -91.7   -3.5   74.6 3199.1 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               830.8209     7.3562 112.941  < 2e-16 ***
## `Average Age`              -2.9390     0.2723 -10.792  < 2e-16 ***
## `Avg Driving Experience`   -1.5576     0.2842  -5.481 4.31e-08 ***
## `Number of Vehicles`       -2.2293     1.3190  -1.690    0.091 .  
## `Gender Dummy`             49.5381     2.5642  19.319  < 2e-16 ***
## `Dummy Married`            78.1069     2.6054  29.979  < 2e-16 ***
## `Avg Vehicle Age`         -11.3789     0.3313 -34.344  < 2e-16 ***
## `Dummy Fuel`             -310.4897     3.5669 -87.048  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 155.5 on 15282 degrees of freedom
## Multiple R-squared:  0.6245, Adjusted R-squared:  0.6243 
## F-statistic:  3630 on 7 and 15282 DF,  p-value: < 2.2e-16

We established a model to find the effect of various factors on the loss.

5.DISCUSION:

The variable Number of vehicles statistically has no effect on the loss(p>0.05).Other variables such as average age , avg drving experience,gender dummy,married dummy, avg vehicle age,dummy fuel statistically has a significant effect on the loss(p<0.05) .62% (Multiple r-squared)variations in the dependent variable can be explained by the independent variable.The value increases when we add independent variable to it.Adjusted R square value is 0.594.It means 62.43%(Adjusted R-squared) variation in the dependent variable can be explained by the independent variable it become precise when we add independent variable to it.

6.CONCLUSION:

This paper was motivated by the need for research that could improve our understanding of how the premium pricing is made based on customer risk profiling in insurance companies. The unique contribution of this paper is that we investigated the loss made by customers based on the risk profiling. We found that Higher is the age, lower is the loss,Average Loss for Males is higher than Females,Average Loss for Single is higher than Married,Older is the vehicle, lower are the losses Losses are higher for Fuel type Diesel.And the number of vehicle the customer have has no statistical significance in the model.

7.REFRENCES:

  1. Simmons, B.(2015,January9),how insurance companies measure risk, Retrieved from http://www.insurancecompanies.com/insider-information-how-insurance-companies-measure-risk/

  2. Simons Lintel, (2016, September 19), The Determinants of Auto Insurance Premiums, Retrieved from http://thismatter.com/money/insurance/types/auto-insurance-cost-determinants.htm

  3. Jessica Bosrai,(2013,January 8),What Really Goes Into Determining Your Insurance Rates?, Retrieved from ,http://www.forbes.com/sites/moneywisewomen/2013/01/08/what-really-goes-into-determining-your-insurance-rates/#6c16ba743ffa

  4. Anne Freedman,(2013,August 1),What Factors Should Underwriters Consider?, Retrieved from http://riskandinsurance.com/what-factors-should-underwriters-consider/

  5. Investopedia Staff,(2014,September 16),How An Insurance Company Determines Your Premiums, Retrieved from http://www.investopedia.com/articles/pf/05/insurescore.asp

8.APPENDICES:

Appendix 1

#Summarizing the data
library(psych)
describe(project)
##                              vars     n      mean       sd   median
## Policy Number                   1 15290 149910.28 28948.81 149872.0
## Age                             2 15290     42.33    18.28     42.0
## Age Interval*                   3 15290       NaN       NA       NA
## Average Age                     4 15290     42.71    18.25     45.5
## Years of Driving Experience     5 15290     23.73    17.85     23.0
## Driving Experience Interval*    6 15290       NaN       NA       NA
## Avg Driving Experience          7 15290     24.46    17.30     17.5
## Number of Vehicles              8 15290      2.50     0.95      2.0
## Gender*                         9 15290       NaN       NA       NA
## Gender Dummy                   10 15290      0.49     0.50      0.0
## Married*                       11 15290       NaN       NA       NA
## Dummy Married                  12 15290      0.49     0.50      0.0
## Vehicle Age                    13 15290      8.66     4.34      9.0
## Vehicle Age Interval*          14 15290       NaN       NA       NA
## Avg Vehicle Age                15 15290      8.58     4.26      9.5
## Fuel*                          16 15290       NaN       NA       NA
## Dummy Fuel                     17 15290      0.76     0.43      1.0
## Losses                         18 15290    389.86   253.73    355.0
## Capped Losses                  19 15290    389.86   253.73    355.0
##                                trimmed      mad      min      max range
## Policy Number                149903.58 37216.97 100002.0 200000.0 99998
## Age                              42.18    26.69     16.0     70.0    54
## Age Interval*                      NaN       NA      Inf     -Inf  -Inf
## Average Age                      42.01    35.58     21.5     69.5    48
## Years of Driving Experience      23.41    26.69      0.0     53.0    53
## Driving Experience Interval*       NaN       NA      Inf     -Inf  -Inf
## Avg Driving Experience           23.20    17.79      5.5     53.5    48
## Number of Vehicles                2.49     1.48      1.0      4.0     3
## Gender*                            NaN       NA      Inf     -Inf  -Inf
## Gender Dummy                      0.49     0.00      0.0      1.0     1
## Married*                           NaN       NA      Inf     -Inf  -Inf
## Dummy Married                     0.49     0.00      0.0      1.0     1
## Vehicle Age                       8.87     4.45      0.0     15.0    15
## Vehicle Age Interval*              NaN       NA      Inf     -Inf  -Inf
## Avg Vehicle Age                   8.85     5.93      1.5     13.5    12
## Fuel*                              NaN       NA      Inf     -Inf  -Inf
## Dummy Fuel                        0.83     0.00      0.0      1.0     1
## Losses                          363.77   194.22     13.0   3500.0  3487
## Capped Losses                   363.77   194.22     13.0   3500.0  3487
##                               skew kurtosis     se
## Policy Number                 0.00    -1.21 234.11
## Age                           0.05    -1.51   0.15
## Age Interval*                   NA       NA     NA
## Average Age                   0.15    -1.47   0.15
## Years of Driving Experience   0.10    -1.51   0.14
## Driving Experience Interval*    NA       NA     NA
## Avg Driving Experience        0.25    -1.41   0.14
## Number of Vehicles            0.01    -0.93   0.01
## Gender*                         NA       NA     NA
## Gender Dummy                  0.03    -2.00   0.00
## Married*                        NA       NA     NA
## Dummy Married                 0.04    -2.00   0.00
## Vehicle Age                  -0.34    -0.93   0.04
## Vehicle Age Interval*           NA       NA     NA
## Avg Vehicle Age              -0.34    -1.14   0.03
## Fuel*                           NA       NA     NA
## Dummy Fuel                   -1.24    -0.47   0.00
## Losses                        2.56    18.07   2.05
## Capped Losses                 2.56    18.07   2.05

Appexdix 2

#Age & Age band
aggregate(project$Losses,by=list(AGE=project$`Age Interval`),mean)
##     AGE        x
## 1 16-27 516.8375
## 2 28-39 419.5922
## 3 40-51 412.1081
## 4 52-63 311.3217
## 5 64-75 207.2843
boxplot(Losses~`Age Interval`,horizontal=TRUE,col=c("green","yellow","red","pink","orange"))

Appendix 3

##Years of Driving experience
aggregate(project$Losses,by=list(YEARS=project$`Driving Experience Interval`),mean)
##    YEARS        x
## 1   0-11 505.6585
## 2 23-Dec 418.7986
## 3  24-35 417.1441
## 4  36-47 261.2160
## 5  48-59 205.8557
boxplot(Losses~`Driving Experience Interval`,horizontal=TRUE,col=c("green","yellow","blue","grey","red"))

Appendix 4

#Gender
aggregate(project$Losses,by=list(Gender=project$Gender),mean)
##   Gender        x
## 1      F 343.7114
## 2      M 437.2527
boxplot(Losses~Gender,horizontal=TRUE,col=c("green","yellow"))

Appendix 5

#Marrital status
aggregate(project$Losses,by=list(Married=project$Married),mean)
##   Married        x
## 1 Married 323.7421
## 2  Single 458.4047
boxplot(Losses~Married,horizontal=TRUE,col=c("green","yellow"))

Appendix 6

#Fuel type
aggregate(project$Losses,by=list(Fuel=project$Fuel),mean)
##   Fuel        x
## 1    D 720.0174
## 2    P 287.4435
boxplot(Losses~Fuel,horizontal=TRUE,col=c("green","yellow"))

Appendix 7

#Number of vehicles
aggregate(project$Losses,by=list(vehicles=project$`Number of Vehicles`),mean)
##   vehicles        x
## 1        1 397.3399
## 2        2 389.9020
## 3        3 386.9963
## 4        4 388.0263
boxplot(Losses~`Number of Vehicles`,horizontal=TRUE,col=c("green","yellow","purple","pink"))

Appendix 8

#vehicle age
aggregate(project$Losses,by=list(VehicleAge=project$`Vehicle Age Interval`),mean)
##   VehicleAge        x
## 1        0-3 527.2806
## 2     11-Aug 362.4155
## 3     15-Dec 325.5348
## 4      7-Apr 417.8239
boxplot(Losses~`Vehicle Age Interval`,horizontal=TRUE,col=c("green","yellow","red","orange"))

Appendix 9 #DISTRIBUTION OF DEPENDENTVARIABLE VARIABLES

library(lattice)
histogram(project$Losses,col="green",main="Distribution of dependent variable-capped loss",xlab="LOSS")

APPENDIX 10 CORROGRAM

library(corrgram)
corrgram(project, order=TRUE, lower.panel=panel.shade,
         upper.panel=panel.pie, text.panel=panel.txt,
         main="Corrgram ")

9.R code

9.1.setwd(“~/”) setwd(“C:/Users/SUBARNA/Desktop/DATA INTERN ACTIVITY”) library(readr) project <- read_csv(“project.csv”)

9.2.attach(project)

fit<-lm(Losses~Average Age+Avg Driving Experience+Number of Vehicles+Gender Dummy+Dummy Married+Avg Vehicle Age+Dummy Fuel)

summary(fit)

9.3.library(psych) describe(project)

9.4aggregate(project\(Losses,by=list(AGE=project\)Age Interval),mean)

boxplot(Losses~Age Interval,horizontal=TRUE,col=c(“green”,“yellow”,“red”,“pink”,“orange”))

9.5.aggregate(project\(Losses,by=list(YEARS=project\)Driving Experience Interval),mean)

boxplot(Losses~Driving Experience Interval,horizontal=TRUE,col=c(“green”,“yellow”,“blue”,“grey”,“red”))

9.6.aggregate(project\(Losses,by=list(Gender=project\)Gender),mean)

boxplot(Losses~Gender,horizontal=TRUE,col=c(“green”,“yellow”))

9.7.aggregate(project\(Losses,by=list(Married=project\)Married),mean)

boxplot(Losses~Married,horizontal=TRUE,col=c(“green”,“yellow”))

9.8.aggregate(project\(Losses,by=list(Fuel=project\)Fuel),mean)

boxplot(Losses~Fuel,horizontal=TRUE,col=c(“green”,“yellow”))

9.9.aggregate(project\(Losses,by=list(vehicles=project\)Number of Vehicles),mean)

boxplot(Losses~Number of Vehicles,horizontal=TRUE,col=c(“green”,“yellow”,“purple”,“pink”))

9.10.aggregate(project\(Losses,by=list(VehicleAge=project\)Vehicle Age Interval),mean)

boxplot(Losses~Vehicle Age Interval,horizontal=TRUE,col=c(“green”,“yellow”,“red”,“orange”))

9.11.library(lattice) histogram(project$Losses,col=“green”,main=“Distribution of dependent variable-capped loss”,xlab=“LOSS”)

9.12.library(corrgram) corrgram(project, order=TRUE, lower.panel=panel.shade, upper.panel=panel.pie, text.panel=panel.txt, main=“Corrgram”)