This document is written to illustrate the issues observed in the data set that is related to the data completeness, accuracy and consistency. Do note that issues that are raised here does not necessary mean that there is an inherent issue, but rather it is potentially an area that we should follow-up with Client X to ensure the correctness of the report.

Claims Information

For claims data, there are two files in total.

  1. historical claim

  2. open claim

Issue 1: Inconsistency between cause of loss and event type - An example is shown below:

historical_claims[  EVENT_TYPE == "THEFT", table(CAUSE_OF_LOSS, EVENT_TYPE)]
##                                                      EVENT_TYPE
## CAUSE_OF_LOSS                                         THEFT
##   COLLISION                                            4187
##   COLLISION WITH ANIMAL                                2807
##   COLLISION WITH COMMERICAL VEHICLES/FLEETS            2085
##   COLLISION WITH FLYING OBJECTS (NOT AIRCRAFT)         2067
##   COLLISION WITH NON-VEHICULAR OBJECT                  2859
##   COLLISION WITH OTHER NON-VEHICULAR PROPERTY          2099
##   COLLISION WITH OTHER VEHICLE - AT FAULT              2028
##   COLLISION WITH OTHER VEHICLE - NOT AT FAULT          2088
##   COLLISION WITH TREE                                  2847
##   FIRE                                                 2940
##   FLOOD                                                5650
##   HIT BY ANOTHER VEHICLE                               2908
##   INFLUENCED BY FALLING/FLYING OBJECTS (NOT AIRCRAFT)  2735
##   LOSS OF KEYS                                         2777
##   MALICIOUS DAMAGE                                     2790
##   PART THEFT                                           2821
##   THEFT                                                2799

Issue 2: Distribution of claims - Need to confirm if there are any missing claims and if IBNR has been considered.

ALL_CLAIMS[year(DATE_OF_LOSS) >= 2019, table(format(DATE_OF_LOSS, format = "%Y-%m"), SOURCE )]
##          SOURCE
##           Historical OPEN
##   2019-01       6197    0
##   2019-02       5433    0
##   2019-03       5782    0
##   2019-04       5474    0
##   2019-05       5222    0
##   2019-06       4901    0
##   2019-07       4925    0
##   2019-08       4429    0
##   2019-09       4162    0
##   2019-10       3987    0
##   2019-11       3652    0
##   2019-12       3422    0
##   2020-01       2953    0
##   2020-02       2610    0
##   2020-03       2334    0
##   2020-04       1955    0
##   2020-05       1689    0
##   2020-06       1267    0
##   2020-07        946    0
##   2020-08        648    0
##   2020-09        301   35
##   2020-10         22   72
##   2020-11          0    8

Data Modelling of Claims Data

  1. Append “Historical Claim” and “Open Claim”
  2. Alignment of Date Format
#######################################################################################
# Appending of the claims data                                                        #
#######################################################################################
historical_claims[, SOURCE := "Historical"]
open_claims[      , SOURCE := "OPEN"      ]
ALL_CLAIMS <- rbind(historical_claims, open_claims)
names(ALL_CLAIMS)[stringr::str_detect(names(ALL_CLAIMS), pattern = "DATE")][1:15] -> col_for_Date
for(i in 1:15){
  ALL_CLAIMS[ , col_for_Date[i] := as.Date(get(col_for_Date[i]), format = "%m/%d/%Y")]
}

Exposure Information

For exposure data, there are three files in total.

  1. Vehicle details

  2. Customer details

  3. Policy details

Issue 1: Year of Manufacture of Vehicle are all 2012 or earlier. This needs to be further updated. On a similar note, the years of driving experience for customers are outdated too.

##     YEAR_OF_MANUFACTURE      N
##  1:                1995  19075
##  2:                1996  19307
##  3:                1997  19317
##  4:                1998  19118
##  5:                1999  19173
##  6:                2000 137424
##  7:                2001 118235
##  8:                2002 118183
##  9:                2003 118442
## 10:                2004 117343
## 11:                2005 117495
## 12:                2006 118023
## 13:                2007 118325
## 14:                2008 118176
## 15:                2009 117922
## 16:                2010 118232
## 17:                2011 117786
## 18:                2012 118424

Issue 2: Vehicle Gross Weight might not be the same unit for all weight. It is odd that a bus is of the same weight as a motorbike

 vehicle_new[ VEHICLE_GROSS_WEIGHT < 800, table(VEHICLE_GROSS_WEIGHT, VEHICLE_TYPE)]
##                     VEHICLE_TYPE
## VEHICLE_GROSS_WEIGHT  BUS CARRY_GOODS MOTORBIKE MOTORCAR
##                  500  373         396      6714     4028
##                  510  402         384      6763     4028
##                  520  420         396      6768     3998
##                  530  411         385      6848     4018
##                  540  369         327      6859     4190
##                  550  364         389      6745     4033
##                  560  375         361      6825     4076
##                  570  380         394      6937     4116
##                  580  376         360      6686     4042
##                  590  385         358      6780     4016
##                  600  391         381      6949     4063
##                  610  379         375      6798     4133
##                  620  399         391      6767     4034
##                  630  378         375      6755     4019
##                  640  359         358      6806     4071
##                  650  363         377      6774     4077
##                  660  333         368      6940     4025
##                  670  372         357      6837     4130
##                  680  381         390      6790     3965
##                  690  389         352      6855     4103
##                  700  368         410      6727     3941
##                  710  357         345      6919     3929
##                  720  391         360      6916     4045
##                  730  350         408      6751     4032
##                  740  402         394      6883     4021
##                  750  369         376      6805     4044
##                  760  402         401      6600     4008
##                  770  358         377      6758     3979
##                  780  353         364      6774     4100
##                  790  380         384      6631     3984

Issue 3: Engine Size might not be the same unit for all entries. It is odd that the engine size of a bus is smaller than a motorbike. I suspect that it is written in horsepower rather than cc.

 vehicle_new[ ENGINE_SIZE_CC < 1000 , table(ENGINE_SIZE_CC, VEHICLE_TYPE)]
##               VEHICLE_TYPE
## ENGINE_SIZE_CC   BUS CARRY_GOODS MOTORBIKE MOTORCAR
##            500  1522         714     17752    13031
##            520  1490         722     17673    12992
##            540  1501         697     17936    13383
##            560  1426         726     17693    13308
##            580  1450         779     17808    13094
##            600  1500         712     17912    13200
##            620  1462         736     17789    13286
##            640  1458         688     17862    13217
##            660  1453         727     17890    13151
##            680  1485         751     17738    13277
##            700  1494         724     17859    13443
##            720  1499         745     17692    13313
##            740  1437         779     17815    13144
##            760  1432         720     17964    13190
##            780  1538         736     17901    13426
##            800  1536         731     17890    13249
##            820  1457         782     17938    13145
##            840  1467         736     17655    12984
##            860  1459         798     17800    13367
##            880  1497         751     17693    13292
##            900  1481         748     17970    13389
##            920  1515         745     17750    13389
##            940  1425         705     17593    13252
##            960  1473         718     17763    13210
##            980  1423         738     17615    13406

Customer Details.

Issue 4: 103,834 customer cannot be found to have a vehicle

customer[!CUSTOMER_IDENTIFIER %in% vehicle_new$CUSTOMER_IDENTIFIER,]
##         CUSTOMER_IDENTIFIER GENDER  DRIVING_EXPERIENCE
##      1:          CUSTOMER_3      M THREE_TO_FIVE_YEARS
##      2:         CUSTOMER_28      M THREE_TO_FIVE_YEARS
##      3:         CUSTOMER_33      M THREE_TO_FIVE_YEARS
##      4:         CUSTOMER_34      M   FIVE_TO_TEN_YEARS
##      5:         CUSTOMER_39      M THREE_TO_FIVE_YEARS
##     ---                                               
## 103830:    CUSTOMER_1038528      F THREE_TO_FIVE_YEARS
## 103831:    CUSTOMER_1038531      M THREE_TO_FIVE_YEARS
## 103832:    CUSTOMER_1038532      M THREE_TO_FIVE_YEARS
## 103833:    CUSTOMER_1038535      M THREE_TO_FIVE_YEARS
## 103834:    CUSTOMER_1038551      M THREE_TO_FIVE_YEARS
# vehicle_new[!CUSTOMER_IDENTIFIER %in% customer$CUSTOMER_IDENTIFIER,]

# policies_new[, i_terdt := CANCELLATION_DATE]
# policies_new[is.na(i_terdt), i_terdt := POLICY_END_DATE]

Issue 5: 203,352 Vehicles do not have corresponding policies.

vehicle_new[!VEHICLE_IDENTIFIER %in% policies_new$VEHICLE_IDENTIFIER]
##         VEHICLE_IDENTIFIER YEAR_OF_MANUFACTURE         MAKE_OF_VEHICLE
##      1:       VEHICLE_4771                2004           MARUTI SUZUKI
##      2:       VEHICLE_4838                2005              JOHN DEERE
##      3:       VEHICLE_7194                2005                  Jaguar
##      4:       VEHICLE_7867                2002                    OPEL
##      5:      VEHICLE_11450                2008                    Mini
##     ---                                                               
## 203348:    VEHICLE_1649794                1995                     HMT
## 203349:    VEHICLE_1649831                2000                 Renault
## 203350:    VEHICLE_1649936                2007                 SHRACHI
## 203351:    VEHICLE_1649960                2002                Maserati
## 203352:    VEHICLE_1649971                2000 TRISHUL AGRO INDUSTRIES
##         VEHICLE_TYPE SEATING_CAPACITY ENGINE_SIZE_CC VEHICLE_GROSS_WEIGHT
##      1:    MOTORBIKE                1           1600                  530
##      2:  CARRY_GOODS                2           1410                 2500
##      3:     MOTORCAR                5           1000                  710
##      4:    MOTORBIKE                2           1765                  890
##      5:     MOTORCAR                5           7280                 1865
##     ---                                                                  
## 203348:          BUS               58            960                 3100
## 203349:     MOTORCAR                2            840                 2290
## 203350:  CARRY_GOODS                7           2763                 2800
## 203351:     MOTORCAR                2           1000                  630
## 203352:    MOTORBIKE                1            580                  660
##         CUSTOMER_IDENTIFIER
##      1:     CUSTOMER_936772
##      2:      CUSTOMER_29936
##      3:        CUSTOMER_107
##      4:     CUSTOMER_710934
##      5:    CUSTOMER_1018887
##     ---                    
## 203348:     CUSTOMER_231383
## 203349:     CUSTOMER_199253
## 203350:     CUSTOMER_730870
## 203351:     CUSTOMER_567963
## 203352:     CUSTOMER_767561

Data Modelling of Exposure Data

Step 1: Merge vehicle_new & customer together using “CUSTOMER_IDENTIFIER”

Step 2: Merge the result of step 1 with policies_new using “VEHICLE_IDENTIFIER”

The completion of this step will provide a policy data with information on vehicle and customer.

customer[vehicle_new, on  = c("CUSTOMER_IDENTIFIER")] -> vehicle_customer         
vehicle_customer[policies_new, on  = c("VEHICLE_IDENTIFIER")] -> policy_vehicle_customer         

Step 3: A termination date of the policy is also added to this data by taking the earlier of cancellation date or policy date. This termination date will help to calculate exposure later.

policy_vehicle_customer[, i_terdt := CANCELLATION_DATE]
policy_vehicle_customer[is.na(i_terdt), i_terdt := POLICY_END_DATE]

The data set “policy_vehicle_customer” will then be exported later.

Next - To create exposure according to period of time in-force in a year. For instance, a policy start date in 1st Dec 2012 and policy end date in 1st Dec 2013 will be given exposure of 1/12 in 2012 and 11/12 in 2013.

exposure.computation <- function(finalpol = policies_new ,survey.start = as.Date("2012-01-01", format = "%Y-%m-%d") ,
                                   survey.end = as.Date("2020-12-31", format = "%Y-%m-%d"))
{
  
  exposures_old <- finalpol
  exposures <- NULL 
  
  
  # Step 1 # - Calendar Year
  for (calendar_year in as.numeric(format(survey.start, "%Y")):as.numeric(format(survey.end, "%Y")) ) {
    #cat("Calendar Year", calendar_year, "\n")
    tmp <- exposures_old
    tmp$calendar_year <- calendar_year
    calendar_year_start <- as.Date(paste(calendar_year, '01', '01', sep = "-"))
    calendar_year_end <- as.Date(paste(calendar_year + 1, '01', '01', sep = "-"))
    if (calendar_year_start < survey.start) {
      calendar_year_start = survey.start
    }
    if (calendar_year_end > survey.end) {
      calendar_year_end = survey.end
    }
    
    tmp$start_date <- as.Date(ifelse(calendar_year_start > tmp$POLICY_START_DATE, calendar_year_start, tmp$POLICY_START_DATE), origin = "1970-01-01")
    tmp$end_date <- as.Date(ifelse(is.na(tmp$i_terdt), calendar_year_end, ifelse(calendar_year_end<tmp$i_terdt, calendar_year_end, tmp$i_terdt) ), origin = "1970-01-01")
    tmp <- tmp[which(tmp$start_date < tmp$end_date),]
    
    if (exists('exposures')) {
      exposures <- rbind(exposures, tmp)
    } else {
      exposures <- tmp
    }
  }
  
  rm(tmp)
  rm(calendar_year)
  rm(calendar_year_start)
  rm(calendar_year_end)
  rm(exposures_old)
  
  exposures$exposure <- as.numeric(exposures$end_date - exposures$start_date)/365.25
  return(exposures)
}




exposure_new <- exposure.computation(finalpol = policy_vehicle_customer ,survey.start = as.Date("2012-01-01", format = "%Y-%m-%d") ,
                                       survey.end = as.Date("2021-01-01", format = "%Y-%m-%d"))

Now that the result of the exposure has been computed, we will create a column for “Earned Premium” by using the exposure years computed and multiply it with the Gross Written Premium (GWP).

exposure_new[ , Earned_Premium := exposure * GWP]
exposure_new[, .(POLICY_IDENTIFIER, calendar_year, exposure, Earned_Premium)] -> exposure_new

Summary

Finally, we have the following data set to be uploaded into Power BI

  1. ALL_CLAIMS as ALL Claims 20220814

  2. policy_vehicle_customer as policies_new_20220814

  3. exposure_new as “exposure 20220814”

These three data set will be added into Power BI with the following diagram as the Model between them.

Data Flow of Data into Power BI

Power BI Data Model