Policy and Claims

Within the insurance industry the claims and policy departments may be completely independent to each other.

It may be useful to know when a claim is lodged as to whether there is an active policy for the insured to be able to make a claim.

Using the data files made available the following building claims and policy have been analysed.

setwd("C:/Users/tesso/Documents/VIGNETTE")

#import file_claim_building file and rename to claim
claim<-read.csv("file_claim_building.csv")

#import file_policy_building file and rename to policy
policy<-read.csv("file_policy_building.csv")

Merge the two data sets together by the common column of the policy number. This will now create one data set with all the necessary data required.

#merge claim and policy by the PolicyID and rename to building
building<-merge(claim, policy, by=("PolicyID"))

library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 3.5.3
#> -- Attaching packages --------------------------------------------------------------------------- tidyverse 1.2.1 --
#> v ggplot2 2.2.1     v purrr   0.3.2
#> v tibble  1.4.2     v dplyr   0.7.5
#> v tidyr   0.8.1     v stringr 1.3.1
#> v readr   1.1.1     v forcats 0.3.0
#> Warning: package 'purrr' was built under R version 3.5.3
#> -- Conflicts ------------------------------------------------------------------------------ tidyverse_conflicts() --
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag()    masks stats::lag()

Use the summary function to see summary data for the Gross Loss costs of the High Risk group.

High<-filter(building, building$GeoRisk=="High Risk" )


summary(High$GrossLossBuilding)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   700.1  2134.4  2564.3  2765.1  2979.5 12034.9

Create a histogram using the ggplot function to count the number of claims per GeoRIsk category.

ggplot(building, aes(building$GeoRisk))+geom_bar(aes())+labs(title =" Number of Claims by Risk Category", x = "Risk Category", y = "Number of Claims")

Create a box and whisker plot using the ggplot function to visualise the summary of each GeoRisk category against the Gross cost of each claim in that category.

ggplot(building, aes(building$GeoRisk, building$GrossLossBuilding))+geom_boxplot()+ labs(title =" Risk vs Gross Cost", x = "Risk Category", y = "Gross Loss")

This is just an introduction to the merging data sets and visualising the data within the merged set.

We have succesfully been able to merge the claims and policy documents and then visualised both a histogram of the categories as well as a box and whisker of claims cost by category.