Within the insurance industry the claims and policy departments may be completely independent to each other.
It may be useful to know when a claim is lodged as to whether there is an active policy for the insured to be able to make a claim.
Using the data files made available the following building claims and policy have been analysed.
setwd("C:/Users/tesso/Documents/VIGNETTE")
IMPORTING DATA
Import the two data sets required, one for the claims and the other for policy.
Import using the read.csv function
#import file_claim_building file and rename to claim
claim<-read.csv("file_claim_building.csv")
#import file_policy_building file and rename to policy
policy<-read.csv("file_policy_building.csv")
MERGING DATA
Merge the two data sets together by the common column of the policy number. This will now create one data set with all the necessary data required.
#merge claim and policy by the PolicyID and rename to building
building<-merge(claim, policy, by=("PolicyID"))
ANALYSING THE DATA
Load the tidyverse package
library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 3.5.3
#> -- Attaching packages --------------------------------------------------------------------------- tidyverse 1.2.1 --
#> v ggplot2 2.2.1 v purrr 0.3.2
#> v tibble 1.4.2 v dplyr 0.7.5
#> v tidyr 0.8.1 v stringr 1.3.1
#> v readr 1.1.1 v forcats 0.3.0
#> Warning: package 'purrr' was built under R version 3.5.3
#> -- Conflicts ------------------------------------------------------------------------------ tidyverse_conflicts() --
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag() masks stats::lag()
Filter the data such that only the GeoRisk group of High Risk is displayed.
Use the summary function to see summary data for the Gross Loss costs of the High Risk group.
This can then easily be replicated for the other groups.
High<-filter(building, building$GeoRisk=="High Risk" )
summary(High$GrossLossBuilding)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 700.1 2134.4 2564.3 2765.1 2979.5 12034.9
VISUALISING THE DATA
Create a histogram using the ggplot function to count the number of claims per GeoRIsk category.
ggplot(building, aes(building$GeoRisk))+geom_bar(aes())+labs(title =" Number of Claims by Risk Category", x = "Risk Category", y = "Number of Claims")
Create a box and whisker plot using the ggplot function to visualise the summary of each GeoRisk category against the Gross cost of each claim in that category.
ggplot(building, aes(building$GeoRisk, building$GrossLossBuilding))+geom_boxplot()+ labs(title =" Risk vs Gross Cost", x = "Risk Category", y = "Gross Loss")
This is just an introduction to the merging data sets and visualising the data within the merged set.
We have succesfully been able to merge the claims and policy documents and then visualised both a histogram of the categories as well as a box and whisker of claims cost by category.
Link to CSV documents