This data set contains 1000 observations on 39 variables concerning insurance claims made by motorists after being involved in a collision. Data can be downloaded from: https://www.kaggle.com/roshansharma/insurance-claim.
From my drivers’ age table, i notice that the the value of median=38 and mean=38.95 are so close that i think the bell curve might be the shape of the age distribution.
This histogram shape is pretty consistent to the shape i thought the age distribution might be.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 19.00 32.00 38.00 38.95 44.00 64.00
##
## Associate College High School JD Masters MD
## 145 122 160 161 143 144
## PhD
## 125
##
## Accura Audi BMW Chevrolet Dodge Ford Honda
## 68 69 72 76 80 72 55
## Jeep Mercedes Nissan Saab Suburu Toyota Volkswagen
## 67 65 78 80 80 70 68
From auto_make versus total_claim_amount box plots, Toyota is the manufacturer that distinct itself from the rest with a very large amount of claims supposedly 53000 claims.
By observing the scatter plots from age versus claim amount, i am flatted to say that there is a correlation between age of drivers and the amount of collisions. From approximately 24 to 47 year old, drivers are more involved in car accidents than those that haves age greater than 47.
incident_severity<-table(car_claims$incident_severity) incident_severity
ggplot(car_claims ,aes(x=incident_severity,fill=incident_severity))+ geom_bar()+labs(title = “incident_severity”, x="“,y=”")
This bar graph shows the relation between incident severity and and the amount of claims. Surprisingly, drivers with very little damages are those who are reportedly filling more claims. It seems understandable that major injury and total loss claims be high probably because they are looking for reparation. But, what can eventually triggers the large amount of minor claims at the point to overpass any other claim damage?
ggplot(car_claims, aes(y =vehicle_claim, x = injury_claim))+geom_point()
By observing the scatter plots comparing vehicle claims and injury claims, i notice that there is a strong positive correlation between injury claims and vehicle claims. Each injury claim is directly tied to to a car accident. So, the insurance company should also wait for an injury claim most of the time that an accident will occur.
A hypothetical data analyst created the following graph to help him figure out whether certain types of incidents tend to occur more often at certain times of the day.
To address his concerns, the analyst uses two variables in this analysis: hour of the day and incident type to get box plots.
The box plots in this case are very useful for the analyst’s question because it Provides a detailed response. of course some accident types tend to occur at a certain moment of the day. for example multiple and single vehicle collision ten to occur mostly between 7AM and 7PM and the black line shows that the average accident occur around 2 PM in the day. This correspond to the moment of high traffic when most people go or back from daily activities. however, parked and theft cars accidents mostly occur from 4 AM to 9 AM when most people steal in bed with the average accident happening around 6 AM.
(3)Another way to address the concern is by building a a bar graph that show that at the beginning of the day, there is only few parked and theft vehicles that occur and later in the day several multiple and single car accidents occur.Each event seems not to be related to one another other.
incident_type_table<-table(car_claims\(incident_type\)incident_hour_of_the_day) incident_type_table
prop.table(incident_type_table,2) ggplot(car_claims ,aes(x=incident_type,fill=incident_type))+ geom_bar()+labs(title = “incident_type”, x="“,y=”")