This data set contains 1000 observations on 39 variables concerning insurance claims made by motorists after being involved in a collision. Data can be downloaded from: https://www.kaggle.com/roshansharma/insurance-claim.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 19.00 32.00 38.00 38.95 44.00 64.00
The mean age of drivers is larger than the median age of drivers. Based on this, the shape of the age distribution should be skewed to the right.
The distribution of the driver’s ages is skewed to the right on the histogram. This is consistent with my expectation.
##
## Associate College High School JD Masters MD
## 145 122 160 161 143 144
## PhD
## 125
##
## Accura Audi BMW Chevrolet Dodge Ford Honda
## 68 69 72 76 80 72 55
## Jeep Mercedes Nissan Saab Suburu Toyota Volkswagen
## 67 65 78 80 80 70 68
The manufacturer that stands out the most is Toyota since it has widest boxplot. This means it has the widest range of claim amounts compared to other manufacturers. Additionally, manufacturers such as Saab, Nissan, Honda and Ford have many outliers left of the minimum. This could be the result of claims filed for older cars that are not worth as much as newer cars.
There appears to be a correlation between age of drivers and claim amounts, with individuals between 26 and 48 having the highest claim amounts. Potential reasons for this could be that younger individuals may be less likely to afford newer and more expensive cars compared to individuals between 26 and 48. Likewise, individuals near retirement age may not need to drive as often compared to those who still commute to work.
The most popular hobby of drivers in the data set is reading. This raises interesting questions such as if reading somehow contributes to distracted driving or if it also correlates to another factor such as age which might increase the likelihood of filing car claims.
Claims where fraud was reported appear to be more likely to have a higher claim amount.This could be because the higher the vehicle claim amount, the higher the payout the person committing fraud would get.
Two variable in the box plot are the type of incident of the claim and the hour of the day that the incident occurred.
The graph is useful at answering the question because it clearly shows that collisions tend to take place during the morning and afternoon, while incidents involving a parked car and vehicle theft typically occur early in the morning.
Another way to address the question is to construct a stacked bar chart which makes it easier to visually compare the incident types at different hours of the day.