Description and Source of Data

This data set contains 1000 observations on 39 variables concerning insurance claims made by motorists after being involved in a collision. Data can be downloaded from: https://www.kaggle.com/roshansharma/insurance-claim.

Age and Education Level of Drivers


5 Number Summary and Mean

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   19.00   32.00   38.00   38.95   44.00   64.00


The mean age of drivers is larger than the median age of drivers. Based on this, the shape of the age distribution should be skewed to the right.


Age of Drivers Histogram



The distribution of the driver’s ages is skewed to the right on the histogram. This is consistent with my expectation.


Education Level of Drivers Frequency Table


## 
##   Associate     College High School          JD     Masters          MD 
##         145         122         160         161         143         144 
##         PhD 
##         125


Education Level of Drivers Bar Graph


Collisions and Car Manufacturers


Auto Make Table and Bar Graph


## 
##     Accura       Audi        BMW  Chevrolet      Dodge       Ford      Honda 
##         68         69         72         76         80         72         55 
##       Jeep   Mercedes     Nissan       Saab     Suburu     Toyota Volkswagen 
##         67         65         78         80         80         70         68


Auto Make vs Claim Amount Box Plot



The manufacturer that stands out the most is Toyota since it has widest boxplot. This means it has the widest range of claim amounts compared to other manufacturers. Additionally, manufacturers such as Saab, Nissan, Honda and Ford have many outliers left of the minimum. This could be the result of claims filed for older cars that are not worth as much as newer cars.


Age and Claim Amounts



There appears to be a correlation between age of drivers and claim amounts, with individuals between 26 and 48 having the highest claim amounts. Potential reasons for this could be that younger individuals may be less likely to afford newer and more expensive cars compared to individuals between 26 and 48. Likewise, individuals near retirement age may not need to drive as often compared to those who still commute to work.


Hobbies Histogram



The most popular hobby of drivers in the data set is reading. This raises interesting questions such as if reading somehow contributes to distracted driving or if it also correlates to another factor such as age which might increase the likelihood of filing car claims.


Fraud Reported vs Claim Amount



Claims where fraud was reported appear to be more likely to have a higher claim amount.This could be because the higher the vehicle claim amount, the higher the payout the person committing fraud would get.

Analysis Critique



Two variable in the box plot are the type of incident of the claim and the hour of the day that the incident occurred.

The graph is useful at answering the question because it clearly shows that collisions tend to take place during the morning and afternoon, while incidents involving a parked car and vehicle theft typically occur early in the morning.


Another way to address the question is to construct a stacked bar chart which makes it easier to visually compare the incident types at different hours of the day.