Description and Source of Data

This data set contains 1000 observations on 39 variables concerning insurance claims made by motorists after being involved in a collision. Data can be downloaded from: https://www.kaggle.com/roshansharma/insurance-claim.

Age and Education Level of Drivers

  1. The median for the age of drivers is 38, and the mean for the age of drivers is 38.948. Since the median and mean are so close together, we can infer that the shape of the age distribution is approximately Normal and symmetrical.
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   19.00   32.00   38.00   38.95   44.00   64.00
  1. The histogram displayed below seems to be slightly skewed to the right, given the slightly longer tail at the right end of the graph. This is not entirely consistent with my findings in question 1, which led me to conclude that the shape of the distribution is symmetrical.
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   19.00   32.00   38.00   38.95   44.00   64.00

## 
##   Associate     College High School          JD     Masters          MD 
##       0.145       0.122       0.160       0.161       0.143       0.144 
##         PhD 
##       0.125

Collisions and Car Manufacturers

## 
##     Accura       Audi        BMW  Chevrolet      Dodge       Ford      Honda 
##      0.068      0.069      0.072      0.076      0.080      0.072      0.055 
##       Jeep   Mercedes     Nissan       Saab     Suburu     Toyota Volkswagen 
##      0.067      0.065      0.078      0.080      0.080      0.070      0.068

  1. The only manufacturer that stands out to me in the box plots below is Toyota. Toyota stands out because it has a significantly larger spread from the first quartile to the third quartile compared to all other manufacturers. However, what is unique about Toyota and what makes it stand out even more is the fact that its spread from its minimum to its maximum is relatively small compared to other manufacturers.

Age and Claim Amounts

  1. Since the scatter plot shows no clear patterns, there seems to be no correlation between age of drivers and claim amount.

Your own investigation #1

Question Raised: Does gender effect how often car insurance claims are made?

Investigation Using Graphs:

Question Answered: According to the graphs displayed above, it appears that females make more car insurance claims than males. According to the relative frequency graph, about 55% of the ones who made claims were female and 45% were male. While this may cause one to conclude that women make more claims than men, I do not think that is a reasonable conclusion to make off of such little information.

Your own investigation #2

Question Raised: Does the severity of the collision effect how expensive the total car insurance claim will be?

Investigation Using Graphs:

Question Answered: As you can see in the graph, the severity of the incident likely effects the total claim amount. Trivial damage has a far lower spread and median than that of the other 3 variables, and minor damage has a slightly smaller median and first quartile compared to the more severe incidents like total loss and major damage. ## Analysis Critique

A hypothetical data analyst created the following graph to help him figure out whether certain types of incidents tend to occur more often at certain times of the day.

  1. The two variables are hour of the day and incident type.

  2. This graph is useful because it shows that different incident types are in fact occuring at different hours of the day. According to the graph, parked car and vehicle theft incidents occur more in the early hours of the day, whereas multi-vehicle and single vehicle collisions occur more during the busy hours of the day when people are out driving more.

  3. An alternative method to addressing this question would be to look at the number of vehicles involved in the incident and at what time the incident occurs. If there are more than one vehicle involved, it is a multi_vehicle incident, and if not, it is one of the other 3 variables. The only issue with this method is that you will not be able to tell the difference between theft, single vehicle collision, and parked car incidents. However, I do not see that being a major issue because this method will be able to clearly display whether multi-vehicle incidents occur more during the busy hours of the day compared to incidents involving one vehicle.