Automobile insurance company located in the southwest and western regions of the United States.Its depend on accurate pricing estimates to maintain profitability. Auto policies must be priced so that the insurance company makes a profit in the long run, given the costs of their customers’ payouts for accident repairs, total loss car replacements, medical fees, and legal fees.
The executives at this insurance company have noticed declining profitability over the last several years and have hired you as a data science consultant to evaluate their claims data and make recommendations on pricing, customer behavior, and car insurance policy adjustments.
This report presents an analysis of insurance policy data from five states, including Washington, California, Arizona, Nevada, and Oregon. company.
Claim_df Dataset
The Claim_df is loaded below and consists of 6,249 auto claims submitted by customers of the insurance company.
Load and Summary the Claim_df Dataset
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 1.0.1
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.5.0
## ✔ readr 2.1.3 ✔ forcats 1.0.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(dplyr)
library(ggplot2)
library(readr)
claims_df <-
readRDS(url('https://gmubusinessanalytics.netlify.app/data/claims_df.rds'))
claims_df
## # A tibble: 6,249 × 20
## custo…¹ custo…² highe…³ emplo…⁴ gender income resid…⁵ marit…⁶ sales…⁷ cover…⁸
## <chr> <fct> <fct> <fct> <fct> <dbl> <fct> <fct> <fct> <fct>
## 1 AA11235 Nevada Bachel… Medica… Female 11167 Suburb… Married Branch Basic
## 2 AA16582 Washin… Bachel… Medica… Male 14072 Suburb… Divorc… Agent Basic
## 3 AA34092 Califo… Associ… Employ… Male 33635 Suburb… Married Web Extend…
## 4 AA56476 Arizona High S… Employ… Female 74454 Suburb… Single Call C… Basic
## 5 AA69265 Nevada Bachel… Employ… Female 60817 Suburb… Single Web Premium
## 6 AA71604 Arizona Master Employ… Female 87560 Suburb… Married Web Extend…
## 7 AA93585 Califo… Associ… Employ… Male 97024 Urban Married Branch Premium
## 8 AB21519 Califo… Associ… Employ… Female 93272 Urban Married Branch Extend…
## 9 AB23825 Califo… Associ… Employ… Male 21509 Suburb… Single Agent Extend…
## 10 AB26022 Oregon High S… Retired Male 26487 Suburb… Single Call C… Basic
## # … with 6,239 more rows, 10 more variables: policy <fct>, vehicle_class <fct>,
## # vehicle_size <fct>, monthly_premium <dbl>, months_policy_active <dbl>,
## # months_since_last_claim <dbl>, current_claim_amount <dbl>,
## # total_claims <dbl>, total_claims_amount <dbl>,
## # customer_lifetime_value <dbl>, and abbreviated variable names ¹customer_id,
## # ²customer_state, ³highest_education, ⁴employment_status, ⁵residence_type,
## # ⁶marital_status, ⁷sales_channel, ⁸coverage
summary(claims_df)
## customer_id customer_state highest_education employment_status
## Length:6249 Washington: 554 High School:1708 Employed :5154
## Class :character Oregon :1763 Associate :1843 Medical Leave: 421
## Mode :character California:2150 Bachelor :1868 Disabled : 392
## Nevada : 601 Master : 578 Retired : 282
## Arizona :1181 Doctoral : 252
##
## gender income residence_type marital_status
## Female:3177 Min. :10037 Urban :1495 Single :1027
## Male :3072 1st Qu.:27750 Suburban:3657 Married :4158
## Median :46503 Rural :1097 Divorced:1064
## Mean :49600
## 3rd Qu.:69259
## Max. :99981
## sales_channel coverage policy vehicle_class
## Agent :2359 Basic :3815 Personal :4658 Two-Door Car :1292
## Branch :1771 Extended:1858 Corporate:1328 Four-Door Car:3124
## Call Center:1218 Premium : 576 Special : 263 Sports Car : 335
## Web : 901 SUV :1246
## Luxury Car : 119
## Luxury SUV : 133
## vehicle_size monthly_premium months_policy_active months_since_last_claim
## Small :1249 Min. : 61.00 Min. :12.00 Min. : 0.00
## Midsize:4396 1st Qu.: 69.00 1st Qu.:30.00 1st Qu.: 6.00
## Large : 604 Median : 83.00 Median :38.00 Median :14.00
## Mean : 93.81 Mean :39.03 Mean :14.97
## 3rd Qu.:109.00 3rd Qu.:48.00 3rd Qu.:23.00
## Max. :297.00 Max. :71.00 Max. :35.00
## current_claim_amount total_claims total_claims_amount
## Min. : 739 Min. :1.000 Min. : 859
## 1st Qu.:1285 1st Qu.:2.000 1st Qu.:2224
## Median :1628 Median :2.000 Median :2664
## Mean :1625 Mean :2.392 Mean :2745
## 3rd Qu.:1903 3rd Qu.:3.000 3rd Qu.:3157
## Max. :3259 Max. :4.000 Max. :5688
## customer_lifetime_value
## Min. :-4285.0
## 1st Qu.: -396.0
## Median : 574.0
## Mean : 922.1
## 3rd Qu.: 1814.0
## Max. :16465.0
The objective of this project is to perform an exploratory data analysis on the claims_df dataset and produce an executive summary of your key insights and recommendations to the executive team at the insurance company.
Purpose of Automobile Insurance Company analysis.
The purpose of this analysis is to identify key patterns and trends in customer behavior and insurance coverage preferences, with a view to making recommendations that will improve the profitability and sustainability of the insurance company
1. Which gender has the highest Total Claims amount
total_claims_amount_by_gender <- ggplot(data=claims_df,
aes(x=gender,
y= total_claims_amount,
color = gender)) +
labs(x = 'Gender', y = 'Total Claims Amount', color='Gender', title = 'The total amount claimed from the auto insuarance by gender') +
geom_bar(stat="identity") +
theme_bw()
plot(total_claims_amount_by_gender)
From the above analysis, we could notice that the Male has highest number of total claims amount than female. It is adviced that the company to increase the premiums for Male as the costs of the male’s payouts for accident repairs, total loss car replacements, medical fees, and legal fees are at higher risk for the company in claims.
2. What is the variation of the total amount of claims by each state?
s<-ggplot(claims_df, aes(customer_state, total_claims_amount)) +
geom_boxplot() + theme_bw() +labs(x = '5 US States', y = 'Total Claims Amount', title = 'Variation of total amount of claims over 5 US states')
plot(s)
This dataset contains “customer_state” which has 5 states Washington, California, Arizona, Nevada, Oregon. From the above analysis, it is noticed that the average total amount of claims from all the five states are in the same range, however there are outliers in each state’s total amount of claims. It is adviced to increase the premimum for customers when the total claims amount reaches 4500.
3. What is the relationship between customer lifetime value with their highest education and their occupation?
customerlife <- ggplot(claims_df,
aes(employment_status, highest_education, fill= customer_lifetime_value)) +
geom_tile()+
labs(x="Employment Status",
y="Highest Education",
fill="Customer Lifetime Value",
title= "Effect of Employement Status and Highest Education on Customer value") +
scale_fill_gradient2(low="red", mid="white", high="blue") +
theme_bw()
plot(customerlife)
Heatmap shows that customers who are employed and are a doctorate has the lowest customer lifetime value, whereas the retired customers with associate education level has the highest customer lifetime value.
4. What is monthly premium which is paid by customers mostly?
range = ggplot(claims_df, aes(x=monthly_premium)) +
geom_histogram(color="black",fill="#decfdd")+labs(x = 'Monthly Premium', title = 'Distribution of Customers paying for Monthly premiums') +
facet_wrap(~vehicle_size, nrow = 3)
plot(range)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The visualization highlights that a higher proportion of customers who own midsize vehicles pay a monthly premium compared to those with other vehicle categories. Moreover, customers across all vehicle categories demonstrate a willingness to pay a monthly premium provided it does not exceed $150. This presents an opportunity to promote monthly premium payments among customers and incentivize those already paying a premium to increase their contributions by offering additional services.
5. In terms of customer lifetime value, which gender exhibits the highest and lowest values across different residence types?
cust_residence_type <- ggplot(claims_df, aes(x=residence_type, y=customer_lifetime_value)) +
geom_boxplot(
aes(color = residence_type),
size = 1.5,
alpha = 0.8
) +
facet_wrap(~gender) +
labs(y = 'Customer Lifetime Value', x = 'Gender', colour='Residence Types', title = 'Gender with highest and lowest customer lifetime value among the different residential categories') +
theme_bw()
plot(cust_residence_type)
Based on the analysis, it is apparent that the customer lifetime value for males residing in urban, sub-urban, and rural areas is lower than that of females. However, customer lifetime value is slightly higher for sub-urban residents compared to other residential categories. Additionally, outliers in lifetime value were observed for males residing in sub-urban areas.
6. List out top customers based on their policy type and vehicle size, that are highly profitable?
claims_df %>% group_by(customer_id) %>% dplyr::summarise(
Policy = policy,
State = vehicle_size,
Profit_customers = max(customer_lifetime_value)) %>% arrange(desc(Profit_customers)) %>% head(10)
## # A tibble: 10 × 4
## customer_id Policy State Profit_customers
## <chr> <fct> <fct> <dbl>
## 1 XT93203 Personal Midsize 16465
## 2 QW44027 Corporate Large 15204
## 3 DU50092 Personal Midsize 14217
## 4 NZ60700 Personal Midsize 14203
## 5 XZ64172 Corporate Midsize 14126
## 6 WP58340 Personal Midsize 13542
## 7 TX35222 Personal Midsize 13403
## 8 UY18770 Personal Midsize 13265
## 9 JC11405 Personal Midsize 13100
## 10 MA15172 Personal Midsize 12900
claims_df %>% group_by(customer_id) %>% dplyr::summarise(
Policy = policy,
State = vehicle_size,
Profit_customers = max(customer_lifetime_value)) %>% arrange(desc(Profit_customers))
## # A tibble: 6,249 × 4
## customer_id Policy State Profit_customers
## <chr> <fct> <fct> <dbl>
## 1 XT93203 Personal Midsize 16465
## 2 QW44027 Corporate Large 15204
## 3 DU50092 Personal Midsize 14217
## 4 NZ60700 Personal Midsize 14203
## 5 XZ64172 Corporate Midsize 14126
## 6 WP58340 Personal Midsize 13542
## 7 TX35222 Personal Midsize 13403
## 8 UY18770 Personal Midsize 13265
## 9 JC11405 Personal Midsize 13100
## 10 MA15172 Personal Midsize 12900
## # … with 6,239 more rows
claims_df %>% group_by(policy, coverage) %>%
summarise(n_customers = n(),
min_profit = min(customer_lifetime_value),
avg_profit = mean(customer_lifetime_value),
max_profit = max(customer_lifetime_value))
## `summarise()` has grouped output by 'policy'. You can override using the
## `.groups` argument.
## # A tibble: 9 × 6
## # Groups: policy [3]
## policy coverage n_customers min_profit avg_profit max_profit
## <fct> <fct> <int> <dbl> <dbl> <dbl>
## 1 Personal Basic 2853 -4285 327. 9872
## 2 Personal Extended 1371 -2767 1533. 10804
## 3 Personal Premium 434 -1703 2917. 16465
## 4 Corporate Basic 805 -3850 367. 10132
## 5 Corporate Extended 401 -2734 1566. 12185
## 6 Corporate Premium 122 -2038 2790. 15204
## 7 Special Basic 157 -3911 94.1 6184
## 8 Special Extended 86 -1090 1502. 8934
## 9 Special Premium 20 -785 2612. 7950
Based on the following observations, it can be inferred that customers with midsize vehicles and a personal policy type are the most profitable, with a maximum profit of 16,465. A further breakdown of customer profits into minimum, average, and maximum categories, based on policy type and coverage, reveals that customers with a personal policy and premium coverage generate the highest profits. In contrast, customers with personal policy and basic coverage, which is the most commonly opted for option, generate a maximum profit of 9,872.
7. Find the number of customers by insurance coverage and Marital status.
# Group by coverage and marital_status and calculate the count for each combination
summary_data <- claims_df %>%
group_by(coverage, marital_status) %>%
summarise(n_marital_status = n())
## `summarise()` has grouped output by 'coverage'. You can override using the
## `.groups` argument.
# Create a bar plot to visualize the counts for each combination of coverage and marital_status
ggplot(summary_data, aes(x = coverage, y = n_marital_status, fill = marital_status)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Counts of Marital Status by Coverage",
x = "Coverage",
y = "Count") +
theme_minimal()
Customers have been classified based on their coverage preferences and marital status. The analysis reveals that the highest number of customers, 2,570, who opted for basic coverage are married. The second-highest group, comprising 1,206 customers, opted for extended coverage and are also married.
8.Which coverage plans have the lowest monthly premium and the highest overall claim amount?
summary_data <- claims_df %>%
group_by(coverage) %>%
summarise(Premium = min(monthly_premium), claims_Amt = sum(total_claims_amount))
# Create a scatter plot to visualize the relationship between Premium and claims_Amt for each coverage
ggplot(summary_data, aes(x = Premium, y = claims_Amt)) +
geom_point(aes(color = coverage)) +
labs(title = "Premium vs. Claims Amount by Coverage",
x = "Minimum Monthly Premium",
y = "Total Claims Amount") +
theme_minimal()
The analysis reveals that all coverage plans have a low premium, which denotes the starting price for each plan type. For the basic plan, the premium starts at 61, with a total claimed amount of 10,995,269. The extended plan, which is priced at 76, has a total claimed amount of 4,717,440. The premium plan, which is priced at 101, has a total claimed amount of 1,439,928, the lowest among all coverage plans.
The analysis reveals several important insights into customer behavior and insurance coverage preferences.
First, male customers make higher total claims than female customers, and therefore, the company is advised to increase premiums for males to offset the higher risk of payouts for accident repairs, total loss car replacements, medical fees, and legal fees.
Secondly, the company is advised to increase premiums when the total claims amount reaches 4,500.
Thirdly, customer lifetime value is higher for retired customers with associate education level, while customers who are employed and have a doctorate have the lowest customer lifetime value.
Fourthly, customer lifetime value for males residing in urban, sub-urban, and rural areas is lower than that of females, but it is slightly higher for sub-urban residents compared to other residential categories.
Fifthly, customers with midsize vehicles and a personal policy type are the most profitable, with a maximum profit of 16,465, while customers with personal policy and basic coverage generate a maximum profit of 9,872.
Based on these findings, several recommendations are made to improve the profitability and sustainability of the insurance company.
Firstly, the company should consider increasing premiums for male customers to offset the higher risk of payouts.
Secondly, the company should increase premiums when the total claims amount reaches 4500.
Thirdly, the company should target retired customers with associate education level, as they have the highest customer lifetime value.
Fourthly, the company should consider offering additional services to incentivize customers to pay a monthly premium of up to $150.
Finally, the company should consider promoting personal policy and premium coverage to maximize profits, while also targeting married customers who opt for basic or extended coverage.