Introduction.

Automobile insurance company located in the southwest and western regions of the United States.Its depend on accurate pricing estimates to maintain profitability. Auto policies must be priced so that the insurance company makes a profit in the long run, given the costs of their customers’ payouts for accident repairs, total loss car replacements, medical fees, and legal fees.

The executives at this insurance company have noticed declining profitability over the last several years and have hired you as a data science consultant to evaluate their claims data and make recommendations on pricing, customer behavior, and car insurance policy adjustments.

This report presents an analysis of insurance policy data from five states, including Washington, California, Arizona, Nevada, and Oregon. company.

Claim_df Dataset

The Claim_df is loaded below and consists of 6,249 auto claims submitted by customers of the insurance company.

Load and Summary the Claim_df Dataset

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0      ✔ purrr   1.0.1 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.5.0 
## ✔ readr   2.1.3      ✔ forcats 1.0.0 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(dplyr)
library(ggplot2)
library(readr)

claims_df <- 
  readRDS(url('https://gmubusinessanalytics.netlify.app/data/claims_df.rds'))
claims_df
## # A tibble: 6,249 × 20
##    custo…¹ custo…² highe…³ emplo…⁴ gender income resid…⁵ marit…⁶ sales…⁷ cover…⁸
##    <chr>   <fct>   <fct>   <fct>   <fct>   <dbl> <fct>   <fct>   <fct>   <fct>  
##  1 AA11235 Nevada  Bachel… Medica… Female  11167 Suburb… Married Branch  Basic  
##  2 AA16582 Washin… Bachel… Medica… Male    14072 Suburb… Divorc… Agent   Basic  
##  3 AA34092 Califo… Associ… Employ… Male    33635 Suburb… Married Web     Extend…
##  4 AA56476 Arizona High S… Employ… Female  74454 Suburb… Single  Call C… Basic  
##  5 AA69265 Nevada  Bachel… Employ… Female  60817 Suburb… Single  Web     Premium
##  6 AA71604 Arizona Master  Employ… Female  87560 Suburb… Married Web     Extend…
##  7 AA93585 Califo… Associ… Employ… Male    97024 Urban   Married Branch  Premium
##  8 AB21519 Califo… Associ… Employ… Female  93272 Urban   Married Branch  Extend…
##  9 AB23825 Califo… Associ… Employ… Male    21509 Suburb… Single  Agent   Extend…
## 10 AB26022 Oregon  High S… Retired Male    26487 Suburb… Single  Call C… Basic  
## # … with 6,239 more rows, 10 more variables: policy <fct>, vehicle_class <fct>,
## #   vehicle_size <fct>, monthly_premium <dbl>, months_policy_active <dbl>,
## #   months_since_last_claim <dbl>, current_claim_amount <dbl>,
## #   total_claims <dbl>, total_claims_amount <dbl>,
## #   customer_lifetime_value <dbl>, and abbreviated variable names ¹​customer_id,
## #   ²​customer_state, ³​highest_education, ⁴​employment_status, ⁵​residence_type,
## #   ⁶​marital_status, ⁷​sales_channel, ⁸​coverage
summary(claims_df)
##  customer_id           customer_state   highest_education     employment_status
##  Length:6249        Washington: 554   High School:1708    Employed     :5154   
##  Class :character   Oregon    :1763   Associate  :1843    Medical Leave: 421   
##  Mode  :character   California:2150   Bachelor   :1868    Disabled     : 392   
##                     Nevada    : 601   Master     : 578    Retired      : 282   
##                     Arizona   :1181   Doctoral   : 252                         
##                                                                                
##     gender         income       residence_type  marital_status
##  Female:3177   Min.   :10037   Urban   :1495   Single  :1027  
##  Male  :3072   1st Qu.:27750   Suburban:3657   Married :4158  
##                Median :46503   Rural   :1097   Divorced:1064  
##                Mean   :49600                                  
##                3rd Qu.:69259                                  
##                Max.   :99981                                  
##      sales_channel      coverage          policy           vehicle_class 
##  Agent      :2359   Basic   :3815   Personal :4658   Two-Door Car :1292  
##  Branch     :1771   Extended:1858   Corporate:1328   Four-Door Car:3124  
##  Call Center:1218   Premium : 576   Special  : 263   Sports Car   : 335  
##  Web        : 901                                    SUV          :1246  
##                                                      Luxury Car   : 119  
##                                                      Luxury SUV   : 133  
##   vehicle_size  monthly_premium  months_policy_active months_since_last_claim
##  Small  :1249   Min.   : 61.00   Min.   :12.00        Min.   : 0.00          
##  Midsize:4396   1st Qu.: 69.00   1st Qu.:30.00        1st Qu.: 6.00          
##  Large  : 604   Median : 83.00   Median :38.00        Median :14.00          
##                 Mean   : 93.81   Mean   :39.03        Mean   :14.97          
##                 3rd Qu.:109.00   3rd Qu.:48.00        3rd Qu.:23.00          
##                 Max.   :297.00   Max.   :71.00        Max.   :35.00          
##  current_claim_amount  total_claims   total_claims_amount
##  Min.   : 739         Min.   :1.000   Min.   : 859       
##  1st Qu.:1285         1st Qu.:2.000   1st Qu.:2224       
##  Median :1628         Median :2.000   Median :2664       
##  Mean   :1625         Mean   :2.392   Mean   :2745       
##  3rd Qu.:1903         3rd Qu.:3.000   3rd Qu.:3157       
##  Max.   :3259         Max.   :4.000   Max.   :5688       
##  customer_lifetime_value
##  Min.   :-4285.0        
##  1st Qu.: -396.0        
##  Median :  574.0        
##  Mean   :  922.1        
##  3rd Qu.: 1814.0        
##  Max.   :16465.0

Objectives of Automobile Insurance Company analysis:

The objective of this project is to perform an exploratory data analysis on the claims_df dataset and produce an executive summary of your key insights and recommendations to the executive team at the insurance company.

Purpose of Automobile Insurance Company analysis.

The purpose of this analysis is to identify key patterns and trends in customer behavior and insurance coverage preferences, with a view to making recommendations that will improve the profitability and sustainability of the insurance company

1. Which gender has the highest Total Claims amount

total_claims_amount_by_gender <- ggplot(data=claims_df, 
                                        aes(x=gender, 
                                            y= total_claims_amount, 
                                            color = gender)) +
                                        labs(x = 'Gender', y = 'Total Claims Amount', color='Gender', title = 'The total amount claimed from the auto insuarance by gender') +
                                        geom_bar(stat="identity") + 
                                        theme_bw()

plot(total_claims_amount_by_gender)

From the above analysis, we could notice that the Male has highest number of total claims amount than female. It is adviced that the company to increase the premiums for Male as the costs of the male’s payouts for accident repairs, total loss car replacements, medical fees, and legal fees are at higher risk for the company in claims.

2. What is the variation of the total amount of claims by each state?

s<-ggplot(claims_df, aes(customer_state, total_claims_amount)) +
geom_boxplot() + theme_bw() +labs(x = '5 US States', y = 'Total Claims Amount', title = 'Variation of total amount of claims over 5 US states')

plot(s)

This dataset contains “customer_state” which has 5 states Washington, California, Arizona, Nevada, Oregon. From the above analysis, it is noticed that the average total amount of claims from all the five states are in the same range, however there are outliers in each state’s total amount of claims. It is adviced to increase the premimum for customers when the total claims amount reaches 4500.

3. What is the relationship between customer lifetime value with their highest education and their occupation?

customerlife <- ggplot(claims_df, 
                   aes(employment_status, highest_education, fill= customer_lifetime_value)) +
                   geom_tile()+ 
                   labs(x="Employment Status", 
                        y="Highest Education",
                        fill="Customer Lifetime Value", 
                        title= "Effect of Employement Status and Highest Education on Customer value") +
                   scale_fill_gradient2(low="red", mid="white", high="blue") +
                   theme_bw()

plot(customerlife)

Heatmap shows that customers who are employed and are a doctorate has the lowest customer lifetime value, whereas the retired customers with associate education level has the highest customer lifetime value.

4. What is monthly premium which is paid by customers mostly?

range = ggplot(claims_df, aes(x=monthly_premium)) +
  geom_histogram(color="black",fill="#decfdd")+labs(x = 'Monthly Premium', title = 'Distribution of Customers paying for Monthly premiums') +
facet_wrap(~vehicle_size, nrow = 3)
plot(range)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The visualization highlights that a higher proportion of customers who own midsize vehicles pay a monthly premium compared to those with other vehicle categories. Moreover, customers across all vehicle categories demonstrate a willingness to pay a monthly premium provided it does not exceed $150. This presents an opportunity to promote monthly premium payments among customers and incentivize those already paying a premium to increase their contributions by offering additional services.

5. In terms of customer lifetime value, which gender exhibits the highest and lowest values across different residence types?

cust_residence_type <- ggplot(claims_df, aes(x=residence_type, y=customer_lifetime_value)) +
  geom_boxplot(
    aes(color = residence_type),
    size = 1.5, 
    alpha = 0.8
  ) +
facet_wrap(~gender) +
labs(y = 'Customer Lifetime Value', x = 'Gender', colour='Residence Types', title = 'Gender with highest and lowest customer lifetime value among the different residential categories') +
theme_bw()

plot(cust_residence_type)

Based on the analysis, it is apparent that the customer lifetime value for males residing in urban, sub-urban, and rural areas is lower than that of females. However, customer lifetime value is slightly higher for sub-urban residents compared to other residential categories. Additionally, outliers in lifetime value were observed for males residing in sub-urban areas.

6. List out top customers based on their policy type and vehicle size, that are highly profitable?

claims_df %>% group_by(customer_id) %>% dplyr::summarise(
        Policy = policy, 
        State = vehicle_size, 
        Profit_customers = max(customer_lifetime_value)) %>% arrange(desc(Profit_customers)) %>% head(10)
## # A tibble: 10 × 4
##    customer_id Policy    State   Profit_customers
##    <chr>       <fct>     <fct>              <dbl>
##  1 XT93203     Personal  Midsize            16465
##  2 QW44027     Corporate Large              15204
##  3 DU50092     Personal  Midsize            14217
##  4 NZ60700     Personal  Midsize            14203
##  5 XZ64172     Corporate Midsize            14126
##  6 WP58340     Personal  Midsize            13542
##  7 TX35222     Personal  Midsize            13403
##  8 UY18770     Personal  Midsize            13265
##  9 JC11405     Personal  Midsize            13100
## 10 MA15172     Personal  Midsize            12900
claims_df %>% group_by(customer_id) %>% dplyr::summarise(
        Policy = policy, 
        State = vehicle_size, 
        Profit_customers = max(customer_lifetime_value)) %>% arrange(desc(Profit_customers))
## # A tibble: 6,249 × 4
##    customer_id Policy    State   Profit_customers
##    <chr>       <fct>     <fct>              <dbl>
##  1 XT93203     Personal  Midsize            16465
##  2 QW44027     Corporate Large              15204
##  3 DU50092     Personal  Midsize            14217
##  4 NZ60700     Personal  Midsize            14203
##  5 XZ64172     Corporate Midsize            14126
##  6 WP58340     Personal  Midsize            13542
##  7 TX35222     Personal  Midsize            13403
##  8 UY18770     Personal  Midsize            13265
##  9 JC11405     Personal  Midsize            13100
## 10 MA15172     Personal  Midsize            12900
## # … with 6,239 more rows
claims_df %>% group_by(policy, coverage) %>% 
                  summarise(n_customers = n(),
                            min_profit = min(customer_lifetime_value),
                            avg_profit = mean(customer_lifetime_value),
                            max_profit = max(customer_lifetime_value))
## `summarise()` has grouped output by 'policy'. You can override using the
## `.groups` argument.
## # A tibble: 9 × 6
## # Groups:   policy [3]
##   policy    coverage n_customers min_profit avg_profit max_profit
##   <fct>     <fct>          <int>      <dbl>      <dbl>      <dbl>
## 1 Personal  Basic           2853      -4285      327.        9872
## 2 Personal  Extended        1371      -2767     1533.       10804
## 3 Personal  Premium          434      -1703     2917.       16465
## 4 Corporate Basic            805      -3850      367.       10132
## 5 Corporate Extended         401      -2734     1566.       12185
## 6 Corporate Premium          122      -2038     2790.       15204
## 7 Special   Basic            157      -3911       94.1       6184
## 8 Special   Extended          86      -1090     1502.        8934
## 9 Special   Premium           20       -785     2612.        7950

Based on the following observations, it can be inferred that customers with midsize vehicles and a personal policy type are the most profitable, with a maximum profit of 16,465. A further breakdown of customer profits into minimum, average, and maximum categories, based on policy type and coverage, reveals that customers with a personal policy and premium coverage generate the highest profits. In contrast, customers with personal policy and basic coverage, which is the most commonly opted for option, generate a maximum profit of 9,872.

7. Find the number of customers by insurance coverage and Marital status.

# Group by coverage and marital_status and calculate the count for each combination
summary_data <- claims_df %>%
  group_by(coverage, marital_status) %>%
  summarise(n_marital_status = n())
## `summarise()` has grouped output by 'coverage'. You can override using the
## `.groups` argument.
# Create a bar plot to visualize the counts for each combination of coverage and marital_status
ggplot(summary_data, aes(x = coverage, y = n_marital_status, fill = marital_status)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Counts of Marital Status by Coverage",
       x = "Coverage",
       y = "Count") +
  theme_minimal()

Customers have been classified based on their coverage preferences and marital status. The analysis reveals that the highest number of customers, 2,570, who opted for basic coverage are married. The second-highest group, comprising 1,206 customers, opted for extended coverage and are also married.

8.Which coverage plans have the lowest monthly premium and the highest overall claim amount?

summary_data <- claims_df %>%
  group_by(coverage) %>%
  summarise(Premium = min(monthly_premium), claims_Amt = sum(total_claims_amount))

# Create a scatter plot to visualize the relationship between Premium and claims_Amt for each coverage
ggplot(summary_data, aes(x = Premium, y = claims_Amt)) +
  geom_point(aes(color = coverage)) +
  labs(title = "Premium vs. Claims Amount by Coverage",
       x = "Minimum Monthly Premium",
       y = "Total Claims Amount") +
  theme_minimal()

The analysis reveals that all coverage plans have a low premium, which denotes the starting price for each plan type. For the basic plan, the premium starts at 61, with a total claimed amount of 10,995,269. The extended plan, which is priced at 76, has a total claimed amount of 4,717,440. The premium plan, which is priced at 101, has a total claimed amount of 1,439,928, the lowest among all coverage plans.

Key Findings of Automobile Insurance Company analysis.

The analysis reveals several important insights into customer behavior and insurance coverage preferences.

Recommendations and Conclusion of Automobile Insurance Company analysis.

Based on these findings, several recommendations are made to improve the profitability and sustainability of the insurance company.