Exploratory Data Analysis of CT Insurance Company Complaints, Resolutions, Status, and Recoveries

Intro

Insurance is a promise between the insurer and the insuree to provide a service in a future in the event of an accident in exchange for a premium. Everyone needs it and everyone has it as mandated by the State of Connecticut Department of Insurance which is dependent on your assets in vehicle & home ownership as well as preference in health insurance needs. As consumers our major concern is that in the event of a car accident that our promise and agreements with our trusted insurance company are upheld and rectified with fair compensation to resume life after chaos. With access to open data and the ability to report inconsistencies within the sometimes complicated claim process we can make informed decisions about where to place our assets for the most optimal protection in Connecticut among the seemingly endless options for insurance carriers. This dataset ranges on opened claim complaints starting in 2007 to 2016 (present). Since most companies have several lines of business it makes more sense to visually assess and make general observations the data by company instead of aggregating by type, which is not included in the original data.

# get data
library(RSocrata)
url <- "https://data.ct.gov/Business/Insurance-Company-Complaints-Resolutions-Status-an/t64r-mt64"
insur <- read.socrata(url)
save(insur, file ="insur.RData")
# descriptives about the data set
str(insur)

## 'data.frame':    13935 obs. of  11 variables:
##  $ Company    : Factor w/ 677 levels "21st Century Assurance Company",..: 1 2 3 3 3 3 4 4 5 5 ...
##  $ File.No.   : int  302165 201212 404371 300917 201185 105854 204467 202096 600004 505873 ...
##  $ Opened     : POSIXlt, format: "2013-05-15" "2012-03-08" ...
##  $ Closed     : POSIXlt, format: "2013-08-13" "2012-04-03" ...
##  $ Coverage   : Factor w/ 68 levels "A & H","Auto",..: 4 5 2 26 5 2 5 6 4 5 ...
##  $ Reason     : Factor w/ 13 levels "Employer Handling",..: 10 10 9 3 10 9 10 10 10 10 ...
##  $ SubReason  : Factor w/ 70 levels "Actual Cash Value Dispute",..: 6 6 27 15 7 3 1 7 12 46 ...
##  $ Disposition: Factor w/ 53 levels "Accident Occurred in another State",..: 6 6 23 34 27 4 13 34 7 37 ...
##  $ Conclusion : Factor w/ 8 levels "Furnished Information",..: 6 6 6 3 3 6 6 3 6 6 ...
##  $ Recovery   : num  2586 0 0 0 0 ...
##  $ Status     : Factor w/ 5 levels "Closed","Open",..: 1 1 1 1 1 1 1 1 1 1 ...

dim(insur)

## [1] 13935    11

# clean the data
insur$Recovery <- gsub("^[[:punct:]]", "", insur$Recovery) # remove the $
insur$Recovery <- as.numeric(insur$Recovery) # change into a numeric data type
# make a copy of the dataset without the dates columns to get around: Error: column 'Opened' has unsupported type : POSIXlt, POSIXt
insur2 <- insur[, -c(3:4)]
# or change data type again
insur$Opened <- as.POSIXct(insur$Opened)
insur$Closed <- as.POSIXct(insur$Closed)

The most complained about insurance companies

With 677 unique insurance companies in this list is is most worthwhile to observe the top 25 companies with the most complaint counts. There may be times when parent companies use other (subsidiaries) companies to give their members and providers certain types of services and state-specific coverage which can explain the large amount of companies in this dataset with similar umbrella names however for this analysis they will be treated as separate companies as listed. The top 25 companies on this list provide a variety of coverage from auto, home, health, commercial coverage and even worker’s compensation coverage - so these complaints are not an artifact of a particular type of peril but an overarching look into how claims adjusters at particular companies settle claims. Anthem seems to have many more complaints compared to the other companies and this could be directly attributed to the fact that Anthem (Blue Cross and Blue Shield, headquartered in Wallingford, CT) is the number one ranked in enrollment in the state. It is also observable that the companies in the health and life insurance sector see more submitted claims due to the nature of healthcare reimbursement and public perception as compared to auto insurance where not every incident is filed as a claim. With the exclusion of Anthem (as an outlier) the most populous complaints still arise from health and life insurance companies from this list. However from this visualization you can interpret that Anthem Health receives more ‘Unfair Claims Practice’ claims as a reason them some companies receive in total complaints. (Source: http://www.hartfordbusiness.com/article/20150601/DATAJOE/150609990/largest-connecticut-health-insurance-companies). A detailed look into why consumers submit complaints yields that some of the top reasons are: Denial of Claim, Claim Delay, Claim Procedure, Excessive charges, Unsatisfactory Settlement, Cancelation, and Premium/Rate Increase - which all center on customer expereince, claim handling and retension.

library(ggplot2)
library(plotly) # development github version
library(dplyr)
# 677 unique companies listed
comp_count <- insur %>% count(Company)
sort(comp_count$n, decreasing = T)

##   [1] 1828  356  338  306  303  284  260  235  221  214  210  175  173  168
##  [15]  154  144  141  138  129  128  125  125  118  115  111  111  101  100
##  [29]   93   92   92   87   86   86   84   84   83   82   79   79   79   79
##  [43]   78   78   76   75   74   73   73   73   71   69   69   66   65   65
##  [57]   64   64   62   59   58   57   54   52   49   48   48   48   47   47
##  [71]   45   44   43   43   41   40   39   36   36   35   34   34   33   32
##  [85]   32   32   32   31   30   30   30   29   29   28   28   27   27   27
##  [99]   27   27   27   26   26   26   26   26   25   25   25   25   23   23
## [113]   23   22   22   22   22   22   22   22   21   21   20   20   20   20
## [127]   20   20   19   19   19   19   18   18   18   17   17   17   17   17
## [141]   17   17   17   17   17   17   16   16   16   16   16   16   16   16
## [155]   16   16   16   16   16   16   15   15   15   15   15   15   15   14
## [169]   14   14   14   14   14   14   13   13   13   13   13   13   13   13
## [183]   13   12   12   12   12   12   12   12   12   12   12   12   12   11
## [197]   11   11   11   11   11   11   11   11   10   10   10   10   10   10
## [211]   10   10   10   10    9    9    9    9    9    9    9    9    9    9
## [225]    9    9    9    9    9    8    8    8    8    8    8    8    8    8
## [239]    8    8    8    8    8    8    8    8    8    8    7    7    7    7
## [253]    7    7    7    7    7    7    7    7    7    7    7    6    6    6
## [267]    6    6    6    6    6    6    6    6    6    6    6    6    6    6
## [281]    6    6    6    6    6    6    6    6    5    5    5    5    5    5
## [295]    5    5    5    5    5    5    5    5    5    5    5    5    5    5
## [309]    5    5    5    5    5    5    5    4    4    4    4    4    4    4
## [323]    4    4    4    4    4    4    4    4    4    4    4    4    4    4
## [337]    4    4    4    4    4    4    4    4    4    4    4    4    4    4
## [351]    4    4    4    3    3    3    3    3    3    3    3    3    3    3
## [365]    3    3    3    3    3    3    3    3    3    3    3    3    3    3
## [379]    3    3    3    3    3    3    3    3    3    3    3    3    3    3
## [393]    3    3    3    3    3    3    3    3    2    2    2    2    2    2
## [407]    2    2    2    2    2    2    2    2    2    2    2    2    2    2
## [421]    2    2    2    2    2    2    2    2    2    2    2    2    2    2
## [435]    2    2    2    2    2    2    2    2    2    2    2    2    2    2
## [449]    2    2    2    2    2    2    2    2    2    2    2    2    2    2
## [463]    2    2    2    2    2    2    2    2    2    2    2    2    2    2
## [477]    2    2    2    2    2    2    2    1    1    1    1    1    1    1
## [491]    1    1    1    1    1    1    1    1    1    1    1    1    1    1
## [505]    1    1    1    1    1    1    1    1    1    1    1    1    1    1
## [519]    1    1    1    1    1    1    1    1    1    1    1    1    1    1
## [533]    1    1    1    1    1    1    1    1    1    1    1    1    1    1
## [547]    1    1    1    1    1    1    1    1    1    1    1    1    1    1
## [561]    1    1    1    1    1    1    1    1    1    1    1    1    1    1
## [575]    1    1    1    1    1    1    1    1    1    1    1    1    1    1
## [589]    1    1    1    1    1    1    1    1    1    1    1    1    1    1
## [603]    1    1    1    1    1    1    1    1    1    1    1    1    1    1
## [617]    1    1    1    1    1    1    1    1    1    1    1    1    1    1
## [631]    1    1    1    1    1    1    1    1    1    1    1    1    1    1
## [645]    1    1    1    1    1    1    1    1    1    1    1    1    1    1
## [659]    1    1    1    1    1    1    1    1    1    1    1    1    1    1
## [673]    1    1    1    1    1

# data reduction to top 25 frequency companies
top_comp <- names(sort(table(insur$Company), decreasing=TRUE)[1:25])
# create a new dataset with only the top 25 most complained about companies
new_df <- filter(insur, Company %in% top_comp)
# Clean up company names for plotting
new_df$Company_title <- gsub("Insurance Company", "", x = new_df$Company)
new_df$Company_title <- gsub("(The)", "", x = new_df$Company_title, fixed = TRUE)
# plot frequency of top complaint companies listed
most_comp <- ggplot(new_df, aes(x= Company_title, fill = Reason)) + 
                geom_bar(stat="count") +
                coord_flip() +
                labs(title = "Top 25 Connecticut Insurance Company Complaints", 
                     x = "", y = "Complaint Count") 

#most_comp # static version

(gg <- ggplotly(most_comp) %>%
      layout(margin = list(l = 230))) # interactive version

# A detailed look into why consumers submit complaints for the entire data set
library(data.table)
detail_comp <- count(insur, SubReason)
d <- data.table(detail_comp, key="n")
d[, tail(.SD, 3), by=n]

##        n                                    SubReason
##  1:    1                    External Appeal Not Given
##  2:    1               U.S. Department of Labor/Cobra
##  3:    1                             Div Illustration
##  4:    2                     Viatical/Life Settlement
##  5:    2                                 Service Fees
##  6:    2                                     Medicare
##  7:    3                                         FEMA
##  8:    4                        Other CT state agency
##  9:    5                                   Other Fees
## 10:    6                       Unfair Discriminati on
## 11:    7                 State Continuation (38A-538)
## 12:    7                                  No Response
## 13:    8           Carrier Never Received Application
## 14:    8                               Classification
## 15:    9                            Benefit Extension
## 16:    9                       Unprofessional Conduct
## 17:   10           Internal/External Appeal Procedure
## 18:   12                                   Rescission
## 19:   16                                   Time Delay
## 20:   17                              Unapproved Form
## 21:   18 After Market Parts/Unsatisfactory Settlement
## 22:   18                                   Labor Rate
## 23:   20                                    Mis-Quote
## 24:   26                     Usual and Customary Fees
## 25:   28                   Underwriting/Waivers/Rated
## 26:   29                          Underwriting Delays
## 27:   32                             Mandated Benefit
## 28:   32                                     Steering
## 29:   33                 Carrier Never Received Claim
## 30:   33                           Policy Issue Delay
## 31:   40                                 Storage Fees
## 32:   44                Discontinuation & Replacement
## 33:   51                       Pre-Existing Condition
## 34:   55                             Diminished Value
## 35:   58                              Case Management
## 36:   62                                  Replacement
## 37:   67                           Duplicate Coverage
## 38:   72                                  Subrogation
## 39:   85                      Failed to Remit Premium
## 40:  101                            Refusal to Insure
## 41:  105                      Coordination of Benefit
## 42:  111                                        Audit
## 43:  126                     Other [Enter Sub-Reason]
## 44:  145                      Provider Contract Issue
## 45:  146                                  Loss of Use
## 46:  167                       Misleading Advertising
## 47:  177                    Actual Cash Value Dispute
## 48:  185                               CPT Code Issue
## 49:  200                            Misrepresentation
## 50:  201                               Policy Service
## 51:  209                                 No Subreason
## 52:  214                      Eligibility of Provider
## 53:  226                            Producer Handling
## 54:  280                       Comparative Negligence
## 55:  280                         Policy Service Delay
## 56:  292                         Premium Refund Delay
## 57:  308                     No Coverage/Premium Paid
## 58:  314                               Premium/Notice
## 59:  315                      Not Medically Necessary
## 60:  361                           Contract Provision
## 61:  378                         Unsatisfactory Offer
## 62:  535                                  Non-Renewal
## 63:  613                        Premium/Rate Increase
## 64:  620                                 Cancellation
## 65:  785                    Unsatisfactory Settlement
## 66:  899                            Excessive Charges
## 67: 1044                              Claim Procedure
## 68: 1405                                 Claim Delays
## 69: 2258                              Denial of Claim
##        n                                    SubReason

# Denial of Claim, Claim Delay, Claim Procedure, Excessive charges, Unsatisfactory Settlement, Cancelation, and Premium/Rate Increase

Total time for insurance resolution of a claim

An insurance claim submission can seem like a arduous process and in a time of need it’s understandable to want swift fast resolution. Depending on the complexity of the event, you could spend a lot of time negotiate with your insurance carrier for a resolution or be limited in submission due to a hospital stay and unable to file immediately at the point of incurred loss. Based on these submitted complaints, a large consumer pain point would be a extended time without compensation. This box-plot shows the distribution of time in days for the amount of time from the submission date of a claim to the closed date. The five number summary includes the average amount of days until payout or resolution and the median amount of days which does not take into the account of outliers, as well as the minimum days, 1st (25%) quarterly, 3rd (75%) quartile and the maximum days. As a consumer it is positive to see that on average your claim will be resolved in approximately one month as seen above that one of the more prevalent reasons for submitting a complaint is claim delays. It is also worth interpreting that this list comprises of submitted complaints to companies after intial communication on claims so this total time for resolution can be very different than regular claims process.

new_df$total_time <- new_df$Closed - new_df$Opened # calculated in seconds 
new_df$total_time <- new_df$total_time / 86400 # in days
new_df$total_time <- as.numeric(new_df$total_time) # change from "difftime" to "numeric"
# Remove outliers
library(data.table)
source('~/Desktop/R-directory/dumas/R/outlierReplace.R')
outlierReplace(new_df, "total_time", which(new_df$total_time > 100), NA)
# remove NA since changing to zero would alter stats
new_df <- na.omit(new_df) 

total_plot <- ggplot(new_df, aes(x = Company_title, y = total_time, fill = Company_title)) +
                geom_boxplot() +
                coord_flip() +
                labs(title = "Total time for insurance resolution of a claim", 
                     x = "", y = "Time (in days)") +
                guides(fill=FALSE)

#total_plot # static version

(gg2 <- ggplotly(total_plot) %>%
      layout(margin = list(l = 230))) # interactive version

#plot_ly(new_df, y = total_time, color = Company_title, type = "box" ) # none flipped version

summary(new_df$total_time) # stats

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   13.00   20.00   25.12   32.00   99.96

Claims Payout

Depending on a consumers individual insurance coverage and deductible limits the claim amount will differ and the thought of a large out-of-pocket expense is a harsh reality. This plot depicts that health and life insurance companies are paying a lot (for certain outliers) in claims settlement which is inline with the notion of current healthcare expenses however the average claims payout value is less than $1000. Note: There is only a single bar for Genworth Life indicating that only one value of $604.56 was allocated as recovery for that claim. For other claims in this dataset the value for the recovery could have been resolved to $0.00 which was removed in this vizualization.

# recode zero payout to NA
new_df[new_df$Recovery == 0, ] <- NA
# Remove outliers
outlierReplace(new_df, "Recovery", which(new_df$Recovery > 1000), NA)
# omit NA values
new_df <- na.omit(new_df)

pay_res <- ggplot(new_df, aes(x=Company_title, 
                              y = Recovery, fill = Company_title)) +
                  geom_boxplot() +
                  coord_flip() + 
                   labs(title = "Claims Recovery Payout", 
                     x = "", y = "Recovery ($ dollars)") +
                  guides(fill=FALSE)

#pay_res # static version

(gg3 <- ggplotly(pay_res) %>%
      layout(margin = list(l = 230))) # interactive

summary(new_df$Recovery) # stats

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2.87  151.20  336.60  384.50  585.20 1000.00

Conculsion

In summary the company with the most complaints also provides the most coverage/enrollment in Connecticut. With the amount of insurance companies licensed in Connecticut it is obvious that we are plagued and privy to our choice in coverage for our most valuable assess. If customer experience in our most sensitive time of need (ie. a car accident or weather damage to a roof) is important to you, it would be wise to research insurance companies aside from the usual websites and learn from crowd-sourced data which can indicated quality and public perception better than whose is the most popular.