Insurance is a promise between the insurer and the insuree to provide a service in a future in the event of an accident in exchange for a premium. Everyone needs it and everyone has it as mandated by the State of Connecticut Department of Insurance which is dependent on your assets in vehicle & home ownership as well as preference in health insurance needs. As consumers our major concern is that in the event of a car accident that our promise and agreements with our trusted insurance company are upheld and rectified with fair compensation to resume life after chaos. With access to open data and the ability to report inconsistencies within the sometimes complicated claim process we can make informed decisions about where to place our assets for the most optimal protection in Connecticut among the seemingly endless options for insurance carriers. This dataset ranges on opened claim complaints starting in 2007 to 2016 (present). Since most companies have several lines of business it makes more sense to visually assess and make general observations the data by company instead of aggregating by type, which is not included in the original data.
# get data
library(RSocrata)
url <- "https://data.ct.gov/Business/Insurance-Company-Complaints-Resolutions-Status-an/t64r-mt64"
insur <- read.socrata(url)
save(insur, file ="insur.RData")
# descriptives about the data set
str(insur)
## 'data.frame': 13935 obs. of 11 variables:
## $ Company : Factor w/ 677 levels "21st Century Assurance Company",..: 1 2 3 3 3 3 4 4 5 5 ...
## $ File.No. : int 302165 201212 404371 300917 201185 105854 204467 202096 600004 505873 ...
## $ Opened : POSIXlt, format: "2013-05-15" "2012-03-08" ...
## $ Closed : POSIXlt, format: "2013-08-13" "2012-04-03" ...
## $ Coverage : Factor w/ 68 levels "A & H","Auto",..: 4 5 2 26 5 2 5 6 4 5 ...
## $ Reason : Factor w/ 13 levels "Employer Handling",..: 10 10 9 3 10 9 10 10 10 10 ...
## $ SubReason : Factor w/ 70 levels "Actual Cash Value Dispute",..: 6 6 27 15 7 3 1 7 12 46 ...
## $ Disposition: Factor w/ 53 levels "Accident Occurred in another State",..: 6 6 23 34 27 4 13 34 7 37 ...
## $ Conclusion : Factor w/ 8 levels "Furnished Information",..: 6 6 6 3 3 6 6 3 6 6 ...
## $ Recovery : num 2586 0 0 0 0 ...
## $ Status : Factor w/ 5 levels "Closed","Open",..: 1 1 1 1 1 1 1 1 1 1 ...
dim(insur)
## [1] 13935 11
# clean the data
insur$Recovery <- gsub("^[[:punct:]]", "", insur$Recovery) # remove the $
insur$Recovery <- as.numeric(insur$Recovery) # change into a numeric data type
# make a copy of the dataset without the dates columns to get around: Error: column 'Opened' has unsupported type : POSIXlt, POSIXt
insur2 <- insur[, -c(3:4)]
# or change data type again
insur$Opened <- as.POSIXct(insur$Opened)
insur$Closed <- as.POSIXct(insur$Closed)
With 677 unique insurance companies in this list is is most worthwhile to observe the top 25 companies with the most complaint counts. There may be times when parent companies use other (subsidiaries) companies to give their members and providers certain types of services and state-specific coverage which can explain the large amount of companies in this dataset with similar umbrella names however for this analysis they will be treated as separate companies as listed. The top 25 companies on this list provide a variety of coverage from auto, home, health, commercial coverage and even worker’s compensation coverage - so these complaints are not an artifact of a particular type of peril but an overarching look into how claims adjusters at particular companies settle claims. Anthem seems to have many more complaints compared to the other companies and this could be directly attributed to the fact that Anthem (Blue Cross and Blue Shield, headquartered in Wallingford, CT) is the number one ranked in enrollment in the state. It is also observable that the companies in the health and life insurance sector see more submitted claims due to the nature of healthcare reimbursement and public perception as compared to auto insurance where not every incident is filed as a claim. With the exclusion of Anthem (as an outlier) the most populous complaints still arise from health and life insurance companies from this list. However from this visualization you can interpret that Anthem Health receives more ‘Unfair Claims Practice’ claims as a reason them some companies receive in total complaints. (Source: http://www.hartfordbusiness.com/article/20150601/DATAJOE/150609990/largest-connecticut-health-insurance-companies). A detailed look into why consumers submit complaints yields that some of the top reasons are: Denial of Claim, Claim Delay, Claim Procedure, Excessive charges, Unsatisfactory Settlement, Cancelation, and Premium/Rate Increase - which all center on customer expereince, claim handling and retension.
library(ggplot2)
library(plotly) # development github version
library(dplyr)
# 677 unique companies listed
comp_count <- insur %>% count(Company)
sort(comp_count$n, decreasing = T)
## [1] 1828 356 338 306 303 284 260 235 221 214 210 175 173 168
## [15] 154 144 141 138 129 128 125 125 118 115 111 111 101 100
## [29] 93 92 92 87 86 86 84 84 83 82 79 79 79 79
## [43] 78 78 76 75 74 73 73 73 71 69 69 66 65 65
## [57] 64 64 62 59 58 57 54 52 49 48 48 48 47 47
## [71] 45 44 43 43 41 40 39 36 36 35 34 34 33 32
## [85] 32 32 32 31 30 30 30 29 29 28 28 27 27 27
## [99] 27 27 27 26 26 26 26 26 25 25 25 25 23 23
## [113] 23 22 22 22 22 22 22 22 21 21 20 20 20 20
## [127] 20 20 19 19 19 19 18 18 18 17 17 17 17 17
## [141] 17 17 17 17 17 17 16 16 16 16 16 16 16 16
## [155] 16 16 16 16 16 16 15 15 15 15 15 15 15 14
## [169] 14 14 14 14 14 14 13 13 13 13 13 13 13 13
## [183] 13 12 12 12 12 12 12 12 12 12 12 12 12 11
## [197] 11 11 11 11 11 11 11 11 10 10 10 10 10 10
## [211] 10 10 10 10 9 9 9 9 9 9 9 9 9 9
## [225] 9 9 9 9 9 8 8 8 8 8 8 8 8 8
## [239] 8 8 8 8 8 8 8 8 8 8 7 7 7 7
## [253] 7 7 7 7 7 7 7 7 7 7 7 6 6 6
## [267] 6 6 6 6 6 6 6 6 6 6 6 6 6 6
## [281] 6 6 6 6 6 6 6 6 5 5 5 5 5 5
## [295] 5 5 5 5 5 5 5 5 5 5 5 5 5 5
## [309] 5 5 5 5 5 5 5 4 4 4 4 4 4 4
## [323] 4 4 4 4 4 4 4 4 4 4 4 4 4 4
## [337] 4 4 4 4 4 4 4 4 4 4 4 4 4 4
## [351] 4 4 4 3 3 3 3 3 3 3 3 3 3 3
## [365] 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [379] 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [393] 3 3 3 3 3 3 3 3 2 2 2 2 2 2
## [407] 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [421] 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [435] 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [449] 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [463] 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [477] 2 2 2 2 2 2 2 1 1 1 1 1 1 1
## [491] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [505] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [519] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [533] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [547] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [561] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [575] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [589] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [603] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [617] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [631] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [645] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [659] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [673] 1 1 1 1 1
# data reduction to top 25 frequency companies
top_comp <- names(sort(table(insur$Company), decreasing=TRUE)[1:25])
# create a new dataset with only the top 25 most complained about companies
new_df <- filter(insur, Company %in% top_comp)
# Clean up company names for plotting
new_df$Company_title <- gsub("Insurance Company", "", x = new_df$Company)
new_df$Company_title <- gsub("(The)", "", x = new_df$Company_title, fixed = TRUE)
# plot frequency of top complaint companies listed
most_comp <- ggplot(new_df, aes(x= Company_title, fill = Reason)) +
geom_bar(stat="count") +
coord_flip() +
labs(title = "Top 25 Connecticut Insurance Company Complaints",
x = "", y = "Complaint Count")
#most_comp # static version
(gg <- ggplotly(most_comp) %>%
layout(margin = list(l = 230))) # interactive version
# A detailed look into why consumers submit complaints for the entire data set
library(data.table)
detail_comp <- count(insur, SubReason)
d <- data.table(detail_comp, key="n")
d[, tail(.SD, 3), by=n]
## n SubReason
## 1: 1 External Appeal Not Given
## 2: 1 U.S. Department of Labor/Cobra
## 3: 1 Div Illustration
## 4: 2 Viatical/Life Settlement
## 5: 2 Service Fees
## 6: 2 Medicare
## 7: 3 FEMA
## 8: 4 Other CT state agency
## 9: 5 Other Fees
## 10: 6 Unfair Discriminati on
## 11: 7 State Continuation (38A-538)
## 12: 7 No Response
## 13: 8 Carrier Never Received Application
## 14: 8 Classification
## 15: 9 Benefit Extension
## 16: 9 Unprofessional Conduct
## 17: 10 Internal/External Appeal Procedure
## 18: 12 Rescission
## 19: 16 Time Delay
## 20: 17 Unapproved Form
## 21: 18 After Market Parts/Unsatisfactory Settlement
## 22: 18 Labor Rate
## 23: 20 Mis-Quote
## 24: 26 Usual and Customary Fees
## 25: 28 Underwriting/Waivers/Rated
## 26: 29 Underwriting Delays
## 27: 32 Mandated Benefit
## 28: 32 Steering
## 29: 33 Carrier Never Received Claim
## 30: 33 Policy Issue Delay
## 31: 40 Storage Fees
## 32: 44 Discontinuation & Replacement
## 33: 51 Pre-Existing Condition
## 34: 55 Diminished Value
## 35: 58 Case Management
## 36: 62 Replacement
## 37: 67 Duplicate Coverage
## 38: 72 Subrogation
## 39: 85 Failed to Remit Premium
## 40: 101 Refusal to Insure
## 41: 105 Coordination of Benefit
## 42: 111 Audit
## 43: 126 Other [Enter Sub-Reason]
## 44: 145 Provider Contract Issue
## 45: 146 Loss of Use
## 46: 167 Misleading Advertising
## 47: 177 Actual Cash Value Dispute
## 48: 185 CPT Code Issue
## 49: 200 Misrepresentation
## 50: 201 Policy Service
## 51: 209 No Subreason
## 52: 214 Eligibility of Provider
## 53: 226 Producer Handling
## 54: 280 Comparative Negligence
## 55: 280 Policy Service Delay
## 56: 292 Premium Refund Delay
## 57: 308 No Coverage/Premium Paid
## 58: 314 Premium/Notice
## 59: 315 Not Medically Necessary
## 60: 361 Contract Provision
## 61: 378 Unsatisfactory Offer
## 62: 535 Non-Renewal
## 63: 613 Premium/Rate Increase
## 64: 620 Cancellation
## 65: 785 Unsatisfactory Settlement
## 66: 899 Excessive Charges
## 67: 1044 Claim Procedure
## 68: 1405 Claim Delays
## 69: 2258 Denial of Claim
## n SubReason
# Denial of Claim, Claim Delay, Claim Procedure, Excessive charges, Unsatisfactory Settlement, Cancelation, and Premium/Rate Increase
An insurance claim submission can seem like a arduous process and in a time of need it’s understandable to want swift fast resolution. Depending on the complexity of the event, you could spend a lot of time negotiate with your insurance carrier for a resolution or be limited in submission due to a hospital stay and unable to file immediately at the point of incurred loss. Based on these submitted complaints, a large consumer pain point would be a extended time without compensation. This box-plot shows the distribution of time in days for the amount of time from the submission date of a claim to the closed date. The five number summary includes the average amount of days until payout or resolution and the median amount of days which does not take into the account of outliers, as well as the minimum days, 1st (25%) quarterly, 3rd (75%) quartile and the maximum days. As a consumer it is positive to see that on average your claim will be resolved in approximately one month as seen above that one of the more prevalent reasons for submitting a complaint is claim delays. It is also worth interpreting that this list comprises of submitted complaints to companies after intial communication on claims so this total time for resolution can be very different than regular claims process.
new_df$total_time <- new_df$Closed - new_df$Opened # calculated in seconds
new_df$total_time <- new_df$total_time / 86400 # in days
new_df$total_time <- as.numeric(new_df$total_time) # change from "difftime" to "numeric"
# Remove outliers
library(data.table)
source('~/Desktop/R-directory/dumas/R/outlierReplace.R')
outlierReplace(new_df, "total_time", which(new_df$total_time > 100), NA)
# remove NA since changing to zero would alter stats
new_df <- na.omit(new_df)
total_plot <- ggplot(new_df, aes(x = Company_title, y = total_time, fill = Company_title)) +
geom_boxplot() +
coord_flip() +
labs(title = "Total time for insurance resolution of a claim",
x = "", y = "Time (in days)") +
guides(fill=FALSE)
#total_plot # static version
(gg2 <- ggplotly(total_plot) %>%
layout(margin = list(l = 230))) # interactive version
#plot_ly(new_df, y = total_time, color = Company_title, type = "box" ) # none flipped version
summary(new_df$total_time) # stats
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 13.00 20.00 25.12 32.00 99.96
Depending on a consumers individual insurance coverage and deductible limits the claim amount will differ and the thought of a large out-of-pocket expense is a harsh reality. This plot depicts that health and life insurance companies are paying a lot (for certain outliers) in claims settlement which is inline with the notion of current healthcare expenses however the average claims payout value is less than $1000. Note: There is only a single bar for Genworth Life indicating that only one value of $604.56 was allocated as recovery for that claim. For other claims in this dataset the value for the recovery could have been resolved to $0.00 which was removed in this vizualization.
# recode zero payout to NA
new_df[new_df$Recovery == 0, ] <- NA
# Remove outliers
outlierReplace(new_df, "Recovery", which(new_df$Recovery > 1000), NA)
# omit NA values
new_df <- na.omit(new_df)
pay_res <- ggplot(new_df, aes(x=Company_title,
y = Recovery, fill = Company_title)) +
geom_boxplot() +
coord_flip() +
labs(title = "Claims Recovery Payout",
x = "", y = "Recovery ($ dollars)") +
guides(fill=FALSE)
#pay_res # static version
(gg3 <- ggplotly(pay_res) %>%
layout(margin = list(l = 230))) # interactive
summary(new_df$Recovery) # stats
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.87 151.20 336.60 384.50 585.20 1000.00
In summary the company with the most complaints also provides the most coverage/enrollment in Connecticut. With the amount of insurance companies licensed in Connecticut it is obvious that we are plagued and privy to our choice in coverage for our most valuable assess. If customer experience in our most sensitive time of need (ie. a car accident or weather damage to a roof) is important to you, it would be wise to research insurance companies aside from the usual websites and learn from crowd-sourced data which can indicated quality and public perception better than whose is the most popular.