Analytics Edge: Unit 4 - Keeping An Eye on Healthcare Costs

The D2Hawkeye Story

D2Hawkeye

Founded by Chris Kryder, MD, MBA in 2001
Combine expert knowledge and databases with analytics to improve quality and cost management in healthcare
Located in Massachusetts USA, grew very fast and was sold to Verisk Analytics in 2009

Healthcare Case Management

D2Hawkeye tries to improve healthcare case management
- Identify high-risk patients
- Work with patients to manage treatment and associated costs
- Arrange specialist care
Medical costs often relate to severity of health problems, and are an issue for both patient and provider
Goal: improve the quality of cost predictions

Impact

Many different types of clients
- Third party administrators of medical claims
- Case management companies
- Benefit consultants
- Health plans
Millions of people analyzed monthly through analytic platform in 2009
Thousands of employers processed monthly

Pre-Analytics Approach

Human judgement - MDs manually analyzed patient histories and developed
Limited data sets
Costly and inefficient
Can we used analytics instead?

Data Sources

Healthcare industry is data-rich, but data may be hard to access
- Unstructured - doctor’s notes
- Unavailable - hard to get due to differences in technology
- Inaccessible - strong privacy laws around healthcare data sharing
What is available?
Claims data
- Requests for reimbursement submitted to insurance companies or state-provided insurance from doctors, hospitals, and pharmacies.
Eligibility information
Demographic information

Claims Data

Rich, structured data source
Very high dimension
Doesn’t capture all aspects of a persons treatment or health - many things must be inferred
Unlike electronic medical records, we do not know the results of a test, only that a test was administered

D2Hawkeye’s Claims Data

Available: claims data for 2.4 million people over a span of 3 years
Include only people with data for at least 10 months in both periods - 400,000 people

Variables / Cost Profiles

Variables
- Chronic condition cost indicators
- Gender and age

Cost Variables

Medical Intepretation of Buckets

Error Measures

Typically we use R² or accuracy, but others can be used
In case of D2Hawkeye, failing to classify a high-cost patient is worse than failing to classify a low-cost patient correctly
Use a “penalty error” to capture this asymmetry

Penalty Error

Key idea: use asymmetric penalties
Define a “penalty matrix” as the cost of being wrong

Baseline

Baseline is to simply predict that the cost in the next “period” will be the cost in the current period
Accuracy of 75%
Penalty Error of 0.56

Multi-class Classification

We use predicting a bucket number

Most Important Factors

First splits are related to cost

Secondary Factors

Risk factors
Chronic Illness
“Q146”
- Asthma + depression
“Q1”
- Risk factor indicating hylan injection
- Possible knee replacement or arthroscopy

Example Groups for Bucket 5

Under 35 years old, between $3300 and $3900 in claims, C.A.D., but no office visits in last year.
Claims between $3900 and $43,000 with at least $8000 paid in last 12 months, $4300 in pharmacy claims, acute cost profile and cancer diagnosis
More than $58,000 in claims, at least $55,000 paid in last 12 months, and not an acute profile

Insights

Substantial improvement over the baseline
Double accuracy over baseline in some cases
Smaller accuracy improvement on bucket 5, but much lower penalty

Analytics Provide an Edge

Substantial improvement in D2Hawkeye’s ability to identify patients who need more attention
Because the model was interpret able, physicians were able to improve the model by identifying new variables and refining existing variables
Analytics gave D2Hawkeye an edge over competition using “last century” methods

The D2Hawkeye Story in R

Read in the data

# Read in the data
Claims = read.csv("ClaimsData.csv")
# Output structure
str(Claims)
## 'data.frame':    458005 obs. of  16 variables:
##  $ age              : int  85 59 67 52 67 68 75 70 67 67 ...
##  $ alzheimers       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ arthritis        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ cancer           : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ copd             : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ depression       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ diabetes         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ heart.failure    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ ihd              : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ kidney           : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ osteoporosis     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ stroke           : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ reimbursement2008: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ bucket2008       : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ reimbursement2009: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ bucket2009       : int  1 1 1 1 1 1 1 1 1 1 ...

Split the data

# Percentage of patients in each cost bucket
table(Claims$bucket2009)/nrow(Claims)
## 
##           1           2           3           4           5 
## 0.671267781 0.190170413 0.089466272 0.043324855 0.005770679

# Split the data
library(caTools)

set.seed(88)

spl = sample.split(Claims$bucket2009, SplitRatio = 0.6)

ClaimsTrain = subset(Claims, spl==TRUE)

ClaimsTest = subset(Claims, spl==FALSE)

Baseline Method

# Baseline method
table(ClaimsTest$bucket2009, ClaimsTest$bucket2008)
##    
##          1      2      3      4      5
##   1 110138   7787   3427   1452    174
##   2  16000  10721   4629   2931    559
##   3   7006   4629   2774   1621    360
##   4   2688   1943   1415   1539    352
##   5    293    191    160    309    104

(110138 + 10721 + 2774 + 1539 + 104)/nrow(ClaimsTest)
## [1] 0.6838135

Create Penalty Matrix

# Penalty Matrix
PenaltyMatrix = matrix(c(0,1,2,3,4,2,0,1,2,3,4,2,0,1,2,6,4,2,0,1,8,6,4,2,0), byrow=TRUE, nrow=5)

PenaltyMatrix
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    0    1    2    3    4
## [2,]    2    0    1    2    3
## [3,]    4    2    0    1    2
## [4,]    6    4    2    0    1
## [5,]    8    6    4    2    0

# Penalty Error of Baseline Method
as.matrix(table(ClaimsTest$bucket2009, ClaimsTest$bucket2008))*PenaltyMatrix
##    
##         1     2     3     4     5
##   1     0  7787  6854  4356   696
##   2 32000     0  4629  5862  1677
##   3 28024  9258     0  1621   720
##   4 16128  7772  2830     0   352
##   5  2344  1146   640   618     0

sum(as.matrix(table(ClaimsTest$bucket2009, ClaimsTest$bucket2008))*PenaltyMatrix)/nrow(ClaimsTest)
## [1] 0.7386055

CART Model

# Load necessary libraries
library(rpart)
library(rpart.plot)

# CART model
ClaimsTree = rpart(bucket2009 ~ age + alzheimers + arthritis + cancer + copd + depression + diabetes + heart.failure + ihd + kidney + osteoporosis + stroke + bucket2008 + reimbursement2008, data=ClaimsTrain, method="class", cp=0.00005)
# Plot CART
prp(ClaimsTree)



# Make predictions
PredictTest = predict(ClaimsTree, newdata = ClaimsTest, type = "class")

table(ClaimsTest$bucket2009, PredictTest)
##    PredictTest
##          1      2      3      4      5
##   1 114141   8610    124    103      0
##   2  18409  16102    187    142      0
##   3   8027   8146    118     99      0
##   4   3099   4584     53    201      0
##   5    351    657      4     45      0

(114141 + 16102 + 118 + 201 + 0)/nrow(ClaimsTest)
## [1] 0.7126669

# Penalty Error
as.matrix(table(ClaimsTest$bucket2009, PredictTest))*PenaltyMatrix
##    PredictTest
##         1     2     3     4     5
##   1     0  8610   248   309     0
##   2 36818     0   187   284     0
##   3 32108 16292     0    99     0
##   4 18594 18336   106     0     0
##   5  2808  3942    16    90     0

sum(as.matrix(table(ClaimsTest$bucket2009, PredictTest))*PenaltyMatrix)/nrow(ClaimsTest)
## [1] 0.7578902

CART model with loss matrix

# New CART model with loss matrix
ClaimsTree = rpart(bucket2009 ~ age + alzheimers + arthritis + cancer + copd + depression + diabetes + heart.failure + ihd + kidney + osteoporosis + stroke + bucket2008 + reimbursement2008, data=ClaimsTrain, method="class", cp=0.00005, parms=list(loss=PenaltyMatrix))

# Redo predictions and penalty error
PredictTest = predict(ClaimsTree, newdata = ClaimsTest, type = "class")

table(ClaimsTest$bucket2009, PredictTest)
##    PredictTest
##         1     2     3     4     5
##   1 94310 25295  3087   286     0
##   2  7176 18942  8079   643     0
##   3  3590  7706  4692   401     1
##   4  1304  3193  2803   636     1
##   5   135   356   408   156     2

(94310 + 18942 + 4692 + 636 + 2)/nrow(ClaimsTest)
## [1] 0.6472746

sum(as.matrix(table(ClaimsTest$bucket2009, PredictTest))*PenaltyMatrix)/nrow(ClaimsTest)
## [1] 0.6418161

Analytics Edge: Unit 4 - Keeping An Eye on Healthcare Costs

Sulman Khan

October 26, 2018

The D2Hawkeye Story

D2Hawkeye

Healthcare Case Management

Impact

Pre-Analytics Approach

Data Sources

Claims Data

D2Hawkeye’s Claims Data

Variables / Cost Profiles

Cost Variables

Medical Intepretation of Buckets

Error Measures

Penalty Error

Baseline

Multi-class Classification

Most Important Factors

Secondary Factors

Example Groups for Bucket 5

Insights

Analytics Provide an Edge

The D2Hawkeye Story in R

Read in the data

Split the data

Baseline Method

Create Penalty Matrix

CART Model

CART model with loss matrix