Importance: Using hospital quality ratings, patients are able to make a better decision in what hospital they want to be treated and where the best care is available in state of California, based on overall hospital performance or based on particular medical condition or procedure.
Question: Can we predict hospital quality ratings based on risk adjusted mortality rates, number of deaths, number of cases, medical procedures performed and medical conditions treated for 2012-2013?
Dataset: is available from California Hospital Inpatient Mortality Rates and Quality Ratings, 2012-2013.
Description of dataset: The dataset contains risk-adjusted mortality rates, and number of deaths and cases for 6 medical conditions treated (Acute Stroke, Acute Myocardial Infarction, Heart Failure, Gastrointestinal Hemorrhage, Hip Fracture and Pneumonia) and 6 procedures performed (Abdominal Aortic Aneurysm Repair, Carotid Endarterectomy, Craniotomy, Esophageal Resection, Pancreatic Resection, Percutaneous Coronary Intervention) in California hospitals for 2012 and 2013. This dataset does not include conditions treated or procedures performed in outpatient settings.
Load the data from csv file.
setwd("C:/Users/postdoc/Dropbox (Personal)/SpringBoard Fund/Rprojects/")
data <- read.csv("California_Hospital_Inpatient_Mortality_Rates_and_Quality_Ratings__2012-2013.csv",sep=",",header=TRUE)
df <- tbl_df(data)
Dataset: 11169 observations and 12 variables.
Variables with missing values:
Remove missing values, because number of missing values consists of half of dataset.
df_clean <- df[which(is.na(df$X..of.Cases)==F),]
Clean Dataset: 6165 observations and 12 variables.
Variables with no missing values:
summary(df_clean$Hospital.Ratings)
## As Expected Better Worse
## 5797 158 210
summary(df_clean$Procedure.Condition)
## AAA Repair Acute Stroke
## 283 617
## Acute Stroke Hemorrhagic Acute Stroke Ischemic
## 466 615
## Acute Stroke Subarachnoid AMI
## 241 590
## Carotid Endarterectomy Craniotomy
## 404 298
## Esophageal Resection GI Hemorrhage
## 75 622
## Heart Failure Hip Fracture
## 616 426
## Pancreatic Cancer Pancreatic Other
## 142 130
## Pancreatic Resection PCI
## 190 299
## Pneumonia
## 151
Decoding Procedure/Condition variable.
According to the American Stroke Association (ASA), strokes can be classified into 2 main categories: 87% are ischemic strokes, caused by blockage of an artery; 13% are hemorrhagic strokes, caused by bleeding. Ischemic strokes are further divided into 2 groups: thrombotic and embolic strokes. Hemorrhagic strokes are divided into 2 main categories: intracerebral and subarachnoid hemorrhages.
Our clean dataset has four categories for Acute Stroke:
Within each hospital, there are different notations for Acute Stroke variable. It suggests that different doctor uses different notations for the condition. These four categories are combined in one: Acute Stroke.
df_clean$Procedure.Condition <- gsub("Acute Stroke .*","Acute Stroke",df_clean$Procedure.Condition)
df_clean$Procedure.Condition <- factor(df_clean$Procedure.Condition)
Two additional categories are present in Procedure/Condition variable:
These categories are separate medical conditions and are not combined in one category.
The Procedure.Condition variable contains 6 medical procedures and 8 medical conditions. To indicate what procedure was performed or what condition was treated, the Medical_Categorey variable was added to the clean dataset.
df_clean <- df_clean %>%
mutate(Medical_Category = ifelse(grepl("Repair",Procedure.Condition) | grepl("Endarterectomy",Procedure.Condition) | grepl("Craniotomy",Procedure.Condition) | grepl("Resection",Procedure.Condition) | grepl("PCI",Procedure.Condition), "Procedure", "Condition"))
Density Plots for # of Cases, # of Deaths and Risk Adjusted Mortality Rate by Hospital Ratings.
p1 <- ggplot(df_clean,aes(log(X..of.Cases),fill=factor(Hospital.Ratings),colour=factor(Hospital.Ratings)))+
geom_density(alpha = 0.1)
p2 <- ggplot(df_clean,aes(log(X..of.Deaths),fill=factor(Hospital.Ratings),colour=factor(Hospital.Ratings)))+
geom_density(alpha = 0.1)
p3 <- ggplot(df_clean,aes(log(Risk.Adjusted.Mortality.Rate),fill=factor(Hospital.Ratings),colour=factor(Hospital.Ratings)))+
geom_density(alpha = 0.1)
grid.arrange(p1, p2, p3, ncol=1)
Conclusions 1:
Density Plots for Risk Adjusted Mortality Rate by Procedures Performed and Hospital Ratings.
df_p <- df_clean[which(df_clean$Medical_Category=="Procedure"),]
p6 <- ggplot(df_p,aes(log(Risk.Adjusted.Mortality.Rate),fill=factor(Hospital.Ratings),colour=factor(Hospital.Ratings)))+
geom_density(alpha = 0.1)+
theme(legend.position='bottom')+
facet_wrap(~ Procedure.Condition, ncol=2, scales="free_y")
p6
Density Plots for Risk Adjusted Mortality Rate by Conditions Treated and Hospital Ratings.
df_c <- df_clean[which(df_clean$Medical_Category=="Condition"),]
p9 <- ggplot(df_c,aes(log(Risk.Adjusted.Mortality.Rate),fill=factor(Hospital.Ratings),colour=factor(Hospital.Ratings)))+
geom_density(alpha = 0.1)+
theme(legend.position='bottom')+
facet_wrap(~ Procedure.Condition, ncol=2, scales="free_y")
p9
Associations between medical procedures or conditions with hospital ratings, number of cases, number of deaths and risk adjusted mortality rate.
Procedures.
df_p_all <- df_p %>%
group_by(Procedure.Condition) %>%
summarise(all_cases = sum(X..of.Cases),
all_deaths = sum(X..of.Deaths),
all_mortality_rate = sum(Risk.Adjusted.Mortality.Rate))
df_p_all
## # A tibble: 6 x 4
## Procedure.Condition all_cases all_deaths all_mortality_rate
## <fctr> <int> <int> <dbl>
## 1 AAA Repair 4927 59 508.5
## 2 Carotid Endarterectomy 12478 60 290.2
## 3 Craniotomy 30164 2159 2354.6
## 4 Esophageal Resection 619 28 436.9
## 5 Pancreatic Resection 3356 93 1002.8
## 6 PCI 78660 2028 793.6
Conditions.
df_c_all <- df_c %>%
group_by(Procedure.Condition) %>%
summarise(all_cases = sum(X..of.Cases),
all_deaths = sum(X..of.Deaths),
all_mortality_rate = sum(Risk.Adjusted.Mortality.Rate))
df_c_all
## # A tibble: 8 x 4
## Procedure.Condition all_cases all_deaths all_mortality_rate
## <fctr> <int> <int> <dbl>
## 1 Acute Stroke 217956 20461 26582.9
## 2 AMI 93594 5731 3863.4
## 3 GI Hemorrhage 94804 2099 1597.8
## 4 Heart Failure 155066 4778 2200.0
## 5 Hip Fracture 32245 744 945.8
## 6 Pancreatic Cancer 1787 43 632.0
## 7 Pancreatic Other 1425 41 590.0
## 8 Pneumonia 20630 1019 552.5
The highest mortality rates is for Craniotomy and Pancreatic Resection procedures, Acute Stroke, AMI and Heart Failure conditions.
The lowest mortality rates is for Carotid Endarterectomy procedure, Pancreatic Other and Pneumonia conditions.
Hospital Ratings.
prop.table(table(df_clean$Procedure.Condition,df_clean$Hospital.Ratings))*100
##
## As Expected Better Worse
## AAA Repair 4.5579886 0.0000000 0.0324412
## Acute Stroke 29.0673155 1.2003244 1.1841038
## AMI 8.6942417 0.3730738 0.5028386
## Carotid Endarterectomy 6.4395783 0.0000000 0.1135442
## Craniotomy 4.2497972 0.2919708 0.2919708
## Esophageal Resection 1.2165450 0.0000000 0.0000000
## GI Hemorrhage 9.6512571 0.1622060 0.2757502
## Heart Failure 9.0186537 0.3730738 0.6001622
## Hip Fracture 6.8450933 0.0000000 0.0648824
## Pancreatic Cancer 2.2708840 0.0000000 0.0324412
## Pancreatic Other 2.0762368 0.0000000 0.0324412
## Pancreatic Resection 3.0170316 0.0000000 0.0648824
## PCI 4.5417680 0.0973236 0.2108678
## Pneumonia 2.3844282 0.0648824 0.0000000
Conclusions 2:
Higher the risk adjusted mortality rate, worse the hospital ratings.
Summary of hospital ratings over all conditions and procedues.
df_clean <- df_clean %>% mutate(ratings =
ifelse(grepl("As Expected",Hospital.Ratings),"0",
ifelse(grepl("Better",Hospital.Ratings),"1",
ifelse(grepl("Worse",Hospital.Ratings),"-1",NA))))
df_clean$ratings <- as.numeric(df_clean$ratings)
all_ratings <- df_clean %>%
group_by(Hospital,Latitude,Longitude) %>%
summarise(all_ratings = sum(ratings),
mean_mortality_rate = mean(Risk.Adjusted.Mortality.Rate)) %>%
mutate(ratings =
ifelse(all_ratings > 0,"Better",
ifelse(all_ratings < 0, "Worse","As Expected")))
all_ratings$ratings <- as.factor(all_ratings$ratings)
all_ratings <- tbl_df(all_ratings)
Mapping of overall hospital ratings and mean mortality rates.
CAmap <- get_map(location="California",source="google",maptype="roadmap",crop=FALSE,zoom=6)
ggmap (CAmap) +
geom_point(aes(x=Longitude,y=Latitude,size=mean_mortality_rate,colour=ratings),data=all_ratings,alpha=0.5)+
scale_colour_manual(values=c("Worse" = "darkred","Better" = "darkblue","As Expected" = "darkgrey"))+
scale_size(range = c(0, 10))
Overall Hospital Ratings:
summary(all_ratings$ratings)
## As Expected Better Worse
## 172 69 99
all_ratings %>% arrange(desc(all_ratings)) %>% select(Hospital) %>% slice(1:10)
## # A tibble: 10 x 1
## Hospital
## <fctr>
## 1 Kaiser Foundation Hospital â Redwood City
## 2 Kaiser Foundation Hospital â Sunset
## 3 Centinela Hospital Medical Center
## 4 Scripps Green Hospital
## 5 Cedars Sinai Medical Center
## 6 Desert Valley Hospital
## 7 Encino Hospital Medical Center
## 8 Glendale Adventist Medical Center â Wilson Terrace
## 9 Kaiser Foundation Hospital â Rehabilitation Center Vallejo
## 10 Kaiser Foundation Hospital â San Francisco
all_ratings %>% arrange(mean_mortality_rate) %>% select(Hospital) %>% slice(1:10)
## # A tibble: 10 x 1
## Hospital
## <fctr>
## 1 Corcoran District Hospital
## 2 Eastern Plumas Hospital â Portola Campus
## 3 Glenn Medical Center
## 4 Good Samaritan Hospital â Bakersfield
## 5 Hoag Orthopedic Institute
## 6 Kern Valley Healthcare District
## 7 Laguna Honda Hospital and Rehabilitation Center
## 8 Mammoth Hospital
## 9 Southern Inyo Hospital
## 10 Tehachapi Hospital
all_ratings %>% arrange(all_ratings) %>% select(Hospital) %>% slice(1:10)
## # A tibble: 10 x 1
## Hospital
## <fctr>
## 1 Palomar Health Downtown Campus
## 2 Los Angeles County/University of Southern California Medical Center
## 3 San Francisco General Hospital
## 4 Santa Barbara Cottage Hospital
## 5 Arrowhead Regional Medical Center
## 6 Grossmont Hospital
## 7 Sharp Memorial Hospital
## 8 Antelope Valley Hospital
## 9 Doctors Medical Center
## 10 Good Samaritan Hospital â San Jose
all_ratings %>% arrange(desc(mean_mortality_rate)) %>% select(Hospital) %>% slice(1:10)
## # A tibble: 10 x 1
## Hospital
## <fctr>
## 1 Memorial Hospital Los Banos
## 2 Mountains Community Hospital
## 3 Santa Ynez Valley Cottage Hospital
## 4 Biggs Gridley Memorial Hospital
## 5 Coalinga Regional Medical Center
## 6 George L. Mee Memorial Hospital
## 7 Adventist Medical Center â Reedley
## 8 Central Valley Specialty Hospital
## 9 San Joaquin General Hospital
## 10 Goleta Valley Cottage Hospital
Summary for Acute Stroke condition.
df_as <- df_c[which(df_c$Procedure.Condition=="Acute Stroke"),]
df_as <- df_as %>% mutate(ratings =
ifelse(grepl("As Expected",Hospital.Ratings),"0",
ifelse(grepl("Better",Hospital.Ratings),"1",
ifelse(grepl("Worse",Hospital.Ratings),"-1",NA))))
df_as$ratings <- as.numeric(df_as$ratings)
df_as_all <- df_as %>%
group_by(Hospital,Latitude,Longitude) %>%
summarise(all_ratings = sum(ratings),
mean_mortality_rate = mean(Risk.Adjusted.Mortality.Rate)) %>%
mutate(ratings =
ifelse(all_ratings > 0,"Better",
ifelse(all_ratings < 0, "Worse","As Expected")))
df_as_all$ratings <- as.factor(df_as_all$ratings)
df_as_all <- tbl_df(df_as_all)
df_as_best_rat <- df_as_all %>% arrange(desc(all_ratings)) %>% slice(1:50)
df_as_best <- df_as_best_rat %>% arrange(mean_mortality_rate) %>% slice(1:25)
df_as_best %>% slice(1:10) %>% select(Hospital)
## # A tibble: 10 x 1
## Hospital
## <fctr>
## 1 Pacific Alliance Medical Center, Inc.
## 2 Anaheim General Hospital
## 3 Olympia Medical Center
## 4 Los Angeles County/Olive View â UCLA Medical Center
## 5 Encino Hospital Medical Center
## 6 Marina Del Rey Hospital
## 7 Sutter Delta Medical Center
## 8 Desert Valley Hospital
## 9 Sutter Solano Medical Center
## 10 Kaiser Foundation Hospital â Oakland Campus
Summary for AMI condition.
## # A tibble: 10 x 1
## Hospital
## <fctr>
## 1 Southern California Hospital at Hollywood
## 2 Sherman Oaks Hospital
## 3 Kaiser Foundation Hospital â Antioch
## 4 Encino Hospital Medical Center
## 5 La Palma Intercommunity Hospital
## 6 Scripps Green Hospital
## 7 Kaiser Foundation Hospital â Santa Rosa
## 8 Paradise Valley Hospital
## 9 Kaiser Foundation Hospital â South Sacramento
## 10 Kaiser Foundation Hospital â Hayward
Summary for Heart Failure condition.
## # A tibble: 10 x 1
## Hospital
## <fctr>
## 1 Adventist Medical Center â Reedley
## 2 Anaheim General Hospital
## 3 Sherman Oaks Hospital
## 4 Barstow Community Hospital
## 5 Centinela Hospital Medical Center
## 6 Encino Hospital Medical Center
## 7 Desert Valley Hospital
## 8 Paradise Valley Hospital
## 9 Scripps Green Hospital
## 10 White Memorial Medical Center
Summary for PCI procedure.
## # A tibble: 10 x 1
## Hospital
## <fctr>
## 1 Brotman Medical Center
## 2 Fresno Heart and Surgical Hospital
## 3 El Camino Hospital
## 4 Downey Regional Medical Center
## 5 Henry Mayo Newhall Memorial Hospital
## 6 Scripps Green Hospital
## 7 California Pacific Medical Center â Pacific Campus
## 8 Kaiser Foundation Hospital â Sunset
## 9 Community Memorial Hospital â San Buenaventura
## 10 Glendale Adventist Medical Center â Wilson Terrace
Summary for Craniotomy procedure.
## # A tibble: 10 x 1
## Hospital
## <fctr>
## 1 Alhambra Hospital
## 2 Desert Valley Hospital
## 3 El Centro Regional Medical Center
## 4 Saint Johnâs Health Center
## 5 El Camino Hospital
## 6 City of Hope Helford Clinical Research Hospital
## 7 Community Hospital of The Monterey Peninsula
## 8 Community Hospital Monterey Peninsula
## 9 California Pacific Medical Center â Pacific Campus
## 10 Glendale Adventist Medical Center â Wilson Terrace
Summary for Pancreatic Resection procedure.
## # A tibble: 10 x 1
## Hospital
## <fctr>
## 1 Alameda County Medical Center
## 2 Community Hospital Monterey Peninsula
## 3 Community Hospital of The Monterey Peninsula
## 4 Community Memorial Hospital â San Buenaventura
## 5 Eden Medical Center
## 6 Eisenhower Medical Center
## 7 Fresno Heart and Surgical Hospital
## 8 Good Samaritan Hospital â San Jose
## 9 Grossmont Hospital
## 10 John Muir Medical Center â Concord Campus
Approach
Split the Data Set into 2012 training and 2013 test sets.
train <- df_clean[which(df_clean$Year==2012),]
test_original <- df_clean[which(df_clean$Year==2013),]
test <- subset(test_original, select = -Hospital.Ratings)
Feature Engineering using Random Forests.
fit <- randomForest(Hospital.Ratings ~ Procedure.Condition + Risk.Adjusted.Mortality.Rate + X..of.Cases + X..of.Deaths, data=train,importance=TRUE,ntree=1000)
print(fit) # view results
##
## Call:
## randomForest(formula = Hospital.Ratings ~ Procedure.Condition + Risk.Adjusted.Mortality.Rate + X..of.Cases + X..of.Deaths, data = train, importance = TRUE, ntree = 1000)
## Type of random forest: classification
## Number of trees: 1000
## No. of variables tried at each split: 2
##
## OOB estimate of error rate: 4.32%
## Confusion matrix:
## As Expected Better Worse class.error
## As Expected 2902 10 5 0.005142269
## Better 56 24 1 0.703703704
## Worse 62 0 40 0.607843137
importance(fit) # importance of each predictor
## As Expected Better Worse
## Procedure.Condition 145.51584 24.974139 11.60154
## Risk.Adjusted.Mortality.Rate 104.67908 30.106975 73.00210
## X..of.Cases 89.16738 50.456622 17.41210
## X..of.Deaths 53.59510 6.323429 58.84989
## MeanDecreaseAccuracy MeanDecreaseGini
## Procedure.Condition 149.34129 60.14185
## Risk.Adjusted.Mortality.Rate 112.79069 125.53213
## X..of.Cases 95.16901 87.71509
## X..of.Deaths 58.43046 67.69199
varImpPlot(fit,type=1)
The most important variables: Procedure.Condition and Risk.Adjusted.Mortality.Rate.
Mean Decrease Gini is a measure of how each variable contributes to the homogeneity of the nodes and leaves in the resulting random forest. The Gini coefficient is a measure of homogeneity from 0 (homogeneous) to 1 (heterogeneous). Variables that result in nodes with higher purity have a higher decrease in Gini coefficient.
varImpPlot(fit,type=2)
Predictions with all variables.
# prediction
prediction <- predict(fit, test)
#fancyRpartPlot(fit)
# confusion matrix
cm <- as.matrix(table(Actual = test_original$Hospital.Ratings,Predicted = prediction))
cm
## Predicted
## Actual As Expected Better Worse
## As Expected 2867 7 6
## Better 49 28 0
## Worse 72 0 36
n = sum(cm) # number of instances
nc = nrow(cm) # number of classes
diag = diag(cm) # number of correctly classified instances per class
rowsums = apply(cm, 1, sum) # number of instances per class
colsums = apply(cm, 2, sum) # number of predictions per class
p = rowsums / n # distribution of instances over the actual classes
q = colsums / n # distribution of instances over the predicted classes
accuracy = sum(diag) / n
accuracy
## [1] 0.9562806
precision = diag / colsums
recall = diag / rowsums
f1 = 2 * precision * recall / (precision + recall)
data.frame(precision, recall, f1)
## precision recall f1
## As Expected 0.9595047 0.9954861 0.9771643
## Better 0.8000000 0.3636364 0.5000000
## Worse 0.8571429 0.3333333 0.4800000
Clean 2012 training set and convert to the wide format based on Procedure.Condition and Risk.Adjusted.Mortality.Rate variables.
# convert train set to wide format.
train2 <- train %>%
group_by(Hospital,Latitude,Longitude,Procedure.Condition) %>%
summarise(all_ratings = sum(ratings),
mortality_rate = sum(Risk.Adjusted.Mortality.Rate)) %>%
mutate(ratings =
ifelse(all_ratings > 0,"Better",
ifelse(all_ratings < 0, "Worse","As Expected")))
train2$ratings <- as.factor(train2$ratings)
train2 <- tbl_df(train2)
train2_wide <- train2 %>% select(Hospital,Latitude,Longitude,Procedure.Condition,ratings,mortality_rate) %>% spread(Procedure.Condition,mortality_rate)
# subset to ratings and procedures only
train2_wide_cut <- train2_wide[,-c(1,2,3)]
# remove white spaces from column names
colnames(train2_wide_cut) <- gsub(" ","",colnames(train2_wide_cut))
# replace NA with 0, beauce some hospitals does not treat these conditions, thus mortality rate is zero.
train2_wide_cut[is.na(train2_wide_cut)] <- 0
Feature Enginering with Random Forests on wide format 2012 training set
fit <- randomForest(ratings ~ AAARepair + AcuteStroke + AMI + CarotidEndarterectomy + Craniotomy + EsophagealResection + GIHemorrhage + HeartFailure + HipFracture + PancreaticCancer + PancreaticOther + PancreaticResection + PCI + Pneumonia, data=train2_wide_cut,importance=TRUE,ntree=1000)
print(fit) # view results
##
## Call:
## randomForest(formula = ratings ~ AAARepair + AcuteStroke + AMI + CarotidEndarterectomy + Craniotomy + EsophagealResection + GIHemorrhage + HeartFailure + HipFracture + PancreaticCancer + PancreaticOther + PancreaticResection + PCI + Pneumonia, data = train2_wide_cut, importance = TRUE, ntree = 1000)
## Type of random forest: classification
## Number of trees: 1000
## No. of variables tried at each split: 3
##
## OOB estimate of error rate: 15.98%
## Confusion matrix:
## As Expected Better Worse class.error
## As Expected 296 17 13 0.09202454
## Better 8 37 2 0.21276596
## Worse 26 4 35 0.46153846
importance(fit) # importance of each predictor
## As Expected Better Worse MeanDecreaseAccuracy
## AAARepair -1.1071070 5.203811 1.501276 2.667456
## AcuteStroke 11.0114687 17.605407 40.008126 41.322957
## AMI -1.2714426 36.702337 34.383789 43.229566
## CarotidEndarterectomy 6.5132876 12.069448 12.138739 17.444599
## Craniotomy 5.6676596 20.111117 12.475400 22.838230
## EsophagealResection -2.3301798 2.435076 2.436611 0.784897
## GIHemorrhage 2.3487945 44.159897 39.138107 49.907648
## HeartFailure 1.5175070 46.135206 37.174181 49.184776
## HipFracture -5.9453389 30.079553 27.773111 31.667778
## PancreaticCancer -1.0057905 3.738689 3.682976 3.871419
## PancreaticOther -0.2044520 5.538596 8.723983 8.821532
## PancreaticResection 1.2884631 7.435238 11.966683 12.062778
## PCI 0.3144257 14.668247 7.926353 14.960126
## Pneumonia -3.9746467 10.982875 18.830663 14.365316
## MeanDecreaseGini
## AAARepair 1.04159696
## AcuteStroke 17.77417855
## AMI 24.18630962
## CarotidEndarterectomy 2.09411139
## Craniotomy 6.24026595
## EsophagealResection 0.09852053
## GIHemorrhage 26.37847005
## HeartFailure 24.45450167
## HipFracture 10.07273068
## PancreaticCancer 0.27019039
## PancreaticOther 0.93716163
## PancreaticResection 1.13203146
## PCI 5.02526795
## Pneumonia 3.08633895
varImpPlot(fit,type=1)
varImpPlot(fit,type=2)
All variables are included in tree construction.
tree0 <- rpart(ratings ~ AAARepair + AcuteStroke + AMI + CarotidEndarterectomy + Craniotomy + EsophagealResection + GIHemorrhage + HeartFailure + HipFracture + PancreaticCancer + PancreaticOther + PancreaticResection + PCI + Pneumonia, data = train2_wide_cut, method = "class")
printcp(tree0)
##
## Classification tree:
## rpart(formula = ratings ~ AAARepair + AcuteStroke + AMI + CarotidEndarterectomy +
## Craniotomy + EsophagealResection + GIHemorrhage + HeartFailure +
## HipFracture + PancreaticCancer + PancreaticOther + PancreaticResection +
## PCI + Pneumonia, data = train2_wide_cut, method = "class")
##
## Variables actually used in tree construction:
## [1] AcuteStroke AMI GIHemorrhage HeartFailure HipFracture
##
## Root node error: 112/438 = 0.25571
##
## n= 438
##
## CP nsplit rel error xerror xstd
## 1 0.059524 0 1.00000 1.00000 0.081520
## 2 0.035714 5 0.60714 0.69643 0.071490
## 3 0.010000 7 0.53571 0.66071 0.070018
fancyRpartPlot(tree0)
best_ratings <- all_ratings %>% arrange(desc(all_ratings)) %>% slice(1:50)
best_lowest <- best_ratings %>% arrange(mean_mortality_rate) %>% slice(1:25)
CAmap <- get_map(location="California",source="google",maptype="roadmap",crop=FALSE,zoom=6)
ggmap (CAmap) +
geom_point(aes(x=Longitude,y=Latitude,colour=mean_mortality_rate),data=best_lowest,size=5,alpha=0.6)+
scale_colour_gradient(limits=c(1, 5), high="red", low="darkblue")
df_as_best <- df_as_best %>% mutate(Procedure.Condition="Acute Stroke")
df_ami_best <- df_ami_best %>% mutate(Procedure.Condition="AMI")
df_hf_best <- df_hf_best %>% mutate(Procedure.Condition="Heart Failure")
best_cond <- bind_rows(df_as_best[1:10,],df_ami_best[1:10,],df_hf_best[1:10,])
CAmap <- get_map(location="California",source="google",maptype="roadmap",crop=FALSE,zoom=6)
ggmap (CAmap) +
geom_point(aes(x=Longitude,y=Latitude,size=mean_mortality_rate,colour=Procedure.Condition),data=best_cond,alpha=0.8)+
scale_colour_manual(values=c("Acute Stroke"="darkred", "AMI"="darkgrey", "Heart Failure"="darkblue"))+
scale_size(range = c(0, 10))
df_pci_best <- df_pci_best %>% mutate(Procedure.Condition="PCI")
df_cr_best <- df_cr_best %>% mutate(Procedure.Condition="Craniotomy")
df_pr_best <- df_pr_best %>% mutate(Procedure.Condition="Pancreatic Resection")
best_proc <- bind_rows(df_pci_best[1:10,],df_cr_best[1:10,],df_pr_best[1:10,])
CAmap <- get_map(location="California",source="google",maptype="roadmap",crop=FALSE,zoom=6)
ggmap (CAmap) +
geom_point(aes(x=Longitude,y=Latitude,size=mean_mortality_rate,colour=Procedure.Condition),data=best_proc,alpha=0.6)+
scale_colour_manual(values=c("PCI"="darkblue", "Craniotomy"="darkred", "Pancreatic Resection"="darkgrey"))+
scale_size(range = c(0, 10))
Recommend which hospital will have the best care in the future using predicted hospital ratings.