U.S. healthcare is the most expensive in the world. Post-acute care is a big driver of costs and variability in U.S. Healthcare. Given that it is the Medicare population that is the highest utilizer of post-acute care, the Center for Medicare and Medicaid Services (CMS) publishes data on post-acute providers.We are going to utilize the public data and provide insight into the utilization of post-acute care.
We will analyze Texas Skilled Nursing Facilities focusing on the following aspects:
# call libraries and set seed
library(readxl)
library(caret)
library(plotly)
library(corrplot)
library(dplyr)
library(data.table)
set.seed(123)
PAC <- read_excel("PAC Provider PUF_v20191002 2.xlsx", sheet = 3)
SNF_TX <- filter(PAC, State == "TX", Service_Category == "SNF")
# delect STATE_TOTAL
SNF_TX <- SNF_TX[-1, ]
# delect unrelevant variables: Summary_Category, State, Service_Category, Provider_ID, Facility_Name, City and ZIP
SNF_TX <- SNF_TX[, -(1:7)]
i <- 1 : 115
SNF_TX[i] <- lapply(SNF_TX[i], as.numeric)
# check if there is NA values
sum(is.na(SNF_TX))
## [1] 71205
# delete columns have more than 30%(363) NA values
SNF_TX <- SNF_TX[colSums(!is.na(SNF_TX)) > 846]
ncol(SNF_TX)
## [1] 47
# check the summary of each variable to see if there is outliers and zero-variable
summary(SNF_TX)
## Distinct_Beneficiaries Episode_or_Stay_Count Days_of_Service
## Min. : 11.0 Min. : 11.0 Min. : 72
## 1st Qu.: 43.0 1st Qu.: 55.0 1st Qu.: 1753
## Median : 75.0 Median : 97.5 Median : 2929
## Mean :100.2 Mean :126.6 Mean : 3541
## 3rd Qu.:132.5 3rd Qu.:170.0 3rd Qu.: 4635
## Max. :790.0 Max. :954.0 Max. :21118
##
## Percent_of_Beneficiaries_with_30_or_fewer_Service_Days
## Min. :0.1700
## 1st Qu.:0.3600
## Median :0.4400
## Mean :0.4565
## 3rd Qu.:0.5200
## Max. :1.0000
## NA's :185
## Percent_of_Beneficiaries_with_60_or_more_Service_Days Total_Charge_Amount
## Min. :0.0000 Min. : 79212
## 1st Qu.:0.1500 1st Qu.: 1075750
## Median :0.2400 Median : 1916924
## Mean :0.2503 Mean : 2494034
## 3rd Qu.:0.3400 3rd Qu.: 3337799
## Max. :0.6800 Max. :29343398
## NA's :297
## Total_Medicare_Standard_Payment_Amount Total_Medicare_Payment_Amount
## Min. : 24012 Min. : 19480
## 1st Qu.: 753128 1st Qu.: 666260
## Median : 1305099 Median :1134261
## Mean : 1598570 Mean :1431973
## 3rd Qu.: 2148067 3rd Qu.:1885673
## Max. :10138651 Max. :9506124
##
## Total_Allowed_Amount Total_Coinsurance_Amount Percent_Dual_Beneficiaries
## Min. : 24744 Min. : 494 Min. :0.0000
## 1st Qu.: 845538 1st Qu.: 182883 1st Qu.:0.2900
## Median : 1434888 Median : 291823 Median :0.4600
## Mean : 1773101 Mean : 341128 Mean :0.4514
## 3rd Qu.: 2348982 3rd Qu.: 446042 3rd Qu.:0.6000
## Max. :11414121 Max. :1907997 Max. :1.0000
## NA's :117
## Percent_Medicare_Beneficiaries_in_a_Rural_ZIP Average_Age
## Min. :0.0000 Min. :60.00
## 1st Qu.:0.0200 1st Qu.:76.00
## Median :0.0400 Median :78.00
## Mean :0.1871 Mean :77.75
## 3rd Qu.:0.1100 3rd Qu.:80.00
## Max. :1.0000 Max. :90.00
##
## Percent_Male_Beneficiaries Percent_Female_Beneficiaries
## Min. :0.2000 Min. :0.0000
## 1st Qu.:0.3600 1st Qu.:0.5300
## Median :0.4100 Median :0.5900
## Mean :0.4206 Mean :0.5796
## 3rd Qu.:0.4700 3rd Qu.:0.6400
## Max. :1.0000 Max. :0.8000
## NA's :145 NA's :145
## Percent_White_Beneficiaries
## Min. :0.0700
## 1st Qu.:0.6200
## Median :0.7900
## Mean :0.7306
## 3rd Qu.:0.8900
## Max. :1.0000
## NA's :69
## Percent_American_Indian_or_Alaska_Native_Beneficiaries Average_HCC_Score
## Min. :0 Min. :1.290
## 1st Qu.:0 1st Qu.:2.260
## Median :0 Median :2.600
## Mean :0 Mean :2.713
## 3rd Qu.:0 3rd Qu.:3.087
## Max. :0 Max. :6.440
## NA's :267
## Average_Number_of_Chronic_Conditions Percent_of_Beneficiaries_with_Alzheimer's
## Min. :4.800 Min. :0.0000
## 1st Qu.:7.100 1st Qu.:0.5600
## Median :7.400 Median :0.6300
## Mean :7.404 Mean :0.6162
## 3rd Qu.:7.800 3rd Qu.:0.6900
## Max. :9.300 Max. :0.7500
## NA's :322
## Percent_of_Beneficiaries_with_CHF
## Min. :0.2700
## 1st Qu.:0.5000
## Median :0.5600
## Mean :0.5627
## 3rd Qu.:0.6200
## Max. :0.7500
## NA's :81
## Percent_of_Beneficiaries_with_Chronic_Kidney_Disease
## Min. :0.3900
## 1st Qu.:0.6000
## Median :0.6500
## Mean :0.6431
## 3rd Qu.:0.6900
## Max. :0.7500
## NA's :303
## Percent_of_Beneficiaries_with_COPD Percent_of_Beneficiaries_with_Depression
## Min. :0.1600 Min. :0.2700
## 1st Qu.:0.3400 1st Qu.:0.5000
## Median :0.3900 Median :0.5700
## Mean :0.4003 Mean :0.5647
## 3rd Qu.:0.4500 3rd Qu.:0.6300
## Max. :0.7400 Max. :0.7500
## NA's :148 NA's :168
## Percent_of_Beneficiaries_with_Diabetes
## Min. :0.2300
## 1st Qu.:0.4500
## Median :0.5200
## Mean :0.5238
## 3rd Qu.:0.6000
## Max. :0.7500
## NA's :134
## Percent_of_Beneficiaries_with_Hyperlipidemia Percent_of_Beneficiaries_with_IHD
## Min. :0.3100 Min. :0.3400
## 1st Qu.:0.6100 1st Qu.:0.5600
## Median :0.6600 Median :0.6100
## Mean :0.6469 Mean :0.6085
## 3rd Qu.:0.7000 3rd Qu.:0.6600
## Max. :0.7500 Max. :0.7500
## NA's :282 NA's :147
## Percent_of_Beneficiaries_with_RA/OA Total_Physical_Therapy_Minutes
## Min. :0.2900 Min. : 1609
## 1st Qu.:0.5400 1st Qu.: 17967
## Median :0.6100 Median : 31928
## Mean :0.5984 Mean : 41109
## 3rd Qu.:0.6700 3rd Qu.: 54116
## Max. :0.7500 Max. :303544
## NA's :144 NA's :39
## Total_Occupational_Therapy_Minutes Total_Speech-Language_Pathology_Minutes
## Min. : 345 Min. : 0
## 1st Qu.: 18012 1st Qu.: 6970
## Median : 31533 Median : 13193
## Mean : 40061 Mean : 15469
## 3rd Qu.: 52849 3rd Qu.: 21160
## Max. :312564 Max. :128376
## NA's :39 NA's :39
## Assessment_Denominator
## Min. : 11.0
## 1st Qu.: 71.0
## Median : 122.0
## Mean : 152.3
## 3rd Qu.: 201.5
## Max. :1046.0
## NA's :39
## Percentage_of_RV_Assessments_within_10_minutes_of_Threshold
## Min. :0.0000
## 1st Qu.:0.3300
## Median :0.5700
## Mean :0.5398
## 3rd Qu.:0.7600
## Max. :1.0000
## NA's :303
## Percentage_of_RU_Assessments_within_10_minutes_of_Threshold
## Min. :0.0100
## 1st Qu.:0.5200
## Median :0.7600
## Mean :0.6838
## 3rd Qu.:0.9100
## Max. :1.0000
## NA's :87
## Percentage_with_an_Inpatient_Hospital_Discharge_Status
## Min. :0.0000
## 1st Qu.:0.1800
## Median :0.2200
## Mean :0.2166
## 3rd Qu.:0.2500
## Max. :0.5600
## NA's :302
## Percentage_with_a_Home_Health_Discharge_Status
## Min. :0.00000
## 1st Qu.:0.00000
## Median :0.00000
## Mean :0.07817
## 3rd Qu.:0.14000
## Max. :0.73000
## NA's :220
## Percentage_with_a_Hospice_Discharge_Status
## Min. :0e+00
## 1st Qu.:0e+00
## Median :0e+00
## Mean :3e-05
## 3rd Qu.:0e+00
## Max. :3e-02
## NA's :210
## Percentage_with_an_Unknown_Discharge_Status Percentage_of_Days_in_SNF_AAA_RUG
## Min. :0.0000 Min. :0.0000000
## 1st Qu.:0.1700 1st Qu.:0.0000000
## Median :0.2400 Median :0.0000000
## Mean :0.2431 Mean :0.0005785
## 3rd Qu.:0.3200 3rd Qu.:0.0000000
## Max. :0.7600 Max. :0.0500000
## NA's :151
## Percentage_of_Days_in_SNF_Low,_Medium_or_High_Rehabilitation_RUGs
## Min. :0.000
## 1st Qu.:0.030
## Median :0.050
## Mean :0.085
## 3rd Qu.:0.100
## Max. :0.950
##
## Percentage_of_Days_in_SNF_Very-High_Rehabilitation_RUGs
## Min. :0.0000
## 1st Qu.:0.1200
## Median :0.1900
## Mean :0.2087
## 3rd Qu.:0.2800
## Max. :0.8400
##
## Percentage_of_Days_in_SNF_Ultra-High_Rehabilitation_RUGs
## Min. :0.0000
## 1st Qu.:0.5500
## Median :0.6800
## Mean :0.6449
## 3rd Qu.:0.7900
## Max. :0.9900
##
## Percentage_of_Days_in_SNF_Clinically_Complex_RUGs
## Min. :0.00000
## 1st Qu.:0.00000
## Median :0.01000
## Mean :0.01515
## 3rd Qu.:0.02000
## Max. :0.73000
##
## Percentage_of_Days_in_SNF_Reduced_Physical_Functioning_RUGs
## Min. :0.000000
## 1st Qu.:0.000000
## Median :0.010000
## Mean :0.008942
## 3rd Qu.:0.010000
## Max. :0.300000
##
## Percentage_of_Days_in_SNF_Special_High_Care_RUGs
## Min. :0.00000
## 1st Qu.:0.01000
## Median :0.02000
## Mean :0.02883
## 3rd Qu.:0.04000
## Max. :0.37000
##
## Percentage_of_Days_in_SNF_Behavioral_Symptom_RUGs
## Min. :0.000000
## 1st Qu.:0.000000
## Median :0.000000
## Mean :0.001983
## 3rd Qu.:0.000000
## Max. :0.200000
##
## Percentage_of_Days_in_SNF_Extensive_Service_RUGs
## Min. :0.000000
## 1st Qu.:0.000000
## Median :0.000000
## Mean :0.004388
## 3rd Qu.:0.000000
## Max. :0.490000
##
There are no outliers in the dataset, we will replace the rest of NA values with the mean of the corresponding variables.
SNF_TX <- SNF_TX %>%
mutate_if(is.numeric, zoo::na.aggregate)
# check the sum of NA values, it should be 0
sum(is.na(SNF_TX))
## [1] 0
There are two variables have zero or near zero vairance, we will delete them since they don’t provide useful information for the analysis.
SNF_TX$Percent_American_Indian_or_Alaska_Native_Beneficiaries <- NULL
SNF_TX$Percentage_with_a_Hospice_Discharge_Status <- NULL
# move dependent variables to the end
SNF_TX <- SNF_TX %>%
select(-c(6:10), everything())
SNF_TX <- SNF_TX %>%
select(-41, everything())
corrM <- cor(SNF_TX)
# high coefficient variables impact total payments
hc <- corrM[, "Total_Charge_Amount"]
hc <- hc[which(hc > 0.7)]
hc <- as.data.frame(hc)
hc
## hc
## Distinct_Beneficiaries 0.8462809
## Episode_or_Stay_Count 0.8480204
## Days_of_Service 0.8396123
## Total_Physical_Therapy_Minutes 0.8150268
## Total_Occupational_Therapy_Minutes 0.7993695
## Assessment_Denominator 0.8055552
## Total_Medicare_Standard_Payment_Amount 0.8449340
## Total_Medicare_Payment_Amount 0.8446619
## Total_Allowed_Amount 0.8380941
## Total_Coinsurance_Amount 0.7503724
## Total_Charge_Amount 1.0000000
Above are ten factors which have correlation coefficient over 0.7. We will compare these factors from Texas State and the Nation in the next step.
# subset a dataset with Nation and Texas State data
SNF_COMP <- filter(PAC, Summary_Category %in% c("NATION", "STATE"), State %in% c("TX", "NATIONAL_TOTAL"), Service_Category == "SNF")
# subset a dataset with high correlation coefficient variables
SNF_COMP <- SNF_COMP[, c("Distinct_Beneficiaries", "Episode_or_Stay_Count", "Days_of_Service", "Total_Physical_Therapy_Minutes", "Total_Occupational_Therapy_Minutes", "Assessment_Denominator", "Total_Medicare_Standard_Payment_Amount", "Total_Medicare_Payment_Amount", "Total_Allowed_Amount", "Total_Coinsurance_Amount")]
SNF_COMP
## # A tibble: 2 x 10
## Distinct_Benefi… Episode_or_Stay… Days_of_Service Total_Physical_…
## <dbl> <dbl> <dbl> <chr>
## 1 1927773 2413877 60186065 777219672
## 2 121407 153468 4291027 48219943
## # … with 6 more variables: Total_Occupational_Therapy_Minutes <chr>,
## # Assessment_Denominator <chr>, Total_Medicare_Standard_Payment_Amount <dbl>,
## # Total_Medicare_Payment_Amount <dbl>, Total_Allowed_Amount <chr>,
## # Total_Coinsurance_Amount <chr>
# transpose the dataset for a better visualization and manipulation
SNF_NS <- transpose(SNF_COMP)
rownames(SNF_NS) <- colnames(SNF_COMP)
colnames(SNF_NS) <- c("Nation", "Texas")
SNF_NS$Nation <- as.numeric(SNF_NS$Nation)
SNF_NS$Texas <- as.numeric(SNF_NS$Texas)
SNF_NS
## Nation Texas
## Distinct_Beneficiaries 1927773 121407
## Episode_or_Stay_Count 2413877 153468
## Days_of_Service 60186065 4291027
## Total_Physical_Therapy_Minutes 777219672 48219943
## Total_Occupational_Therapy_Minutes 733831157 46985914
## Assessment_Denominator 2825031 178669
## Total_Medicare_Standard_Payment_Amount 26962514767 1937021638
## Total_Medicare_Payment_Amount 26274406911 1735050513
## Total_Allowed_Amount 31586472960 2148533654
## Total_Coinsurance_Amount 5312066049 413483141
# add columns reflect the ratio for Texas and the Nation
SNF_NS$Nation_Ratio <- round(SNF_NS$Nation / 2413877, 2)
SNF_NS$Texas_Ratio <- round(SNF_NS$Texas / 153468, 2)
SNF_NS
## Nation Texas Nation_Ratio
## Distinct_Beneficiaries 1927773 121407 0.80
## Episode_or_Stay_Count 2413877 153468 1.00
## Days_of_Service 60186065 4291027 24.93
## Total_Physical_Therapy_Minutes 777219672 48219943 321.98
## Total_Occupational_Therapy_Minutes 733831157 46985914 304.01
## Assessment_Denominator 2825031 178669 1.17
## Total_Medicare_Standard_Payment_Amount 26962514767 1937021638 11169.80
## Total_Medicare_Payment_Amount 26274406911 1735050513 10884.73
## Total_Allowed_Amount 31586472960 2148533654 13085.37
## Total_Coinsurance_Amount 5312066049 413483141 2200.64
## Texas_Ratio
## Distinct_Beneficiaries 0.79
## Episode_or_Stay_Count 1.00
## Days_of_Service 27.96
## Total_Physical_Therapy_Minutes 314.20
## Total_Occupational_Therapy_Minutes 306.16
## Assessment_Denominator 1.16
## Total_Medicare_Standard_Payment_Amount 12621.66
## Total_Medicare_Payment_Amount 11305.62
## Total_Allowed_Amount 13999.88
## Total_Coinsurance_Amount 2694.26
variables <- c("Distinct_Beneficiaries", "Days_of_Service", "Assessment_Denominator")
p <- plot_ly(SNF_NS, x = ~variables, y = ~SNF_NS$Nation_Ratio[c(1,3,6)], type = "bar", name = "Nation_Ratio") %>%
add_trace(y = ~SNF_NS$Texas_Ratio[c(1,3,6)], name = "Texas_Ratio") %>%
layout(title = "Nation and Texas State Comparison",
yaxis = list (title = "Ratio"))
p
While Assessment_Denominator and Distinct_Beneficiaries are almost the same for Texas and the Nation, Days_of_Service of Texas is above national average.
variables1 <- c("Total_Physical_Therapy_Minutes", "Total_Occupational_Therapy_Minutes", "Total_Medicare_Standard_Payment_Amount", "Total_Medicare_Payment_Amount", "Total_Allowed_Amount", "Total_Coinsurance_Amount")
p1 <- plot_ly(SNF_NS, x = ~variables1, y = ~SNF_NS$Nation_Ratio[c(4,5,7:10)], type = "bar", name = "Nation_Ratio") %>%
add_trace(y = ~SNF_NS$Texas_Ratio[c(4,5,7:10)], name = "Texas_Ratio") %>%
layout(title = "Nation and Texas State Comparison",
yaxis = list (title = "Ratio"))
p1
Total_Physical_Therapy_Minutes and Total_Occupational_Therapy_Minutes appear the be very similar for Texas and the Nation. Total_Allowed_Amount and Total_Medicare_Standard_Payment_Amount of Texas appear to be over the Nation the most.
From the correlation coefficient analysis and the comparison with National data, we find that Days_of_Service(Total count of covered days delivered by a provider in the calendar year) could be the potential direction that we can work towards to reduce the total costs.
The comparison of Texas and the Nation shows that Texas has higher Days_of_Service per stay than the Nation, while the rest of the factors are very similar for the two.
Total_Physical_Therapy_Minutes and Total_Occupational_Therapy_Minutes also have high impact on Total_Charge_Amount. At the same time, both factors also directly related to Days_of_Service.
write.csv(SNF_TX, "SNF_TX.csv")