Post-Acute Care Analysis - Skilled Nursing Facilities

Part 1: Analysis on the factors that impact the total charge amount

Overview

U.S. healthcare is the most expensive in the world. Post-acute care is a big driver of costs and variability in U.S. Healthcare. Given that it is the Medicare population that is the highest utilizer of post-acute care, the Center for Medicare and Medicaid Services (CMS) publishes data on post-acute providers.We are going to utilize the public data and provide insight into the utilization of post-acute care.

Objective

We will analyze Texas Skilled Nursing Facilities focusing on the following aspects:

  • Find high-impact factors that drive the total charge amount.
  • Analyze how the high-impact factors of Texas compared to the nation.
  • Create a model that predict total payment amount.(This will come in the second part.)
  • Investigate the factor importance through the model.(This will come in the second part.)

Analysis

Factors impact total charge amount

  • Call libraries and set seed
# call libraries and set seed
library(readxl)
library(caret)
library(plotly)
library(corrplot)
library(dplyr)
library(data.table)
set.seed(123)
  • Upload the full post acute care dataset
PAC <- read_excel("PAC Provider PUF_v20191002 2.xlsx", sheet = 3)
  • Subset and inspect SNF data from Texas
SNF_TX <- filter(PAC, State == "TX", Service_Category == "SNF")
# delect STATE_TOTAL
SNF_TX <- SNF_TX[-1, ] 
# delect unrelevant variables: Summary_Category, State, Service_Category, Provider_ID, Facility_Name, City and ZIP
SNF_TX <- SNF_TX[, -(1:7)]
  • Convert data type from character to numeric
i <- 1 : 115
SNF_TX[i] <- lapply(SNF_TX[i], as.numeric)
  • Inspect, delete and replace missing values
# check if there is NA values
sum(is.na(SNF_TX))
## [1] 71205
# delete columns have more than 30%(363) NA values
SNF_TX <- SNF_TX[colSums(!is.na(SNF_TX)) > 846]
ncol(SNF_TX)
## [1] 47
# check the summary of each variable to see if there is outliers and zero-variable
summary(SNF_TX)
##  Distinct_Beneficiaries Episode_or_Stay_Count Days_of_Service
##  Min.   : 11.0          Min.   : 11.0         Min.   :   72  
##  1st Qu.: 43.0          1st Qu.: 55.0         1st Qu.: 1753  
##  Median : 75.0          Median : 97.5         Median : 2929  
##  Mean   :100.2          Mean   :126.6         Mean   : 3541  
##  3rd Qu.:132.5          3rd Qu.:170.0         3rd Qu.: 4635  
##  Max.   :790.0          Max.   :954.0         Max.   :21118  
##                                                              
##  Percent_of_Beneficiaries_with_30_or_fewer_Service_Days
##  Min.   :0.1700                                        
##  1st Qu.:0.3600                                        
##  Median :0.4400                                        
##  Mean   :0.4565                                        
##  3rd Qu.:0.5200                                        
##  Max.   :1.0000                                        
##  NA's   :185                                           
##  Percent_of_Beneficiaries_with_60_or_more_Service_Days Total_Charge_Amount
##  Min.   :0.0000                                        Min.   :   79212   
##  1st Qu.:0.1500                                        1st Qu.: 1075750   
##  Median :0.2400                                        Median : 1916924   
##  Mean   :0.2503                                        Mean   : 2494034   
##  3rd Qu.:0.3400                                        3rd Qu.: 3337799   
##  Max.   :0.6800                                        Max.   :29343398   
##  NA's   :297                                                              
##  Total_Medicare_Standard_Payment_Amount Total_Medicare_Payment_Amount
##  Min.   :   24012                       Min.   :  19480              
##  1st Qu.:  753128                       1st Qu.: 666260              
##  Median : 1305099                       Median :1134261              
##  Mean   : 1598570                       Mean   :1431973              
##  3rd Qu.: 2148067                       3rd Qu.:1885673              
##  Max.   :10138651                       Max.   :9506124              
##                                                                      
##  Total_Allowed_Amount Total_Coinsurance_Amount Percent_Dual_Beneficiaries
##  Min.   :   24744     Min.   :    494          Min.   :0.0000            
##  1st Qu.:  845538     1st Qu.: 182883          1st Qu.:0.2900            
##  Median : 1434888     Median : 291823          Median :0.4600            
##  Mean   : 1773101     Mean   : 341128          Mean   :0.4514            
##  3rd Qu.: 2348982     3rd Qu.: 446042          3rd Qu.:0.6000            
##  Max.   :11414121     Max.   :1907997          Max.   :1.0000            
##                                                NA's   :117               
##  Percent_Medicare_Beneficiaries_in_a_Rural_ZIP  Average_Age   
##  Min.   :0.0000                                Min.   :60.00  
##  1st Qu.:0.0200                                1st Qu.:76.00  
##  Median :0.0400                                Median :78.00  
##  Mean   :0.1871                                Mean   :77.75  
##  3rd Qu.:0.1100                                3rd Qu.:80.00  
##  Max.   :1.0000                                Max.   :90.00  
##                                                               
##  Percent_Male_Beneficiaries Percent_Female_Beneficiaries
##  Min.   :0.2000             Min.   :0.0000              
##  1st Qu.:0.3600             1st Qu.:0.5300              
##  Median :0.4100             Median :0.5900              
##  Mean   :0.4206             Mean   :0.5796              
##  3rd Qu.:0.4700             3rd Qu.:0.6400              
##  Max.   :1.0000             Max.   :0.8000              
##  NA's   :145                NA's   :145                 
##  Percent_White_Beneficiaries
##  Min.   :0.0700             
##  1st Qu.:0.6200             
##  Median :0.7900             
##  Mean   :0.7306             
##  3rd Qu.:0.8900             
##  Max.   :1.0000             
##  NA's   :69                 
##  Percent_American_Indian_or_Alaska_Native_Beneficiaries Average_HCC_Score
##  Min.   :0                                              Min.   :1.290    
##  1st Qu.:0                                              1st Qu.:2.260    
##  Median :0                                              Median :2.600    
##  Mean   :0                                              Mean   :2.713    
##  3rd Qu.:0                                              3rd Qu.:3.087    
##  Max.   :0                                              Max.   :6.440    
##  NA's   :267                                                             
##  Average_Number_of_Chronic_Conditions Percent_of_Beneficiaries_with_Alzheimer's
##  Min.   :4.800                        Min.   :0.0000                           
##  1st Qu.:7.100                        1st Qu.:0.5600                           
##  Median :7.400                        Median :0.6300                           
##  Mean   :7.404                        Mean   :0.6162                           
##  3rd Qu.:7.800                        3rd Qu.:0.6900                           
##  Max.   :9.300                        Max.   :0.7500                           
##                                       NA's   :322                              
##  Percent_of_Beneficiaries_with_CHF
##  Min.   :0.2700                   
##  1st Qu.:0.5000                   
##  Median :0.5600                   
##  Mean   :0.5627                   
##  3rd Qu.:0.6200                   
##  Max.   :0.7500                   
##  NA's   :81                       
##  Percent_of_Beneficiaries_with_Chronic_Kidney_Disease
##  Min.   :0.3900                                      
##  1st Qu.:0.6000                                      
##  Median :0.6500                                      
##  Mean   :0.6431                                      
##  3rd Qu.:0.6900                                      
##  Max.   :0.7500                                      
##  NA's   :303                                         
##  Percent_of_Beneficiaries_with_COPD Percent_of_Beneficiaries_with_Depression
##  Min.   :0.1600                     Min.   :0.2700                          
##  1st Qu.:0.3400                     1st Qu.:0.5000                          
##  Median :0.3900                     Median :0.5700                          
##  Mean   :0.4003                     Mean   :0.5647                          
##  3rd Qu.:0.4500                     3rd Qu.:0.6300                          
##  Max.   :0.7400                     Max.   :0.7500                          
##  NA's   :148                        NA's   :168                             
##  Percent_of_Beneficiaries_with_Diabetes
##  Min.   :0.2300                        
##  1st Qu.:0.4500                        
##  Median :0.5200                        
##  Mean   :0.5238                        
##  3rd Qu.:0.6000                        
##  Max.   :0.7500                        
##  NA's   :134                           
##  Percent_of_Beneficiaries_with_Hyperlipidemia Percent_of_Beneficiaries_with_IHD
##  Min.   :0.3100                               Min.   :0.3400                   
##  1st Qu.:0.6100                               1st Qu.:0.5600                   
##  Median :0.6600                               Median :0.6100                   
##  Mean   :0.6469                               Mean   :0.6085                   
##  3rd Qu.:0.7000                               3rd Qu.:0.6600                   
##  Max.   :0.7500                               Max.   :0.7500                   
##  NA's   :282                                  NA's   :147                      
##  Percent_of_Beneficiaries_with_RA/OA Total_Physical_Therapy_Minutes
##  Min.   :0.2900                      Min.   :  1609                
##  1st Qu.:0.5400                      1st Qu.: 17967                
##  Median :0.6100                      Median : 31928                
##  Mean   :0.5984                      Mean   : 41109                
##  3rd Qu.:0.6700                      3rd Qu.: 54116                
##  Max.   :0.7500                      Max.   :303544                
##  NA's   :144                         NA's   :39                    
##  Total_Occupational_Therapy_Minutes Total_Speech-Language_Pathology_Minutes
##  Min.   :   345                     Min.   :     0                         
##  1st Qu.: 18012                     1st Qu.:  6970                         
##  Median : 31533                     Median : 13193                         
##  Mean   : 40061                     Mean   : 15469                         
##  3rd Qu.: 52849                     3rd Qu.: 21160                         
##  Max.   :312564                     Max.   :128376                         
##  NA's   :39                         NA's   :39                             
##  Assessment_Denominator
##  Min.   :  11.0        
##  1st Qu.:  71.0        
##  Median : 122.0        
##  Mean   : 152.3        
##  3rd Qu.: 201.5        
##  Max.   :1046.0        
##  NA's   :39            
##  Percentage_of_RV_Assessments_within_10_minutes_of_Threshold
##  Min.   :0.0000                                             
##  1st Qu.:0.3300                                             
##  Median :0.5700                                             
##  Mean   :0.5398                                             
##  3rd Qu.:0.7600                                             
##  Max.   :1.0000                                             
##  NA's   :303                                                
##  Percentage_of_RU_Assessments_within_10_minutes_of_Threshold
##  Min.   :0.0100                                             
##  1st Qu.:0.5200                                             
##  Median :0.7600                                             
##  Mean   :0.6838                                             
##  3rd Qu.:0.9100                                             
##  Max.   :1.0000                                             
##  NA's   :87                                                 
##  Percentage_with_an_Inpatient_Hospital_Discharge_Status
##  Min.   :0.0000                                        
##  1st Qu.:0.1800                                        
##  Median :0.2200                                        
##  Mean   :0.2166                                        
##  3rd Qu.:0.2500                                        
##  Max.   :0.5600                                        
##  NA's   :302                                           
##  Percentage_with_a_Home_Health_Discharge_Status
##  Min.   :0.00000                               
##  1st Qu.:0.00000                               
##  Median :0.00000                               
##  Mean   :0.07817                               
##  3rd Qu.:0.14000                               
##  Max.   :0.73000                               
##  NA's   :220                                   
##  Percentage_with_a_Hospice_Discharge_Status
##  Min.   :0e+00                             
##  1st Qu.:0e+00                             
##  Median :0e+00                             
##  Mean   :3e-05                             
##  3rd Qu.:0e+00                             
##  Max.   :3e-02                             
##  NA's   :210                               
##  Percentage_with_an_Unknown_Discharge_Status Percentage_of_Days_in_SNF_AAA_RUG
##  Min.   :0.0000                              Min.   :0.0000000                
##  1st Qu.:0.1700                              1st Qu.:0.0000000                
##  Median :0.2400                              Median :0.0000000                
##  Mean   :0.2431                              Mean   :0.0005785                
##  3rd Qu.:0.3200                              3rd Qu.:0.0000000                
##  Max.   :0.7600                              Max.   :0.0500000                
##  NA's   :151                                                                  
##  Percentage_of_Days_in_SNF_Low,_Medium_or_High_Rehabilitation_RUGs
##  Min.   :0.000                                                    
##  1st Qu.:0.030                                                    
##  Median :0.050                                                    
##  Mean   :0.085                                                    
##  3rd Qu.:0.100                                                    
##  Max.   :0.950                                                    
##                                                                   
##  Percentage_of_Days_in_SNF_Very-High_Rehabilitation_RUGs
##  Min.   :0.0000                                         
##  1st Qu.:0.1200                                         
##  Median :0.1900                                         
##  Mean   :0.2087                                         
##  3rd Qu.:0.2800                                         
##  Max.   :0.8400                                         
##                                                         
##  Percentage_of_Days_in_SNF_Ultra-High_Rehabilitation_RUGs
##  Min.   :0.0000                                          
##  1st Qu.:0.5500                                          
##  Median :0.6800                                          
##  Mean   :0.6449                                          
##  3rd Qu.:0.7900                                          
##  Max.   :0.9900                                          
##                                                          
##  Percentage_of_Days_in_SNF_Clinically_Complex_RUGs
##  Min.   :0.00000                                  
##  1st Qu.:0.00000                                  
##  Median :0.01000                                  
##  Mean   :0.01515                                  
##  3rd Qu.:0.02000                                  
##  Max.   :0.73000                                  
##                                                   
##  Percentage_of_Days_in_SNF_Reduced_Physical_Functioning_RUGs
##  Min.   :0.000000                                           
##  1st Qu.:0.000000                                           
##  Median :0.010000                                           
##  Mean   :0.008942                                           
##  3rd Qu.:0.010000                                           
##  Max.   :0.300000                                           
##                                                             
##  Percentage_of_Days_in_SNF_Special_High_Care_RUGs
##  Min.   :0.00000                                 
##  1st Qu.:0.01000                                 
##  Median :0.02000                                 
##  Mean   :0.02883                                 
##  3rd Qu.:0.04000                                 
##  Max.   :0.37000                                 
##                                                  
##  Percentage_of_Days_in_SNF_Behavioral_Symptom_RUGs
##  Min.   :0.000000                                 
##  1st Qu.:0.000000                                 
##  Median :0.000000                                 
##  Mean   :0.001983                                 
##  3rd Qu.:0.000000                                 
##  Max.   :0.200000                                 
##                                                   
##  Percentage_of_Days_in_SNF_Extensive_Service_RUGs
##  Min.   :0.000000                                
##  1st Qu.:0.000000                                
##  Median :0.000000                                
##  Mean   :0.004388                                
##  3rd Qu.:0.000000                                
##  Max.   :0.490000                                
## 

There are no outliers in the dataset, we will replace the rest of NA values with the mean of the corresponding variables.

SNF_TX <- SNF_TX %>%
        mutate_if(is.numeric, zoo::na.aggregate)
# check the sum of NA values, it should be 0
sum(is.na(SNF_TX))
## [1] 0

There are two variables have zero or near zero vairance, we will delete them since they don’t provide useful information for the analysis.

SNF_TX$Percent_American_Indian_or_Alaska_Native_Beneficiaries <- NULL
SNF_TX$Percentage_with_a_Hospice_Discharge_Status <- NULL
  • Move the dependent variable(payment) to the end, for an easier observation.
# move dependent variables to the end
SNF_TX <- SNF_TX %>%
        select(-c(6:10), everything())
SNF_TX <- SNF_TX %>%
        select(-41, everything())

Use correlation coefficients to see which factors has more impact on total payments

corrM <- cor(SNF_TX)
# high coefficient variables impact total payments
hc <- corrM[, "Total_Charge_Amount"]
hc <- hc[which(hc > 0.7)]
hc <- as.data.frame(hc)
hc
##                                               hc
## Distinct_Beneficiaries                 0.8462809
## Episode_or_Stay_Count                  0.8480204
## Days_of_Service                        0.8396123
## Total_Physical_Therapy_Minutes         0.8150268
## Total_Occupational_Therapy_Minutes     0.7993695
## Assessment_Denominator                 0.8055552
## Total_Medicare_Standard_Payment_Amount 0.8449340
## Total_Medicare_Payment_Amount          0.8446619
## Total_Allowed_Amount                   0.8380941
## Total_Coinsurance_Amount               0.7503724
## Total_Charge_Amount                    1.0000000

Above are ten factors which have correlation coefficient over 0.7. We will compare these factors from Texas State and the Nation in the next step.

Comparison of Texas and the Nation

  • Subset a dataset include SNF in Texas State and in the Nation
# subset a dataset with Nation and Texas State data
SNF_COMP <- filter(PAC, Summary_Category %in% c("NATION", "STATE"), State %in% c("TX", "NATIONAL_TOTAL"), Service_Category == "SNF")
# subset a dataset with high correlation coefficient variables
SNF_COMP <- SNF_COMP[, c("Distinct_Beneficiaries", "Episode_or_Stay_Count", "Days_of_Service", "Total_Physical_Therapy_Minutes", "Total_Occupational_Therapy_Minutes", "Assessment_Denominator", "Total_Medicare_Standard_Payment_Amount", "Total_Medicare_Payment_Amount", "Total_Allowed_Amount", "Total_Coinsurance_Amount")]
SNF_COMP
## # A tibble: 2 x 10
##   Distinct_Benefi… Episode_or_Stay… Days_of_Service Total_Physical_…
##              <dbl>            <dbl>           <dbl> <chr>           
## 1          1927773          2413877        60186065 777219672       
## 2           121407           153468         4291027 48219943        
## # … with 6 more variables: Total_Occupational_Therapy_Minutes <chr>,
## #   Assessment_Denominator <chr>, Total_Medicare_Standard_Payment_Amount <dbl>,
## #   Total_Medicare_Payment_Amount <dbl>, Total_Allowed_Amount <chr>,
## #   Total_Coinsurance_Amount <chr>
# transpose the dataset for a better visualization and manipulation 
SNF_NS <- transpose(SNF_COMP)
rownames(SNF_NS) <- colnames(SNF_COMP)
colnames(SNF_NS) <- c("Nation", "Texas")
SNF_NS$Nation <- as.numeric(SNF_NS$Nation)
SNF_NS$Texas <- as.numeric(SNF_NS$Texas)
SNF_NS
##                                             Nation      Texas
## Distinct_Beneficiaries                     1927773     121407
## Episode_or_Stay_Count                      2413877     153468
## Days_of_Service                           60186065    4291027
## Total_Physical_Therapy_Minutes           777219672   48219943
## Total_Occupational_Therapy_Minutes       733831157   46985914
## Assessment_Denominator                     2825031     178669
## Total_Medicare_Standard_Payment_Amount 26962514767 1937021638
## Total_Medicare_Payment_Amount          26274406911 1735050513
## Total_Allowed_Amount                   31586472960 2148533654
## Total_Coinsurance_Amount                5312066049  413483141
  • Compare the ratio of the factors and Episode_or_Stay_Count By comparing the factors per stay for Texas and the Nation, we hope to find out which direction we should target at inorder to reduce the cost.
# add columns reflect the ratio for Texas and the Nation
SNF_NS$Nation_Ratio <- round(SNF_NS$Nation / 2413877, 2)
SNF_NS$Texas_Ratio <- round(SNF_NS$Texas / 153468, 2)
SNF_NS
##                                             Nation      Texas Nation_Ratio
## Distinct_Beneficiaries                     1927773     121407         0.80
## Episode_or_Stay_Count                      2413877     153468         1.00
## Days_of_Service                           60186065    4291027        24.93
## Total_Physical_Therapy_Minutes           777219672   48219943       321.98
## Total_Occupational_Therapy_Minutes       733831157   46985914       304.01
## Assessment_Denominator                     2825031     178669         1.17
## Total_Medicare_Standard_Payment_Amount 26962514767 1937021638     11169.80
## Total_Medicare_Payment_Amount          26274406911 1735050513     10884.73
## Total_Allowed_Amount                   31586472960 2148533654     13085.37
## Total_Coinsurance_Amount                5312066049  413483141      2200.64
##                                        Texas_Ratio
## Distinct_Beneficiaries                        0.79
## Episode_or_Stay_Count                         1.00
## Days_of_Service                              27.96
## Total_Physical_Therapy_Minutes              314.20
## Total_Occupational_Therapy_Minutes          306.16
## Assessment_Denominator                        1.16
## Total_Medicare_Standard_Payment_Amount    12621.66
## Total_Medicare_Payment_Amount             11305.62
## Total_Allowed_Amount                      13999.88
## Total_Coinsurance_Amount                   2694.26
  • Plot the comparison for Distinct_Beneficiaries, Days_of_Service, Assessment_Denominator
variables <- c("Distinct_Beneficiaries", "Days_of_Service", "Assessment_Denominator")

p <- plot_ly(SNF_NS, x = ~variables, y = ~SNF_NS$Nation_Ratio[c(1,3,6)], type = "bar", name = "Nation_Ratio") %>%
        add_trace(y = ~SNF_NS$Texas_Ratio[c(1,3,6)], name = "Texas_Ratio") %>%
        layout(title = "Nation and Texas State Comparison",
               yaxis = list (title = "Ratio"))
p

While Assessment_Denominator and Distinct_Beneficiaries are almost the same for Texas and the Nation, Days_of_Service of Texas is above national average.

  • Plot the comparison for the rest of the factors.
variables1 <- c("Total_Physical_Therapy_Minutes", "Total_Occupational_Therapy_Minutes", "Total_Medicare_Standard_Payment_Amount", "Total_Medicare_Payment_Amount", "Total_Allowed_Amount", "Total_Coinsurance_Amount")

p1 <- plot_ly(SNF_NS, x = ~variables1, y = ~SNF_NS$Nation_Ratio[c(4,5,7:10)], type = "bar", name = "Nation_Ratio") %>%
        add_trace(y = ~SNF_NS$Texas_Ratio[c(4,5,7:10)], name = "Texas_Ratio") %>%
        layout(title = "Nation and Texas State Comparison",
               yaxis = list (title = "Ratio"))
p1

Total_Physical_Therapy_Minutes and Total_Occupational_Therapy_Minutes appear the be very similar for Texas and the Nation. Total_Allowed_Amount and Total_Medicare_Standard_Payment_Amount of Texas appear to be over the Nation the most.

Conclusion

From the correlation coefficient analysis and the comparison with National data, we find that Days_of_Service(Total count of covered days delivered by a provider in the calendar year) could be the potential direction that we can work towards to reduce the total costs.

The comparison of Texas and the Nation shows that Texas has higher Days_of_Service per stay than the Nation, while the rest of the factors are very similar for the two.

Total_Physical_Therapy_Minutes and Total_Occupational_Therapy_Minutes also have high impact on Total_Charge_Amount. At the same time, both factors also directly related to Days_of_Service.

Export SNF_TX dataset for model building

write.csv(SNF_TX, "SNF_TX.csv")