Our primary data source is Definitive Healthcare, which provides comprehensive data extracted from publicly available information, including from federal, state, and other regulatory agencies, in addition to licensed data from other companies, web research on publicly available information through technology as well as electronic and phone surveys conducted by their research team based in the United States.38 Our final dataset consists of all active, short term acute care hospitals in the United States with an active Medicare Provider Number (MPN). Veterans Administration, Military Healthcare System, and Indian Health System hospitals were excluded from the analysis as the outcome measures used in this study are not available for federal government operated hospitals. This limitation reduced our total data set to 2,900 unique hospital observations and the dataset was reduced further based on the number of institutions reporting data for each dependent variable. The total number of observations available for each measure is detailed in the section below.
Given the variety of perspectives and robust number of quality measures available, three aggregate dependent variables were used for this study. The first variable was the 2021 value-based purchasing Total Performance Score. Value-based purchasing is a CMS program that adjusts a hospital’s payments based on its performance in four weighted quality measurement domains to comprise its Total Performance Score. The domains include (1) clinical outcomes, (2) patient and community engagement, (3) safety, and (4) efficiency and cost reduction.17 A total of 2,471 observations for the Hospital Value Based Purchasing Total Performance Score from the year 2021 were available in the final analyzed data set.
The second dependent variable included in the study is the Hospital Compare overall rating. Hospital Compare rating provides consumer-focused information on hospitals’ performance on numerous dimensions of quality, such as treating heart attacks and pneumonia, readmission rates, and safety of care.14 For this study, 2,592 observations were collected from the year 2021 for analysis.
The third dependent variable includes the HCAHPS (Hospital Consumer Assessment of Healthcare Providers and Systems) summary star rating. The HCAHPS survey asks discharged patients numerous questions about their experience with a hospital stay including questions pertaining to communication with nurses and doctors, the responsiveness of hospital staff, the cleanliness and quietness of the hospital environment, communication about medicines, discharge information, overall rating of hospital, and would they recommend the hospital. The patient survey (HCAHPS) summary star rating is the average of all the Star Ratings of the HCAHPS measures.15 Our final data set for analysis included 2,624 observations from the year 2021 for the HCAHPS summary star rating.
require(Amelia)
## Loading required package: Amelia
## Loading required package: Rcpp
## ##
## ## Amelia II: Multiple Imputation
## ## (Version 1.8.1, built: 2022-11-18)
## ## Copyright (C) 2005-2023 James Honaker, Gary King and Matthew Blackwell
## ## Refer to http://gking.harvard.edu/amelia/ for more information
## ##
require(car)
## Loading required package: car
## Loading required package: carData
require(caret)
## Loading required package: caret
## Loading required package: ggplot2
## Loading required package: lattice
require(fastDummies)
## Loading required package: fastDummies
require(GGally)
## Loading required package: GGally
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
require(ggplot2)
require(kableExtra)
## Loading required package: kableExtra
## Warning in !is.null(rmarkdown::metadata$output) && rmarkdown::metadata$output
## %in% : 'length(x) = 2 > 1' in coercion to 'logical(1)'
require(MASS)
## Loading required package: MASS
require(ResourceSelection)
## Loading required package: ResourceSelection
## ResourceSelection 0.3-5 2019-07-22
require(tidyverse)
## Loading required package: tidyverse
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.1 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ lubridate 1.9.2 ✔ tibble 3.2.1
## ✔ purrr 1.0.1 ✔ tidyr 1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::group_rows() masks kableExtra::group_rows()
## ✖ dplyr::lag() masks stats::lag()
## ✖ purrr::lift() masks caret::lift()
## ✖ dplyr::recode() masks car::recode()
## ✖ dplyr::select() masks MASS::select()
## ✖ purrr::some() masks car::some()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
require(lmtest)
## Loading required package: lmtest
## Loading required package: zoo
##
## Attaching package: 'zoo'
##
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
require(magrittr)
## Loading required package: magrittr
##
## Attaching package: 'magrittr'
##
## The following object is masked from 'package:purrr':
##
## set_names
##
## The following object is masked from 'package:tidyr':
##
## extract
require(naniar)
## Loading required package: naniar
require(ordinal)
## Loading required package: ordinal
##
## Attaching package: 'ordinal'
##
## The following object is masked from 'package:dplyr':
##
## slice
require(psych)
## Loading required package: psych
##
## Attaching package: 'psych'
##
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
##
## The following object is masked from 'package:car':
##
## logit
require(ROCR)
## Loading required package: ROCR
require(summarytools)
## Loading required package: summarytools
##
## Attaching package: 'summarytools'
##
## The following object is masked from 'package:tibble':
##
## view
require(tidyverse)
mydata=read.csv('original.csv', stringsAsFactors = TRUE)
mydata=mydata[, -c(1:2)]
gg_miss_var(mydata,show_pct = TRUE)
## Independent Variable
The independent variable of interest in this study is the ‘labor compensation ratio’ (LCR) or the percent of net patient revenue consumed by labor expense. We operationalized this variable by extracting measures from the Medicare Cost Report (MCR) as shown in Equation I below:
\[Labor Cost Ratio = \frac{(Total Salaries + Total Contract Labor Cost + Total Fringe Benefits)}{Net Patient Revenue}\]
The potential for reverse causality prompted us to utilize older (2019) labor compensation ratio data to allow for the impact of hiring practices to be fully realized within the quality outcomes and score reports (data year 2021). The practice of replacing an explanatory variable with its lagged value to counteract endogeneity is prevalent across a wide variety of disciplines in economics and finance.43-46
We also included several organizational level characteristics to control for confounding variance. These variables included the following quantitative variables: percent bed utilization, staffed beds, the complications or major complications (CC & MCC) rate, percent Medicare days, percent Medicaid days, the case mix index (CMI), and average length of stay (ALOS). Categorical variables included for-profit ownership status (FP, dichotomous), the market concentration index (MCI, dichotomous: <= median, > median), rural status (dichotomous), and the AHA region (categorical, 10 levels). Region 1 was consisting of the states of Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, and Vermont was the referent group. All control variables were extracted from the year 2019.
Independent variables were largely complete (one percent missing total). Medians were imputed for the missing ordinal, interval, and ratio values, while modes were imputed for the missing nominal values. THe only remaining missing data belonged to the dependent variables. Each dependent variable was analyzed separately, and missing cases of the DV were simply dropped. TPS, Rating, and Star included 2471, 2592, 2624 observations, respectively.
missmap(mydata)
sum(!is.na(mydata$y1_TPS))
## [1] 2471
sum(!is.na(mydata$y2_Rating))
## [1] 2592
sum(!is.na(mydata$y3_Star))
## [1] 2624
The average facility score for TPS was 33.61 (sd=11.29). The median Hospital Compare and Star ratings were both 3. The ‘typical’ facility saw a mean LCR of 0.44 (sd=0.14) and a mean bed utilization of 50% (sd=20%).
Hospital Compare and Star Rating dependent variables were both collapsed into dichotomous variables with 0 representing scores/stars less than or equal to three. This recode partially balanced the classes and was an indicator of average to below average performance versus above average performance.
After recoding the Rating variable, there were 1501 observations (57.9%) of facilities with Rating 1-3 and 1091 with Rating 4-5 (42.1%) for a total of 2592 observations. For the Star variable, there were 1929 with 1-3 stars (73.5%) and 695 with 4-5 stars (26.5%) for a total of 2624 observations.
#Y2 Recode
mydata$y2_Rating[mydata$y2_Rating<4]=0
mydata$y2_Rating[mydata$y2_Rating>=4]=1
mydata$y2_Rating=as.factor(mydata$y2_Rating)
levels(mydata$y2_Rating)=c('1-3 Rating','4-5 Rating')
addmargins(table(mydata$y2_Rating))
##
## 1-3 Rating 4-5 Rating Sum
## 1501 1091 2592
#Y3 Recode
mydata$y3_Star[mydata$y3_Star<=3]=0
mydata$y3_Star[mydata$y3_Star>=4]=1
mydata$y3_Star=as.factor(mydata$y3_Star)
levels(mydata$y3_Star)=c('1-3 Stars', '4-5 Stars')
addmargins(table(mydata$y3_Star))
##
## 1-3 Stars 4-5 Stars Sum
## 1929 695 2624
tmp=subset(mydata, select=c('y2_Rating','y3_Star'))
#Graphs of Dependent Variables
par(mfrow=c(1,3))
hist(mydata$y1_TPS, main='TPS Score', col='mistyrose', xlab='TPS Score')
#Y2 and Y3 Graphs
mynames=c('Rating', 'Star')
for (i in 1:ncol(tmp)){
tbl=table(tmp[,i])
pct=tbl/sum(tbl)*100
x=barplot(tbl,col = c("lightblue", "mistyrose", "lightcyan",
"lavender", "cornsilk"),density = 50, main=mynames[i],las=3, ylim=c(0,2500))
text(x,1800, labels=paste0(round(pct,2),"%"),srt=90)
}
The mean LCR was 44% (sd=0.14). The average facility had bed utilization of 49.9% (sd=20.1%) and 203 staffed beds (sd=207). The mean CCMCC rate was 2.2% (sd=3.0%). The average percent of Medicare days was 12.8% (sd=6.2%), while the average percent of Medicaid days was 3.9% (sd=4.8%). The average facility had a CMI of 1.648 (sd=0.367), An ALOS of 4.394 (sd=3.375), and an MCI of 0.341 (sd=0.314). Bivariate pairs plots (Figure XX) illustrate the distributions and bivariate distributions for the quantitative predictors. The maximum absolute value of the correlation (bed utilization and staffed beds) was 0.533.
About 75.9% of the facilities were not-for-profit or government owned, while 90% were in urban settings. Region 4 was the modal region (15.2%). Notched boxplots of TPS and these variables depict distributional differences for each categorical variable (see Figures XXX-XXX). Chi square tests of Rating and separately Star versus FP suggested distributional differences for Rating only, \(\chi_{1}^2 = 34.359, p <.001\). Chi square tests of Rating and Star versus Rural suggested distributional distributions for Star only, \(\chi_{1}^2 = 21.616, p <.001\). For Region versus Rating and Star, there were distributional distributions found for both variables, \(\chi_{9}^2 = 71.017, p <.001\) and \(\chi_{9}^2 = 118.54, p <.001\), respectively.
Please select any graphs you would like, Brad.
myd=describe(mydata)
print(myd,digits=3)
## vars n mean sd median trimmed mad min max range
## y1_TPS 1 2471 33.611 11.288 32.375 32.856 10.564 6.000 92.667 86.667
## LCR 2 2900 0.436 0.138 0.417 0.426 0.129 0.011 0.982 0.971
## BedUtil 3 2900 0.499 0.201 0.516 0.503 0.227 0.001 0.999 0.998
## StaffedBeds 4 2900 0.203 0.207 0.141 0.166 0.136 0.001 2.735 2.734
## CC_MCC 5 2900 0.022 0.030 0.016 0.017 0.010 0.000 0.751 0.751
## MCareDays 6 2900 0.128 0.062 0.122 0.123 0.052 0.000 0.637 0.637
## MAidDays 7 2900 0.039 0.048 0.025 0.030 0.023 0.000 0.669 0.669
## CMI 8 2900 1.648 0.367 1.606 1.621 0.287 0.816 5.268 4.452
## ALOS 9 2900 4.394 3.375 4.213 4.240 0.913 1.000 144.379 143.379
## MCI 10 2900 0.341 0.314 0.236 0.298 0.257 0.025 1.000 0.975
## FP* 11 2900 1.241 0.428 1.000 1.177 0.000 1.000 2.000 1.000
## Rural* 12 2900 1.104 0.305 1.000 1.005 0.000 1.000 2.000 1.000
## Region* 13 2900 5.712 2.676 5.000 5.690 2.965 1.000 10.000 9.000
## y2_Rating* 14 2592 1.421 0.494 1.000 1.401 0.000 1.000 2.000 1.000
## y3_Star* 15 2624 1.265 0.441 1.000 1.206 0.000 1.000 2.000 1.000
## skew kurtosis se
## y1_TPS 0.700 0.704 0.227
## LCR 0.764 0.819 0.003
## BedUtil -0.163 -0.760 0.004
## StaffedBeds 3.029 17.559 0.004
## CC_MCC 8.551 144.308 0.001
## MCareDays 1.363 5.519 0.001
## MAidDays 4.407 36.288 0.001
## CMI 1.731 9.217 0.007
## ALOS 31.488 1196.519 0.063
## MCI 1.009 -0.220 0.006
## FP* 1.208 -0.541 0.008
## Rural* 2.597 4.745 0.006
## Region* 0.090 -1.104 0.050
## y2_Rating* 0.320 -1.898 0.010
## y3_Star* 1.065 -0.866 0.009
tmp=mydata[,11:13] # Quantitative Variables
tmp2=mydata[,2:10]
kdepairs(tmp2, cex.labels=2) # Quantitative Variables
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
par(mfrow=c(1,3))
for (i in 1:ncol(tmp)){
tbl=table(tmp[,i])
pct=tbl/sum(tbl)*100
x=barplot(tbl,col = c("lightblue", "mistyrose", "lightcyan",
"lavender", "cornsilk"),density = 50, main=colnames(tmp)[i],las=3, ylim=c(0,2500))
text(x,1800, labels=paste0(round(pct,2),"%"),srt=90)
}
par(mfrow=c(1,1))
boxplot(mydata$y1_TPS~mydata$FP, horizontal = T, notch=T, main='TPS~FP', col=c("lightblue", "mistyrose"), ylab='TPS', xlab='FP')
(t1=table(mydata$y2_Rating,mydata$FP))%>%kbl()%>%kable_classic(html_font='Cambria')
| No | Yes | |
|---|---|---|
| 1-3 Rating | 1123 | 378 |
| 4-5 Rating | 921 | 170 |
(t2=table(mydata$y3_Star,mydata$FP))%>%kbl()%>%kable_classic(html_font='Cambria')
| No | Yes | |
|---|---|---|
| 1-3 Stars | 1473 | 456 |
| 4-5 Stars | 540 | 155 |
boxplot(mydata$y1_TPS~mydata$Rural, horizontal = T, notch=T, main='TPS~Rural', col=c("lightblue", "mistyrose"), ylab='TPS',xlab='Rural')
(t3=table(mydata$y2_Rating,mydata$Rural))%>%kbl()%>%kable_classic(html_font='Cambria')
| No | Yes | |
|---|---|---|
| 1-3 Rating | 1369 | 132 |
| 4-5 Rating | 988 | 103 |
(t4=table(mydata$y3_Star,mydata$Rural))%>%kbl()%>%kable_classic(html_font='Cambria')
| No | Yes | |
|---|---|---|
| 1-3 Stars | 1807 | 122 |
| 4-5 Stars | 612 | 83 |
boxplot(mydata$y1_TPS~mydata$Region, horizontal = T, notch=T, main='TPS~Region', col=c("lightblue", "mistyrose", "lightcyan", "lavender", "cornsilk", "navajowhite", 'lightskyblue','lightsalmon','palegreen', 'peachpuff'), ylab='TPS', xlab='Region')
(t5=table(mydata$y2_Rating,mydata$Region))%>%kbl()%>%kable_classic(html_font='Cambria')
| Region 1 | Region 2 | Region 3 | Region 4 | Region 5 | Region 6 | Region 7 | Region 8 | Region 9 | Unreported | |
|---|---|---|---|---|---|---|---|---|---|---|
| 1-3 Rating | 59 | 184 | 134 | 300 | 198 | 87 | 186 | 94 | 165 | 94 |
| 4-5 Rating | 56 | 107 | 86 | 125 | 219 | 111 | 122 | 79 | 125 | 61 |
(t6=table(mydata$y3_Star,mydata$Region))%>%kbl()%>%kable_classic(html_font='Cambria')
| Region 1 | Region 2 | Region 3 | Region 4 | Region 5 | Region 6 | Region 7 | Region 8 | Region 9 | Unreported | |
|---|---|---|---|---|---|---|---|---|---|---|
| 1-3 Stars | 81 | 250 | 173 | 347 | 267 | 121 | 223 | 133 | 246 | 88 |
| 4-5 Stars | 33 | 48 | 41 | 71 | 151 | 87 | 101 | 53 | 51 | 59 |
print('Rating/Star ~FP')
## [1] "Rating/Star ~FP"
chisq.test(t1) #Rating~FP
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: t1
## X-squared = 34.359, df = 1, p-value = 4.583e-09
chisq.test(t2) #Star~FP
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: t2
## X-squared = 0.4392, df = 1, p-value = 0.5075
print('Rating/Star ~Rural')
## [1] "Rating/Star ~Rural"
chisq.test(t3)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: t3
## X-squared = 0.24689, df = 1, p-value = 0.6193
chisq.test(t4)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: t4
## X-squared = 21.616, df = 1, p-value = 3.33e-06
print('Rating/Star ~Region')
## [1] "Rating/Star ~Region"
chisq.test(t5)
##
## Pearson's Chi-squared test
##
## data: t5
## X-squared = 71.017, df = 9, p-value = 9.618e-12
chisq.test(t6)
##
## Pearson's Chi-squared test
##
## data: t6
## X-squared = 118.54, df = 9, p-value < 2.2e-16
temp=dummy_cols(mydata$Region, remove_first_dummy = TRUE,remove_selected_columns = TRUE)
mydata=as.data.frame(cbind(mydata,temp))
mydata$Region=NULL
colnames(mydata)[15:23]=c('Region2','Region3','Region4','Region5','Region6', 'Region7', 'Region8', 'Region9', 'RegionUnreported')
After descriptive analysis and to investigate external validity of our model estimates, we split the data into training and test sets (80% and 20%, respectively.) Doing so allows estimation of the model performance on unseen data. The training and test split occurred prior to any transformations of the predictor variables, so that statistics estimated from the training set (if any) would not bed leaked to the test set.
set.seed(1234)
mys=sample(1:nrow(mydata), 580, replace=F) #get about an 80% sample depending on y variable
train=mydata[-mys,]
test=mydata[mys, ]
The distributions of the quantitative variables are shown in the pairs plot from the descriptive statistics. Many of the univariate distributions appeared non-normal. While these distributions need not be normal (only the residuals), transforming them to normality often solves modeling problems. Therefore, we investigated transformations on the training set.
The only quantitative dependent variable, TPS, was evaluated for normality based on the training data. Box-Cox methods suggested a power transformation of 0.264. Raising the dependent variable to the 1/4 power, however, results in a loss of interpretability. Although a logarithmic transformation was explored, TPS was left untransformed for explanability
par(mfrow=c(1,2))
hist(train$y1_TPS, main='TPS', xlab='TPS', col='mistyrose')
hist(log(train$y1_TPS), main='Log(TPS)', xlab='Log(TPS)', col='lightsteelblue')
LCR appeared to be slightly right-skewed (see Figure XXXX). The recommended power transformation based on the training set (Box-Cox method) was 0.369. Since interpretation of a 1/3 power transformation is not necessarily intuitive and transformation of predictors may improve model build but are not required to meet model assumptions, we left the variable untransformed.
par(mfrow=c(1,3))
hist(train$LCR, main='LCR, Training Set', xlab='LCR', col='mistyrose')
hist(train$LCR^.5,main='LCR^.5, Training Set', xlab='LCR^.5', col='lightsteelblue')
hist(log(train$LCR), main='Log(LCR)', xlab='Log(LCR)', col='lightgreen')
# Power Transform
myt=powerTransform(train$LCR)
myt$lambda
## train$LCR
## 0.3690125
Bed Utilization appeared Gaussian, and Box-Cox methods suggested a transformation of nearly 1, \(LRT_1=0.986, p=1\). An additional LRT test of normality failed to reject the null Gaussian assumption with untransformed data, \(LRT_1=0.121, p=0.728\).
hist(train$BedUtil, main='Bed Utilization', xlab='LCR', col='mistyrose')
# Power Transform
myt=powerTransform(train$BedUtil)
myt$lambda
## train$BedUtil
## 0.9862363
testTransform(myt,1)
## LRT df pval
## LR test, lambda = (1) 0.1205962 1 0.72839
Staffed beds appeared to be lognormal, although the optimal transformation calculated on the training set and based on Box-Cox power analysis was 0.12. For interpretability, an imperfect logarithmic transformation \(LRT_1=44.556, p<0.001\), which avoids using a power function generated only from the training set was applied to both the training and test set.
# Graphs
par(mfrow=c(1,2))
hist(train$StaffedBeds,main='Staffed Beds, Training Set', xlab='Staffed Beds', col='mistyrose')
hist(log(train$StaffedBeds),main='Log(Staffed Beds), Training Set', xlab='Log(Staffed Beds)', col='lightsteelblue')
# Power Transform
myt=powerTransform(train$StaffedBeds+.0001)
myt$lambda
## train$StaffedBeds + 1e-04
## 0.1186714
testTransform(myt,0)
## LRT df pval
## LR test, lambda = (0) 44.55574 1 2.4722e-11
# Transformations
mydata$StaffedBeds=log(mydata$StaffedBeds)
train$StaffedBeds=log(train$StaffedBeds)
test$StaffedBeds=log(test$StaffedBeds)
CC MCC was largely left-skewed. Although the optimal normality transformation for the training set was 0.304, by Box-Cox methods, an imperfect logarithmic transformation, \(LRT_1=4634.466, p<0.001\) produced an interpretable and more visually normal distribution, even better than a square root transformation.
par(mfrow=c(1,3))
hist(train$CC_MCC,main='CC MCC', xlab='CC MCC', col='mistyrose')
hist(log(train$CC_MCC), main='Log(CC MCC)', xlab='Log(CC MCC)', col='lightsteelblue')
hist(train$CC_MCC^.5, main='CC_MCC^.5', xlab='CC_MCC^.5', col='lightgreen')
# Power Transform
myt=powerTransform(train$CC_MCC+.01)
myt$lambda
## train$CC_MCC + 0.01
## -0.5219373
testTransform(myt,0)
## LRT df pval
## LR test, lambda = (0) 366.6845 1 < 2.22e-16
# Transformations
mydata$CC_MCC=log(mydata$CC_MCC+.01)
train$CC_MCC=log(train$CC_MCC+.01)
test$CC_MCC=log(test$CC_MCC+.01)
Medicare days were obviously right skewed. The Box-Cox recommended transformation for the training set was raising to the 0.559 power, nearly a square root transformation. Thus, an imperfect square root transformation (\(LRT_1=5.578, p=0.012\)) was sufficient for both interpretability and conversion to normality, and this transformation required no information from the training set.
par(mfrow=c(1,2))
hist(train$MCareDays,main='Medicare Days', xlab='Medicare Days', col='mistyrose')
hist(train$MCareDays^.5,main='Medicare Days^(1/2)', xlab='Medicare Days', col='lightsteelblue')
# Power Transform
myt=powerTransform(train$MCareDays)
myt$lambda
## train$MCareDays
## 0.5586571
testTransform(myt, .5)
## LRT df pval
## LR test, lambda = (0.5) 5.547714 1 0.018505
# Transformations
mydata$MCareDays=mydata$MCareDays^.5
train$MCareDays=train$MCareDays^.5
test$MCareDays=test$MCareDays^.5
Medicaid days appeared roughly lognormal. While the optimal Box-Cox transformation power for the training set was 0.196, an imperfect logarithmic transform (\(LRT_1=231.300, p<0.001\)) provided a visually more normal distribution while retaining intepretability.
par(mfrow=c(1,2))
hist(train$MAidDays,main='Medicaid Days', xlab='Medicaid Days', col='mistyrose')
hist(log(train$MAidDays),main='Log(Medicaid Days)', xlab='Medicaid Days', col='lightsteelblue')
# Power Transform
myt=powerTransform(train$MAidDays)
myt$lambda
## train$MAidDays
## 0.1958326
testTransform(myt, 0)
## LRT df pval
## LR test, lambda = (0) 231.2996 1 < 2.22e-16
# Transformations
mydata$MAidDays=log(mydata$MAidDays)
train$MAidDays=log(train$MAidDays)
test$MAidDays=log(test$MAidDays)
While the optimal transformation for CMI was -0.213, an imperfect logarithmic transformation (\(LRT_1=9.088, p=0.003\)) provided a sufficiently normal and interpretable distribution by visual inspection.
par(mfrow=c(1,2))
hist(mydata$CMI,main='CMI', xlab='CMI', col='mistyrose')
hist(log(mydata$CMI),main='CMI', xlab='CMI', col='lightsteelblue')
# Power Transform
myt=powerTransform(train$CMI)
myt$lambda
## train$CMI
## -0.2127362
testTransform(myt,0)
## LRT df pval
## LR test, lambda = (0) 9.087465 1 0.0025737
# Transformations
mydata$CMI=log(mydata$CMI)
train$CMI=log(train$CMI)
test$CMI=log(test$CMI)
An imperfect logarithmic transformation for ALOS (\(LRT_1=32.990, p<0.001\), also made this variable visually normal while interpretable. The optimal transformation based on the training data was -0.138, \(LRT_1=0, p=1\).
par(mfrow=c(1,2))
hist(train$ALOS,main='ALOS', xlab='ALOS', col='mistyrose')
hist(log(train$ALOS),main='Log(ALOS)', xlab='Log(ALOS)', col='lightsteelblue')
# Power Trainsform
myt=powerTransform(train$ALOS)
myt$lambda
## train$ALOS
## -0.1382294
testTransform(myt,0 )
## LRT df pval
## LR test, lambda = (0) 32.89989 1 9.7029e-09
# Transformations
mydata$ALOS=log(mydata$ALOS)
train$ALOS=log(train$ALOS)
test$ALOS=log(test$ALOS)
Box-Cox transformation failed to produce a visually reasonable Gaussian distribution for MCI even when selecting the optimal lambda. The distribution is close to bimodal at the ends. MCI was thus dichotomized at the median of the training set based on its intractable, bi-modal distribution. This split is interpretable and proved useful in modeling.
To avoid leaking information from the training set to the test set, the median was calculated on the training set only. The dichotomous split for ‘below or equal to median’ and ‘above median’ were then applied to to test set.
# Power Trainsform
myt=powerTransform(train$MCI)
myt$lambda
## train$MCI
## 0.1559153
par(mfrow=c(1,2))
hist(train$MCI, main='MCI', xlab='MCI', col='mistyrose')
hist(train$MCI^.1559153, main='MCI^.16', xlab='MCI^.16', col='lightsteelblue')
mymedian=median(train$MCI) #median from the training set
temp=mydata$MCI
temp[temp<=mymedian]=0
temp[temp>mymedian]=1
mydata$MCI=as.factor(temp)
addmargins(table(mydata$MCI))%>%kbl()%>%kable_classic(html_font='Cambria')
| Var1 | Freq |
|---|---|
| 0 | 1564 |
| 1 | 1336 |
| Sum | 2900 |
temp2=train$MCI
temp2[temp2<=mymedian]=0
temp2[temp2>mymedian]=1
train$MCI=as.factor(temp2)
addmargins(table(train$MCI))%>%kbl()%>%kable_classic(html_font='Cambria')
| Var1 | Freq |
|---|---|
| 0 | 1248 |
| 1 | 1072 |
| Sum | 2320 |
temp3=test$MCI
temp3[temp3<=mymedian]=0
temp3[temp3>mymedian]=1
test$MCI=as.factor(temp3)
addmargins(table(test$MCI))%>%kbl()%>%kable_classic(html_font='Cambria')
| Var1 | Freq |
|---|---|
| 0 | 316 |
| 1 | 264 |
| Sum | 580 |
#for (i in 2:9){
# tmpMu=mean(train[,i])
# tmpSD=sd(train[,i])
# mydata[,i]=(mydata[,i]-tmpMu)/tmpSD
# train[,i]=(train[,i]-tmpMu)/tmpSD
# test[,i]=(test[,i]-tmpMu)/tmpSD
#}
describe(train[,1:9])
## vars n mean sd median trimmed mad min max range skew
## y1_TPS 1 1973 33.47 11.31 32.00 32.65 10.38 8.67 92.67 84.00 0.76
## LCR 2 2320 0.44 0.14 0.42 0.43 0.13 0.01 0.98 0.97 0.75
## BedUtil 3 2320 0.50 0.20 0.51 0.50 0.23 0.00 1.00 1.00 -0.16
## StaffedBeds 4 2320 -2.03 0.99 -1.97 -2.00 1.06 -6.91 0.84 7.74 -0.34
## CC_MCC 5 2320 -3.59 0.51 -3.64 -3.63 0.37 -4.61 -0.27 4.33 1.26
## MCareDays 6 2320 0.35 0.09 0.35 0.35 0.08 0.02 0.80 0.78 -0.05
## MAidDays 7 2320 -3.81 1.20 -3.70 -3.72 1.04 -9.58 -0.40 9.17 -0.89
## CMI 8 2320 0.48 0.21 0.47 0.48 0.18 -0.16 1.47 1.63 0.18
## ALOS 9 2320 1.42 0.30 1.44 1.43 0.22 0.00 4.97 4.97 0.73
## kurtosis se
## y1_TPS 0.87 0.25
## LCR 0.79 0.00
## BedUtil -0.74 0.00
## StaffedBeds 0.05 0.02
## CC_MCC 3.64 0.01
## MCareDays 1.22 0.00
## MAidDays 1.79 0.02
## CMI 1.00 0.00
## ALOS 15.61 0.01
describe(test[,1:9])
## vars n mean sd median trimmed mad min max range skew
## y1_TPS 1 498 34.17 11.19 33.71 33.67 11.35 6.00 69.67 63.67 0.44
## LCR 2 580 0.43 0.14 0.42 0.42 0.13 0.12 0.98 0.86 0.83
## BedUtil 3 580 0.50 0.20 0.52 0.51 0.22 0.05 0.93 0.88 -0.18
## StaffedBeds 4 580 -2.06 0.96 -1.94 -2.05 0.99 -5.12 1.01 6.12 -0.19
## CC_MCC 5 580 -3.61 0.47 -3.66 -3.64 0.36 -4.61 -1.49 3.11 1.03
## MCareDays 6 580 0.35 0.09 0.35 0.34 0.07 0.04 0.77 0.73 0.42
## MAidDays 7 580 -3.84 1.19 -3.70 -3.79 1.04 -9.93 -0.48 9.45 -0.79
## CMI 8 580 0.48 0.21 0.47 0.47 0.18 -0.20 1.66 1.86 0.72
## ALOS 9 580 1.43 0.29 1.44 1.44 0.22 0.13 3.31 3.18 0.23
## kurtosis se
## y1_TPS 0.09 0.50
## LCR 0.93 0.01
## BedUtil -0.84 0.01
## StaffedBeds -0.17 0.04
## CC_MCC 2.93 0.02
## MCareDays 2.22 0.00
## MAidDays 2.26 0.05
## CMI 3.15 0.01
## ALOS 6.44 0.01
kdepairs(mydata[, 2:9], cex.labels=2)
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
## Warning in par(usr): argument 1 does not name a graphical parameter
Using multiple regression, we evaluated TPS as a function of all predictors on the training data. The initial regression model was built on the training set, and forecasts from that model were used to assess performance out of sample on the test set. The results of the statistically signficant regression model (\(F_{1952}^{20}=27.08, p<0.001\)) are shown in Table XXX and Figure YYY. Only bed utilization and some of the regions were not statistically part of the model at the \(\alpha=.10\) level.. The \(R^2\) was only a modest 0.2172, and the standard error was 10.06 TPS points.
The estimates from the training data were used to estimate the dependent variable in the test set. The forecast were reasonable, accounting for \(R^2=0.180\) of the variance (Root Mean Squared Error = 4.339.) While imperfect, the forecasts demonstrate little loss of generality.
# Build training and test sets on full data
train1=subset(train,!is.na(train$y1_TPS),)
test1=subset(test,!is.na(test$y1_TPS),)
# Run Regression
mylm=lm(y1_TPS~., data=train1[,-c(13:14)])
#stepAIC(mylm)
summary(mylm)
##
## Call:
## lm(formula = y1_TPS ~ ., data = train1[, -c(13:14)])
##
## Residuals:
## Min 1Q Median 3Q Max
## -30.873 -6.576 -0.473 6.004 40.347
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 46.7723 3.9537 11.830 < 2e-16 ***
## LCR -5.2252 2.1583 -2.421 0.015571 *
## BedUtil 1.0771 1.7621 0.611 0.541083
## StaffedBeds -4.6500 0.4415 -10.533 < 2e-16 ***
## CC_MCC 3.0878 0.6315 4.890 1.09e-06 ***
## MCareDays -7.5009 3.1239 -2.401 0.016439 *
## MAidDays -0.8167 0.2427 -3.366 0.000778 ***
## CMI 8.1753 2.0497 3.988 6.89e-05 ***
## ALOS -7.2014 1.4932 -4.823 1.53e-06 ***
## MCI1 -0.7633 0.4698 -1.625 0.104356
## FPYes -2.0277 0.6075 -3.338 0.000860 ***
## RuralYes 2.6841 0.8669 3.096 0.001989 **
## Region2 -3.1436 1.2676 -2.480 0.013222 *
## Region3 -2.5298 1.3331 -1.898 0.057882 .
## Region4 -4.4943 1.2188 -3.687 0.000233 ***
## Region5 -2.6371 1.2406 -2.126 0.033654 *
## Region6 0.1982 1.3747 0.144 0.885352
## Region7 -3.9829 1.2771 -3.119 0.001843 **
## Region8 -2.1886 1.4270 -1.534 0.125261
## Region9 0.2607 1.2706 0.205 0.837438
## RegionUnreported 1.2302 1.6764 0.734 0.463132
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.01 on 1952 degrees of freedom
## Multiple R-squared: 0.2251, Adjusted R-squared: 0.2172
## F-statistic: 28.35 on 20 and 1952 DF, p-value: < 2.2e-16
anova(mylm)
## Analysis of Variance Table
##
## Response: y1_TPS
## Df Sum Sq Mean Sq F value Pr(>F)
## LCR 1 345 344.7 3.4420 0.0637075 .
## BedUtil 1 11758 11757.8 117.4099 < 2.2e-16 ***
## StaffedBeds 1 24729 24728.6 246.9326 < 2.2e-16 ***
## CC_MCC 1 4109 4109.4 41.0349 1.869e-10 ***
## MCareDays 1 497 496.9 4.9619 0.0260259 *
## MAidDays 1 1248 1248.1 12.4630 0.0004247 ***
## CMI 1 2689 2688.9 26.8506 2.425e-07 ***
## ALOS 1 2418 2418.0 24.1453 9.678e-07 ***
## MCI 1 644 643.5 6.4260 0.0113238 *
## FP 1 1884 1884.3 18.8163 1.513e-05 ***
## Rural 1 747 746.7 7.4568 0.0063765 **
## Region2 1 220 219.9 2.1963 0.1385056
## Region3 1 1 1.1 0.0109 0.9170388
## Region4 1 2003 2002.6 19.9976 8.199e-06 ***
## Region5 1 426 425.6 4.2503 0.0393761 *
## Region6 1 338 338.2 3.3774 0.0662468 .
## Region7 1 2084 2084.3 20.8136 5.377e-06 ***
## Region8 1 590 589.8 5.8895 0.0153212 *
## Region9 1 5 4.6 0.0463 0.8296137
## RegionUnreported 1 54 53.9 0.5385 0.4631322
## Residuals 1952 195480 100.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Predict regression on Unseen Test Set
par(mfrow=c(3,1))
mypred=predict(mylm,test1)
plot(test1$y1_TPS~mypred, xlab='Predictions', main='TPS~Predictions', ylab='TPS', col='lightsteelblue')
abline(lm(test1$y1_TPS~mypred), col='red')
hist(mylm$residuals, main='Residuals',xlab='Residuals', col='mistyrose')
qqnorm(mylm$residuals, col='lightgreen')
ncvTest(mylm)
## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 164.6829, Df = 1, p = < 2.22e-16
mys=summary(lm(test1$y1_TPS~mypred))
print('Performance on Test Set')
## [1] "Performance on Test Set"
print(noquote(paste0('R2: ', mys$r.squared)))
## [1] R2: 0.17441901020975
print(noquote(paste0('Adj R2: ', mys$adj.r.squared)))
## [1] Adj R2: 0.172754532407754
print(noquote(paste0('RMSE ', sqrt(sum(test1$y1_TPS-mypred)^2/nrow(test1)))))
## [1] RMSE 4.71063536616421
myt=powerTransform(mylm$residuals+abs(min(mylm$residuals))+.001)
myt$lambda
## mylm$residuals + abs(min(mylm$residuals)) + 0.001
## 0.7930931
testTransform(myt,1)
## LRT df pval
## LR test, lambda = (1) 21.10645 1 4.3446e-06
ncvTest(mylm)
## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 164.6829, Df = 1, p = < 2.22e-16
Colinearity was not a factor for this analysis (max variance inflation factor = 4.164).
vif(mylm)%>%kable()
| x | |
|---|---|
| LCR | 1.473733 |
| BedUtil | 2.079524 |
| StaffedBeds | 2.743331 |
| CC_MCC | 1.140201 |
| MCareDays | 1.161864 |
| MAidDays | 1.389762 |
| CMI | 2.385343 |
| ALOS | 1.964638 |
| MCI | 1.084715 |
| FP | 1.218265 |
| Rural | 1.153489 |
| Region2 | 3.272455 |
| Region3 | 2.859218 |
| Region4 | 4.164477 |
| Region5 | 4.036800 |
| Region6 | 2.599255 |
| Region7 | 3.346578 |
| Region8 | 2.714347 |
| Region9 | 3.312685 |
| RegionUnreported | 1.763967 |
For Rating, we fit all independent and control variables to the training set and evaluated accuracy metrics on the test set. Only bed utilization and some regional variables were not statistically significant at the \(\alpha=.10\) level.
When using the training parameter estimates to forecast the test data, the results were reasonable. The forecast were 66% accurate in classification with a sensitivity (recall) of 62% and a precision (positive predictive value) of 54%. Specificity was 68%.
# Subset for non-missing Y2 values
train2=subset(train,!is.na(train$y2_Rating),)
test2=subset(test,!is.na(test$y2_Rating),)
# GLM
myglm=glm(y2_Rating ~ ., data=train2[,-c(1,14)], family=binomial)
(mys=summary(myglm))
##
## Call:
## glm(formula = y2_Rating ~ ., family = binomial, data = train2[,
## -c(1, 14)])
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.8745 -0.9473 -0.6182 1.0720 2.6303
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 4.78080 0.89189 5.360 8.31e-08 ***
## LCR -2.29476 0.47277 -4.854 1.21e-06 ***
## BedUtil 0.42870 0.38302 1.119 0.263025
## StaffedBeds -0.41138 0.09594 -4.288 1.80e-05 ***
## CC_MCC 0.68870 0.13897 4.956 7.20e-07 ***
## MCareDays -0.30430 0.67889 -0.448 0.653983
## MAidDays -0.16593 0.05252 -3.159 0.001581 **
## CMI 2.45613 0.45758 5.368 7.98e-08 ***
## ALOS -2.22987 0.34424 -6.478 9.32e-11 ***
## MCI1 -0.20502 0.10128 -2.024 0.042938 *
## FPYes -0.87494 0.13620 -6.424 1.33e-10 ***
## RuralYes 0.41148 0.18184 2.263 0.023644 *
## Region2 -0.83321 0.27097 -3.075 0.002105 **
## Region3 -1.01973 0.28284 -3.605 0.000312 ***
## Region4 -1.31350 0.26076 -5.037 4.72e-07 ***
## Region5 -0.72288 0.26050 -2.775 0.005521 **
## Region6 -0.48251 0.28706 -1.681 0.092787 .
## Region7 -0.95333 0.27038 -3.526 0.000422 ***
## Region8 -0.89022 0.30486 -2.920 0.003500 **
## Region9 -0.87352 0.26826 -3.256 0.001129 **
## RegionUnreported -0.85820 0.30425 -2.821 0.004791 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2819.2 on 2073 degrees of freedom
## Residual deviance: 2472.2 on 2053 degrees of freedom
## AIC: 2514.2
##
## Number of Fisher Scoring iterations: 4
options(scipen=1)
temp=as.data.frame(as.matrix(mys$coefficients))
temp=round(temp,3)
temp$OR=round(exp(temp[,1]),3)
mypred=round(predict(myglm, test2, type='response'),0)
# ROC
perf_AUC=performance(prediction(mypred,test2$y2_Rating),"auc") #Calculate the AUC value
AUC=perf_AUC@y.values[[1]]
perf_ROC=performance(prediction(mypred,test2$y2_Rating),"tpr","fpr") #plot the actual ROC curve
plot(perf_ROC, main="ROC plot")
text(0.5,0.5,paste("AUC = ",format(AUC, digits=5, scientific=FALSE)))
# Confusion Matrix and Accuracy Metrics
confusionMatrix(as.factor(as.numeric(test2$y2_Rating)-1), as.factor(mypred),positive='1')
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 221 73
## 1 104 120
##
## Accuracy : 0.6583
## 95% CI : (0.6157, 0.6991)
## No Information Rate : 0.6274
## P-Value [Acc > NIR] : 0.07888
##
## Kappa : 0.2922
##
## Mcnemar's Test P-Value : 0.02414
##
## Sensitivity : 0.6218
## Specificity : 0.6800
## Pos Pred Value : 0.5357
## Neg Pred Value : 0.7517
## Prevalence : 0.3726
## Detection Rate : 0.2317
## Detection Prevalence : 0.4324
## Balanced Accuracy : 0.6509
##
## 'Positive' Class : 1
##
For Star, we also fit all independent and control variables to the training set and evaluated accuracy metrics on the test set. Only CC MCC and MCI were not in the model at the \(\alpha=.10\) level.
The forecast of the test set was 78% accurate with a sensitivity of 68% but a low positive predictive value of 38% indicating over classification of the 4-5 Star factor level.
train3=subset(train,!is.na(train$y3_Star),)
test3=subset(test,!is.na(test$y3_Star),)
myglm1=glm(formula = y3_Star ~ .,data = train3[,-c(1,13)],family=binomial, n_iter=1000, control = list(maxit = 10000))
summary(myglm1)
##
## Call:
## glm(formula = y3_Star ~ ., family = binomial, data = train3[,
## -c(1, 13)], control = list(maxit = 10000), n_iter = 1000)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.2660 -0.6928 -0.4625 0.2446 2.6147
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.59381 0.97152 -0.611 0.541054
## LCR -1.15299 0.54854 -2.102 0.035560 *
## BedUtil -0.91674 0.45006 -2.037 0.041655 *
## StaffedBeds -0.98380 0.10843 -9.073 < 2e-16 ***
## CC_MCC 0.59646 0.14335 4.161 3.17e-05 ***
## MCareDays -1.27511 0.78624 -1.622 0.104850
## MAidDays -0.21779 0.05894 -3.695 0.000220 ***
## CMI 3.90830 0.48177 8.112 4.96e-16 ***
## ALOS -0.46748 0.35400 -1.321 0.186645
## MCI1 -0.03277 0.11989 -0.273 0.784596
## FPYes -0.96866 0.17459 -5.548 2.89e-08 ***
## RuralYes 0.95798 0.19803 4.838 1.31e-06 ***
## Region2 -1.23858 0.32204 -3.846 0.000120 ***
## Region3 -1.33632 0.33744 -3.960 7.49e-05 ***
## Region4 -1.13911 0.30152 -3.778 0.000158 ***
## Region5 -0.57444 0.28977 -1.982 0.047433 *
## Region6 -0.59169 0.31597 -1.873 0.061125 .
## Region7 -0.96252 0.31009 -3.104 0.001909 **
## Region8 -0.97887 0.34104 -2.870 0.004101 **
## Region9 -1.28368 0.31616 -4.060 4.90e-05 ***
## RegionUnreported -0.57968 0.34701 -1.671 0.094817 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2421.4 on 2099 degrees of freedom
## Residual deviance: 1880.2 on 2079 degrees of freedom
## AIC: 1922.2
##
## Number of Fisher Scoring iterations: 5
mypred2=round(predict(myglm1, test3, type='response'),0)
# ROC
perf_AUC=performance(prediction(mypred2,test3$y3_Star),"auc") #Calculate the AUC value
AUC=perf_AUC@y.values[[1]]
perf_ROC=performance(prediction(mypred2,test3$y3_Star),"tpr","fpr") #plot the actual ROC curve
plot(perf_ROC, main="ROC plot")
text(0.5,0.5,paste("AUC = ",format(AUC, digits=5, scientific=FALSE)))
confusionMatrix( as.factor(as.numeric(test3$y3_Star)-1), as.factor(mypred2), positive='1')
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 356 26
## 1 88 54
##
## Accuracy : 0.7824
## 95% CI : (0.7446, 0.817)
## No Information Rate : 0.8473
## P-Value [Acc > NIR] : 1
##
## Kappa : 0.3618
##
## Mcnemar's Test P-Value : 1.109e-08
##
## Sensitivity : 0.6750
## Specificity : 0.8018
## Pos Pred Value : 0.3803
## Neg Pred Value : 0.9319
## Prevalence : 0.1527
## Detection Rate : 0.1031
## Detection Prevalence : 0.2710
## Balanced Accuracy : 0.7384
##
## 'Positive' Class : 1
##
When comparing directionality of the coefficients, only those that lacked statistical significance in one or more models showed difference in direction. These variables included bed utilization, CC MCC , and Regions 6, 9, and Unknown. The congruence of the directionality would indicate that most of the predictors are directionaly stable for these three quality metrics. Aside from Rural status and CMI, the predictors are negative, meaning that increases are associated with reduced quality. Rural status and CMI, however, are positive, indicating that increases result in increased quality.
tmp1=round(summary(mylm)$coefficients[,c(1,4)],3)
tmp2=round(summary(myglm)$coefficients[,c(1,4)],3)
tmp3=round(summary(myglm1)$coefficients[,c(1,4)],3)
mycoef=cbind(tmp1,tmp2, tmp3)
partial=mycoef[, c(1,3,5)]
tmp=apply(partial,1,function(x) sum((sign(x))))
tmp[abs(tmp)<3]="Mixed"
tmp[tmp==-3]="All -"
tmp[tmp==3]="All+"
mycoef=as.data.frame(mycoef)
mycoef$Congruence=as.vector(tmp)
colnames(mycoef)=c('TPS','p', 'Rating','p', 'Stars','p', 'Congruence')
mycoef=mycoef[2:nrow(mycoef),]
mycoef[order(mycoef$Congruence),]
## TPS p Rating p Stars p Congruence
## LCR -5.225 0.016 -2.295 0.000 -1.153 0.036 All -
## StaffedBeds -4.650 0.000 -0.411 0.000 -0.984 0.000 All -
## MCareDays -7.501 0.016 -0.304 0.654 -1.275 0.105 All -
## MAidDays -0.817 0.001 -0.166 0.002 -0.218 0.000 All -
## ALOS -7.201 0.000 -2.230 0.000 -0.467 0.187 All -
## MCI1 -0.763 0.104 -0.205 0.043 -0.033 0.785 All -
## FPYes -2.028 0.001 -0.875 0.000 -0.969 0.000 All -
## Region2 -3.144 0.013 -0.833 0.002 -1.239 0.000 All -
## Region3 -2.530 0.058 -1.020 0.000 -1.336 0.000 All -
## Region4 -4.494 0.000 -1.314 0.000 -1.139 0.000 All -
## Region5 -2.637 0.034 -0.723 0.006 -0.574 0.047 All -
## Region7 -3.983 0.002 -0.953 0.000 -0.963 0.002 All -
## Region8 -2.189 0.125 -0.890 0.003 -0.979 0.004 All -
## CC_MCC 3.088 0.000 0.689 0.000 0.596 0.000 All+
## CMI 8.175 0.000 2.456 0.000 3.908 0.000 All+
## RuralYes 2.684 0.002 0.411 0.024 0.958 0.000 All+
## BedUtil 1.077 0.541 0.429 0.263 -0.917 0.042 Mixed
## Region6 0.198 0.885 -0.483 0.093 -0.592 0.061 Mixed
## Region9 0.261 0.837 -0.874 0.001 -1.284 0.000 Mixed
## RegionUnreported 1.230 0.463 -0.858 0.005 -0.580 0.095 Mixed