Method and Models

Data & Sample

Our primary data source is Definitive Healthcare, which provides comprehensive data extracted from publicly available information, including from federal, state, and other regulatory agencies, in addition to licensed data from other companies, web research on publicly available information through technology as well as electronic and phone surveys conducted by their research team based in the United States.38 Our final dataset consists of all active, short term acute care hospitals in the United States with an active Medicare Provider Number (MPN). Veterans Administration, Military Healthcare System, and Indian Health System hospitals were excluded from the analysis as the outcome measures used in this study are not available for federal government operated hospitals. This limitation reduced our total data set to 2,900 unique hospital observations and the dataset was reduced further based on the number of institutions reporting data for each dependent variable. The total number of observations available for each measure is detailed in the section below.

Dependent Variables

Given the variety of perspectives and robust number of quality measures available, three aggregate dependent variables were used for this study. The first variable was the 2021 value-based purchasing Total Performance Score. Value-based purchasing is a CMS program that adjusts a hospital’s payments based on its performance in four weighted quality measurement domains to comprise its Total Performance Score. The domains include (1) clinical outcomes, (2) patient and community engagement, (3) safety, and (4) efficiency and cost reduction.17 A total of 2,471 observations for the Hospital Value Based Purchasing Total Performance Score from the year 2021 were available in the final analyzed data set.

The second dependent variable included in the study is the Hospital Compare overall rating. Hospital Compare rating provides consumer-focused information on hospitals’ performance on numerous dimensions of quality, such as treating heart attacks and pneumonia, readmission rates, and safety of care.14 For this study, 2,592 observations were collected from the year 2021 for analysis.

The third dependent variable includes the HCAHPS (Hospital Consumer Assessment of Healthcare Providers and Systems) summary star rating. The HCAHPS survey asks discharged patients numerous questions about their experience with a hospital stay including questions pertaining to communication with nurses and doctors, the responsiveness of hospital staff, the cleanliness and quietness of the hospital environment, communication about medicines, discharge information, overall rating of hospital, and would they recommend the hospital. The patient survey (HCAHPS) summary star rating is the average of all the Star Ratings of the HCAHPS measures.15 Our final data set for analysis included 2,624 observations from the year 2021 for the HCAHPS summary star rating.

Load Data

require(Amelia)

## Loading required package: Amelia

## Loading required package: Rcpp

## ## 
## ## Amelia II: Multiple Imputation
## ## (Version 1.8.1, built: 2022-11-18)
## ## Copyright (C) 2005-2023 James Honaker, Gary King and Matthew Blackwell
## ## Refer to http://gking.harvard.edu/amelia/ for more information
## ##

require(car)

## Loading required package: car

## Loading required package: carData

require(caret)

## Loading required package: caret

## Loading required package: ggplot2

## Loading required package: lattice

require(fastDummies)

## Loading required package: fastDummies

require(GGally)

## Loading required package: GGally

## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2

require(ggplot2)
require(kableExtra)

## Loading required package: kableExtra

## Warning in !is.null(rmarkdown::metadata$output) && rmarkdown::metadata$output
## %in% : 'length(x) = 2 > 1' in coercion to 'logical(1)'

require(MASS)

## Loading required package: MASS

require(ResourceSelection)

## Loading required package: ResourceSelection

## ResourceSelection 0.3-5   2019-07-22

require(tidyverse)

## Loading required package: tidyverse

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.1     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ lubridate 1.9.2     ✔ tibble    3.2.1
## ✔ purrr     1.0.1     ✔ tidyr     1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter()     masks stats::filter()
## ✖ dplyr::group_rows() masks kableExtra::group_rows()
## ✖ dplyr::lag()        masks stats::lag()
## ✖ purrr::lift()       masks caret::lift()
## ✖ dplyr::recode()     masks car::recode()
## ✖ dplyr::select()     masks MASS::select()
## ✖ purrr::some()       masks car::some()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors

require(lmtest)

## Loading required package: lmtest
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

require(magrittr)

## Loading required package: magrittr
## 
## Attaching package: 'magrittr'
## 
## The following object is masked from 'package:purrr':
## 
##     set_names
## 
## The following object is masked from 'package:tidyr':
## 
##     extract

require(naniar)

## Loading required package: naniar

require(ordinal)

## Loading required package: ordinal
## 
## Attaching package: 'ordinal'
## 
## The following object is masked from 'package:dplyr':
## 
##     slice

require(psych)

## Loading required package: psych
## 
## Attaching package: 'psych'
## 
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha
## 
## The following object is masked from 'package:car':
## 
##     logit

require(ROCR)

## Loading required package: ROCR

require(summarytools)

## Loading required package: summarytools
## 
## Attaching package: 'summarytools'
## 
## The following object is masked from 'package:tibble':
## 
##     view

require(tidyverse)

mydata=read.csv('original.csv', stringsAsFactors = TRUE)
mydata=mydata[, -c(1:2)]
gg_miss_var(mydata,show_pct = TRUE)

## Independent Variable

The independent variable of interest in this study is the ‘labor compensation ratio’ (LCR) or the percent of net patient revenue consumed by labor expense. We operationalized this variable by extracting measures from the Medicare Cost Report (MCR) as shown in Equation I below:

\[Labor Cost Ratio = \frac{(Total Salaries + Total Contract Labor Cost + Total Fringe Benefits)}{Net Patient Revenue}\]

The potential for reverse causality prompted us to utilize older (2019) labor compensation ratio data to allow for the impact of hiring practices to be fully realized within the quality outcomes and score reports (data year 2021). The practice of replacing an explanatory variable with its lagged value to counteract endogeneity is prevalent across a wide variety of disciplines in economics and finance.43-46

Control Variables

We also included several organizational level characteristics to control for confounding variance. These variables included the following quantitative variables: percent bed utilization, staffed beds, the complications or major complications (CC & MCC) rate, percent Medicare days, percent Medicaid days, the case mix index (CMI), and average length of stay (ALOS). Categorical variables included for-profit ownership status (FP, dichotomous), the market concentration index (MCI, dichotomous: <= median, > median), rural status (dichotomous), and the AHA region (categorical, 10 levels). Region 1 was consisting of the states of Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, and Vermont was the referent group. All control variables were extracted from the year 2019.

Exporatory Data Analysis

Missing Data

Independent variables were largely complete (one percent missing total). Medians were imputed for the missing ordinal, interval, and ratio values, while modes were imputed for the missing nominal values. THe only remaining missing data belonged to the dependent variables. Each dependent variable was analyzed separately, and missing cases of the DV were simply dropped. TPS, Rating, and Star included 2471, 2592, 2624 observations, respectively.

missmap(mydata)

sum(!is.na(mydata$y1_TPS))

## [1] 2471

sum(!is.na(mydata$y2_Rating))

## [1] 2592

sum(!is.na(mydata$y3_Star))

## [1] 2624

Descriptive Statistics

Dependent Variables, Re-coding, and Plots

The average facility score for TPS was 33.61 (sd=11.29). The median Hospital Compare and Star ratings were both 3. The ‘typical’ facility saw a mean LCR of 0.44 (sd=0.14) and a mean bed utilization of 50% (sd=20%).

Hospital Compare and Star Rating dependent variables were both collapsed into dichotomous variables with 0 representing scores/stars less than or equal to three. This recode partially balanced the classes and was an indicator of average to below average performance versus above average performance.

After recoding the Rating variable, there were 1501 observations (57.9%) of facilities with Rating 1-3 and 1091 with Rating 4-5 (42.1%) for a total of 2592 observations. For the Star variable, there were 1929 with 1-3 stars (73.5%) and 695 with 4-5 stars (26.5%) for a total of 2624 observations.

#Y2 Recode
mydata$y2_Rating[mydata$y2_Rating<4]=0
mydata$y2_Rating[mydata$y2_Rating>=4]=1
mydata$y2_Rating=as.factor(mydata$y2_Rating)
levels(mydata$y2_Rating)=c('1-3 Rating','4-5 Rating')
addmargins(table(mydata$y2_Rating))

## 
## 1-3 Rating 4-5 Rating        Sum 
##       1501       1091       2592

#Y3 Recode
mydata$y3_Star[mydata$y3_Star<=3]=0
mydata$y3_Star[mydata$y3_Star>=4]=1
mydata$y3_Star=as.factor(mydata$y3_Star)
levels(mydata$y3_Star)=c('1-3 Stars', '4-5 Stars')
addmargins(table(mydata$y3_Star))

## 
## 1-3 Stars 4-5 Stars       Sum 
##      1929       695      2624

tmp=subset(mydata, select=c('y2_Rating','y3_Star'))

#Graphs of Dependent Variables
par(mfrow=c(1,3))
hist(mydata$y1_TPS, main='TPS Score', col='mistyrose', xlab='TPS Score')
#Y2 and Y3 Graphs
mynames=c('Rating', 'Star')
for (i in 1:ncol(tmp)){
  tbl=table(tmp[,i])
  pct=tbl/sum(tbl)*100
  x=barplot(tbl,col = c("lightblue", "mistyrose", "lightcyan",
                "lavender", "cornsilk"),density = 50, main=mynames[i],las=3, ylim=c(0,2500))
    text(x,1800, labels=paste0(round(pct,2),"%"),srt=90)

}

Independent Variables and Controls and Plots

The mean LCR was 44% (sd=0.14). The average facility had bed utilization of 49.9% (sd=20.1%) and 203 staffed beds (sd=207). The mean CCMCC rate was 2.2% (sd=3.0%). The average percent of Medicare days was 12.8% (sd=6.2%), while the average percent of Medicaid days was 3.9% (sd=4.8%). The average facility had a CMI of 1.648 (sd=0.367), An ALOS of 4.394 (sd=3.375), and an MCI of 0.341 (sd=0.314). Bivariate pairs plots (Figure XX) illustrate the distributions and bivariate distributions for the quantitative predictors. The maximum absolute value of the correlation (bed utilization and staffed beds) was 0.533.

About 75.9% of the facilities were not-for-profit or government owned, while 90% were in urban settings. Region 4 was the modal region (15.2%). Notched boxplots of TPS and these variables depict distributional differences for each categorical variable (see Figures XXX-XXX). Chi square tests of Rating and separately Star versus FP suggested distributional differences for Rating only, \(\chi_{1}^2 = 34.359, p <.001\). Chi square tests of Rating and Star versus Rural suggested distributional distributions for Star only, \(\chi_{1}^2 = 21.616, p <.001\). For Region versus Rating and Star, there were distributional distributions found for both variables, \(\chi_{9}^2 = 71.017, p <.001\) and \(\chi_{9}^2 = 118.54, p <.001\), respectively.

Please select any graphs you would like, Brad.

myd=describe(mydata)
print(myd,digits=3)

##             vars    n   mean     sd median trimmed    mad   min     max   range
## y1_TPS         1 2471 33.611 11.288 32.375  32.856 10.564 6.000  92.667  86.667
## LCR            2 2900  0.436  0.138  0.417   0.426  0.129 0.011   0.982   0.971
## BedUtil        3 2900  0.499  0.201  0.516   0.503  0.227 0.001   0.999   0.998
## StaffedBeds    4 2900  0.203  0.207  0.141   0.166  0.136 0.001   2.735   2.734
## CC_MCC         5 2900  0.022  0.030  0.016   0.017  0.010 0.000   0.751   0.751
## MCareDays      6 2900  0.128  0.062  0.122   0.123  0.052 0.000   0.637   0.637
## MAidDays       7 2900  0.039  0.048  0.025   0.030  0.023 0.000   0.669   0.669
## CMI            8 2900  1.648  0.367  1.606   1.621  0.287 0.816   5.268   4.452
## ALOS           9 2900  4.394  3.375  4.213   4.240  0.913 1.000 144.379 143.379
## MCI           10 2900  0.341  0.314  0.236   0.298  0.257 0.025   1.000   0.975
## FP*           11 2900  1.241  0.428  1.000   1.177  0.000 1.000   2.000   1.000
## Rural*        12 2900  1.104  0.305  1.000   1.005  0.000 1.000   2.000   1.000
## Region*       13 2900  5.712  2.676  5.000   5.690  2.965 1.000  10.000   9.000
## y2_Rating*    14 2592  1.421  0.494  1.000   1.401  0.000 1.000   2.000   1.000
## y3_Star*      15 2624  1.265  0.441  1.000   1.206  0.000 1.000   2.000   1.000
##               skew kurtosis    se
## y1_TPS       0.700    0.704 0.227
## LCR          0.764    0.819 0.003
## BedUtil     -0.163   -0.760 0.004
## StaffedBeds  3.029   17.559 0.004
## CC_MCC       8.551  144.308 0.001
## MCareDays    1.363    5.519 0.001
## MAidDays     4.407   36.288 0.001
## CMI          1.731    9.217 0.007
## ALOS        31.488 1196.519 0.063
## MCI          1.009   -0.220 0.006
## FP*          1.208   -0.541 0.008
## Rural*       2.597    4.745 0.006
## Region*      0.090   -1.104 0.050
## y2_Rating*   0.320   -1.898 0.010
## y3_Star*     1.065   -0.866 0.009

tmp=mydata[,11:13] # Quantitative Variables
tmp2=mydata[,2:10]
kdepairs(tmp2, cex.labels=2) # Quantitative Variables

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

par(mfrow=c(1,3))

for (i in 1:ncol(tmp)){
  tbl=table(tmp[,i])
  pct=tbl/sum(tbl)*100
  x=barplot(tbl,col = c("lightblue", "mistyrose", "lightcyan",
                "lavender", "cornsilk"),density = 50, main=colnames(tmp)[i],las=3, ylim=c(0,2500))
    text(x,1800, labels=paste0(round(pct,2),"%"),srt=90)
}

par(mfrow=c(1,1))

boxplot(mydata$y1_TPS~mydata$FP, horizontal = T, notch=T, main='TPS~FP', col=c("lightblue", "mistyrose"), ylab='TPS', xlab='FP')

(t1=table(mydata$y2_Rating,mydata$FP))%>%kbl()%>%kable_classic(html_font='Cambria')

	No	Yes
1-3 Rating	1123	378
4-5 Rating	921	170

(t2=table(mydata$y3_Star,mydata$FP))%>%kbl()%>%kable_classic(html_font='Cambria')

	No	Yes
1-3 Stars	1473	456
4-5 Stars	540	155

boxplot(mydata$y1_TPS~mydata$Rural, horizontal = T, notch=T, main='TPS~Rural', col=c("lightblue", "mistyrose"), ylab='TPS',xlab='Rural')

(t3=table(mydata$y2_Rating,mydata$Rural))%>%kbl()%>%kable_classic(html_font='Cambria')

	No	Yes
1-3 Rating	1369	132
4-5 Rating	988	103

(t4=table(mydata$y3_Star,mydata$Rural))%>%kbl()%>%kable_classic(html_font='Cambria')

	No	Yes
1-3 Stars	1807	122
4-5 Stars	612	83

boxplot(mydata$y1_TPS~mydata$Region, horizontal = T, notch=T, main='TPS~Region', col=c("lightblue", "mistyrose", "lightcyan", "lavender", "cornsilk", "navajowhite", 'lightskyblue','lightsalmon','palegreen', 'peachpuff'), ylab='TPS', xlab='Region')

(t5=table(mydata$y2_Rating,mydata$Region))%>%kbl()%>%kable_classic(html_font='Cambria')

	Region 1	Region 2	Region 3	Region 4	Region 5	Region 6	Region 7	Region 8	Region 9	Unreported
1-3 Rating	59	184	134	300	198	87	186	94	165	94
4-5 Rating	56	107	86	125	219	111	122	79	125	61

(t6=table(mydata$y3_Star,mydata$Region))%>%kbl()%>%kable_classic(html_font='Cambria')

	Region 1	Region 2	Region 3	Region 4	Region 5	Region 6	Region 7	Region 8	Region 9	Unreported
1-3 Stars	81	250	173	347	267	121	223	133	246	88
4-5 Stars	33	48	41	71	151	87	101	53	51	59

print('Rating/Star ~FP')

## [1] "Rating/Star ~FP"

chisq.test(t1) #Rating~FP

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  t1
## X-squared = 34.359, df = 1, p-value = 4.583e-09

chisq.test(t2) #Star~FP

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  t2
## X-squared = 0.4392, df = 1, p-value = 0.5075

print('Rating/Star ~Rural')

## [1] "Rating/Star ~Rural"

chisq.test(t3)

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  t3
## X-squared = 0.24689, df = 1, p-value = 0.6193

chisq.test(t4)

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  t4
## X-squared = 21.616, df = 1, p-value = 3.33e-06

print('Rating/Star ~Region')

## [1] "Rating/Star ~Region"

chisq.test(t5)

## 
##  Pearson's Chi-squared test
## 
## data:  t5
## X-squared = 71.017, df = 9, p-value = 9.618e-12

chisq.test(t6)

## 
##  Pearson's Chi-squared test
## 
## data:  t6
## X-squared = 118.54, df = 9, p-value < 2.2e-16

Convert Region to Dummy Variables

temp=dummy_cols(mydata$Region, remove_first_dummy = TRUE,remove_selected_columns = TRUE)
mydata=as.data.frame(cbind(mydata,temp))
mydata$Region=NULL
colnames(mydata)[15:23]=c('Region2','Region3','Region4','Region5','Region6', 'Region7', 'Region8', 'Region9', 'RegionUnreported')

Training & Test Set Split

After descriptive analysis and to investigate external validity of our model estimates, we split the data into training and test sets (80% and 20%, respectively.) Doing so allows estimation of the model performance on unseen data. The training and test split occurred prior to any transformations of the predictor variables, so that statistics estimated from the training set (if any) would not bed leaked to the test set.

set.seed(1234)
mys=sample(1:nrow(mydata), 580, replace=F) #get about an 80% sample depending on y variable
train=mydata[-mys,]
test=mydata[mys, ]

Transformation of Variables

The distributions of the quantitative variables are shown in the pairs plot from the descriptive statistics. Many of the univariate distributions appeared non-normal. While these distributions need not be normal (only the residuals), transforming them to normality often solves modeling problems. Therefore, we investigated transformations on the training set.

Y1-TPS

The only quantitative dependent variable, TPS, was evaluated for normality based on the training data. Box-Cox methods suggested a power transformation of 0.264. Raising the dependent variable to the 1/4 power, however, results in a loss of interpretability. Although a logarithmic transformation was explored, TPS was left untransformed for explanability

par(mfrow=c(1,2))

hist(train$y1_TPS, main='TPS', xlab='TPS', col='mistyrose')
hist(log(train$y1_TPS), main='Log(TPS)', xlab='Log(TPS)', col='lightsteelblue')

LCR

LCR appeared to be slightly right-skewed (see Figure XXXX). The recommended power transformation based on the training set (Box-Cox method) was 0.369. Since interpretation of a 1/3 power transformation is not necessarily intuitive and transformation of predictors may improve model build but are not required to meet model assumptions, we left the variable untransformed.

par(mfrow=c(1,3))

hist(train$LCR, main='LCR, Training Set', xlab='LCR', col='mistyrose')
hist(train$LCR^.5,main='LCR^.5, Training Set', xlab='LCR^.5', col='lightsteelblue')
hist(log(train$LCR), main='Log(LCR)', xlab='Log(LCR)', col='lightgreen')

# Power Transform
myt=powerTransform(train$LCR)
myt$lambda

## train$LCR 
## 0.3690125

Bed Utilization

Bed Utilization appeared Gaussian, and Box-Cox methods suggested a transformation of nearly 1, \(LRT_1=0.986, p=1\). An additional LRT test of normality failed to reject the null Gaussian assumption with untransformed data, \(LRT_1=0.121, p=0.728\).

hist(train$BedUtil, main='Bed Utilization', xlab='LCR', col='mistyrose')

# Power Transform
myt=powerTransform(train$BedUtil)
myt$lambda

## train$BedUtil 
##     0.9862363

testTransform(myt,1)

##                             LRT df    pval
## LR test, lambda = (1) 0.1205962  1 0.72839

Staffed Beds

Staffed beds appeared to be lognormal, although the optimal transformation calculated on the training set and based on Box-Cox power analysis was 0.12. For interpretability, an imperfect logarithmic transformation \(LRT_1=44.556, p<0.001\), which avoids using a power function generated only from the training set was applied to both the training and test set.

# Graphs
par(mfrow=c(1,2))
hist(train$StaffedBeds,main='Staffed Beds, Training Set', xlab='Staffed Beds', col='mistyrose')
hist(log(train$StaffedBeds),main='Log(Staffed Beds), Training Set', xlab='Log(Staffed Beds)', col='lightsteelblue')

# Power Transform
myt=powerTransform(train$StaffedBeds+.0001)
myt$lambda

## train$StaffedBeds + 1e-04 
##                 0.1186714

testTransform(myt,0)

##                            LRT df       pval
## LR test, lambda = (0) 44.55574  1 2.4722e-11

# Transformations
mydata$StaffedBeds=log(mydata$StaffedBeds)
train$StaffedBeds=log(train$StaffedBeds)
test$StaffedBeds=log(test$StaffedBeds)

CC MCC

CC MCC was largely left-skewed. Although the optimal normality transformation for the training set was 0.304, by Box-Cox methods, an imperfect logarithmic transformation, \(LRT_1=4634.466, p<0.001\) produced an interpretable and more visually normal distribution, even better than a square root transformation.

par(mfrow=c(1,3))

hist(train$CC_MCC,main='CC MCC', xlab='CC MCC', col='mistyrose')
hist(log(train$CC_MCC), main='Log(CC MCC)', xlab='Log(CC MCC)', col='lightsteelblue')
hist(train$CC_MCC^.5, main='CC_MCC^.5', xlab='CC_MCC^.5', col='lightgreen')

# Power Transform
myt=powerTransform(train$CC_MCC+.01)
myt$lambda

## train$CC_MCC + 0.01 
##          -0.5219373

testTransform(myt,0)

##                            LRT df       pval
## LR test, lambda = (0) 366.6845  1 < 2.22e-16

# Transformations
mydata$CC_MCC=log(mydata$CC_MCC+.01)
train$CC_MCC=log(train$CC_MCC+.01)
test$CC_MCC=log(test$CC_MCC+.01)

Medicare Days

Medicare days were obviously right skewed. The Box-Cox recommended transformation for the training set was raising to the 0.559 power, nearly a square root transformation. Thus, an imperfect square root transformation (\(LRT_1=5.578, p=0.012\)) was sufficient for both interpretability and conversion to normality, and this transformation required no information from the training set.

par(mfrow=c(1,2))

hist(train$MCareDays,main='Medicare Days', xlab='Medicare Days', col='mistyrose')
hist(train$MCareDays^.5,main='Medicare Days^(1/2)', xlab='Medicare Days', col='lightsteelblue')

# Power Transform
myt=powerTransform(train$MCareDays)
myt$lambda

## train$MCareDays 
##       0.5586571

testTransform(myt, .5)

##                              LRT df     pval
## LR test, lambda = (0.5) 5.547714  1 0.018505

# Transformations
mydata$MCareDays=mydata$MCareDays^.5
train$MCareDays=train$MCareDays^.5
test$MCareDays=test$MCareDays^.5

Medicaid Days

Medicaid days appeared roughly lognormal. While the optimal Box-Cox transformation power for the training set was 0.196, an imperfect logarithmic transform (\(LRT_1=231.300, p<0.001\)) provided a visually more normal distribution while retaining intepretability.

par(mfrow=c(1,2))
hist(train$MAidDays,main='Medicaid Days', xlab='Medicaid Days', col='mistyrose')
hist(log(train$MAidDays),main='Log(Medicaid Days)', xlab='Medicaid Days', col='lightsteelblue')

# Power Transform
myt=powerTransform(train$MAidDays)
myt$lambda

## train$MAidDays 
##      0.1958326

testTransform(myt, 0)

##                            LRT df       pval
## LR test, lambda = (0) 231.2996  1 < 2.22e-16

# Transformations
mydata$MAidDays=log(mydata$MAidDays)
train$MAidDays=log(train$MAidDays)
test$MAidDays=log(test$MAidDays)

CMI

While the optimal transformation for CMI was -0.213, an imperfect logarithmic transformation (\(LRT_1=9.088, p=0.003\)) provided a sufficiently normal and interpretable distribution by visual inspection.

par(mfrow=c(1,2))
hist(mydata$CMI,main='CMI', xlab='CMI', col='mistyrose')
hist(log(mydata$CMI),main='CMI', xlab='CMI', col='lightsteelblue')

# Power Transform
myt=powerTransform(train$CMI)
myt$lambda

##  train$CMI 
## -0.2127362

testTransform(myt,0)

##                            LRT df      pval
## LR test, lambda = (0) 9.087465  1 0.0025737

# Transformations
mydata$CMI=log(mydata$CMI)
train$CMI=log(train$CMI)
test$CMI=log(test$CMI)

ALOS

An imperfect logarithmic transformation for ALOS (\(LRT_1=32.990, p<0.001\), also made this variable visually normal while interpretable. The optimal transformation based on the training data was -0.138, \(LRT_1=0, p=1\).

par(mfrow=c(1,2))

hist(train$ALOS,main='ALOS', xlab='ALOS', col='mistyrose')
hist(log(train$ALOS),main='Log(ALOS)', xlab='Log(ALOS)', col='lightsteelblue')

# Power Trainsform
myt=powerTransform(train$ALOS)
myt$lambda

## train$ALOS 
## -0.1382294

testTransform(myt,0 )

##                            LRT df       pval
## LR test, lambda = (0) 32.89989  1 9.7029e-09

# Transformations
mydata$ALOS=log(mydata$ALOS)
train$ALOS=log(train$ALOS)
test$ALOS=log(test$ALOS)

MCI

Box-Cox transformation failed to produce a visually reasonable Gaussian distribution for MCI even when selecting the optimal lambda. The distribution is close to bimodal at the ends. MCI was thus dichotomized at the median of the training set based on its intractable, bi-modal distribution. This split is interpretable and proved useful in modeling.

To avoid leaking information from the training set to the test set, the median was calculated on the training set only. The dichotomous split for ‘below or equal to median’ and ‘above median’ were then applied to to test set.

# Power Trainsform
myt=powerTransform(train$MCI)
myt$lambda

## train$MCI 
## 0.1559153

par(mfrow=c(1,2))

hist(train$MCI, main='MCI', xlab='MCI', col='mistyrose')
hist(train$MCI^.1559153, main='MCI^.16', xlab='MCI^.16', col='lightsteelblue')

mymedian=median(train$MCI) #median from the training set

temp=mydata$MCI
temp[temp<=mymedian]=0
temp[temp>mymedian]=1
mydata$MCI=as.factor(temp)
addmargins(table(mydata$MCI))%>%kbl()%>%kable_classic(html_font='Cambria')

Var1	Freq
0	1564
1	1336
Sum	2900

temp2=train$MCI
temp2[temp2<=mymedian]=0
temp2[temp2>mymedian]=1
train$MCI=as.factor(temp2)
addmargins(table(train$MCI))%>%kbl()%>%kable_classic(html_font='Cambria')

Var1	Freq
0	1248
1	1072
Sum	2320

temp3=test$MCI
temp3[temp3<=mymedian]=0
temp3[temp3>mymedian]=1
test$MCI=as.factor(temp3)
addmargins(table(test$MCI))%>%kbl()%>%kable_classic(html_font='Cambria')

Var1	Freq
0	316
1	264
Sum	580

Final Pairs Plot

#for (i in 2:9){
#  tmpMu=mean(train[,i])
 # tmpSD=sd(train[,i])
#  mydata[,i]=(mydata[,i]-tmpMu)/tmpSD
#  train[,i]=(train[,i]-tmpMu)/tmpSD
#  test[,i]=(test[,i]-tmpMu)/tmpSD
#}

describe(train[,1:9])

##             vars    n  mean    sd median trimmed   mad   min   max range  skew
## y1_TPS         1 1973 33.47 11.31  32.00   32.65 10.38  8.67 92.67 84.00  0.76
## LCR            2 2320  0.44  0.14   0.42    0.43  0.13  0.01  0.98  0.97  0.75
## BedUtil        3 2320  0.50  0.20   0.51    0.50  0.23  0.00  1.00  1.00 -0.16
## StaffedBeds    4 2320 -2.03  0.99  -1.97   -2.00  1.06 -6.91  0.84  7.74 -0.34
## CC_MCC         5 2320 -3.59  0.51  -3.64   -3.63  0.37 -4.61 -0.27  4.33  1.26
## MCareDays      6 2320  0.35  0.09   0.35    0.35  0.08  0.02  0.80  0.78 -0.05
## MAidDays       7 2320 -3.81  1.20  -3.70   -3.72  1.04 -9.58 -0.40  9.17 -0.89
## CMI            8 2320  0.48  0.21   0.47    0.48  0.18 -0.16  1.47  1.63  0.18
## ALOS           9 2320  1.42  0.30   1.44    1.43  0.22  0.00  4.97  4.97  0.73
##             kurtosis   se
## y1_TPS          0.87 0.25
## LCR             0.79 0.00
## BedUtil        -0.74 0.00
## StaffedBeds     0.05 0.02
## CC_MCC          3.64 0.01
## MCareDays       1.22 0.00
## MAidDays        1.79 0.02
## CMI             1.00 0.00
## ALOS           15.61 0.01

describe(test[,1:9])

##             vars   n  mean    sd median trimmed   mad   min   max range  skew
## y1_TPS         1 498 34.17 11.19  33.71   33.67 11.35  6.00 69.67 63.67  0.44
## LCR            2 580  0.43  0.14   0.42    0.42  0.13  0.12  0.98  0.86  0.83
## BedUtil        3 580  0.50  0.20   0.52    0.51  0.22  0.05  0.93  0.88 -0.18
## StaffedBeds    4 580 -2.06  0.96  -1.94   -2.05  0.99 -5.12  1.01  6.12 -0.19
## CC_MCC         5 580 -3.61  0.47  -3.66   -3.64  0.36 -4.61 -1.49  3.11  1.03
## MCareDays      6 580  0.35  0.09   0.35    0.34  0.07  0.04  0.77  0.73  0.42
## MAidDays       7 580 -3.84  1.19  -3.70   -3.79  1.04 -9.93 -0.48  9.45 -0.79
## CMI            8 580  0.48  0.21   0.47    0.47  0.18 -0.20  1.66  1.86  0.72
## ALOS           9 580  1.43  0.29   1.44    1.44  0.22  0.13  3.31  3.18  0.23
##             kurtosis   se
## y1_TPS          0.09 0.50
## LCR             0.93 0.01
## BedUtil        -0.84 0.01
## StaffedBeds    -0.17 0.04
## CC_MCC          2.93 0.02
## MCareDays       2.22 0.00
## MAidDays        2.26 0.05
## CMI             3.15 0.01
## ALOS            6.44 0.01

kdepairs(mydata[, 2:9], cex.labels=2)

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

Results

Y1 Models

Y1 Marginal Effects

Using multiple regression, we evaluated TPS as a function of all predictors on the training data. The initial regression model was built on the training set, and forecasts from that model were used to assess performance out of sample on the test set. The results of the statistically signficant regression model (\(F_{1952}^{20}=27.08, p<0.001\)) are shown in Table XXX and Figure YYY. Only bed utilization and some of the regions were not statistically part of the model at the \(\alpha=.10\) level.. The \(R^2\) was only a modest 0.2172, and the standard error was 10.06 TPS points.

The estimates from the training data were used to estimate the dependent variable in the test set. The forecast were reasonable, accounting for \(R^2=0.180\) of the variance (Root Mean Squared Error = 4.339.) While imperfect, the forecasts demonstrate little loss of generality.

# Build training and test sets on full data

train1=subset(train,!is.na(train$y1_TPS),)
test1=subset(test,!is.na(test$y1_TPS),)

# Run Regression

mylm=lm(y1_TPS~., data=train1[,-c(13:14)])
#stepAIC(mylm)
summary(mylm)

## 
## Call:
## lm(formula = y1_TPS ~ ., data = train1[, -c(13:14)])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -30.873  -6.576  -0.473   6.004  40.347 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       46.7723     3.9537  11.830  < 2e-16 ***
## LCR               -5.2252     2.1583  -2.421 0.015571 *  
## BedUtil            1.0771     1.7621   0.611 0.541083    
## StaffedBeds       -4.6500     0.4415 -10.533  < 2e-16 ***
## CC_MCC             3.0878     0.6315   4.890 1.09e-06 ***
## MCareDays         -7.5009     3.1239  -2.401 0.016439 *  
## MAidDays          -0.8167     0.2427  -3.366 0.000778 ***
## CMI                8.1753     2.0497   3.988 6.89e-05 ***
## ALOS              -7.2014     1.4932  -4.823 1.53e-06 ***
## MCI1              -0.7633     0.4698  -1.625 0.104356    
## FPYes             -2.0277     0.6075  -3.338 0.000860 ***
## RuralYes           2.6841     0.8669   3.096 0.001989 ** 
## Region2           -3.1436     1.2676  -2.480 0.013222 *  
## Region3           -2.5298     1.3331  -1.898 0.057882 .  
## Region4           -4.4943     1.2188  -3.687 0.000233 ***
## Region5           -2.6371     1.2406  -2.126 0.033654 *  
## Region6            0.1982     1.3747   0.144 0.885352    
## Region7           -3.9829     1.2771  -3.119 0.001843 ** 
## Region8           -2.1886     1.4270  -1.534 0.125261    
## Region9            0.2607     1.2706   0.205 0.837438    
## RegionUnreported   1.2302     1.6764   0.734 0.463132    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.01 on 1952 degrees of freedom
## Multiple R-squared:  0.2251, Adjusted R-squared:  0.2172 
## F-statistic: 28.35 on 20 and 1952 DF,  p-value: < 2.2e-16

anova(mylm)

## Analysis of Variance Table
## 
## Response: y1_TPS
##                    Df Sum Sq Mean Sq  F value    Pr(>F)    
## LCR                 1    345   344.7   3.4420 0.0637075 .  
## BedUtil             1  11758 11757.8 117.4099 < 2.2e-16 ***
## StaffedBeds         1  24729 24728.6 246.9326 < 2.2e-16 ***
## CC_MCC              1   4109  4109.4  41.0349 1.869e-10 ***
## MCareDays           1    497   496.9   4.9619 0.0260259 *  
## MAidDays            1   1248  1248.1  12.4630 0.0004247 ***
## CMI                 1   2689  2688.9  26.8506 2.425e-07 ***
## ALOS                1   2418  2418.0  24.1453 9.678e-07 ***
## MCI                 1    644   643.5   6.4260 0.0113238 *  
## FP                  1   1884  1884.3  18.8163 1.513e-05 ***
## Rural               1    747   746.7   7.4568 0.0063765 ** 
## Region2             1    220   219.9   2.1963 0.1385056    
## Region3             1      1     1.1   0.0109 0.9170388    
## Region4             1   2003  2002.6  19.9976 8.199e-06 ***
## Region5             1    426   425.6   4.2503 0.0393761 *  
## Region6             1    338   338.2   3.3774 0.0662468 .  
## Region7             1   2084  2084.3  20.8136 5.377e-06 ***
## Region8             1    590   589.8   5.8895 0.0153212 *  
## Region9             1      5     4.6   0.0463 0.8296137    
## RegionUnreported    1     54    53.9   0.5385 0.4631322    
## Residuals        1952 195480   100.1                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#Predict regression on Unseen Test Set
par(mfrow=c(3,1))
mypred=predict(mylm,test1)
plot(test1$y1_TPS~mypred, xlab='Predictions', main='TPS~Predictions', ylab='TPS', col='lightsteelblue')
abline(lm(test1$y1_TPS~mypred), col='red')
hist(mylm$residuals, main='Residuals',xlab='Residuals', col='mistyrose')
qqnorm(mylm$residuals, col='lightgreen')

ncvTest(mylm)

## Non-constant Variance Score Test 
## Variance formula: ~ fitted.values 
## Chisquare = 164.6829, Df = 1, p = < 2.22e-16

mys=summary(lm(test1$y1_TPS~mypred))
print('Performance on Test Set')

## [1] "Performance on Test Set"

print(noquote(paste0('R2: ', mys$r.squared)))

## [1] R2: 0.17441901020975

print(noquote(paste0('Adj R2: ', mys$adj.r.squared)))

## [1] Adj R2: 0.172754532407754

print(noquote(paste0('RMSE ', sqrt(sum(test1$y1_TPS-mypred)^2/nrow(test1)))))

## [1] RMSE 4.71063536616421

myt=powerTransform(mylm$residuals+abs(min(mylm$residuals))+.001)
myt$lambda

## mylm$residuals + abs(min(mylm$residuals)) + 0.001 
##                                         0.7930931

testTransform(myt,1)

##                            LRT df       pval
## LR test, lambda = (1) 21.10645  1 4.3446e-06

ncvTest(mylm)

## Non-constant Variance Score Test 
## Variance formula: ~ fitted.values 
## Chisquare = 164.6829, Df = 1, p = < 2.22e-16

Y1-Model VIF

Colinearity was not a factor for this analysis (max variance inflation factor = 4.164).

vif(mylm)%>%kable()

	x
LCR	1.473733
BedUtil	2.079524
StaffedBeds	2.743331
CC_MCC	1.140201
MCareDays	1.161864
MAidDays	1.389762
CMI	2.385343
ALOS	1.964638
MCI	1.084715
FP	1.218265
Rural	1.153489
Region2	3.272455
Region3	2.859218
Region4	4.164477
Region5	4.036800
Region6	2.599255
Region7	3.346578
Region8	2.714347
Region9	3.312685
RegionUnreported	1.763967

Y2 Model-Rating (Logistic Regression)

For Rating, we fit all independent and control variables to the training set and evaluated accuracy metrics on the test set. Only bed utilization and some regional variables were not statistically significant at the \(\alpha=.10\) level.

When using the training parameter estimates to forecast the test data, the results were reasonable. The forecast were 66% accurate in classification with a sensitivity (recall) of 62% and a precision (positive predictive value) of 54%. Specificity was 68%.

Y2-Marginal Effects

# Subset for non-missing Y2 values

train2=subset(train,!is.na(train$y2_Rating),)
test2=subset(test,!is.na(test$y2_Rating),)

# GLM
myglm=glm(y2_Rating ~ ., data=train2[,-c(1,14)], family=binomial)
(mys=summary(myglm))

## 
## Call:
## glm(formula = y2_Rating ~ ., family = binomial, data = train2[, 
##     -c(1, 14)])
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.8745  -0.9473  -0.6182   1.0720   2.6303  
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(>|z|)    
## (Intercept)       4.78080    0.89189   5.360 8.31e-08 ***
## LCR              -2.29476    0.47277  -4.854 1.21e-06 ***
## BedUtil           0.42870    0.38302   1.119 0.263025    
## StaffedBeds      -0.41138    0.09594  -4.288 1.80e-05 ***
## CC_MCC            0.68870    0.13897   4.956 7.20e-07 ***
## MCareDays        -0.30430    0.67889  -0.448 0.653983    
## MAidDays         -0.16593    0.05252  -3.159 0.001581 ** 
## CMI               2.45613    0.45758   5.368 7.98e-08 ***
## ALOS             -2.22987    0.34424  -6.478 9.32e-11 ***
## MCI1             -0.20502    0.10128  -2.024 0.042938 *  
## FPYes            -0.87494    0.13620  -6.424 1.33e-10 ***
## RuralYes          0.41148    0.18184   2.263 0.023644 *  
## Region2          -0.83321    0.27097  -3.075 0.002105 ** 
## Region3          -1.01973    0.28284  -3.605 0.000312 ***
## Region4          -1.31350    0.26076  -5.037 4.72e-07 ***
## Region5          -0.72288    0.26050  -2.775 0.005521 ** 
## Region6          -0.48251    0.28706  -1.681 0.092787 .  
## Region7          -0.95333    0.27038  -3.526 0.000422 ***
## Region8          -0.89022    0.30486  -2.920 0.003500 ** 
## Region9          -0.87352    0.26826  -3.256 0.001129 ** 
## RegionUnreported -0.85820    0.30425  -2.821 0.004791 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2819.2  on 2073  degrees of freedom
## Residual deviance: 2472.2  on 2053  degrees of freedom
## AIC: 2514.2
## 
## Number of Fisher Scoring iterations: 4

options(scipen=1)
temp=as.data.frame(as.matrix(mys$coefficients))
temp=round(temp,3)
temp$OR=round(exp(temp[,1]),3)

mypred=round(predict(myglm, test2, type='response'),0)

# ROC

perf_AUC=performance(prediction(mypred,test2$y2_Rating),"auc") #Calculate the AUC value
AUC=perf_AUC@y.values[[1]]
perf_ROC=performance(prediction(mypred,test2$y2_Rating),"tpr","fpr") #plot the actual ROC curve
plot(perf_ROC, main="ROC plot")
text(0.5,0.5,paste("AUC = ",format(AUC, digits=5, scientific=FALSE)))

# Confusion Matrix and Accuracy Metrics
confusionMatrix(as.factor(as.numeric(test2$y2_Rating)-1), as.factor(mypred),positive='1')

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0 221  73
##          1 104 120
##                                           
##                Accuracy : 0.6583          
##                  95% CI : (0.6157, 0.6991)
##     No Information Rate : 0.6274          
##     P-Value [Acc > NIR] : 0.07888         
##                                           
##                   Kappa : 0.2922          
##                                           
##  Mcnemar's Test P-Value : 0.02414         
##                                           
##             Sensitivity : 0.6218          
##             Specificity : 0.6800          
##          Pos Pred Value : 0.5357          
##          Neg Pred Value : 0.7517          
##              Prevalence : 0.3726          
##          Detection Rate : 0.2317          
##    Detection Prevalence : 0.4324          
##       Balanced Accuracy : 0.6509          
##                                           
##        'Positive' Class : 1               
##

Y3 Model-Logistic Regression

Y3 Marginal Effects

For Star, we also fit all independent and control variables to the training set and evaluated accuracy metrics on the test set. Only CC MCC and MCI were not in the model at the \(\alpha=.10\) level.

The forecast of the test set was 78% accurate with a sensitivity of 68% but a low positive predictive value of 38% indicating over classification of the 4-5 Star factor level.

train3=subset(train,!is.na(train$y3_Star),)
test3=subset(test,!is.na(test$y3_Star),)

myglm1=glm(formula = y3_Star ~ .,data = train3[,-c(1,13)],family=binomial, n_iter=1000, control = list(maxit = 10000))

summary(myglm1)

## 
## Call:
## glm(formula = y3_Star ~ ., family = binomial, data = train3[, 
##     -c(1, 13)], control = list(maxit = 10000), n_iter = 1000)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.2660  -0.6928  -0.4625   0.2446   2.6147  
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      -0.59381    0.97152  -0.611 0.541054    
## LCR              -1.15299    0.54854  -2.102 0.035560 *  
## BedUtil          -0.91674    0.45006  -2.037 0.041655 *  
## StaffedBeds      -0.98380    0.10843  -9.073  < 2e-16 ***
## CC_MCC            0.59646    0.14335   4.161 3.17e-05 ***
## MCareDays        -1.27511    0.78624  -1.622 0.104850    
## MAidDays         -0.21779    0.05894  -3.695 0.000220 ***
## CMI               3.90830    0.48177   8.112 4.96e-16 ***
## ALOS             -0.46748    0.35400  -1.321 0.186645    
## MCI1             -0.03277    0.11989  -0.273 0.784596    
## FPYes            -0.96866    0.17459  -5.548 2.89e-08 ***
## RuralYes          0.95798    0.19803   4.838 1.31e-06 ***
## Region2          -1.23858    0.32204  -3.846 0.000120 ***
## Region3          -1.33632    0.33744  -3.960 7.49e-05 ***
## Region4          -1.13911    0.30152  -3.778 0.000158 ***
## Region5          -0.57444    0.28977  -1.982 0.047433 *  
## Region6          -0.59169    0.31597  -1.873 0.061125 .  
## Region7          -0.96252    0.31009  -3.104 0.001909 ** 
## Region8          -0.97887    0.34104  -2.870 0.004101 ** 
## Region9          -1.28368    0.31616  -4.060 4.90e-05 ***
## RegionUnreported -0.57968    0.34701  -1.671 0.094817 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2421.4  on 2099  degrees of freedom
## Residual deviance: 1880.2  on 2079  degrees of freedom
## AIC: 1922.2
## 
## Number of Fisher Scoring iterations: 5

mypred2=round(predict(myglm1, test3, type='response'),0)


# ROC

perf_AUC=performance(prediction(mypred2,test3$y3_Star),"auc") #Calculate the AUC value
AUC=perf_AUC@y.values[[1]]
perf_ROC=performance(prediction(mypred2,test3$y3_Star),"tpr","fpr") #plot the actual ROC curve
plot(perf_ROC, main="ROC plot")
text(0.5,0.5,paste("AUC = ",format(AUC, digits=5, scientific=FALSE)))

confusionMatrix( as.factor(as.numeric(test3$y3_Star)-1), as.factor(mypred2), positive='1')

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0 356  26
##          1  88  54
##                                          
##                Accuracy : 0.7824         
##                  95% CI : (0.7446, 0.817)
##     No Information Rate : 0.8473         
##     P-Value [Acc > NIR] : 1              
##                                          
##                   Kappa : 0.3618         
##                                          
##  Mcnemar's Test P-Value : 1.109e-08      
##                                          
##             Sensitivity : 0.6750         
##             Specificity : 0.8018         
##          Pos Pred Value : 0.3803         
##          Neg Pred Value : 0.9319         
##              Prevalence : 0.1527         
##          Detection Rate : 0.1031         
##    Detection Prevalence : 0.2710         
##       Balanced Accuracy : 0.7384         
##                                          
##        'Positive' Class : 1              
##

All Coefficients, Marginal

When comparing directionality of the coefficients, only those that lacked statistical significance in one or more models showed difference in direction. These variables included bed utilization, CC MCC , and Regions 6, 9, and Unknown. The congruence of the directionality would indicate that most of the predictors are directionaly stable for these three quality metrics. Aside from Rural status and CMI, the predictors are negative, meaning that increases are associated with reduced quality. Rural status and CMI, however, are positive, indicating that increases result in increased quality.

tmp1=round(summary(mylm)$coefficients[,c(1,4)],3)
tmp2=round(summary(myglm)$coefficients[,c(1,4)],3)
tmp3=round(summary(myglm1)$coefficients[,c(1,4)],3)

mycoef=cbind(tmp1,tmp2, tmp3)
partial=mycoef[, c(1,3,5)]
tmp=apply(partial,1,function(x) sum((sign(x))))
tmp[abs(tmp)<3]="Mixed"
tmp[tmp==-3]="All -"
tmp[tmp==3]="All+"

mycoef=as.data.frame(mycoef)
mycoef$Congruence=as.vector(tmp)

colnames(mycoef)=c('TPS','p', 'Rating','p', 'Stars','p', 'Congruence')
mycoef=mycoef[2:nrow(mycoef),]
mycoef[order(mycoef$Congruence),]

##                     TPS     p Rating     p  Stars     p Congruence
## LCR              -5.225 0.016 -2.295 0.000 -1.153 0.036      All -
## StaffedBeds      -4.650 0.000 -0.411 0.000 -0.984 0.000      All -
## MCareDays        -7.501 0.016 -0.304 0.654 -1.275 0.105      All -
## MAidDays         -0.817 0.001 -0.166 0.002 -0.218 0.000      All -
## ALOS             -7.201 0.000 -2.230 0.000 -0.467 0.187      All -
## MCI1             -0.763 0.104 -0.205 0.043 -0.033 0.785      All -
## FPYes            -2.028 0.001 -0.875 0.000 -0.969 0.000      All -
## Region2          -3.144 0.013 -0.833 0.002 -1.239 0.000      All -
## Region3          -2.530 0.058 -1.020 0.000 -1.336 0.000      All -
## Region4          -4.494 0.000 -1.314 0.000 -1.139 0.000      All -
## Region5          -2.637 0.034 -0.723 0.006 -0.574 0.047      All -
## Region7          -3.983 0.002 -0.953 0.000 -0.963 0.002      All -
## Region8          -2.189 0.125 -0.890 0.003 -0.979 0.004      All -
## CC_MCC            3.088 0.000  0.689 0.000  0.596 0.000       All+
## CMI               8.175 0.000  2.456 0.000  3.908 0.000       All+
## RuralYes          2.684 0.002  0.411 0.024  0.958 0.000       All+
## BedUtil           1.077 0.541  0.429 0.263 -0.917 0.042      Mixed
## Region6           0.198 0.885 -0.483 0.093 -0.592 0.061      Mixed
## Region9           0.261 0.837 -0.874 0.001 -1.284 0.000      Mixed
## RegionUnreported  1.230 0.463 -0.858 0.005 -0.580 0.095      Mixed

LCR

Sith

2023-04-16