Overview

Data Background: The dataset contains information about two sets of hospitals, one that participated in a policy intervention designed to improve patient safety (Treatment = TRUE), and one that did not. The goal was to reduce central-line associated blood stream infections (CLABSIs), with lower rates being more desirable. The policy was implemented at the beginning of 2016. The pre-intervention results are in the SIR15 column, and the post-intervention results are in the SIR16 column.

Task: Complete an initial exploration of the data to estimate the effect of the intervention on CLABSI rates.

Hypotheses: The null hypothesis is the increase(or decease) in CLABSI rates between the two years is the same for the hospitals with the intervention and those without. The alternative hypothesis is that the hospitals with the intervention had a greater decrease or a smaller increase in CLABSI rates from 2015 to 2016.

Outcome Variable: The outcome variable is “diffs”. Diffs is the difference between the 2016 CLABSI rate and the 2015 CLABSI rate for each hospital. This difference measures the percent change in CLABSI rate from 2015 to 2016.

Methods: I used a right-sided weighted 2-sample T-test to investigate whether hospitals with the policy implementation had a greater decrease in CLABSI rates from 2015 to 2016 than hospitals that did not receive the policy implentation. The assumptions of this test (normality and equal variance) were satisfied. I also investigated the correlations between the various predictors and the difference in CLABSI from 2015 to 2016. Lastly, I built regression models to see if treatment is a relevant predictor of the 2015 to 2016 differences. The regression assumptions were satisfied.

Findings: The right-sided t-test was inconclusive (p=.9792), meaning there was no evidence that the hospitals that received the policy experienced a greater decrease in CLABSI rates. When performing the two-sided test to see if a difference between the two groups exists at all, the results were in fact significant at the 5% level (p=.04). There does appear to be a difference between hospitals that received the policy intervention compared to those that did not, with regards to infection rates. However, it appears that the hospitals with no policy intervention had on average a greater decrese in CLABSI rates(M~intervention = -.11 M~nointervention = -.18). The policy group experienced on average an 11% decrease in CLABSI rates whereas the no policy group experienced about an 18% decrease in CLABSI rates, from 2015 to 2016. This is the opposite of the outcome we expected to see. However, these results are not sufficient to say that the difference is based on the policy intervention. It is also interesting to note that while the non-intervention group had a greater decrease, it also started off with much higher CLABSI rates in 2016.

The correlation plot showed that receiving the policy was not highly correlated with the outcome variable. The regression on the outcome variable performed very poorly(R^2 = .01), showing that the variables did a very poor job of predicting the difference in CLABSI rates. The regression on 2016 CLABSI rates (while excluding 2015 CLABSI rates) also performed very poorly (R^2=.04), but policy was a significant variable. When including 2015 CLABSI rates the regression performed much better, explaining approximately 17% of the variation in the response (R^2 = .17). Again, even when including 2015 CLABSI rates, the policy variable was significant. More so, the coefficient of having the policy is negative, therefore, having the policy intervention is corresponding to a lower CLABSI rate in 2016 compared to hospitals without the intervention, after accounting for the other variables included in the model. However, the effect of policy is very small.

Conclusion: I conclude that there is no evidence that the policy is affecting CLABSI rates. The T-test suggested that actually having the policy intervention was associated with a less drastic decrease in CLABSI rates from 2015 to 2016. However, the regression showed that the policy was not significant when accounting for other variables, so this difference between treatment groups seen in the T-test may be due to a confounding variable. The only variable that effectively predicts 2016 CLABSI rates is 2015 CLABSI rates, and from the bar charts we saw that the treatment was not evenly applied to hospitals respective of the 2015 rates. The non-intervention group started off with higher CLABSI rates than the intervention group. The effort of decreasing CLABSI rates is likely one of diminishing returns, meaning, once the rates have reached a certain low, it becomes increasingly difficult to continue to lower them. This could explain why the non-intervention group has a more dramatic decrease than the policy intervention group.

Limitations & Future Research: I would have liked to have had patient level data to build a hierarchical model that includes both patient level and hospital level information. As the data stands, we do not know if the aggregated or hospital level data is obscuring an important finding at the patient’s level. We could be seeing a case of Simpson’s paradox. I would also like to know how the hospitals were chosen for treatments. Was this decided randomly? It would have been better if the treatment had been applied more evenly to hospitals with higher and lower CLABSI rates. Lastly, I would like to repeat my analysis having removed the negative outliers originally identified in the histogram of the differences.

Importing the Data

rm(list=ls()) # Clear objects from Memory
cat("\014") # Clear Console

setwd("~/Dropbox/mathematica")
data <- read.csv("HOSPITAL_SAFETY.csv", stringsAsFactors = FALSE)
#lapply(data, str)

Packages

library(ggplot2)
library(tidyr)
library(weights)
library(VIM)

Data Cleaning

Six variables in the dataset have missing values. The total percent of missingness is 5% for the whole dataset. Variables relating to our outcome variable cannot have missing values, so those rows need be eliminated. Once those rows were eliminated, the remaining attributes with missing values accounted for less than 1% of the data, so the decision was made to remove these remaining missing data rows, rather than perform multiple imputation.

# recode region to be character
data$REGION <- as.character(data$REGION)

# missing data
misvars <-colnames(data)[ apply(data, 2, anyNA) ]
length(misvars)

## [1] 6

misvars

## [1] "USES_EHRs" "DENOM16"   "SIR16"     "DENOM15"   "SIR15"     "TREATMENT"

ratemis <- function(y){
  mis <- sum(is.na(y))
  obs <- nrow(y)*ncol(y)
  return(mis/obs)
}

# 5% of total data is missing
ratemis(data)

## [1] 0.04728339

#######
ratemiscol <- function(x){
  mis <- sum(is.na(x))
  obs <- length(x)
  return(mis/obs)
}

#missingnes for each variable
apply(data, 2, ratemiscol)

##            HOSP.ID               NAME             COUNTY 
##         0.00000000         0.00000000         0.00000000 
##             REGION        URBAN_RURAL         WAGE_INDEX 
##         0.00000000         0.00000000         0.00000000 
##   RESIDENT_TO_BEDS               BEDS             DSHPCT 
##         0.00000000         0.00000000         0.00000000 
##     COST_TO_CHARGE      ANNUAL_VOLUME            CASEMIX 
##         0.00000000         0.00000000         0.00000000 
##               TYPE          OWNERSHIP EMERGENCY_SERVICES 
##         0.00000000         0.00000000         0.00000000 
##          USES_EHRs            DENOM16              SIR16 
##         0.01716212         0.06834202         0.38553478 
##            DENOM15              SIR15          TREATMENT 
##         0.06895495         0.37542139         0.07753601

# combinations of missingness
aggr(data, numbers = TRUE)

## Warning in plot.aggr(res, ...): not enough horizontal space to display
## frequencies

# can't have missing data in the outcome variables
data_clean <- data[rowSums(is.na(data[c("SIR15", "SIR16", "DENOM15", "DENOM16")])) == 0,]
#apply(data_clean, 2, ratemiscol)
data_clean <- data_clean[rowSums(is.na(data_clean[c("TREATMENT", "USES_EHRs")])) == 0,]
#ratemis(data_clean)

Preparing for a T-Test

Investigating Outcome Variable

The outcome variable is “diffs”. Diffs is the difference between the 2016 CLABSI rate and the 2015 CLABSI rate for each hospital. This difference measures the percent change in CLABSI rate from 2015 to 2016. I want to investigate whether the percent change in CLABSI rates is higher for the hospitals that received policy intervention compared to those hospitals that did not.

In order to perform a t-test, we must first check the assumptions. Namely, is the outcome variable (differences) approximately normally distributed? And, is there approximately equal variance between the two groups (treated hospitals vs. untreated hospitals)? Does the outcome variable have any significant outliers?

There does not appear to be a problem with normality; the data forms a nice bell-shaped curve. Based on the F-test, there does not appear to be a problem with equal variance (p=.2295). The F-test assumes a null that the two groups have equal variance, and is only overturned if there is evidence to contrary. There do appear to be some negative outliers, however, I am going to ignore those for my initial analysis.

There does not appear to be a problem with normality or equal variance, so the assumptions for the T-test are satisified.

# outcome variable
data_clean$diffs <- data_clean$SIR16 -data_clean$SIR15

# looks like the rates are normally distributed for both the trt and control groups
#par(mfrow=c(1,3))
#qplot(data_clean[data_clean$TREATMENT == TRUE, "diffs"])
#qplot(data_clean[data_clean$TREATMENT == FALSE, "diffs"])
qplot(data_clean[, "diffs"])

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

# checking for equal variance
#data_clean$group <- ifelse(data_clean$TREATMENT == TRUE, 1, 0)
# based on a p-value of .1922, there is no evidence to reject the null. We cannot say
# the true ratio of variances is not equal to 1. In other words, we continue assuming 
# the variances are equal.
var.test(data_clean$diffs ~ data_clean$TREATMENT, data, alternative = "two.sided")

## 
##  F test to compare two variances
## 
## data:  data_clean$diffs by data_clean$TREATMENT
## F = 1.0816, num df = 754, denom df = 1198, p-value = 0.2295
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.9517045 1.2316647
## sample estimates:
## ratio of variances 
##           1.081636

Weighted 2-sample t-test

The weighted t-test accounts for the average number of patients who were given central lines in the hospital between the two years. This is important because our outcome variable is a rate, and therefore does not account for differences in volume. A hospital that has a 20% increase in CLABSI rates, but only saw 100 patients, is less impressive than a hospital that saw a 20% increase with 1000 patients.

The t-test tested whether the treatment group had a greater decrease in CLABSI rates compared to the non-treatment group. The test was inconclusive (p=.9792), meaning there was no evidence that the treatment group experienced a greater decrease. When performing the two sided test, to see if a difference between the two groups exists at all, the results were in fact significant at the 5% level (p=.04). There does appear to be a difference between hospitals that received the policy intervention compared to those that did not, with regards to infection rates. However, it appears that the hospitals with no policy intervention had on average a greater decrese in CLABSI rates(M~intervention = -.11 M~nointervention = -.18). The treatment group experienced on average an 11% decrease in CLABSI rates whereas the nontreatment group experienced about an 18% decrease in CLABSI rates, from 2015 to 2016. This is the opposite of the outcome we expected. However, these results are not sufficient to say that the difference is based on the policy intervention. It is also interesting to note that while the non-intervention group had a greater decrease, it also started off much higher in 2016. The effort of decreasing CLABSI rates is likely one of diminishing returns, meaning, once the rates have reached a certain low, it becomes increasingly difficult to continue to lower them. This could explain why the non-intervention group has a more dramatic decrease. It is also suspect that the non-intervention group started off with higher CLABSI rates. It would have been better if the treatment had been applied so to evenly distribute to hospitals with higher and lower CLABSI rates.

data_clean$weight <- (data_clean$DENOM16 + data_clean$DENOM15)/2

wtd.t.test(data_clean[data_clean$TREATMENT == TRUE, "diffs"], data_clean[data_clean$TREATMENT == FALSE, "diffs"], weight =  data_clean[data_clean$TREATMENT == TRUE, "weight"], weighty = data_clean[data_clean$TREATMENT == FALSE, "weight"], alternative = "less" )

## Warning in wtd.t.test(data_clean[data_clean$TREATMENT == TRUE, "diffs"], :
## Treating data for x and y separately because they are of different lengths

## $test
## [1] "Two Sample Weighted T-Test (Welch)"
## 
## $coefficients
##      t.value           df      p.value 
##    2.1651947 1422.7316502    0.9847307 
## 
## $additional
##  Difference      Mean.x      Mean.y    Std. Err 
##  0.06736212 -0.11166162 -0.17902374  0.03111134

wtd.t.test(data_clean[data_clean$TREATMENT == TRUE, "diffs"], data_clean[data_clean$TREATMENT == FALSE, "diffs"], weight =  data_clean[data_clean$TREATMENT == TRUE, "weight"], weighty = data_clean[data_clean$TREATMENT == FALSE, "weight"], alternative = "two.sided" )

## Warning in wtd.t.test(data_clean[data_clean$TREATMENT == TRUE, "diffs"], :
## Treating data for x and y separately because they are of different lengths

## $test
## [1] "Two Sample Weighted T-Test (Welch)"
## 
## $coefficients
##      t.value           df      p.value 
## 2.165195e+00 1.422732e+03 3.053855e-02 
## 
## $additional
##  Difference      Mean.x      Mean.y    Std. Err 
##  0.06736212 -0.11166162 -0.17902374  0.03111134

# Visualization
data_clean$group <- ifelse(data_clean$TREATMENT == TRUE, "Policy Intervention", "No Policy")
ggplot(data_clean, aes(x=factor(group), y=diffs)) + stat_summary(fun.y="mean", geom="bar", fill = c("red", "blue")) + xlab("Hospital Category") + ggtitle("Average % Decrease in CLABSI Rates from 2015-2016 by Treatment Group") + ylab("Average Percent Change from 2015") + scale_fill_manual(values = c("Policy", "No Policy"))

par(mfrow=c(1,2))
ggplot(data_clean, aes(x=factor(group), y=SIR15)) + stat_summary(fun.y="mean", geom="bar", fill = c("red", "blue")) + xlab("Hospital Category") + ggtitle("Average CLABSI Rates in 2015") + ylab("Average CLABSI rate") + scale_fill_manual(values = c("Policy", "No Policy"))

ggplot(data_clean, aes(x=factor(group), y=SIR16)) + stat_summary(fun.y="mean", geom="bar", fill = c("red", "blue")) + xlab("Hospital Category") + ggtitle("Average CLABSI Rates in 2016") + ylab("Average CLABSI rate") + scale_fill_manual(values = c("Policy", "No Policy"))

par(mfrow = c(1,1))

Relationship Between Variables

Correlations

The correlations are all very small (less than .1) with the variable difference. It does not appear any of the variables in the dataset, including treatment, are correlated with the differences in CLABSI rates from 2015 to 2016.

zzz <- data_clean[sapply(data_clean, class) %in% c("numeric", "integer")]
correlations <- cor(x=zzz$diffs, y=zzz, use="pairwise.complete.obs") # Acquired is the target

correlations

##         HOSP.ID     COUNTY WAGE_INDEX RESIDENT_TO_BEDS         BEDS
## [1,] 0.02156722 0.01541303 0.04028394       -0.0434569 -0.008580192
##           DSHPCT COST_TO_CHARGE ANNUAL_VOLUME      CASEMIX      DENOM16
## [1,] -0.05314729     -0.0603475   0.003859495 -0.008651786 -0.002556181
##          SIR16       DENOM15      SIR15 diffs       weight
## [1,] 0.5132796 -0.0001217933 -0.6337421     1 -0.001334298

names(correlations) <- names(zzz)

correlationsNoNa <- correlations[!is.na(correlations)]

length(correlationsNoNa)

## [1] 15

StrongCorrelations <- correlationsNoNa[abs(correlationsNoNa) > 0.01]

StrongCorrelations <- sort(StrongCorrelations)

par(mar=c(5.1,10.1,2.1,1.1))

barplot(StrongCorrelations, main=paste("Strong Correlations"), horiz=T, cex.names=0.8, col="cyan", las=2, xlab="Correlation")

Regression

For the regression, I had to remove TYPE and USES_EHRs, as both of these varibles only contained 1 distinct value. I built three regression models, (1) to model the differences (2) to model CLABSI rates in 2016 (3) to model CLABSI rates in 2016, with CLABSI rates in 2015 as an included predictor.

Modeling the Differences: Pearson’s moment (R^2) is too close to zero, which means that none of the variables are significant predictors. It does not seem that there is any linear relationship between these variables. Treatment is not noted as significant.
Modeling SIR16: While treatment appears significant in predicting CLABSI rates in 2016, the amount of variance explained by the model is still less than 5%(R^2 = .04).
Modeling SIR16 with SIR15 included: SIR15 is a strong predictor of SIR16 (not surprisingly). About 17% of the variance in the outcome variable can be explained by this model (R^2=.17). Treatment appears to still be a significant predictor of SIR16 when SIR15 is included, which is somewhat interesting. The coefficient of treatment(true) is negative, therefore having the policy intervention is corresponding to a lower CLABSI rate in 2016 compared to hospitals without the intervention, after accounting for the other variables included in the model. The treatment variable is significant, however, the effect is very small. The effect of this individual variable could be calculated by removing it from the model and subtracting the first model’s AIC from the second model’s AIC. However, it is clear from our other models that treatment is contributing very little to the R^2 produced here.

#lapply(data_clean, str)
chars <- data_clean[, sapply(data_clean, class) == 'character']
#lapply(chars, unique)
log <- data_clean[, sapply(data_clean, class) == 'logical']
#lapply(log, unique)

# modeling differences
data_model <- subset(data_clean, select = -c(HOSP.ID, NAME, DENOM15, DENOM16, SIR15, SIR16, TYPE, USES_EHRs, group))
hosp_mod <- lm(data_model$diffs ~ . - weight, data = data_model, weights = weight)
summary(hosp_mod)

## 
## Call:
## lm(formula = data_model$diffs ~ . - weight, data = data_model, 
##     weights = weight)
## 
## Weighted Residuals:
##     Min      1Q  Median      3Q     Max 
## -387.19  -34.64    3.58   35.47  407.48 
## 
## Coefficients:
##                                                        Estimate Std. Error
## (Intercept)                                          -1.389e+00  4.833e-01
## COUNTY                                                3.765e-06  1.146e-06
## REGION2                                              -4.572e-02  8.911e-02
## REGION3                                               1.160e-01  9.548e-02
## REGION4                                               8.596e-02  9.368e-02
## REGION40                                              5.160e-01  4.431e-01
## REGION5                                              -3.322e-02  1.130e-01
## REGION6                                               4.168e-02  1.044e-01
## REGION7                                               3.188e-02  1.045e-01
## REGION8                                               1.062e-02  1.060e-01
## REGION9                                               3.871e-02  9.294e-02
## URBAN_RURALOURBAN                                     4.789e-02  3.499e-02
## URBAN_RURALRURAL                                      8.047e-02  8.530e-02
## WAGE_INDEX                                            3.450e-01  1.476e-01
## RESIDENT_TO_BEDS                                      2.490e-02  8.775e-02
## BEDS                                                  9.289e-05  1.162e-04
## DSHPCT                                               -2.404e-01  1.241e-01
## COST_TO_CHARGE                                       -3.246e-01  1.498e-01
## ANNUAL_VOLUME                                        -6.533e-06  7.168e-06
## CASEMIX                                               4.011e-02  7.990e-02
## OWNERSHIPGovernment - Hospital District or Authority  5.618e-01  3.703e-01
## OWNERSHIPGovernment - Local                           7.259e-01  3.753e-01
## OWNERSHIPGovernment - State                           5.619e-01  3.756e-01
## OWNERSHIPPhysician                                    8.610e-01  5.148e-01
## OWNERSHIPProprietary                                  7.298e-01  3.684e-01
## OWNERSHIPVoluntary non-profit - Church                6.779e-01  3.688e-01
## OWNERSHIPVoluntary non-profit - Other                 6.043e-01  3.685e-01
## OWNERSHIPVoluntary non-profit - Private               6.887e-01  3.671e-01
## EMERGENCY_SERVICESTRUE                                1.383e-01  1.402e-01
## TREATMENTTRUE                                         3.824e-02  3.339e-02
##                                                      t value Pr(>|t|)   
## (Intercept)                                           -2.874  0.00410 **
## COUNTY                                                 3.286  0.00103 **
## REGION2                                               -0.513  0.60797   
## REGION3                                                1.215  0.22461   
## REGION4                                                0.918  0.35898   
## REGION40                                               1.165  0.24432   
## REGION5                                               -0.294  0.76873   
## REGION6                                                0.399  0.68988   
## REGION7                                                0.305  0.76036   
## REGION8                                                0.100  0.92025   
## REGION9                                                0.417  0.67706   
## URBAN_RURALOURBAN                                      1.369  0.17123   
## URBAN_RURALRURAL                                       0.943  0.34560   
## WAGE_INDEX                                             2.338  0.01948 * 
## RESIDENT_TO_BEDS                                       0.284  0.77659   
## BEDS                                                   0.799  0.42432   
## DSHPCT                                                -1.937  0.05289 . 
## COST_TO_CHARGE                                        -2.167  0.03036 * 
## ANNUAL_VOLUME                                         -0.911  0.36216   
## CASEMIX                                                0.502  0.61569   
## OWNERSHIPGovernment - Hospital District or Authority   1.517  0.12946   
## OWNERSHIPGovernment - Local                            1.934  0.05325 . 
## OWNERSHIPGovernment - State                            1.496  0.13487   
## OWNERSHIPPhysician                                     1.672  0.09460 . 
## OWNERSHIPProprietary                                   1.981  0.04769 * 
## OWNERSHIPVoluntary non-profit - Church                 1.838  0.06617 . 
## OWNERSHIPVoluntary non-profit - Other                  1.640  0.10121   
## OWNERSHIPVoluntary non-profit - Private                1.876  0.06081 . 
## EMERGENCY_SERVICESTRUE                                 0.986  0.32404   
## TREATMENTTRUE                                          1.145  0.25231   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 62.98 on 1924 degrees of freedom
## Multiple R-squared:  0.02748,    Adjusted R-squared:  0.01283 
## F-statistic: 1.875 on 29 and 1924 DF,  p-value: 0.003241

par(mfrow = c(2,2))
plot(hosp_mod)

# modeling SIR16
data_model2 <- subset(data_clean, select = -c(HOSP.ID, NAME, DENOM15, DENOM16, SIR15, diffs, TYPE, USES_EHRs, group))
hosp_mod2 <- lm(data_model2$SIR16 ~ . - weight, data = data_model2, weights = weight)
summary(hosp_mod2)

## 
## Call:
## lm(formula = data_model2$SIR16 ~ . - weight, data = data_model2, 
##     weights = weight)
## 
## Weighted Residuals:
##     Min      1Q  Median      3Q     Max 
## -143.31  -33.99   -7.52   26.55  408.92 
## 
## Coefficients:
##                                                        Estimate Std. Error
## (Intercept)                                          -3.523e-01  4.070e-01
## COUNTY                                               -2.586e-06  9.650e-07
## REGION2                                               1.951e-01  7.506e-02
## REGION3                                               2.644e-01  8.042e-02
## REGION4                                               1.548e-01  7.891e-02
## REGION40                                              4.837e-01  3.732e-01
## REGION5                                               2.180e-01  9.514e-02
## REGION6                                               1.268e-01  8.796e-02
## REGION7                                               2.799e-01  8.802e-02
## REGION8                                               7.111e-03  8.932e-02
## REGION9                                               1.246e-02  7.828e-02
## URBAN_RURALOURBAN                                     2.482e-04  2.947e-02
## URBAN_RURALRURAL                                     -5.645e-02  7.185e-02
## WAGE_INDEX                                            2.245e-01  1.243e-01
## RESIDENT_TO_BEDS                                      1.068e-01  7.391e-02
## BEDS                                                 -1.634e-04  9.790e-05
## DSHPCT                                                2.949e-01  1.045e-01
## COST_TO_CHARGE                                       -5.143e-02  1.262e-01
## ANNUAL_VOLUME                                         4.514e-06  6.038e-06
## CASEMIX                                               2.011e-01  6.730e-02
## OWNERSHIPGovernment - Hospital District or Authority  1.895e-01  3.119e-01
## OWNERSHIPGovernment - Local                           2.662e-01  3.161e-01
## OWNERSHIPGovernment - State                           2.179e-01  3.164e-01
## OWNERSHIPPhysician                                    2.691e-01  4.337e-01
## OWNERSHIPProprietary                                  4.029e-01  3.103e-01
## OWNERSHIPVoluntary non-profit - Church                2.118e-01  3.106e-01
## OWNERSHIPVoluntary non-profit - Other                 2.225e-01  3.104e-01
## OWNERSHIPVoluntary non-profit - Private               3.121e-01  3.092e-01
## EMERGENCY_SERVICESTRUE                                2.546e-01  1.181e-01
## TREATMENTTRUE                                        -1.049e-01  2.813e-02
##                                                      t value Pr(>|t|)    
## (Intercept)                                           -0.865 0.386876    
## COUNTY                                                -2.679 0.007439 ** 
## REGION2                                                2.599 0.009408 ** 
## REGION3                                                3.287 0.001031 ** 
## REGION4                                                1.962 0.049880 *  
## REGION40                                               1.296 0.195120    
## REGION5                                                2.292 0.022039 *  
## REGION6                                                1.441 0.149736    
## REGION7                                                3.179 0.001499 ** 
## REGION8                                                0.080 0.936554    
## REGION9                                                0.159 0.873545    
## URBAN_RURALOURBAN                                      0.008 0.993281    
## URBAN_RURALRURAL                                      -0.786 0.432166    
## WAGE_INDEX                                             1.807 0.070989 .  
## RESIDENT_TO_BEDS                                       1.445 0.148615    
## BEDS                                                  -1.669 0.095209 .  
## DSHPCT                                                 2.822 0.004826 ** 
## COST_TO_CHARGE                                        -0.408 0.683625    
## ANNUAL_VOLUME                                          0.748 0.454796    
## CASEMIX                                                2.988 0.002844 ** 
## OWNERSHIPGovernment - Hospital District or Authority   0.608 0.543509    
## OWNERSHIPGovernment - Local                            0.842 0.399898    
## OWNERSHIPGovernment - State                            0.689 0.491142    
## OWNERSHIPPhysician                                     0.621 0.534995    
## OWNERSHIPProprietary                                   1.298 0.194289    
## OWNERSHIPVoluntary non-profit - Church                 0.682 0.495371    
## OWNERSHIPVoluntary non-profit - Other                  0.717 0.473677    
## OWNERSHIPVoluntary non-profit - Private                1.009 0.312916    
## EMERGENCY_SERVICESTRUE                                 2.155 0.031258 *  
## TREATMENTTRUE                                         -3.731 0.000196 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 53.05 on 1924 degrees of freedom
## Multiple R-squared:  0.06922,    Adjusted R-squared:  0.05519 
## F-statistic: 4.934 on 29 and 1924 DF,  p-value: 2.618e-16

par(mfrow = c(2,2))
plot(hosp_mod2)

# modeling SIR16, including SIR15
data_model3 <- subset(data_clean, select = -c(HOSP.ID, NAME, DENOM15, DENOM16, diffs, TYPE, USES_EHRs, group))
hosp_mod3 <- lm(data_model3$SIR16 ~ . - weight, data = data_model3, weights = weight)
summary(hosp_mod3)

## 
## Call:
## lm(formula = data_model3$SIR16 ~ . - weight, data = data_model3, 
##     weights = weight)
## 
## Weighted Residuals:
##     Min      1Q  Median      3Q     Max 
## -142.31  -30.82   -6.94   24.67  334.78 
## 
## Coefficients:
##                                                        Estimate Std. Error
## (Intercept)                                          -6.946e-01  3.799e-01
## COUNTY                                               -4.880e-07  9.078e-07
## REGION2                                               1.156e-01  7.012e-02
## REGION3                                               2.153e-01  7.502e-02
## REGION4                                               1.321e-01  7.357e-02
## REGION40                                              4.944e-01  3.479e-01
## REGION5                                               1.350e-01  8.881e-02
## REGION6                                               9.865e-02  8.201e-02
## REGION7                                               1.979e-01  8.218e-02
## REGION8                                               8.270e-03  8.326e-02
## REGION9                                               2.113e-02  7.297e-02
## URBAN_RURALOURBAN                                     1.598e-02  2.748e-02
## URBAN_RURALRURAL                                     -1.122e-02  6.702e-02
## WAGE_INDEX                                            2.643e-01  1.159e-01
## RESIDENT_TO_BEDS                                      7.975e-02  6.891e-02
## BEDS                                                 -7.877e-05  9.139e-05
## DSHPCT                                                1.181e-01  9.798e-02
## COST_TO_CHARGE                                       -1.417e-01  1.177e-01
## ANNUAL_VOLUME                                         8.649e-07  5.632e-06
## CASEMIX                                               1.479e-01  6.281e-02
## OWNERSHIPGovernment - Hospital District or Authority  3.125e-01  2.908e-01
## OWNERSHIPGovernment - Local                           4.180e-01  2.948e-01
## OWNERSHIPGovernment - State                           3.315e-01  2.950e-01
## OWNERSHIPPhysician                                    4.646e-01  4.044e-01
## OWNERSHIPProprietary                                  5.109e-01  2.893e-01
## OWNERSHIPVoluntary non-profit - Church                3.658e-01  2.897e-01
## OWNERSHIPVoluntary non-profit - Other                 3.486e-01  2.894e-01
## OWNERSHIPVoluntary non-profit - Private               4.365e-01  2.883e-01
## EMERGENCY_SERVICESTRUE                                2.162e-01  1.101e-01
## SIR15                                                 3.303e-01  1.935e-02
## TREATMENTTRUE                                        -5.765e-02  2.636e-02
##                                                      t value Pr(>|t|)    
## (Intercept)                                           -1.828  0.06768 .  
## COUNTY                                                -0.538  0.59093    
## REGION2                                                1.648  0.09948 .  
## REGION3                                                2.871  0.00414 ** 
## REGION4                                                1.796  0.07273 .  
## REGION40                                               1.421  0.15544    
## REGION5                                                1.520  0.12856    
## REGION6                                                1.203  0.22912    
## REGION7                                                2.409  0.01611 *  
## REGION8                                                0.099  0.92089    
## REGION9                                                0.290  0.77215    
## URBAN_RURALOURBAN                                      0.582  0.56093    
## URBAN_RURALRURAL                                      -0.167  0.86701    
## WAGE_INDEX                                             2.281  0.02265 *  
## RESIDENT_TO_BEDS                                       1.157  0.24729    
## BEDS                                                  -0.862  0.38882    
## DSHPCT                                                 1.206  0.22809    
## COST_TO_CHARGE                                        -1.203  0.22902    
## ANNUAL_VOLUME                                          0.154  0.87796    
## CASEMIX                                                2.355  0.01862 *  
## OWNERSHIPGovernment - Hospital District or Authority   1.074  0.28278    
## OWNERSHIPGovernment - Local                            1.418  0.15636    
## OWNERSHIPGovernment - State                            1.124  0.26125    
## OWNERSHIPPhysician                                     1.149  0.25072    
## OWNERSHIPProprietary                                   1.766  0.07755 .  
## OWNERSHIPVoluntary non-profit - Church                 1.263  0.20684    
## OWNERSHIPVoluntary non-profit - Other                  1.204  0.22860    
## OWNERSHIPVoluntary non-profit - Private                1.514  0.13020    
## EMERGENCY_SERVICESTRUE                                 1.963  0.04977 *  
## SIR15                                                 17.073  < 2e-16 ***
## TREATMENTTRUE                                         -2.187  0.02886 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 49.45 on 1923 degrees of freedom
## Multiple R-squared:  0.1917, Adjusted R-squared:  0.1791 
## F-statistic: 15.21 on 30 and 1923 DF,  p-value: < 2.2e-16

par(mfrow = c(2,2))
plot(hosp_mod3)

Determining the Effect of Hospital Level Interventions to Reduce CLABSIs

Theresa Henle

4/7/2018