Homework6 - Applied Regression Analysis

In a study at the Ohio State University Medical Center, doctors investigated the association between ICU acquired paresis (ICUAP) (or muscular weakness) and subject’s admission SOFA score dichotomized at 11 (sofa11) (0 for less than 11 and 1 for 11 and greater).

Specifically the investigators wanted to determine if handgrip strength (max_grip) could be used as a surrogate for ICUAP. The investigators believed that the number of days the patient was mechanically ventilated (MV_days) could affect the relationship between weakness and dichotomized SOFA.

data = read.csv("Acquired_weakness.csv", header = T, sep = ",")
attach(data)

Exercise One

Using a two-sample t-test, determine whether the two SOFA Score groups differ significantly with respect to ICUAP (as measured by max_grip).

Solving

library(ggplot2)
library(gridExtra)
library(ggthemes)
data1 = transform(data, 
                 sofa11 = factor(sofa11, 
                                 levels = 0:1, 
                                 c("SOFA < 11", "SOFA >= 11")))
p1 = ggplot(data1, aes(factor(sofa11), max_grip)) + 
        geom_boxplot(outlier.colour = "red", outlier.shape = 1) + 
        xlab(" ") +
        ylab("Handgrip Strength") +
        theme_stata() + 
        scale_color_stata()
p2 = ggplot(data1, aes(factor(sofa11), MVdays)) + 
        geom_boxplot(outlier.colour = "red", outlier.shape = 1) + 
        xlab(" ") +
        ylab("Days in Mechanically Ventilated") +
        theme_stata() + 
        scale_color_stata()
p3 = ggplot(data1, aes(MVdays, max_grip)) + 
        geom_point(aes(color = sofa11)) + 
        ylab("Handgrip Strength") +
        xlab("Days in Mechanically Ventilated") +
        theme_stata() + 
        scale_color_stata()
marrangeGrob(list(p1, p2, p3), nrow = 2, ncol = 2, top = " ")

If we take a confidence level of 5%, we have a significative evidence for reject the null hypothesis (\(H_0:\) means equals), because the probability of error in rejecting the true null hypothesis would be less than 5% previously admitted.

t.test(max_grip ~ factor(sofa11), var.equal = T)

## 
##  Two Sample t-test
## 
## data:  max_grip by factor(sofa11)
## t = 2.205, df = 135, p-value = 0.02915
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   0.924972 17.021695
## sample estimates:
## mean in group 0 mean in group 1 
##       15.140000        6.166667

Exercise Two

Perform the same analysis from Exercise One (i.e., 2 sample t-test) using a regression analysis.

Solving

Below, p-value for test the hypothesis that the true slope \(\beta_1\) is \(0\).

m1 = lm(max_grip ~ sofa11)
summary(m1)$coefficients[2,4]

## [1] 0.02914824

This is the same test given in exercise one.

Exercise Three

Using a single model determine the regression of max_grip on MVdays for each of the two SOFA score groups.

Solving

ggplot(data1, aes(MVdays, max_grip)) + 
        geom_point() + 
        facet_grid(. ~ sofa11) +
        ylab("Handgrip Strength") +
        xlab("Days in Mechanically Ventilated") +
        geom_smooth(method = "lm") + 
        theme_stata() + 
        scale_color_stata()

m2 = lm(max_grip ~ sofa11 + MVdays)
summary(m2)$coefficients

##               Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 20.3502597  1.8304188 11.117816 9.454278e-21
## sofa11      -4.3156769  4.0965192 -1.053499 2.940085e-01
## MVdays      -0.5980555  0.1632222 -3.664058 3.565219e-04

m3 = lm(max_grip ~ sofa11 + MVdays + MVdays*sofa11)
summary(m3)$coefficients

##                  Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)    21.1842062  1.9914508 10.637575 1.677513e-19
## sofa11        -10.3080939  6.9787903 -1.477060 1.420235e-01
## MVdays         -0.6937794  0.1864576 -3.720843 2.919292e-04
## sofa11:MVdays   0.4083585  0.3851152  1.060354 2.909054e-01

m4 = lm(MVdays ~ sofa11)
summary(m4)$coefficients

##             Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)    8.712  0.6077344 14.335209 6.657980e-29
## sofa11         7.788  2.0534471  3.792647 2.240056e-04

anova(m2,m3)

## Analysis of Variance Table
## 
## Model 1: max_grip ~ sofa11 + MVdays
## Model 2: max_grip ~ sofa11 + MVdays + MVdays * sofa11
##   Res.Df   RSS Df Sum of Sq      F Pr(>F)
## 1    134 22250                           
## 2    133 22064  1    186.52 1.1244 0.2909

library(car)
1/vif(m2) # higher than 0.01

##    sofa11    MVdays 
## 0.9037102 0.9037102

The results above show us that m2 fit better. So,

General:

\[\hat y = 20.350 - 4.316\times sofa11 - 0.598 \times MVdays.\]

SOFA < 11:

\[\hat y = 20.350 - 0.598 \times MVdays.\]

SOFA >= 11:

\[\hat y = (20.350 - 4.316) - 0.598\times MVdays.\]

Exercise Four

Test whether the two SOFA cohorts differ significantly, controlling for mechanically ventilated days.

Solving

Below, p-value for test the hypothesis that the true slope \(\beta_1\) is \(0\):

summary(m2)$coefficients[2,4]

## [1] 0.2940085

If we take a confidence level of 5%, we haven’t a significative evidence for reject the null hypothesis (\(H_0: \beta_1=0\)), because the probability of error in rejecting the true null hypothesis would be higher than 5% previously admitted.

Exercise Five

Determine and Compare

Test whether the two SOFA cohorts have the same slope. Do they have the same intercept? Are they coincident? Use the homework forum to share your findings.

Solving

Below, p-value for test the hypothesis that the true slope \(\beta_2\) is \(0\):

summary(m2)$coefficients[3,4]

## [1] 0.0003565219

If we take a confidence level of 5%, we have a significative evidence for reject the null hypothesis (\(H_0: \beta_2=0\) means equals), because the probability of error in rejecting the true null hypothesis would be less than 5% previously admitted.

So,

if SOFA < 11 then \(\hat y = 20.350 - 0.598 \times MVdays\);
if SOFA >= 11 then \(\hat y = (20.350 - 4.316) - 0.598\times MVdays\); and

we can conclude that the two SOFA cohorts have the same slope (\(\beta_2\)).

As we cannot reject \(\beta_1=0\), with confidence level of 5%, the intercept is the same, this is the same straight line.

detach(data)