Due October 31st Answer all questions specified on the problem but if you see something interesting and want to do more analysis please report it as well. Don’t forget to include discussion and to make your plots look nice and presentable. Submit your .rmd file with the knitted PDF (or knitted Word Document saved as a PDF). If you are still having trouble with .rmd, let us know and we will help you, but both the .rmd and the PDF are required.

Question 1.

Following up with the Beat the Blues data from class (package HSAUR3) do the following

Part A.

Construct boxplots to compare the factor variable drug in ananalogous way to how we constructed boxplots in for treatment variable. Discuss the results.

Answer- It is clear from the box plot that there is a steady decrease in BDI with lower variability in patients who took anti-depressants. Although it seems there is a slight decrease in BDI in patients who did not take anti-depressants the variability is higher which suggests patients are more likely to reducte their Beck Depression Inventory II levels as time goes on with anti-depressant treatment.

###load all libraries
library(HSAUR3)
library(dplyr)
library(tidyr)
library(ggplot2)
library(gee)
library(Matrix)
library(lme4)
library(multcomp)
library(Hmisc)
#load the data
data("BtheB", package = "HSAUR3")
#create the labels for the facet
labels <- c('No'='Did Not Take Anti-Depressants',
            'Yes'='Took Anti-Depressants')
#pipe the data and gether the columns with bdi into two columns for labels and values. change the bdi.pre to 0. 
#factor and replace the alphabetic and punctuation characters with empty space.
#using ggplot plot a boxplot with  a facet on drug
 BtheB %>%
  gather(BDI_Type,BDI_Value, 4:8) %>%
  mutate(BDI_Type = ifelse(BDI_Type == 'bdi.pre','0',BDI_Type),
         BDI_Type = factor(gsub(BDI_Type,pattern="[[:alpha:]]|[[:punct:]]",replacement=''))) %>%
  ggplot(aes(x=BDI_Type,y=BDI_Value)) +
  geom_boxplot() +
  facet_grid(.~drug,labeller =  as_labeller(labels)) +
  labs(title='Beat the Blues ~ Anti-Depressant Treatment',
       x='Time (in months)',
       y='BDI')

Part B.

Repeat (a) for the length variable. Discuss the results.

Answer - It seems obbvious that Beck Depression Inventory II levels decrease much faster in patients that have shorter episodes of depression which is what the below box plot shows. There is much more variablility in BDI levels for patients who have longer episodes of depression and a much gentle decrease in BDI levels.

labels <- c('>6m'='Current Episode Greater than 6 Months',
            '<6m'='Current Episode Less than 6 Months')
 BtheB %>%
  gather(BDI_Type,BDI_Value, 4:8) %>%
  mutate(BDI_Type = ifelse(BDI_Type == 'bdi.pre','0',BDI_Type),
         BDI_Type = factor(gsub(BDI_Type,pattern="[[:alpha:]]|[[:punct:]]",replacement=''))) %>%
  ggplot(aes(x=BDI_Type,y=BDI_Value)) +
  geom_boxplot() +
  facet_grid(.~length,labeller =  as_labeller(labels)) +
  labs(title='Beat the Blues ~ Length of Current Episode of Depression',
       x='Time (in months)',
       y='BDI')

Part C.

Use the lm function to fit a model to the Beat the Blues data that assumes that the repeated measurements are independent. Compare the results to those from fitting the random intercept model \(BtheB_lmer1\) from class.

Answer - Using the chi-squared test it appears that the linear model produces a lower AIC, higher BIC, and a better overall fit than the linear mixed-effect model. However, this does not take into effect the repeated measures on the different subjects and is misleading as shown by the residuals plot of the mixed-effects model vs that of the linear model.

BtheB$subject <- factor(rownames(BtheB))
btb <- BtheB %>%
  gather(time,bdi, 5:8) %>%
  mutate(time = as.numeric(gsub(time,pattern="[[:alpha:]]|[[:punct:]]",replacement='')))


BtheB_lm1 <- lm(bdi ~ bdi.pre + time + treatment + drug + 
                    length +  subject, data = btb)

BtheB_lmer1 <- lmer(bdi ~ bdi.pre + time + treatment + drug + 
                      length + (1 | subject), data = btb, 
                    REML = FALSE, na.action = na.omit)

anova(BtheB_lmer1,BtheB_lm1)
## Data: btb
## Models:
## BtheB_lmer1: bdi ~ bdi.pre + time + treatment + drug + length + (1 | subject)
## BtheB_lm1: bdi ~ bdi.pre + time + treatment + drug + length + subject
##             Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)    
## BtheB_lmer1  8 1887.5 1916.6 -935.75   1871.5                             
## BtheB_lm1   99 1778.2 2138.0 -790.08   1580.2 291.32     91  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Coefficients for Mixed-Effects Model

cftest(BtheB_lmer1)
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Fit: lmer(formula = bdi ~ bdi.pre + time + treatment + drug + length + 
##     (1 | subject), data = btb, REML = FALSE, na.action = na.omit)
## 
## Linear Hypotheses:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept) == 0     5.59239    2.24244   2.494   0.0126 *  
## bdi.pre == 0         0.63968    0.07789   8.212 2.22e-16 ***
## time == 0           -0.70476    0.14639  -4.814 1.48e-06 ***
## treatmentBtheB == 0 -2.32908    1.67036  -1.394   0.1632    
## drugYes == 0        -2.82495    1.72684  -1.636   0.1019    
## length>6m == 0       0.19708    1.63832   0.120   0.9043    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Univariate p values reported)

Random Intercepts & Residuals Plot of Mixed Effects Model

residuals <- function(object, obs) obs-predict(object) 
layout(matrix(1:2, ncol = 2))
qint <- ranef(BtheB_lmer1)$subject[["(Intercept)"]]
qres <- residuals(BtheB_lmer1, btb$bdi.pre)
qqnorm(qint, ylab = "Estimated random intercepts",
     xlim = c(-3, 3), ylim = c(-20, 20),
     main = "Random intercepts")
qqline(qint, col="red", lwd=3)
qqnorm(qres, xlim = c(-3, 3), ylim = c(-20, 20),
     ylab = "Estimated residuals",
     main = "Residuals")
qqline(qres, col="red", lwd=3)

Q-Q Plot of Linear Model

plot(BtheB_lm1,which=2)

Part D.

Investigate and discuss whether there is any evidence of an interaction between treatment and time for the Beat the Blues data.

Answer - There is certainly an interaction between time and treatment type. The Beat the Blues treatment type shows a clear downward tren with lower variance as time progresses than a treatment as usual which indicates the Beat the Blues treatment does reduce Back Depression Inventory II as time progresses.

Box Plot of Treatment Type to Time

labels <- c('TAU'='Treatment as Usual',
              'BtheB'='Beat The Blues')  
  BtheB %>%
  gather(time,bdi, 5:8) %>%
  mutate(time = factor(gsub(time,pattern="[[:alpha:]]|[[:punct:]]",replacement=''))) %>%
  ggplot(aes(x=time,y=bdi)) +
  geom_boxplot() +
  facet_grid(.~treatment,labeller =  as_labeller(labels)) + 
    labs(title='Beat the Blues ~ Treatment Type',
         x='Time (in months)',
         y='BDI')

Coefficients of Linear Model With Time & Treatment Interaction

The first output is the coefficient of the interaction between time and treatment in a linear model. The p value is significant at th 0.05% level which suggests that there is an interaction between time and treatment. The second output is the coefficients of a mixed effects model which indicates that the interaction between time and treatmetn is not significant for the repeated measures of subject.

  btb <- BtheB %>%
    gather(time,bdi, 5:8) %>%
    mutate(time = as.numeric(gsub(time,pattern="[[:alpha:]]|[[:punct:]]",replacement='')))
  
  
  BtheB_lm2 <- lm(bdi ~ bdi.pre + time + treatment + drug + 
                    length +  subject + treatment*time, data = btb)
  
  d <- as.data.frame(summary(BtheB_lm2)$coefficients)
  d[99,]
##                      Estimate Std. Error  t value   Pr(>|t|)
## time:treatmentBtheB 0.6435779  0.2975651 2.162814 0.03186724
  BtheB_lmer2 <- lmer(bdi ~ bdi.pre + time + treatment + drug + 
                        length + time*treatment +(1 | subject), data = btb, 
                      REML = FALSE, na.action = na.omit)  

Coefficients for Mixed-Effects Model

cftest(BtheB_lmer2)
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Fit: lmer(formula = bdi ~ bdi.pre + time + treatment + drug + length + 
##     time * treatment + (1 | subject), data = btb, REML = FALSE, 
##     na.action = na.omit)
## 
## Linear Hypotheses:
##                          Estimate Std. Error z value Pr(>|z|)    
## (Intercept) == 0          6.47824    2.31905   2.793  0.00521 ** 
## bdi.pre == 0              0.64046    0.07843   8.166 2.22e-16 ***
## time == 0                -0.95555    0.20831  -4.587 4.49e-06 ***
## treatmentBtheB == 0      -4.09804    1.98490  -2.065  0.03896 *  
## drugYes == 0             -2.79209    1.73986  -1.605  0.10854    
## length>6m == 0            0.21905    1.65043   0.133  0.89441    
## time:treatmentBtheB == 0  0.49000    0.28959   1.692  0.09064 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Univariate p values reported)

Random Intercepts and Residuals Plot of Mixed Effects Model with Interaction Term

residuals <- function(object, obs) obs-predict(object) 
  layout(matrix(1:2, ncol = 2))
  qint <- ranef(BtheB_lmer2)$subject[["(Intercept)"]]
  qres <- residuals(BtheB_lmer2, btb$bdi.pre)
  qqnorm(qint, ylab = "Estimated random intercepts",
         xlim = c(-3, 3), ylim = c(-20, 20),
         main = "Random intercepts")
  qqline(qint, col="red", lwd=3)
  qqnorm(qres, xlim = c(-3, 3), ylim = c(-20, 20),
         ylab = "Estimated residuals",
         main = "Residuals")
  qqline(qres, col="red", lwd=3)

Part E.

Construct a plot of the mean profiles of both treatment groups in the Beat the Blues data, showing also standard deviation bars at each time point. (Attempt to use ggplot2 library to do this).

#function for plotting the standard deviation, mean, and upper and lower values
min.mean.sd.max <- function(x) {
  r <- c(min(x), mean(x) - sd(x), mean(x), mean(x) + sd(x), max(x))
  names(r) <- c("ymin", "lower", "middle", "upper", "ymax")
  r
}

BtheB %>%
  gather(time,bdi, 4:8) %>%
  mutate(time = factor(gsub(time,pattern="[[:alpha:]]|[[:punct:]]",replacement=''))) %>%
  ggplot(aes(x=time,y=bdi)) +
stat_summary(fun.data = min.mean.sd.max, geom = "boxplot")+
  facet_grid(. ~ treatment,labeller =  as_labeller(labels)) + 
  labs(title='Beat the Blues ~ Treatment Type',
       x='Time (in months)',
       y='BDI')

Question 2

Consider the phosphate data from the package HSAUR3. This data shows the plasma inorganic phosphate levels for 33 subjects, 20 of whom are controls and 13 of whom have been classified as obese (Davis, 2002). Perform the following on this dataset

Part A.

Construct boxplots by group and discuss.

Answer - The phosphate levels in the control group seem to drop and then rise again after 1 and a half hours with less variability than obese subjects. Additionally, the obese subjects phosphate levels did not rise until the third hour compared to the second hour for the control group.

data(phosphate)


phosphate %>%
  gather(time,phosphate_level, 2:9) %>%
  mutate(time = factor(gsub(time,pattern="[[:alpha:]]",replacement=''))) %>%
ggplot(aes(x=time,y=phosphate_level)) +
  geom_boxplot() +
  facet_grid(.~group) +
  labs(title='Phosphate Data ~ Group Phosphate Levels By Time',
       x='Time (in hours)',
       y='Phosphate Level')

Part B.

Produce separate plots of the profiles of the individuals in each group.

phosphate$subject <- factor(rownames(phosphate))

phosphate %>%
  gather(time,phosphate_level, 2:9) %>%
  mutate(time = factor(gsub(time,pattern="[[:alpha:]]",replacement=''))) %>%
ggplot(aes(x=time,y=phosphate_level)) +
  geom_line(aes(group=subject,color=subject),size=1) +
  facet_grid(.~group) +
  labs(title='Phosphate Data ~ Group Phosphate Levels By Time & Subject',
       x='Time (in hours)',
       y='Phosphate Level')

Part C.

Guided by how these plots fit, which linear mixed effects models do you think might be sensible? ( Hint: Discuss intercept/slope and intercept/ interaction ).

Answer - It would be sensible to fit a linear mixed effects model because the slopes and intercepts in each group vary widely. Additionally, there is an interaction within the group variable (control & obese) which should take effect on the time variable. Thus a model of the form \(lme(phosphate_level ~ t0 + time * group + (1|subject),data=phos)\) should be adequate.

Part D.

Convert the data to long version and fit the model of your choice and discuss the results.

Answer - Judging by the coefficients of the model there significance in the interaction between group and time as suggested by the box plot. The random intercepts and residuals of the model seem to be centered around zero. Additionally, the mean squared error of the fixed-effects model is 0.151 which suggests the model has good prediction power overall for phosphate levels over time againsts both groups of subjects.

Coefficient Output

phosphate$subject <- factor(rownames(phosphate))
phos <- phosphate %>%
  gather(time,phosphate_level, 3:9) %>%
  mutate(time = factor(gsub(time,pattern="[[:alpha:]]",replacement='')))

phos_lme <- lmer(phosphate_level ~ t0 + time * group + (1|subject),data=phos)

cftest(phos_lme)
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Fit: lmer(formula = phosphate_level ~ t0 + time * group + (1 | subject), 
##     data = phos)
## 
## Linear Hypotheses:
##                         Estimate Std. Error z value Pr(>|z|)    
## (Intercept) == 0         0.88700    0.38907   2.280 0.022621 *  
## t0 == 0                  0.63084    0.08954   7.045 1.85e-12 ***
## time1 == 0              -0.55500    0.13424  -4.134 3.56e-05 ***
## time1.5 == 0            -0.72000    0.13424  -5.364 8.16e-08 ***
## time2 == 0              -0.50500    0.13424  -3.762 0.000169 ***
## time3 == 0              -0.20500    0.13424  -1.527 0.126731    
## time4 == 0               0.06000    0.13424   0.447 0.654903    
## time5 == 0               0.45500    0.13424   3.389 0.000700 ***
## groupobese == 0          0.40106    0.18965   2.115 0.034458 *  
## time1:groupobese == 0    0.31654    0.21388   1.480 0.138874    
## time1.5:groupobese == 0  0.18154    0.21388   0.849 0.395996    
## time2:groupobese == 0   -0.38731    0.21388  -1.811 0.070159 .  
## time3:groupobese == 0   -0.57192    0.21388  -2.674 0.007494 ** 
## time4:groupobese == 0   -0.58308    0.21388  -2.726 0.006407 ** 
## time5:groupobese == 0   -0.67038    0.21388  -3.134 0.001722 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Univariate p values reported)

Summary Stats of the LME Model

summary(phos_lme)
## Linear mixed model fit by REML ['lmerMod']
## Formula: phosphate_level ~ t0 + time * group + (1 | subject)
##    Data: phos
## 
## REML criterion at convergence: 330.3
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.3420 -0.6072  0.0432  0.6218  3.2895 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  subject  (Intercept) 0.08565  0.2927  
##  Residual             0.18020  0.4245  
## Number of obs: 231, groups:  subject, 33
## 
## Fixed effects:
##                    Estimate Std. Error t value
## (Intercept)         0.88700    0.38907   2.280
## t0                  0.63084    0.08954   7.045
## time1              -0.55500    0.13424  -4.134
## time1.5            -0.72000    0.13424  -5.364
## time2              -0.50500    0.13424  -3.762
## time3              -0.20500    0.13424  -1.527
## time4               0.06000    0.13424   0.447
## time5               0.45500    0.13424   3.389
## groupobese          0.40106    0.18965   2.115
## time1:groupobese    0.31654    0.21388   1.480
## time1.5:groupobese  0.18154    0.21388   0.849
## time2:groupobese   -0.38731    0.21388  -1.811
## time3:groupobese   -0.57192    0.21388  -2.674
## time4:groupobese   -0.58308    0.21388  -2.726
## time5:groupobese   -0.67038    0.21388  -3.134
## 
## Correlation matrix not shown by default, as p = 15 > 12.
## Use print(x, correlation=TRUE)  or
##   vcov(x)     if you need it

Random Intercepts & Residuals Plot

residuals <- function(object, obs) obs-predict(object) 
layout(matrix(1:2, ncol = 2))
qint <- ranef(phos_lme)$subject[["(Intercept)"]]
qres <- residuals(phos_lme, phos$t0)
qqnorm(qint, ylab = "Estimated random intercepts",
       xlim = c(-3, 3), ylim = c(-20, 20),
       main = "Random intercepts")
qqline(qint, col="red", lwd=3)
qqnorm(qres, xlim = c(-3, 3), ylim = c(-20, 20),
       ylab = "Estimated residuals",
       main = "Residuals")
qqline(qres, col="red", lwd=3)

Mean Square Error Of Phosphate Mixed-Effects Model

Use the re.form argument to specifiy the mixed effect of \((1|subject)\).

phos.pred <- predict(phos_lme,newdata=phos,re.form=~(1|subject))

round(mean((phos$phosphate_level - phos.pred)^2), 3)
## [1] 0.151