STAT 578 Survival Analysis HW 2 - HIV data

Ryan Masson | Portland State University | 2/23/24

Exercise 6.1

The problem with these data is that large estimated treatment means are heavily associated with smaller estimates of variance, just from looking at the table of means. This could negatively affect the validity of our basic F-tests. Something needs to be done to stabilize the variance so our inferences will remain reasonabably valid. A power transformation on the data, arrived at through the Box-Cox method, could do this. This would allow us to still do inference on difference of means, but our point estimates and confidence intervals would be less reliable. There are also more complex variance models that can give estimates in the original non-transformed scale, but for which inference is not as well-developed.

Exercise 6.2

Show the code

setwd("~/Documents/datawork/psu/STAT 565/")
milk <- read.table("ex6.2", header = TRUE)
milk$method <- as.factor(milk$method)
milk_lm <- lm(count ~ method, data = milk)
milk_anova <- aov(count ~ method, data = milk)
summary(milk_anova)

            Df    Sum Sq   Mean Sq F value   Pr(>F)    
method       2 1.833e+13 9.167e+12   95.48 4.27e-08 ***
Residuals   12 1.152e+12 9.601e+10                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The ANOVA F-test for difference between the treatment means has a very small p-value, indicating that we would reject the null hypothesis of equality between the treatment means. However, this is the assumptions checking chapter of the textbook, so I proceed to do that:

Show the code

plot(milk_lm, which = 2)

Show the code

plot(milk_lm, which = 1)

From a normal probability plot and a residual plot, there appears to be a departure from normality in both tails of the data, and there is nonconstant variance. This could damage the validity of my F-test.

Show the code

library(MASS)
boxcox(milk_lm)  # use lambda 0

Show the code

milk$count_trans <- log(milk$count)
milk_lm_trans <- lm(count_trans ~ method, data = milk)
plot(milk_lm_trans, which = 2)

Show the code

plot(milk_lm_trans, which = 1)

Show the code

milk_anova_trans <- aov(count_trans ~ method, data = milk)
summary(milk_anova_trans)

            Df Sum Sq Mean Sq F value   Pr(>F)    
method       2 119.77   59.88    1283 1.02e-14 ***
Residuals   12   0.56    0.05                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

I used the Box-Cox method to find an appropriate power transformation for the response variable. 0 turned out to be an appropriate power so I refit the model with the log of the response, and plotting showed a stabilization of variance and normality. The p-value of the subsequent F-test is very small, so I reject the null hypothesis of equality of treatment means.

Exercise 6.4

Show the code

delivery <- read.table("ex6.4", header = TRUE)
delivery$company <- as.factor(delivery$company)
delivery_model <- lm(breakage ~ company, data = delivery)
plot(delivery_model, which = 1)

Show the code

bc <- boxcox(delivery_model)

The 95% confidence interval for the transformation power is about (-0.67, 0.75).

Exercise 6.5

This plot tells me that there is some serial dependence in the data. The residuals show correlation over time; the independence assumption is violated.
This residual plot shows a right-opening megaphone shape as you move through the treatment groups. This is a tell-tale sign of unequal variance, so the equal variance assumption is violated.
This plot appears to show reasonably equal variance. It does not show the violation of one of our assumptions.
This normal quantile-quantile plot is of the data, not the residuals, so it does not address one of the main assumptions. The normality assumption refers to the error term in our model, not the data itself.

Problem 6.1

Show the code

grass <- read.table("pr6.1", header = TRUE)
grass$trt <- as.factor(grass$trt) 
grass_model <- lm(y ~ trt, data = grass)
plot(grass_model, which = 2)

Show the code

plot(grass_model, which = 1)

The plots are showing some non-normality of residuals and nonconstant variance. A power transformation could help.

Show the code

boxcox(grass_model)   # lambda = 1.75

Show the code

grass$y_trans <- grass$y ^ 1.75
grass_model_trans <- lm(y_trans ~ trt, data = grass)
plot(grass_model_trans, which = 2)

Show the code

plot(grass_model_trans, which = 1) # little effect of transformation

Show the code

grass_anova <- aov(grass_model)
summary(grass_anova)

            Df Sum Sq Mean Sq F value  Pr(>F)    
trt          5   6398    1280    71.2 3.2e-11 ***
Residuals   18    323      18                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The transformation appeared to have little effect, which makes sense because 1 was in the Box-Cox confidence interval. An analysis of variance results in a p-value of about 3.2e-11. So the treatments are not equivalent.

Under nonirrigated conditions, there are now 4 treatment groups. We are testing for a quadratic effect of nitrogen, so consulting Table C.6 in the textbook, the contrast coefficients will be (1, -1, -1, 1). To apply this to our dataset that still includes irrigated treatment groups, the final contrast vector will be (1, 0, -1, -1, 1, 0).

Show the code

library(multcomp)

Loading required package: mvtnorm

Loading required package: survival

Warning: package 'survival' was built under R version 4.3.1

Loading required package: TH.data


Attaching package: 'TH.data'

The following object is masked from 'package:MASS':

    geyser

Show the code

K <- matrix(c(1, 0, -1, -1, 1, 0), 1)
summary(glht(grass_model, linfct = K))


     Simultaneous Tests for General Linear Hypotheses

Fit: lm(formula = y ~ trt, data = grass)

Linear Hypotheses:
       Estimate Std. Error t value Pr(>|t|)    
1 == 0   90.750      5.608   16.18  3.6e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)

The p-value is < 0.001, so the effect is significant.

STAT 578 Survival Analysis HW 2 - HIV data

Exercise 6.1

Exercise 6.2

Exercise 6.4

Exercise 6.5

Problem 6.1

Exercise 7.2

Exercise 7.3