library(survival)
## Loading required package: splines
HIV=read.csv("HIV.csv")
survdiff(Surv(Time)~Drug, data=HIV)
## Call:
## survdiff(formula = Surv(Time) ~ Drug, data = HIV)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## Drug=NoSaq 17 17 13.1 1.16 2.18
## Drug=Saq 17 17 20.9 0.73 2.18
##
## Chisq= 2.2 on 1 degrees of freedom, p= 0.14
1.16 + 0.73 #calculate test statistic
## [1] 1.89
1-pchisq(1.89, 1) #calculate p-value
## [1] 0.1692
The p-value is above .05, so we fail to reject the null hypothesis. There appears to be no difference in the distribution of times at which CD4 reaches the prescribed level between the two treatments.
(17/13.1)/(17/20.9) #point estimate: HR = (OA/EA)/(OB/EB)
## [1] 1.595
The point estimate is larger than 1, indicating that patients taking the \(NoSaq\) treatment have a higher risk of having their CD4 count drop to the prespecified level.
To find the interval estimate:
1.59542*exp(-1.96*sqrt(1/13.1+1/20.9)) #lower bound for interval: [HR*e +/- 1.96*s ] where s = sd (log(HR)) = sqrt(1/EA + 1/EB)
## [1] 0.7997
1.59542*exp(1.96*sqrt(1/13.1+1/20.9)) #upper bound for interval
## [1] 3.183
Our interval estimate is [.7997, 3.183].
leaders=read.csv("Leaders.csv")
Status = (leaders$Lost > 0)+0
km=survfit(Surv(Years)~1, conf.type='plain', data=leaders)
plot(km, mark.time=F, conf.int=F)
The maximum value of the plot appears to be 0.8 rather than 1, which is odd.
To plot the graph with the changed x-axis limits:
plot(km, mark.time=F, conf.int=F, xlim=c(-1,28))
The maximum value of the plot is now 1, which is what we are used to seeing. Shifting the axis so that the \(y=0\) line is visible showed the real beginning of the plot, which was blocked by the immediate decrease in survival rate.
kmlead=survfit(Surv(Years, Status)~Manner, conf.type='plain', data=leaders)
plot(kmlead, mark.time=F, conf.int=F, col=1:2)
There is some overlap between the distributions, but they each also have distinct values, leading us to believe that the distributions are not identical.
survdiff(Surv(Years, Status)~Manner, data=leaders)
## Call:
## survdiff(formula = Surv(Years, Status) ~ Manner, data = leaders)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## Manner=0 295 223 237 0.879 3.1
## Manner=1 177 134 120 1.746 3.1
##
## Chisq= 3.1 on 1 degrees of freedom, p= 0.0785
.879+1.746 #calculate test statistic
## [1] 2.625
1-pchisq(2.625, 1) #calculate p-value
## [1] 0.1052
Our p-value is 0.105 (i.e. greater than .05), leading us to fail to reject the null hypothesis. There is not enough evidence to say that there is a difference in the distributions.
kmlead=survfit(Surv(Years, Status)~Manner, conf.type='plain', data=leaders)
summary(kmlead)
## Call: survfit(formula = Surv(Years, Status) ~ Manner, data = leaders,
## conf.type = "plain")
##
## Manner=0
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 295 48 0.8373 0.0215 0.7952 0.879
## 1 244 25 0.7515 0.0252 0.7021 0.801
## 2 212 28 0.6522 0.0280 0.5973 0.707
## 3 177 18 0.5859 0.0292 0.5287 0.643
## 4 153 25 0.4902 0.0301 0.4313 0.549
## 5 123 25 0.3905 0.0298 0.3321 0.449
## 6 93 14 0.3318 0.0292 0.2746 0.389
## 7 74 4 0.3138 0.0290 0.2571 0.371
## 8 64 7 0.2795 0.0285 0.2235 0.335
## 9 53 1 0.2742 0.0285 0.2184 0.330
## 10 48 2 0.2628 0.0284 0.2071 0.319
## 11 44 5 0.2329 0.0282 0.1777 0.288
## 12 39 2 0.2210 0.0280 0.1662 0.276
## 13 34 2 0.2080 0.0278 0.1535 0.262
## 14 32 3 0.1885 0.0274 0.1349 0.242
## 15 29 2 0.1755 0.0270 0.1226 0.228
## 17 22 4 0.1436 0.0264 0.0919 0.195
## 18 17 2 0.1267 0.0258 0.0761 0.177
## 19 15 2 0.1098 0.0250 0.0608 0.159
## 20 13 2 0.0929 0.0238 0.0462 0.140
## 22 9 1 0.0826 0.0233 0.0369 0.128
## 24 4 1 0.0619 0.0250 0.0129 0.111
##
## Manner=1
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 177 51 0.712 0.0340 0.6451 0.779
## 1 122 20 0.595 0.0371 0.5224 0.668
## 2 97 10 0.534 0.0380 0.4592 0.608
## 3 85 16 0.433 0.0383 0.3583 0.508
## 4 67 10 0.369 0.0376 0.2949 0.442
## 5 57 4 0.343 0.0372 0.2700 0.416
## 6 49 3 0.322 0.0368 0.2497 0.394
## 7 43 2 0.307 0.0366 0.2351 0.379
## 8 40 2 0.291 0.0363 0.2203 0.363
## 9 34 2 0.274 0.0362 0.2035 0.345
## 10 30 4 0.238 0.0357 0.1679 0.308
## 11 23 4 0.196 0.0349 0.1279 0.265
## 12 19 2 0.176 0.0342 0.1087 0.243
## 13 15 2 0.152 0.0334 0.0868 0.218
## 14 12 1 0.140 0.0329 0.0750 0.204
## 16 8 1 0.122 0.0331 0.0572 0.187
From the summary, we see that the estimate of median survival time for \(Manner=0\) is 3, whereas for \(Manner=1\) is is 2. This would lead us to suggest that there is a difference between the distributions. This is not contradictory to our results in (c) because finding a confidence interval for median survival time compares one survival aspect, not the entire survival curve.
S = function(x) 1-pnorm(x, )
survdiff(Surv(Years)~Region, data=leaders)
## Call:
## survdiff(formula = Surv(Years) ~ Region, data = leaders)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## Region=0 65 65 74.2 1.151 1.659
## Region=1 134 134 169.7 7.495 14.620
## Region=2 99 99 102.7 0.136 0.211
## Region=3 174 174 125.4 18.879 33.274
##
## Chisq= 35.9 on 3 degrees of freedom, p= 7.68e-08
1.151+7.495+.136+18.879 #calculate test statistic
## [1] 27.66
1-pchisq(27.661, 3) #calculate p-value
## [1] 4.278e-06
The p-value is essentially 0, leading us to believe that there is a relationship between region and number of years in power.
Region 3 (Latin America) is largely influencing the decision we made in (g) because it contributes the most to the chi-square test statistic (18.879 out of 27.661).
LitCat = 1 + (leaders$Literacy > 20) + (leaders$Literacy > 40) + (leaders$Literacy > 60) + (leaders$Literacy > 80)
survdiff(Surv(Years, Status)~LitCat, data=leaders)
## Call:
## survdiff(formula = Surv(Years, Status) ~ LitCat, data = leaders)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## LitCat=1 71 49 63.2 3.191 4.543
## LitCat=2 69 50 54.1 0.316 0.435
## LitCat=3 67 44 59.1 3.864 5.412
## LitCat=4 129 105 86.2 4.113 6.374
## LitCat=5 136 109 94.4 2.266 3.636
##
## Chisq= 16.3 on 4 degrees of freedom, p= 0.00267
3.191+.316+3.864+4.113+2.266 #calculate test statistic
## [1] 13.75
1-pchisq(13.75, 4) #calculate p-value
## [1] 0.008137
The p-value is less than .05, indicating that \(LitCat\) is significantly related to the duration of leaders. We also see from the log-rank test that as literacy rate increases, so does the expected length of term for leaders.
survdiff(Surv(Years, Status)~LitCat + strata(Manner), data=leaders)
## Call:
## survdiff(formula = Surv(Years, Status) ~ LitCat + strata(Manner),
## data = leaders)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## LitCat=1 71 49 64.7 3.79 5.662
## LitCat=2 69 50 54.8 0.42 0.591
## LitCat=3 67 44 59.4 4.00 5.733
## LitCat=4 129 105 85.9 4.27 6.792
## LitCat=5 136 109 92.3 3.04 4.965
##
## Chisq= 19.6 on 4 degrees of freedom, p= 0.000596
3.333+.483+1.953+2.436+3.109 #calculate test statistic
## [1] 11.31
1-pchisq(11.314, 4) #calculate p-value
## [1] 0.02325
\(LitCat\) is significant in the presence of \(Manner\).
To perform the stratified log-rank test for \(Region\):
survdiff(Surv(Years, Status)~LitCat + strata(Region), data=leaders)
## Call:
## survdiff(formula = Surv(Years, Status) ~ LitCat + strata(Region),
## data = leaders)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## LitCat=1 71 49 49.1 5.79e-05 9.67e-05
## LitCat=2 69 50 45.8 3.91e-01 5.63e-01
## LitCat=3 67 44 53.1 1.54e+00 2.28e+00
## LitCat=4 129 105 90.2 2.43e+00 4.12e+00
## LitCat=5 136 109 118.9 8.28e-01 2.38e+00
##
## Chisq= 6.6 on 4 degrees of freedom, p= 0.159
.0154+.1604+.4241+1.7155+1.0978 #calculate test statistic
## [1] 3.413
1-pchisq(3.4132, 4) #calculate p-value
## [1] 0.4912
\(LitCat\) is not significant in the presence of \(Region\).
To make the table:
table(leaders$Region , LitCat )
## LitCat
## 1 2 3 4 5
## 0 10 10 22 23 0
## 1 54 37 21 22 0
## 2 7 20 10 19 43
## 3 0 2 14 65 93
Region is a counfounder for both \(Years\) and \(LitCat\) because it influences both the number of years in power and the literacy rate. Poorly educated regions tend to have both a lower literacy rate and more leaders with long terms.
creating the \(LitCat\) variable transformed the literacy data from numerical to categorical, which is required when evaluating significance using a log-rank test.
To fit the Weibull model:
m = survreg(Surv(Years+0.001 , Status)~Literacy , dist = 'weibull' , data=leaders)
summary(m)
##
## Call:
## survreg(formula = Surv(Years + 0.001, Status) ~ Literacy, data = leaders,
## dist = "weibull")
## Value Std. Error z p
## (Intercept) 2.5213 0.33849 7.45 9.43e-14
## Literacy -0.0138 0.00498 -2.77 5.62e-03
## Log(scale) 0.9580 0.04776 20.06 1.66e-89
##
## Scale= 2.61
##
## Weibull distribution
## Loglik(model)= -712.8 Loglik(intercept only)= -716.7
## Chisq= 7.86 on 1 degrees of freedom, p= 0.0051
## Number of Newton-Raphson Iterations: 6
## n= 472
The \(Literacy\) coefficient is negative, indicating that as literacy increases, the number of years a leader is in power decreases. This agrees with what we found in (i). The p-value for \(Literacy\) is 5.62e-03 (i.e. less than .05), indicating that there is a significant relationship between literacy and the number of years a leader is in power.
S70 = function(x) 1-pweibull(x,1/2.61, exp(2.5213-0.0138*70))
integrate(S70,0,Inf)
## 17.81 with absolute error < 0.0017
S80 = function(x) 1-pweibull(x,1/2.61, exp(2.5213-0.0138*80))
integrate(S80,0,Inf)
## 15.51 with absolute error < 0.0019
The ratio of these two predictions is 1.148, which is very close to 1.143, or 80/70.
m = survreg( Surv(Years+0.001 , Status)~Literacy + factor(Region) ,dist = 'weibull' , data=leaders )
summary(m)
##
## Call:
## survreg(formula = Surv(Years + 0.001, Status) ~ Literacy + factor(Region),
## data = leaders, dist = "weibull")
## Value Std. Error z p
## (Intercept) 2.104983 0.49160 4.2819 1.85e-05
## Literacy -0.000405 0.00646 -0.0627 9.50e-01
## factor(Region)1 0.257156 0.46962 0.5476 5.84e-01
## factor(Region)2 -0.243890 0.49324 -0.4945 6.21e-01
## factor(Region)3 -1.150246 0.47706 -2.4111 1.59e-02
## Log(scale) 0.944090 0.04787 19.7228 1.37e-86
##
## Scale= 2.57
##
## Weibull distribution
## Loglik(model)= -706.9 Loglik(intercept only)= -716.7
## Chisq= 19.61 on 4 degrees of freedom, p= 6e-04
## Number of Newton-Raphson Iterations: 6
## n= 472
When controlling for region, literacy is no longer significant (p-value=.95).
9 2x2 tables are required.
| (0,2] | Dead | Alive | Total |
|---|---|---|---|
| Group 1 | 1 | 4 | 5 |
| Group 2 | 0 | 5 | 5 |
| Total | 1 | 9 | 10 |
| (2, 7] | Dead | Alive | Total |
|---|---|---|---|
| Group 1 | 1 | 3 | 4 |
| Group 2 | 1 | 4 | 5 |
| Total | 3 | 6 | 9 |
Group 1 Dead: 4/9
Group 2 Dead: 5/9
Group 1 Alive: 5/9
Group 2 Alive: 4/9
False
True
True
True
False
True
True
True
True