R Markdown
library("survival")
library("lubridate")
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
library("ggplot2")
library("survminer")
## Loading required package: ggpubr
## Loading required package: magrittr
library("Publish")
## Loading required package: prodlim
- Read in the data from the internet:
hmohiv<-read.table("https://stats.idre.ucla.edu/stat/r/examples/asa/hmohiv.csv", sep=",", header = TRUE)
- There are the following variables in the data: age, time, drug, entry and end date.
- Convert entry and end dates to R dates with the following code: as.Date(hmohiv$entdate,format=“%m/%d/%Y”).
hmohiv$entdate <- as.Date(hmohiv$entdate,format="%m/%d/%Y")
hmohiv$enddate <- as.Date(hmohiv$enddate,format="%m/%d/%Y")
- Create a histogram of entry dates with ggplot2.
ggplot(hmohiv,aes(entdate)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

- Compute the follow up time in days and compare with the “time” variable. Hint: just subtract, but convert into numeric (use as.numeric). What is the relationship between the time difference we computed and the “time” variable?
hmohiv$obstime = as.numeric(hmohiv$enddate-hmohiv$entdate)
- What is the date range of follow-up, entry and end times?
range(hmohiv$obstime)
## [1] 30 1825
range(hmohiv$entdate)
## [1] "1989-01-13" "1991-12-28"
range(hmohiv$enddate)
## [1] "1989-02-16" "1995-11-14"
- Is survival time normally distributed?
ggplot(hmohiv,aes(obstime)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

- Use Surv(hmohiv\(time,hmohiv\)censor) to create a survival variable (survVar). Create a Kaplan-Meier curve with survfit(survVar~1, …..), and plot with ggsurvplot. Try “risk.table=TRUE” argument.
hmohiv$survVar <- Surv(hmohiv$time,hmohiv$censor)
KM <- survfit(survVar ~ 1, type="kaplan-meier", conf.type="log", data=hmohiv)
ggsurvplot(survfit(survVar~drug,data=hmohiv),risk.table=TRUE, pval = TRUE)

- How many events in each group?
with(hmohiv,table(hmohiv$time,hmohiv$censor))
##
## 0 1
## 1 2 15
## 2 5 5
## 3 2 10
## 4 1 4
## 5 0 7
## 6 1 2
## 7 1 6
## 8 0 4
## 9 0 3
## 10 1 3
## 11 0 3
## 12 2 2
## 13 0 1
## 14 0 1
## 15 0 2
## 19 1 0
## 22 0 1
## 24 1 0
## 30 0 1
## 31 0 1
## 32 0 1
## 34 0 1
## 35 0 1
## 36 0 1
## 43 0 1
## 53 0 1
## 54 0 1
## 56 1 0
## 57 0 1
## 58 0 1
## 60 2 0
- What were the median survival times with 95% Cis in each group? Hint: summary(fit)$table
median(summary(KM$surv))
## [1] 0.2649738
- Use survdiff() to compute the logrank test.
survdiff(survVar~censor, data = hmohiv)
## Call:
## survdiff(formula = survVar ~ censor, data = hmohiv)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## censor=0 20 0 18.3 18.32 26.6
## censor=1 80 80 61.7 5.44 26.6
##
## Chisq= 26.6 on 1 degrees of freedom, p= 2e-07
- Was the difference statistically significant?
survdiff(survVar~censor, data = hmohiv)
## Call:
## survdiff(formula = survVar ~ censor, data = hmohiv)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## censor=0 20 0 18.3 18.32 26.6
## censor=1 80 80 61.7 5.44 26.6
##
## Chisq= 26.6 on 1 degrees of freedom, p= 2e-07
- Use coxph function to estimate the hazard ratio (HR). Give the 95% CI and interpret the parameter. Remember to exponeniate!
Cox1 <- coxph(survVar ~ age, data = hmohiv)
Cox1
## Call:
## coxph(formula = survVar ~ age, data = hmohiv)
##
## coef exp(coef) se(coef) z p
## age 0.08529 1.08903 0.01745 4.888 1.02e-06
##
## Likelihood ratio test=23.34 on 1 df, p=1.359e-06
## n= 100, number of events= 80
- Create age group variable for each decade from 20 to 60 using the “cut()” function.
cut(hmohiv$age, breaks = 20:60)
## [1] (45,46] (34,35] (29,30] (29,30] (35,36] (31,32] (35,36] (30,31]
## [9] (47,48] (46,47] (27,28] (33,34] (43,44] (31,32] (35,36] (35,36]
## [17] (53,54] (34,35] (43,44] (37,38] (39,40] (33,34] (24,25] (31,32]
## [25] (41,42] (46,47] (29,30] (46,47] (40,41] (39,40] (42,43] (40,41]
## [33] (29,30] (36,37] (41,42] (30,31] (38,39] (31,32] (50,51] (35,36]
## [41] (42,43] (38,39] (32,33] (44,45] (32,33] (27,28] (30,31] <NA>
## [49] (43,44] (38,39] (32,33] (30,31] (32,33] (49,50] (35,36] (29,30]
## [57] (41,42] (31,32] (33,34] (37,38] (32,33] (38,39] (38,39] (32,33]
## [65] (33,34] (33,34] (45,46] (21,22] (43,44] (36,37] (24,25] (37,38]
## [73] (31,32] (33,34] (28,29] (35,36] (20,21] (25,26] (31,32] (41,42]
## [81] (39,40] (36,37] (46,47] (31,32] (40,41] (45,46] (25,26] (29,30]
## [89] (31,32] (30,31] (34,35] (35,36] (40,41] (35,36] (34,35] (33,34]
## [97] (27,28] (28,29] (34,35] (33,34]
## 40 Levels: (20,21] (21,22] (22,23] (23,24] (24,25] (25,26] ... (59,60]
- Plot the relationship between decade and survival. Hint: use survfit.
ggsurvplot(survfit(survVar~censor,data=hmohiv),risk.table=TRUE, pval = TRUE)

- Was age a factor in survival? What is the HR?
ggsurvplot(survfit(survVar~age,data=hmohiv), pval = TRUE)

- Use coxph to simultaneously estimate HR due to age & drug and report CI and p-values.
Cox2 <- coxph(survVar ~ age + drug, data = hmohiv)
Cox2
## Call:
## coxph(formula = survVar ~ age + drug, data = hmohiv)
##
## coef exp(coef) se(coef) z p
## age 0.09714 1.10202 0.01864 5.211 1.87e-07
## drug 1.01670 2.76405 0.25622 3.968 7.24e-05
##
## Likelihood ratio test=39.13 on 2 df, p=3.182e-09
## n= 100, number of events= 80