1. (10%) Define a proportional hazard model using appropriate notation to first describe the relationship between hazard and Z (continuous) and then interpret the regression coefficient for Z, i.e. β.
on above attachment
2. (10%) Write down the partial likelihood of β based on the above 4 observations.
on above attachment
3. (10%) Plot the log partial likelihood for β in [-8, 3], and convince yourself that this function is concave.
| interval | slope |
|---|---|
| x = [ -8,-7 ) | 9.998847 |
| x = [ -7,-6 ) | 9.996866 |
| x = [ -6,-5 ) | 9.991481 |
| x = [ -5,-4 ) | 9.976835 |
| x = [ -4,-3 ) | 9.936885 |
| x = [ -3,-2 ) | 9.826119 |
| x = [ -2,-1 ) | 9.505480 |
| x = [ -1,0 ) | 8.688716 |
| x = [ 0,1 ) | 7.679191 |
| x = [ 1,2 ) | 7.203369 |
| x = [ 2,3 ) | 7.063979 |
4. (10%) Find β ̂ that maximizes the log partial likelihood function and then calculate the second derivative of the log partial likelihood function at β ̂.
## [1] 8.125000 8.070131 8.016569 7.964462 7.913935 7.865097 7.818031 7.772802
## [9] 7.729455 7.688015 7.648492 7.610878 7.575153 7.541283 7.509225 7.478929
## [17] 7.450335 7.423381 7.397999 7.374120 7.351673 7.330586 7.310788 7.292209
## [25] 7.274781 7.258437 7.243112 7.228746 7.215279 7.202654 7.190820 7.179725
## [33] 7.169321 7.159564 7.150413 7.141826 7.133768 7.126204 7.119101 7.112430
## [41] 7.106161 7.100269 7.094730 7.089519 7.084617 7.080004 7.075660 7.071569
## [49] 7.067715 7.064082 7.060657 7.057428 7.054381 7.051506 7.048791 7.046229
## [57] 7.043808 7.041521 7.039360 7.037317 7.035385
5. (10%) Use SAS/R to fit the above proportional hazards model to the data. How do your results compare with the SAS/R output?
## Call:
## coxph(formula = Surv(time, delta) ~ Z, data = df)
##
## n= 4, number of events= 3
##
## coef exp(coef) se(coef) z Pr(>|z|)
## Z -0.9030 0.4054 0.8093 -1.116 0.265
##
## exp(coef) exp(-coef) lower .95 upper .95
## Z 0.4054 2.467 0.08297 1.98
##
## Concordance= 0.8 (se = 0.16 )
## Likelihood ratio test= 1.76 on 1 df, p=0.2
## Wald test = 1.24 on 1 df, p=0.3
## Score (logrank) test = 1.67 on 1 df, p=0.2
My calculations and R output do not converge My calculations are much higher than cox ph model.
Effects of ploidy on the prognosis of patients with tongue cancer. Variables Time: time to death Delta: death indicator Type: type of tumor (1: Aneuploid; 2: Diploid) Use the data to answer the following questions.
## Parsed with column specification:
## cols(
## type = col_double(),
## time = col_double(),
## delta = col_double()
## )
6. (10%) Summarize the survival data using appropriate summary statistics/graphs by tumor type.
| type | time | delta | |
|---|---|---|---|
| 1:52 | Min. : 1.00 | Min. :0.0000 | |
| 2:28 | 1st Qu.: 23.75 | 1st Qu.:0.0000 | |
| NA | Median : 69.50 | Median :1.0000 | |
| NA | Mean : 73.83 | Mean :0.6625 | |
| NA | 3rd Qu.:101.75 | 3rd Qu.:1.0000 | |
| NA | Max. :400.00 | Max. :1.0000 |
## `summarise()` regrouping output by 'type' (override with `.groups` argument)
| type | delta | Count | MaxT | MinT |
|---|---|---|---|---|
| 1 | 0 | 21 | 400 | 61 |
| 1 | 1 | 31 | 167 | 1 |
| 2 | 0 | 6 | 231 | 8 |
| 2 | 1 | 22 | 181 | 1 |
## Call: survfit(formula = Surv(time, delta) ~ type, data = tongue %>%
## mutate(type = as_factor(type)))
##
## n events median 0.95LCL 0.95UCL
## type=1 52 31 93 67 NA
## type=2 28 22 42 23 112
## Call: survfit(formula = Surv(time, follow) ~ type, data = tongue %>%
## mutate(type = as_factor(type)) %>% mutate(follow = ifelse(delta ==
## 0, 1, 0)))
##
## n events median 0.95LCL 0.95UCL
## type=1 52 21 108 93 240
## type=2 28 6 176 104 NA
This dataset looks at the outcome of death in participants who have tongue cancer with two different type of tumors, Aneuploid tumor and Diploid tumer. The sample size is 80 participants. There are 52 participants who have Aneuploid type tumor, 31 experienced outcome of death and 21 the outcome was not observed. The maximum time of the observed event for Aneuploid is time 167; the median event time is time 93 (95% CI: 67, NA). The median follow up time for Aneuploid is time 108 (95% CI: 93, 240). There are 28 participants with Diploid type tumor, 28 participants with Diploid tumor, 22 experienced outcome of death and for 6 of the participants the outcome was not observed. The maximum time of observed event for Diploid tumor is time 181; the median event time is time 42 (95% CI: 23, 112). The median follow up time for Diploid is time 176 (95% CI: 104, NA). For participants where event is not observed, they are right censored. The last participants for Aneuploid and dipoid tumor types are censored at time 400 and time 231 respectively.
Using Kaplan Meier to estimate survival time, Aneuploid tumors have a consistently higher survival cure when compared with Diploid tumors. There is no crossing of survival curves.
7. (10%) Perform logrank test to compare risk of death overtime between two tumore types and draw a conclusion.
## Call:
## survdiff(formula = Surv(time, delta) ~ type, data = tongue %>%
## mutate(type = as_factor(type)))
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## type=1 52 31 36.6 0.843 2.79
## type=2 28 22 16.4 1.873 2.79
##
## Chisq= 2.8 on 1 degrees of freedom, p= 0.09
Using the logrank test with test statistic 2.79 following the Chi^2 distribution with pvalue 0.09, there appears to be no significant differences in the the risk of death overtime between the two tumor types at significance level of 0.05.
8. (10%) Fit a Cox PH model with tumor type as the only covariate to the data and draw a conclusion.
## Call:
## coxph(formula = Surv(time, delta) ~ type, data = tongue %>% mutate(type = as_factor(type)))
##
## n= 80, number of events= 53
##
## coef exp(coef) se(coef) z Pr(>|z|)
## type2 0.4664 1.5942 0.2804 1.663 0.0963 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## type2 1.594 0.6273 0.9201 2.762
##
## Concordance= 0.564 (se = 0.036 )
## Likelihood ratio test= 2.67 on 1 df, p=0.1
## Wald test = 2.77 on 1 df, p=0.1
## Score (logrank) test = 2.81 on 1 df, p=0.09
Fitting the data using Cox proportional hazard model, a person with Diploid type tumor has 59% higher hazard of experienceing the event of death overtime when compared to Aneuploid tumor type. The covariate coeffiecient of 1.594 has 95% CI of 0.920 and 2.762 which contains 1 and has a pvalue of 0.0963 under the; this shows no statistical significant differences in hazard of experiencing the event overtime in the two tumor types.
9. (10%) Calculate the ratio of median survival time derived from Kaplan-Meier survival curve between two tumor types (Aneuploid vs. Diploid) and then compare it with the hazard ratio (Diploid vs. Aneuploid) derived from Q8. Under what distribution, those two would be identical?| KM | HR |
|---|---|
| 0.4516129 | 0.4663742 |
| 1.5708438 | 1.5942034 |
The ratio of median survival time derived from the Kaplan-Meier survial curve is 0.45 which is slightly lower when compared to hazard ratio of 0.46 using Cox proportional hazard model. Under the Weibull distribution, the two hazard ratios will be the same.
10. (10%) Compare logrank test with score test of the Cox PH model.
The logrank test statistic is 2.79 and the Cox PH model score test statistic is 2.81. Both test conclude the same with both p values 0.09 showing that there is no differene is hazard rates between the two differet tumor types.
library(tidyverse)
library(survival)
library(survminer)
library(kableExtra)
#table
df <- tibble("id" = 1:4, "time" = c(8,7,9,10), "delta" = c(1,1,0,1), "Z" = c(3,4,5,6))
#Q3
Beta <- -8:3
ln_L <- map_dbl(.x = Beta,
.f = ~.x*(3+4+6)-log(exp(.x*3)+(2*exp(.x*4))+(2*exp(.x*5))+(3*exp(.x*6))))
coord <- paste0("(", paste(Beta, ln_L %>% round(3), sep = ","), ")")
ggplot(aes(x = Beta, y = ln_L), data = tibble("Beta" = Beta, "ln_L" = ln_L)) +
geom_line() +
geom_point()+
geom_label(aes(x = Beta + 1.25, y = ln_L + 1.25, label = coord), position = "nudge")
tibble("interval" = paste("x = [", paste0(-8:2, ",", -7:3), ")"),
"slope" = map_dbl(.x = 1:11, .f = ~ ln_L[.x+1]-ln_L[.x])) %>%
kbl() %>%
kable_styling(full_width = FALSE)
#Q4
score <- function(x){
13 - (((3*exp(x*3))+(8*exp(x*4))+(10*exp(x*5))+(18*exp(x*6)))/((exp(x*3))+(2*exp(x*4))+(2*exp(x*5))+(3*exp(x*6))))
}
score(x=seq(-1,1,by = 0.01))
#Q5
coxph(Surv(time, delta)~Z, data = df) %>% summary
#Data Analysis
tongue <- read_csv("U:\\BDS 725 - Survival Analysis\\Survival Analysis\\Session 5\\tongue_KM1_11.csv")
#Q6
#tongue
tongue %>%
mutate(type = as_factor(type)) %>%
summary %>%
kbl() %>%
kable_styling(full_width = TRUE)
tongue %>%
mutate_at(c("type","delta"), as_factor) %>%
group_by(type, delta) %>%
summarize(Count=n(), MaxT = max(time), MinT = min(time)) %>%
kbl() %>%
kable_styling(full_width = TRUE)
KMdat <- survfit(Surv(time, delta)~type, data = tongue %>% mutate(type = as_factor(type)))
KMdat
survfit(Surv(time, follow)~type, data = tongue %>% mutate(type = as_factor(type)) %>% mutate(follow = ifelse(delta == 0, 1, 0)))
ggsurvplot(KMdat,
conf.int = TRUE,
data = tongue,
legend = "bottom",
legend.labs = c("Aneuploid", "Diploid"))
# Q7
survdiff(Surv(time, delta)~type, data = tongue %>% mutate(type = as_factor(type)))
#Q8
cph <- coxph(Surv(time, delta)~type, data = tongue %>% mutate(type = as_factor(type)))
cph %>% summary
#Q9
type1med <- 93
type2med <- 42
ratiomed <- type2med/type1med
tibble("KM" = c(ratiomed, exp(ratiomed)),
"HR" = c(cph$coefficients, cph$coefficients %>% exp())) %>%
kbl() %>%
kable_styling(full_width = FALSE)