The following codes is about estimating the survival function. The days variable as used as time variable. The number of patients alive shows the people at risk at different days. Also, include number of deaths at different time. The proportion of alive patients was obtained by dividing number of patients alive by total alive as the data have no censored observations. Then I have summarized the findings in data frame for understanding. At the end I have plot the survival curve as well in step function.
days <- c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 21, 22, 23, 24, 25,
26, 27, 28, 29, 30)
no_of_patients_alive <- c(455, 423, 410, 400, 392, 386, 378, 371, 366, 360, 353,
305, 296, 295, 292, 290, 288, 286, 283, 280, 279)
no_of_deaths <- c(0, 32, 45, 55, 63, 69, 77, 84, 89, 95, 102, 150, 159, 160, 163, 165,
167, 169, 172, 175, 176)
total_alive <- 455
proportion_alive <- c((no_of_patients_alive) / (total_alive))
print(proportion_alive)
## [1] 1.0000000 0.9296703 0.9010989 0.8791209 0.8615385 0.8483516 0.8307692
## [8] 0.8153846 0.8043956 0.7912088 0.7758242 0.6703297 0.6505495 0.6483516
## [15] 0.6417582 0.6373626 0.6329670 0.6285714 0.6219780 0.6153846 0.6131868
est_of_surf <- data.frame(days, no_of_patients_alive, no_of_deaths, proportion_alive)
print(est_of_surf)
## days no_of_patients_alive no_of_deaths proportion_alive
## 1 0 455 0 1.0000000
## 2 1 423 32 0.9296703
## 3 2 410 45 0.9010989
## 4 3 400 55 0.8791209
## 5 4 392 63 0.8615385
## 6 5 386 69 0.8483516
## 7 6 378 77 0.8307692
## 8 7 371 84 0.8153846
## 9 8 366 89 0.8043956
## 10 9 360 95 0.7912088
## 11 10 353 102 0.7758242
## 12 21 305 150 0.6703297
## 13 22 296 159 0.6505495
## 14 23 295 160 0.6483516
## 15 24 292 163 0.6417582
## 16 25 290 165 0.6373626
## 17 26 288 167 0.6329670
## 18 27 286 169 0.6285714
## 19 28 283 172 0.6219780
## 20 29 280 175 0.6153846
## 21 30 279 176 0.6131868
plot(days, proportion_alive, type = "s", xlab = "Days", ylab = "Proportion of Alive")
Another example is based on the survival data for 24 patients. When we have censored observation in the data we do not include the censored observation and its corresponding time. But, it is to be noted that when we go to find the survival estimate for the next observation we then include means that the estimated is based on the censored observation upto time t. I.e. upto some certain time x patients is survived and some x patients are censored means that left the study. The surt_months emphasizes the survival time in months. The obs_deaths is the short form of observed deaths. Similarly, no_risk is stand for the number of patients at risk. The prop variable is the proportions of survived. The formula is different from the above example ensures that this data include some censored observations. Its need a little expertise in R to multiply the two column cross. While to find the final survival estimate it require to cross multiply the proportion column and the estimated column. So I estimated the survival function based on the individual value using indexing proportions starting from s1:s8 and then combined using the variable sur_est to get the overall estimates.
The indexing is easy for small datasets. Datasets having large number of observation would be really difficult task for indexing. The output at the end the first output upto 7 is the proportions and the second output is the survival estimates.
surt_months <- c(0, 6, 8, 12, 20, 24, 30, 42)
obs_deaths <- c(0, 4, 2, 2, 1, 1, 1, 1)
no_risk <- c(24, 23, 19, 17, 10, 8, 4, 1)
prop <- c((no_risk - obs_deaths) / (no_risk))
print(prop)
## [1] 1.0000000 0.8260870 0.8947368 0.8823529 0.9000000 0.8750000 0.7500000
## [8] 0.0000000
s1 <- prop[1] * prop[1]
s2 <- s1 * prop[2]
s3 <- s2 * prop[3]
s4 <- s3 * prop[4]
s5 <- s4 * prop[5]
s6 <- s5 * prop[6]
s7 <- s6 * prop[7]
s8 <- s7 * prop[8]
sur_est <- c(s1, s2, s3, s4, s5, s6, s7, s8)
print(sur_est)
## [1] 1.0000000 0.8260870 0.7391304 0.6521739 0.5869565 0.5135870 0.3851902
## [8] 0.0000000
This example is same as the above but the data and variables names are different. The time_r stands for time relapse. The censd variable shows the censored observations. relp stands for relapse. Similarly, riskst stands for risk set and cond_prob stands for conditional probability in the above example we used proportion for this term. cum_prob Stands cummulative probability or the estimate of the survival function.
time_r <- c(0, 6, 7, 10, 13, 16, 22, 23)
censd <- c(0, 1, 0, 1, 0, 0, 0, 0)
relp <- c(0, 3, 1, 1, 1, 1, 1, 1)
riskst <- c(21, 21, 17, 15, 12, 11, 7, 6)
cond_prob <- c((riskst - relp) / riskst)
print(cond_prob)
## [1] 1.0000000 0.8571429 0.9411765 0.9333333 0.9166667 0.9090909 0.8571429
## [8] 0.8333333
c1 <- cond_prob[1] * 1.000
c2 <- cond_prob[2] * c1
c3 <- cond_prob[3] * c2
c4 <- cond_prob[4] * c3
c5 <- cond_prob[5] * c4
c6 <- cond_prob[6] * c5
c7 <- cond_prob[7] * c6
c8 <- cond_prob[8] * c7
cum_prob <- c(c1, c2, c3, c4, c5, c6, c7, c8)
print(cum_prob)
## [1] 1.0000000 0.8571429 0.8067227 0.7529412 0.6901961 0.6274510 0.5378151
## [8] 0.4481793
plot(time_r, cum_prob, type = "s")
The log-rank test is used to compare to survival curves or two survival estimates. The theoretical background and how to compute this is presented by me in my presentation. Here I will only consider how to compute this test in R using its generic formulas. The formulas are: \(e_1f=(n_1f/(n_1f+n_2f))*(m_1f+m_2f)\). \(e_2f = (n_2f/(n_1f+n_2f))*(m_1f+m_2f)\). \(Var(O_i-E_i)=\sum(n_1f*n_2f(m_1f+m_2f)(n_1f+n_2f-m_1f-m_2f)/(n_1f+n_2f)^2(n_1f+n_2f-1)\). The test statistic is: \(Log-rank test = (O - E)^2 / Var(O - E)\), we can take any one of the two groups. I have used the leukemia data here for the analysis. The following computations are the requirements of the Log-rank test. It such time consuming process to calculate it by hand.
t_f <- c(1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 15, 16, 17, 22, 23)
m1f <- c(0, 0, 0, 0, 0, 3, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1)
m2f <- c(2, 2, 1, 2, 2, 0, 0, 4, 0, 2, 2, 0, 1, 0, 1, 1, 1)
n1f <- c(21, 21, 21, 21, 21, 21, 17, 16, 15, 13, 12, 12, 11, 11, 10, 7, 6)
n2f <- c(21, 19, 17, 16, 14, 12, 12, 12, 8, 8, 6, 4, 4, 3, 3, 2, 1)
e1f <- c(n1f / (n1f + n2f)*(m1f + m2f))
print(e1f)
## [1] 1.0000000 1.0500000 0.5526316 1.1351351 1.2000000 1.9090909 0.5862069
## [8] 2.2857143 0.6521739 1.2380952 1.3333333 0.7500000 0.7333333 0.7857143
## [15] 0.7692308 1.5555556 1.7142857
e2f <- c((n2f / (n1f + n2f) * (m1f + m2f)))
e2f
## [1] 1.0000000 0.9500000 0.4473684 0.8648649 0.8000000 1.0909091 0.4137931
## [8] 1.7142857 0.3478261 0.7619048 0.6666667 0.2500000 0.2666667 0.2142857
## [15] 0.2307692 0.4444444 0.2857143
sum_e2f <- sum(e2f)
sum_e2f
## [1] 10.7495
O1 <- c((m1f - e1f))
O1
## [1] -1.0000000 -1.0500000 -0.5526316 -1.1351351 -1.2000000 1.0909091
## [7] 0.4137931 -2.2857143 0.3478261 -1.2380952 -1.3333333 0.2500000
## [13] -0.7333333 0.2142857 -0.7692308 -0.5555556 -0.7142857
sum(O1)
## [1] -10.2505
O2 <- c((m2f - e2f))
sum_O2 <- sum(O2)
n1f_n2f <- c(n1f * n2f)
n1f_n2f
## [1] 441 399 357 336 294 252 204 192 120 104 72 48 44 33 30 14 6
m1f_m2f <- c(m1f + m2f)
m1f_m2f
## [1] 2 2 1 2 2 3 1 4 1 2 2 1 1 1 1 2 2
n1f_n2f_m1f_m2f <- c(n1f + n2f - m1f - m2f)
n1f_n2f_2 <- c((n1f + n2f)^2)
n1f_n2f_minus_1 <- c((n1f + n2f -1))
numerator <- c(n1f_n2f*m1f_m2f*n1f_n2f_m1f_m2f)
sum_numerator <- sum(numerator)
sum_numerator
## [1] 179838
denominator <- c(n1f_n2f_2 * n1f_n2f_minus_1)
sum_denom <- sum(denominator)
sum_denom
## [1] 397124
total <- c(numerator/denominator)
sum_total <- sum(total)
sum_total
## [1] 6.256961
leukemia <- data.frame(t_f, m1f, m2f, n1f, n2f, e1f, e2f, O1, O2, n1f_n2f, m1f_m2f, n1f_n2f_m1f_m2f, n1f_n2f_2, n1f_n2f_minus_1, numerator, denominator, total)
leukemia
log_rank_test <- ((sum_O2)^2) / sum_total
log_rank_test
## [1] 16.79294
Chi-Square for Independence, It is simple to evaluate using the chi-square test or Fisher exact test when we have two categorical variables. Consider the case when we have three categorical variables and wish to discover how they are related. The Cochran–Mantel–Haenszel test, which is an extension of the chi-square test of association, is useful in this scenario.
The Cochran–Mantel–Haenszel test assesses the significance of categorical variables’ associations. We’ll use the VCD package and the mantelhean.test() function in this example.
Let’s look at the test hypotheses. Ho:-The two inner variables do not have any relationship. H1: There is a link between the two inner variables.
library(vcd)
## Warning: package 'vcd' was built under R version 4.3.2
## Loading required package: grid
head(Arthritis)
Treatment, Sex, Age, and Improved are the four factors here. The goal is to figure out how these factors are related. Consider the following three variables: Treatment, Sex, and Improved.
mantelhaen.test(Arthritis$Treatment,Arthritis$Improved,Arthritis$Sex)
##
## Cochran-Mantel-Haenszel test
##
## data: Arthritis$Treatment and Arthritis$Improved and Arthritis$Sex
## Cochran-Mantel-Haenszel M^2 = 14.632, df = 2, p-value = 0.0006647
Because the p-value of 0.0006647 is less than 0.05, the null hypothesis is rejected and the alternate hypothesis is accepted. This suggests that at each level of sex, the treatment received and the improvement claimed are not independent. Similarly, we can divide the age column into two or more components and measure the relationship.
The Kaplan-Meier (KM) method is non-parametric method used to estimate the survival probability from observed survival times (Kaplan and Meier, 1958). the estimated probability St is a step function that changes value only at the time of each event. It is possible to compute confidence intervals for the survival probability. The KM survival curve, a plot of the KM survival probability against time, provides a useful summary of the data that can be used to estimate measures such as median survival time. We will We will use two R Packages. For computing survival analysis, we will use survival For summarizing and visualizing the results of survival analysis, will use survminer. We will use the cancer data to estimate Kaplan-Meier curves.
library(survival)
##
## Attaching package: 'survival'
## The following object is masked _by_ '.GlobalEnv':
##
## leukemia
library(survminer)
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.3.2
## Loading required package: ggpubr
##
## Attaching package: 'survminer'
## The following object is masked from 'package:survival':
##
## myeloma
data("cancer")
head(cancer)
fit <- survfit(Surv(time, status) ~ sex, data = cancer)
print(fit)
## Call: survfit(formula = Surv(time, status) ~ sex, data = cancer)
##
## n events median 0.95LCL 0.95UCL
## sex=1 138 112 270 212 310
## sex=2 90 53 426 348 550
summary(fit)$table
## records n.max n.start events rmean se(rmean) median 0.95LCL 0.95UCL
## sex=1 138 138 138 112 326.0841 22.91156 270 212 310
## sex=2 90 90 90 53 460.6473 34.68985 426 348 550
d <- data.frame(time = fit$time,
n.risk = fit$n.risk,
n.event = fit$n.event,
n.censor = fit$n.censor,
surv = fit$surv,
upper = fit$upper,
lower = fit$lower)
head(d)
ggsurvplot(fit,
pval = TRUE, conf.int = TRUE,
risk.table = TRUE,
risk.table.col = "strata",
linetype = "strata",
surv.median.line = "hv",
ggtheme = theme_bw(),
palette = c("#E7B800", "#2E9FDF"))
ggsurvplot(fit,
pval = TRUE,
conf.int = T,
conf.int.style = "step",
xlab = "Times in days",
break.time.by = 200,
ggtheme = theme_light(),
risk.table = "abs_pct",
risk.table.y.text.col = T,
risk.table.y.text = FALSE,
ncensor.plot = T,
surv.median.line = "hv",
legend.labs = c("Male", "Female"),
palette = c("#E7B800", "#2E9FDF"))
summary(fit)$table
## records n.max n.start events rmean se(rmean) median 0.95LCL 0.95UCL
## sex=1 138 138 138 112 326.0841 22.91156 270 212 310
## sex=2 90 90 90 53 460.6473 34.68985 426 348 550
# Cummulative events
ggsurvplot(fit,
conf.int = T,
risk.table.col = "strata",
ggtheme = theme_bw(),
palette = c("#E7B800", "#2E9FDF"),
fun = "event")
# Cummualtive hazard cumhaz
ggsurvplot(fit,
conf.int = TRUE,
risk.table.col = "strata", # Change risk table color by groups
ggtheme = theme_bw(), # Change ggplot2 theme
palette = c("#E7B800", "#2E9FDF"),
fun = "cumhaz")
# Log-Rank Test for comparing survival of groups
surv_diff <- survdiff(Surv(time, status) ~ sex, data = cancer)
print(surv_diff)
## Call:
## survdiff(formula = Surv(time, status) ~ sex, data = cancer)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## sex=1 138 112 91.6 4.55 10.3
## sex=2 90 53 73.4 5.68 10.3
##
## Chisq= 10.3 on 1 degrees of freedom, p= 0.001
names(surv_diff)
## [1] "n" "obs" "exp" "var" "chisq" "pvalue" "call"
The Cox proportional-hazards model (Cox, 1972) is essentially a regression model commonly used statistical in medical research for investigating the association between the survival time of patients and one or more predictor variables. Data from R related Lung Cancer 0time: Survival time in days 0status: censoring status 1=censored, 2=dead 0age: Age in years 0sex: Male=1 Female=2 0ph.ecog: ECOG performance score (0=good 5=dead) 0ph.karno: Karnofsky performance score (bad=0-good=100) rated by physician 0pat.karno: Karnofsky performance score as rated by patient 0meal.cal: Calories consumed at meals 0wt.loss: Weight loss in last six months.
data("cancer")
head(cancer)
res.cox <- coxph(Surv(time, status) ~ sex, data = cancer)
res.cox
## Call:
## coxph(formula = Surv(time, status) ~ sex, data = cancer)
##
## coef exp(coef) se(coef) z p
## sex -0.5310 0.5880 0.1672 -3.176 0.00149
##
## Likelihood ratio test=10.63 on 1 df, p=0.001111
## n= 228, number of events= 165
The variable gender is encoded as a numeric vector. 1: male, 2: female. The R summary for the Cox model gives the hazard ratio (HR) for the second group relative to the first group, that is, female versus male. The beta coefficient for gender = -0.53 indicates that females have lower risk of death (lower survival rates) than males, in these data.
Hazard ratios. The exponentiated coefficients (exp(coef) = exp(-0.53) = 0.59), also known as hazard ratios, give the effect size of covariates. For example, being female (gender=2) reduces the hazard by a factor of 0.59, or 41%. Being female is associated with good prognostic. Means that the rate.
Statistical significance. The column marked z gives the Wald statistic value. It corresponds to the ratio of each regression coefficient to its standard error \(z = coef/se(coef)\). The wald statistic evaluates, whether the beta \(\beta\) coefficient of a given variable is statistically significantly different from 0. From the output above, we can conclude that the variable gender have highly statistically significant coefficients.
reg2 <- coxph(formula = Surv(time,status)~sex+age+ph.ecog+ph.karno+wt.loss, data = cancer)
reg2
## Call:
## coxph(formula = Surv(time, status) ~ sex + age + ph.ecog + ph.karno +
## wt.loss, data = cancer)
##
## coef exp(coef) se(coef) z p
## sex -0.631422 0.531835 0.177134 -3.565 0.000364
## age 0.015157 1.015273 0.009763 1.553 0.120538
## ph.ecog 0.740204 2.096364 0.191332 3.869 0.000109
## ph.karno 0.015251 1.015368 0.009797 1.557 0.119553
## wt.loss -0.009298 0.990745 0.006699 -1.388 0.165168
##
## Likelihood ratio test=33.53 on 5 df, p=2.951e-06
## n= 213, number of events= 151
## (15 observations deleted due to missingness)
The variables sex and ph.ecog have highly statistically significant coefficients, while the coefficient for ph.karno, age and wt.loss is not significant.
Age, ph.ecog and ph.karno have positive beta coefficients, while gender and wt loss has a negative coefficient. Thus, older age and higher ph.ecog & karno are associated with poorer survival, whereas being female (sex=2) is associated with better survival.
res.cox <- coxph(Surv(time, status)~ age + sex + wt.loss, data = cancer)
res.cox
## Call:
## coxph(formula = Surv(time, status) ~ age + sex + wt.loss, data = cancer)
##
## coef exp(coef) se(coef) z p
## age 0.0200882 1.0202913 0.0096644 2.079 0.0377
## sex -0.5210319 0.5939074 0.1743541 -2.988 0.0028
## wt.loss 0.0007596 1.0007599 0.0061934 0.123 0.9024
##
## Likelihood ratio test=14.67 on 3 df, p=0.002122
## n= 214, number of events= 152
## (14 observations deleted due to missingness)
test.ph <- cox.zph(res.cox)
test.ph
## chisq df p
## age 0.5077 1 0.48
## sex 2.5489 1 0.11
## wt.loss 0.0144 1 0.90
## GLOBAL 3.0051 3 0.39
From the output above, the test is not statistically significant for each of the covariates and the global is also not statistically significant. Therefore we can assume the proportional hazard assumption.
We will now examine the shapes of the hazards in a bit more detail and show how both the location and shape vary with the parameters of each distribution.
To do so we will load some needed packages: we will use flexsurv to compute the hazards, data.table as a fast alternative to data.frame, and ggplot2 for plotting.
library("flexsurv")
## Warning: package 'flexsurv' was built under R version 4.3.2
library("data.table")
library("ggplot2")
ggplot2::theme_set(theme_minimal())
We can create a general function for computing hazards for any general hazard function given combinations of parameter values at different time points. The key to the function is mapply, a multivariate version of sapply. To illustrate, let’s compute the hazard from a Weibull distribution given 3 values each of the shape and scale parameters at time points 1 and 2. The output is a matrix where each row corresponds to a time point and each column is combination of the shape and scale parameters. For example, the second row and third column is the hazard at time point 2 given a shape parameter of 1.5 and a scale parameter of 1.75.
mapply(flexsurv::hweibull,
shape = c(.5, 1, 1.5),
scale = c(.25, 1, 1.75),
MoreArgs = list(x = 1:2))
## [,1] [,2] [,3]
## [1,] 1.0000000 1 0.6479391
## [2,] 0.7071068 1 0.9163243
The more general function uses mapply to return a data.table of hazards at all possible combinations of the parameter values and time points. Factor variables and intuitive names are also returned to facilitate plotting with ggplot2.
hazfun <- function(FUN, param_vals, times){
# Compute hazard for all possible combinations of parameters and times
values <- do.call("mapply", c(list(FUN = FUN),
as.list(expand.grid(param_vals)),
list(MoreArgs = list(x = times))))
x <- data.table(expand.grid(c(param_vals, list(time = times))),
value = c(t(values)))
# Create factor variables and intuitive names for plotting
param_names <- names(param_vals)
x[[param_names[1]]] <- factor(x[[param_names[1]]],
levels = unique(x[[param_names[1]]]))
if (length(param_vals) > 1){
for (j in 2:length(param_vals)){
ordered_vals <- unique(x[[param_names[j]]])
x[[param_names[j]]] <- paste0(param_names[j], " = ", x[[param_names[j]]])
factor_levels <- paste0(param_names[j], " = ", ordered_vals)
x[[param_names[j]]] <- factor(x[[param_names[j]]],
levels = factor_levels)
}
}
# Return
return(x)
}
The exponential distribution is parameterized by a single rate parameter and only supports a hazard that is constant over time. The hazard is simply equal to the rate parameter.
times <- seq(1, 10, 1)
rate <- seq(1, 5, 1)
haz_exp <- hazfun(flexsurv::hexp,
list(rate = rate),
times = times)
ggplot(haz_exp, aes(x = time, y = value, col = rate)) +
geom_line() + xlab("Time") + ylab("Hazard") +
scale_y_continuous(breaks = rate)
The Weibull distribution can be parameterized as both an accelerated failure time (AFT) model or as a proportional hazards (PH) model. The parameterization in the base stats package is an AFT model.
The hazard is decreasing for shape parameter \(a < 1\) and increasing for \(a > 1\). For \(a = 1\), the Weibull distribution is equivalent to an exponential distribution with rate parameter \(1/b\) where \(b\) is the scale parameter.
wei_shape <- seq(.25, 3, .25)
haz_wei <- hazfun(flexsurv::hweibull,
list(scale = seq(2, 5, .5),
shape = wei_shape),
times = times)
ggplot(haz_wei, aes(x = time, y = value, col = scale)) +
geom_line() + facet_wrap(~shape, scales = "free_y") +
xlab("Time") + ylab("Hazard")
flexsurv provides an alternative PH parameterization of the Weibull model with the same shape parameter \(a\) and a scale parameter \(m = b^{-a}\) where \(b\) is the scale parameter in the AFT model.
The hazard is again decreasing for \(a < 1\), constant for \(a = 1\), and increasing for \(a > 1\). Note that for \(a = 1\), the PH Weibull distribution is equivalent to an exponential distribution with rate parameter \(m\).
haz_weiPH <- hazfun(flexsurv::hweibullPH,
list(scale = seq(2, 5, .5),
shape = wei_shape),
times = times)
ggplot(haz_weiPH, aes(x = time, y = value, col = scale)) +
geom_line() + facet_wrap(~shape, scales = "free_y") +
xlab("Time") + ylab("Hazard")
The Gompertz distribution is parameterized by a shape parameter \(a\) and rate parameter \(b\). The hazard is increasing for \(a > 0\), constant for \(a = 0\), and decreasing for \(a < 0\). When \(a = 0\), the Gompertz distribution is equivalent to an exponential distribution with rate parameter \(b\).
haz_gomp <- hazfun(flexsurv::hgompertz,
list(rate = seq(.5, 3, .5),
shape = seq(-2, 2, .5)),
times = times)
ggplot(haz_gomp, aes(x = time, y = value, col = rate)) +
geom_line() + facet_wrap(~shape, scales = "free_y") +
xlab("Time") + ylab("Hazard")
The gamma distribution is parameterized by a shape parameter \(a\) and a rate parameter \(b\). Like the Weibull distribution, the hazard is decreasing for \(a < 1\), constant for \(a = 1\), and increasing for \(a >1\). In the case where \(a = 1\), the gamma distribution is an exponential distribution with rate parameter \(b\).
haz_gam <- hazfun(flexsurv::hgamma,
list(rate = seq(.5, 3, .5),
shape = c(1e-5, .5, 1, seq(2, 6))),
times = times)
ggplot(haz_gam, aes(x = time, y = value, col = rate)) +
geom_line() + facet_wrap(~shape, scales = "free_y") +
xlab("Time") + ylab("Hazard")
The lognormal distribution is parameterized by the mean \(\mu\) and standard deviation \(\sigma\) of survival time on the log scale. The lognormal hazard is either monotonically decreasing or arc-shaped. Note that the shape of the hazard depends on the values of both \(\mu\) and \(\sigma\).
haz_lnorm <- hazfun(flexsurv::hlnorm,
list(meanlog = seq(0, 4, .5),
sdlog = seq(.5, 3, .5)),
times = seq(0, 20, .1))
ggplot(haz_lnorm, aes(x = time, y = value, col = meanlog)) +
geom_line() + facet_wrap(~sdlog, scales = "free_y") +
xlab("Time") + ylab("Hazard")
The log-logistic distribution is parameterized by a shape parameter \(a\) and a scale parameter \(b\). When \(a > 1\), the hazard function is arc-shaped whereas when \(a \leq 1\), the hazard function is decreasing monotonically.
haz_llogis <- hazfun(flexsurv::hllogis,
list(scale = seq(.5, 4, .5),
shape = seq(.5, 3, .5)),
times = seq(0, 20, 1))
ggplot(haz_llogis, aes(x = time, y = value, col = scale)) +
geom_line() + facet_wrap(~shape, scales = "free_y") +
xlab("Time") + ylab("Hazard")
The generalized gamma distribution is parameterized by a location parameter \(\mu\), a scale parameter \(\sigma\), and a shape parameter \(Q\). It is the most flexible distribution reviewed in this post and includes the exponential (\(Q = \sigma = 1\)), Weibull (\(Q = 1\)), gamma (\(Q = \sigma\)), and lognormal (\(Q = 0\)) distributions as special cases.
Each row in the figure corresponds to a unique value of \(\sigma\) and each column corresponds to a unique value of \(Q\).The generalized gamma distribution is quite flexible as it supports hazard functions that are monotonically increasing, monotonically decreasing, arc-shaped, and bathtub shaped.
haz_gengamma <- hazfun(flexsurv::hgengamma,
list(mu = seq(1, 4, .5),
sigma = seq(.5, 2, .5),
Q = seq(-3, 3, 1)),
times = seq(0, 30, 1))
ggplot(haz_gengamma, aes(x = time, y = value, col = mu)) +
geom_line() +
facet_wrap(sigma ~ Q,
ncol = length(levels(haz_gengamma$Q)),
scales = "free") +
xlab("Time") + ylab("Hazard") +
theme(legend.position = "bottom") +
guides(col = guide_legend(nrow = 2, byrow = TRUE))
## Warning: Removed 7 rows containing missing values (`geom_line()`).