R for Biostatistics

Bongani Ncube
Biostatistician/Data Scientist
Feb 22, 2025

Welcome!

About me

  • University Of Witwatersrand | Master Of Science In Epidemiology Biostatistics
  • Deltas Africa Sub-Saharan Africa Consortium for Advanced Biostatistics (SSACAB) | Research Fellow
  • University of Zimbabwe | Bachelor of Science in Statistics (2023) Mathematics and Statistics

Today’s plan


  1. What is R? How can it ease the burden of repeated reporting?
  2. Basic functions for Manipulating public health data
  3. Basic Descriptive analytics for Public health data
  4. Visualizing Public health data
  5. Statistical Analysis/ Modeling for public health data

What is R?

1 2 3 4 5 6

What is R?

1 2 3 4 5 6





R is an open-source (free!) scripting language for working with data

The benefits of R

1 2 3 4 5 6



The magic of R is that it’s reproducible (by someone else or by yourself in six months)


Keeps data separate from code (data preparation steps)

Getting R

1 2 3 4 5 6




You need the R language

And also the software

Using R

1 2 3 4 5 6



You use R via packages


…which contain functions


…which are just verbs

How to choose a statistical software.

1 2 3 4 5 6

1 2 3 4 5 6


R Stata Python
Cost Free Requires License Free
IDE RStudio Built in editor Many (Visual Code best)
Strengths Best epi / trials libraries for helpful functions Simple functionality; powerful quasi-experimental/Meta-analysis. U of U MSCI uses. Best NLP, machine learning libraries
Weakness Clunky syntax; many ‘dialects’ Simple syntax Moderately Complex Syntax
Explainable Programming* Quarto No options Jupyter

*The idea that code should be readable by consumers of science has caught on in more quantitative fields (Math, CS), but will be coming to medicine. Long overdue. Learn now.

Today’s data

1 2 3 4 5 6

stroke


doa dod status sex dm gcs sbp dbp wbc time2 stroke_type referral_from
17/2/2011 18/2/2011 alive male no 15 151 73 12.5 1 IS non-hospital
20/3/2011 21/3/2011 alive male no 15 196 123 8.1 1 IS non-hospital
9/4/2011 10/4/2011 dead female no 11 126 78 15.3 1 HS hospital
12/4/2011 13/4/2011 dead male no 3 170 103 13.9 1 IS hospital
12/4/2011 13/4/2011 alive female yes 15 103 62 14.7 1 IS non-hospital
4/5/2011 5/5/2011 dead female no 3 91 55 14.2 1 HS hospital

Today’s data

1 2 3 4 5 6

stroke


These data come from patients who were admitted at a tertiary hospital due to acute stroke. They were treated in the ward and the status (dead or alive) were recorded. The variables are:

  • doa : date of admission
  • dod : date of discharge
  • status : event at discharge (alive or dead)
  • sex : male or female
  • dm : diabetes (yes or no)
  • gcs : Glasgow Coma Scale (value from 3 to 15)
  • sbp : Systolic blood pressure (mmHg)

Today’s data

1 2 3 4 5 6

stroke

  • dbp : Diastolic blood pressure (mmHg)
  • wbc : Total white cell count
  • time2 : days in ward
  • stroke_type : stroke type (Ischaemic stroke or Haemorrhagic stroke)
  • referral_from : patient was referred from a hospital or not from a hospital

:::

Today’s data

1 2 3 4 5 6

stroke

The outcome of interest is time from admission to death. The time variable is time2 (in days) and the event variable is status. The event of interest is dead. We also note that variables dates and other categorical variables are in character format. The rest are in numerical format. :::

Today’s data

1 2 3 4 5 6

Cancer


marital sex age surgery race first status survivaltime
Single Male 4 Yes White Yes 0 290
Single Female 4 Yes White Yes 1 9
Single Female 4 Yes Black Yes 1 10
Single Female 5 Yes White Yes 0 141
Single Female 5 No White Yes 1 12
Single Male 5 No White Yes 1 54

Today’s data

1 2 3 4 5 6

Cancer


This study was conducted from latest release data from the 2017 submission of the SEER database (1973 to 2015 data) of the National Cancer Institute. Patients with a diagnosis of AO were selected from the SEER database using the International Classification of Diseases for Oncology, Third Edition (ICD-O-3) histology code 9451. There are 1824 patients were diagnosed with AO in this dataset.

The dataset contains all variables of our interest. For the purpose of this session, we want to explore prognostic factor association of age, surgery status and marital status with the survival of AO patients. The variables in the dataset as follow:

Today’s data

1 2 3 4 5 6

Cancer


  • age : biological age at the beginning of the study. This data in numerical values.
  • surgery : is the patient has undergone surgery. Coded as “Yes” and “No”
  • marital : Marital status at diagnosis. Coded as “single”, “married”, or “separated/divorced/widowed”.
  • survivaltime : survival time in month. This data is in numerical values.
  • status : survival status coded as “1 = died” and “0 = censored”

Basic data manipulation

1 2 3 4 5 6

Useful operators

1 2 3 4 5 6



<-

“save as”

opt + -

|>

“and then”

Cmd + shift + m

Common functions

1 2 3 4 5 6

Common functions

1 2 3 4 5 6


filter keeps or discards rows (aka observations)

select keeps or discards columns (aka variables)

arrange sorts data set by certain variable(s)

count tallies data set by certain variable(s)

mutate creates new variables

group_by/summarize aggregates data (pivot tables!)

str_* functions work easily with text

Syntax of a function

1 2 3 4 5 6



function(data, argument(s))


is the same as


data |>

    function(argument(s))

Filter

1 2 3 4 5 6


filter keeps or discards rows (aka observations)

the == operator tests for equality


stroke |> 
  filter(sex== "male")
doa dod status sex dm gcs sbp dbp wbc time2 stroke_type referral_from
17/2/2011 18/2/2011 alive male no 15 151 73 12.5 1 IS non-hospital
20/3/2011 21/3/2011 alive male no 15 196 123 8.1 1 IS non-hospital
12/4/2011 13/4/2011 dead male no 3 170 103 13.9 1 IS hospital
22/5/2011 23/5/2011 alive male no 11 171 80 8.7 1 IS hospital
21/10/2011 22/10/2011 alive male no 4 230 120 12.7 1 IS hospital
28/11/2011 29/11/2011 dead male no 10 207 128 10.8 1 HS non-hospital

Filter

1 2 3 4 5 6


the | operator signifies “or”


cancer |> 
  filter(marital == "Single" | 
           marital == "Married")
marital sex age surgery race first status survivaltime
Single Male 4 Yes White Yes 0 290
Single Female 4 Yes White Yes 1 9
Single Female 4 Yes Black Yes 1 10
Single Female 5 Yes White Yes 0 141
Single Female 5 No White Yes 1 12
Single Male 5 No White Yes 1 54
marital n percent
Married 1137 0.7316602
Single 417 0.2683398

Filter

1 2 3 4 5 6


the %in% operator allows for multiple options in a list


cancer |> 
  filter(marital %in% c("Separated/divorced/widowed","Single"))
marital sex age surgery race first status survivaltime
Single Male 4 Yes White Yes 0 290
Single Female 4 Yes White Yes 1 9
Single Female 4 Yes Black Yes 1 10
Single Female 5 Yes White Yes 0 141
Single Female 5 No White Yes 1 12
Single Male 5 No White Yes 1 54
marital n percent
Separated/divorced/widowed 270 0.3930131
Single 417 0.6069869

Filter

1 2 3 4 5 6


the & operator combines conditions


cancer |> filter(marital %in% c("Separated/divorced/widowed",
                      "Single") &
         status == 1)
marital sex age surgery race first status survivaltime
Single Female 4 Yes White Yes 1 9
Single Female 4 Yes Black Yes 1 10
Single Female 5 No White Yes 1 12
Single Male 5 No White Yes 1 54
Single Male 6 Yes Others Yes 1 11
Single Male 8 Yes Others Yes 1 8

Select

1 2 3 4 5 6


select keeps or discards columns (aka variables)


stroke |> 
  select(sex,status,gcs)
sex status gcs
male alive 15
male alive 15
female dead 11
male dead 3
female alive 15
female dead 3

Select

1 2 3 4 5 6


can drop columns with -column


stroke |> 
  select(-sex)
doa dod status dm gcs sbp dbp wbc time2 stroke_type referral_from
17/2/2011 18/2/2011 alive no 15 151 73 12.5 1 IS non-hospital
20/3/2011 21/3/2011 alive no 15 196 123 8.1 1 IS non-hospital
9/4/2011 10/4/2011 dead no 11 126 78 15.3 1 HS hospital
12/4/2011 13/4/2011 dead no 3 170 103 13.9 1 IS hospital
12/4/2011 13/4/2011 alive yes 15 103 62 14.7 1 IS non-hospital
4/5/2011 5/5/2011 dead no 3 91 55 14.2 1 HS hospital

Select

1 2 3 4 5 6


the pipe |> or |> chains multiple functions together


stroke |> 
  select(sex,status,sbp) |> 
  filter(sex == "male")
sex status sbp
male alive 151
male alive 196
male dead 170
male alive 171
male alive 230
male dead 207

Mutate

1 2 3 4 5 6


mutate creates new variables (with a single =)


stroke |> 
  mutate(Outcome = "stroke")
doa dod status sex dm gcs sbp dbp wbc time2 stroke_type referral_from Outcome
17/2/2011 18/2/2011 alive male no 15 151 73 12.5 1 IS non-hospital stroke
20/3/2011 21/3/2011 alive male no 15 196 123 8.1 1 IS non-hospital stroke
9/4/2011 10/4/2011 dead female no 11 126 78 15.3 1 HS hospital stroke
12/4/2011 13/4/2011 dead male no 3 170 103 13.9 1 IS hospital stroke
12/4/2011 13/4/2011 alive female yes 15 103 62 14.7 1 IS non-hospital stroke
4/5/2011 5/5/2011 dead female no 3 91 55 14.2 1 HS hospital stroke

Mutate

1 2 3 4 5 6


much more useful with a conditional such as ifelse(), which has three arguments:

condition, value if true, value if false


cancer |> 
  mutate(Survival = ifelse(status ==1,
                       "Died", "censored")) |> 
  select(Survival, status)
Survival status
censored alive
censored alive
Died dead
Died dead
censored alive
Died dead
censored alive
censored alive

Mutate

1 2 3 4 5 6


with multiple conditions, case_when() is much easier!

stroke |> 
  mutate(gcs_group = case_when(between(gcs,3,8) ~ "Severe",
                               between(gcs , 9,12)~"Moderate" ,
                              between(gcs,13,15) ~"Mild")) |> 
  select(gcs_group, gcs)
gcs_group gcs
Mild 15
Mild 15
Moderate 11
Severe 3
Mild 15
Severe 3

Group by / summarize

1 2 3 4 5 6


group_by/summarize aggregates data (pivot tables!)

group_by() identifies the grouping variable(s) and summarize() specifies the aggregation


cancer |> 
  group_by(marital) |> 
  summarize(count = n())
marital count
Married 1137
Separated/divorced/widowed 270
Single 417

Explanatory data analysis

1 2 3 4 5 6

  • Now that we have seen common funtions for data analysis let us go ahead and wrangle our data.
cancer |> 
  mutate_if(is.character, as.factor) |> 
   mutate(Survival = ifelse(status ==1,
                       "Died", "censored"))->cancer
glimpse(cancer)
Rows: 1,824
Columns: 9
$ marital      <fct> Single, Single, Single, Single, Single, Single, Single, S…
$ sex          <fct> Male, Female, Female, Female, Female, Male, Male, Female,…
$ age          <dbl> 4, 4, 4, 5, 5, 5, 6, 6, 6, 8, 9, 11, 11, 11, 11, 12, 13, …
$ surgery      <fct> Yes, Yes, Yes, Yes, No, No, Yes, Yes, Yes, Yes, Yes, Yes,…
$ race         <fct> White, White, Black, White, White, White, White, White, O…
$ first        <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Ye…
$ status       <dbl> 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, …
$ survivaltime <dbl> 290, 9, 10, 141, 12, 54, 161, 60, 11, 8, 15, 298, 83, 15,…
$ Survival     <chr> "censored", "Died", "Died", "censored", "Died", "Died", "…

Do we have any missing data

1 2 3 4 5 6

skimr::skim(cancer)
Data summary
Name cancer
Number of rows 1824
Number of columns 9
_______________________
Column type frequency:
character 1
factor 5
numeric 3
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Survival 0 1 4 8 0 2 0

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
marital 0 1 FALSE 3 Mar: 1137, Sin: 417, Sep: 270
sex 0 1 FALSE 2 Mal: 1023, Fem: 801
surgery 0 1 FALSE 2 Yes: 1609, No: 215
race 0 1 FALSE 3 Whi: 1584, Oth: 157, Bla: 83
first 0 1 FALSE 2 Yes: 1651, No: 173

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
age 0 1 48.94 15.30 4 39 49 59.25 87 ▁▅▇▆▂
status 0 1 0.54 0.50 0 0 1 1.00 1 ▇▁▁▁▇
survivaltime 0 1 58.25 58.36 1 13 37 91.00 368 ▇▂▁▁▁

Do we have any missing data

1 2 3 4 5 6

colSums(is.na(cancer))
     marital          sex          age      surgery         race        first 
           0            0            0            0            0            0 
      status survivaltime     Survival 
           0            0            0 

EDA in tables

1 2 3 4 5 6

  • We can get the overall descriptive statistics for the cancer data by using the tbl_summary function from gtsummary package.
  • Ideally at this stage , you have already refined your dataset.

EDA in tables

1 2 3 4 5 6

cancer |>
 select(-first,-age,-marital,-status,-race) |>  
  tbl_summary(type=list(surgery~"categorical")) |> 
  as_gt()
Characteristic N = 1,8241
sex
    Female 801 (44%)
    Male 1,023 (56%)
surgery
    No 215 (12%)
    Yes 1,609 (88%)
survivaltime 37 (13, 91)
Survival
    censored 835 (46%)
    Died 989 (54%)
1 n (%); Median (IQR)

EDA in tables (Refined)

1 2 3 4 5 6

cancer |>
 select(-first,-age,-marital,-status,-race) |>  
  tbl_summary(type=list(surgery~"categorical"),
    statistic = list(all_continuous() ~ "{mean} ({sd})", 
                     all_categorical() ~ "{n} / {N} ({p}%)"),
    digits = all_continuous() ~ 2) |> 
  modify_caption("Patient Characteristics (N = {N})") |>
  as_gt()
Patient Characteristics (N = 1824)
Characteristic N = 1,8241
sex
    Female 801 / 1,824 (44%)
    Male 1,023 / 1,824 (56%)
surgery
    No 215 / 1,824 (12%)
    Yes 1,609 / 1,824 (88%)
survivaltime 58.25 (58.36)
Survival
    censored 835 / 1,824 (46%)
    Died 989 / 1,824 (54%)
1 n / N (%); Mean (SD)

stratify by Survival Status

1 2 3 4 5 6

To stratify the descriptive statistics based on variable status, we use the argument by =.

tab_status <- cancer |>
 select(-first,-age,-marital,-status,-race) |>   
  tbl_summary(
    by = Survival, 
    type=list(surgery~"categorical"),
    statistic = list(all_continuous() ~ "{mean} ({sd})", 
                     all_categorical() ~ "{n} / {N} ({p}%)"),
    digits = all_continuous() ~ 2) |>
  modify_caption("Patient Characteristics and Fatality (N = {N})")

tab_status |>
  as_gt()
Patient Characteristics and Fatality (N = 1824)
Characteristic Died, N = 9891 censored, N = 8351
sex

    Female 425 / 989 (43%) 376 / 835 (45%)
    Male 564 / 989 (57%) 459 / 835 (55%)
surgery

    No 154 / 989 (16%) 61 / 835 (7.3%)
    Yes 835 / 989 (84%) 774 / 835 (93%)
survivaltime 37.51 (42.16) 82.81 (64.99)
1 n / N (%); Mean (SD)

stratify by Gender

1 2 3 4 5 6

Next, we stratify the descriptive statistics based on gender also using the by = argument.

tab_gender <- cancer |>
 select(-first,-age,-marital,-status,-race) |>   
  tbl_summary(
    by = sex, 
    type=list(surgery~"categorical"),
    statistic = list(all_continuous() ~ "{mean} ({sd})", 
                     all_categorical() ~ "{n} / {N} ({p}%)"),
    digits = all_continuous() ~ 2) |>
  modify_caption("**Patient Characteristics and Gender** (N = {N})")

tab_gender |>
  as_gt()
Patient Characteristics and Gender (N = 1824)
Characteristic Female, N = 8011 Male, N = 1,0231
surgery

    No 83 / 801 (10%) 132 / 1,023 (13%)
    Yes 718 / 801 (90%) 891 / 1,023 (87%)
survivaltime 60.31 (59.75) 56.64 (57.22)
Survival

    censored 376 / 801 (47%) 459 / 1,023 (45%)
    Died 425 / 801 (53%) 564 / 1,023 (55%)
1 n / N (%); Mean (SD)

Merge the two tables

1 2 3 4 5 6

We can combine the two tables using tbl_merge().

tbl_merge(
  tbls = list(tab_gender, tab_status),
  tab_spanner = c("**Gender**", "**status**")) |>
  as_gt()
Patient Characteristics and Gender (N = 1824)
Characteristic Gender status
Female, N = 8011 Male, N = 1,0231 Died, N = 9891 censored, N = 8351
surgery



    No 83 / 801 (10%) 132 / 1,023 (13%) 154 / 989 (16%) 61 / 835 (7.3%)
    Yes 718 / 801 (90%) 891 / 1,023 (87%) 835 / 989 (84%) 774 / 835 (93%)
survivaltime 60.31 (59.75) 56.64 (57.22) 37.51 (42.16) 82.81 (64.99)
Survival



    censored 376 / 801 (47%) 459 / 1,023 (45%)

    Died 425 / 801 (53%) 564 / 1,023 (55%)

sex



    Female

425 / 989 (43%) 376 / 835 (45%)
    Male

564 / 989 (57%) 459 / 835 (55%)
1 n / N (%); Mean (SD)

EDA with plots

1 2 3 4 5 6

  • Lets explore frequency of survival status as we have already done in the tables above.
cancer |> 
  group_by(Survival) |> 
  summarise(freq = n())
# A tibble: 2 × 2
  Survival  freq
  <chr>    <int>
1 Died       989
2 censored   835

A distribution of a categorical variable

1 2 3 4 5 6

To plot the distribution of a categorical variable, we can use a Bar chart.

ggplot(data = cancer) + 
  geom_bar(mapping = aes(x = Survival)) +
  theme_bw()

A distribution of a categorical variable

1 2 3 4 5 6

Combining dplyr for data wrangling and then ggplot2 (both are packages inside the tidyverse metapackage) to plot the data. For example, dplyr part for data wrangling:

pep_age <- cancer |>
  group_by(Survival) |> 
  summarize(mean_age = mean(age)) 
pep_age
# A tibble: 2 × 2
  Survival mean_age
  <chr>       <dbl>
1 Died         52.2
2 censored     45.1

A distribution of a categorical variable

1 2 3 4 5 6

And the ggplot2 part to make the plot:

ggplot(pep_age, mapping = aes(x = Survival, y = mean_age)) + 
  geom_col()+
  theme_stata()

A distribution of a categorical variable

1 2 3 4 5 6

We can combine both tasks dplyr and ggplot together that will save time:

library(ggthemes)

cancer |> 
mutate(Survival = as.factor(Survival)) |>
  group_by(Survival) |> 
  summarize(mean_age = mean(age)) |> 
  ggplot(mapping = aes(x = Survival, y = mean_age, fill = Survival)) + 
  geom_col() +
  ylab("Mean age (Years)") +
  xlab("Survival Status") +
  theme_stata()

Histogram

1 2 3 4 5 6

To plot the distribution of a numerical variable, we can plot a histogram. To specify the number of bin, we can use binwidth and add some customization.

ggplot(data = cancer, mapping = aes(x = survivaltime)) + 
  geom_histogram(binwidth = 10) +
  ylab("Frequency") +
  xlab("survival time") +
  ggtitle("survival time distribution") +
  theme_stata()

Overlaying histograms and boxplot

1 2 3 4 5 6

By overlaying histograms, examine the distribution of a numerical variable (var age) based on variable status. First, we create an object called hist_surv. Next, we create a boxplot object and name it as box_surv. After that, we combine the two objects side-by-side using a vertical bar.

hist_surv <- ggplot(data = cancer, aes(x = survivaltime, fill = Survival)) + 
  geom_histogram(binwidth = 5, aes(y = ..density..),  
                   position = "identity", alpha = 0.75) + 
  geom_density(alpha = 0.25) +
  xlab("Survival Time") +
  ylab("Density") +
  labs(title = "Density distribution",
       caption = "Source : Survival Analysis data") +
   scale_fill_grey() +
  theme_stata()

:::

:::

Overlaying histograms and boxplot

1 2 3 4 5 6

You can read more placement of multiple plots from the patchwork package to learn about arranging multiple plots in a single figure.

library(patchwork)
hist_surv |  box_surv

Overlaying histograms and boxplot

1 2 3 4 5 6

library(patchwork)
hist_surv |  box_surv

Faceting the plots

1 2 3 4 5 6

It is hard to visualize three variables in a single histogram plot. But we can use facet_.() function to further split the plots.

We can see better plots if we split the histogram based on particular grouping. In this example, we stratify the distribution of variable age (a numerical variable) based on status and gender (both are categorical variables)

Faceting the plots

1 2 3 4 5 6

ggplot(data = cancer, aes(x = survivaltime, fill = sex)) +
    geom_histogram(binwidth = 5, aes(y = ..density..),  
                   position = "identity", alpha = 0.45) + 
  geom_density(aes(linetype = sex), alpha = 0.65) +
  scale_fill_grey() +
  xlab("survival time") +
  ylab("Density") +
  labs(title = "Density distribution of survival time for status and gender",
       caption = "Source : Survival Analysis data") +
  theme_bw() +
  facet_wrap( ~ Survival)

How to do a statistical analysis

1 2 3 4 5 6

Biostatistics is the branch of statistics that applies statistical methods and techniques to biological, medical, and health sciences.

Step 0: Save yourself a headache and collect your data in a processable format

Step 1: Data Wrangling

  • Each row is an observation (usually a patient)
  • Each column contains only 1 type of data (more below)
  • No free text (if you need to, categorize responses)

How to do a statistical analysis

1 2 3 4 5 6

Step 2: For each data element, consider the data type

  • Binary (aka dichotomous scale): e.g. Yes or No, 0 or 1
  • Unordered Categorical (nominal scale): e.g. Utah, Colorado, Nevada, Idaho
  • Ordered Categorical (ordinal scale): e.g. Room air, nasal cannula, HFNC, intubated, ECMO, dead
  • Continuous (interval & ratio scales - differ by whether 0 is special): e.g. Temperature (Celsius or Kelvin, respectively)

How to choose a statistical test

1 2 3 4 5 6

dichotomous nominal ordinal interval
a.ka. binary categorical ordered categorical continuous
n X X X X
% X X X X
min X X
max X X
range X X
mode X X X X
mean X
median X X
IQR X X
Std. dev. X
Std. err. X

From: Stoddard GJ. Biostatistics and Epidemiology Using Stata: A Course Manual. Salt Lake City, UT: University of Utah School of Medicine.

How to choose a statistical test

1 2 3 4 5 6

How do you choose the right test?

What type of variables? How many groups? Are the samples correlated (e.g. observation from the same patient at two different times)?

How do you choose the right test?

1 2 3 4 5 6

Level of measurement of status variable | Two Independent Groups | Three or more Independent Groups | Two Correlated* Samples | Three or more Correlated* Samples |
Dichotomous chi-square or Fisher’s exact test chi-square or Fisher-Freeman-Halton test McNemar test Cochran Q test
Unordered Categorical chi-square or Fisher-Freeman-Halton test chi-square or Fisher-Freeman-Halton test Stuart-Maxwell test Multiplicity adjusted Stuart-Maxwell tests#
Ordered categorical Wilcoxon-Mann-Whitney (WMW) test

Old School***: Kruskal-Wallis analysis of variance (ANOVA)

New School***: multiplicity adjusted WMW test

Wilcoxon sign rank test

Old School# Friedman two-way ANOVA by ranks

New School# Mulitiplicity adjusted Wilcoxon sign rank tests

Continuous independent groups t-test

Old school***: oneway ANOVA

New school***: multiplicity adjusted independent groups t tests

paired t-test mixed effects linear regression
Censored: time to event log-rank test Multiplicity adjusted log-rank test Shared-frailty Cox regression Shared-frailty Cox regression

Examples AO cancer

1 2 3 4 5 6

What test would we use to assess if:

  • “surgery” and “Survival” are associated beyond what’s attributable to chance?

  • To test if “Sex” and “Survival” are associated?

  • If “Survivaltime” and “Surgery” are associated?

  • if “Age” and “Survival status” are associated?

Test for Associations

Does Survival outcome depend on whether you have gone through surgery?

1 2 3 4 5 6

chi2_test_result <- chisq.test(cancer$Survival, cancer$surgery)
print(chi2_test_result)

    Pearson's Chi-squared test with Yates' continuity correction

data:  cancer$Survival and cancer$surgery
X-squared = 28.961, df = 1, p-value = 0.00000007386

Test for Associations

Is Survival outcome associated to gender?

1 2 3 4 5 6

chi2_test_result <- chisq.test(cancer$sex, cancer$Survival)
print(chi2_test_result)

    Pearson's Chi-squared test with Yates' continuity correction

data:  cancer$sex and cancer$Survival
X-squared = 0.6967, df = 1, p-value = 0.4039

Test for Associations

1 2 3 4 5 6

Does survival time depend on surgery status?

t_test_result <- t.test(survivaltime ~ surgery, data = cancer)
print(t_test_result)

    Welch Two Sample t-test

data:  survivaltime by surgery
t = -3.4353, df = 271.52, p-value = 0.0006845
alternative hypothesis: true difference in means between group No and group Yes is not equal to 0
95 percent confidence interval:
 -23.234358  -6.305457
sample estimates:
 mean in group No mean in group Yes 
         45.22326          59.99316 

Test for Associations

1 2 3 4 5 6

Does survival outcome depend on age?

t_test_result <- t.test(age~ Survival, data = cancer)
print(t_test_result)

    Welch Two Sample t-test

data:  age by Survival
t = -10.293, df = 1817.5, p-value < 0.00000000000000022
alternative hypothesis: true difference in means between group censored and group Died is not equal to 0
95 percent confidence interval:
 -8.419480 -5.724381
sample estimates:
mean in group censored     mean in group Died 
              45.10299               52.17492 

Test for Associations (Neat Tables)

1 2 3 4 5 6

library(gtsummary)

p <-cancer |> select(-first,-marital,-status,-race) 

# summary of descriptive statistics per river (mean,standard deviation and Analysis Of Variance)
theme_gtsummary_journal("jama")
table1<-p|>
  tbl_summary(
    by = status,
    type=list(surgery~"categorical"),
    statistic = list(
      all_continuous() ~ "{mean} ({sd})",
      all_categorical() ~ "{n} / {N} ({p}%)"))|> 
  add_p(test = all_continuous() ~ "aov",
        pvalue_fun = function(x) style_pvalue(x, digits = 2))|> 
   modify_header(statistic ~ "**Test Statistic**")|>
  bold_labels()|> 
  modify_fmt_fun(statistic ~ style_sigfig);table1 
Characteristic Died, N = 989 censored, N = 835 Test Statistic p-value1
sex, n / N (%)

0.78 0.38
    Female 425 / 989 (43%) 376 / 835 (45%)

    Male 564 / 989 (57%) 459 / 835 (55%)

surgery, n / N (%) 835 / 989 (84%) 774 / 835 (93%) 30 <0.001
survivaltime, Mean (SD) 38 (42) 83 (65) 321 <0.001
1 Pearson’s Chi-squared test; One-way ANOVA

Statistical Modeling

Understand the logic of regression analysis

1 2 3 4 5 6

Recall, if there is an association between an ‘exposure’ and an ‘status’, there are 4 possible explanations

  1. Chance
  2. Confounding (some other factor influences the exposure and the status)
  3. Bias 
  4. Or, causation (a real effect)

Understand the logic of regression analysis

1 2 3 4 5 6

There are at least 3 uses of regression models:

  1. Inferential Statistics: Hypothesis testing with confounding control
  2. Descriptive Statistics: Summarize the strength of association
  3. Prediction of an status (e.g. statistical machine learning)

Understand the logic of regression analysis

1 2 3 4 5 6

Regression comes with additional assumptions:

  • Independent observations (special “mixed models” can relax this)
  • The form of the output variable is correct*
  • The form of the predictor variables are correct
  • The relationship between the predictors are properly specified.**
  • Additional constraints (e.g. constant variance)

Understand the logic of regression analysis

1 2 3 4 5 6

Thus the logic is: if the assumptions of the models hold in reality, then the described relationships are valid

No model is perfect, but some models are useful

  • Morris moment(TM)

Output variable (aka the dependent variable, predicted variable) form determines the type of regression : 

EXAMPLES

1 2 3 4 5 6

Dichotomous Chi2 Test logistic regression
Unordered categorical Chi2 Test multinomial logistic regression
Ordered categorical Wilcoxon-Mann-Whitney ordinal logistic regression
Continuous (normally distributed) T-test linear regression
Censored: time to event Log-rank test Cox regression

From: From: Stoddard GJ. Biostatistics and Epidemiology Using Stata: A Course Manual. Salt Lake City, UT: University of Utah School of Medicine.

Interpretation:

1 2 3 4 5 6

Regression coefficient = What change in the status do you expected if you change the predictor by 1 unit, holding all other variables constant

  • For linear regression: additive change in status
  • For logistic regression: multiplicative change in odds of the status
  • For Cox regression: multiplicative change in the hazard of the status. 

Example:

1 2 3 4 5 6

Consider, if we want to test whether ‘surgery’ and ‘Survival’ are associated, we could use a chi2 test:

chi2_test_result <- chisq.test(cancer$surgery, cancer$Survival)
print(chi2_test_result)

    Pearson's Chi-squared test with Yates' continuity correction

data:  cancer$surgery and cancer$Survival
X-squared = 28.961, df = 1, p-value = 0.00000007386

model development

1 2 3 4 5 6

Alternatively you could specify a logistic regression

(“GLM” standards for ‘generalised linear model’. Logistic regression is a type of glm where the family is binomial)

logistic_model <- glm(status ~ surgery, data = cancer, family = binomial())

# Output the summary of the model to see coefficients and statistics
summary(logistic_model)

Call:
glm(formula = status ~ surgery, family = binomial(), data = cancer)

Coefficients:
            Estimate Std. Error z value       Pr(>|z|)    
(Intercept)   0.9261     0.1513   6.121 0.000000000927 ***
surgeryYes   -0.8502     0.1593  -5.337 0.000000094385 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2515.6  on 1823  degrees of freedom
Residual deviance: 2484.7  on 1822  degrees of freedom
AIC: 2488.7

Number of Fisher Scoring iterations: 4

model development

1 2 3 4 5 6

logistic_model <- glm(status ~ age, data = cancer, family = binomial())

# Output the summary of the model to see coefficients and statistics
summary(logistic_model)

Call:
glm(formula = status ~ age, family = binomial(), data = cancer)

Coefficients:
             Estimate Std. Error z value            Pr(>|z|)    
(Intercept) -1.377002   0.167291  -8.231 <0.0000000000000002 ***
age          0.031789   0.003311   9.600 <0.0000000000000002 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2515.6  on 1823  degrees of freedom
Residual deviance: 2416.3  on 1822  degrees of freedom
AIC: 2420.3

Number of Fisher Scoring iterations: 4

model development

1 2 3 4 5 6

logistic_model <- glm(status ~ sex, data = cancer, family = binomial())

# Output the summary of the model to see coefficients and statistics
summary(logistic_model)

Call:
glm(formula = status ~ sex, family = binomial(), data = cancer)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept)  0.12250    0.07080   1.730   0.0836 .
sexMale      0.08350    0.09468   0.882   0.3778  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2515.6  on 1823  degrees of freedom
Residual deviance: 2514.8  on 1822  degrees of freedom
AIC: 2518.8

Number of Fisher Scoring iterations: 3

model development

1 2 3 4 5 6

logistic_model <- glm(status ~ surgery, data = cancer, family = binomial())

# Output the summary of the model to see coefficients and statistics
summary(logistic_model)

Call:
glm(formula = status ~ surgery, family = binomial(), data = cancer)

Coefficients:
            Estimate Std. Error z value       Pr(>|z|)    
(Intercept)   0.9261     0.1513   6.121 0.000000000927 ***
surgeryYes   -0.8502     0.1593  -5.337 0.000000094385 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2515.6  on 1823  degrees of freedom
Residual deviance: 2484.7  on 1822  degrees of freedom
AIC: 2488.7

Number of Fisher Scoring iterations: 4

Survival analysis - survival estimates

1 2 3 4 5 6

Kaplan-Meier survival estimates is the non-parametric survival estimates. It provides the survival probability estimates at different time. Using survfit(), we can estimate the survival probability based on Kaplan-Meier (KM).

Let’s estimate the survival probabilities for

  • overall
  • Surgery
  • sex
  • race

The survival probabilities for all patients:

Kaplan-Meier survival estimates

1 2 3 4 5 6

KM <- survfit(Surv(time = survivaltime, 
                   event = Survival == "Died" ) ~ 1, 
              data = cancer)
summary(KM)
Call: survfit(formula = Surv(time = survivaltime, event = Survival == 
    "Died") ~ 1, data = cancer)

 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    1   1824      43    0.976 0.00355        0.969        0.983
    2   1772      42    0.953 0.00495        0.944        0.963
    3   1725      26    0.939 0.00562        0.928        0.950
    4   1687      25    0.925 0.00619        0.913        0.937
    5   1649      18    0.915 0.00656        0.902        0.928
    6   1618      30    0.898 0.00713        0.884        0.912
    7   1580      24    0.884 0.00755        0.870        0.899
    8   1548      20    0.873 0.00787        0.858        0.888
    9   1515      28    0.857 0.00829        0.841        0.873
   10   1480      24    0.843 0.00863        0.826        0.860
   11   1444      25    0.828 0.00896        0.811        0.846
   12   1410      28    0.812 0.00931        0.794        0.830
   13   1375      26    0.796 0.00961        0.778        0.816
   14   1341      15    0.788 0.00977        0.769        0.807
   15   1317      25    0.773 0.01003        0.753        0.793
   16   1291      20    0.761 0.01023        0.741        0.781
   17   1265      21    0.748 0.01042        0.728        0.769
   18   1236      16    0.738 0.01056        0.718        0.759
   19   1207      15    0.729 0.01070        0.708        0.750
   20   1190      13    0.721 0.01080        0.700        0.743
   21   1173      15    0.712 0.01093        0.691        0.734
   22   1149      14    0.703 0.01104        0.682        0.725
   23   1122      13    0.695 0.01114        0.674        0.717
   24   1104      14    0.686 0.01124        0.665        0.709
   25   1085      11    0.679 0.01132        0.658        0.702
   26   1069      15    0.670 0.01143        0.648        0.693
   27   1049       7    0.665 0.01148        0.643        0.688
   28   1034      14    0.656 0.01157        0.634        0.679
   29   1013      10    0.650 0.01164        0.627        0.673
   30    996      12    0.642 0.01171        0.619        0.665
   31    980       9    0.636 0.01177        0.613        0.660
   32    967       7    0.632 0.01181        0.609        0.655
   33    951       5    0.628 0.01184        0.605        0.652
   34    944       3    0.626 0.01186        0.603        0.650
   35    936       6    0.622 0.01190        0.599        0.646
   36    928       5    0.619 0.01193        0.596        0.643
   37    917       6    0.615 0.01196        0.592        0.639
   38    904      10    0.608 0.01202        0.585        0.632
   39    889       6    0.604 0.01206        0.581        0.628
   40    877       8    0.598 0.01210        0.575        0.623
   41    861       8    0.593 0.01215        0.569        0.617
   42    848       9    0.587 0.01220        0.563        0.611
   43    834       7    0.582 0.01224        0.558        0.606
   44    821       3    0.579 0.01226        0.556        0.604
   45    814       4    0.577 0.01228        0.553        0.601
   46    807       5    0.573 0.01231        0.549        0.598
   47    798       6    0.569 0.01234        0.545        0.593
   48    786       5    0.565 0.01237        0.541        0.590
   49    777       7    0.560 0.01240        0.536        0.585
   50    767      11    0.552 0.01246        0.528        0.577
   51    748       6    0.548 0.01249        0.524        0.573
   52    739       7    0.542 0.01252        0.518        0.568
   53    729       7    0.537 0.01256        0.513        0.562
   54    715       5    0.533 0.01258        0.509        0.559
   56    701       3    0.531 0.01260        0.507        0.556
   57    698       2    0.530 0.01261        0.505        0.555
   58    692       5    0.526 0.01263        0.502        0.551
   59    685       4    0.523 0.01265        0.499        0.548
   60    678       7    0.517 0.01268        0.493        0.543
   61    665       7    0.512 0.01272        0.488        0.537
   62    652       2    0.510 0.01273        0.486        0.536
   63    646      11    0.502 0.01278        0.477        0.527
   64    633       4    0.498 0.01279        0.474        0.524
   65    624       4    0.495 0.01281        0.471        0.521
   66    613       6    0.490 0.01284        0.466        0.516
   67    604       1    0.490 0.01284        0.465        0.515
   68    601       5    0.486 0.01286        0.461        0.511
   69    593       4    0.482 0.01288        0.458        0.508
   70    585       5    0.478 0.01290        0.454        0.504
   71    573       1    0.477 0.01291        0.453        0.503
   73    566       2    0.476 0.01291        0.451        0.502
   74    559       2    0.474 0.01292        0.449        0.500
   75    553       1    0.473 0.01293        0.448        0.499
   76    549       3    0.470 0.01294        0.446        0.497
   77    543       2    0.469 0.01295        0.444        0.495
   78    539       4    0.465 0.01298        0.441        0.491
   79    531       2    0.464 0.01299        0.439        0.490
   80    522       1    0.463 0.01299        0.438        0.489
   81    514       2    0.461 0.01300        0.436        0.487
   82    506       4    0.457 0.01303        0.432        0.483
   83    501       3    0.454 0.01304        0.430        0.481
   84    489       1    0.454 0.01305        0.429        0.480
   86    483       1    0.453 0.01306        0.428        0.479
   87    476       3    0.450 0.01308        0.425        0.476
   88    472       1    0.449 0.01309        0.424        0.475
   89    468       1    0.448 0.01309        0.423        0.474
   90    463       3    0.445 0.01311        0.420        0.471
   91    458       1    0.444 0.01312        0.419        0.470
   92    450       2    0.442 0.01314        0.417        0.468
   93    442       3    0.439 0.01316        0.414        0.466
   94    435       1    0.438 0.01317        0.413        0.465
   95    428       5    0.433 0.01321        0.408        0.460
   96    419       5    0.428 0.01326        0.402        0.454
   97    411       4    0.424 0.01329        0.398        0.450
   98    406       4    0.419 0.01332        0.394        0.446
   99    398       2    0.417 0.01334        0.392        0.444
  100    394       1    0.416 0.01335        0.391        0.443
  102    386       4    0.412 0.01338        0.386        0.439
  103    377       1    0.411 0.01339        0.385        0.438
  105    373       1    0.410 0.01340        0.384        0.437
  106    367       1    0.409 0.01341        0.383        0.436
  107    363       1    0.407 0.01342        0.382        0.435
  108    358       2    0.405 0.01344        0.380        0.432
  110    350       2    0.403 0.01346        0.377        0.430
  111    344       2    0.400 0.01349        0.375        0.428
  112    339       1    0.399 0.01350        0.374        0.427
  113    336       3    0.396 0.01353        0.370        0.423
  114    331       2    0.393 0.01356        0.368        0.421
  115    324       5    0.387 0.01362        0.361        0.415
  116    318       1    0.386 0.01363        0.360        0.414
  117    317       2    0.384 0.01365        0.358        0.411
  118    310       3    0.380 0.01369        0.354        0.408
  119    303       3    0.376 0.01372        0.350        0.404
  120    296       2    0.374 0.01375        0.348        0.402
  123    285       2    0.371 0.01377        0.345        0.399
  124    278       2    0.368 0.01380        0.342        0.396
  125    276       1    0.367 0.01382        0.341        0.395
  127    266       1    0.366 0.01383        0.339        0.394
  129    262       2    0.363 0.01387        0.337        0.391
  130    259       2    0.360 0.01390        0.334        0.388
  133    249       2    0.357 0.01394        0.331        0.386
  135    240       1    0.356 0.01396        0.329        0.384
  136    236       1    0.354 0.01398        0.328        0.383
  137    230       2    0.351 0.01403        0.325        0.380
  139    224       1    0.349 0.01406        0.323        0.378
  140    219       1    0.348 0.01408        0.321        0.377
  141    215       3    0.343 0.01416        0.316        0.372
  142    205       1    0.341 0.01419        0.315        0.370
  143    199       1    0.340 0.01422        0.313        0.369
  145    192       2    0.336 0.01429        0.309        0.365
  146    188       1    0.334 0.01433        0.307        0.364
  154    170       1    0.332 0.01438        0.305        0.362
  155    166       1    0.330 0.01443        0.303        0.360
  156    165       2    0.326 0.01453        0.299        0.356
  160    151       1    0.324 0.01459        0.297        0.354
  163    129       1    0.322 0.01470        0.294        0.352
  169    115       2    0.316 0.01496        0.288        0.347
  173    106       1    0.313 0.01512        0.285        0.344
  177     92       2    0.306 0.01553        0.277        0.338
  182     77       1    0.302 0.01583        0.273        0.335
  184     71       1    0.298 0.01617        0.268        0.332
  187     67       1    0.294 0.01653        0.263        0.328
  189     64       1    0.289 0.01690        0.258        0.324
  191     59       1    0.284 0.01731        0.252        0.320
  192     57       1    0.279 0.01771        0.247        0.316
  194     56       1    0.274 0.01808        0.241        0.312
  201     45       2    0.262 0.01922        0.227        0.302
  207     40       1    0.255 0.01982        0.219        0.297
  209     38       1    0.249 0.02041        0.212        0.292
  211     35       1    0.242 0.02103        0.204        0.287
  213     33       1    0.234 0.02163        0.195        0.281
  215     30       2    0.219 0.02283        0.178        0.268
  238     13       1    0.202 0.02656        0.156        0.261
  253     12       1    0.185 0.02919        0.136        0.252

Plot the survival probability

1 2 3 4 5 6

The KM estimate provides the survival probabilities. We can plot these probabilities to look at the trend of survival over time. The plot provides

  1. survival probability on the \(y-axis\)
  2. time on the \(x-axis\)

Plot the survival probability

1 2 3 4 5 6

library(ggsurvfit)
library(survminer)
ggsurvplot(KM, 
           data = cancer, 
           risk.table = TRUE, 
           linetype = c(1,4),
           tables.height = 0.3,
           pval = TRUE)

Add more aesthetics to the graph

1 2 3 4 5 6

library(ggsurvfit)
library(survminer)
ggsurvplot(KM, 
           data = cancer, 
           palette = paletteer_d("ggsci::light_blue_material")[seq(2,10,2)],
           surv.median.line = "hv",
           pval = TRUE,
           risk.table = "abs_pct",
                 size = 1.2, conf.int = FALSE, 
                 
                 legend.title = "",
                 ggtheme = theme_minimal() + 
             theme(plot.title = element_text(face = "bold")),
                 title = "Probability of dying",
                 xlab = "Time",
                 ylab = "Probability of dying",
                 legend = "top", censor = FALSE)

Kaplan-Meier survival estimates

1 2 3 4 5 6

Does survival time vary between those who had surgery and those who did not?

Next, we will estimate the survival probabilities for surgery status:

KM_surgery <- survfit(Surv(time = survivaltime, 
                             event = Survival == "Died" ) ~ surgery, 
                        data = cancer)
summary(KM_surgery)
Call: survfit(formula = Surv(time = survivaltime, event = Survival == 
    "Died") ~ surgery, data = cancer)

                surgery=No 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    1    215      13    0.940  0.0163       0.9082        0.972
    2    202      12    0.884  0.0219       0.8419        0.928
    3    190       7    0.851  0.0243       0.8049        0.900
    4    183      10    0.805  0.0270       0.7534        0.859
    5    171       5    0.781  0.0282       0.7277        0.838
    6    166       6    0.753  0.0295       0.6973        0.813
    7    160       7    0.720  0.0307       0.6622        0.783
    8    151       2    0.710  0.0310       0.6522        0.774
    9    149       5    0.687  0.0318       0.6271        0.752
   10    144       5    0.663  0.0324       0.6022        0.729
   11    138       5    0.639  0.0330       0.5773        0.707
   12    132       6    0.610  0.0335       0.5474        0.679
   13    126       1    0.605  0.0336       0.5425        0.674
   14    124       4    0.585  0.0339       0.5225        0.656
   15    120       3    0.571  0.0341       0.5077        0.642
   16    117       5    0.546  0.0343       0.4830        0.618
   17    111       3    0.532  0.0344       0.4681        0.604
   18    108       1    0.527  0.0345       0.4632        0.599
   19    107       3    0.512  0.0345       0.4484        0.584
   20    104       1    0.507  0.0346       0.4435        0.579
   22    101       1    0.502  0.0346       0.4385        0.575
   23     99       2    0.492  0.0346       0.4284        0.565
   24     97       2    0.482  0.0346       0.4183        0.555
   26     93       4    0.461  0.0347       0.3978        0.534
   27     89       1    0.456  0.0347       0.3926        0.529
   28     87       3    0.440  0.0346       0.3771        0.513
   29     83       1    0.435  0.0346       0.3719        0.508
   30     81       2    0.424  0.0346       0.3613        0.498
   31     79       1    0.419  0.0346       0.3561        0.492
   35     77       1    0.413  0.0345       0.3508        0.487
   36     76       1    0.408  0.0345       0.3454        0.481
   38     69       1    0.402  0.0345       0.3396        0.476
   39     68       1    0.396  0.0345       0.3338        0.470
   41     66       2    0.384  0.0345       0.3220        0.458
   43     64       2    0.372  0.0344       0.3102        0.446
   49     62       1    0.366  0.0344       0.3044        0.440
   51     61       1    0.360  0.0344       0.2985        0.434
   52     60       1    0.354  0.0343       0.2927        0.428
   54     59       2    0.342  0.0342       0.2811        0.416
   56     57       1    0.336  0.0341       0.2753        0.410
   63     55       1    0.330  0.0340       0.2695        0.404
   68     53       1    0.324  0.0339       0.2635        0.397
   76     50       1    0.317  0.0339       0.2572        0.391
   78     48       1    0.311  0.0338       0.2509        0.384
   84     44       1    0.303  0.0338       0.2440        0.377
   87     42       1    0.296  0.0337       0.2370        0.370
   88     41       1    0.289  0.0337       0.2300        0.363
   94     38       1    0.281  0.0336       0.2227        0.356
   95     36       1    0.274  0.0336       0.2151        0.348
  105     33       1    0.265  0.0336       0.2070        0.340
  120     28       1    0.256  0.0337       0.1976        0.331
  123     26       1    0.246  0.0338       0.1879        0.322
  140     23       1    0.235  0.0340       0.1773        0.312
  142     20       1    0.224  0.0343       0.1655        0.302
  145     19       1    0.212  0.0344       0.1540        0.291
  169     14       1    0.197  0.0351       0.1386        0.279
  177     11       1    0.179  0.0362       0.1202        0.266
  184      9       1    0.159  0.0372       0.1004        0.252
  215      5       1    0.127  0.0412       0.0674        0.240

                surgery=Yes 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    1   1609      30    0.981 0.00337        0.975        0.988
    2   1570      30    0.963 0.00474        0.953        0.972
    3   1535      19    0.951 0.00541        0.940        0.961
    4   1504      15    0.941 0.00588        0.930        0.953
    5   1478      13    0.933 0.00626        0.921        0.945
    6   1452      24    0.918 0.00691        0.904        0.931
    7   1420      17    0.907 0.00732        0.892        0.921
    8   1397      18    0.895 0.00773        0.880        0.910
    9   1366      23    0.880 0.00821        0.864        0.896
   10   1336      19    0.867 0.00858        0.851        0.884
   11   1306      20    0.854 0.00895        0.837        0.872
   12   1278      22    0.839 0.00933        0.821        0.858
   13   1249      25    0.822 0.00973        0.804        0.842
   14   1217      11    0.815 0.00989        0.796        0.835
   15   1197      22    0.800 0.01021        0.780        0.820
   16   1174      15    0.790 0.01042        0.770        0.811
   17   1154      18    0.778 0.01065        0.757        0.799
   18   1128      15    0.767 0.01084        0.746        0.789
   19   1100      12    0.759 0.01099        0.738        0.781
   20   1086      12    0.750 0.01113        0.729        0.773
   21   1072      15    0.740 0.01130        0.718        0.762
   22   1048      13    0.731 0.01144        0.709        0.754
   23   1023      11    0.723 0.01156        0.701        0.746
   24   1007      12    0.714 0.01169        0.692        0.738
   25    990      11    0.706 0.01180        0.684        0.730
   26    976      11    0.698 0.01191        0.675        0.722
   27    960       6    0.694 0.01197        0.671        0.718
   28    947      11    0.686 0.01207        0.663        0.710
   29    930       9    0.679 0.01216        0.656        0.704
   30    915      10    0.672 0.01225        0.648        0.696
   31    901       8    0.666 0.01232        0.642        0.691
   32    889       7    0.661 0.01238        0.637        0.685
   33    873       5    0.657 0.01243        0.633        0.682
   34    866       3    0.655 0.01245        0.631        0.679
   35    859       5    0.651 0.01250        0.627        0.676
   36    852       4    0.648 0.01253        0.624        0.673
   37    844       6    0.643 0.01258        0.619        0.668
   38    835       9    0.636 0.01266        0.612        0.662
   39    821       5    0.632 0.01270        0.608        0.658
   40    811       8    0.626 0.01276        0.602        0.652
   41    795       6    0.621 0.01281        0.597        0.647
   42    784       9    0.614 0.01288        0.590        0.640
   43    770       5    0.610 0.01292        0.585        0.636
   44    759       3    0.608 0.01295        0.583        0.634
   45    752       4    0.605 0.01298        0.580        0.631
   46    745       5    0.601 0.01302        0.576        0.627
   47    736       6    0.596 0.01306        0.571        0.622
   48    724       5    0.592 0.01310        0.566        0.618
   49    715       6    0.587 0.01315        0.561        0.613
   50    706      11    0.577 0.01323        0.552        0.604
   51    687       5    0.573 0.01327        0.548        0.600
   52    679       6    0.568 0.01331        0.543        0.595
   53    670       7    0.562 0.01336        0.537        0.589
   54    656       3    0.560 0.01338        0.534        0.587
   56    644       2    0.558 0.01339        0.532        0.585
   57    642       2    0.556 0.01341        0.531        0.583
   58    636       5    0.552 0.01344        0.526        0.579
   59    629       4    0.548 0.01347        0.523        0.575
   60    622       7    0.542 0.01352        0.516        0.569
   61    609       7    0.536 0.01357        0.510        0.563
   62    597       2    0.534 0.01358        0.508        0.561
   63    591      10    0.525 0.01365        0.499        0.553
   64    579       4    0.521 0.01368        0.495        0.549
   65    570       4    0.518 0.01370        0.492        0.545
   66    560       6    0.512 0.01374        0.486        0.540
   67    551       1    0.511 0.01375        0.485        0.539
   68    548       4    0.508 0.01377        0.481        0.535
   69    541       4    0.504 0.01380        0.477        0.532
   70    533       5    0.499 0.01383        0.473        0.527
   71    522       1    0.498 0.01384        0.472        0.526
   73    515       2    0.496 0.01385        0.470        0.524
   74    509       2    0.494 0.01386        0.468        0.522
   75    503       1    0.493 0.01387        0.467        0.521
   76    499       2    0.491 0.01389        0.465        0.519
   77    495       2    0.489 0.01390        0.463        0.517
   78    491       3    0.486 0.01392        0.460        0.514
   79    484       2    0.484 0.01394        0.458        0.512
   80    476       1    0.483 0.01395        0.457        0.511
   81    468       2    0.481 0.01396        0.455        0.509
   82    461       4    0.477 0.01400        0.450        0.505
   83    456       3    0.474 0.01402        0.447        0.502
   86    440       1    0.473 0.01403        0.446        0.501
   87    434       2    0.471 0.01405        0.444        0.499
   89    428       1    0.470 0.01406        0.443        0.498
   90    424       3    0.466 0.01409        0.439        0.495
   91    419       1    0.465 0.01410        0.438        0.494
   92    411       2    0.463 0.01412        0.436        0.491
   93    404       3    0.459 0.01416        0.433        0.488
   95    392       4    0.455 0.01421        0.428        0.483
   96    384       5    0.449 0.01427        0.422        0.478
   97    377       4    0.444 0.01431        0.417        0.473
   98    373       4    0.439 0.01435        0.412        0.468
   99    365       2    0.437 0.01438        0.410        0.466
  100    361       1    0.436 0.01439        0.408        0.465
  102    353       4    0.431 0.01443        0.403        0.460
  103    344       1    0.429 0.01445        0.402        0.459
  106    336       1    0.428 0.01446        0.401        0.458
  107    332       1    0.427 0.01447        0.399        0.456
  108    327       2    0.424 0.01450        0.397        0.454
  110    320       2    0.422 0.01453        0.394        0.451
  111    314       2    0.419 0.01456        0.391        0.449
  112    309       1    0.418 0.01458        0.390        0.447
  113    306       3    0.414 0.01463        0.386        0.443
  114    301       2    0.411 0.01466        0.383        0.441
  115    294       5    0.404 0.01474        0.376        0.434
  116    288       1    0.402 0.01475        0.374        0.432
  117    287       2    0.400 0.01478        0.372        0.430
  118    282       3    0.395 0.01483        0.367        0.425
  119    275       3    0.391 0.01487        0.363        0.421
  120    268       1    0.390 0.01489        0.361        0.420
  123    259       1    0.388 0.01491        0.360        0.418
  124    253       2    0.385 0.01495        0.357        0.415
  125    251       1    0.383 0.01497        0.355        0.414
  127    241       1    0.382 0.01499        0.354        0.412
  129    237       2    0.379 0.01503        0.350        0.409
  130    234       2    0.375 0.01508        0.347        0.406
  133    225       2    0.372 0.01513        0.344        0.403
  135    216       1    0.370 0.01516        0.342        0.401
  136    212       1    0.369 0.01519        0.340        0.400
  137    207       2    0.365 0.01525        0.336        0.396
  139    201       1    0.363 0.01528        0.334        0.394
  141    194       3    0.358 0.01538        0.329        0.389
  143    180       1    0.356 0.01542        0.327        0.387
  145    173       1    0.354 0.01547        0.325        0.385
  146    170       1    0.351 0.01552        0.322        0.383
  154    152       1    0.349 0.01559        0.320        0.381
  155    148       1    0.347 0.01566        0.317        0.379
  156    147       2    0.342 0.01580        0.312        0.375
  160    135       1    0.340 0.01588        0.310        0.372
  163    113       1    0.337 0.01603        0.307        0.369
  169    101       1    0.333 0.01621        0.303        0.367
  173     94       1    0.330 0.01642        0.299        0.363
  177     81       1    0.326 0.01671        0.294        0.360
  182     68       1    0.321 0.01714        0.289        0.356
  187     60       1    0.315 0.01767        0.283        0.352
  189     57       1    0.310 0.01821        0.276        0.348
  191     52       1    0.304 0.01881        0.269        0.343
  192     50       1    0.298 0.01939        0.262        0.338
  194     49       1    0.292 0.01992        0.255        0.334
  201     39       2    0.277 0.02153        0.238        0.322
  207     34       1    0.269 0.02238        0.228        0.316
  209     33       1    0.261 0.02314        0.219        0.310
  211     30       1    0.252 0.02394        0.209        0.303
  213     28       1    0.243 0.02472        0.199        0.297
  215     25       1    0.233 0.02557        0.188        0.289
  238     12       1    0.214 0.02992        0.162        0.281
  253     11       1    0.194 0.03291        0.139        0.271

Plot the survival probability

1 2 3 4 5 6

The KM estimate provides the survival probabilities. We can plot these probabilities to look at the trend of survival over time. The plot provides

  1. survival probability on the \(y-axis\)
  2. time on the \(x-axis\)

Plot the survival probability

1 2 3 4 5 6

library(ggsurvfit)
library(survminer)
ggsurvplot(KM_surgery, 
           data = cancer, 
           risk.table = TRUE, 
           linetype = c(1,4),
           tables.height = 0.3,
           pval = TRUE)

Add more aesthetics to the graph

1 2 3 4 5 6

library(ggsurvfit)
library(survminer)
ggsurvplot(KM_surgery, 
           data = cancer, 
           palette=c("hotpink","darkblue"),
           surv.median.line = "hv",
           pval = TRUE,
           risk.table = "abs_pct",
                 size = 1.2, conf.int = FALSE, 
                 legend.labs = levels(cancer$surgery),
                 legend.title = "",
                 ggtheme = theme_minimal() + 
             theme(plot.title = element_text(face = "bold")),
                 title = "Survival Probability",
                 xlab = "Time",
                 ylab = "Survival Probability",
                 legend = "top", censor = FALSE)

Kaplan-Meier survival estimates

1 2 3 4 5 6

Does survival time vary with gender?

We can perform the Kaplan-Meier estimates for variable sex too:

KM_dm <- survfit(Surv(time = survivaltime, 
                      event = Survival == "Died" ) ~ sex,
                 data = cancer)
summary(KM_dm)
Call: survfit(formula = Surv(time = survivaltime, event = Survival == 
    "Died") ~ sex, data = cancer)

                sex=Female 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    1    801      14    0.983 0.00463       0.9735        0.992
    2    780      20    0.957 0.00716       0.9434        0.971
    3    758      12    0.942 0.00828       0.9261        0.959
    4    739      15    0.923 0.00947       0.9047        0.942
    5    718       8    0.913 0.01004       0.8933        0.933
    6    706      12    0.897 0.01082       0.8763        0.919
    7    690       9    0.886 0.01136       0.8636        0.908
    8    679       4    0.880 0.01159       0.8579        0.903
    9    670      11    0.866 0.01219       0.8423        0.890
   10    657       8    0.855 0.01260       0.8310        0.880
   11    644      10    0.842 0.01308       0.8168        0.868
   12    628      13    0.825 0.01368       0.7982        0.852
   13    614      10    0.811 0.01410       0.7840        0.839
   14    602       4    0.806 0.01426       0.7783        0.834
   15    596       9    0.794 0.01461       0.7655        0.823
   16    587       4    0.788 0.01476       0.7598        0.818
   17    582       9    0.776 0.01508       0.7470        0.806
   18    568       6    0.768 0.01529       0.7385        0.798
   19    555       7    0.758 0.01553       0.7283        0.789
   20    547       5    0.751 0.01569       0.7211        0.783
   21    540       8    0.740 0.01594       0.7095        0.772
   22    528       7    0.730 0.01616       0.6993        0.763
   23    515       7    0.720 0.01637       0.6890        0.753
   24    505       4    0.715 0.01649       0.6831        0.748
   25    499       5    0.707 0.01663       0.6756        0.741
   26    490      10    0.693 0.01690       0.6607        0.727
   27    476       3    0.689 0.01698       0.6562        0.723
   28    467       6    0.680 0.01715       0.6470        0.714
   29    459       2    0.677 0.01720       0.6440        0.711
   30    453       8    0.665 0.01741       0.6317        0.700
   31    444       5    0.657 0.01753       0.6240        0.693
   32    439       3    0.653 0.01760       0.6193        0.688
   33    434       1    0.651 0.01762       0.6178        0.687
   34    431       2    0.648 0.01767       0.6147        0.684
   35    427       5    0.641 0.01779       0.6069        0.677
   36    422       3    0.636 0.01786       0.6022        0.672
   37    417       2    0.633 0.01790       0.5991        0.669
   38    412       8    0.621 0.01807       0.5865        0.657
   39    400       2    0.618 0.01811       0.5833        0.654
   40    395       4    0.612 0.01820       0.5769        0.648
   41    386       5    0.604 0.01830       0.5688        0.641
   42    379       3    0.599 0.01837       0.5639        0.636
   43    374       3    0.594 0.01843       0.5590        0.631
   44    368       3    0.589 0.01849       0.5541        0.627
   45    363       2    0.586 0.01853       0.5507        0.623
   46    358       1    0.584 0.01855       0.5491        0.622
   47    353       4    0.578 0.01863       0.5423        0.615
   48    347       1    0.576 0.01865       0.5406        0.614
   49    344       1    0.574 0.01867       0.5389        0.612
   50    343       3    0.569 0.01873       0.5338        0.607
   51    336       3    0.564 0.01879       0.5286        0.602
   52    332       3    0.559 0.01885       0.5234        0.597
   53    327       4    0.552 0.01893       0.5164        0.591
   54    320       3    0.547 0.01899       0.5112        0.586
   56    312       2    0.544 0.01903       0.5076        0.582
   57    310       1    0.542 0.01905       0.5058        0.581
   58    306       4    0.535 0.01912       0.4986        0.574
   59    300       3    0.529 0.01918       0.4932        0.568
   60    297       4    0.522 0.01925       0.4859        0.561
   61    291       3    0.517 0.01930       0.4805        0.556
   63    284       4    0.510 0.01937       0.4731        0.549
   64    279       4    0.502 0.01943       0.4657        0.542
   65    272       2    0.499 0.01947       0.4619        0.538
   66    267       3    0.493 0.01951       0.4562        0.533
   68    263       1    0.491 0.01953       0.4544        0.531
   69    260       1    0.489 0.01955       0.4524        0.529
   70    259       1    0.487 0.01956       0.4505        0.527
   73    254       1    0.485 0.01958       0.4486        0.525
   74    253       1    0.484 0.01959       0.4466        0.524
   75    249       1    0.482 0.01961       0.4447        0.522
   76    248       1    0.480 0.01963       0.4427        0.520
   78    246       3    0.474 0.01968       0.4368        0.514
   80    237       1    0.472 0.01970       0.4348        0.512
   82    231       2    0.468 0.01974       0.4306        0.508
   83    228       3    0.462 0.01979       0.4244        0.502
   84    222       1    0.460 0.01981       0.4223        0.500
   88    217       1    0.457 0.01984       0.4201        0.498
   90    211       1    0.455 0.01986       0.4179        0.496
   92    207       1    0.453 0.01988       0.4157        0.494
   93    203       3    0.446 0.01996       0.4089        0.487
   94    199       1    0.444 0.01999       0.4066        0.485
   95    196       3    0.437 0.02006       0.3997        0.478
   97    190       1    0.435 0.02009       0.3973        0.476
   98    189       1    0.433 0.02012       0.3950        0.474
   99    188       1    0.430 0.02014       0.3927        0.472
  102    184       3    0.423 0.02021       0.3855        0.465
  103    180       1    0.421 0.02024       0.3832        0.463
  106    173       1    0.419 0.02027       0.3807        0.460
  107    171       1    0.416 0.02030       0.3782        0.458
  110    166       1    0.414 0.02033       0.3756        0.455
  111    163       1    0.411 0.02036       0.3731        0.453
  112    161       1    0.409 0.02039       0.3705        0.451
  114    158       2    0.403 0.02046       0.3652        0.446
  115    153       2    0.398 0.02053       0.3598        0.440
  116    150       1    0.395 0.02056       0.3571        0.438
  119    145       1    0.393 0.02060       0.3543        0.435
  123    138       2    0.387 0.02069       0.3485        0.430
  130    124       2    0.381 0.02082       0.3421        0.424
  133    119       1    0.378 0.02089       0.3388        0.421
  137    108       1    0.374 0.02099       0.3351        0.418
  139    106       1    0.371 0.02109       0.3314        0.414
  141    101       3    0.360 0.02140       0.3200        0.404
  155     85       1    0.355 0.02156       0.3155        0.400
  156     84       1    0.351 0.02171       0.3110        0.396
  163     62       1    0.345 0.02209       0.3047        0.392
  169     54       1    0.339 0.02259       0.2975        0.386
  177     45       1    0.331 0.02331       0.2888        0.380
  182     37       1    0.323 0.02434       0.2782        0.374
  184     32       1    0.312 0.02558       0.2661        0.367
  187     29       1    0.302 0.02687       0.2534        0.359
  192     25       1    0.290 0.02838       0.2390        0.351
  194     24       1    0.278 0.02965       0.2251        0.342
  201     18       1    0.262 0.03176       0.2067        0.332
  207     17       1    0.247 0.03343       0.1892        0.322
  209     15       1    0.230 0.03501       0.1709        0.310
  215     13       2    0.195 0.03753       0.1336        0.284
  238      6       1    0.162 0.04309       0.0965        0.273

                sex=Male 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    1   1023      29    0.972 0.00519        0.962        0.982
    2    992      22    0.950 0.00681        0.937        0.964
    3    967      14    0.936 0.00764        0.921        0.951
    4    948      10    0.926 0.00817        0.911        0.943
    5    931      10    0.917 0.00867        0.900        0.934
    6    912      18    0.898 0.00949        0.880        0.917
    7    890      15    0.883 0.01010        0.864        0.903
    8    869      16    0.867 0.01070        0.846        0.888
    9    845      17    0.850 0.01129        0.828        0.872
   10    823      16    0.833 0.01180        0.810        0.857
   11    800      15    0.817 0.01225        0.794        0.842
   12    782      15    0.802 0.01267        0.777        0.827
   13    761      16    0.785 0.01308        0.760        0.811
   14    739      11    0.773 0.01336        0.747        0.800
   15    721      16    0.756 0.01373        0.730        0.783
   16    704      16    0.739 0.01408        0.712        0.767
   17    683      12    0.726 0.01432        0.698        0.755
   18    668      10    0.715 0.01451        0.687        0.744
   19    652       8    0.706 0.01466        0.678        0.736
   20    643       8    0.697 0.01480        0.669        0.727
   21    633       7    0.690 0.01492        0.661        0.720
   22    621       7    0.682 0.01504        0.653        0.712
   23    607       6    0.675 0.01514        0.646        0.706
   24    599      10    0.664 0.01530        0.635        0.695
   25    586       6    0.657 0.01540        0.628        0.688
   26    579       5    0.651 0.01547        0.622        0.683
   27    573       4    0.647 0.01553        0.617        0.678
   28    567       8    0.638 0.01564        0.608        0.669
   29    554       8    0.629 0.01575        0.598        0.660
   30    543       4    0.624 0.01581        0.594        0.656
   31    536       4    0.619 0.01586        0.589        0.651
   32    528       4    0.615 0.01591        0.584        0.647
   33    517       4    0.610 0.01596        0.579        0.642
   34    513       1    0.609 0.01598        0.578        0.641
   35    509       1    0.607 0.01599        0.577        0.640
   36    506       2    0.605 0.01602        0.574        0.637
   37    500       4    0.600 0.01607        0.570        0.633
   38    492       2    0.598 0.01610        0.567        0.630
   39    489       4    0.593 0.01615        0.562        0.625
   40    482       4    0.588 0.01620        0.557        0.621
   41    475       3    0.584 0.01624        0.553        0.617
   42    469       6    0.577 0.01632        0.546        0.610
   43    460       4    0.572 0.01637        0.541        0.605
   45    451       2    0.569 0.01639        0.538        0.602
   46    449       4    0.564 0.01644        0.533        0.597
   47    445       2    0.562 0.01647        0.530        0.595
   48    439       4    0.557 0.01651        0.525        0.590
   49    433       6    0.549 0.01658        0.517        0.582
   50    424       8    0.538 0.01667        0.507        0.572
   51    412       3    0.535 0.01670        0.503        0.568
   52    407       4    0.529 0.01674        0.497        0.563
   53    402       3    0.525 0.01677        0.493        0.559
   54    395       2    0.523 0.01679        0.491        0.557
   56    389       1    0.521 0.01680        0.489        0.555
   57    388       1    0.520 0.01681        0.488        0.554
   58    386       1    0.519 0.01682        0.487        0.553
   59    385       1    0.517 0.01683        0.485        0.551
   60    381       3    0.513 0.01686        0.481        0.547
   61    374       4    0.508 0.01690        0.476        0.542
   62    367       2    0.505 0.01693        0.473        0.539
   63    362       7    0.495 0.01700        0.463        0.530
   65    352       2    0.492 0.01702        0.460        0.527
   66    346       3    0.488 0.01705        0.456        0.523
   67    340       1    0.487 0.01706        0.454        0.521
   68    338       4    0.481 0.01710        0.449        0.516
   69    333       3    0.477 0.01712        0.444        0.511
   70    326       4    0.471 0.01716        0.438        0.506
   71    318       1    0.469 0.01717        0.437        0.504
   73    312       1    0.468 0.01718        0.435        0.503
   74    306       1    0.466 0.01719        0.434        0.501
   76    301       2    0.463 0.01722        0.431        0.498
   77    296       2    0.460 0.01724        0.427        0.495
   78    293       1    0.458 0.01726        0.426        0.494
   79    291       2    0.455 0.01728        0.423        0.490
   81    281       2    0.452 0.01731        0.419        0.487
   82    275       2    0.449 0.01734        0.416        0.484
   86    265       1    0.447 0.01736        0.414        0.482
   87    259       3    0.442 0.01741        0.409        0.477
   89    254       1    0.440 0.01743        0.407        0.476
   90    252       2    0.437 0.01746        0.404        0.472
   91    249       1    0.435 0.01748        0.402        0.471
   92    243       1    0.433 0.01750        0.400        0.469
   95    232       2    0.429 0.01755        0.396        0.465
   96    228       5    0.420 0.01766        0.387        0.456
   97    221       3    0.414 0.01773        0.381        0.451
   98    217       3    0.409 0.01779        0.375        0.445
   99    210       1    0.407 0.01781        0.373        0.443
  100    208       1    0.405 0.01783        0.371        0.441
  102    202       1    0.403 0.01785        0.369        0.439
  105    197       1    0.401 0.01788        0.367        0.437
  108    190       2    0.396 0.01794        0.363        0.433
  110    184       1    0.394 0.01797        0.361        0.431
  111    181       1    0.392 0.01800        0.358        0.429
  113    177       3    0.385 0.01810        0.352        0.423
  115    171       3    0.379 0.01820        0.345        0.416
  117    168       2    0.374 0.01826        0.340        0.412
  118    164       3    0.367 0.01835        0.333        0.405
  119    158       2    0.363 0.01841        0.328        0.401
  120    154       2    0.358 0.01847        0.323        0.396
  124    145       2    0.353 0.01854        0.318        0.391
  125    143       1    0.351 0.01857        0.316        0.389
  127    139       1    0.348 0.01861        0.313        0.386
  129    137       2    0.343 0.01868        0.308        0.382
  133    130       1    0.340 0.01872        0.305        0.379
  135    128       1    0.338 0.01877        0.303        0.376
  136    125       1    0.335 0.01881        0.300        0.374
  137    122       1    0.332 0.01885        0.297        0.371
  140    117       1    0.329 0.01890        0.294        0.369
  142    108       1    0.326 0.01897        0.291        0.366
  143    104       1    0.323 0.01905        0.288        0.363
  145    101       2    0.317 0.01920        0.281        0.357
  146     97       1    0.313 0.01928        0.278        0.354
  154     84       1    0.310 0.01941        0.274        0.350
  156     81       1    0.306 0.01954        0.270        0.347
  160     74       1    0.302 0.01971        0.266        0.343
  169     61       1    0.297 0.02000        0.260        0.339
  173     54       1    0.291 0.02037        0.254        0.334
  177     47       1    0.285 0.02086        0.247        0.329
  189     36       1    0.277 0.02173        0.238        0.323
  191     33       1    0.269 0.02264        0.228        0.317
  201     27       1    0.259 0.02389        0.216        0.310
  211     21       1    0.247 0.02574        0.201        0.303
  213     19       1    0.234 0.02746        0.186        0.294
  253      7       1    0.200 0.03884        0.137        0.293

Plot the survival probability

1 2 3 4 5 6

And then we can plot the survival estimates for male and female patients :

ggsurvplot(KM_dm, 
           data = cancer, 
           risk.table = TRUE, 
           linetype = c(1,4), 
           tables.height = 0.3,
           pval = TRUE)

Kaplan-Meier survival estimates

1 2 3 4 5 6

Does survival time vary with race?

We can perform the Kaplan-Meier estimates for variable race too:

KM_dm <- survfit(Surv(time = survivaltime, 
                      event = Survival == "Died" ) ~ race,
                 data = cancer)
summary(KM_dm)
Call: survfit(formula = Surv(time = survivaltime, event = Survival == 
    "Died") ~ race, data = cancer)

                race=Black 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    1     83       2    0.976  0.0168        0.943        1.000
    2     80       1    0.964  0.0206        0.924        1.000
    3     79       5    0.903  0.0327        0.841        0.969
    5     74       1    0.891  0.0344        0.825        0.961
    6     71       2    0.865  0.0378        0.794        0.943
    8     68       1    0.853  0.0393        0.779        0.933
    9     66       1    0.840  0.0408        0.764        0.924
   10     65       3    0.801  0.0446        0.718        0.893
   11     61       1    0.788  0.0458        0.703        0.883
   12     59       4    0.734  0.0499        0.643        0.839
   13     55       1    0.721  0.0507        0.628        0.828
   14     52       1    0.707  0.0516        0.613        0.816
   15     51       2    0.680  0.0532        0.583        0.792
   17     48       1    0.665  0.0539        0.568        0.780
   20     47       1    0.651  0.0546        0.553        0.767
   21     45       1    0.637  0.0553        0.537        0.755
   22     44       1    0.622  0.0559        0.522        0.742
   24     43       2    0.593  0.0569        0.492        0.716
   27     40       2    0.564  0.0578        0.461        0.689
   28     38       1    0.549  0.0581        0.446        0.675
   41     35       3    0.502  0.0592        0.398        0.632
   42     31       1    0.486  0.0594        0.382        0.617
   44     29       1    0.469  0.0597        0.365        0.602
   47     28       1    0.452  0.0599        0.349        0.586
   50     26       1    0.435  0.0600        0.332        0.570
   97     18       1    0.411  0.0614        0.306        0.550
   98     17       1    0.386  0.0623        0.282        0.530
  102     16       1    0.362  0.0629        0.258        0.509
  119     13       2    0.307  0.0644        0.203        0.463
  177      4       1    0.230  0.0821        0.114        0.463

                race=Others 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    1    157       6    0.962  0.0153       0.9323        0.992
    3    149       1    0.955  0.0165       0.9235        0.988
    4    147       2    0.942  0.0187       0.9065        0.980
    6    142       3    0.922  0.0215       0.8812        0.966
    7    139       3    0.903  0.0239       0.8568        0.951
    8    135       1    0.896  0.0247       0.8488        0.946
    9    133       1    0.889  0.0254       0.8407        0.940
   11    131       1    0.882  0.0261       0.8326        0.935
   12    130       4    0.855  0.0286       0.8009        0.913
   13    125       3    0.835  0.0303       0.7774        0.896
   14    120       1    0.828  0.0308       0.7695        0.890
   15    119       1    0.821  0.0313       0.7616        0.884
   18    116       1    0.814  0.0318       0.7536        0.879
   19    112       2    0.799  0.0329       0.7372        0.866
   20    110       1    0.792  0.0334       0.7291        0.860
   21    109       2    0.777  0.0343       0.7129        0.848
   22    106       2    0.763  0.0352       0.6967        0.835
   23    103       1    0.755  0.0356       0.6886        0.828
   24    102       1    0.748  0.0360       0.6805        0.822
   25    101       2    0.733  0.0368       0.6643        0.809
   26     99       1    0.726  0.0372       0.6563        0.802
   29     97       1    0.718  0.0375       0.6482        0.796
   30     95       1    0.711  0.0379       0.6401        0.789
   32     93       1    0.703  0.0383       0.6318        0.782
   33     92       1    0.695  0.0386       0.6236        0.775
   34     90       1    0.688  0.0389       0.6154        0.768
   37     86       1    0.680  0.0393       0.6068        0.761
   39     83       1    0.671  0.0397       0.5980        0.754
   40     79       1    0.663  0.0401       0.5889        0.746
   41     77       1    0.654  0.0405       0.5796        0.739
   43     76       1    0.646  0.0408       0.5704        0.731
   44     74       1    0.637  0.0412       0.5611        0.723
   47     71       1    0.628  0.0416       0.5516        0.715
   50     69       2    0.610  0.0423       0.5322        0.699
   52     66       1    0.601  0.0427       0.5225        0.690
   53     64       1    0.591  0.0430       0.5126        0.682
   56     63       2    0.572  0.0437       0.4929        0.665
   62     61       1    0.563  0.0439       0.4832        0.656
   63     60       2    0.544  0.0444       0.4638        0.639
   66     57       1    0.535  0.0447       0.4539        0.630
   68     55       1    0.525  0.0449       0.4439        0.621
   78     48       1    0.514  0.0453       0.4325        0.611
   79     47       1    0.503  0.0456       0.4212        0.601
   81     44       1    0.492  0.0460       0.4093        0.591
   86     42       1    0.480  0.0464       0.3972        0.580
   87     41       1    0.468  0.0467       0.3851        0.569
   96     39       1    0.456  0.0470       0.3728        0.558
   98     38       1    0.444  0.0473       0.3606        0.547
  102     36       1    0.432  0.0476       0.3481        0.536
  111     33       1    0.419  0.0479       0.3347        0.524
  115     32       1    0.406  0.0481       0.3215        0.512
  119     28       1    0.391  0.0485       0.3068        0.499
  120     26       1    0.376  0.0490       0.2915        0.485
  142     18       1    0.355  0.0505       0.2689        0.469
  143     17       1    0.334  0.0517       0.2470        0.453
  207      5       1    0.268  0.0727       0.1570        0.456
  209      4       1    0.201  0.0796       0.0922        0.436
  215      3       1    0.134  0.0761       0.0438        0.408

                race=White 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    1   1584      35    0.978 0.00369        0.971        0.985
    2   1543      41    0.952 0.00538        0.941        0.963
    3   1497      20    0.939 0.00601        0.927        0.951
    4   1466      23    0.924 0.00666        0.912        0.938
    5   1433      17    0.913 0.00709        0.900        0.928
    6   1405      25    0.897 0.00767        0.882        0.912
    7   1373      21    0.884 0.00812        0.868        0.900
    8   1345      18    0.872 0.00848        0.855        0.888
    9   1316      26    0.854 0.00896        0.837        0.872
   10   1283      21    0.840 0.00931        0.822        0.859
   11   1252      23    0.825 0.00968        0.806        0.844
   12   1221      20    0.812 0.00999        0.792        0.831
   13   1195      22    0.797 0.01030        0.777        0.817
   14   1169      13    0.788 0.01047        0.767        0.809
   15   1147      22    0.773 0.01076        0.752        0.794
   16   1125      20    0.759 0.01099        0.738        0.781
   17   1100      20    0.745 0.01122        0.723        0.767
   18   1073      15    0.735 0.01138        0.713        0.757
   19   1048      13    0.726 0.01152        0.703        0.748
   20   1033      11    0.718 0.01163        0.695        0.741
   21   1019      12    0.709 0.01174        0.687        0.733
   22    999      11    0.702 0.01185        0.679        0.725
   23    976      12    0.693 0.01196        0.670        0.717
   24    959      11    0.685 0.01206        0.662        0.709
   25    943       9    0.678 0.01214        0.655        0.703
   26    930      14    0.668 0.01226        0.645        0.693
   27    912       5    0.665 0.01230        0.641        0.689
   28    899      13    0.655 0.01241        0.631        0.680
   29    880       9    0.648 0.01248        0.624        0.673
   30    865      11    0.640 0.01257        0.616        0.665
   31    851       9    0.633 0.01264        0.609        0.659
   32    838       6    0.629 0.01268        0.604        0.654
   33    824       4    0.626 0.01271        0.601        0.651
   34    819       2    0.624 0.01273        0.600        0.650
   35    812       6    0.620 0.01277        0.595        0.645
   36    804       5    0.616 0.01281        0.591        0.641
   37    796       5    0.612 0.01284        0.587        0.638
   38    785      10    0.604 0.01291        0.579        0.630
   39    771       5    0.600 0.01295        0.575        0.626
   40    763       7    0.595 0.01300        0.570        0.621
   41    749       4    0.591 0.01302        0.566        0.618
   42    741       8    0.585 0.01308        0.560        0.611
   43    729       6    0.580 0.01312        0.555        0.607
   44    718       1    0.579 0.01312        0.554        0.606
   45    714       4    0.576 0.01315        0.551        0.603
   46    707       5    0.572 0.01318        0.547        0.599
   47    699       4    0.569 0.01321        0.544        0.595
   48    690       5    0.565 0.01324        0.539        0.591
   49    682       7    0.559 0.01328        0.533        0.586
   50    672       8    0.552 0.01333        0.527        0.579
   51    657       6    0.547 0.01337        0.522        0.574
   52    649       6    0.542 0.01340        0.517        0.569
   53    641       6    0.537 0.01344        0.511        0.564
   54    628       5    0.533 0.01347        0.507        0.560
   56    615       1    0.532 0.01347        0.506        0.559
   57    614       2    0.530 0.01348        0.504        0.557
   58    608       5    0.526 0.01351        0.500        0.553
   59    601       4    0.522 0.01354        0.496        0.550
   60    594       7    0.516 0.01357        0.490        0.543
   61    581       7    0.510 0.01361        0.484        0.537
   62    568       1    0.509 0.01362        0.483        0.536
   63    563       9    0.501 0.01367        0.475        0.528
   64    552       4    0.497 0.01369        0.471        0.525
   65    544       4    0.494 0.01371        0.467        0.521
   66    533       5    0.489 0.01374        0.463        0.517
   67    525       1    0.488 0.01374        0.462        0.516
   68    523       4    0.484 0.01376        0.458        0.512
   69    516       4    0.481 0.01378        0.454        0.508
   70    510       5    0.476 0.01381        0.450        0.504
   71    499       1    0.475 0.01381        0.449        0.503
   73    493       2    0.473 0.01383        0.447        0.501
   74    486       2    0.471 0.01384        0.445        0.499
   75    480       1    0.470 0.01384        0.444        0.498
   76    476       3    0.467 0.01386        0.441        0.495
   77    471       2    0.465 0.01387        0.439        0.493
   78    468       3    0.462 0.01389        0.436        0.490
   79    461       1    0.461 0.01390        0.435        0.489
   80    455       1    0.460 0.01390        0.434        0.488
   81    447       1    0.459 0.01391        0.433        0.487
   82    441       4    0.455 0.01394        0.428        0.483
   83    437       3    0.452 0.01396        0.425        0.480
   84    426       1    0.451 0.01397        0.424        0.479
   87    414       2    0.449 0.01398        0.422        0.477
   88    411       1    0.447 0.01399        0.421        0.476
   89    407       1    0.446 0.01400        0.420        0.475
   90    402       3    0.443 0.01403        0.416        0.471
   91    397       1    0.442 0.01404        0.415        0.470
   92    390       2    0.440 0.01406        0.413        0.468
   93    384       3    0.436 0.01409        0.409        0.465
   94    378       1    0.435 0.01410        0.408        0.464
   95    371       5    0.429 0.01415        0.402        0.458
   96    362       4    0.424 0.01419        0.398        0.453
   97    355       3    0.421 0.01422        0.394        0.450
   98    351       2    0.418 0.01424        0.391        0.447
   99    346       2    0.416 0.01426        0.389        0.445
  100    342       1    0.415 0.01427        0.388        0.444
  102    334       2    0.412 0.01429        0.385        0.441
  103    327       1    0.411 0.01430        0.384        0.440
  105    324       1    0.410 0.01431        0.383        0.439
  106    320       1    0.409 0.01433        0.381        0.438
  107    317       1    0.407 0.01434        0.380        0.436
  108    312       2    0.405 0.01437        0.377        0.434
  110    304       2    0.402 0.01439        0.375        0.431
  111    298       1    0.401 0.01441        0.373        0.430
  112    294       1    0.399 0.01442        0.372        0.429
  113    291       3    0.395 0.01447        0.368        0.425
  114    286       2    0.392 0.01450        0.365        0.422
  115    279       4    0.387 0.01456        0.359        0.416
  116    274       1    0.385 0.01458        0.358        0.415
  117    273       2    0.383 0.01461        0.355        0.412
  118    267       3    0.378 0.01465        0.351        0.408
  120    259       1    0.377 0.01467        0.349        0.407
  123    249       2    0.374 0.01471        0.346        0.404
  124    244       2    0.371 0.01474        0.343        0.401
  125    242       1    0.369 0.01476        0.341        0.399
  127    234       1    0.368 0.01478        0.340        0.398
  129    230       2    0.364 0.01483        0.336        0.395
  130    227       2    0.361 0.01487        0.333        0.392
  133    220       2    0.358 0.01491        0.330        0.388
  135    212       1    0.356 0.01494        0.328        0.387
  136    208       1    0.354 0.01496        0.326        0.385
  137    203       2    0.351 0.01502        0.323        0.382
  139    197       1    0.349 0.01505        0.321        0.380
  140    192       1    0.347 0.01508        0.319        0.378
  141    188       3    0.342 0.01517        0.313        0.373
  145    168       2    0.338 0.01526        0.309        0.369
  146    164       1    0.336 0.01531        0.307        0.367
  154    147       1    0.333 0.01538        0.305        0.365
  155    143       1    0.331 0.01544        0.302        0.363
  156    142       2    0.326 0.01557        0.297        0.358
  160    130       1    0.324 0.01566        0.295        0.356
  163    113       1    0.321 0.01578        0.292        0.354
  169     99       2    0.315 0.01611        0.285        0.348
  173     92       1    0.311 0.01629        0.281        0.345
  177     81       1    0.307 0.01654        0.277        0.342
  182     67       1    0.303 0.01692        0.271        0.338
  184     61       1    0.298 0.01735        0.266        0.334
  187     58       1    0.293 0.01780        0.260        0.330
  189     55       1    0.287 0.01825        0.254        0.325
  191     50       1    0.282 0.01877        0.247        0.321
  192     49       1    0.276 0.01925        0.241        0.316
  194     48       1    0.270 0.01968        0.234        0.312
  201     38       2    0.256 0.02106        0.218        0.301
  211     30       1    0.247 0.02202        0.208        0.294
  213     28       1    0.239 0.02293        0.198        0.288
  215     25       1    0.229 0.02392        0.187        0.281
  238     12       1    0.210 0.02854        0.161        0.274
  253     11       1    0.191 0.03169        0.138        0.264

Plot the survival probability

1 2 3 4 5 6

And then we can plot the survival estimates for patients of different race:

ggsurvplot(KM_dm, 
           data = cancer, 
           risk.table = TRUE, 
           linetype = c(1,4,5), 
           tables.height = 0.3,
           pval = TRUE)

Kaplan-Meier survival estimates

1 2 3 4 5 6

Does survival time vary marital status?

We can perform the Kaplan-Meier estimates for variable marital status too:

KM_ms <- survfit(Surv(time = survivaltime, 
                      event = Survival == "Died" ) ~ marital,
                 data = cancer)
summary(KM_ms)
Call: survfit(formula = Surv(time = survivaltime, event = Survival == 
    "Died") ~ marital, data = cancer)

                marital=Married 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    1   1137      28    0.975 0.00460        0.966        0.984
    2   1104      25    0.953 0.00627        0.941        0.966
    3   1075      17    0.938 0.00715        0.924        0.952
    4   1050      13    0.927 0.00776        0.912        0.942
    5   1027      13    0.915 0.00831        0.899        0.931
    6   1004      14    0.902 0.00887        0.885        0.920
    7    987      15    0.888 0.00941        0.870        0.907
    8    966      11    0.878 0.00979        0.859        0.898
    9    948      17    0.863 0.01033        0.843        0.883
   10    924      16    0.848 0.01081        0.827        0.869
   11    898      17    0.832 0.01128        0.810        0.854
   12    878      17    0.815 0.01172        0.793        0.839
   13    856      17    0.799 0.01213        0.776        0.823
   14    837      11    0.789 0.01237        0.765        0.813
   15    819      14    0.775 0.01268        0.751        0.801
   16    804      12    0.764 0.01292        0.739        0.789
   17    786       9    0.755 0.01310        0.730        0.781
   18    775       9    0.746 0.01327        0.721        0.773
   19    756      10    0.736 0.01345        0.710        0.763
   20    745       9    0.727 0.01361        0.701        0.755
   21    733       9    0.718 0.01377        0.692        0.746
   22    719       7    0.711 0.01389        0.685        0.739
   23    702       8    0.703 0.01402        0.676        0.731
   24    690       8    0.695 0.01415        0.668        0.724
   25    679       8    0.687 0.01428        0.660        0.716
   26    669       8    0.679 0.01440        0.651        0.708
   27    660       3    0.676 0.01444        0.648        0.705
   28    652       9    0.666 0.01457        0.638        0.696
   29    642      10    0.656 0.01471        0.628        0.686
   30    626       6    0.650 0.01479        0.621        0.679
   31    619       5    0.644 0.01486        0.616        0.674
   32    610       3    0.641 0.01490        0.613        0.671
   33    600       5    0.636 0.01496        0.607        0.666
   34    595       3    0.633 0.01500        0.604        0.663
   35    588       4    0.628 0.01505        0.600        0.659
   36    583       2    0.626 0.01508        0.597        0.657
   37    577       3    0.623 0.01512        0.594        0.653
   38    570       7    0.615 0.01521        0.586        0.646
   39    559       4    0.611 0.01526        0.582        0.642
   40    551       4    0.607 0.01531        0.577        0.637
   41    541       3    0.603 0.01534        0.574        0.634
   42    537       5    0.598 0.01541        0.568        0.629
   43    528       5    0.592 0.01547        0.562        0.623
   44    519       2    0.590 0.01549        0.560        0.621
   45    513       2    0.587 0.01552        0.558        0.619
   46    508       3    0.584 0.01555        0.554        0.615
   47    502       4    0.579 0.01560        0.549        0.611
   48    494       3    0.576 0.01564        0.546        0.607
   49    489       7    0.567 0.01572        0.537        0.599
   50    481       8    0.558 0.01581        0.528        0.590
   51    470       2    0.556 0.01583        0.525        0.588
   52    466       5    0.550 0.01588        0.519        0.582
   53    460       4    0.545 0.01593        0.515        0.577
   54    450       3    0.541 0.01596        0.511        0.573
   56    440       3    0.538 0.01599        0.507        0.570
   57    437       2    0.535 0.01601        0.505        0.567
   58    434       2    0.533 0.01603        0.502        0.565
   59    430       1    0.531 0.01604        0.501        0.564
   60    426       3    0.528 0.01607        0.497        0.560
   61    421       5    0.521 0.01613        0.491        0.554
   62    412       1    0.520 0.01614        0.489        0.553
   63    410      10    0.507 0.01623        0.477        0.540
   64    399       2    0.505 0.01625        0.474        0.538
   65    394       3    0.501 0.01628        0.470        0.534
   66    386       4    0.496 0.01632        0.465        0.529
   67    380       1    0.495 0.01632        0.464        0.528
   68    377       2    0.492 0.01634        0.461        0.525
   69    373       1    0.491 0.01635        0.460        0.524
   70    369       4    0.485 0.01639        0.454        0.519
   74    355       1    0.484 0.01640        0.453        0.517
   75    353       1    0.483 0.01641        0.451        0.516
   76    349       3    0.478 0.01644        0.447        0.512
   77    343       2    0.476 0.01647        0.444        0.509
   78    340       3    0.471 0.01650        0.440        0.505
   79    334       2    0.469 0.01652        0.437        0.502
   81    321       1    0.467 0.01653        0.436        0.501
   82    316       2    0.464 0.01656        0.433        0.498
   83    314       1    0.463 0.01657        0.431        0.496
   84    306       1    0.461 0.01659        0.430        0.495
   87    297       2    0.458 0.01662        0.427        0.492
   90    290       2    0.455 0.01666        0.423        0.489
   91    287       1    0.453 0.01667        0.422        0.487
   93    277       2    0.450 0.01671        0.418        0.484
   94    272       1    0.448 0.01673        0.417        0.482
   95    269       1    0.447 0.01675        0.415        0.481
   96    266       2    0.443 0.01679        0.412        0.478
   97    261       4    0.437 0.01688        0.405        0.471
   98    256       3    0.431 0.01694        0.400        0.466
   99    250       2    0.428 0.01698        0.396        0.463
  100    246       1    0.426 0.01700        0.394        0.461
  102    242       3    0.421 0.01706        0.389        0.456
  107    227       1    0.419 0.01708        0.387        0.454
  108    224       1    0.417 0.01711        0.385        0.452
  111    218       1    0.415 0.01714        0.383        0.450
  113    215       1    0.413 0.01717        0.381        0.448
  114    212       1    0.411 0.01719        0.379        0.447
  115    207       3    0.406 0.01729        0.373        0.441
  116    203       1    0.404 0.01732        0.371        0.439
  118    199       1    0.402 0.01735        0.369        0.437
  119    195       2    0.397 0.01741        0.365        0.433
  120    189       2    0.393 0.01748        0.360        0.429
  123    180       2    0.389 0.01756        0.356        0.425
  124    173       2    0.384 0.01764        0.351        0.420
  129    162       2    0.380 0.01774        0.346        0.416
  130    160       2    0.375 0.01783        0.341        0.411
  133    151       1    0.372 0.01788        0.339        0.409
  135    147       1    0.370 0.01794        0.336        0.407
  136    144       1    0.367 0.01800        0.334        0.404
  137    139       1    0.365 0.01806        0.331        0.402
  139    136       1    0.362 0.01813        0.328        0.399
  141    132       3    0.354 0.01833        0.320        0.392
  142    125       1    0.351 0.01840        0.317        0.389
  143    124       1    0.348 0.01847        0.314        0.386
  146    117       1    0.345 0.01855        0.311        0.383
  154    105       1    0.342 0.01866        0.307        0.380
  155    102       1    0.338 0.01877        0.304        0.377
  156    101       2    0.332 0.01899        0.297        0.371
  160     91       1    0.328 0.01913        0.293        0.368
  169     74       2    0.319 0.01961        0.283        0.360
  173     69       1    0.315 0.01987        0.278        0.356
  177     58       2    0.304 0.02061        0.266        0.347
  182     50       1    0.298 0.02107        0.259        0.342
  187     47       1    0.291 0.02156        0.252        0.337
  192     43       1    0.285 0.02209        0.244        0.331
  194     42       1    0.278 0.02258        0.237        0.326
  201     35       2    0.262 0.02392        0.219        0.313
  207     30       1    0.253 0.02466        0.209        0.306
  209     28       1    0.244 0.02539        0.199        0.299
  211     26       1    0.235 0.02609        0.189        0.292
  215     23       1    0.225 0.02688        0.178        0.284
  238      9       1    0.200 0.03353        0.144        0.277
  253      8       1    0.175 0.03749        0.115        0.266

                marital=Separated/divorced/widowed 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    1    270       8    0.970  0.0103       0.9504        0.991
    2    262       9    0.937  0.0148       0.9085        0.966
    3    253       4    0.922  0.0163       0.8908        0.955
    4    249      10    0.885  0.0194       0.8480        0.924
    5    238       4    0.870  0.0205       0.8311        0.911
    6    233       8    0.840  0.0223       0.7978        0.885
    7    221       2    0.833  0.0227       0.7894        0.879
    8    217       5    0.814  0.0238       0.7683        0.862
    9    211       8    0.783  0.0253       0.7348        0.834
   10    203       6    0.760  0.0262       0.7100        0.813
   11    197       3    0.748  0.0267       0.6976        0.802
   12    192       3    0.736  0.0271       0.6852        0.791
   13    189       3    0.725  0.0275       0.6728        0.781
   14    183       3    0.713  0.0279       0.6602        0.770
   15    179       4    0.697  0.0284       0.6435        0.755
   16    175       4    0.681  0.0288       0.6268        0.740
   17    171       3    0.669  0.0291       0.6143        0.729
   18    167       2    0.661  0.0293       0.6060        0.721
   19    164       2    0.653  0.0295       0.5976        0.713
   20    162       1    0.649  0.0296       0.5934        0.710
   21    161       3    0.637  0.0299       0.5809        0.698
   22    157       5    0.617  0.0303       0.5600        0.679
   23    151       1    0.612  0.0303       0.5558        0.675
   24    149       3    0.600  0.0305       0.5431        0.663
   25    145       2    0.592  0.0307       0.5347        0.655
   26    142       4    0.575  0.0309       0.5176        0.639
   27    138       1    0.571  0.0310       0.5134        0.635
   28    136       4    0.554  0.0312       0.4963        0.619
   30    128       2    0.546  0.0313       0.4875        0.610
   31    124       3    0.532  0.0315       0.4741        0.598
   32    121       2    0.524  0.0315       0.4652        0.589
   35    119       1    0.519  0.0316       0.4608        0.585
   36    117       2    0.510  0.0317       0.4519        0.576
   38    113       1    0.506  0.0317       0.4473        0.572
   39    112       1    0.501  0.0317       0.4427        0.567
   40    111       2    0.492  0.0318       0.4337        0.559
   42    107       3    0.478  0.0319       0.4198        0.545
   43    104       2    0.469  0.0319       0.4106        0.536
   45    102       1    0.465  0.0320       0.4060        0.532
   50     99       1    0.460  0.0320       0.4013        0.527
   51     96       1    0.455  0.0320       0.3965        0.522
   52     95       2    0.446  0.0320       0.3870        0.513
   53     93       2    0.436  0.0321       0.3775        0.504
   58     90       1    0.431  0.0321       0.3727        0.499
   59     89       2    0.421  0.0321       0.3631        0.489
   60     87       1    0.417  0.0321       0.3583        0.484
   61     86       1    0.412  0.0320       0.3535        0.480
   65     82       1    0.407  0.0320       0.3485        0.475
   66     80       1    0.402  0.0320       0.3435        0.470
   73     77       2    0.391  0.0321       0.3332        0.459
   74     74       1    0.386  0.0321       0.3279        0.454
   78     71       1    0.380  0.0321       0.3226        0.449
   81     68       1    0.375  0.0321       0.3170        0.443
   82     67       1    0.369  0.0321       0.3115        0.438
   83     66       1    0.364  0.0321       0.3060        0.432
   87     64       1    0.358  0.0321       0.3004        0.427
   88     63       1    0.352  0.0321       0.2948        0.421
   89     62       1    0.347  0.0320       0.2892        0.416
   90     60       1    0.341  0.0320       0.2835        0.410
   96     56       1    0.335  0.0320       0.2775        0.404
  102     53       1    0.328  0.0320       0.2713        0.398
  110     50       1    0.322  0.0321       0.2648        0.391
  115     47       2    0.308  0.0321       0.2512        0.378
  117     45       2    0.295  0.0321       0.2378        0.365
  118     42       1    0.287  0.0321       0.2310        0.358
  119     41       1    0.280  0.0321       0.2241        0.351
  125     39       1    0.273  0.0321       0.2171        0.344
  145     29       2    0.254  0.0325       0.1981        0.327
  163     17       1    0.239  0.0339       0.1815        0.316
  184      7       1    0.205  0.0430       0.1362        0.309
  191      4       1    0.154  0.0549       0.0765        0.310

                marital=Single 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
    1    417       7    0.983 0.00629        0.971        0.996
    2    406       8    0.964 0.00917        0.946        0.982
    3    397       5    0.952 0.01054        0.931        0.973
    4    388       2    0.947 0.01104        0.925        0.969
    5    384       1    0.944 0.01128        0.922        0.967
    6    381       8    0.925 0.01304        0.899        0.950
    7    372       7    0.907 0.01436        0.879        0.936
    8    365       4    0.897 0.01504        0.868        0.927
    9    356       3    0.890 0.01553        0.860        0.921
   10    353       2    0.885 0.01585        0.854        0.916
   11    349       5    0.872 0.01660        0.840        0.905
   12    340       8    0.851 0.01773        0.817        0.887
   13    330       6    0.836 0.01850        0.800        0.873
   14    321       1    0.833 0.01862        0.798        0.871
   15    319       7    0.815 0.01945        0.778        0.854
   16    312       4    0.805 0.01989        0.766        0.845
   17    308       9    0.781 0.02080        0.741        0.823
   18    294       5    0.768 0.02127        0.727        0.811
   19    287       3    0.760 0.02155        0.719        0.803
   20    283       3    0.752 0.02182        0.710        0.796
   21    279       3    0.744 0.02208        0.702        0.788
   22    273       2    0.738 0.02225        0.696        0.783
   23    269       4    0.727 0.02259        0.684        0.773
   24    265       3    0.719 0.02282        0.676        0.765
   25    261       1    0.716 0.02290        0.673        0.763
   26    258       3    0.708 0.02314        0.664        0.755
   27    251       3    0.699 0.02337        0.655        0.747
   28    246       1    0.697 0.02345        0.652        0.744
   30    242       4    0.685 0.02375        0.640        0.733
   31    237       1    0.682 0.02383        0.637        0.730
   32    236       2    0.676 0.02398        0.631        0.725
   35    229       1    0.673 0.02405        0.628        0.722
   36    228       1    0.670 0.02413        0.625        0.719
   37    226       3    0.662 0.02435        0.616        0.711
   38    221       2    0.656 0.02449        0.609        0.705
   39    218       1    0.653 0.02456        0.606        0.703
   40    215       2    0.647 0.02471        0.600        0.697
   41    212       5    0.631 0.02505        0.584        0.682
   42    204       1    0.628 0.02512        0.581        0.679
   44    200       1    0.625 0.02519        0.578        0.676
   45    199       1    0.622 0.02525        0.574        0.673
   46    198       2    0.616 0.02539        0.568        0.667
   47    195       2    0.609 0.02552        0.561        0.661
   48    192       2    0.603 0.02564        0.555        0.655
   50    187       2    0.596 0.02577        0.548        0.649
   51    182       3    0.587 0.02596        0.538        0.640
   53    176       1    0.583 0.02603        0.534        0.637
   54    174       2    0.577 0.02616        0.528        0.630
   58    168       2    0.570 0.02629        0.520        0.624
   59    166       1    0.566 0.02636        0.517        0.620
   60    165       3    0.556 0.02654        0.506        0.611
   61    158       1    0.553 0.02660        0.503        0.607
   62    155       1    0.549 0.02667        0.499        0.604
   63    152       1    0.545 0.02674        0.495        0.600
   64    151       2    0.538 0.02687        0.488        0.593
   66    147       1    0.534 0.02693        0.484        0.590
   68    145       3    0.523 0.02712        0.473        0.579
   69    142       3    0.512 0.02729        0.462        0.569
   70    139       1    0.509 0.02734        0.458        0.565
   71    136       1    0.505 0.02740        0.454        0.562
   80    127       1    0.501 0.02747        0.450        0.558
   82    123       1    0.497 0.02754        0.446        0.554
   83    121       1    0.493 0.02762        0.441        0.550
   86    117       1    0.489 0.02770        0.437        0.546
   92    110       2    0.480 0.02790        0.428        0.538
   93    107       1    0.475 0.02800        0.423        0.533
   95    103       4    0.457 0.02839        0.404        0.516
   96     97       2    0.447 0.02858        0.395        0.507
   98     95       1    0.443 0.02866        0.390        0.502
  103     89       1    0.438 0.02877        0.385        0.498
  105     87       1    0.433 0.02887        0.380        0.493
  106     85       1    0.428 0.02898        0.374        0.488
  108     83       1    0.422 0.02908        0.369        0.483
  110     81       1    0.417 0.02919        0.364        0.478
  111     79       1    0.412 0.02929        0.358        0.473
  112     77       1    0.407 0.02940        0.353        0.468
  113     74       2    0.396 0.02961        0.342        0.458
  114     72       1    0.390 0.02970        0.336        0.453
  118     69       1    0.384 0.02981        0.330        0.447
  127     65       1    0.378 0.02993        0.324        0.442
  133     63       1    0.372 0.03005        0.318        0.436
  137     57       1    0.366 0.03023        0.311        0.430
  140     53       1    0.359 0.03043        0.304        0.424
  189     14       1    0.333 0.03754        0.267        0.416
  213      8       1    0.292 0.05097        0.207        0.411
  215      6       1    0.243 0.06143        0.148        0.399

Plot the survival probability

1 2 3 4 5 6

And then we can plot the survival estimates for patients based on marital status:

ggsurvplot(KM_ms, 
           data = cancer, 
           risk.table = TRUE, 
           linetype = c(1,4,5), 
           tables.height = 0.3,
           pval = TRUE)

Comparing Kaplan-Meier estimates across groups

1 2 3 4 5 6

There are a number of available tests to compare the survival estimates between groups based on KM. The tests include:

  1. log-rank (default)
  2. peto-peto test

Log-rank test

1 2 3 4 5 6

From Kaplan-Meier survival curves, we could see the graphical representation of survival probabilities in different group over time. And to answer question if the survival estimates are different between levels or groups we can use statistical tests for example the log rank and the peto-peto tests.

For all the test, the null hypothesis is that that the survival estimates between levels or groups are not different. For example, to do that:

Log-rank test

1 2 3 4 5 6

survdiff(Surv(time = survivaltime, 
              event = Survival == "Died") ~ surgery, 
         data = cancer,
         rho = 0)
Call:
survdiff(formula = Surv(time = survivaltime, event = Survival == 
    "Died") ~ surgery, data = cancer, rho = 0)

               N Observed Expected (O-E)^2/E (O-E)^2/V
surgery=No   215      154     92.1     41.57      46.5
surgery=Yes 1609      835    896.9      4.27      46.5

 Chisq= 46.5  on 1 degrees of freedom, p= 0.000000000009 

Log-rank test

1 2 3 4 5 6

The survival estimates between sex (Male vs Female groups) are not different at the level of \(5\%\) significance (p-value = 0.2).

survdiff(Surv(time = survivaltime, 
              event = Survival == "Died") ~ sex, 
         data = cancer,
         rho = 0)
Call:
survdiff(formula = Surv(time = survivaltime, event = Survival == 
    "Died") ~ sex, data = cancer, rho = 0)

              N Observed Expected (O-E)^2/E (O-E)^2/V
sex=Female  801      425      445     0.869       1.6
sex=Male   1023      564      544     0.710       1.6

 Chisq= 1.6  on 1 degrees of freedom, p= 0.2 

The survival estimates between Male and female patients are not different (p-value = 0.1).

peto-peto test

1 2 3 4 5 6

We will be confident with our results if we obtain almost similar findings from other tests. So, now let’s compare survival estimates using the peto-peto test.

This is the result for comparing survival estimates for surgery status using peto-peto test.

peto-peto test

1 2 3 4 5 6

survdiff(Surv(time = survivaltime, 
              event = Survival == "Died") ~ surgery, 
         data = cancer,
         rho = 1)
Call:
survdiff(formula = Surv(time = survivaltime, event = Survival == 
    "Died") ~ surgery, data = cancer, rho = 1)

               N Observed Expected (O-E)^2/E (O-E)^2/V
surgery=No   215      120     65.9     44.70      65.2
surgery=Yes 1609      575    629.6      4.68      65.2

 Chisq= 65.2  on 1 degrees of freedom, p= 0.0000000000000007 

peto-peto test

1 2 3 4 5 6

This is the result for comparing survival estimates different genders using peto-peto test.

survdiff(Surv(time = survivaltime, 
              event = Survival == "Died") ~ sex, 
         data = cancer,
         rho = 1)
Call:
survdiff(formula = Surv(time = survivaltime, event = Survival == 
    "Died") ~ sex, data = cancer, rho = 1)

              N Observed Expected (O-E)^2/E (O-E)^2/V
sex=Female  801      295      312     0.893      2.17
sex=Male   1023      401      384     0.724      2.17

 Chisq= 2.2  on 1 degrees of freedom, p= 0.1 

Semi-parametric in survival analysis

1 2 3 4 5 6

One advantage of time-to-event data (from a cohort study) is the ability to estimate the hazard or risk to develop the event (outcome) of interest. However, the challenge in the cohort study is the presence of censoring. Censoring can happen due to

  • patients leave the study (loss to follow up) randomly
  • patients do not experience the event even at the termination of the study
  • patients are withdrawn from the study

In censored patients, we do not know exactly the time for them to develop the event.

Semi-parametric in survival analysis

1 2 3 4 5 6

To explore how to incorporate a regression model-like structure into the hazard function, we can model the hazard function using:

\[h(t) = \theta_0\] The hazard function is a rate, and because of that it must be strictly positive. To constrain \(\theta\) at greater than zero, we can parameterize the hazard function as:

\[h(t) = \exp^{\beta_0}\]

Semi-parametric models in survival analysis

1 2 3 4 5 6

So for a covariate \(x\) the log-hazard function is:

\[ln[h(t.x)] = \beta_0 + \beta_1(x)\] and the hazard function is

\[h(t.x) = exp^{\beta_0 + \beta_1(x)}\]

This is the exponential distribution which is one example of a fully parametric hazard function. Fully parametric models accomplishes two goals simultaneously:

  • It describes the basic underlying distribution of survival time (error component)
  • It characterizes how that distribution changes as a function of the covariates (systematic component).

Semi-parametric models in survival analysis

1 2 3 4 5 6

However, even though fully parametric models can be used to accomplish the above goals, the assumptions required for their error components may be unnecessarily stringent or unrealistic. One option is to have a fully parametric regression structure but leave their dependence on time unspecified. The models that utilize this approach are called semiparametric regression models.

Cox proportional hazards regression

1 2 3 4 5 6

If we want to compare the survival experience of cancer patients based on surgery status , one form of a regression model for the hazard function that addresses the study goal is:

\[h(t,x,\beta) = h_0(t)r(x,\beta)\] We can see that the hazard function is the product of two functions:

  • The function, \(h_0(t)\), characterizes how the hazard function changes as a function of survival time.
  • The function, \(r(x,\beta)\), characterizes how the hazard function changes as a function of subject covariates.

The \(h_0(t)\) is frequently referred to as the baseline hazard function.

Cox proportional hazards regression

1 2 3 4 5 6

The hazard ratio (HR) depends only on the function \(r(x,\beta)\). If the ratio function \(HR(t,x_1,x_0)\) has a clear clinical interpretation then, the actual form of the baseline hazard function is of little importance.

With this parameterization the hazard function is

\[h(t,x,\beta) = h_o(t)exp^{x \beta}\]

and the hazard ratio is

\[HR(t,x_1, x_0) = exp^{\beta(x_1 - x_0)}\]

Cox proportional hazards regression

1 2 3 4 5 6

This model is referred to in the literature by a variety of terms, such as the Cox model, the Cox proportional hazards model or simply the proportional hazards model.

So for example, if we have a covariate which is a dichomotomous (binary), such as surgetypery : coded as a value of \(x_1 = 1\) and \(x_0 = 0\), for yes and no, respectively, then the hazard ratio becomes

\[HR(t,x_1, x_0) = exp^\beta\]

Advantages of the Cox proportional hazards regression

1 2 3 4 5 6

If you remember that by using Kaplan-Meier (KM) analysis, we could estimate the survival probability. And using the log-rank or peto-peto test, we could compare the survival between categorical covariates. However, the disadvantages of KM include:

  1. Need to categorize numerical variable to compare survival
  2. It is a univariable analysis
  3. It is a non-parametric analysis

Advantages of the Cox proportional hazards regression

1 2 3 4 5 6

We also acknowledge that the fully parametric regression models in survival analysis have stringent assumptions and distribution requirement. So, to overcome the limitations of the KM analysis and the fully parametric analysis, we can model our survival data using the semi-parametric Cox proportional hazards regression.

Estimation from Cox proportional hazards regression

1 2 3 4 5 6

Using our cancer dataset, we will estimate the parameters using the Cox PH regression. Remember, in our data we have

  1. the time variable : survivaltime
  2. the event variable : Survivaltime and the event of interest is Died. Event classified other than dead are considered as censored
  3. all other covariates

Estimation from Cox proportional hazards regression

1 2 3 4 5 6

Now let’s take surgery as the covariate of interest:

surgery <- 
  coxph(Surv(time = survivaltime, 
             event = Survival == "Died") ~ surgery,
                     data = cancer)
summary(surgery)
Call:
coxph(formula = Surv(time = survivaltime, event = Survival == 
    "Died") ~ surgery, data = cancer)

  n= 1824, number of events= 989 

               coef exp(coef) se(coef)      z        Pr(>|z|)    
surgeryYes -0.59284   0.55276  0.08781 -6.751 0.0000000000147 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

           exp(coef) exp(-coef) lower .95 upper .95
surgeryYes    0.5528      1.809    0.4654    0.6566

Concordance= 0.544  (se = 0.007 )
Likelihood ratio test= 39.73  on 1 df,   p=0.0000000003
Wald test            = 45.58  on 1 df,   p=0.00000000001
Score (logrank) test = 46.92  on 1 df,   p=0.000000000007

Estimation from Cox proportional hazards regression

1 2 3 4 5 6

But for nicer output (in a data frame format), we can use tidy(). This will give us

  • the estimate which is the log hazard. If you exponentiate it, you will get hazard ratio
  • the standard error
  • the p-value
  • the confidence intervals for the log hazard

Estimation from Cox proportional hazards regression

1 2 3 4 5 6

library(broom)
tidy(surgery,
     conf.int = TRUE)
# A tibble: 1 × 7
  term       estimate std.error statistic  p.value conf.low conf.high
  <chr>         <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
1 surgeryYes   -0.593    0.0878     -6.75 1.47e-11   -0.765    -0.421

Estimation from Cox proportional hazards regression

1 2 3 4 5 6

The simple Cox PH model with SURGERY shows that the patients who went through surgery have \(-0.5928356\) times the crude log hazard for death as compared to patients who did not(p-value = 0.00000000001466763).

The \(95\%\) confidence intervals for the crude log hazards are calculated by:

\[\hat\beta \pm 1.96 \times \widehat{SE}(\hat\beta)\]

Estimation from Cox proportional hazards regression

1 2 3 4 5 6

tidy(surgery, 
     exponentiate = TRUE,
     conf.int = TRUE)
# A tibble: 1 × 7
  term       estimate std.error statistic  p.value conf.low conf.high
  <chr>         <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
1 surgeryYes    0.553    0.0878     -6.75 1.47e-11    0.465     0.657

Estimation from Cox proportional hazards regression

1 2 3 4 5 6

Or we can get the crude hazard ratio (HR) by exponentiating the log HR. In this example, the simple Cox PH model with covariate surgery shows that the patients who took surgery has \(55\%\) lower risk for cancer as compared to patients who did not (p-value = 0.00000000001466763 and \(95\% CI 46, 66\)).

The \(95\%\) confidence intervals for crude HR are calculated by

\[exp[\hat\beta \pm 1.96 \times \widehat{SE}(\hat\beta)]\]

Estimation from Cox proportional hazards regression

1 2 3 4 5 6

Let’s model the risk for cancer death for covariate age:

cancer_age <- 
  coxph(Surv(time = survivaltime, 
             event = Survival == "Died") ~ age,
                     data = cancer)
summary(cancer_age)
Call:
coxph(formula = Surv(time = survivaltime, event = Survival == 
    "Died") ~ age, data = cancer)

  n= 1824, number of events= 989 

        coef exp(coef) se(coef)    z            Pr(>|z|)    
age 0.039872  1.040678 0.002431 16.4 <0.0000000000000002 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

    exp(coef) exp(-coef) lower .95 upper .95
age     1.041     0.9609     1.036     1.046

Concordance= 0.667  (se = 0.01 )
Likelihood ratio test= 272.6  on 1 df,   p=<0.0000000000000002
Wald test            = 269  on 1 df,   p=<0.0000000000000002
Score (logrank) test = 270.1  on 1 df,   p=<0.0000000000000002

Estimation from Cox proportional hazards regression

1 2 3 4 5 6

The simple Cox PH model with covariate sex shows that with each one unit increase in age, the crude log hazard for death changes by a factor of \(0.039872\).

tidy(cancer_age,
     exponentiate = TRUE,
     conf.int = TRUE)
# A tibble: 1 × 7
  term  estimate std.error statistic  p.value conf.low conf.high
  <chr>    <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
1 age       1.04   0.00243      16.4 1.89e-60     1.04      1.05

Estimation from Cox proportional hazards regression

1 2 3 4 5 6

When we exponentiate the log HR, the simple Cox PH model shows that with each one unit increase in age, the crude risk for death increases for about \(4%\). The relationship between cancer death and age is highly significant (p-value \(< 0.0001\)) when not adjusting for other covariates.

By using tbl_uvregression() we can generate simple univariable model for all covariates in one line of code. In return, we get the crude HR for all the covariates of interest.

Estimation from Cox proportional hazards regression

1 2 3 4 5 6

cancer |>
  dplyr::select(survivaltime, Survival, sex,age, surgery) |>
  tbl_uvregression(
    method = coxph,
    y = Surv(survivaltime, event = Survival == "Died"),
    exponentiate = TRUE,
    pvalue_fun = ~style_pvalue(.x, digits = 3)
  ) |>
  as_gt()
Characteristic N HR (95% CI)1 p-value
sex 1,824

    Female

    Male
1.08 (0.96 to 1.23) 0.205
age 1,824 1.04 (1.04 to 1.05) <0.001
surgery 1,824

    No

    Yes
0.55 (0.47 to 0.66) <0.001
1 HR = Hazard Ratio, CI = Confidence Interval

Multiple Cox PH regression

1 2 3 4 5 6

There are two primary reasons to include more than one covariates in the model. One of the primary reasons for using a regression model is to include multiple covariates to adjust statistically for possible imbalances in the observed data before making statistical inferences. In traditional statistical applications, it is called analysis of covariance, while in clinical and epidemiological investigations it is often called control of confounding. The other reason is a statistically related issue where the inclusion of higher-order terms in a model representing interactions between covariates. These are also called effect modifiers.

Multiple Cox PH regression

1 2 3 4 5 6

Let’s decide based on our clinical expertise and statistical significance, we would model a Cox PH model with these covariates.

  • age
  • sex
  • surgery

Multiple Cox PH regression

1 2 3 4 5 6

The reasons because we found that both age and surgery are statistically significant. We also believe that gender may also influence outcome.

Multiple Cox PH regression

1 2 3 4 5 6

To estimate to Cox PH model with age, sex and surgery:

cancer_mv <- 
  coxph(Surv(time = survivaltime, 
             event = Survival == "Died") ~ age +  sex + surgery, 
        data = cancer)
tidy(cancer_mv, exponentiate = TRUE, conf.int = TRUE)
# A tibble: 3 × 7
  term       estimate std.error statistic  p.value conf.low conf.high
  <chr>         <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
1 age           1.04    0.00244     16.2  3.88e-59    1.04      1.05 
2 sexMale       1.14    0.0645       2.09 3.69e- 2    1.01      1.30 
3 surgeryYes    0.593   0.0882      -5.93 3.03e- 9    0.499     0.705

Multiple Cox PH regression

1 2 3 4 5 6

or we may use tbl_regression() for a better output

tbl_regression(cancer_mv) |>
  as_gt()
Characteristic log(HR) (95% CI)1 p-value
age 0.04 (0.03 to 0.04) <0.001
sex

    Female
    Male 0.13 (0.01 to 0.26) 0.037
surgery

    No
    Yes -0.52 (-0.70 to -0.35) <0.001
1 HR = Hazard Ratio, CI = Confidence Interval

Multiple Cox PH regression

1 2 3 4 5 6

and show the exponentiation of the log hazard ratio to obtain the hazard ratio

tbl_regression(cancer_mv, exponentiate = TRUE) |>
  as_gt()
Characteristic HR (95% CI)1 p-value
age 1.04 (1.04 to 1.05) <0.001
sex

    Female
    Male 1.14 (1.01 to 1.30) 0.037
surgery

    No
    Yes 0.59 (0.50 to 0.70) <0.001
1 HR = Hazard Ratio, CI = Confidence Interval

Multiple Cox PH regression

1 2 3 4 5 6

We would like to doubly confirm if the model with covariates surgery, age and sex and really statistically different from model with age and surgery:

cancer_mv2 <- 
  coxph(Surv(time = survivaltime, 
             event = Survival == "Died") ~ age + surgery, 
        data = cancer)
tidy(cancer_mv2, exponentiate = TRUE, conf.int = TRUE)
# A tibble: 2 × 7
  term       estimate std.error statistic  p.value conf.low conf.high
  <chr>         <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
1 age           1.04    0.00243     16.1  1.35e-58    1.04      1.04 
2 surgeryYes    0.588   0.0881      -6.02 1.73e- 9    0.495     0.699

Multiple Cox PH regression

1 2 3 4 5 6

We can confirm this by running the likelihood ratio test between the two Cox PH models:

anova(cancer_mv , cancer_mv2, test = 'Chisq')
Analysis of Deviance Table
 Cox model: response is  Surv(time = survivaltime, event = Survival == "Died")
 Model 1: ~ age + sex + surgery
 Model 2: ~ age + surgery
   loglik  Chisq Df Pr(>|Chi|)  
1 -6620.1                       
2 -6622.2 4.3771  1    0.03643 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

And true enough, the two Cox PH modes are different (p-value < 0.01). And we will choose the larger model.

Compare Univariate Cox and Logistic Regression

1 2 3 4 5 6

Compare Cox and Logistic Regression

1 2 3 4 5 6

fancy_table <-
  tbl_merge(
    tbls        = list(uvlm_table, uvcm_table),
    tab_spanner = c("Death Status", "Time to Death")
  )

fancy_table
Characteristic Death Status Time to Death
N OR (95% CI)1 p-value N HR (95% CI)1 p-value
survivaltime 1,824 0.98 (0.98 to 0.99) <0.001


sex 1,824

1,824

    Female



    Male
1.09 (0.90 to 1.31) 0.38
1.08 (0.96 to 1.23) 0.205
age 1,824 1.03 (1.03 to 1.04) <0.001 1,824 1.04 (1.04 to 1.05) <0.001
surgery 1,824

1,824

    No



    Yes
0.43 (0.31 to 0.58) <0.001
0.55 (0.47 to 0.66) <0.001
1 OR = Odds Ratio, CI = Confidence Interval, HR = Hazard Ratio, CI = Confidence Interval

Compare Multivariate Cox and Logistic Regression

1 2 3 4 5 6

Compare Cox and Logistic Regression

1 2 3 4 5 6

fancy_table <-
  tbl_merge(
    tbls        = list(cox, logistic),
    tab_spanner = c("Cox PH","Logistic")
  )

fancy_table
Characteristic Cox PH Logistic
HR (95% CI)1 p-value OR (95% CI)1 p-value
sex



    Female

    Male 1.14 (1.01 to 1.30) 0.037 1.09 (0.90 to 1.33) 0.36
age 1.04 (1.04 to 1.05) <0.001 1.03 (1.03 to 1.04) <0.001
surgery



    No

    Yes 0.59 (0.50 to 0.70) <0.001 0.45 (0.33 to 0.62) <0.001
1 HR = Hazard Ratio, CI = Confidence Interval, OR = Odds Ratio, CI = Confidence Interval