Survival Analysis on rats Data set

Introduction :-

In this report, I am attempting to do survival analysis (or) time-to-event analysis on Rats Data set.

Exploratory Data Analysis :-

Exploratory data analysis is an approach of analyzing data sets to summarize their main characteristics, often with visual methods.

Structure of given dataset :-

The given dataset has 300 rats and each rat has 5attributes. The header of the dataset is as follows.

##   litter rx time status sex
## 1      1  1  101      0   f
## 2      1  0   49      1   f
## 3      1  0  104      0   f
## 4      2  1   91      0   m
## 5      2  0  104      0   m
## 6      2  0  102      0   m

Explanation of all the variables :-

• litter : litter number from 1 to 100, numeric

• rx : treatment, (1=drug, 0=control), factor

• time : time to tumor or last follow-up, numeric

• status : event status, 1=tumor and 0=censored, numeric

• sex : male or female, factor

The detailed structure is as follows.

## 'data.frame':    300 obs. of  5 variables:
##  $ litter: int  1 1 1 2 2 2 3 3 3 4 ...
##  $ rx    : int  1 0 0 1 0 0 1 0 0 1 ...
##  $ time  : int  101 49 104 91 104 102 104 102 104 91 ...
##  $ status: int  0 1 0 0 0 0 0 0 0 0 ...
##  $ sex   : Factor w/ 2 levels "f","m": 1 1 1 2 2 2 1 1 1 2 ...

In Input , The type of each attribute is as follows.

##    litter        rx      time    status       sex 
## "integer" "integer" "integer" "integer"  "factor"

The type of rx is not correct . I am updating it to factorial category. For EDA, I am updating the type of status to factorial. Finally, the type of each attribute is as follows.

##    litter        rx      time    status       sex 
## "integer"  "factor" "integer"  "factor"  "factor"

Dealing with NULL values :-

The number of null values in each column are as follows.

## litter     rx   time status    sex 
##      0      0      0      0      0

As there is no null values, we can proceed further.

Summary :-

The overall summary of all the attributes is as follows.

##      litter       rx           time        status  sex    
##  Min.   :  1.00   0:200   Min.   : 23.00   0:258   f:150  
##  1st Qu.: 25.75   1:100   1st Qu.: 80.75   1: 42   m:150  
##  Median : 50.50           Median : 98.00                  
##  Mean   : 50.50           Mean   : 90.44                  
##  3rd Qu.: 75.25           3rd Qu.:104.00                  
##  Max.   :100.00           Max.   :104.00

The distribution of all continuous variables is as follows.

The distribution of all contionus variables in each category is as follows.

Litter:-

Time:-

The co-releation between the continous variables is as follows

##             litter        time
## litter  1.00000000 -0.04241067
## time   -0.04241067  1.00000000

___

Description of EDA :-

In our data set,

There are 300 rats & 5 attributes for each rat.
rx has little effect on time of event.

NON-PARAMETRIC SURVIVAL MODELS

Fitting Kaplan-Meier Model (with out considering category varaibles) :-

In this Model, all the records will be considered as similar and the categorical varible tgrade (I / II / II ) and horTh ( yes / no) are not considered in this model.

Overview of the fitted kaplan-meier model is as follows.

## Call: survfit(formula = attrib, data = df, type = "kaplan-meier")
## 
##       n  events  median 0.95LCL 0.95UCL 
##     300      42      NA      NA      NA

Survival Table :-

summary(km_model)

## Call: survfit(formula = attrib, data = df, type = "kaplan-meier")
## 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##    34    298       1    0.997 0.00335        0.990        1.000
##    39    297       1    0.993 0.00473        0.984        1.000
##    40    295       1    0.990 0.00579        0.979        1.000
##    45    294       1    0.987 0.00668        0.974        1.000
##    49    292       1    0.983 0.00746        0.969        0.998
##    50    290       1    0.980 0.00817        0.964        0.996
##    54    285       1    0.976 0.00883        0.959        0.994
##    55    282       1    0.973 0.00946        0.955        0.992
##    64    274       1    0.969 0.01007        0.950        0.989
##    66    271       1    0.966 0.01065        0.945        0.987
##    67    270       1    0.962 0.01119        0.940        0.984
##    68    267       1    0.959 0.01172        0.936        0.982
##    70    263       1    0.955 0.01222        0.931        0.979
##    71    261       1    0.951 0.01271        0.927        0.977
##    72    259       1    0.948 0.01318        0.922        0.974
##    73    257       2    0.940 0.01408        0.913        0.968
##    75    251       1    0.936 0.01451        0.908        0.965
##    77    245       1    0.933 0.01494        0.904        0.962
##    78    238       1    0.929 0.01539        0.899        0.959
##    79    235       1    0.925 0.01582        0.894        0.956
##    80    230       2    0.917 0.01667        0.885        0.950
##    81    225       2    0.909 0.01749        0.875        0.944
##    84    215       2    0.900 0.01832        0.865        0.937
##    86    209       1    0.896 0.01873        0.860        0.933
##    88    202       1    0.891 0.01916        0.855        0.930
##    89    198       2    0.882 0.02000        0.844        0.922
##    92    176       1    0.877 0.02050        0.838        0.919
##    94    169       1    0.872 0.02103        0.832        0.914
##    96    158       2    0.861 0.02216        0.819        0.906
##   101    142       1    0.855 0.02282        0.812        0.901
##   102    139       2    0.843 0.02409        0.797        0.891
##   103    113       3    0.820 0.02669        0.770        0.874
##   104    108       1    0.813 0.02751        0.761        0.869

kaplan-meier curve :-

There is no good & specific results observed in this graph.

Fitting Kaplan-Meier Model (considering horTh category variable) :-

In this Model, all the records will be considered in two different categories based on sex value of the record.

The overview of the fitted kaplan-meier model is as follows.

## Call: survfit(formula = attrib, data = df, type = "kaplan-meier")
## 
##         n events median 0.95LCL 0.95UCL
## sex=f 150     40     NA      NA      NA
## sex=m 150      2     NA      NA      NA

Survival Table :-

## Call: survfit(formula = attrib, data = df, type = "kaplan-meier")
## 
##                 sex=f 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##    34    150       1    0.993 0.00664        0.980        1.000
##    39    149       1    0.987 0.00937        0.968        1.000
##    40    148       1    0.980 0.01143        0.958        1.000
##    45    147       1    0.973 0.01315        0.948        0.999
##    49    145       1    0.967 0.01468        0.938        0.996
##    50    143       1    0.960 0.01606        0.929        0.992
##    54    142       1    0.953 0.01731        0.920        0.988
##    55    141       1    0.946 0.01846        0.911        0.983
##    64    138       1    0.939 0.01956        0.902        0.979
##    66    137       1    0.933 0.02058        0.893        0.974
##    67    136       1    0.926 0.02154        0.884        0.969
##    68    135       1    0.919 0.02245        0.876        0.964
##    70    132       1    0.912 0.02333        0.867        0.959
##    72    130       1    0.905 0.02418        0.859        0.954
##    73    128       2    0.891 0.02579        0.842        0.943
##    77    119       1    0.883 0.02664        0.833        0.937
##    78    114       1    0.876 0.02751        0.823        0.931
##    79    112       1    0.868 0.02835        0.814        0.925
##    80    108       2    0.852 0.03002        0.795        0.913
##    81    105       2    0.835 0.03156        0.776        0.900
##    84     99       2    0.819 0.03310        0.756        0.886
##    86     96       1    0.810 0.03384        0.746        0.879
##    88     93       1    0.801 0.03458        0.736        0.872
##    89     91       2    0.784 0.03599        0.716        0.858
##    92     82       1    0.774 0.03680        0.705        0.850
##    94     79       1    0.764 0.03761        0.694        0.842
##    96     74       2    0.744 0.03933        0.670        0.825
##   101     69       1    0.733 0.04021        0.658        0.816
##   102     67       2    0.711 0.04188        0.634        0.798
##   103     64       3    0.678 0.04412        0.597        0.770
##   104     60       1    0.666 0.04481        0.584        0.760
## 
##                 sex=m 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##    71    131       1    0.992  0.0076        0.978            1
##    75    129       1    0.985  0.0108        0.964            1

kaplan-meier curve :-

Comparing two KM-Curves :-

The logrank test, or log-rank test, is a hypothesis test to compare the survival distributions of two samples. It is a nonparametric test. This test is well suitable for Kaplan-Meier Estimator model ( non- parametric model ).

Null Hypothesis : Survival in two groups is same.
Alternative Hypothesis : Survival in two groups is not same.

## Call:
## survdiff(formula = attrib, data = df)
## 
##         N Observed Expected (O-E)^2/E (O-E)^2/V
## sex=f 150       40     20.6      18.1      35.9
## sex=m 150        2     21.4      17.5      35.9
## 
##  Chisq= 35.9  on 1 degrees of freedom, p= 2e-09

The P-value is very very less.

Fitting Kaplan-Meier Model (considering rx category variable) :-

In this Model, all the records will be considered in three different categories based on rx value of the record.

The overview of the fitted kaplan-meier model is as follows.

## Call: survfit(formula = attrib, data = df, type = "kaplan-meier")
## 
##        n events median 0.95LCL 0.95UCL
## rx=0 200     21     NA      NA      NA
## rx=1 100     21     NA      NA      NA

Survival Table :-

summary(km_model_rx)

## Call: survfit(formula = attrib, data = df, type = "kaplan-meier")
## 
##                 rx=0 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##    40    198       1    0.995 0.00504        0.985        1.000
##    49    196       1    0.990 0.00712        0.976        1.000
##    50    195       1    0.985 0.00871        0.968        1.000
##    54    191       1    0.980 0.01008        0.960        1.000
##    55    188       1    0.974 0.01129        0.953        0.997
##    64    184       1    0.969 0.01241        0.945        0.994
##    66    182       1    0.964 0.01343        0.938        0.991
##    68    181       1    0.958 0.01438        0.931        0.987
##    71    176       1    0.953 0.01529        0.924        0.983
##    73    173       1    0.948 0.01617        0.916        0.980
##    75    168       1    0.942 0.01702        0.909        0.976
##    77    164       1    0.936 0.01786        0.902        0.972
##    78    158       1    0.930 0.01871        0.894        0.968
##    79    156       1    0.924 0.01951        0.887        0.963
##    81    149       2    0.912 0.02113        0.871        0.954
##    84    142       2    0.899 0.02270        0.856        0.945
##    96    111       1    0.891 0.02390        0.845        0.939
##   101     98       1    0.882 0.02533        0.834        0.933
##   102     96       1    0.873 0.02668        0.822        0.927
## 
##                 rx=1 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##    34     99       1    0.990  0.0100        0.970        1.000
##    39     98       1    0.980  0.0141        0.952        1.000
##    45     97       1    0.970  0.0172        0.937        1.000
##    67     89       1    0.959  0.0202        0.920        0.999
##    70     86       1    0.948  0.0228        0.904        0.993
##    72     85       1    0.937  0.0251        0.889        0.987
##    73     84       1    0.925  0.0272        0.874        0.980
##    80     78       2    0.902  0.0312        0.842        0.965
##    86     72       1    0.889  0.0332        0.826        0.957
##    88     67       1    0.876  0.0353        0.809        0.948
##    89     64       2    0.848  0.0391        0.775        0.929
##    92     54       1    0.833  0.0414        0.755        0.918
##    94     50       1    0.816  0.0438        0.735        0.907
##    96     47       1    0.799  0.0462        0.713        0.895
##   102     43       1    0.780  0.0487        0.690        0.882
##   103     41       3    0.723  0.0552        0.623        0.840
##   104     38       1    0.704  0.0569        0.601        0.825

kaplan-meier curve :-

There is no special results in this graph.

Comparing two KM-Curves :-

Null Hypothesis : Survival in two groups is same.
Alternative Hypothesis : Survival in two groups is not same.

## Call:
## survdiff(formula = attrib, data = df)
## 
##        N Observed Expected (O-E)^2/E (O-E)^2/V
## rx=0 200       21     28.2      1.82      5.55
## rx=1 100       21     13.8      3.71      5.55
## 
##  Chisq= 5.5  on 1 degrees of freedom, p= 0.02

P-Value is very less.

Conclusion of Non-parametric models :-

As the p value is very small , we can reject the null hypothesis. We can say that, the group of people in sex and rx are stastically different and the survival will be different in each group.

** From the graphs we can conclude that, the groups survival rate is same.**

SEMI-PARAMETRIC SURVIVAL MODELS

Fitting cox PH Model :-

In this Model, all the records will be considered and all the variables will be considred.

Overview of the fitted cox PH model is as follows.

## Call:
## coxph(formula = attrib, data = df)
## 
##             coef exp(coef)  se(coef)      z        p
## litter  0.008465  1.008501  0.005344  1.584  0.11315
## rx1     0.805296  2.237359  0.309431  2.603  0.00925
## sexm   -3.085125  0.045724  0.724932 -4.256 2.08e-05
## 
## Likelihood ratio test=52.58  on 3 df, p=2.252e-11
## n= 300, number of events= 42

##                        2.5 %    97.5 %
## litter 1.00850125 0.99799405 1.0191191
## rx1    2.23735897 1.21996509 4.1032118
## sexm   0.04572433 0.01104292 0.1893262

Conclusion of Semi-parametric models :-

As the Hazard ratio in rx1 is very high, we can say as rx1 group has more dangor that rx2 group

As the Hazard ratio in sexm is very less, we can say as sexm group has less dangor that senf group

SURVIVAL TREES

Fitting Conditional inference tree :-

In this Model, all the records will be considered and all the variables will be considred.

Overview of the fitted Conditional inference tree model is as follows.

## 
##   Conditional inference tree with 3 terminal nodes
## 
## Response:  Surv(time, status) 
## Inputs:  litter, rx, sex 
## Number of observations:  300 
## 
## 1) sex == {m}; criterion = 1, statistic = 35.839
##   2)*  weights = 150 
## 1) sex == {f}
##   3) rx == {0}; criterion = 0.991, statistic = 8.707
##     4)*  weights = 100 
##   3) rx == {1}
##     5)*  weights = 50

Survival Analysis on rats Data set

Anil Kumar Kanasani - 11013622

2021-02-15

Introduction :-

Exploratory Data Analysis :-

Description of EDA :-

NON-PARAMETRIC SURVIVAL MODELS

Fitting Kaplan-Meier Model (with out considering category varaibles) :-

Fitting Kaplan-Meier Model (considering horTh category variable) :-

Comparing two KM-Curves :-

Fitting Kaplan-Meier Model (considering rx category variable) :-

Comparing two KM-Curves :-

Conclusion of Non-parametric models :-

SEMI-PARAMETRIC SURVIVAL MODELS

Fitting cox PH Model :-

Conclusion of Semi-parametric models :-

SURVIVAL TREES

Fitting Conditional inference tree :-