The Waltons.csv contains genotypes and number of days survived in Drosophila. Column ‘T’ includes days survived. Column ‘E’ indicates events (1- death, 0-censored). Column ‘group’ indicates genotypes. For this experiment we would Fit a Weibull distribution to miR-137 group and control group respectively and provide the following results using PYTHON.
Write down the Weibull distribution model, provide the model parameters and their upper and lower bounds
Compute survival probability at 10 days and 30 days for miR-137 and control groups respectively.
Plot the survival curve and hazard function.
Discuss our findings.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from lifelines import WeibullFitter
from lifelines import KaplanMeierFitter
dat=pd.read_csv("Desktop/waltons.csv")
display(dat.head())
T=dat['Time']
E=dat['E']
w=dat['weight']
wf=WeibullFitter()
wf.fit(T,E,weights=w)
display(wf.summary)
wf.plot_survival_function()
plt.xlabel('Days Survived')
plt.ylabel('Survival Prob')
plt.show()
wf.plot_hazard()
plt.xlabel('Days Survived')
plt.ylabel('Hazard Rate')
plt.show()
# predicting weibull survival estimates at 10 and 30 days respectively
wf.predict([10,30])
| Time | E | group | weight | |
|---|---|---|---|---|
| 0 | 6 | 1 | miR-137 | 1 |
| 1 | 13 | 1 | miR-137 | 1 |
| 2 | 13 | 1 | miR-137 | 1 |
| 3 | 13 | 1 | miR-137 | 1 |
| 4 | 19 | 1 | miR-137 | 1 |
| coef | se(coef) | coef lower 95% | coef upper 95% | cmp to | z | p | -log2(p) | |
|---|---|---|---|---|---|---|---|---|
| lambda_ | 55.733097 | 1.326504 | 53.133196 | 58.332998 | 1.0 | 41.261153 | 0.000000e+00 | inf |
| rho_ | 3.450505 | 0.243323 | 2.973601 | 3.927410 | 1.0 | 10.070989 | 7.422812e-24 | 76.834308 |
png
png
10 0.997340
30 0.888706
Name: Weibull_estimate, dtype: float64
# spliting the groups into miR-137 and control group to examine the survival function
# miR-137 group
dat2=dat[dat['group']=='miR-137'] # the code subsets miR-137 group
display(dat2.head())
T2=dat2['Time']
E2=dat2['E']
w2=dat2['weight']
wf2=WeibullFitter()
wf2.fit(T2,E2,weights=w2)
display(wf2.summary)
# predicting weibull survival estimates at 10 and 30 days respectively for miR-137 group
wf2.predict([10,30])
| Time | E | group | weight | |
|---|---|---|---|---|
| 0 | 6 | 1 | miR-137 | 1 |
| 1 | 13 | 1 | miR-137 | 1 |
| 2 | 13 | 1 | miR-137 | 1 |
| 3 | 13 | 1 | miR-137 | 1 |
| 4 | 19 | 1 | miR-137 | 1 |
| coef | se(coef) | coef lower 95% | coef upper 95% | cmp to | z | p | -log2(p) | |
|---|---|---|---|---|---|---|---|---|
| lambda_ | 29.110312 | 2.547545 | 24.117216 | 34.103409 | 1.0 | 11.034274 | 2.611506e-28 | 91.629104 |
| rho_ | 2.073407 | 0.264260 | 1.555467 | 2.591347 | 1.0 | 4.061935 | 4.866754e-05 | 14.326681 |
10 0.896637
20 0.631790
Name: Weibull_estimate, dtype: float64
# control group
dat1=dat[dat['group']=='control'] # the code subsets control group
display(dat1.head())
T1=dat1['Time']
E1=dat1['E']
w1=dat1['weight']
wf1=WeibullFitter()
wf1.fit(T1,E1,weights=w1)
display(wf1.summary)
# predicting weibull survival estimates at 10 and 30 days respectively for control group
wf1.predict([10,30])
| Time | E | group | weight | |
|---|---|---|---|---|
| 34 | 33 | 1 | control | 1 |
| 35 | 54 | 1 | control | 1 |
| 36 | 54 | 1 | control | 1 |
| 37 | 61 | 1 | control | 1 |
| 38 | 61 | 1 | control | 1 |
| coef | se(coef) | coef lower 95% | coef upper 95% | cmp to | z | p | -log2(p) | |
|---|---|---|---|---|---|---|---|---|
| lambda_ | 60.599957 | 0.849588 | 58.934796 | 62.265118 | 1.0 | 70.151640 | 0.000000e+00 | inf |
| rho_ | 6.652041 | 0.496830 | 5.678272 | 7.625809 | 1.0 | 11.376211 | 5.493491e-30 | 97.20012 |
10 0.999994
30 0.990737
Name: Weibull_estimate, dtype: float64
The model parameters for both groups are different,
The upper and lower bound values for control group is higher when compared with miR-137.
The lambda and beta coefficients are higher for control when compared with miR-137 genotype.
The P-value is significantly low for both genotypes
# Comparing survival curve and hazard rate curve of both miR-137 and control group
wf2.plot_survival_function(label='miR-137')
wf1.plot_survival_function(label='control')
plt.xlabel('Days Survived')
plt.ylabel('Survival Prob')
plt.show()
wf2.plot_hazard(label='miR-137')
wf1.plot_hazard(label='control')
plt.xlabel('Days Survived')
plt.ylabel('Hazard Rate')
plt.show()
png
png
There is a difference in the curve for both groups.
The survival curve for miR-137 genotype indicates it survived in early days of Drosophila between 0 and 20days,
but starts to decline from 25days with low survival rates when compared with the control genotype which survived up to 30days mark before declining at 50days.
The hazard rate for both groups is low.
# kaplanMeier
# This is a non-parametric statistic estimator used to estimate survival function and
# is mainly a descriptive tool for point estimates
# Ploting kaplanMeier of both groups
km=KaplanMeierFitter()
#miR-137
km.fit(T2,E2,weights=w2)
km.plot(label='miR-137')
#control
km.fit(T1,E1,weights=w1)
km.plot(label='control')
<AxesSubplot:xlabel='timeline'>
png
# kaplanMeier
# Comparing the curves of both weibull and kaplanmeier, we can see that the curves are almost identical.
# kaplanMeier
km=KaplanMeierFitter()
km.fit(T,E,weights=w)
km.plot(label='kaplanMeier')
#weibull distribution
wf.plot_survival_function()
plt.xlabel('Days Survived')
plt.ylabel('Survival Prob')
plt.show()
png