Risk Analysis of Covid-19 Patient Mortality Using Extended Cox Model
Survival analysis is a collection of methods for evaluate data related to time from start to finish a special event (can be death/recovery) with given censorship status to be a comparison of endurance ability human life (Cread,2020).
Berdasarkan definisi tersebut, penelitian ini memilih analisis survival untuk mengetahui penyebab kematian pasien covid 19. Penelitian ini menggunakan data pasien covid -19 yang diperoleh dari Mexican Government, dengan variabel respon yaitu time dan status (tersensor atau tidak) dan variabel prediktor yaitu hasil laboratorium pasien berupa riwayat penyakit yang pernah diderita pasien Covid-19. Data yang akan dianalisis terlalu besar dan heterogen sehingga dilakukan clustering menggunakan algoritma K-Means Clustering.Penelitian akan melakukan pengelompokan data pasien covid-19 berdasarakan tingkat keparahan yaitu terdiri dari cluster pasien tidak parah dan pasien yang parah, sehingga hasil analisis akan lebih mengerucut sesuai dengan cluster tertentu dan memperjelas kesimpulan. Setelah melakukan clustering pada data pasien covid-19, dilanjutkan dengan melakukan analisis survival pada dua cluster yang terbentuk yaitu menerapkan model extended cox yang diestimasi menggunakan metode Breslow partial likelihood dan yang terakhir dilakukan uji kesesuaian model.
Penelitian analisis risiko kematian pasien covid-19 ini diharapkan memberikan insight mengenai factor-faktor yang mempengaruhi kematian pasien covid-19 sehingga membantu tenaga medis dalam memberikan treatment yang sesuai bagi pasien yang memiliki faktor tersebut. Berikut ini merupakan flowchart penelitian ini
1. Data Preparation
Pada Tahapan ini dilakuan mempersiapkan library yang digunakan dan membaca data untuk penelitian ini
CovidData<-read.csv2("covid.csv")
head(CovidData) id sex patient_type entry_date date_symptoms date_died intubed
1 1636bf 0 2 12/01/2020 09/01/2020 13/01/2020 1
2 7.98E+02 0 2 12/01/2020 12/01/2020 14/01/2020 2
3 1673d6 1 2 13/01/2020 10/01/2020 15/01/2020 2
4 1045fa 1 2 12/01/2020 12/01/2020 29/01/2020 2
5 1abb1c 1 2 12/01/2020 04/01/2020 30/01/2020 2
6 09ab8c 0 2 06/02/2020 05/02/2020 08/02/2020 2
pneumonia age pregnancy diabetes copd asthma inmsupr hypertension
1 1 48 0 0 0 0 0 0
2 1 74 0 0 0 0 0 0
3 1 45 0 98 0 98 98 0
4 1 75 0 0 0 0 0 1
5 1 54 0 1 0 0 0 0
6 1 55 0 0 1 1 0 1
other_disease cardiovascular obesity renal_chronic tobacco
1 2 0 0 0 0
2 2 1 0 0 1
3 2 0 0 0 0
4 1 0 1 0 0
5 2 0 1 0 0
6 2 1 1 0 0
contact_other_covid covid_res icu
1 99 2 2
2 99 2 2
3 99 2 1
4 99 2 2
5 99 2 1
6 99 2 1
2. Data Identification
Pada penelitian ini, tidak menggunakan semua kolom pada data covid tersebut, Sehingga membuang variabel yang tidak dibutuhkan dengan melakukan subseting.
CovidData=CovidData[,-c(1,3,4,7,16,21,22,23)]
The following is a description of each column:
1. Sex: Identifies the sex of the patient (0: male & 1: female).
2. date_symptoms: Identifies the date on which the patient’s symptoms began.
3. data_died: Identifies the date the patient died.
4. pneumonia:Identifies if the patient was diagnosed with pneumonia (0: No & 1: yes).
5. age: Identifies the age of the patient.
6. pregnancy: Identifies if the patient is pregnant (0:No & 1:Yes).
7. diabetes: Identifies if the patient has a diagnosis of diabetes (0: No & 1: Yes).
8. copd: Identifies if the patient has a diagnosis of Chronic obstructive pulmonary disease (COPD) (0: No & 1:Yes).
9. asthma: Identifies if the patient has a diagnosis of asthma (0: No & 1: Yes).
10. inmsupr:Identifies if the patient has immunosuppression (0: No & 1: Yes) 11. hypertension:Identifies if the patient has a diagnosis of hypertension (0:No & 1:Yes).
12. cardiovascular: Identifies if the patient has a diagnosis of cardiovascular disease (0: No & 1: Yes).
13. obesity: Identifies if the patient is diagnosed with obesity (0: No & 1: Yes).
14. renal_chronic: Identifies if the patient has a diagnosis of chronic kidney failure (0: No & 1: Yes).
15. tobacco: Identifies if the patient is a tobacco user (0: No & 1: Yes).
3. Data Acquistion
Data acquistion is done so that the data format is in accordance with the objectives of the analysis.
3.1. Check Type Data
str(CovidData)'data.frame': 125650 obs. of 15 variables:
$ sex : int 0 0 1 1 1 0 0 1 1 0 ...
$ date_symptoms : chr "09/01/2020" "12/01/2020" "10/01/2020" "12/01/2020" ...
$ date_died : chr "13/01/2020" "14/01/2020" "15/01/2020" "29/01/2020" ...
$ pneumonia : int 1 1 1 1 1 1 1 1 1 1 ...
$ age : int 48 74 45 75 54 55 63 68 32 15 ...
$ pregnancy : int 0 0 0 0 0 0 0 0 0 0 ...
$ diabetes : int 0 0 98 0 1 0 0 0 0 0 ...
$ copd : int 0 0 0 0 0 1 0 0 0 0 ...
$ asthma : int 0 0 98 0 0 1 0 0 0 0 ...
$ inmsupr : int 0 0 98 0 0 0 0 0 1 0 ...
$ hypertension : int 0 0 0 1 0 1 1 1 1 0 ...
$ cardiovascular: int 0 1 0 0 0 1 0 0 0 0 ...
$ obesity : int 0 0 0 1 1 1 0 1 0 0 ...
$ renal_chronic : int 0 0 0 0 0 0 0 0 1 0 ...
$ tobacco : int 0 1 0 0 0 0 0 0 0 0 ...
Perhatikan hasil pengecekan tipe diatas, date_symptoms dan date_died masih dalam bentuk chr seharusnya dalam bentuk date. variabel pneumonia, pregnancy, diabetes, copd, asthma, inmsupr, hypertension, cardiovascular, obesity, renal_chronic, tobaco seharusnya juga dalam bentuk factor namun karena melakukan analisis survival langsung bisa diidentifkasi sehingga tidak perlu diubah kedalam bentuk tipe data factor.
CovidData$date_died <- as.Date(CovidData$date_died, format = "%d/%m/%y")
CovidData$date_symptoms <- as.Date(CovidData$date_symptoms, format = "%d/%m/%y")
str(CovidData)'data.frame': 125650 obs. of 15 variables:
$ sex : int 0 0 1 1 1 0 0 1 1 0 ...
$ date_symptoms : Date, format: "2020-01-09" "2020-01-12" ...
$ date_died : Date, format: "2020-01-13" "2020-01-14" ...
$ pneumonia : int 1 1 1 1 1 1 1 1 1 1 ...
$ age : int 48 74 45 75 54 55 63 68 32 15 ...
$ pregnancy : int 0 0 0 0 0 0 0 0 0 0 ...
$ diabetes : int 0 0 98 0 1 0 0 0 0 0 ...
$ copd : int 0 0 0 0 0 1 0 0 0 0 ...
$ asthma : int 0 0 98 0 0 1 0 0 0 0 ...
$ inmsupr : int 0 0 98 0 0 0 0 0 1 0 ...
$ hypertension : int 0 0 0 1 0 1 1 1 1 0 ...
$ cardiovascular: int 0 1 0 0 0 1 0 0 0 0 ...
$ obesity : int 0 0 0 1 1 1 0 1 0 0 ...
$ renal_chronic : int 0 0 0 0 0 0 0 0 1 0 ...
$ tobacco : int 0 1 0 0 0 0 0 0 0 0 ...
3.2. Feature Engineering
Buat kolom waktu sebagai variabel respon dalam penelitian ini, variabel ini dipeorleh dari tanggal kematian pasien dikurangi dengan tanggal pertama kali positif Covid-19. secara matematika dituliskan seperti berikut
\[ waktu = datedied - date symptoms \]
CovidData$waktu<-CovidData$date_died - CovidData$date_symptoms
#lalu diubah ketipe data integer
CovidData$waktu<-as.integer(CovidData$waktu)
head(CovidData) sex date_symptoms date_died pneumonia age pregnancy diabetes copd asthma
1 0 2020-01-09 2020-01-13 1 48 0 0 0 0
2 0 2020-01-12 2020-01-14 1 74 0 0 0 0
3 1 2020-01-10 2020-01-15 1 45 0 98 0 98
4 1 2020-01-12 2020-01-29 1 75 0 0 0 0
5 1 2020-01-04 2020-01-30 1 54 0 1 0 0
6 0 2020-02-05 2020-02-08 1 55 0 0 1 1
inmsupr hypertension cardiovascular obesity renal_chronic tobacco waktu
1 0 0 0 0 0 0 4
2 0 0 1 0 0 1 2
3 98 0 0 0 0 0 5
4 0 1 0 1 0 0 17
5 0 0 0 1 0 0 26
6 0 1 1 1 0 0 3
Kemudian membuat kolom status sebagai variabel respon dalam penelitian ini
CovidData = CovidData %>%
mutate(status = ifelse(is.na(waktu), 0, 1))
head(CovidData) sex date_symptoms date_died pneumonia age pregnancy diabetes copd asthma
1 0 2020-01-09 2020-01-13 1 48 0 0 0 0
2 0 2020-01-12 2020-01-14 1 74 0 0 0 0
3 1 2020-01-10 2020-01-15 1 45 0 98 0 98
4 1 2020-01-12 2020-01-29 1 75 0 0 0 0
5 1 2020-01-04 2020-01-30 1 54 0 1 0 0
6 0 2020-02-05 2020-02-08 1 55 0 0 1 1
inmsupr hypertension cardiovascular obesity renal_chronic tobacco waktu
1 0 0 0 0 0 0 4
2 0 0 1 0 0 1 2
3 98 0 0 0 0 0 5
4 0 1 0 1 0 0 17
5 0 0 0 1 0 0 26
6 0 1 1 1 0 0 3
status
1 1
2 1
3 1
4 1
5 1
6 1
4. Data Filtering
Pada tahapan data filtering dilakukan pengecekan missing value dan outlier.
3.1 Check missing value
anyNA(CovidData)[1] TRUE
3.2 Check Outlier
boxplot(CovidData)5. Data Validation
Adanya missing value pada kolom waktu kemudian melakukan imputasi pada nilai yang missing value yaitu mengisi nilai yang kosong tersebut menjadi nilai maksimumnya yaitu 100 karena maksimal waktu pasien mengalami event adalah 93 hari sehingga dibulatkan menjadi 100 hari.
CovidData$waktu[is.na(CovidData$waktu)] <- 100
head(CovidData) sex date_symptoms date_died pneumonia age pregnancy diabetes copd asthma
1 0 2020-01-09 2020-01-13 1 48 0 0 0 0
2 0 2020-01-12 2020-01-14 1 74 0 0 0 0
3 1 2020-01-10 2020-01-15 1 45 0 98 0 98
4 1 2020-01-12 2020-01-29 1 75 0 0 0 0
5 1 2020-01-04 2020-01-30 1 54 0 1 0 0
6 0 2020-02-05 2020-02-08 1 55 0 0 1 1
inmsupr hypertension cardiovascular obesity renal_chronic tobacco waktu
1 0 0 0 0 0 0 4
2 0 0 1 0 0 1 2
3 98 0 0 0 0 0 5
4 0 1 0 1 0 0 17
5 0 0 0 1 0 0 26
6 0 1 1 1 0 0 3
status
1 1
2 1
3 1
4 1
5 1
6 1
Data pada umur sangat beragam sehingga perlu membuat kolom baru untuk mengelompokan variabel age menjadi umur tua dan muda. Menurut WHO, sesorang dikatakan masih muda saat usia $ <= 55$ tahun dan dikatakan tua saat lebih dari 55 tahun.
CovidData = CovidData %>%
mutate(Age = ifelse(age <= 55, 1, 0))
CovidData = CovidData[,-c(2,3,5)]
head(CovidData) sex pneumonia pregnancy diabetes copd asthma inmsupr hypertension
1 0 1 0 0 0 0 0 0
2 0 1 0 0 0 0 0 0
3 1 1 0 98 0 98 98 0
4 1 1 0 0 0 0 0 1
5 1 1 0 1 0 0 0 0
6 0 1 0 0 1 1 0 1
cardiovascular obesity renal_chronic tobacco waktu status Age
1 0 0 0 0 4 1 1
2 1 0 0 1 2 1 0
3 0 0 0 0 5 1 1
4 0 1 0 0 17 1 0
5 0 1 0 0 26 1 1
6 1 1 0 0 3 1 1
6. Data Cleaning
tahapan selanjutnya adalah menghapus data-data outlier pada kolom pregnancy, copd, asthma, inmsupr, hypertension, cardiovascular, obesity, renal chronic, tobacco,pneumonia, diabetes dan waktu.
CovidData <- subset(CovidData, pregnancy!= 98)
CovidData <- subset(CovidData, copd!= 98)
CovidData <- subset(CovidData, asthma!= 98)
CovidData <- subset(CovidData, inmsupr!= 98)
CovidData <- subset(CovidData, hypertension!= 98)
CovidData <- subset(CovidData, cardiovascular!= 98)
CovidData <- subset(CovidData, obesity!= 98)
CovidData <- subset(CovidData, renal_chronic!= 98)
CovidData <- subset(CovidData, tobacco!= 98)
CovidData <- subset(CovidData, pneumonia!= 99)
CovidData <- subset(CovidData, diabetes!= 98)
CovidData <- subset(CovidData, waktu > 0 )
head(CovidData) sex pneumonia pregnancy diabetes copd asthma inmsupr hypertension
1 0 1 0 0 0 0 0 0
2 0 1 0 0 0 0 0 0
4 1 1 0 0 0 0 0 1
5 1 1 0 1 0 0 0 0
6 0 1 0 0 1 1 0 1
7 0 1 0 0 0 0 0 1
cardiovascular obesity renal_chronic tobacco waktu status Age
1 0 0 0 0 4 1 1
2 1 0 0 1 2 1 0
4 0 1 0 0 17 1 0
5 0 1 0 0 26 1 1
6 1 1 0 0 3 1 1
7 0 0 0 0 14 1 0
7. Scaling Data
Scaling data dilakukan untuk normalisasi data yang nantinya digunakan untuk proses clustering data. Proses ini dilakukan karena range antara variabel satu dengan lainnya terlalu jauh. Normalisasi ini menggunakan Z-scores.
CovidDataScale<- scale(CovidData)
head(CovidDataScale) sex pneumonia pregnancy diabetes copd asthma inmsupr
1 -0.8752406 1.438927 -0.07518716 -0.5025649 -0.1616108 -0.1652522 -0.1387816
2 -0.8752406 1.438927 -0.07518716 -0.5025649 -0.1616108 -0.1652522 -0.1387816
4 1.1425337 1.438927 -0.07518716 -0.5025649 -0.1616108 -0.1652522 -0.1387816
5 1.1425337 1.438927 -0.07518716 1.9897767 -0.1616108 -0.1652522 -0.1387816
6 -0.8752406 1.438927 -0.07518716 -0.5025649 6.1876555 6.0513084 -0.1387816
7 -0.8752406 1.438927 -0.07518716 -0.5025649 -0.1616108 -0.1652522 -0.1387816
hypertension cardiovascular obesity renal_chronic tobacco waktu
1 -0.5659387 -0.1788443 -0.5017843 -0.1858564 -0.2973432 -1.754919
2 -0.5659387 5.5914099 -0.5017843 -0.1858564 3.3630899 -1.804557
4 1.7669615 -0.1788443 1.9928722 -0.1858564 -0.2973432 -1.432273
5 -0.5659387 -0.1788443 1.9928722 -0.1858564 -0.2973432 -1.208902
6 1.7669615 5.5914099 1.9928722 -0.1858564 -0.2973432 -1.779738
7 1.7669615 -0.1788443 -0.5017843 -0.1858564 -0.2973432 -1.506729
status Age
1 1.58511 0.7196612
2 1.58511 -1.3895316
4 1.58511 -1.3895316
5 1.58511 0.7196612
6 1.58511 0.7196612
7 1.58511 -1.3895316
8. Clustering Data
8.1 K-means Algoritma
Dalam menentukan center \(k\) optimum menggunakan elbow method dan dipilih \(k=5\) karena dari cluster 5 ke 6 penurunan nilai total withinss landai/tidak signifikan
RNGkind(sample.kind = "Rounding")
set.seed(298)
clusterDataCovid <- kmeans(x = CovidDataScale, centers = 5)8.2 Interpretasi/ cluster profiling
CovidData$cluster <- as.factor(clusterDataCovid$cluster)
head(CovidData) sex pneumonia pregnancy diabetes copd asthma inmsupr hypertension
1 0 1 0 0 0 0 0 0
2 0 1 0 0 0 0 0 0
4 1 1 0 0 0 0 0 1
5 1 1 0 1 0 0 0 0
6 0 1 0 0 1 1 0 1
7 0 1 0 0 0 0 0 1
cardiovascular obesity renal_chronic tobacco waktu status Age cluster
1 0 0 0 0 4 1 1 1
2 1 0 0 1 2 1 0 5
4 0 1 0 0 17 1 0 1
5 0 1 0 0 26 1 1 1
6 1 1 0 0 3 1 1 4
7 0 0 0 0 14 1 0 1
CovidProfile <- CovidData %>%
group_by(cluster) %>%
summarise_all(.funs = "mean")
head(CovidProfile)# A tibble: 5 x 16
cluster sex pneumonia pregnancy diabetes copd asthma inmsupr hypertension
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0.343 0.742 0.00125 0.382 0 0 0 0.421
2 2 0.463 0.148 0.00765 0.113 0 0 0 0.145
3 3 0.543 0.348 0.00635 0.233 0.00881 0.620 0.405 0.276
4 4 0.439 0.568 0 0.419 1 0.0549 0.0562 0.544
5 5 0.412 0.515 0.00119 0.421 0 0.0288 0.0567 0.629
# ... with 7 more variables: cardiovascular <dbl>, obesity <dbl>,
# renal_chronic <dbl>, tobacco <dbl>, waktu <dbl>, status <dbl>, Age <dbl>
CovidProfile %>%
tidyr::pivot_longer(-cluster) %>%
group_by(name) %>%
summarize(cluster_min_val = which.min(value),
cluster_max_val = which.max(value))# A tibble: 15 x 3
name cluster_min_val cluster_max_val
<chr> <int> <int>
1 Age 4 2
2 asthma 1 3
3 cardiovascular 1 5
4 copd 1 4
5 diabetes 2 5
6 hypertension 2 5
7 inmsupr 1 3
8 obesity 2 5
9 pneumonia 2 1
10 pregnancy 4 2
11 renal_chronic 2 5
12 sex 1 3
13 status 2 1
14 tobacco 2 4
15 waktu 1 2
Keterangan
1. Cluster 1: Pada Cluster 1 bisa dikatakan Cluster darurat karena pada cluster ini hampir 90% mengalami kematian. Komorbid yang dijumpai dalam cluster ini adalah pneumonia sementara tidak pasien yang memiliki komorbid asthma, cardivascular, copd dan inmsupr.
2. Cluster 2: Pada Cluster 2 berbanding terbalik denga cluster 1 yaitu 99% pasien pada cluster ini sembuh atau survive dari Covid-19. Pada Cluster ini lebih dari 50% pasien berusia muda dan semua pasien tidak dalam keadaan hamil.
3. Cluster 3: Pada cluster 3 komorbid yang sering dijumpai adalah asthma dan banyak pasien yang menalami penurunan imun tubuh.
4. Cluster 4: Pada Cluster 4 komorbid yang sering dijumpai adalah copd.
5. Cluster 5: Pada cluster 5 komorbid yang sering dijumpai adalah cardiovascular, hypertension, diabetes, renal_chronic.
8.3 Visualize clustering
library(factoextra)
fviz_cluster(object = clusterDataCovid,
data = CovidDataScale)9. Analisis Survival
Anlisis survival akan diterapkan pada 4 cluster yaitu cluster 1, cluster 3, cluster 4 dan cluster 5. Cluster 2 tidak dikutkan karena pada cluster 2 hampir semua pasien sembuh sementara dalam penelitian ini event berupa kematian.
9.1 Cluster 1
DataCluster1 <- CovidData %>%
filter(cluster == 1)
DataCluster1[,c("sex","pneumonia","pregnancy","diabetes","copd","asthma","inmsupr","hypertension","cardiovascular","obesity","renal_chronic","tobacco","status","Age")] <- lapply(DataCluster1[,c("sex","pneumonia","pregnancy","diabetes","copd","asthma","inmsupr","hypertension","cardiovascular","obesity","renal_chronic","tobacco","status","Age")],as.factor)
summary(DataCluster1) sex pneumonia pregnancy diabetes copd asthma inmsupr
0:20487 0: 8041 0:31141 0:19272 0:31180 0:31180 0:31180
1:10693 1:23139 1: 39 1:11908
hypertension cardiovascular obesity renal_chronic tobacco waktu
0:18056 0:31180 0:24044 0:28990 0:28648 Min. : 1.00
1:13124 1: 7136 1: 2190 1: 2532 1st Qu.: 6.00
Median : 10.00
Mean : 14.99
3rd Qu.: 16.00
Max. :100.00
status Age cluster
0: 1301 0:20457 1:31180
1:29879 1:10723 2: 0
3: 0
4: 0
5: 0
Berdasarkan summary dari DataCluster1 diperoleh tidak ada pasien yang memiliki komorbid copd, asthma dan inmsupr. Jika keempat variabel tersebut diikutkan akan mempengarahui model karena nilai disetiap varaibel sama semua yaitu 0 sehingga yang harus dilakukan adalah menghapus keempat variabel tersebut.
DataCluster1 <- CovidData %>%
filter(cluster == 1) %>%
select(-c("copd","asthma",'inmsupr','cardiovascular'))
head(DataCluster1) sex pneumonia pregnancy diabetes hypertension obesity renal_chronic tobacco
1 0 1 0 0 0 0 0 0
2 1 1 0 0 1 1 0 0
3 1 1 0 1 0 1 0 0
4 0 1 0 0 1 0 0 0
5 1 1 0 0 1 1 0 0
6 0 1 0 0 0 0 0 0
waktu status Age cluster
1 4 1 1 1
2 17 1 0 1
3 26 1 1 1
4 14 1 0 1
5 2 1 0 1
6 8 1 1 1
1. Uji proportional hazard
Sebelum dilakukan analisis akan dilakukan pengujian asumsi proportional hazard dengan dua pendekatan yaitu pendekatan grafik dan pendekatan Goodness Of Fit (GOF).
a. Pendekatan Grafik
Variabel sex
fit<-survfit(Surv(waktu,status)~ sex, data=DataCluster1)
ggsurvplot(fit, data = DataCluster1, fun = "cumhaz",
legend = "top",
legend.title = "Jenis Kelamin",
legend.labs = c("Laki-Laki", "Perempuan")
)Variabel pneumonia
fit<-survfit(Surv(waktu,status)~ pneumonia, data=DataCluster1)
ggsurvplot(fit, data = DataCluster1, fun = "cumhaz",
legend = "top",
legend.title = "Pneumonia",
legend.labs = c("no Pneumonia", "yes pneumonia")
)Variabel Pregnancy
fit<-survfit(Surv(waktu,status)~ pregnancy, data=DataCluster1)
ggsurvplot(fit, data = DataCluster1, fun = "cumhaz",
legend = "top",
legend.title = "Pregnancy",
legend.labs = c("no pregnancy", "yes pregnancy")
)Variabel diabetes
fit<-survfit(Surv(waktu,status)~ diabetes, data=DataCluster1)
ggsurvplot(fit, data = DataCluster1, fun = "cumhaz",
legend = "top",
legend.title = "Diabetes",
legend.labs = c("no Diabetes", "yes Diabetes")
)Variabel Hypertension
fit<-survfit(Surv(waktu,status)~ hypertension, data=DataCluster1)
ggsurvplot(fit, data = DataCluster1, fun = "cumhaz",
legend = "top",
legend.title = "Hyperetension",
legend.labs = c("no hypertension", "yes hypertension")
)Variabel obesity
fit<-survfit(Surv(waktu,status)~ obesity, data=DataCluster1)
ggsurvplot(fit, data = DataCluster1, fun = "cumhaz",
legend = "top",
legend.title = "Obesity",
legend.labs = c("no obesity", "yes obesity")
)Variabel renal chronic
fit<-survfit(Surv(waktu,status)~ renal_chronic, data=DataCluster1)
ggsurvplot(fit, data = DataCluster1, fun = "cumhaz",
legend = "top",
legend.title = "Renal Chronic",
legend.labs = c("no renal chronic", "yes renal chronic")
)variabel tobacco
fit<-survfit(Surv(waktu,status)~ tobacco, data=DataCluster1)
ggsurvplot(fit, data = DataCluster1, fun = "cumhaz",
legend = "top",
legend.title = "Tobacco",
legend.labs = c("no tobacco", "yes tobacco")
)Variabel Age
fit<-survfit(Surv(waktu,status)~ Age, data=DataCluster1)
ggsurvplot(fit, data = DataCluster1, fun = "cumhaz",
legend = "top",
legend.title = "Age",
legend.labs = c("young", "old")
)b. Pendekatan GOF
fit_cluster1<-coxph(Surv(waktu,status)~ sex+pneumonia+Age+pregnancy+hypertension+obesity+tobacco+diabetes+renal_chronic, data = DataCluster1)
cek_fitt_all1<-cox.zph(fit_cluster1, transform="km", global=TRUE)
print(cek_fitt_all1) chisq df p
sex 1.17e+02 1 <2e-16
pneumonia 1.03e+01 1 0.0013
Age 1.94e+02 1 <2e-16
pregnancy 2.97e-02 1 0.8632
hypertension 6.34e+02 1 <2e-16
obesity 3.45e-01 1 0.5572
tobacco 7.03e+00 1 0.0080
diabetes 9.30e+02 1 <2e-16
renal_chronic 4.39e+02 1 <2e-16
GLOBAL 1.42e+03 9 <2e-16
Perhatikan hasil pengujian ada beberapa yang tidak memenuhi asumsi proportional hazard yaitu sex, pneumonia, age, hypertension, tobacco, diabetes, renal_chronic. maka ke-tujuh variabel ini akan dinteraksikan dengan fungsi waktu yaitu fungsi heavside pada setiap variabel. Untuk dapat menentukan fungsi heavsidenya adalah dengan melihat grafik survival probabilty.
# VARIABEL SEX
fit<-survfit(Surv(waktu,status)~ sex , data=DataCluster1)
ggsurvplot(fit, data = DataCluster1,
pval = TRUE, conf.int = TRUE,
risk.table = TRUE, # Add risk table
risk.table.col = "strata", # Change risk table color by groups
linetype = "strata", # Change line type by groups
surv.median.line = "hv", # Specify median survival
ggtheme = theme_bw(), # Change ggplot2 theme
palette = c("#E7B800", "#2E9FDF"))
Perhatikan kecepatan grafik pasien laki-laki turun cepat pada hari ke 25 namun setelah hari ke 25 konstan sampai akhir penelitian. sehingga fungsi heavsidenya adalah:
\[ g(t)= \begin{cases} 1, & \mbox{jika}\ t\mbox{ < 25} \\ 0, & \mbox{jika}\ t\mbox{ >= 25} \end{cases}\]
# VARIABEL PNEUMONIA
fit<-survfit(Surv(waktu,status)~ pneumonia , data=DataCluster1)
ggsurvplot(fit, data = DataCluster1,
pval = TRUE, conf.int = TRUE,
risk.table = TRUE, # Add risk table
risk.table.col = "strata", # Change risk table color by groups
linetype = "strata", # Change line type by groups
surv.median.line = "hv", # Specify median survival
ggtheme = theme_bw(), # Change ggplot2 theme
palette = c("#E7B800", "#2E9FDF"))
Perhatikan kecepatan grafik pasien komorbid pneumonia turun cepat pada hari ke 25, namun setelah hari ke 25 konstan sampai akhir penelitian. sehingga fungsi heavsidenya adalah
\[ g(t)= \begin{cases} 1, & \mbox{jika}\ t\mbox{ < 25} \\ 0, & \mbox{jika}\ t\mbox{ >= 25} \end{cases}\]
# VARIABEL AGE
fit<-survfit(Surv(waktu,status)~ Age , data=DataCluster1)
ggsurvplot(fit, data = DataCluster1,
pval = TRUE, conf.int = TRUE,
risk.table = TRUE, # Add risk table
risk.table.col = "strata", # Change risk table color by groups
linetype = "strata", # Change line type by groups
surv.median.line = "hv", # Specify median survival
ggtheme = theme_bw(), # Change ggplot2 theme
palette = c("#E7B800", "#2E9FDF"))
Perhatikan kecepatan grafik pasien yang berusia tua turun cepat pada hari ke 25, namun setelah hari ke 25 konstan sampai akhir penelitian. sehinggan fungsi heavsidenya adalah
\[ g(t)= \begin{cases} 1, & \mbox{jika}\ t\mbox{ < 25} \\ 0, & \mbox{jika}\ t\mbox{ >= 25} \end{cases}\]
#VARIABE HYPERTENSION
fit<-survfit(Surv(waktu,status)~ hypertension , data=DataCluster1)
ggsurvplot(fit, data = DataCluster1,
pval = TRUE, conf.int = TRUE,
risk.table = TRUE, # Add risk table
risk.table.col = "strata", # Change risk table color by groups
linetype = "strata", # Change line type by groups
surv.median.line = "hv", # Specify median survival
ggtheme = theme_bw(), # Change ggplot2 theme
palette = c("#E7B800", "#2E9FDF"))
Perhatikan kecepatan grafik pasien yang hipertensi turun cepat pada hari ke 25 namun setelah hari ke 25 konstan sampai akhir penelitian. sehinggan fungsi heavsidenya adalah
\[ g(t)= \begin{cases} 1, & \mbox{jika}\ t\mbox{ < 25} \\ 0, & \mbox{jika}\ t\mbox{ >= 25} \end{cases}\]
fit<-survfit(Surv(waktu,status)~ tobacco, data=DataCluster1)
ggsurvplot(fit, data = DataCluster1,
pval = TRUE, conf.int = TRUE,
risk.table = TRUE, # Add risk table
risk.table.col = "strata", # Change risk table color by groups
linetype = "strata", # Change line type by groups
surv.median.line = "hv", # Specify median survival
ggtheme = theme_bw(), # Change ggplot2 theme
palette = c("#E7B800", "#2E9FDF"))
Perhatikan kecepatan grafik pasien yang perokok turun cepat pada hari ke 25 namun setelah hari ke 25 konstan sampai akhir penelitian. sehinggan fungsi heavsidenya adalah
\[ g(t)= \begin{cases} 1, & \mbox{jika}\ t\mbox{ < 25} \\ 0, & \mbox{jika}\ t\mbox{ >= 25} \end{cases}\]
fit<-survfit(Surv(waktu,status)~ diabetes, data=DataCluster1)
ggsurvplot(fit, data = DataCluster1,
pval = TRUE, conf.int = TRUE,
risk.table = TRUE, # Add risk table
risk.table.col = "strata", # Change risk table color by groups
linetype = "strata", # Change line type by groups
surv.median.line = "hv", # Specify median survival
ggtheme = theme_bw(), # Change ggplot2 theme
palette = c("#E7B800", "#2E9FDF"))
Perhatikan kecepatan grafik pasien yang diabetes cepat pada hari ke 25 namun setelah hari ke 25 konstan sampai akhir penelitian. sehinggan fungsi heavsidenya adalah
\[ g(t)= \begin{cases} 1, & \mbox{jika}\ t\mbox{ < 25} \\ 0, & \mbox{jika}\ t\mbox{ >= 25} \end{cases}\]
fit<-survfit(Surv(waktu,status)~ renal_chronic , data=DataCluster1)
ggsurvplot(fit, data = DataCluster1,
pval = TRUE, conf.int = TRUE,
risk.table = TRUE, # Add risk table
risk.table.col = "strata", # Change risk table color by groups
linetype = "strata", # Change line type by groups
surv.median.line = "hv", # Specify median survival
ggtheme = theme_bw(), # Change ggplot2 theme
palette = c("#E7B800", "#2E9FDF"))
Perhatikan kecepatan grafik pasien yang renal chronic turun cepat pada hari ke 25 namun setelah hari ke 25 konstan sampai akhir penelitian. sehinggan fungsi heavsidenya adalah
sehingga ke-tujuh variabel tersebut diinterkasikan dengan fungsi waktu masing-masing. namun sebelumnya membuat kolom interval yaitu jika t lebih besar atau sama dengan 25 maka nilanya 25 tapi kalau tidak nilainya 0
DataCluster1= DataCluster1 %>%
mutate(interval = ifelse(waktu<= 25, 0, 25))
head(DataCluster1) sex pneumonia pregnancy diabetes hypertension obesity renal_chronic tobacco
1 0 1 0 0 0 0 0 0
2 1 1 0 0 1 1 0 0
3 1 1 0 1 0 1 0 0
4 0 1 0 0 1 0 0 0
5 1 1 0 0 1 1 0 0
6 0 1 0 0 0 0 0 0
waktu status Age cluster interval
1 4 1 1 1 0
2 17 1 0 1 0
3 26 1 1 1 25
4 14 1 0 1 0
5 2 1 0 1 0
6 8 1 1 1 0
# Variabel Pneumonia
t.pneumonia_1 = t(DataCluster1$pneumonia)*(1-(DataCluster1$interval)/25)
t.pneumonia_2 = t(DataCluster1$pneumonia)*(DataCluster1$interval/25)
pneumonia_t1 = t(t.pneumonia_1 )
pneumonia_t2 = t(t.pneumonia_2)#variabel Sex
t.sex_1 = t(DataCluster1$sex)*(1-(DataCluster1$interval)/25)
t.sex_2 = t(DataCluster1$sex)*(DataCluster1$interval/25)
sex_t1 = t(t.sex_1 )
sex_t2 = t(t.sex_2)# Variabel Sex
t.hypertension_1 = t(DataCluster1$hypertension)*(1-(DataCluster1$interval)/25)
t.hypertension_2 = t(DataCluster1$hypertension)*(DataCluster1$interval/25)
hypertension_t1 = t(t.hypertension_1 )
hypertension_t2 = t(t.hypertension_2)#Variabel Age
t.Age_1 = t(DataCluster1$Age)*(1-(DataCluster1$interval)/25)
t.Age_2 = t(DataCluster1$Age)*(DataCluster1$interval/25)
Age_t1 = t(t.Age_1 )
Age_t2 = t(t.Age_2)### Variabel Tobacco
t.tobacco_1 = t(DataCluster1$tobacco)*(1-(DataCluster1$interval)/25)
t.tobacco_2 = t(DataCluster1$tobacco)*(DataCluster1$interval/25)
tobacco_t1 = t(t.tobacco_1)
tobacco_t2 = t(t.tobacco_2)# Variabel Diabetes
t.diabetes_1 = t(DataCluster1$diabetes)*(1-(DataCluster1$interval)/25)
t.diabetes_2 = t(DataCluster1$diabetes)*(DataCluster1$interval/25)
diabetes_t1 = t(t.diabetes_1 )
diabetes_t2 = t(t.diabetes_2)#Variabel renal chronic
t.renal_chronic_1 =
t(DataCluster1$renal_chronic)*(1-(DataCluster1$interval)/25)
t.renal_chronic_2 =
t(DataCluster1$renal_chronic)*(DataCluster1$interval/25)
renal_chronic_t1 = t(t.renal_chronic_1 )
renal_chronic_t2 = t(t.renal_chronic_2)DataCluster1=cbind(DataCluster1,pneumonia_t1,pneumonia_t2,sex_t1,sex_t2,hypertension_t1,hypertension_t2,Age_t1,Age_t2,tobacco_t1,tobacco_t2,diabetes_t1,diabetes_t2,renal_chronic_t1,renal_chronic_t2)
head(DataCluster1) sex pneumonia pregnancy diabetes hypertension obesity renal_chronic tobacco
1 0 1 0 0 0 0 0 0
2 1 1 0 0 1 1 0 0
3 1 1 0 1 0 1 0 0
4 0 1 0 0 1 0 0 0
5 1 1 0 0 1 1 0 0
6 0 1 0 0 0 0 0 0
waktu status Age cluster interval pneumonia_t1 pneumonia_t2 sex_t1 sex_t2
1 4 1 1 1 0 1 0 0 0
2 17 1 0 1 0 1 0 1 0
3 26 1 1 1 25 0 1 0 1
4 14 1 0 1 0 1 0 0 0
5 2 1 0 1 0 1 0 1 0
6 8 1 1 1 0 1 0 0 0
hypertension_t1 hypertension_t2 Age_t1 Age_t2 tobacco_t1 tobacco_t2
1 0 0 1 0 0 0
2 1 0 0 0 0 0
3 0 0 0 1 0 0
4 1 0 0 0 0 0
5 1 0 0 0 0 0
6 0 0 1 0 0 0
diabetes_t1 diabetes_t2 renal_chronic_t1 renal_chronic_t2
1 0 0 0 0
2 0 0 0 0
3 0 1 0 0
4 0 0 0 0
5 0 0 0 0
6 0 0 0 0
2. Model Extended Cox with Bresslow Method
Fit_Model_Cluster1<-coxph(Surv(waktu,status)~ pregnancy+obesity+pneumonia_t1+pneumonia_t2+sex_t1+sex_t2+hypertension_t1+hypertension_t2+Age_t1+Age_t2+tobacco_t1+tobacco_t2+diabetes_t1+diabetes_t2+renal_chronic_t1+renal_chronic_t2, data = DataCluster1, method = "breslow")
summary(Fit_Model_Cluster1)Call:
coxph(formula = Surv(waktu, status) ~ pregnancy + obesity + pneumonia_t1 +
pneumonia_t2 + sex_t1 + sex_t2 + hypertension_t1 + hypertension_t2 +
Age_t1 + Age_t2 + tobacco_t1 + tobacco_t2 + diabetes_t1 +
diabetes_t2 + renal_chronic_t1 + renal_chronic_t2, data = DataCluster1,
method = "breslow")
n= 31180, number of events= 29879
coef exp(coef) se(coef) z Pr(>|z|)
pregnancy -0.031851 0.968651 0.160725 -0.198 0.8429
obesity -0.001491 0.998510 0.014045 -0.106 0.9155
pneumonia_t1 0.027843 1.028235 0.013635 2.042 0.0412 *
pneumonia_t2 -1.501625 0.222768 0.041847 -35.883 < 2e-16 ***
sex_t1 0.109628 1.115863 0.012823 8.549 < 2e-16 ***
sex_t2 -0.553374 0.575007 0.052150 -10.611 < 2e-16 ***
hypertension_t1 0.065329 1.067510 0.013242 4.934 8.07e-07 ***
hypertension_t2 -1.259851 0.283696 0.063089 -19.969 < 2e-16 ***
Age_t1 0.063498 1.065558 0.012784 4.967 6.80e-07 ***
Age_t2 -0.723086 0.485253 0.049179 -14.703 < 2e-16 ***
tobacco_t1 0.023668 1.023950 0.021906 1.080 0.2800
tobacco_t2 -0.780017 0.458398 0.092202 -8.460 < 2e-16 ***
diabetes_t1 0.111644 1.118115 0.013187 8.466 < 2e-16 ***
diabetes_t2 -1.415757 0.242742 0.068008 -20.818 < 2e-16 ***
renal_chronic_t1 0.240653 1.272079 0.024665 9.757 < 2e-16 ***
renal_chronic_t2 -0.860596 0.422910 0.126612 -6.797 1.07e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
pregnancy 0.9687 1.0324 0.7069 1.3273
obesity 0.9985 1.0015 0.9714 1.0264
pneumonia_t1 1.0282 0.9725 1.0011 1.0561
pneumonia_t2 0.2228 4.4890 0.2052 0.2418
sex_t1 1.1159 0.8962 1.0882 1.1443
sex_t2 0.5750 1.7391 0.5191 0.6369
hypertension_t1 1.0675 0.9368 1.0402 1.0956
hypertension_t2 0.2837 3.5249 0.2507 0.3210
Age_t1 1.0656 0.9385 1.0392 1.0926
Age_t2 0.4853 2.0608 0.4407 0.5344
tobacco_t1 1.0240 0.9766 0.9809 1.0689
tobacco_t2 0.4584 2.1815 0.3826 0.5492
diabetes_t1 1.1181 0.8944 1.0896 1.1474
diabetes_t2 0.2427 4.1196 0.2124 0.2774
renal_chronic_t1 1.2721 0.7861 1.2120 1.3351
renal_chronic_t2 0.4229 2.3646 0.3300 0.5420
Concordance= 0.606 (se = 0.002 )
Likelihood ratio test= 15732 on 16 df, p=<2e-16
Wald test = 6644 on 16 df, p=<2e-16
Score (logrank) test = 9498 on 16 df, p=<2e-16
3. Model Improvment
model.stepwise.cluster1<-step(object = Fit_Model_Cluster1,direction = "both", trace=T,data=DataCluster1)Start: AIC=553964.4
Surv(waktu, status) ~ pregnancy + obesity + pneumonia_t1 + pneumonia_t2 +
sex_t1 + sex_t2 + hypertension_t1 + hypertension_t2 + Age_t1 +
Age_t2 + tobacco_t1 + tobacco_t2 + diabetes_t1 + diabetes_t2 +
renal_chronic_t1 + renal_chronic_t2
Df AIC
- obesity 1 553962
- pregnancy 1 553962
- tobacco_t1 1 553964
<none> 553964
- pneumonia_t1 1 553967
- hypertension_t1 1 553987
- Age_t1 1 553987
- renal_chronic_t2 1 554021
- diabetes_t1 1 554033
- sex_t1 1 554035
- renal_chronic_t1 1 554052
- tobacco_t2 1 554054
- sex_t2 1 554088
- Age_t2 1 554215
- hypertension_t2 1 554448
- diabetes_t2 1 554481
- pneumonia_t2 1 555469
Step: AIC=553962.4
Surv(waktu, status) ~ pregnancy + pneumonia_t1 + pneumonia_t2 +
sex_t1 + sex_t2 + hypertension_t1 + hypertension_t2 + Age_t1 +
Age_t2 + tobacco_t1 + tobacco_t2 + diabetes_t1 + diabetes_t2 +
renal_chronic_t1 + renal_chronic_t2
Df AIC
- pregnancy 1 553960
- tobacco_t1 1 553962
<none> 553962
+ obesity 1 553964
- pneumonia_t1 1 553965
- hypertension_t1 1 553985
- Age_t1 1 553985
- renal_chronic_t2 1 554019
- diabetes_t1 1 554031
- sex_t1 1 554033
- renal_chronic_t1 1 554050
- tobacco_t2 1 554052
- sex_t2 1 554086
- Age_t2 1 554213
- hypertension_t2 1 554446
- diabetes_t2 1 554479
- pneumonia_t2 1 555468
Step: AIC=553960.4
Surv(waktu, status) ~ pneumonia_t1 + pneumonia_t2 + sex_t1 +
sex_t2 + hypertension_t1 + hypertension_t2 + Age_t1 + Age_t2 +
tobacco_t1 + tobacco_t2 + diabetes_t1 + diabetes_t2 + renal_chronic_t1 +
renal_chronic_t2
Df AIC
- tobacco_t1 1 553960
<none> 553960
+ pregnancy 1 553962
+ obesity 1 553962
- pneumonia_t1 1 553963
- hypertension_t1 1 553983
- Age_t1 1 553983
- renal_chronic_t2 1 554017
- diabetes_t1 1 554029
- sex_t1 1 554031
- renal_chronic_t1 1 554048
- tobacco_t2 1 554050
- sex_t2 1 554084
- Age_t2 1 554211
- hypertension_t2 1 554444
- diabetes_t2 1 554477
- pneumonia_t2 1 555466
Step: AIC=553959.6
Surv(waktu, status) ~ pneumonia_t1 + pneumonia_t2 + sex_t1 +
sex_t2 + hypertension_t1 + hypertension_t2 + Age_t1 + Age_t2 +
tobacco_t2 + diabetes_t1 + diabetes_t2 + renal_chronic_t1 +
renal_chronic_t2
Df AIC
<none> 553960
+ tobacco_t1 1 553960
+ pregnancy 1 553962
+ obesity 1 553962
- pneumonia_t1 1 553962
- hypertension_t1 1 553982
- Age_t1 1 553982
- renal_chronic_t2 1 554016
- sex_t1 1 554029
- diabetes_t1 1 554029
- renal_chronic_t1 1 554047
- tobacco_t2 1 554049
- sex_t2 1 554084
- Age_t2 1 554211
- hypertension_t2 1 554444
- diabetes_t2 1 554477
- pneumonia_t2 1 555468
4. Model Final
Model_Final_Cluster1<-coxph(Surv(waktu, status) ~
+ pneumonia_t2 + sex_t1 + sex_t2 + hypertension_t1 +
hypertension_t2 + Age_t1 + Age_t2 + tobacco_t2 + diabetes_t1 +
diabetes_t2 + renal_chronic_t1 +
renal_chronic_t2, data = DataCluster1, method = "breslow")
summary(Model_Final_Cluster1)Call:
coxph(formula = Surv(waktu, status) ~ +pneumonia_t2 + sex_t1 +
sex_t2 + hypertension_t1 + hypertension_t2 + Age_t1 + Age_t2 +
tobacco_t2 + diabetes_t1 + diabetes_t2 + renal_chronic_t1 +
renal_chronic_t2, data = DataCluster1, method = "breslow")
n= 31180, number of events= 29879
coef exp(coef) se(coef) z Pr(>|z|)
pneumonia_t2 -1.50954 0.22101 0.04161 -36.280 < 2e-16 ***
sex_t1 0.10793 1.11397 0.01270 8.500 < 2e-16 ***
sex_t2 -0.55884 0.57187 0.05205 -10.736 < 2e-16 ***
hypertension_t1 0.06566 1.06786 0.01319 4.977 6.47e-07 ***
hypertension_t2 -1.26332 0.28271 0.06295 -20.069 < 2e-16 ***
Age_t1 0.06394 1.06602 0.01267 5.045 4.54e-07 ***
Age_t2 -0.73190 0.48099 0.04897 -14.945 < 2e-16 ***
tobacco_t2 -0.78736 0.45504 0.09213 -8.547 < 2e-16 ***
diabetes_t1 0.11329 1.11996 0.01317 8.603 < 2e-16 ***
diabetes_t2 -1.41660 0.24254 0.06789 -20.866 < 2e-16 ***
renal_chronic_t1 0.23788 1.26855 0.02459 9.675 < 2e-16 ***
renal_chronic_t2 -0.86118 0.42266 0.12661 -6.802 1.03e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
pneumonia_t2 0.2210 4.5246 0.2037 0.2398
sex_t1 1.1140 0.8977 1.0866 1.1420
sex_t2 0.5719 1.7486 0.5164 0.6333
hypertension_t1 1.0679 0.9365 1.0406 1.0958
hypertension_t2 0.2827 3.5371 0.2499 0.3198
Age_t1 1.0660 0.9381 1.0399 1.0928
Age_t2 0.4810 2.0790 0.4370 0.5294
tobacco_t2 0.4550 2.1976 0.3799 0.5451
diabetes_t1 1.1200 0.8929 1.0914 1.1492
diabetes_t2 0.2425 4.1231 0.2123 0.2771
renal_chronic_t1 1.2686 0.7883 1.2089 1.3312
renal_chronic_t2 0.4227 2.3660 0.3298 0.5417
Concordance= 0.609 (se = 0.002 )
Likelihood ratio test= 15726 on 12 df, p=<2e-16
Wald test = 6657 on 12 df, p=<2e-16
Score (logrank) test = 9498 on 12 df, p=<2e-16
9.2 Cluster 2
DataCluster2 <- CovidData %>%
filter(cluster == 2)
DataCluster2[,c("sex","pneumonia","pregnancy","diabetes","copd","asthma","inmsupr","hypertension","cardiovascular","obesity","renal_chronic","tobacco","status","Age")] <- lapply(DataCluster2[,c("sex","pneumonia","pregnancy","diabetes","copd","asthma","inmsupr","hypertension","cardiovascular","obesity","renal_chronic","tobacco","status","Age")],as.factor)
summary(DataCluster2) sex pneumonia pregnancy diabetes copd asthma inmsupr
0:43765 0:69413 0:80828 0:72215 0:81451 0:81451 0:81451
1:37686 1:12038 1: 623 1: 9236
hypertension cardiovascular obesity renal_chronic tobacco waktu
0:69655 0:81451 0:66847 0:80602 0:75397 Min. : 42.00
1:11796 1:14604 1: 849 1: 6054 1st Qu.:100.00
Median :100.00
Mean : 99.99
3rd Qu.:100.00
Max. :100.00
status Age cluster
0:81439 0:15585 1: 0
1: 12 1:65866 2:81451
3: 0
4: 0
5: 0
Karena Clsuter 2 banyak yang survive maka cluster 2 tidak dilakukan analisis karena batas penelitian ini adalah eventnya adalah kematian
9.3 Cluster 3
DataCluster3<- CovidData %>%
filter(cluster == 3)
DataCluster3[,c("sex","pneumonia","pregnancy","diabetes","copd","asthma","inmsupr","hypertension","cardiovascular","obesity","renal_chronic","tobacco","status","Age")] <- lapply(DataCluster3[,c("sex","pneumonia","pregnancy","diabetes","copd","asthma","inmsupr","hypertension","cardiovascular","obesity","renal_chronic","tobacco","status","Age")],as.factor)
summary(DataCluster3) sex pneumonia pregnancy diabetes copd asthma inmsupr hypertension
0:2229 0:3180 0:4849 0:3745 0:4837 0:1852 0:2903 0:3532
1:2651 1:1700 1: 31 1:1135 1: 43 1:3028 1:1977 1:1348
cardiovascular obesity renal_chronic tobacco waktu status
0:4843 0:3592 0:4484 0:4484 Min. : 1.00 0:3294
1: 37 1:1288 1: 396 1: 396 1st Qu.: 15.00 1:1586
Median :100.00
Mean : 70.97
3rd Qu.:100.00
Max. :100.00
Age cluster
0:1473 1: 0
1:3407 2: 0
3:4880
4: 0
5: 0
DataCluster3<- CovidData %>%
filter(cluster == 3)
head(DataCluster3) sex pneumonia pregnancy diabetes copd asthma inmsupr hypertension
1 1 1 0 0 0 0 1 1
2 1 1 0 0 0 0 1 1
3 0 1 0 0 0 1 1 1
4 0 1 0 0 0 0 1 1
5 1 0 0 0 0 0 1 0
6 0 1 0 1 0 0 1 0
cardiovascular obesity renal_chronic tobacco waktu status Age cluster
1 0 0 1 0 7 1 1 3
2 0 1 1 0 12 1 1 3
3 0 0 0 1 22 1 1 3
4 0 0 0 0 9 1 0 3
5 0 0 0 0 9 1 1 3
6 0 0 1 1 6 1 0 3
Sebelum diterapkan model extended cox perlu dilakukan pengujian apakah semua variabel independent terhadap waktu. uji asumsi proportional hazarad ada dua pendekatan yaitu pendekatan secara grafik dan pendekatan GOF.
a. Pendekatan Grafik
Variabel sex
fit<-survfit(Surv(waktu,status)~ sex, data=DataCluster3)
ggsurvplot(fit, data = DataCluster3, fun = "cumhaz",
legend = "top",
legend.title = "Jenis Kelamin",
legend.labs = c("Laki-Laki", "Perempuan")
)Variabel pneumonia
fit<-survfit(Surv(waktu,status)~ pneumonia, data=DataCluster3)
ggsurvplot(fit, data = DataCluster3, fun = "cumhaz",
legend = "top",
legend.title = "Pneumonia",
legend.labs = c("no Pneumonia", "yes pneumonia")
)Variabel Pregnancy
fit<-survfit(Surv(waktu,status)~ pregnancy, data=DataCluster3)
ggsurvplot(fit, data = DataCluster3, fun = "cumhaz",
legend = "top",
legend.title = "Pregnancy",
legend.labs = c("no pregnancy", "yes pregnancy")
)Variabel diabetes
fit<-survfit(Surv(waktu,status)~ diabetes, data=DataCluster3)
ggsurvplot(fit, data = DataCluster3, fun = "cumhaz",
legend = "top",
legend.title = "Diabetes",
legend.labs = c("no Diabetes", "yes Diabetes")
)Variabel Hypertension
fit<-survfit(Surv(waktu,status)~ hypertension, data=DataCluster3)
ggsurvplot(fit, data = DataCluster3, fun = "cumhaz",
legend = "top",
legend.title = "Hyperetension",
legend.labs = c("no hypertension", "yes hypertension")
)Variabel obesity
fit<-survfit(Surv(waktu,status)~ obesity, data=DataCluster3)
ggsurvplot(fit, data = DataCluster3, fun = "cumhaz",
legend = "top",
legend.title = "Obesity",
legend.labs = c("no obesity", "yes obesity")
)Variabel renal chronic
fit<-survfit(Surv(waktu,status)~ renal_chronic, data=DataCluster3)
ggsurvplot(fit, data = DataCluster3, fun = "cumhaz",
legend = "top",
legend.title = "Renal Chronic",
legend.labs = c("no renal chronic", "yes renal chronic")
)Variabel tobacco
fit<-survfit(Surv(waktu,status)~ tobacco, data=DataCluster3)
ggsurvplot(fit, data = DataCluster3, fun = "cumhaz",
legend = "top",
legend.title = "Tobacco",
legend.labs = c("no tobacco", "yes tobacco")
)Variabel Age
fit<-survfit(Surv(waktu,status)~ Age, data=DataCluster3)
ggsurvplot(fit, data = DataCluster3, fun = "cumhaz",
legend = "top",
legend.title = "Age",
legend.labs = c("young", "old")
)Variabel COPD
fit<-survfit(Surv(waktu,status)~ copd, data=DataCluster3)
ggsurvplot(fit, data = DataCluster3, fun = "cumhaz",
legend = "top",
legend.title = "Age",
legend.labs = c("young", "old")
)Variabel inmsupr
fit<-survfit(Surv(waktu,status)~ inmsupr, data=DataCluster3)
ggsurvplot(fit, data = DataCluster3, fun = "cumhaz",
legend = "top",
legend.title = "inmsupr",
legend.labs = c("no inmsupr", "yes inmsupr")
)Variabel asthma
fit<-survfit(Surv(waktu,status)~ asthma, data=DataCluster3)
ggsurvplot(fit, data = DataCluster3, fun = "cumhaz",
legend = "top",
legend.title = "asthma",
legend.labs = c("no asthma", "yes asthma")
)Variabel cardiovascular
fit<-survfit(Surv(waktu,status)~ cardiovascular, data=DataCluster3)
ggsurvplot(fit, data = DataCluster3, fun = "cumhaz",
legend = "top",
legend.title = "Cardiovascular",
legend.labs = c("No", "Yes")
)b. Pendekatan GOF
fit_cluster3<-coxph(Surv(waktu,status)~ sex+pneumonia+Age+pregnancy+hypertension+obesity+tobacco+diabetes+renal_chronic+inmsupr+asthma+cardiovascular+copd, data = DataCluster3)
cek_fitt_all3<-cox.zph(fit_cluster3, transform="km", global=TRUE)
print(cek_fitt_all3) chisq df p
sex 4.480 1 0.03430
pneumonia 68.983 1 < 2e-16
Age 13.476 1 0.00024
pregnancy 2.327 1 0.12715
hypertension 1.995 1 0.15787
obesity 1.958 1 0.16171
tobacco 0.116 1 0.73356
diabetes 1.501 1 0.22053
renal_chronic 0.697 1 0.40380
inmsupr 8.815 1 0.00299
asthma 7.261 1 0.00705
cardiovascular 0.381 1 0.53730
copd 0.204 1 0.65118
GLOBAL 107.288 13 < 2e-16
Ada 4 variabel yang tidak memenuhi asumsi proportional hazard yaitu: pneumonia, Age, inmsupr, asthma dengan taraf signifikansi 10%. maka ke-empat variabel ini akan dinteraksikan dengan fungsi waktu yaitu fungsi heavside pada setiap variabel. Untuk dapat menentukan fungsi heavsidenya adalah dengan melihat grafik survival probabilty.
# VARIABEL PNEUMONIA
fit<-survfit(Surv(waktu,status)~ pneumonia , data=DataCluster3)
ggsurvplot(fit, data = DataCluster3,
pval = TRUE, conf.int = TRUE,
risk.table = TRUE, # Add risk table
risk.table.col = "strata", # Change risk table color by groups
linetype = "strata", # Change line type by groups
surv.median.line = "hv", # Specify median survival
ggtheme = theme_bw(), # Change ggplot2 theme
palette = c("#E7B800", "#2E9FDF"))
Perhatikan kecepatan grafik pasien pneumonia turun cepat pada hari ke 25 namun setelah hari ke 25 konstan sampai akhir penelitian. sehinggan fungsi heavsidenya adalah
\[ g(t)= \begin{cases} 1, & \mbox{jika}\ t\mbox{ < 25} \\ 0, & \mbox{jika}\ t\mbox{ >= 25} \end{cases}\]
# VARIABEL Age
fit<-survfit(Surv(waktu,status)~ Age , data=DataCluster3)
ggsurvplot(fit, data = DataCluster3,
pval = TRUE, conf.int = TRUE,
risk.table = TRUE, # Add risk table
risk.table.col = "strata", # Change risk table color by groups
linetype = "strata", # Change line type by groups
surv.median.line = "hv", # Specify median survival
ggtheme = theme_bw(), # Change ggplot2 theme
palette = c("#E7B800", "#2E9FDF"))# VARIABEL Asthma
fit<-survfit(Surv(waktu,status)~ asthma , data=DataCluster3)
ggsurvplot(fit, data = DataCluster3,
pval = TRUE, conf.int = TRUE,
risk.table = TRUE, # Add risk table
risk.table.col = "strata", # Change risk table color by groups
linetype = "strata", # Change line type by groups
surv.median.line = "hv", # Specify median survival
ggtheme = theme_bw(), # Change ggplot2 theme
palette = c("#E7B800", "#2E9FDF"))# VARIABEL inmsupr
fit<-survfit(Surv(waktu,status)~ inmsupr, data=DataCluster3)
ggsurvplot(fit, data = DataCluster3,
pval = TRUE, conf.int = TRUE,
risk.table = TRUE, # Add risk table
risk.table.col = "strata", # Change risk table color by groups
linetype = "strata", # Change line type by groups
surv.median.line = "hv", # Specify median survival
ggtheme = theme_bw(), # Change ggplot2 theme
palette = c("#E7B800", "#2E9FDF"))
sehingga ke-empat tersebut diinterkasikan dengan fungsi waktu masing-masing. namun sebelumnya membuat kolom interval yaitu jika t lebih besar atau sama dengan 25 maka nilanya 25 tapi kalau tidak nilainya 0
DataCluster3= DataCluster3 %>%
mutate(interval = ifelse(waktu<= 25, 0, 25))
head(DataCluster3) sex pneumonia pregnancy diabetes copd asthma inmsupr hypertension
1 1 1 0 0 0 0 1 1
2 1 1 0 0 0 0 1 1
3 0 1 0 0 0 1 1 1
4 0 1 0 0 0 0 1 1
5 1 0 0 0 0 0 1 0
6 0 1 0 1 0 0 1 0
cardiovascular obesity renal_chronic tobacco waktu status Age cluster
1 0 0 1 0 7 1 1 3
2 0 1 1 0 12 1 1 3
3 0 0 0 1 22 1 1 3
4 0 0 0 0 9 1 0 3
5 0 0 0 0 9 1 1 3
6 0 0 1 1 6 1 0 3
interval
1 0
2 0
3 0
4 0
5 0
6 0
t.pneumonia_1 = t(DataCluster3$pneumonia)*(1-(DataCluster3$interval)/25)
t.pneumonia_2 = t(DataCluster3$pneumonia)*(DataCluster3$interval/25)
pneumonia_t1 = t(t.pneumonia_1 )
pneumonia_t2 = t(t.pneumonia_2)t.Age_1 = t(DataCluster3$Age)*(1-(DataCluster3$interval)/25)
t.Age_2 = t(DataCluster3$Age)*(DataCluster3$interval/25)
Age_t1 = t(t.Age_1 )
Age_t2 = t(t.Age_2)t.asthma_1 = t(DataCluster3$asthma)*(1-(DataCluster3$interval)/25)
t.asthma_2 = t(DataCluster3$asthma)*(DataCluster3$interval/25)
asthma_t1 = t(t.asthma_1 )
asthma_t2 = t(t.asthma_2)t.inmsupr_1 = t(DataCluster3$inmsupr)*(1-(DataCluster3$interval)/25)
t.inmsupr_2 = t(DataCluster3$inmsupr)*(DataCluster3$interval/25)
inmsupr_t1 = t(t.inmsupr_1 )
inmsupr_t2 = t(t.inmsupr_2)DataCluster3=cbind(DataCluster3,pneumonia_t1,pneumonia_t2,asthma_t1,asthma_t2,inmsupr_t2,inmsupr_t1,Age_t1,Age_t2)
head(DataCluster3) sex pneumonia pregnancy diabetes copd asthma inmsupr hypertension
1 1 1 0 0 0 0 1 1
2 1 1 0 0 0 0 1 1
3 0 1 0 0 0 1 1 1
4 0 1 0 0 0 0 1 1
5 1 0 0 0 0 0 1 0
6 0 1 0 1 0 0 1 0
cardiovascular obesity renal_chronic tobacco waktu status Age cluster
1 0 0 1 0 7 1 1 3
2 0 1 1 0 12 1 1 3
3 0 0 0 1 22 1 1 3
4 0 0 0 0 9 1 0 3
5 0 0 0 0 9 1 1 3
6 0 0 1 1 6 1 0 3
interval pneumonia_t1 pneumonia_t2 asthma_t1 asthma_t2 inmsupr_t2 inmsupr_t1
1 0 1 0 0 0 0 1
2 0 1 0 0 0 0 1
3 0 1 0 1 0 0 1
4 0 1 0 0 0 0 1
5 0 0 0 0 0 0 1
6 0 1 0 0 0 0 1
Age_t1 Age_t2
1 1 0
2 1 0
3 1 0
4 0 0
5 1 0
6 0 0
2. Model Extended Cox with Breslow Method
Fit_Model_Cluster3<-coxph(Surv(waktu,status)~ sex+pregnancy+hypertension+obesity+tobacco+diabetes+renal_chronic+cardiovascular+copd+inmsupr_t1+inmsupr_t2+asthma_t1 +asthma_t2+ pneumonia_t1+pneumonia_t2+Age_t1+Age_t2, data = DataCluster3, method = "breslow")
summary(Fit_Model_Cluster3)Call:
coxph(formula = Surv(waktu, status) ~ sex + pregnancy + hypertension +
obesity + tobacco + diabetes + renal_chronic + cardiovascular +
copd + inmsupr_t1 + inmsupr_t2 + asthma_t1 + asthma_t2 +
pneumonia_t1 + pneumonia_t2 + Age_t1 + Age_t2, data = DataCluster3,
method = "breslow")
n= 4880, number of events= 1586
coef exp(coef) se(coef) z Pr(>|z|)
sex 8.152e-02 1.085e+00 5.211e-02 1.564 0.117718
pregnancy 4.987e-01 1.647e+00 5.040e-01 0.990 0.322397
hypertension 2.160e-02 1.022e+00 5.570e-02 0.388 0.698121
obesity -3.732e-02 9.634e-01 6.009e-02 -0.621 0.534570
tobacco -2.475e-02 9.756e-01 8.983e-02 -0.276 0.782896
diabetes -1.131e-02 9.888e-01 5.590e-02 -0.202 0.839703
renal_chronic 6.937e-02 1.072e+00 7.199e-02 0.964 0.335284
cardiovascular -7.459e-02 9.281e-01 4.286e-01 -0.174 0.861841
copd -3.180e-02 9.687e-01 4.323e-01 -0.074 0.941368
inmsupr_t1 1.252e-01 1.133e+00 1.789e-01 0.700 0.483882
inmsupr_t2 -2.276e+01 1.302e-10 5.152e+02 -0.044 0.964762
asthma_t1 -8.851e-02 9.153e-01 1.764e-01 -0.502 0.615895
asthma_t2 -2.349e+01 6.283e-11 5.152e+02 -0.046 0.963635
pneumonia_t1 -1.922e-01 8.251e-01 5.680e-02 -3.384 0.000713 ***
pneumonia_t2 2.236e+00 9.354e+00 2.905e-01 7.697 1.4e-14 ***
Age_t1 6.596e-03 1.007e+00 5.323e-02 0.124 0.901378
Age_t2 -1.055e-01 8.998e-01 2.757e-01 -0.383 0.701872
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
sex 1.085e+00 9.217e-01 0.9796 1.2016
pregnancy 1.647e+00 6.073e-01 0.6132 4.4218
hypertension 1.022e+00 9.786e-01 0.9162 1.1397
obesity 9.634e-01 1.038e+00 0.8563 1.0838
tobacco 9.756e-01 1.025e+00 0.8181 1.1634
diabetes 9.888e-01 1.011e+00 0.8862 1.1032
renal_chronic 1.072e+00 9.330e-01 0.9308 1.2343
cardiovascular 9.281e-01 1.077e+00 0.4007 2.1500
copd 9.687e-01 1.032e+00 0.4151 2.2604
inmsupr_t1 1.133e+00 8.823e-01 0.7982 1.6093
inmsupr_t2 1.302e-10 7.682e+09 0.0000 Inf
asthma_t1 9.153e-01 1.093e+00 0.6477 1.2934
asthma_t2 6.283e-11 1.592e+10 0.0000 Inf
pneumonia_t1 8.251e-01 1.212e+00 0.7382 0.9223
pneumonia_t2 9.354e+00 1.069e-01 5.2934 16.5301
Age_t1 1.007e+00 9.934e-01 0.9069 1.1173
Age_t2 8.998e-01 1.111e+00 0.5242 1.5447
Concordance= 0.92 (se = 0.002 )
Likelihood ratio test= 5976 on 17 df, p=<2e-16
Wald test = 116.5 on 17 df, p=<2e-16
Score (logrank) test = 7121 on 17 df, p=<2e-16
Biasanya Anda dapat mengabaikan pesan tersebut, sebagian besar untuk informasi Anda. Itu hal utama yang perlu diperhatikan adalah Sebuah. Ketika salah satu koefisien menuju tak terhingga dalam model Cox, Wald uji signifikansi beta/se(beta) rusak, dan tidak lagi dapat diandalkan. Namun tes LR masih valid. Karenanya rutinitas seperti stepAIC baik-baik saja. Begitu juga nilai prediksi, residu, dll. Sebenarnya itu hanya Uji Wald yang perlu diabaikan: didasarkan pada deret Taylor yang hanya tidak bekerja sejauh itu dari nol maka dari itu perlu membuang variabel yang menghasilkan infinity
Fit_Model_Cluster3<-coxph(Surv(waktu,status)~ sex+pregnancy+hypertension+obesity+tobacco+diabetes+renal_chronic+cardiovascular+copd+inmsupr_t1+asthma_t1 + pneumonia_t1+pneumonia_t2+Age_t1+Age_t2, data = DataCluster3, method = "breslow")
summary(Fit_Model_Cluster3)Call:
coxph(formula = Surv(waktu, status) ~ sex + pregnancy + hypertension +
obesity + tobacco + diabetes + renal_chronic + cardiovascular +
copd + inmsupr_t1 + asthma_t1 + pneumonia_t1 + pneumonia_t2 +
Age_t1 + Age_t2, data = DataCluster3, method = "breslow")
n= 4880, number of events= 1586
coef exp(coef) se(coef) z Pr(>|z|)
sex -0.09383 0.91044 0.05227 -1.795 0.072658 .
pregnancy 0.88582 2.42497 0.50439 1.756 0.079049 .
hypertension 0.03777 1.03849 0.05524 0.684 0.494193
obesity -0.04748 0.95363 0.06019 -0.789 0.430212
tobacco -0.11480 0.89154 0.09024 -1.272 0.203290
diabetes 0.07676 1.07978 0.05526 1.389 0.164810
renal_chronic 0.19930 1.22055 0.07214 2.763 0.005731 **
cardiovascular -1.21153 0.29774 0.33878 -3.576 0.000349 ***
copd -1.40264 0.24595 0.34149 -4.107 4e-05 ***
inmsupr_t1 2.28540 9.82961 0.07583 30.140 < 2e-16 ***
asthma_t1 1.90470 6.71736 0.07106 26.805 < 2e-16 ***
pneumonia_t1 0.22963 1.25813 0.06323 3.632 0.000282 ***
pneumonia_t2 -0.59466 0.55175 0.16427 -3.620 0.000295 ***
Age_t1 0.14136 1.15184 0.05569 2.538 0.011142 *
Age_t2 -2.98052 0.05077 0.17112 -17.418 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
sex 0.91044 1.0984 0.8218 1.0087
pregnancy 2.42497 0.4124 0.9023 6.5169
hypertension 1.03849 0.9629 0.9319 1.1572
obesity 0.95363 1.0486 0.8475 1.0730
tobacco 0.89154 1.1217 0.7470 1.0640
diabetes 1.07978 0.9261 0.9689 1.2033
renal_chronic 1.22055 0.8193 1.0596 1.4059
cardiovascular 0.29774 3.3586 0.1533 0.5784
copd 0.24595 4.0659 0.1259 0.4803
inmsupr_t1 9.82961 0.1017 8.4721 11.4046
asthma_t1 6.71736 0.1489 5.8441 7.7212
pneumonia_t1 1.25813 0.7948 1.1115 1.4241
pneumonia_t2 0.55175 1.8124 0.3999 0.7613
Age_t1 1.15184 0.8682 1.0327 1.2847
Age_t2 0.05077 19.6980 0.0363 0.0710
Concordance= 0.903 (se = 0.003 )
Likelihood ratio test= 5034 on 15 df, p=<2e-16
Wald test = 2490 on 15 df, p=<2e-16
Score (logrank) test = 7074 on 15 df, p=<2e-16
3. Model Improvment
model.stepwise.cluster3<-step(object = Fit_Model_Cluster3,direction = "both", trace=T,data=DataCluster3)Start: AIC=21381.46
Surv(waktu, status) ~ sex + pregnancy + hypertension + obesity +
tobacco + diabetes + renal_chronic + cardiovascular + copd +
inmsupr_t1 + asthma_t1 + pneumonia_t1 + pneumonia_t2 + Age_t1 +
Age_t2
Df AIC
- hypertension 1 21380
- obesity 1 21380
- tobacco 1 21381
- diabetes 1 21381
<none> 21382
- pregnancy 1 21382
- sex 1 21383
- Age_t1 1 21386
- renal_chronic 1 21387
- pneumonia_t1 1 21393
- pneumonia_t2 1 21395
- cardiovascular 1 21397
- copd 1 21404
- Age_t2 1 21882
- asthma_t1 1 21961
- inmsupr_t1 1 22185
Step: AIC=21379.93
Surv(waktu, status) ~ sex + pregnancy + obesity + tobacco + diabetes +
renal_chronic + cardiovascular + copd + inmsupr_t1 + asthma_t1 +
pneumonia_t1 + pneumonia_t2 + Age_t1 + Age_t2
Df AIC
- obesity 1 21378
- tobacco 1 21380
<none> 21380
- pregnancy 1 21380
- diabetes 1 21380
- sex 1 21381
+ hypertension 1 21382
- Age_t1 1 21384
- renal_chronic 1 21386
- pneumonia_t1 1 21391
- pneumonia_t2 1 21393
- cardiovascular 1 21396
- copd 1 21403
- Age_t2 1 21895
- asthma_t1 1 21960
- inmsupr_t1 1 22183
Step: AIC=21378.43
Surv(waktu, status) ~ sex + pregnancy + tobacco + diabetes +
renal_chronic + cardiovascular + copd + inmsupr_t1 + asthma_t1 +
pneumonia_t1 + pneumonia_t2 + Age_t1 + Age_t2
Df AIC
- tobacco 1 21378
<none> 21378
- diabetes 1 21379
- pregnancy 1 21379
- sex 1 21380
+ obesity 1 21380
+ hypertension 1 21380
- Age_t1 1 21382
- renal_chronic 1 21385
- pneumonia_t1 1 21390
- pneumonia_t2 1 21391
- cardiovascular 1 21396
- copd 1 21403
- Age_t2 1 21893
- asthma_t1 1 21959
- inmsupr_t1 1 22192
Step: AIC=21377.92
Surv(waktu, status) ~ sex + pregnancy + diabetes + renal_chronic +
cardiovascular + copd + inmsupr_t1 + asthma_t1 + pneumonia_t1 +
pneumonia_t2 + Age_t1 + Age_t2
Df AIC
<none> 21378
- diabetes 1 21378
- pregnancy 1 21378
+ tobacco 1 21378
- sex 1 21379
+ obesity 1 21380
+ hypertension 1 21380
- Age_t1 1 21382
- renal_chronic 1 21384
- pneumonia_t1 1 21389
- pneumonia_t2 1 21391
- cardiovascular 1 21395
- copd 1 21404
- Age_t2 1 21897
- asthma_t1 1 21957
- inmsupr_t1 1 22190
4. Model Final
Model_Final_Cluster3<-coxph(Surv(waktu, status) ~ sex + pregnancy + diabetes + renal_chronic +
cardiovascular + copd + inmsupr_t1 + asthma_t1 + pneumonia_t1 +
pneumonia_t2 + Age_t1 + Age_t2, data = DataCluster3, method = "breslow")
summary(Model_Final_Cluster3)Call:
coxph(formula = Surv(waktu, status) ~ sex + pregnancy + diabetes +
renal_chronic + cardiovascular + copd + inmsupr_t1 + asthma_t1 +
pneumonia_t1 + pneumonia_t2 + Age_t1 + Age_t2, data = DataCluster3,
method = "breslow")
n= 4880, number of events= 1586
coef exp(coef) se(coef) z Pr(>|z|)
sex -0.08358 0.91981 0.05134 -1.628 0.103499
pregnancy 0.88464 2.42211 0.50413 1.755 0.079294 .
diabetes 0.08002 1.08331 0.05409 1.479 0.139053
renal_chronic 0.20489 1.22739 0.07092 2.889 0.003862 **
cardiovascular -1.23765 0.29007 0.33711 -3.671 0.000241 ***
copd -1.46186 0.23181 0.33921 -4.310 1.64e-05 ***
inmsupr_t1 2.28054 9.78198 0.07494 30.433 < 2e-16 ***
asthma_t1 1.89739 6.66846 0.07055 26.893 < 2e-16 ***
pneumonia_t1 0.22401 1.25108 0.06310 3.550 0.000385 ***
pneumonia_t2 -0.60041 0.54859 0.16417 -3.657 0.000255 ***
Age_t1 0.13550 1.14511 0.05523 2.453 0.014157 *
Age_t2 -2.99757 0.04991 0.17028 -17.604 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
sex 0.91981 1.0872 0.83177 1.01718
pregnancy 2.42211 0.4129 0.90174 6.50591
diabetes 1.08331 0.9231 0.97434 1.20447
renal_chronic 1.22739 0.8147 1.06812 1.41042
cardiovascular 0.29007 3.4475 0.14981 0.56162
copd 0.23181 4.3140 0.11923 0.45067
inmsupr_t1 9.78198 0.1022 8.44580 11.32955
asthma_t1 6.66846 0.1500 5.80724 7.65740
pneumonia_t1 1.25108 0.7993 1.10554 1.41578
pneumonia_t2 0.54859 1.8229 0.39765 0.75681
Age_t1 1.14511 0.8733 1.02762 1.27603
Age_t2 0.04991 20.0368 0.03575 0.06968
Concordance= 0.902 (se = 0.003 )
Likelihood ratio test= 5031 on 12 df, p=<2e-16
Wald test = 2492 on 12 df, p=<2e-16
Score (logrank) test = 7073 on 12 df, p=<2e-16
9.4 Cluster 4
1. Uji Proportional hazard
DataCluster4<- CovidData %>%
filter(cluster == 4)
DataCluster4[,c("sex","pneumonia","pregnancy","diabetes","copd","asthma","inmsupr","hypertension","cardiovascular","obesity","renal_chronic","tobacco","status","Age")] <- lapply(DataCluster4[,c("sex","pneumonia","pregnancy","diabetes","copd","asthma","inmsupr","hypertension","cardiovascular","obesity","renal_chronic","tobacco","status","Age")],as.factor)
summary(DataCluster4) sex pneumonia pregnancy diabetes copd asthma inmsupr hypertension
0:1746 0:1344 0:3113 0:1809 1:3113 0:2942 0:2938 0:1421
1:1367 1:1769 1:1304 1: 171 1: 175 1:1692
cardiovascular obesity renal_chronic tobacco waktu status
0:2674 0:2249 0:2834 0:2392 Min. : 1.00 0:1136
1: 439 1: 864 1: 279 1: 721 1st Qu.: 7.00 1:1977
Median : 14.00
Mean : 42.97
3rd Qu.:100.00
Max. :100.00
Age cluster
0:2531 1: 0
1: 582 2: 0
3: 0
4:3113
5: 0
DataCluster4<- CovidData %>%
filter(cluster == 4)%>%
select(-"pregnancy")
head(DataCluster4) sex pneumonia diabetes copd asthma inmsupr hypertension cardiovascular
1 0 1 0 1 1 0 1 1
2 1 1 1 1 0 0 1 0
3 0 1 1 1 0 1 1 0
4 0 1 0 1 0 0 0 0
5 1 1 1 1 0 0 1 0
6 0 1 1 1 0 0 1 0
obesity renal_chronic tobacco waktu status Age cluster
1 1 0 0 3 1 1 4
2 0 0 0 9 1 0 4
3 0 1 0 6 1 0 4
4 0 0 0 5 1 0 4
5 1 1 0 6 1 0 4
6 0 0 0 8 1 1 4
Sebelum diterapkan model extended cox perlu dilakukan pengujian apakah semua variabe independent terhadap waktu. uji asumsi proportional hazarad ada dua pendekatan yaitu pendekatan secara grafik dan pendekatan GOF.
a. Pendekatan Grafik
Variabel sex
fit<-survfit(Surv(waktu,status)~ sex, data=DataCluster4)
ggsurvplot(fit, data = DataCluster4, fun = "cumhaz",
legend = "top",
legend.title = "Jenis Kelamin",
legend.labs = c("Laki-Laki", "Perempuan")
)Variabel pneumonia
fit<-survfit(Surv(waktu,status)~ pneumonia, data=DataCluster4)
ggsurvplot(fit, data = DataCluster4, fun = "cumhaz",
legend = "top",
legend.title = "Pneumonia",
legend.labs = c("no Pneumonia", "yes pneumonia")
)Variabel diabetes
fit<-survfit(Surv(waktu,status)~ diabetes, data=DataCluster4)
ggsurvplot(fit, data = DataCluster4, fun = "cumhaz",
legend = "top",
legend.title = "Diabetes",
legend.labs = c("no Diabetes", "yes Diabetes")
) ##### Variabel Hypertension
fit<-survfit(Surv(waktu,status)~ hypertension, data=DataCluster4)
ggsurvplot(fit, data = DataCluster4, fun = "cumhaz",
legend = "top",
legend.title = "Hyperetension",
legend.labs = c("no hypertension", "yes hypertension")
)Variabel obesity
fit<-survfit(Surv(waktu,status)~ obesity, data=DataCluster4)
ggsurvplot(fit, data = DataCluster4, fun = "cumhaz",
legend = "top",
legend.title = "Obesity",
legend.labs = c("no obesity", "yes obesity")
)Variabel renal chronic
fit<-survfit(Surv(waktu,status)~ renal_chronic, data=DataCluster4)
ggsurvplot(fit, data = DataCluster4, fun = "cumhaz",
legend = "top",
legend.title = "Renal Chronic",
legend.labs = c("no renal chronic", "yes renal chronic")
)Variabel tobacco
fit<-survfit(Surv(waktu,status)~ tobacco, data=DataCluster4)
ggsurvplot(fit, data = DataCluster4, fun = "cumhaz",
legend = "top",
legend.title = "Tobacco",
legend.labs = c("no tobacco", "yes tobacco")
)Variabel Age
fit<-survfit(Surv(waktu,status)~ Age, data=DataCluster4)
ggsurvplot(fit, data = DataCluster4, fun = "cumhaz",
legend = "top",
legend.title = "Age",
legend.labs = c("old", "young")
)Variabel COPD
fit<-survfit(Surv(waktu,status)~ copd, data=DataCluster4)
ggsurvplot(fit, data = DataCluster4, fun = "cumhaz",
legend = "top",
legend.title = "copd"
)Variabel inmsupr
fit<-survfit(Surv(waktu,status)~ inmsupr, data=DataCluster4)
ggsurvplot(fit, data = DataCluster4, fun = "cumhaz",
legend = "top",
legend.title = "inmsupr",
legend.labs = c("no inmsupr", "yes inmsupr")
)Variabel asthma
fit<-survfit(Surv(waktu,status)~ asthma, data=DataCluster4)
ggsurvplot(fit, data = DataCluster4, fun = "cumhaz",
legend = "top",
legend.title = "asthma",
legend.labs = c("no asthma", "yes asthma")
)Variabel cardiovascular
fit<-survfit(Surv(waktu,status)~ cardiovascular, data=DataCluster4)
ggsurvplot(fit, data = DataCluster4, fun = "cumhaz",
legend = "top",
legend.title = "Cardiovascular",
legend.labs = c("No", "Yes")
)b. Pendekatan GOF
fit_cluster4<-coxph(Surv(waktu,status)~ sex+pneumonia+Age+hypertension+obesity+tobacco+diabetes+renal_chronic+inmsupr+asthma+cardiovascular+copd, data = DataCluster4)
cek_fitt_all4<-cox.zph(fit_cluster4, transform="km", global=TRUE)
print(cek_fitt_all4) chisq df p
sex 1.651 1 0.199
pneumonia 63.795 1 1.4e-15
Age 2.289 1 0.130
hypertension 0.353 1 0.552
obesity 1.560 1 0.212
tobacco 0.620 1 0.431
diabetes 0.365 1 0.546
renal_chronic 2.298 1 0.130
inmsupr 0.520 1 0.471
asthma 0.121 1 0.728
cardiovascular 8.303 1 0.004
GLOBAL 86.854 11 6.9e-14
Ada 2 variabel yang tidak memenuhi asumsi proportional hazard yaitu: pneumonia, cardiovascular. maka ke-dua variabel ini akan dinteraksikan dengan fungsi waktu yaitu fungsi heavside pada setiap variabel. Untuk dapat menentukan fungsi heavsidenya adalah dengan melihat grafik survival probabilty.
# VARIABEL PNEUMONIA
fit<-survfit(Surv(waktu,status)~ pneumonia , data=DataCluster4)
ggsurvplot(fit, data = DataCluster4,
pval = TRUE, conf.int = TRUE,
risk.table = TRUE, # Add risk table
risk.table.col = "strata", # Change risk table color by groups
linetype = "strata", # Change line type by groups
surv.median.line = "hv", # Specify median survival
ggtheme = theme_bw(), # Change ggplot2 theme
palette = c("#E7B800", "#2E9FDF"))
Perhatikan kecepatan grafik pasien pneumonia turun cepat pada hari ke 25 namun setelah hari ke 25 konstan sampai akhir penelitian. sehinggan fungsi heavsidenya adalah
\[ g(t)= \begin{cases} 1, & \mbox{jika}\ t\mbox{ < 25} \\ 0, & \mbox{jika}\ t\mbox{ >= 25} \end{cases}\]
# VARIABEL Cardiovascular
fit<-survfit(Surv(waktu,status)~ cardiovascular , data=DataCluster4)
ggsurvplot(fit, data = DataCluster4,
pval = TRUE, conf.int = TRUE,
risk.table = TRUE, # Add risk table
risk.table.col = "strata", # Change risk table color by groups
linetype = "strata", # Change line type by groups
surv.median.line = "hv", # Specify median survival
ggtheme = theme_bw(), # Change ggplot2 theme
palette = c("#E7B800", "#2E9FDF"))
sehingga ke-dua tersebut diinterkasikan dengan fungsi waktu masing-masing. namun sebelumnya membuat kolom interval yaitu jika t lebih besar atau sama dengan 25 maka nilanya 25 tapi kalau tidak nilainya 0
DataCluster4= DataCluster4 %>%
mutate(interval = ifelse(waktu<= 25, 0, 25))
head(DataCluster4) sex pneumonia diabetes copd asthma inmsupr hypertension cardiovascular
1 0 1 0 1 1 0 1 1
2 1 1 1 1 0 0 1 0
3 0 1 1 1 0 1 1 0
4 0 1 0 1 0 0 0 0
5 1 1 1 1 0 0 1 0
6 0 1 1 1 0 0 1 0
obesity renal_chronic tobacco waktu status Age cluster interval
1 1 0 0 3 1 1 4 0
2 0 0 0 9 1 0 4 0
3 0 1 0 6 1 0 4 0
4 0 0 0 5 1 0 4 0
5 1 1 0 6 1 0 4 0
6 0 0 0 8 1 1 4 0
t.pneumonia_1 = t(DataCluster4$pneumonia)*(1-(DataCluster4$interval)/25)
t.pneumonia_2 = t(DataCluster4$pneumonia)*(DataCluster4$interval/25)
pneumonia_t1 = t(t.pneumonia_1 )
pneumonia_t2 = t(t.pneumonia_2)t.cardiovascular_1 = t(DataCluster4$cardiovascular)*(1-(DataCluster4$interval)/25)
t.cardiovascular_2 = t(DataCluster4$cardiovascular)*(DataCluster4$interval/25)
cardiovascular_t1 = t(t.cardiovascular_1 )
cardiovascular_t2 = t(t.cardiovascular_2)DataCluster4=cbind(DataCluster4,pneumonia_t1,pneumonia_t2,cardiovascular_t1,cardiovascular_t2)
head(DataCluster4) sex pneumonia diabetes copd asthma inmsupr hypertension cardiovascular
1 0 1 0 1 1 0 1 1
2 1 1 1 1 0 0 1 0
3 0 1 1 1 0 1 1 0
4 0 1 0 1 0 0 0 0
5 1 1 1 1 0 0 1 0
6 0 1 1 1 0 0 1 0
obesity renal_chronic tobacco waktu status Age cluster interval pneumonia_t1
1 1 0 0 3 1 1 4 0 1
2 0 0 0 9 1 0 4 0 1
3 0 1 0 6 1 0 4 0 1
4 0 0 0 5 1 0 4 0 1
5 1 1 0 6 1 0 4 0 1
6 0 0 0 8 1 1 4 0 1
pneumonia_t2 cardiovascular_t1 cardiovascular_t2
1 0 1 0
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 0
6 0 0 0
2. Model Extended Cox
Fit_Model_Cluster4<-coxph(Surv(waktu,status)~ sex+hypertension+obesity+tobacco+diabetes+renal_chronic+cardiovascular+copd+pneumonia_t1+pneumonia_t2+cardiovascular_t1+cardiovascular_t2+asthma+inmsupr, data = DataCluster4, method = "breslow")
summary(Fit_Model_Cluster4)Call:
coxph(formula = Surv(waktu, status) ~ sex + hypertension + obesity +
tobacco + diabetes + renal_chronic + cardiovascular + copd +
pneumonia_t1 + pneumonia_t2 + cardiovascular_t1 + cardiovascular_t2 +
asthma + inmsupr, data = DataCluster4, method = "breslow")
n= 3113, number of events= 1977
coef exp(coef) se(coef) z Pr(>|z|)
sex 0.01785 1.01801 0.04784 0.373 0.7091
hypertension 0.07129 1.07390 0.04807 1.483 0.1380
obesity -0.13039 0.87775 0.05183 -2.516 0.0119 *
tobacco 0.02973 1.03017 0.05552 0.535 0.5923
diabetes 0.13633 1.14606 0.04711 2.894 0.0038 **
renal_chronic 0.15840 1.17163 0.07472 2.120 0.0340 *
cardiovascular -1.45769 0.23277 0.26222 -5.559 2.71e-08 ***
copd NA NA 0.00000 NA NA
pneumonia_t1 1.45571 4.28751 0.05587 26.057 < 2e-16 ***
pneumonia_t2 -1.49436 0.22439 0.13177 -11.340 < 2e-16 ***
cardiovascular_t1 1.94364 6.98412 0.26923 7.219 5.22e-13 ***
cardiovascular_t2 NA NA 0.00000 NA NA
asthma -0.18013 0.83516 0.10455 -1.723 0.0849 .
inmsupr 0.12309 1.13099 0.09396 1.310 0.1902
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
sex 1.0180 0.9823 0.9269 1.1181
hypertension 1.0739 0.9312 0.9773 1.1800
obesity 0.8778 1.1393 0.7930 0.9716
tobacco 1.0302 0.9707 0.9240 1.1486
diabetes 1.1461 0.8726 1.0450 1.2569
renal_chronic 1.1716 0.8535 1.0120 1.3564
cardiovascular 0.2328 4.2960 0.1392 0.3892
copd NA NA NA NA
pneumonia_t1 4.2875 0.2332 3.8428 4.7836
pneumonia_t2 0.2244 4.4565 0.1733 0.2905
cardiovascular_t1 6.9841 0.1432 4.1205 11.8380
cardiovascular_t2 NA NA NA NA
asthma 0.8352 1.1974 0.6804 1.0251
inmsupr 1.1310 0.8842 0.9408 1.3597
Concordance= 0.744 (se = 0.005 )
Likelihood ratio test= 1755 on 12 df, p=<2e-16
Wald test = 1295 on 12 df, p=<2e-16
Score (logrank) test = 1800 on 12 df, p=<2e-16
3. Model improvment
model.stepwise.cluster4<-step(object = Fit_Model_Cluster4,direction = "both", trace=T,data=DataCluster4)Start: AIC=28502.79
Surv(waktu, status) ~ sex + hypertension + obesity + tobacco +
diabetes + renal_chronic + cardiovascular + copd + pneumonia_t1 +
pneumonia_t2 + cardiovascular_t1 + cardiovascular_t2 + asthma +
inmsupr
Step: AIC=28502.79
Surv(waktu, status) ~ sex + hypertension + obesity + tobacco +
diabetes + renal_chronic + cardiovascular + copd + pneumonia_t1 +
pneumonia_t2 + cardiovascular_t1 + asthma + inmsupr
Step: AIC=28502.79
Surv(waktu, status) ~ sex + hypertension + obesity + tobacco +
diabetes + renal_chronic + cardiovascular + pneumonia_t1 +
pneumonia_t2 + cardiovascular_t1 + asthma + inmsupr
Df AIC
- sex 1 28501
- tobacco 1 28501
- inmsupr 1 28503
<none> 28503
- hypertension 1 28503
- asthma 1 28504
- renal_chronic 1 28505
- obesity 1 28507
- diabetes 1 28509
- cardiovascular 1 28551
- cardiovascular_t1 1 28591
- pneumonia_t2 1 28690
- pneumonia_t1 1 29267
Step: AIC=28500.93
Surv(waktu, status) ~ hypertension + obesity + tobacco + diabetes +
renal_chronic + cardiovascular + pneumonia_t1 + pneumonia_t2 +
cardiovascular_t1 + asthma + inmsupr
Df AIC
- tobacco 1 28499
- inmsupr 1 28501
<none> 28501
- hypertension 1 28501
- asthma 1 28502
+ sex 1 28503
- renal_chronic 1 28503
- obesity 1 28505
- diabetes 1 28507
- cardiovascular 1 28549
- cardiovascular_t1 1 28589
- pneumonia_t2 1 28688
- pneumonia_t1 1 29265
Step: AIC=28499.13
Surv(waktu, status) ~ hypertension + obesity + diabetes + renal_chronic +
cardiovascular + pneumonia_t1 + pneumonia_t2 + cardiovascular_t1 +
asthma + inmsupr
Df AIC
- inmsupr 1 28499
<none> 28499
- hypertension 1 28499
- asthma 1 28500
+ tobacco 1 28501
+ sex 1 28501
- renal_chronic 1 28502
- obesity 1 28503
- diabetes 1 28506
- cardiovascular 1 28547
- cardiovascular_t1 1 28587
- pneumonia_t2 1 28686
- pneumonia_t1 1 29265
Step: AIC=28498.87
Surv(waktu, status) ~ hypertension + obesity + diabetes + renal_chronic +
cardiovascular + pneumonia_t1 + pneumonia_t2 + cardiovascular_t1 +
asthma
Df AIC
<none> 28499
- hypertension 1 28499
+ inmsupr 1 28499
- asthma 1 28500
+ tobacco 1 28501
+ sex 1 28501
- renal_chronic 1 28502
- obesity 1 28503
- diabetes 1 28506
- cardiovascular 1 28546
- cardiovascular_t1 1 28587
- pneumonia_t2 1 28686
- pneumonia_t1 1 29265
4. Model Final
Model_Final_Cluster4<-coxph(Surv(waktu, status) ~ hypertension + obesity + diabetes + renal_chronic +
cardiovascular + pneumonia_t1 + pneumonia_t2 + cardiovascular_t1 +
asthma, data = DataCluster4, method = "breslow")
summary(Model_Final_Cluster4)Call:
coxph(formula = Surv(waktu, status) ~ hypertension + obesity +
diabetes + renal_chronic + cardiovascular + pneumonia_t1 +
pneumonia_t2 + cardiovascular_t1 + asthma, data = DataCluster4,
method = "breslow")
n= 3113, number of events= 1977
coef exp(coef) se(coef) z Pr(>|z|)
hypertension 0.07019 1.07272 0.04800 1.462 0.14362
obesity -0.12785 0.87998 0.05153 -2.481 0.01309 *
diabetes 0.14208 1.15267 0.04677 3.038 0.00238 **
renal_chronic 0.16555 1.18004 0.07451 2.222 0.02629 *
cardiovascular -1.44224 0.23640 0.26194 -5.506 3.67e-08 ***
pneumonia_t1 1.45612 4.28926 0.05582 26.084 < 2e-16 ***
pneumonia_t2 -1.49284 0.22473 0.13177 -11.329 < 2e-16 ***
cardiovascular_t1 1.94515 6.99470 0.26920 7.226 4.99e-13 ***
asthma -0.18858 0.82814 0.10434 -1.807 0.07072 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
hypertension 1.0727 0.9322 0.9764 1.1785
obesity 0.8800 1.1364 0.7955 0.9735
diabetes 1.1527 0.8675 1.0517 1.2633
renal_chronic 1.1800 0.8474 1.0197 1.3656
cardiovascular 0.2364 4.2302 0.1415 0.3950
pneumonia_t1 4.2893 0.2331 3.8447 4.7852
pneumonia_t2 0.2247 4.4497 0.1736 0.2910
cardiovascular_t1 6.9947 0.1430 4.1269 11.8554
asthma 0.8281 1.2075 0.6750 1.0161
Concordance= 0.743 (se = 0.005 )
Likelihood ratio test= 1753 on 9 df, p=<2e-16
Wald test = 1293 on 9 df, p=<2e-16
Score (logrank) test = 1798 on 9 df, p=<2e-16
9.5 Cluster 5
DataCluster5<- CovidData %>%
filter(cluster == 5)
DataCluster5[,c("sex","pneumonia","pregnancy","diabetes","copd","asthma","inmsupr","hypertension","cardiovascular","obesity","renal_chronic","tobacco","status","Age")] <- lapply(DataCluster5[,c("sex","pneumonia","pregnancy","diabetes","copd","asthma","inmsupr","hypertension","cardiovascular","obesity","renal_chronic","tobacco","status","Age")],as.factor)
summary(DataCluster5) sex pneumonia pregnancy diabetes copd asthma inmsupr hypertension
0:1981 0:1632 0:3363 0:1948 0:3367 0:3270 0:3176 0:1248
1:1386 1:1735 1: 4 1:1419 1: 97 1: 191 1:2119
cardiovascular obesity renal_chronic tobacco waktu status
1:3367 0:2319 0:2941 0:2998 Min. : 1.00 0:1522
1:1048 1: 426 1: 369 1st Qu.: 8.00 1:1845
Median : 20.00
Mean : 50.81
3rd Qu.:100.00
Max. :100.00
Age cluster
0:2260 1: 0
1:1107 2: 0
3: 0
4: 0
5:3367
DataCluster5<- CovidData %>%
filter(cluster == 5)%>%
select(-c("copd",'cardiovascular'))
head(DataCluster5) sex pneumonia pregnancy diabetes asthma inmsupr hypertension obesity
1 0 1 0 0 0 0 0 0
2 1 1 0 1 0 0 0 0
3 0 1 0 1 0 0 1 1
4 1 0 0 1 0 0 0 0
5 1 1 0 0 0 0 1 0
6 0 1 0 0 0 0 1 0
renal_chronic tobacco waktu status Age cluster
1 0 1 2 1 0 5
2 0 0 52 1 0 5
3 0 0 5 1 0 5
4 1 0 8 1 0 5
5 1 0 5 1 0 5
6 1 0 4 1 1 5
1. Uji Proportional Hazard
b. Pendekatan Grafik
fit<-survfit(Surv(waktu,status)~ sex, data=DataCluster5)
ggsurvplot(fit, data = DataCluster5, fun = "cumhaz",
legend = "top",
legend.title = "Jenis Kelamin",
legend.labs = c("Laki-Laki", "Perempuan")
)Variabel pneumonia
fit<-survfit(Surv(waktu,status)~ pneumonia, data=DataCluster5)
ggsurvplot(fit, data = DataCluster5, fun = "cumhaz",
legend = "top",
legend.title = "Pneumonia",
legend.labs = c("no Pneumonia", "yes pneumonia")
)Variabel diabetes
fit<-survfit(Surv(waktu,status)~ diabetes, data=DataCluster5)
ggsurvplot(fit, data = DataCluster5, fun = "cumhaz",
legend = "top",
legend.title = "Diabetes",
legend.labs = c("no Diabetes", "yes Diabetes")
)Variabel Hypertension
fit<-survfit(Surv(waktu,status)~ hypertension, data=DataCluster5)
ggsurvplot(fit, data = DataCluster5, fun = "cumhaz",
legend = "top",
legend.title = "Hyperetension",
legend.labs = c("no hypertension", "yes hypertension")
)Variabel obesity
fit<-survfit(Surv(waktu,status)~ obesity, data=DataCluster5)
ggsurvplot(fit, data = DataCluster5, fun = "cumhaz",
legend = "top",
legend.title = "Obesity",
legend.labs = c("no obesity", "yes obesity")
)Variabel renal chronic
fit<-survfit(Surv(waktu,status)~ renal_chronic, data=DataCluster5)
ggsurvplot(fit, data = DataCluster5, fun = "cumhaz",
legend = "top",
legend.title = "Renal Chronic",
legend.labs = c("no renal chronic", "yes renal chronic")
)Variabel tobacco
fit<-survfit(Surv(waktu,status)~ tobacco, data=DataCluster5)
ggsurvplot(fit, data = DataCluster5, fun = "cumhaz",
legend = "top",
legend.title = "Tobacco",
legend.labs = c("no tobacco", "yes tobacco")
)Variabel Age
fit<-survfit(Surv(waktu,status)~ Age, data=DataCluster5)
ggsurvplot(fit, data = DataCluster5, fun = "cumhaz",
legend = "top",
legend.title = "Age",
legend.labs = c("old", "young")
)Variabel inmsupr
fit<-survfit(Surv(waktu,status)~ inmsupr, data=DataCluster5)
ggsurvplot(fit, data = DataCluster5, fun = "cumhaz",
legend = "top",
legend.title = "inmsupr",
legend.labs = c("no inmsupr", "yes inmsupr")
)Variabel asthma
fit<-survfit(Surv(waktu,status)~ asthma, data=DataCluster5)
ggsurvplot(fit, data = DataCluster5, fun = "cumhaz",
legend = "top",
legend.title = "asthma",
legend.labs = c("no asthma", "yes asthma")
)b. Pendekatan GOF
fit_cluster5<-coxph(Surv(waktu,status)~ sex+pneumonia+Age+hypertension+obesity+tobacco+diabetes+renal_chronic+inmsupr+asthma+pregnancy, data = DataCluster5)
cek_fitt_all5<-cox.zph(fit_cluster5, transform="km", global=TRUE)
print(cek_fitt_all5) chisq df p
sex 5.670 1 0.017
pneumonia 62.849 1 2.2e-15
Age 17.650 1 2.7e-05
hypertension 2.194 1 0.139
obesity 1.014 1 0.314
tobacco 3.506 1 0.061
diabetes 4.965 1 0.026
renal_chronic 0.735 1 0.391
inmsupr 1.134 1 0.287
asthma 1.507 1 0.220
pregnancy 0.439 1 0.508
GLOBAL 85.547 11 1.2e-13
# VARIABEL PNEUMONIA
fit<-survfit(Surv(waktu,status)~ pneumonia , data=DataCluster5)
ggsurvplot(fit, data = DataCluster5,
pval = TRUE, conf.int = TRUE,
risk.table = TRUE, # Add risk table
risk.table.col = "strata", # Change risk table color by groups
linetype = "strata", # Change line type by groups
surv.median.line = "hv", # Specify median survival
ggtheme = theme_bw(), # Change ggplot2 theme
palette = c("#E7B800", "#2E9FDF"))
Perhatikan kecepatan grafik pasien yang memiliki pneumonia turun cepat pada hari ke 25 namun setelah hari ke 25 konstan sampai akhir penelitian. sehinggan fungsi heavsidenya adalah
\[ g(t)= \begin{cases} 1, & \mbox{jika}\ t\mbox{ < 25} \\ 0, & \mbox{jika}\ t\mbox{ >= 25} \end{cases}\]
# VARIABEL Age
fit<-survfit(Surv(waktu,status)~ Age , data=DataCluster5)
ggsurvplot(fit, data = DataCluster5,
pval = TRUE, conf.int = TRUE,
risk.table = TRUE, # Add risk table
risk.table.col = "strata", # Change risk table color by groups
linetype = "strata", # Change line type by groups
surv.median.line = "hv", # Specify median survival
ggtheme = theme_bw(), # Change ggplot2 theme
palette = c("#E7B800", "#2E9FDF"))DataCluster5= DataCluster5 %>%
mutate(interval = ifelse(waktu<= 25, 0, 25))
head(DataCluster5) sex pneumonia pregnancy diabetes asthma inmsupr hypertension obesity
1 0 1 0 0 0 0 0 0
2 1 1 0 1 0 0 0 0
3 0 1 0 1 0 0 1 1
4 1 0 0 1 0 0 0 0
5 1 1 0 0 0 0 1 0
6 0 1 0 0 0 0 1 0
renal_chronic tobacco waktu status Age cluster interval
1 0 1 2 1 0 5 0
2 0 0 52 1 0 5 25
3 0 0 5 1 0 5 0
4 1 0 8 1 0 5 0
5 1 0 5 1 0 5 0
6 1 0 4 1 1 5 0
t.pneumonia_1 = t(DataCluster5$pneumonia)*(1-(DataCluster5$interval)/25)
t.pneumonia_2 = t(DataCluster5$pneumonia)*(DataCluster5$interval/25)
pneumonia_t1 = t(t.pneumonia_1 )
pneumonia_t2 = t(t.pneumonia_2)t.Age_1 = t(DataCluster5$Age)*(1-(DataCluster5$interval)/25)
t.Age_2 = t(DataCluster5$Age)*(DataCluster5$interval/25)
Age_t1 = t(t.Age_1 )
Age_t2 = t(t.Age_2)DataCluster5=cbind(DataCluster5,pneumonia_t1,pneumonia_t2,Age_t1,Age_t2)
head(DataCluster5) sex pneumonia pregnancy diabetes asthma inmsupr hypertension obesity
1 0 1 0 0 0 0 0 0
2 1 1 0 1 0 0 0 0
3 0 1 0 1 0 0 1 1
4 1 0 0 1 0 0 0 0
5 1 1 0 0 0 0 1 0
6 0 1 0 0 0 0 1 0
renal_chronic tobacco waktu status Age cluster interval pneumonia_t1
1 0 1 2 1 0 5 0 1
2 0 0 52 1 0 5 25 0
3 0 0 5 1 0 5 0 1
4 1 0 8 1 0 5 0 0
5 1 0 5 1 0 5 0 1
6 1 0 4 1 1 5 0 1
pneumonia_t2 Age_t1 Age_t2
1 0 0 0
2 1 0 0
3 0 0 0
4 0 0 0
5 0 0 0
6 0 1 0
2. Model extended cox wit breslow method
Fit_Model_Cluster5<-coxph(Surv(waktu,status)~ sex+hypertension+obesity+tobacco+diabetes+renal_chronic+pregnancy+pneumonia_t1+pneumonia_t2+Age_t1+Age_t2+asthma+inmsupr, data = DataCluster5, method = "breslow")
summary(Fit_Model_Cluster5)Call:
coxph(formula = Surv(waktu, status) ~ sex + hypertension + obesity +
tobacco + diabetes + renal_chronic + pregnancy + pneumonia_t1 +
pneumonia_t2 + Age_t1 + Age_t2 + asthma + inmsupr, data = DataCluster5,
method = "breslow")
n= 3367, number of events= 1845
coef exp(coef) se(coef) z Pr(>|z|)
sex -0.022076 0.978166 0.049402 -0.447 0.655
hypertension -0.024265 0.976027 0.053456 -0.454 0.650
obesity -0.048034 0.953102 0.051414 -0.934 0.350
tobacco 0.003548 1.003555 0.073564 0.048 0.962
diabetes 0.031888 1.032402 0.048972 0.651 0.515
renal_chronic 0.286411 1.331640 0.064813 4.419 9.91e-06 ***
pregnancy 0.367627 1.444303 1.003104 0.366 0.714
pneumonia_t1 1.479824 4.392174 0.059950 24.684 < 2e-16 ***
pneumonia_t2 -1.364407 0.255532 0.140013 -9.745 < 2e-16 ***
Age_t1 0.357100 1.429179 0.062935 5.674 1.39e-08 ***
Age_t2 -3.380664 0.034025 0.254798 -13.268 < 2e-16 ***
asthma -0.241937 0.785106 0.150304 -1.610 0.107
inmsupr 0.091748 1.096089 0.093032 0.986 0.324
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
sex 0.97817 1.0223 0.88790 1.07761
hypertension 0.97603 1.0246 0.87894 1.08384
obesity 0.95310 1.0492 0.86174 1.05415
tobacco 1.00355 0.9965 0.86881 1.15920
diabetes 1.03240 0.9686 0.93791 1.13641
renal_chronic 1.33164 0.7510 1.17278 1.51201
pregnancy 1.44430 0.6924 0.20222 10.31578
pneumonia_t1 4.39217 0.2277 3.90526 4.93979
pneumonia_t2 0.25553 3.9134 0.19421 0.33622
Age_t1 1.42918 0.6997 1.26333 1.61680
Age_t2 0.03402 29.3903 0.02065 0.05606
asthma 0.78511 1.2737 0.58477 1.05406
inmsupr 1.09609 0.9123 0.91339 1.31533
Concordance= 0.799 (se = 0.005 )
Likelihood ratio test= 2643 on 13 df, p=<2e-16
Wald test = 1300 on 13 df, p=<2e-16
Score (logrank) test = 2582 on 13 df, p=<2e-16
3. Model Final
model.stepwise.cluster5<-step(object = Fit_Model_Cluster5,direction = "both", trace=T,data=DataCluster5)Start: AIC=26144.92
Surv(waktu, status) ~ sex + hypertension + obesity + tobacco +
diabetes + renal_chronic + pregnancy + pneumonia_t1 + pneumonia_t2 +
Age_t1 + Age_t2 + asthma + inmsupr
Df AIC
- tobacco 1 26143
- pregnancy 1 26143
- sex 1 26143
- hypertension 1 26143
- diabetes 1 26143
- obesity 1 26144
- inmsupr 1 26144
<none> 26145
- asthma 1 26146
- renal_chronic 1 26161
- Age_t1 1 26173
- pneumonia_t2 1 26277
- Age_t2 1 26669
- pneumonia_t1 1 26852
Step: AIC=26142.92
Surv(waktu, status) ~ sex + hypertension + obesity + diabetes +
renal_chronic + pregnancy + pneumonia_t1 + pneumonia_t2 +
Age_t1 + Age_t2 + asthma + inmsupr
Df AIC
- pregnancy 1 26141
- hypertension 1 26141
- sex 1 26141
- diabetes 1 26141
- obesity 1 26142
- inmsupr 1 26142
<none> 26143
- asthma 1 26144
+ tobacco 1 26145
- renal_chronic 1 26159
- Age_t1 1 26171
- pneumonia_t2 1 26275
- Age_t2 1 26667
- pneumonia_t1 1 26850
Step: AIC=26141.04
Surv(waktu, status) ~ sex + hypertension + obesity + diabetes +
renal_chronic + pneumonia_t1 + pneumonia_t2 + Age_t1 + Age_t2 +
asthma + inmsupr
Df AIC
- sex 1 26139
- hypertension 1 26139
- diabetes 1 26140
- obesity 1 26140
- inmsupr 1 26140
<none> 26141
- asthma 1 26142
+ pregnancy 1 26143
+ tobacco 1 26143
- renal_chronic 1 26158
- Age_t1 1 26170
- pneumonia_t2 1 26273
- Age_t2 1 26665
- pneumonia_t1 1 26849
Step: AIC=26139.24
Surv(waktu, status) ~ hypertension + obesity + diabetes + renal_chronic +
pneumonia_t1 + pneumonia_t2 + Age_t1 + Age_t2 + asthma +
inmsupr
Df AIC
- hypertension 1 26138
- diabetes 1 26138
- inmsupr 1 26138
- obesity 1 26138
<none> 26139
- asthma 1 26140
+ sex 1 26141
+ pregnancy 1 26141
+ tobacco 1 26141
- renal_chronic 1 26156
- Age_t1 1 26168
- pneumonia_t2 1 26272
- Age_t2 1 26665
- pneumonia_t1 1 26847
Step: AIC=26137.47
Surv(waktu, status) ~ obesity + diabetes + renal_chronic + pneumonia_t1 +
pneumonia_t2 + Age_t1 + Age_t2 + asthma + inmsupr
Df AIC
- diabetes 1 26136
- inmsupr 1 26136
- obesity 1 26137
<none> 26138
- asthma 1 26138
+ hypertension 1 26139
+ sex 1 26139
+ pregnancy 1 26139
+ tobacco 1 26140
- renal_chronic 1 26154
- Age_t1 1 26167
- pneumonia_t2 1 26270
- Age_t2 1 26674
- pneumonia_t1 1 26845
Step: AIC=26135.76
Surv(waktu, status) ~ obesity + renal_chronic + pneumonia_t1 +
pneumonia_t2 + Age_t1 + Age_t2 + asthma + inmsupr
Df AIC
- inmsupr 1 26135
- obesity 1 26135
<none> 26136
- asthma 1 26137
+ diabetes 1 26138
+ sex 1 26138
+ hypertension 1 26138
+ pregnancy 1 26138
+ tobacco 1 26138
- renal_chronic 1 26153
- Age_t1 1 26165
- pneumonia_t2 1 26268
- Age_t2 1 26685
- pneumonia_t1 1 26844
Step: AIC=26134.63
Surv(waktu, status) ~ obesity + renal_chronic + pneumonia_t1 +
pneumonia_t2 + Age_t1 + Age_t2 + asthma
Df AIC
- obesity 1 26134
<none> 26135
- asthma 1 26136
+ inmsupr 1 26136
+ diabetes 1 26136
+ hypertension 1 26137
+ pregnancy 1 26137
+ sex 1 26137
+ tobacco 1 26137
- renal_chronic 1 26154
- Age_t1 1 26165
- pneumonia_t2 1 26268
- Age_t2 1 26685
- pneumonia_t1 1 26843
Step: AIC=26133.59
Surv(waktu, status) ~ renal_chronic + pneumonia_t1 + pneumonia_t2 +
Age_t1 + Age_t2 + asthma
Df AIC
<none> 26134
+ obesity 1 26135
+ inmsupr 1 26135
- asthma 1 26135
+ hypertension 1 26135
+ diabetes 1 26135
+ sex 1 26135
+ pregnancy 1 26136
+ tobacco 1 26136
- renal_chronic 1 26153
- Age_t1 1 26163
- pneumonia_t2 1 26267
- Age_t2 1 26684
- pneumonia_t1 1 26841
4. Model Final
Model_Final_Cluster5<-coxph(Surv(waktu, status) ~ renal_chronic + pneumonia_t1 + pneumonia_t2 +
Age_t1 + Age_t2 + asthma, data = DataCluster5, method = "breslow")
summary(Model_Final_Cluster5)Call:
coxph(formula = Surv(waktu, status) ~ renal_chronic + pneumonia_t1 +
pneumonia_t2 + Age_t1 + Age_t2 + asthma, data = DataCluster5,
method = "breslow")
n= 3367, number of events= 1845
coef exp(coef) se(coef) z Pr(>|z|)
renal_chronic 0.29806 1.34725 0.06260 4.761 1.92e-06 ***
pneumonia_t1 1.47917 4.38929 0.05994 24.677 < 2e-16 ***
pneumonia_t2 -1.36687 0.25490 0.13996 -9.766 < 2e-16 ***
Age_t1 0.35894 1.43180 0.06167 5.820 5.89e-09 ***
Age_t2 -3.38137 0.03400 0.25382 -13.322 < 2e-16 ***
asthma -0.25712 0.77328 0.14957 -1.719 0.0856 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
renal_chronic 1.3472 0.7423 1.19169 1.52311
pneumonia_t1 4.3893 0.2278 3.90277 4.93646
pneumonia_t2 0.2549 3.9230 0.19375 0.33536
Age_t1 1.4318 0.6984 1.26878 1.61578
Age_t2 0.0340 29.4110 0.02067 0.05592
asthma 0.7733 1.2932 0.57679 1.03670
Concordance= 0.797 (se = 0.005 )
Likelihood ratio test= 2641 on 6 df, p=<2e-16
Wald test = 1297 on 6 df, p=<2e-16
Score (logrank) test = 2579 on 6 df, p=<2e-16