Risk Analysis of Covid-19 Patient Mortality Using Extended Cox Model


Survival analysis is a collection of methods for evaluate data related to time from start to finish a special event (can be death/recovery) with given censorship status to be a comparison of endurance ability human life (Cread,2020).
Berdasarkan definisi tersebut, penelitian ini memilih analisis survival untuk mengetahui penyebab kematian pasien covid 19. Penelitian ini menggunakan data pasien covid -19 yang diperoleh dari Mexican Government, dengan variabel respon yaitu time dan status (tersensor atau tidak) dan variabel prediktor yaitu hasil laboratorium pasien berupa riwayat penyakit yang pernah diderita pasien Covid-19. Data yang akan dianalisis terlalu besar dan heterogen sehingga dilakukan clustering menggunakan algoritma K-Means Clustering.Penelitian akan melakukan pengelompokan data pasien covid-19 berdasarakan tingkat keparahan yaitu terdiri dari cluster pasien tidak parah dan pasien yang parah, sehingga hasil analisis akan lebih mengerucut sesuai dengan cluster tertentu dan memperjelas kesimpulan. Setelah melakukan clustering pada data pasien covid-19, dilanjutkan dengan melakukan analisis survival pada dua cluster yang terbentuk yaitu menerapkan model extended cox yang diestimasi menggunakan metode Breslow partial likelihood dan yang terakhir dilakukan uji kesesuaian model.
Penelitian analisis risiko kematian pasien covid-19 ini diharapkan memberikan insight mengenai factor-faktor yang mempengaruhi kematian pasien covid-19 sehingga membantu tenaga medis dalam memberikan treatment yang sesuai bagi pasien yang memiliki faktor tersebut. Berikut ini merupakan flowchart penelitian ini

1. Data Preparation

Pada Tahapan ini dilakuan mempersiapkan library yang digunakan dan membaca data untuk penelitian ini

CovidData<-read.csv2("covid.csv")
head(CovidData)
        id sex patient_type entry_date date_symptoms  date_died intubed
1   1636bf   0            2 12/01/2020    09/01/2020 13/01/2020       1
2 7.98E+02   0            2 12/01/2020    12/01/2020 14/01/2020       2
3   1673d6   1            2 13/01/2020    10/01/2020 15/01/2020       2
4   1045fa   1            2 12/01/2020    12/01/2020 29/01/2020       2
5   1abb1c   1            2 12/01/2020    04/01/2020 30/01/2020       2
6   09ab8c   0            2 06/02/2020    05/02/2020 08/02/2020       2
  pneumonia age pregnancy diabetes copd asthma inmsupr hypertension
1         1  48         0        0    0      0       0            0
2         1  74         0        0    0      0       0            0
3         1  45         0       98    0     98      98            0
4         1  75         0        0    0      0       0            1
5         1  54         0        1    0      0       0            0
6         1  55         0        0    1      1       0            1
  other_disease cardiovascular obesity renal_chronic tobacco
1             2              0       0             0       0
2             2              1       0             0       1
3             2              0       0             0       0
4             1              0       1             0       0
5             2              0       1             0       0
6             2              1       1             0       0
  contact_other_covid covid_res icu
1                  99         2   2
2                  99         2   2
3                  99         2   1
4                  99         2   2
5                  99         2   1
6                  99         2   1

2. Data Identification

Pada penelitian ini, tidak menggunakan semua kolom pada data covid tersebut, Sehingga membuang variabel yang tidak dibutuhkan dengan melakukan subseting.

CovidData=CovidData[,-c(1,3,4,7,16,21,22,23)]

The following is a description of each column:
1. Sex: Identifies the sex of the patient (0: male & 1: female).
2. date_symptoms: Identifies the date on which the patient’s symptoms began.
3. data_died: Identifies the date the patient died.
4. pneumonia:Identifies if the patient was diagnosed with pneumonia (0: No & 1: yes).
5. age: Identifies the age of the patient.
6. pregnancy: Identifies if the patient is pregnant (0:No & 1:Yes).
7. diabetes: Identifies if the patient has a diagnosis of diabetes (0: No & 1: Yes).
8. copd: Identifies if the patient has a diagnosis of Chronic obstructive pulmonary disease (COPD) (0: No & 1:Yes).
9. asthma: Identifies if the patient has a diagnosis of asthma (0: No & 1: Yes).
10. inmsupr:Identifies if the patient has immunosuppression (0: No & 1: Yes) 11. hypertension:Identifies if the patient has a diagnosis of hypertension (0:No & 1:Yes).
12. cardiovascular: Identifies if the patient has a diagnosis of cardiovascular disease (0: No & 1: Yes).
13. obesity: Identifies if the patient is diagnosed with obesity (0: No & 1: Yes).
14. renal_chronic: Identifies if the patient has a diagnosis of chronic kidney failure (0: No & 1: Yes).
15. tobacco: Identifies if the patient is a tobacco user (0: No & 1: Yes).

3. Data Acquistion

Data acquistion is done so that the data format is in accordance with the objectives of the analysis.

3.1. Check Type Data

str(CovidData)
'data.frame':   125650 obs. of  15 variables:
 $ sex           : int  0 0 1 1 1 0 0 1 1 0 ...
 $ date_symptoms : chr  "09/01/2020" "12/01/2020" "10/01/2020" "12/01/2020" ...
 $ date_died     : chr  "13/01/2020" "14/01/2020" "15/01/2020" "29/01/2020" ...
 $ pneumonia     : int  1 1 1 1 1 1 1 1 1 1 ...
 $ age           : int  48 74 45 75 54 55 63 68 32 15 ...
 $ pregnancy     : int  0 0 0 0 0 0 0 0 0 0 ...
 $ diabetes      : int  0 0 98 0 1 0 0 0 0 0 ...
 $ copd          : int  0 0 0 0 0 1 0 0 0 0 ...
 $ asthma        : int  0 0 98 0 0 1 0 0 0 0 ...
 $ inmsupr       : int  0 0 98 0 0 0 0 0 1 0 ...
 $ hypertension  : int  0 0 0 1 0 1 1 1 1 0 ...
 $ cardiovascular: int  0 1 0 0 0 1 0 0 0 0 ...
 $ obesity       : int  0 0 0 1 1 1 0 1 0 0 ...
 $ renal_chronic : int  0 0 0 0 0 0 0 0 1 0 ...
 $ tobacco       : int  0 1 0 0 0 0 0 0 0 0 ...

Perhatikan hasil pengecekan tipe diatas, date_symptoms dan date_died masih dalam bentuk chr seharusnya dalam bentuk date. variabel pneumonia, pregnancy, diabetes, copd, asthma, inmsupr, hypertension, cardiovascular, obesity, renal_chronic, tobaco seharusnya juga dalam bentuk factor namun karena melakukan analisis survival langsung bisa diidentifkasi sehingga tidak perlu diubah kedalam bentuk tipe data factor.

CovidData$date_died <- as.Date(CovidData$date_died, format = "%d/%m/%y")
CovidData$date_symptoms <- as.Date(CovidData$date_symptoms, format = "%d/%m/%y")

str(CovidData)
'data.frame':   125650 obs. of  15 variables:
 $ sex           : int  0 0 1 1 1 0 0 1 1 0 ...
 $ date_symptoms : Date, format: "2020-01-09" "2020-01-12" ...
 $ date_died     : Date, format: "2020-01-13" "2020-01-14" ...
 $ pneumonia     : int  1 1 1 1 1 1 1 1 1 1 ...
 $ age           : int  48 74 45 75 54 55 63 68 32 15 ...
 $ pregnancy     : int  0 0 0 0 0 0 0 0 0 0 ...
 $ diabetes      : int  0 0 98 0 1 0 0 0 0 0 ...
 $ copd          : int  0 0 0 0 0 1 0 0 0 0 ...
 $ asthma        : int  0 0 98 0 0 1 0 0 0 0 ...
 $ inmsupr       : int  0 0 98 0 0 0 0 0 1 0 ...
 $ hypertension  : int  0 0 0 1 0 1 1 1 1 0 ...
 $ cardiovascular: int  0 1 0 0 0 1 0 0 0 0 ...
 $ obesity       : int  0 0 0 1 1 1 0 1 0 0 ...
 $ renal_chronic : int  0 0 0 0 0 0 0 0 1 0 ...
 $ tobacco       : int  0 1 0 0 0 0 0 0 0 0 ...

3.2. Feature Engineering

Buat kolom waktu sebagai variabel respon dalam penelitian ini, variabel ini dipeorleh dari tanggal kematian pasien dikurangi dengan tanggal pertama kali positif Covid-19. secara matematika dituliskan seperti berikut

\[ waktu = datedied - date symptoms \]

CovidData$waktu<-CovidData$date_died - CovidData$date_symptoms
#lalu diubah ketipe data integer
CovidData$waktu<-as.integer(CovidData$waktu)
head(CovidData)
  sex date_symptoms  date_died pneumonia age pregnancy diabetes copd asthma
1   0    2020-01-09 2020-01-13         1  48         0        0    0      0
2   0    2020-01-12 2020-01-14         1  74         0        0    0      0
3   1    2020-01-10 2020-01-15         1  45         0       98    0     98
4   1    2020-01-12 2020-01-29         1  75         0        0    0      0
5   1    2020-01-04 2020-01-30         1  54         0        1    0      0
6   0    2020-02-05 2020-02-08         1  55         0        0    1      1
  inmsupr hypertension cardiovascular obesity renal_chronic tobacco waktu
1       0            0              0       0             0       0     4
2       0            0              1       0             0       1     2
3      98            0              0       0             0       0     5
4       0            1              0       1             0       0    17
5       0            0              0       1             0       0    26
6       0            1              1       1             0       0     3

Kemudian membuat kolom status sebagai variabel respon dalam penelitian ini

CovidData = CovidData %>%
        mutate(status = ifelse(is.na(waktu), 0, 1))
head(CovidData)
  sex date_symptoms  date_died pneumonia age pregnancy diabetes copd asthma
1   0    2020-01-09 2020-01-13         1  48         0        0    0      0
2   0    2020-01-12 2020-01-14         1  74         0        0    0      0
3   1    2020-01-10 2020-01-15         1  45         0       98    0     98
4   1    2020-01-12 2020-01-29         1  75         0        0    0      0
5   1    2020-01-04 2020-01-30         1  54         0        1    0      0
6   0    2020-02-05 2020-02-08         1  55         0        0    1      1
  inmsupr hypertension cardiovascular obesity renal_chronic tobacco waktu
1       0            0              0       0             0       0     4
2       0            0              1       0             0       1     2
3      98            0              0       0             0       0     5
4       0            1              0       1             0       0    17
5       0            0              0       1             0       0    26
6       0            1              1       1             0       0     3
  status
1      1
2      1
3      1
4      1
5      1
6      1

4. Data Filtering

Pada tahapan data filtering dilakukan pengecekan missing value dan outlier.

3.1 Check missing value

anyNA(CovidData)
[1] TRUE

3.2 Check Outlier

boxplot(CovidData)

5. Data Validation

Adanya missing value pada kolom waktu kemudian melakukan imputasi pada nilai yang missing value yaitu mengisi nilai yang kosong tersebut menjadi nilai maksimumnya yaitu 100 karena maksimal waktu pasien mengalami event adalah 93 hari sehingga dibulatkan menjadi 100 hari.

CovidData$waktu[is.na(CovidData$waktu)] <- 100
head(CovidData)
  sex date_symptoms  date_died pneumonia age pregnancy diabetes copd asthma
1   0    2020-01-09 2020-01-13         1  48         0        0    0      0
2   0    2020-01-12 2020-01-14         1  74         0        0    0      0
3   1    2020-01-10 2020-01-15         1  45         0       98    0     98
4   1    2020-01-12 2020-01-29         1  75         0        0    0      0
5   1    2020-01-04 2020-01-30         1  54         0        1    0      0
6   0    2020-02-05 2020-02-08         1  55         0        0    1      1
  inmsupr hypertension cardiovascular obesity renal_chronic tobacco waktu
1       0            0              0       0             0       0     4
2       0            0              1       0             0       1     2
3      98            0              0       0             0       0     5
4       0            1              0       1             0       0    17
5       0            0              0       1             0       0    26
6       0            1              1       1             0       0     3
  status
1      1
2      1
3      1
4      1
5      1
6      1

Data pada umur sangat beragam sehingga perlu membuat kolom baru untuk mengelompokan variabel age menjadi umur tua dan muda. Menurut WHO, sesorang dikatakan masih muda saat usia $ <= 55$ tahun dan dikatakan tua saat lebih dari 55 tahun.

CovidData = CovidData %>%
        mutate(Age = ifelse(age <= 55, 1, 0))
CovidData = CovidData[,-c(2,3,5)]
head(CovidData)
  sex pneumonia pregnancy diabetes copd asthma inmsupr hypertension
1   0         1         0        0    0      0       0            0
2   0         1         0        0    0      0       0            0
3   1         1         0       98    0     98      98            0
4   1         1         0        0    0      0       0            1
5   1         1         0        1    0      0       0            0
6   0         1         0        0    1      1       0            1
  cardiovascular obesity renal_chronic tobacco waktu status Age
1              0       0             0       0     4      1   1
2              1       0             0       1     2      1   0
3              0       0             0       0     5      1   1
4              0       1             0       0    17      1   0
5              0       1             0       0    26      1   1
6              1       1             0       0     3      1   1

6. Data Cleaning

tahapan selanjutnya adalah menghapus data-data outlier pada kolom pregnancy, copd, asthma, inmsupr, hypertension, cardiovascular, obesity, renal chronic, tobacco,pneumonia, diabetes dan waktu.

CovidData <- subset(CovidData, pregnancy!= 98)
CovidData <- subset(CovidData, copd!= 98)
CovidData <- subset(CovidData, asthma!= 98)
CovidData <- subset(CovidData, inmsupr!= 98)
CovidData <- subset(CovidData, hypertension!= 98)
CovidData <- subset(CovidData, cardiovascular!= 98)
CovidData <- subset(CovidData, obesity!= 98)
CovidData <- subset(CovidData, renal_chronic!= 98)
CovidData <- subset(CovidData, tobacco!= 98)
CovidData <- subset(CovidData, pneumonia!= 99)
CovidData <- subset(CovidData, diabetes!= 98)
CovidData <- subset(CovidData, waktu > 0 )
head(CovidData)
  sex pneumonia pregnancy diabetes copd asthma inmsupr hypertension
1   0         1         0        0    0      0       0            0
2   0         1         0        0    0      0       0            0
4   1         1         0        0    0      0       0            1
5   1         1         0        1    0      0       0            0
6   0         1         0        0    1      1       0            1
7   0         1         0        0    0      0       0            1
  cardiovascular obesity renal_chronic tobacco waktu status Age
1              0       0             0       0     4      1   1
2              1       0             0       1     2      1   0
4              0       1             0       0    17      1   0
5              0       1             0       0    26      1   1
6              1       1             0       0     3      1   1
7              0       0             0       0    14      1   0

7. Scaling Data

Scaling data dilakukan untuk normalisasi data yang nantinya digunakan untuk proses clustering data. Proses ini dilakukan karena range antara variabel satu dengan lainnya terlalu jauh. Normalisasi ini menggunakan Z-scores.

CovidDataScale<- scale(CovidData)
head(CovidDataScale)
         sex pneumonia   pregnancy   diabetes       copd     asthma    inmsupr
1 -0.8752406  1.438927 -0.07518716 -0.5025649 -0.1616108 -0.1652522 -0.1387816
2 -0.8752406  1.438927 -0.07518716 -0.5025649 -0.1616108 -0.1652522 -0.1387816
4  1.1425337  1.438927 -0.07518716 -0.5025649 -0.1616108 -0.1652522 -0.1387816
5  1.1425337  1.438927 -0.07518716  1.9897767 -0.1616108 -0.1652522 -0.1387816
6 -0.8752406  1.438927 -0.07518716 -0.5025649  6.1876555  6.0513084 -0.1387816
7 -0.8752406  1.438927 -0.07518716 -0.5025649 -0.1616108 -0.1652522 -0.1387816
  hypertension cardiovascular    obesity renal_chronic    tobacco     waktu
1   -0.5659387     -0.1788443 -0.5017843    -0.1858564 -0.2973432 -1.754919
2   -0.5659387      5.5914099 -0.5017843    -0.1858564  3.3630899 -1.804557
4    1.7669615     -0.1788443  1.9928722    -0.1858564 -0.2973432 -1.432273
5   -0.5659387     -0.1788443  1.9928722    -0.1858564 -0.2973432 -1.208902
6    1.7669615      5.5914099  1.9928722    -0.1858564 -0.2973432 -1.779738
7    1.7669615     -0.1788443 -0.5017843    -0.1858564 -0.2973432 -1.506729
   status        Age
1 1.58511  0.7196612
2 1.58511 -1.3895316
4 1.58511 -1.3895316
5 1.58511  0.7196612
6 1.58511  0.7196612
7 1.58511 -1.3895316

8. Clustering Data

8.1 K-means Algoritma

Dalam menentukan center \(k\) optimum menggunakan elbow method dan dipilih \(k=5\) karena dari cluster 5 ke 6 penurunan nilai total withinss landai/tidak signifikan

RNGkind(sample.kind = "Rounding")
set.seed(298)

clusterDataCovid <- kmeans(x = CovidDataScale, centers = 5)

8.2 Interpretasi/ cluster profiling

CovidData$cluster <- as.factor(clusterDataCovid$cluster)
head(CovidData)
  sex pneumonia pregnancy diabetes copd asthma inmsupr hypertension
1   0         1         0        0    0      0       0            0
2   0         1         0        0    0      0       0            0
4   1         1         0        0    0      0       0            1
5   1         1         0        1    0      0       0            0
6   0         1         0        0    1      1       0            1
7   0         1         0        0    0      0       0            1
  cardiovascular obesity renal_chronic tobacco waktu status Age cluster
1              0       0             0       0     4      1   1       1
2              1       0             0       1     2      1   0       5
4              0       1             0       0    17      1   0       1
5              0       1             0       0    26      1   1       1
6              1       1             0       0     3      1   1       4
7              0       0             0       0    14      1   0       1
CovidProfile <- CovidData %>% 
  group_by(cluster) %>% 
  summarise_all(.funs = "mean")
head(CovidProfile)
# A tibble: 5 x 16
  cluster   sex pneumonia pregnancy diabetes    copd asthma inmsupr hypertension
  <fct>   <dbl>     <dbl>     <dbl>    <dbl>   <dbl>  <dbl>   <dbl>        <dbl>
1 1       0.343     0.742   0.00125    0.382 0       0       0             0.421
2 2       0.463     0.148   0.00765    0.113 0       0       0             0.145
3 3       0.543     0.348   0.00635    0.233 0.00881 0.620   0.405         0.276
4 4       0.439     0.568   0          0.419 1       0.0549  0.0562        0.544
5 5       0.412     0.515   0.00119    0.421 0       0.0288  0.0567        0.629
# ... with 7 more variables: cardiovascular <dbl>, obesity <dbl>,
#   renal_chronic <dbl>, tobacco <dbl>, waktu <dbl>, status <dbl>, Age <dbl>
CovidProfile %>% 
  tidyr::pivot_longer(-cluster) %>% 
  group_by(name) %>% 
  summarize(cluster_min_val = which.min(value),
            cluster_max_val = which.max(value))
# A tibble: 15 x 3
   name           cluster_min_val cluster_max_val
   <chr>                    <int>           <int>
 1 Age                          4               2
 2 asthma                       1               3
 3 cardiovascular               1               5
 4 copd                         1               4
 5 diabetes                     2               5
 6 hypertension                 2               5
 7 inmsupr                      1               3
 8 obesity                      2               5
 9 pneumonia                    2               1
10 pregnancy                    4               2
11 renal_chronic                2               5
12 sex                          1               3
13 status                       2               1
14 tobacco                      2               4
15 waktu                        1               2

Keterangan
1. Cluster 1: Pada Cluster 1 bisa dikatakan Cluster darurat karena pada cluster ini hampir 90% mengalami kematian. Komorbid yang dijumpai dalam cluster ini adalah pneumonia sementara tidak pasien yang memiliki komorbid asthma, cardivascular, copd dan inmsupr.
2. Cluster 2: Pada Cluster 2 berbanding terbalik denga cluster 1 yaitu 99% pasien pada cluster ini sembuh atau survive dari Covid-19. Pada Cluster ini lebih dari 50% pasien berusia muda dan semua pasien tidak dalam keadaan hamil.
3. Cluster 3: Pada cluster 3 komorbid yang sering dijumpai adalah asthma dan banyak pasien yang menalami penurunan imun tubuh.
4. Cluster 4: Pada Cluster 4 komorbid yang sering dijumpai adalah copd.
5. Cluster 5: Pada cluster 5 komorbid yang sering dijumpai adalah cardiovascular, hypertension, diabetes, renal_chronic.

8.3 Visualize clustering

library(factoextra)
fviz_cluster(object = clusterDataCovid,
             data = CovidDataScale)

9. Analisis Survival

Anlisis survival akan diterapkan pada 4 cluster yaitu cluster 1, cluster 3, cluster 4 dan cluster 5. Cluster 2 tidak dikutkan karena pada cluster 2 hampir semua pasien sembuh sementara dalam penelitian ini event berupa kematian.

9.1 Cluster 1

DataCluster1 <- CovidData %>% 
  filter(cluster == 1)

DataCluster1[,c("sex","pneumonia","pregnancy","diabetes","copd","asthma","inmsupr","hypertension","cardiovascular","obesity","renal_chronic","tobacco","status","Age")] <- lapply(DataCluster1[,c("sex","pneumonia","pregnancy","diabetes","copd","asthma","inmsupr","hypertension","cardiovascular","obesity","renal_chronic","tobacco","status","Age")],as.factor)

summary(DataCluster1)
 sex       pneumonia pregnancy diabetes  copd      asthma    inmsupr  
 0:20487   0: 8041   0:31141   0:19272   0:31180   0:31180   0:31180  
 1:10693   1:23139   1:   39   1:11908                                
                                                                      
                                                                      
                                                                      
                                                                      
 hypertension cardiovascular obesity   renal_chronic tobacco       waktu       
 0:18056      0:31180        0:24044   0:28990       0:28648   Min.   :  1.00  
 1:13124                     1: 7136   1: 2190       1: 2532   1st Qu.:  6.00  
                                                               Median : 10.00  
                                                               Mean   : 14.99  
                                                               3rd Qu.: 16.00  
                                                               Max.   :100.00  
 status    Age       cluster  
 0: 1301   0:20457   1:31180  
 1:29879   1:10723   2:    0  
                     3:    0  
                     4:    0  
                     5:    0  
                              

Berdasarkan summary dari DataCluster1 diperoleh tidak ada pasien yang memiliki komorbid copd, asthma dan inmsupr. Jika keempat variabel tersebut diikutkan akan mempengarahui model karena nilai disetiap varaibel sama semua yaitu 0 sehingga yang harus dilakukan adalah menghapus keempat variabel tersebut.

DataCluster1 <- CovidData %>% 
  filter(cluster == 1) %>% 
  select(-c("copd","asthma",'inmsupr','cardiovascular'))
head(DataCluster1)
  sex pneumonia pregnancy diabetes hypertension obesity renal_chronic tobacco
1   0         1         0        0            0       0             0       0
2   1         1         0        0            1       1             0       0
3   1         1         0        1            0       1             0       0
4   0         1         0        0            1       0             0       0
5   1         1         0        0            1       1             0       0
6   0         1         0        0            0       0             0       0
  waktu status Age cluster
1     4      1   1       1
2    17      1   0       1
3    26      1   1       1
4    14      1   0       1
5     2      1   0       1
6     8      1   1       1

1. Uji proportional hazard

Sebelum dilakukan analisis akan dilakukan pengujian asumsi proportional hazard dengan dua pendekatan yaitu pendekatan grafik dan pendekatan Goodness Of Fit (GOF).

a. Pendekatan Grafik

Variabel sex
fit<-survfit(Surv(waktu,status)~ sex, data=DataCluster1)
ggsurvplot(fit, data = DataCluster1, fun = "cumhaz",
           legend = "top",
          legend.title = "Jenis Kelamin",
          legend.labs = c("Laki-Laki", "Perempuan")
          )

Variabel pneumonia
fit<-survfit(Surv(waktu,status)~ pneumonia, data=DataCluster1)
ggsurvplot(fit, data = DataCluster1, fun = "cumhaz",
           legend = "top",
          legend.title = "Pneumonia",
          legend.labs = c("no Pneumonia", "yes pneumonia")
          )

Variabel Pregnancy
fit<-survfit(Surv(waktu,status)~ pregnancy, data=DataCluster1)
ggsurvplot(fit, data = DataCluster1, fun = "cumhaz",
           legend = "top",
          legend.title = "Pregnancy",
          legend.labs = c("no pregnancy", "yes pregnancy")
          )

Variabel diabetes
fit<-survfit(Surv(waktu,status)~ diabetes, data=DataCluster1)
ggsurvplot(fit, data = DataCluster1, fun = "cumhaz",
           legend = "top",
          legend.title = "Diabetes",
          legend.labs = c("no Diabetes", "yes Diabetes")
          )

Variabel Hypertension
fit<-survfit(Surv(waktu,status)~ hypertension, data=DataCluster1)
ggsurvplot(fit, data = DataCluster1, fun = "cumhaz",
           legend = "top",
          legend.title = "Hyperetension",
          legend.labs = c("no hypertension", "yes hypertension")
          )

Variabel obesity
fit<-survfit(Surv(waktu,status)~ obesity, data=DataCluster1)
ggsurvplot(fit, data = DataCluster1, fun = "cumhaz",
           legend = "top",
          legend.title = "Obesity",
          legend.labs = c("no obesity", "yes obesity")
          )

Variabel renal chronic
fit<-survfit(Surv(waktu,status)~ renal_chronic, data=DataCluster1)
ggsurvplot(fit, data = DataCluster1, fun = "cumhaz",
           legend = "top",
          legend.title = "Renal Chronic",
          legend.labs = c("no renal chronic", "yes renal chronic")
          )

variabel tobacco
fit<-survfit(Surv(waktu,status)~ tobacco, data=DataCluster1)
ggsurvplot(fit, data = DataCluster1, fun = "cumhaz",
           legend = "top",
          legend.title = "Tobacco",
          legend.labs = c("no tobacco", "yes tobacco")
          )

Variabel Age
fit<-survfit(Surv(waktu,status)~ Age, data=DataCluster1)
ggsurvplot(fit, data = DataCluster1, fun = "cumhaz",
           legend = "top",
          legend.title = "Age",
          legend.labs = c("young", "old")
          )

b. Pendekatan GOF

fit_cluster1<-coxph(Surv(waktu,status)~ sex+pneumonia+Age+pregnancy+hypertension+obesity+tobacco+diabetes+renal_chronic, data = DataCluster1)

cek_fitt_all1<-cox.zph(fit_cluster1,  transform="km", global=TRUE)

print(cek_fitt_all1)
                 chisq df      p
sex           1.17e+02  1 <2e-16
pneumonia     1.03e+01  1 0.0013
Age           1.94e+02  1 <2e-16
pregnancy     2.97e-02  1 0.8632
hypertension  6.34e+02  1 <2e-16
obesity       3.45e-01  1 0.5572
tobacco       7.03e+00  1 0.0080
diabetes      9.30e+02  1 <2e-16
renal_chronic 4.39e+02  1 <2e-16
GLOBAL        1.42e+03  9 <2e-16

Perhatikan hasil pengujian ada beberapa yang tidak memenuhi asumsi proportional hazard yaitu sex, pneumonia, age, hypertension, tobacco, diabetes, renal_chronic. maka ke-tujuh variabel ini akan dinteraksikan dengan fungsi waktu yaitu fungsi heavside pada setiap variabel. Untuk dapat menentukan fungsi heavsidenya adalah dengan melihat grafik survival probabilty.

# VARIABEL SEX
fit<-survfit(Surv(waktu,status)~ sex , data=DataCluster1)
ggsurvplot(fit, data = DataCluster1, 
pval = TRUE, conf.int = TRUE,
 risk.table = TRUE, # Add risk table
 risk.table.col = "strata", # Change risk table color by groups
 linetype = "strata", # Change line type by groups
 surv.median.line = "hv", # Specify median survival
 ggtheme = theme_bw(), # Change ggplot2 theme
 palette = c("#E7B800", "#2E9FDF"))

Perhatikan kecepatan grafik pasien laki-laki turun cepat pada hari ke 25 namun setelah hari ke 25 konstan sampai akhir penelitian. sehingga fungsi heavsidenya adalah:

\[ g(t)= \begin{cases} 1, & \mbox{jika}\ t\mbox{ < 25} \\ 0, & \mbox{jika}\ t\mbox{ >= 25} \end{cases}\]

# VARIABEL PNEUMONIA
fit<-survfit(Surv(waktu,status)~ pneumonia , data=DataCluster1)
ggsurvplot(fit, data = DataCluster1, 
pval = TRUE, conf.int = TRUE,
 risk.table = TRUE, # Add risk table
 risk.table.col = "strata", # Change risk table color by groups
 linetype = "strata", # Change line type by groups
 surv.median.line = "hv", # Specify median survival
 ggtheme = theme_bw(), # Change ggplot2 theme
 palette = c("#E7B800", "#2E9FDF"))

Perhatikan kecepatan grafik pasien komorbid pneumonia turun cepat pada hari ke 25, namun setelah hari ke 25 konstan sampai akhir penelitian. sehingga fungsi heavsidenya adalah

\[ g(t)= \begin{cases} 1, & \mbox{jika}\ t\mbox{ < 25} \\ 0, & \mbox{jika}\ t\mbox{ >= 25} \end{cases}\]

# VARIABEL AGE
fit<-survfit(Surv(waktu,status)~ Age , data=DataCluster1)
ggsurvplot(fit, data = DataCluster1, 
pval = TRUE, conf.int = TRUE,
 risk.table = TRUE, # Add risk table
 risk.table.col = "strata", # Change risk table color by groups
 linetype = "strata", # Change line type by groups
 surv.median.line = "hv", # Specify median survival
 ggtheme = theme_bw(), # Change ggplot2 theme
 palette = c("#E7B800", "#2E9FDF"))

Perhatikan kecepatan grafik pasien yang berusia tua turun cepat pada hari ke 25, namun setelah hari ke 25 konstan sampai akhir penelitian. sehinggan fungsi heavsidenya adalah

\[ g(t)= \begin{cases} 1, & \mbox{jika}\ t\mbox{ < 25} \\ 0, & \mbox{jika}\ t\mbox{ >= 25} \end{cases}\]

#VARIABE HYPERTENSION
fit<-survfit(Surv(waktu,status)~ hypertension , data=DataCluster1)
ggsurvplot(fit, data = DataCluster1, 
pval = TRUE, conf.int = TRUE,
 risk.table = TRUE, # Add risk table
 risk.table.col = "strata", # Change risk table color by groups
 linetype = "strata", # Change line type by groups
 surv.median.line = "hv", # Specify median survival
 ggtheme = theme_bw(), # Change ggplot2 theme
 palette = c("#E7B800", "#2E9FDF"))

Perhatikan kecepatan grafik pasien yang hipertensi turun cepat pada hari ke 25 namun setelah hari ke 25 konstan sampai akhir penelitian. sehinggan fungsi heavsidenya adalah

\[ g(t)= \begin{cases} 1, & \mbox{jika}\ t\mbox{ < 25} \\ 0, & \mbox{jika}\ t\mbox{ >= 25} \end{cases}\]

fit<-survfit(Surv(waktu,status)~ tobacco, data=DataCluster1)
ggsurvplot(fit, data = DataCluster1, 
pval = TRUE, conf.int = TRUE,
 risk.table = TRUE, # Add risk table
 risk.table.col = "strata", # Change risk table color by groups
 linetype = "strata", # Change line type by groups
 surv.median.line = "hv", # Specify median survival
 ggtheme = theme_bw(), # Change ggplot2 theme
 palette = c("#E7B800", "#2E9FDF"))

Perhatikan kecepatan grafik pasien yang perokok turun cepat pada hari ke 25 namun setelah hari ke 25 konstan sampai akhir penelitian. sehinggan fungsi heavsidenya adalah

\[ g(t)= \begin{cases} 1, & \mbox{jika}\ t\mbox{ < 25} \\ 0, & \mbox{jika}\ t\mbox{ >= 25} \end{cases}\]

fit<-survfit(Surv(waktu,status)~ diabetes, data=DataCluster1)
ggsurvplot(fit, data = DataCluster1, 
pval = TRUE, conf.int = TRUE,
 risk.table = TRUE, # Add risk table
 risk.table.col = "strata", # Change risk table color by groups
 linetype = "strata", # Change line type by groups
 surv.median.line = "hv", # Specify median survival
 ggtheme = theme_bw(), # Change ggplot2 theme
 palette = c("#E7B800", "#2E9FDF"))

Perhatikan kecepatan grafik pasien yang diabetes cepat pada hari ke 25 namun setelah hari ke 25 konstan sampai akhir penelitian. sehinggan fungsi heavsidenya adalah

\[ g(t)= \begin{cases} 1, & \mbox{jika}\ t\mbox{ < 25} \\ 0, & \mbox{jika}\ t\mbox{ >= 25} \end{cases}\]

fit<-survfit(Surv(waktu,status)~ renal_chronic , data=DataCluster1)
ggsurvplot(fit, data = DataCluster1, 
pval = TRUE, conf.int = TRUE,
 risk.table = TRUE, # Add risk table
 risk.table.col = "strata", # Change risk table color by groups
 linetype = "strata", # Change line type by groups
 surv.median.line = "hv", # Specify median survival
 ggtheme = theme_bw(), # Change ggplot2 theme
 palette = c("#E7B800", "#2E9FDF"))

Perhatikan kecepatan grafik pasien yang renal chronic turun cepat pada hari ke 25 namun setelah hari ke 25 konstan sampai akhir penelitian. sehinggan fungsi heavsidenya adalah

\[ g(t)= \begin{cases} 1, & \mbox{jika}\ t\mbox{ < 25} \\ 0, & \mbox{jika}\ t\mbox{ >= 25} \end{cases}\]

sehingga ke-tujuh variabel tersebut diinterkasikan dengan fungsi waktu masing-masing. namun sebelumnya membuat kolom interval yaitu jika t lebih besar atau sama dengan 25 maka nilanya 25 tapi kalau tidak nilainya 0

DataCluster1= DataCluster1 %>%
        mutate(interval = ifelse(waktu<= 25, 0, 25))
head(DataCluster1)
  sex pneumonia pregnancy diabetes hypertension obesity renal_chronic tobacco
1   0         1         0        0            0       0             0       0
2   1         1         0        0            1       1             0       0
3   1         1         0        1            0       1             0       0
4   0         1         0        0            1       0             0       0
5   1         1         0        0            1       1             0       0
6   0         1         0        0            0       0             0       0
  waktu status Age cluster interval
1     4      1   1       1        0
2    17      1   0       1        0
3    26      1   1       1       25
4    14      1   0       1        0
5     2      1   0       1        0
6     8      1   1       1        0
# Variabel Pneumonia
t.pneumonia_1 = t(DataCluster1$pneumonia)*(1-(DataCluster1$interval)/25) 
t.pneumonia_2 = t(DataCluster1$pneumonia)*(DataCluster1$interval/25) 
pneumonia_t1 = t(t.pneumonia_1 )
pneumonia_t2 = t(t.pneumonia_2)
#variabel Sex
t.sex_1 = t(DataCluster1$sex)*(1-(DataCluster1$interval)/25) 
t.sex_2 = t(DataCluster1$sex)*(DataCluster1$interval/25) 
sex_t1 = t(t.sex_1 )
sex_t2 = t(t.sex_2)
# Variabel Sex
t.hypertension_1 = t(DataCluster1$hypertension)*(1-(DataCluster1$interval)/25) 
t.hypertension_2 = t(DataCluster1$hypertension)*(DataCluster1$interval/25) 
hypertension_t1 = t(t.hypertension_1 )
hypertension_t2 = t(t.hypertension_2)
#Variabel Age
t.Age_1 = t(DataCluster1$Age)*(1-(DataCluster1$interval)/25) 
t.Age_2 = t(DataCluster1$Age)*(DataCluster1$interval/25) 
Age_t1 = t(t.Age_1 )
Age_t2 = t(t.Age_2)
### Variabel Tobacco
t.tobacco_1 = t(DataCluster1$tobacco)*(1-(DataCluster1$interval)/25) 
t.tobacco_2 = t(DataCluster1$tobacco)*(DataCluster1$interval/25) 
tobacco_t1 = t(t.tobacco_1)
tobacco_t2 = t(t.tobacco_2)
# Variabel Diabetes
t.diabetes_1 = t(DataCluster1$diabetes)*(1-(DataCluster1$interval)/25) 
t.diabetes_2 = t(DataCluster1$diabetes)*(DataCluster1$interval/25) 
diabetes_t1 = t(t.diabetes_1 )
diabetes_t2 = t(t.diabetes_2)
#Variabel renal chronic
t.renal_chronic_1 = 
  t(DataCluster1$renal_chronic)*(1-(DataCluster1$interval)/25) 
t.renal_chronic_2 = 
  t(DataCluster1$renal_chronic)*(DataCluster1$interval/25) 
renal_chronic_t1 = t(t.renal_chronic_1 )
renal_chronic_t2 = t(t.renal_chronic_2)
DataCluster1=cbind(DataCluster1,pneumonia_t1,pneumonia_t2,sex_t1,sex_t2,hypertension_t1,hypertension_t2,Age_t1,Age_t2,tobacco_t1,tobacco_t2,diabetes_t1,diabetes_t2,renal_chronic_t1,renal_chronic_t2)
head(DataCluster1)
  sex pneumonia pregnancy diabetes hypertension obesity renal_chronic tobacco
1   0         1         0        0            0       0             0       0
2   1         1         0        0            1       1             0       0
3   1         1         0        1            0       1             0       0
4   0         1         0        0            1       0             0       0
5   1         1         0        0            1       1             0       0
6   0         1         0        0            0       0             0       0
  waktu status Age cluster interval pneumonia_t1 pneumonia_t2 sex_t1 sex_t2
1     4      1   1       1        0            1            0      0      0
2    17      1   0       1        0            1            0      1      0
3    26      1   1       1       25            0            1      0      1
4    14      1   0       1        0            1            0      0      0
5     2      1   0       1        0            1            0      1      0
6     8      1   1       1        0            1            0      0      0
  hypertension_t1 hypertension_t2 Age_t1 Age_t2 tobacco_t1 tobacco_t2
1               0               0      1      0          0          0
2               1               0      0      0          0          0
3               0               0      0      1          0          0
4               1               0      0      0          0          0
5               1               0      0      0          0          0
6               0               0      1      0          0          0
  diabetes_t1 diabetes_t2 renal_chronic_t1 renal_chronic_t2
1           0           0                0                0
2           0           0                0                0
3           0           1                0                0
4           0           0                0                0
5           0           0                0                0
6           0           0                0                0

2. Model Extended Cox with Bresslow Method

Fit_Model_Cluster1<-coxph(Surv(waktu,status)~ pregnancy+obesity+pneumonia_t1+pneumonia_t2+sex_t1+sex_t2+hypertension_t1+hypertension_t2+Age_t1+Age_t2+tobacco_t1+tobacco_t2+diabetes_t1+diabetes_t2+renal_chronic_t1+renal_chronic_t2, data = DataCluster1, method = "breslow")

summary(Fit_Model_Cluster1)
Call:
coxph(formula = Surv(waktu, status) ~ pregnancy + obesity + pneumonia_t1 + 
    pneumonia_t2 + sex_t1 + sex_t2 + hypertension_t1 + hypertension_t2 + 
    Age_t1 + Age_t2 + tobacco_t1 + tobacco_t2 + diabetes_t1 + 
    diabetes_t2 + renal_chronic_t1 + renal_chronic_t2, data = DataCluster1, 
    method = "breslow")

  n= 31180, number of events= 29879 

                      coef exp(coef)  se(coef)       z Pr(>|z|)    
pregnancy        -0.031851  0.968651  0.160725  -0.198   0.8429    
obesity          -0.001491  0.998510  0.014045  -0.106   0.9155    
pneumonia_t1      0.027843  1.028235  0.013635   2.042   0.0412 *  
pneumonia_t2     -1.501625  0.222768  0.041847 -35.883  < 2e-16 ***
sex_t1            0.109628  1.115863  0.012823   8.549  < 2e-16 ***
sex_t2           -0.553374  0.575007  0.052150 -10.611  < 2e-16 ***
hypertension_t1   0.065329  1.067510  0.013242   4.934 8.07e-07 ***
hypertension_t2  -1.259851  0.283696  0.063089 -19.969  < 2e-16 ***
Age_t1            0.063498  1.065558  0.012784   4.967 6.80e-07 ***
Age_t2           -0.723086  0.485253  0.049179 -14.703  < 2e-16 ***
tobacco_t1        0.023668  1.023950  0.021906   1.080   0.2800    
tobacco_t2       -0.780017  0.458398  0.092202  -8.460  < 2e-16 ***
diabetes_t1       0.111644  1.118115  0.013187   8.466  < 2e-16 ***
diabetes_t2      -1.415757  0.242742  0.068008 -20.818  < 2e-16 ***
renal_chronic_t1  0.240653  1.272079  0.024665   9.757  < 2e-16 ***
renal_chronic_t2 -0.860596  0.422910  0.126612  -6.797 1.07e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

                 exp(coef) exp(-coef) lower .95 upper .95
pregnancy           0.9687     1.0324    0.7069    1.3273
obesity             0.9985     1.0015    0.9714    1.0264
pneumonia_t1        1.0282     0.9725    1.0011    1.0561
pneumonia_t2        0.2228     4.4890    0.2052    0.2418
sex_t1              1.1159     0.8962    1.0882    1.1443
sex_t2              0.5750     1.7391    0.5191    0.6369
hypertension_t1     1.0675     0.9368    1.0402    1.0956
hypertension_t2     0.2837     3.5249    0.2507    0.3210
Age_t1              1.0656     0.9385    1.0392    1.0926
Age_t2              0.4853     2.0608    0.4407    0.5344
tobacco_t1          1.0240     0.9766    0.9809    1.0689
tobacco_t2          0.4584     2.1815    0.3826    0.5492
diabetes_t1         1.1181     0.8944    1.0896    1.1474
diabetes_t2         0.2427     4.1196    0.2124    0.2774
renal_chronic_t1    1.2721     0.7861    1.2120    1.3351
renal_chronic_t2    0.4229     2.3646    0.3300    0.5420

Concordance= 0.606  (se = 0.002 )
Likelihood ratio test= 15732  on 16 df,   p=<2e-16
Wald test            = 6644  on 16 df,   p=<2e-16
Score (logrank) test = 9498  on 16 df,   p=<2e-16

3. Model Improvment

model.stepwise.cluster1<-step(object = Fit_Model_Cluster1,direction = "both", trace=T,data=DataCluster1)
Start:  AIC=553964.4
Surv(waktu, status) ~ pregnancy + obesity + pneumonia_t1 + pneumonia_t2 + 
    sex_t1 + sex_t2 + hypertension_t1 + hypertension_t2 + Age_t1 + 
    Age_t2 + tobacco_t1 + tobacco_t2 + diabetes_t1 + diabetes_t2 + 
    renal_chronic_t1 + renal_chronic_t2

                   Df    AIC
- obesity           1 553962
- pregnancy         1 553962
- tobacco_t1        1 553964
<none>                553964
- pneumonia_t1      1 553967
- hypertension_t1   1 553987
- Age_t1            1 553987
- renal_chronic_t2  1 554021
- diabetes_t1       1 554033
- sex_t1            1 554035
- renal_chronic_t1  1 554052
- tobacco_t2        1 554054
- sex_t2            1 554088
- Age_t2            1 554215
- hypertension_t2   1 554448
- diabetes_t2       1 554481
- pneumonia_t2      1 555469

Step:  AIC=553962.4
Surv(waktu, status) ~ pregnancy + pneumonia_t1 + pneumonia_t2 + 
    sex_t1 + sex_t2 + hypertension_t1 + hypertension_t2 + Age_t1 + 
    Age_t2 + tobacco_t1 + tobacco_t2 + diabetes_t1 + diabetes_t2 + 
    renal_chronic_t1 + renal_chronic_t2

                   Df    AIC
- pregnancy         1 553960
- tobacco_t1        1 553962
<none>                553962
+ obesity           1 553964
- pneumonia_t1      1 553965
- hypertension_t1   1 553985
- Age_t1            1 553985
- renal_chronic_t2  1 554019
- diabetes_t1       1 554031
- sex_t1            1 554033
- renal_chronic_t1  1 554050
- tobacco_t2        1 554052
- sex_t2            1 554086
- Age_t2            1 554213
- hypertension_t2   1 554446
- diabetes_t2       1 554479
- pneumonia_t2      1 555468

Step:  AIC=553960.4
Surv(waktu, status) ~ pneumonia_t1 + pneumonia_t2 + sex_t1 + 
    sex_t2 + hypertension_t1 + hypertension_t2 + Age_t1 + Age_t2 + 
    tobacco_t1 + tobacco_t2 + diabetes_t1 + diabetes_t2 + renal_chronic_t1 + 
    renal_chronic_t2

                   Df    AIC
- tobacco_t1        1 553960
<none>                553960
+ pregnancy         1 553962
+ obesity           1 553962
- pneumonia_t1      1 553963
- hypertension_t1   1 553983
- Age_t1            1 553983
- renal_chronic_t2  1 554017
- diabetes_t1       1 554029
- sex_t1            1 554031
- renal_chronic_t1  1 554048
- tobacco_t2        1 554050
- sex_t2            1 554084
- Age_t2            1 554211
- hypertension_t2   1 554444
- diabetes_t2       1 554477
- pneumonia_t2      1 555466

Step:  AIC=553959.6
Surv(waktu, status) ~ pneumonia_t1 + pneumonia_t2 + sex_t1 + 
    sex_t2 + hypertension_t1 + hypertension_t2 + Age_t1 + Age_t2 + 
    tobacco_t2 + diabetes_t1 + diabetes_t2 + renal_chronic_t1 + 
    renal_chronic_t2

                   Df    AIC
<none>                553960
+ tobacco_t1        1 553960
+ pregnancy         1 553962
+ obesity           1 553962
- pneumonia_t1      1 553962
- hypertension_t1   1 553982
- Age_t1            1 553982
- renal_chronic_t2  1 554016
- sex_t1            1 554029
- diabetes_t1       1 554029
- renal_chronic_t1  1 554047
- tobacco_t2        1 554049
- sex_t2            1 554084
- Age_t2            1 554211
- hypertension_t2   1 554444
- diabetes_t2       1 554477
- pneumonia_t2      1 555468

4. Model Final

Model_Final_Cluster1<-coxph(Surv(waktu, status) ~ 
      + pneumonia_t2 + sex_t1 + sex_t2 + hypertension_t1 +
        hypertension_t2 + Age_t1 + Age_t2 + tobacco_t2 + diabetes_t1 +
        diabetes_t2 + renal_chronic_t1 + 
        renal_chronic_t2, data = DataCluster1, method = "breslow")

summary(Model_Final_Cluster1)
Call:
coxph(formula = Surv(waktu, status) ~ +pneumonia_t2 + sex_t1 + 
    sex_t2 + hypertension_t1 + hypertension_t2 + Age_t1 + Age_t2 + 
    tobacco_t2 + diabetes_t1 + diabetes_t2 + renal_chronic_t1 + 
    renal_chronic_t2, data = DataCluster1, method = "breslow")

  n= 31180, number of events= 29879 

                     coef exp(coef) se(coef)       z Pr(>|z|)    
pneumonia_t2     -1.50954   0.22101  0.04161 -36.280  < 2e-16 ***
sex_t1            0.10793   1.11397  0.01270   8.500  < 2e-16 ***
sex_t2           -0.55884   0.57187  0.05205 -10.736  < 2e-16 ***
hypertension_t1   0.06566   1.06786  0.01319   4.977 6.47e-07 ***
hypertension_t2  -1.26332   0.28271  0.06295 -20.069  < 2e-16 ***
Age_t1            0.06394   1.06602  0.01267   5.045 4.54e-07 ***
Age_t2           -0.73190   0.48099  0.04897 -14.945  < 2e-16 ***
tobacco_t2       -0.78736   0.45504  0.09213  -8.547  < 2e-16 ***
diabetes_t1       0.11329   1.11996  0.01317   8.603  < 2e-16 ***
diabetes_t2      -1.41660   0.24254  0.06789 -20.866  < 2e-16 ***
renal_chronic_t1  0.23788   1.26855  0.02459   9.675  < 2e-16 ***
renal_chronic_t2 -0.86118   0.42266  0.12661  -6.802 1.03e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

                 exp(coef) exp(-coef) lower .95 upper .95
pneumonia_t2        0.2210     4.5246    0.2037    0.2398
sex_t1              1.1140     0.8977    1.0866    1.1420
sex_t2              0.5719     1.7486    0.5164    0.6333
hypertension_t1     1.0679     0.9365    1.0406    1.0958
hypertension_t2     0.2827     3.5371    0.2499    0.3198
Age_t1              1.0660     0.9381    1.0399    1.0928
Age_t2              0.4810     2.0790    0.4370    0.5294
tobacco_t2          0.4550     2.1976    0.3799    0.5451
diabetes_t1         1.1200     0.8929    1.0914    1.1492
diabetes_t2         0.2425     4.1231    0.2123    0.2771
renal_chronic_t1    1.2686     0.7883    1.2089    1.3312
renal_chronic_t2    0.4227     2.3660    0.3298    0.5417

Concordance= 0.609  (se = 0.002 )
Likelihood ratio test= 15726  on 12 df,   p=<2e-16
Wald test            = 6657  on 12 df,   p=<2e-16
Score (logrank) test = 9498  on 12 df,   p=<2e-16

9.2 Cluster 2

DataCluster2 <- CovidData %>% 
  filter(cluster == 2)

DataCluster2[,c("sex","pneumonia","pregnancy","diabetes","copd","asthma","inmsupr","hypertension","cardiovascular","obesity","renal_chronic","tobacco","status","Age")] <- lapply(DataCluster2[,c("sex","pneumonia","pregnancy","diabetes","copd","asthma","inmsupr","hypertension","cardiovascular","obesity","renal_chronic","tobacco","status","Age")],as.factor)

summary(DataCluster2)
 sex       pneumonia pregnancy diabetes  copd      asthma    inmsupr  
 0:43765   0:69413   0:80828   0:72215   0:81451   0:81451   0:81451  
 1:37686   1:12038   1:  623   1: 9236                                
                                                                      
                                                                      
                                                                      
                                                                      
 hypertension cardiovascular obesity   renal_chronic tobacco       waktu       
 0:69655      0:81451        0:66847   0:80602       0:75397   Min.   : 42.00  
 1:11796                     1:14604   1:  849       1: 6054   1st Qu.:100.00  
                                                               Median :100.00  
                                                               Mean   : 99.99  
                                                               3rd Qu.:100.00  
                                                               Max.   :100.00  
 status    Age       cluster  
 0:81439   0:15585   1:    0  
 1:   12   1:65866   2:81451  
                     3:    0  
                     4:    0  
                     5:    0  
                              

Karena Clsuter 2 banyak yang survive maka cluster 2 tidak dilakukan analisis karena batas penelitian ini adalah eventnya adalah kematian

9.3 Cluster 3

DataCluster3<- CovidData %>% 
  filter(cluster == 3)

DataCluster3[,c("sex","pneumonia","pregnancy","diabetes","copd","asthma","inmsupr","hypertension","cardiovascular","obesity","renal_chronic","tobacco","status","Age")] <- lapply(DataCluster3[,c("sex","pneumonia","pregnancy","diabetes","copd","asthma","inmsupr","hypertension","cardiovascular","obesity","renal_chronic","tobacco","status","Age")],as.factor)

summary(DataCluster3)
 sex      pneumonia pregnancy diabetes copd     asthma   inmsupr  hypertension
 0:2229   0:3180    0:4849    0:3745   0:4837   0:1852   0:2903   0:3532      
 1:2651   1:1700    1:  31    1:1135   1:  43   1:3028   1:1977   1:1348      
                                                                              
                                                                              
                                                                              
                                                                              
 cardiovascular obesity  renal_chronic tobacco      waktu        status  
 0:4843         0:3592   0:4484        0:4484   Min.   :  1.00   0:3294  
 1:  37         1:1288   1: 396        1: 396   1st Qu.: 15.00   1:1586  
                                                Median :100.00           
                                                Mean   : 70.97           
                                                3rd Qu.:100.00           
                                                Max.   :100.00           
 Age      cluster 
 0:1473   1:   0  
 1:3407   2:   0  
          3:4880  
          4:   0  
          5:   0  
                  
DataCluster3<- CovidData %>% 
  filter(cluster == 3)
head(DataCluster3)
  sex pneumonia pregnancy diabetes copd asthma inmsupr hypertension
1   1         1         0        0    0      0       1            1
2   1         1         0        0    0      0       1            1
3   0         1         0        0    0      1       1            1
4   0         1         0        0    0      0       1            1
5   1         0         0        0    0      0       1            0
6   0         1         0        1    0      0       1            0
  cardiovascular obesity renal_chronic tobacco waktu status Age cluster
1              0       0             1       0     7      1   1       3
2              0       1             1       0    12      1   1       3
3              0       0             0       1    22      1   1       3
4              0       0             0       0     9      1   0       3
5              0       0             0       0     9      1   1       3
6              0       0             1       1     6      1   0       3

Sebelum diterapkan model extended cox perlu dilakukan pengujian apakah semua variabel independent terhadap waktu. uji asumsi proportional hazarad ada dua pendekatan yaitu pendekatan secara grafik dan pendekatan GOF.

a. Pendekatan Grafik

Variabel sex
fit<-survfit(Surv(waktu,status)~ sex, data=DataCluster3)
ggsurvplot(fit, data = DataCluster3, fun = "cumhaz",
           legend = "top",
          legend.title = "Jenis Kelamin",
          legend.labs = c("Laki-Laki", "Perempuan")
          )

Variabel pneumonia
fit<-survfit(Surv(waktu,status)~ pneumonia, data=DataCluster3)
ggsurvplot(fit, data = DataCluster3, fun = "cumhaz",
           legend = "top",
          legend.title = "Pneumonia",
          legend.labs = c("no Pneumonia", "yes pneumonia")
          )

Variabel Pregnancy
fit<-survfit(Surv(waktu,status)~ pregnancy, data=DataCluster3)
ggsurvplot(fit, data = DataCluster3, fun = "cumhaz",
           legend = "top",
          legend.title = "Pregnancy",
          legend.labs = c("no pregnancy", "yes pregnancy")
          )

Variabel diabetes
fit<-survfit(Surv(waktu,status)~ diabetes, data=DataCluster3)
ggsurvplot(fit, data = DataCluster3, fun = "cumhaz",
           legend = "top",
          legend.title = "Diabetes",
          legend.labs = c("no Diabetes", "yes Diabetes")
          )

Variabel Hypertension
fit<-survfit(Surv(waktu,status)~ hypertension, data=DataCluster3)
ggsurvplot(fit, data = DataCluster3, fun = "cumhaz",
           legend = "top",
          legend.title = "Hyperetension",
          legend.labs = c("no hypertension", "yes hypertension")
          )

Variabel obesity
fit<-survfit(Surv(waktu,status)~ obesity, data=DataCluster3)
ggsurvplot(fit, data = DataCluster3, fun = "cumhaz",
           legend = "top",
          legend.title = "Obesity",
          legend.labs = c("no obesity", "yes obesity")
          )

Variabel renal chronic
fit<-survfit(Surv(waktu,status)~ renal_chronic, data=DataCluster3)
ggsurvplot(fit, data = DataCluster3, fun = "cumhaz",
           legend = "top",
          legend.title = "Renal Chronic",
          legend.labs = c("no renal chronic", "yes renal chronic")
          )

Variabel tobacco
fit<-survfit(Surv(waktu,status)~ tobacco, data=DataCluster3)
ggsurvplot(fit, data = DataCluster3, fun = "cumhaz",
           legend = "top",
          legend.title = "Tobacco",
          legend.labs = c("no tobacco", "yes tobacco")
          )

Variabel Age
fit<-survfit(Surv(waktu,status)~ Age, data=DataCluster3)
ggsurvplot(fit, data = DataCluster3, fun = "cumhaz",
           legend = "top",
          legend.title = "Age",
          legend.labs = c("young", "old")
          )

Variabel COPD
fit<-survfit(Surv(waktu,status)~ copd, data=DataCluster3)
ggsurvplot(fit, data = DataCluster3, fun = "cumhaz",
           legend = "top",
          legend.title = "Age",
          legend.labs = c("young", "old")
          )

Variabel inmsupr
fit<-survfit(Surv(waktu,status)~ inmsupr, data=DataCluster3)
ggsurvplot(fit, data = DataCluster3, fun = "cumhaz",
           legend = "top",
          legend.title = "inmsupr",
          legend.labs = c("no inmsupr", "yes inmsupr")
          )

Variabel asthma
fit<-survfit(Surv(waktu,status)~ asthma, data=DataCluster3)
ggsurvplot(fit, data = DataCluster3, fun = "cumhaz",
           legend = "top",
          legend.title = "asthma",
          legend.labs = c("no asthma", "yes asthma")
          )

Variabel cardiovascular
fit<-survfit(Surv(waktu,status)~ cardiovascular, data=DataCluster3)
ggsurvplot(fit, data = DataCluster3, fun = "cumhaz",
           legend = "top",
          legend.title = "Cardiovascular",
          legend.labs = c("No", "Yes")
          )

b. Pendekatan GOF

fit_cluster3<-coxph(Surv(waktu,status)~ sex+pneumonia+Age+pregnancy+hypertension+obesity+tobacco+diabetes+renal_chronic+inmsupr+asthma+cardiovascular+copd, data = DataCluster3)

cek_fitt_all3<-cox.zph(fit_cluster3,  transform="km", global=TRUE)

print(cek_fitt_all3)
                 chisq df       p
sex              4.480  1 0.03430
pneumonia       68.983  1 < 2e-16
Age             13.476  1 0.00024
pregnancy        2.327  1 0.12715
hypertension     1.995  1 0.15787
obesity          1.958  1 0.16171
tobacco          0.116  1 0.73356
diabetes         1.501  1 0.22053
renal_chronic    0.697  1 0.40380
inmsupr          8.815  1 0.00299
asthma           7.261  1 0.00705
cardiovascular   0.381  1 0.53730
copd             0.204  1 0.65118
GLOBAL         107.288 13 < 2e-16

Ada 4 variabel yang tidak memenuhi asumsi proportional hazard yaitu: pneumonia, Age, inmsupr, asthma dengan taraf signifikansi 10%. maka ke-empat variabel ini akan dinteraksikan dengan fungsi waktu yaitu fungsi heavside pada setiap variabel. Untuk dapat menentukan fungsi heavsidenya adalah dengan melihat grafik survival probabilty.

# VARIABEL PNEUMONIA
fit<-survfit(Surv(waktu,status)~ pneumonia , data=DataCluster3)
ggsurvplot(fit, data = DataCluster3, 
pval = TRUE, conf.int = TRUE,
 risk.table = TRUE, # Add risk table
 risk.table.col = "strata", # Change risk table color by groups
 linetype = "strata", # Change line type by groups
 surv.median.line = "hv", # Specify median survival
 ggtheme = theme_bw(), # Change ggplot2 theme
 palette = c("#E7B800", "#2E9FDF"))

Perhatikan kecepatan grafik pasien pneumonia turun cepat pada hari ke 25 namun setelah hari ke 25 konstan sampai akhir penelitian. sehinggan fungsi heavsidenya adalah

\[ g(t)= \begin{cases} 1, & \mbox{jika}\ t\mbox{ < 25} \\ 0, & \mbox{jika}\ t\mbox{ >= 25} \end{cases}\]

# VARIABEL Age
fit<-survfit(Surv(waktu,status)~ Age , data=DataCluster3)
ggsurvplot(fit, data = DataCluster3, 
pval = TRUE, conf.int = TRUE,
 risk.table = TRUE, # Add risk table
 risk.table.col = "strata", # Change risk table color by groups
 linetype = "strata", # Change line type by groups
 surv.median.line = "hv", # Specify median survival
 ggtheme = theme_bw(), # Change ggplot2 theme
 palette = c("#E7B800", "#2E9FDF"))

# VARIABEL Asthma
fit<-survfit(Surv(waktu,status)~ asthma , data=DataCluster3)
ggsurvplot(fit, data = DataCluster3, 
pval = TRUE, conf.int = TRUE,
 risk.table = TRUE, # Add risk table
 risk.table.col = "strata", # Change risk table color by groups
 linetype = "strata", # Change line type by groups
 surv.median.line = "hv", # Specify median survival
 ggtheme = theme_bw(), # Change ggplot2 theme
 palette = c("#E7B800", "#2E9FDF"))

# VARIABEL inmsupr
fit<-survfit(Surv(waktu,status)~ inmsupr, data=DataCluster3)
ggsurvplot(fit, data = DataCluster3, 
pval = TRUE, conf.int = TRUE,
 risk.table = TRUE, # Add risk table
 risk.table.col = "strata", # Change risk table color by groups
 linetype = "strata", # Change line type by groups
 surv.median.line = "hv", # Specify median survival
 ggtheme = theme_bw(), # Change ggplot2 theme
 palette = c("#E7B800", "#2E9FDF"))

sehingga ke-empat tersebut diinterkasikan dengan fungsi waktu masing-masing. namun sebelumnya membuat kolom interval yaitu jika t lebih besar atau sama dengan 25 maka nilanya 25 tapi kalau tidak nilainya 0

DataCluster3= DataCluster3 %>%
        mutate(interval = ifelse(waktu<= 25, 0, 25))
head(DataCluster3)
  sex pneumonia pregnancy diabetes copd asthma inmsupr hypertension
1   1         1         0        0    0      0       1            1
2   1         1         0        0    0      0       1            1
3   0         1         0        0    0      1       1            1
4   0         1         0        0    0      0       1            1
5   1         0         0        0    0      0       1            0
6   0         1         0        1    0      0       1            0
  cardiovascular obesity renal_chronic tobacco waktu status Age cluster
1              0       0             1       0     7      1   1       3
2              0       1             1       0    12      1   1       3
3              0       0             0       1    22      1   1       3
4              0       0             0       0     9      1   0       3
5              0       0             0       0     9      1   1       3
6              0       0             1       1     6      1   0       3
  interval
1        0
2        0
3        0
4        0
5        0
6        0
t.pneumonia_1 = t(DataCluster3$pneumonia)*(1-(DataCluster3$interval)/25) 
t.pneumonia_2 = t(DataCluster3$pneumonia)*(DataCluster3$interval/25) 
pneumonia_t1 = t(t.pneumonia_1 )
pneumonia_t2 = t(t.pneumonia_2)
t.Age_1 = t(DataCluster3$Age)*(1-(DataCluster3$interval)/25) 
t.Age_2 = t(DataCluster3$Age)*(DataCluster3$interval/25) 
Age_t1 = t(t.Age_1 )
Age_t2 = t(t.Age_2)
t.asthma_1 = t(DataCluster3$asthma)*(1-(DataCluster3$interval)/25) 
t.asthma_2 = t(DataCluster3$asthma)*(DataCluster3$interval/25) 
asthma_t1 = t(t.asthma_1 )
asthma_t2 = t(t.asthma_2)
t.inmsupr_1 = t(DataCluster3$inmsupr)*(1-(DataCluster3$interval)/25) 
t.inmsupr_2 = t(DataCluster3$inmsupr)*(DataCluster3$interval/25) 
inmsupr_t1 = t(t.inmsupr_1 )
inmsupr_t2 = t(t.inmsupr_2)
DataCluster3=cbind(DataCluster3,pneumonia_t1,pneumonia_t2,asthma_t1,asthma_t2,inmsupr_t2,inmsupr_t1,Age_t1,Age_t2)
head(DataCluster3)
  sex pneumonia pregnancy diabetes copd asthma inmsupr hypertension
1   1         1         0        0    0      0       1            1
2   1         1         0        0    0      0       1            1
3   0         1         0        0    0      1       1            1
4   0         1         0        0    0      0       1            1
5   1         0         0        0    0      0       1            0
6   0         1         0        1    0      0       1            0
  cardiovascular obesity renal_chronic tobacco waktu status Age cluster
1              0       0             1       0     7      1   1       3
2              0       1             1       0    12      1   1       3
3              0       0             0       1    22      1   1       3
4              0       0             0       0     9      1   0       3
5              0       0             0       0     9      1   1       3
6              0       0             1       1     6      1   0       3
  interval pneumonia_t1 pneumonia_t2 asthma_t1 asthma_t2 inmsupr_t2 inmsupr_t1
1        0            1            0         0         0          0          1
2        0            1            0         0         0          0          1
3        0            1            0         1         0          0          1
4        0            1            0         0         0          0          1
5        0            0            0         0         0          0          1
6        0            1            0         0         0          0          1
  Age_t1 Age_t2
1      1      0
2      1      0
3      1      0
4      0      0
5      1      0
6      0      0

2. Model Extended Cox with Breslow Method

Fit_Model_Cluster3<-coxph(Surv(waktu,status)~ sex+pregnancy+hypertension+obesity+tobacco+diabetes+renal_chronic+cardiovascular+copd+inmsupr_t1+inmsupr_t2+asthma_t1 +asthma_t2+ pneumonia_t1+pneumonia_t2+Age_t1+Age_t2, data = DataCluster3, method = "breslow")

summary(Fit_Model_Cluster3)
Call:
coxph(formula = Surv(waktu, status) ~ sex + pregnancy + hypertension + 
    obesity + tobacco + diabetes + renal_chronic + cardiovascular + 
    copd + inmsupr_t1 + inmsupr_t2 + asthma_t1 + asthma_t2 + 
    pneumonia_t1 + pneumonia_t2 + Age_t1 + Age_t2, data = DataCluster3, 
    method = "breslow")

  n= 4880, number of events= 1586 

                     coef  exp(coef)   se(coef)      z Pr(>|z|)    
sex             8.152e-02  1.085e+00  5.211e-02  1.564 0.117718    
pregnancy       4.987e-01  1.647e+00  5.040e-01  0.990 0.322397    
hypertension    2.160e-02  1.022e+00  5.570e-02  0.388 0.698121    
obesity        -3.732e-02  9.634e-01  6.009e-02 -0.621 0.534570    
tobacco        -2.475e-02  9.756e-01  8.983e-02 -0.276 0.782896    
diabetes       -1.131e-02  9.888e-01  5.590e-02 -0.202 0.839703    
renal_chronic   6.937e-02  1.072e+00  7.199e-02  0.964 0.335284    
cardiovascular -7.459e-02  9.281e-01  4.286e-01 -0.174 0.861841    
copd           -3.180e-02  9.687e-01  4.323e-01 -0.074 0.941368    
inmsupr_t1      1.252e-01  1.133e+00  1.789e-01  0.700 0.483882    
inmsupr_t2     -2.276e+01  1.302e-10  5.152e+02 -0.044 0.964762    
asthma_t1      -8.851e-02  9.153e-01  1.764e-01 -0.502 0.615895    
asthma_t2      -2.349e+01  6.283e-11  5.152e+02 -0.046 0.963635    
pneumonia_t1   -1.922e-01  8.251e-01  5.680e-02 -3.384 0.000713 ***
pneumonia_t2    2.236e+00  9.354e+00  2.905e-01  7.697  1.4e-14 ***
Age_t1          6.596e-03  1.007e+00  5.323e-02  0.124 0.901378    
Age_t2         -1.055e-01  8.998e-01  2.757e-01 -0.383 0.701872    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

               exp(coef) exp(-coef) lower .95 upper .95
sex            1.085e+00  9.217e-01    0.9796    1.2016
pregnancy      1.647e+00  6.073e-01    0.6132    4.4218
hypertension   1.022e+00  9.786e-01    0.9162    1.1397
obesity        9.634e-01  1.038e+00    0.8563    1.0838
tobacco        9.756e-01  1.025e+00    0.8181    1.1634
diabetes       9.888e-01  1.011e+00    0.8862    1.1032
renal_chronic  1.072e+00  9.330e-01    0.9308    1.2343
cardiovascular 9.281e-01  1.077e+00    0.4007    2.1500
copd           9.687e-01  1.032e+00    0.4151    2.2604
inmsupr_t1     1.133e+00  8.823e-01    0.7982    1.6093
inmsupr_t2     1.302e-10  7.682e+09    0.0000       Inf
asthma_t1      9.153e-01  1.093e+00    0.6477    1.2934
asthma_t2      6.283e-11  1.592e+10    0.0000       Inf
pneumonia_t1   8.251e-01  1.212e+00    0.7382    0.9223
pneumonia_t2   9.354e+00  1.069e-01    5.2934   16.5301
Age_t1         1.007e+00  9.934e-01    0.9069    1.1173
Age_t2         8.998e-01  1.111e+00    0.5242    1.5447

Concordance= 0.92  (se = 0.002 )
Likelihood ratio test= 5976  on 17 df,   p=<2e-16
Wald test            = 116.5  on 17 df,   p=<2e-16
Score (logrank) test = 7121  on 17 df,   p=<2e-16

Biasanya Anda dapat mengabaikan pesan tersebut, sebagian besar untuk informasi Anda. Itu hal utama yang perlu diperhatikan adalah Sebuah. Ketika salah satu koefisien menuju tak terhingga dalam model Cox, Wald uji signifikansi beta/se(beta) rusak, dan tidak lagi dapat diandalkan. Namun tes LR masih valid. Karenanya rutinitas seperti stepAIC baik-baik saja. Begitu juga nilai prediksi, residu, dll. Sebenarnya itu hanya Uji Wald yang perlu diabaikan: didasarkan pada deret Taylor yang hanya tidak bekerja sejauh itu dari nol maka dari itu perlu membuang variabel yang menghasilkan infinity

Fit_Model_Cluster3<-coxph(Surv(waktu,status)~ sex+pregnancy+hypertension+obesity+tobacco+diabetes+renal_chronic+cardiovascular+copd+inmsupr_t1+asthma_t1 + pneumonia_t1+pneumonia_t2+Age_t1+Age_t2, data = DataCluster3, method = "breslow")

summary(Fit_Model_Cluster3)
Call:
coxph(formula = Surv(waktu, status) ~ sex + pregnancy + hypertension + 
    obesity + tobacco + diabetes + renal_chronic + cardiovascular + 
    copd + inmsupr_t1 + asthma_t1 + pneumonia_t1 + pneumonia_t2 + 
    Age_t1 + Age_t2, data = DataCluster3, method = "breslow")

  n= 4880, number of events= 1586 

                   coef exp(coef) se(coef)       z Pr(>|z|)    
sex            -0.09383   0.91044  0.05227  -1.795 0.072658 .  
pregnancy       0.88582   2.42497  0.50439   1.756 0.079049 .  
hypertension    0.03777   1.03849  0.05524   0.684 0.494193    
obesity        -0.04748   0.95363  0.06019  -0.789 0.430212    
tobacco        -0.11480   0.89154  0.09024  -1.272 0.203290    
diabetes        0.07676   1.07978  0.05526   1.389 0.164810    
renal_chronic   0.19930   1.22055  0.07214   2.763 0.005731 ** 
cardiovascular -1.21153   0.29774  0.33878  -3.576 0.000349 ***
copd           -1.40264   0.24595  0.34149  -4.107    4e-05 ***
inmsupr_t1      2.28540   9.82961  0.07583  30.140  < 2e-16 ***
asthma_t1       1.90470   6.71736  0.07106  26.805  < 2e-16 ***
pneumonia_t1    0.22963   1.25813  0.06323   3.632 0.000282 ***
pneumonia_t2   -0.59466   0.55175  0.16427  -3.620 0.000295 ***
Age_t1          0.14136   1.15184  0.05569   2.538 0.011142 *  
Age_t2         -2.98052   0.05077  0.17112 -17.418  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

               exp(coef) exp(-coef) lower .95 upper .95
sex              0.91044     1.0984    0.8218    1.0087
pregnancy        2.42497     0.4124    0.9023    6.5169
hypertension     1.03849     0.9629    0.9319    1.1572
obesity          0.95363     1.0486    0.8475    1.0730
tobacco          0.89154     1.1217    0.7470    1.0640
diabetes         1.07978     0.9261    0.9689    1.2033
renal_chronic    1.22055     0.8193    1.0596    1.4059
cardiovascular   0.29774     3.3586    0.1533    0.5784
copd             0.24595     4.0659    0.1259    0.4803
inmsupr_t1       9.82961     0.1017    8.4721   11.4046
asthma_t1        6.71736     0.1489    5.8441    7.7212
pneumonia_t1     1.25813     0.7948    1.1115    1.4241
pneumonia_t2     0.55175     1.8124    0.3999    0.7613
Age_t1           1.15184     0.8682    1.0327    1.2847
Age_t2           0.05077    19.6980    0.0363    0.0710

Concordance= 0.903  (se = 0.003 )
Likelihood ratio test= 5034  on 15 df,   p=<2e-16
Wald test            = 2490  on 15 df,   p=<2e-16
Score (logrank) test = 7074  on 15 df,   p=<2e-16

3. Model Improvment

model.stepwise.cluster3<-step(object = Fit_Model_Cluster3,direction = "both", trace=T,data=DataCluster3)
Start:  AIC=21381.46
Surv(waktu, status) ~ sex + pregnancy + hypertension + obesity + 
    tobacco + diabetes + renal_chronic + cardiovascular + copd + 
    inmsupr_t1 + asthma_t1 + pneumonia_t1 + pneumonia_t2 + Age_t1 + 
    Age_t2

                 Df   AIC
- hypertension    1 21380
- obesity         1 21380
- tobacco         1 21381
- diabetes        1 21381
<none>              21382
- pregnancy       1 21382
- sex             1 21383
- Age_t1          1 21386
- renal_chronic   1 21387
- pneumonia_t1    1 21393
- pneumonia_t2    1 21395
- cardiovascular  1 21397
- copd            1 21404
- Age_t2          1 21882
- asthma_t1       1 21961
- inmsupr_t1      1 22185

Step:  AIC=21379.93
Surv(waktu, status) ~ sex + pregnancy + obesity + tobacco + diabetes + 
    renal_chronic + cardiovascular + copd + inmsupr_t1 + asthma_t1 + 
    pneumonia_t1 + pneumonia_t2 + Age_t1 + Age_t2

                 Df   AIC
- obesity         1 21378
- tobacco         1 21380
<none>              21380
- pregnancy       1 21380
- diabetes        1 21380
- sex             1 21381
+ hypertension    1 21382
- Age_t1          1 21384
- renal_chronic   1 21386
- pneumonia_t1    1 21391
- pneumonia_t2    1 21393
- cardiovascular  1 21396
- copd            1 21403
- Age_t2          1 21895
- asthma_t1       1 21960
- inmsupr_t1      1 22183

Step:  AIC=21378.43
Surv(waktu, status) ~ sex + pregnancy + tobacco + diabetes + 
    renal_chronic + cardiovascular + copd + inmsupr_t1 + asthma_t1 + 
    pneumonia_t1 + pneumonia_t2 + Age_t1 + Age_t2

                 Df   AIC
- tobacco         1 21378
<none>              21378
- diabetes        1 21379
- pregnancy       1 21379
- sex             1 21380
+ obesity         1 21380
+ hypertension    1 21380
- Age_t1          1 21382
- renal_chronic   1 21385
- pneumonia_t1    1 21390
- pneumonia_t2    1 21391
- cardiovascular  1 21396
- copd            1 21403
- Age_t2          1 21893
- asthma_t1       1 21959
- inmsupr_t1      1 22192

Step:  AIC=21377.92
Surv(waktu, status) ~ sex + pregnancy + diabetes + renal_chronic + 
    cardiovascular + copd + inmsupr_t1 + asthma_t1 + pneumonia_t1 + 
    pneumonia_t2 + Age_t1 + Age_t2

                 Df   AIC
<none>              21378
- diabetes        1 21378
- pregnancy       1 21378
+ tobacco         1 21378
- sex             1 21379
+ obesity         1 21380
+ hypertension    1 21380
- Age_t1          1 21382
- renal_chronic   1 21384
- pneumonia_t1    1 21389
- pneumonia_t2    1 21391
- cardiovascular  1 21395
- copd            1 21404
- Age_t2          1 21897
- asthma_t1       1 21957
- inmsupr_t1      1 22190

4. Model Final

Model_Final_Cluster3<-coxph(Surv(waktu, status) ~ sex + pregnancy + diabetes + renal_chronic + 
    cardiovascular + copd + inmsupr_t1 + asthma_t1 + pneumonia_t1 + 
    pneumonia_t2 + Age_t1 + Age_t2, data = DataCluster3, method = "breslow")

summary(Model_Final_Cluster3)
Call:
coxph(formula = Surv(waktu, status) ~ sex + pregnancy + diabetes + 
    renal_chronic + cardiovascular + copd + inmsupr_t1 + asthma_t1 + 
    pneumonia_t1 + pneumonia_t2 + Age_t1 + Age_t2, data = DataCluster3, 
    method = "breslow")

  n= 4880, number of events= 1586 

                   coef exp(coef) se(coef)       z Pr(>|z|)    
sex            -0.08358   0.91981  0.05134  -1.628 0.103499    
pregnancy       0.88464   2.42211  0.50413   1.755 0.079294 .  
diabetes        0.08002   1.08331  0.05409   1.479 0.139053    
renal_chronic   0.20489   1.22739  0.07092   2.889 0.003862 ** 
cardiovascular -1.23765   0.29007  0.33711  -3.671 0.000241 ***
copd           -1.46186   0.23181  0.33921  -4.310 1.64e-05 ***
inmsupr_t1      2.28054   9.78198  0.07494  30.433  < 2e-16 ***
asthma_t1       1.89739   6.66846  0.07055  26.893  < 2e-16 ***
pneumonia_t1    0.22401   1.25108  0.06310   3.550 0.000385 ***
pneumonia_t2   -0.60041   0.54859  0.16417  -3.657 0.000255 ***
Age_t1          0.13550   1.14511  0.05523   2.453 0.014157 *  
Age_t2         -2.99757   0.04991  0.17028 -17.604  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

               exp(coef) exp(-coef) lower .95 upper .95
sex              0.91981     1.0872   0.83177   1.01718
pregnancy        2.42211     0.4129   0.90174   6.50591
diabetes         1.08331     0.9231   0.97434   1.20447
renal_chronic    1.22739     0.8147   1.06812   1.41042
cardiovascular   0.29007     3.4475   0.14981   0.56162
copd             0.23181     4.3140   0.11923   0.45067
inmsupr_t1       9.78198     0.1022   8.44580  11.32955
asthma_t1        6.66846     0.1500   5.80724   7.65740
pneumonia_t1     1.25108     0.7993   1.10554   1.41578
pneumonia_t2     0.54859     1.8229   0.39765   0.75681
Age_t1           1.14511     0.8733   1.02762   1.27603
Age_t2           0.04991    20.0368   0.03575   0.06968

Concordance= 0.902  (se = 0.003 )
Likelihood ratio test= 5031  on 12 df,   p=<2e-16
Wald test            = 2492  on 12 df,   p=<2e-16
Score (logrank) test = 7073  on 12 df,   p=<2e-16

9.4 Cluster 4

1. Uji Proportional hazard

DataCluster4<- CovidData %>% 
  filter(cluster == 4)

DataCluster4[,c("sex","pneumonia","pregnancy","diabetes","copd","asthma","inmsupr","hypertension","cardiovascular","obesity","renal_chronic","tobacco","status","Age")] <- lapply(DataCluster4[,c("sex","pneumonia","pregnancy","diabetes","copd","asthma","inmsupr","hypertension","cardiovascular","obesity","renal_chronic","tobacco","status","Age")],as.factor)

summary(DataCluster4)
 sex      pneumonia pregnancy diabetes copd     asthma   inmsupr  hypertension
 0:1746   0:1344    0:3113    0:1809   1:3113   0:2942   0:2938   0:1421      
 1:1367   1:1769              1:1304            1: 171   1: 175   1:1692      
                                                                              
                                                                              
                                                                              
                                                                              
 cardiovascular obesity  renal_chronic tobacco      waktu        status  
 0:2674         0:2249   0:2834        0:2392   Min.   :  1.00   0:1136  
 1: 439         1: 864   1: 279        1: 721   1st Qu.:  7.00   1:1977  
                                                Median : 14.00           
                                                Mean   : 42.97           
                                                3rd Qu.:100.00           
                                                Max.   :100.00           
 Age      cluster 
 0:2531   1:   0  
 1: 582   2:   0  
          3:   0  
          4:3113  
          5:   0  
                  
DataCluster4<- CovidData %>% 
  filter(cluster == 4)%>% 
  select(-"pregnancy")
head(DataCluster4)
  sex pneumonia diabetes copd asthma inmsupr hypertension cardiovascular
1   0         1        0    1      1       0            1              1
2   1         1        1    1      0       0            1              0
3   0         1        1    1      0       1            1              0
4   0         1        0    1      0       0            0              0
5   1         1        1    1      0       0            1              0
6   0         1        1    1      0       0            1              0
  obesity renal_chronic tobacco waktu status Age cluster
1       1             0       0     3      1   1       4
2       0             0       0     9      1   0       4
3       0             1       0     6      1   0       4
4       0             0       0     5      1   0       4
5       1             1       0     6      1   0       4
6       0             0       0     8      1   1       4

Sebelum diterapkan model extended cox perlu dilakukan pengujian apakah semua variabe independent terhadap waktu. uji asumsi proportional hazarad ada dua pendekatan yaitu pendekatan secara grafik dan pendekatan GOF.

a. Pendekatan Grafik

Variabel sex
fit<-survfit(Surv(waktu,status)~ sex, data=DataCluster4)
ggsurvplot(fit, data = DataCluster4, fun = "cumhaz",
           legend = "top",
          legend.title = "Jenis Kelamin",
          legend.labs = c("Laki-Laki", "Perempuan")
          )

Variabel pneumonia
fit<-survfit(Surv(waktu,status)~ pneumonia, data=DataCluster4)
ggsurvplot(fit, data = DataCluster4, fun = "cumhaz",
           legend = "top",
          legend.title = "Pneumonia",
          legend.labs = c("no Pneumonia", "yes pneumonia")
          )

Variabel diabetes
fit<-survfit(Surv(waktu,status)~ diabetes, data=DataCluster4)
ggsurvplot(fit, data = DataCluster4, fun = "cumhaz",
           legend = "top",
          legend.title = "Diabetes",
          legend.labs = c("no Diabetes", "yes Diabetes")
          )

##### Variabel Hypertension

fit<-survfit(Surv(waktu,status)~ hypertension, data=DataCluster4)
ggsurvplot(fit, data = DataCluster4, fun = "cumhaz",
           legend = "top",
          legend.title = "Hyperetension",
          legend.labs = c("no hypertension", "yes hypertension")
          )

Variabel obesity
fit<-survfit(Surv(waktu,status)~ obesity, data=DataCluster4)
ggsurvplot(fit, data = DataCluster4, fun = "cumhaz",
           legend = "top",
          legend.title = "Obesity",
          legend.labs = c("no obesity", "yes obesity")
          )

Variabel renal chronic
fit<-survfit(Surv(waktu,status)~ renal_chronic, data=DataCluster4)
ggsurvplot(fit, data = DataCluster4, fun = "cumhaz",
           legend = "top",
          legend.title = "Renal Chronic",
          legend.labs = c("no renal chronic", "yes renal chronic")
          )

Variabel tobacco
fit<-survfit(Surv(waktu,status)~ tobacco, data=DataCluster4)
ggsurvplot(fit, data = DataCluster4, fun = "cumhaz",
           legend = "top",
          legend.title = "Tobacco",
          legend.labs = c("no tobacco", "yes tobacco")
          )

Variabel Age
fit<-survfit(Surv(waktu,status)~ Age, data=DataCluster4)
ggsurvplot(fit, data = DataCluster4, fun = "cumhaz",
           legend = "top",
          legend.title = "Age",
          legend.labs = c("old", "young")
          )

Variabel COPD
fit<-survfit(Surv(waktu,status)~ copd, data=DataCluster4)
ggsurvplot(fit, data = DataCluster4, fun = "cumhaz",
           legend = "top",
          legend.title = "copd"
          )

Variabel inmsupr
fit<-survfit(Surv(waktu,status)~ inmsupr, data=DataCluster4)
ggsurvplot(fit, data = DataCluster4, fun = "cumhaz",
           legend = "top",
          legend.title = "inmsupr",
          legend.labs = c("no inmsupr", "yes inmsupr")
          )

Variabel asthma
fit<-survfit(Surv(waktu,status)~ asthma, data=DataCluster4)
ggsurvplot(fit, data = DataCluster4, fun = "cumhaz",
           legend = "top",
          legend.title = "asthma",
          legend.labs = c("no asthma", "yes asthma")
          )

Variabel cardiovascular
fit<-survfit(Surv(waktu,status)~ cardiovascular, data=DataCluster4)
ggsurvplot(fit, data = DataCluster4, fun = "cumhaz",
           legend = "top",
          legend.title = "Cardiovascular",
          legend.labs = c("No", "Yes")
          )

b. Pendekatan GOF

fit_cluster4<-coxph(Surv(waktu,status)~ sex+pneumonia+Age+hypertension+obesity+tobacco+diabetes+renal_chronic+inmsupr+asthma+cardiovascular+copd, data = DataCluster4)

cek_fitt_all4<-cox.zph(fit_cluster4,  transform="km", global=TRUE)

print(cek_fitt_all4)
                chisq df       p
sex             1.651  1   0.199
pneumonia      63.795  1 1.4e-15
Age             2.289  1   0.130
hypertension    0.353  1   0.552
obesity         1.560  1   0.212
tobacco         0.620  1   0.431
diabetes        0.365  1   0.546
renal_chronic   2.298  1   0.130
inmsupr         0.520  1   0.471
asthma          0.121  1   0.728
cardiovascular  8.303  1   0.004
GLOBAL         86.854 11 6.9e-14

Ada 2 variabel yang tidak memenuhi asumsi proportional hazard yaitu: pneumonia, cardiovascular. maka ke-dua variabel ini akan dinteraksikan dengan fungsi waktu yaitu fungsi heavside pada setiap variabel. Untuk dapat menentukan fungsi heavsidenya adalah dengan melihat grafik survival probabilty.

# VARIABEL PNEUMONIA
fit<-survfit(Surv(waktu,status)~ pneumonia , data=DataCluster4)
ggsurvplot(fit, data = DataCluster4, 
pval = TRUE, conf.int = TRUE,
 risk.table = TRUE, # Add risk table
 risk.table.col = "strata", # Change risk table color by groups
 linetype = "strata", # Change line type by groups
 surv.median.line = "hv", # Specify median survival
 ggtheme = theme_bw(), # Change ggplot2 theme
 palette = c("#E7B800", "#2E9FDF"))

Perhatikan kecepatan grafik pasien pneumonia turun cepat pada hari ke 25 namun setelah hari ke 25 konstan sampai akhir penelitian. sehinggan fungsi heavsidenya adalah

\[ g(t)= \begin{cases} 1, & \mbox{jika}\ t\mbox{ < 25} \\ 0, & \mbox{jika}\ t\mbox{ >= 25} \end{cases}\]

# VARIABEL Cardiovascular
fit<-survfit(Surv(waktu,status)~ cardiovascular , data=DataCluster4)
ggsurvplot(fit, data = DataCluster4, 
pval = TRUE, conf.int = TRUE,
 risk.table = TRUE, # Add risk table
 risk.table.col = "strata", # Change risk table color by groups
 linetype = "strata", # Change line type by groups
 surv.median.line = "hv", # Specify median survival
 ggtheme = theme_bw(), # Change ggplot2 theme
 palette = c("#E7B800", "#2E9FDF"))

sehingga ke-dua tersebut diinterkasikan dengan fungsi waktu masing-masing. namun sebelumnya membuat kolom interval yaitu jika t lebih besar atau sama dengan 25 maka nilanya 25 tapi kalau tidak nilainya 0

DataCluster4= DataCluster4 %>%
        mutate(interval = ifelse(waktu<= 25, 0, 25))
head(DataCluster4)
  sex pneumonia diabetes copd asthma inmsupr hypertension cardiovascular
1   0         1        0    1      1       0            1              1
2   1         1        1    1      0       0            1              0
3   0         1        1    1      0       1            1              0
4   0         1        0    1      0       0            0              0
5   1         1        1    1      0       0            1              0
6   0         1        1    1      0       0            1              0
  obesity renal_chronic tobacco waktu status Age cluster interval
1       1             0       0     3      1   1       4        0
2       0             0       0     9      1   0       4        0
3       0             1       0     6      1   0       4        0
4       0             0       0     5      1   0       4        0
5       1             1       0     6      1   0       4        0
6       0             0       0     8      1   1       4        0
t.pneumonia_1 = t(DataCluster4$pneumonia)*(1-(DataCluster4$interval)/25) 
t.pneumonia_2 = t(DataCluster4$pneumonia)*(DataCluster4$interval/25) 
pneumonia_t1 = t(t.pneumonia_1 )
pneumonia_t2 = t(t.pneumonia_2)
t.cardiovascular_1 = t(DataCluster4$cardiovascular)*(1-(DataCluster4$interval)/25) 
t.cardiovascular_2 = t(DataCluster4$cardiovascular)*(DataCluster4$interval/25) 
cardiovascular_t1 = t(t.cardiovascular_1 )
cardiovascular_t2 = t(t.cardiovascular_2)
DataCluster4=cbind(DataCluster4,pneumonia_t1,pneumonia_t2,cardiovascular_t1,cardiovascular_t2)
head(DataCluster4)
  sex pneumonia diabetes copd asthma inmsupr hypertension cardiovascular
1   0         1        0    1      1       0            1              1
2   1         1        1    1      0       0            1              0
3   0         1        1    1      0       1            1              0
4   0         1        0    1      0       0            0              0
5   1         1        1    1      0       0            1              0
6   0         1        1    1      0       0            1              0
  obesity renal_chronic tobacco waktu status Age cluster interval pneumonia_t1
1       1             0       0     3      1   1       4        0            1
2       0             0       0     9      1   0       4        0            1
3       0             1       0     6      1   0       4        0            1
4       0             0       0     5      1   0       4        0            1
5       1             1       0     6      1   0       4        0            1
6       0             0       0     8      1   1       4        0            1
  pneumonia_t2 cardiovascular_t1 cardiovascular_t2
1            0                 1                 0
2            0                 0                 0
3            0                 0                 0
4            0                 0                 0
5            0                 0                 0
6            0                 0                 0

2. Model Extended Cox

Fit_Model_Cluster4<-coxph(Surv(waktu,status)~ sex+hypertension+obesity+tobacco+diabetes+renal_chronic+cardiovascular+copd+pneumonia_t1+pneumonia_t2+cardiovascular_t1+cardiovascular_t2+asthma+inmsupr, data = DataCluster4, method = "breslow")

summary(Fit_Model_Cluster4)
Call:
coxph(formula = Surv(waktu, status) ~ sex + hypertension + obesity + 
    tobacco + diabetes + renal_chronic + cardiovascular + copd + 
    pneumonia_t1 + pneumonia_t2 + cardiovascular_t1 + cardiovascular_t2 + 
    asthma + inmsupr, data = DataCluster4, method = "breslow")

  n= 3113, number of events= 1977 

                      coef exp(coef) se(coef)       z Pr(>|z|)    
sex                0.01785   1.01801  0.04784   0.373   0.7091    
hypertension       0.07129   1.07390  0.04807   1.483   0.1380    
obesity           -0.13039   0.87775  0.05183  -2.516   0.0119 *  
tobacco            0.02973   1.03017  0.05552   0.535   0.5923    
diabetes           0.13633   1.14606  0.04711   2.894   0.0038 ** 
renal_chronic      0.15840   1.17163  0.07472   2.120   0.0340 *  
cardiovascular    -1.45769   0.23277  0.26222  -5.559 2.71e-08 ***
copd                    NA        NA  0.00000      NA       NA    
pneumonia_t1       1.45571   4.28751  0.05587  26.057  < 2e-16 ***
pneumonia_t2      -1.49436   0.22439  0.13177 -11.340  < 2e-16 ***
cardiovascular_t1  1.94364   6.98412  0.26923   7.219 5.22e-13 ***
cardiovascular_t2       NA        NA  0.00000      NA       NA    
asthma            -0.18013   0.83516  0.10455  -1.723   0.0849 .  
inmsupr            0.12309   1.13099  0.09396   1.310   0.1902    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

                  exp(coef) exp(-coef) lower .95 upper .95
sex                  1.0180     0.9823    0.9269    1.1181
hypertension         1.0739     0.9312    0.9773    1.1800
obesity              0.8778     1.1393    0.7930    0.9716
tobacco              1.0302     0.9707    0.9240    1.1486
diabetes             1.1461     0.8726    1.0450    1.2569
renal_chronic        1.1716     0.8535    1.0120    1.3564
cardiovascular       0.2328     4.2960    0.1392    0.3892
copd                     NA         NA        NA        NA
pneumonia_t1         4.2875     0.2332    3.8428    4.7836
pneumonia_t2         0.2244     4.4565    0.1733    0.2905
cardiovascular_t1    6.9841     0.1432    4.1205   11.8380
cardiovascular_t2        NA         NA        NA        NA
asthma               0.8352     1.1974    0.6804    1.0251
inmsupr              1.1310     0.8842    0.9408    1.3597

Concordance= 0.744  (se = 0.005 )
Likelihood ratio test= 1755  on 12 df,   p=<2e-16
Wald test            = 1295  on 12 df,   p=<2e-16
Score (logrank) test = 1800  on 12 df,   p=<2e-16

3. Model improvment

model.stepwise.cluster4<-step(object = Fit_Model_Cluster4,direction = "both", trace=T,data=DataCluster4)
Start:  AIC=28502.79
Surv(waktu, status) ~ sex + hypertension + obesity + tobacco + 
    diabetes + renal_chronic + cardiovascular + copd + pneumonia_t1 + 
    pneumonia_t2 + cardiovascular_t1 + cardiovascular_t2 + asthma + 
    inmsupr


Step:  AIC=28502.79
Surv(waktu, status) ~ sex + hypertension + obesity + tobacco + 
    diabetes + renal_chronic + cardiovascular + copd + pneumonia_t1 + 
    pneumonia_t2 + cardiovascular_t1 + asthma + inmsupr


Step:  AIC=28502.79
Surv(waktu, status) ~ sex + hypertension + obesity + tobacco + 
    diabetes + renal_chronic + cardiovascular + pneumonia_t1 + 
    pneumonia_t2 + cardiovascular_t1 + asthma + inmsupr

                    Df   AIC
- sex                1 28501
- tobacco            1 28501
- inmsupr            1 28503
<none>                 28503
- hypertension       1 28503
- asthma             1 28504
- renal_chronic      1 28505
- obesity            1 28507
- diabetes           1 28509
- cardiovascular     1 28551
- cardiovascular_t1  1 28591
- pneumonia_t2       1 28690
- pneumonia_t1       1 29267

Step:  AIC=28500.93
Surv(waktu, status) ~ hypertension + obesity + tobacco + diabetes + 
    renal_chronic + cardiovascular + pneumonia_t1 + pneumonia_t2 + 
    cardiovascular_t1 + asthma + inmsupr

                    Df   AIC
- tobacco            1 28499
- inmsupr            1 28501
<none>                 28501
- hypertension       1 28501
- asthma             1 28502
+ sex                1 28503
- renal_chronic      1 28503
- obesity            1 28505
- diabetes           1 28507
- cardiovascular     1 28549
- cardiovascular_t1  1 28589
- pneumonia_t2       1 28688
- pneumonia_t1       1 29265

Step:  AIC=28499.13
Surv(waktu, status) ~ hypertension + obesity + diabetes + renal_chronic + 
    cardiovascular + pneumonia_t1 + pneumonia_t2 + cardiovascular_t1 + 
    asthma + inmsupr

                    Df   AIC
- inmsupr            1 28499
<none>                 28499
- hypertension       1 28499
- asthma             1 28500
+ tobacco            1 28501
+ sex                1 28501
- renal_chronic      1 28502
- obesity            1 28503
- diabetes           1 28506
- cardiovascular     1 28547
- cardiovascular_t1  1 28587
- pneumonia_t2       1 28686
- pneumonia_t1       1 29265

Step:  AIC=28498.87
Surv(waktu, status) ~ hypertension + obesity + diabetes + renal_chronic + 
    cardiovascular + pneumonia_t1 + pneumonia_t2 + cardiovascular_t1 + 
    asthma

                    Df   AIC
<none>                 28499
- hypertension       1 28499
+ inmsupr            1 28499
- asthma             1 28500
+ tobacco            1 28501
+ sex                1 28501
- renal_chronic      1 28502
- obesity            1 28503
- diabetes           1 28506
- cardiovascular     1 28546
- cardiovascular_t1  1 28587
- pneumonia_t2       1 28686
- pneumonia_t1       1 29265

4. Model Final

Model_Final_Cluster4<-coxph(Surv(waktu, status) ~ hypertension + obesity + diabetes + renal_chronic + 
    cardiovascular + pneumonia_t1 + pneumonia_t2 + cardiovascular_t1 + 
    asthma, data = DataCluster4, method = "breslow")

summary(Model_Final_Cluster4)
Call:
coxph(formula = Surv(waktu, status) ~ hypertension + obesity + 
    diabetes + renal_chronic + cardiovascular + pneumonia_t1 + 
    pneumonia_t2 + cardiovascular_t1 + asthma, data = DataCluster4, 
    method = "breslow")

  n= 3113, number of events= 1977 

                      coef exp(coef) se(coef)       z Pr(>|z|)    
hypertension       0.07019   1.07272  0.04800   1.462  0.14362    
obesity           -0.12785   0.87998  0.05153  -2.481  0.01309 *  
diabetes           0.14208   1.15267  0.04677   3.038  0.00238 ** 
renal_chronic      0.16555   1.18004  0.07451   2.222  0.02629 *  
cardiovascular    -1.44224   0.23640  0.26194  -5.506 3.67e-08 ***
pneumonia_t1       1.45612   4.28926  0.05582  26.084  < 2e-16 ***
pneumonia_t2      -1.49284   0.22473  0.13177 -11.329  < 2e-16 ***
cardiovascular_t1  1.94515   6.99470  0.26920   7.226 4.99e-13 ***
asthma            -0.18858   0.82814  0.10434  -1.807  0.07072 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

                  exp(coef) exp(-coef) lower .95 upper .95
hypertension         1.0727     0.9322    0.9764    1.1785
obesity              0.8800     1.1364    0.7955    0.9735
diabetes             1.1527     0.8675    1.0517    1.2633
renal_chronic        1.1800     0.8474    1.0197    1.3656
cardiovascular       0.2364     4.2302    0.1415    0.3950
pneumonia_t1         4.2893     0.2331    3.8447    4.7852
pneumonia_t2         0.2247     4.4497    0.1736    0.2910
cardiovascular_t1    6.9947     0.1430    4.1269   11.8554
asthma               0.8281     1.2075    0.6750    1.0161

Concordance= 0.743  (se = 0.005 )
Likelihood ratio test= 1753  on 9 df,   p=<2e-16
Wald test            = 1293  on 9 df,   p=<2e-16
Score (logrank) test = 1798  on 9 df,   p=<2e-16

9.5 Cluster 5

DataCluster5<- CovidData %>% 
  filter(cluster == 5)

DataCluster5[,c("sex","pneumonia","pregnancy","diabetes","copd","asthma","inmsupr","hypertension","cardiovascular","obesity","renal_chronic","tobacco","status","Age")] <- lapply(DataCluster5[,c("sex","pneumonia","pregnancy","diabetes","copd","asthma","inmsupr","hypertension","cardiovascular","obesity","renal_chronic","tobacco","status","Age")],as.factor)

summary(DataCluster5)
 sex      pneumonia pregnancy diabetes copd     asthma   inmsupr  hypertension
 0:1981   0:1632    0:3363    0:1948   0:3367   0:3270   0:3176   0:1248      
 1:1386   1:1735    1:   4    1:1419            1:  97   1: 191   1:2119      
                                                                              
                                                                              
                                                                              
                                                                              
 cardiovascular obesity  renal_chronic tobacco      waktu        status  
 1:3367         0:2319   0:2941        0:2998   Min.   :  1.00   0:1522  
                1:1048   1: 426        1: 369   1st Qu.:  8.00   1:1845  
                                                Median : 20.00           
                                                Mean   : 50.81           
                                                3rd Qu.:100.00           
                                                Max.   :100.00           
 Age      cluster 
 0:2260   1:   0  
 1:1107   2:   0  
          3:   0  
          4:   0  
          5:3367  
                  
DataCluster5<- CovidData %>% 
  filter(cluster == 5)%>% 
  select(-c("copd",'cardiovascular'))
head(DataCluster5)
  sex pneumonia pregnancy diabetes asthma inmsupr hypertension obesity
1   0         1         0        0      0       0            0       0
2   1         1         0        1      0       0            0       0
3   0         1         0        1      0       0            1       1
4   1         0         0        1      0       0            0       0
5   1         1         0        0      0       0            1       0
6   0         1         0        0      0       0            1       0
  renal_chronic tobacco waktu status Age cluster
1             0       1     2      1   0       5
2             0       0    52      1   0       5
3             0       0     5      1   0       5
4             1       0     8      1   0       5
5             1       0     5      1   0       5
6             1       0     4      1   1       5

1. Uji Proportional Hazard

b. Pendekatan Grafik

fit<-survfit(Surv(waktu,status)~ sex, data=DataCluster5)
ggsurvplot(fit, data = DataCluster5, fun = "cumhaz",
           legend = "top",
          legend.title = "Jenis Kelamin",
          legend.labs = c("Laki-Laki", "Perempuan")
          )

Variabel pneumonia
fit<-survfit(Surv(waktu,status)~ pneumonia, data=DataCluster5)
ggsurvplot(fit, data = DataCluster5, fun = "cumhaz",
           legend = "top",
          legend.title = "Pneumonia",
          legend.labs = c("no Pneumonia", "yes pneumonia")
          )

Variabel diabetes
fit<-survfit(Surv(waktu,status)~ diabetes, data=DataCluster5)
ggsurvplot(fit, data = DataCluster5, fun = "cumhaz",
           legend = "top",
          legend.title = "Diabetes",
          legend.labs = c("no Diabetes", "yes Diabetes")
          )

Variabel Hypertension
fit<-survfit(Surv(waktu,status)~ hypertension, data=DataCluster5)
ggsurvplot(fit, data = DataCluster5, fun = "cumhaz",
           legend = "top",
          legend.title = "Hyperetension",
          legend.labs = c("no hypertension", "yes hypertension")
          )

Variabel obesity
fit<-survfit(Surv(waktu,status)~ obesity, data=DataCluster5)
ggsurvplot(fit, data = DataCluster5, fun = "cumhaz",
           legend = "top",
          legend.title = "Obesity",
          legend.labs = c("no obesity", "yes obesity")
          )

Variabel renal chronic
fit<-survfit(Surv(waktu,status)~ renal_chronic, data=DataCluster5)
ggsurvplot(fit, data = DataCluster5, fun = "cumhaz",
           legend = "top",
          legend.title = "Renal Chronic",
          legend.labs = c("no renal chronic", "yes renal chronic")
          )

Variabel tobacco
fit<-survfit(Surv(waktu,status)~ tobacco, data=DataCluster5)
ggsurvplot(fit, data = DataCluster5, fun = "cumhaz",
           legend = "top",
          legend.title = "Tobacco",
          legend.labs = c("no tobacco", "yes tobacco")
          )

Variabel Age
fit<-survfit(Surv(waktu,status)~ Age, data=DataCluster5)
ggsurvplot(fit, data = DataCluster5, fun = "cumhaz",
           legend = "top",
          legend.title = "Age",
          legend.labs = c("old", "young")
          )

Variabel inmsupr
fit<-survfit(Surv(waktu,status)~ inmsupr, data=DataCluster5)
ggsurvplot(fit, data = DataCluster5, fun = "cumhaz",
           legend = "top",
          legend.title = "inmsupr",
          legend.labs = c("no inmsupr", "yes inmsupr")
          )

Variabel asthma
fit<-survfit(Surv(waktu,status)~ asthma, data=DataCluster5)
ggsurvplot(fit, data = DataCluster5, fun = "cumhaz",
           legend = "top",
          legend.title = "asthma",
          legend.labs = c("no asthma", "yes asthma")
          )

b. Pendekatan GOF

fit_cluster5<-coxph(Surv(waktu,status)~ sex+pneumonia+Age+hypertension+obesity+tobacco+diabetes+renal_chronic+inmsupr+asthma+pregnancy, data = DataCluster5)

cek_fitt_all5<-cox.zph(fit_cluster5,  transform="km", global=TRUE)

print(cek_fitt_all5)
               chisq df       p
sex            5.670  1   0.017
pneumonia     62.849  1 2.2e-15
Age           17.650  1 2.7e-05
hypertension   2.194  1   0.139
obesity        1.014  1   0.314
tobacco        3.506  1   0.061
diabetes       4.965  1   0.026
renal_chronic  0.735  1   0.391
inmsupr        1.134  1   0.287
asthma         1.507  1   0.220
pregnancy      0.439  1   0.508
GLOBAL        85.547 11 1.2e-13
# VARIABEL PNEUMONIA
fit<-survfit(Surv(waktu,status)~ pneumonia , data=DataCluster5)
ggsurvplot(fit, data = DataCluster5, 
pval = TRUE, conf.int = TRUE,
 risk.table = TRUE, # Add risk table
 risk.table.col = "strata", # Change risk table color by groups
 linetype = "strata", # Change line type by groups
 surv.median.line = "hv", # Specify median survival
 ggtheme = theme_bw(), # Change ggplot2 theme
 palette = c("#E7B800", "#2E9FDF"))

Perhatikan kecepatan grafik pasien yang memiliki pneumonia turun cepat pada hari ke 25 namun setelah hari ke 25 konstan sampai akhir penelitian. sehinggan fungsi heavsidenya adalah

\[ g(t)= \begin{cases} 1, & \mbox{jika}\ t\mbox{ < 25} \\ 0, & \mbox{jika}\ t\mbox{ >= 25} \end{cases}\]

# VARIABEL Age
fit<-survfit(Surv(waktu,status)~ Age , data=DataCluster5)
ggsurvplot(fit, data = DataCluster5, 
pval = TRUE, conf.int = TRUE,
 risk.table = TRUE, # Add risk table
 risk.table.col = "strata", # Change risk table color by groups
 linetype = "strata", # Change line type by groups
 surv.median.line = "hv", # Specify median survival
 ggtheme = theme_bw(), # Change ggplot2 theme
 palette = c("#E7B800", "#2E9FDF"))

DataCluster5= DataCluster5 %>%
        mutate(interval = ifelse(waktu<= 25, 0, 25))
head(DataCluster5)
  sex pneumonia pregnancy diabetes asthma inmsupr hypertension obesity
1   0         1         0        0      0       0            0       0
2   1         1         0        1      0       0            0       0
3   0         1         0        1      0       0            1       1
4   1         0         0        1      0       0            0       0
5   1         1         0        0      0       0            1       0
6   0         1         0        0      0       0            1       0
  renal_chronic tobacco waktu status Age cluster interval
1             0       1     2      1   0       5        0
2             0       0    52      1   0       5       25
3             0       0     5      1   0       5        0
4             1       0     8      1   0       5        0
5             1       0     5      1   0       5        0
6             1       0     4      1   1       5        0
t.pneumonia_1 = t(DataCluster5$pneumonia)*(1-(DataCluster5$interval)/25) 
t.pneumonia_2 = t(DataCluster5$pneumonia)*(DataCluster5$interval/25) 
pneumonia_t1 = t(t.pneumonia_1 )
pneumonia_t2 = t(t.pneumonia_2)
t.Age_1 = t(DataCluster5$Age)*(1-(DataCluster5$interval)/25) 
t.Age_2 = t(DataCluster5$Age)*(DataCluster5$interval/25) 
Age_t1 = t(t.Age_1 )
Age_t2 = t(t.Age_2)
DataCluster5=cbind(DataCluster5,pneumonia_t1,pneumonia_t2,Age_t1,Age_t2)
head(DataCluster5)
  sex pneumonia pregnancy diabetes asthma inmsupr hypertension obesity
1   0         1         0        0      0       0            0       0
2   1         1         0        1      0       0            0       0
3   0         1         0        1      0       0            1       1
4   1         0         0        1      0       0            0       0
5   1         1         0        0      0       0            1       0
6   0         1         0        0      0       0            1       0
  renal_chronic tobacco waktu status Age cluster interval pneumonia_t1
1             0       1     2      1   0       5        0            1
2             0       0    52      1   0       5       25            0
3             0       0     5      1   0       5        0            1
4             1       0     8      1   0       5        0            0
5             1       0     5      1   0       5        0            1
6             1       0     4      1   1       5        0            1
  pneumonia_t2 Age_t1 Age_t2
1            0      0      0
2            1      0      0
3            0      0      0
4            0      0      0
5            0      0      0
6            0      1      0

2. Model extended cox wit breslow method

Fit_Model_Cluster5<-coxph(Surv(waktu,status)~ sex+hypertension+obesity+tobacco+diabetes+renal_chronic+pregnancy+pneumonia_t1+pneumonia_t2+Age_t1+Age_t2+asthma+inmsupr, data = DataCluster5, method = "breslow")

summary(Fit_Model_Cluster5)
Call:
coxph(formula = Surv(waktu, status) ~ sex + hypertension + obesity + 
    tobacco + diabetes + renal_chronic + pregnancy + pneumonia_t1 + 
    pneumonia_t2 + Age_t1 + Age_t2 + asthma + inmsupr, data = DataCluster5, 
    method = "breslow")

  n= 3367, number of events= 1845 

                   coef exp(coef)  se(coef)       z Pr(>|z|)    
sex           -0.022076  0.978166  0.049402  -0.447    0.655    
hypertension  -0.024265  0.976027  0.053456  -0.454    0.650    
obesity       -0.048034  0.953102  0.051414  -0.934    0.350    
tobacco        0.003548  1.003555  0.073564   0.048    0.962    
diabetes       0.031888  1.032402  0.048972   0.651    0.515    
renal_chronic  0.286411  1.331640  0.064813   4.419 9.91e-06 ***
pregnancy      0.367627  1.444303  1.003104   0.366    0.714    
pneumonia_t1   1.479824  4.392174  0.059950  24.684  < 2e-16 ***
pneumonia_t2  -1.364407  0.255532  0.140013  -9.745  < 2e-16 ***
Age_t1         0.357100  1.429179  0.062935   5.674 1.39e-08 ***
Age_t2        -3.380664  0.034025  0.254798 -13.268  < 2e-16 ***
asthma        -0.241937  0.785106  0.150304  -1.610    0.107    
inmsupr        0.091748  1.096089  0.093032   0.986    0.324    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

              exp(coef) exp(-coef) lower .95 upper .95
sex             0.97817     1.0223   0.88790   1.07761
hypertension    0.97603     1.0246   0.87894   1.08384
obesity         0.95310     1.0492   0.86174   1.05415
tobacco         1.00355     0.9965   0.86881   1.15920
diabetes        1.03240     0.9686   0.93791   1.13641
renal_chronic   1.33164     0.7510   1.17278   1.51201
pregnancy       1.44430     0.6924   0.20222  10.31578
pneumonia_t1    4.39217     0.2277   3.90526   4.93979
pneumonia_t2    0.25553     3.9134   0.19421   0.33622
Age_t1          1.42918     0.6997   1.26333   1.61680
Age_t2          0.03402    29.3903   0.02065   0.05606
asthma          0.78511     1.2737   0.58477   1.05406
inmsupr         1.09609     0.9123   0.91339   1.31533

Concordance= 0.799  (se = 0.005 )
Likelihood ratio test= 2643  on 13 df,   p=<2e-16
Wald test            = 1300  on 13 df,   p=<2e-16
Score (logrank) test = 2582  on 13 df,   p=<2e-16

3. Model Final

model.stepwise.cluster5<-step(object = Fit_Model_Cluster5,direction = "both", trace=T,data=DataCluster5)
Start:  AIC=26144.92
Surv(waktu, status) ~ sex + hypertension + obesity + tobacco + 
    diabetes + renal_chronic + pregnancy + pneumonia_t1 + pneumonia_t2 + 
    Age_t1 + Age_t2 + asthma + inmsupr

                Df   AIC
- tobacco        1 26143
- pregnancy      1 26143
- sex            1 26143
- hypertension   1 26143
- diabetes       1 26143
- obesity        1 26144
- inmsupr        1 26144
<none>             26145
- asthma         1 26146
- renal_chronic  1 26161
- Age_t1         1 26173
- pneumonia_t2   1 26277
- Age_t2         1 26669
- pneumonia_t1   1 26852

Step:  AIC=26142.92
Surv(waktu, status) ~ sex + hypertension + obesity + diabetes + 
    renal_chronic + pregnancy + pneumonia_t1 + pneumonia_t2 + 
    Age_t1 + Age_t2 + asthma + inmsupr

                Df   AIC
- pregnancy      1 26141
- hypertension   1 26141
- sex            1 26141
- diabetes       1 26141
- obesity        1 26142
- inmsupr        1 26142
<none>             26143
- asthma         1 26144
+ tobacco        1 26145
- renal_chronic  1 26159
- Age_t1         1 26171
- pneumonia_t2   1 26275
- Age_t2         1 26667
- pneumonia_t1   1 26850

Step:  AIC=26141.04
Surv(waktu, status) ~ sex + hypertension + obesity + diabetes + 
    renal_chronic + pneumonia_t1 + pneumonia_t2 + Age_t1 + Age_t2 + 
    asthma + inmsupr

                Df   AIC
- sex            1 26139
- hypertension   1 26139
- diabetes       1 26140
- obesity        1 26140
- inmsupr        1 26140
<none>             26141
- asthma         1 26142
+ pregnancy      1 26143
+ tobacco        1 26143
- renal_chronic  1 26158
- Age_t1         1 26170
- pneumonia_t2   1 26273
- Age_t2         1 26665
- pneumonia_t1   1 26849

Step:  AIC=26139.24
Surv(waktu, status) ~ hypertension + obesity + diabetes + renal_chronic + 
    pneumonia_t1 + pneumonia_t2 + Age_t1 + Age_t2 + asthma + 
    inmsupr

                Df   AIC
- hypertension   1 26138
- diabetes       1 26138
- inmsupr        1 26138
- obesity        1 26138
<none>             26139
- asthma         1 26140
+ sex            1 26141
+ pregnancy      1 26141
+ tobacco        1 26141
- renal_chronic  1 26156
- Age_t1         1 26168
- pneumonia_t2   1 26272
- Age_t2         1 26665
- pneumonia_t1   1 26847

Step:  AIC=26137.47
Surv(waktu, status) ~ obesity + diabetes + renal_chronic + pneumonia_t1 + 
    pneumonia_t2 + Age_t1 + Age_t2 + asthma + inmsupr

                Df   AIC
- diabetes       1 26136
- inmsupr        1 26136
- obesity        1 26137
<none>             26138
- asthma         1 26138
+ hypertension   1 26139
+ sex            1 26139
+ pregnancy      1 26139
+ tobacco        1 26140
- renal_chronic  1 26154
- Age_t1         1 26167
- pneumonia_t2   1 26270
- Age_t2         1 26674
- pneumonia_t1   1 26845

Step:  AIC=26135.76
Surv(waktu, status) ~ obesity + renal_chronic + pneumonia_t1 + 
    pneumonia_t2 + Age_t1 + Age_t2 + asthma + inmsupr

                Df   AIC
- inmsupr        1 26135
- obesity        1 26135
<none>             26136
- asthma         1 26137
+ diabetes       1 26138
+ sex            1 26138
+ hypertension   1 26138
+ pregnancy      1 26138
+ tobacco        1 26138
- renal_chronic  1 26153
- Age_t1         1 26165
- pneumonia_t2   1 26268
- Age_t2         1 26685
- pneumonia_t1   1 26844

Step:  AIC=26134.63
Surv(waktu, status) ~ obesity + renal_chronic + pneumonia_t1 + 
    pneumonia_t2 + Age_t1 + Age_t2 + asthma

                Df   AIC
- obesity        1 26134
<none>             26135
- asthma         1 26136
+ inmsupr        1 26136
+ diabetes       1 26136
+ hypertension   1 26137
+ pregnancy      1 26137
+ sex            1 26137
+ tobacco        1 26137
- renal_chronic  1 26154
- Age_t1         1 26165
- pneumonia_t2   1 26268
- Age_t2         1 26685
- pneumonia_t1   1 26843

Step:  AIC=26133.59
Surv(waktu, status) ~ renal_chronic + pneumonia_t1 + pneumonia_t2 + 
    Age_t1 + Age_t2 + asthma

                Df   AIC
<none>             26134
+ obesity        1 26135
+ inmsupr        1 26135
- asthma         1 26135
+ hypertension   1 26135
+ diabetes       1 26135
+ sex            1 26135
+ pregnancy      1 26136
+ tobacco        1 26136
- renal_chronic  1 26153
- Age_t1         1 26163
- pneumonia_t2   1 26267
- Age_t2         1 26684
- pneumonia_t1   1 26841

4. Model Final

Model_Final_Cluster5<-coxph(Surv(waktu, status) ~ renal_chronic + pneumonia_t1 + pneumonia_t2 + 
    Age_t1 + Age_t2 + asthma, data = DataCluster5, method = "breslow")

summary(Model_Final_Cluster5)
Call:
coxph(formula = Surv(waktu, status) ~ renal_chronic + pneumonia_t1 + 
    pneumonia_t2 + Age_t1 + Age_t2 + asthma, data = DataCluster5, 
    method = "breslow")

  n= 3367, number of events= 1845 

                  coef exp(coef) se(coef)       z Pr(>|z|)    
renal_chronic  0.29806   1.34725  0.06260   4.761 1.92e-06 ***
pneumonia_t1   1.47917   4.38929  0.05994  24.677  < 2e-16 ***
pneumonia_t2  -1.36687   0.25490  0.13996  -9.766  < 2e-16 ***
Age_t1         0.35894   1.43180  0.06167   5.820 5.89e-09 ***
Age_t2        -3.38137   0.03400  0.25382 -13.322  < 2e-16 ***
asthma        -0.25712   0.77328  0.14957  -1.719   0.0856 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

              exp(coef) exp(-coef) lower .95 upper .95
renal_chronic    1.3472     0.7423   1.19169   1.52311
pneumonia_t1     4.3893     0.2278   3.90277   4.93646
pneumonia_t2     0.2549     3.9230   0.19375   0.33536
Age_t1           1.4318     0.6984   1.26878   1.61578
Age_t2           0.0340    29.4110   0.02067   0.05592
asthma           0.7733     1.2932   0.57679   1.03670

Concordance= 0.797  (se = 0.005 )
Likelihood ratio test= 2641  on 6 df,   p=<2e-16
Wald test            = 1297  on 6 df,   p=<2e-16
Score (logrank) test = 2579  on 6 df,   p=<2e-16