H2O 를 활용한 Cox Proportional Hazards (CoxPH)
[참조 1] http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/coxph.html
생존분석에서 쓰는 통계모형으로 준모수적 방법을 이용해 생존함수를 추정(*콕스모델)
모형의 이름인 비례위험은 시간에 상관없이 어떤 변수의 위험비는 항상 일정하다는 모형의 기본가정에서 비롯
## Warning: package 'h2o' was built under R version 4.0.3
## Connection successful!
##
## R is connected to the H2O cluster:
## H2O cluster uptime: 4 hours 29 minutes
## H2O cluster timezone: Asia/Seoul
## H2O data parsing timezone: UTC
## H2O cluster version: 3.32.0.1
## H2O cluster version age: 25 days
## H2O cluster name: H2O_started_from_R_user_uho906
## H2O cluster total nodes: 1
## H2O cluster total memory: 3.97 GB
## H2O cluster total cores: 4
## H2O cluster allowed cores: 4
## H2O cluster healthy: TRUE
## H2O Connection ip: localhost
## H2O Connection port: 54321
## H2O Connection proxy: NA
## H2O Internal Security: FALSE
## H2O API Extensions: Amazon S3, Algos, AutoML, Core V3, TargetEncoder, Core V4
## R Version: R version 4.0.2 (2020-06-22)
# Import the heart dataset into H2O:
heart <- h2o.importFile("http://s3.amazonaws.com/h2o-public-test-data/smalldata/coxph_test/heart.csv")
##
|
| | 0%
|
|======================================================================| 100%
## start stop event age year surgery transplant id
## 1 0 50 1 -17.155373 0.1232033 0 0 1
## 2 0 6 1 3.835729 0.2546201 0 0 2
## 3 0 1 0 6.297057 0.2655715 0 0 3
## 4 1 16 1 6.297057 0.2655715 0 1 3
## 5 0 36 0 -7.737166 0.4900753 0 0 4
## 6 36 39 1 -7.737166 0.4900753 0 1 4
##
## [172 rows x 8 columns]
# Split the dataset into a train and test set:
heart_split <- h2o.splitFrame(data = heart, ratios = 0.8, seed = 1234)
train <- heart_split[[1]]
test <- heart_split[[2]]
# Build and train the model:
heart_coxph <- h2o.coxph(x = "age",
event_column = "event",
start_column = "start",
stop_column = "stop",
ties = "breslow",
training_frame = train)
##
|
| | 0%
|
|======================================================================| 100%
## Model Details:
## ==============
##
## H2OCoxPHModel: coxph
## Model ID: CoxPH_model_R_1604363756625_4
## Call:
## Surv(start, stop, event) ~ age
##
## coef exp(coef) se(coef) z p
## age 0.02257 1.02282 0.01514 1.49 0.136
##
## Likelihood ratio test=2.42 on 1 df, p=0.1196
## n= 138, number of events= 59
## H2OCoxPHMetrics: coxph
## ** Reported on training data. **
# Generate predictions on a test set (if necessary):
predict <- h2o.predict(heart_coxph, newdata = test)
##
|
| | 0%
|
|======================================================================| 100%
## lp
## 1 0.20941803
## 2 0.13206056
## 3 0.06687515
## 4 0.21565853
## 5 0.21565853
## 6 0.20299217
##
## [34 rows x 1 column]