Penyakit jantung merupakan salah satu penyebab utama kematian di dunia. Risiko penyakit jantung dipengaruhi oleh berbagai faktor seperti usia, tekanan darah, kadar kolesterol, serta kondisi klinis individu. Oleh karena itu, analisis statistik diperlukan untuk mengidentifikasi faktor risiko utama sebagai dasar pencegahan dan deteksi dini.
Penelitian ini bertujuan untuk menganalisis faktor risiko penyakit jantung menggunakan metode regresi logistik.
Dataset yang digunakan berasal dari Kaggle yaitu Heart Disease UCI Dataset. Dataset ini berisi indikator kesehatan pasien.
## age sex cp trestbps
## Min. :29.00 Min. :0.0000 Min. :0.000 Min. : 94.0
## 1st Qu.:48.00 1st Qu.:0.0000 1st Qu.:2.000 1st Qu.:120.0
## Median :56.00 Median :1.0000 Median :2.000 Median :130.0
## Mean :54.54 Mean :0.6768 Mean :2.158 Mean :131.7
## 3rd Qu.:61.00 3rd Qu.:1.0000 3rd Qu.:3.000 3rd Qu.:140.0
## Max. :77.00 Max. :1.0000 Max. :3.000 Max. :200.0
## chol fbs restecg thalach
## Min. :126.0 Min. :0.0000 Min. :0.0000 Min. : 71.0
## 1st Qu.:211.0 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:133.0
## Median :243.0 Median :0.0000 Median :1.0000 Median :153.0
## Mean :247.4 Mean :0.1448 Mean :0.9966 Mean :149.6
## 3rd Qu.:276.0 3rd Qu.:0.0000 3rd Qu.:2.0000 3rd Qu.:166.0
## Max. :564.0 Max. :1.0000 Max. :2.0000 Max. :202.0
## exang oldpeak slope ca
## Min. :0.0000 Length:297 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 Class :character 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Mode :character Median :1.0000 Median :0.0000
## Mean :0.3266 Mean :0.6027 Mean :0.6768
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :2.0000 Max. :3.0000
## thal condition
## Min. :0.000 Min. :0.0000
## 1st Qu.:0.000 1st Qu.:0.0000
## Median :0.000 Median :0.0000
## Mean :0.835 Mean :0.4613
## 3rd Qu.:2.000 3rd Qu.:1.0000
## Max. :2.000 Max. :1.0000
Statistik Deskriptif
mean_age <- round(mean(data$age, na.rm=TRUE),2)
mean_bp <- round(mean(data$trestbps, na.rm=TRUE),2)
mean_chol <- round(mean(data$chol, na.rm=TRUE),2)
mean_age## [1] 54.54
## [1] 131.69
## [1] 247.35
model <- glm(condition ~ age + sex + cp + trestbps + chol + thalach + oldpeak,
data = data,
family = binomial)
summary(model)##
## Call:
## glm(formula = condition ~ age + sex + cp + trestbps + chol +
## thalach + oldpeak, family = binomial, data = data)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -6.244e+00 2.533e+00 -2.465 0.013711 *
## age 2.599e-02 2.409e-02 1.079 0.280777
## sex 2.480e+00 4.786e-01 5.181 2.21e-07 ***
## cp 1.128e+00 2.149e-01 5.251 1.52e-07 ***
## trestbps 3.069e-02 1.135e-02 2.705 0.006840 **
## chol 7.886e-03 3.694e-03 2.135 0.032750 *
## thalach -3.896e-02 1.062e-02 -3.668 0.000245 ***
## oldpeak0.1 -7.706e-01 1.028e+00 -0.750 0.453408
## oldpeak0.2 -4.302e-01 9.004e-01 -0.478 0.632780
## oldpeak0.3 9.884e-01 1.398e+00 0.707 0.479508
## oldpeak0.4 -2.284e+00 1.199e+00 -1.904 0.056866 .
## oldpeak0.5 -1.544e+00 1.203e+00 -1.283 0.199530
## oldpeak0.6 -2.418e-01 7.848e-01 -0.308 0.758015
## oldpeak0.7 -1.213e+01 6.523e+03 -0.002 0.998516
## oldpeak0.8 5.330e-01 7.912e-01 0.674 0.500574
## oldpeak0.9 1.802e+00 3.401e+00 0.530 0.596104
## oldpeak1 8.647e-01 8.729e-01 0.991 0.321849
## oldpeak1.1 -1.680e+01 3.760e+03 -0.004 0.996434
## oldpeak1.2 3.746e-01 6.921e-01 0.541 0.588332
## oldpeak1.3 -1.556e+01 6.523e+03 -0.002 0.998097
## oldpeak1.4 1.822e+00 9.080e-01 2.007 0.044740 *
## oldpeak1.5 -3.091e+00 1.713e+00 -1.804 0.071205 .
## oldpeak1.6 -7.068e-01 8.468e-01 -0.835 0.403879
## oldpeak1.8 1.658e+00 1.127e+00 1.472 0.141049
## oldpeak1.9 4.864e-01 1.232e+00 0.395 0.692994
## oldpeak2 9.620e-01 9.883e-01 0.973 0.330353
## oldpeak2.1 1.608e+01 6.523e+03 0.002 0.998033
## oldpeak2.2 1.698e+01 2.992e+03 0.006 0.995470
## oldpeak2.3 -2.006e+01 3.612e+03 -0.006 0.995568
## oldpeak2.4 -2.934e-01 1.675e+00 -0.175 0.860979
## oldpeak2.5 1.897e+01 4.222e+03 0.004 0.996415
## oldpeak2.6 2.576e+00 1.380e+00 1.867 0.061971 .
## oldpeak2.8 1.756e+01 2.480e+03 0.007 0.994350
## oldpeak2.9 1.559e+01 6.523e+03 0.002 0.998093
## oldpeak3 1.088e+00 1.221e+00 0.891 0.373027
## oldpeak3.1 1.771e+01 6.523e+03 0.003 0.997834
## oldpeak3.2 1.850e+01 3.924e+03 0.005 0.996238
## oldpeak3.4 1.623e+01 3.639e+03 0.004 0.996442
## oldpeak3.5 -1.670e+01 6.523e+03 -0.003 0.997958
## oldpeak3.6 1.836e+01 2.947e+03 0.006 0.995030
## oldpeak3.8 2.293e+01 6.523e+03 0.004 0.997195
## oldpeak4 1.738e+01 3.683e+03 0.005 0.996235
## oldpeak4.2 -1.276e+00 1.956e+00 -0.652 0.514161
## oldpeak4.4 1.689e+01 6.523e+03 0.003 0.997934
## oldpeak5.6 1.583e+01 6.523e+03 0.002 0.998063
## oldpeak6.2 1.926e+01 6.523e+03 0.003 0.997644
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 409.95 on 296 degrees of freedom
## Residual deviance: 211.28 on 251 degrees of freedom
## AIC: 303.28
##
## Number of Fisher Scoring iterations: 17
Koefisien positif menunjukkan peningkatan risiko, sedangkan negatif menunjukkan penurunan risiko.
## (Intercept) age sex cp trestbps chol
## -6.2444 0.0260 2.4797 1.1281 0.0307 0.0079
## thalach oldpeak0.1 oldpeak0.2 oldpeak0.3 oldpeak0.4 oldpeak0.5
## -0.0390 -0.7706 -0.4302 0.9884 -2.2836 -1.5436
## oldpeak0.6 oldpeak0.7 oldpeak0.8 oldpeak0.9 oldpeak1 oldpeak1.1
## -0.2418 -12.1315 0.5330 1.8024 0.8647 -16.8041
## oldpeak1.2 oldpeak1.3 oldpeak1.4 oldpeak1.5 oldpeak1.6 oldpeak1.8
## 0.3746 -15.5603 1.8225 -3.0908 -0.7068 1.6584
## oldpeak1.9 oldpeak2 oldpeak2.1 oldpeak2.2 oldpeak2.3 oldpeak2.4
## 0.4864 0.9620 16.0791 16.9850 -20.0645 -0.2934
## oldpeak2.5 oldpeak2.6 oldpeak2.8 oldpeak2.9 oldpeak3 oldpeak3.1
## 18.9677 2.5763 17.5631 15.5863 1.0879 17.7095
## oldpeak3.2 oldpeak3.4 oldpeak3.5 oldpeak3.6 oldpeak3.8 oldpeak4
## 18.5047 16.2303 -16.6958 18.3584 22.9285 17.3791
## oldpeak4.2 oldpeak4.4 oldpeak5.6 oldpeak6.2
## -1.2759 16.8863 15.8331 19.2595
## (Intercept) age sex cp trestbps chol
## 1.941285e-03 1.026329e+00 1.193759e+01 3.089742e+00 1.031169e+00 1.007917e+00
## thalach oldpeak0.1 oldpeak0.2 oldpeak0.3 oldpeak0.4 oldpeak0.5
## 9.617920e-01 4.627414e-01 6.503639e-01 2.686968e+00 1.019134e-01 2.136091e-01
## oldpeak0.6 oldpeak0.7 oldpeak0.8 oldpeak0.9 oldpeak1 oldpeak1.1
## 7.852295e-01 5.387179e-06 1.703989e+00 6.063953e+00 2.374397e+00 5.035836e-08
## oldpeak1.2 oldpeak1.3 oldpeak1.4 oldpeak1.5 oldpeak1.6 oldpeak1.8
## 1.454417e+00 1.746904e-07 6.187071e+00 4.546372e-02 4.932005e-01 5.250644e+00
## oldpeak1.9 oldpeak2 oldpeak2.1 oldpeak2.2 oldpeak2.3 oldpeak2.4
## 1.626444e+00 2.616879e+00 9.617663e+06 2.379437e+07 1.932454e-09 7.457257e-01
## oldpeak2.5 oldpeak2.6 oldpeak2.8 oldpeak2.9 oldpeak3 oldpeak3.1
## 1.728012e+08 1.314810e+01 4.241947e+07 5.875643e+06 2.967992e+00 4.910872e+07
## oldpeak3.2 oldpeak3.4 oldpeak3.5 oldpeak3.6 oldpeak3.8 oldpeak4
## 1.087651e+08 1.118730e+07 5.611943e-08 9.395865e+07 9.072578e+09 3.528833e+07
## oldpeak4.2 oldpeak4.4 oldpeak5.6 oldpeak6.2
## 2.791813e-01 2.155992e+07 7.519825e+06 2.313579e+08
prob <- predict(model, type="response")
pred <- ifelse(prob > 0.5,1,0)
cm <- table(
factor(pred, levels=c(0,1)),
factor(data$condition, levels=c(0,1))
)
cm##
## 0 1
## 0 143 24
## 1 17 113
6.2 Akurasi
## [1] 0.8619529
6.3 Sensitivity dan Specificity
sensitivity <- cm[2,2] / (cm[2,2] + cm[1,2])
specificity <- cm[1,1] / (cm[1,1] + cm[2,1])
sensitivity## [1] 0.8248175
## [1] 0.89375
Sensitivity menunjukkan kemampuan model mendeteksi pasien sakit, sedangkan specificity menunjukkan kemampuan mendeteksi pasien sehat.
logLik_model <- logLik(model)
logLik_null <- logLik(
glm(condition ~ 1, data=data, family=binomial)
)
pseudo_r2 <- 1 - (logLik_model / logLik_null)
pseudo_r2## 'log Lik.' 0.4846183 (df=46)
Nilai pseudo R² menunjukkan kemampuan model menjelaskan variabilitas data.
numeric_vars <- c("age","trestbps","chol","thalach","oldpeak")
data[numeric_vars] <- lapply(data[numeric_vars], as.numeric)
cor_matrix <- cor(data[numeric_vars], use="complete.obs")
cor_matrix## age trestbps chol thalach oldpeak
## age 1.0000000 0.29047626 2.026435e-01 -3.945629e-01 0.19712262
## trestbps 0.2904763 1.00000000 1.315357e-01 -4.910766e-02 0.19124314
## chol 0.2026435 0.13153571 1.000000e+00 -7.456799e-05 0.03859579
## thalach -0.3945629 -0.04910766 -7.456799e-05 1.000000e+00 -0.34763997
## oldpeak 0.1971226 0.19124314 3.859579e-02 -3.476400e-01 1.00000000
Korelasi tinggi antar variabel dapat menyebabkan multikolinearitas.
tpr <- c()
fpr <- c()
thresholds <- seq(0,1,by=0.05)
for(t in thresholds){
pred_t <- ifelse(prob > t,1,0)
cm_t <- table(
factor(pred_t,levels=c(0,1)),
factor(data$condition,levels=c(0,1))
)
tpr <- c(tpr, cm_t[2,2]/(cm_t[2,2]+cm_t[1,2]))
fpr <- c(fpr, cm_t[2,1]/(cm_t[2,1]+cm_t[1,1]))
}
plot(fpr,tpr,
type="l",
lwd=3,
col="blue",
xlab="False Positive Rate",
ylab="True Positive Rate",
main="ROC Curve")
abline(0,1,col="red",lty=2)ROC curve menunjukkan kemampuan model membedakan pasien sakit dan sehat.
ggplot(data,aes(age,trestbps,color=factor(condition)))+
geom_point(size=3)+
labs(
title="Hubungan Usia dan Tekanan Darah",
x="Usia",
y="Tekanan Darah",
color="Condition"
)+
theme_minimal()Hasil menunjukkan bahwa usia, tekanan darah, dan kolesterol berpengaruh terhadap risiko penyakit jantung. Individu dengan usia lebih tinggi dan tekanan darah tinggi memiliki peluang lebih besar mengalami penyakit jantung. Faktor klinis lainnya juga berperan dalam meningkatkan risiko.
Regresi logistik efektif dalam mengidentifikasi faktor risiko penyakit jantung. Model ini dapat digunakan sebagai dasar deteksi dini dan pengambilan keputusan kesehatan.