Dataset yang digunakan bersumber dari Kaggle dan berisi informasi gaya hidup dan kesehatan individu, seperti aktivitas fisik, tekanan darah, serta gangguan tidur. Tujuan analisis ini adalah mengklasifikasi jenis gangguan tidur berdasarkan faktor gaya hidup.
library(DT)
## Warning: package 'DT' was built under R version 4.4.3
df <- read.csv("Sleep_health_and_lifestyle_dataset.csv")
datatable(df)
library(psych)
## Warning: package 'psych' was built under R version 4.4.3
describe(df)
## vars n mean sd median trimmed mad min
## Person.ID 1 374 187.50 108.11 187.5 187.50 138.62 1.0
## Gender* 2 374 1.51 0.50 2.0 1.51 0.00 1.0
## Age 3 374 42.18 8.67 43.0 41.84 10.38 27.0
## Occupation* 4 374 4.77 3.06 4.0 4.47 2.97 1.0
## Sleep.Duration 5 374 7.13 0.80 7.2 7.12 1.04 5.8
## Quality.of.Sleep 6 374 7.31 1.20 7.0 7.32 1.48 4.0
## Physical.Activity.Level 7 374 59.17 20.83 60.0 58.97 22.24 30.0
## Stress.Level 8 374 5.39 1.77 5.0 5.36 2.97 3.0
## BMI.Category* 9 374 2.30 1.43 1.0 2.25 0.00 1.0
## Blood.Pressure* 10 374 14.11 7.10 16.0 14.48 8.90 1.0
## Heart.Rate 11 374 70.17 4.14 70.0 69.74 2.97 65.0
## Daily.Steps 12 374 6816.84 1617.92 7000.0 6732.67 1482.60 3000.0
## Sleep.Disorder* 13 374 2.00 0.64 2.0 2.00 0.00 1.0
## max range skew kurtosis se
## Person.ID 374.0 373.0 0.00 -1.21 5.59
## Gender* 2.0 1.0 -0.02 -2.00 0.03
## Age 59.0 32.0 0.26 -0.92 0.45
## Occupation* 11.0 10.0 0.74 -0.53 0.16
## Sleep.Duration 8.5 2.7 0.04 -1.29 0.04
## Quality.of.Sleep 9.0 5.0 -0.21 -0.77 0.06
## Physical.Activity.Level 90.0 60.0 0.07 -1.27 1.08
## Stress.Level 8.0 5.0 0.15 -1.33 0.09
## BMI.Category* 4.0 3.0 0.28 -1.85 0.07
## Blood.Pressure* 25.0 24.0 -0.18 -0.96 0.37
## Heart.Rate 86.0 21.0 1.22 2.21 0.21
## Daily.Steps 10000.0 7000.0 0.18 -0.42 83.66
## Sleep.Disorder* 3.0 2.0 0.00 -0.60 0.03
cat_cols <- sapply(df, function(x) is.factor(x) || is.character(x))
for (col_name in names(df)[cat_cols]) {
cat("Kolom:", col_name, "\n")
print(table(df[[col_name]]))
cat("\n---------------------\n\n")
}
## Kolom: Gender
##
## Female Male
## 185 189
##
## ---------------------
##
## Kolom: Occupation
##
## Accountant Doctor Engineer
## 37 71 63
## Lawyer Manager Nurse
## 47 1 73
## Sales Representative Salesperson Scientist
## 2 32 4
## Software Engineer Teacher
## 4 40
##
## ---------------------
##
## Kolom: BMI.Category
##
## Normal Normal Weight Obese Overweight
## 195 21 10 148
##
## ---------------------
##
## Kolom: Blood.Pressure
##
## 115/75 115/78 117/76 118/75 118/76 119/77 120/80 121/79 122/80 125/80 125/82
## 32 2 2 2 1 2 45 1 1 65 4
## 126/83 128/84 128/85 129/84 130/85 130/86 131/86 132/87 135/88 135/90 139/91
## 2 2 3 2 99 2 2 3 2 27 2
## 140/90 140/95 142/92
## 4 65 2
##
## ---------------------
##
## Kolom: Sleep.Disorder
##
## Insomnia None Sleep Apnea
## 77 219 78
##
## ---------------------
missing_counts <- colSums(is.na(df))
print(missing_counts)
## Person.ID Gender Age
## 0 0 0
## Occupation Sleep.Duration Quality.of.Sleep
## 0 0 0
## Physical.Activity.Level Stress.Level BMI.Category
## 0 0 0
## Blood.Pressure Heart.Rate Daily.Steps
## 0 0 0
## Sleep.Disorder
## 0
Person ID
)df$Person.ID <- NULL
Pada kolom BMI Category
terdapat nilai yang tidak
konsisten, yaitu Normal
dan Normal Weight
.
Nilai Normal Weight
diubah menjadi Normal
agar
konsisten.
df$BMI.Category <- ifelse(df$BMI.Category == "Normal Weight", "Normal", df$BMI.Category)
Blood Pressure
menjadi
BP_Systolic
dan BP_Diastolic
Tipe data pada kolom Blood Pressure
merupakan
kategorikal yang dimana berupa angka untuk merepresentasikan tekanan
darah dalam sistolik dan diastolik. Pada data baris pertama yaitu
126/83
, dimana nilai 126 adalah nilai tekanan darah
sistolik dan nilai 83 adalah nilai tekanan darah diastolik. Nilai ini
kemudian akan dipecah menjadi kolom BP_Systolic
dan
BP_Diastolic
agar menjadi tipe data numerik.
bp_split <- strsplit(as.character(df$Blood.Pressure), "/")
df$BP_Systolic <- as.numeric(sapply(bp_split, `[`, 1))
df$BP_Diastolic <- as.numeric(sapply(bp_split, `[`, 2))
df$Blood.Pressure <- NULL
sleep_col <- df$Sleep.Disorder
df$Sleep.Disorder <- NULL
df$Sleep.Disorder <- sleep_col
numeric_cols <- c("Age", "Sleep.Duration", "Quality.of.Sleep", "Physical.Activity.Level", "Stress.Level", "Heart.Rate", "Daily.Steps", "BP_Systolic", "BP_Diastolic")
par(mfrow = c(2, ceiling(length(numeric_cols) / 2)))
for (col in numeric_cols) {
boxplot(df[[col]], main = col, col = "skyblue", horizontal = TRUE)
}
par(mfrow = c(1,1))
detect_outliers <- function(x) {
z <- scale(x)
which(abs(z) > 3)
}
outlier_index <- list()
for (var in numeric_cols) {
outlier_index[[var]] <- detect_outliers(df[[var]])
cat(var, ":", length(outlier_index[[var]]), "outlier\n")
}
## Age : 0 outlier
## Sleep.Duration : 0 outlier
## Quality.of.Sleep : 0 outlier
## Physical.Activity.Level : 0 outlier
## Stress.Level : 0 outlier
## Heart.Rate : 9 outlier
## Daily.Steps : 0 outlier
## BP_Systolic : 0 outlier
## BP_Diastolic : 0 outlier
Outlier diimputasi menggunakan nilai batas bawah atau batas atas
replace_outlier_with_bound <- function(x) {
Q1 <- quantile(x, 0.25, na.rm = TRUE)
Q3 <- quantile(x, 0.75, na.rm = TRUE)
IQR_val <- Q3 - Q1
lower_bound <- Q1 - 1.5 * IQR_val
upper_bound <- Q3 + 1.5 * IQR_val
x[x < lower_bound] <- lower_bound
x[x > upper_bound] <- upper_bound
return(x)
}
numeric_cols <- c("Age", "Sleep.Duration", "Quality.of.Sleep", "Physical.Activity.Level", "Stress.Level", "Heart.Rate", "Daily.Steps", "BP_Systolic", "BP_Diastolic")
df[numeric_cols] <- lapply(df[numeric_cols], replace_outlier_with_bound)
detect_outliers <- function(x) {
z <- scale(x)
which(abs(z) > 3)
}
outlier_index <- list()
for (var in numeric_cols) {
outlier_index[[var]] <- detect_outliers(df[[var]])
cat(var, ":", length(outlier_index[[var]]), "outlier\n")
}
## Age : 0 outlier
## Sleep.Duration : 0 outlier
## Quality.of.Sleep : 0 outlier
## Physical.Activity.Level : 0 outlier
## Stress.Level : 0 outlier
## Heart.Rate : 0 outlier
## Daily.Steps : 0 outlier
## BP_Systolic : 0 outlier
## BP_Diastolic : 0 outlier
Beberapa kolom kategorikal perlu diubah menjadi representasi numerik
agar bisa digunakan dalam model. One Hot Encoding dilakukan pada kolom
Gender
, Occupation
, dan
BMI Category
, dengan baseline yang dijadikan referensi
sebagai berikut:
Gender
:Female
Occupation
: Software Engineer
BMI Category
: Normal
df$Gender <- factor(df$Gender, levels = c("Female", "Male"))
df$Occupation <- factor(df$Occupation, levels = c("Software Engineer", sort(unique(df$Occupation))[sort(unique(df$Occupation)) != "Software Engineer"]))
df$BMI.Category <- factor(df$BMI.Category, levels = c("Normal", "Overweight", "Obese"))
gender_dummies <- model.matrix(~ Gender, data = df)[, -1, drop = FALSE]
occupation_dummies <- model.matrix(~ Occupation, data = df)[, -1, drop = FALSE]
bmi_dummies <- model.matrix(~ BMI.Category, data = df)[, -1, drop = FALSE]
colnames(gender_dummies) <- gsub("Gender", "Gender_", colnames(gender_dummies))
colnames(occupation_dummies) <- gsub("Occupation", "Occupation_", colnames(occupation_dummies))
colnames(bmi_dummies) <- gsub("BMI.Category", "BMI.Category_", colnames(bmi_dummies))
colnames(gender_dummies) <- gsub("[() ]", "", colnames(gender_dummies))
colnames(occupation_dummies) <- gsub("[() ]", "", colnames(occupation_dummies))
colnames(bmi_dummies) <- gsub("[() ]", "", colnames(bmi_dummies))
df$Gender <- NULL
df$Occupation <- NULL
df$BMI.Category <- NULL
df <- cbind(df, gender_dummies, occupation_dummies, bmi_dummies)
# atur urutan kolom
new_order <- c(
"Age",
"Gender_Male",
grep("^Occupation_", names(df), value = TRUE),
grep("^BMI.Category_", names(df), value = TRUE),
"Sleep.Duration",
"Quality.of.Sleep",
"Physical.Activity.Level",
"Stress.Level",
"Heart.Rate",
"Daily.Steps",
"BP_Systolic",
"BP_Diastolic",
"Sleep.Disorder"
)
cols_existing <- names(df)
cols_to_keep <- intersect(new_order, cols_existing)
cols_leftover <- setdiff(cols_existing, cols_to_keep)
df <- df[, c(cols_to_keep, cols_leftover)]
Sleep Disorder
Label kelas Sleep Disorder
dikonversi dari bentuk string
ke dalam tipe data faktor (unordered) agar algoritma klasifikasi seperti
Analisis Diskriminan Linear (LDA) dan Regresi Multinomial dapat
mengenali bahwa target merupakan kelas diskrit (nominal), bukan data
numerik berurutan (ordinal).
df$Sleep.Disorder <- factor(df$Sleep.Disorder, levels = c("None", "Sleep Apnea", "Insomnia"))
table(df$Sleep.Disorder)
##
## None Sleep Apnea Insomnia
## 219 78 77
barplot(table(df$Sleep.Disorder), col = "steelblue", main = "Distribusi Sleep Disorder")
library(corrplot)
## corrplot 0.95 loaded
exclude_prefixes <- c("Gender_", "Occupation_", "BMI.Category_")
numeric_df <- df[sapply(df, is.numeric)]
numeric_df <- numeric_df[ , !grepl(paste0("^(", paste(exclude_prefixes, collapse = "|"), ")"), names(numeric_df))]
corrplot::corrplot(cor(numeric_df), method = "color", tl.cex = 1, addCoef.col = "black", number.cex = 0.7)
Berdasarkan hasil visualisasi korelasi, ditemukan bahwa
BP_Systolic
dan BP_Diastolic
memiliki korelasi
tinggi (>0.85). Untuk menghindari masalah multikolinearitas, variabel
BP_Diastolic
dihapus dari dataset. Begitu juga dengan
Sleep.Duration
dan Quality.of.Sleep
yang
berkorelasi tinggi, variabel Quality.of.Sleep
dihapus dari
dataset.
df$BP_Diastolic <- NULL
df$Quality.of.Sleep <- NULL
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.3
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
numeric_cols <- c("Age", "Sleep.Duration", "Physical.Activity.Level",
"Stress.Level", "Heart.Rate", "Daily.Steps", "BP_Systolic")
data_numeric <- df[, numeric_cols]
dist_matrix <- dist(scale(data_numeric))
mds_result <- cmdscale(dist_matrix, k = 2)
mds_df <- data.frame(Dim1 = mds_result[,1],
Dim2 = mds_result[,2],
Sleep.Disorder = df$Sleep.Disorder)
ggplot(mds_df, aes(x = Dim1, y = Dim2, color = Sleep.Disorder)) +
geom_point(size = 2, alpha = 0.8) +
labs(title = "MDS Visualization (Euclidean Distance)",
x = "Dimension 1", y = "Dimension 2") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5))
datatable(df)
library(psych)
describe(df)
## vars n mean sd median trimmed mad
## Age 1 374 42.18 8.67 43.0 41.84 10.38
## Gender_Male 2 374 0.51 0.50 1.0 0.51 0.00
## Occupation_Accountant 3 374 0.10 0.30 0.0 0.00 0.00
## Occupation_Doctor 4 374 0.19 0.39 0.0 0.11 0.00
## Occupation_Engineer 5 374 0.17 0.37 0.0 0.09 0.00
## Occupation_Lawyer 6 374 0.13 0.33 0.0 0.03 0.00
## Occupation_Manager 7 374 0.00 0.05 0.0 0.00 0.00
## Occupation_Nurse 8 374 0.20 0.40 0.0 0.12 0.00
## Occupation_SalesRepresentative 9 374 0.01 0.07 0.0 0.00 0.00
## Occupation_Salesperson 10 374 0.09 0.28 0.0 0.00 0.00
## Occupation_Scientist 11 374 0.01 0.10 0.0 0.00 0.00
## Occupation_Teacher 12 374 0.11 0.31 0.0 0.01 0.00
## BMI.Category_Overweight 13 374 0.40 0.49 0.0 0.37 0.00
## BMI.Category_Obese 14 374 0.03 0.16 0.0 0.00 0.00
## Sleep.Duration 15 374 7.13 0.80 7.2 7.12 1.04
## Physical.Activity.Level 16 374 59.17 20.83 60.0 58.97 22.24
## Stress.Level 17 374 5.39 1.77 5.0 5.36 2.97
## Heart.Rate 18 374 69.97 3.57 70.0 69.74 2.97
## Daily.Steps 19 374 6816.84 1617.92 7000.0 6732.67 1482.60
## BP_Systolic 20 374 128.55 7.75 130.0 128.78 7.41
## Sleep.Disorder* 21 374 1.62 0.81 1.0 1.53 0.00
## min max range skew kurtosis se
## Age 27.0 59.0 32.0 0.26 -0.92 0.45
## Gender_Male 0.0 1.0 1.0 -0.02 -2.00 0.03
## Occupation_Accountant 0.0 1.0 1.0 2.68 5.17 0.02
## Occupation_Doctor 0.0 1.0 1.0 1.58 0.48 0.02
## Occupation_Engineer 0.0 1.0 1.0 1.76 1.12 0.02
## Occupation_Lawyer 0.0 1.0 1.0 2.25 3.07 0.02
## Occupation_Manager 0.0 1.0 1.0 19.18 367.02 0.00
## Occupation_Nurse 0.0 1.0 1.0 1.53 0.35 0.02
## Occupation_SalesRepresentative 0.0 1.0 1.0 13.51 181.02 0.00
## Occupation_Salesperson 0.0 1.0 1.0 2.95 6.73 0.01
## Occupation_Scientist 0.0 1.0 1.0 9.48 88.02 0.01
## Occupation_Teacher 0.0 1.0 1.0 2.53 4.43 0.02
## BMI.Category_Overweight 0.0 1.0 1.0 0.42 -1.82 0.03
## BMI.Category_Obese 0.0 1.0 1.0 5.84 32.24 0.01
## Sleep.Duration 5.8 8.5 2.7 0.04 -1.29 0.04
## Physical.Activity.Level 30.0 90.0 60.0 0.07 -1.27 1.08
## Stress.Level 3.0 8.0 5.0 0.15 -1.33 0.09
## Heart.Rate 65.0 78.0 13.0 0.47 -0.39 0.18
## Daily.Steps 3000.0 10000.0 7000.0 0.18 -0.42 83.66
## BP_Systolic 115.0 142.0 27.0 -0.04 -0.91 0.40
## Sleep.Disorder* 1.0 3.0 2.0 0.79 -1.01 0.04
if (!require(MVN)) install.packages("MVN")
## Loading required package: MVN
## Warning: package 'MVN' was built under R version 4.4.3
library(MVN)
numeric_cols <- c("Age", "Sleep.Duration", "Physical.Activity.Level", "Stress.Level", "Heart.Rate", "Daily.Steps", "BP_Systolic")
mvn(df[numeric_cols], mvnTest = "hz", univariateTest = "AD", multivariatePlot = "qq")
## $multivariateNormality
## Test HZ p value MVN
## 1 Henze-Zirkler 21.37807 0 NO
##
## $univariateNormality
## Test Variable Statistic p value Normality
## 1 Anderson-Darling Age 3.9372 <0.001 NO
## 2 Anderson-Darling Sleep.Duration 7.3040 <0.001 NO
## 3 Anderson-Darling Physical.Activity.Level 11.3729 <0.001 NO
## 4 Anderson-Darling Stress.Level 12.6410 <0.001 NO
## 5 Anderson-Darling Heart.Rate 9.1373 <0.001 NO
## 6 Anderson-Darling Daily.Steps 8.9070 <0.001 NO
## 7 Anderson-Darling BP_Systolic 9.0151 <0.001 NO
##
## $Descriptives
## n Mean Std.Dev Median Min Max
## Age 374 42.184492 8.6731335 43.0 27.0 59.0
## Sleep.Duration 374 7.132086 0.7956567 7.2 5.8 8.5
## Physical.Activity.Level 374 59.171123 20.8308037 60.0 30.0 90.0
## Stress.Level 374 5.385027 1.7745264 5.0 3.0 8.0
## Heart.Rate 374 69.965241 3.5672581 70.0 65.0 78.0
## Daily.Steps 374 6816.844920 1617.9156791 7000.0 3000.0 10000.0
## BP_Systolic 374 128.553476 7.7481176 130.0 115.0 142.0
## 25th 75th Skew Kurtosis
## Age 35.25 50.0 0.25516254 -0.9248041
## Sleep.Duration 6.40 7.8 0.03725369 -1.2945281
## Physical.Activity.Level 45.00 75.0 0.07389048 -1.2744695
## Stress.Level 4.00 7.0 0.15309385 -1.3345700
## Heart.Rate 68.00 72.0 0.46555442 -0.3878657
## Daily.Steps 5600.00 8000.0 0.17684985 -0.4186421
## BP_Systolic 125.00 135.0 -0.03538326 -0.9088701
Hasil ini menunjukkan bahwa p-value<0.05 maka tolak H0, artinya populasi tidak berdistribusi normal multivariat, sehingga asumsi normal multivariat tidak terpenuhi
library(biotools)
## Warning: package 'biotools' was built under R version 4.4.3
## Loading required package: MASS
## ---
## biotools version 4.3
box_m_result <- boxM(df[, numeric_cols], grouping = df$Sleep.Disorder)
print(box_m_result)
##
## Box's M-test for Homogeneity of Covariance Matrices
##
## data: df[, numeric_cols]
## Chi-Sq (approx.) = 833.06, df = 56, p-value < 2.2e-16
Hasil ini menunjukkan bahwa p-value<0.05 maka tolak H0, artinya varians tidak homogen, sehingga asumsi homogenitas varians tidak terpenuhi
manova_model <- manova(as.matrix(df[, numeric_cols]) ~ Sleep.Disorder, data = df)
summary(manova_model, test = "Wilks")
## Df Wilks approx F num Df den Df Pr(>F)
## Sleep.Disorder 2 0.26577 49.001 14 730 < 2.2e-16 ***
## Residuals 371
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
p-value < 0.05 → Tolak H0: Terdapat paling tidak satu kombinasi
linier dari variabel numerik yang dapat membedakan antar kelas
Sleep Disorder
.
Gunakan ANOVA untuk melihat variabel mana yang signifikan secara individual:
anova_results <- sapply(numeric_cols, function(x) {
model <- aov(as.formula(paste(x, "~ Sleep.Disorder")), data = df)
summary(model)[[1]][["Pr(>F)"]][1]
})
anova_df <- data.frame(p_value = anova_results)
anova_df
## p_value
## Age 8.852105e-23
## Sleep.Duration 1.626151e-13
## Physical.Activity.Level 6.306184e-18
## Stress.Level 1.520459e-03
## Heart.Rate 3.510225e-13
## Daily.Steps 7.942397e-11
## BP_Systolic 1.198027e-62
Semua variabel memiliki p-value < 0.05, artinya terdapat perbedaan
rata-rata yang signifikan secara statistik untuk semua variabel numerik
berdasarkan kelompok Sleep Disorder
.
library(MASS)
lda_model <- lda(Sleep.Disorder ~ ., data = df[, c(numeric_cols, "Sleep.Disorder")])
print(lda_model)
## Call:
## lda(Sleep.Disorder ~ ., data = df[, c(numeric_cols, "Sleep.Disorder")])
##
## Prior probabilities of groups:
## None Sleep Apnea Insomnia
## 0.5855615 0.2085561 0.2058824
##
## Group means:
## Age Sleep.Duration Physical.Activity.Level Stress.Level
## None 39.03653 7.358447 57.94977 5.114155
## Sleep Apnea 49.70513 7.032051 74.79487 5.666667
## Insomnia 43.51948 6.589610 46.81818 5.870130
## Heart.Rate Daily.Steps BP_Systolic
## None 69.01826 6852.968 124.0457
## Sleep Apnea 72.44872 7619.231 137.7692
## Insomnia 70.14286 5901.299 132.0390
##
## Coefficients of linear discriminants:
## LD1 LD2
## Age 6.793810e-02 0.0182843850
## Sleep.Duration -8.130318e-01 -1.3462678312
## Physical.Activity.Level 2.210341e-03 0.0227071520
## Stress.Level -3.381003e-01 0.2983645430
## Heart.Rate 2.024337e-01 -0.3276790399
## Daily.Steps 3.755992e-05 -0.0007085623
## BP_Systolic 1.134453e-01 0.0002235590
##
## Proportion of trace:
## LD1 LD2
## 0.7977 0.2023
library(caret)
## Warning: package 'caret' was built under R version 4.4.3
## Loading required package: lattice
lda_pred <- predict(lda_model, newdata = df[, numeric_cols])
# Confusion matrix
conf_matrix <- confusionMatrix(as.factor(lda_pred$class), as.factor(df$Sleep.Disorder))
print(conf_matrix)
## Confusion Matrix and Statistics
##
## Reference
## Prediction None Sleep Apnea Insomnia
## None 192 7 8
## Sleep Apnea 6 65 6
## Insomnia 21 6 63
##
## Overall Statistics
##
## Accuracy : 0.8556
## 95% CI : (0.8158, 0.8896)
## No Information Rate : 0.5856
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.7525
##
## Mcnemar's Test P-Value : 0.1163
##
## Statistics by Class:
##
## Class: None Class: Sleep Apnea Class: Insomnia
## Sensitivity 0.8767 0.8333 0.8182
## Specificity 0.9032 0.9595 0.9091
## Pos Pred Value 0.9275 0.8442 0.7000
## Neg Pred Value 0.8383 0.9562 0.9507
## Prevalence 0.5856 0.2086 0.2059
## Detection Rate 0.5134 0.1738 0.1684
## Detection Prevalence 0.5535 0.2059 0.2406
## Balanced Accuracy 0.8900 0.8964 0.8636
Untuk mengetahui variabel mana yang berpengaruh terhadap gangguan
tidur (Sleep Disorder
), salah satu caranya adalah dengan
melihat plot antara fungsi diskriminan.
plot(lda_model, col = as.integer(df[, c(numeric_cols, "Sleep.Disorder")]$Sleep.Disorder))
Dilihat dari plot diatas, dapat dikatakan model mampu mengelompokkan
data dengan cukup baik walaupun terdapat overlap pada ketiga kategori.
Dapat dilihat pula fungsi diskriminan LD1 berperan besar dalam
membedakan antara kategori Sleep Disorder
, sedangkan fungsi
diskriminan LD2 tidak berperan besar dalam membedakan kategori
Sleep Disorder
.
if (!require(nnet)) install.packages("nnet")
## Loading required package: nnet
library(nnet)
model <- multinom(Sleep.Disorder ~ Age + Gender_Male + Occupation_Accountant + Occupation_Doctor + Occupation_Engineer +
Occupation_Lawyer + Occupation_Manager + Occupation_Nurse + Occupation_SalesRepresentative + Occupation_Salesperson +
Occupation_Scientist + Occupation_Teacher + BMI.Category_Overweight + BMI.Category_Obese + Sleep.Duration +
Physical.Activity.Level + Stress.Level + Heart.Rate + Daily.Steps + BP_Systolic,
data = df)
## # weights: 66 (42 variable)
## initial value 410.880996
## iter 10 value 232.513420
## iter 20 value 120.129947
## iter 30 value 113.374575
## iter 40 value 106.324151
## iter 50 value 104.656806
## iter 60 value 104.138955
## iter 70 value 104.104534
## iter 80 value 104.080944
## final value 104.078396
## converged
summary(model)
## Call:
## multinom(formula = Sleep.Disorder ~ Age + Gender_Male + Occupation_Accountant +
## Occupation_Doctor + Occupation_Engineer + Occupation_Lawyer +
## Occupation_Manager + Occupation_Nurse + Occupation_SalesRepresentative +
## Occupation_Salesperson + Occupation_Scientist + Occupation_Teacher +
## BMI.Category_Overweight + BMI.Category_Obese + Sleep.Duration +
## Physical.Activity.Level + Stress.Level + Heart.Rate + Daily.Steps +
## BP_Systolic, data = df)
##
## Coefficients:
## (Intercept) Age Gender_Male Occupation_Accountant
## Sleep Apnea -90.04220 -0.009096638 -0.2926666 -15.79232
## Insomnia -93.06757 0.205253795 1.6394267 13.51101
## Occupation_Doctor Occupation_Engineer Occupation_Lawyer
## Sleep Apnea 4.0354511 3.136961 3.332353
## Insomnia 0.2703967 5.618437 3.425211
## Occupation_Manager Occupation_Nurse Occupation_SalesRepresentative
## Sleep Apnea -14.666577 7.275768 18.20412
## Insomnia -6.201642 1.557523 -17.82102
## Occupation_Salesperson Occupation_Scientist Occupation_Teacher
## Sleep Apnea 7.498148 7.526992 8.656193
## Insomnia 8.453898 -18.177851 7.871865
## BMI.Category_Overweight BMI.Category_Obese Sleep.Duration
## Sleep Apnea -0.5379552 47.15032 2.928505
## Insomnia -6.1155857 37.42976 -3.238395
## Physical.Activity.Level Stress.Level Heart.Rate Daily.Steps
## Sleep Apnea -0.05838660 0.46028362 0.3989656 0.0008301033
## Insomnia -0.02007838 -0.05453537 0.1822150 -0.0010739335
## BP_Systolic
## Sleep Apnea 0.2300954
## Insomnia 0.7556189
##
## Std. Errors:
## (Intercept) Age Gender_Male Occupation_Accountant
## Sleep Apnea 0.005460022 0.05977360 0.03373033 3.921671e-12
## Insomnia 0.004487709 0.05631287 0.02898579 1.991206e-02
## Occupation_Doctor Occupation_Engineer Occupation_Lawyer
## Sleep Apnea 0.01852534 0.01386726 0.008904411
## Insomnia 0.01426357 0.03261901 0.021296859
## Occupation_Manager Occupation_Nurse Occupation_SalesRepresentative
## Sleep Apnea 1.592468e-12 0.03449422 1.042231e-14
## Insomnia 2.606316e-10 0.01676492 1.038601e-14
## Occupation_Salesperson Occupation_Scientist Occupation_Teacher
## Sleep Apnea 0.009559529 5.443588e-03 0.004355305
## Insomnia 0.041544018 3.695043e-13 0.007034011
## BMI.Category_Overweight BMI.Category_Obese Sleep.Duration
## Sleep Apnea 0.03791723 0.003184474 0.1252124
## Insomnia 0.06462222 0.003184474 0.1225856
## Physical.Activity.Level Stress.Level Heart.Rate Daily.Steps
## Sleep Apnea 0.02494531 0.1384985 0.09556256 0.0002878376
## Insomnia 0.02370210 0.1721407 0.07180195 0.0003319279
## BP_Systolic
## Sleep Apnea 0.06978849
## Insomnia 0.04760106
##
## Residual Deviance: 208.1568
## AIC: 292.1568
model_null <- multinom(Sleep.Disorder ~ 1, data = df, trace = FALSE)
anova_result <- anova(model_null, model, test = "Chisq")
print(anova_result)
## Likelihood ratio tests of Multinomial Models
##
## Response: Sleep.Disorder
## Model
## 1 1
## 2 Age + Gender_Male + Occupation_Accountant + Occupation_Doctor + Occupation_Engineer + Occupation_Lawyer + Occupation_Manager + Occupation_Nurse + Occupation_SalesRepresentative + Occupation_Salesperson + Occupation_Scientist + Occupation_Teacher + BMI.Category_Overweight + BMI.Category_Obese + Sleep.Duration + Physical.Activity.Level + Stress.Level + Heart.Rate + Daily.Steps + BP_Systolic
## Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
## 1 746 722.3373
## 2 706 208.1568 1 vs 2 40 514.1805 0
Berdasarkan uji Likelihood Ratio, terdapat perbedaan yang signifikan
antara model intercept dan model penuh (χ²(40) = 514.18, p < 0.001).
Hal ini menunjukkan bahwa variabel-variabel prediktor secara
bersama-sama berkontribusi signifikan dalam memprediksi gangguan tidur
(Sleep Disorder
).
model_summary <- summary(model)
coefs <- model_summary$coefficients
std_err <- model_summary$standard.errors
z_values <- coefs / std_err
p_values <- 2 * (1 - pnorm(abs(z_values)))
print(round(p_values, 4))
## (Intercept) Age Gender_Male Occupation_Accountant
## Sleep Apnea 0 0.8790 0 0
## Insomnia 0 0.0003 0 0
## Occupation_Doctor Occupation_Engineer Occupation_Lawyer
## Sleep Apnea 0 0 0
## Insomnia 0 0 0
## Occupation_Manager Occupation_Nurse Occupation_SalesRepresentative
## Sleep Apnea 0 0 0
## Insomnia 0 0 0
## Occupation_Salesperson Occupation_Scientist Occupation_Teacher
## Sleep Apnea 0 0 0
## Insomnia 0 0 0
## BMI.Category_Overweight BMI.Category_Obese Sleep.Duration
## Sleep Apnea 0 0 0
## Insomnia 0 0 0
## Physical.Activity.Level Stress.Level Heart.Rate Daily.Steps
## Sleep Apnea 0.0193 0.0009 0.0000 0.0039
## Insomnia 0.3969 0.7514 0.0112 0.0012
## BP_Systolic
## Sleep Apnea 0.001
## Insomnia 0.000
Berdasarkan uji parsial, ditemukan bahwa sebagian besar variabel prediktor berkontribusi signifikan secara statistik terhadap probabilitas individu mengalami gangguan tidur, baik Sleep Apnea maupun Insomnia.
Untuk Sleep Apnea, variabel seperti Heart Rate
(p =
0.0000), Stress Level
(p = 0.0009), dan
Daily Steps
(p = 0.0039) menunjukkan pengaruh individual
yang signifikan terhadap prediksi gangguan tidur. Sementara itu,
variabel seperti Age
(p = 0.8790) tidak menunjukkan
kontribusi signifikan secara individual.
Sedangkan untuk Insomnia, variabel seperti Age
(p =
0.0003), Heart Rate
(p = 0.0112), dan
BP_Systolic
(p = 0.0000) berperan penting dalam model
prediktif, sedangkan variabel seperti
Physical Activity Level
(p = 0.3969) dan
Stress Level
(p = 0.7514) tidak berkontribusi secara
signifikan secara individual.
mlr_pred <- predict(model, newdata = df)
library(caret)
conf_matrix <- confusionMatrix(mlr_pred, df$Sleep.Disorder)
print(conf_matrix)
## Confusion Matrix and Statistics
##
## Reference
## Prediction None Sleep Apnea Insomnia
## None 210 5 7
## Sleep Apnea 5 71 3
## Insomnia 4 2 67
##
## Overall Statistics
##
## Accuracy : 0.9305
## 95% CI : (0.8998, 0.9541)
## No Information Rate : 0.5856
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.8776
##
## Mcnemar's Test P-Value : 0.7969
##
## Statistics by Class:
##
## Class: None Class: Sleep Apnea Class: Insomnia
## Sensitivity 0.9589 0.9103 0.8701
## Specificity 0.9226 0.9730 0.9798
## Pos Pred Value 0.9459 0.8987 0.9178
## Neg Pred Value 0.9408 0.9763 0.9668
## Prevalence 0.5856 0.2086 0.2059
## Detection Rate 0.5615 0.1898 0.1791
## Detection Prevalence 0.5936 0.2112 0.1952
## Balanced Accuracy 0.9407 0.9416 0.9250
data.frame(exp(summary(model)$coefficients))
## X.Intercept. Age Gender_Male Occupation_Accountant
## Sleep Apnea 7.855423e-40 0.9909446 0.7462709 1.385106e-07
## Insomnia 3.813004e-41 1.2278366 5.1522150 7.374905e+05
## Occupation_Doctor Occupation_Engineer Occupation_Lawyer
## Sleep Apnea 56.568431 23.03376 28.00415
## Insomnia 1.310484 275.45837 30.72912
## Occupation_Manager Occupation_Nurse Occupation_SalesRepresentative
## Sleep Apnea 4.269592e-07 1444.859837 8.052818e+07
## Insomnia 2.026100e-03 4.747049 1.821502e-08
## Occupation_Salesperson Occupation_Scientist Occupation_Teacher
## Sleep Apnea 1804.697 1.857511e+03 5745.617
## Insomnia 4693.331 1.274851e-08 2622.452
## BMI.Category_Overweight BMI.Category_Obese Sleep.Duration
## Sleep Apnea 0.583941052 3.000030e+20 18.69965262
## Insomnia 0.002208182 1.801106e+16 0.03922682
## Physical.Activity.Level Stress.Level Heart.Rate Daily.Steps
## Sleep Apnea 0.9432852 1.584523 1.490282 1.0008304
## Insomnia 0.9801218 0.946925 1.199872 0.9989266
## BP_Systolic
## Sleep Apnea 1.258720
## Insomnia 2.128929
Berdasarkan hasil odds ratio, beberapa variabel memiliki pengaruh besar terhadap peningkatan risiko Sleep Apnea dan Insomnia:
A. Sleep Apnea
- Occupation_Teacher
(OR = 5745.62),
Occupation_Scientist
(OR = 1857.51),
Occupation_Salesperson
(OR = 1804.70), dan
Occupation_Nurse
(OR = 1444.86) menunjukkan bahwa individu
dengan pekerjaan ini memiliki peluang yang sangat tinggi mengalami Sleep
Apnea dibandingkan kategori referensi
(Occupation_SoftwareEngineer
).
- Sleep Duration
(OR = 18.70) dan BMI_Obese
(OR = 3.000030e+20) juga berkontribusi signifikan menaikkan risiko Sleep
Apnea.
- Physical Activity Level
(OR = 0.9432852) menurunkan
risiko Sleep Apnea.
B. Insomnia
- Occupation_Salesperson
(OR = 4693.33) dan
Occupation_Teacher
(OR = 2622.45) memiliki pengaruh besar
terhadap peningkatan risiko Insomnia.
- BMI_Obese
(OR = 1.801106e+16) sangat meningkatkan risiko
Insomnia.
- Sleep Duration
(OR = 0.039) dan Daily Steps
(OR = 0.9989266) menurunkan risiko Insomnia.