Transformasi Data
Kesehatan Umum
Pipeline lengkap transformasi data pasien — mulai impor, pembersihan, rekayasa fitur, deteksi pencilan, hingga normalisasi siap pakai untuk machine learning.
| Bagian | Topik | Metode |
|---|---|---|
| 6.9.2.1 | Mengimpor & Memeriksa Data |
read.csv, str(), summary()
|
| 6.9.2.2 | Membersihkan Data | Faktorisasi, standarisasi satuan |
| 6.9.2.3 | Rekayasa Fitur | BMI, kelompok usia, kondisi kronis |
| 6.9.2.4 | Kategorisasi & Pengelompokan | Tekanan darah, aktivitas, ordinal |
| 6.9.2.5 | Deteksi & Penanganan Pencilan | IQR, Z-score |
| 6.9.2.6 | Fitur Temporal & Bergulir | Tanggal, rolling mean |
| 6.9.2.7 | Mengkode Variabel Kategorikal | One-hot, label encoding |
| 6.9.2.8 | Normalisasi & Skala Fitur | Z-score, Min-Max |
Mengimpor dan Memeriksa Kumpulan Data
Memuat data dari sumber Excel dan memeriksa struktur, tipe data, serta nilai yang hilang.
# Load paket yang dibutuhkan
if (!requireNamespace("dplyr", quietly = TRUE)) install.packages("dplyr")
if (!requireNamespace("ggplot2", quietly = TRUE)) install.packages("ggplot2")
library(dplyr)
library(ggplot2)
# Data di-embed langsung agar tidak bergantung path file eksternal
df <- read.csv(text = paste0(
"No,ID_Pasien,Tanggal,Usia,BMI,Tekanan_Darah,Kolesterol,Glukosa,Detak_Jantung,Lokasi,Kondisi_Kesehatan,Musim\n",
"1,SLy5n7T2vCfd,2021-07-14,27,31.5,122.8,131.5,77.7,70.0,Makassar,Healthy,Dry Season\n",
"2,SS9WdTh6Gp9l,2020-11-16,63,26.9,119.9,212.8,128.1,82.2,Jakarta,Diabetes,Rainy Season\n",
"3,5PBRrmglA03t,2023-03-22,72,18.2,146.0,158.5,100.3,73.9,Surabaya,Healthy,Transitional Season\n",
"4,0cAGgC7hcyxq,2023-01-02,60,19.9,121.2,220.9,103.4,79.1,Bandung,Diabetes,Rainy Season\n",
"5,0KSEA9pnVHdd,2023-06-05,40,32.5,109.4,229.8,91.9,67.0,Makassar,Healthy,Dry Season\n",
"6,Zba4dbAEtGwn,2023-03-15,71,27.4,126.2,209.3,102.0,56.3,Bandung,Obesity,Transitional Season\n",
"7,R8Qx2GZT0XQT,2021-09-25,74,17.1,111.3,117.8,103.5,78.5,Makassar,Healthy,Dry Season\n",
"8,4CDiQyhVv9KV,2020-02-18,44,25.3,108.7,140.5,78.8,72.1,Jakarta,Hypertension,Rainy Season\n",
"9,PPPsJBNOlqxa,2023-02-25,51,18.2,91.4,285.0,73.6,93.7,Jakarta,Healthy,Rainy Season\n",
"10,wsS2iEHE4Sh6,2023-08-19,33,17.6,124.6,265.0,77.5,72.7,Bandung,Healthy,Dry Season\n",
"11,dHw1Qj2Kba4S,2020-01-24,63,25.3,113.9,199.2,70.5,85.2,Bandung,Diabetes,Rainy Season\n",
"12,8iJeK256qAc2,2020-12-21,42,25.0,130.8,237.5,90.6,68.6,Bandung,Hypertension,Rainy Season\n",
"13,B1Cp6SdRoNK8,2024-06-12,45,19.6,108.7,152.1,121.4,72.5,Makassar,Obesity,Dry Season\n",
"14,qhNcbTreDrX4,2020-06-13,32,25.6,119.9,225.9,91.9,88.6,Makassar,Healthy,Dry Season\n",
"15,hkKpYRePsho5,2021-09-13,66,26.8,117.5,168.2,80.5,68.8,Makassar,Hypertension,Dry Season\n",
"16,LbjXwfGVvV5N,2024-04-04,55,19.2,124.7,165.1,98.3,77.6,Surabaya,Obesity,Transitional Season\n",
"17,BiYC2oc91bEX,2021-02-13,40,26.5,112.1,202.7,82.2,81.3,Bandung,Diabetes,Rainy Season\n",
"18,AzpV85tGCDMn,2022-06-17,78,23.9,135.2,225.9,116.3,82.5,Jakarta,Healthy,Dry Season\n",
"19,C5rG5plKpaLe,2023-08-13,36,29.6,120.2,177.2,79.1,61.3,Medan,Obesity,Dry Season\n",
"20,5UlgANBrFxA6,2021-08-23,27,23.5,149.6,176.1,96.6,68.9,Jakarta,Hypertension,Dry Season\n",
"21,wxj3dfHyqO9O,2023-07-30,30,27.6,159.1,241.1,75.2,71.4,Surabaya,Obesity,Dry Season\n",
"22,Prr6AHVPDbL2,2022-07-20,79,20.1,130.7,267.2,67.5,90.6,Jakarta,Hypertension,Dry Season\n",
"23,TMQmpYjNnZnA,2022-09-23,55,28.3,106.0,202.3,64.4,80.9,Bandung,Healthy,Dry Season\n",
"24,uRa6kvMcSZh1,2021-09-12,18,30.3,119.2,148.8,92.2,68.8,Jakarta,Hypertension,Dry Season\n",
"25,M6LVbKwm5qJ8,2024-03-20,28,30.2,113.2,195.0,72.7,76.1,Makassar,Hypertension,Transitional Season\n",
"26,eQcx4kEp8CTq,2023-05-20,62,23.4,142.3,224.5,99.0,77.5,Jakarta,Obesity,Transitional Season\n",
"27,u2UNPGzvGMXL,2022-05-09,43,28.2,140.7,150.3,96.4,68.7,Surabaya,Diabetes,Transitional Season\n",
"28,MEBXjhI3nRLw,2020-09-15,75,28.1,112.3,171.8,120.2,78.0,Medan,Healthy,Dry Season\n",
"29,RKCBNjej149x,2023-08-30,27,24.6,119.3,247.9,88.7,86.0,Jakarta,Diabetes,Dry Season\n",
"30,bZ8xp89XCPPy,2021-04-25,76,27.6,103.6,278.8,74.4,88.7,Jakarta,Hypertension,Transitional Season\n",
"31,ODOme8ic2spy,2023-08-14,62,14.9,132.8,147.9,130.2,81.3,Jakarta,Hypertension,Dry Season\n",
"32,uDDO1l3114cX,2020-01-24,32,26.0,128.4,180.9,107.4,69.5,Bandung,Diabetes,Rainy Season\n",
"33,jEvmcnozHA9A,2023-03-27,34,18.6,138.8,205.6,94.5,80.3,Medan,Diabetes,Transitional Season\n",
"34,bKdK1pp7GPSg,2020-10-25,64,25.3,103.4,213.9,72.3,85.4,Bandung,Obesity,Transitional Season\n",
"35,9rjIML4tc831,2021-02-09,25,31.4,125.5,198.0,84.0,81.6,Jakarta,Hypertension,Rainy Season\n",
"36,QdV5zOh96PeD,2020-11-09,53,27.1,112.3,171.5,125.2,73.3,Bandung,Obesity,Rainy Season\n",
"37,SsaI3yxuiCL0,2024-07-29,23,22.2,133.7,235.5,75.4,87.8,Medan,Healthy,Dry Season\n",
"38,1Aea5e7ohPlp,2023-08-08,27,23.2,139.6,224.4,98.8,72.8,Surabaya,Diabetes,Dry Season\n",
"39,LN3e0Y5sFUAC,2022-04-15,77,20.1,107.4,201.0,98.6,82.1,Bandung,Cardiovascular Disease,Transitional Season\n",
"40,czC6gasCKh5m,2022-09-08,44,15.5,126.0,207.9,98.9,70.2,Bandung,Obesity,Dry Season\n",
"41,7rm13rwzbOp8,2020-05-25,37,28.4,103.6,168.0,95.4,72.3,Surabaya,Obesity,Transitional Season\n",
"42,E6G6fkQci0Qv,2024-07-02,73,31.0,128.2,199.2,84.0,72.5,Jakarta,Healthy,Dry Season\n",
"43,QkxBgv0al3FI,2023-10-03,42,25.1,99.3,222.1,100.5,96.7,Surabaya,Diabetes,Transitional Season\n",
"44,VSuBx01A35fV,2020-07-15,80,21.0,152.7,151.0,78.4,93.4,Makassar,Obesity,Dry Season\n",
"45,GWJItsJM7kNd,2024-03-19,41,24.3,154.0,204.7,102.0,72.4,Makassar,Diabetes,Transitional Season\n",
"46,mnaimP2BXUkB,2020-08-13,33,24.5,133.4,154.8,86.7,73.1,Jakarta,Cardiovascular Disease,Dry Season\n",
"47,ur0hDynQhOda,2021-05-18,61,22.2,119.2,247.6,105.7,87.1,Bandung,Hypertension,Transitional Season\n",
"48,3ON805VB964g,2023-10-10,33,23.6,129.9,173.9,70.5,76.6,Medan,Cardiovascular Disease,Transitional Season\n",
"49,ZRxN3WBeQ5Et,2022-09-10,68,24.2,108.2,167.5,83.6,64.8,Jakarta,Hypertension,Dry Season\n",
"50,9Qjm9pRYpZdR,2021-12-27,20,27.0,96.4,166.1,107.1,88.3,Makassar,Hypertension,Rainy Season\n",
"51,3Tuxh3auRBBI,2022-08-02,73,29.5,124.7,220.5,113.5,74.8,Makassar,Obesity,Dry Season\n",
"52,C5At5ohS56fl,2023-06-22,49,16.1,82.2,187.9,95.0,70.6,Surabaya,Healthy,Dry Season\n",
"53,CZoSq9VKeUbB,2022-07-10,38,32.7,106.8,176.8,85.8,72.0,Medan,Healthy,Dry Season\n",
"54,4WmVm8Bo9fdG,2021-09-17,56,35.2,116.1,141.9,102.6,78.8,Bandung,Cardiovascular Disease,Dry Season\n",
"55,8bZ6jhOrR6FA,2020-09-18,20,20.4,138.4,256.0,98.3,72.6,Bandung,Diabetes,Dry Season\n",
"56,NjHNzGjbrfJD,2021-01-24,27,22.2,134.1,225.8,115.2,87.7,Medan,Hypertension,Rainy Season\n",
"57,P1zgCnGbR8re,2020-05-09,38,28.2,131.5,213.7,132.0,82.6,Medan,Diabetes,Transitional Season\n",
"58,0cyS8m6oM9k1,2023-10-27,59,24.7,108.8,182.7,94.1,82.1,Jakarta,Cardiovascular Disease,Transitional Season\n",
"59,C31JJycFo6si,2022-10-23,24,30.2,138.6,118.5,118.9,52.3,Bandung,Cardiovascular Disease,Transitional Season\n",
"60,Knvjs1J9ocq2,2021-01-08,36,27.0,117.5,179.3,101.9,84.5,Bandung,Hypertension,Rainy Season\n",
"61,5PwDTpgJS8FC,2023-11-29,46,19.8,144.0,211.7,103.9,67.5,Jakarta,Cardiovascular Disease,Rainy Season\n",
"62,wPuc8xUQh42z,2022-02-08,62,26.8,99.2,168.6,100.1,73.7,Jakarta,Hypertension,Rainy Season\n",
"63,KcgSMz1e4aBQ,2022-10-11,59,20.2,114.7,154.2,76.4,68.2,Jakarta,Cardiovascular Disease,Transitional Season\n",
"64,ZL2sgkrjHMp8,2021-11-28,72,24.5,120.0,251.0,94.4,73.9,Jakarta,Diabetes,Rainy Season\n",
"65,p6UkMUQ11wik,2021-10-11,63,24.3,128.9,134.6,76.2,51.4,Medan,Healthy,Transitional Season\n",
"66,dLRx0ac4ifBI,2020-02-09,41,21.4,112.8,159.9,103.9,62.9,Jakarta,Healthy,Rainy Season\n",
"67,qqi47mApPU9g,2021-05-31,46,30.6,127.0,159.5,68.1,88.8,Medan,Diabetes,Transitional Season\n",
"68,9CuO7Sbq3bUi,2020-02-02,73,25.0,89.0,165.2,117.5,65.8,Surabaya,Healthy,Rainy Season\n",
"69,iErG8BEooWia,2022-08-02,47,26.7,113.3,233.8,113.9,61.6,Jakarta,Cardiovascular Disease,Dry Season\n",
"70,NQWBJglidgSa,2020-04-12,60,33.3,114.4,133.4,115.0,84.0,Makassar,Healthy,Transitional Season\n",
"71,CjbcNQ1BA325,2023-06-05,62,23.8,123.2,192.0,99.3,73.6,Makassar,Obesity,Dry Season\n",
"72,pnvSzLykEK6v,2020-04-18,52,25.5,110.3,184.1,77.4,74.5,Makassar,Hypertension,Transitional Season\n",
"73,Gnt89gE8EkOR,2022-09-28,59,24.0,102.3,253.0,82.9,70.1,Makassar,Obesity,Dry Season\n",
"74,kE4TUQ2Oqxn5,2023-09-14,68,29.1,122.7,233.5,82.0,68.1,Makassar,Obesity,Dry Season\n",
"75,bkPRqOjg7Dzu,2024-08-19,51,26.6,122.8,175.9,65.6,88.0,Jakarta,Healthy,Dry Season\n",
"76,Gb9KfzbEYL4O,2020-03-16,56,23.0,101.0,237.1,62.1,68.4,Surabaya,Healthy,Transitional Season\n",
"77,X0epzADTlfCV,2022-09-06,69,28.8,101.9,157.7,118.1,62.5,Jakarta,Healthy,Dry Season\n",
"78,cmAl1i0EohW2,2024-12-05,38,20.6,116.2,145.6,77.8,67.1,Jakarta,Hypertension,Rainy Season\n",
"79,eA1bRTXGwuFF,2024-02-23,70,25.8,119.5,238.0,106.2,85.7,Bandung,Diabetes,Rainy Season\n",
"80,mjUSpoZDYaQF,2024-04-19,60,17.4,134.8,204.8,58.6,55.3,Bandung,Diabetes,Transitional Season\n",
"81,XXX5QdRgn3Q1,2022-01-02,75,27.8,126.3,127.6,116.4,75.5,Makassar,Obesity,Rainy Season\n",
"82,CUSV68x3zoUK,2020-01-16,60,25.2,145.2,219.6,82.4,73.2,Jakarta,Diabetes,Rainy Season\n",
"83,VHApcvpLmgUU,2023-10-12,79,24.9,123.2,180.8,124.2,56.7,Bandung,Cardiovascular Disease,Transitional Season\n",
"84,tDxONbfys5NO,2024-10-21,43,19.9,140.3,184.6,54.0,80.3,Jakarta,Cardiovascular Disease,Transitional Season\n",
"85,YO8LBK0BF0IL,2023-06-25,50,24.0,97.6,249.8,59.8,73.5,Makassar,Cardiovascular Disease,Dry Season\n",
"86,rY8HSULOLjYq,2022-10-02,70,31.6,105.5,182.2,77.3,70.9,Jakarta,Diabetes,Transitional Season\n",
"87,vO7bkFFB8qnb,2023-09-10,79,28.8,105.6,188.8,68.3,91.0,Bandung,Obesity,Dry Season\n",
"88,66A3qwSlw5zi,2022-09-14,23,28.3,93.5,175.5,116.5,65.2,Surabaya,Obesity,Dry Season\n",
"89,ZtmrXNH4ENrK,2024-07-11,75,27.8,124.3,213.2,101.9,72.3,Bandung,Healthy,Dry Season\n",
"90,lagtok7K8VYr,2024-07-23,40,27.7,90.1,219.7,86.6,83.0,Makassar,Diabetes,Dry Season\n",
"91,hmsmv4ZS8a7x,2020-03-22,59,18.7,133.3,228.3,77.3,59.7,Bandung,Obesity,Transitional Season\n",
"92,AvuBeevTFWT8,2022-07-02,58,18.1,121.1,199.7,90.5,67.0,Makassar,Diabetes,Dry Season\n",
"93,jDdv7f4E02kN,2020-11-20,77,22.9,126.7,181.4,87.2,75.5,Jakarta,Diabetes,Rainy Season\n",
"94,ipUny9bqJMsU,2024-07-03,61,30.7,122.3,103.1,98.3,77.9,Bandung,Diabetes,Dry Season\n",
"95,qKEfW1oqpuRu,2023-10-15,77,27.3,142.5,240.9,105.6,60.9,Makassar,Healthy,Transitional Season\n",
"96,63x6YJJbNTKQ,2022-08-08,62,22.0,124.9,240.1,86.8,90.4,Surabaya,Hypertension,Dry Season\n",
"97,2FQwpxz6IBQD,2020-10-22,35,25.7,127.2,225.1,109.4,73.9,Medan,Hypertension,Transitional Season\n",
"98,kUQ1IPMEaH0X,2020-05-28,64,26.0,92.6,182.0,121.9,60.9,Jakarta,Healthy,Transitional Season\n",
"99,G2YQgAA5H722,2022-09-22,38,23.5,132.9,235.3,104.1,62.0,Medan,Hypertension,Dry Season\n",
"100,gpjcTYVW0hvs,2024-05-11,50,27.3,116.0,244.6,115.0,84.1,Jakarta,Cardiovascular Disease,Transitional Season\n",
"101,NkH1VhjUwNeY,2020-04-09,28,24.4,89.8,209.2,103.4,91.8,Jakarta,Obesity,Transitional Season\n",
"102,8BTzXeZmpESW,2022-09-25,57,26.8,122.8,198.2,109.2,66.8,Surabaya,Obesity,Dry Season\n",
"103,Dnu6zMN9pgra,2020-10-24,62,27.5,123.5,123.3,77.6,76.1,Makassar,Healthy,Transitional Season\n",
"104,o4LHkRt7zeeP,2021-02-05,31,23.5,130.2,260.8,99.2,86.6,Makassar,Diabetes,Rainy Season\n",
"105,WsAIVYl7kfBM,2020-03-31,19,23.6,129.2,210.2,100.7,71.6,Bandung,Diabetes,Transitional Season\n",
"106,V18VedXktgYT,2022-02-19,55,22.4,128.4,215.8,61.1,72.6,Surabaya,Diabetes,Rainy Season\n",
"107,qtiwC54fTZz0,2023-04-19,62,26.6,149.8,240.2,128.4,63.8,Medan,Healthy,Transitional Season\n",
"108,mihMjctTxWp4,2022-12-13,79,26.3,144.3,195.4,85.6,74.7,Bandung,Hypertension,Rainy Season\n",
"109,t6qBvNsk3LEi,2022-04-30,52,25.3,122.8,163.7,51.6,92.2,Surabaya,Obesity,Transitional Season\n",
"110,PDUwyUeNrqOI,2020-10-14,45,22.6,117.9,189.8,97.4,93.4,Makassar,Cardiovascular Disease,Transitional Season\n",
"111,ZfQ2RAx1cIdd,2021-12-20,62,27.2,108.8,159.1,110.7,65.1,Makassar,Hypertension,Rainy Season\n",
"112,TNHEQp6CUlDi,2024-11-16,33,20.6,134.5,226.1,88.5,81.2,Surabaya,Obesity,Rainy Season\n",
"113,sBvZ7A6pJpTL,2020-02-29,68,23.6,106.8,190.4,92.3,90.1,Medan,Diabetes,Rainy Season\n",
"114,YzmZ9hbeGnG3,2023-08-01,33,23.6,118.5,195.0,111.4,80.7,Surabaya,Obesity,Dry Season\n",
"115,hdY4liCRXvQq,2022-04-28,62,33.9,135.3,187.0,89.7,65.4,Bandung,Hypertension,Transitional Season\n",
"116,qqUdDDlchznV,2024-07-01,62,19.3,118.2,244.2,76.1,82.8,Makassar,Obesity,Dry Season\n",
"117,ZlCR4urRpHVw,2022-08-26,25,30.0,126.8,150.2,90.8,62.0,Surabaya,Cardiovascular Disease,Dry Season\n",
"118,0hSvnQJdcv9A,2021-01-11,74,29.6,146.1,193.0,116.2,78.3,Makassar,Healthy,Rainy Season\n",
"119,hZ50IYt9PIv3,2021-09-29,78,24.8,150.9,138.9,94.8,85.6,Makassar,Cardiovascular Disease,Dry Season\n",
"120,ddtpTdkXh4L5,2021-03-02,72,27.3,140.7,198.9,101.6,71.4,Makassar,Hypertension,Transitional Season\n",
"121,wdozCMsyJJ0P,2021-03-17,62,23.2,146.9,199.6,139.3,68.9,Surabaya,Healthy,Transitional Season\n",
"122,PYApLzGm0JJG,2023-02-09,55,23.2,107.2,214.6,83.7,64.0,Surabaya,Diabetes,Rainy Season\n",
"123,pi2AvHwVA5zt,2021-08-06,55,25.8,116.5,200.7,74.8,77.4,Makassar,Hypertension,Dry Season\n",
"124,8aY7LDtN6OJh,2023-08-01,25,25.2,154.5,226.2,114.7,88.0,Makassar,Healthy,Dry Season\n",
"125,4inWi8RcDHAH,2022-08-08,75,24.4,132.9,199.5,64.3,72.1,Bandung,Healthy,Dry Season\n",
"126,X95vyI8xT0Hb,2024-03-16,48,30.7,133.6,204.0,122.3,90.2,Makassar,Cardiovascular Disease,Transitional Season\n",
"127,OnTihZC0AZmn,2023-03-10,54,27.9,138.8,255.0,78.4,71.5,Makassar,Hypertension,Transitional Season\n",
"128,yqlEfZZBWQ1L,2024-10-07,40,28.9,130.9,233.7,90.3,76.8,Medan,Cardiovascular Disease,Transitional Season\n",
"129,iIJgckvcRLie,2022-08-26,58,24.2,145.2,258.9,132.6,72.0,Surabaya,Hypertension,Dry Season\n",
"130,EJvxvSmGmBcW,2020-11-06,27,21.9,109.8,228.7,79.1,80.7,Surabaya,Healthy,Rainy Season\n",
"131,edgiQWIBNahl,2022-06-30,76,30.8,114.3,179.9,85.7,81.5,Bandung,Cardiovascular Disease,Dry Season\n",
"132,Lh1fleaBFmPI,2021-04-18,77,26.6,112.7,203.3,55.5,75.7,Medan,Diabetes,Transitional Season\n",
"133,ezw6rGGizStF,2022-03-21,75,22.2,111.9,132.6,62.0,66.9,Jakarta,Healthy,Transitional Season\n",
"134,gqesJxhGk0Ro,2024-04-26,28,25.1,117.0,208.3,69.4,87.8,Jakarta,Diabetes,Transitional Season\n",
"135,ggCqsgSA2IKf,2020-02-11,44,22.5,144.7,194.9,101.6,65.3,Medan,Diabetes,Rainy Season\n",
"136,btjo7FWMSuJC,2022-01-29,40,25.6,106.8,240.7,63.9,76.8,Bandung,Diabetes,Rainy Season\n",
"137,A15eaI8YDCmT,2023-06-28,50,29.3,126.7,221.5,91.7,57.5,Medan,Healthy,Dry Season\n",
"138,I5cFqBarWwG7,2024-01-04,39,23.1,147.5,244.8,94.0,83.3,Jakarta,Diabetes,Rainy Season\n",
"139,DNB7dv1bfU7m,2024-08-19,32,25.5,93.2,292.8,85.1,68.5,Jakarta,Hypertension,Dry Season\n",
"140,ZFtVhROHc63M,2020-01-25,21,26.9,133.5,166.3,105.6,76.4,Surabaya,Healthy,Rainy Season\n",
"141,HvOt0txvc6qE,2021-12-03,77,22.8,104.9,195.8,97.4,86.6,Makassar,Diabetes,Rainy Season\n",
"142,UKggyAVZc1hO,2024-04-16,56,18.2,141.5,197.7,89.2,93.7,Surabaya,Cardiovascular Disease,Transitional Season\n",
"143,e7jBP9Nj7AMk,2023-06-15,60,24.2,132.0,176.7,93.0,71.6,Bandung,Diabetes,Dry Season\n",
"144,KQHfhGcO8LKP,2021-06-09,27,21.6,125.6,269.9,64.9,81.8,Jakarta,Obesity,Dry Season\n",
"145,bUdrXCGGx6fd,2021-09-14,25,28.7,118.4,162.4,123.2,86.8,Medan,Healthy,Dry Season\n",
"146,GbtHKKxIVI9z,2020-09-18,48,24.7,92.5,192.0,96.0,62.1,Surabaya,Obesity,Dry Season\n",
"147,f0tZDm9Dbb2L,2024-03-07,45,24.4,106.7,213.8,90.6,83.5,Surabaya,Healthy,Transitional Season\n",
"148,NuHckq2L3hOY,2022-07-31,52,18.7,107.4,207.0,83.4,57.7,Bandung,Cardiovascular Disease,Dry Season\n",
"149,BreqO9Hove1D,2020-10-25,62,20.8,138.9,196.2,77.1,75.2,Medan,Cardiovascular Disease,Transitional Season\n",
"150,zZaHEo8y8ll6,2024-01-09,37,26.9,116.6,190.4,112.8,69.0,Surabaya,Diabetes,Rainy Season\n",
"151,q8FyriaIGguB,2021-12-20,72,24.2,117.4,200.8,93.2,80.1,Bandung,Cardiovascular Disease,Rainy Season\n",
"152,m769nsubP0HS,2021-01-03,73,22.2,109.3,177.7,66.8,62.8,Medan,Obesity,Rainy Season\n",
"153,mlwGKHf4xqPl,2021-05-26,65,19.1,119.7,180.5,107.2,85.5,Medan,Obesity,Transitional Season\n",
"154,MUFm8J5mi4Nr,2022-01-27,25,33.5,91.7,236.0,105.6,102.8,Bandung,Diabetes,Rainy Season\n",
"155,48hSGV2mFh8s,2023-10-24,49,29.5,134.0,259.5,86.4,71.3,Surabaya,Cardiovascular Disease,Transitional Season\n",
"156,TjlKqPgUDfgI,2022-01-23,20,21.5,130.9,161.3,75.7,87.8,Surabaya,Healthy,Rainy Season\n",
"157,8LlCQ3LwjjdO,2022-03-09,58,22.6,140.9,197.1,70.7,47.3,Jakarta,Diabetes,Transitional Season\n",
"158,1i1ty7Zou6vh,2022-05-11,65,25.8,117.0,228.8,99.4,81.6,Surabaya,Healthy,Transitional Season\n",
"159,Tc5U450NwgTy,2023-06-01,28,24.7,132.5,212.6,94.1,68.4,Jakarta,Healthy,Dry Season\n",
"160,o7P2t8cMcGO4,2022-02-12,30,29.9,101.9,260.2,69.6,82.8,Medan,Hypertension,Rainy Season\n",
"161,S1px0UCD8GLy,2023-05-22,56,25.2,131.1,176.8,92.4,88.2,Medan,Diabetes,Transitional Season\n",
"162,9tkEtvptrfnb,2020-10-20,35,25.3,124.8,182.6,100.0,61.5,Surabaya,Cardiovascular Disease,Transitional Season\n",
"163,o0AU26zIGix2,2021-08-29,56,13.2,136.7,219.9,93.0,71.7,Bandung,Healthy,Dry Season\n",
"164,3JRAR6ZIAtNm,2024-03-21,38,24.5,135.3,185.3,118.4,85.1,Surabaya,Cardiovascular Disease,Transitional Season\n",
"165,n5TiH9JczWFc,2022-02-02,22,18.4,103.3,280.1,127.8,84.0,Bandung,Diabetes,Rainy Season\n",
"166,jVth8YMleatt,2022-02-21,48,23.2,127.1,179.7,106.0,66.9,Jakarta,Hypertension,Rainy Season\n",
"167,sTTCdLZRb4Q3,2021-06-29,73,21.8,148.7,214.3,52.8,75.6,Medan,Cardiovascular Disease,Dry Season\n",
"168,1COjcyePU6eG,2021-03-01,37,26.8,111.8,237.9,113.1,52.2,Medan,Hypertension,Transitional Season\n",
"169,OO2tQNRF0MoV,2021-04-15,35,24.0,131.3,158.4,74.1,72.3,Medan,Healthy,Transitional Season\n",
"170,8zjwaiAwa6t6,2020-07-06,56,23.5,122.1,244.5,57.2,75.2,Makassar,Diabetes,Dry Season\n",
"171,ZsNdVtNX32YD,2020-09-10,66,22.8,118.7,143.7,96.9,93.1,Surabaya,Diabetes,Dry Season\n",
"172,Bx331NLajO3h,2020-09-24,75,24.6,130.8,163.5,94.1,81.2,Surabaya,Hypertension,Dry Season\n",
"173,Evff6VvMC5JW,2021-10-01,20,18.6,134.7,286.0,89.7,74.1,Bandung,Hypertension,Transitional Season\n",
"174,sdhWnQ74rWUz,2023-12-04,65,29.0,112.6,242.8,78.3,85.2,Bandung,Cardiovascular Disease,Rainy Season\n",
"175,giV5Qdyl5tv9,2020-02-10,25,24.6,123.6,164.6,104.4,78.0,Bandung,Hypertension,Rainy Season\n",
"176,MMEpkIL2oVAb,2023-05-01,36,28.0,131.3,120.1,96.7,83.1,Makassar,Healthy,Transitional Season\n",
"177,89tAlWQZgYW6,2024-05-20,67,33.6,101.3,96.2,69.3,69.9,Jakarta,Healthy,Transitional Season\n",
"178,W9JQbgSptK2u,2022-04-03,27,21.4,141.9,278.1,62.8,83.7,Makassar,Hypertension,Transitional Season\n",
"179,kPHjIQTxuw4W,2021-10-25,32,29.5,106.2,253.2,61.3,81.7,Jakarta,Healthy,Transitional Season\n",
"180,BwP4ZzZ0yraI,2023-01-26,78,30.5,149.1,168.3,88.4,84.3,Makassar,Hypertension,Rainy Season\n",
"181,TqZkBJlvEnA4,2021-06-20,29,28.5,121.3,145.7,102.3,73.8,Medan,Obesity,Dry Season\n",
"182,fliL3quI1aqO,2020-12-13,52,26.8,111.3,280.1,112.7,85.4,Surabaya,Obesity,Rainy Season\n",
"183,W6sZiEvxUln5,2020-04-16,69,31.9,141.9,235.5,112.6,63.4,Medan,Obesity,Transitional Season\n",
"184,85SSvOlJ4EGe,2020-11-12,65,23.2,121.6,172.4,93.9,64.5,Bandung,Hypertension,Rainy Season\n",
"185,YjB4DQxd8yal,2022-11-03,49,22.2,145.5,193.8,87.0,70.5,Makassar,Hypertension,Rainy Season\n",
"186,nFaZQikTJkwZ,2020-06-10,78,25.9,113.8,261.8,100.0,87.3,Surabaya,Hypertension,Dry Season\n",
"187,Hg790cbc9OWf,2021-09-27,71,26.1,112.2,239.2,94.5,84.4,Surabaya,Hypertension,Dry Season\n",
"188,3kpyPIrrhJpr,2021-12-06,59,29.7,116.7,154.9,88.1,84.8,Makassar,Healthy,Rainy Season\n",
"189,ibaov3Kv6sy3,2024-06-18,51,29.4,99.1,212.3,95.6,70.1,Surabaya,Obesity,Dry Season\n",
"190,OrpyNMbPhA1N,2024-04-16,57,25.7,126.6,241.6,80.8,86.6,Medan,Healthy,Transitional Season\n",
"191,dGA9ytkotAOZ,2022-06-24,18,23.9,117.3,231.8,151.8,75.0,Bandung,Diabetes,Dry Season\n",
"192,XrdueCAoPK6x,2024-09-16,30,31.4,94.8,235.0,37.5,67.2,Surabaya,Obesity,Dry Season\n",
"193,v9Pul8RCmRda,2022-03-06,24,17.0,128.0,176.9,120.1,66.8,Makassar,Hypertension,Transitional Season\n",
"194,qwo7g0L5KWif,2021-09-21,28,22.1,122.1,223.4,140.5,76.1,Jakarta,Cardiovascular Disease,Dry Season\n",
"195,D5sjjYgeWepL,2022-11-26,28,26.7,122.7,278.4,90.5,94.3,Surabaya,Cardiovascular Disease,Rainy Season\n",
"196,cfRjN2SroAao,2023-03-07,30,20.2,118.3,133.6,109.6,81.4,Surabaya,Healthy,Transitional Season\n",
"197,JGPA0xbjpt7O,2023-10-24,80,26.3,116.5,149.9,82.5,72.9,Medan,Healthy,Transitional Season\n",
"198,XBJ9DjXeElTj,2022-10-05,54,22.1,109.0,259.7,112.9,71.1,Makassar,Hypertension,Transitional Season\n",
"199,uH2CFoKot3oA,2022-02-11,51,23.0,132.2,275.5,111.0,65.8,Medan,Diabetes,Rainy Season\n",
"200,Tm6mbxXNgsVq,2024-04-19,32,22.0,131.5,259.7,69.2,54.4,Jakarta,Healthy,Transitional Season\n",
"201,4eNCvu2eYXwU,2020-03-18,54,28.9,135.9,242.2,63.5,77.8,Bandung,Hypertension,Transitional Season\n",
"202,KTglJ1eD5yGJ,2021-06-09,31,24.6,104.2,214.4,99.1,58.2,Medan,Hypertension,Dry Season\n",
"203,rCzmKce3tePL,2024-08-18,54,23.2,105.1,200.2,68.7,90.2,Jakarta,Healthy,Dry Season\n",
"204,YDmj0xjDB1X4,2023-01-25,43,28.0,132.1,249.2,118.0,73.7,Medan,Hypertension,Rainy Season\n",
"205,NlTAoycVI7Xd,2024-10-30,66,34.8,135.0,142.1,100.5,75.9,Makassar,Obesity,Transitional Season\n",
"206,o1hN4R0eehMV,2021-04-27,45,25.2,113.1,267.2,101.5,74.3,Bandung,Obesity,Transitional Season\n",
"207,eIcpJBeciKpS,2020-06-30,35,25.6,106.6,193.6,96.4,85.2,Jakarta,Healthy,Dry Season\n",
"208,X7B7Qk6uYv4X,2024-02-06,76,31.7,124.7,257.8,98.0,80.4,Bandung,Cardiovascular Disease,Rainy Season\n",
"209,ys1sR7y98qCX,2023-02-24,71,25.8,101.5,187.3,114.7,73.5,Makassar,Healthy,Rainy Season\n",
"210,1I3F923KeMvE,2024-12-27,34,21.5,127.5,275.5,92.1,89.1,Jakarta,Diabetes,Rainy Season\n",
"211,CL8YsG4sdAmJ,2024-02-25,67,24.5,145.3,198.6,98.7,85.1,Bandung,Healthy,Rainy Season\n",
"212,yUgLC2uH3Uhg,2024-04-27,56,26.6,89.2,182.9,93.0,76.5,Medan,Diabetes,Transitional Season\n",
"213,XG04TMeDVqw5,2024-09-11,62,27.4,110.6,227.4,109.7,64.4,Makassar,Hypertension,Dry Season\n",
"214,KJ5sOtZt0ULl,2024-03-09,51,20.8,130.7,177.4,79.3,87.6,Makassar,Obesity,Transitional Season\n",
"215,jeSbihwq2Dvg,2022-09-18,29,31.8,142.5,206.7,61.1,82.4,Bandung,Cardiovascular Disease,Dry Season\n",
"216,l3uHkcX1ybPM,2020-12-02,47,30.6,116.7,263.4,85.0,71.3,Makassar,Cardiovascular Disease,Rainy Season\n",
"217,11Wb15uk4Js8,2021-06-22,52,27.1,121.2,133.0,50.8,77.3,Bandung,Hypertension,Dry Season\n",
"218,EXMivtga5EQ5,2024-04-14,37,22.7,125.0,165.9,69.1,61.0,Medan,Obesity,Transitional Season\n",
"219,MzBRj8zn8iPJ,2023-04-18,43,24.2,139.0,206.7,89.7,83.5,Bandung,Healthy,Transitional Season\n",
"220,zka7MeUL2lRY,2022-05-17,42,14.6,133.1,275.2,69.3,56.6,Surabaya,Cardiovascular Disease,Transitional Season\n",
"221,8f6UaPE4hjfu,2020-07-30,25,20.2,115.4,233.6,106.0,82.5,Surabaya,Healthy,Dry Season\n",
"222,dyYthz8z6YDg,2024-05-29,37,19.9,114.0,172.4,89.0,73.8,Bandung,Obesity,Transitional Season\n",
"223,syTaQ3PZXNhP,2020-07-19,45,22.3,124.1,212.0,70.7,78.5,Medan,Diabetes,Dry Season\n",
"224,VppKtirqZivF,2023-11-25,72,31.7,107.7,215.8,107.8,66.8,Medan,Obesity,Rainy Season\n",
"225,YiaqYMlOGaDU,2020-03-02,63,24.0,91.2,195.9,82.5,79.2,Makassar,Diabetes,Transitional Season\n",
"226,w65atN3exITS,2023-07-16,50,26.8,112.2,203.2,102.1,84.1,Makassar,Diabetes,Dry Season\n",
"227,zZdWG7ohbOlQ,2023-11-06,73,22.9,124.0,159.2,92.5,71.2,Makassar,Diabetes,Rainy Season\n",
"228,bsuhpwLcaow2,2023-02-21,63,27.5,122.3,214.8,87.6,70.2,Makassar,Obesity,Rainy Season\n",
"229,ZftJrUaMnuI4,2022-12-22,38,28.6,118.4,160.3,69.8,75.2,Makassar,Obesity,Rainy Season\n",
"230,TzyAb6tRKh0k,2021-07-14,73,29.9,96.9,239.8,37.2,68.5,Bandung,Hypertension,Dry Season\n",
"231,BgPU6enpUfAs,2021-07-26,67,27.1,121.8,152.2,50.3,70.3,Surabaya,Hypertension,Dry Season\n",
"232,rzNf7lSvYdzt,2024-07-09,73,33.0,124.0,240.6,59.2,94.9,Medan,Cardiovascular Disease,Dry Season\n",
"233,brMe9OZjMInU,2024-11-27,51,21.2,150.4,252.0,75.0,71.7,Medan,Healthy,Rainy Season\n",
"234,bGvsaQ5zBnUp,2021-06-23,79,27.5,83.8,247.7,112.7,62.5,Surabaya,Hypertension,Dry Season\n",
"235,jczAkji16bd5,2020-01-02,22,27.9,104.7,307.0,77.8,52.4,Surabaya,Hypertension,Rainy Season\n",
"236,5THkntphLXHu,2021-11-01,59,22.7,125.0,123.4,52.2,68.3,Surabaya,Diabetes,Rainy Season\n",
"237,OlyN0TRPr5Qq,2024-07-29,29,28.9,101.3,169.2,71.2,68.2,Jakarta,Obesity,Dry Season\n",
"238,7IkZom6HN9k3,2023-02-24,20,26.7,147.2,126.1,97.2,71.1,Bandung,Hypertension,Rainy Season\n",
"239,D8TiPgQkQJtd,2023-05-16,75,26.8,138.9,201.3,96.0,63.4,Makassar,Healthy,Transitional Season\n",
"240,rXgFPY27AIrc,2022-02-09,35,29.8,126.3,149.7,71.1,79.7,Medan,Hypertension,Rainy Season\n",
"241,BjNTrY5fx0Sg,2022-01-10,67,24.7,109.7,212.1,78.7,51.2,Bandung,Hypertension,Rainy Season\n",
"242,0S9qOEuCr9le,2020-09-25,67,27.6,122.7,198.9,92.2,83.5,Makassar,Cardiovascular Disease,Dry Season\n",
"243,eMSGIN8FqBX4,2022-11-01,64,24.1,121.3,113.5,78.6,76.3,Makassar,Obesity,Rainy Season\n",
"244,pNzsfbRgq1K0,2021-12-05,18,25.6,130.8,218.0,91.3,88.2,Bandung,Hypertension,Rainy Season\n",
"245,fHy2GkAq8otR,2021-10-31,25,36.6,112.3,252.2,107.8,81.3,Jakarta,Obesity,Transitional Season\n",
"246,i0tslbaGaeXu,2024-05-05,68,21.5,117.8,174.8,96.2,91.3,Jakarta,Hypertension,Transitional Season\n",
"247,FsYmNZlL8t70,2021-03-19,57,27.6,125.9,140.3,99.0,76.3,Jakarta,Cardiovascular Disease,Transitional Season\n",
"248,w8QKuYqoDTAA,2021-11-07,47,36.4,128.8,247.9,104.0,78.9,Makassar,Hypertension,Rainy Season\n",
"249,G01bcnIc4sCA,2024-05-01,37,24.1,115.7,209.0,70.3,60.8,Makassar,Healthy,Transitional Season\n",
"250,t6sbs2Ftw0Bj,2023-05-06,68,31.9,112.6,259.9,80.9,73.0,Surabaya,Diabetes,Transitional Season\n",
"251,knQ7vckx31vC,2022-06-30,51,20.3,133.9,175.3,96.2,59.8,Medan,Diabetes,Dry Season\n",
"252,MuFNMK6tJi6X,2022-10-30,38,26.8,118.9,168.6,55.9,85.7,Bandung,Obesity,Transitional Season\n",
"253,vVTr096aNW0N,2022-08-24,69,20.8,104.2,244.5,68.3,75.0,Bandung,Hypertension,Dry Season\n",
"254,1B2NpMbDPt4m,2023-07-02,80,19.1,99.1,260.8,86.7,52.8,Medan,Hypertension,Dry Season\n",
"255,JX7kXghdQtcI,2023-05-09,36,21.6,133.6,150.9,70.0,79.3,Surabaya,Cardiovascular Disease,Transitional Season\n",
"256,rqVi90y4aLJu,2024-11-03,78,20.8,124.8,224.9,115.9,77.9,Bandung,Cardiovascular Disease,Rainy Season\n",
"257,RpbkRHFWkUSQ,2024-03-18,75,23.2,125.1,221.4,118.1,81.7,Medan,Hypertension,Transitional Season\n",
"258,G81NH8epjI6Y,2022-04-29,60,21.5,138.3,211.9,79.6,97.9,Surabaya,Diabetes,Transitional Season\n",
"259,vp8kG7Z0fOUl,2020-06-29,64,25.0,140.1,208.0,98.4,78.0,Makassar,Healthy,Dry Season\n",
"260,HR2X9bbLdIbF,2022-01-20,35,27.7,130.5,177.6,70.9,87.5,Surabaya,Healthy,Rainy Season\n",
"261,TWUrXmzMVh9F,2023-07-31,57,21.3,92.8,180.2,80.7,73.3,Bandung,Cardiovascular Disease,Dry Season\n",
"262,rcv5nDSj3Xmg,2024-11-22,41,25.4,113.8,145.7,81.5,71.0,Bandung,Obesity,Rainy Season\n",
"263,so709AFzB0UJ,2021-04-07,59,16.9,93.3,325.0,105.5,81.2,Bandung,Hypertension,Transitional Season\n",
"264,1TY8MRud8Zyq,2021-08-27,46,25.6,137.2,252.1,91.4,84.2,Bandung,Healthy,Dry Season\n",
"265,QyXOCV2H3fXz,2023-08-29,51,26.7,109.7,210.8,94.5,67.1,Jakarta,Obesity,Dry Season\n",
"266,UWtDSatCxFbt,2020-03-24,39,20.7,84.0,243.4,72.4,87.6,Makassar,Cardiovascular Disease,Transitional Season\n",
"267,ppLz34QlQF76,2022-09-11,21,27.2,116.6,179.0,92.6,66.4,Surabaya,Cardiovascular Disease,Dry Season\n",
"268,SXvVKhxO9k5H,2021-04-08,40,34.4,112.9,222.6,95.6,70.7,Surabaya,Obesity,Transitional Season\n",
"269,4PdDOzYypDuh,2022-09-10,34,25.5,133.0,192.6,107.6,77.0,Jakarta,Diabetes,Dry Season\n",
"270,JTwkRWM95O96,2020-08-12,61,23.9,113.8,215.3,79.8,77.0,Medan,Obesity,Dry Season\n",
"271,S3aYAlretn1e,2024-05-15,72,35.8,149.6,208.3,92.9,77.4,Jakarta,Diabetes,Transitional Season\n",
"272,Etl1pbZzSiW5,2023-01-30,37,28.1,126.9,177.2,96.6,70.8,Medan,Cardiovascular Disease,Rainy Season\n",
"273,f2ZszYEpFRk2,2020-06-19,68,26.7,103.0,232.5,69.2,67.9,Bandung,Healthy,Dry Season\n",
"274,ryXIWrR7t4L5,2021-05-03,56,31.4,93.2,235.4,63.1,73.3,Bandung,Healthy,Transitional Season\n",
"275,NmsPsoN3M6wZ,2024-08-12,26,31.6,114.1,174.1,104.6,71.5,Bandung,Diabetes,Dry Season\n",
"276,uqsmEs5opnYI,2022-07-10,80,19.6,108.5,180.1,107.4,77.8,Makassar,Healthy,Dry Season\n",
"277,tV6JyHNVRTbp,2021-11-24,79,27.8,114.3,218.8,96.6,75.7,Surabaya,Healthy,Rainy Season\n",
"278,FHZ2GJarBTDq,2024-05-09,36,23.0,125.5,153.8,108.1,82.5,Jakarta,Obesity,Transitional Season\n",
"279,mbzeOkFepMOB,2024-05-08,38,23.2,110.6,206.0,109.0,69.4,Jakarta,Cardiovascular Disease,Transitional Season\n",
"280,kW32hwyOant0,2022-10-19,60,21.9,97.7,177.1,114.5,72.6,Surabaya,Diabetes,Transitional Season\n",
"281,XMpxLfKCfXgh,2024-09-04,60,15.4,144.3,164.8,91.7,89.6,Surabaya,Diabetes,Dry Season\n",
"282,tlp9E6NbQXj3,2023-12-15,80,28.2,146.1,168.7,106.0,78.3,Jakarta,Hypertension,Rainy Season\n",
"283,4LqdVXYGleTi,2024-04-27,61,21.5,156.0,138.0,74.4,63.3,Bandung,Hypertension,Transitional Season\n",
"284,EXEkHEbV4r6d,2023-05-15,34,30.2,131.7,173.5,73.7,81.0,Makassar,Obesity,Transitional Season\n",
"285,oLDPKPzqL4Ls,2023-01-13,51,27.4,115.3,189.3,126.5,75.7,Surabaya,Healthy,Rainy Season\n",
"286,hrnNmjzMyntA,2020-06-01,69,17.0,120.2,233.2,91.1,63.9,Bandung,Obesity,Dry Season\n",
"287,6zOR135Ja5gt,2022-08-18,24,21.3,103.3,158.5,67.8,65.4,Medan,Healthy,Dry Season\n",
"288,90LIwFV8shgc,2020-08-27,66,30.5,133.7,261.5,87.2,77.2,Makassar,Hypertension,Dry Season\n",
"289,NBD5gJGNSycZ,2020-02-18,21,22.9,94.3,224.4,79.4,78.4,Makassar,Cardiovascular Disease,Rainy Season\n",
"290,TmAe8HcWj2cP,2023-03-08,62,16.2,100.4,170.0,52.3,74.2,Makassar,Healthy,Transitional Season\n",
"291,hp80OKlJvWBA,2020-06-04,68,37.8,142.6,294.2,88.3,74.4,Jakarta,Diabetes,Dry Season\n",
"292,39Xtdtoxv0Tu,2020-09-30,18,27.3,167.0,143.6,103.2,71.2,Surabaya,Diabetes,Dry Season\n",
"293,5P3P3YUBeXG2,2024-05-09,45,23.9,121.7,249.2,108.4,86.3,Surabaya,Healthy,Transitional Season\n",
"294,uCixPSkOfcMT,2023-06-03,34,34.3,147.9,148.3,91.5,84.1,Surabaya,Cardiovascular Disease,Dry Season\n",
"295,gPtTntjmISZs,2022-01-31,53,33.0,104.9,225.6,127.3,57.2,Makassar,Healthy,Rainy Season\n",
"296,9cdTKFClBjWn,2020-06-18,74,26.9,134.5,204.4,96.8,61.5,Makassar,Healthy,Dry Season\n",
"297,P5BhW3xBxZEJ,2020-10-18,51,18.6,115.6,176.4,41.3,78.6,Bandung,Hypertension,Transitional Season\n",
"298,FaWDM0xhgqiy,2020-05-27,44,23.4,96.4,163.8,120.2,64.7,Surabaya,Obesity,Transitional Season\n",
"299,izNYn65UUjuE,2021-12-19,46,26.3,115.5,210.1,73.5,63.6,Medan,Cardiovascular Disease,Rainy Season\n",
"300,2k998FltUwOr,2020-07-07,43,21.3,116.9,200.5,82.3,77.4,Makassar,Obesity,Dry Season\n",
"301,l52ppBJkeCqI,2023-02-10,74,21.2,103.0,222.6,91.0,74.3,Surabaya,Hypertension,Rainy Season\n",
"302,FZxSQAnS3gBn,2022-11-16,38,21.9,123.7,223.4,75.0,76.5,Medan,Hypertension,Rainy Season\n",
"303,SypqX84gicae,2021-08-28,57,14.3,121.5,255.8,77.3,66.8,Makassar,Hypertension,Dry Season\n",
"304,qmSpzk8kj9w0,2021-10-06,52,31.1,142.8,261.8,97.9,66.9,Jakarta,Cardiovascular Disease,Transitional Season\n",
"305,KmjldLLSxFOg,2023-01-14,75,28.7,136.9,157.4,43.9,76.1,Medan,Obesity,Rainy Season\n",
"306,D81KoV6jicfm,2024-07-07,30,20.3,127.8,263.2,92.6,75.2,Jakarta,Cardiovascular Disease,Dry Season\n",
"307,0yYHXOmDBCtz,2020-09-06,51,24.2,130.8,196.0,99.5,74.7,Medan,Hypertension,Dry Season\n",
"308,qSz2vLChWSIx,2020-06-17,72,20.8,102.2,218.7,92.4,66.5,Medan,Hypertension,Dry Season\n",
"309,4R1xUr7hdZnf,2022-06-25,53,27.6,151.8,212.3,106.3,77.9,Bandung,Hypertension,Dry Season\n",
"310,tsOhrAsGeQIj,2022-01-01,58,23.7,99.6,204.0,105.8,75.7,Jakarta,Hypertension,Rainy Season\n",
"311,pQ1PuDtRS0bd,2021-07-03,60,30.1,117.7,180.8,104.4,88.0,Makassar,Obesity,Dry Season\n",
"312,2dFvKjyu49AQ,2023-10-17,45,29.0,96.4,184.2,122.0,70.0,Makassar,Diabetes,Transitional Season\n",
"313,bXbBP6WOmWNz,2021-04-01,27,17.3,112.7,261.1,79.8,84.3,Jakarta,Hypertension,Transitional Season\n",
"314,rBmtxdOt9Sa8,2022-07-07,49,22.6,152.7,305.3,95.1,65.6,Makassar,Cardiovascular Disease,Dry Season\n",
"315,oyunyE1ULdub,2020-04-24,63,19.8,127.6,241.7,86.4,77.7,Makassar,Cardiovascular Disease,Transitional Season\n",
"316,ATMA6zCZEiCr,2024-12-07,44,24.6,114.8,239.2,103.6,64.5,Medan,Hypertension,Rainy Season\n",
"317,i91DUJv05egd,2022-01-01,50,26.2,131.4,195.4,80.0,84.5,Surabaya,Cardiovascular Disease,Rainy Season\n",
"318,7zNOZlpPaMEm,2023-05-21,18,19.4,110.4,153.3,77.2,79.6,Surabaya,Hypertension,Transitional Season\n",
"319,mN3cl8AG7VXb,2023-08-08,31,28.3,129.0,203.7,97.4,72.0,Medan,Hypertension,Dry Season\n",
"320,1JFDCW8zZLXG,2023-12-20,80,28.4,117.6,204.1,86.5,73.6,Surabaya,Diabetes,Rainy Season\n",
"321,hDxgRoy12D2V,2023-05-22,69,29.4,101.4,205.9,75.4,78.4,Surabaya,Diabetes,Transitional Season\n",
"322,9YOYS3wRK0ql,2022-09-11,75,19.8,142.2,199.5,90.0,80.3,Jakarta,Healthy,Dry Season\n",
"323,Zhogrl2SjSKu,2021-03-13,70,22.8,118.2,256.1,100.4,87.4,Bandung,Obesity,Transitional Season\n",
"324,J38fM05rC4VV,2021-09-16,49,28.6,116.8,263.7,86.1,70.9,Jakarta,Hypertension,Dry Season\n",
"325,Ogla04npfK5V,2021-06-25,45,26.7,101.1,240.1,75.7,68.8,Bandung,Healthy,Dry Season\n",
"326,JaMxT08Z7exz,2021-11-29,60,22.0,143.2,170.6,99.6,103.8,Bandung,Diabetes,Rainy Season\n",
"327,ATC7Vuq4Atb1,2024-02-01,80,24.9,104.1,205.2,67.0,65.9,Jakarta,Cardiovascular Disease,Rainy Season\n",
"328,s4YdP4dwX79q,2022-07-05,49,28.6,126.3,259.0,87.3,78.1,Bandung,Diabetes,Dry Season\n",
"329,cSBayqB5wkf9,2022-08-23,58,19.4,134.7,226.0,85.9,85.6,Surabaya,Hypertension,Dry Season\n",
"330,iAVY5YUDP4zM,2024-05-01,47,26.0,144.4,172.4,127.9,48.9,Surabaya,Healthy,Transitional Season\n",
"331,eLSzWfyr3rSn,2023-02-02,68,18.3,136.4,187.9,46.1,87.6,Jakarta,Obesity,Rainy Season\n",
"332,YebduFEo1lI1,2023-05-10,54,19.6,137.7,159.6,95.4,83.4,Surabaya,Hypertension,Transitional Season\n",
"333,DC12ZrYSAiz3,2020-03-28,70,29.1,125.3,235.3,95.1,70.3,Bandung,Hypertension,Transitional Season\n",
"334,vYGyu87hYpz2,2023-10-25,51,19.4,116.6,178.3,91.0,70.4,Makassar,Hypertension,Transitional Season\n",
"335,phLg0PlvCaxP,2021-01-25,58,18.1,103.8,240.4,117.5,76.3,Bandung,Obesity,Rainy Season\n",
"336,uSd8vj3KzXwg,2021-02-01,52,21.5,122.5,255.1,80.2,97.5,Surabaya,Diabetes,Rainy Season\n",
"337,tVg4CmJILDuL,2020-09-24,41,19.0,128.3,237.7,89.5,84.5,Medan,Obesity,Dry Season\n",
"338,1CNhii2SIOwd,2024-09-12,61,25.5,104.5,183.9,107.8,67.8,Medan,Cardiovascular Disease,Dry Season\n",
"339,Z7rIT7oOierc,2022-03-29,51,22.4,132.3,156.7,83.7,69.4,Makassar,Hypertension,Transitional Season\n",
"340,VP9k56mvK1Cd,2024-03-25,76,27.4,112.8,168.6,89.9,69.6,Makassar,Healthy,Transitional Season\n",
"341,kBez782o4SvY,2023-02-18,41,25.9,125.7,204.2,87.5,80.5,Medan,Healthy,Rainy Season\n",
"342,fbOph1Sl4io3,2024-06-12,78,21.7,120.1,186.2,102.5,97.0,Medan,Cardiovascular Disease,Dry Season\n",
"343,p35KviRKDDz3,2020-02-06,45,23.0,114.7,245.0,92.2,73.7,Medan,Diabetes,Rainy Season\n",
"344,Q52JXMjir3e9,2021-12-06,65,25.2,93.9,200.6,75.0,78.6,Bandung,Obesity,Rainy Season\n",
"345,SkHnjJFdoYL7,2020-06-11,35,22.0,120.7,113.7,72.3,77.6,Jakarta,Cardiovascular Disease,Dry Season\n",
"346,1GnuOBVAw2Qv,2023-12-12,47,21.5,103.2,235.3,88.3,96.1,Makassar,Obesity,Rainy Season\n",
"347,aPYKwTNPDJEL,2024-12-23,39,30.9,108.3,275.7,53.9,61.0,Bandung,Diabetes,Rainy Season\n",
"348,1f1KucTiWIa9,2020-02-15,43,23.6,132.7,179.8,94.9,71.3,Makassar,Obesity,Rainy Season\n",
"349,s0OCQ0pprHps,2023-08-03,66,29.5,146.5,214.7,66.0,81.7,Bandung,Diabetes,Dry Season\n",
"350,7zZHKDKcbZwV,2022-05-08,37,25.7,132.7,189.9,96.8,66.2,Surabaya,Hypertension,Transitional Season\n",
"351,2pCnUULLnaQw,2024-01-27,48,27.1,111.8,154.2,133.7,65.7,Jakarta,Hypertension,Rainy Season\n",
"352,J0vZpE9K2Da6,2022-05-17,57,21.1,123.8,180.0,137.0,108.3,Surabaya,Hypertension,Transitional Season\n",
"353,zCJx578K54Mq,2021-01-10,47,25.0,124.5,193.8,94.0,77.0,Makassar,Diabetes,Rainy Season\n",
"354,prcgqqs3ardv,2022-03-31,25,24.5,124.8,184.5,66.6,66.6,Jakarta,Healthy,Transitional Season\n",
"355,AdkqXDDmM2aT,2020-10-06,68,19.3,119.4,211.9,111.5,79.0,Surabaya,Hypertension,Transitional Season\n",
"356,zUtTBWkVB3i2,2022-07-22,64,23.1,102.8,145.1,139.8,70.3,Bandung,Cardiovascular Disease,Dry Season\n",
"357,6443ZQoAAIpG,2022-01-25,42,23.9,128.4,239.9,85.7,62.3,Makassar,Obesity,Rainy Season\n",
"358,t7kTrSXHxfpe,2022-09-07,75,25.9,111.1,163.2,132.9,80.2,Bandung,Diabetes,Dry Season\n",
"359,aCHAZqtykBFp,2023-11-28,55,23.4,102.3,192.1,101.6,93.0,Bandung,Hypertension,Rainy Season\n",
"360,CnhiAvuXTBZQ,2021-12-15,52,27.8,124.6,218.6,69.1,95.7,Surabaya,Hypertension,Rainy Season\n",
"361,0Yu4Mkn61Ghc,2023-06-02,18,27.6,117.8,232.5,74.3,64.3,Jakarta,Hypertension,Dry Season\n",
"362,di3sZD6hBrjf,2023-11-03,47,16.8,113.7,138.3,112.4,53.2,Bandung,Hypertension,Rainy Season\n",
"363,SVOZE51XeE0c,2020-07-24,64,20.4,102.3,163.0,107.3,90.7,Medan,Cardiovascular Disease,Dry Season\n",
"364,rPZ4XTJO12gV,2023-10-03,21,18.9,112.9,191.4,90.9,80.7,Makassar,Healthy,Transitional Season\n",
"365,ckL4bRnpgBke,2024-12-29,45,23.5,134.0,147.2,50.1,64.7,Bandung,Healthy,Rainy Season\n",
"366,gRGUyFiledM7,2021-08-27,58,32.4,133.7,208.3,87.6,65.4,Medan,Obesity,Dry Season\n",
"367,2JPKJu14mETB,2023-08-17,50,26.3,108.7,190.8,88.6,63.2,Surabaya,Cardiovascular Disease,Dry Season\n",
"368,urRzva7b85N7,2023-10-13,60,19.9,104.0,281.6,69.6,77.9,Bandung,Healthy,Transitional Season\n",
"369,rv4qxviAioCq,2022-07-06,23,26.8,99.1,137.7,97.5,78.6,Makassar,Cardiovascular Disease,Dry Season\n",
"370,ZjE6QONrntLn,2021-11-14,37,23.6,121.5,189.5,100.8,77.2,Jakarta,Cardiovascular Disease,Rainy Season\n",
"371,G8UijSnEjeur,2022-02-03,31,25.6,137.9,149.4,80.1,72.6,Jakarta,Diabetes,Rainy Season\n",
"372,nH2Sv4pTISuD,2021-10-08,26,25.5,102.7,165.3,89.4,72.6,Makassar,Obesity,Transitional Season\n",
"373,3UV4xVjkhj1H,2024-12-01,45,18.9,141.7,259.6,100.1,77.4,Makassar,Cardiovascular Disease,Rainy Season\n",
"374,6bBAHSwzRsFh,2020-06-07,48,30.1,137.2,142.3,101.6,78.4,Jakarta,Healthy,Dry Season\n",
"375,Rut7eKM50XlE,2022-07-16,41,28.7,124.8,252.0,68.5,63.0,Jakarta,Cardiovascular Disease,Dry Season\n",
"376,Mpb4RWPGfwao,2022-09-04,23,23.3,120.9,184.7,100.0,90.6,Bandung,Healthy,Dry Season\n",
"377,VZx0AxgHdkFn,2022-07-23,61,19.1,136.3,226.5,88.8,60.9,Surabaya,Hypertension,Dry Season\n",
"378,3JfTkvliazdH,2021-10-14,39,18.6,111.7,155.6,74.8,62.3,Jakarta,Diabetes,Transitional Season\n",
"379,UHNJlVfUy9OW,2023-11-02,66,28.7,112.5,235.6,128.5,60.9,Jakarta,Obesity,Rainy Season\n",
"380,66JocXl8Xyes,2024-04-14,28,28.6,117.2,244.0,102.3,94.7,Surabaya,Diabetes,Transitional Season\n",
"381,cvojymsdgXcy,2022-12-13,76,22.3,116.5,142.6,113.4,80.3,Surabaya,Healthy,Rainy Season\n",
"382,9RjZ5KJNyTJp,2021-05-28,26,26.9,122.8,240.7,81.0,74.8,Makassar,Obesity,Transitional Season\n",
"383,hihvKQG3oXyT,2021-06-23,35,26.8,134.6,189.8,58.1,67.1,Makassar,Hypertension,Dry Season\n",
"384,ZNCsUB0dEDsy,2021-01-21,60,31.7,121.6,257.1,78.8,74.8,Makassar,Obesity,Rainy Season\n",
"385,4fWMyK8aGlRh,2022-07-17,36,27.1,94.9,177.8,110.5,70.5,Jakarta,Hypertension,Dry Season\n",
"386,rVJlAqOxH3dU,2024-07-04,50,32.7,132.8,209.2,103.1,91.4,Bandung,Hypertension,Dry Season\n",
"387,RnP47ExaAd2A,2022-09-15,60,26.0,148.4,173.4,130.6,79.4,Bandung,Cardiovascular Disease,Dry Season\n",
"388,IX99u2SLDJbR,2023-05-20,66,28.0,82.6,217.7,66.9,59.8,Surabaya,Cardiovascular Disease,Transitional Season\n",
"389,e9qSHB6opiWj,2022-01-16,66,29.0,141.6,244.2,83.7,62.2,Medan,Obesity,Rainy Season\n",
"390,Nbc4T2JLIhQT,2023-07-12,57,24.7,127.5,228.7,94.6,82.6,Medan,Hypertension,Dry Season\n",
"391,M6YEqoAZrvfU,2023-03-07,28,33.5,130.4,157.6,99.4,65.3,Surabaya,Healthy,Transitional Season\n",
"392,BbMtrYuSbMQT,2024-09-30,67,27.9,111.5,248.9,85.9,76.8,Bandung,Cardiovascular Disease,Dry Season\n",
"393,1h2Jjrb4oYwp,2021-09-18,50,30.0,134.2,180.9,118.0,69.5,Makassar,Hypertension,Dry Season\n",
"394,sIKzpZIpmsv6,2024-06-05,72,26.3,132.3,209.9,106.4,73.0,Bandung,Hypertension,Dry Season\n",
"395,W4n0A2VNB7Yu,2023-12-03,55,30.2,120.2,252.5,103.4,76.5,Makassar,Diabetes,Rainy Season\n",
"396,GIMsJLNkDPFr,2021-09-11,51,25.0,135.8,213.2,84.8,85.2,Makassar,Hypertension,Dry Season\n",
"397,OqsMtpUVjXsS,2020-07-20,63,28.1,130.5,182.5,114.0,75.6,Jakarta,Healthy,Dry Season\n",
"398,EDOOgxjmPnb6,2020-03-13,54,27.3,119.0,217.5,97.5,75.9,Bandung,Obesity,Transitional Season\n",
"399,X3Jz0Fjukknh,2024-08-09,18,29.9,141.1,182.8,77.5,83.2,Makassar,Hypertension,Dry Season\n",
"400,PUxWn6tlW40f,2020-11-30,47,26.9,128.4,189.4,90.2,75.9,Makassar,Cardiovascular Disease,Rainy Season\n",
"401,qpFb9CKExv0r,2022-12-05,74,25.7,116.9,273.7,89.6,75.8,Makassar,Obesity,Rainy Season\n",
"402,GLtbK5tWtCyy,2023-06-24,72,22.8,97.3,246.4,56.0,74.3,Bandung,Hypertension,Dry Season\n",
"403,oj2A1d0y7W0v,2020-05-10,35,21.2,128.9,207.9,89.3,72.0,Medan,Hypertension,Transitional Season\n",
"404,MC681O6tIzPP,2023-03-19,77,20.2,100.8,146.8,88.7,91.7,Medan,Hypertension,Transitional Season\n",
"405,6avaQJoTkxo9,2022-12-20,21,28.4,102.6,270.0,101.8,67.1,Surabaya,Obesity,Rainy Season\n",
"406,S6VU7NlQ74hC,2022-11-09,32,19.6,107.5,235.7,84.7,68.3,Surabaya,Diabetes,Rainy Season\n",
"407,NR2BItfWF0lq,2021-11-12,24,27.5,89.7,147.0,68.9,54.8,Bandung,Obesity,Rainy Season\n",
"408,9xTX2WFJCDHn,2023-10-09,31,22.6,119.5,236.7,88.4,61.0,Jakarta,Healthy,Transitional Season\n",
"409,ObcgGNkbBJfK,2022-09-09,27,29.1,106.5,229.6,72.5,68.2,Bandung,Healthy,Dry Season\n",
"410,I9yJThlinRJu,2021-06-12,52,28.4,102.2,226.7,70.4,82.3,Medan,Hypertension,Dry Season\n",
"411,ixBGs0oVTeN0,2020-03-22,24,22.3,113.3,223.0,94.1,72.2,Surabaya,Cardiovascular Disease,Transitional Season\n",
"412,4dcQmu9VkClb,2023-05-19,22,23.3,128.4,223.8,68.0,73.9,Medan,Hypertension,Transitional Season\n",
"413,xnCheo0EXekL,2023-04-12,41,21.3,121.4,219.7,92.2,71.4,Makassar,Diabetes,Transitional Season\n",
"414,ta8NhoUfeRrw,2020-09-29,55,17.3,128.9,239.2,94.7,66.7,Bandung,Cardiovascular Disease,Dry Season\n",
"415,WVx2DLucrr3L,2020-04-03,22,29.0,109.7,176.1,124.8,73.7,Medan,Obesity,Transitional Season\n",
"416,Mf6cqZxO0jzv,2020-11-25,38,26.2,125.4,208.3,91.1,73.9,Jakarta,Obesity,Rainy Season\n",
"417,Y7YM5fYGGrqa,2022-07-21,80,30.8,143.0,206.4,108.5,85.9,Jakarta,Obesity,Dry Season\n",
"418,5j9BgOSANE9B,2023-12-23,67,25.6,104.2,204.3,113.3,80.9,Jakarta,Obesity,Rainy Season\n",
"419,fGHyPcbNxuUy,2024-09-09,58,34.1,119.5,164.0,101.9,77.5,Bandung,Hypertension,Dry Season\n",
"420,8XL097hnDoLE,2020-06-27,52,22.4,134.2,210.6,86.0,69.2,Bandung,Cardiovascular Disease,Dry Season\n",
"421,V3GuRsEoayxo,2023-01-02,20,23.7,123.6,191.7,84.6,59.9,Medan,Healthy,Rainy Season\n",
"422,MLiR6EQlKmha,2020-10-30,26,20.6,143.4,225.2,96.3,80.3,Bandung,Hypertension,Transitional Season\n",
"423,XFVcxXOaYrWk,2022-07-18,80,30.2,140.2,205.0,99.4,61.8,Surabaya,Obesity,Dry Season\n",
"424,MsoumdyxXScL,2022-08-11,62,25.3,111.1,171.1,93.9,66.9,Jakarta,Diabetes,Dry Season\n",
"425,dDywJDWTT8Bl,2021-04-17,44,31.1,128.1,224.2,59.1,74.0,Medan,Obesity,Transitional Season\n",
"426,Yl7bBRXZvwkV,2023-10-01,46,23.9,106.1,219.5,107.6,89.9,Jakarta,Hypertension,Transitional Season\n",
"427,xGCLt2xo981u,2022-04-08,44,29.8,113.8,179.7,78.9,88.4,Jakarta,Healthy,Transitional Season\n",
"428,Ovjpd8l85DGA,2023-06-08,56,31.4,135.1,186.0,90.2,70.1,Makassar,Cardiovascular Disease,Dry Season\n",
"429,tfXVTtblqp1y,2023-10-21,59,21.0,119.3,285.1,65.5,90.6,Medan,Healthy,Transitional Season\n",
"430,3dmwxGSgINUc,2023-02-13,34,20.9,126.1,205.1,121.7,99.3,Surabaya,Obesity,Rainy Season\n",
"431,E0CE6HfYX29N,2021-12-26,51,26.6,132.5,180.3,115.0,72.9,Surabaya,Hypertension,Rainy Season\n",
"432,QRg4C58dYqhT,2022-10-07,37,24.2,103.7,176.8,97.2,77.1,Jakarta,Hypertension,Transitional Season\n",
"433,Ay1Tg9I3xOXL,2020-04-17,66,26.2,111.2,224.1,82.5,75.5,Surabaya,Obesity,Transitional Season\n",
"434,rlexSgsKkiTl,2021-10-20,43,25.8,131.7,215.3,67.7,86.4,Jakarta,Healthy,Transitional Season\n",
"435,s3vNcPpDBbmM,2023-09-11,29,23.6,118.3,223.7,92.0,58.1,Bandung,Healthy,Dry Season\n",
"436,bFuFXRVa6cmO,2024-11-06,41,20.2,111.0,187.3,108.7,70.2,Jakarta,Hypertension,Rainy Season\n",
"437,hWXba3QaVrov,2021-08-23,41,21.2,118.9,176.7,86.9,72.0,Medan,Obesity,Dry Season\n",
"438,pzC87Mhy75TB,2024-04-14,55,18.8,90.6,137.0,106.1,87.4,Jakarta,Cardiovascular Disease,Transitional Season\n",
"439,degYdrWNh6sq,2021-09-09,47,24.9,103.8,226.1,131.8,81.0,Medan,Hypertension,Dry Season\n",
"440,Xef6abtuvpFH,2020-04-12,39,27.4,122.3,279.7,67.6,65.5,Surabaya,Cardiovascular Disease,Transitional Season\n",
"441,yUqTPm8hb1yD,2022-01-24,47,29.7,93.7,128.2,103.5,71.7,Surabaya,Healthy,Rainy Season\n",
"442,cJG2f6HKKsNh,2023-09-30,68,22.2,119.9,225.5,124.0,91.4,Bandung,Cardiovascular Disease,Dry Season\n",
"443,uHCgEwyXaeao,2023-07-05,27,27.8,122.7,144.9,65.8,69.9,Bandung,Obesity,Dry Season\n",
"444,GlJw9bn4sNYB,2023-11-17,65,19.5,99.1,184.9,112.3,77.1,Surabaya,Healthy,Rainy Season\n",
"445,sgd7Oj6RtT9u,2023-04-22,50,22.8,131.8,194.6,29.6,93.2,Surabaya,Hypertension,Transitional Season\n",
"446,kLRMO3BS96Ne,2022-04-22,59,26.5,107.4,210.4,100.7,78.6,Makassar,Cardiovascular Disease,Transitional Season\n",
"447,J1hVnpTCyyGX,2023-11-14,24,19.1,122.5,184.7,97.4,76.5,Medan,Hypertension,Rainy Season\n",
"448,BuZ1q2Cui8et,2024-12-27,42,27.2,99.4,227.1,73.4,64.4,Jakarta,Hypertension,Rainy Season\n",
"449,EApuGKR5cBy1,2020-07-15,41,23.3,118.1,183.0,61.8,79.8,Makassar,Healthy,Dry Season\n",
"450,4W7N3BAeS4CO,2024-08-04,80,22.0,136.8,207.9,139.9,87.5,Bandung,Cardiovascular Disease,Dry Season\n",
"451,DyvmNjYdeF34,2023-04-30,19,21.5,130.6,239.2,93.3,76.8,Makassar,Diabetes,Transitional Season\n",
"452,0HlV2ViGa7W6,2022-01-15,53,25.0,108.5,239.0,73.1,72.6,Jakarta,Obesity,Rainy Season\n",
"453,R2uCOGeFSpie,2024-05-01,69,26.4,132.8,207.9,132.6,68.5,Bandung,Hypertension,Transitional Season\n",
"454,M9FxwCk95a9l,2023-05-01,25,27.0,152.5,231.6,112.0,73.3,Bandung,Diabetes,Transitional Season\n",
"455,WOCqOOsKKyOS,2023-12-19,64,20.8,117.9,245.8,89.9,85.5,Bandung,Cardiovascular Disease,Rainy Season\n",
"456,Za4JGLfEKcJn,2022-12-08,31,22.4,138.1,183.7,106.5,76.3,Bandung,Obesity,Rainy Season\n",
"457,D29DYFZFeDoe,2021-04-04,34,22.4,129.8,158.0,93.1,88.0,Medan,Diabetes,Transitional Season\n",
"458,Kh1QioYTOSqZ,2022-11-11,75,27.5,116.5,163.7,81.5,84.2,Medan,Healthy,Rainy Season\n",
"459,5wa0f3uFzluT,2024-11-27,54,27.5,139.8,194.8,82.4,70.8,Medan,Healthy,Rainy Season\n",
"460,VQiKxcIqOP6c,2024-11-05,66,33.5,84.1,254.3,86.9,68.8,Makassar,Diabetes,Rainy Season\n",
"461,6GDi4wZSvA7l,2023-11-01,34,27.9,118.3,191.8,71.1,77.6,Bandung,Cardiovascular Disease,Rainy Season\n",
"462,16e6AF5XQOo2,2020-05-11,33,22.5,139.2,170.8,89.5,79.3,Surabaya,Obesity,Transitional Season\n",
"463,7FKmancgEeMb,2024-05-21,49,22.0,130.9,219.5,137.2,63.7,Jakarta,Obesity,Transitional Season\n",
"464,hMnmEiBlMFCU,2020-07-05,75,31.7,130.9,185.0,57.8,66.3,Surabaya,Diabetes,Dry Season\n",
"465,y2z0SS5BoFKN,2020-02-22,69,24.4,143.9,274.6,93.1,75.8,Makassar,Cardiovascular Disease,Rainy Season\n",
"466,Bj9fms522gKq,2022-07-22,42,29.9,121.6,241.9,110.0,55.5,Medan,Hypertension,Dry Season\n",
"467,WrmrwLZd5Yex,2021-09-28,79,27.5,127.6,192.4,73.1,72.3,Jakarta,Diabetes,Dry Season\n",
"468,qnDCBdUBwifo,2022-12-18,55,29.0,131.3,126.0,134.7,84.3,Bandung,Obesity,Rainy Season\n",
"469,kA5JsiBaQ8F6,2022-04-16,20,23.6,107.6,263.1,50.6,66.0,Medan,Hypertension,Transitional Season\n",
"470,LaCcoVLx9zzz,2021-11-12,31,20.4,130.6,158.0,69.5,69.5,Medan,Obesity,Rainy Season\n",
"471,g2ee3OkWnUVj,2021-12-02,26,26.8,134.2,190.4,51.6,87.9,Jakarta,Hypertension,Rainy Season\n",
"472,2qrcyqTcKkVS,2023-11-28,73,28.4,111.3,219.5,50.3,70.8,Medan,Obesity,Rainy Season\n",
"473,xbARna2ZAh2Y,2023-03-31,61,26.7,101.5,238.3,74.5,67.7,Jakarta,Diabetes,Transitional Season\n",
"474,aDAg9FByI6Qy,2020-03-27,72,31.7,116.7,202.3,58.9,62.7,Jakarta,Cardiovascular Disease,Transitional Season\n",
"475,wZtQL4lFXOKC,2020-08-15,26,25.3,120.6,212.7,70.1,81.6,Bandung,Diabetes,Dry Season\n",
"476,fSYOk3NrKaJ5,2024-05-11,41,20.5,110.4,188.1,135.6,71.5,Bandung,Cardiovascular Disease,Transitional Season\n",
"477,kwfXmFGEKW6b,2024-08-10,63,26.9,105.6,124.1,85.1,78.0,Bandung,Hypertension,Dry Season\n",
"478,42j843LhVj7z,2024-04-11,20,24.2,130.1,224.9,118.0,71.8,Bandung,Hypertension,Transitional Season\n",
"479,yjYWajebQL3d,2023-04-12,53,29.1,106.3,194.8,54.3,75.3,Surabaya,Obesity,Transitional Season\n",
"480,Zl43hCSMcSBp,2023-04-12,52,22.7,124.3,253.6,82.4,80.6,Makassar,Hypertension,Transitional Season\n",
"481,WIL9gOPeV6J9,2020-01-03,56,21.9,125.4,255.2,78.1,62.3,Jakarta,Obesity,Rainy Season\n",
"482,5OU7UlVt7Ups,2021-08-29,50,18.8,136.7,204.0,136.6,60.4,Medan,Healthy,Dry Season\n",
"483,joPKje470vch,2021-12-03,77,19.8,119.8,138.0,81.0,70.8,Medan,Diabetes,Rainy Season\n",
"484,wtHNX0kSpg4l,2021-02-05,73,20.3,89.7,201.7,101.7,68.2,Makassar,Cardiovascular Disease,Rainy Season\n",
"485,V59yOwE9QUNK,2020-05-17,54,15.6,129.3,185.9,105.5,86.1,Medan,Hypertension,Transitional Season\n",
"486,6xLsoAPs7fVR,2021-04-25,39,27.8,118.3,172.4,94.7,74.8,Surabaya,Healthy,Transitional Season\n",
"487,4hCiPtB3NVsM,2022-02-05,21,28.1,143.9,203.1,78.5,58.9,Makassar,Healthy,Rainy Season\n",
"488,181gVXLfQsgt,2024-06-07,46,21.1,128.2,194.2,104.4,60.8,Surabaya,Healthy,Dry Season\n",
"489,qkNip7xD8Drh,2022-03-31,44,24.4,95.1,214.1,106.8,71.9,Jakarta,Healthy,Transitional Season\n",
"490,xqnAoa9sZweh,2022-05-18,25,27.0,121.5,188.1,74.0,96.3,Bandung,Hypertension,Transitional Season\n",
"491,PiCZaEBPtZwQ,2021-10-30,61,28.4,104.2,222.2,94.4,72.8,Makassar,Obesity,Transitional Season\n",
"492,ZG3Np0BQq3x7,2024-11-08,61,26.6,89.2,187.2,72.2,66.6,Jakarta,Healthy,Rainy Season\n",
"493,dliDT4JABtSP,2021-12-08,29,26.7,132.4,169.0,100.5,73.6,Surabaya,Cardiovascular Disease,Rainy Season\n",
"494,unl0PY90NcBU,2024-02-07,64,25.8,90.9,218.8,82.5,66.5,Bandung,Healthy,Rainy Season\n",
"495,k15wZKs5igO7,2022-06-23,26,20.9,144.4,261.9,66.1,83.0,Medan,Diabetes,Dry Season\n",
"496,qiymUnCmMIs9,2023-04-06,41,25.7,114.6,65.4,102.2,68.2,Medan,Obesity,Transitional Season\n",
"497,wNO4LPM45mbU,2023-01-15,20,29.0,110.6,128.1,81.6,63.1,Surabaya,Obesity,Rainy Season\n",
"498,mGTWsuf9dMk0,2021-05-10,20,22.2,106.3,148.3,108.7,71.0,Makassar,Healthy,Transitional Season\n",
"499,ecEownET5SIA,2020-08-05,43,29.1,95.0,225.2,61.7,69.4,Surabaya,Obesity,Dry Season\n",
"500,ZN9lr0an97yz,2020-11-27,58,27.1,126.3,248.2,67.6,75.7,Surabaya,Hypertension,Rainy Season\n"
), stringsAsFactors = FALSE)
# Konversi tipe data
df$Tanggal <- as.Date(df$Tanggal)
# Rename sudah sesuai saat data dibuat
# Periksa dimensi dataset
cat("Dimensi dataset:", nrow(df), "baris x", ncol(df), "kolom\n")
#> Dimensi dataset: 500 baris x 12 kolom
# Periksa struktur dan tipe data
str(df)
#> 'data.frame': 500 obs. of 12 variables:
#> $ No : int 1 2 3 4 5 6 7 8 9 10 ...
#> $ ID_Pasien : chr "SLy5n7T2vCfd" "SS9WdTh6Gp9l" "5PBRrmglA03t" "0cAGgC7hcyxq" ...
#> $ Tanggal : Date, format: "2021-07-14" "2020-11-16" ...
#> $ Usia : int 27 63 72 60 40 71 74 44 51 33 ...
#> $ BMI : num 31.5 26.9 18.2 19.9 32.5 27.4 17.1 25.3 18.2 17.6 ...
#> $ Tekanan_Darah : num 123 120 146 121 109 ...
#> $ Kolesterol : num 132 213 158 221 230 ...
#> $ Glukosa : num 77.7 128.1 100.3 103.4 91.9 ...
#> $ Detak_Jantung : num 70 82.2 73.9 79.1 67 56.3 78.5 72.1 93.7 72.7 ...
#> $ Lokasi : chr "Makassar" "Jakarta" "Surabaya" "Bandung" ...
#> $ Kondisi_Kesehatan: chr "Healthy" "Diabetes" "Healthy" "Diabetes" ...
#> $ Musim : chr "Dry Season" "Rainy Season" "Transitional Season" "Rainy Season" ...
# Statistik ringkas seluruh variabel
summary(df)
#> No ID_Pasien Tanggal Usia
#> Min. : 1.0 Length:500 Min. :2020-01-02 Min. :18.00
#> 1st Qu.:125.8 Class :character 1st Qu.:2021-05-08 1st Qu.:36.00
#> Median :250.5 Mode :character Median :2022-07-17 Median :51.00
#> Mean :250.5 Mean :2022-06-27 Mean :50.13
#> 3rd Qu.:375.2 3rd Qu.:2023-08-17 3rd Qu.:64.00
#> Max. :500.0 Max. :2024-12-29 Max. :80.00
#> BMI Tekanan_Darah Kolesterol Glukosa
#> Min. :13.20 Min. : 82.2 Min. : 65.4 Min. : 29.60
#> 1st Qu.:22.20 1st Qu.:110.4 1st Qu.:176.1 1st Qu.: 77.47
#> Median :25.20 Median :121.3 Median :203.5 Median : 91.90
#> Mean :25.09 Mean :120.9 Mean :202.9 Mean : 91.11
#> 3rd Qu.:27.80 3rd Qu.:131.8 3rd Qu.:233.3 3rd Qu.:103.50
#> Max. :37.80 Max. :167.0 Max. :325.0 Max. :151.80
#> Detak_Jantung Lokasi Kondisi_Kesehatan Musim
#> Min. : 47.30 Length:500 Length:500 Length:500
#> 1st Qu.: 68.50 Class :character Class :character Class :character
#> Median : 74.80 Mode :character Mode :character Mode :character
#> Mean : 75.18
#> 3rd Qu.: 81.72
#> Max. :108.30
# Periksa nilai yang hilang
missing_vals <- colSums(is.na(df))
cat("Jumlah nilai hilang per kolom:\n")
#> Jumlah nilai hilang per kolom:
print(missing_vals)
#> No ID_Pasien Tanggal Usia
#> 0 0 0 0
#> BMI Tekanan_Darah Kolesterol Glukosa
#> 0 0 0 0
#> Detak_Jantung Lokasi Kondisi_Kesehatan Musim
#> 0 0 0 0
cat("\nTotal nilai hilang:", sum(missing_vals), "\n")
#>
#> Total nilai hilang: 0
# Tampilkan 6 baris pertama
head(df, 6)
#> No ID_Pasien Tanggal Usia BMI Tekanan_Darah Kolesterol Glukosa
#> 1 1 SLy5n7T2vCfd 2021-07-14 27 31.5 122.8 131.5 77.7
#> 2 2 SS9WdTh6Gp9l 2020-11-16 63 26.9 119.9 212.8 128.1
#> 3 3 5PBRrmglA03t 2023-03-22 72 18.2 146.0 158.5 100.3
#> 4 4 0cAGgC7hcyxq 2023-01-02 60 19.9 121.2 220.9 103.4
#> 5 5 0KSEA9pnVHdd 2023-06-05 40 32.5 109.4 229.8 91.9
#> 6 6 Zba4dbAEtGwn 2023-03-15 71 27.4 126.2 209.3 102.0
#> Detak_Jantung Lokasi Kondisi_Kesehatan Musim
#> 1 70.0 Makassar Healthy Dry Season
#> 2 82.2 Jakarta Diabetes Rainy Season
#> 3 73.9 Surabaya Healthy Transitional Season
#> 4 79.1 Bandung Diabetes Rainy Season
#> 5 67.0 Makassar Healthy Dry Season
#> 6 56.3 Bandung Obesity Transitional Season
📘 Interpretasi – 6.9.2.1 Dataset berisi 500
rekam medis pasien dari 5 kota di Indonesia (Jakarta, Surabaya,
Bandung, Medan, Makassar) dengan 12 variabel yang mencakup identitas,
pengukuran klinis, lokasi, kondisi kesehatan, dan musim. Tidak ditemukan
nilai yang hilang (missing values) sehingga dataset siap untuk
tahap pembersihan dan transformasi. Tipe data sudah cukup sesuai:
tanggal terbaca sebagai POSIXct, numerik sebagai
numeric, dan karakter sebagai character.
💡 Insight – Struktur Data Kolom
Kondisi_Kesehatan memiliki 5 kategori unik: Healthy,
Diabetes, Obesity, Hypertension, dan Cardiovascular Disease. Kolom
Musim mencerminkan konteks iklim tropis Indonesia: Dry
Season, Rainy Season, dan Transitional Season — fitur musiman yang
relevan untuk analisis kesehatan berbasis waktu.
Membersihkan Data
Mengubah variabel kategorikal menjadi faktor, menangani entri tidak konsisten, dan memastikan satuan standar.
# 1. Ubah variabel kategorikal menjadi faktor (factor)
df$Kondisi_Kesehatan <- as.factor(df$Kondisi_Kesehatan)
df$Lokasi <- as.factor(df$Lokasi)
df$Musim <- as.factor(df$Musim)
# Periksa level faktor
cat("Level Kondisi_Kesehatan:\n")
#> Level Kondisi_Kesehatan:
print(levels(df$Kondisi_Kesehatan))
#> [1] "Cardiovascular Disease" "Diabetes" "Healthy"
#> [4] "Hypertension" "Obesity"
cat("\nLevel Musim:\n")
#>
#> Level Musim:
print(levels(df$Musim))
#> [1] "Dry Season" "Rainy Season" "Transitional Season"
cat("\nLevel Lokasi:\n")
#>
#> Level Lokasi:
print(levels(df$Lokasi))
#> [1] "Bandung" "Jakarta" "Makassar" "Medan" "Surabaya"
# 2. Pastikan kolom Tanggal bertipe Date
df$Tanggal <- as.Date(df$Tanggal)
# Periksa rentang tanggal
cat("Rentang tanggal:\n")
#> Rentang tanggal:
cat(" Terlama :", format(min(df$Tanggal), "%d %B %Y"), "\n")
#> Terlama : 02 January 2020
cat(" Terbaru :", format(max(df$Tanggal), "%d %B %Y"), "\n")
#> Terbaru : 29 December 2024
# 3. Verifikasi satuan standar (berat dalam kg, tinggi dalam cm)
# BMI sudah dihitung, pastikan nilai masuk akal
cat("Rentang BMI :", round(min(df$BMI), 1), "–", round(max(df$BMI), 1), "\n")
#> Rentang BMI : 13.2 – 37.8
cat("Rentang Usia :", min(df$Usia), "–", max(df$Usia), "tahun\n")
#> Rentang Usia : 18 – 80 tahun
cat("Rentang Glukosa :", round(min(df$Glukosa), 1), "–",
round(max(df$Glukosa), 1), "mg/dL\n")
#> Rentang Glukosa : 29.6 – 151.8 mg/dL
# 4. Periksa duplikasi berdasarkan ID Pasien
n_duplikat <- sum(duplicated(df$ID_Pasien))
cat("\nJumlah ID duplikat:", n_duplikat, "\n")
#>
#> Jumlah ID duplikat: 0
# 5. Distribusi frekuensi tiap kondisi kesehatan
cat("Distribusi Kondisi Kesehatan:\n")
#> Distribusi Kondisi Kesehatan:
tabel_kondisi <- table(df$Kondisi_Kesehatan)
print(tabel_kondisi)
#>
#> Cardiovascular Disease Diabetes Healthy
#> 83 89 105
#> Hypertension Obesity
#> 125 98
cat("\nProporsi (%):\n")
#>
#> Proporsi (%):
print(round(prop.table(tabel_kondisi) * 100, 1))
#>
#> Cardiovascular Disease Diabetes Healthy
#> 16.6 17.8 21.0
#> Hypertension Obesity
#> 25.0 19.6
# Visualisasi distribusi kondisi kesehatan
ggplot(df, aes(x = Kondisi_Kesehatan, fill = Kondisi_Kesehatan)) +
geom_bar(color = "white", linewidth = 0.3) +
scale_fill_manual(values = c("#2d7a5f","#1e4d7a","#7a5c1e","#8b2a2a","#5a5a72")) +
labs(title = "Distribusi Kondisi Kesehatan Pasien",
x = "Kondisi Kesehatan", y = "Jumlah Pasien") +
theme_minimal(base_size = 11) +
theme(legend.position = "none",
axis.text.x = element_text(angle = 20, hjust = 1),
plot.title = element_text(face = "bold"))
# Distribusi pasien per kota
ggplot(df, aes(x = Lokasi, fill = Lokasi)) +
geom_bar(color = "white", linewidth = 0.3) +
scale_fill_manual(values = c("#3d3d5c","#6b6b8e","#2d7a5f","#1e4d7a","#7a5c1e")) +
labs(title = "Distribusi Pasien per Kota", x = "Kota", y = "Jumlah") +
theme_minimal(base_size = 11) +
theme(legend.position = "none",
plot.title = element_text(face = "bold"))
📘 Interpretasi – 6.9.2.2 Proses pembersihan
berhasil mengkonversi tiga kolom kategorikal
(Kondisi_Kesehatan, Lokasi,
Musim) menjadi faktor R. Tidak ditemukan duplikasi ID
pasien. Rentang nilai klinis (BMI: 13,2–37,8; Glukosa: 29,6–151,8 mg/dL)
berada dalam batas klinis yang dapat diterima. Distribusi kondisi
kesehatan relatif seimbang dengan Healthy sebagai kategori terbesar.
Rekayasa Fitur
Membuat indikator risiko kesehatan dari data mentah: BMI, kelompok usia, dan penanda kondisi kronis.
# 1. Verifikasi rumus BMI (sudah tersedia di dataset)
# BMI = Berat (kg) / Tinggi (m)^2
# Dataset sudah memuat nilai BMI langsung
# Kategorisasi BMI menurut WHO
df$Kategori_BMI <- cut(
df$BMI,
breaks = c(-Inf, 18.5, 24.9, 29.9, Inf),
labels = c("Underweight", "Normal", "Overweight", "Obese"),
right = TRUE
)
cat("Distribusi Kategori BMI:\n")
#> Distribusi Kategori BMI:
print(table(df$Kategori_BMI))
#>
#> Underweight Normal Overweight Obese
#> 25 213 204 58
# 2. Klasifikasi kelompok usia
df$Kelompok_Usia <- cut(
df$Usia,
breaks = c(-Inf, 35, 59, Inf),
labels = c("Muda (≤35)", "Dewasa (36-59)", "Lanjut Usia (≥60)"),
right = TRUE
)
cat("\nDistribusi Kelompok Usia:\n")
#>
#> Distribusi Kelompok Usia:
print(table(df$Kelompok_Usia))
#>
#> Muda (≤35) Dewasa (36-59) Lanjut Usia (≥60)
#> 122 205 173
# 3. Penanda kondisi kronis: has_diabetes
# Glukosa puasa > 126 mg/dL dianggap indikasi diabetes
df$Has_Diabetes <- ifelse(df$Glukosa > 126, TRUE, FALSE)
cat("\nPasien dengan indikasi Glukosa tinggi (>126 mg/dL):\n")
#>
#> Pasien dengan indikasi Glukosa tinggi (>126 mg/dL):
print(table(df$Has_Diabetes))
#>
#> FALSE TRUE
#> 475 25
cat("Proporsi:", round(mean(df$Has_Diabetes) * 100, 1), "%\n")
#> Proporsi: 5 %
# 4. Penanda hipertensi
# Tekanan darah sistolik > 140 mmHg = hipertensi
df$Has_Hypertension <- ifelse(df$Tekanan_Darah > 140, TRUE, FALSE)
cat("\nPasien dengan indikasi Hipertensi (TD > 140):\n")
#>
#> Pasien dengan indikasi Hipertensi (TD > 140):
print(table(df$Has_Hypertension))
#>
#> FALSE TRUE
#> 442 58
cat("Proporsi:", round(mean(df$Has_Hypertension) * 100, 1), "%\n")
#> Proporsi: 11.6 %
# Tampilkan kolom baru yang dibuat
cat("\nKolom baru hasil rekayasa fitur:\n")
#>
#> Kolom baru hasil rekayasa fitur:
df %>%
select(ID_Pasien, Usia, BMI, Glukosa, Tekanan_Darah,
Kategori_BMI, Kelompok_Usia, Has_Diabetes, Has_Hypertension) %>%
head(6)
#> ID_Pasien Usia BMI Glukosa Tekanan_Darah Kategori_BMI Kelompok_Usia
#> 1 SLy5n7T2vCfd 27 31.5 77.7 122.8 Obese Muda (≤35)
#> 2 SS9WdTh6Gp9l 63 26.9 128.1 119.9 Overweight Lanjut Usia (≥60)
#> 3 5PBRrmglA03t 72 18.2 100.3 146.0 Underweight Lanjut Usia (≥60)
#> 4 0cAGgC7hcyxq 60 19.9 103.4 121.2 Normal Lanjut Usia (≥60)
#> 5 0KSEA9pnVHdd 40 32.5 91.9 109.4 Obese Dewasa (36-59)
#> 6 Zba4dbAEtGwn 71 27.4 102.0 126.2 Overweight Lanjut Usia (≥60)
#> Has_Diabetes Has_Hypertension
#> 1 FALSE FALSE
#> 2 TRUE FALSE
#> 3 FALSE TRUE
#> 4 FALSE FALSE
#> 5 FALSE FALSE
#> 6 FALSE FALSE
# Visualisasi distribusi kategori BMI
ggplot(df, aes(x = Kategori_BMI, fill = Kategori_BMI)) +
geom_bar(color = "white", linewidth = 0.3) +
scale_fill_manual(values = c("#1e4d7a","#2d7a5f","#7a5c1e","#8b2a2a")) +
labs(title = "Distribusi Kategori BMI", x = "Kategori BMI", y = "Jumlah Pasien") +
theme_minimal(base_size = 11) +
theme(legend.position = "none", plot.title = element_text(face = "bold"))
# Visualisasi kelompok usia
ggplot(df, aes(x = Kelompok_Usia, fill = Kelompok_Usia)) +
geom_bar(color = "white", linewidth = 0.3) +
scale_fill_manual(values = c("#3d3d5c","#6b6b8e","#2d7a5f")) +
labs(title = "Distribusi Kelompok Usia", x = "Kelompok Usia", y = "Jumlah") +
theme_minimal(base_size = 11) +
theme(legend.position = "none", plot.title = element_text(face = "bold"))
📘 Interpretasi – 6.9.2.3 Rekayasa fitur berhasil
menghasilkan empat variabel baru dari data mentah. Kategori BMI
menunjukkan bahwa mayoritas pasien berada dalam rentang Normal
dan Overweight. Penanda Has_Diabetes (glukosa >
126 mg/dL) dan Has_Hypertension (TD > 140 mmHg)
merupakan fitur biner klinis yang berguna sebagai variabel prediktor
dalam pemodelan risiko kesehatan.
📌 Catatan Klinis Ambang batas glukosa > 126 mg/dL mengacu pada standar WHO untuk diagnosis diabetes puasa. Nilai tekanan darah > 140 mmHg merujuk pada kriteria hipertensi Stadium 2 menurut AHA. Pertimbangkan konteks klinis sebelum interpretasi final.
Kategorisasi dan Pengelompokan
Mengkonversikan skor numerik ke tingkat kesehatan: tekanan darah, kolesterol, dan skor kepuasan ordinal.
# 1. Kategorisasi tekanan darah (sistolik, mmHg)
# Normal: < 120 | Meningkat: 120–139 | Tinggi: ≥ 140
df$Kategori_TD <- cut(
df$Tekanan_Darah,
breaks = c(-Inf, 119.9, 139.9, Inf),
labels = c("Normal", "Meningkat", "Tinggi"),
right = TRUE
)
cat("Distribusi Kategori Tekanan Darah:\n")
#> Distribusi Kategori Tekanan Darah:
print(table(df$Kategori_TD))
#>
#> Normal Meningkat Tinggi
#> 237 205 58
cat("\nProporsi (%):\n")
#>
#> Proporsi (%):
print(round(prop.table(table(df$Kategori_TD)) * 100, 1))
#>
#> Normal Meningkat Tinggi
#> 47.4 41.0 11.6
# 2. Kategorisasi kolesterol total (mg/dL)
# Optimal: < 170 | Batas: 170–199 | Tinggi: ≥ 200
df$Kategori_Kolesterol <- cut(
df$Kolesterol,
breaks = c(-Inf, 169.9, 199.9, Inf),
labels = c("Optimal", "Batas", "Tinggi"),
right = TRUE
)
cat("\nDistribusi Kategori Kolesterol:\n")
#>
#> Distribusi Kategori Kolesterol:
print(table(df$Kategori_Kolesterol))
#>
#> Optimal Batas Tinggi
#> 106 130 264
# 3. Kategorisasi detak jantung (bpm)
# Lambat: < 60 | Normal: 60–100 | Cepat: > 100
df$Kategori_Detak <- cut(
df$Detak_Jantung,
breaks = c(-Inf, 59.9, 100, Inf),
labels = c("Bradikardi", "Normal", "Takikardi"),
right = TRUE
)
cat("\nDistribusi Kategori Detak Jantung:\n")
#>
#> Distribusi Kategori Detak Jantung:
print(table(df$Kategori_Detak))
#>
#> Bradikardi Normal Takikardi
#> 26 471 3
# 4. Variabel ordinal: Skor Risiko Kesehatan (0–3)
# Setiap penanda yang "buruk" menambah 1 poin
df$Skor_Risiko <- as.integer(df$Has_Diabetes) +
as.integer(df$Has_Hypertension) +
as.integer(df$Kategori_BMI %in% c("Overweight", "Obese"))
cat("\nDistribusi Skor Risiko Gabungan (0-3):\n")
#>
#> Distribusi Skor Risiko Gabungan (0-3):
print(table(df$Skor_Risiko))
#>
#> 0 1 2 3
#> 199 260 38 3
# Visualisasi kategori tekanan darah
ggplot(df, aes(x = Kategori_TD, fill = Kategori_TD)) +
geom_bar(color = "white", linewidth = 0.3) +
scale_fill_manual(values = c("#2d7a5f","#7a5c1e","#8b2a2a")) +
labs(title = "Kategori Tekanan Darah Sistolik",
x = "Kategori", y = "Jumlah Pasien") +
theme_minimal(base_size = 11) +
theme(legend.position = "none", plot.title = element_text(face = "bold"))
# Distribusi skor risiko gabungan
df_risiko <- as.data.frame(table(df$Skor_Risiko))
colnames(df_risiko) <- c("Skor", "Jumlah")
ggplot(df_risiko, aes(x = Skor, y = Jumlah, fill = Skor)) +
geom_col(color = "white", linewidth = 0.3) +
scale_fill_manual(values = c("#2d7a5f","#7a5c1e","#8b2a2a","#5a5a72")) +
labs(title = "Distribusi Skor Risiko Kesehatan (0–3)",
x = "Skor Risiko", y = "Jumlah Pasien") +
theme_minimal(base_size = 11) +
theme(legend.position = "none", plot.title = element_text(face = "bold"))
📘 Interpretasi – 6.9.2.4 Kategorisasi tekanan darah mengungkap bahwa sebagian besar pasien berada dalam kategori Meningkat dan Tinggi, mengindikasikan prevalensi hipertensi yang signifikan dalam dataset. Skor risiko gabungan (0–3) memberikan variabel ordinal yang berguna untuk segmentasi pasien berdasarkan beban faktor risiko kumulatif — semakin tinggi skor, semakin banyak intervensi klinis yang diperlukan.
Mendeteksi dan Menangani Pencilan
Menggunakan metode IQR dan Z-score untuk mendeteksi nilai abnormal pada metrik klinis.
# Fungsi deteksi pencilan dengan metode IQR
deteksi_iqr <- function(x, nama_kolom = "x") {
Q1 <- quantile(x, 0.25, na.rm = TRUE)
Q3 <- quantile(x, 0.75, na.rm = TRUE)
IQR <- Q3 - Q1
batas_bawah <- Q1 - 1.5 * IQR
batas_atas <- Q3 + 1.5 * IQR
pencilan <- x < batas_bawah | x > batas_atas
cat("─── Kolom:", nama_kolom, "───\n")
cat(" Q1:", round(Q1, 2), "| Q3:", round(Q3, 2),
"| IQR:", round(IQR, 2), "\n")
cat(" Batas bawah:", round(batas_bawah, 2),
"| Batas atas:", round(batas_atas, 2), "\n")
cat(" Jumlah pencilan:", sum(pencilan), "\n\n")
return(pencilan)
}
# Terapkan deteksi IQR pada kolom klinis utama
kolom_klinis <- c("Tekanan_Darah", "Kolesterol", "Glukosa",
"BMI", "Detak_Jantung")
outlier_flags <- lapply(kolom_klinis, function(k) {
deteksi_iqr(df[[k]], k)
})
#> ─── Kolom: Tekanan_Darah ───
#> Q1: 110.38 | Q3: 131.85 | IQR: 21.48
#> Batas bawah: 78.16 | Batas atas: 164.06
#> Jumlah pencilan: 1
#>
#> ─── Kolom: Kolesterol ───
#> Q1: 176.1 | Q3: 233.27 | IQR: 57.17
#> Batas bawah: 90.34 | Batas atas: 319.04
#> Jumlah pencilan: 2
#>
#> ─── Kolom: Glukosa ───
#> Q1: 77.47 | Q3: 103.5 | IQR: 26.03
#> Batas bawah: 38.44 | Batas atas: 142.54
#> Jumlah pencilan: 4
#>
#> ─── Kolom: BMI ───
#> Q1: 22.2 | Q3: 27.8 | IQR: 5.6
#> Batas bawah: 13.8 | Batas atas: 36.2
#> Jumlah pencilan: 4
#>
#> ─── Kolom: Detak_Jantung ───
#> Q1: 68.5 | Q3: 81.73 | IQR: 13.23
#> Batas bawah: 48.66 | Batas atas: 101.56
#> Jumlah pencilan: 4
names(outlier_flags) <- kolom_klinis
# Metode Z-score untuk kolesterol
df$Z_Kolesterol <- scale(df$Kolesterol)[, 1]
# Pencilan Z-score: |Z| > 3
pencilan_z <- abs(df$Z_Kolesterol) > 3
cat("Pencilan Kolesterol (|Z| > 3):", sum(pencilan_z), "pasien\n")
#> Pencilan Kolesterol (|Z| > 3): 2 pasien
# Tampilkan pencilan kolesterol
if (sum(pencilan_z) > 0) {
cat("\nDetail pencilan kolesterol:\n")
print(df[pencilan_z, c("ID_Pasien", "Kolesterol", "Z_Kolesterol",
"Kondisi_Kesehatan")])
}
#>
#> Detail pencilan kolesterol:
#> ID_Pasien Kolesterol Z_Kolesterol Kondisi_Kesehatan
#> 263 so709AFzB0UJ 325.0 3.038673 Hypertension
#> 496 qiymUnCmMIs9 65.4 -3.422267 Obesity
# Penanganan: winsorization (cap di batas IQR)
# Terapkan pada kolesterol (pertimbangkan domain knowledge)
Q1_kol <- quantile(df$Kolesterol, 0.25)
Q3_kol <- quantile(df$Kolesterol, 0.75)
IQR_kol <- Q3_kol - Q1_kol
df$Kolesterol_Clean <- pmin(
pmax(df$Kolesterol, Q1_kol - 1.5 * IQR_kol),
Q3_kol + 1.5 * IQR_kol
)
cat("\nSebelum winsorization — rentang kolesterol:",
round(min(df$Kolesterol), 1), "–", round(max(df$Kolesterol), 1), "\n")
#>
#> Sebelum winsorization — rentang kolesterol: 65.4 – 325
cat("Setelah winsorization — rentang kolesterol:",
round(min(df$Kolesterol_Clean), 1), "–",
round(max(df$Kolesterol_Clean), 1), "\n")
#> Setelah winsorization — rentang kolesterol: 90.3 – 319
# Boxplot sebelum & setelah penanganan pencilan (Kolesterol)
par(mfrow = c(1, 2), mar = c(4, 4, 3, 1))
boxplot(df$Kolesterol, main = "Kolesterol\n(Asli)",
col = "#eef4fd", border = "#1e4d7a",
ylab = "mg/dL", cex.main = 0.9)
boxplot(df$Kolesterol_Clean, main = "Kolesterol\n(Setelah Winsorization)",
col = "#eaf4f0", border = "#2d7a5f",
ylab = "mg/dL", cex.main = 0.9)
par(mfrow = c(1, 1))
# Visualisasi distribusi glukosa dengan batas IQR
Q1_g <- quantile(df$Glukosa, 0.25)
Q3_g <- quantile(df$Glukosa, 0.75)
IQR_g <- Q3_g - Q1_g
ggplot(df, aes(x = Glukosa)) +
geom_histogram(bins = 30, fill = "#3d3d5c", color = "white", alpha = 0.8) +
geom_vline(xintercept = Q1_g - 1.5 * IQR_g, color = "#8b2a2a", linetype = "dashed") +
geom_vline(xintercept = Q3_g + 1.5 * IQR_g, color = "#8b2a2a", linetype = "dashed") +
labs(title = "Distribusi Glukosa + Batas IQR",
x = "Glukosa (mg/dL)", y = "Frekuensi") +
theme_minimal(base_size = 11) +
theme(plot.title = element_text(face = "bold"))
📘 Interpretasi – 6.9.2.5 Metode IQR berhasil mengidentifikasi pencilan pada beberapa variabel klinis. Winsorization diterapkan pada kolesterol untuk menjaga integritas distribusi tanpa menghapus rekam medis pasien. Penting untuk mempertimbangkan pengetahuan domain: nilai glukosa yang sangat tinggi mungkin valid untuk pasien diabetes, sehingga penghapusan pencilan harus dilakukan secara selektif berdasarkan konteks klinis.
Fitur Temporal dan Bergulir
Mengekstrak fitur berbasis waktu dan menghitung rata-rata bergulir dari pengukuran klinis.
# 1. Ekstrak komponen waktu dari kolom Tanggal
df$Tahun <- as.integer(format(df$Tanggal, "%Y"))
df$Bulan <- as.integer(format(df$Tanggal, "%m"))
df$Kuartal <- paste0("Q", ceiling(df$Bulan / 3))
cat("Distribusi rekam per Tahun:\n")
#> Distribusi rekam per Tahun:
print(table(df$Tahun))
#>
#> 2020 2021 2022 2023 2024
#> 97 97 109 110 87
cat("\nDistribusi rekam per Kuartal:\n")
#>
#> Distribusi rekam per Kuartal:
print(table(df$Kuartal))
#>
#> Q1 Q2 Q3 Q4
#> 113 134 134 119
# 2. Rata-rata glukosa per bulan (tren temporal)
df_bulanan <- df %>%
mutate(Periode = format(Tanggal, "%Y-%m")) %>%
group_by(Periode) %>%
summarise(
Rata_Glukosa = mean(Glukosa, na.rm = TRUE),
Rata_BMI = mean(BMI, na.rm = TRUE),
N = n(),
.groups = "drop"
) %>%
arrange(Periode)
cat("Rata-rata Glukosa per Periode (6 pertama):\n")
#> Rata-rata Glukosa per Periode (6 pertama):
print(head(df_bulanan, 6))
#> # A tibble: 6 × 4
#> Periode Rata_Glukosa Rata_BMI N
#> <chr> <dbl> <dbl> <int>
#> 1 2020-01 87.0 25.5 6
#> 2 2020-02 95.8 23.6 10
#> 3 2020-03 80.4 24.9 10
#> 4 2020-04 96.2 27.2 8
#> 5 2020-05 108. 23.6 7
#> 6 2020-06 90.4 25.5 12
# 3. Rata-rata bergulir (rolling mean) Glukosa — window 3 periode
# Menggunakan fungsi kustom sederhana tanpa paket tambahan
rolling_mean <- function(x, window = 3) {
n <- length(x)
result <- rep(NA_real_, n)
for (i in window:n) {
result[i] <- mean(x[(i - window + 1):i], na.rm = TRUE)
}
return(result)
}
df_bulanan$Rolling_Glukosa_3 <- rolling_mean(df_bulanan$Rata_Glukosa, 3)
cat("\nData tren dengan Rolling Mean (3 periode):\n")
#>
#> Data tren dengan Rolling Mean (3 periode):
print(df_bulanan[, c("Periode", "Rata_Glukosa", "Rolling_Glukosa_3", "N")])
#> # A tibble: 60 × 4
#> Periode Rata_Glukosa Rolling_Glukosa_3 N
#> <chr> <dbl> <dbl> <int>
#> 1 2020-01 87.0 NA 6
#> 2 2020-02 95.8 NA 10
#> 3 2020-03 80.4 87.7 10
#> 4 2020-04 96.2 90.8 8
#> 5 2020-05 108. 94.8 7
#> 6 2020-06 90.4 98.1 12
#> 7 2020-07 81.7 93.3 9
#> 8 2020-08 77.1 83.1 5
#> 9 2020-09 98.5 85.8 10
#> 10 2020-10 87.0 87.5 9
#> # ℹ 50 more rows
# 4. Rata-rata per musim
df_musim <- df %>%
group_by(Musim) %>%
summarise(
Rata_Glukosa = round(mean(Glukosa), 2),
Rata_TD = round(mean(Tekanan_Darah), 2),
Rata_BMI = round(mean(BMI), 2),
N_Pasien = n(),
.groups = "drop"
)
cat("Statistik rata-rata per Musim:\n")
#> Statistik rata-rata per Musim:
print(df_musim)
#> # A tibble: 3 × 5
#> Musim Rata_Glukosa Rata_TD Rata_BMI N_Pasien
#> <fct> <dbl> <dbl> <dbl> <int>
#> 1 Dry Season 91.3 122. 25.3 177
#> 2 Rainy Season 91.6 120. 25.2 152
#> 3 Transitional Season 90.5 121. 24.7 171
# Plot tren glukosa bulanan + rolling mean
ggplot(df_bulanan, aes(x = Periode, group = 1)) +
geom_line(aes(y = Rata_Glukosa), color = "#6b6b8e",
linewidth = 0.8, alpha = 0.7) +
geom_line(aes(y = Rolling_Glukosa_3), color = "#2d7a5f",
linewidth = 1.2, na.rm = TRUE) +
labs(title = "Tren Glukosa Bulanan + Rolling Mean (3 Periode)",
x = "Periode", y = "Rata-rata Glukosa (mg/dL)",
caption = "Abu-abu: aktual | Hijau: rolling mean") +
theme_minimal(base_size = 10) +
theme(axis.text.x = element_text(angle = 60, hjust = 1, size = 7),
plot.title = element_text(face = "bold", size = 10))
# Perbandingan glukosa antar musim
ggplot(df, aes(x = Musim, y = Glukosa, fill = Musim)) +
geom_boxplot(color = "#3d3d5c", alpha = 0.8, outlier.size = 1) +
scale_fill_manual(values = c("#eef4fd","#eaf4f0","#fdf8ec")) +
labs(title = "Distribusi Glukosa per Musim",
x = "Musim", y = "Glukosa (mg/dL)") +
theme_minimal(base_size = 11) +
theme(legend.position = "none", plot.title = element_text(face = "bold"))
📘 Interpretasi – 6.9.2.6 Fitur temporal berhasil diekstrak dari kolom tanggal, mencakup tahun, bulan, dan kuartal. Rolling mean 3-periode memperhalus fluktuasi acak dalam tren glukosa bulanan, sehingga memudahkan identifikasi pola musiman. Analisis per musim menunjukkan perbedaan kecil namun konsisten dalam rata-rata glukosa dan tekanan darah — informasi yang berguna untuk model prediktif berbasis waktu.
Mengkodekan Variabel Kategorikal
Mengkonversikan kategori ke kode numerik melalui one-hot encoding dan label encoding.
# 1. One-Hot Encoding untuk Kondisi_Kesehatan
# Mengkode gaya hidup/kondisi ke dalam representasi biner
kondisi_levels <- levels(df$Kondisi_Kesehatan)
for (level in kondisi_levels) {
col_name <- paste0("Kondisi_", gsub(" ", "_", level))
df[[col_name]] <- as.integer(df$Kondisi_Kesehatan == level)
}
# Tampilkan hasil one-hot encoding
cat("Kolom one-hot yang dibuat:\n")
#> Kolom one-hot yang dibuat:
ohe_cols <- grep("^Kondisi_", names(df), value = TRUE)
print(ohe_cols)
#> [1] "Kondisi_Kesehatan" "Kondisi_Cardiovascular_Disease"
#> [3] "Kondisi_Diabetes" "Kondisi_Healthy"
#> [5] "Kondisi_Hypertension" "Kondisi_Obesity"
cat("\nContoh 5 baris pertama:\n")
#>
#> Contoh 5 baris pertama:
print(df[1:5, c("Kondisi_Kesehatan", ohe_cols)])
#> Kondisi_Kesehatan Kondisi_Kesehatan.1 Kondisi_Cardiovascular_Disease
#> 1 Healthy Healthy 0
#> 2 Diabetes Diabetes 0
#> 3 Healthy Healthy 0
#> 4 Diabetes Diabetes 0
#> 5 Healthy Healthy 0
#> Kondisi_Diabetes Kondisi_Healthy Kondisi_Hypertension Kondisi_Obesity
#> 1 0 1 0 0
#> 2 1 0 0 0
#> 3 0 1 0 0
#> 4 1 0 0 0
#> 5 0 1 0 0
# 2. One-Hot Encoding untuk Musim
musim_levels <- levels(df$Musim)
for (level in musim_levels) {
col_name <- paste0("Musim_", gsub(" ", "_", level))
df[[col_name]] <- as.integer(df$Musim == level)
}
cat("Kolom one-hot Musim:\n")
#> Kolom one-hot Musim:
musim_cols <- grep("^Musim_", names(df), value = TRUE)
print(df[1:5, c("Musim", musim_cols)])
#> Musim Musim_Dry_Season Musim_Rainy_Season
#> 1 Dry Season 1 0
#> 2 Rainy Season 0 1
#> 3 Transitional Season 0 0
#> 4 Rainy Season 0 1
#> 5 Dry Season 1 0
#> Musim_Transitional_Season
#> 1 0
#> 2 0
#> 3 1
#> 4 0
#> 5 0
# 3. Label Encoding (ordinal) untuk variabel bertingkat
# Kategori BMI: Underweight=1, Normal=2, Overweight=3, Obese=4
df$BMI_Label <- as.integer(factor(
df$Kategori_BMI,
levels = c("Underweight", "Normal", "Overweight", "Obese"),
ordered = TRUE
))
# Tingkat kesehatan tekanan darah (ordinal)
df$TD_Label <- as.integer(factor(
df$Kategori_TD,
levels = c("Normal", "Meningkat", "Tinggi"),
ordered = TRUE
))
cat("Contoh Label Encoding:\n")
#> Contoh Label Encoding:
print(df[1:8, c("Kategori_BMI", "BMI_Label", "Kategori_TD", "TD_Label")])
#> Kategori_BMI BMI_Label Kategori_TD TD_Label
#> 1 Obese 4 Meningkat 2
#> 2 Overweight 3 Normal 1
#> 3 Underweight 1 Tinggi 3
#> 4 Normal 2 Meningkat 2
#> 5 Obese 4 Normal 1
#> 6 Overweight 3 Meningkat 2
#> 7 Underweight 1 Normal 1
#> 8 Overweight 3 Normal 1
# 4. Label encoding untuk Lokasi (numerik arbitrer)
df$Lokasi_Label <- as.integer(df$Lokasi)
cat("\nPetaan Lokasi → Label:\n")
#>
#> Petaan Lokasi → Label:
print(data.frame(
Lokasi = levels(df$Lokasi),
Label = 1:length(levels(df$Lokasi))
))
#> Lokasi Label
#> 1 Bandung 1
#> 2 Jakarta 2
#> 3 Makassar 3
#> 4 Medan 4
#> 5 Surabaya 5
# Tampilkan ringkasan semua kolom yang dihasilkan
cat("Ringkasan kolom setelah encoding:\n")
#> Ringkasan kolom setelah encoding:
encoded_cols <- c("BMI_Label", "TD_Label", "Lokasi_Label",
ohe_cols[1:3], musim_cols[1:2])
summary(df[, encoded_cols])
#> BMI_Label TD_Label Lokasi_Label Kondisi_Kesehatan
#> Min. :1.00 Min. :1.000 Min. :1.00 Cardiovascular Disease: 83
#> 1st Qu.:2.00 1st Qu.:1.000 1st Qu.:2.00 Diabetes : 89
#> Median :3.00 Median :2.000 Median :3.00 Healthy :105
#> Mean :2.59 Mean :1.642 Mean :2.93 Hypertension :125
#> 3rd Qu.:3.00 3rd Qu.:2.000 3rd Qu.:4.00 Obesity : 98
#> Max. :4.00 Max. :3.000 Max. :5.00
#> Kondisi_Cardiovascular_Disease Kondisi_Diabetes Musim_Dry_Season
#> Min. :0.000 Min. :0.000 Min. :0.000
#> 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000
#> Median :0.000 Median :0.000 Median :0.000
#> Mean :0.166 Mean :0.178 Mean :0.354
#> 3rd Qu.:0.000 3rd Qu.:0.000 3rd Qu.:1.000
#> Max. :1.000 Max. :1.000 Max. :1.000
#> Musim_Rainy_Season
#> Min. :0.000
#> 1st Qu.:0.000
#> Median :0.000
#> Mean :0.304
#> 3rd Qu.:1.000
#> Max. :1.000
Strategi Encoding
| Metode | Digunakan untuk | Contoh |
|---|---|---|
| One-Hot | Variabel nominal tanpa urutan | Kondisi, Musim, Lokasi |
| Label/Ordinal | Variabel dengan tingkatan alami | Kategori BMI, Tekanan Darah |
| Biner | Penanda kondisi ya/tidak | Has_Diabetes, Has_Hypertension |
📘 Interpretasi – 6.9.2.7 One-hot encoding
menghasilkan kolom biner untuk setiap kategori kondisi kesehatan dan
musim, menghindari ordinalitas buatan pada data nominal. Label encoding
diterapkan pada variabel yang memiliki urutan alami
(Kategori_BMI, Kategori_TD). Kombinasi kedua
teknik ini menghasilkan representasi numerik yang siap digunakan oleh
algoritma machine learning seperti regresi logistik, SVM, maupun neural
network.
📌 Perhatian Multikolinearitas Saat menggunakan one-hot encoding dalam regresi, hapus salah satu kolom dummy (dummy variable trap) untuk menghindari multikolinearitas sempurna. Misalnya, untuk k kategori, gunakan k-1 kolom dummy.
Normalisasi atau Skala Fitur
Menerapkan normalisasi Z-score dan skala min-max pada fitur numerik klinis untuk keperluan pemodelan.
# 1. Normalisasi Z-score (Standardisasi)
# Z = (x - mean) / sd → mean=0, sd=1
fitur_numerik <- c("BMI", "Tekanan_Darah", "Kolesterol",
"Glukosa", "Detak_Jantung", "Usia")
for (col in fitur_numerik) {
col_z <- paste0(col, "_Z")
df[[col_z]] <- as.numeric(scale(df[[col]]))
}
# Verifikasi: mean ≈ 0, sd ≈ 1
cat("Verifikasi Z-score (mean & sd):\n")
#> Verifikasi Z-score (mean & sd):
z_cols <- paste0(fitur_numerik, "_Z")
z_check <- sapply(z_cols, function(c) {
c(mean = round(mean(df[[c]], na.rm = TRUE), 5),
sd = round(sd(df[[c]], na.rm = TRUE), 3))
})
print(t(z_check))
#> mean sd
#> BMI_Z 0 1
#> Tekanan_Darah_Z 0 1
#> Kolesterol_Z 0 1
#> Glukosa_Z 0 1
#> Detak_Jantung_Z 0 1
#> Usia_Z 0 1
# 2. Normalisasi Min-Max
# MinMax = (x - min) / (max - min) → rentang [0, 1]
minmax_norm <- function(x) {
(x - min(x, na.rm = TRUE)) / (max(x, na.rm = TRUE) - min(x, na.rm = TRUE))
}
for (col in fitur_numerik) {
col_mm <- paste0(col, "_MM")
df[[col_mm]] <- minmax_norm(df[[col]])
}
# Verifikasi: rentang [0, 1]
cat("\nVerifikasi Min-Max (min & max):\n")
#>
#> Verifikasi Min-Max (min & max):
mm_cols <- paste0(fitur_numerik, "_MM")
mm_check <- sapply(mm_cols, function(c) {
c(min = round(min(df[[c]], na.rm = TRUE), 3),
max = round(max(df[[c]], na.rm = TRUE), 3))
})
print(t(mm_check))
#> min max
#> BMI_MM 0 1
#> Tekanan_Darah_MM 0 1
#> Kolesterol_MM 0 1
#> Glukosa_MM 0 1
#> Detak_Jantung_MM 0 1
#> Usia_MM 0 1
# Perbandingan nilai asli vs Z-score vs Min-Max untuk BMI
cat("\nPerbandingan BMI: Asli | Z-score | Min-Max (5 baris):\n")
#>
#> Perbandingan BMI: Asli | Z-score | Min-Max (5 baris):
print(df[1:5, c("BMI", "BMI_Z", "BMI_MM")])
#> BMI BMI_Z BMI_MM
#> 1 31.5 1.536953 0.7439024
#> 2 26.9 0.434473 0.5569106
#> 3 18.2 -1.650652 0.2032520
#> 4 19.9 -1.243214 0.2723577
#> 5 32.5 1.776623 0.7845528
# Dataset akhir siap pakai (kolom terpilih)
df_final <- df %>%
select(ID_Pasien, Tanggal, Usia, Lokasi, Kondisi_Kesehatan,
BMI_Z, Tekanan_Darah_Z, Kolesterol_Z, Glukosa_Z, Detak_Jantung_Z,
BMI_MM, Tekanan_Darah_MM, Kolesterol_MM, Glukosa_MM,
Kategori_BMI, Kategori_TD, Has_Diabetes, Has_Hypertension,
Skor_Risiko, BMI_Label, TD_Label)
cat("Dimensi dataset final siap ML:", nrow(df_final), "x", ncol(df_final), "\n")
#> Dimensi dataset final siap ML: 500 x 21
cat("Kolom dataset final:\n")
#> Kolom dataset final:
print(names(df_final))
#> [1] "ID_Pasien" "Tanggal" "Usia"
#> [4] "Lokasi" "Kondisi_Kesehatan" "BMI_Z"
#> [7] "Tekanan_Darah_Z" "Kolesterol_Z" "Glukosa_Z"
#> [10] "Detak_Jantung_Z" "BMI_MM" "Tekanan_Darah_MM"
#> [13] "Kolesterol_MM" "Glukosa_MM" "Kategori_BMI"
#> [16] "Kategori_TD" "Has_Diabetes" "Has_Hypertension"
#> [19] "Skor_Risiko" "BMI_Label" "TD_Label"
# Perbandingan distribusi BMI: sebelum & sesudah normalisasi
par(mfrow = c(2, 1), mar = c(3.5, 3.5, 2.5, 1))
hist(df$BMI, breaks = 30, col = "#eef4fd", border = "#1e4d7a",
main = "BMI — Nilai Asli", xlab = "BMI", ylab = "Frekuensi",
cex.main = 0.9, cex.axis = 0.8)
hist(df$BMI_Z, breaks = 30, col = "#eaf4f0", border = "#2d7a5f",
main = "BMI — Setelah Z-score Normalization",
xlab = "Z-score", ylab = "Frekuensi",
cex.main = 0.9, cex.axis = 0.8)
par(mfrow = c(1, 1))
# Scatter plot: Glukosa Z vs BMI Z, warnai per kondisi
ggplot(df, aes(x = BMI_Z, y = Glukosa_Z, color = Kondisi_Kesehatan)) +
geom_point(alpha = 0.5, size = 1.5) +
scale_color_manual(values = c("#2d7a5f","#1e4d7a","#7a5c1e","#8b2a2a","#5a5a72")) +
labs(title = "BMI vs Glukosa (Z-score) per Kondisi",
x = "BMI (Z-score)", y = "Glukosa (Z-score)",
color = "Kondisi") +
theme_minimal(base_size = 10) +
theme(legend.position = "right",
legend.text = element_text(size = 8),
plot.title = element_text(face = "bold", size = 10))
📘 Interpretasi – 6.9.2.8 Normalisasi Z-score menghasilkan fitur dengan mean = 0 dan standar deviasi = 1, ideal untuk algoritma berbasis gradien (regresi logistik, SVM, neural network). Normalisasi Min-Max menghasilkan fitur dalam rentang [0, 1], cocok untuk jaringan saraf dan algoritma berbasis jarak seperti k-NN. Dataset final berisi 21 kolom yang siap untuk pipeline machine learning, mencakup fitur asli, fitur yang direkayasa, variabel yang dikodekan, dan versi yang dinormalisasi.