Dataset “Glass Classification” merupakan kumpulan data identifikasi jenis kaca berdasarkan persentase material yang digunakan dalam kaca tersebut. Dataset disediakan oleh UCI Machine Learning dengan lisensi Database Contents License v1.0. Dataset bisa di-download di Kaggle.
glass <- read.csv("glass.csv")str(glass)## 'data.frame': 214 obs. of 10 variables:
## $ RI : num 1.52 1.52 1.52 1.52 1.52 ...
## $ Na : num 13.6 13.9 13.5 13.2 13.3 ...
## $ Mg : num 4.49 3.6 3.55 3.69 3.62 3.61 3.6 3.61 3.58 3.6 ...
## $ Al : num 1.1 1.36 1.54 1.29 1.24 1.62 1.14 1.05 1.37 1.36 ...
## $ Si : num 71.8 72.7 73 72.6 73.1 ...
## $ K : num 0.06 0.48 0.39 0.57 0.55 0.64 0.58 0.57 0.56 0.57 ...
## $ Ca : num 8.75 7.83 7.78 8.22 8.07 8.07 8.17 8.24 8.3 8.4 ...
## $ Ba : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Fe : num 0 0 0 0 0 0.26 0 0 0 0.11 ...
## $ Type: int 1 1 1 1 1 1 1 1 1 1 ...
Data glass memiliki 10 variabel (kolom) dengan 214
observasi (baris). Berdasarkan halaman dataset Kaggle: Glass
Classification, berikut informasi mengenai tiap variabel/kolom yang
tersedia di dataset glass.
RI <- "refractive index (Indeks bias)"
Na <- "Sodium (%)"
Mg <- "Magnesium (%)"
Al <- "Aluminium (%)"
Si <- "Silicon (%)"
K <- "Potassium (%)"
Ca <- "Calcium (%)"
Ba <- "Barium (%)"
Fe <- "Iron (%)"
Type <- "Tipe kaca"Catatan:
TypeKolom Type mengindikasikan jenis kacanya yang bernilai
\(1 - 7\) dan berupa
kategori. Sehingga, kolom tersebut harus diubah dari
int menjadi Factor.
glass$Type <- as.factor(glass$Type)
str(glass)## 'data.frame': 214 obs. of 10 variables:
## $ RI : num 1.52 1.52 1.52 1.52 1.52 ...
## $ Na : num 13.6 13.9 13.5 13.2 13.3 ...
## $ Mg : num 4.49 3.6 3.55 3.69 3.62 3.61 3.6 3.61 3.58 3.6 ...
## $ Al : num 1.1 1.36 1.54 1.29 1.24 1.62 1.14 1.05 1.37 1.36 ...
## $ Si : num 71.8 72.7 73 72.6 73.1 ...
## $ K : num 0.06 0.48 0.39 0.57 0.55 0.64 0.58 0.57 0.56 0.57 ...
## $ Ca : num 8.75 7.83 7.78 8.22 8.07 8.07 8.17 8.24 8.3 8.4 ...
## $ Ba : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Fe : num 0 0 0 0 0 0.26 0 0 0 0.11 ...
## $ Type: Factor w/ 6 levels "1","2","3","5",..: 1 1 1 1 1 1 1 1 1 1 ...
levels(glass$Type)## [1] "1" "2" "3" "5" "6" "7"
Dari informasi diatas, diketahui bahwa jenis kaca \(4\) tidak ada di dataset. Hal tersebut juga dikonfirmasi pada halaman dataset.
Berikut penjelasan mengenai angka tersebut dari halaman dataset:
1 <- "building-windows-float-processed"
2 <- "building-windows-nonfloat-processed"
3 <- "vehicle-windows-float-processed"
4 <- "vehicle-windows-nonfloat-processed" # tidak ada di dataset
5 <- "containers"
6 <- "tableware"
7 <- "headlamps"Untuk memudahkan pembacaan nanti, nama Factor diubah menjadi kategori seperti diatas.
type_name <- c(
"building-windows-float-processed",
"building-windows-nonfloat-processed",
"vehicle-windows-float-processed",
"containers",
"tableware",
"headlamps"
)
levels(glass$Type) <- type_name
str(glass)## 'data.frame': 214 obs. of 10 variables:
## $ RI : num 1.52 1.52 1.52 1.52 1.52 ...
## $ Na : num 13.6 13.9 13.5 13.2 13.3 ...
## $ Mg : num 4.49 3.6 3.55 3.69 3.62 3.61 3.6 3.61 3.58 3.6 ...
## $ Al : num 1.1 1.36 1.54 1.29 1.24 1.62 1.14 1.05 1.37 1.36 ...
## $ Si : num 71.8 72.7 73 72.6 73.1 ...
## $ K : num 0.06 0.48 0.39 0.57 0.55 0.64 0.58 0.57 0.56 0.57 ...
## $ Ca : num 8.75 7.83 7.78 8.22 8.07 8.07 8.17 8.24 8.3 8.4 ...
## $ Ba : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Fe : num 0 0 0 0 0 0.26 0 0 0 0.11 ...
## $ Type: Factor w/ 6 levels "building-windows-float-processed",..: 1 1 1 1 1 1 1 1 1 1 ...
Menampilkan 10 baris pertama dataset glass.
glass |> head(10)Menampilkan 10 baris terakhir dataset glass.
glass |> tail(10)Ditampilkan dengan menukar kolom <-> baris.
glass |> tail(10) |> t() |> as.data.frame()Dimensi dataset.
dim(glass)## [1] 214 10
Informasi diatas juga dapat ditemukan saat melihat struktur objek
glass dengan str(...). Bisa juga menampilkan
jumlah baris dan kolom dari glass menggunakan
nrow(...) dan ncol(...).
glass |> nrow()## [1] 214
glass |> ncol()## [1] 10
Rangkuman dataset glass dapat dilihat menggunakan
summary(...).
summary(glass);## RI Na Mg Al
## Min. :1.511 Min. :10.73 Min. :0.000 Min. :0.290
## 1st Qu.:1.517 1st Qu.:12.91 1st Qu.:2.115 1st Qu.:1.190
## Median :1.518 Median :13.30 Median :3.480 Median :1.360
## Mean :1.518 Mean :13.41 Mean :2.685 Mean :1.445
## 3rd Qu.:1.519 3rd Qu.:13.82 3rd Qu.:3.600 3rd Qu.:1.630
## Max. :1.534 Max. :17.38 Max. :4.490 Max. :3.500
## Si K Ca Ba
## Min. :69.81 Min. :0.0000 Min. : 5.430 Min. :0.000
## 1st Qu.:72.28 1st Qu.:0.1225 1st Qu.: 8.240 1st Qu.:0.000
## Median :72.79 Median :0.5550 Median : 8.600 Median :0.000
## Mean :72.65 Mean :0.4971 Mean : 8.957 Mean :0.175
## 3rd Qu.:73.09 3rd Qu.:0.6100 3rd Qu.: 9.172 3rd Qu.:0.000
## Max. :75.41 Max. :6.2100 Max. :16.190 Max. :3.150
## Fe Type
## Min. :0.00000 building-windows-float-processed :70
## 1st Qu.:0.00000 building-windows-nonfloat-processed:76
## Median :0.00000 vehicle-windows-float-processed :17
## Mean :0.05701 containers :13
## 3rd Qu.:0.10000 tableware : 9
## Max. :0.51000 headlamps :29
"RI" "Na" "Mg" "Al"
Min. ->> 1.511 Min. ->> 10.73 Min. ->> 0.000 Min. ->> 0.290
1st Qu. ->> 1.517 1st Qu. ->> 12.91 1st Qu. ->> 2.115 1st Qu. ->> 1.190
Median ->> 1.518 Median ->> 13.30 Median ->> 3.480 Median ->> 1.360
Mean ->> 1.518 Mean ->> 13.41 Mean ->> 2.685 Mean ->> 1.445
3rd Qu. ->> 1.519 3rd Qu. ->> 13.82 3rd Qu. ->> 3.600 3rd Qu. ->> 1.630
Max. ->> 1.534 Max. ->> 17.38 Max. ->> 4.490 Max. ->> 3.500
"Si" "K" "Ca" "Ba"
Min. ->> 69.81 Min. ->> 0.0000 Min. ->> 5.430 Min. ->> 0.000
1st Qu. ->> 72.28 1st Qu. ->> 0.1225 1st Qu. ->> 8.240 1st Qu. ->> 0.000
Median ->> 72.79 Median ->> 0.5550 Median ->> 8.600 Median ->> 0.000
Mean ->> 72.65 Mean ->> 0.4971 Mean ->> 8.957 Mean ->> 0.175
3rd Qu. ->> 73.09 3rd Qu. ->> 0.6100 3rd Qu. ->> 9.172 3rd Qu. ->> 0.000
Max. ->> 75.41 Max. ->> 6.2100 Max. ->> 16.190 Max. ->> 3.150
"Fe" "Type"
Min. ->> 0.00000 building-windows-float-processed ->> 70
1st Qu. ->> 0.00000 building-windows-nonfloat-processed ->> 76
Median ->> 0.00000 vehicle-windows-float-processed ->> 17
Mean ->> 0.05701 containers ->> 13
3rd Qu. ->> 0.10000 tableware ->> 9
Max. ->> 0.51000 headlamps ->> 29 Dari informasi diatas dapat dilihat nilai minimum, kuartil pertama, median, rata-rata, kuartil ketiga, maksimum.
Sebelum mengeksplorasi lebih lanjut, dilakukan pemeriksaan data yang hilang dalam dataset.
anyNA(glass)## [1] FALSE
Karena hasil dari anyNA(...) bernilai
FALSE. Maka, dataset glass tidak memiliki data
yang kosong/hilang. Sebagai catatan, bisa juga diperiksa jumlah data
yang kosong dengan menggabungkan fungsi is.NA(...) dengan
ColSum(...)
glass |> is.na() |> colSums()## RI Na Mg Al Si K Ca Ba Fe Type
## 0 0 0 0 0 0 0 0 0 0
Dari beberapa fungsi diatas bisa disimpulkan bahwa dataset
glass tidak memiliki data yang
hilang/kosong.
Pada bagian ini akan dilakukan subset dan agregasi data untuk memperoleh informasi yang lebih mendalam dari rangkuman sebelumnya.
Mengingat ulang, dataset memiliki \(6\) tipe kaca yang tersedia didataset.
level_types <- levels(glass$Type)
names(level_types) <- c(1,2,3,5,6,7) |> as.character()
level_types## 1 2
## "building-windows-float-processed" "building-windows-nonfloat-processed"
## 3 5
## "vehicle-windows-float-processed" "containers"
## 6 7
## "tableware" "headlamps"
Berikut dataset untuk tipe \(1\)
yaitu building-window-float-processed.
glass_type1 <- glass |>
subset(Type == level_types["1"])
glass_type1glass_type1 |> dim()## [1] 70 10
summary(glass_type1)## RI Na Mg Al
## Min. :1.512 Min. :12.45 Min. :2.710 Min. :0.290
## 1st Qu.:1.518 1st Qu.:12.82 1st Qu.:3.480 1st Qu.:1.113
## Median :1.518 Median :13.20 Median :3.565 Median :1.230
## Mean :1.519 Mean :13.24 Mean :3.552 Mean :1.164
## 3rd Qu.:1.520 3rd Qu.:13.53 3rd Qu.:3.658 3rd Qu.:1.327
## Max. :1.527 Max. :14.77 Max. :4.490 Max. :1.690
## Si K Ca Ba
## Min. :71.35 Min. :0.0000 Min. : 7.780 Min. :0.00000
## 1st Qu.:72.08 1st Qu.:0.2000 1st Qu.: 8.430 1st Qu.:0.00000
## Median :72.81 Median :0.5600 Median : 8.675 Median :0.00000
## Mean :72.62 Mean :0.4474 Mean : 8.797 Mean :0.01271
## 3rd Qu.:73.02 3rd Qu.:0.5900 3rd Qu.: 9.053 3rd Qu.:0.00000
## Max. :73.70 Max. :0.6900 Max. :10.170 Max. :0.69000
## Fe Type
## Min. :0.000 building-windows-float-processed :70
## 1st Qu.:0.000 building-windows-nonfloat-processed: 0
## Median :0.000 vehicle-windows-float-processed : 0
## Mean :0.057 containers : 0
## 3rd Qu.:0.110 tableware : 0
## Max. :0.310 headlamps : 0
Diatas terlampirkan hasil statistik mengenai tipe kaca \(1\)
(building-windows-float-processed).
Tipe \(2\)
(building-windows-nonfloat-processed)
glass_type2 <- glass |>
subset(Type == level_types["2"])
glass_type2glass_type2 |> dim()## [1] 76 10
glass_type2 |> summary()## RI Na Mg Al
## Min. :1.514 Min. :10.73 Min. :0.000 Min. :0.560
## 1st Qu.:1.516 1st Qu.:12.88 1st Qu.:3.058 1st Qu.:1.248
## Median :1.517 Median :13.15 Median :3.520 Median :1.460
## Mean :1.519 Mean :13.11 Mean :3.002 Mean :1.408
## 3rd Qu.:1.518 3rd Qu.:13.43 3rd Qu.:3.623 3rd Qu.:1.570
## Max. :1.534 Max. :14.86 Max. :3.980 Max. :2.120
## Si K Ca Ba
## Min. :69.81 Min. :0.0000 Min. : 7.080 Min. :0.00000
## 1st Qu.:72.33 1st Qu.:0.4800 1st Qu.: 8.037 1st Qu.:0.00000
## Median :72.73 Median :0.5800 Median : 8.275 Median :0.00000
## Mean :72.60 Mean :0.5211 Mean : 9.074 Mean :0.05026
## 3rd Qu.:73.06 3rd Qu.:0.6500 3rd Qu.: 8.915 3rd Qu.:0.00000
## Max. :74.45 Max. :1.1000 Max. :16.190 Max. :3.15000
## Fe Type
## Min. :0.00000 building-windows-float-processed : 0
## 1st Qu.:0.00000 building-windows-nonfloat-processed:76
## Median :0.00000 vehicle-windows-float-processed : 0
## Mean :0.07974 containers : 0
## 3rd Qu.:0.15500 tableware : 0
## Max. :0.35000 headlamps : 0
Tipe \(3\)
(vehicle-windows-float-processed)
glass_type3 <- glass |>
subset(Type == level_types["3"])
glass_type3glass_type3 |> dim()## [1] 17 10
glass_type3 |> summary()## RI Na Mg Al
## Min. :1.516 Min. :12.16 Min. :3.340 Min. :0.580
## 1st Qu.:1.517 1st Qu.:13.24 1st Qu.:3.400 1st Qu.:0.910
## Median :1.518 Median :13.42 Median :3.530 Median :1.280
## Mean :1.518 Mean :13.44 Mean :3.544 Mean :1.201
## 3rd Qu.:1.518 3rd Qu.:13.64 3rd Qu.:3.650 3rd Qu.:1.380
## Max. :1.522 Max. :14.32 Max. :3.900 Max. :1.760
## Si K Ca Ba
## Min. :71.36 Min. :0.0000 Min. :8.320 Min. :0.000000
## 1st Qu.:72.04 1st Qu.:0.1600 1st Qu.:8.530 1st Qu.:0.000000
## Median :72.64 Median :0.5600 Median :8.790 Median :0.000000
## Mean :72.40 Mean :0.4065 Mean :8.783 Mean :0.008824
## 3rd Qu.:72.70 3rd Qu.:0.5700 3rd Qu.:8.930 3rd Qu.:0.000000
## Max. :73.01 Max. :0.6100 Max. :9.650 Max. :0.150000
## Fe Type
## Min. :0.00000 building-windows-float-processed : 0
## 1st Qu.:0.00000 building-windows-nonfloat-processed: 0
## Median :0.00000 vehicle-windows-float-processed :17
## Mean :0.05706 containers : 0
## 3rd Qu.:0.09000 tableware : 0
## Max. :0.37000 headlamps : 0
Tipe \(5\)
(containers)
glass_type5 <- glass |>
subset(Type == level_types["5"])
glass_type5glass_type5 |> dim()## [1] 13 10
glass_type5 |> summary()## RI Na Mg Al
## Min. :1.513 Min. :11.03 Min. :0.0000 Min. :1.400
## 1st Qu.:1.517 1st Qu.:12.73 1st Qu.:0.0000 1st Qu.:1.560
## Median :1.520 Median :12.97 Median :0.0000 Median :1.760
## Mean :1.519 Mean :12.83 Mean :0.7738 Mean :2.034
## 3rd Qu.:1.521 3rd Qu.:13.27 3rd Qu.:1.7100 3rd Qu.:2.170
## Max. :1.524 Max. :14.01 Max. :2.6800 Max. :3.500
## Si K Ca Ba
## Min. :69.89 Min. :0.13 Min. : 5.87 Min. :0.0000
## 1st Qu.:72.18 1st Qu.:0.38 1st Qu.: 9.70 1st Qu.:0.0000
## Median :72.69 Median :0.58 Median :11.27 Median :0.0000
## Mean :72.37 Mean :1.47 Mean :10.12 Mean :0.1877
## 3rd Qu.:73.39 3rd Qu.:0.97 3rd Qu.:11.53 3rd Qu.:0.0000
## Max. :73.88 Max. :6.21 Max. :12.50 Max. :2.2000
## Fe Type
## Min. :0.00000 building-windows-float-processed : 0
## 1st Qu.:0.00000 building-windows-nonfloat-processed: 0
## Median :0.00000 vehicle-windows-float-processed : 0
## Mean :0.06077 containers :13
## 3rd Qu.:0.00000 tableware : 0
## Max. :0.51000 headlamps : 0
Tipe \(6\)
(tableware)
glass_type6 <- glass |>
subset(Type == level_types["6"])
glass_type6glass_type6 |> dim()## [1] 9 10
glass_type6 |> summary()## RI Na Mg Al
## Min. :1.511 Min. :13.79 Min. :0.000 Min. :0.340
## 1st Qu.:1.518 1st Qu.:14.09 1st Qu.:0.000 1st Qu.:1.190
## Median :1.519 Median :14.40 Median :1.740 Median :1.560
## Mean :1.517 Mean :14.65 Mean :1.306 Mean :1.367
## 3rd Qu.:1.519 3rd Qu.:14.56 3rd Qu.:2.240 3rd Qu.:1.660
## Max. :1.520 Max. :17.38 Max. :2.410 Max. :2.090
## Si K Ca Ba Fe
## Min. :72.37 Min. :0 Min. : 6.650 Min. :0 Min. :0
## 1st Qu.:72.50 1st Qu.:0 1st Qu.: 9.260 1st Qu.:0 1st Qu.:0
## Median :72.74 Median :0 Median : 9.570 Median :0 Median :0
## Mean :73.21 Mean :0 Mean : 9.357 Mean :0 Mean :0
## 3rd Qu.:73.48 3rd Qu.:0 3rd Qu.: 9.950 3rd Qu.:0 3rd Qu.:0
## Max. :75.41 Max. :0 Max. :11.220 Max. :0 Max. :0
## Type
## building-windows-float-processed :0
## building-windows-nonfloat-processed:0
## vehicle-windows-float-processed :0
## containers :0
## tableware :9
## headlamps :0
Tipe \(7\)
(headlamps)
glass_type7 <- glass |>
subset(Type == level_types["7"])
glass_type7glass_type7 |> dim()## [1] 29 10
glass_type7 |> summary()## RI Na Mg Al
## Min. :1.511 Min. :11.95 Min. :0.0000 Min. :1.190
## 1st Qu.:1.516 1st Qu.:14.20 1st Qu.:0.0000 1st Qu.:1.870
## Median :1.517 Median :14.39 Median :0.0000 Median :2.060
## Mean :1.517 Mean :14.44 Mean :0.5383 Mean :2.123
## 3rd Qu.:1.517 3rd Qu.:14.86 3rd Qu.:0.0000 3rd Qu.:2.420
## Max. :1.524 Max. :15.79 Max. :3.3400 Max. :2.880
## Si K Ca Ba
## Min. :70.26 Min. :0.0000 Min. :5.430 Min. :0.00
## 1st Qu.:72.86 1st Qu.:0.0000 1st Qu.:8.440 1st Qu.:0.61
## Median :73.11 Median :0.0000 Median :8.670 Median :0.81
## Mean :72.97 Mean :0.3252 Mean :8.491 Mean :1.04
## 3rd Qu.:73.36 3rd Qu.:0.1400 3rd Qu.:8.950 3rd Qu.:1.59
## Max. :75.18 Max. :2.7000 Max. :9.760 Max. :2.88
## Fe Type
## Min. :0.00000 building-windows-float-processed : 0
## 1st Qu.:0.00000 building-windows-nonfloat-processed: 0
## Median :0.00000 vehicle-windows-float-processed : 0
## Mean :0.01345 containers : 0
## 3rd Qu.:0.00000 tableware : 0
## Max. :0.09000 headlamps :29
Agregasi ini tidak berbeda dengan melakukan summary(...)
untuk setiap tipe kaca (seperti yang dilakukan pada saat
subsetting).
glass |> aggregate(x = . ~ Type, data = _, FUN = min)glass |> aggregate(x = . ~ Type, data = _, FUN = max)glass |> aggregate(x = . ~ Type, data = _, FUN = median)glass |> aggregate(x = . ~ Type, data = _, FUN = mean)glass |> aggregate(x = . ~ Type, data = _, FUN = length)Dari beberapa tahapan eksplorasi berikut beberapa yang bisa dihasilkan:
summary(glass$Type) |> list(Count = _) |>
as.data.frame()glass |>
aggregate(. ~ Type, data = _, mean)material_max <- glass |>
subset(select = -Type) |>
sapply(max)
material_max## RI Na Mg Al Si K Ca Ba
## 1.53393 17.38000 4.49000 3.50000 75.41000 6.21000 16.19000 3.15000
## Fe
## 0.51000
glass |>
subset(
RI == material_max["RI"]
) |>
subset(select = c(RI, Type))glass |>
subset(
Na == material_max["Na"]
) |>
subset(select = c(Na, Type))glass |>
subset(
Mg == material_max["Mg"]
) |>
subset(select = c(Mg, Type))glass |>
subset(
Al == material_max["Al"]
) |>
subset(select = c(Al, Type))glass |>
subset(
Si == material_max["Si"]
) |>
subset(select = c(Si, Type))glass |>
subset(
K == material_max["K"]
) |>
subset(select = c(K, Type))glass |>
subset(
Ca == material_max["Ca"]
) |>
subset(select = c(Ca, Type))glass |>
subset(
Ba == material_max["Ba"]
) |>
subset(select = c(Ba, Type))glass |>
subset(
Fe == material_max["Fe"]
) |>
subset(select = c(Fe, Type))