Penggunaan Titanic Dataset
R merupakan bahasa pemrograman yang banyak digunakan untuk analisis statistik. R memiliki banyak packages yang bisa di install secara gratis, salah satunya adalah dplyr. Pada dplyr terdapat dataset bernama Titanic.
Titanic dataset menyediakan kumpulan data yang berupa informasi nasib penumpang kapal Titanic yang mengalami kecelakaan beberapa puluh tahun silam. Data-data tersebut dirangkum sesuai dengan status ekonomi (kelas), jenis kelamin, usia dan kelangsungan hidup.
Load Package dplyr
Untuk menggunakan dataset ini, install package dplyr dan load package dplyr sebagai berikut:
#install.packages("dplyr") tanpa menggunakan tagar(#)
library(dplyr)##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Fungsi summarise()
Fungsi summarise() digunakan untuk meringkas data.
data = data.frame(Titanic)
penumpang = summarise(data, penumpang = sum(Freq))
penumpang## penumpang
## 1 2201
atau bisa juga dengan menggunakan syntax berikut:
penumpang = data %>%
summarise(penumpang = sum(Freq))
penumpang## penumpang
## 1 2201
Fungsi arrange()
Fungsi rrange() digunakan untuk mengurutkan data.
data %>%
arrange(desc(Freq))## Class Sex Age Survived Freq
## 1 Crew Male Adult No 670
## 2 3rd Male Adult No 387
## 3 Crew Male Adult Yes 192
## 4 2nd Male Adult No 154
## 5 1st Female Adult Yes 140
## 6 1st Male Adult No 118
## 7 3rd Female Adult No 89
## 8 2nd Female Adult Yes 80
## 9 3rd Female Adult Yes 76
## 10 3rd Male Adult Yes 75
## 11 1st Male Adult Yes 57
## 12 3rd Male Child No 35
## 13 Crew Female Adult Yes 20
## 14 3rd Female Child No 17
## 15 3rd Female Child Yes 14
## 16 2nd Male Adult Yes 14
## 17 2nd Female Adult No 13
## 18 3rd Male Child Yes 13
## 19 2nd Female Child Yes 13
## 20 2nd Male Child Yes 11
## 21 1st Male Child Yes 5
## 22 1st Female Adult No 4
## 23 Crew Female Adult No 3
## 24 1st Female Child Yes 1
## 25 1st Male Child No 0
## 26 2nd Male Child No 0
## 27 Crew Male Child No 0
## 28 1st Female Child No 0
## 29 2nd Female Child No 0
## 30 Crew Female Child No 0
## 31 Crew Male Child Yes 0
## 32 Crew Female Child Yes 0
Fungsi filter()
Fungsi filter() digunakan untuk memilih sebagian data berdasarkan nilai tertentu.
data %>%
filter(Sex == "Female")## Class Sex Age Survived Freq
## 1 1st Female Child No 0
## 2 2nd Female Child No 0
## 3 3rd Female Child No 17
## 4 Crew Female Child No 0
## 5 1st Female Adult No 4
## 6 2nd Female Adult No 13
## 7 3rd Female Adult No 89
## 8 Crew Female Adult No 3
## 9 1st Female Child Yes 1
## 10 2nd Female Child Yes 13
## 11 3rd Female Child Yes 14
## 12 Crew Female Child Yes 0
## 13 1st Female Adult Yes 140
## 14 2nd Female Adult Yes 80
## 15 3rd Female Adult Yes 76
## 16 Crew Female Adult Yes 20
Fungsi select()
Fungsi select() digunakan untuk memilih subset data berdasarkan peubah tertentu.
data %>%
select(Sex, Survived, Freq)## Sex Survived Freq
## 1 Male No 0
## 2 Male No 0
## 3 Male No 35
## 4 Male No 0
## 5 Female No 0
## 6 Female No 0
## 7 Female No 17
## 8 Female No 0
## 9 Male No 118
## 10 Male No 154
## 11 Male No 387
## 12 Male No 670
## 13 Female No 4
## 14 Female No 13
## 15 Female No 89
## 16 Female No 3
## 17 Male Yes 5
## 18 Male Yes 11
## 19 Male Yes 13
## 20 Male Yes 0
## 21 Female Yes 1
## 22 Female Yes 13
## 23 Female Yes 14
## 24 Female Yes 0
## 25 Male Yes 57
## 26 Male Yes 14
## 27 Male Yes 75
## 28 Male Yes 192
## 29 Female Yes 140
## 30 Female Yes 80
## 31 Female Yes 76
## 32 Female Yes 20
Fungsi mutate()
Fungsi mutate() digunakan untuk menambahkan peubah baru pada data.
data %>%
mutate(Freq_19 = Freq*19)## Class Sex Age Survived Freq Freq_19
## 1 1st Male Child No 0 0
## 2 2nd Male Child No 0 0
## 3 3rd Male Child No 35 665
## 4 Crew Male Child No 0 0
## 5 1st Female Child No 0 0
## 6 2nd Female Child No 0 0
## 7 3rd Female Child No 17 323
## 8 Crew Female Child No 0 0
## 9 1st Male Adult No 118 2242
## 10 2nd Male Adult No 154 2926
## 11 3rd Male Adult No 387 7353
## 12 Crew Male Adult No 670 12730
## 13 1st Female Adult No 4 76
## 14 2nd Female Adult No 13 247
## 15 3rd Female Adult No 89 1691
## 16 Crew Female Adult No 3 57
## 17 1st Male Child Yes 5 95
## 18 2nd Male Child Yes 11 209
## 19 3rd Male Child Yes 13 247
## 20 Crew Male Child Yes 0 0
## 21 1st Female Child Yes 1 19
## 22 2nd Female Child Yes 13 247
## 23 3rd Female Child Yes 14 266
## 24 Crew Female Child Yes 0 0
## 25 1st Male Adult Yes 57 1083
## 26 2nd Male Adult Yes 14 266
## 27 3rd Male Adult Yes 75 1425
## 28 Crew Male Adult Yes 192 3648
## 29 1st Female Adult Yes 140 2660
## 30 2nd Female Adult Yes 80 1520
## 31 3rd Female Adult Yes 76 1444
## 32 Crew Female Adult Yes 20 380
Aggregasi data
Aggregasi data berdasarkan jenis kelamin, kelangsungan hidup dan usia untuk menampilkan jumlah frekuensi data.
data %>%
filter(Age == "Child") %>%
group_by(Sex, Survived, Age) %>%
summarise(freq = n(),.groups = "drop") %>%
arrange(desc(freq))## # A tibble: 4 x 4
## Sex Survived Age freq
## <fct> <fct> <fct> <int>
## 1 Male No Child 4
## 2 Male Yes Child 4
## 3 Female No Child 4
## 4 Female Yes Child 4
data %>%
filter(Survived != 0) %>%
group_by(Sex, Survived, Age) %>%
summarise(freq = n(),.groups = "drop") %>%
arrange(desc(freq))## # A tibble: 8 x 4
## Sex Survived Age freq
## <fct> <fct> <fct> <int>
## 1 Male No Child 4
## 2 Male No Adult 4
## 3 Male Yes Child 4
## 4 Male Yes Adult 4
## 5 Female No Child 4
## 6 Female No Adult 4
## 7 Female Yes Child 4
## 8 Female Yes Adult 4