Penggunaan Titanic Dataset

R merupakan bahasa pemrograman yang banyak digunakan untuk analisis statistik. R memiliki banyak packages yang bisa di install secara gratis, salah satunya adalah dplyr. Pada dplyr terdapat dataset bernama Titanic.

Titanic dataset menyediakan kumpulan data yang berupa informasi nasib penumpang kapal Titanic yang mengalami kecelakaan beberapa puluh tahun silam. Data-data tersebut dirangkum sesuai dengan status ekonomi (kelas), jenis kelamin, usia dan kelangsungan hidup.

Load Package dplyr

Untuk menggunakan dataset ini, install package dplyr dan load package dplyr sebagai berikut:

#install.packages("dplyr") tanpa menggunakan tagar(#)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Fungsi summarise()

Fungsi summarise() digunakan untuk meringkas data.

data = data.frame(Titanic)
penumpang = summarise(data, penumpang = sum(Freq))
penumpang
##   penumpang
## 1      2201

atau bisa juga dengan menggunakan syntax berikut:

penumpang = data %>% 
  summarise(penumpang = sum(Freq))
penumpang
##   penumpang
## 1      2201

Fungsi arrange()

Fungsi rrange() digunakan untuk mengurutkan data.

data %>% 
  arrange(desc(Freq))
##    Class    Sex   Age Survived Freq
## 1   Crew   Male Adult       No  670
## 2    3rd   Male Adult       No  387
## 3   Crew   Male Adult      Yes  192
## 4    2nd   Male Adult       No  154
## 5    1st Female Adult      Yes  140
## 6    1st   Male Adult       No  118
## 7    3rd Female Adult       No   89
## 8    2nd Female Adult      Yes   80
## 9    3rd Female Adult      Yes   76
## 10   3rd   Male Adult      Yes   75
## 11   1st   Male Adult      Yes   57
## 12   3rd   Male Child       No   35
## 13  Crew Female Adult      Yes   20
## 14   3rd Female Child       No   17
## 15   3rd Female Child      Yes   14
## 16   2nd   Male Adult      Yes   14
## 17   2nd Female Adult       No   13
## 18   3rd   Male Child      Yes   13
## 19   2nd Female Child      Yes   13
## 20   2nd   Male Child      Yes   11
## 21   1st   Male Child      Yes    5
## 22   1st Female Adult       No    4
## 23  Crew Female Adult       No    3
## 24   1st Female Child      Yes    1
## 25   1st   Male Child       No    0
## 26   2nd   Male Child       No    0
## 27  Crew   Male Child       No    0
## 28   1st Female Child       No    0
## 29   2nd Female Child       No    0
## 30  Crew Female Child       No    0
## 31  Crew   Male Child      Yes    0
## 32  Crew Female Child      Yes    0

Fungsi filter()

Fungsi filter() digunakan untuk memilih sebagian data berdasarkan nilai tertentu.

data %>% 
  filter(Sex == "Female")
##    Class    Sex   Age Survived Freq
## 1    1st Female Child       No    0
## 2    2nd Female Child       No    0
## 3    3rd Female Child       No   17
## 4   Crew Female Child       No    0
## 5    1st Female Adult       No    4
## 6    2nd Female Adult       No   13
## 7    3rd Female Adult       No   89
## 8   Crew Female Adult       No    3
## 9    1st Female Child      Yes    1
## 10   2nd Female Child      Yes   13
## 11   3rd Female Child      Yes   14
## 12  Crew Female Child      Yes    0
## 13   1st Female Adult      Yes  140
## 14   2nd Female Adult      Yes   80
## 15   3rd Female Adult      Yes   76
## 16  Crew Female Adult      Yes   20

Fungsi select()

Fungsi select() digunakan untuk memilih subset data berdasarkan peubah tertentu.

data %>% 
  select(Sex, Survived, Freq)
##       Sex Survived Freq
## 1    Male       No    0
## 2    Male       No    0
## 3    Male       No   35
## 4    Male       No    0
## 5  Female       No    0
## 6  Female       No    0
## 7  Female       No   17
## 8  Female       No    0
## 9    Male       No  118
## 10   Male       No  154
## 11   Male       No  387
## 12   Male       No  670
## 13 Female       No    4
## 14 Female       No   13
## 15 Female       No   89
## 16 Female       No    3
## 17   Male      Yes    5
## 18   Male      Yes   11
## 19   Male      Yes   13
## 20   Male      Yes    0
## 21 Female      Yes    1
## 22 Female      Yes   13
## 23 Female      Yes   14
## 24 Female      Yes    0
## 25   Male      Yes   57
## 26   Male      Yes   14
## 27   Male      Yes   75
## 28   Male      Yes  192
## 29 Female      Yes  140
## 30 Female      Yes   80
## 31 Female      Yes   76
## 32 Female      Yes   20

Fungsi mutate()

Fungsi mutate() digunakan untuk menambahkan peubah baru pada data.

data %>% 
  mutate(Freq_19 = Freq*19)
##    Class    Sex   Age Survived Freq Freq_19
## 1    1st   Male Child       No    0       0
## 2    2nd   Male Child       No    0       0
## 3    3rd   Male Child       No   35     665
## 4   Crew   Male Child       No    0       0
## 5    1st Female Child       No    0       0
## 6    2nd Female Child       No    0       0
## 7    3rd Female Child       No   17     323
## 8   Crew Female Child       No    0       0
## 9    1st   Male Adult       No  118    2242
## 10   2nd   Male Adult       No  154    2926
## 11   3rd   Male Adult       No  387    7353
## 12  Crew   Male Adult       No  670   12730
## 13   1st Female Adult       No    4      76
## 14   2nd Female Adult       No   13     247
## 15   3rd Female Adult       No   89    1691
## 16  Crew Female Adult       No    3      57
## 17   1st   Male Child      Yes    5      95
## 18   2nd   Male Child      Yes   11     209
## 19   3rd   Male Child      Yes   13     247
## 20  Crew   Male Child      Yes    0       0
## 21   1st Female Child      Yes    1      19
## 22   2nd Female Child      Yes   13     247
## 23   3rd Female Child      Yes   14     266
## 24  Crew Female Child      Yes    0       0
## 25   1st   Male Adult      Yes   57    1083
## 26   2nd   Male Adult      Yes   14     266
## 27   3rd   Male Adult      Yes   75    1425
## 28  Crew   Male Adult      Yes  192    3648
## 29   1st Female Adult      Yes  140    2660
## 30   2nd Female Adult      Yes   80    1520
## 31   3rd Female Adult      Yes   76    1444
## 32  Crew Female Adult      Yes   20     380

Aggregasi data

Aggregasi data berdasarkan jenis kelamin, kelangsungan hidup dan usia untuk menampilkan jumlah frekuensi data.

data %>%
  filter(Age == "Child") %>% 
  group_by(Sex, Survived, Age) %>% 
  summarise(freq = n(),.groups = "drop") %>% 
  arrange(desc(freq))
## # A tibble: 4 x 4
##   Sex    Survived Age    freq
##   <fct>  <fct>    <fct> <int>
## 1 Male   No       Child     4
## 2 Male   Yes      Child     4
## 3 Female No       Child     4
## 4 Female Yes      Child     4
data %>%
  filter(Survived != 0) %>% 
  group_by(Sex, Survived, Age) %>% 
  summarise(freq = n(),.groups = "drop") %>% 
  arrange(desc(freq))
## # A tibble: 8 x 4
##   Sex    Survived Age    freq
##   <fct>  <fct>    <fct> <int>
## 1 Male   No       Child     4
## 2 Male   No       Adult     4
## 3 Male   Yes      Child     4
## 4 Male   Yes      Adult     4
## 5 Female No       Child     4
## 6 Female No       Adult     4
## 7 Female Yes      Child     4
## 8 Female Yes      Adult     4