Tugas Akhir STA261

Engine SQL

Database yang digunakan adalah database northwindLink Download . Diperlukan package tidyverse, DBI , danRSQLite.

Melakukan Instalasi Package

install.packages(c("RSQLite", "DBI"), dependencies = TRUE)
install.packages("tidyverse")

Memanggil Library

Selanjutnya package tidyverse,RSQLite, dan DBI dipanggil .

library(tidyverse)
library(RSQLite)
library(DBI)

Mengoneksikan Database

Koneksi terhadap database menggunakan drive DBMS SQLite dapat digunakan pada R dengan menggunakan sintaks berikut: DBI::dbConnect(RSQLite::SQLite(), path = ":dbname:")

northwind<-DBI::dbConnect(RSQLite::SQLite(), "C:/sqlite/northwind_large.sqlite")

Menggunakan Query SQL pada Database

Setelah melakukan dbConnect(), baru kita dapat melakukan perintah SQL pada RStudio.

SELECT 
  *
FROM
  OrderDetail
WHERE
  OrderId = "10250"
ORDER BY
  ProductId;
3 records
Id OrderId ProductId UnitPrice Quantity Discount
10250/41 10250 41 7.7 10 0.00
10250/51 10250 51 42.4 35 0.15
10250/65 10250 65 16.8 15 0.15

Memutus Koneksi Database

dbDisconnect(northwind)

Menggunakan dplyr

northwind<-DBI::dbConnect(RSQLite::SQLite(), "C:/sqlite/northwind_large.sqlite")

Melihat Daftar Entity/Tabel dalam Database

RSQLite::dbListTables(northwind)
##  [1] "Category"             "Customer"             "CustomerCustomerDemo"
##  [4] "CustomerDemographic"  "Employee"             "EmployeeTerritory"   
##  [7] "Order"                "OrderDetail"          "Product"             
## [10] "ProductDetails_V"     "Region"               "Shipper"             
## [13] "Supplier"             "Territory"

Memilih Entity dalam Database

product <- dplyr::tbl(northwind,"Product")
product
## # Source:   table<Product> [?? x 10]
## # Database: sqlite 3.36.0 [C:\sqlite\northwind_large.sqlite]
##       Id ProductName           SupplierId CategoryId QuantityPerUnit   UnitPrice
##    <int> <chr>                      <int>      <int> <chr>                 <dbl>
##  1     1 Chai                           1          1 10 boxes x 20 ba~      18  
##  2     2 Chang                          1          1 24 - 12 oz bottl~      19  
##  3     3 Aniseed Syrup                  1          2 12 - 550 ml bott~      10  
##  4     4 Chef Anton's Cajun S~          2          2 48 - 6 oz jars         22  
##  5     5 Chef Anton's Gumbo M~          2          2 36 boxes               21.4
##  6     6 Grandma's Boysenberr~          3          2 12 - 8 oz jars         25  
##  7     7 Uncle Bob's Organic ~          3          7 12 - 1 lb pkgs.        30  
##  8     8 Northwoods Cranberry~          3          2 12 - 12 oz jars        40  
##  9     9 Mishi Kobe Niku                4          6 18 - 500 g pkgs.       97  
## 10    10 Ikura                          4          8 12 - 200 ml jars       31  
## # ... with more rows, and 4 more variables: UnitsInStock <int>,
## #   UnitsOnOrder <int>, ReorderLevel <int>, Discontinued <int>

Melihat Kelas dari Tabel

class(product)
## [1] "tbl_SQLiteConnection" "tbl_dbi"              "tbl_sql"             
## [4] "tbl_lazy"             "tbl"

Melakukan Wrangling pada Tabel

Sintaks berikut akan menyeleksi tabel dengan membuang kolom UnitsOnOrder dan ReorderLevel. Selanjutnya tabel difilter dan hanya menampilkan data dengan UnitPrice > 10.

query <- product  %>% select(-UnitsOnOrder,-ReorderLevel) %>% filter(UnitPrice>10)
query 
## # Source:   lazy query [?? x 8]
## # Database: sqlite 3.36.0 [C:\sqlite\northwind_large.sqlite]
##       Id ProductName            SupplierId CategoryId QuantityPerUnit  UnitPrice
##    <int> <chr>                       <int>      <int> <chr>                <dbl>
##  1     1 Chai                            1          1 10 boxes x 20 b~      18  
##  2     2 Chang                           1          1 24 - 12 oz bott~      19  
##  3     4 Chef Anton's Cajun Se~          2          2 48 - 6 oz jars        22  
##  4     5 Chef Anton's Gumbo Mix          2          2 36 boxes              21.4
##  5     6 Grandma's Boysenberry~          3          2 12 - 8 oz jars        25  
##  6     7 Uncle Bob's Organic D~          3          7 12 - 1 lb pkgs.       30  
##  7     8 Northwoods Cranberry ~          3          2 12 - 12 oz jars       40  
##  8     9 Mishi Kobe Niku                 4          6 18 - 500 g pkgs.      97  
##  9    10 Ikura                           4          8 12 - 200 ml jars      31  
## 10    11 Queso Cabrales                  5          4 1 kg pkg.             21  
## # ... with more rows, and 2 more variables: UnitsInStock <int>,
## #   Discontinued <int>

Melihat Query SQL untuk Menampilkan Tabel yang Telah Diubah

dplyr::show_query(query)
## <SQL>
## SELECT *
## FROM (SELECT `Id`, `ProductName`, `SupplierId`, `CategoryId`, `QuantityPerUnit`, `UnitPrice`, `UnitsInStock`, `Discontinued`
## FROM `Product`)
## WHERE (`UnitPrice` > 10.0)

Data Wrangling

Dataset yang digunakan adalah esoph yang tersedia pada package default R datasets.

Memanggil Library Dataset

library(datasets)
esoph
##    agegp     alcgp    tobgp ncases ncontrols
## 1  25-34 0-39g/day 0-9g/day      0        40
## 2  25-34 0-39g/day    10-19      0        10
## 3  25-34 0-39g/day    20-29      0         6
## 4  25-34 0-39g/day      30+      0         5
## 5  25-34     40-79 0-9g/day      0        27
## 6  25-34     40-79    10-19      0         7
## 7  25-34     40-79    20-29      0         4
## 8  25-34     40-79      30+      0         7
## 9  25-34    80-119 0-9g/day      0         2
## 10 25-34    80-119    10-19      0         1
## 11 25-34    80-119      30+      0         2
## 12 25-34      120+ 0-9g/day      0         1
## 13 25-34      120+    10-19      1         1
## 14 25-34      120+    20-29      0         1
## 15 25-34      120+      30+      0         2
## 16 35-44 0-39g/day 0-9g/day      0        60
## 17 35-44 0-39g/day    10-19      1        14
## 18 35-44 0-39g/day    20-29      0         7
## 19 35-44 0-39g/day      30+      0         8
## 20 35-44     40-79 0-9g/day      0        35
## 21 35-44     40-79    10-19      3        23
## 22 35-44     40-79    20-29      1        14
## 23 35-44     40-79      30+      0         8
## 24 35-44    80-119 0-9g/day      0        11
## 25 35-44    80-119    10-19      0         6
## 26 35-44    80-119    20-29      0         2
## 27 35-44    80-119      30+      0         1
## 28 35-44      120+ 0-9g/day      2         3
## 29 35-44      120+    10-19      0         3
## 30 35-44      120+    20-29      2         4
## 31 45-54 0-39g/day 0-9g/day      1        46
## 32 45-54 0-39g/day    10-19      0        18
## 33 45-54 0-39g/day    20-29      0        10
## 34 45-54 0-39g/day      30+      0         4
## 35 45-54     40-79 0-9g/day      6        38
## 36 45-54     40-79    10-19      4        21
## 37 45-54     40-79    20-29      5        15
## 38 45-54     40-79      30+      5         7
## 39 45-54    80-119 0-9g/day      3        16
## 40 45-54    80-119    10-19      6        14
## 41 45-54    80-119    20-29      1         5
## 42 45-54    80-119      30+      2         4
## 43 45-54      120+ 0-9g/day      4         4
## 44 45-54      120+    10-19      3         4
## 45 45-54      120+    20-29      2         3
## 46 45-54      120+      30+      4         4
## 47 55-64 0-39g/day 0-9g/day      2        49
## 48 55-64 0-39g/day    10-19      3        22
## 49 55-64 0-39g/day    20-29      3        12
## 50 55-64 0-39g/day      30+      4         6
## 51 55-64     40-79 0-9g/day      9        40
## 52 55-64     40-79    10-19      6        21
## 53 55-64     40-79    20-29      4        17
## 54 55-64     40-79      30+      3         6
## 55 55-64    80-119 0-9g/day      9        18
## 56 55-64    80-119    10-19      8        15
## 57 55-64    80-119    20-29      3         6
## 58 55-64    80-119      30+      4         4
## 59 55-64      120+ 0-9g/day      5        10
## 60 55-64      120+    10-19      6         7
## 61 55-64      120+    20-29      2         3
## 62 55-64      120+      30+      5         6
## 63 65-74 0-39g/day 0-9g/day      5        48
## 64 65-74 0-39g/day    10-19      4        14
## 65 65-74 0-39g/day    20-29      2         7
## 66 65-74 0-39g/day      30+      0         2
## 67 65-74     40-79 0-9g/day     17        34
## 68 65-74     40-79    10-19      3        10
## 69 65-74     40-79    20-29      5         9
## 70 65-74    80-119 0-9g/day      6        13
## 71 65-74    80-119    10-19      4        12
## 72 65-74    80-119    20-29      2         3
## 73 65-74    80-119      30+      1         1
## 74 65-74      120+ 0-9g/day      3         4
## 75 65-74      120+    10-19      1         2
## 76 65-74      120+    20-29      1         1
## 77 65-74      120+      30+      1         1
## 78   75+ 0-39g/day 0-9g/day      1        18
## 79   75+ 0-39g/day    10-19      2         6
## 80   75+ 0-39g/day      30+      1         3
## 81   75+     40-79 0-9g/day      2         5
## 82   75+     40-79    10-19      1         3
## 83   75+     40-79    20-29      0         3
## 84   75+     40-79      30+      1         1
## 85   75+    80-119 0-9g/day      1         1
## 86   75+    80-119    10-19      1         1
## 87   75+      120+ 0-9g/day      2         2
## 88   75+      120+    10-19      1         1

Melihat Informasi Spesifik Dataset

?trees
View(trees)

Dataset esoph berasal dari studi kasus-kontrol dari (o) kanker kerongkongan di Ille-et-Vilaine, Prancis.

Dataset tersebut akan dilakukan perintah arrange , mutate ,filter,select, dan summarise.

Menyeleksi Data

esoph %>% select(-ncases)
##    agegp     alcgp    tobgp ncontrols
## 1  25-34 0-39g/day 0-9g/day        40
## 2  25-34 0-39g/day    10-19        10
## 3  25-34 0-39g/day    20-29         6
## 4  25-34 0-39g/day      30+         5
## 5  25-34     40-79 0-9g/day        27
## 6  25-34     40-79    10-19         7
## 7  25-34     40-79    20-29         4
## 8  25-34     40-79      30+         7
## 9  25-34    80-119 0-9g/day         2
## 10 25-34    80-119    10-19         1
## 11 25-34    80-119      30+         2
## 12 25-34      120+ 0-9g/day         1
## 13 25-34      120+    10-19         1
## 14 25-34      120+    20-29         1
## 15 25-34      120+      30+         2
## 16 35-44 0-39g/day 0-9g/day        60
## 17 35-44 0-39g/day    10-19        14
## 18 35-44 0-39g/day    20-29         7
## 19 35-44 0-39g/day      30+         8
## 20 35-44     40-79 0-9g/day        35
## 21 35-44     40-79    10-19        23
## 22 35-44     40-79    20-29        14
## 23 35-44     40-79      30+         8
## 24 35-44    80-119 0-9g/day        11
## 25 35-44    80-119    10-19         6
## 26 35-44    80-119    20-29         2
## 27 35-44    80-119      30+         1
## 28 35-44      120+ 0-9g/day         3
## 29 35-44      120+    10-19         3
## 30 35-44      120+    20-29         4
## 31 45-54 0-39g/day 0-9g/day        46
## 32 45-54 0-39g/day    10-19        18
## 33 45-54 0-39g/day    20-29        10
## 34 45-54 0-39g/day      30+         4
## 35 45-54     40-79 0-9g/day        38
## 36 45-54     40-79    10-19        21
## 37 45-54     40-79    20-29        15
## 38 45-54     40-79      30+         7
## 39 45-54    80-119 0-9g/day        16
## 40 45-54    80-119    10-19        14
## 41 45-54    80-119    20-29         5
## 42 45-54    80-119      30+         4
## 43 45-54      120+ 0-9g/day         4
## 44 45-54      120+    10-19         4
## 45 45-54      120+    20-29         3
## 46 45-54      120+      30+         4
## 47 55-64 0-39g/day 0-9g/day        49
## 48 55-64 0-39g/day    10-19        22
## 49 55-64 0-39g/day    20-29        12
## 50 55-64 0-39g/day      30+         6
## 51 55-64     40-79 0-9g/day        40
## 52 55-64     40-79    10-19        21
## 53 55-64     40-79    20-29        17
## 54 55-64     40-79      30+         6
## 55 55-64    80-119 0-9g/day        18
## 56 55-64    80-119    10-19        15
## 57 55-64    80-119    20-29         6
## 58 55-64    80-119      30+         4
## 59 55-64      120+ 0-9g/day        10
## 60 55-64      120+    10-19         7
## 61 55-64      120+    20-29         3
## 62 55-64      120+      30+         6
## 63 65-74 0-39g/day 0-9g/day        48
## 64 65-74 0-39g/day    10-19        14
## 65 65-74 0-39g/day    20-29         7
## 66 65-74 0-39g/day      30+         2
## 67 65-74     40-79 0-9g/day        34
## 68 65-74     40-79    10-19        10
## 69 65-74     40-79    20-29         9
## 70 65-74    80-119 0-9g/day        13
## 71 65-74    80-119    10-19        12
## 72 65-74    80-119    20-29         3
## 73 65-74    80-119      30+         1
## 74 65-74      120+ 0-9g/day         4
## 75 65-74      120+    10-19         2
## 76 65-74      120+    20-29         1
## 77 65-74      120+      30+         1
## 78   75+ 0-39g/day 0-9g/day        18
## 79   75+ 0-39g/day    10-19         6
## 80   75+ 0-39g/day      30+         3
## 81   75+     40-79 0-9g/day         5
## 82   75+     40-79    10-19         3
## 83   75+     40-79    20-29         3
## 84   75+     40-79      30+         1
## 85   75+    80-119 0-9g/day         1
## 86   75+    80-119    10-19         1
## 87   75+      120+ 0-9g/day         2
## 88   75+      120+    10-19         1

Mengurutkan Data

Sintaks dibawah akan mengurutkan data secara descending (besar ke kecil)

esoph %>% arrange(desc(ncontrols))
##    agegp     alcgp    tobgp ncases ncontrols
## 1  35-44 0-39g/day 0-9g/day      0        60
## 2  55-64 0-39g/day 0-9g/day      2        49
## 3  65-74 0-39g/day 0-9g/day      5        48
## 4  45-54 0-39g/day 0-9g/day      1        46
## 5  25-34 0-39g/day 0-9g/day      0        40
## 6  55-64     40-79 0-9g/day      9        40
## 7  45-54     40-79 0-9g/day      6        38
## 8  35-44     40-79 0-9g/day      0        35
## 9  65-74     40-79 0-9g/day     17        34
## 10 25-34     40-79 0-9g/day      0        27
## 11 35-44     40-79    10-19      3        23
## 12 55-64 0-39g/day    10-19      3        22
## 13 45-54     40-79    10-19      4        21
## 14 55-64     40-79    10-19      6        21
## 15 45-54 0-39g/day    10-19      0        18
## 16 55-64    80-119 0-9g/day      9        18
## 17   75+ 0-39g/day 0-9g/day      1        18
## 18 55-64     40-79    20-29      4        17
## 19 45-54    80-119 0-9g/day      3        16
## 20 45-54     40-79    20-29      5        15
## 21 55-64    80-119    10-19      8        15
## 22 35-44 0-39g/day    10-19      1        14
## 23 35-44     40-79    20-29      1        14
## 24 45-54    80-119    10-19      6        14
## 25 65-74 0-39g/day    10-19      4        14
## 26 65-74    80-119 0-9g/day      6        13
## 27 55-64 0-39g/day    20-29      3        12
## 28 65-74    80-119    10-19      4        12
## 29 35-44    80-119 0-9g/day      0        11
## 30 25-34 0-39g/day    10-19      0        10
## 31 45-54 0-39g/day    20-29      0        10
## 32 55-64      120+ 0-9g/day      5        10
## 33 65-74     40-79    10-19      3        10
## 34 65-74     40-79    20-29      5         9
## 35 35-44 0-39g/day      30+      0         8
## 36 35-44     40-79      30+      0         8
## 37 25-34     40-79    10-19      0         7
## 38 25-34     40-79      30+      0         7
## 39 35-44 0-39g/day    20-29      0         7
## 40 45-54     40-79      30+      5         7
## 41 55-64      120+    10-19      6         7
## 42 65-74 0-39g/day    20-29      2         7
## 43 25-34 0-39g/day    20-29      0         6
## 44 35-44    80-119    10-19      0         6
## 45 55-64 0-39g/day      30+      4         6
## 46 55-64     40-79      30+      3         6
## 47 55-64    80-119    20-29      3         6
## 48 55-64      120+      30+      5         6
## 49   75+ 0-39g/day    10-19      2         6
## 50 25-34 0-39g/day      30+      0         5
## 51 45-54    80-119    20-29      1         5
## 52   75+     40-79 0-9g/day      2         5
## 53 25-34     40-79    20-29      0         4
## 54 35-44      120+    20-29      2         4
## 55 45-54 0-39g/day      30+      0         4
## 56 45-54    80-119      30+      2         4
## 57 45-54      120+ 0-9g/day      4         4
## 58 45-54      120+    10-19      3         4
## 59 45-54      120+      30+      4         4
## 60 55-64    80-119      30+      4         4
## 61 65-74      120+ 0-9g/day      3         4
## 62 35-44      120+ 0-9g/day      2         3
## 63 35-44      120+    10-19      0         3
## 64 45-54      120+    20-29      2         3
## 65 55-64      120+    20-29      2         3
## 66 65-74    80-119    20-29      2         3
## 67   75+ 0-39g/day      30+      1         3
## 68   75+     40-79    10-19      1         3
## 69   75+     40-79    20-29      0         3
## 70 25-34    80-119 0-9g/day      0         2
## 71 25-34    80-119      30+      0         2
## 72 25-34      120+      30+      0         2
## 73 35-44    80-119    20-29      0         2
## 74 65-74 0-39g/day      30+      0         2
## 75 65-74      120+    10-19      1         2
## 76   75+      120+ 0-9g/day      2         2
## 77 25-34    80-119    10-19      0         1
## 78 25-34      120+ 0-9g/day      0         1
## 79 25-34      120+    10-19      1         1
## 80 25-34      120+    20-29      0         1
## 81 35-44    80-119      30+      0         1
## 82 65-74    80-119      30+      1         1
## 83 65-74      120+    20-29      1         1
## 84 65-74      120+      30+      1         1
## 85   75+     40-79      30+      1         1
## 86   75+    80-119 0-9g/day      1         1
## 87   75+    80-119    10-19      1         1
## 88   75+      120+    10-19      1         1

Melakukan Filter

Sintaks di bawah akan menampilkan data dengan kriteria tertentu .Data yang ditampilkan hanya data yang memiliki record 25-34pada kolom agegp

esoph %>% filter(agegp =="25-34")
##    agegp     alcgp    tobgp ncases ncontrols
## 1  25-34 0-39g/day 0-9g/day      0        40
## 2  25-34 0-39g/day    10-19      0        10
## 3  25-34 0-39g/day    20-29      0         6
## 4  25-34 0-39g/day      30+      0         5
## 5  25-34     40-79 0-9g/day      0        27
## 6  25-34     40-79    10-19      0         7
## 7  25-34     40-79    20-29      0         4
## 8  25-34     40-79      30+      0         7
## 9  25-34    80-119 0-9g/day      0         2
## 10 25-34    80-119    10-19      0         1
## 11 25-34    80-119      30+      0         2
## 12 25-34      120+ 0-9g/day      0         1
## 13 25-34      120+    10-19      1         1
## 14 25-34      120+    20-29      0         1
## 15 25-34      120+      30+      0         2

Menambah Variabel Baru dengan fungsi mutate()

esoph %>% mutate(ntotal=ncases+ncontrols)
##    agegp     alcgp    tobgp ncases ncontrols ntotal
## 1  25-34 0-39g/day 0-9g/day      0        40     40
## 2  25-34 0-39g/day    10-19      0        10     10
## 3  25-34 0-39g/day    20-29      0         6      6
## 4  25-34 0-39g/day      30+      0         5      5
## 5  25-34     40-79 0-9g/day      0        27     27
## 6  25-34     40-79    10-19      0         7      7
## 7  25-34     40-79    20-29      0         4      4
## 8  25-34     40-79      30+      0         7      7
## 9  25-34    80-119 0-9g/day      0         2      2
## 10 25-34    80-119    10-19      0         1      1
## 11 25-34    80-119      30+      0         2      2
## 12 25-34      120+ 0-9g/day      0         1      1
## 13 25-34      120+    10-19      1         1      2
## 14 25-34      120+    20-29      0         1      1
## 15 25-34      120+      30+      0         2      2
## 16 35-44 0-39g/day 0-9g/day      0        60     60
## 17 35-44 0-39g/day    10-19      1        14     15
## 18 35-44 0-39g/day    20-29      0         7      7
## 19 35-44 0-39g/day      30+      0         8      8
## 20 35-44     40-79 0-9g/day      0        35     35
## 21 35-44     40-79    10-19      3        23     26
## 22 35-44     40-79    20-29      1        14     15
## 23 35-44     40-79      30+      0         8      8
## 24 35-44    80-119 0-9g/day      0        11     11
## 25 35-44    80-119    10-19      0         6      6
## 26 35-44    80-119    20-29      0         2      2
## 27 35-44    80-119      30+      0         1      1
## 28 35-44      120+ 0-9g/day      2         3      5
## 29 35-44      120+    10-19      0         3      3
## 30 35-44      120+    20-29      2         4      6
## 31 45-54 0-39g/day 0-9g/day      1        46     47
## 32 45-54 0-39g/day    10-19      0        18     18
## 33 45-54 0-39g/day    20-29      0        10     10
## 34 45-54 0-39g/day      30+      0         4      4
## 35 45-54     40-79 0-9g/day      6        38     44
## 36 45-54     40-79    10-19      4        21     25
## 37 45-54     40-79    20-29      5        15     20
## 38 45-54     40-79      30+      5         7     12
## 39 45-54    80-119 0-9g/day      3        16     19
## 40 45-54    80-119    10-19      6        14     20
## 41 45-54    80-119    20-29      1         5      6
## 42 45-54    80-119      30+      2         4      6
## 43 45-54      120+ 0-9g/day      4         4      8
## 44 45-54      120+    10-19      3         4      7
## 45 45-54      120+    20-29      2         3      5
## 46 45-54      120+      30+      4         4      8
## 47 55-64 0-39g/day 0-9g/day      2        49     51
## 48 55-64 0-39g/day    10-19      3        22     25
## 49 55-64 0-39g/day    20-29      3        12     15
## 50 55-64 0-39g/day      30+      4         6     10
## 51 55-64     40-79 0-9g/day      9        40     49
## 52 55-64     40-79    10-19      6        21     27
## 53 55-64     40-79    20-29      4        17     21
## 54 55-64     40-79      30+      3         6      9
## 55 55-64    80-119 0-9g/day      9        18     27
## 56 55-64    80-119    10-19      8        15     23
## 57 55-64    80-119    20-29      3         6      9
## 58 55-64    80-119      30+      4         4      8
## 59 55-64      120+ 0-9g/day      5        10     15
## 60 55-64      120+    10-19      6         7     13
## 61 55-64      120+    20-29      2         3      5
## 62 55-64      120+      30+      5         6     11
## 63 65-74 0-39g/day 0-9g/day      5        48     53
## 64 65-74 0-39g/day    10-19      4        14     18
## 65 65-74 0-39g/day    20-29      2         7      9
## 66 65-74 0-39g/day      30+      0         2      2
## 67 65-74     40-79 0-9g/day     17        34     51
## 68 65-74     40-79    10-19      3        10     13
## 69 65-74     40-79    20-29      5         9     14
## 70 65-74    80-119 0-9g/day      6        13     19
## 71 65-74    80-119    10-19      4        12     16
## 72 65-74    80-119    20-29      2         3      5
## 73 65-74    80-119      30+      1         1      2
## 74 65-74      120+ 0-9g/day      3         4      7
## 75 65-74      120+    10-19      1         2      3
## 76 65-74      120+    20-29      1         1      2
## 77 65-74      120+      30+      1         1      2
## 78   75+ 0-39g/day 0-9g/day      1        18     19
## 79   75+ 0-39g/day    10-19      2         6      8
## 80   75+ 0-39g/day      30+      1         3      4
## 81   75+     40-79 0-9g/day      2         5      7
## 82   75+     40-79    10-19      1         3      4
## 83   75+     40-79    20-29      0         3      3
## 84   75+     40-79      30+      1         1      2
## 85   75+    80-119 0-9g/day      1         1      2
## 86   75+    80-119    10-19      1         1      2
## 87   75+      120+ 0-9g/day      2         2      4
## 88   75+      120+    10-19      1         1      2

Menggunakan fungsi summarise()

Sintaks di bawah akan menampilkan rata-rata dari peubah ncontrols pada tiap kategori yang berbeda pada kolom agegp

esoph %>% group_by(agegp)  %>% summarise(mean=mean(ncontrols))
## # A tibble: 6 x 2
##   agegp  mean
##   <ord> <dbl>
## 1 25-34  7.73
## 2 35-44 13.3 
## 3 45-54 13.3 
## 4 55-64 15.1 
## 5 65-74 10.7 
## 6 75+    4

Melakukan kelima perintah bersamaan

esoph %>% select(agegp,ncontrols,ncases)%>% arrange(desc(ncontrols))%>% filter(agegp =="25-34") %>% mutate(ntotal=ncontrols + ncases) %>% group_by(agegp)  %>% summarise(mean=mean(ncontrols))
## # A tibble: 1 x 2
##   agegp  mean
##   <ord> <dbl>
## 1 25-34  7.73