Tugas Akhir Mandarel
Engine SQL
Publikasi ini menggunakan basis data sampel ‘chinook’ yang digunakan sebagai database untuk mengakses dan mengolah melalui R. Terdapat beberapa package yang dibutuhkan, yaitu ‘DBI’ dan ‘RSQLite’.
install.packages(c("RSQLite", "DBI"), dependencies = TRUE)Selanjutnya package tidyverse yang mencakup beberapa package diretrieve. Fungsi dalam suatu package dapat juga dipanggil tanpa perintah library() atau require() jika sintaks yang ditulis menyertakan nama package tempat fungsi tersebut berasal. Penyertaan ini dengan menggunakan tanda ::.
library(tidyverse)
library(RSQLite)
library(DBI)Secara umum, koneksi terhadap database pada R dapat menggunakan sintaks berikut: DBI::dbConnect(RSQLite::SQLite(), path = ":dbname:")
chinook<-DBI::dbConnect(RSQLite::SQLite(), "C:/sqlite/chinook.db")
RSQLite::dbListTables(chinook)## [1] "albums" "artists" "customers" "employees"
## [5] "genres" "invoice_items" "invoices" "media_types"
## [9] "playlist_track" "playlists" "sqlite_sequence" "sqlite_stat1"
## [13] "tracks"
Setelah melakukan dbConnect() kita dapat melakukan perintah SQL.
SELECT
*
FROM
customers;| CustomerId | FirstName | LastName | Company | Address | City | State | Country | PostalCode | Phone | Fax | SupportRepId | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Luís | Gonçalves | Embraer - Empresa Brasileira de Aeronáutica S.A. | Av. Brigadeiro Faria Lima, 2170 | São José dos Campos | SP | Brazil | 12227-000 | +55 (12) 3923-5555 | +55 (12) 3923-5566 | luisg@embraer.com.br | 3 |
| 2 | Leonie | Köhler | NA | Theodor-Heuss-Straße 34 | Stuttgart | NA | Germany | 70174 | +49 0711 2842222 | NA | leonekohler@surfeu.de | 5 |
| 3 | François | Tremblay | NA | 1498 rue Bélanger | Montréal | QC | Canada | H2G 1A7 | +1 (514) 721-4711 | NA | ftremblay@gmail.com | 3 |
| 4 | Bjørn | Hansen | NA | Ullevålsveien 14 | Oslo | NA | Norway | 0171 | +47 22 44 22 22 | NA | bjorn.hansen@yahoo.no | 4 |
| 5 | František | Wichterlová | JetBrains s.r.o. | Klanova 9/506 | Prague | NA | Czech Republic | 14700 | +420 2 4172 5555 | +420 2 4172 5555 | frantisekw@jetbrains.com | 4 |
| 6 | Helena | Holý | NA | Rilská 3174/6 | Prague | NA | Czech Republic | 14300 | +420 2 4177 0449 | NA | hholy@gmail.com | 5 |
| 7 | Astrid | Gruber | NA | Rotenturmstraße 4, 1010 Innere Stadt | Vienne | NA | Austria | 1010 | +43 01 5134505 | NA | astrid.gruber@apple.at | 5 |
| 8 | Daan | Peeters | NA | Grétrystraat 63 | Brussels | NA | Belgium | 1000 | +32 02 219 03 03 | NA | daan_peeters@apple.be | 4 |
| 9 | Kara | Nielsen | NA | Sønder Boulevard 51 | Copenhagen | NA | Denmark | 1720 | +453 3331 9991 | NA | kara.nielsen@jubii.dk | 4 |
| 10 | Eduardo | Martins | Woodstock Discos | Rua Dr. Falcão Filho, 155 | São Paulo | SP | Brazil | 01007-010 | +55 (11) 3033-5446 | +55 (11) 3033-4564 | eduardo@woodstock.com.br | 4 |
Setelah selesai mengolah database terkait dianjurkan memutuskan koneksi dengan fungsi dbDisconnect() dan jika ingin menyambungkan kembali cukup menggunakan kembali dbConnect()
dbDisconnect(chinook)Menggunakan DPLYR
chinook<-DBI::dbConnect(RSQLite::SQLite(), "C:/sqlite/chinook.db")
RSQLite::dbListTables(chinook)## [1] "albums" "artists" "customers" "employees"
## [5] "genres" "invoice_items" "invoices" "media_types"
## [9] "playlist_track" "playlists" "sqlite_sequence" "sqlite_stat1"
## [13] "tracks"
dplyr::tbl(chinook,"customers")## # Source: table<customers> [?? x 13]
## # Database: sqlite 3.37.0 [C:\sqlite\chinook.db]
## CustomerId FirstName LastName Company Address City State Country PostalCode
## <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 1 Luís Gonçalves Embrae~ Av. Br~ São ~ SP Brazil 12227-000
## 2 2 Leonie Köhler <NA> Theodo~ Stut~ <NA> Germany 70174
## 3 3 François Tremblay <NA> 1498 r~ Mont~ QC Canada H2G 1A7
## 4 4 Bjørn Hansen <NA> Ullevå~ Oslo <NA> Norway 0171
## 5 5 František Wichterl~ JetBra~ Klanov~ Prag~ <NA> Czech ~ 14700
## 6 6 Helena Holý <NA> Rilská~ Prag~ <NA> Czech ~ 14300
## 7 7 Astrid Gruber <NA> Rotent~ Vien~ <NA> Austria 1010
## 8 8 Daan Peeters <NA> Grétry~ Brus~ <NA> Belgium 1000
## 9 9 Kara Nielsen <NA> Sønder~ Cope~ <NA> Denmark 1720
## 10 10 Eduardo Martins Woodst~ Rua Dr~ São ~ SP Brazil 01007-010
## # ... with more rows, and 4 more variables: Phone <chr>, Fax <chr>,
## # Email <chr>, SupportRepId <int>
customers <- dplyr::tbl(chinook,"customers")
class(customers)## [1] "tbl_SQLiteConnection" "tbl_dbi" "tbl_sql"
## [4] "tbl_lazy" "tbl"
customers## # Source: table<customers> [?? x 13]
## # Database: sqlite 3.37.0 [C:\sqlite\chinook.db]
## CustomerId FirstName LastName Company Address City State Country PostalCode
## <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 1 Luís Gonçalves Embrae~ Av. Br~ São ~ SP Brazil 12227-000
## 2 2 Leonie Köhler <NA> Theodo~ Stut~ <NA> Germany 70174
## 3 3 François Tremblay <NA> 1498 r~ Mont~ QC Canada H2G 1A7
## 4 4 Bjørn Hansen <NA> Ullevå~ Oslo <NA> Norway 0171
## 5 5 František Wichterl~ JetBra~ Klanov~ Prag~ <NA> Czech ~ 14700
## 6 6 Helena Holý <NA> Rilská~ Prag~ <NA> Czech ~ 14300
## 7 7 Astrid Gruber <NA> Rotent~ Vien~ <NA> Austria 1010
## 8 8 Daan Peeters <NA> Grétry~ Brus~ <NA> Belgium 1000
## 9 9 Kara Nielsen <NA> Sønder~ Cope~ <NA> Denmark 1720
## 10 10 Eduardo Martins Woodst~ Rua Dr~ São ~ SP Brazil 01007-010
## # ... with more rows, and 4 more variables: Phone <chr>, Fax <chr>,
## # Email <chr>, SupportRepId <int>
query <- customers %>% select(-CustomerId,-FirstName)
query## # Source: lazy query [?? x 11]
## # Database: sqlite 3.37.0 [C:\sqlite\chinook.db]
## LastName Company Address City State Country PostalCode Phone Fax Email
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Gonçalves Embraer~ Av. Br~ São ~ SP Brazil 12227-000 +55 ~ +55 ~ luis~
## 2 Köhler <NA> Theodo~ Stut~ <NA> Germany 70174 +49 ~ <NA> leon~
## 3 Tremblay <NA> 1498 r~ Mont~ QC Canada H2G 1A7 +1 (~ <NA> ftre~
## 4 Hansen <NA> Ullevå~ Oslo <NA> Norway 0171 +47 ~ <NA> bjor~
## 5 Wichterlová JetBrai~ Klanov~ Prag~ <NA> Czech ~ 14700 +420~ +420~ fran~
## 6 Holý <NA> Rilská~ Prag~ <NA> Czech ~ 14300 +420~ <NA> hhol~
## 7 Gruber <NA> Rotent~ Vien~ <NA> Austria 1010 +43 ~ <NA> astr~
## 8 Peeters <NA> Grétry~ Brus~ <NA> Belgium 1000 +32 ~ <NA> daan~
## 9 Nielsen <NA> Sønder~ Cope~ <NA> Denmark 1720 +453~ <NA> kara~
## 10 Martins Woodsto~ Rua Dr~ São ~ SP Brazil 01007-010 +55 ~ +55 ~ edua~
## # ... with more rows, and 1 more variable: SupportRepId <int>
dplyr::show_query(query)## <SQL>
## SELECT `LastName`, `Company`, `Address`, `City`, `State`, `Country`, `PostalCode`, `Phone`, `Fax`, `Email`, `SupportRepId`
## FROM `customers`
Data Wrangling
Dataset yang digunakan adalah storms yang tersedia pada package default R datasets.
library(datasets)
library(help = "datasets")
Formaldehyde## carb optden
## 1 0.1 0.086
## 2 0.3 0.269
## 3 0.5 0.446
## 4 0.6 0.538
## 5 0.7 0.626
## 6 0.9 0.782
These data are from a chemical experiment to prepare a standard curve for the determination of formaldehyde by the addition of chromatropic acid and concentrated sulphuric acid and the reading of the resulting purple color on a spectrophotometer. Untuk mengetahui informasi terkait dataset dapat menggunakan sintaks ?sebelum nama dataset .
?Formaldehyde
View(Formaldehyde)Dataset tersebut akan dilakukan perintah arrange , mutate ,filter,select, dan summarise.
Formaldehyde%>% select(carb,optden)## carb optden
## 1 0.1 0.086
## 2 0.3 0.269
## 3 0.5 0.446
## 4 0.6 0.538
## 5 0.7 0.626
## 6 0.9 0.782
Formaldehyde %>% arrange(desc(carb))## carb optden
## 6 0.9 0.782
## 5 0.7 0.626
## 4 0.6 0.538
## 3 0.5 0.446
## 2 0.3 0.269
## 1 0.1 0.086
Formaldehyde %>% filter(carb>0.5)## carb optden
## 4 0.6 0.538
## 5 0.7 0.626
## 6 0.9 0.782
Formaldehyde %>% mutate(duration=carb*optden)## carb optden duration
## 1 0.1 0.086 0.0086
## 2 0.3 0.269 0.0807
## 3 0.5 0.446 0.2230
## 4 0.6 0.538 0.3228
## 5 0.7 0.626 0.4382
## 6 0.9 0.782 0.7038
Formaldehyde %>% group_by(carb)%>%summarise(mean=mean(carb))## # A tibble: 6 x 2
## carb mean
## <dbl> <dbl>
## 1 0.1 0.1
## 2 0.3 0.3
## 3 0.5 0.5
## 4 0.6 0.6
## 5 0.7 0.7
## 6 0.9 0.9
Formaldehyde## carb optden
## 1 0.1 0.086
## 2 0.3 0.269
## 3 0.5 0.446
## 4 0.6 0.538
## 5 0.7 0.626
## 6 0.9 0.782
Melakukan kelima perintah bersamaan
Formaldehyde%>% select(carb,optden) %>% arrange(desc(carb)) %>% filter(carb>0.5)%>% mutate(duration=carb*optden)%>% group_by(carb)%>%summarise(mean=mean(carb))## # A tibble: 3 x 2
## carb mean
## <dbl> <dbl>
## 1 0.6 0.6
## 2 0.7 0.7
## 3 0.9 0.9