Tugas Akhir Mandarel

Engine SQL

Publikasi ini menggunakan basis data sampel ‘chinook’ yang digunakan sebagai database untuk mengakses dan mengolah melalui R. Terdapat beberapa package yang dibutuhkan, yaitu ‘DBI’ dan ‘RSQLite’.

install.packages(c("RSQLite", "DBI"), dependencies = TRUE)

Selanjutnya package tidyverse yang mencakup beberapa package diretrieve. Fungsi dalam suatu package dapat juga dipanggil tanpa perintah library() atau require() jika sintaks yang ditulis menyertakan nama package tempat fungsi tersebut berasal. Penyertaan ini dengan menggunakan tanda ::.

library(tidyverse)
library(RSQLite)
library(DBI)

Secara umum, koneksi terhadap database pada R dapat menggunakan sintaks berikut: DBI::dbConnect(RSQLite::SQLite(), path = ":dbname:")

chinook<-DBI::dbConnect(RSQLite::SQLite(), "C:/sqlite/chinook.db")
RSQLite::dbListTables(chinook)
##  [1] "albums"          "artists"         "customers"       "employees"      
##  [5] "genres"          "invoice_items"   "invoices"        "media_types"    
##  [9] "playlist_track"  "playlists"       "sqlite_sequence" "sqlite_stat1"   
## [13] "tracks"

Setelah melakukan dbConnect() kita dapat melakukan perintah SQL.

SELECT 
  *
FROM
  customers;
Displaying records 1 - 10
CustomerId FirstName LastName Company Address City State Country PostalCode Phone Fax Email SupportRepId
1 Luís Gonçalves Embraer - Empresa Brasileira de Aeronáutica S.A. Av. Brigadeiro Faria Lima, 2170 São José dos Campos SP Brazil 12227-000 +55 (12) 3923-5555 +55 (12) 3923-5566 3
2 Leonie Köhler NA Theodor-Heuss-Straße 34 Stuttgart NA Germany 70174 +49 0711 2842222 NA 5
3 François Tremblay NA 1498 rue Bélanger Montréal QC Canada H2G 1A7 +1 (514) 721-4711 NA 3
4 Bjørn Hansen NA Ullevålsveien 14 Oslo NA Norway 0171 +47 22 44 22 22 NA 4
5 František Wichterlová JetBrains s.r.o. Klanova 9/506 Prague NA Czech Republic 14700 +420 2 4172 5555 +420 2 4172 5555 4
6 Helena Holý NA Rilská 3174/6 Prague NA Czech Republic 14300 +420 2 4177 0449 NA 5
7 Astrid Gruber NA Rotenturmstraße 4, 1010 Innere Stadt Vienne NA Austria 1010 +43 01 5134505 NA 5
8 Daan Peeters NA Grétrystraat 63 Brussels NA Belgium 1000 +32 02 219 03 03 NA 4
9 Kara Nielsen NA Sønder Boulevard 51 Copenhagen NA Denmark 1720 +453 3331 9991 NA 4
10 Eduardo Martins Woodstock Discos Rua Dr. Falcão Filho, 155 São Paulo SP Brazil 01007-010 +55 (11) 3033-5446 +55 (11) 3033-4564 4

Setelah selesai mengolah database terkait dianjurkan memutuskan koneksi dengan fungsi dbDisconnect() dan jika ingin menyambungkan kembali cukup menggunakan kembali dbConnect()

dbDisconnect(chinook)

Menggunakan DPLYR

chinook<-DBI::dbConnect(RSQLite::SQLite(), "C:/sqlite/chinook.db")

RSQLite::dbListTables(chinook)
##  [1] "albums"          "artists"         "customers"       "employees"      
##  [5] "genres"          "invoice_items"   "invoices"        "media_types"    
##  [9] "playlist_track"  "playlists"       "sqlite_sequence" "sqlite_stat1"   
## [13] "tracks"
dplyr::tbl(chinook,"customers")
## # Source:   table<customers> [?? x 13]
## # Database: sqlite 3.37.0 [C:\sqlite\chinook.db]
##    CustomerId FirstName LastName  Company Address City  State Country PostalCode
##         <int> <chr>     <chr>     <chr>   <chr>   <chr> <chr> <chr>   <chr>     
##  1          1 Luís      Gonçalves Embrae~ Av. Br~ São ~ SP    Brazil  12227-000 
##  2          2 Leonie    Köhler    <NA>    Theodo~ Stut~ <NA>  Germany 70174     
##  3          3 François  Tremblay  <NA>    1498 r~ Mont~ QC    Canada  H2G 1A7   
##  4          4 Bjørn     Hansen    <NA>    Ullevå~ Oslo  <NA>  Norway  0171      
##  5          5 František Wichterl~ JetBra~ Klanov~ Prag~ <NA>  Czech ~ 14700     
##  6          6 Helena    Holý      <NA>    Rilská~ Prag~ <NA>  Czech ~ 14300     
##  7          7 Astrid    Gruber    <NA>    Rotent~ Vien~ <NA>  Austria 1010      
##  8          8 Daan      Peeters   <NA>    Grétry~ Brus~ <NA>  Belgium 1000      
##  9          9 Kara      Nielsen   <NA>    Sønder~ Cope~ <NA>  Denmark 1720      
## 10         10 Eduardo   Martins   Woodst~ Rua Dr~ São ~ SP    Brazil  01007-010 
## # ... with more rows, and 4 more variables: Phone <chr>, Fax <chr>,
## #   Email <chr>, SupportRepId <int>
customers <- dplyr::tbl(chinook,"customers")
class(customers)
## [1] "tbl_SQLiteConnection" "tbl_dbi"              "tbl_sql"             
## [4] "tbl_lazy"             "tbl"
customers
## # Source:   table<customers> [?? x 13]
## # Database: sqlite 3.37.0 [C:\sqlite\chinook.db]
##    CustomerId FirstName LastName  Company Address City  State Country PostalCode
##         <int> <chr>     <chr>     <chr>   <chr>   <chr> <chr> <chr>   <chr>     
##  1          1 Luís      Gonçalves Embrae~ Av. Br~ São ~ SP    Brazil  12227-000 
##  2          2 Leonie    Köhler    <NA>    Theodo~ Stut~ <NA>  Germany 70174     
##  3          3 François  Tremblay  <NA>    1498 r~ Mont~ QC    Canada  H2G 1A7   
##  4          4 Bjørn     Hansen    <NA>    Ullevå~ Oslo  <NA>  Norway  0171      
##  5          5 František Wichterl~ JetBra~ Klanov~ Prag~ <NA>  Czech ~ 14700     
##  6          6 Helena    Holý      <NA>    Rilská~ Prag~ <NA>  Czech ~ 14300     
##  7          7 Astrid    Gruber    <NA>    Rotent~ Vien~ <NA>  Austria 1010      
##  8          8 Daan      Peeters   <NA>    Grétry~ Brus~ <NA>  Belgium 1000      
##  9          9 Kara      Nielsen   <NA>    Sønder~ Cope~ <NA>  Denmark 1720      
## 10         10 Eduardo   Martins   Woodst~ Rua Dr~ São ~ SP    Brazil  01007-010 
## # ... with more rows, and 4 more variables: Phone <chr>, Fax <chr>,
## #   Email <chr>, SupportRepId <int>
query <- customers  %>% select(-CustomerId,-FirstName)
query
## # Source:   lazy query [?? x 11]
## # Database: sqlite 3.37.0 [C:\sqlite\chinook.db]
##    LastName    Company  Address City  State Country PostalCode Phone Fax   Email
##    <chr>       <chr>    <chr>   <chr> <chr> <chr>   <chr>      <chr> <chr> <chr>
##  1 Gonçalves   Embraer~ Av. Br~ São ~ SP    Brazil  12227-000  +55 ~ +55 ~ luis~
##  2 Köhler      <NA>     Theodo~ Stut~ <NA>  Germany 70174      +49 ~ <NA>  leon~
##  3 Tremblay    <NA>     1498 r~ Mont~ QC    Canada  H2G 1A7    +1 (~ <NA>  ftre~
##  4 Hansen      <NA>     Ullevå~ Oslo  <NA>  Norway  0171       +47 ~ <NA>  bjor~
##  5 Wichterlová JetBrai~ Klanov~ Prag~ <NA>  Czech ~ 14700      +420~ +420~ fran~
##  6 Holý        <NA>     Rilská~ Prag~ <NA>  Czech ~ 14300      +420~ <NA>  hhol~
##  7 Gruber      <NA>     Rotent~ Vien~ <NA>  Austria 1010       +43 ~ <NA>  astr~
##  8 Peeters     <NA>     Grétry~ Brus~ <NA>  Belgium 1000       +32 ~ <NA>  daan~
##  9 Nielsen     <NA>     Sønder~ Cope~ <NA>  Denmark 1720       +453~ <NA>  kara~
## 10 Martins     Woodsto~ Rua Dr~ São ~ SP    Brazil  01007-010  +55 ~ +55 ~ edua~
## # ... with more rows, and 1 more variable: SupportRepId <int>
dplyr::show_query(query)
## <SQL>
## SELECT `LastName`, `Company`, `Address`, `City`, `State`, `Country`, `PostalCode`, `Phone`, `Fax`, `Email`, `SupportRepId`
## FROM `customers`

Data Wrangling

Dataset yang digunakan adalah storms yang tersedia pada package default R datasets.

library(datasets)
library(help = "datasets")
Formaldehyde
##   carb optden
## 1  0.1  0.086
## 2  0.3  0.269
## 3  0.5  0.446
## 4  0.6  0.538
## 5  0.7  0.626
## 6  0.9  0.782

These data are from a chemical experiment to prepare a standard curve for the determination of formaldehyde by the addition of chromatropic acid and concentrated sulphuric acid and the reading of the resulting purple color on a spectrophotometer. Untuk mengetahui informasi terkait dataset dapat menggunakan sintaks ?sebelum nama dataset .

?Formaldehyde
View(Formaldehyde)

Dataset tersebut akan dilakukan perintah arrange , mutate ,filter,select, dan summarise.

Formaldehyde%>% select(carb,optden)
##   carb optden
## 1  0.1  0.086
## 2  0.3  0.269
## 3  0.5  0.446
## 4  0.6  0.538
## 5  0.7  0.626
## 6  0.9  0.782
Formaldehyde %>% arrange(desc(carb))
##   carb optden
## 6  0.9  0.782
## 5  0.7  0.626
## 4  0.6  0.538
## 3  0.5  0.446
## 2  0.3  0.269
## 1  0.1  0.086
Formaldehyde %>% filter(carb>0.5)
##   carb optden
## 4  0.6  0.538
## 5  0.7  0.626
## 6  0.9  0.782
Formaldehyde %>% mutate(duration=carb*optden)
##   carb optden duration
## 1  0.1  0.086   0.0086
## 2  0.3  0.269   0.0807
## 3  0.5  0.446   0.2230
## 4  0.6  0.538   0.3228
## 5  0.7  0.626   0.4382
## 6  0.9  0.782   0.7038
Formaldehyde %>% group_by(carb)%>%summarise(mean=mean(carb))
## # A tibble: 6 x 2
##    carb  mean
##   <dbl> <dbl>
## 1   0.1   0.1
## 2   0.3   0.3
## 3   0.5   0.5
## 4   0.6   0.6
## 5   0.7   0.7
## 6   0.9   0.9
Formaldehyde
##   carb optden
## 1  0.1  0.086
## 2  0.3  0.269
## 3  0.5  0.446
## 4  0.6  0.538
## 5  0.7  0.626
## 6  0.9  0.782

Melakukan kelima perintah bersamaan

Formaldehyde%>% select(carb,optden) %>% arrange(desc(carb)) %>%  filter(carb>0.5)%>% mutate(duration=carb*optden)%>% group_by(carb)%>%summarise(mean=mean(carb))
## # A tibble: 3 x 2
##    carb  mean
##   <dbl> <dbl>
## 1   0.6   0.6
## 2   0.7   0.7
## 3   0.9   0.9