Tugas Akhir Mandarel

Engine SQL

Database yang digunakan pada publikasi ini adalah database sampel chinook digunakan sebagai database untuk mengakses dan mengolah melalui R. Lalu, diperlukan beberapa package, yaitu DBI dan RSQLite.

install.packages(c("RSQLite", "DBI"), dependencies = TRUE)

Selanjutnya package tidyverse yang mencakup beberapa package diretrieve. Fungsi dalam suatu package dapat juga dipanggil tanpa perintah library() atau require() jika sintaks yang ditulis menyertakan nama package tempat fungsi tersebut berasal. Penyertaan ini dengan menggunakan tanda ::.

library(tidyverse)
library(RSQLite)
library(DBI)

Secara umum, koneksi terhadap database pada R dapat menggunakan sintaks berikut: DBI::dbConnect(RSQLite::SQLite(), path = ":dbname:")

chinook<-DBI::dbConnect(RSQLite::SQLite(), "C:/SQLite/chinook.db")
RSQLite::dbListTables(chinook)
##  [1] "albums"          "artists"         "customers"       "employees"      
##  [5] "genres"          "invoice_items"   "invoices"        "media_types"    
##  [9] "playlist_track"  "playlists"       "sqlite_sequence" "sqlite_stat1"   
## [13] "tracks"

Setelah melakukan dbConnect() kita dapat melakukan perintah SQL.

SELECT 
  *
FROM
  employees;
8 records
EmployeeId LastName FirstName Title ReportsTo BirthDate HireDate Address City State Country PostalCode Phone Fax Email
1 Adams Andrew General Manager NA 1962-02-18 00:00:00 2002-08-14 00:00:00 11120 Jasper Ave NW Edmonton AB Canada T5K 2N1 +1 (780) 428-9482 +1 (780) 428-3457
2 Edwards Nancy Sales Manager 1 1958-12-08 00:00:00 2002-05-01 00:00:00 825 8 Ave SW Calgary AB Canada T2P 2T3 +1 (403) 262-3443 +1 (403) 262-3322
3 Peacock Jane Sales Support Agent 2 1973-08-29 00:00:00 2002-04-01 00:00:00 1111 6 Ave SW Calgary AB Canada T2P 5M5 +1 (403) 262-3443 +1 (403) 262-6712
4 Park Margaret Sales Support Agent 2 1947-09-19 00:00:00 2003-05-03 00:00:00 683 10 Street SW Calgary AB Canada T2P 5G3 +1 (403) 263-4423 +1 (403) 263-4289
5 Johnson Steve Sales Support Agent 2 1965-03-03 00:00:00 2003-10-17 00:00:00 7727B 41 Ave Calgary AB Canada T3B 1Y7 1 (780) 836-9987 1 (780) 836-9543
6 Mitchell Michael IT Manager 1 1973-07-01 00:00:00 2003-10-17 00:00:00 5827 Bowness Road NW Calgary AB Canada T3B 0C5 +1 (403) 246-9887 +1 (403) 246-9899
7 King Robert IT Staff 6 1970-05-29 00:00:00 2004-01-02 00:00:00 590 Columbia Boulevard West Lethbridge AB Canada T1K 5N8 +1 (403) 456-9986 +1 (403) 456-8485
8 Callahan Laura IT Staff 6 1968-01-09 00:00:00 2004-03-04 00:00:00 923 7 ST NW Lethbridge AB Canada T1H 1Y8 +1 (403) 467-3351 +1 (403) 467-8772

Setelah selesai mengolah database terkait dianjurkan memutuskan koneksi dengan fungsi dbDisconnect() dan jika ingin menyambungkan kembali cukup menggunakan kembali dbConnect()

dbDisconnect(chinook)

Menggunakan DPLYR

chinook<-DBI::dbConnect(RSQLite::SQLite(), "C:/SQLite/chinook.db")

RSQLite::dbListTables(chinook)
##  [1] "albums"          "artists"         "customers"       "employees"      
##  [5] "genres"          "invoice_items"   "invoices"        "media_types"    
##  [9] "playlist_track"  "playlists"       "sqlite_sequence" "sqlite_stat1"   
## [13] "tracks"
dplyr::tbl(chinook,"employees")
## # Source:   table<employees> [?? x 15]
## # Database: sqlite 3.37.0 [C:\sqlite\chinook.db]
##   EmployeeId LastName FirstName Title ReportsTo BirthDate HireDate Address City 
##        <int> <chr>    <chr>     <chr>     <int> <chr>     <chr>    <chr>   <chr>
## 1          1 Adams    Andrew    Gene~        NA 1962-02-~ 2002-08~ 11120 ~ Edmo~
## 2          2 Edwards  Nancy     Sale~         1 1958-12-~ 2002-05~ 825 8 ~ Calg~
## 3          3 Peacock  Jane      Sale~         2 1973-08-~ 2002-04~ 1111 6~ Calg~
## 4          4 Park     Margaret  Sale~         2 1947-09-~ 2003-05~ 683 10~ Calg~
## 5          5 Johnson  Steve     Sale~         2 1965-03-~ 2003-10~ 7727B ~ Calg~
## 6          6 Mitchell Michael   IT M~         1 1973-07-~ 2003-10~ 5827 B~ Calg~
## 7          7 King     Robert    IT S~         6 1970-05-~ 2004-01~ 590 Co~ Leth~
## 8          8 Callahan Laura     IT S~         6 1968-01-~ 2004-03~ 923 7 ~ Leth~
## # ... with 6 more variables: State <chr>, Country <chr>, PostalCode <chr>,
## #   Phone <chr>, Fax <chr>, Email <chr>
employees <- dplyr::tbl(chinook,"employees")
class(employees)
## [1] "tbl_SQLiteConnection" "tbl_dbi"              "tbl_sql"             
## [4] "tbl_lazy"             "tbl"
employees
## # Source:   table<employees> [?? x 15]
## # Database: sqlite 3.37.0 [C:\sqlite\chinook.db]
##   EmployeeId LastName FirstName Title ReportsTo BirthDate HireDate Address City 
##        <int> <chr>    <chr>     <chr>     <int> <chr>     <chr>    <chr>   <chr>
## 1          1 Adams    Andrew    Gene~        NA 1962-02-~ 2002-08~ 11120 ~ Edmo~
## 2          2 Edwards  Nancy     Sale~         1 1958-12-~ 2002-05~ 825 8 ~ Calg~
## 3          3 Peacock  Jane      Sale~         2 1973-08-~ 2002-04~ 1111 6~ Calg~
## 4          4 Park     Margaret  Sale~         2 1947-09-~ 2003-05~ 683 10~ Calg~
## 5          5 Johnson  Steve     Sale~         2 1965-03-~ 2003-10~ 7727B ~ Calg~
## 6          6 Mitchell Michael   IT M~         1 1973-07-~ 2003-10~ 5827 B~ Calg~
## 7          7 King     Robert    IT S~         6 1970-05-~ 2004-01~ 590 Co~ Leth~
## 8          8 Callahan Laura     IT S~         6 1968-01-~ 2004-03~ 923 7 ~ Leth~
## # ... with 6 more variables: State <chr>, Country <chr>, PostalCode <chr>,
## #   Phone <chr>, Fax <chr>, Email <chr>
query <- employees  %>% select(-EmployeeId,-LastName,-FirstName, -Title)
query
## # Source:   lazy query [?? x 11]
## # Database: sqlite 3.37.0 [C:\sqlite\chinook.db]
##   ReportsTo BirthDate  HireDate  Address   City  State Country PostalCode Phone 
##       <int> <chr>      <chr>     <chr>     <chr> <chr> <chr>   <chr>      <chr> 
## 1        NA 1962-02-1~ 2002-08-~ 11120 Ja~ Edmo~ AB    Canada  T5K 2N1    +1 (7~
## 2         1 1958-12-0~ 2002-05-~ 825 8 Av~ Calg~ AB    Canada  T2P 2T3    +1 (4~
## 3         2 1973-08-2~ 2002-04-~ 1111 6 A~ Calg~ AB    Canada  T2P 5M5    +1 (4~
## 4         2 1947-09-1~ 2003-05-~ 683 10 S~ Calg~ AB    Canada  T2P 5G3    +1 (4~
## 5         2 1965-03-0~ 2003-10-~ 7727B 41~ Calg~ AB    Canada  T3B 1Y7    1 (78~
## 6         1 1973-07-0~ 2003-10-~ 5827 Bow~ Calg~ AB    Canada  T3B 0C5    +1 (4~
## 7         6 1970-05-2~ 2004-01-~ 590 Colu~ Leth~ AB    Canada  T1K 5N8    +1 (4~
## 8         6 1968-01-0~ 2004-03-~ 923 7 ST~ Leth~ AB    Canada  T1H 1Y8    +1 (4~
## # ... with 2 more variables: Fax <chr>, Email <chr>
dplyr::show_query(query)
## <SQL>
## SELECT `ReportsTo`, `BirthDate`, `HireDate`, `Address`, `City`, `State`, `Country`, `PostalCode`, `Phone`, `Fax`, `Email`
## FROM `employees`

Data Wrangling

Dataset yang digunakan adalah storms yang tersedia pada package default R datasets.

library(datasets)
library(help = "datasets")
BOD
##   Time demand
## 1    1    8.3
## 2    2   10.3
## 3    3   19.0
## 4    4   16.0
## 5    5   15.6
## 6    7   19.8

The BOD data frame has 6 rows and 2 columns giving the biochemical oxygen demand versus time in an evaluation of water quality. Untuk mengetahui informasi terkait dataset dapat menggunakan sintaks ?sebelum nama dataset .

?BOD
View(BOD)

Dataset tersebut akan dilakukan perintah arrange , mutate ,filter,select, dan summarise.

BOD%>% select(Time,demand)
##   Time demand
## 1    1    8.3
## 2    2   10.3
## 3    3   19.0
## 4    4   16.0
## 5    5   15.6
## 6    7   19.8
BOD %>% arrange(desc(demand))
##   Time demand
## 1    7   19.8
## 2    3   19.0
## 3    4   16.0
## 4    5   15.6
## 5    2   10.3
## 6    1    8.3
BOD %>% filter(Time>3)
##   Time demand
## 1    4   16.0
## 2    5   15.6
## 3    7   19.8
BOD %>% mutate(duration=Time*demand)
##   Time demand duration
## 1    1    8.3      8.3
## 2    2   10.3     20.6
## 3    3   19.0     57.0
## 4    4   16.0     64.0
## 5    5   15.6     78.0
## 6    7   19.8    138.6
BOD %>% group_by(Time)%>%summarise(mean=mean(Time))
## # A tibble: 6 x 2
##    Time  mean
##   <dbl> <dbl>
## 1     1     1
## 2     2     2
## 3     3     3
## 4     4     4
## 5     5     5
## 6     7     7
BOD
##   Time demand
## 1    1    8.3
## 2    2   10.3
## 3    3   19.0
## 4    4   16.0
## 5    5   15.6
## 6    7   19.8

Melakukan kelima perintah bersamaan

BOD%>% select(Time,demand) %>% arrange(desc(demand)) %>%  filter(Time>3)%>% mutate(duration=Time*demand)%>% group_by(Time)%>%summarise(mean=mean(Time))
## # A tibble: 3 x 2
##    Time  mean
##   <dbl> <dbl>
## 1     4     4
## 2     5     5
## 3     7     7