Tugas Akhir Mandarel
Engine SQL
Database yang digunakan pada publikasi ini adalah database sampel chinook digunakan sebagai database untuk mengakses dan mengolah melalui R. Lalu, diperlukan beberapa package, yaitu DBI dan RSQLite.
install.packages(c("RSQLite", "DBI"), dependencies = TRUE)Selanjutnya package tidyverse yang mencakup beberapa package diretrieve. Fungsi dalam suatu package dapat juga dipanggil tanpa perintah library() atau require() jika sintaks yang ditulis menyertakan nama package tempat fungsi tersebut berasal. Penyertaan ini dengan menggunakan tanda ::.
library(tidyverse)
library(RSQLite)
library(DBI)Secara umum, koneksi terhadap database pada R dapat menggunakan sintaks berikut: DBI::dbConnect(RSQLite::SQLite(), path = ":dbname:")
chinook<-DBI::dbConnect(RSQLite::SQLite(), "C:/SQLite/chinook.db")
RSQLite::dbListTables(chinook)## [1] "albums" "artists" "customers" "employees"
## [5] "genres" "invoice_items" "invoices" "media_types"
## [9] "playlist_track" "playlists" "sqlite_sequence" "sqlite_stat1"
## [13] "tracks"
Setelah melakukan dbConnect() kita dapat melakukan perintah SQL.
SELECT
*
FROM
employees;| EmployeeId | LastName | FirstName | Title | ReportsTo | BirthDate | HireDate | Address | City | State | Country | PostalCode | Phone | Fax | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Adams | Andrew | General Manager | NA | 1962-02-18 00:00:00 | 2002-08-14 00:00:00 | 11120 Jasper Ave NW | Edmonton | AB | Canada | T5K 2N1 | +1 (780) 428-9482 | +1 (780) 428-3457 | andrew@chinookcorp.com |
| 2 | Edwards | Nancy | Sales Manager | 1 | 1958-12-08 00:00:00 | 2002-05-01 00:00:00 | 825 8 Ave SW | Calgary | AB | Canada | T2P 2T3 | +1 (403) 262-3443 | +1 (403) 262-3322 | nancy@chinookcorp.com |
| 3 | Peacock | Jane | Sales Support Agent | 2 | 1973-08-29 00:00:00 | 2002-04-01 00:00:00 | 1111 6 Ave SW | Calgary | AB | Canada | T2P 5M5 | +1 (403) 262-3443 | +1 (403) 262-6712 | jane@chinookcorp.com |
| 4 | Park | Margaret | Sales Support Agent | 2 | 1947-09-19 00:00:00 | 2003-05-03 00:00:00 | 683 10 Street SW | Calgary | AB | Canada | T2P 5G3 | +1 (403) 263-4423 | +1 (403) 263-4289 | margaret@chinookcorp.com |
| 5 | Johnson | Steve | Sales Support Agent | 2 | 1965-03-03 00:00:00 | 2003-10-17 00:00:00 | 7727B 41 Ave | Calgary | AB | Canada | T3B 1Y7 | 1 (780) 836-9987 | 1 (780) 836-9543 | steve@chinookcorp.com |
| 6 | Mitchell | Michael | IT Manager | 1 | 1973-07-01 00:00:00 | 2003-10-17 00:00:00 | 5827 Bowness Road NW | Calgary | AB | Canada | T3B 0C5 | +1 (403) 246-9887 | +1 (403) 246-9899 | michael@chinookcorp.com |
| 7 | King | Robert | IT Staff | 6 | 1970-05-29 00:00:00 | 2004-01-02 00:00:00 | 590 Columbia Boulevard West | Lethbridge | AB | Canada | T1K 5N8 | +1 (403) 456-9986 | +1 (403) 456-8485 | robert@chinookcorp.com |
| 8 | Callahan | Laura | IT Staff | 6 | 1968-01-09 00:00:00 | 2004-03-04 00:00:00 | 923 7 ST NW | Lethbridge | AB | Canada | T1H 1Y8 | +1 (403) 467-3351 | +1 (403) 467-8772 | laura@chinookcorp.com |
Setelah selesai mengolah database terkait dianjurkan memutuskan koneksi dengan fungsi dbDisconnect() dan jika ingin menyambungkan kembali cukup menggunakan kembali dbConnect()
dbDisconnect(chinook)Menggunakan DPLYR
chinook<-DBI::dbConnect(RSQLite::SQLite(), "C:/SQLite/chinook.db")
RSQLite::dbListTables(chinook)## [1] "albums" "artists" "customers" "employees"
## [5] "genres" "invoice_items" "invoices" "media_types"
## [9] "playlist_track" "playlists" "sqlite_sequence" "sqlite_stat1"
## [13] "tracks"
dplyr::tbl(chinook,"employees")## # Source: table<employees> [?? x 15]
## # Database: sqlite 3.37.0 [C:\sqlite\chinook.db]
## EmployeeId LastName FirstName Title ReportsTo BirthDate HireDate Address City
## <int> <chr> <chr> <chr> <int> <chr> <chr> <chr> <chr>
## 1 1 Adams Andrew Gene~ NA 1962-02-~ 2002-08~ 11120 ~ Edmo~
## 2 2 Edwards Nancy Sale~ 1 1958-12-~ 2002-05~ 825 8 ~ Calg~
## 3 3 Peacock Jane Sale~ 2 1973-08-~ 2002-04~ 1111 6~ Calg~
## 4 4 Park Margaret Sale~ 2 1947-09-~ 2003-05~ 683 10~ Calg~
## 5 5 Johnson Steve Sale~ 2 1965-03-~ 2003-10~ 7727B ~ Calg~
## 6 6 Mitchell Michael IT M~ 1 1973-07-~ 2003-10~ 5827 B~ Calg~
## 7 7 King Robert IT S~ 6 1970-05-~ 2004-01~ 590 Co~ Leth~
## 8 8 Callahan Laura IT S~ 6 1968-01-~ 2004-03~ 923 7 ~ Leth~
## # ... with 6 more variables: State <chr>, Country <chr>, PostalCode <chr>,
## # Phone <chr>, Fax <chr>, Email <chr>
employees <- dplyr::tbl(chinook,"employees")
class(employees)## [1] "tbl_SQLiteConnection" "tbl_dbi" "tbl_sql"
## [4] "tbl_lazy" "tbl"
employees## # Source: table<employees> [?? x 15]
## # Database: sqlite 3.37.0 [C:\sqlite\chinook.db]
## EmployeeId LastName FirstName Title ReportsTo BirthDate HireDate Address City
## <int> <chr> <chr> <chr> <int> <chr> <chr> <chr> <chr>
## 1 1 Adams Andrew Gene~ NA 1962-02-~ 2002-08~ 11120 ~ Edmo~
## 2 2 Edwards Nancy Sale~ 1 1958-12-~ 2002-05~ 825 8 ~ Calg~
## 3 3 Peacock Jane Sale~ 2 1973-08-~ 2002-04~ 1111 6~ Calg~
## 4 4 Park Margaret Sale~ 2 1947-09-~ 2003-05~ 683 10~ Calg~
## 5 5 Johnson Steve Sale~ 2 1965-03-~ 2003-10~ 7727B ~ Calg~
## 6 6 Mitchell Michael IT M~ 1 1973-07-~ 2003-10~ 5827 B~ Calg~
## 7 7 King Robert IT S~ 6 1970-05-~ 2004-01~ 590 Co~ Leth~
## 8 8 Callahan Laura IT S~ 6 1968-01-~ 2004-03~ 923 7 ~ Leth~
## # ... with 6 more variables: State <chr>, Country <chr>, PostalCode <chr>,
## # Phone <chr>, Fax <chr>, Email <chr>
query <- employees %>% select(-EmployeeId,-LastName,-FirstName, -Title)
query## # Source: lazy query [?? x 11]
## # Database: sqlite 3.37.0 [C:\sqlite\chinook.db]
## ReportsTo BirthDate HireDate Address City State Country PostalCode Phone
## <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 NA 1962-02-1~ 2002-08-~ 11120 Ja~ Edmo~ AB Canada T5K 2N1 +1 (7~
## 2 1 1958-12-0~ 2002-05-~ 825 8 Av~ Calg~ AB Canada T2P 2T3 +1 (4~
## 3 2 1973-08-2~ 2002-04-~ 1111 6 A~ Calg~ AB Canada T2P 5M5 +1 (4~
## 4 2 1947-09-1~ 2003-05-~ 683 10 S~ Calg~ AB Canada T2P 5G3 +1 (4~
## 5 2 1965-03-0~ 2003-10-~ 7727B 41~ Calg~ AB Canada T3B 1Y7 1 (78~
## 6 1 1973-07-0~ 2003-10-~ 5827 Bow~ Calg~ AB Canada T3B 0C5 +1 (4~
## 7 6 1970-05-2~ 2004-01-~ 590 Colu~ Leth~ AB Canada T1K 5N8 +1 (4~
## 8 6 1968-01-0~ 2004-03-~ 923 7 ST~ Leth~ AB Canada T1H 1Y8 +1 (4~
## # ... with 2 more variables: Fax <chr>, Email <chr>
dplyr::show_query(query)## <SQL>
## SELECT `ReportsTo`, `BirthDate`, `HireDate`, `Address`, `City`, `State`, `Country`, `PostalCode`, `Phone`, `Fax`, `Email`
## FROM `employees`
Data Wrangling
Dataset yang digunakan adalah storms yang tersedia pada package default R datasets.
library(datasets)
library(help = "datasets")
BOD## Time demand
## 1 1 8.3
## 2 2 10.3
## 3 3 19.0
## 4 4 16.0
## 5 5 15.6
## 6 7 19.8
The BOD data frame has 6 rows and 2 columns giving the biochemical oxygen demand versus time in an evaluation of water quality. Untuk mengetahui informasi terkait dataset dapat menggunakan sintaks ?sebelum nama dataset .
?BOD
View(BOD)Dataset tersebut akan dilakukan perintah arrange , mutate ,filter,select, dan summarise.
BOD%>% select(Time,demand)## Time demand
## 1 1 8.3
## 2 2 10.3
## 3 3 19.0
## 4 4 16.0
## 5 5 15.6
## 6 7 19.8
BOD %>% arrange(desc(demand))## Time demand
## 1 7 19.8
## 2 3 19.0
## 3 4 16.0
## 4 5 15.6
## 5 2 10.3
## 6 1 8.3
BOD %>% filter(Time>3)## Time demand
## 1 4 16.0
## 2 5 15.6
## 3 7 19.8
BOD %>% mutate(duration=Time*demand)## Time demand duration
## 1 1 8.3 8.3
## 2 2 10.3 20.6
## 3 3 19.0 57.0
## 4 4 16.0 64.0
## 5 5 15.6 78.0
## 6 7 19.8 138.6
BOD %>% group_by(Time)%>%summarise(mean=mean(Time))## # A tibble: 6 x 2
## Time mean
## <dbl> <dbl>
## 1 1 1
## 2 2 2
## 3 3 3
## 4 4 4
## 5 5 5
## 6 7 7
BOD## Time demand
## 1 1 8.3
## 2 2 10.3
## 3 3 19.0
## 4 4 16.0
## 5 5 15.6
## 6 7 19.8
Melakukan kelima perintah bersamaan
BOD%>% select(Time,demand) %>% arrange(desc(demand)) %>% filter(Time>3)%>% mutate(duration=Time*demand)%>% group_by(Time)%>%summarise(mean=mean(Time))## # A tibble: 3 x 2
## Time mean
## <dbl> <dbl>
## 1 4 4
## 2 5 5
## 3 7 7