Tugas Akhir Mandarel
Engine SQL
Database yang digunakan pada publikasi ini berasal dari sampel chinook . Supaya dapat mengakses dan mengolah database ini melalui R diperlukan packages yang akan digunakan yaitu DBI dan RSQLite.
install.packages(c("RSQLite", "DBI"), dependencies = TRUE)Selanjutnya package tidyverse yang mencakup beberapa package diretrieve. Fungsi dalam suatu package dapat juga dipanggil tanpa perintah library() atau require() jika sintaks yang ditulis menyertakan nama package tempat fungsi tersebut berasal. Penyertaan ini dengan menggunakan tanda ::.
library(tidyverse)
library(RSQLite)
library(DBI)Secara umum, koneksi terhadap database pada R dapat menggunakan sintaks berikut: DBI::dbConnect(RSQLite::SQLite(), path = ":dbname:")
chinook<-DBI::dbConnect(RSQLite::SQLite(), "C:/sqlite/chinook.db")
RSQLite::dbListTables(chinook)## [1] "albums" "artists" "customers" "employees"
## [5] "genres" "invoice_items" "invoices" "media_types"
## [9] "playlist_track" "playlists" "sqlite_sequence" "sqlite_stat1"
## [13] "tracks"
Setelah melakukan dbConnect() kita dapat melakukan perintah SQL.
SELECT
*
FROM
tracks;| TrackId | Name | AlbumId | MediaTypeId | GenreId | Composer | Milliseconds | Bytes | UnitPrice |
|---|---|---|---|---|---|---|---|---|
| 1 | For Those About To Rock (We Salute You) | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 343719 | 11170334 | 0.99 |
| 2 | Balls to the Wall | 2 | 2 | 1 | NA | 342562 | 5510424 | 0.99 |
| 3 | Fast As a Shark | 3 | 2 | 1 | F. Baltes, S. Kaufman, U. Dirkscneider & W. Hoffman | 230619 | 3990994 | 0.99 |
| 4 | Restless and Wild | 3 | 2 | 1 | F. Baltes, R.A. Smith-Diesel, S. Kaufman, U. Dirkscneider & W. Hoffman | 252051 | 4331779 | 0.99 |
| 5 | Princess of the Dawn | 3 | 2 | 1 | Deaffy & R.A. Smith-Diesel | 375418 | 6290521 | 0.99 |
| 6 | Put The Finger On You | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 205662 | 6713451 | 0.99 |
| 7 | Let's Get It Up | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 233926 | 7636561 | 0.99 |
| 8 | Inject The Venom | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 210834 | 6852860 | 0.99 |
| 9 | Snowballed | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 203102 | 6599424 | 0.99 |
| 10 | Evil Walks | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 263497 | 8611245 | 0.99 |
Setelah selesai mengolah database terkait, untuk memutuskan hubungan kita dapat menggunakan fungsi dbDisconnect() dan jika ingin menyambungkan kembali cukup menggunakan kembali dbConnect()
dbDisconnect(chinook)Menggunakan DPLYR
chinook<-DBI::dbConnect(RSQLite::SQLite(), "C:/sqlite/chinook.db")
RSQLite::dbListTables(chinook)## [1] "albums" "artists" "customers" "employees"
## [5] "genres" "invoice_items" "invoices" "media_types"
## [9] "playlist_track" "playlists" "sqlite_sequence" "sqlite_stat1"
## [13] "tracks"
dplyr::tbl(chinook,"tracks")## # Source: table<tracks> [?? x 9]
## # Database: sqlite 3.37.0 [C:\sqlite\chinook.db]
## TrackId Name AlbumId MediaTypeId GenreId Composer Milliseconds Bytes
## <int> <chr> <int> <int> <int> <chr> <int> <int>
## 1 1 For Tho~ 1 1 1 Angus Young~ 343719 1.12e7
## 2 2 Balls t~ 2 2 1 <NA> 342562 5.51e6
## 3 3 Fast As~ 3 2 1 F. Baltes, ~ 230619 3.99e6
## 4 4 Restles~ 3 2 1 F. Baltes, ~ 252051 4.33e6
## 5 5 Princes~ 3 2 1 Deaffy & R.~ 375418 6.29e6
## 6 6 Put The~ 1 1 1 Angus Young~ 205662 6.71e6
## 7 7 Let's G~ 1 1 1 Angus Young~ 233926 7.64e6
## 8 8 Inject ~ 1 1 1 Angus Young~ 210834 6.85e6
## 9 9 Snowbal~ 1 1 1 Angus Young~ 203102 6.60e6
## 10 10 Evil Wa~ 1 1 1 Angus Young~ 263497 8.61e6
## # ... with more rows, and 1 more variable: UnitPrice <dbl>
tracks <- dplyr::tbl(chinook,"tracks")
class(tracks)## [1] "tbl_SQLiteConnection" "tbl_dbi" "tbl_sql"
## [4] "tbl_lazy" "tbl"
tracks## # Source: table<tracks> [?? x 9]
## # Database: sqlite 3.37.0 [C:\sqlite\chinook.db]
## TrackId Name AlbumId MediaTypeId GenreId Composer Milliseconds Bytes
## <int> <chr> <int> <int> <int> <chr> <int> <int>
## 1 1 For Tho~ 1 1 1 Angus Young~ 343719 1.12e7
## 2 2 Balls t~ 2 2 1 <NA> 342562 5.51e6
## 3 3 Fast As~ 3 2 1 F. Baltes, ~ 230619 3.99e6
## 4 4 Restles~ 3 2 1 F. Baltes, ~ 252051 4.33e6
## 5 5 Princes~ 3 2 1 Deaffy & R.~ 375418 6.29e6
## 6 6 Put The~ 1 1 1 Angus Young~ 205662 6.71e6
## 7 7 Let's G~ 1 1 1 Angus Young~ 233926 7.64e6
## 8 8 Inject ~ 1 1 1 Angus Young~ 210834 6.85e6
## 9 9 Snowbal~ 1 1 1 Angus Young~ 203102 6.60e6
## 10 10 Evil Wa~ 1 1 1 Angus Young~ 263497 8.61e6
## # ... with more rows, and 1 more variable: UnitPrice <dbl>
query <- tracks %>% select(-AlbumId)
query## # Source: lazy query [?? x 8]
## # Database: sqlite 3.37.0 [C:\sqlite\chinook.db]
## TrackId Name MediaTypeId GenreId Composer Milliseconds Bytes UnitPrice
## <int> <chr> <int> <int> <chr> <int> <int> <dbl>
## 1 1 For Th~ 1 1 Angus Youn~ 343719 1.12e7 0.99
## 2 2 Balls ~ 2 1 <NA> 342562 5.51e6 0.99
## 3 3 Fast A~ 2 1 F. Baltes,~ 230619 3.99e6 0.99
## 4 4 Restle~ 2 1 F. Baltes,~ 252051 4.33e6 0.99
## 5 5 Prince~ 2 1 Deaffy & R~ 375418 6.29e6 0.99
## 6 6 Put Th~ 1 1 Angus Youn~ 205662 6.71e6 0.99
## 7 7 Let's ~ 1 1 Angus Youn~ 233926 7.64e6 0.99
## 8 8 Inject~ 1 1 Angus Youn~ 210834 6.85e6 0.99
## 9 9 Snowba~ 1 1 Angus Youn~ 203102 6.60e6 0.99
## 10 10 Evil W~ 1 1 Angus Youn~ 263497 8.61e6 0.99
## # ... with more rows
dplyr::show_query(query)## <SQL>
## SELECT `TrackId`, `Name`, `MediaTypeId`, `GenreId`, `Composer`, `Milliseconds`, `Bytes`, `UnitPrice`
## FROM `tracks`
Data Wrangling
Dataset yang digunakan adalah storms yang tersedia pada package default R datasets.
library(datasets)
library(help = "datasets")
PlantGrowth## weight group
## 1 4.17 ctrl
## 2 5.58 ctrl
## 3 5.18 ctrl
## 4 6.11 ctrl
## 5 4.50 ctrl
## 6 4.61 ctrl
## 7 5.17 ctrl
## 8 4.53 ctrl
## 9 5.33 ctrl
## 10 5.14 ctrl
## 11 4.81 trt1
## 12 4.17 trt1
## 13 4.41 trt1
## 14 3.59 trt1
## 15 5.87 trt1
## 16 3.83 trt1
## 17 6.03 trt1
## 18 4.89 trt1
## 19 4.32 trt1
## 20 4.69 trt1
## 21 6.31 trt2
## 22 5.12 trt2
## 23 5.54 trt2
## 24 5.50 trt2
## 25 5.37 trt2
## 26 5.29 trt2
## 27 4.92 trt2
## 28 6.15 trt2
## 29 5.80 trt2
## 30 5.26 trt2
Packages plantgrowth biasanya hasil dari percobaan untuk membandingkan hasil (yang diukur dengan berat kering tanaman) yang diperoleh di bawah kontrol dan dua kondisi perlakuan yang berbeda.Formatnya data frame terdiri atas dua kasus pada dua variabel [,1]berat yang berupa numerik, [,2]kelompok faktor. Untuk mengetahui informasi terkait dataset dapat menggunakan sintaks ?dan meletakannya sebelum nama dataset .
?PlantGrowth
View(PlantGrowth)Dataset tersebut akan dilakukan perintah arrange , mutate ,filter,select, dan summarise.
PlantGrowth%>% select(weight,group)## weight group
## 1 4.17 ctrl
## 2 5.58 ctrl
## 3 5.18 ctrl
## 4 6.11 ctrl
## 5 4.50 ctrl
## 6 4.61 ctrl
## 7 5.17 ctrl
## 8 4.53 ctrl
## 9 5.33 ctrl
## 10 5.14 ctrl
## 11 4.81 trt1
## 12 4.17 trt1
## 13 4.41 trt1
## 14 3.59 trt1
## 15 5.87 trt1
## 16 3.83 trt1
## 17 6.03 trt1
## 18 4.89 trt1
## 19 4.32 trt1
## 20 4.69 trt1
## 21 6.31 trt2
## 22 5.12 trt2
## 23 5.54 trt2
## 24 5.50 trt2
## 25 5.37 trt2
## 26 5.29 trt2
## 27 4.92 trt2
## 28 6.15 trt2
## 29 5.80 trt2
## 30 5.26 trt2
PlantGrowth %>% arrange(desc(weight))## weight group
## 1 6.31 trt2
## 2 6.15 trt2
## 3 6.11 ctrl
## 4 6.03 trt1
## 5 5.87 trt1
## 6 5.80 trt2
## 7 5.58 ctrl
## 8 5.54 trt2
## 9 5.50 trt2
## 10 5.37 trt2
## 11 5.33 ctrl
## 12 5.29 trt2
## 13 5.26 trt2
## 14 5.18 ctrl
## 15 5.17 ctrl
## 16 5.14 ctrl
## 17 5.12 trt2
## 18 4.92 trt2
## 19 4.89 trt1
## 20 4.81 trt1
## 21 4.69 trt1
## 22 4.61 ctrl
## 23 4.53 ctrl
## 24 4.50 ctrl
## 25 4.41 trt1
## 26 4.32 trt1
## 27 4.17 ctrl
## 28 4.17 trt1
## 29 3.83 trt1
## 30 3.59 trt1
PlantGrowth %>% filter(weight>5.12)## weight group
## 1 5.58 ctrl
## 2 5.18 ctrl
## 3 6.11 ctrl
## 4 5.17 ctrl
## 5 5.33 ctrl
## 6 5.14 ctrl
## 7 5.87 trt1
## 8 6.03 trt1
## 9 6.31 trt2
## 10 5.54 trt2
## 11 5.50 trt2
## 12 5.37 trt2
## 13 5.29 trt2
## 14 6.15 trt2
## 15 5.80 trt2
## 16 5.26 trt2
PlantGrowth %>% mutate(kuadrat=weight^2)## weight group kuadrat
## 1 4.17 ctrl 17.3889
## 2 5.58 ctrl 31.1364
## 3 5.18 ctrl 26.8324
## 4 6.11 ctrl 37.3321
## 5 4.50 ctrl 20.2500
## 6 4.61 ctrl 21.2521
## 7 5.17 ctrl 26.7289
## 8 4.53 ctrl 20.5209
## 9 5.33 ctrl 28.4089
## 10 5.14 ctrl 26.4196
## 11 4.81 trt1 23.1361
## 12 4.17 trt1 17.3889
## 13 4.41 trt1 19.4481
## 14 3.59 trt1 12.8881
## 15 5.87 trt1 34.4569
## 16 3.83 trt1 14.6689
## 17 6.03 trt1 36.3609
## 18 4.89 trt1 23.9121
## 19 4.32 trt1 18.6624
## 20 4.69 trt1 21.9961
## 21 6.31 trt2 39.8161
## 22 5.12 trt2 26.2144
## 23 5.54 trt2 30.6916
## 24 5.50 trt2 30.2500
## 25 5.37 trt2 28.8369
## 26 5.29 trt2 27.9841
## 27 4.92 trt2 24.2064
## 28 6.15 trt2 37.8225
## 29 5.80 trt2 33.6400
## 30 5.26 trt2 27.6676
PlantGrowth %>% group_by(weight)%>%summarise(mean=mean(weight))## # A tibble: 29 x 2
## weight mean
## <dbl> <dbl>
## 1 3.59 3.59
## 2 3.83 3.83
## 3 4.17 4.17
## 4 4.32 4.32
## 5 4.41 4.41
## 6 4.5 4.5
## 7 4.53 4.53
## 8 4.61 4.61
## 9 4.69 4.69
## 10 4.81 4.81
## # ... with 19 more rows
PlantGrowth## weight group
## 1 4.17 ctrl
## 2 5.58 ctrl
## 3 5.18 ctrl
## 4 6.11 ctrl
## 5 4.50 ctrl
## 6 4.61 ctrl
## 7 5.17 ctrl
## 8 4.53 ctrl
## 9 5.33 ctrl
## 10 5.14 ctrl
## 11 4.81 trt1
## 12 4.17 trt1
## 13 4.41 trt1
## 14 3.59 trt1
## 15 5.87 trt1
## 16 3.83 trt1
## 17 6.03 trt1
## 18 4.89 trt1
## 19 4.32 trt1
## 20 4.69 trt1
## 21 6.31 trt2
## 22 5.12 trt2
## 23 5.54 trt2
## 24 5.50 trt2
## 25 5.37 trt2
## 26 5.29 trt2
## 27 4.92 trt2
## 28 6.15 trt2
## 29 5.80 trt2
## 30 5.26 trt2
Melakukan kelima perintah bersamaan
PlantGrowth%>% select(weight,group) %>% arrange(desc(weight)) %>% filter(weight>5.12)%>% mutate(kuadrat=weight^2)%>% group_by(weight)%>%summarise(mean=mean(weight))## # A tibble: 16 x 2
## weight mean
## <dbl> <dbl>
## 1 5.14 5.14
## 2 5.17 5.17
## 3 5.18 5.18
## 4 5.26 5.26
## 5 5.29 5.29
## 6 5.33 5.33
## 7 5.37 5.37
## 8 5.5 5.5
## 9 5.54 5.54
## 10 5.58 5.58
## 11 5.8 5.8
## 12 5.87 5.87
## 13 6.03 6.03
## 14 6.11 6.11
## 15 6.15 6.15
## 16 6.31 6.31