Tugas Akhir Mandarel

Engine SQL

Database yang digunakan pada publikasi ini adalah database sampel northwindSumber . Untuk mengakses dan mengolah database ini melalui R diperlukan package DBI dan RSQLite.

Melakukan Instalasi Package yang dibutuhkan

install.packages(c("RSQLite", "DBI"), dependencies = TRUE)
install.packages("tidyverse")

Selanjutnya package tidyverse yang mencakup beberapa package dipanggil. Fungsi dalam suatu package dapat juga dipanggil tanpa perintah library() atau require() jika sintaks yang ditulis menyertakan nama package tempat fungsi tersebut berasal. Penyertaan ini dengan menggunakan tanda ::.

Memanggil library

library(tidyverse)
library(RSQLite)
library(DBI)

Secara umum, koneksi terhadap database pada R dapat menggunakan sintaks berikut: DBI::dbConnect(RSQLite::SQLite(), path = ":dbname:")

Melakukan Koneksi pada Database

northwind<-DBI::dbConnect(RSQLite::SQLite(), "C:/sqlite/northwind_large.sqlite")

Menggunakan perintah SQL pada database

Setelah melakukan dbConnect() kita dapat melakukan perintah SQL.

SELECT 
  *
FROM
  EMPLOYEE
WHERE
Region ="North America";
5 records
Id LastName FirstName Title TitleOfCourtesy BirthDate HireDate Address City Region PostalCode Country HomePhone Extension Photo Notes ReportsTo PhotoPath
1 Davolio Nancy Sales Representative Ms. 1980-12-08 2024-05-01 507 - 20th Ave. E. Apt. 2A Seattle North America 98122 USA (206) 555-9857 5467 Education includes a BA in psychology from Colorado State University in 1970. She also completed ‘The Art of the Cold Call.’ Nancy is a member of Toastmasters International. 2 http://accweb/emmployees/davolio.bmp
2 Fuller Andrew Vice President, Sales Dr. 1984-02-19 2024-08-14 908 W. Capital Way Tacoma North America 98401 USA (206) 555-9482 3457 Andrew received his BTS commercial in 1974 and a Ph.D. in international marketing from the University of Dallas in 1981. He is fluent in French and Italian and reads German. He joined the company as a sales representative, was promoted to sales manager in January 1992 and to vice president of sales in March 1993. Andrew is a member of the Sales Management Roundtable, the Seattle Chamber of Commerce, and the Pacific Rim Importers Association. NA http://accweb/emmployees/fuller.bmp
3 Leverling Janet Sales Representative Ms. 1995-08-30 2024-04-01 722 Moss Bay Blvd. Kirkland North America 98033 USA (206) 555-3412 3355 Janet has a BS degree in chemistry from Boston College (1984). She has also completed a certificate program in food retailing management. Janet was hired as a sales associate in 1991 and promoted to sales representative in February 1992. 2 http://accweb/emmployees/leverling.bmp
4 Peacock Margaret Sales Representative Mrs. 1969-09-19 2025-05-03 4110 Old Redmond Rd. Redmond North America 98052 USA (206) 555-8122 5176 Margaret holds a BA in English literature from Concordia College (1958) and an MA from the American Institute of Culinary Arts (1966). She was assigned to the London office temporarily from July through November 1992. 2 http://accweb/emmployees/peacock.bmp
8 Callahan Laura Inside Sales Coordinator Ms. 1990-01-09 2026-03-05 4726 - 11th Ave. N.E. Seattle North America 98105 USA (206) 555-1189 2344 Laura received a BA in psychology from the University of Washington. She has also completed a course in business French. She reads and writes French. 2 http://accweb/emmployees/davolio.bmp

Memutus Koneksi Database

Setelah selesai mengolah database terkait dianjurkan memutuskan koneksi dengan fungsi dbDisconnect() dan jika ingin menyambungkan kembali cukup menggunakan kembali dbConnect()

dbDisconnect(northwind)

Menggunakan DPLYR

Melakukan Koneksi pada Database Northwind

northwind<-DBI::dbConnect(RSQLite::SQLite(), "C:/sqlite/northwind_large.sqlite")

Memuat Tabel

employee <- dplyr::tbl(northwind,"Employee")
employee
## # Source:   table<Employee> [?? x 18]
## # Database: sqlite 3.36.0 [C:\sqlite\northwind_large.sqlite]
##      Id LastName  FirstName Title   TitleOfCourtesy BirthDate HireDate Address  
##   <int> <chr>     <chr>     <chr>   <chr>           <chr>     <chr>    <chr>    
## 1     1 Davolio   Nancy     Sales ~ Ms.             1980-12-~ 2024-05~ 507 - 20~
## 2     2 Fuller    Andrew    Vice P~ Dr.             1984-02-~ 2024-08~ 908 W. C~
## 3     3 Leverling Janet     Sales ~ Ms.             1995-08-~ 2024-04~ 722 Moss~
## 4     4 Peacock   Margaret  Sales ~ Mrs.            1969-09-~ 2025-05~ 4110 Old~
## 5     5 Buchanan  Steven    Sales ~ Mr.             1987-03-~ 2025-10~ 14 Garre~
## 6     6 Suyama    Michael   Sales ~ Mr.             1995-07-~ 2025-10~ Coventry~
## 7     7 King      Robert    Sales ~ Mr.             1992-05-~ 2026-01~ Edgeham ~
## 8     8 Callahan  Laura     Inside~ Ms.             1990-01-~ 2026-03~ 4726 - 1~
## 9     9 Dodsworth Anne      Sales ~ Ms.             1998-01-~ 2026-11~ 7 Hounds~
## # ... with 10 more variables: City <chr>, Region <chr>, PostalCode <chr>,
## #   Country <chr>, HomePhone <chr>, Extension <chr>, Photo <blob>, Notes <chr>,
## #   ReportsTo <int>, PhotoPath <chr>

Melihat Kelas dari Tabel

class(employee)
## [1] "tbl_SQLiteConnection" "tbl_dbi"              "tbl_sql"             
## [4] "tbl_lazy"             "tbl"

Menyeleksi Kolom pada Tabel

query <- employee  %>% select(-Photo,-Notes,-PhotoPath)
query
## # Source:   lazy query [?? x 15]
## # Database: sqlite 3.36.0 [C:\sqlite\northwind_large.sqlite]
##      Id LastName  FirstName Title   TitleOfCourtesy BirthDate HireDate Address  
##   <int> <chr>     <chr>     <chr>   <chr>           <chr>     <chr>    <chr>    
## 1     1 Davolio   Nancy     Sales ~ Ms.             1980-12-~ 2024-05~ 507 - 20~
## 2     2 Fuller    Andrew    Vice P~ Dr.             1984-02-~ 2024-08~ 908 W. C~
## 3     3 Leverling Janet     Sales ~ Ms.             1995-08-~ 2024-04~ 722 Moss~
## 4     4 Peacock   Margaret  Sales ~ Mrs.            1969-09-~ 2025-05~ 4110 Old~
## 5     5 Buchanan  Steven    Sales ~ Mr.             1987-03-~ 2025-10~ 14 Garre~
## 6     6 Suyama    Michael   Sales ~ Mr.             1995-07-~ 2025-10~ Coventry~
## 7     7 King      Robert    Sales ~ Mr.             1992-05-~ 2026-01~ Edgeham ~
## 8     8 Callahan  Laura     Inside~ Ms.             1990-01-~ 2026-03~ 4726 - 1~
## 9     9 Dodsworth Anne      Sales ~ Ms.             1998-01-~ 2026-11~ 7 Hounds~
## # ... with 7 more variables: City <chr>, Region <chr>, PostalCode <chr>,
## #   Country <chr>, HomePhone <chr>, Extension <chr>, ReportsTo <int>

Data Wrangling

Dataset yang digunakan adalah storms yang tersedia pada package default R datasets. ## Memanggil Library datasets dan memilih dataset

library(datasets)
storms
## # A tibble: 10,010 x 13
##    name   year month   day  hour   lat  long status      category  wind pressure
##    <chr> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <chr>       <ord>    <int>    <int>
##  1 Amy    1975     6    27     0  27.5 -79   tropical d~ -1          25     1013
##  2 Amy    1975     6    27     6  28.5 -79   tropical d~ -1          25     1013
##  3 Amy    1975     6    27    12  29.5 -79   tropical d~ -1          25     1013
##  4 Amy    1975     6    27    18  30.5 -79   tropical d~ -1          25     1013
##  5 Amy    1975     6    28     0  31.5 -78.8 tropical d~ -1          25     1012
##  6 Amy    1975     6    28     6  32.4 -78.7 tropical d~ -1          25     1012
##  7 Amy    1975     6    28    12  33.3 -78   tropical d~ -1          25     1011
##  8 Amy    1975     6    28    18  34   -77   tropical d~ -1          30     1006
##  9 Amy    1975     6    29     0  34.4 -75.8 tropical s~ 0           35     1004
## 10 Amy    1975     6    29     6  34   -74.8 tropical s~ 0           40     1002
## # ... with 10,000 more rows, and 2 more variables: ts_diameter <dbl>,
## #   hu_diameter <dbl>

Data ini adalah bagian dari data track terbaik database badai Atlantik NOAA, Sumber . Data tersebut mencakup posisi dan atribut dari 198 badai tropis, yang diukur setiap enam jam selama masa badai. ## Melihat Detail dari Dataset Sampel Untuk mengetahui informasi terkait dataset dapat menggunakan sintaks ?sebelum nama dataset .

?storms
View(storms)

Dataset tersebut akan dilakukan perintah arrange , mutate ,filter,select, dan summarise.

Memilih Kolom Tertentu

Sintaks dibawah jika dijalankan akan memuat semua kolom pada tabel storms kecuali kolom ts_diameter dan hu_diameter.

storms%>% select(-ts_diameter,-hu_diameter)
## # A tibble: 10,010 x 11
##    name   year month   day  hour   lat  long status      category  wind pressure
##    <chr> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <chr>       <ord>    <int>    <int>
##  1 Amy    1975     6    27     0  27.5 -79   tropical d~ -1          25     1013
##  2 Amy    1975     6    27     6  28.5 -79   tropical d~ -1          25     1013
##  3 Amy    1975     6    27    12  29.5 -79   tropical d~ -1          25     1013
##  4 Amy    1975     6    27    18  30.5 -79   tropical d~ -1          25     1013
##  5 Amy    1975     6    28     0  31.5 -78.8 tropical d~ -1          25     1012
##  6 Amy    1975     6    28     6  32.4 -78.7 tropical d~ -1          25     1012
##  7 Amy    1975     6    28    12  33.3 -78   tropical d~ -1          25     1011
##  8 Amy    1975     6    28    18  34   -77   tropical d~ -1          30     1006
##  9 Amy    1975     6    29     0  34.4 -75.8 tropical s~ 0           35     1004
## 10 Amy    1975     6    29     6  34   -74.8 tropical s~ 0           40     1002
## # ... with 10,000 more rows

Mengurutkan Data Berdasarkan Peubah Tertentu

Sintaks dibawah jika dijalankan akan mengurutkan data secara descending berdasarkan peubah pressure

storms %>% arrange(desc(pressure))
## # A tibble: 10,010 x 13
##    name    year month   day  hour   lat  long status     category  wind pressure
##    <chr>  <dbl> <dbl> <int> <dbl> <dbl> <dbl> <chr>      <ord>    <int>    <int>
##  1 AL072~  2003     7    26    12  32.3 -82   tropical ~ -1          20     1022
##  2 AL072~  2003     7    26    18  32.8 -82.6 tropical ~ -1          15     1022
##  3 AL072~  2003     7    27     0  33   -83   tropical ~ -1          15     1022
##  4 Emily   1993     8    22    18  19.9 -52.6 tropical ~ -1          30     1020
##  5 Emily   1993     8    23     0  20.5 -53.6 tropical ~ -1          30     1020
##  6 Emily   1993     8    23     6  21.3 -54.8 tropical ~ -1          30     1020
##  7 Emily   1993     8    23    12  22.3 -56   tropical ~ -1          30     1020
##  8 Emily   1993     8    23    18  23.2 -57.1 tropical ~ -1          30     1020
##  9 Emily   1993     8    24     0  24.3 -57.8 tropical ~ -1          30     1020
## 10 Emily   1993     8    24     6  25.4 -58.6 tropical ~ -1          30     1020
## # ... with 10,000 more rows, and 2 more variables: ts_diameter <dbl>,
## #   hu_diameter <dbl>

Melakukan Filter pada Tabel

storms %>% filter(wind>20)
## # A tibble: 9,723 x 13
##    name   year month   day  hour   lat  long status      category  wind pressure
##    <chr> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <chr>       <ord>    <int>    <int>
##  1 Amy    1975     6    27     0  27.5 -79   tropical d~ -1          25     1013
##  2 Amy    1975     6    27     6  28.5 -79   tropical d~ -1          25     1013
##  3 Amy    1975     6    27    12  29.5 -79   tropical d~ -1          25     1013
##  4 Amy    1975     6    27    18  30.5 -79   tropical d~ -1          25     1013
##  5 Amy    1975     6    28     0  31.5 -78.8 tropical d~ -1          25     1012
##  6 Amy    1975     6    28     6  32.4 -78.7 tropical d~ -1          25     1012
##  7 Amy    1975     6    28    12  33.3 -78   tropical d~ -1          25     1011
##  8 Amy    1975     6    28    18  34   -77   tropical d~ -1          30     1006
##  9 Amy    1975     6    29     0  34.4 -75.8 tropical s~ 0           35     1004
## 10 Amy    1975     6    29     6  34   -74.8 tropical s~ 0           40     1002
## # ... with 9,713 more rows, and 2 more variables: ts_diameter <dbl>,
## #   hu_diameter <dbl>

Menambahkan Peubah Baru berdasarkan Peubah Lain dengan fungsi mutate()

storms %>% mutate(duration=day*hour)
## # A tibble: 10,010 x 14
##    name   year month   day  hour   lat  long status      category  wind pressure
##    <chr> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <chr>       <ord>    <int>    <int>
##  1 Amy    1975     6    27     0  27.5 -79   tropical d~ -1          25     1013
##  2 Amy    1975     6    27     6  28.5 -79   tropical d~ -1          25     1013
##  3 Amy    1975     6    27    12  29.5 -79   tropical d~ -1          25     1013
##  4 Amy    1975     6    27    18  30.5 -79   tropical d~ -1          25     1013
##  5 Amy    1975     6    28     0  31.5 -78.8 tropical d~ -1          25     1012
##  6 Amy    1975     6    28     6  32.4 -78.7 tropical d~ -1          25     1012
##  7 Amy    1975     6    28    12  33.3 -78   tropical d~ -1          25     1011
##  8 Amy    1975     6    28    18  34   -77   tropical d~ -1          30     1006
##  9 Amy    1975     6    29     0  34.4 -75.8 tropical s~ 0           35     1004
## 10 Amy    1975     6    29     6  34   -74.8 tropical s~ 0           40     1002
## # ... with 10,000 more rows, and 3 more variables: ts_diameter <dbl>,
## #   hu_diameter <dbl>, duration <dbl>

Menggunakan fungsi summarise()

storms %>% group_by(status)%>%summarise(mean=mean(pressure))
## # A tibble: 3 x 2
##   status               mean
##   <chr>               <dbl>
## 1 hurricane            969.
## 2 tropical depression 1008.
## 3 tropical storm       999.

Melakukan kelima perintah bersamaan

storms%>% select(-ts_diameter,-hu_diameter) %>% arrange(desc(pressure)) %>%  filter(wind>20)%>% mutate(duration=day*hour)%>% group_by(status)%>%summarise(mean=mean(pressure))
## # A tibble: 3 x 2
##   status               mean
##   <chr>               <dbl>
## 1 hurricane            969.
## 2 tropical depression 1008.
## 3 tropical storm       999.