Tugas Praktikum 03 Sains Data

Combining Multiple Tables From Database with dplyr in R

Package yang digunakan pada latihan ini adalah sebagai beikut

## Warning: package 'tidyverse' was built under R version 4.0.5
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.1.0     v dplyr   1.0.7
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   2.0.1     v forcats 0.5.1
## Warning: package 'ggplot2' was built under R version 4.0.4
## Warning: package 'tibble' was built under R version 4.0.4
## Warning: package 'tidyr' was built under R version 4.0.5
## Warning: package 'readr' was built under R version 4.0.5
## Warning: package 'dplyr' was built under R version 4.0.5
## Warning: package 'forcats' was built under R version 4.0.5
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## Warning: package 'RSQLite' was built under R version 4.0.5
## Warning: package 'DBI' was built under R version 4.0.5

Impor Data

Langkah pertama adalah membuat koneksi terhdap database pada R software, Data yang digunakan dalam latihan ini adalah data chinook.

chinook <-DBI::dbConnect(RSQLite::SQLite(), "C:/sqlite/db/chinook.db")
class(chinook)
## [1] "SQLiteConnection"
## attr(,"package")
## [1] "RSQLite"

Terdapat 14 tabel pada database tersebut.

RSQLite::dbListTables(chinook)
##  [1] "albums"          "artists"         "customers"       "employees"      
##  [5] "genres"          "invoice_items"   "invoices"        "media_types"    
##  [9] "playlist_track"  "playlists"       "sqlite_sequence" "sqlite_stat1"   
## [13] "tracks"

Inner Join

inner_join() menghasilkan semua baris pada table x yang memiliki kesamaan nilai dengan table y, dan semua kolom dari x dan y. Pada latihan fungsi inner_join(), tabel yang akan digabungkan yaitu tabel albums dan tabel artists dimana ada kesamaan data pada dua tabel tersebut yaitu data ArtisId.

albums <- dplyr::tbl(chinook,"albums")
albums
## # Source:   table<albums> [?? x 3]
## # Database: sqlite 3.36.0 [C:\sqlite\db\chinook.db]
##    AlbumId Title                                 ArtistId
##      <int> <chr>                                    <int>
##  1       1 For Those About To Rock We Salute You        1
##  2       2 Balls to the Wall                            2
##  3       3 Restless and Wild                            2
##  4       4 Let There Be Rock                            1
##  5       5 Big Ones                                     3
##  6       6 Jagged Little Pill                           4
##  7       7 Facelift                                     5
##  8       8 Warner 25 Anos                               6
##  9       9 Plays Metallica By Four Cellos               7
## 10      10 Audioslave                                   8
## # ... with more rows
artists <-dplyr::tbl(chinook,"artists")
artists
## # Source:   table<artists> [?? x 2]
## # Database: sqlite 3.36.0 [C:\sqlite\db\chinook.db]
##    ArtistId Name                
##       <int> <chr>               
##  1        1 AC/DC               
##  2        2 Accept              
##  3        3 Aerosmith           
##  4        4 Alanis Morissette   
##  5        5 Alice In Chains     
##  6        6 Antônio Carlos Jobim
##  7        7 Apocalyptica        
##  8        8 Audioslave          
##  9        9 BackBeat            
## 10       10 Billy Cobham        
## # ... with more rows
inner_join(albums, artists)
## Joining, by = "ArtistId"
## # Source:   lazy query [?? x 4]
## # Database: sqlite 3.36.0 [C:\sqlite\db\chinook.db]
##    AlbumId Title                                 ArtistId Name                
##      <int> <chr>                                    <int> <chr>               
##  1       1 For Those About To Rock We Salute You        1 AC/DC               
##  2       2 Balls to the Wall                            2 Accept              
##  3       3 Restless and Wild                            2 Accept              
##  4       4 Let There Be Rock                            1 AC/DC               
##  5       5 Big Ones                                     3 Aerosmith           
##  6       6 Jagged Little Pill                           4 Alanis Morissette   
##  7       7 Facelift                                     5 Alice In Chains     
##  8       8 Warner 25 Anos                               6 Antônio Carlos Jobim
##  9       9 Plays Metallica By Four Cellos               7 Apocalyptica        
## 10      10 Audioslave                                   8 Audioslave          
## # ... with more rows

Left Join

left_join() menghasilkan semua baris pada table x, dan semua kolom pada x dan y, dimana untuk baris pada x yang tidak memiliki kesamaan dengan y diisi dengan nilai NA pada kolom yang baru. Pada latihan fungsi lest_join(), tabel yang akan digabungkan yaitu tabel albums dan tabel artists dimana ada kesamaan data pada dua tabel tersebut yaitu data ArtisId.

left_join(artists,albums)
## Joining, by = "ArtistId"
## # Source:   lazy query [?? x 4]
## # Database: sqlite 3.36.0 [C:\sqlite\db\chinook.db]
##    ArtistId Name                 AlbumId Title                                
##       <int> <chr>                  <int> <chr>                                
##  1        1 AC/DC                      1 For Those About To Rock We Salute You
##  2        1 AC/DC                      4 Let There Be Rock                    
##  3        2 Accept                     2 Balls to the Wall                    
##  4        2 Accept                     3 Restless and Wild                    
##  5        3 Aerosmith                  5 Big Ones                             
##  6        4 Alanis Morissette          6 Jagged Little Pill                   
##  7        5 Alice In Chains            7 Facelift                             
##  8        6 Antônio Carlos Jobim       8 Warner 25 Anos                       
##  9        6 Antônio Carlos Jobim      34 Chill: Brazil (Disc 2)               
## 10        7 Apocalyptica               9 Plays Metallica By Four Cellos       
## # ... with more rows

Right Join

righ_join kebalikan dari left_join, menghasilkan semua baris pada table y, dan semua kolom pada x dan y, dimana untuk baris pada y yang tidak memiliki kesamaan dengan x diisi dengan nilai NA pada kolom yang baru.

right_join(artists, albums)
## Joining, by = "ArtistId"
## # Source:   lazy query [?? x 4]
## # Database: sqlite 3.36.0 [C:\sqlite\db\chinook.db]
##    ArtistId Name                 AlbumId Title                                
##       <int> <chr>                  <int> <chr>                                
##  1        1 AC/DC                      1 For Those About To Rock We Salute You
##  2        2 Accept                     2 Balls to the Wall                    
##  3        2 Accept                     3 Restless and Wild                    
##  4        1 AC/DC                      4 Let There Be Rock                    
##  5        3 Aerosmith                  5 Big Ones                             
##  6        4 Alanis Morissette          6 Jagged Little Pill                   
##  7        5 Alice In Chains            7 Facelift                             
##  8        6 Antônio Carlos Jobim       8 Warner 25 Anos                       
##  9        7 Apocalyptica               9 Plays Metallica By Four Cellos       
## 10        8 Audioslave                10 Audioslave                           
## # ... with more rows

Full Join

full_join menghasilkan semua baris dan kolom dari x dan y. Jika ada nilai yang tidak sama (match) antara x dan y, maka akan bernilai NA.

full_join(artists, albums)
## Joining, by = "ArtistId"
## # Source:   lazy query [?? x 4]
## # Database: sqlite 3.36.0 [C:\sqlite\db\chinook.db]
##    ArtistId Name                 AlbumId Title                                
##       <int> <chr>                  <int> <chr>                                
##  1        1 AC/DC                      1 For Those About To Rock We Salute You
##  2        1 AC/DC                      4 Let There Be Rock                    
##  3        2 Accept                     2 Balls to the Wall                    
##  4        2 Accept                     3 Restless and Wild                    
##  5        3 Aerosmith                  5 Big Ones                             
##  6        4 Alanis Morissette          6 Jagged Little Pill                   
##  7        5 Alice In Chains            7 Facelift                             
##  8        6 Antônio Carlos Jobim       8 Warner 25 Anos                       
##  9        6 Antônio Carlos Jobim      34 Chill: Brazil (Disc 2)               
## 10        7 Apocalyptica               9 Plays Metallica By Four Cellos       
## # ... with more rows

Semi Join

semi_join() menghasilkan semua baris pada table x yang memiliki kesamaan nilai dengan table y, dan semua kolom dari x. Ini mirip dengan inner_join(), bedanya inner_join() mengembalikan semua kolom dari x dan y.

semi_join(artists, albums)
## Joining, by = "ArtistId"
## # Source:   lazy query [?? x 2]
## # Database: sqlite 3.36.0 [C:\sqlite\db\chinook.db]
##    ArtistId Name                
##       <int> <chr>               
##  1        1 AC/DC               
##  2        2 Accept              
##  3        3 Aerosmith           
##  4        4 Alanis Morissette   
##  5        5 Alice In Chains     
##  6        6 Antônio Carlos Jobim
##  7        7 Apocalyptica        
##  8        8 Audioslave          
##  9        9 BackBeat            
## 10       10 Billy Cobham        
## # ... with more rows

Anti Join

anti_join() menghasilkan semua baris dari x yang TIDAK memiliki kesamaan dengan y, dan semua kolom yang berasal dari x.

anti_join(artists, albums)
## Joining, by = "ArtistId"
## # Source:   lazy query [?? x 2]
## # Database: sqlite 3.36.0 [C:\sqlite\db\chinook.db]
##    ArtistId Name                      
##       <int> <chr>                     
##  1       25 Milton Nascimento & Bebeto
##  2       26 Azymuth                   
##  3       28 João Gilberto             
##  4       29 Bebel Gilberto            
##  5       30 Jorge Vercilo             
##  6       31 Baby Consuelo             
##  7       32 Ney Matogrosso            
##  8       33 Luiz Melodia              
##  9       34 Nando Reis                
## 10       35 Pedro Luís & A Parede     
## # ... with more rows

Spasial dan Pemetaan

Beberapa Package yang akan digunakan:

library(sf)
## Warning: package 'sf' was built under R version 4.0.4
## Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
library(ggplot2)
library(tigris)
## Warning: package 'tigris' was built under R version 4.0.5
## To enable 
## caching of data, set `options(tigris_use_cache = TRUE)` in your R script or .Rprofile.
library(dplyr)

Import data SHP:

Admin3Kecamatan <- "D:/S2 Statistika/Sains Data STA581/Week3/Admin3Kecamatan/idn_admbnda_adm3_bps_20200401.shp"
glimpse(Admin3Kecamatan)
##  chr "D:/S2 Statistika/Sains Data STA581/Week3/Admin3Kecamatan/idn_admbnda_adm3_bps_20200401.shp"

Merubah menjadi fungsi ‘st_read’ dari package ‘sf’

Admin3<-st_read(Admin3Kecamatan)
## Reading layer `idn_admbnda_adm3_bps_20200401' from data source `D:\S2 Statistika\Sains Data STA581\Week3\Admin3Kecamatan\idn_admbnda_adm3_bps_20200401.shp' using driver `ESRI Shapefile'
## Simple feature collection with 7069 features and 16 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 95.01079 ymin: -11.00762 xmax: 141.0194 ymax: 6.07693
## geographic CRS: WGS 84
glimpse(Admin3)
## Rows: 7,069
## Columns: 17
## $ Shape_Leng <dbl> 0.2798656, 0.7514001, 0.6900061, 0.6483629, 0.2437073, 1.35~
## $ Shape_Area <dbl> 0.003107633, 0.016925540, 0.024636382, 0.010761277, 0.00116~
## $ ADM3_EN    <chr> "2 X 11 Enam Lingkung", "2 X 11 Kayu Tanam", "Abab", "Abang~
## $ ADM3_PCODE <chr> "ID1306050", "ID1306052", "ID1612030", "ID5107050", "ID7471~
## $ ADM3_REF   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
## $ ADM3ALT1EN <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
## $ ADM3ALT2EN <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
## $ ADM2_EN    <chr> "Padang Pariaman", "Padang Pariaman", "Penukal Abab Lematan~
## $ ADM2_PCODE <chr> "ID1306", "ID1306", "ID1612", "ID5107", "ID7471", "ID9432",~
## $ ADM1_EN    <chr> "Sumatera Barat", "Sumatera Barat", "Sumatera Selatan", "Ba~
## $ ADM1_PCODE <chr> "ID13", "ID13", "ID16", "ID51", "ID74", "ID94", "ID94", "ID~
## $ ADM0_EN    <chr> "Indonesia", "Indonesia", "Indonesia", "Indonesia", "Indone~
## $ ADM0_PCODE <chr> "ID", "ID", "ID", "ID", "ID", "ID", "ID", "ID", "ID", "ID",~
## $ date       <date> 2019-12-20, 2019-12-20, 2019-12-20, 2019-12-20, 2019-12-20~
## $ validOn    <date> 2020-04-01, 2020-04-01, 2020-04-01, 2020-04-01, 2020-04-01~
## $ validTo    <date> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA~
## $ geometry   <MULTIPOLYGON [°]> MULTIPOLYGON (((100.2811 -0..., MULTIPOLYGON (~

Import data via file CSV

library(readr)
Jabar_Subang <- read_csv("D:/S2 Statistika/Sains Data STA581/Week3/Jabar dan Subang.csv")
## Rows: 630 Columns: 8
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (5): ADM3_EN, ADM3_PCODE, ADM2_EN, ADM2_PCODE, ADM1_EN
## dbl (3): Shape_Leng, Shape_Area, DATA
## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(Jabar_Subang)
## # A tibble: 6 x 8
##   Shape_Leng Shape_Area ADM3_EN    ADM3_PCODE ADM2_EN ADM2_PCODE ADM1_EN    DATA
##        <dbl>      <dbl> <chr>      <chr>      <chr>   <chr>      <chr>     <dbl>
## 1      0.348    0.00522 Arjasari   ID3204150  Bandung ID3204     Jawa Bar~    NA
## 2      0.436    0.00328 Baleendah  ID3204140  Bandung ID3204     Jawa Bar~    NA
## 3      0.350    0.00318 Banjaran   ID3204160  Bandung ID3204     Jawa Bar~    NA
## 4      0.274    0.00229 Bojongsoa~ ID3204280  Bandung ID3204     Jawa Bar~    NA
## 5      0.274    0.00190 Cangkuang  ID3204161  Bandung ID3204     Jawa Bar~    NA
## 6      0.438    0.00356 Cicalengka ID3204100  Bandung ID3204     Jawa Bar~    NA

Merge Data

merged_subang <- geo_join(spatial_data=Admin3, data_frame=Jabar_Subang, by_sp="ADM3_PCODE",by_df="ADM3_PCODE", how = "inner")
## Warning: We recommend using the dplyr::*_join() family of functions instead.

Mengatur warna

mycol <- c("green", "yellow", "red", "red4")

Menampilkan peta

Peta di bawah ini merupakan peta provinsi Jawa Barat dimana wilayah Kab. Subang sudah diberi warna yang berbeda berdasarkan nilai Data IPM setiap kecamatan yang ada di Jawa Barat.