Tugas Praktikum 03 Sains Data
Combining Multiple Tables From Database with dplyr in R
Package yang digunakan pada latihan ini adalah sebagai beikut
## Warning: package 'tidyverse' was built under R version 4.0.5
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.1.0 v dplyr 1.0.7
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 2.0.1 v forcats 0.5.1
## Warning: package 'ggplot2' was built under R version 4.0.4
## Warning: package 'tibble' was built under R version 4.0.4
## Warning: package 'tidyr' was built under R version 4.0.5
## Warning: package 'readr' was built under R version 4.0.5
## Warning: package 'dplyr' was built under R version 4.0.5
## Warning: package 'forcats' was built under R version 4.0.5
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## Warning: package 'RSQLite' was built under R version 4.0.5
## Warning: package 'DBI' was built under R version 4.0.5
Impor Data
Langkah pertama adalah membuat koneksi terhdap database pada R software, Data yang digunakan dalam latihan ini adalah data chinook.
## [1] "SQLiteConnection"
## attr(,"package")
## [1] "RSQLite"
Terdapat 14 tabel pada database tersebut.
## [1] "albums" "artists" "customers" "employees"
## [5] "genres" "invoice_items" "invoices" "media_types"
## [9] "playlist_track" "playlists" "sqlite_sequence" "sqlite_stat1"
## [13] "tracks"
Inner Join
inner_join() menghasilkan semua baris pada table x yang memiliki kesamaan nilai dengan table y, dan semua kolom dari x dan y. Pada latihan fungsi inner_join(), tabel yang akan digabungkan yaitu tabel albums dan tabel artists dimana ada kesamaan data pada dua tabel tersebut yaitu data ArtisId.
## # Source: table<albums> [?? x 3]
## # Database: sqlite 3.36.0 [C:\sqlite\db\chinook.db]
## AlbumId Title ArtistId
## <int> <chr> <int>
## 1 1 For Those About To Rock We Salute You 1
## 2 2 Balls to the Wall 2
## 3 3 Restless and Wild 2
## 4 4 Let There Be Rock 1
## 5 5 Big Ones 3
## 6 6 Jagged Little Pill 4
## 7 7 Facelift 5
## 8 8 Warner 25 Anos 6
## 9 9 Plays Metallica By Four Cellos 7
## 10 10 Audioslave 8
## # ... with more rows
## # Source: table<artists> [?? x 2]
## # Database: sqlite 3.36.0 [C:\sqlite\db\chinook.db]
## ArtistId Name
## <int> <chr>
## 1 1 AC/DC
## 2 2 Accept
## 3 3 Aerosmith
## 4 4 Alanis Morissette
## 5 5 Alice In Chains
## 6 6 Antônio Carlos Jobim
## 7 7 Apocalyptica
## 8 8 Audioslave
## 9 9 BackBeat
## 10 10 Billy Cobham
## # ... with more rows
## Joining, by = "ArtistId"
## # Source: lazy query [?? x 4]
## # Database: sqlite 3.36.0 [C:\sqlite\db\chinook.db]
## AlbumId Title ArtistId Name
## <int> <chr> <int> <chr>
## 1 1 For Those About To Rock We Salute You 1 AC/DC
## 2 2 Balls to the Wall 2 Accept
## 3 3 Restless and Wild 2 Accept
## 4 4 Let There Be Rock 1 AC/DC
## 5 5 Big Ones 3 Aerosmith
## 6 6 Jagged Little Pill 4 Alanis Morissette
## 7 7 Facelift 5 Alice In Chains
## 8 8 Warner 25 Anos 6 Antônio Carlos Jobim
## 9 9 Plays Metallica By Four Cellos 7 Apocalyptica
## 10 10 Audioslave 8 Audioslave
## # ... with more rows
Left Join
left_join() menghasilkan semua baris pada table x, dan semua kolom pada x dan y, dimana untuk baris pada x yang tidak memiliki kesamaan dengan y diisi dengan nilai NA pada kolom yang baru. Pada latihan fungsi lest_join(), tabel yang akan digabungkan yaitu tabel albums dan tabel artists dimana ada kesamaan data pada dua tabel tersebut yaitu data ArtisId.
## Joining, by = "ArtistId"
## # Source: lazy query [?? x 4]
## # Database: sqlite 3.36.0 [C:\sqlite\db\chinook.db]
## ArtistId Name AlbumId Title
## <int> <chr> <int> <chr>
## 1 1 AC/DC 1 For Those About To Rock We Salute You
## 2 1 AC/DC 4 Let There Be Rock
## 3 2 Accept 2 Balls to the Wall
## 4 2 Accept 3 Restless and Wild
## 5 3 Aerosmith 5 Big Ones
## 6 4 Alanis Morissette 6 Jagged Little Pill
## 7 5 Alice In Chains 7 Facelift
## 8 6 Antônio Carlos Jobim 8 Warner 25 Anos
## 9 6 Antônio Carlos Jobim 34 Chill: Brazil (Disc 2)
## 10 7 Apocalyptica 9 Plays Metallica By Four Cellos
## # ... with more rows
Right Join
righ_join kebalikan dari left_join, menghasilkan semua baris pada table y, dan semua kolom pada x dan y, dimana untuk baris pada y yang tidak memiliki kesamaan dengan x diisi dengan nilai NA pada kolom yang baru.
## Joining, by = "ArtistId"
## # Source: lazy query [?? x 4]
## # Database: sqlite 3.36.0 [C:\sqlite\db\chinook.db]
## ArtistId Name AlbumId Title
## <int> <chr> <int> <chr>
## 1 1 AC/DC 1 For Those About To Rock We Salute You
## 2 2 Accept 2 Balls to the Wall
## 3 2 Accept 3 Restless and Wild
## 4 1 AC/DC 4 Let There Be Rock
## 5 3 Aerosmith 5 Big Ones
## 6 4 Alanis Morissette 6 Jagged Little Pill
## 7 5 Alice In Chains 7 Facelift
## 8 6 Antônio Carlos Jobim 8 Warner 25 Anos
## 9 7 Apocalyptica 9 Plays Metallica By Four Cellos
## 10 8 Audioslave 10 Audioslave
## # ... with more rows
Full Join
full_join menghasilkan semua baris dan kolom dari x dan y. Jika ada nilai yang tidak sama (match) antara x dan y, maka akan bernilai NA.
## Joining, by = "ArtistId"
## # Source: lazy query [?? x 4]
## # Database: sqlite 3.36.0 [C:\sqlite\db\chinook.db]
## ArtistId Name AlbumId Title
## <int> <chr> <int> <chr>
## 1 1 AC/DC 1 For Those About To Rock We Salute You
## 2 1 AC/DC 4 Let There Be Rock
## 3 2 Accept 2 Balls to the Wall
## 4 2 Accept 3 Restless and Wild
## 5 3 Aerosmith 5 Big Ones
## 6 4 Alanis Morissette 6 Jagged Little Pill
## 7 5 Alice In Chains 7 Facelift
## 8 6 Antônio Carlos Jobim 8 Warner 25 Anos
## 9 6 Antônio Carlos Jobim 34 Chill: Brazil (Disc 2)
## 10 7 Apocalyptica 9 Plays Metallica By Four Cellos
## # ... with more rows
Semi Join
semi_join() menghasilkan semua baris pada table x yang memiliki kesamaan nilai dengan table y, dan semua kolom dari x. Ini mirip dengan inner_join(), bedanya inner_join() mengembalikan semua kolom dari x dan y.
## Joining, by = "ArtistId"
## # Source: lazy query [?? x 2]
## # Database: sqlite 3.36.0 [C:\sqlite\db\chinook.db]
## ArtistId Name
## <int> <chr>
## 1 1 AC/DC
## 2 2 Accept
## 3 3 Aerosmith
## 4 4 Alanis Morissette
## 5 5 Alice In Chains
## 6 6 Antônio Carlos Jobim
## 7 7 Apocalyptica
## 8 8 Audioslave
## 9 9 BackBeat
## 10 10 Billy Cobham
## # ... with more rows
Anti Join
anti_join() menghasilkan semua baris dari x yang TIDAK memiliki kesamaan dengan y, dan semua kolom yang berasal dari x.
## Joining, by = "ArtistId"
## # Source: lazy query [?? x 2]
## # Database: sqlite 3.36.0 [C:\sqlite\db\chinook.db]
## ArtistId Name
## <int> <chr>
## 1 25 Milton Nascimento & Bebeto
## 2 26 Azymuth
## 3 28 João Gilberto
## 4 29 Bebel Gilberto
## 5 30 Jorge Vercilo
## 6 31 Baby Consuelo
## 7 32 Ney Matogrosso
## 8 33 Luiz Melodia
## 9 34 Nando Reis
## 10 35 Pedro LuÃs & A Parede
## # ... with more rows
Spasial dan Pemetaan
Beberapa Package yang akan digunakan:
## Warning: package 'sf' was built under R version 4.0.4
## Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
## Warning: package 'tigris' was built under R version 4.0.5
## To enable
## caching of data, set `options(tigris_use_cache = TRUE)` in your R script or .Rprofile.
Import data SHP:
Admin3Kecamatan <- "D:/S2 Statistika/Sains Data STA581/Week3/Admin3Kecamatan/idn_admbnda_adm3_bps_20200401.shp"
glimpse(Admin3Kecamatan)## chr "D:/S2 Statistika/Sains Data STA581/Week3/Admin3Kecamatan/idn_admbnda_adm3_bps_20200401.shp"
Merubah menjadi fungsi ‘st_read’ dari package ‘sf’
## Reading layer `idn_admbnda_adm3_bps_20200401' from data source `D:\S2 Statistika\Sains Data STA581\Week3\Admin3Kecamatan\idn_admbnda_adm3_bps_20200401.shp' using driver `ESRI Shapefile'
## Simple feature collection with 7069 features and 16 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 95.01079 ymin: -11.00762 xmax: 141.0194 ymax: 6.07693
## geographic CRS: WGS 84
## Rows: 7,069
## Columns: 17
## $ Shape_Leng <dbl> 0.2798656, 0.7514001, 0.6900061, 0.6483629, 0.2437073, 1.35~
## $ Shape_Area <dbl> 0.003107633, 0.016925540, 0.024636382, 0.010761277, 0.00116~
## $ ADM3_EN <chr> "2 X 11 Enam Lingkung", "2 X 11 Kayu Tanam", "Abab", "Abang~
## $ ADM3_PCODE <chr> "ID1306050", "ID1306052", "ID1612030", "ID5107050", "ID7471~
## $ ADM3_REF <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
## $ ADM3ALT1EN <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
## $ ADM3ALT2EN <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
## $ ADM2_EN <chr> "Padang Pariaman", "Padang Pariaman", "Penukal Abab Lematan~
## $ ADM2_PCODE <chr> "ID1306", "ID1306", "ID1612", "ID5107", "ID7471", "ID9432",~
## $ ADM1_EN <chr> "Sumatera Barat", "Sumatera Barat", "Sumatera Selatan", "Ba~
## $ ADM1_PCODE <chr> "ID13", "ID13", "ID16", "ID51", "ID74", "ID94", "ID94", "ID~
## $ ADM0_EN <chr> "Indonesia", "Indonesia", "Indonesia", "Indonesia", "Indone~
## $ ADM0_PCODE <chr> "ID", "ID", "ID", "ID", "ID", "ID", "ID", "ID", "ID", "ID",~
## $ date <date> 2019-12-20, 2019-12-20, 2019-12-20, 2019-12-20, 2019-12-20~
## $ validOn <date> 2020-04-01, 2020-04-01, 2020-04-01, 2020-04-01, 2020-04-01~
## $ validTo <date> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA~
## $ geometry <MULTIPOLYGON [°]> MULTIPOLYGON (((100.2811 -0..., MULTIPOLYGON (~
Import data via file CSV
library(readr)
Jabar_Subang <- read_csv("D:/S2 Statistika/Sains Data STA581/Week3/Jabar dan Subang.csv")## Rows: 630 Columns: 8
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (5): ADM3_EN, ADM3_PCODE, ADM2_EN, ADM2_PCODE, ADM1_EN
## dbl (3): Shape_Leng, Shape_Area, DATA
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 6 x 8
## Shape_Leng Shape_Area ADM3_EN ADM3_PCODE ADM2_EN ADM2_PCODE ADM1_EN DATA
## <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 0.348 0.00522 Arjasari ID3204150 Bandung ID3204 Jawa Bar~ NA
## 2 0.436 0.00328 Baleendah ID3204140 Bandung ID3204 Jawa Bar~ NA
## 3 0.350 0.00318 Banjaran ID3204160 Bandung ID3204 Jawa Bar~ NA
## 4 0.274 0.00229 Bojongsoa~ ID3204280 Bandung ID3204 Jawa Bar~ NA
## 5 0.274 0.00190 Cangkuang ID3204161 Bandung ID3204 Jawa Bar~ NA
## 6 0.438 0.00356 Cicalengka ID3204100 Bandung ID3204 Jawa Bar~ NA
Merge Data
merged_subang <- geo_join(spatial_data=Admin3, data_frame=Jabar_Subang, by_sp="ADM3_PCODE",by_df="ADM3_PCODE", how = "inner")## Warning: We recommend using the dplyr::*_join() family of functions instead.
Menampilkan peta
Peta di bawah ini merupakan peta provinsi Jawa Barat dimana wilayah Kab. Subang sudah diberi warna yang berbeda berdasarkan nilai Data IPM setiap kecamatan yang ada di Jawa Barat.