LBB Programming For Data Science with R

Introduction

This data is dowloaded from Open Data Jakarta (https://data.jakarta.go.id/dataset). This data give information about water gate location in Jakarta

Reading the Data

water_gate <- read.csv("Data-Pintu-Air-2021.csv")
head(water_gate)
##               nama sistem_aliran wilayah bidang  latitude longitude
## 1    Bali Matraman Aliran Tengah Selatan     NA -6.218639  106.8527
## 2       Bulak Cabe  Aliran Timur   Utara     NA -6.106905  106.9405
## 3     Cakung Drain  Aliran Timur   Timur     NA -6.183501  106.9290
## 4 Cengkareng Drain  Aliran Barat   Barat     NA -6.154056  106.7478
## 5           Cideng Aliran Tengah   Pusat     NA -6.171635  106.8110
## 6        Citraland  Aliran Barat   Barat     NA -6.167454  106.7881
##   batas_siaga1 batas_siaga2 batas_siaga3 batas_siaga4
## 1                                                    
## 2                                                    
## 3                                                    
## 4                                                    
## 5                                                    
## 6
rmarkdown::paged_table(water_gate)

Remove the NA and blank from the table, using piping function

water_clean <- water_gate %>% 
  select(-c(bidang, batas_siaga1, batas_siaga2, batas_siaga3,batas_siaga4))
head(water_clean)
##               nama sistem_aliran wilayah  latitude longitude
## 1    Bali Matraman Aliran Tengah Selatan -6.218639  106.8527
## 2       Bulak Cabe  Aliran Timur   Utara -6.106905  106.9405
## 3     Cakung Drain  Aliran Timur   Timur -6.183501  106.9290
## 4 Cengkareng Drain  Aliran Barat   Barat -6.154056  106.7478
## 5           Cideng Aliran Tengah   Pusat -6.171635  106.8110
## 6        Citraland  Aliran Barat   Barat -6.167454  106.7881

Now we have to check the structure of the dataframe using dim, and glimpse

dim(water_clean)
## [1] 35  5
glimpse(water_clean)
## Rows: 35
## Columns: 5
## $ nama          <chr> "Bali Matraman", "Bulak Cabe", "Cakung Drain", "Cengkare~
## $ sistem_aliran <chr> "Aliran Tengah", "Aliran Timur", "Aliran Timur", "Aliran~
## $ wilayah       <chr> "Selatan", "Utara", "Timur", "Barat", "Pusat", "Barat", ~
## $ latitude      <dbl> -6.218639, -6.106905, -6.183501, -6.154056, -6.171635, -~
## $ longitude     <dbl> 106.8527, 106.9405, 106.9290, 106.7478, 106.8110, 106.78~

Change types of Data

Then we can change the type of dataframe

water_clean$sistem_aliran <- as.factor(water_clean$sistem_aliran)
water_clean$wilayah <- as.factor(water_clean$wilayah)
head(water_clean)
##               nama sistem_aliran wilayah  latitude longitude
## 1    Bali Matraman Aliran Tengah Selatan -6.218639  106.8527
## 2       Bulak Cabe  Aliran Timur   Utara -6.106905  106.9405
## 3     Cakung Drain  Aliran Timur   Timur -6.183501  106.9290
## 4 Cengkareng Drain  Aliran Barat   Barat -6.154056  106.7478
## 5           Cideng Aliran Tengah   Pusat -6.171635  106.8110
## 6        Citraland  Aliran Barat   Barat -6.167454  106.7881

Now we want to know the name of the coloumns from the dataframe

names(water_clean)
## [1] "nama"          "sistem_aliran" "wilayah"       "latitude"     
## [5] "longitude"

The data set includes all information as follow:

  • nama character is represent the name of each water gate in every location, have a different value and non repeating
  • sistem_aliran factor is represent the direction flow of the water gate, some of them have repeating value
  • wilayah factor is represent the name of region where the water gate located, some of them have repeating value
  • latitude numeric is represent the latitude of the location, have numeric value
  • longitude numeric is represent the longitude of the location, have numeric value

Data Analysis

check the total of dataframe

nrow(water_clean)
## [1] 35

check the statistical summary from the dataset (maximum, minimun, and total)

summary(water_clean)
##      nama                 sistem_aliran    wilayah      latitude     
##  Length:35          Aliran Barat :10    Barat  : 9   Min.   :-6.288  
##  Class :character   Aliran Tengah:11    Pusat  : 6   1st Qu.:-6.190  
##  Mode  :character   Aliran Timur :14    Selatan: 3   Median :-6.166  
##                                         Timur  : 5   Mean   :-6.170  
##                                         Utara  :12   3rd Qu.:-6.144  
##                                                      Max.   :-6.107  
##    longitude    
##  Min.   :106.7  
##  1st Qu.:106.8  
##  Median :106.8  
##  Mean   :106.8  
##  3rd Qu.:106.9  
##  Max.   :106.9

Visualize the water gate location

# install.packages("leaflet")
library(leaflet)

# membuat icon
ico <- makeIcon(
    iconUrl = "water-polo.png",
    iconWidth= 20, iconHeight=20
)

# membuat object leaflet(), sama seperti awalan ggplot()
map1 <- leaflet()

# membuat tiles atau gambar peta
map1 <- addTiles(map1)

# membuat konten popup dengan gaya penulisan html
content_popup <- paste(sep = " ",
                 "Wilayah:", water_clean$wilayah, "<br>",
                 "Sistem Aliran", water_clean$sistem_aliran, "<br>",
                 "Nama:", water_clean$nama
                 )

# memasukkan marker atau titik sesuai dengan data
map1 <- addMarkers(map = map1, 
                   lng =  water_clean$longitude, # garis bujur
                   lat = water_clean$latitude, # garis lintang
                   icon = ico, 
                   
                   popup = content_popup, #popup atau tulisan
                   
                   clusterOptions = markerClusterOptions() # membuat cluster supaya tidak overlap
                   )
map1