LBB Programming For Data Science with R

Introduction

This data is dowloaded from Open Data Jakarta (https://data.jakarta.go.id/dataset). This data give information about water gate location in Jakarta

Reading the Data

water_gate <- read.csv("Data-Pintu-Air-2021.csv")
head(water_gate)

##               nama sistem_aliran wilayah bidang  latitude longitude
## 1    Bali Matraman Aliran Tengah Selatan     NA -6.218639  106.8527
## 2       Bulak Cabe  Aliran Timur   Utara     NA -6.106905  106.9405
## 3     Cakung Drain  Aliran Timur   Timur     NA -6.183501  106.9290
## 4 Cengkareng Drain  Aliran Barat   Barat     NA -6.154056  106.7478
## 5           Cideng Aliran Tengah   Pusat     NA -6.171635  106.8110
## 6        Citraland  Aliran Barat   Barat     NA -6.167454  106.7881
##   batas_siaga1 batas_siaga2 batas_siaga3 batas_siaga4
## 1                                                    
## 2                                                    
## 3                                                    
## 4                                                    
## 5                                                    
## 6

rmarkdown::paged_table(water_gate)

Remove the NA and blank from the table, using piping function

water_clean <- water_gate %>% 
  select(-c(bidang, batas_siaga1, batas_siaga2, batas_siaga3,batas_siaga4))
head(water_clean)

##               nama sistem_aliran wilayah  latitude longitude
## 1    Bali Matraman Aliran Tengah Selatan -6.218639  106.8527
## 2       Bulak Cabe  Aliran Timur   Utara -6.106905  106.9405
## 3     Cakung Drain  Aliran Timur   Timur -6.183501  106.9290
## 4 Cengkareng Drain  Aliran Barat   Barat -6.154056  106.7478
## 5           Cideng Aliran Tengah   Pusat -6.171635  106.8110
## 6        Citraland  Aliran Barat   Barat -6.167454  106.7881

Now we have to check the structure of the dataframe using dim, and glimpse

dim(water_clean)

## [1] 35  5

glimpse(water_clean)

## Rows: 35
## Columns: 5
## $ nama          <chr> "Bali Matraman", "Bulak Cabe", "Cakung Drain", "Cengkare~
## $ sistem_aliran <chr> "Aliran Tengah", "Aliran Timur", "Aliran Timur", "Aliran~
## $ wilayah       <chr> "Selatan", "Utara", "Timur", "Barat", "Pusat", "Barat", ~
## $ latitude      <dbl> -6.218639, -6.106905, -6.183501, -6.154056, -6.171635, -~
## $ longitude     <dbl> 106.8527, 106.9405, 106.9290, 106.7478, 106.8110, 106.78~

Change types of Data

Then we can change the type of dataframe

water_clean$sistem_aliran <- as.factor(water_clean$sistem_aliran)
water_clean$wilayah <- as.factor(water_clean$wilayah)
head(water_clean)

##               nama sistem_aliran wilayah  latitude longitude
## 1    Bali Matraman Aliran Tengah Selatan -6.218639  106.8527
## 2       Bulak Cabe  Aliran Timur   Utara -6.106905  106.9405
## 3     Cakung Drain  Aliran Timur   Timur -6.183501  106.9290
## 4 Cengkareng Drain  Aliran Barat   Barat -6.154056  106.7478
## 5           Cideng Aliran Tengah   Pusat -6.171635  106.8110
## 6        Citraland  Aliran Barat   Barat -6.167454  106.7881

Now we want to know the name of the coloumns from the dataframe

names(water_clean)

## [1] "nama"          "sistem_aliran" "wilayah"       "latitude"     
## [5] "longitude"

The data set includes all information as follow:

nama character is represent the name of each water gate in every location, have a different value and non repeating
sistem_aliran factor is represent the direction flow of the water gate, some of them have repeating value
wilayah factor is represent the name of region where the water gate located, some of them have repeating value
latitude numeric is represent the latitude of the location, have numeric value
longitude numeric is represent the longitude of the location, have numeric value

Data Analysis

check the total of dataframe

nrow(water_clean)

## [1] 35

check the statistical summary from the dataset (maximum, minimun, and total)

summary(water_clean)

##      nama                 sistem_aliran    wilayah      latitude     
##  Length:35          Aliran Barat :10    Barat  : 9   Min.   :-6.288  
##  Class :character   Aliran Tengah:11    Pusat  : 6   1st Qu.:-6.190  
##  Mode  :character   Aliran Timur :14    Selatan: 3   Median :-6.166  
##                                         Timur  : 5   Mean   :-6.170  
##                                         Utara  :12   3rd Qu.:-6.144  
##                                                      Max.   :-6.107  
##    longitude    
##  Min.   :106.7  
##  1st Qu.:106.8  
##  Median :106.8  
##  Mean   :106.8  
##  3rd Qu.:106.9  
##  Max.   :106.9

Visualize the water gate location

# install.packages("leaflet")
library(leaflet)

# membuat icon
ico <- makeIcon(
    iconUrl = "water-polo.png",
    iconWidth= 20, iconHeight=20
)

# membuat object leaflet(), sama seperti awalan ggplot()
map1 <- leaflet()

# membuat tiles atau gambar peta
map1 <- addTiles(map1)

# membuat konten popup dengan gaya penulisan html
content_popup <- paste(sep = " ",
                 "Wilayah:", water_clean$wilayah, "<br>",
                 "Sistem Aliran", water_clean$sistem_aliran, "<br>",
                 "Nama:", water_clean$nama
                 )

# memasukkan marker atau titik sesuai dengan data
map1 <- addMarkers(map = map1, 
                   lng =  water_clean$longitude, # garis bujur
                   lat = water_clean$latitude, # garis lintang
                   icon = ico, 
                   
                   popup = content_popup, #popup atau tulisan
                   
                   clusterOptions = markerClusterOptions() # membuat cluster supaya tidak overlap
                   )
map1

DVLBB2

Widya Rahma

1/30/2022

LBB Programming For Data Science with R

Introduction

Reading the Data

Change types of Data

Data Analysis

Visualize the water gate location