Code chunk

1. Load Required Modules

The modules on the below code chunk are required to execute all the code chunks on this markdown. If you haven’t installed these modules, please download and install them before you carry out code chunks.

require(rjson)
## Loading required package: rjson
require(tibble)
## Loading required package: tibble
require(RSQLite)
## Loading required package: RSQLite

2. Download YouBike 2.0 real-time data from API

The data used in this notebook is from API on data.gov.tw. The data format is JSON, and the size is 572.4 KB.

fileUrl <- "https://tcgbusfs.blob.core.windows.net/dotapp/youbike/v2/youbike_immediate.json"
download.file(fileUrl, 'YouBike_realtime.json')

file <- "YouBike_realtime.json"
json_data <- fromJSON(file = file)

3. Prepare the tables before appending them to RDBMS

The dataset includes the id, name, area, and address of the YouBike 2.0 stations in Taipei City, and also added DateTime and Date that the data updated.

Due to reducing the amount of data stored in RDBMS, I separated the JSON file into two data frames. I will explain the detail of the data frames next section.

sno <- c()
sna <- c()
tot <- c()
sbi <- c()
sarea <- c()
mday <- c()
lat <- c()
lng <- c()
ar <- c()
sareaen <- c()
snaen <- c()
aren <- c()
bemp <- c()
act <- c()
srcUpdateTime <- c()
updateTime <- c()
infoTime <- c()
infoDate <- c()

for (i in 1:length(json_data)) {
    sno <- append(sno, json_data[i][[1]]$sno)
    sna <- append(sna, json_data[i][[1]]$sna)
    tot <- append(tot, json_data[i][[1]]$tot)
    sbi <- append(sbi, json_data[i][[1]]$sbi)
    sarea <- append(sarea, json_data[i][[1]]$sarea)
    mday <- append(mday, json_data[i][[1]]$mday)
    lat <- append(lat, json_data[i][[1]]$lat)
    lng <- append(lng, json_data[i][[1]]$lng)
    ar <- append(ar, json_data[i][[1]]$ar)
    sareaen <- append(sareaen, json_data[i][[1]]$sareaen)
    snaen <- append(snaen, json_data[i][[1]]$snaen)
    aren <- append(aren, json_data[i][[1]]$aren)
    bemp <- append(bemp, json_data[i][[1]]$bemp)
    act <- append(act, json_data[i][[1]]$act)
    srcUpdateTime <- append(srcUpdateTime, json_data[i][[1]]$srcUpdateTime)
    updateTime <- append(updateTime, json_data[i][[1]]$updateTime)
    infoTime <- append(infoTime, json_data[i][[1]]$infoTime)
    infoDate <- append(infoDate, json_data[i][[1]]$infoDate)
}

station_df <- tibble(
    id = sno,
    station_name = sna,
    station_area = sarea,
    station_address = ar,
    station_name_en = snaen,
    station_area_en = sareaen,
    station_ar_en = aren,
    latitude = lat,
    longitude = lng,
    active = as.integer(act),
    total_parking_space = as.integer(tot)
)

state_df <- tibble(
    id = sno,
    active = act,
    number_of_bikes_at_the_station = as.integer(sbi),
    number_of_bikes_returnable = as.integer(bemp),
    station_update_time = mday,
    youbike_update_time = srcUpdateTime,
    transport_update_time = updateTime,
    info_datetime = infoTime,
    info_date = infoDate
)

station_df
## # A tibble: 1,240 × 11
##    id     stati…¹ stati…² stati…³ stati…⁴ stati…⁵ stati…⁶ latit…⁷ longi…⁸ active
##    <chr>  <chr>   <chr>   <chr>   <chr>   <chr>   <chr>     <dbl>   <dbl>  <int>
##  1 50010… YouBik… 大安區  復興南… YouBik… Daan D… No.235…    25.0    122.      1
##  2 50010… YouBik… 大安區  復興南… YouBik… Daan D… No.273…    25.0    122.      1
##  3 50010… YouBik… 大安區  和平東… YouBik… Daan D… No. 7…     25.0    122.      1
##  4 50010… YouBik… 大安區  和平東… YouBik… Daan D… No. 33…    25.0    122.      1
##  5 50010… YouBik… 大安區  復興南… YouBik… Daan D… No. 36…    25.0    122.      1
##  6 50010… YouBik… 大安區  復興南… YouBik… Daan D… No. 28…    25.0    122.      1
##  7 50010… YouBik… 大安區  復興南… YouBik… Daan D… No. 34…    25.0    122.      1
##  8 50010… YouBik… 大安區  新生南… YouBik… Daan D… No. 52…    25.0    122.      1
##  9 50010… YouBik… 大安區  新生南… YouBik… Daan D… No. 66…    25.0    122.      1
## 10 50010… YouBik… 大安區  新生南… YouBik… Daan D… No. 82…    25.0    122.      1
## # … with 1,230 more rows, 1 more variable: total_parking_space <int>, and
## #   abbreviated variable names ¹​station_name, ²​station_area, ³​station_address,
## #   ⁴​station_name_en, ⁵​station_area_en, ⁶​station_ar_en, ⁷​latitude, ⁸​longitude
state_df
## # A tibble: 1,240 × 9
##    id        active number_of_…¹ numbe…² stati…³ youbi…⁴ trans…⁵ info_…⁶ info_…⁷
##    <chr>     <chr>         <int>   <int> <chr>   <chr>   <chr>   <chr>   <chr>  
##  1 500101001 1                 2      26 2022-1… 2022-1… 2022-1… 2022-1… 2022-1…
##  2 500101002 1                12       9 2022-1… 2022-1… 2022-1… 2022-1… 2022-1…
##  3 500101003 1                 0      16 2022-1… 2022-1… 2022-1… 2022-1… 2022-1…
##  4 500101004 1                 1      10 2022-1… 2022-1… 2022-1… 2022-1… 2022-1…
##  5 500101005 1                 8       8 2022-1… 2022-1… 2022-1… 2022-1… 2022-1…
##  6 500101006 1                10       1 2022-1… 2022-1… 2022-1… 2022-1… 2022-1…
##  7 500101007 1                 1      10 2022-1… 2022-1… 2022-1… 2022-1… 2022-1…
##  8 500101008 1                 3      14 2022-1… 2022-1… 2022-1… 2022-1… 2022-1…
##  9 500101009 1                 7       9 2022-1… 2022-1… 2022-1… 2022-1… 2022-1…
## 10 500101010 1                 8      12 2022-1… 2022-1… 2022-1… 2022-1… 2022-1…
## # … with 1,230 more rows, and abbreviated variable names
## #   ¹​number_of_bikes_at_the_station, ²​number_of_bikes_returnable,
## #   ³​station_update_time, ⁴​youbike_update_time, ⁵​transport_update_time,
## #   ⁶​info_datetime, ⁷​info_date

4. Append data to RDBMS

About Tables

As mentioned in the previous section, I separated the JSON file into two tables to store the data. Please reference the below tables to understand the structures of tables on RDBMS.

Table: station_info

Source: YouBike2.0臺北市公共自行車即時資訊
Column Name Description
id id of YouBike2.0 stations
station_name Station name in Chinese
station_area District name of Taipei City in Chinese
station_address Address of the stations in Chinese
station_name_en Station name in English
station_area_en District name of Taipei City in English
station_ar_en Address of the stations in English
latitude Latitude
longitude Longitude
active Active: 1, Inactive: 0
total_parking_space The total number of parking spaces in each station

Table: current_state

Source: YouBike2.0臺北市公共自行車即時資訊
Column Name Description
id id of YouBike2.0 stations
active Active: 1, Inactive: 0
number_of_bikes_at_the_station Number of bikes at the station
number_of_bikes_returnable Number of bikes returnable
station_update_time Updated time on each station of YouBike
youbike_update_time Updated time on the system of YouBike
transport_update_time Updated time on the system of the bureau of transportation
info_datetime Updated time on each station of YouBike
info_date Updated date on each station of YouBike

About RDBMS

I used SQLite as the default RDBMS in this notebook. If you want to use different RDBMS like MySQL, PostgreSQL and so on, you can look for the information on the official website of RStudio.

conn <- dbConnect(RSQLite::SQLite(), ':memory:')

is.exist <- FALSE
for(table.name in dbListTables(conn)) {
    if(table.name == 'station_info') {
        is.exist <- TRUE
        break
    }
}

if(!is.exist) {
    dbCreateTable(conn, 'station_info', station_df)
}

is.exist <- FALSE
for(table.name in dbListTables(conn)) {
    if(table.name == 'current_state') {
        is.exist <- TRUE
        break
    }
}

if(!is.exist) {
    dbCreateTable(conn, 'current_state', state_df)
}
 
dbWriteTable(conn, 'station_info', station_df, overwrite = TRUE)
dbWriteTable(conn, 'current_state', state_df, append = TRUE)

dbDisconnect(conn)