The way to collect the realtime data of YouBike2.0 in Taipei City on data.gov.tw

Code chunk

1. Load Required Modules

The modules on the below code chunk are required to execute all the code chunks on this markdown. If you haven’t installed these modules, please download and install them before you carry out code chunks.

require(rjson)

## Loading required package: rjson

require(tibble)

## Loading required package: tibble

require(RSQLite)

## Loading required package: RSQLite

2. Download YouBike 2.0 real-time data from API

The data used in this notebook is from API on data.gov.tw. The data format is JSON, and the size is 572.4 KB.

fileUrl <- "https://tcgbusfs.blob.core.windows.net/dotapp/youbike/v2/youbike_immediate.json"
download.file(fileUrl, 'YouBike_realtime.json')

file <- "YouBike_realtime.json"
json_data <- fromJSON(file = file)

3. Prepare the tables before appending them to RDBMS

The dataset includes the id, name, area, and address of the YouBike 2.0 stations in Taipei City, and also added DateTime and Date that the data updated.

Due to reducing the amount of data stored in RDBMS, I separated the JSON file into two data frames. I will explain the detail of the data frames next section.

sno <- c()
sna <- c()
tot <- c()
sbi <- c()
sarea <- c()
mday <- c()
lat <- c()
lng <- c()
ar <- c()
sareaen <- c()
snaen <- c()
aren <- c()
bemp <- c()
act <- c()
srcUpdateTime <- c()
updateTime <- c()
infoTime <- c()
infoDate <- c()

for (i in 1:length(json_data)) {
    sno <- append(sno, json_data[i][[1]]$sno)
    sna <- append(sna, json_data[i][[1]]$sna)
    tot <- append(tot, json_data[i][[1]]$tot)
    sbi <- append(sbi, json_data[i][[1]]$sbi)
    sarea <- append(sarea, json_data[i][[1]]$sarea)
    mday <- append(mday, json_data[i][[1]]$mday)
    lat <- append(lat, json_data[i][[1]]$lat)
    lng <- append(lng, json_data[i][[1]]$lng)
    ar <- append(ar, json_data[i][[1]]$ar)
    sareaen <- append(sareaen, json_data[i][[1]]$sareaen)
    snaen <- append(snaen, json_data[i][[1]]$snaen)
    aren <- append(aren, json_data[i][[1]]$aren)
    bemp <- append(bemp, json_data[i][[1]]$bemp)
    act <- append(act, json_data[i][[1]]$act)
    srcUpdateTime <- append(srcUpdateTime, json_data[i][[1]]$srcUpdateTime)
    updateTime <- append(updateTime, json_data[i][[1]]$updateTime)
    infoTime <- append(infoTime, json_data[i][[1]]$infoTime)
    infoDate <- append(infoDate, json_data[i][[1]]$infoDate)
}

station_df <- tibble(
    id = sno,
    station_name = sna,
    station_area = sarea,
    station_address = ar,
    station_name_en = snaen,
    station_area_en = sareaen,
    station_ar_en = aren,
    latitude = lat,
    longitude = lng,
    active = as.integer(act),
    total_parking_space = as.integer(tot)
)

state_df <- tibble(
    id = sno,
    active = act,
    number_of_bikes_at_the_station = as.integer(sbi),
    number_of_bikes_returnable = as.integer(bemp),
    station_update_time = mday,
    youbike_update_time = srcUpdateTime,
    transport_update_time = updateTime,
    info_datetime = infoTime,
    info_date = infoDate
)

station_df

## # A tibble: 1,240 × 11
##    id     stati…¹ stati…² stati…³ stati…⁴ stati…⁵ stati…⁶ latit…⁷ longi…⁸ active
##    <chr>  <chr>   <chr>   <chr>   <chr>   <chr>   <chr>     <dbl>   <dbl>  <int>
##  1 50010… YouBik… 大安區  復興南… YouBik… Daan D… No.235…    25.0    122.      1
##  2 50010… YouBik… 大安區  復興南… YouBik… Daan D… No.273…    25.0    122.      1
##  3 50010… YouBik… 大安區  和平東… YouBik… Daan D… No. 7…     25.0    122.      1
##  4 50010… YouBik… 大安區  和平東… YouBik… Daan D… No. 33…    25.0    122.      1
##  5 50010… YouBik… 大安區  復興南… YouBik… Daan D… No. 36…    25.0    122.      1
##  6 50010… YouBik… 大安區  復興南… YouBik… Daan D… No. 28…    25.0    122.      1
##  7 50010… YouBik… 大安區  復興南… YouBik… Daan D… No. 34…    25.0    122.      1
##  8 50010… YouBik… 大安區  新生南… YouBik… Daan D… No. 52…    25.0    122.      1
##  9 50010… YouBik… 大安區  新生南… YouBik… Daan D… No. 66…    25.0    122.      1
## 10 50010… YouBik… 大安區  新生南… YouBik… Daan D… No. 82…    25.0    122.      1
## # … with 1,230 more rows, 1 more variable: total_parking_space <int>, and
## #   abbreviated variable names ¹station_name, ²station_area, ³station_address,
## #   ⁴station_name_en, ⁵station_area_en, ⁶station_ar_en, ⁷latitude, ⁸longitude

state_df

## # A tibble: 1,240 × 9
##    id        active number_of_…¹ numbe…² stati…³ youbi…⁴ trans…⁵ info_…⁶ info_…⁷
##    <chr>     <chr>         <int>   <int> <chr>   <chr>   <chr>   <chr>   <chr>  
##  1 500101001 1                 2      26 2022-1… 2022-1… 2022-1… 2022-1… 2022-1…
##  2 500101002 1                12       9 2022-1… 2022-1… 2022-1… 2022-1… 2022-1…
##  3 500101003 1                 0      16 2022-1… 2022-1… 2022-1… 2022-1… 2022-1…
##  4 500101004 1                 1      10 2022-1… 2022-1… 2022-1… 2022-1… 2022-1…
##  5 500101005 1                 8       8 2022-1… 2022-1… 2022-1… 2022-1… 2022-1…
##  6 500101006 1                10       1 2022-1… 2022-1… 2022-1… 2022-1… 2022-1…
##  7 500101007 1                 1      10 2022-1… 2022-1… 2022-1… 2022-1… 2022-1…
##  8 500101008 1                 3      14 2022-1… 2022-1… 2022-1… 2022-1… 2022-1…
##  9 500101009 1                 7       9 2022-1… 2022-1… 2022-1… 2022-1… 2022-1…
## 10 500101010 1                 8      12 2022-1… 2022-1… 2022-1… 2022-1… 2022-1…
## # … with 1,230 more rows, and abbreviated variable names
## #   ¹number_of_bikes_at_the_station, ²number_of_bikes_returnable,
## #   ³station_update_time, ⁴youbike_update_time, ⁵transport_update_time,
## #   ⁶info_datetime, ⁷info_date

4. Append data to RDBMS

About Tables

As mentioned in the previous section, I separated the JSON file into two tables to store the data. Please reference the below tables to understand the structures of tables on RDBMS.

Table: station_info

Source: YouBike2.0臺北市公共自行車即時資訊
Column Name	Description
id	id of YouBike2.0 stations
station_name	Station name in Chinese
station_area	District name of Taipei City in Chinese
station_address	Address of the stations in Chinese
station_name_en	Station name in English
station_area_en	District name of Taipei City in English
station_ar_en	Address of the stations in English
latitude	Latitude
longitude	Longitude
active	Active: 1, Inactive: 0
total_parking_space	The total number of parking spaces in each station

Table: current_state

Source: YouBike2.0臺北市公共自行車即時資訊
Column Name	Description
id	id of YouBike2.0 stations
active	Active: 1, Inactive: 0
number_of_bikes_at_the_station	Number of bikes at the station
number_of_bikes_returnable	Number of bikes returnable
station_update_time	Updated time on each station of YouBike
youbike_update_time	Updated time on the system of YouBike
transport_update_time	Updated time on the system of the bureau of transportation
info_datetime	Updated time on each station of YouBike
info_date	Updated date on each station of YouBike

About RDBMS

I used SQLite as the default RDBMS in this notebook. If you want to use different RDBMS like MySQL, PostgreSQL and so on, you can look for the information on the official website of RStudio.

conn <- dbConnect(RSQLite::SQLite(), ':memory:')

is.exist <- FALSE
for(table.name in dbListTables(conn)) {
    if(table.name == 'station_info') {
        is.exist <- TRUE
        break
    }
}

if(!is.exist) {
    dbCreateTable(conn, 'station_info', station_df)
}

is.exist <- FALSE
for(table.name in dbListTables(conn)) {
    if(table.name == 'current_state') {
        is.exist <- TRUE
        break
    }
}

if(!is.exist) {
    dbCreateTable(conn, 'current_state', state_df)
}
 
dbWriteTable(conn, 'station_info', station_df, overwrite = TRUE)
dbWriteTable(conn, 'current_state', state_df, append = TRUE)

dbDisconnect(conn)