The modules on the below code chunk are required to execute all the code chunks on this markdown. If you haven’t installed these modules, please download and install them before you carry out code chunks.
require(rjson)
## Loading required package: rjson
require(tibble)
## Loading required package: tibble
require(RSQLite)
## Loading required package: RSQLite
The data used in this notebook is from API on data.gov.tw. The data format is JSON, and the size is 572.4 KB.
fileUrl <- "https://tcgbusfs.blob.core.windows.net/dotapp/youbike/v2/youbike_immediate.json"
download.file(fileUrl, 'YouBike_realtime.json')
file <- "YouBike_realtime.json"
json_data <- fromJSON(file = file)
The dataset includes the id, name, area, and address of the YouBike 2.0 stations in Taipei City, and also added DateTime and Date that the data updated.
Due to reducing the amount of data stored in RDBMS, I separated the JSON file into two data frames. I will explain the detail of the data frames next section.
sno <- c()
sna <- c()
tot <- c()
sbi <- c()
sarea <- c()
mday <- c()
lat <- c()
lng <- c()
ar <- c()
sareaen <- c()
snaen <- c()
aren <- c()
bemp <- c()
act <- c()
srcUpdateTime <- c()
updateTime <- c()
infoTime <- c()
infoDate <- c()
for (i in 1:length(json_data)) {
sno <- append(sno, json_data[i][[1]]$sno)
sna <- append(sna, json_data[i][[1]]$sna)
tot <- append(tot, json_data[i][[1]]$tot)
sbi <- append(sbi, json_data[i][[1]]$sbi)
sarea <- append(sarea, json_data[i][[1]]$sarea)
mday <- append(mday, json_data[i][[1]]$mday)
lat <- append(lat, json_data[i][[1]]$lat)
lng <- append(lng, json_data[i][[1]]$lng)
ar <- append(ar, json_data[i][[1]]$ar)
sareaen <- append(sareaen, json_data[i][[1]]$sareaen)
snaen <- append(snaen, json_data[i][[1]]$snaen)
aren <- append(aren, json_data[i][[1]]$aren)
bemp <- append(bemp, json_data[i][[1]]$bemp)
act <- append(act, json_data[i][[1]]$act)
srcUpdateTime <- append(srcUpdateTime, json_data[i][[1]]$srcUpdateTime)
updateTime <- append(updateTime, json_data[i][[1]]$updateTime)
infoTime <- append(infoTime, json_data[i][[1]]$infoTime)
infoDate <- append(infoDate, json_data[i][[1]]$infoDate)
}
station_df <- tibble(
id = sno,
station_name = sna,
station_area = sarea,
station_address = ar,
station_name_en = snaen,
station_area_en = sareaen,
station_ar_en = aren,
latitude = lat,
longitude = lng,
active = as.integer(act),
total_parking_space = as.integer(tot)
)
state_df <- tibble(
id = sno,
active = act,
number_of_bikes_at_the_station = as.integer(sbi),
number_of_bikes_returnable = as.integer(bemp),
station_update_time = mday,
youbike_update_time = srcUpdateTime,
transport_update_time = updateTime,
info_datetime = infoTime,
info_date = infoDate
)
station_df
## # A tibble: 1,240 × 11
## id stati…¹ stati…² stati…³ stati…⁴ stati…⁵ stati…⁶ latit…⁷ longi…⁸ active
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <int>
## 1 50010… YouBik… 大安區 復興南… YouBik… Daan D… No.235… 25.0 122. 1
## 2 50010… YouBik… 大安區 復興南… YouBik… Daan D… No.273… 25.0 122. 1
## 3 50010… YouBik… 大安區 和平東… YouBik… Daan D… No. 7… 25.0 122. 1
## 4 50010… YouBik… 大安區 和平東… YouBik… Daan D… No. 33… 25.0 122. 1
## 5 50010… YouBik… 大安區 復興南… YouBik… Daan D… No. 36… 25.0 122. 1
## 6 50010… YouBik… 大安區 復興南… YouBik… Daan D… No. 28… 25.0 122. 1
## 7 50010… YouBik… 大安區 復興南… YouBik… Daan D… No. 34… 25.0 122. 1
## 8 50010… YouBik… 大安區 新生南… YouBik… Daan D… No. 52… 25.0 122. 1
## 9 50010… YouBik… 大安區 新生南… YouBik… Daan D… No. 66… 25.0 122. 1
## 10 50010… YouBik… 大安區 新生南… YouBik… Daan D… No. 82… 25.0 122. 1
## # … with 1,230 more rows, 1 more variable: total_parking_space <int>, and
## # abbreviated variable names ¹station_name, ²station_area, ³station_address,
## # ⁴station_name_en, ⁵station_area_en, ⁶station_ar_en, ⁷latitude, ⁸longitude
state_df
## # A tibble: 1,240 × 9
## id active number_of_…¹ numbe…² stati…³ youbi…⁴ trans…⁵ info_…⁶ info_…⁷
## <chr> <chr> <int> <int> <chr> <chr> <chr> <chr> <chr>
## 1 500101001 1 2 26 2022-1… 2022-1… 2022-1… 2022-1… 2022-1…
## 2 500101002 1 12 9 2022-1… 2022-1… 2022-1… 2022-1… 2022-1…
## 3 500101003 1 0 16 2022-1… 2022-1… 2022-1… 2022-1… 2022-1…
## 4 500101004 1 1 10 2022-1… 2022-1… 2022-1… 2022-1… 2022-1…
## 5 500101005 1 8 8 2022-1… 2022-1… 2022-1… 2022-1… 2022-1…
## 6 500101006 1 10 1 2022-1… 2022-1… 2022-1… 2022-1… 2022-1…
## 7 500101007 1 1 10 2022-1… 2022-1… 2022-1… 2022-1… 2022-1…
## 8 500101008 1 3 14 2022-1… 2022-1… 2022-1… 2022-1… 2022-1…
## 9 500101009 1 7 9 2022-1… 2022-1… 2022-1… 2022-1… 2022-1…
## 10 500101010 1 8 12 2022-1… 2022-1… 2022-1… 2022-1… 2022-1…
## # … with 1,230 more rows, and abbreviated variable names
## # ¹number_of_bikes_at_the_station, ²number_of_bikes_returnable,
## # ³station_update_time, ⁴youbike_update_time, ⁵transport_update_time,
## # ⁶info_datetime, ⁷info_date
As mentioned in the previous section, I separated the JSON file into two tables to store the data. Please reference the below tables to understand the structures of tables on RDBMS.
Table: station_info
| Column Name | Description |
|---|---|
| id | id of YouBike2.0 stations |
| station_name | Station name in Chinese |
| station_area | District name of Taipei City in Chinese |
| station_address | Address of the stations in Chinese |
| station_name_en | Station name in English |
| station_area_en | District name of Taipei City in English |
| station_ar_en | Address of the stations in English |
| latitude | Latitude |
| longitude | Longitude |
| active | Active: 1, Inactive: 0 |
| total_parking_space | The total number of parking spaces in each station |
Table: current_state
| Column Name | Description |
|---|---|
| id | id of YouBike2.0 stations |
| active | Active: 1, Inactive: 0 |
| number_of_bikes_at_the_station | Number of bikes at the station |
| number_of_bikes_returnable | Number of bikes returnable |
| station_update_time | Updated time on each station of YouBike |
| youbike_update_time | Updated time on the system of YouBike |
| transport_update_time | Updated time on the system of the bureau of transportation |
| info_datetime | Updated time on each station of YouBike |
| info_date | Updated date on each station of YouBike |
I used SQLite as the default RDBMS in this notebook. If you want to use different RDBMS like MySQL, PostgreSQL and so on, you can look for the information on the official website of RStudio.
conn <- dbConnect(RSQLite::SQLite(), ':memory:')
is.exist <- FALSE
for(table.name in dbListTables(conn)) {
if(table.name == 'station_info') {
is.exist <- TRUE
break
}
}
if(!is.exist) {
dbCreateTable(conn, 'station_info', station_df)
}
is.exist <- FALSE
for(table.name in dbListTables(conn)) {
if(table.name == 'current_state') {
is.exist <- TRUE
break
}
}
if(!is.exist) {
dbCreateTable(conn, 'current_state', state_df)
}
dbWriteTable(conn, 'station_info', station_df, overwrite = TRUE)
dbWriteTable(conn, 'current_state', state_df, append = TRUE)
dbDisconnect(conn)