This code is an example of how to use the R programming language to access and process data from a bikeshare operator and obtain docking station bike availability data for that moment in time. This is an example for one city, but in principle the same could be done for many cities and then stored in a combined database.
The code will achieve the following objectives:
First we need to load the necessary libraries
#——– Get docking station data for current time ———-#
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.1.3
## Warning: package 'tibble' was built under R version 4.1.3
## Warning: package 'tidyr' was built under R version 4.1.3
## Warning: package 'readr' was built under R version 4.1.3
## Warning: package 'stringr' was built under R version 4.1.3
## Warning: package 'forcats' was built under R version 4.1.3
library(xml2)
library(XML)
library(knitr)
library(RSQL) #Generate and Process 'SQL' Queries in R
library(RSQLite) #Can create an in-memory SQL database
library(odbc) #Contains drivers to connect to a database
library(DBI) #Contains functions for interacting with the database
Get the data
To find the url needed to obtain the data via an API, a bit of exploration is needed on the bikeshare operator site.
Below describes the process for obtaining data from the London scheme.
The first line uses the httr::GET() function to send a GET request to the URL “https://tfl.gov.uk/tfl/syndication/feeds/cycle-hire/livecyclehireupdates.xml”, which is an XML file that contains information about the status of the cycle hire stations in London. The result of the request is stored in the object status_api_call.
status_api_call <- httr::GET("https://tfl.gov.uk/tfl/syndication/feeds/cycle-hire/livecyclehireupdates.xml")
The second line uses the xmlInternalTreeParse() function to parse the XML content of status_api_call into an internal tree representation, which is easier to manipulate and query. The result is stored in the object api_xml2.
api_xml2 = xmlInternalTreeParse(status_api_call)
The printed output is in the wrong format, so we need convert into a df.
#cat(saveXML(api_xml2, nchars = 10))
The fourth line uses the Sys.time() function to get the current system time and store it in the object current_time. This is needed to ensure that the date and time is correct in the final df, as sometime bikeshare operators have errors with the datetime provided.
current_time = Sys.time()
The fifth line uses the xmlToDataFrame() function to convert the XML tree api_xml2 into a data frame, which is a tabular format that can be used for further analysis. The data frame is then piped (|>) to the mutate() function, which adds a new column called datetime with the value of current_time. This can help to keep track of when the data was obtained. The result is stored in the object xmldf.
xmldf = xmlToDataFrame(api_xml2) |>
dplyr::mutate(datetime = current_time)
# print the data frame as a table
knitr::kable(xmldf[1:5, ], caption = "Data frame from XML file")
| id | name | terminalName | lat | long | installed | locked | installDate | removalDate | temporary | nbBikes | nbStandardBikes | nbEBikes | nbEmptyDocks | nbDocks | datetime |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | River Street , Clerkenwell | 001023 | 51.52916347 | -0.109970527 | true | false | 1278947280000 | false | 14 | 14 | 0 | 5 | 19 | 2024-03-28 15:19:22 | |
| 2 | Phillimore Gardens, Kensington | 001018 | 51.49960695 | -0.197574246 | true | false | 1278585780000 | false | 11 | 11 | 0 | 26 | 37 | 2024-03-28 15:19:22 | |
| 3 | Christopher Street, Liverpool Street | 001012 | 51.52128377 | -0.084605692 | true | false | 1278240360000 | false | 9 | 9 | 0 | 21 | 32 | 2024-03-28 15:19:22 | |
| 4 | St. Chad’s Street, King’s Cross | 001013 | 51.53005939 | -0.120973687 | true | false | 1278241080000 | false | 19 | 19 | 0 | 4 | 23 | 2024-03-28 15:19:22 | |
| 5 | Sedding Street, Sloane Square | 003420 | 51.49313 | -0.156876 | true | false | 1278241440000 | false | 25 | 24 | 1 | 2 | 27 | 2024-03-28 15:19:22 |
We then need to save this data to a database. This code is an example of how to use R to interact with a SQLite database.
CREATE A DATABASE
The first line uses the dbConnect() function to establish a connection to a SQLite database file located at “C:/Users/user/Documents/name/db_name”. The result of the connection is stored in the object con
con <- dbConnect(RSQLite::SQLite(), dbname="C:/Users/patri/Documents/R projs/city_comparison/cities_database.db")
WRITE TO DATABASE
The second line uses the dbWriteTable() function to write the data frame xmldf to a table named ‘cities_database_table’ in the database. The append=TRUE argument indicates that the data will be added to the existing table, rather than replacing it
# RSQLite::dbWriteTable(con, name='cities_database_table', xmldf,append=TRUE)
Check what tables exist in your database
The third line uses the dbListTables() function to list the names of all the tables in the database. This can help to check if the table was created successfully.
# RSQLite::dbListTables(con)
create dataframe from DB
The fourth line uses the dbGetQuery() function to execute a SQL query that selects all the rows and columns from the table ‘cities_database_table’ and returns them as a data frame. The result is stored in the object cities_db_df.
# cities_db_df <- dbGetQuery(con, "SELECT * FROM cities_database_table")
Setup a script schedule
Separate to this script, we need to ensure that we collect this data every 10 minutes by using a timer feature to re-run the script.
# library(taskscheduleR)
# taskscheduler_create(taskname = "my_script_task",
#rscript = "C:/Users/username/Documents/my_script.R",
#schedule = "MINUTE",
#starttime = format(Sys.time() + 50, "%H:%M"),
#modifier = 10)
This will create a task named my_script_task that will run your script every ten minutes, starting from 50 seconds after the current time. You can modify the arguments of the function to suit your needs, such as changing the task name, the script path, the schedule frequency, or the start time.