Bike-sharing systems are a new generation of traditional bike rentals
where the whole process from membership, rental and return back has
become automatic. Through these systems, the user is able to easily rent
a bike from a particular position and return back to another position.
Today, there exists great interest in these systems due to their
important role in traffic, environmental, and health issues.
Apart from interesting real-world applications of bike-sharing
systems, the characteristics of data being generated by these systems
make them attractive for the research. Having features such as duration
of travel, departure, and arrival position, total bike number rented
turns the bike-sharing system into a virtual sensor network that can be
used for sensing mobility in the city. Hence, it is expected that the
most important events in the city could be detected via monitoring these
data.
Capital Bikeshare has more than 4300 bikes available at 500 stations
across 7 jurisdictions. With that number, Capital Bikeshare provides
residents and visitors with a convenient, fun, and affordable
transportation option for getting from point A to point B. People use
Capital Bikeshare to commute to work or school, run errands, get to
appointments or social engagements and more.
We aggregated the data on daily basis and set limitation on only one station Capital Bikeshare system and focusing on only the number of bikes rented.
library(dplyr)
library(tidyr)
library(lubridate)
library(forecast)
library(tseries)
dataset <- read.csv("data_input/day.csv")
dataset <- dataset %>% select(dteday,cnt)
dataset$dteday <- ymd(dataset$dteday)
#checking missing date, will returns TRUE if there is no missing date
complete_date = seq.Date(from = min(dataset$dteday), to = max(dataset$dteday), by = "day")
all(complete_date == dataset$dteday)#> [1] TRUE
#> [1] FALSE
📈 Insight : - Trend of bike renting is increasing from
2011 to 2012 - Number of bike renting has its peak during mid-year
(summer - fall season), it might because of the weather condition and
temperature outside is comfortable to use bike as transportation - The
data has
trend , seasonal and
additive type - From the data shown, we are gonna use model
Triple Exponential Smoothing and ARIMA
Using
adf.test() to check the stationarity assumption with
using assumptions:
H0: data is not stationaryH1: data is stationarywith p-value <0.05 (alpha), means that H0 is rejected
#>
#> Augmented Dickey-Fuller Test
#>
#> data: bike_ts
#> Dickey-Fuller = -1.6351, Lag order = 9, p-value = 0.7327
#> alternative hypothesis: stationary
📈 Insight : - p-value is > than 0.05 means that the data is not stationary - we have to do differencing
#>
#> Augmented Dickey-Fuller Test
#>
#> data: .
#> Dickey-Fuller = -13.798, Lag order = 8, p-value = 0.01
#> alternative hypothesis: stationary
📈 Insight : - p-value is < than 0.05 means that the data is stationary after differencing - so we put d = 1
###PACF & ACF PLOT
📈 Insight : - PACF plot shows that we can use these
lag for p > 1,2,3,4,5
ACF plot shows that we can use these lag for q >
1,2,3
from differencing and plot PACF and ACF we have order list for ARIMA Model: > p : 0,1,2,3,4,5 > d : 1 > q : 0,1,2,3
hence we got combination of: - c(0,1,0) - c(0,1,1) - c(0,1,2) -
c(0,1,3) - c(1,1,0) - c(1,1,1) - c(1,1,2) - c(1,1,3) -
c(1,1,0) - c(2,1,1) - c(2,1,2) - c(2,1,3) - c(3,1,0) - c(3,1,1) -
c(3,1,2) - c(3,1,3) - c(4,1,0) - c(4,1,1) - c(4,1,2) - c(4,1,3) -
c(5,1,0) - c(5,1,1) - c(5,1,2) - c(5,1,3)
but we only took 2 combination of order list for ARIMA Model we took
one with high value of p and low value of q
and we took one with low value of p and high value of
q
#> [1] 12051.98
#> [1] 12070.63
#> ME RMSE MAE MPE MAPE ACF1 Theil's U
#> Test set 12.16737 923.0264 645.7667 -44.25499 58.28445 -0.00071928 2.309738
#> ME RMSE MAE MPE MAPE ACF1 Theil's U
#> Test set 4.220283 933.7811 658.8 -44.23189 58.70165 -0.01593013 2.074105
📈 Insight : - the MAPE of the two models are above
50% which means that this model cannot forecast well enough
- for
this issue, we recommend to add more data because we only have 2 years
span of data
- the model cannot catch the trend well because the
dataset only have 2 repetition
Hadi Fanaee-T
Laboratory of Artificial Intelligence and Decision
Support (LIAAD), University of Porto INESC Porto
Campus da FEUP Rua
Dr. Roberto Frias, 378 4200 - 465 Porto, Portugal
Original dataset : https://www.kaggle.com/c/bike-sharing-demand
Capital Bikeshare trip data : http://capitalbikeshare.com/system-data
Weather
Information : https://openweathermap.org/history
Holiday Schedule
: http://dchr.dc.gov/page/holiday-schedule
A work by Taufan Anggoro Adhi
tf.anggoro@gmail.com