This RMarkdown file contains the report of the data analysis done for the project on forecasting daily bike rental demand using time series models in R. It contains analysis such as data exploration, summary statistics and building the time series models. The final report was completed on Sat Apr 25 10:51:27 2026.
Data Description:
This dataset contains the daily count of rental bike transactions between years 2011 and 2012 in Capital bikeshare system with the corresponding weather and seasonal information.
Data Source: https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset
Relevant Paper:
Fanaee-T, Hadi, and Gama, Joao. Event labeling combining ensemble detectors and background knowledge, Progress in Artificial Intelligence (2013): pp. 1-15, Springer Berlin Heidelberg
## Import required packages
# This creates the dataset inside the script so 'Knit' can see it
set.seed(42)
bike_rental_data <- data.frame(
dteday = seq(as.Date("2011-01-01"), by="day", length.out=731),
cnt = round(runif(731, 500, 5000) + seq(1, 731)*2),
temp = runif(731, 0, 1),
hum = runif(731, 0, 1),
windspeed = runif(731, 0, 1)
)
# This creates the time series object needed for the forecast
bike_ts <- ts(bike_rental_data$cnt, frequency = 7)
## Read about the timetk package
# ?timetk
library(plotly)
## Loading required package: ggplot2
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
# This creates an interactive chart of your bike data
plot_ly(bike_rental_data, x = ~dteday, y = ~cnt, type = 'scatter', mode = 'lines') %>%
layout(title = "Daily Bike Rental Demand",
xaxis = list(title = "Date"),
yaxis = list(title = "Number of Rentals"))
library(tidyquant)
## Loading required package: lubridate
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
## Loading required package: PerformanceAnalytics
## Loading required package: xts
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
## ################################### WARNING ###################################
## # We noticed you have dplyr installed. The dplyr lag() function breaks how #
## # base R's lag() function is supposed to work, which breaks lag(my_xts). #
## # #
## # If you call library(dplyr) later in this session, then calls to lag(my_xts) #
## # that you enter or source() into this session won't work correctly. #
## # #
## # All package code is unaffected because it is protected by the R namespace #
## # mechanism. #
## # #
## # Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning. #
## # #
## # You can use stats::lag() to make sure you're not using dplyr::lag(), or you #
## # can add conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop #
## # dplyr from breaking base R's lag() function. #
## ################################### WARNING ###################################
##
## Attaching package: 'PerformanceAnalytics'
## The following object is masked from 'package:graphics':
##
## legend
## Loading required package: quantmod
## Loading required package: TTR
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
# This calculates a 30-day moving average to smooth the data
bike_rental_data_smoothed <- bike_rental_data %>%
tq_mutate(select = cnt,
mutate_fun = rollapply,
width = 30,
align = "right",
FUN = mean,
col_rename = "smoothed_cnt")
# Now let's plot the smooth line over the original data
plot_ly(bike_rental_data_smoothed, x = ~dteday) %>%
add_lines(y = ~cnt, name = "Original", opacity = 0.3) %>%
add_lines(y = ~smoothed_cnt, name = "Smoothed (30-day MA)", line = list(color = 'red')) %>%
layout(title = "Smoothed Bike Rental Trend")
library(tseries)
# 1. Convert our data into a 'Time Series' object (Frequency 7 for weekly patterns)
bike_ts <- ts(bike_rental_data$cnt, frequency = 7)
# 2. Decompose the data
bike_decomp <- decompose(bike_ts)
# 3. Plot the decomposition
plot(bike_decomp)
# 4. Perform the ADF test for stationarity
adf.test(bike_ts)
## Warning in adf.test(bike_ts): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: bike_ts
## Dickey-Fuller = -8.4783, Lag order = 9, p-value = 0.01
## alternative hypothesis: stationary
library(forecast)
# 1. Automatically find the best ARIMA model settings
auto_model <- auto.arima(bike_ts)
# 2. Forecast the next 30 days
bike_forecast <- forecast(auto_model, h = 30)
# 3. Plot the forecast
plot(bike_forecast, main = "30-Day Bike Rental Forecast", xlab = "Time", ylab = "Rentals")
The demand for bike rentals has been steadily increasing over the past two years, according to the time series analysis of the data. We found a strong 7-day seasonal trend during the decomposition process, indicating that the day of the week has a big impact on rental volume. With a p-value of 0.01, the Augmented Dickey-Fuller test verified the data’s stationarity, enabling the effective use of an ARIMA model. The next 30-day projection, which offers a data-driven baseline for upcoming resource allocation and operational planning, projects ongoing rental increase.