Data Analysis Project
Tools: R (tidyverse, lubridate, tsibble, fable, feasts,
ggplot2)
Methods: Time Series Analysis, ARIMA Forecasting,
Cross-Validation
Dataset: UCI Machine Learning Repository – Bike Sharing
Dataset (Washington, DC)
This report analyzes daily bike rental demand in the Washington, DC Capital Bikeshare system using historical data from 2011–2012. The objective is to explore rental patterns, identify seasonal behavior, and build forecasting models that can support operational planning and pricing decisions.
Executive Summary:
Daily bike rental demand in Washington, DC shows clear seasonal variation and a strong relationship with temperature. Exploratory analysis highlights higher ridership in warmer months and lower ridership in colder periods. Cross-validation results indicate that ARIMA outperforms naive benchmark models for short-term forecasting. These forecasts can support seasonal fleet planning, maintenance scheduling, and resource allocation.
Bike sharing systems must accurately forecast demand in order to manage fleet availability, station capacity, and operational costs. Reliable demand forecasts help operators determine how many bikes should be available at different times and locations, while also supporting pricing and maintenance planning.
This analysis uses historical Capital Bikeshare data from Washington, DC to examine patterns in daily bike rental demand and develop forecasting models that support operational decision-making.
The dataset contains the daily count of rental bike transactions in the Capital Bikeshare system during 2011 and 2012, along with weather, season, and calendar variables.
Fanaee-T, H., & Gama, J. (2013). Event labeling combining ensemble detectors and background knowledge. Progress in Artificial Intelligence.
| missing_dates | duplicate_dates | start_date | end_date |
|---|---|---|---|
| 0 | 0 | 2011-01-01 | 2012-12-31 |
The dataset contains a complete daily time index with no missing or duplicated observations.
| days | mean | sd | median | min | max |
|---|---|---|---|---|---|
| 731 | 4504.3 | 1937.2 | 4548 | 22 | 8714 |
The dataset contains 731 daily observations across two full calendar years, which is sufficient for identifying broad trends and seasonal patterns.
The time plot shows a clear increase in bike rental demand from 2011 to 2012, along with strong seasonal fluctuations.
Seasonal differences are substantial, with the highest rental activity occurring during warmer parts of the year.
The faceted view shows that the relationship between temperature and rental demand differs somewhat by season, but demand generally rises as temperatures become milder and remains highest during warmer periods.
Monthly aggregation highlights the annual cycle more clearly, with demand peaking in the warmer months.
The rolling averages help reveal the underlying trend by reducing short-term noise in the daily series.
The decomposition indicates a rising long-term trend and a strong recurring seasonal structure in the series.
| Series | ADF p-value |
|---|---|
| Original series | 0.9824 |
| First difference | 0.0100 |
The Augmented Dickey–Fuller test suggests that the original rental series is non-stationary, while the differenced series is stationary and therefore better suited for ARIMA modeling.
The naive and seasonal naive models provide simple benchmark forecasts that allow the ARIMA model to be evaluated against baseline performance.
## Series: rentals
## Model: ARIMA(1,1,1)(1,0,2)[7]
##
## Coefficients:
## ar1 ma1 sar1 sma1 sma2
## 0.3612 -0.9005 0.9106 -0.9035 0.0545
## s.e. 0.0427 0.0205 0.0760 0.0865 0.0417
##
## sigma^2 estimated as 844004: log likelihood=-6014.88
## AIC=12041.75 AICc=12041.87 BIC=12069.31
The automatic ARIMA procedure selects a model that captures both differencing and seasonal structure in the data.
| .model | RMSE | MAE |
|---|---|---|
| ARIMA | 1493.6 | 1099.3 |
| SNAIVE | 1703.5 | 1231.4 |
| NAIVE | 1832.0 | 1350.7 |
Cross-validation results show that the ARIMA model achieves the lowest RMSE and MAE among the models tested, outperforming both the naive and seasonal naive benchmarks. This suggests that ARIMA captures the structure of the series more effectively and provides stronger short-term forecasting performance for this dataset.
The 60-day forecast projects future rental demand based on the historical pattern in the series. Forecast intervals widen over time, reflecting increasing uncertainty as the forecast horizon extends.
| .model | lb_stat | lb_pvalue |
|---|---|---|
| ARIMA | 16.0354 | 0.8141 |
The Ljung–Box test results show no significant residual autocorrelation, suggesting that the ARIMA model adequately captures the temporal structure of the series.
Daily bike rental demand increased between 2011 and 2012, indicating growing system usage during the study period.
Rental activity exhibits clear seasonal patterns, with higher ridership during warmer months and reduced usage during colder periods.
Temperature shows a positive association with rental demand, suggesting that weather conditions play an important role in influencing daily ridership levels.
Forecast evaluation indicates that the ARIMA model provides more accurate short-term predictions than simple naive benchmark models.
Increase bike availability and operational capacity in anticipation of peak demand during warmer months.
Schedule maintenance and system upgrades during lower-demand periods, particularly in colder seasons.
Use demand forecasts to support staffing decisions, bike redistribution strategies, and inventory planning.
Consider weather-responsive pricing or promotional strategies to help stabilize demand during shoulder seasons.
The analysis relies on data from 2011–2012 only, which may limit the generalizability of long-term demand patterns.
The dataset contains system-wide daily totals rather than station-level observations, preventing more detailed spatial analysis.
Potential explanatory factors such as precipitation, special events, and pricing changes are not included in the dataset.
Extend the analysis using additional years of data.
Incorporate richer weather variables and event-related predictors.
Build station-level forecasting models to support redistribution decisions.
Compare ARIMA forecasts with alternative approaches such as ETS models or machine learning methods.
UCI Machine Learning Repository. Bike Sharing Dataset.
Fanaee-T, H., & Gama, J. (2013). Event labeling combining ensemble detectors and background knowledge. Progress in Artificial Intelligence.
Hyndman, R. J., & Athanasopoulos, G. Forecasting: Principles and Practice.
Documentation for the tsibble, fable, feasts, and tseries R packages.