Cyclists riding along the Reflecting Pool near the Lincoln Memorial in Washington, DC.

Data Analysis Project
Tools: R (tidyverse, lubridate, tsibble, fable, feasts, ggplot2)
Methods: Time Series Analysis, ARIMA Forecasting, Cross-Validation
Dataset: UCI Machine Learning Repository – Bike Sharing Dataset (Washington, DC)

1 Introduction

This report analyzes daily bike rental demand in the Washington, DC Capital Bikeshare system using historical data from 2011–2012. The objective is to explore rental patterns, identify seasonal behavior, and build forecasting models that can support operational planning and pricing decisions.

Executive Summary:
Daily bike rental demand in Washington, DC shows clear seasonal variation and a strong relationship with temperature. Exploratory analysis highlights higher ridership in warmer months and lower ridership in colder periods. Cross-validation results indicate that ARIMA outperforms naive benchmark models for short-term forecasting. These forecasts can support seasonal fleet planning, maintenance scheduling, and resource allocation.

1.1 Business Objective

Bike sharing systems must accurately forecast demand in order to manage fleet availability, station capacity, and operational costs. Reliable demand forecasts help operators determine how many bikes should be available at different times and locations, while also supporting pricing and maintenance planning.

This analysis uses historical Capital Bikeshare data from Washington, DC to examine patterns in daily bike rental demand and develop forecasting models that support operational decision-making.

1.2 Data Description

The dataset contains the daily count of rental bike transactions in the Capital Bikeshare system during 2011 and 2012, along with weather, season, and calendar variables.

1.3 Data Source

UCI Machine Learning Repository – Bike Sharing Dataset

1.4 Relevant Paper

Fanaee-T, H., & Gama, J. (2013). Event labeling combining ensemble detectors and background knowledge. Progress in Artificial Intelligence.

2 Data Preparation

2.1 Load Required Packages

2.2 Load the Data

2.3 Clean and Prepare the Data

2.4 Data Quality Check

Table 1. Data quality check for the time index.
missing_dates	duplicate_dates	start_date	end_date
0	0	2011-01-01	2012-12-31

The dataset contains a complete daily time index with no missing or duplicated observations.

2.5 Summary Statistics

Table 2. Summary statistics for daily bike rentals.
days	mean	sd	median	min	max
731	4504.3	1937.2	4548	22	8714

The dataset contains 731 daily observations across two full calendar years, which is sufficient for identifying broad trends and seasonal patterns.

3 Exploratory Data Analysis

3.1 Daily Rentals Over Time

The time plot shows a clear increase in bike rental demand from 2011 to 2012, along with strong seasonal fluctuations.

3.2 Rentals by Season

Seasonal differences are substantial, with the highest rental activity occurring during warmer parts of the year.

3.3 Rentals vs Temperature

The faceted view shows that the relationship between temperature and rental demand differs somewhat by season, but demand generally rises as temperatures become milder and remains highest during warmer periods.

3.4 Daily Capital Bikeshare Rentals (2011–2012)

The interactive time-series visualization highlights daily variability and seasonal demand patterns in the Capital Bikeshare system.

Demand peaks during summer months and declines during winter, suggesting strong seasonal effects that should be incorporated into forecasting models.

3.5 Monthly Average Rentals

Monthly aggregation highlights the annual cycle more clearly, with demand peaking in the warmer months.

3.6 Monthly Seasonal Subseries of Capital Bikeshare Rentals

The seasonal subseries plot highlights recurring monthly patterns in bike rental demand. Rental activity is lowest during winter months, rises steadily through spring, peaks in late summer to early fall, and then declines again toward winter. This recurring structure provides strong visual evidence of seasonality in the series.

3.7 Smoothed Trend

The rolling averages help reveal the underlying trend by reducing short-term noise in the daily series.

4 Trend and Seasonality

4.1 STL Decomposition

The decomposition indicates a rising long-term trend and a strong recurring seasonal structure in the series.

4.2 Stationarity Testing

Table 3. Augmented Dickey–Fuller stationarity test results.
Series	ADF p-value
Original series	0.9824
First difference	0.0100

The Augmented Dickey–Fuller test suggests that the original rental series is non-stationary, while the differenced series is stationary and therefore better suited for ARIMA modeling.

5 Forecast Evaluation

5.1 Benchmark Models

The naive and seasonal naive models provide simple benchmark forecasts that allow the ARIMA model to be evaluated against baseline performance.

5.2 ARIMA Model

## Series: rentals 
## Model: ARIMA(1,1,1)(1,0,2)[7] 
## 
## Coefficients:
##          ar1      ma1    sar1     sma1    sma2
##       0.3612  -0.9005  0.9106  -0.9035  0.0545
## s.e.  0.0427   0.0205  0.0760   0.0865  0.0417
## 
## sigma^2 estimated as 844004:  log likelihood=-6014.88
## AIC=12041.75   AICc=12041.87   BIC=12069.31

The automatic ARIMA procedure selects a model that captures both differencing and seasonal structure in the data.

5.3 Cross-Validation Comparison

Table 4. Cross-validation comparison of forecasting models.
.model	RMSE	MAE
ARIMA	1493.6	1099.3
SNAIVE	1703.5	1231.4
NAIVE	1832.0	1350.7

Cross-validation results show that the ARIMA model achieves the lowest RMSE and MAE among the models tested, outperforming both the naive and seasonal naive benchmarks. This suggests that ARIMA captures the structure of the series more effectively and provides stronger short-term forecasting performance for this dataset.

5.4 Final Forecast

The 60-day forecast projects future rental demand based on the historical pattern in the series. Forecast intervals widen over time, reflecting increasing uncertainty as the forecast horizon extends.

5.5 Model Diagnostics

Table 5. Ljung–Box test results for ARIMA residuals.
.model	lb_stat	lb_pvalue
ARIMA	16.0354	0.8141

The Ljung–Box test results show no significant residual autocorrelation, suggesting that the ARIMA model adequately captures the temporal structure of the series.

6 Conclusions and Recommendations

6.1 Key Findings

Daily bike rental demand increased between 2011 and 2012, indicating growing system usage during the study period.
Rental activity exhibits clear seasonal patterns, with higher ridership during warmer months and reduced usage during colder periods.
Temperature shows a positive association with rental demand, suggesting that weather conditions play an important role in influencing daily ridership levels.
Forecast evaluation indicates that the ARIMA model provides more accurate short-term predictions than simple naive benchmark models.

6.2 Business Recommendations

Increase bike availability and operational capacity in anticipation of peak demand during warmer months.
Schedule maintenance and system upgrades during lower-demand periods, particularly in colder seasons.
Use demand forecasts to support staffing decisions, bike redistribution strategies, and inventory planning.
Consider weather-responsive pricing or promotional strategies to help stabilize demand during shoulder seasons.

6.3 Limitations

The analysis relies on data from 2011–2012 only, which may limit the generalizability of long-term demand patterns.
The dataset contains system-wide daily totals rather than station-level observations, preventing more detailed spatial analysis.
Potential explanatory factors such as precipitation, special events, and pricing changes are not included in the dataset.

6.4 Next Steps

Extend the analysis using additional years of data.
Incorporate richer weather variables and event-related predictors.
Build station-level forecasting models to support redistribution decisions.
Compare ARIMA forecasts with alternative approaches such as ETS models or machine learning methods.

6.5 References

UCI Machine Learning Repository. Bike Sharing Dataset.
Fanaee-T, H., & Gama, J. (2013). Event labeling combining ensemble detectors and background knowledge. Progress in Artificial Intelligence.
Hyndman, R. J., & Athanasopoulos, G. Forecasting: Principles and Practice.
Documentation for the tsibble, fable, feasts, and tseries R packages.

Bike Rental Demand Forecasting in Washington, DC

Nikki Carlson

2026-03-16