Introduction

In this report, I will showcase a dataset of daily bike rental information during 2011-2012 and build a model to better understand which factors drive rental activity. After removal of two outliers, the data consists of 729 observations and contains information about weather, date information, and rental activity.

The Data

Total bike rentals, rentals from registered riders, and rentals from non-registered/“casual” riders are each present in the data. Of all riders, 81.2% are registered. I will focus on total riders to paint the most complete picture of rental activity. However, it is worth noting that ridership among registered and casual riders does appear to be driven be different factors - the correlation between the two is quite low. This could be an area for further research.

total registered casual
total 1.000 0.945 0.671
registered 0.945 1.000 0.391
casual 0.671 0.391 1.000

The following table contains correlations between Total Rentals and the four numerical weather-related variables in the data. Note that actual temperature and “feels like” temperature are very tightly correlated and correlate similarly with other variables. In predictive model, I will just include feeling temperature.

The number of total bike rentals appears to be approximately normally distributed with an average value of 4516.

There is meaningful variation in each of the four continious weather variables in the data. Each appear to be approximately normally distributed, with some amount of right skewness for wind speed.

Two clear trends emerge when visualizing daily ridership by month throughout the period covered by the data. First, there is a cyclical seasonal pattern that appears to be driven at least in part by the more favorable temperatures present in certain months. Second, there appears to be a generally positive trend in rentals over time.

As mentioned, the overall rate of registered user ridership is 81.2%. However, casual/registered ridership varies substantially by day of the week. On weekdays, the users are dominantly registered (around 85-90%) but on weekends, the percentage of casual users rises to over 30%.

The Model

Based on exploration of the data and iterated development informed by regression diagnostics, the final model estimates number of total bike rentals in a linear regression model using the following variables:

Each variable was statistically significant at the 5% confidence level (at least for some levels for categorical/factor variables). The overall predictive power of this model is strong. In this sample, the model explains 86.1% of the variation in total rentals. Despite this, the model is not perfect and there are fundamental regression assumptions that deserve additional investigation.

Intuitively, temperature is an important driver of rental levels. However, from the graphic below, it is clear that this relationship is complex and non-linear. Warmer weather generally leads to more rentals, but very hot weather appears to lead to decreased ridership. A quadratic relationship (blue line) captures this relatively well, but still appears to underestimate rentals at very low temperatures and overestimate them at very high temperatures. I use a bucketing quantile approach (red points) to help model this unintuitive behavior.

Recommendations

Weekend promotions

There is a significant dip in registered and an uptick in casual users on weekends. This is presents two opportunities: - Via promotions for registered users, the revenue earned from registered users will remain consistent - By providing registration benefits to casual users on the weekends, the number of registered users is likely to increase dramatically

Weekday Management

There is a significantly higher number of registered users on weekdays as opposed to weekends. Domain knowledge suggests that this is due to registered users who use bike sharing services for their daily workday commute.

The demand for bike sharing amongst this consumer segment can be effectively met if the bike sharing stations are set at key pick up points (where many registered users begin their commute) and their respective places of work (which would allow for a second commute back).

Climate based pricing

The nearly parabolic relationship between feel temperature and rental frequency tells us demand drops in less than ideal weather conditions. The optimal range of feel temperature is in the range of 20-30 degrees celsius.

In addition to annual membership, Bikeshare could also offer seasonal pass to increase market share in high demand season.