I plan to analyze Nice Ride MN bike share program data alongside historical weather data, in order to explore opportunities for optimizing the program operations and revenue.
Two main questions for the analysis: - What are the effects of weather on bikeshare volume? - How do weather effects on bikeshare volume differ between member and casual account types?
The client is Nice Ride MN and this research is timely, as Nice Ride MN is in the process of accepting bids from for-profit bike share programs in order to phase out their non-profit operation. Considering the current circumstances of the program, it is beneficial to analyze bike share user behavior to determine if certain business decisions might increase operational efficiency or increase revenue potential.
Important fields in our data include daily observations for casual and member riders trip duration and distance, along with weather variables such as average temperature, precipitation, and occuring weather types (i.e. hail, snow, thunder). This is a rich dataset offering the opportunity to understand public bike share behavior for two differing price structures. Exploration of the data shows differing use behavior for casual and member customers based on day of the week, time of day and responses to weather scenarios. Analyzing bike use behavior and correlations to weather scenarios allows for insights related to optimal maintenance scheduling and any potential price restructuring for strategies related to revenue and growth.
We are unable to factor in the effects of historical city construction projects. The data include two bike seasons, 2016-2017, and span a duration from early April to early November. During exploratory data analysis a trend of greater biking volume was noticed for 2017 in comparison to 2016. Based on research and potentially backed by domain knowledge, there was a greater volume of city construction projects in 2016 compared to 2017, this might be a uncontrollable variable in our analysis.
The Nice Ride MN bikeshare datasets required little data wrangling other than renaming a few columns based on preference, formatting the date and time columns to match with the weather data, and a full join to match the dock station data with the trip history data. Joining bikeshare volume with daily weather averages provided to be a more challenging task. Initially the data was formatted based on hourly weather and riding observations, however, the hourly weather data provided multiple observations per hour causing excessive noise. It was decided that a more reliable and consistent dataset could be created using daily weather averages for joining with ride observations.
Trip distrubution shows greatest variance year-over-year during the summer months. It is possible road construction played a larger role in 2016 than 2017, impacting ride volume negatively. However, it is difficult to control for road construction from one year to the next, this is simply a hypothesis to note but not investigate in the scope of this project.
Daily distribution by year confirms consistency year-over-year, member riders are utilizing bikshare for commuting purposes while casual riders utilize bikeshare for leisure, inverse of one another. An hourly breakdown of bike volume would further detail this observation, however, this is not relevant to the greater scope of our project purpose.
Exploring outliers in the data - observations falling outside a normal range - it was realized that 1,498 observations held a trip duration greater than one day, It is possible that some of the observations appears to be greater than one day based on midnight passing during a bike ride. Further examination determined that a few trips did not fit this possibility and were recorded as lasting almost the entire bike season! We may have uncovered a few faulty bike recording mechanisms for bikes in maintenance. We removed these observations before continuing our analysis to see the strength of correlation between bike use and various weather varibales.
Based on statistical analysis exploring the strength of relationships between bike volume and weather variables such as temperature, precipitation, wind speed, and humidity, we uncovered the following insights:
In comparison to casual riders, member riders appear more willing to ride in average weather variables of the following manner:
- Lower temperatures
- Higher wind speed
- Higher precipitation
- Higher relative humidty
- Lower heat index
The next step in our analysis will be to apply regression modeling to a majority proportion of the historical data. We will be seeking to determine a best line of fit over the data based on selective application of the variables described earlier. Having chosen the optimal arrangement of our variables, we will test the predictive strength of this model on the remaining portion of our data. This will serve as a secondary check and ensure a minimal amount of model predictions are false positives or negatives. Once this testing phase has validated our model, we can confidently plan to apply the model to future bike observations for the upcoming 2018 season.
The biggest change to our approach for this business problem is that we have scaled back the scope of our objectives, realizing that partnering bike share use with the unpredictable aspects of weather poses a large and complex scope of work. There are a few questions from our original analysis proposal that will have to remain unanswered until additional phases of this project can occur.