Overview
Divvy is a bike share program created by the Chicago Department of Transportation providing thousands of geo-tracked bikes at hundreds of stations. The program covers Chicago and its Evanston suburb, providing residents and visitors with a convenient, fun and affordable transportation option for getting around and exploring the Chicago and Evanston areas.

The program operates 24 hours/day, 7 days/week, 365 days/year and riders have access to all bikes and stations across the system. Pricing options include single rides (charged per minute), day passes ($15 with unlimited rides up to 3 hours) and annual memberships ($83 or $119/year with unlimited rides, first 45 minutes free). Riders can choose between classic bikes and pedal-assist motorized ebikes.

This case study analysis is focused on identifying the different behaviors of member and casual riders, with recommendations on marketing strategies to convert casual riders to members. The requirement is to produce a report with the following:

  1. A clear statement of the business task
  2. A description of all data sources used
  3. Documentation of any cleaning or manipulation of data
  4. A summary of the analysis
  5. Supporting visualizations and key findings
  6. The top three recommendations

The Ask: What’s the Business Challenge/Objective/Question?

Prepare: Data generation, collection, storage, and data management

What type of data is provided/required?

  • Data sources: Cyclistic historical trip data (original Divvy data source), found here. Trip data source files were available for April 2020 thru October 2022. This project uses one year of data that includes November 2021 thru October 2022.
  • Licensing: Motivate International Inc. under this data license.
  • Data credibility (ROCCC): License above covers details supporting credibility and integrity.
  • Data location: Source files stored on network with access governed by corporate policies and procedures. Work files located in RStudio Cyclistic Capstone Project folder.
  • Privacy, Security, Accessibility: Source files did not include PII. Data access is granted thru license. Work files have view only permissions.

Collection/Measurement: What analysis does data support? Is additional data needed?

  • Columns: rider type, start/end date and time, rideable type
  • Rider type provides member/casual groups
  • Start/end dates/times provide weekday, time of day, month, season, trip length
  • Rideable type provides bike classifications
  • Stats: number of trips (avg, max, min, total), length of trips (avg, max, min, mid, total), frequency (mode)
  • No additional data needs to be collected at this time

File Preparation

  • Verified all file header rows contained same fields, same data types and in same order
  • Verified all files did not contain duplicate records
  • Checked column counts for total number containing data, verifying columns that should contain the same counts actually did
  • Noted use of field constraints (restricted values), and id/description fields
  • Sorted files by member_casual, started_at, rideable_type

Process: Data cleaning/data integrity

  • Checked for duplicates
  • Checked column counts to verify columns that should contain the same count actually did
  • Used filters to find missing data: data added highlighted in green
  • Added day_of_week, weekday and ride_length columns to all files.
  • Used filter to highlight ride length <= zero or invalid data - highlighted in red
  • Comments:
    • Some station data (names, id, latitude, longitude) is missing and column counts are different between total records and similar columns.
    • Divvy’s legacy station master file did not enable correction of missing data. Insights based on station data can not be provided.

Analyze: Data exploration, visualization and analysis

  • Added a Ride Length Greater Than Zero sheet in workbooks to serve as data source for analysis
  • Added start_hour, time_of_day and season fields for further analysis. Categories for time_of_day: Early Morning (5 - 8am), Late Morning (9 - 11am), Early Afternoon (12 - 3pm), Late Afternoon (4 - 5pm), Early Evening (6 - 7pm), Late Evening (8 - 9pm) and Night (10pm - 4am).
  • Categories for seasons: Spring (March - May), Summer (June - August), Fall (September - November), Winter (December - February).
  • Added Analysis sheet in workbooks; included stats for mean ride length, max ride length, mode of day of week and number of casual and member rides on several workbooks
  • Added pivot tables on several workbooks; for casual and member riders, these provided average ride length, average ride length by weekday, average ride length and number of rides by weekday, and number of rides by type of bike
  • Merged monthly data files using Cloud Storage/BigQuery and MS SQL Server.
  • Added ride_duration field to capture ride length in minutes, and year, month, season_sort, month_nbr, time_of_day_sort fields.
  • Exported data files (all records with ride_duration > zero) from MS SQL Server to import into RStudio.

Summary Findings

  • Member rides occur most frequently on weekdays

  • Member rides occur most frequently during morning and afternoon

  • Member rides have a consistently short duration every day

  • Member rides have a consistent duration throughout the day

  • Casual rides occur most frequently on weekends

  • Casual rides occur most frequently during the afternoon

  • Casual rides have a longer duration on weekends

  • Casual rides have a longer duration from late morning thru late afternoon

In RStudio, we begin with loading the R Packages

We import the two data files using the Import Dataset tool

  • cyclistic_data.csv
  • cyclistic_summary_data_4trips.csv

We check the datasets for accuracy and completeness by:

looking at the structures



checking record counts



and previewing the data:

cyclistic_data.csv
##            ride_id rider_type            start_at year    month weekday
## 1 6BB9F79FB5BFFA0C     Casual 2021-11-01 00:00:14 2021 November  Monday
## 2 9B28379EC39C521C     Casual 2021-11-01 00:00:14 2021 November  Monday
## 3 7705C605D750A621     Casual 2021-11-01 00:01:36 2021 November  Monday
## 4 079314A319561676     Casual 2021-11-01 00:04:32 2021 November  Monday
## 5 49E9DB5878BBD249     Casual 2021-11-01 00:07:46 2021 November  Monday
##   time_of_day bike_type              end_at season ride_duration season_sort
## 1       Night  Electric 2021-11-01 00:30:10   Fall            30           3
## 2       Night  Electric 2021-11-01 00:04:06   Fall             4           3
## 3       Night  Electric 2021-11-01 00:09:44   Fall             8           3
## 4       Night  Electric 2021-11-01 00:11:37   Fall             7           3
## 5       Night   Classic 2021-11-01 00:17:11   Fall            10           3
##   weekday_sort time_of_day_sort month_nbr
## 1            1                7        11
## 2            1                7        11
## 3            1                7        11
## 4            1                7        11
## 5            1                7        11
cyclistic_summary_data_4trips.csv
##   rider_type season season_sort year    month month_nbr weekday weekday_sort
## 1     Casual   Fall           3 2021 November        11  Friday            5
## 2     Casual   Fall           3 2021 November        11  Friday            5
## 3     Casual   Fall           3 2021 November        11  Friday            5
## 4     Casual   Fall           3 2021 November        11  Friday            5
## 5     Casual   Fall           3 2021 November        11  Friday            5
##   bike_type     time_of_day time_of_day_sort nbr_of_rides total_length
## 1   Classic Early Afternoon                3         1086        33771
## 2   Classic   Early Evening                5          560        10642
## 3   Classic   Early Morning                1          281         5470
## 4   Classic  Late Afternoon                4          644        16241
## 5   Classic    Late Evening                6          280         5705
##   avg_length
## 1         31
## 2         19
## 3         19
## 4         25
## 5         20

Now we’ll start the analysis

From November 2021 thru October 2022, there were a total of 5,685,947 rides. Here’s the breakdown by Casual and Member:

## # A tibble: 2 × 4
## # Groups:   Total Rides, Rider Type [2]
##   `Total Rides` `Rider Type` `Nbr of Rides` `Percent of Total`
##           <int> <chr>                 <int>              <dbl>
## 1       5685947 Casual              2325591                 41
## 2       5685947 Member              3360356                 59

Number of Rides Analysis

We can see that Members consistently ride more than Casual riders on the weekdays. Casual riders are more active on the weekends.

Time of Day Analysis

Categories Used:

  • Early Morning (5 - 8 am)
  • Late Morning (9 - 11 am)
  • Early Afternoon (12 - 3 pm)
  • Late Afternoon (4 - 5 pm)
  • Early Evening (6 - 7 pm)
  • Late Evening (8 - 9 pm)
  • Night (10pm - 4 am)

Looks like Members are consistently more active throughout the day. Both riders show similar activity starting in the late evening hours. Let’s delve a little deeper into these numbers and look at time of day for each day of the week.

So here we’re seeing a pretty consistent pattern for Members. They are more active Monday - Friday, especially during early morning and afternoon hours.

Ride Length Analysis

Interesting. Averages show Member rides are consistently less than half as long as Casual rides. Casual riders take longer rides on the weekends. Let’s delve further and look at the median ride lengths. Are Casual riders really riding over twice as long as Members or are outliers skewing these results?

Ok, this makes more sense, with median ride lengths providing a more accurate result. Outliers were definitely skewing the average ride lengths. There’s a much closer correlation, and Casual riders still seem to take longer rides, especially on weekends.

Let’s look at average and median ride lengths by time of day.

Member rides are consistent over the course of the day as well, and still less than half as long as Casual rides. So these averages are following the same pattern as the averages by day. Outliers are probably skewing these results as well, but let’s confirm with the median results.

Again, we see median ride lengths providing a more accurate result. Outliers were definitely skewing the average ride lengths. Casual riders take predominantly longer rides from late morning to early evening. Ride lengths get closer starting in late evening hours.

Ride Duration Analysis

Now let’s look at rides in a different way. If we group the number of rides by time slots, what might that tell us?

Here we see that Members have the most rides under 30 minutes and a far greater number under 15 minutes. Casual riders have the most rides over 30 minutes and a far greater number over 45 minutes.

Divvy’s annual membership plans include the first 45 minutes. Let’s look at the numbers over 45 minutes. Are there really that many Casual riders that would benefit from Divvy’s annual memberships?

Casual rides over 45 minutes represent 10% of total Casual rides. Here we see that 70% of these rides are between 46 and 90 minutes.

Share: Key Findings and Recommendations

How are Member and Casual riders different?

Members ride more on weekdays

  • 75% of Member rides occur on weekdays
  • 63% of Casual rides occur on weekdays

Casual riders ride more on weekends

  • 37% of Casual rides occur on weekends
  • 25% of Member rides occur on weekends

Members start riding earlier on weekdays

  • 13% of Member rides occur between 5 and 8 AM
  • 5% of Casual rides occur between 5 and 8 AM

Member rides are shorter

  • 75% of Member rides are 15 minutes or less
  • 6% of Casual rides are 15 minutes or less

Casual rides are longer

  • 18% of Casual rides are longer than 30 minutes
  • 6% of Member rides are longer than 30 minutes

In general, it seems like Members might be using their bikes for commuting to work, based on times and weekdays. Casual riders might be pursuing more leisure activities.

Recommendations

Collect additional information for improving analysis

  • For Casual riders, include whether the rides were single rides or day passes, and the total cost of the ride
  • Track type of ride use: tourism, work commutes, exercise/fitness, shopping/errands, leisure
  • Survey riders to determine interest in participating in sponsored cause/event based rides

Provide regular communication to riders

  • Provide a quarterly or annual rider activity report. For Casual riders, this may indicate that annual membership would be a better alternative. Report could include number of rides, types (single, day pass), bike type, duration, ride cost, usage fees, etc.

Divvy’s annual memberships include these incentives:

  • 45 free minutes
  • Speed up with 50% off ebikes
  • Bike Angels points and rewards program

Marketing/Pricing Strategies

  • Look at using different ride activity thresholds to qualify Casual riders for annual membership, reduced usage fees and Bike Angels points/rewards. Thresholds could include number of rides, duration, total cost, etc. tracked in the activity reports.
  • Look at providing a Casual rider annual membership based on their weekend and time of day activity.
  • Based on rider interest, look at partnering with sponsors for causes/events to promote community engagement and raise funds for charitable organizations.

This concludes the Divvy Bikeshare analysis. Future analysis may include insights based on additional information recommended above.