Cyclistic Bike-Share Analysis Case Study

Introduction

The Cyclistic Bike Share Analysis Case Study is the capstone project of choice for the Google Data Analytics Course. The purpose of this study is to analyze how different customer types—casual riders and annual members—use the Cyclistic bike-share service in Chicago. Throughout this study, key behavioral patterns that distinguish these two groups will be uncovered by applying the steps of the data analysis process: Ask, Prepare, Process, Analyze, Share, and Act. The ultimate goal is to generate actionable insights that will guide the marketing team in developing a targeted strategy to convert casual riders into long-term, profitable annual members. This analysis will inform future decision-making and support the company’s broader objective of sustainable growth through increased membership retention.

About the Company

In 2016, Cyclistic, a fictional company created for this study, launched a successful bike-share program. Since its inception, the service has expanded to a fleet of 5,824 geotracked bicycles connected to a network of 692 docking stations across Chicago. Users can unlock bikes from one station and return them to any other station in the system at any time.

Historically, the company’s marketing strategy focused on building general awareness and appealing to a broad range of consumer segments. A key factor in this success has been the flexibility of its pricing plans, which include single-ride passes, full-day passes, and annual memberships. Riders who purchase single-ride or full-day passes are classified as casual riders, while those with annual memberships are considered members.

Financial analysis has shown that annual members are significantly more profitable than casual riders. Although flexible pricing attracts a wider customer base, future growth is expected to depend on increasing the number of annual memberships. Rather than focusing efforts on acquiring entirely new customers, there is a strategic opportunity to convert existing casual riders into members, as they are already familiar with the service and have chosen it for their transportation needs.

To support this conversion strategy, the team must first gain a deeper understanding of how annual members and casual riders differ, the factors that might influence casual riders to purchase a membership, and the role digital media could play in marketing efforts. Historical bike trip data will be analyzed to uncover trends and inform targeted marketing strategies.

Ask

Business Task

The business task is to analyze historical trip data from Cyclistic’s bike-share program to identify how annual members and casual riders use the service differently. The goal is to uncover usage patterns and insights that can inform the development of a data-driven marketing strategy aimed at converting casual riders into annual members, thereby increasing customer retention and long-term profitability for the company.


The following questions will guide the future marketing program:
1. How do annyal members and casual riders use Cyclistic bikes differently?
2. Why would casual riders buy Cyclistic annual memberships?
3. How can Cyclistic use digital media to influence casual riders to become members?

Prepare

As part of the data preparation process, historical bike trip data was sourced and organized to support the analysis. The datasets Divvy 2019 Q1 and Divvy 2020 Q1 were used exclusively for this study, as they are compatible with Posit’s RStudio and remain within the platform’s free plan memory limitations. These datasets are publicly available and were originally provided by Motivate International Inc. under an open data license. While the datasets refer to “Divvy” instead of “Cyclistic” due to the fictional nature of the case study, they are appropriate and sufficient for addressing the business questions.

The data includes relevant fields necessary to examine rider behavior but excludes personally identifiable information in compliance with privacy and licensing requirements. As such, it is not possible to link individual rides to specific users or determine whether casual riders reside in the Cyclistic service area. However, the available information is adequate for identifying trends in usage between casual riders and annual members, which is the primary objective of the analysis.

Process

During the data processing phase, both Microsoft Excel and Posit’s RStudio were utilized to clean and prepare the datasets for analysis. Initial data integrity checks were performed in Excel, where the data was examined for duplicates, missing values, and inaccuracies. The data was then formatted appropriately before being imported into RStudio for further processing.

The datasets Divvy 2019 Q1 and Divvy 2020 Q1 were loaded into RStudio and assigned to the data frames q1_2019 and q1_2020, respectively. To ensure seamless integration, the column names and data types were compared and standardized before merging the two datasets into a unified data frame, all_trips. Additional data cleaning steps were taken to ensure relevance and accuracy.


q1_2019 <- read_csv("C:/Users/../Divvy_Trips_2019_Q1.csv")
q1_2020 <- read_csv("C:/Users/../Divvy_Trips_2020_Q1.csv") 


In the member_casual column, inconsistent labels were identified:

  • “Subscriber” and “member” referred to the same user type
  • “Customer” and “casual” referred to the same user type

These values were standardized so that all entries use either “member” or “casual” for consistency.


The column names of each files were compared and those of q1_2019 were renamed to match those of q1_2020. This was done to ensure consistency when merging both files:


q1_2019 <- q1_2019 %>%
  rename(ride_id = trip_id,
  rideable_type = bike_id,
  started_at = start_time,
  ended_at = end_time,
  start_station_name = from_station_name,
  end_station_name = to_station_name,
  start_station_id = from_station_id,
  end_station_id = to_station_id)

all_trips <- bind_rows(q1_2019, q1_2020)  


The following columns were removed due to their lack of pertinence to the analysis objectives:

  • start_lat
  • start_lng
  • end_lat
  • end_lng
  • birthyear
  • gender
  • tripduration


all_trips <- all_trips %>% 
  select(-c(start_lat, start_lng, end_lat, end_lng, birthyear, gender, tripduration))

Due to changes in data structure post-2020 Q1, the tripduration field was no longer available. As a result, a new calculated field named ride_length was created by computing the difference between the ended_at and started_at timestamps:


all_trips$ride_length <- difftime(all_trips$ended_at, all_trips$started_at)

The unit of measure for the calculated field is seconds. A new field named ride_length_mins was created to convert the measure of unit from seconds to minutes for the study.


all_trips_v2 <- all_trips_v2 %>% 
  mutate(ride_length_mins = round(ride_length/60))

Upon further inspection, certain records were found to have negative ride lengths, often associated with bikes being temporarily removed from circulation for quality control purposes. These records were filtered out to improve the reliability of the dataset. Records with ride lengths that were less than a minute were also filtered as they could have been erroneous. A cleaned version of the dataset, named all_trips_v2, was created to preserve the original data:


all_trips_v2 <- all_trips[!(all_trips$start_station_name == "HQ QR" | all_trips$ride_length < 60), ]


This finalized data frame was then used as the foundation for subsequent analysis.

Analyze

RStudio was used as the primary tool for data analysis, leading to actionable insights. With the business task centered on understanding how casual riders and annual members use Cyclistic bikes differently, a structured analytical approach was applied to ensure that all steps supported this objective. The primary goal was to extract meaningful insights from the data that could directly inform a marketing strategy aimed at converting casual riders into annual members.

With the data frame covering only the first quarter of the year (January to March), the decision was made to conduct the analysis using a monthly overview. While weekly trends were initially explored, they introduced excessive short-term variability, such as fluctuations caused by weather patterns or holiday schedules. These inconsistencies made it difficult to extract meaningful insights. As a result, monthly aggregation was selected as the most effective approach to draw comparisons and identify trends relevant to the business objective of converting casual riders into annual members.

This approach was chosen for several reasons:

  1. Reduced Visual Noise: Weekly data introduced volatility and visual clutter, obscuring long-term behavioral trends between rider types.
  2. Improved Comparability: Monthly summaries provided more balanced and interpretable comparisons, especially given uneven weekly data volumes.
  3. Alignment with Business Cycles: Marketing initiatives and performance tracking are often executed on a monthly basis, making this level of granularity more applicable and actionable for stakeholders.

Using this approach, the following key metrics were computed to support data visualization and insight generation:

  • Total number of rides per rider type
  • Average duration of rides per rider type
  • Total duration of rides per rider type
  • Proportional distribution of rider types
  • Top 10 most frequently used start stations

To organize and facilitate these analyses, three sub-data frames were created using the cleaned dataset (all_trips_v2):


Monthly ride summary per rider type:

q_summary <- all_trips_v2 %>%
  mutate(month = month(started_at, label = TRUE)) %>%
  group_by(member_casual, month) %>%
  summarize(
    number_of_rides = n(),
    average_duration = mean(ride_length_mins),
    total_duration = sum(ride_length_mins)) %>%
  arrange(member_casual, month)


Top 10 most popular start stations:

top_stations <- all_trips_v2 %>%
  group_by(start_station_name) %>%
  summarize(count = n()) %>%
  arrange(desc(count)) %>%
  rename(station = start_station_name) %>%
  head(10)


Ratio of casual vs. member riders:

rider_ratio <- all_trips_v2 %>%
  group_by(member_casual) %>%
  summarize(total = n())


These structured aggregations formed the foundation for the visualizations and interpretations presented in the subsequent phase.

Share

With the data fully prepared and analyzed, the next step was to effectively communicate the findings in a way that supports business decision-making. This phase focuses on translating raw metrics into clear, compelling visualizations that highlight key behavioral differences between casual riders and annual members. The following visuals were created using the ggplot2 library.

1. Distribution of Riders per Rider Type



ggplot(rider_ratio, aes(x='', y=total, fill=member_casual)) + geom_col(width=1,color='white') + coord_polar(theta='y') + geom_text(aes(label= paste0(round((total/total_rides)*100), "%")), position=position_stack(vjust=0.5), size=5.5) + labs(title='Distribution of Riders per Rider Type', caption='Analysis done on data from "Divvy_Trips_2019_Q1" and "Divvy_Trips_2020_Q1"', fill='Rider Type') + theme(axis.text=element_text(size=15), axis.ticks=element_blank(), axis.title=element_blank(), panel.background=element_rect(fill='white')) + scale_y_continuous(breaks=rider_ratio_2$pos, labels=comma(rider_ratio$total))

Key Insight:

  • The overwhelming majority of rides were taken by annual members, indicating that most users are already engaged in long-term usage.
  • However, casual riders, while a small segment, may represent a growth opportunity if they can be effectively converted into members.


2. Number of Rides per Rider Type



ggplot(q_summary, aes(x=month, y=number_of_rides, fill=member_casual)) + geom_col(position = 'dodge') + labs(title="Number of Rides per Rider Type", x='Month of Year', y='Number of Rides', caption='Analysis of data from "Divvy_Trips_Q1_2019" and "Divvy_Trips_Q1_2020"', fill='Rider Type') + annotate('text', x=x_cord, y=q_summary$number_of_rides+offset_rides, label=q_summary$number_of_rides) + scale_y_continuous(labels = comma)

Key Insight:

  • Member ride volume is consistently high and steady across all three months.
  • Casual ride volume tripled from February to March, suggesting seasonal patterns in casual usage (likely influenced by weather).
  • March shows a significant uptick in casual ridership, which may indicate higher interest or tourism in spring.


3. Average Duration of Rides per Rider Type



ggplot(q_summary, aes(x=month, y=average_duration, fill=member_casual)) + geom_col(position='dodge') + labs(title='Average Duration of Rides per Rider Type', x='Month of Year', y='Average Duration/ Minutes', caption='Analysis of data from "Divvy_Trips_2019_Q1" and "Divvy_Trips_2020_Q1"', fill='Rider Type') + annotate('text', x=x_cord, y=q_summary$average_duration+offset_avg, label=round(q_summary$average_duration)) + scale_y_continuous(labels = comma)

Key Insight:

  • Casual riders consistently have much longer average ride durations, often over 9× longer than members.
  • Member ride durations are short and stable, suggesting utility-based or commuting use.
  • The sharp drop in casual duration from February to March (135 to 64 mins) might reflect a shift from leisure to shorter, possibly more practical rides as temperatures improve.


4. Total Duration of Rides per Rider Type



ggplot(q_summary, aes(x=month, y=total_duration, fill=member_casual)) + geom_col(position='dodge') + labs(title='Total Duration of Rides per Rider Type', x='Month of Year', y='Total Duration/ Minutes', caption='Analysis of data from "Divvy_Trips_2019_Q1" and "Divvy_Trips_2020_Q1"', fill='Rider Type') + annotate('text', x=x_cord, y=q_summary$total_duration+offset_total, label=round(q_summary$total_duration), size=3.5) + scale_y_continuous(label=comma)

Key Insight:

  • Despite accounting for only 9% of total rides, casual riders contribute a large portion of total ride duration—over 46% of member total duration in March.
  • This supports the idea that casual users, while fewer, are intensive users in terms of time spent per trip, and could benefit from longer-term pricing models.


Share

Business Task (Recap)

To analyze how casual riders and annual members use Cyclistic bikes differently in order to develop a targeted marketing strategy aimed at converting casual riders into annual members.

Each recommendation below directly addresses the insights uncovered in the previous phase and supports the business goal of increasing annual memberships:


1. Launch Seasonal Promotions at the Start of Spring

Insight: Casual ridership surged by 171% from February to March.
Action:

  • Introduce limited-time spring offers, such as “March Membership Trial” or “3 Rides, Get 1 Month Free”.
  • Capitalize on the seasonal increase in interest when casual riders are already increasing usage.

Outcome: Converts riders when they’re most engaged, increasing the likelihood of trial-to-membership transitions.


2. Offer Value-Based Memberships for Long-Duration Riders

Insight: Casual riders spend significantly more time per ride than members (avg. 64–135 mins vs. 13–14 mins).
Action:

Introduce or highlight a “Leisure Rider” membership plan with features like:

  • Discounted long rides
  • Flexible usage hours
  • Pay-as-you-go caps for high-duration riders

Outcome: Addresses the specific needs of casual users who prefer longer, leisure-focused rides and incentivizes them to commit.


3. Geo-Target Digital Ads and QR Promotions at Top Start Stations

Insight: The most popular start stations are located in high-traffic downtown areas (e.g., Canal St, Clinton St, Michigan Ave).
Action:

  • Place station-specific QR codes with messages like “Riding often? Save with a membership!”
  • Run geo-targeted digital ads and push notifications near these hubs via Cyclistic’s app or partner apps.

Outcome: Delivers conversion messaging where casual riders physically begin their journey, increasing relevance and engagement.


4. Educate on Cost Savings Using Personalized Ride History

Insight: Casual riders use the service less often but for longer durations, making them ideal candidates for personalized nudges.
Action:

  • Use past ride data to generate custom emails or in-app prompts:
    • “You spent 120+ minutes riding this month. Here’s how much you could’ve saved with a membership!”
  • Visualize ride history vs. membership costs.

Outcome: Converts value-conscious casual riders by making the cost advantage clear through personalized insights.


5. Use Monthly Trends for Campaign Planning and Member Retention

Insight: Member usage is consistent throughout Q1, while casual usage is more seasonal.
Action:

  • Align marketing calendar to target casuals in Q2-Q3 (peak interest) and focus on member retention strategies (rewards, referral bonuses) during off-peak months.

Outcome: Ensures marketing efforts are data-informed and better timed to maximize engagement.

Conclusion

This case study demonstrates the power of data-driven insights in guiding strategic marketing decisions. By focusing on behavioral differences between rider types, Cyclistic can design more effective campaigns and product offerings to grow its base of annual members. The analysis, grounded in real data and business context, shows that converting even a fraction of casual riders could have a significant impact on long-term growth and profitability.