The Cyclistic Bike Share Analysis Case Study is the capstone project of choice for the Google Data Analytics Course. The purpose of this study is to analyze how different customer types—casual riders and annual members—use the Cyclistic bike-share service in Chicago. Throughout this study, key behavioral patterns that distinguish these two groups will be uncovered by applying the steps of the data analysis process: Ask, Prepare, Process, Analyze, Share, and Act. The ultimate goal is to generate actionable insights that will guide the marketing team in developing a targeted strategy to convert casual riders into long-term, profitable annual members. This analysis will inform future decision-making and support the company’s broader objective of sustainable growth through increased membership retention.
In 2016, Cyclistic, a fictional company created for this study, launched a successful bike-share program. Since its inception, the service has expanded to a fleet of 5,824 geotracked bicycles connected to a network of 692 docking stations across Chicago. Users can unlock bikes from one station and return them to any other station in the system at any time.
Historically, the company’s marketing strategy focused on building general awareness and appealing to a broad range of consumer segments. A key factor in this success has been the flexibility of its pricing plans, which include single-ride passes, full-day passes, and annual memberships. Riders who purchase single-ride or full-day passes are classified as casual riders, while those with annual memberships are considered members.
Financial analysis has shown that annual members are significantly more profitable than casual riders. Although flexible pricing attracts a wider customer base, future growth is expected to depend on increasing the number of annual memberships. Rather than focusing efforts on acquiring entirely new customers, there is a strategic opportunity to convert existing casual riders into members, as they are already familiar with the service and have chosen it for their transportation needs.
To support this conversion strategy, the team must first gain a deeper understanding of how annual members and casual riders differ, the factors that might influence casual riders to purchase a membership, and the role digital media could play in marketing efforts. Historical bike trip data will be analyzed to uncover trends and inform targeted marketing strategies.
The business task is to analyze historical trip data from Cyclistic’s bike-share program to identify how annual members and casual riders use the service differently. The goal is to uncover usage patterns and insights that can inform the development of a data-driven marketing strategy aimed at converting casual riders into annual members, thereby increasing customer retention and long-term profitability for the company.
The following
questions will guide the future marketing program:
1. How do annyal members and casual riders use
Cyclistic bikes differently?
2. Why would casual riders buy Cyclistic annual
memberships?
3. How can
Cyclistic use digital media to influence casual riders to become
members?
As part of the data preparation process, historical bike trip data was sourced and organized to support the analysis. The datasets Divvy 2019 Q1 and Divvy 2020 Q1 were used exclusively for this study, as they are compatible with Posit’s RStudio and remain within the platform’s free plan memory limitations. These datasets are publicly available and were originally provided by Motivate International Inc. under an open data license. While the datasets refer to “Divvy” instead of “Cyclistic” due to the fictional nature of the case study, they are appropriate and sufficient for addressing the business questions.
The data includes relevant fields necessary to examine rider behavior but excludes personally identifiable information in compliance with privacy and licensing requirements. As such, it is not possible to link individual rides to specific users or determine whether casual riders reside in the Cyclistic service area. However, the available information is adequate for identifying trends in usage between casual riders and annual members, which is the primary objective of the analysis.
During the data processing phase, both Microsoft Excel and Posit’s RStudio were utilized to clean and prepare the datasets for analysis. Initial data integrity checks were performed in Excel, where the data was examined for duplicates, missing values, and inaccuracies. The data was then formatted appropriately before being imported into RStudio for further processing.
The datasets Divvy 2019 Q1 and Divvy 2020 Q1 were loaded into RStudio and assigned to the data frames q1_2019 and q1_2020, respectively. To ensure seamless integration, the column names and data types were compared and standardized before merging the two datasets into a unified data frame, all_trips. Additional data cleaning steps were taken to ensure relevance and accuracy.
q1_2019 <- read_csv("C:/Users/../Divvy_Trips_2019_Q1.csv")
q1_2020 <- read_csv("C:/Users/../Divvy_Trips_2020_Q1.csv")
In the member_casual column, inconsistent labels
were identified:
These values were standardized so that all entries use either “member” or “casual” for consistency.
q1_2019 <- q1_2019 %>%
rename(ride_id = trip_id,
rideable_type = bike_id,
started_at = start_time,
ended_at = end_time,
start_station_name = from_station_name,
end_station_name = to_station_name,
start_station_id = from_station_id,
end_station_id = to_station_id)
all_trips <- bind_rows(q1_2019, q1_2020)
The following columns were removed due to their lack of
pertinence to the analysis objectives:
all_trips <- all_trips %>%
select(-c(start_lat, start_lng, end_lat, end_lng, birthyear, gender, tripduration))
all_trips$ride_length <- difftime(all_trips$ended_at, all_trips$started_at)
all_trips_v2 <- all_trips_v2 %>%
mutate(ride_length_mins = round(ride_length/60))
all_trips_v2 <- all_trips[!(all_trips$start_station_name == "HQ QR" | all_trips$ride_length < 60), ]
This finalized data frame was then used as the foundation for
subsequent analysis.
RStudio was used as the primary tool for data analysis, leading to actionable insights. With the business task centered on understanding how casual riders and annual members use Cyclistic bikes differently, a structured analytical approach was applied to ensure that all steps supported this objective. The primary goal was to extract meaningful insights from the data that could directly inform a marketing strategy aimed at converting casual riders into annual members.
With the data frame covering only the first quarter of the year (January to March), the decision was made to conduct the analysis using a monthly overview. While weekly trends were initially explored, they introduced excessive short-term variability, such as fluctuations caused by weather patterns or holiday schedules. These inconsistencies made it difficult to extract meaningful insights. As a result, monthly aggregation was selected as the most effective approach to draw comparisons and identify trends relevant to the business objective of converting casual riders into annual members.
This approach was chosen for several reasons:
Using this approach, the following key metrics were computed to support data visualization and insight generation:
To organize and facilitate these analyses, three sub-data frames were created using the cleaned dataset (all_trips_v2):
Monthly ride summary per rider type:
q_summary <- all_trips_v2 %>%
mutate(month = month(started_at, label = TRUE)) %>%
group_by(member_casual, month) %>%
summarize(
number_of_rides = n(),
average_duration = mean(ride_length_mins),
total_duration = sum(ride_length_mins)) %>%
arrange(member_casual, month)
Top 10 most popular start stations:
top_stations <- all_trips_v2 %>%
group_by(start_station_name) %>%
summarize(count = n()) %>%
arrange(desc(count)) %>%
rename(station = start_station_name) %>%
head(10)
Ratio of casual vs. member riders:
rider_ratio <- all_trips_v2 %>%
group_by(member_casual) %>%
summarize(total = n())
These structured aggregations formed the foundation for the
visualizations and interpretations presented in the subsequent
phase.
With the data fully prepared and analyzed, the next step was to effectively communicate the findings in a way that supports business decision-making. This phase focuses on translating raw metrics into clear, compelling visualizations that highlight key behavioral differences between casual riders and annual members. The following visuals were created using the ggplot2 library.
ggplot(rider_ratio, aes(x='', y=total, fill=member_casual)) + geom_col(width=1,color='white') + coord_polar(theta='y') + geom_text(aes(label= paste0(round((total/total_rides)*100), "%")), position=position_stack(vjust=0.5), size=5.5) + labs(title='Distribution of Riders per Rider Type', caption='Analysis done on data from "Divvy_Trips_2019_Q1" and "Divvy_Trips_2020_Q1"', fill='Rider Type') + theme(axis.text=element_text(size=15), axis.ticks=element_blank(), axis.title=element_blank(), panel.background=element_rect(fill='white')) + scale_y_continuous(breaks=rider_ratio_2$pos, labels=comma(rider_ratio$total))
ggplot(q_summary, aes(x=month, y=number_of_rides, fill=member_casual)) + geom_col(position = 'dodge') + labs(title="Number of Rides per Rider Type", x='Month of Year', y='Number of Rides', caption='Analysis of data from "Divvy_Trips_Q1_2019" and "Divvy_Trips_Q1_2020"', fill='Rider Type') + annotate('text', x=x_cord, y=q_summary$number_of_rides+offset_rides, label=q_summary$number_of_rides) + scale_y_continuous(labels = comma)
ggplot(q_summary, aes(x=month, y=average_duration, fill=member_casual)) + geom_col(position='dodge') + labs(title='Average Duration of Rides per Rider Type', x='Month of Year', y='Average Duration/ Minutes', caption='Analysis of data from "Divvy_Trips_2019_Q1" and "Divvy_Trips_2020_Q1"', fill='Rider Type') + annotate('text', x=x_cord, y=q_summary$average_duration+offset_avg, label=round(q_summary$average_duration)) + scale_y_continuous(labels = comma)
ggplot(q_summary, aes(x=month, y=total_duration, fill=member_casual)) + geom_col(position='dodge') + labs(title='Total Duration of Rides per Rider Type', x='Month of Year', y='Total Duration/ Minutes', caption='Analysis of data from "Divvy_Trips_2019_Q1" and "Divvy_Trips_2020_Q1"', fill='Rider Type') + annotate('text', x=x_cord, y=q_summary$total_duration+offset_total, label=round(q_summary$total_duration), size=3.5) + scale_y_continuous(label=comma)
top_stations <- top_stations %>% mutate(station=factor(station, levels=station))
ggplot(top_stations, aes(x=count, y=station, fill=station)) + geom_bar(stat='identity') + labs(title='Top 10 Most Popular Start Stations', x='Count', y='Station Name', caption='Analysis of data from "Divvy_Trips_2019_Q1" and "Divvy_Trips_2020_Q1"', fill='Station Name') + geom_text(aes(label=count), vjust=0.5, hjust=1.2, size=4.5) + theme(panel.background=element_rect(fill='white'))
| Category | Casual Riders | Members |
|---|---|---|
| Ride Count | Small share (9%) | Dominant (91%) |
| Ride Duration (Avg) | Very long (~2+ hrs avg. Jan/Feb) | Very short (~13–14 mins consistently) |
| Ride Duration (Total) | Growing impact, esp. in March | Highest total time overall |
| Seasonal Behavior | Major spike in March | Stable across months |
| Start Station Use | Likely overlap with member zones downtown | Concentrated in central high-traffic areas |
To analyze how casual riders and annual members use Cyclistic bikes differently in order to develop a targeted marketing strategy aimed at converting casual riders into annual members.
Each recommendation below directly addresses the insights uncovered in the previous phase and supports the business goal of increasing annual memberships:
1. Launch Seasonal Promotions at the Start of Spring
Insight: Casual ridership surged by 171% from
February to March.
Action:
Outcome: Converts riders when they’re most engaged, increasing the likelihood of trial-to-membership transitions.
2. Offer Value-Based Memberships for Long-Duration Riders
Insight: Casual riders spend significantly more time
per ride than members (avg. 64–135 mins vs. 13–14 mins).
Action:
Introduce or highlight a “Leisure Rider” membership plan with features like:
Outcome: Addresses the specific needs of casual users who prefer longer, leisure-focused rides and incentivizes them to commit.
3. Geo-Target Digital Ads and QR Promotions at Top Start Stations
Insight: The most popular start stations are located
in high-traffic downtown areas (e.g., Canal St, Clinton St, Michigan
Ave).
Action:
Outcome: Delivers conversion messaging where casual riders physically begin their journey, increasing relevance and engagement.
4. Educate on Cost Savings Using Personalized Ride History
Insight: Casual riders use the service less often
but for longer durations, making them ideal candidates for personalized
nudges.
Action:
Outcome: Converts value-conscious casual riders by making the cost advantage clear through personalized insights.
5. Use Monthly Trends for Campaign Planning and Member Retention
Insight: Member usage is consistent throughout Q1,
while casual usage is more seasonal.
Action:
Outcome: Ensures marketing efforts are data-informed and better timed to maximize engagement.
This case study demonstrates the power of data-driven insights in guiding strategic marketing decisions. By focusing on behavioral differences between rider types, Cyclistic can design more effective campaigns and product offerings to grow its base of annual members. The analysis, grounded in real data and business context, shows that converting even a fraction of casual riders could have a significant impact on long-term growth and profitability.