About the Company

In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.

Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members.

Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, Moreno believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, Moreno believes there is a solid opportunity to convert casual riders into members.

She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs. Moreno has set a clear goal: Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the team needs to better understand:

Executive Summary

Cyclistic Case Study – Data-Driven Strategy Over Gut Instinct


The Business Task

This case study, completed as part of the Google Data Analytics Professional Certificate program, analyzes the most recent four months of Cyclistic’s ride data to address a core business question: how to convert casual riders into annual members.

While the executive team—backed by the conclusions of the financial analysts—and my boss, the Director of Marketing, are convinced that boosting annual memberships is the path to growth, I took the contrarian route: I let the data speak, and it told a different story.

The Analysis

My analysis aimed to prove what the numbers quietly scream: it’s the casual riders who bring in the money. While the dataset lacks personal identifiers like demographics or residency—which limits deeper user segmentation—the ride behavior alone provides strong evidence. It’s more than enough to guide strategic marketing for both rider types.

I mapped out critical metrics: ride frequency, duration, time-of-day patterns, weekday vs. weekend usage, and station hotspots. One thing became clear—casuals roam where the revenue flows.

Critical Observation:

Casuals pay per ride. The more they ride, the more revenue they generate.

Members pay a flat fee. Ride as much as they want—it doesn’t increase revenue.

Key Findings:

Total ride minutes (Casuals): 17,202,236
Total rides (Members): 713,073

Using a standard pricing model of $0.15/minute for casuals and $120/year for members, here’s the estimated gross revenue from January to April 2025:

Casuals: ~$2.04 million — and that figure climbs with every ride.
Members: ~$79,240 — fixed, regardless of usage.

Recommendations:

If the stakeholders are willing to accept what the data plainly shows, they have three clear options:

1. Two-fold shift: Prioritize casual riders while capping membership growth.

2. Create premium casual packages instead of forcing conversion.

3. Invest in understanding the casual base—because that’s where the money is.


____________________________________

Phase 1: ASK

Business Task Statement:

The goal of this analysis is focused on the first question - to understand the behavioral differences between Cyclistic’s casual riders and annual members. This forms the foundation for data-driven strategies that aim to increase rider engagement, grow subscription rates, and enhance operational efficiency. The findings will directly inform the work of Cyclistic’s marketing team, whom rely on accurate behavioral insights to make impactful decisions. Ultimately, these insights will feed into the strategic decision-making of the Cyclistic executive team, who must approve any major shifts in direction.

Given this focus, we ask a critical question: How do the usage patterns of casual and member riders differ, and can those differences guide us in designing smarter strategies to either convert more casuals into members or find new ways to enhance revenue from existing rider segments?



Phase 2: PREPARE


In this stage, we assessed data integrity and structure, ensuring it was relevant, complete, and ready for analysis. We also documented key metadata and identified potential limitations in the dataset.

a) Data Location: Public Divvy bike-share dataset, downloaded from Divvy’s official data portal.

b) Organization: Multiple CSV files by quarter/year, each containing ride details—timestamps, start/end stations, lat-long, user type (member or casual), and trip duration.

c) Data Credibility & Bias: The data comes from a real-world bike-share system operated by a reputable company, making it operationally reliable. However, it lacks personal identifiers such as user demographics or residency details, which limits the ability to analyze motivations or user segmentation beyond ride behavior.

Dataset Used for This Case Study:
Files: 202501-divvy_trip_data.csv to 202504-divvy_trip_data.csv

Reasons for Excluding Other Provided Datasets:

  1. The 2019 dataset uses a different schema compared to 2020. For example, 2019 includes columns like gender and birthyear, which are missing in 2020. It also classifies users as Customer and Subscriber instead of using the standardized casual and member labels. These structural inconsistencies make direct comparison and integration difficult without heavy pre-processing.

  2. The 2024 dataset, while complete, contained a high volume of data quality issues—hundreds of rows required deletion, which would compromise the integrity of the analysis.

  3. The 2025 datasets were chosen for their cleaner structure and minimal preprocessing requirements, making them more practical and reliable for this case study.

d) Privacy & Licensing: Anonymized dataset shared publicly with open license; no PII included, compliant with privacy standards.

e) Integrity Checks: Verified for valid lat-long ranges, no duplicated trip IDs, and consistent timestamp formats.

f) Data Problems: Missing user demographics, no residency info (tourist vs. local unknown), and no direct marketing exposure data—limiting deeper causal analysis.


Phase 3: PROCESS


We cleaned and transformed the data by removing duplicates, standardizing formats, checking logical consistency, and flagging anomalies. This step ensured a reliable dataset for accurate analysis.

a) Tools Chosen: utilized R (with tidyverse packages) for robust data wrangling, cleaning, and analysis—great for handling large datasets and reproducible workflows.

b) Checking for Data Integrity and Cleaning Steps

  1. Ensured that all files have matching schema1 before merging2 the raw CSV data using bind_rows()
library(readr)
library(here)

# --- Run Schema Check ---
raw_data_path <- "raw_2025"
all_files <- list.files(path = here(raw_data_path), pattern = "\\.csv$", full.names = TRUE)

if (length(all_files) == 0) {
  stop("No CSV files found in the specified raw data directory.")
}

first_schema <- colnames(read_csv(all_files[1], n_max = 1, show_col_types = FALSE))

check_schema <- function(file) {
  cols <- colnames(read_csv(file, n_max = 1, show_col_types = FALSE))
  identical(cols, first_schema)
}

schema_results <- sapply(all_files, check_schema)
names(schema_results) <- basename(all_files)

# Save schema check results
clean_data_path <- "clean_2025"
saveRDS(schema_results, here(clean_data_path, "schema_check_results.rds"))
## ## Schema Check Results
## ✅ All raw data files have compatible schemas.
library(dplyr)
source("2_merge_raw_files.R")
## Saved merged CSV locally to: D:/Capstone_Project/cyclistic_case_study/case_study_final/raw_2025/merged/merged_2025.csv 
## Saved RDS to: D:/Capstone_Project/cyclistic_case_study/case_study_final/raw_2025/merged/merged_2025.rds
  1. Removed duplicate3 trip IDs with distinct() – used ride_id to trim/strip spaces and drop duplicate IDs so every ride is unique.
source("step_01_remove_ride_id_duplicates.R")
## Step 01 complete: Removed 0 duplicate rows based on ride_id.
  1. Trim white space4 – ran a global str_trim on all character columns (e.g. station names, rideable_type) to kill stray spaces that break joins or groupings.
source("step_02_trim_whitespace.R")
## Step 02 complete: Trimmed whitespace in NA rows.
  1. Removed Exact Duplicates5 – checked every column for full‐row duplicates and kept only the first copy to avoid double-counting.
source("step_03_remove_exact_duplicates.R")
## Step 03 complete: Removed 0 exact duplicate rows.
  1. Standardized column names6 – (tolower(), underscores) and key categorical fields like member_casual and rideable_type so you’re not matching “Member” with “member.”
source("step_04_standardize_columns.R")
## Step 04 complete: Standardized columns and values. Affected 0 rows.
  1. Check Logical Values7 – created ride_length (via ended_at – started_at) and flagged any durations ≤ 0 (impossible times) so you know which rows to inspect.
source("step_05_verify_logical_values.R")
## Step 05 complete: Found 0 rows with logical time issues. Saved to step_05_issue.csv
  1. Checked for NA values8 with summary() – looked for blanks or NAs in essential fields (ride_id, member_casual, started_at, ended_at) and quarantined them without deleting.
source("step_06_missing_critical_fields_check.R")
## Step 06 complete: Found 0 rows with missing critical fields; saved to step_06_issue.csv
  1. Confirmed consistent date-time formats9 via lubdriate: – parsed started_at and ended_at, ensured they’re valid and that start precedes end, and saved bad ones for review.
source("step_07_Date_time_check_update.R")
## Step 07 complete: Found 3 rows with date-time issues; saved to step_07_issue.csv
  1. Used filters to validate lat-long ranges10 – checked start_lat, start_lng (and end coords if used) to confirm they’re within ±90/±180, flagging any that fall outside real-world ranges.
source("step_08_lat_long_check.R")
## Step 08 complete: Found 0 rows with invalid lat/long; saved to step_08_issue.csv

c) Verification: Each cleaning step was verified by its own summary report, and all were collated into a final report table showing rows before/after, affected counts, and deletion status (shown below). Conducted summary statistics (mean(), max(), table()) and visual checks (glimpse(), head()) before and after cleaning.

Summary of Cleaning Steps
Cleaning_Step Start_No_Rows End_No_Rows Affected_Rows Deleted
step_01_remove_ride_id_duplicates 960065 960065 0 N
step_02_trim_whitespace 960065 960065 NA N
step_03_remove_exact_duplicates 960065 960065 0 N
step_04_standardize_columns 960065 960065 0 N
step_05_logical_values_check 960065 960065 0 N
step_06_missing_critical_fields_check 960065 960065 0 N
step_07_date_time_check 960065 960065 3 N
step_08_lat_long_check 960065 960065 0 N

Table 3.1 - Summary of Cleaning Steps Lists the data cleaning process applied to the dataset. Across eight steps, the total number of rows remained constant at 960,065, indicating no records were deleted. Minor issues were identified in step 7 with 3 rows flagged for date-time inconsistencies, but these were not removed. Other steps, including removing duplicates, trimming whitespace, and checking logical values, found no affected rows. Overall, the dataset remained intact, with only minimal data quality flags noted but no deletions performed.


d) Documentation: All steps were meticulously scripted in an R Markdown file, ensuring that the entire cleaning process remains transparent, reproducible, and easy to audit. Each step— from data loading to issue checks—was clearly annotated and executed in sequence, with outputs and summary tables included for verification. This very report is the final product of that documentation effort: a structured, shareable artifact that communicates not only the outcomes but also the logic and rigor behind every cleaning decision made along the way.


Phase 4: ANALYZE


In this stage, we analyzed usage trends to uncover real behavioral patterns between casual riders and members. Members leaned toward short, frequent weekday rides—classic commuter behavior—while casuals showed a preference for longer weekend trips.

Given that casuals ride on a pay-per-use basis, these patterns hint at a different kind of value. This challenges the assumption that more members automatically mean more profit, and raises the question: who’s really paying the bills?

Load Libraries

knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(lubridate)

Load Cleaned Data

library(readr)
data <- read_csv("D:/Capstone_Project/Cyclistic_v3/clean_2025/merged_2025.csv", show_col_types = FALSE, progress = FALSE)

Add new header: ride_duration

library(dplyr)
data <- data %>%
  mutate(ride_duration = as.numeric(difftime(ended_at, started_at, units = "mins")))

Check mean_ride and max_ride exists

# Mean and max ride_duration
mean_ride <- mean(data$ride_duration, na.rm = TRUE)
max_ride <- max(data$ride_duration, na.rm = TRUE)

Stats Summary

Ride Duration Summary by Rider Type
member_casual Mean_Duration Median_Duration SD_Duration Min_Duration Max_Duration N_Rides
casual 19.82 9.40 80.22 0.02 1559.92 246992
member 10.88 7.48 26.03 0.02 1499.97 713073

Table 4.1 - Riders’ Stats This summary shows that casual riders take longer trips on average (mean: 19.82 minutes) compared to members (mean: 10.88 minutes), with casuals also having a much wider spread in ride durations (SD: 80.22 vs. 26.03). Casual rides have a higher maximum duration, and their median ride time (9.40 minutes) is slightly above that of members (7.48 minutes), suggesting more variability and potentially more leisurely or exploratory usage.


library(ggplot2)

ggplot(ride_duration_summary, aes(x = member_casual, y = Mean_Duration, fill = member_casual)) +
  geom_col(width = 0.6) +
  geom_text(aes(label = round(Mean_Duration, 1)), vjust = -0.5) + # Add text labels
  labs(
    title = "Average Ride Duration by Rider Type",
    x = "Rider Type",
    y = "Mean Duration (minutes)"
  ) +
  theme_minimal() +
  scale_fill_manual(values = c("casual" = "#FF9999", "member" = "#66CC99")) +
  theme(legend.position = "none")

Chart 4.1 - Average Ride Duration by Rider Type: This chart shows that casual riders spend nearly twice as long per ride (avg. ~20 mins) compared to members (~11 mins). This suggests members use bikes for quick, possibly utilitarian trips—like commuting—while casual riders likely use them for leisurely or exploratory purposes, consistent with tourist or weekend behavior. It hints at different motivations and use cases between the groups, which marketing can exploit.


library(ggplot2)

ggplot(ride_duration_summary, aes(x = member_casual, y = N_Rides, fill = member_casual)) +
  geom_bar(stat = "identity", width = 0.6) +
  geom_text(aes(label = scales::comma(N_Rides)), vjust = 1.5, color = "white", size = 5) +
  labs(title = "Total Number of Rides by Rider Type",
       x = "User Type",
       y = "Number of Rides") +
  theme_minimal() +
  scale_fill_manual(values = c("member" = "#1f77b4", "casual" = "#ff7f0e")) +
  theme(legend.position = "none")

Chart 4.2 - Total Number of Rides per Rider Type: This chart shows members take far more rides than casuals, emphasizing their loyalty and consistent platform use. However, members’ longer or shorter ride durations don’t affect revenue directly since they pay a fixed annual fee. For casual riders, ride duration multiplied by the per-ride fee directly translates into revenue, making each casual ride’s length more financially significant. So, members drive steady value through commitment, while casuals impact revenue through ride frequency and duration.


ggplot(ride_duration_summary, aes(x = member_casual, y = Mean_Duration, fill = member_casual)) +
  geom_bar(stat = "identity", width = 0.6) +
  geom_text(aes(label = round(Mean_Duration, 2)), vjust = -0.5, size = 5) +
  labs(title = "Average Ride Duration by Rider Type",
       x = "User Type",
       y = "Average Ride Duration (minutes)") +
  theme_minimal() +
  scale_fill_manual(values = c("member" = "#1f77b4", "casual" = "#ff7f0e")) +
  theme(legend.position = "none")

Chart 4.3 - Average Ride Duration This chart illustrates that casual riders have a higher average ride duration compared to members. Casuals likely take longer, leisurely trips—often tourists or occasional users—whereas members generally make shorter, more frequent rides. This difference reflects usage patterns but does not directly translate to revenue for members, who pay a fixed annual fee.




library(ggplot2)
library(dplyr)

# Filter only casual riders and count top 10 start stations
top_stations <- updated_data %>%
  filter(member_casual == "casual", !is.na(start_station_name)) %>%
  count(start_station_name, sort = TRUE) %>%
  slice_max(n, n = 10)

# Plot
ggplot(top_stations, aes(x = reorder(start_station_name, n), y = n)) +
  geom_col(fill = "#2c7fb8") +
  geom_text(aes(label = n), vjust = 1.2, color = "white", fontface = "bold", size = 4) +
  coord_flip() +
  labs(
    title = "Top 10 Start Stations Used by Casual Riders",
    x = "Start Station",
    y = "Number of Rides"
  ) +
  theme_minimal()

Chart 4.4 - Casuals’ Hotspots This chart shows the most frequent starting locations for casual riders. While often assumed to reflect tourist-heavy zones, the data invites a closer look—these areas could also signal under-tapped local demand or habitual short-trip users. Either way, they present key opportunities for targeted engagement strategies.




library(ggplot2)
library(dplyr)
library(lubridate)
library(here)

# Define the path to your updated merged data (.rds) file
updated_file_path <- here("clean_2025", "merged_2025_updated.rds")

# Load the updated data
updated_data <- readRDS(updated_file_path)

# Extract hour and member type
hourly_rides <- updated_data %>%
  mutate(hour = hour(started_at)) %>%
  group_by(hour, member_casual) %>%
  summarise(n = n(), .groups = 'drop')

# Calculate total rides for each member type
total_rides <- updated_data %>%
  group_by(member_casual) %>%
  summarise(total = n(), .groups = 'drop')

# Join total rides to hourly counts
hourly_percentages <- hourly_rides %>%
  left_join(total_rides, by = "member_casual") %>%
  mutate(percentage = (n / total) * 100)

ggplot(hourly_percentages %>% filter(!is.na(hour)),
       aes(y = factor(hour), x = percentage, fill = member_casual)) +
  geom_col(position = "dodge", width = 0.7) + # Adjust bar width
  labs(
    title = "Hourly Ride Percentage by Rider Type",
    y = "Hour of Day",
    x = "Percentage of Total Rides",
    fill = "Rider Type"
  ) +
  theme_minimal() +
  scale_fill_manual(values = c("casual" = "#FF9999", "member" = "#66CC99")) +
  scale_y_discrete(limits = factor(0:23), expand = expansion(add = c(0.5, 0.5))) # Add padding to y-axis

Chart 4.5- Hourly Distribution of Rides by Rider Type (Percentage of Total) This chart displays the percentage of each rider type’s total rides that occur within each hour of the day. The facets allow for a direct comparison of the hourly usage patterns between casual and member riders.




library(dplyr)
library(lubridate)
library(ggplot2)
library(here)

# Define the path to your updated merged data (.rds) file
updated_file_path <- here("clean_2025", "merged_2025_updated.rds")

# Load the updated data
updated_data <- readRDS(updated_file_path)

# Extract day of the week and member type
daily_rides <- updated_data %>%
  mutate(day_of_week = wday(started_at, label = TRUE)) %>%
  group_by(day_of_week, member_casual) %>%
  summarise(n = n(), .groups = 'drop')

# Calculate total rides for each member type (if we haven't already)
total_rides <- updated_data %>%
  group_by(member_casual) %>%
  summarise(total = n(), .groups = 'drop')

# Join total rides to daily counts
daily_percentages <- daily_rides %>%
  left_join(total_rides, by = "member_casual") %>%
  mutate(percentage = (n / total) * 100)

# Arrange days in the correct order for plotting
daily_percentages <- daily_percentages %>%
  mutate(day_of_week = factor(day_of_week, levels = c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat")))

print(ggplot(daily_percentages %>% filter(!is.na(day_of_week)),
       aes(y = day_of_week, x = percentage, fill = member_casual)) +
  geom_col(position = "dodge", width = 0.7) +
  labs(
    title = "Daily Distribution of Rides by Rider Type (Percentage of Total)",
    y = "Day of the Week",
    x = "Percentage of Total Rides",
    fill = "Rider Type"
  ) +
  theme_minimal() +
  scale_fill_manual(values = c("casual" = "#FF9999", "member" = "#66CC99"))
)

Chart 4.6 - Daily Ride Percentage This chart shows the percentage of each rider type’s total rides that occur on each day of the week. By comparing the side-by-side bars for casual and member riders, we can observe how their usage patterns differ across the week. For example, you can likely see that casual riders have a higher percentage of their total rides on weekends (Saturday and Sunday) compared to members, while members might show a more even distribution or a peak on weekdays.




library(dplyr)
library(lubridate)
library(ggplot2)
library(here)

# Define the path to your updated merged data (.rds) file
updated_file_path <- here("clean_2025", "merged_2025_updated.rds")

# Load the updated data
updated_data <- readRDS(updated_file_path)

# Extract month and member type
monthly_rides <- updated_data %>%
  mutate(month = month(started_at, label = TRUE)) %>%
  group_by(month, member_casual) %>%
  summarise(n = n(), .groups = 'drop')

# Calculate total rides for each member type (if we haven't already)
total_rides <- updated_data %>%
  group_by(member_casual) %>%
  summarise(total = n(), .groups = 'drop')

# Join total rides to monthly counts
monthly_percentages <- monthly_rides %>%
  left_join(total_rides, by = "member_casual") %>%
  mutate(percentage = (n / total) * 100)

# Arrange months in order
monthly_percentages <- monthly_percentages %>%
  mutate(month = factor(month, levels = c("Jan", "Feb", "Mar", "Apr")))

print(ggplot(monthly_percentages, aes(y = month, x = percentage, fill = member_casual)) +
  geom_col(position = "dodge", width = 0.7) +
  labs(
    title = "Monthly Ride Percentage by Rider Type",
    y = "Month",
    x = "Percentage of Total Rides",
    fill = "Rider Type"
  ) +
  theme_minimal() +
  scale_fill_manual(values = c("casual" = "#FF9999", "member" = "#66CC99"))
)

Chart 4.7 - Monthly Ride Behavior by percentage This chart illustrates the percentage of total rides for both casual and member riders across the months of January through April. As January and February are typically colder months in Chicago, we can observe a relatively lower percentage of rides for both groups during this period compared to the warmer months of March and April. The side-by-side bars allow us to compare the monthly distribution of ridership for each group.


🔍 Strategic Revenue Insights: Casual vs. Member Riders


This analysis challenges the assumption that increasing membership alone drives profitability and raises operational concerns around unrestricted membership growth.


I. Key Revenue Findings

Supporting points:

  1. Member trip duration doesn’t directly impact revenue (flat-rate model).
  2. Member revenue scales only with the number of subscribers.
  3. Casuals pay per ride—the more they ride, the more revenue.
  4. The ride data clearly demonstrates a higher overall revenue contribution from casual riders when compared to annual members.


🔹 Support for Point #4: “Casuals outperform members in revenue contribution”

Although the dataset does not include direct financial figures, we can still derive meaningful revenue estimates by analyzing rider behavior and usage statistics, then mapping these patterns onto widely accepted pricing models used by bike-share systems globally.

To estimate casual rider revenue, we do not need to know the exact number of users. Since casuals are charged per minute, we can simply multiply their total ride minutes by the standard per-minute rate.

In contrast, estimating member revenue requires an indirect approach. By analyzing ride frequency and behavior patterns, we can approximate the number of active members. This estimated population is then multiplied by the annual membership fee to calculate total revenue.


REFERENCE TABLES and CHARTS



Total Rides and Ride Minutes by Rider Type
member_casual Total_Rides Total_Minutes
casual 246992 4894565
member 713073 7755146

Reference Table 1: Total Ride minutes by Casuals and Total Ride by Members


Member Rides Within 45 Minutes
Total_Rides Rides_Within_45min Percent_Within_45min
713073 705113 98.88

Reference Table 2: Member Ride Behavior — This table shows that 98.88% of member rides are 45 minutes or less, highlighting how effectively members utilize the unlimited 45-minute daily ride benefit included in their subscription.


Reference Chart 1 – Hourly Distribution of Rides by Rider Type (Percentage of Total) This chart illustrates the hourly breakdown of ride activity by rider type. Member usage shows clear peaks between 7–9 AM and 5–7 PM, aligning with typical commuting hours. A secondary rise appears between 12–2 PM, likely reflecting midday errands or lunch breaks. These patterns reinforce the assumption that many members use the system as part of a structured daily routine.


Reference Chart 2 – Daily Ride Percentage This chart highlights member ride distribution across the week. Member activity remains consistently high on weekdays, reflecting common commuting behavior. However, weekend usage is also notable—this may indicate continued use for personal or leisure trips, or that some members maintain non-traditional work schedules and commute on weekends as well.


Estimating the Number of Members

To estimate the number of subscribed members, we begin by analyzing observable behavioral patterns in the ride data.

Important Note:

The estimated number of members should not exceed the available fleet size of 5,824 bikes. If member demand surpasses this capacity, it would leave limited or no availability for casual riders—creating operational strain and reducing system accessibility.

1 Hourly Usage Patterns

How many times does a member ride per day?

Reference Chart 1 strongly supports the assumption that a typical member uses the system at least twice per day—once in the morning and once in the evening. This pattern aligns with common commuting routines and suggests that individual members frequently make multiple rides each day. There is also the possibility of more than two rides, as the system does not restrict usage; members may take additional trips during midday or for other errands.

2 Company Given Fact:

While many Cyclistic users ride for leisure, an estimated 30% use the bikes to commute to work on a daily basis. This benchmark—whether referring to 30% of total utilization or 30% of the fleet size (5,824 bikes)—helps anchor our assumptions about member behavior and system demand during peak hours.

3 Estimating Ride Frequency of Members

This inference allows us to estimate the probable number of unique members in the system. Given:

  • 713,073 total member rides
  • 120 days from Jan 1 to April 30
  • 2 frequency of ride
Ride Frequency/Day Estimated Members
2 rides/day 713,073 / (120 × 2) = 2,971
3 rides/day 713,073 / (120 × 3) = 1,980
3.5 rides/day 713,073 / (120 × 3.5) = 1,698

4 Estimating Active Members from Fleet Utilization

If 30% of the fleet is regularly used by members for commuting—based on Cyclistic’s internal benchmark—then we estimate:

  • 30% of 5,824 bikes = 1,747 members

This estimated member base aligns closely with the calculated range based on actual ride behavior.

Given: - 713,073 total member rides from January 1 to April 30 (120 days)

We calculate the average ride frequency per member:

\[ \text{Ride Frequency} = \frac{713{,}073}{120 \times 1{,}747} \approx \textbf{3.4 rides/day} \]

This suggests the average member rides about 3.4 times per day, which supports the observed commuting patterns. It also implies that many members are taking more than two rides daily—not just morning and evening commutes, but also midday trips or errands.


5 Estimated Revenue Comparison

With an estimated member count and known pricing structures, we can now compare revenue between rider types. Here, we use a normalized annual membership rate of $120.00 and rate of $0.15 per minute for single journey (casual) based on the average from Cyclistic’s three comparable cities.

Note: Casual rider revenue is scalable based on activity. Member revenue is fixed, regardless of usage.

This highlights a key strategic consideration: members ride more frequently and consume more fleet resources, yet generate less revenue per ride.

Revenue Comparison Summary: Casual vs. Member Riders
Category Casual_Riders Member_Riders
Pricing Benchmark $0.15 per minute $120/year (~$0.33/day)
Total Ride Minutes (Jan–Apr 2025) 13,604,428 minutes 17,202,236 minutes
Estimated Revenue (Jan–Apr 2025) ~$2.04 million 1,747 × $120 = ~$79,240/year (~$26,400 in 4 months)
Annual Revenue Projection ~$5 million/year ~$79,240/year (capped)
Fleet Constraint Use available bikes when not occupied by members Max 5,824 bikes; high membership risks saturation
Cost-Revenue Disconnect Revenue grows with usage Ride more, but revenue is fixed




💡 Notes on Revenue Estimation

  1. Day passes were excluded from this estimate. They represent a minority of users and do not significantly affect overall revenue patterns.

  2. The calculation focuses on traditional bikes. While e-bike usage increases the per-minute rate for casuals, the annual member fee for e-bike access rises more moderately, widening the revenue gap further.

  3. The annual projection accounts for seasonality, balancing summer peaks and winter lulls.

  4. Even doubling the current subscriber base would not match the revenue generated by casual riders—the pay-per-ride model scales more directly with usage.

  5. To match casual rider revenue, Cyclistic would need over 17,000 annual members—an unrealistic target given the fleet constraint of 5,824 bikes.


II. Additional Insights

💡 Strategic Insight
“Casual riders operate under a pay-per-use model—revenue scales with activity.
Members are on a flat-rate model—revenue is fixed no matter how much they ride.
Converting high-usage casuals into members means trading scalable income for capped returns.”

🧭 Important Note on Member Value
While members may not maximize short-term revenue, they hold intangible value as brand ambassadors. Their visibility throughout the city increases Cyclistic’s presence, credibility, and social proof.
Memberships should be optimized—not eliminated—ensuring they align with fleet capacity and strategic priorities.


🚲 “Let rides—not loyalty—drive revenue”

A message for our walls—and our business model.




Phase 5: SHARE


In this phase, the findings were compiled into a well-structured, visually compelling report tailored for Cyclistic’s analytics team. With a focus on clarity and impact, the visualizations highlighted key behavioral differences between casual riders and annual members—such as ride durations, frequency, and peak usage times. Each element was designed to guide the viewer’s attention to insights that could inform strategic decisions, particularly those related to membership growth. The report maintained accessibility and polish to ensure it communicated effectively across stakeholders.

To ensure accessibility, this .Rmd report was designed with clear headings, simple language, alt text for images, and structured formatting so that all stakeholders—including those using assistive technologies—can easily navigate and understand the findings.

This completed case study is intended for presentation to the marketing director and, ultimately, for approval by Cyclistic’s executive team. It answers the central business question while setting the stage for actionable next steps. These insights now pave the way for Phase 6—where data-driven strategies can be implemented to optimize engagement, increase conversions, and enhance Cyclistic’s service offerings.


Phase 6: ACT

Recommendations

As demonstrated through the analysis in this case study, we arrive at a key conclusion: increasing annual membership alone is not a guaranteed path to profitability. While members contribute consistently to system usage, their flat-rate pricing model and high ride frequency place a disproportionate load on the fleet without generating proportional revenue. In contrast, casual riders—who pay per minute—contribute significantly more to revenue, with earnings that scale directly with usage.

This leads us to three strategic recommendations:

1. Two-Fold Action: Focus on Casual Riders / Cap Membership Growth

As a team, we have identified that casual riders generate more revenue per trip and place fewer long-term demands on system resources. We recommend prioritizing marketing efforts toward tourists, occasional riders, and event-driven users. At the same time, we should implement a cap on the number of annual memberships to safeguard system performance and ensure ride availability.

Overextending the fleet with locked-in members could lead to decreased service quality and a strained user experience.

💡 Strategic Insight:
“Casual riders operate under a pay-per-use model—revenue scales with activity.
Members are on a flat-rate model—revenue is fixed no matter how much they ride.
Converting high-usage casuals into members means trading scalable income for capped returns.”

🧭 Important Note on Member Value:
While members may not maximize short-term revenue, they bring intangible value as brand ambassadors. Their visibility across the city enhances Cyclistic’s public presence, credibility, and community trust. We believe memberships should be optimized—not eliminated—ensuring they grow in alignment with system capacity and strategic objectives.



2. Offer Premium Casual Packages Instead of Pushing Conversions

We propose the development of flexible, high-margin ride options that cater to the behavior of casual users. These may include multi-day passes, ride bundles, or locally tailored tourist packages. By maintaining the pay-per-use model, we preserve revenue scalability while meeting the needs of high-value casual riders—without requiring long-term commitment.



3. Invest in Understanding the Casual Base

To better support this strategy, we recommend initiatives that deepen our understanding of the casual segment. This could include optional in-app surveys or strategic partnerships with tourism boards to gain insights into user intent, residency status, and spending habits. Ultimately, we cannot design meaningful experiences—or profitable campaigns—if we don’t understand who we’re serving.

Let rides—not loyalty—drive revenue.






About the Author

Edgar has a background in entrepreneurship and business consulting, with diverse experience across infrastructure, data analytics, and organizational development. He is focused on applying data-driven insights to enhance strategy, performance, and impact in both public and private sector projects.

Footnotes


  1. Read script Here 1_schema_check.R↩︎

  2. Read script Here 2_merge_raw_files.R↩︎

  3. Read script Here step_01_remove_id_duplicates.R↩︎

  4. Read script Here step_02_trim_whitespace.R↩︎

  5. Read script Here step_03_remove_exact_duplicates.R↩︎

  6. Read script Here step_04_standardize_columns.R↩︎

  7. Read script Here step_05_verify_logical_values.R↩︎

  8. Read script Here step_06_missing_critical_fields_check.R↩︎

  9. Read script Here step_07_Date_time_check_update.R↩︎

  10. Read script Here step_08_lat_long_check.R↩︎