The primary objective of this project is to analyze Cyclistic’s historical bike trip data to identify, quantify, and understand the distinct behavioral patterns and usage differences between two core customer segments: Annual Members and Casual Riders. These data-driven insights will directly inform targeted marketing strategies aimed at converting high-value casual riders into dedicated annual members.
How do annual members and casual riders use Cyclistic bikes differently?
Lily Moreno: Director of Marketing and manager responsible for the development of promotional campaigns across digital channels.
Cyclistic Marketing Analytics Team: Data analysts responsible for collecting, analyzing, and reporting strategic data.
Cyclistic Executive Team: A detail-oriented leadership group that determines whether to approve the recommended marketing initiatives.
The analysis utilizes Cyclistic’s historical trip data covering the previous 12 months. The underlying public datasets have been generated and made available by Motivate International Inc. under an open data license.
In strict compliance with data-privacy standards, all personally identifiable information (PII) of riders is completely omitted from the source data. This prevents tracking individual purchase history, credit card connections, or localized residential status.
The raw batch files are read from the local repository and combined cleanly into a single dataframe via a mapping function.
The dataset underwent structural audits and automated cleaning to safeguard downstream data integrity.
# Standardize column naming schemas to snake_case format
bike_data <- bike_data %>%
clean_names()
# Audit missing value counts across variables
colSums(is.na(bike_data))## ride_id rideable_type started_at ended_at
## 0 0 0 0
## start_station_name start_station_id end_station_name end_station_id
## 1146841 1146841 1199356 1199356
## start_lat start_lng end_lat end_lng
## 0 0 5517 5517
## member_casual
## 0
Redundant rows are eliminated. Note that manual timestamp parsing using ymd_hms() is bypassed because the read_csv parser automatically ingested the temporal vectors as standard dttm (datetime) data types.
# Calculate primary trip lengths and pull distinct ordered time components
bike_data <- bike_data %>%
mutate(
ride_length = as.numeric(
difftime(ended_at, started_at, units = "mins")
),
day_of_week = weekdays(started_at),
day_of_week = factor(
day_of_week,
levels = c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")
),
month = month(started_at, label = TRUE),
hour = hour(started_at)
)Descriptive statistical evaluations are computed across customer segments to highlight trends.
## # A tibble: 2 × 2
## member_casual n
## <chr> <int>
## 1 casual 1991924
## 2 member 3531283
## Calculate high-level centralized dispersion parameters for trip length
bike_data %>%
group_by(member_casual) %>%
summarise(
avg_ride = mean(ride_length)
)## # A tibble: 2 × 2
## member_casual avg_ride
## <chr> <dbl>
## 1 casual 22.9
## 2 member 12.3
## Extract expanded summary metrics for granular distribution analysis
bike_data %>%
group_by(member_casual) %>%
summarise(
mean = mean(ride_length),
median = median(ride_length),
max = max(ride_length),
min = min(ride_length)
)## # A tibble: 2 × 5
## member_casual mean median max min
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 casual 22.9 11.5 1575. 0.000767
## 2 member 12.3 8.58 1500. 0.00130
## Generate cumulative usage volume by day of the week
rides_day <- bike_data %>%
group_by(member_casual, day_of_week) %>%
summarise(
rides = n(),
.groups = "drop"
)
## Generate average trip duration matrix by day of the week
duration_day <- bike_data %>%
group_by(member_casual, day_of_week) %>%
summarise(
avg_duration = mean(ride_length),
.groups = "drop"
)
## Generate monthly trip volumes to identify cyclical season trends
rides_month <- bike_data %>%
group_by(member_casual, month) %>%
summarise(
rides = n(),
.groups = "drop"
)Volume vs. Duration: Annual members log a higher absolute volume of rides, but casual riders stay on the bikes significantly longer per trip on average.
Temporal Footprints: Annual members show strong consistency during weekdays (aligned with professional commuting hours), while casual user demand spikes significantly on weekends for leisure activities.
Seasonal Factors: Both user segments experience sharp contractions in demand during the winter months, with peak usage concentrated in the summer.
Introduce Weekend-Only Memberships: Design a specialized annual membership package that targets casual users who primarily ride on Saturdays and Sundays.
Target High-Duration Leisure Hotspots: Launch targeted seasonal marketing campaigns around parks and waterfront routes during the summer, emphasizing the financial savings of converting to an annual membership for long-duration rides.
Commuter Incentives for Seasonal Riders: Use digital media channels to showcase the cost-effectiveness and health benefits of using Cyclistic for daily work commutes, specifically aiming to convert casual weekday riders before the summer peak.