The core business question for this project is: How do members and casual riders differ in their use of Cyclistic bike-sharing?
Our goal is to identify these differences through analysis so we can develop marketing strategies to convert casual riders into annual members.
In this phase, we imported 12 months of raw CSV data and consolidated it into a single data frame for analysis.
library(tidyverse)
library(lubridate)
file.names <- list.files(pattern = ".csv")
all_trips <- file.names %>%
map_df(~read_csv(.))
# check the total rows of the file
nrow(all_trips)
## [1] 5620544
Cleaned the data, calculated the duration of each ride (ride_length), extracted the day of the week and month for deeper analysis. Removed invalid data (records with a duration less than 0 or missing station names).
#clean the data
library(tidyverse)
library(lubridate)
all_trips_v2 <-all_trips %>%
#1.calculate ride_length, second
mutate(ride_length = as.numeric(difftime(ended_at, started_at))) %>%
#2. Have the day
mutate(day_of_week = wday(started_at, label = TRUE, abbr = FALSE)) %>%
#3. get month and year in order to find the trend with season
mutate(month = month(started_at, label = TRUE, abbr = FALSE)) %>%
#4.keep the ride_length >0 and not include the stop info "HQ QR" or "TEST"
filter(ride_length > 0) %>%
filter(!is.na(start_station_name))
#5.check how many rows left
nrow(all_trips_v2)
## [1] 4425567
#casual and member
user_analysis <- all_trips_v2 %>%
group_by(member_casual) %>%
summarise(
total_rides = n(),
#all the length
avg_duration_mins = mean(ride_length) / 60, #mean of length
median_duration = median(ride_length) / 60
) %>% #median
mutate(percentage = total_rides / sum(total_rides) * 100)
print(user_analysis)
## # A tibble: 2 × 5
## member_casual total_rides avg_duration_mins median_duration percentage
## <chr> <int> <dbl> <dbl> <dbl>
## 1 casual 1579441 25.5 12.3 35.7
## 2 member 2846126 12.8 8.69 64.3
Casual users: Show a strong “weekend effect,” with ride volume peaking on Saturdays and Sundays. Members: Exhibit extremely stable ride patterns on weekdays, consistent with commuting patterns.
#what day who rides more
weekly_analysis <- all_trips_v2 %>%
#clarify with what day and type
group_by(member_casual, day_of_week) %>%
#add.groups = "drop"
summarise(
number_of_rides = n(),
average_duration = mean(ride_length) / 60,
.groups = "drop"
)
print(weekly_analysis)
## # A tibble: 14 × 4
## member_casual day_of_week number_of_rides average_duration
## <chr> <ord> <int> <dbl>
## 1 casual Sunday 265345 29.6
## 2 casual Monday 187346 25.4
## 3 casual Tuesday 179408 22.0
## 4 casual Wednesday 174982 21.2
## 5 casual Thursday 201849 22.4
## 6 casual Friday 246288 25.3
## 7 casual Saturday 324223 28.7
## 8 member Sunday 300907 14.2
## 9 member Monday 412910 12.5
## 10 member Tuesday 465859 12.3
## 11 member Wednesday 448195 12.3
## 12 member Thursday 458570 12.3
## 13 member Friday 414887 12.8
## 14 member Saturday 344798 14.3
Although casual users take fewer rides, their average ride duration (25.5 minutes) is nearly twice that of members (12.8 minutes). This suggests that casual users ride primarily for leisure and entertainment, while members prioritize efficiency.
While Member activity peaks during the workweek (Tuesday), indicating routine commuting, Casual ridership peaks significantly on weekends, with their longest average trips occurring on Sundays (29.6 minutes)
#what day who rides more
weekly_analysis <- all_trips_v2 %>%
#clarify with what day and type
group_by(member_casual, day_of_week) %>%
summarise(
number_of_rides = n(),
average_duration = mean(ride_length) / 60
)
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
print(weekly_analysis)
## # A tibble: 14 × 4
## # Groups: member_casual [2]
## member_casual day_of_week number_of_rides average_duration
## <chr> <ord> <int> <dbl>
## 1 casual Sunday 265345 29.6
## 2 casual Monday 187346 25.4
## 3 casual Tuesday 179408 22.0
## 4 casual Wednesday 174982 21.2
## 5 casual Thursday 201849 22.4
## 6 casual Friday 246288 25.3
## 7 casual Saturday 324223 28.7
## 8 member Sunday 300907 14.2
## 9 member Monday 412910 12.5
## 10 member Tuesday 465859 12.3
## 11 member Wednesday 448195 12.3
## 12 member Thursday 458570 12.3
## 13 member Friday 414887 12.8
## 14 member Saturday 344798 14.3
peak_analysis <- weekly_analysis %>%
group_by(member_casual) %>%
summarise(
max_rides = max(number_of_rides),
day_of_max_rides = day_of_week[which.max(number_of_rides)],
max_duration = max(average_duration),
day_of_max_duration = day_of_week[which.max(average_duration)]
)
print(peak_analysis)
## # A tibble: 2 × 5
## member_casual max_rides day_of_max_rides max_duration day_of_max_duration
## <chr> <int> <ord> <dbl> <ord>
## 1 casual 324223 Saturday 29.6 Sunday
## 2 member 465859 Tuesday 14.3 Saturday
By combining seasonality with weekdays, we can answer a very insightful question: “During the cold winter months, do casual cyclists ride on weekdays (for commuting) or on weekends (for recreation)?”
library(tidyverse)
all_trips_v2 <- all_trips_v2 %>%
mutate(season = case_when(
month %in% c("December", "January", "February") ~ "Winter",
month %in% c("March", "April", "May") ~ "Spring",
month %in% c("June", "July", "August") ~ "Summer",
month %in% c("September", "October", "November") ~ "Autumn",
))
# seasonal analysis with member behaviors
seasonal_analysis <- all_trips_v2 %>%
group_by(season, member_casual) %>%
summarise(number_of_rides = n(), .groups = 'drop')
seasonal_analysis$season <- factor(seasonal_analysis$season,
levels = c("Spring", "Summer", "Autumn", "Winter"))
season_day_analysis <- all_trips_v2 %>%
mutate(type_of_day = if_else(day_of_week %in% c("Saturday", "Sunday"), "Weekend", "Weekday")) %>%
group_by(season, type_of_day, member_casual) %>%
summarise(number_of_rides = n(), .groups = 'drop')
season_day_analysis <- all_trips_v2 %>%
# mark Saturday and Sunday as Weekend,MOnday to Friday as Weekday
mutate(type_of_day = if_else(day_of_week %in% c("Saturday", "Sunday"), "Weekend", "Weekday")) %>%
group_by(season, type_of_day, member_casual) %>%
summarise(number_of_rides = n(), .groups = 'drop')
Based on the data-driven insights from the analysis, here are three strategic recommendations to convert casual riders into annual members:
The data shows a significant spike in casual rider activity during weekends across all seasons.
Recommendation: Launch a “Weekend-Only Membership” or a seasonal summer pass. This lowers the barrier to entry for casual riders who primarily use the bikes for leisure, providing a “stepping stone” toward a full annual membership.
Casual riders travel, on average, twice as long as members per trip (25+ mins vs. 12 mins).
Recommendation: Use marketing campaigns that highlight the cost-effectiveness of membership for long-duration trips. Messaging should focus on how “Pay-per-minute” adds up quickly for casual users compared to the “Unlimited long rides” benefit for members.
The seasonal analysis reveals that summer is the peak period for casual riders, with activity nearly rivaling members on weekends.
Recommendation: Implement a targeted summer promotion (June–August). Use digital ads at popular leisure-destination stations (parks, waterfronts) offering a discount on the first year of an annual membership if they sign up during a summer weekend.
Since annual members are “hardcore” commuters who ride even in winter weekdays, focus on Member Loyalty.
Recommendation: Partner with local businesses (e.g., coffee shops near transit hubs) to offer “Winter Member Perks” to maintain high engagement and ensure subscription renewals during the off-season.
While this analysis focused on the behavioral contrast between user types, a preliminary check of trip duration vs. distance suggests that casual riders often engage in non-linear, leisure paths, whereas members follow high-efficiency, predictable routes. This further supports the recommendation for targeted weekend leisure packages.