1. Ask

The core business question for this project is: How do members and casual riders differ in their use of Cyclistic bike-sharing?

Our goal is to identify these differences through analysis so we can develop marketing strategies to convert casual riders into annual members.

2. Prepare

In this phase, we imported 12 months of raw CSV data and consolidated it into a single data frame for analysis.

library(tidyverse)
library(lubridate)

Identify and import all CSV files and check rows

file.names <- list.files(pattern = ".csv")
all_trips <- file.names %>%
  map_df(~read_csv(.))

# check the total rows of the file
nrow(all_trips)
## [1] 5620544

3. Process

Cleaned the data, calculated the duration of each ride (ride_length), extracted the day of the week and month for deeper analysis. Removed invalid data (records with a duration less than 0 or missing station names).

#clean the data
library(tidyverse)
library(lubridate)
all_trips_v2 <-all_trips %>%
  #1.calculate ride_length, second
  mutate(ride_length = as.numeric(difftime(ended_at, started_at))) %>%
  
  #2. Have the day
  mutate(day_of_week = wday(started_at, label = TRUE, abbr = FALSE)) %>%
  
  #3. get month and year in order to find the trend with season
  mutate(month = month(started_at, label = TRUE, abbr = FALSE)) %>%
  
  #4.keep the ride_length >0 and not include the stop info "HQ QR" or "TEST"
  filter(ride_length > 0) %>%
  filter(!is.na(start_station_name))

  #5.check how many rows left
  nrow(all_trips_v2)
## [1] 4425567

4. Analyze

4.1 User Distribution

 #casual and member 
  user_analysis <- all_trips_v2 %>%
    group_by(member_casual) %>%
    summarise(
      total_rides = n(),
      #all the length
      avg_duration_mins = mean(ride_length) / 60,   #mean of length
      median_duration = median(ride_length) / 60
      ) %>%   #median
      mutate(percentage = total_rides / sum(total_rides) * 100)
  
  print(user_analysis)
## # A tibble: 2 × 5
##   member_casual total_rides avg_duration_mins median_duration percentage
##   <chr>               <int>             <dbl>           <dbl>      <dbl>
## 1 casual            1579441              25.5           12.3        35.7
## 2 member            2846126              12.8            8.69       64.3

4.2 Time-based Analysis

Casual users: Show a strong “weekend effect,” with ride volume peaking on Saturdays and Sundays. Members: Exhibit extremely stable ride patterns on weekdays, consistent with commuting patterns.

#what day who rides more
weekly_analysis <- all_trips_v2 %>%
  #clarify with what day and type
  group_by(member_casual, day_of_week) %>%
  #add.groups = "drop"
  summarise(
    number_of_rides = n(),
    average_duration = mean(ride_length) / 60,
    .groups = "drop"
  )
    
  print(weekly_analysis)
## # A tibble: 14 × 4
##    member_casual day_of_week number_of_rides average_duration
##    <chr>         <ord>                 <int>            <dbl>
##  1 casual        Sunday               265345             29.6
##  2 casual        Monday               187346             25.4
##  3 casual        Tuesday              179408             22.0
##  4 casual        Wednesday            174982             21.2
##  5 casual        Thursday             201849             22.4
##  6 casual        Friday               246288             25.3
##  7 casual        Saturday             324223             28.7
##  8 member        Sunday               300907             14.2
##  9 member        Monday               412910             12.5
## 10 member        Tuesday              465859             12.3
## 11 member        Wednesday            448195             12.3
## 12 member        Thursday             458570             12.3
## 13 member        Friday               414887             12.8
## 14 member        Saturday             344798             14.3

4.3 Ride Length Analysis

Although casual users take fewer rides, their average ride duration (25.5 minutes) is nearly twice that of members (12.8 minutes). This suggests that casual users ride primarily for leisure and entertainment, while members prioritize efficiency.

While Member activity peaks during the workweek (Tuesday), indicating routine commuting, Casual ridership peaks significantly on weekends, with their longest average trips occurring on Sundays (29.6 minutes)

#what day who rides more
  weekly_analysis <- all_trips_v2 %>%
  
  #clarify with what day and type
  group_by(member_casual, day_of_week) %>%
  summarise(
    number_of_rides = n(),
    average_duration = mean(ride_length) / 60
  )
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
  print(weekly_analysis)
## # A tibble: 14 × 4
## # Groups:   member_casual [2]
##    member_casual day_of_week number_of_rides average_duration
##    <chr>         <ord>                 <int>            <dbl>
##  1 casual        Sunday               265345             29.6
##  2 casual        Monday               187346             25.4
##  3 casual        Tuesday              179408             22.0
##  4 casual        Wednesday            174982             21.2
##  5 casual        Thursday             201849             22.4
##  6 casual        Friday               246288             25.3
##  7 casual        Saturday             324223             28.7
##  8 member        Sunday               300907             14.2
##  9 member        Monday               412910             12.5
## 10 member        Tuesday              465859             12.3
## 11 member        Wednesday            448195             12.3
## 12 member        Thursday             458570             12.3
## 13 member        Friday               414887             12.8
## 14 member        Saturday             344798             14.3
  peak_analysis <- weekly_analysis %>%
  group_by(member_casual) %>%
  summarise(
    max_rides = max(number_of_rides),
    day_of_max_rides = day_of_week[which.max(number_of_rides)],
    max_duration = max(average_duration),
    day_of_max_duration = day_of_week[which.max(average_duration)]
  )

print(peak_analysis)
## # A tibble: 2 × 5
##   member_casual max_rides day_of_max_rides max_duration day_of_max_duration
##   <chr>             <int> <ord>                   <dbl> <ord>              
## 1 casual           324223 Saturday                 29.6 Sunday             
## 2 member           465859 Tuesday                  14.3 Saturday

4.4 Multi-dimensional Analysis

By combining seasonality with weekdays, we can answer a very insightful question: “During the cold winter months, do casual cyclists ride on weekdays (for commuting) or on weekends (for recreation)?”

library(tidyverse)
 all_trips_v2 <- all_trips_v2 %>%
    mutate(season = case_when(
      month %in% c("December", "January", "February") ~ "Winter",
      month %in% c("March", "April", "May") ~ "Spring",
      month %in% c("June", "July", "August") ~ "Summer",
      month %in% c("September", "October", "November") ~ "Autumn",
    ))
 
 # seasonal analysis with member behaviors
  seasonal_analysis <- all_trips_v2 %>%
    group_by(season, member_casual) %>%
    summarise(number_of_rides = n(), .groups = 'drop')
  seasonal_analysis$season <- factor(seasonal_analysis$season, 
                                     levels = c("Spring", "Summer", "Autumn", "Winter"))

season_day_analysis <- all_trips_v2 %>%
  mutate(type_of_day = if_else(day_of_week %in% c("Saturday", "Sunday"), "Weekend", "Weekday")) %>%
  group_by(season, type_of_day, member_casual) %>%
  summarise(number_of_rides = n(), .groups = 'drop')
season_day_analysis <- all_trips_v2 %>%
  # mark Saturday and Sunday as Weekend,MOnday to Friday as Weekday
  mutate(type_of_day = if_else(day_of_week %in% c("Saturday", "Sunday"), "Weekend", "Weekday")) %>%
  group_by(season, type_of_day, member_casual) %>%
  summarise(number_of_rides = n(), .groups = 'drop')

5. Share

5.1 Total Ride By day:

ggplot(data = weekly_analysis, aes(x = day_of_week, y = number_of_rides, fill = member_casual))+
    geom_col(position = "dodge")+
    scale_fill_manual(values = c("casual" = "#F8766D", "member" = "#00BFC4")) + 
    theme_minimal() +
    labs(title = "Total Rides by Day: Member vs Casual",
         subtitle = "casual riders peak on weekends,Members stay steady on weekdays",
         x = "Day of week", y = "Number of Rides")

User Roles (Commuter vs. Leisure)

Member Patterns: Annual members show a very consistent and stable riding pattern from Monday to Friday, with a slight dip on weekends. This confirms that members primarily use Cyclistic for commuting to work or school.

Casual Patterns: Casual riders exhibit a massive “Weekend Spike.” Their activity starts to climb on Friday and peaks on Saturday and Sunday. This suggests casual riders use the service mainly for recreational purposes and sightseeing.

The Intersection: On weekends, the gap between the two groups narrows significantly, showing that Saturdays and Sundays are the high-traffic “golden hours” for potential conversion.

5.2 Average Duration by Day: Member vs Casual

 # average time analysis
  ggplot(data = weekly_analysis, aes(x = day_of_week, y = average_duration, fill = member_casual)) +
    geom_col(position = "dodge") +
    # customerize color of the bar
    scale_fill_manual(values = c("casual" = "#F8766D", "member" = "#00BFC4")) + 
    theme_minimal() + 
    labs(title = "Average Duration by Day: Member vs Casual",
         x = "Day of Week", 
         y = "Average Duration (Mins)")

Insights: Trip Intensity & Value Perception

  • The “Double Duration” Gap: A striking finding is that casual riders trip durations (approx. 25-30 mins) are consistently more than double those of members (approx. 12-14 mins) across every single day of the week.

  • Consistency vs. Variability: Members’ trip durations are extremely stable, rarely fluctuating by more than a minute regardless of the day. This reinforces the “routine commute” theory—they know exactly how long it takes to get from A to B.

  • Sunday Peak: For casual riders, the longest trips occur on Sundays (peaking near 30 minutes). This indicates that their Sunday rides are likely long-distance leisure tours or explorations, which are the most expensive trips under a “pay-as-you-go” model.

5.3 Comparing Weekdays vs Weekends across all seasons

 ggplot(season_day_analysis, aes(x = type_of_day, y = number_of_rides, fill = member_casual)) +
  geom_col(position = "dodge") +
  # classify by season
  facet_wrap(~season) + 
  scale_fill_manual(values = c("casual" = "#F8766D", "member" = "#00BFC4")) +
  theme_minimal() +
  labs(title = "Riding Patterns by Season and Day Type",
       subtitle = "Comparing Weekdays vs Weekends across all seasons",
       x = "Type of Day", y = "Total Rides")

Insights:

  • Winter: Take the Winter chart in the bottom right corner: Member (blue) still logs about 250,000 rides on weekdays, while Casual (pink) has plummeted to near rock bottom.

Conclusion: Members rely on bikes for “essential needs” (work, commuting), while casual users are entirely at the mercy of the weather.

The drastic drop in winter ridership is consistent with Chicago’s extreme weather conditions, emphasizing the seasonal nature of casual usage.”

  • The Summer Reversal: On summer weekends, the pink bar chart for casual users actually surpasses that of members.

Conclusion: Summer weekends are the best period when casual users are most active. This is the best time to promote membership services.

  • Weekday vs. Weekend Patterns:

Regardless of the season, member ride volume consistently follows a “weekdays > weekends” trend; casual users’ ride volume, however, is very balanced, and even shows stronger performance on weekends during spring, summer, and fall.

6. Act: Strategic Recommendations

Based on the data-driven insights from the analysis, here are three strategic recommendations to convert casual riders into annual members:

6.1. “Weekend rider” Membership Packages

The data shows a significant spike in casual rider activity during weekends across all seasons.

Recommendation: Launch a “Weekend-Only Membership” or a seasonal summer pass. This lowers the barrier to entry for casual riders who primarily use the bikes for leisure, providing a “stepping stone” toward a full annual membership.

6.2. High-Duration Value Proposition

Casual riders travel, on average, twice as long as members per trip (25+ mins vs. 12 mins).

Recommendation: Use marketing campaigns that highlight the cost-effectiveness of membership for long-duration trips. Messaging should focus on how “Pay-per-minute” adds up quickly for casual users compared to the “Unlimited long rides” benefit for members.

6.3. Seasonal Summer Conversion Campaign

The seasonal analysis reveals that summer is the peak period for casual riders, with activity nearly rivaling members on weekends.

Recommendation: Implement a targeted summer promotion (June–August). Use digital ads at popular leisure-destination stations (parks, waterfronts) offering a discount on the first year of an annual membership if they sign up during a summer weekend.

6.4. Winter Retention for Members

Since annual members are “hardcore” commuters who ride even in winter weekdays, focus on Member Loyalty.

Recommendation: Partner with local businesses (e.g., coffee shops near transit hubs) to offer “Winter Member Perks” to maintain high engagement and ensure subscription renewals during the off-season.

While this analysis focused on the behavioral contrast between user types, a preliminary check of trip duration vs. distance suggests that casual riders often engage in non-linear, leisure paths, whereas members follow high-efficiency, predictable routes. This further supports the recommendation for targeted weekend leisure packages.