Cyclistic Bike-Share Analysis

📍 Business Task

Cyclistic, a Chicago-based bike-share company, offers bikes to both casual riders and annual members. The marketing team aims to convert more casual riders into annual members to increase long-term revenue.

As a data analyst, my goal is to identify how annual members and casual riders use Cyclistic differently, and provide insights to support a targeted marketing strategy.

Key Question:
> How do annual members and casual riders differ in their bike usage patterns?

🗂️ Data Source Description

This analysis uses historical ride data provided by Cyclistic, a fictional bike-share program in Chicago. The dataset is part of the Google Data Analytics Capstone Case Study and reflects real-world data from Divvy, Chicago’s actual bike-share service.

The data includes two quarters:

Each record represents a single bike trip and contains information such as:

  • User type: Whether the rider is a customer (casual) or subscriber (member)
  • Trip duration: Start and end times of the ride
  • Station data: Start and end stations (names and IDs)
  • Demographics: 2019 data contains gender and birth year
  • Location: 2020 data includes latitude and longitude

📝 Data Processing & Cleaning

To ensure consistency and comparability across the two datasets (2019 Q1 and 2020 Q1), several preprocessing steps were performed:

Key Steps:

  • Column Selection: Only relevant columns were retained: user type, trip times, station names/IDs, and trip duration.
  • Column Renaming: Different naming conventions (e.g., member_casual in 2020 vs. usertype in 2019) were standardized.
  • Trip Duration Calculation: For 2020 data, tripduration was computed from start and end times (in minutes).
  • User Type Harmonization: All user types were unified into two categories:
    • "Subscriber": Includes member (2020) and Subscriber (2019)
    • "Customer": Includes casual (2020) and Customer (2019)

⚠️ Additional Notes:

  • 2019 data contains demographic info (e.g., gender, birth year), while 2020 data includes station coordinates (lat/lng).
  • Obvious anomalies like negative or extremely long durations were filtered out when analyzing trip time distributions.

Below is the code that performed this processing:

# Combine 2019 and 2020 with clean column names and labels
data2019_clean <- data2019 %>% 
  select(
    usertype,
    start_time = start_time,
    end_time = end_time,
    from_station_name = from_station_name,
    to_station_name = to_station_name,
    from_station_id = from_station_id,
    to_station_id = to_station_id,
    tripduration
  )

data2020_clean <- data2020 %>% 
  mutate(
    tripduration = as.numeric(difftime(ended_at, started_at, units = "mins"))
  ) %>%
  select(
    usertype = member_casual,
    start_time = started_at,
    end_time = ended_at,
    from_station_name = start_station_name,
    to_station_name = end_station_name,
    from_station_id = start_station_id,
    to_station_id = end_station_id,
    tripduration
  )

# Merge and harmonize user types
data_all <- bind_rows(data2019_clean, data2020_clean) %>% 
  mutate(
    usertype = case_when(
      usertype %in% c("member", "Subscriber") ~ "Subscriber",
      usertype %in% c("casual", "Customer") ~ "Customer",
      TRUE ~ usertype
    )
  )

🔎 Analysis & Visualization

Gender vs User Type (2019)

  • Bar chart showing number of trips by gender and user type
  • Caption shows customer ratio by gender

Insight: Male and female riders show different subscription preferences. Given that female riders have a higher share of Customers, targeted promotions toward female casual riders may be an effective strategy to boost membership.

Age Group vs User Type (2019)

  • Bar chart of trip counts by 10-year birth cohort
  • Line chart showing customer ratio by age group

Insight: Riders born after 2000 show the highest Customer share (~87%), whereas those born in the 1980s are more likely to be Subscribers. This suggests that younger riders are more likely to ride occasionally. To convert them into members, Cyclistic may offer student/young adult discounts or flexible membership tiers.

Start Station Preference by User Type (2020)

  • Top 20 start stations by user type
  • Marker size shows trip volume
  • Color: pink = Customer, blue = Subscriber

Insight: Customer trips cluster around tourist areas and recreational hotspots, such as parks and waterfronts. In contrast, Subscriber rides are more concentrated near downtown hubs and transit-heavy areas, indicating a likely commuter behavior. This geographic segmentation offers opportunities for targeted campaigns based on location and trip purpose (e.g., leisure vs. daily transit).

Average Trip Duration by User Type (2019 and 2020)

Average Duration Comparison

  • Bar chart with error bars showing mean trip durations for each user type
  • Customer trips are significantly longer than Subscriber trips on average

Insight: Customers take much longer trips on average (~1266 seconds) than Subscribers (~402 seconds), suggesting they may be using bikes more for leisure or exploration, while Subscribers likely use them for commuting or short, routine tasks.

Trip Duration Distribution

  • Density plot compares the spread and frequency of trip durations by user type
  • X-axis is capped at 100 seconds to emphasize the core distribution

Insight: The distribution shows that most Subscriber trips are short and tightly clustered, peaking sharply below 10 minutes. In contrast, Customer trips have a flatter, wider spread, with a longer right tail—indicating more variability and longer rides.

Temporal Usage Pattern Analysis (2019 and 2020)

To analyze how user behavior varies across time, we examined patterns by month, weekday, and hour of day. Because the original dataset was too large to efficiently process on a personal device, we used a random sample of 100,000 trips to ensure timely code execution and visualization generation.

set.seed(42)
data_sample <- data_all %>% sample_n(100000)

data_sample <- data_sample %>%
  mutate(
    month = month(start_time, label = TRUE, abbr = FALSE),
    weekday = wday(start_time, label = TRUE, abbr = FALSE, week_start = 1),
    hour = hour(start_time)
  )

🧾 Conclusion & Strategic Recommendations

Our analysis reveals clear and actionable differences between Customers (casual riders) and Subscribers (annual members), with key distinctions across time, geography, trip duration, and demographics:

🔍 Key Takeaways:

Temporal Usage:

  • Subscribers exhibit weekday, peak-hour usage patterns, aligning with work commutes.
  • Customers are weekend-heavy, with rides concentrated between late morning and afternoon, signaling recreational or leisure use.

Trip Duration:

  • Customers take significantly longer trips, further reinforcing non-commute, experience-driven motivations.

Geographic Patterns:

  • Customer trips cluster around tourist destinations and parks, while Subscribers favor urban centers and transit hubs.

Demographics:

  • Younger riders (born after 2000) and female users have a higher proportion of casual usage.
  • Subscribers tend to be older and predominantly male.

🚀 Recommendations for Converting Casual Riders into Annual Members

Based on the above findings, Cyclistic can implement the following targeted marketing strategies:

1. Turn Leisure into Loyalty

  • Roll out seasonal membership plans for warmer months when casual riding spikes.
  • Offer on-the-spot free trial rides in partnership with Chicago events, beaches, and tourist areas.

2. Geo-Based “Ride More, Save More” Nudges

  • Use geofenced ads, QR codes, and signs at high-traffic stations (e.g., Grant Park, Navy Pier).
  • Trigger pop-ups or push notifications promoting discounted memberships when a casual rider checks a bike at those hubs.

3. Win Over Students & Young Riders

  • Launch student/under-25 discounted membership plans with easy sign-up via .edu emails.
  • Run short-form ads on Instagram, TikTok, and via local campus ambassadors.

4. Focus on Female Riders: Safety + Community

  • Promote the program with female-focused messaging: safety features, station lighting, flexible ride options.
  • Start women-led community rides or clubs and highlight them on social platforms.

5. Smart Retargeting Based on Ride Behavior

  • Segment casual users by ride frequency, duration, and time of day:
  • High-frequency casuals → offer discounted annual plan
  • Infrequent users → “3 rides, 4th free” mini bundles