Brief Description

This project analyzes how casual riders and annual members use Cyclistic bikes differently, using historical bike-share data. The goal is to uncover behavior patterns to support a targeted marketing strategy focused on converting casual riders into long-term members.

Scenario

You are a junior data analyst on Cyclistic’s marketing analytics team. Your manager, Lily Moreno, has tasked you with uncovering insights into rider behavior. Your findings will guide a strategic campaign aimed at increasing annual memberships—key to Cyclistic’s sustainable growth.

About the company

Cyclistic is a bike-share program in Chicago, founded in 2016. It offers a flexible pricing structure including single-ride, daily, and annual plans. The company has:

  • 5,824 GPS-enabled bikes
  • 692 docking stations
  • Inclusive bike options (hand tricycles, reclining bikes, cargo bikes)
  • A user base that includes both leisure riders and commuters

Annual members are more profitable than casual users, and converting casual riders is central to the marketing team’s new strategy.

ASK

Business Task:

The objective is to determine how annual members and casual riders use Cyclistic bikes differently. This analysis will identify trends in rider behavior—such as ride length, frequency, or preferred days—and clarify whether casual riders show habits that suggest they could become members. These findings will inform a targeted marketing campaign to increase annual memberships.

Why This Matters

Key Stakeholders
  • Lily Moreno (Director of Marketing): Needs insights to build campaigns that resonate with casual riders.
  • Cyclistic Marketing Analytics Team: Will support the analysis and contribute to strategy refinement.
  • Executive Team: Demands clear, data-backed recommendations before approving marketing decisions.
  • Casual Riders: Target group for conversion based on behavioral trends.
  • Annual Members: Provide baseline metrics for comparison.

PREPARE

2.1 Data Source

For this project, I’m working with two datasets:

  • Divvy_Trips_2019_Q1.csv
  • Divvy_Trips_2020_Q1.csv

These files contain historical trip records for a bike-share program in Chicago, made available by Motivate International Inc. under a public license. Although Cyclistic is fictional, these datasets are representative and appropriate for this analysis. They include key attributes such as rider type, ride duration, start and end times, and station IDs—all crucial for examining usage behavior.

Data Storage and Organization

I created a dedicated folder named cyclistic_case_study on my device to house all related materials. Within it, I organized:

  • original_data/ – where the raw .csv files live
  • processed_data/ – for cleaned and transformed data
  • outputs/ – visualizations and summary files

This structure helps me stay organized as I move through the steps of the analysis.

ROCCC Check for Data Credibility

To ensure the data is suitable, I evaluated it using the ROCCC framework:

  • Reliable: The datasets come from a well-established data provider and were used by many analysts.

  • Original: I’m working with raw trip-level data, not summaries.

  • Comprehensive: The data spans different seasons and contains a variety of variables useful for behavioral segmentation.

  • Current: Although the data is from 2019–2020, it reflects consistent trends that are still valuable for strategy planning.

  • Cited: Appropriate attribution is provided to Motivate International Inc. Licensing, Privacy, and Accessibility

This data excludes personally identifiable information (PII), aligning with standard privacy practices. I’m focusing only on publicly available variables like user type and ride duration. All code and documentation are included in this RMarkdown file and rendered as an HTML report to ensure accessibility for stakeholders and collaborators.

To prepare the files:

# Install required packages
install.packages("tidyverse")   # For data manipulation and visualization
install.packages("lubridate")   # For date-time handling and calculations
install.packages("janitor")     # For cleaning column names and tabulations
install.packages("here")        # Helps manage file paths easily
install.packages("skimr")       # For quick data overviews
install.packages("data.table")  # For high-performance data operations if needed
install.packages("readr")
install.packages("dplyr")
# Load the installed packages into your R environment
library(tidyverse)   # For data wrangling and visualization
library(lubridate)   # For working with date and time formats
library(janitor)     # For cleaning and standardizing column names
library(readr)
library(dplyr)
# Import Divvy Q1 2019 and Q1 2020 datasets
# Use exact file names from list.files()
#divvy_2019 <- read_csv("Divvy_Trips_2019_Q1 - Divvy_Trips_2019_Q1.csv")
#divvy_2020 <- read_csv("Divvy_Trips_2020_Q1 - Divvy_Trips_2020_Q1.csv")
#file.rename("Divvy_Trips_2019_Q1 - Divvy_Trips_2019_Q1.csv", "Divvy_Trips_2019_Q1.csv")
#file.rename("Divvy_Trips_2020_Q1 - Divvy_Trips_2020_Q1.csv", "Divvy_Trips_2020_Q1.csv")
divvy_2019 <- read_csv("Divvy_Trips_2019_Q1.csv")
divvy_2020 <- read_csv("Divvy_Trips_2020_Q1.csv")
#Get a quick snapshot of the data
glimpse(divvy_2019)
glimpse(divvy_2020)

PROCESS

Data Import and Structure Alignment

The datasets for Divvy 2019 Q1 and Divvy 2020 Q1 were loaded and assigned to divvy_2019 and divvy_2020, respectively. Column names and data structures were reviewed for compatibility. To ensure smooth merging, the divvy_2019 data frame was renamed to match divvy_2020:

Both datasets were merged into a unified data frame, all_trips, to enable consolidated analysis:

library(dplyr)       # Enables %>%, mutate(), select(), rename()
library(lubridate)   # Enables ymd_hms(), difftime()
library(readr)       # Enables read_csv()
library(janitor)     # Enables clean_names(), remove_empty()
divvy_2019 <- divvy_2019 %>% 
  mutate(
    ride_id = as.character(ride_id),
    rideable_type = as.character(rideable_type)
  )

divvy_2020 <- divvy_2020 %>%
  mutate(
    ride_id = as.character(ride_id),
    rideable_type = as.character(rideable_type)
  )

all_trips <- bind_rows(divvy_2019, divvy_2020)
all_trips <- bind_rows(divvy_2019, divvy_2020)

Within the member_casual column, varied labels such as “Subscriber” and “Customer” were standardized into two groups—“member” and “casual”—to ensure consistency:

all_trips <- all_trips %>%
  mutate(member_casual = if_else(member_casual %in% c("Subscriber", "member"), "member", "casual"))

With the absence of a built-in tripduration column in the 2020 dataset, a new ride_length field was created by computing the time difference between ended_at and started_at. This was then converted to minutes:

all_trips <- all_trips %>%
  mutate(
    started_at = ymd_hms(started_at),
    ended_at = ymd_hms(ended_at),
    ride_length = as.numeric(difftime(ended_at, started_at, units = "secs")),
    ride_length_mins = round(ride_length / 60, 2)
  )

Non-essential columns, such as latitude/longitude, birth year, gender, and trip duration, were removed to streamline the dataset:

all_trips <- all_trips %>%
  select(-c(start_lat, start_lng, end_lat, end_lng, birthyear, gender, tripduration))

To maintain analytical integrity, records with negative durations, durations under one minute, or flagged station names like “HQ QR” were filtered out. A new version of the dataset—all_trips_v2—was created to preserve original data:

all_trips_v2 <- all_trips %>%
  filter(
    ride_length >= 60,
    start_station_name != "HQ QR"
  )

ANALYZE

Overview and Strategy

RStudio was primarily used to perform a step-by-step descriptive analysis of ride behavior. The goal is to understand how casual riders and annual members differ in ride length and temporal patterns, laying the groundwork for targeted marketing initiatives. Each section below explains the rationale, followed by the exact RMarkdown chunk you’ll use. These chunks will feed directly into the visualizations in the Share phase.

Objectives

  • Descriptive Statistics for Ride Length
  • Comparing Ride Length Between Rider Types
  • Average Ride Time by Day of Week

Descriptive Statistics for Ride Length

This is done by measuring central tendency and dispersion of ride_length (in seconds) across all trips. These metrics help us understand the typical ride and identify outliers.

# Straightforward summary of ride_length
mean(all_trips_v2$ride_length)   
median(all_trips_v2$ride_length) 
max(all_trips_v2$ride_length)    
min(all_trips_v2$ride_length)    
# Condensed summary
summary(all_trips_v2$ride_length)

Comparing Ride Length Between Rider Types

Compare members vs. casual riders on mean, median, max, and min ride lengths. This reveals whether one group tends to take longer or shorter trips.

aggregate(ride_length ~ member_casual, data = all_trips_v2, FUN = mean)
aggregate(ride_length ~ member_casual, data = all_trips_v2, FUN = median)
aggregate(ride_length ~ member_casual, data = all_trips_v2, FUN = max)
aggregate(ride_length ~ member_casual, data = all_trips_v2, FUN = min)

Average Ride Time by Day of Week

To uncover weekly patterns, we calculate the average ride length by member_casual and day_of_week. At this point, days may be alphabetically ordered rather than chronologically.

aggregate(ride_length ~ member_casual + day_of_week, data = all_trips_v2, FUN = mean)

Analysis of Ridership by Rider Type and Weekday

# Total rides by type and weekday
rides_by_day <- all_trips_v2 %>%
  group_by(member_casual, day_of_week) %>%
  summarise(total_rides = n(), .groups = "drop")

# Average ride duration (in minutes)
duration_by_day <- all_trips_v2 %>%
  group_by(member_casual, day_of_week) %>%
  summarise(
    avg_duration_min = mean(ride_length) / 60,
    .groups = "drop"
  )

SHARE

It is on of the core question driving this work – echoing Moreno’s recommendation–is: How do annual members and casual riders use Cyclistic bikes differently. Uncovering these behavioral distinctions lets us tailor marketing, pricing, and fleet strategies to boost member conversion and meet each group’s needs.

1. Total Rides per Rider Type

Universal Rides per Rider Type
Universal Rides per Rider Type
rides_by_month_type <- all_trips_v2 %>% 
  mutate(month = month(started_at, label = TRUE, abbr = TRUE)) %>% 
  group_by(month, member_casual) %>% 
  summarise(total_rides = n(), .groups = "drop")
rides_by_month_type$month <- factor(rides_by_month_type$month, levels = month.abb)
ggplot(rides_by_month_type, aes(month, total_rides, fill = member_casual)) + 
  geom_col(position = "dodge") + 
  geom_text(aes(label = total_rides), position = position_dodge(0.9), vjust = -0.5, size = 3, color = "black") + 
  scale_fill_manual(name = "Rider Type", values = c(casual = "#1f98b4", member = "#FF6F61")) + 
  scale_y_continuous(labels = comma_format(), expand = c(0, 0)) + 
  labs(title = "Total Rides per Rider Type", x = "Month", y = "Number of Rides", 
       caption = 'Analysis of data from "Divvy_Trips_Q1_2019" and "Divvy_Trips_Q1_2020"') + 
  theme_minimal() + 
  theme(legend.position = "right", plot.margin = margin(10, 10, 20, 10), 
        axis.text.x = element_text(angle = 0, hjust = 1))

Key insights:

2. Average Ride Duration by Weekday and Rider Type

Average Ride Duration by Weekday and Rider Type
Average Ride Duration by Weekday and Rider Type
ggplot(duration_by_day, aes(x=day_of_week, y=avg_duration_min, color=member_casual, group=member_casual)) + geom_line(size=1) + geom_point(size=2) +
scale_color_manual(values=c(casual="#1f78b4", member="#33a02c")) + scale_y_continuous(limits=c(0,150), breaks=seq(0,150,by=20), expand=c(0,0)) +
labs(title="Average Ride Duration by Weekday and Rider Type", x="Day of Week", y="Average Duration (minutes)", color="Rider Type", caption='Analysis of data from "Divvy_Trips_Q1_2019" and "Divvy_Trips_Q1_2020"') +
theme_minimal()

Key insights:

3. Heatmap of Total Rides by Weekday and Rider Type

Heatmap of Total Rides by Weekday and Rider Type
Heatmap of Total Rides by Weekday and Rider Type
library(dplyr); library(ggplot2)

all_trips_v2$day_of_week <- factor(all_trips_v2$day_of_week, levels=c("Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday"))

rides_heatmap <- all_trips_v2 %>% group_by(member_casual, day_of_week) %>% summarise(total_rides = n(), .groups = "drop")

ggplot(rides_heatmap, aes(x=day_of_week, y=member_casual, fill=total_rides)) + geom_tile(color="white") + scale_fill_gradient(low="#fee5d9", high="#de2d26", name="Number of Rides") + labs(title="Heatmap of Total Rides by Weekday and Rider Type", x="Day of Week", y="Rider Type") + theme_minimal() + theme(axis.text.x = element_text(angle=45, hjust=1))

Key Insights from the Heatmap:

Summary Across Visuals

Category Casual Riders Members
Ride Count ~10–15 % of total rides ~85–90 % of total rides
Average Ride Duration 35–45 min avg, longest on weekends 13–15 min avg, steady across all weekdays
Duration Peaks Highest on Saturday; widest gap on Sunday Small weekend uptick; peak duration on Tuesday
Weekly Usage Pattern Light Mon–Thu; surge Fri–Sun High Mon–Thu demand; tapering into the weekend
Monthly Ride Trend Rapid growth in March (from ~12 k to ~40 k rides) Stable high counts Jan–Mar with a March uplift

Across these charts, casual riders form a smaller but fast-growing segment—favoring long, leisure-driven outings on weekends (especially in March)—whereas members conduct frequent, short trips peaking mid-week. Aligning marketing toward weekend casuals and reallocating bikes for commute versus leisure periods can unlock both operational efficiencies and membership growth.

SHARE

Business Task (Follow up)

To understand distinct behaviors of casual riders versus annual members on Cyclistic bikes and craft targeted marketing and operational strategies that convert more casual users into annual members while optimizing bike availability.


Deliverables

Each recommendation below maps directly to insights from our analysis and drives toward increasing annual memberships and improving resource allocation.

  1. Launch Spring Membership Trials
    Insight: Casual ridership spiked by over 170% from February to March.
    Action:
    • Offer a “March Membership Trial” at a discounted rate or risk-free one-month pass.
    • Promote via email blasts, in-app banners, and social media ads starting late February.
      Outcome: Capture casual users during their peak engagement window and guide them toward full memberships.
  2. Introduce a “Leisure Rider” Membership Plan
    Insight: Casual riders average 35–45 minutes per trip—3–4× longer than members.
    Action:
    • Create a plan with discounted hourly rates, unlimited weekend rides, and rollover minutes.
    • Highlight flexible, cost-effective pricing for riders who exceed normal trip durations.
      Outcome: Align product features with casual users’ preferences, reducing their per-ride cost and incentivizing membership sign-up.
  3. Weekend-Focused Digital Campaigns
    Insight: Casual usage surges Friday through Sunday, while members peak midweek.
    Action:
    • Deploy geo-targeted push notifications and social ads on Friday afternoons promoting weekend-only membership perks.
    • Offer a “Weekend Warrior” promo code redeemable via the Cyclistic app.
      Outcome: Increase trial uptake when casual riders are most active and prime for conversion.
  4. Station-Level QR Codes & Geo-Ads in Leisure Hotspots
    Insight: Casual riders gravitate toward high-traffic tourist and park areas on weekends.
    Action:
    • Install station signage with QR codes linking to a one-click membership trial.
    • Run location-based ads around popular weekend start stations.
      Outcome: Deliver conversion messaging at the point of decision, boosting relevance and uptake.
  5. Personalized Cost-Savings Nudges
    Insight: Casual riders’ cumulative ride time (especially in March) rivals or exceeds member totals.
    Action:
    • Generate automated emails or in-app messages showing each user’s monthly ride minutes and potential savings as a member.
    • Include clear visual comparisons (e.g., “You spent 180 minutes riding—here’s what you could’ve saved”).
      Outcome: Leverage real usage data to create an emotional incentive for membership, driving higher conversion rates.

Conclusion

By aligning promotional timing, product design, and messaging with the unique habits of casual riders, Cyclistic can convert a greater share of high-value, leisure-focused users into annual members. These data-driven tactics not only promise to grow membership numbers but also ensure bikes are deployed where and when demand is highest—maximizing both revenue and customer satisfaction.