Overview

Cyclistic is a Chicago-based bike-share company that offers a variety of bike options to support inclusive and flexible transportation across the city. Since its launch, Cyclistic has experienced strong growth, supported by flexible pricing options that include single-ride passes, full-day passes, and annual memberships.

While these pricing options have helped attract a broad customer base, internal financial analysis has shown that annual members are significantly more profitable than casual riders. As a result, Cyclistic’s marketing team has identified membership growth as a key driver of long-term business success.

This case study analyzes historical Cyclistic bike-trip data to better understand how annual members and casual riders use Cyclistic bikes differently. The insights generated from this analysis will support data-driven marketing strategies aimed at converting casual riders into annual members.

Business Task

The primary business task for this analysis is to answer the following question:

How do annual members and casual riders use Cyclistic bikes differently?

Understanding these behavioral differences will allow the marketing team to design targeted campaigns that encourage casual riders—who are already familiar with the service—to transition to annual memberships.

Key Stakeholders

  • Lily Moreno, Director of Marketing

  • Cyclistic Marketing Analytics Team

  • Cyclistic Executive Leadership Team

Data Sources

This analysis uses publicly available Cyclistic bike-share trip data, made available by Motivate International Inc. under a public data license.

The dataset includes historical trip-level data such as:

  • Ride start and end timestamps

  • Ride duration

  • Day of the week

  • Rider type (casual rider or annual member)

To protect rider privacy, the data does not include personally identifiable information.

Analytical Approach

This analysis follows the six phases of the data analysis process:

1- Ask – Define the business question

2- Prepare – Identify and understand the data sources

3- Process – Clean and transform the data

4- Analyze – Identify trends and differences

5- Share – Communicate insights through visualizations

6- Act – Provide data-driven recommendations

Data Preparation and Processing

Libraries and Environment Setup

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.1     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.2.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)

Loading Prepared Data

The dataset used in this analysis was previously downloaded, standardized, and saved as a single object to ensure efficient and reproducible analysis.

load("cyclistic_12_months.RData")

Data Cleaning and Feature Engineering

To prepare the data for analysis, date-time fields are standardized and new variables are created to support comparison across rider types.

  • Trip start and end times are converted to datetime format

  • Trip duration is calculated in minutes

  • Day of week is derived from the trip start time

cyclistic_12_months <- cyclistic_12_months %>%
mutate(
started_at = ymd_hms(started_at),
ended_at   = ymd_hms(ended_at),
ride_length = as.numeric(difftime(ended_at, started_at, units = "mins")),
day_of_week = wday(started_at, label = TRUE)
)
## Warning: There were 2 warnings in `mutate()`.
## The first warning was:
## ℹ In argument: `started_at = ymd_hms(started_at)`.
## Caused by warning:
## !  26 failed to parse.
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.

Removing Invalid or Incomplete Records

Trips with non-positive durations or missing station information are removed to improve data quality and ensure reliable analysis.

cyclistic_12_months <- cyclistic_12_months %>%
filter(
ride_length > 0,
!is.na(start_station_name),
!is.na(end_station_name)
)

Analysis

Average Ride Length by Rider Type

avg_ride_length_by_type <- cyclistic_12_months %>%
group_by(member_casual) %>%
summarise(
avg_ride_length = mean(ride_length),
.groups = "drop"
)

avg_ride_length_by_type

Number of Rides by Day of Week and Rider Type

rides_by_day <- cyclistic_12_months %>%
group_by(member_casual, day_of_week) %>%
summarise(
num_rides = n(),
.groups = "drop"
)

rides_by_day

Visualizations

Total Rides by Rider Type

cyclistic_12_months %>%
count(member_casual) %>%
ggplot(aes(x = member_casual, y = n, fill = member_casual)) +
geom_col() +
labs(
title = "Total Rides by Rider Type",
x = "Rider Type",
y = "Number of Rides"
) +
theme_minimal()

Average Ride Length by Day of Week and Rider Type

cyclistic_12_months %>%
group_by(member_casual, day_of_week) %>%
summarise(
avg_length = mean(ride_length),
.groups = "drop"
) %>%
ggplot(aes(x = day_of_week, y = avg_length, fill = member_casual)) +
geom_col(position = "dodge") +
labs(
title = "Average Ride Length by Day of Week and Rider Type",
x = "Day of Week",
y = "Average Ride Length (Minutes)"
) +
theme_minimal()

Summary

The analysis reveals clear behavioral differences between casual riders and annual members. These insights provide a strong foundation for targeted marketing strategies aimed at increasing annual memberships and supporting Cyclistic’s long-term growth objectives.