This analysis explores how annual members and casual riders use Cyclistic bikes differently. The insights derived from this analysis will be used to design a marketing strategy aimed at converting casual riders into annual members.
The data used in this analysis is from Cyclistic’s historical trip data for Q1 2019 and Q1 2020, provided by Motivate International Inc.
The following packages are required for this analysis:
library(tidyverse)
library(lubridate)
library(scales)
The two cleaned CSV files from Google Sheets are imported into RStudio.
trips_2019 <- read_csv("C:/Users/Jay/Downloads/Divvy_Trips_2019_Q1_cleaned.csv")
trips_2020 <- read_csv("C:/Users/Jay/Downloads/Divvy_Trips_2020_Q1_cleaned.csv")
Before combining, the datasets need to be standardized:
bind_rows()# Standardize ride_id
trips_2019 <- trips_2019 %>%
mutate(ride_id = as.character(ride_id))
# Select common columns
trips_2019_clean <- trips_2019 %>%
select(ride_id, started_at, ended_at, start_station_id,
start_station_name, end_station_id, end_station_name,
member_casual, ride_length, day_of_week)
trips_2020_clean <- trips_2020 %>%
select(ride_id, started_at, ended_at, start_station_id,
start_station_name, end_station_id, end_station_name,
member_casual, ride_length, day_of_week)
# Combine into a single dataset
all_trips <- bind_rows(trips_2019_clean, trips_2020_clean)
Additional columns are created to support the analysis:
all_trips <- all_trips %>%
mutate(
started_at = parse_date_time(started_at, orders = c("ymd HMS", "ymd HM")),
hour_of_day = hour(started_at),
ride_length_mins = as.numeric(ride_length) / 60
)
glimpse(all_trips)
## Rows: 791,357
## Columns: 12
## $ ride_id <chr> "21742443", "21742444", "21742445", "21742446", "21…
## $ started_at <dttm> 2019-01-01 00:04:37, 2019-01-01 00:08:13, 2019-01-…
## $ ended_at <chr> "2019-01-01 0:11:07", "2019-01-01 0:15:34", "2019-0…
## $ start_station_id <dbl> 199, 44, 15, 123, 173, 98, 98, 211, 150, 268, 299, …
## $ start_station_name <chr> "Wabash Ave & Grand Ave", "State St & Randolph St",…
## $ end_station_id <dbl> 84, 624, 644, 176, 35, 49, 49, 142, 148, 141, 295, …
## $ end_station_name <chr> "Milwaukee Ave & Grand Ave", "Dearborn St & Van Bur…
## $ member_casual <chr> "member", "member", "member", "member", "member", "…
## $ ride_length <time> 00:06:30, 00:07:21, 00:13:49, 00:29:43, 00:06:04, …
## $ day_of_week <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, …
## $ hour_of_day <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ ride_length_mins <dbl> 6.500000, 7.350000, 13.816667, 29.716667, 6.066667,…
sum(is.na(all_trips$started_at))
## [1] 0
The combined dataset contains 791,357 rows and 12 columns. There are 0 NA values in the started_at column confirming a successful transformation.
This chart shows the total number of rides for each day of the week, split by rider type. Members ride most heavily Monday through Friday while casual riders peak on weekends, suggesting members primarily commute while casual riders ride for leisure.
all_trips %>%
mutate(day_of_week = factor(day_of_week,
levels = 1:7,
labels = c("Sun","Mon","Tue","Wed","Thu","Fri","Sat"))) %>%
group_by(member_casual, day_of_week) %>%
summarise(total_rides = n(), .groups = "drop") %>%
ggplot(aes(x = day_of_week, y = total_rides, fill = member_casual)) +
geom_col(position = "stack") +
scale_fill_manual(values = c("casual" = "#E69F00", "member" = "#0072B2")) +
scale_y_continuous(labels = comma) +
labs(
title = "Number of Rides by Day of Week",
subtitle = "Cyclistic Q1 2019 & Q1 2020",
x = "Day of Week",
y = "Total Rides",
fill = "Rider Type"
) +
theme_minimal()
This chart compares the average ride duration for members and casual riders by day of week. Casual riders consistently take longer rides, averaging 32-40 minutes compared to 11-12 minutes for members, further supporting the leisure vs. commute theory.
all_trips %>%
mutate(day_of_week = factor(day_of_week,
levels = 1:7,
labels = c("Sun","Mon","Tue","Wed","Thu","Fri","Sat"))) %>%
group_by(member_casual, day_of_week) %>%
summarise(avg_ride_length = mean(ride_length_mins), .groups = "drop") %>%
ggplot(aes(x = day_of_week, y = avg_ride_length, fill = member_casual)) +
geom_col(position = "dodge") +
scale_fill_manual(values = c("casual" = "#E69F00", "member" = "#0072B2")) +
labs(
title = "Average Ride Length by Day of Week",
subtitle = "Cyclistic Q1 2019 & Q1 2020",
x = "Day of Week",
y = "Average Ride Length (minutes)",
fill = "Rider Type"
) +
theme_minimal()
This chart shows ride activity throughout the day for each rider type. Members show a clear double peak at 8AM and 5PM consistent with commuting behavior. Casual riders build gradually through the morning and peak between 1PM and 4PM consistent with leisure riding.
all_trips %>%
group_by(member_casual, hour_of_day) %>%
summarise(total_rides = n(), .groups = "drop") %>%
ggplot(aes(x = hour_of_day, y = total_rides, color = member_casual)) +
geom_line(linewidth = 1.2) +
geom_point(size = 2) +
annotate("text", x = 8, y = 81695, label = "8AM",
color = "black", size = 4) +
annotate("text", x = 17, y = 98843, label = "5PM",
color = "black", size = 4) +
scale_color_manual(values = c("casual" = "#E69F00", "member" = "#0072B2")) +
scale_x_continuous(
breaks = 0:23,
labels = c("12AM","1AM","2AM","3AM","4AM","5AM","6AM","7AM","8AM","9AM",
"10AM","11AM","12PM","1PM","2PM","3PM","4PM","5PM","6PM","7PM",
"8PM","9PM","10PM","11PM")
) +
scale_y_continuous(labels = comma) +
labs(
title = "Rides by Hour of Day",
subtitle = "Cyclistic Q1 2019 & Q1 2020",
x = "Hour of Day",
y = "Total Rides",
color = "Rider Type"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Members dominate ridership — Members account for 91% of all rides (720,126) compared to 9% for casual riders (71,231) during Q1
Different days of use — Members ride most heavily Monday through Friday while casual riders peak on weekends
Casual riders take longer trips — Casual riders average 32-40 minutes per ride compared to 11-12 minutes for members
Different times of day — Members show commuter peaks at 8AM and 5PM while casual riders peak between 1PM and 4PM