This case study is the Google Data Analytics Certificate capstone project. I am acting as a junior data analyst at Cyclistic, a fictional bike-share company based in Chicago.
Cyclistic offers a fleet of bikes that can be unlocked from one station and returned to any other station in the network. Riders can purchase:
The marketing director believes that converting casual riders into annual members is key to Cyclistic’s future growth. Three questions will guide the future marketing program:
I have been assigned the first question as my focus:
How do annual members and casual riders use Cyclistic bikes differently?
The insights from this analysis will inform a marketing strategy to convert casual riders into annual members.
Data was sourced from Divvy trip data, made available by Motivate International Inc. under a public license. The dataset covers 12 months of ride data (February 2025 – January 2026), containing over 5.5 million records.
Each monthly CSV file was loaded separately and combined into one dataframe.
m01 <- read_csv("../Cyclistic_Analysis/csv_data/202501-divvy-tripdata.csv")
m02 <- read_csv("../Cyclistic_Analysis/csv_data/202502-divvy-tripdata.csv")
m03 <- read_csv("../Cyclistic_Analysis/csv_data/202503-divvy-tripdata.csv")
m04 <- read_csv("../Cyclistic_Analysis/csv_data/202504-divvy-tripdata.csv")
m05 <- read_csv("../Cyclistic_Analysis/csv_data/202505-divvy-tripdata.csv")
m06 <- read_csv("../Cyclistic_Analysis/csv_data/202506-divvy-tripdata.csv")
m07 <- read_csv("../Cyclistic_Analysis/csv_data/202507-divvy-tripdata.csv")
m08 <- read_csv("../Cyclistic_Analysis/csv_data/202508-divvy-tripdata.csv")
m09 <- read_csv("../Cyclistic_Analysis/csv_data/202509-divvy-tripdata.csv")
m10 <- read_csv("../Cyclistic_Analysis/csv_data/202510-divvy-tripdata.csv")
m11 <- read_csv("../Cyclistic_Analysis/csv_data/202511-divvy-tripdata.csv")
m12 <- read_csv("../Cyclistic_Analysis/csv_data/202512-divvy-tripdata.csv")
# Combine into one dataframe
all_trips <- bind_rows(m01, m02, m03, m04, m05, m06, m07, m08, m09, m10, m11, m12)## Rows: 5,552,092
## Columns: 13
## $ ride_id <chr> "FD0EB1D32AF0D47E", "FB27405C3F8C824F", "6FAFA17094…
## $ rideable_type <chr> "classic_bike", "classic_bike", "electric_bike", "e…
## $ started_at <dttm> 2026-01-31 09:13:09, 2026-01-15 14:25:42, 2026-01-…
## $ ended_at <dttm> 2026-01-31 09:28:10, 2026-01-15 14:33:18, 2026-01-…
## $ start_station_name <chr> "Central St & Girard Ave", "Shore Dr & 55th St", "H…
## $ start_station_id <chr> "CHI02042", "CHI00394", "CHI02087", "CHI00286", "CH…
## $ end_station_name <chr> "Dodge Ave & Church St", "Woodlawn Ave & 55th St", …
## $ end_station_id <chr> "CHI00741", "CHI00423", NA, NA, NA, NA, NA, NA, NA,…
## $ start_lat <dbl> 42.06431, 41.79521, 41.93247, 41.89453, 41.88166, 4…
## $ start_lng <dbl> -87.68615, -87.58071, -87.64242, -87.65341, -87.641…
## $ end_lat <dbl> 42.04831, 41.79526, 41.94000, 41.83000, 41.89000, 4…
## $ end_lng <dbl> -87.69822, -87.59647, -87.64000, -87.67000, -87.630…
## $ member_casual <chr> "member", "casual", "member", "member", "member", "…
The combined dataset contains 5,552,092 rows and 13 columns.
## ride_id rideable_type started_at ended_at
## 0 0 0 0
## start_station_name start_station_id end_station_name end_station_id
## 1186374 1186374 1246827 1246827
## start_lat start_lng end_lat end_lng
## 0 0 5670 5670
## member_casual
## 0
Key observations on missing data:
start_station_name and start_station_id:
1,186,374 missing — mostly electric bikes that do not require docking at
a named station. These were retained as they represent
valid rides.end_station_name and end_station_id:
1,246,827 missing — same reason as above.
Retained.end_lat and end_lng: 5,670 missing — GPS
did not record an end location. Since no map analysis was performed,
these were retained.New columns were added to support analysis:
all_trips <- all_trips %>%
mutate(
ride_length = ended_at - started_at,
ride_length_mins = as.numeric(ride_length) / 60,
day_of_week = wday(started_at, label = TRUE),
month = month(started_at, label = TRUE),
year = year(started_at)
)
colnames(all_trips)## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual" "ride_length" "ride_length_mins"
## [16] "day_of_week" "month" "year"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -54.795 5.399 9.435 16.097 16.582 1574.900
The summary reveals negative ride lengths (minimum:
-54.8 minutes), indicating data errors where ended_at was
recorded before started_at. These are impossible in real
life and must be removed.
Rides under 1 minute (including all negative values) were removed as they represent either data errors or false starts.
## [1] 148401
# Create clean dataframe
all_trips_V2 <- all_trips %>%
filter(ride_length_mins >= 1)
glimpse(all_trips_V2)## Rows: 5,403,691
## Columns: 18
## $ ride_id <chr> "FD0EB1D32AF0D47E", "FB27405C3F8C824F", "6FAFA17094…
## $ rideable_type <chr> "classic_bike", "classic_bike", "electric_bike", "e…
## $ started_at <dttm> 2026-01-31 09:13:09, 2026-01-15 14:25:42, 2026-01-…
## $ ended_at <dttm> 2026-01-31 09:28:10, 2026-01-15 14:33:18, 2026-01-…
## $ start_station_name <chr> "Central St & Girard Ave", "Shore Dr & 55th St", "H…
## $ start_station_id <chr> "CHI02042", "CHI00394", "CHI02087", "CHI00286", "CH…
## $ end_station_name <chr> "Dodge Ave & Church St", "Woodlawn Ave & 55th St", …
## $ end_station_id <chr> "CHI00741", "CHI00423", NA, NA, NA, NA, NA, NA, NA,…
## $ start_lat <dbl> 42.06431, 41.79521, 41.93247, 41.89453, 41.88166, 4…
## $ start_lng <dbl> -87.68615, -87.58071, -87.64242, -87.65341, -87.641…
## $ end_lat <dbl> 42.04831, 41.79526, 41.94000, 41.83000, 41.89000, 4…
## $ end_lng <dbl> -87.69822, -87.59647, -87.64000, -87.67000, -87.630…
## $ member_casual <chr> "member", "casual", "member", "member", "member", "…
## $ ride_length <drtn> 901.284 secs, 456.328 secs, 404.350 secs, 1850.186…
## $ ride_length_mins <dbl> 15.021400, 7.605467, 6.739167, 30.836433, 78.447200…
## $ day_of_week <ord> Sat, Thu, Tue, Mon, Sat, Tue, Wed, Fri, Fri, Thu, T…
## $ month <ord> Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, J…
## $ year <dbl> 2026, 2026, 2026, 2026, 2026, 2026, 2026, 2026, 202…
148,401 rides were removed, leaving 5,403,691 clean records for analysis.
all_trips_V2 %>%
group_by(member_casual) %>%
summarise(average_ride_length = mean(ride_length_mins))## # A tibble: 2 × 2
## member_casual average_ride_length
## <chr> <dbl>
## 1 casual 23.5
## 2 member 12.7
Casual riders average 23.5 minutes per ride compared to 12.7 minutes for members — almost twice as long.
## # A tibble: 2 × 2
## member_casual n
## <chr> <int>
## 1 casual 1920972
## 2 member 3482719
Members take significantly more rides than casual riders despite shorter individual ride durations.
## # A tibble: 14 × 3
## member_casual day_of_week n
## <chr> <ord> <int>
## 1 casual Sun 318480
## 2 casual Mon 220280
## 3 casual Tue 218800
## 4 casual Wed 212183
## 5 casual Thu 247670
## 6 casual Fri 307148
## 7 casual Sat 396411
## 8 member Sun 374492
## 9 member Mon 494937
## 10 member Tue 558517
## 11 member Wed 538722
## 12 member Thu 560394
## 13 member Fri 517005
## 14 member Sat 438652
all_trips_V2 %>%
group_by(member_casual, day_of_week) %>%
summarise(average_ride_length = mean(ride_length_mins), .groups = "drop")## # A tibble: 14 × 3
## member_casual day_of_week average_ride_length
## <chr> <ord> <dbl>
## 1 casual Sun 27.3
## 2 casual Mon 23.2
## 3 casual Tue 20.6
## 4 casual Wed 19.4
## 5 casual Thu 20.7
## 6 casual Fri 23.5
## 7 casual Sat 26.4
## 8 member Sun 13.9
## 9 member Mon 12.2
## 10 member Tue 12.2
## 11 member Wed 12.1
## 12 member Thu 12.2
## 13 member Fri 12.7
## 14 member Sat 13.9
## # A tibble: 24 × 3
## member_casual month n
## <chr> <ord> <int>
## 1 casual Jan 23964
## 2 casual Feb 27051
## 3 casual Mar 83064
## 4 casual Apr 105528
## 5 casual May 176162
## 6 casual Jun 279570
## 7 casual Jul 309293
## 8 casual Aug 324165
## 9 casual Sept 255223
## 10 casual Oct 214849
## 11 casual Nov 94930
## 12 casual Dec 27173
## 13 member Jan 109969
## 14 member Feb 122118
## 15 member Mar 208500
## 16 member Apr 257962
## 17 member May 314091
## 18 member Jun 379609
## 19 member Jul 430535
## 20 member Aug 443214
## 21 member Sept 441046
## 22 member Oct 414231
## 23 member Nov 252033
## 24 member Dec 109411
Based on the analysis, the following three recommendations are made to help Cyclistic convert casual riders into annual members:
Analysis shows casual riders peak on Saturdays (396,411 rides) and Sundays (318,480 rides). Cyclistic should launch targeted digital and in-app marketing campaigns from Thursday onwards to capture casual riders during their highest-activity period. Ads should highlight the cost savings of annual membership for riders who already ride frequently on weekends, showing them that a membership quickly pays for itself.
Casual riders average 23.5 minutes per ride but their weekday ridership is significantly lower than members. Marketing should reframe membership not just as a leisure upgrade, but as a practical tool for spontaneous weekday trips — avoiding car use for short errands, unexpected commutes, or emergency situations. Messaging such as “No car? No problem — your membership covers you any time” could resonate with this segment.
Casual ridership surges in summer (June–August) and drops sharply in winter. Cyclistic should launch discounted membership offers in March and April, just before the peak season begins, to convert casual riders into members before they reach their highest riding frequency. A limited-time spring offer could create urgency and capture riders already planning their cycling season.