This is Google Data Analytics Capstone project where I am working as a junior data analyst on the Cyclistic bike-share case study. Cyclistic operates in Chicago with over 5,800 bikes and 600 docking stations, serving both casual riders and annual members. The bikes are used for leisure activities as well as daily commuting. The purpose of this analysis is to identify differences in how casual riders and annual members use the service. The marketing team believes that increasing annual memberships is essential for long-term business growth. To achieve this, I follow the six steps of data analysis process: Ask, Prepare, Process, Analyze, Share, and Act. By analyzing historical ride data, I aim to uncover meaningful patterns that support data-driven marketing strategies and recommend ways to convert casual riders into annual members.
Cyclistic started its bike-share program in 2016 and has grown to over 5,800 bikes and nearly 700 stations across Chicago. Bikes can be picked up at one station and returned to any other station at any time. Cyclistic offers flexible pricing plans: single-ride passes, day passes, and annual memberships. Riders using single-ride or day passes are called casual riders, while annual pass holders are members.
Financial analysis shows that annual members are more profitable than casual riders. The current marketing strategy focuses on general awareness and attracting a wide audience. The marketing director believes future growth depends on increasing annual memberships. Casual riders are already familiar with Cyclistic and use the service regularly. The goal is to convert casual riders into annual members. Historical trip data will be analyzed to understand rider behaviour and support targeted marketing strategies.
In order to understand the differences in how annual members and casual riders use Cyclistic bikes, I aim to understand:
How the total customers are divided between the two types - member vs casual
How do usage patterns (trip days, duration) differ between annual members and casual riders?
What times of day and days of the week are most popular for each rider type?
How do trip duration vary between members and casual riders?
Which stations are most commonly used by members and casual riders?
What patterns could help convert casual riders into annual members?
Help Cyclistic grow annual memberships by analyzing rider behaviour to identify patterns and provide data-driven marketing recommendations to convert casual riders into members.
The data for this analysis is publicly available from Motivate International Inc. I used Microsoft Excel and Posit’s RStudio, working with the Divvy 2019 Q1 and Divvy 2020 Q1 datasets to stay within the free plan’s memory limits. The datasets are named “Divvy” because Cyclistic is fictional, and all data is anonymized to protect rider privacy.
Data is organized in tables, with each row representing a bike trip and columns including trip ID, start and end times, duration, stations, bike type, and rider type. Some inconsistencies were found: column names differ between 2019 and 2020; only 2019 dataset includes gender and birthyear, only 2020 dataset includes station coordinates; trip duration is missing in 2020 dataset but can be calculated; rider type labels vary (Casual/Customer, Member/Subscriber); and some blank values exist, such as missing station names or IDs in 2020 dataset and blank Gender/Birthyear in 2019 dataset .
In the data processing stage, Microsoft Excel and Posit’s RStudio were used to clean and prepare the datasets for analysis. Preliminary checks for data quality, such as identifying duplicates, missing entries, and errors, were conducted in Excel. After correcting and formatting the data, it was imported into RStudio for more detailed processing and analysis.
Importing the Divvy trip data from a CSV file into R and merging the two files
library(tidyverse)
q12019 <- read_csv("Divvy_Trips_2019_Q1.csv")
q12020 <- read_csv("Divvy_Trips_2020_Q1.csv")
q12020 <- q12020 %>%
mutate(ended_at = as.POSIXct(ended_at,"%d/%m/%Y %H:%M:%S"))
q12020 <- q12020 %>%
mutate(started_at = as.POSIXct(started_at,"%d/%m/%Y %H:%M:%S"))
q12019 <- q12019 %>%
mutate(ended_at = as.POSIXct(ended_at,"%d/%m/%Y %H:%M:%S"))
q12019 <- q12019 %>%
mutate(started_at = as.POSIXct(started_at,"%d/%m/%Y %H:%M:%S"))
q12019 <- q12019 %>%
mutate(ride_id = as.character(ride_id))
comb_trips <- bind_rows(q12019,q12020)
glimpse(comb_trips)
## Rows: 784,260
## Columns: 10
## $ ride_id <chr> "21742443", "21742444", "21742445", "21742446", "21…
## $ started_at <dttm> 0001-01-20, 0001-01-20, 0001-01-20, 0001-01-20, 00…
## $ ended_at <dttm> 0001-01-20, 0001-01-20, 0001-01-20, 0001-01-20, 00…
## $ start_station_name <chr> "Wabash Ave & Grand Ave", "State St & Randolph St",…
## $ start_station_id <dbl> 199, 44, 15, 123, 173, 98, 98, 211, 150, 268, 299, …
## $ end_station_name <chr> "Milwaukee Ave & Grand Ave", "Dearborn St & Van Bur…
## $ end_station_id <dbl> 84, 624, 644, 176, 35, 49, 49, 142, 148, 141, 295, …
## $ member_casual <chr> "member", "member", "member", "member", "member", "…
## $ duration <dbl> 390, 441, 829, 1783, 364, 216, 177, 100, 1727, 336,…
## $ day <chr> "Tuesday", "Tuesday", "Tuesday", "Tuesday", "Tuesda…
RStudio was used to conduct the analysis and understand the difference in usage of the bikes by the casual riders and annual members. Since the dataset used provides information about Quarter 1 (January to March), daily aggregation was selected to understand weekday versus weekend trends. This will be important to understand the trends and patterns of usage of the bike between the two types of users.
Keeping this approach in mind, following metrics were computed to be further used in visualization and recommendations.
rider_ratio <- comb_trips %>%
group_by(member_casual) %>%
summarize(total = n())
casual_rides_by_day <- comb_trips %>%
filter(member_casual == "casual")
member_rides_by_day <- comb_trips %>%
filter(member_casual == "member")
duration_trip <- comb_trips %>%
mutate(duration = duration/60) %>%
group_by(member_casual, day) %>%
summarize(avg_duration = mean(duration),
total_duration = sum(duration))
top10_stations <- comb_trips %>%
group_by(start_station_name, member_casual) %>%
summarise(rides = n(), .groups = "drop") %>%
group_by(start_station_name) %>%
mutate(total_rides = sum(rides)) %>% arrange(desc(total_rides)) %>%
head(20) %>%
ungroup()
top10_casual <- comb_trips %>%
filter(member_casual=="casual")%>%
group_by(start_station_name)%>%
summarise(c_rides = n(), .groups = "drop") %>%
arrange(desc(c_rides)) %>%
head(10) %>%
ungroup()
top10_member <- comb_trips %>%
filter(member_casual=="member")%>%
group_by(start_station_name)%>%
summarise(m_rides = n(), .groups = "drop") %>%
arrange(desc(m_rides)) %>%
head(10) %>%
ungroup()
Following visualizations will be used to share the findings. The visualizations aim to help see the data clearly. Overly complex plots and use of too many colors have been avoided.
rider_ratio <- rider_ratio %>%
mutate(percentage = total / sum(total) * 100)
ggplot(rider_ratio, aes(x = "", y = total, fill = member_casual)) +
geom_bar(stat = "identity") + coord_polar(theta = "y") + geom_text(aes(label = paste0(round(percentage), "%")),position = position_stack(vjust = 0.5)) + labs(title = "Ratio of Rides by User type") + theme_void()
Annual members account for ~91% of all rides, showing they are the most frequent users.Casual riders make only ~9% of rides, indicating their usage is occasional or recreational. Note: this is based on ride counts, not unique riders, so it does not reflect the actual split of members and casual riders.
casual_rides_by_day$day <- factor(casual_rides_by_day$day, levels = c("Monday", "Tuesday", "Wednesday","Thursday", "Friday", "Saturday", "Sunday"))
ggplot(casual_rides_by_day, aes(x = day)) + geom_bar(fill="#ff7f0e") + labs(title = "Rides by Day for Casual Riders", x = "Day",y = "Number of Rides") + theme_minimal()
For Casual Riders, weekday usage is moderate: Monday to Friday sees between ~5,500 to ~8,000 rides. Usage is fairly steady but slightly higher toward the end of the week (Friday).Weekend usage is much higher. Saturday rides jump to ~13,500. Sunday rides are the highest, ~18,000. This pattern suggests casual riders primarily use bikes for leisure or weekend trips, not daily commuting.
member_rides_by_day$day <- factor(member_rides_by_day$day, levels = c("Monday", "Tuesday", "Wednesday","Thursday", "Friday", "Saturday", "Sunday"))
ggplot(member_rides_by_day, aes(x = day)) + geom_bar(fill="#77DD77") + labs(title = "Rides by Day for Annual Members", x = "Day",y = "Number of Rides") + theme_minimal()
For Annual Members, ride numbers are consistently high during weekdays (Monday - Friday), roughly 100,000–120,000 rides per day. Also note, Tuesday and Thursday are slightly higher than other weekdays.This could be related to work schedules. Weekend usage is moderate. Rides drop sharply to around 50,000 rides per day, roughly half of weekday usage. This pattern suggests members are using bikes primarily for commuting or daily transportation.
duration_trip <- comb_trips %>%
mutate(duration = duration/60) %>%
group_by(member_casual, day) %>%
summarize(avg_duration = mean(duration),total_duration = sum(duration))
ggplot(duration_trip, aes(x=day, y=avg_duration, fill=member_casual)) + geom_col(position='dodge') + labs(title = "Average Duration by day per Rider Type", x = "Day",y = "Average Duration(minutes)") + theme_minimal()
Casual riders consistently log far longer average duration than annual members on every day basis. Their ride times range roughly from 100 to over 120 minutes, depending on the day. The peak appears on Thursday, where casual riders exceed 120 minutes on average. Casual riders show more variation across days, with a noticeable mid‑week peak.
Member riders are highly consistent. Member ride durations stay tightly clustered around 15–20 minutes. This suggests members use the service for short, routine trips. These are predictable rides that fit into daily routines.
Thursday peaks for casual riders could be a mid‑week leisure pattern, tourists or visitors planning activities, promotional or weather-related effects
top10_casual$start_station_name <- factor(top10_casual$start_station_name, levels = top10_casual$start_station_name)
ggplot(top10_casual,aes(y=start_station_name, x= c_rides)) + geom_bar(stat="identity", fill="#ff7f0e")+labs(title = "Top 10 Stations used by Casual Riders", x= "Number of Riders", y="Start Stations" )
top10_member$start_station_name <- factor(top10_member$start_station_name, levels = top10_member$start_station_name)
ggplot(top10_member,aes(y=start_station_name, x= m_rides)) + geom_bar(stat="identity", fill="#77dd77")+labs(title = "Top 10 Stations used by Annual Members", x= "Number of Riders", y="Start Stations" )
Most commonly used stations differ for Casual Riders and Annual Members. Marketing strategies to purchase annual membership and conversion campaigns should target the top stations used by casual riders.Further analysis can be done on the location of the stations used by casual riders to understand the purpose of rides.
Target Casual Riders During Peak Weekend Usage: Casual riders are most active on weekends, making this the moment when perceived value and engagement are highest.Deploy push notifications and QR-code promotions at popular stations on Saturdays and Sundays
Offer “Thursday-only membership incentives”. Casual riders show unusually long ride durations on Thursdays, suggesting planned leisure or exploratory trips — an ideal moment to pitch membership benefits.
Educate Casual Riders on Long-Ride Cost Exposure. Many casual riders may not realize how expensive extended rides can be, especially when they frequently exceed standard time limits.At ride completion, show a comparison highlighting how long rides increase casual costs versus membership protection.
Promote Premium Access at Busy Stations - Casual riders often ride during peak times when bike availability could be limited. Market “priority bike access at high-demand stations” as a core membership benefit, especially at tourist and leisure-heavy locations. Premium access adds perceived value even if membership costs more.
Use Physical Signage/Deploy Ambassadors at Top-Casual Stations. Place clear, benefit-focused signage at stations with high casual usage emphasizing: No pre-booking required, Faster access, Stress-free riding. Position trained ambassadors at high-traffic stations during weekends to explain membership benefits.
Create partnerships with restaurants, cafés, and attractions to offer Member-only routes and Exclusive perks for members