Cyclistic Bike Share Analysis

Introduction

As part of the Google Data Analytics Professional Certification program, participants have the option to complete a capstone project. Google provides two recommended case studies, both of which are based on open-source data sets.

I chose to work on the first case study, which focuses on a bike-sharing service in Chicago. The data used in this project is licensed by Bikeshare, an LLC operated by Lyft Bikes and Scooters, under the City of Chicago’s Divvy bicycle sharing service. To make the project more engaging, we used the fictional company name ‘Cyclistic’.

The data analysis process will consist of the following steps:

About the Company

Cyclistic, a bike-sharing company in Chicago, launched its bike-share program in 2016 and now boasts a fleet of 5,824 geo-tracked bicycles across 692 stations in the city. The bikes can be rented and returned at any station in the system, making them a convenient transportation option for Chicagoans.

Cyclistic offers a range of pricing options, including single-ride passes, full-day passes, and annual memberships. Casual riders purchase single-ride or full-day passes, while annual members enjoy the benefits of unlimited rides throughout the year. Cyclistic’s finance analysts have determined that annual members are more profitable than casual riders, making it crucial to increase the number of annual memberships for future growth.

Scenario

In this case scenario, I am a junior data analyst on the marketing team at Cyclistic, a bike-sharing company in Chicago, I am tasked with understanding how casual riders and annual members use Cyclistic bikes differently. The director of marketing believes that the company’s future success depends on maximizing the number of annual memberships, and our team wants to help achieve this goal by designing a new marketing strategy that converts casual riders into annual members. By analyzing the data, we hope to uncover insights that will inform our marketing decisions and help Cyclistic achieve its goals.

ASK

Business Task

Cyclistic’s bike-share is currently focused on converting casual riders into annual members by analyzing the distinct usage patterns of both groups. To accomplish this, I will analyze historical data and provide recommendations on how to facilitate this conversion.

Stakeholders

Lily Moreno: The director of marketing and my manager.

Cyclistic executive team: A detail-oriented executive team who will decide whether to approve the recommended marketing program.

Cyclistic marketing analytics team: A team of data analysts responsible for collecting, analyzing, and reporting data that helps guide Cyclistic’s marketing strategy.

Questions to be answered.

  1. How do annual members and casual riders use Cyclistic bikes differently?

  2. Why would casual riders buy Cyclistic annual memberships?

  3. How can Cyclistic use digital media to influence casual riders to become members?

Tools: I used R for data cleaning and data visualization

Dataset: Cyclistic’s historical trip data from May 2022 to April 2023 which can be found here

PREPARE

To analyze and identify trends for this project, I’ll be using Cyclistic’s historical trip data, which has been made available under license by Motivate International Inc.Therefore, I can rely on its integrity.

I downloaded the ZIP files containing the csv files from the above link. For the purpose of my analysis I will use the csv files from May 2022 to April 2023.

PROCESS

To get an initial look at the data, I used Microsoft Excel. Each month’s data is contained in a separate csv file, which includes information about the ride such as the ride id, rideable type, start and end time, start and end station, and latitude and longitude of the start and end stations.

Here I did some minor cleaning, formatted the columns saved in a new folder so as to have my original documents safe.

Then I proceeded to R for some more cleaning and analyze phase.

ANALYZE

To analyze the vast amount of data available at the company, I chose to use R, a powerful tool that can handle large datasets with ease. Below is a brief summary of the steps I took to analyze the data, and you can find the full process, including calculations, filtering, and more, on my GitHub page click here

all_trips <- rbind(May_2022,June_2022,July_2022,August_2022,September_2022,October_2022,November_2022,December_2022,January_2023,February_2023,March_2023,April_2023)
all_trips <- all_trips %>%
    select(-c(start_lat, start_lng, end_lat, end_lng))
all_trips_clean$ride_length <- difftime(all_trips_clean$ended_at,all_trips$started_at)
nrow(riders)
riders %>%
  group_by(member_casual) %>%
  summarise(count = length(ride_length),
            "%" = (length(ride_id)/nrow(riders))*100)

SHARE

To visualize my findings, I utilized both R and Tableau. Tableau is an excellent tool for creating visually appealing and intuitive visualizations.

Here is what I found using R.

R

ggplot(riders, aes(member_casual, fill = member_casual)) + 
  geom_bar() + labs(title = "Total rides by Customer Type", x = "Customer Type")+
  scale_fill_manual("legend", values = c("casual" = "orange", "member" = "blue"))

Based on the graph above, it is evident that there were more casual riders than members in the past 12 months, as indicated by the ride count and the corresponding percentages in the investigated data.

riders %>% 
  group_by(member_casual, day_of_week) %>% 
  summarise(number_of_rides = n()
            ,average_duration = mean(ride_length)) %>% 
  arrange(member_casual, day_of_week)  %>% 
  ggplot(aes(x = member_casual, y = average_duration, fill = member_casual)) +
  labs(title = "Average Duration by Customer Type", x = "Customer Type")+
  scale_fill_manual("legend", values = c("casual" = "orange", "member" = "blue"))+ geom_col(position = "dodge")

From the above graph, we can see that member riders ride duration is longer than the casual riders despite there being more of this group in total. Let us take a closer look to see if there is still a way to convert all the casual riders into members.

riders %>% 
  group_by(member_casual, day_of_week) %>% 
  summarise(number_of_rides = n(), .groups="drop") %>% 
  arrange(member_casual, day_of_week) %>% 
  ggplot(aes(x = day_of_week, y = number_of_rides, fill = member_casual)) +
  labs(title ="Total rides per Customer Type by Day of week") +
  geom_col(width=0.5, position = position_dodge(width=0.5)) +
  scale_y_continuous(labels = function(x) format(x, scientific = FALSE))+
  scale_fill_manual("legend", values = c("casual" = "orange", "member" = "blue"))+ 
  geom_col(position = "dodge")

Based on the visualization, it is evident that both casual and member riders have the highest number of rides during the weekends, specifically on Saturdays and Sundays. Interestingly, Wednesdays show the lowest number of rides for casual riders, while members seem to remain consistent throughout the week. Additionally, both customer types share similar behaviors over the weekend. This information suggests that there may be an opportunity to convert weekend casual riders into members.

riders %>%
  group_by(member_casual, day_of_week) %>% 
  summarise(average_trip_duration = mean(ride_length)) %>%
  ggplot(aes(x = day_of_week, y = average_trip_duration, fill = member_casual)) +
  geom_col(width=0.5, position = position_dodge(width=0.5)) + 
  labs(title ="Average trip duration per Customer type by Day of week") +
  scale_fill_manual("legend", values = c("casual" = "orange", "member" = "blue"))+ 
  geom_col(position = "dodge")

Based on the above graph, it is evident that member riders have the longest trip duration on average during weekdays, with the highest duration on Saturdays and Sundays. In contrast, casual riders have shorter trip durations on a weekly basis. This pattern suggests that although there are more casual riders, they tend to ride for shorter periods of time. This may be a contributing factor to why they choose to remain casual riders.

Now, let’s examine the bike types and how they are being utilized by the riders

riders %>%
  ggplot(aes(rideable_type, fill = member_casual)) +
  geom_bar()+
  labs(x="Bike Type", title= "Total rides by Bike Type")+
  scale_fill_manual("legend", values = c("casual" = "orange", "member" = "blue"))

Based on the graph above, it is clear that both customer types have a preference for classic and electric bikes over docked bikes. Additionally, it appears that docked bikes are primarily being used by casual riders, as member riders are not utilizing them.

Before displaying the visualizations in Tableau, let’s examine the most frequently used starting station for the average casual rider.

riders %>%
  group_by(start_station_name, member_casual) %>%
  summarise(number_of_ride = n(), .groups = 'drop') %>%
  filter(start_station_name != "", member_casual != 'member') %>%
  arrange(-number_of_ride) %>%
  head(n=10) %>%
  select(-member_casual)

TABLEAU

To create a more comprehensive and visually appealing dashboard, Tableau is an excellent tool for visualizing the data. Below is a brief overview of my process (you can also find the complete code for this section here

I made some changes to the data to help me visualize properly in Tableau. The steps are in the code above.

I went on to create the following graphs in Tableau.

Data Data Data Data

I developed an interactive dashboard that displays all the graphs and pertinent information from my analysis. You can view and interact with the complete dashboard by following this link.

You can find an image of the dashboard below

ACT

The final step of the project.

INSIGHT

RECOMMENDATION