Introduction

In this case study, I’ll be performing many real-world tasks that a data analyst usually does in their day-to-day job. I’ll be working with a fictional company named Cyclist and answer key business questions, in order to do that I’ll be following the six-step data analysis process: ask, prepare, process, analyze, share and act. I find these steps very useful and I think they make a clear path to answer the questions that a data analytics project requires me to answer. Below you’ll find the scenario as well as the questions I’ll try to answer in this project.

Scenario

I’m a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. That being the case I need to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, I will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve my recommendations, so they must be backed up with compelling data insights and professional data visualizations, that I’ll do my best to provide below.

What do we know?

Cyclistic has a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime. Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members.

Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, the director of marketing Lily Moreno believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, Moreno believes there is a very good chance to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs.

That’s all we know and that brings us to the ask phase.

Ask

Our goal is to design marketing strategies aimed at converting casual riders into annual members, the next three questions will be a great guide to get there:

To answer those question our marketing director is interested in analyzing the Cyclistic historical bike trip data to identify trends.

Prepare

In preparation for analysis I downloaded the Cyclistic trip data from the last 12 months (from January 2022 to December 2022). This is all public data that I will be using to answer the question stated above.

Note: The data has been made available by Motivate International Inc. under this license.

Process

In this step I stored and cleaned all the data in order to make it ready to analyze, below you can see my cleaning process.

After that I went to the analyze phase.

Analyze

R

In order to analyze the data I opted for using R, it’s the perfect tool to handle the huge collection of data the company has. Below you can find a brief summary of the steps I took to analyse the data. The full process (calculations, filtering, etc.) can be found on my GitHub (click here).

  • Loaded the libraries needed.
  • Imported the original months data into individual data frames.
  • Merged all the months into a full year data frame called cyclistic_2022.
  • Created a copy of the cyclistic_2022 data frame called cyclistic_data where all my calculations would take place.
  • Created and calculated the ride_length column by subtracting end_time from start_time.
cyclistic_data$ride_length <- difftime(cyclistic_2022$ended_at, cyclistic_2022$started_at, units = "mins")
  • Created new columns called date, year, hour, time, day, month, quarter and time_of_day.
  • Changed the name of the column member_casual to membership to make it more intuitive.
  • Cleaned the data by removing duplicates and the unnecessary columns: ride_id; rideable_type; start_station_id; start_station_name; end_station_name; end_station_id; start_lat; start_lng; end_lat; end_lng.
  • Calculated the number of rides made by all riders (number of rows), by member type, time of the day, hour, day of the week, day of the month, month and quarter.
##Total number of rides
> nrow(cyclistic_data)
[1] 5667717

##Number of rides by mmeber type
> cyclistic_data %>%
+   group_by(membership) %>% 
+   count(membership)
  membership       n
1 casual     2322032
2 member     3345685

##Number of rides by time of day
> cyclistic_data %>%
+   group_by(time_of_day) %>% 
+   count(time_of_day)
  time_of_day       n
1 Afternoon   2470323
2 Evening     1600259
3 Morning     1350303
4 Night        246832

(...)
  • Calculated the overall average Ride Length as well as by member type, hour, time of the day, day of the week, day of the month, month and quarter.
##Overall avg ride length
> cyclistic_avgRide <- mean(cyclistic_data$ride_length)
> print(as_hms(cyclistic_avgRide)) #to get the result in mm:ss format
00:19:26.281596

##by member type
> avgMember <- cyclistic_data %>% group_by(membership) %>% 
+   summarise_at(vars(ride_length),
+                list(time = mean))
> avgMember$time <- as_hms(avgMember$time) #formats time as mm:ss
> print(avgMember)
  membership time         
  <chr>      <time>       
1 casual     29:07.771699
2 member     12:42.705460

#by quarter
> cyclistic_data %>% 
+   group_by(quarter) %>% 
+   summarise_at(vars(ride_length),
+                list(time = mean))
  quarter time         
1 1Q      16.74187 mins
2 2Q      21.06115 mins
3 3Q      20.51899 mins
4 4Q      15.70774 mins

(...)

Share

Act

And so we reach the final step of this project. Below you will find a summary of my key findings. Based on those I will answer the questions I initially made in the Ask phase.

Key Findings

  • Members had the bigger share of rides, amounting to a total of 59% of all rides.
  • The average ride length for annual members (12m42s) was less than half of the average ride length of casual riders (29m07s)
  • Casual riders tend to use Cyclistic much more on the weekends, we can see a huge difference when comparing to weekdays. On members that trend is not observed as they use Cyclistic even more on weekdays (you can see this on the interactive dashboard by applying the filter on the top right corner of the graph).
  • However when talking about average ride length both members and casual riders tend to take longer rides on weekends.
  • Both members and casual riders use Cyclistic more in the Afternoon, that time of the day amounts to 43.59% of all rides. The busiest hour turned out to be 17:00/5 PM for both members and casual riders, with 10% of all rides.
  • The busiest month for casual riders was July, as for members the busiest was actually August. The 3rd Quarter was the busiest, counting for 40.77% of all rides which was expected being that it includes most of the summer season.

Suggestions

  • How do annual members and casual riders use Cyclistic bikes differently?
    Based on what I found casual riders tend to use Cyclistic to make longer rides, the average ride length of casual riders (29 min) more than doubles the average ride length of annual members (12 min), which to me screams that in the eyes of casual riders using Cyclistic (buying a daily pass or single ride) “only” makes sense if they are going to take longer rides. This is supported by the fact that casual riders have a smaller percentage of the total rides (~41%), concluding that annual members use Cyclistic with more freedom since they don’t have to be concerned with maximizing their rides, they can always take a short ride, stop and take another with no downside.

  • What would make casual riders buy a membership?
    Adding more stations to cover more area is always a good start, I would also advise offering a discount to casual riders who are buying the first annual membership this way we can attract more annual members and I believe that after the first year the riders would see the true potential of having an annual membership and would keep being members for more years. Another good option would be to offer free months to new members instead of offering an overall discount to the first year.

  • How can Cyclistic use digital media to influence casual riders to become members?
    Here I think a great start would be to invest in ads, I suggest advertising in platforms like Spotify (who doesn’t listen to music when riding a bike?), making an ad promoting the discounts suggested above in a platform like Spotify would gather a lot of annual membership I reckon. On that note I would also suggest investing in ads on podcasts, youtube content creators and twitter pages of which most of the fanbase is composed of bikers, youtube channels about biking and exercise would be a good example.
    I would also suggest that a Cyclistic app with a personal profile that takes in consideration how each rider uses the company services and makes recommendations on how to maximize the potential of Cyclistic is created. It should include the promotions available and the benefits of becoming an annual member.