Follow on LinkedIn

Introduction

Welcome to my Capstone Case Study for Cyclistic. We will dive into the data analysis process to answer stakeholder questions and recommend actionable insights for data-driven decisions.

About Cyclistic

Cyclistic, a bike-sharing company in Chicago with a fleet of over 5,800 bikes, aims to increase annual memberships. Casual riders, who currently purchase single-ride or full-day passes, are less profitable for Cyclistic than annual members. However, the pricing flexibility has helped attract more customers. Cyclistic’s finance analysts have concluded that maximizing the number of annual members will be key to future growth. Lily Moreno, director of marketing, believes converting casual riders into members is a solid opportunity and set a goal to design marketing strategies aimed at this conversion. To better understand how to achieve this goal, her team needs to analyze historical data to identify trends and differences between annual members and casual riders.

Assumptions and Theories:

Based on the information provided, we can make some initial assumptions and theories. We may assume that annual members and casual riders exhibit distinct patterns in bike usage, ride duration, and frequency. Theorizing that annual members, being more committed, would have longer average ride durations and a higher usage frequency compared to casual riders, we also speculate that factors such as cost-effectiveness, convenience, carbon footprint awareness, and exclusive benefits may motivate casual riders to transition into annual members.

Problem

Three questions will guide the future marketing program:

  1. How do annual members and casual riders use Cyclistic bikes differenty? Task given by our manager, Lily Moreno.

  2. Why would casual riders buy Cyclistic’s annual memberships?

  3. How can Cyclistic use digital media to influence casual riders to become annual members?

Deliverables

1. A clear statement of the business task.

Our overarching goal is to drive the conversion of casual riders into devoted annual members, foster long-term loyalty, and maximize the growth of Cyclistic.

2. A description of sourced data.

Cyclistic’s historical trip data, based on the last annual cycle from April 2023 to March 2024. Motivate International Inc. has made the data available under this License

3. Documentation of any cleaning or manipulation of data.

All Excel, SQL, and R cleaning and manipulation are found in the GitHub Capstone-Case-Study-Projects repository “Case Study 1 - Cyclistic Bike-Share Analysis” in extensive detail.

4. A Summary of Analysis

The analysis of bike ride data reveals several key insights into user behavior and system performance. Overall, the annual average ride length for all users stands at 16 minutes and 07 seconds, with casual riders taking longer rides averaging 23 minutes and 05 seconds compared to members’ average of 12 minutes and 18 seconds. July emerges as the month with the largest variety of ride length durations, totaling 11,042 unique durations out of 20,011 distinct lengths recorded throughout the year. While customer usage peaks in August and declines in January, both casual and member riders show consistent usage patterns from May to October, with a notable drop from November to April. Peak usage occurs during late afternoons, especially between 3-6pm, and differs between casual riders, who peak on Saturdays, and members, who peak on Thursdays. Despite members constituting the majority of riders, casual riders contribute significantly to weekend usage, comprising 35.34% of rides compared to members’ 64.66%. Further analysis of top-performing days, ride lengths, and popular rideable types enriches our understanding of user preferences and system dynamics, underscoring the importance of tailored strategies to accommodate diverse user behaviors.

5. Visualizations and key findings.

All Excel, SQL, and R-supporting visualizations are found in theGitHub Capstone-Case-Study-Projects repository “Case Study 1 - Cyclistic Bike-Share Analysis” in extensive detail.

6. Top 3 recommendations for stakeholders. To drive the conversion of casual riders into devoted annual members, foster long-term loyalty, and maximize the growth of Cyclistic, stakeholders should consider the following recommendations:

Promotional Incentives for Membership Conversion: Implement targeted marketing campaigns offering exclusive discounts, perks, or incentives to encourage casual riders to sign up for annual memberships. Providing trial memberships or discounted rates for the first year can entice casual riders to experience the benefits of membership, such as unlimited rides, priority access to bikes, and member-exclusive events.

Enhanced User Experience and Engagement: Invest in improving the overall user experience of the Cyclistic platform and mobile app to enhance convenience, ease of use, and engagement for both casual and member riders. Introduce features such as personalized recommendations, rewards programs, and interactive challenges to incentivize continued usage and foster a sense of community among members.

Data-Driven Targeting and Customization: Utilize data analytics and segmentation techniques to identify key demographics, usage patterns, and preferences among casual riders. Tailor marketing messages, service offerings, and member benefits based on these insights to effectively target and engage potential members with personalized offers and experiences that resonate with their needs and interests.

By implementing these recommendations, Cyclistic can drive higher conversion rates of casual riders into loyal annual members, foster long-term customer loyalty, and achieve sustainable growth by maximizing user retention and satisfaction.

Solution:

In truth, the solution is beyond our assigned task. Any implications are nothing more than speculation without answering the other two stakeholder questions. However, what our question can tell you are facts about the difference between annual members and causal riders. With that said, from April 2023 to March 2024:

Excel Results

Excel

Over 30 SQL Query Results

SQL

R Results

Present in this Rmd file at the bottom of the page

Conclusion:

The findings of this case study provide valuable insights into the differences between annual members and casual riders of Cyclistic. The analysis revealed that casual riders tend to take longer rides, while annual members prefer shorter rides. Additionally, casual riders are more likely to use the bikes on weekends and on top-performing days, while members exhibit a consistent usage pattern throughout the week. This information can help Cyclistic develop targeted marketing strategies to convert casual riders into annual members. By offering incentives, enhancing the user experience, and utilizing data-driven insights, Cyclistic can increase the number of annual members and achieve sustainable growth. In conclusion, leveraging the comprehensive analysis of Cyclistic’s bike ride data, the company can develop tailored strategies to drive membership conversion, foster long-term loyalty, and maximize growth. By implementing recommendations such as promotional incentives and user experience enhancements, Cyclistic is poised to capitalize on opportunities and thrive in the competitive urban transportation market.

Next steps:

  1. We should collaborate with the rest of the team and bring together their insights from the questions Lily Moreno assigned them. Then I would suggest we consider running promotions targeted at casual users and utilizing targeted advertising in their most frequent locations.
  2. I would suggest gaining additional data from all users by creating surveys for each type that they receive via email or after a purchase, which would preferably include:
    • Preferred payment options: Investigate whether annual members and casual riders have preferences for specific payment options, such as credit cards, mobile apps, or in-person payments.

    • Demographic factors: Explore if there are any demographic differences between annual members and casual riders, such as age, gender, or income level, that could influence their bike usage behavior.

    • Purpose of rides: Examine the purpose of bike rides for annual members and casual riders. Do annual members primarily use bikes for commuting, while casual riders use them for leisure or recreational purposes?

    With all that new data, create user-specific marketing, such as:

    • Develop targeted marketing campaigns for members and casual users separately. Highlight the convenience, cost-effectiveness, and environmental benefits of membership to attract and retain members. For casual users, emphasize the flexibility and leisurely experience and create promotions to encourage weekday ridership.

    • Seasonal Promotions: Capitalize on the popularity of July by introducing special promotions, such as discounted membership rates, extended riding hours, or partnerships with local events and attractions. Additionally, for December, consider offering holiday-themed incentives, such as festive decorations, seasonal rides, or charity initiatives, to engage riders and increase usage.

  3. Lastly, I would continuously monitor and analyze rider data to identify emerging trends, patterns, and areas for improvement. This will enable the bike-sharing program to make informed decisions and adapt strategies to evolving user needs.

By implementing these recommendations, Cyclistic can foster a more personalized and enjoyable experience for both members and casual users. This tailored approach will not only attract new riders but also increase rider loyalty and engagement, leading to sustainable growth and a competitive edge in the market.

The proposed solutions gained from data analysis aim to enhance marketing efficiency by utilizing targeted strategies in advertising and surveys for each respective user type. Cylistic can position itself for sustained growth while fostering a positive impact on the community and promoting sustainable transportation.

all_trips <- bind_rows(db1,db2,db3,db4,db5,db6,db7,db8,db9,db10,db11,db12)
all_trips$ride_length <- as.numeric(as.POSIXlt(all_trips$ride_length, format = "%H:%M:%S"))
colnames(all_trips)  #List of column names
## [1] "ride_id"            "rideable_type"      "started_at"        
## [4] "ended_at"           "start_station_name" "end_station_name"  
## [7] "member_casual"      "ride_length"        "day_of_week"
nrow(all_trips)  #How many rows are in data frame?
## [1] 4355499
dim(all_trips)  #Dimensions of the data frame?
## [1] 4355499       9
head(all_trips)  #See the first 6 rows of data frame.  Also tail(all_trips)
## # A tibble: 6 × 9
##   ride_id  rideable_type started_at ended_at start_station_name end_station_name
##   <chr>    <chr>         <chr>      <chr>    <chr>              <chr>           
## 1 AD30447… classic_bike  4/1/2023 … 4/1/202… Sheffield Ave & W… Sheffield Ave &…
## 2 DC3F49A… classic_bike  4/1/2023 … 4/1/202… California Ave & … California Ave …
## 3 749D486… classic_bike  4/1/2023 … 4/1/202… Sheridan Rd & Noy… Chicago Ave & S…
## 4 D50B53B… electric_bike 4/1/2023 … 4/1/202… Southport Ave & R… Southport Ave &…
## 5 563BB19… classic_bike  4/1/2023 … 4/1/202… Wentworth Ave & 3… Halsted St & 35…
## 6 C00E421… electric_bike 4/1/2023 … 4/1/202… Clark St & Armita… Clark St & Drum…
## # ℹ 3 more variables: member_casual <chr>, ride_length <dbl>, day_of_week <dbl>
str(all_trips)  #See list of columns and data types (numeric, character, etc)
## spc_tbl_ [4,355,499 × 9] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ ride_id           : chr [1:4355499] "AD304476EF192169" "DC3F49A1DA0EABFD" "749D48646A35C4F0" "D50B53B3DBE498FF" ...
##  $ rideable_type     : chr [1:4355499] "classic_bike" "classic_bike" "classic_bike" "electric_bike" ...
##  $ started_at        : chr [1:4355499] "4/1/2023 0:00" "4/1/2023 0:02" "4/1/2023 0:01" "4/1/2023 0:05" ...
##  $ ended_at          : chr [1:4355499] "4/1/2023 0:03" "4/1/2023 0:04" "4/1/2023 0:05" "4/1/2023 0:05" ...
##  $ start_station_name: chr [1:4355499] "Sheffield Ave & Wrightwood Ave" "California Ave & Milwaukee Ave" "Sheridan Rd & Noyes St (NU)" "Southport Ave & Roscoe St" ...
##  $ end_station_name  : chr [1:4355499] "Sheffield Ave & Webster Ave" "California Ave & Milwaukee Ave" "Chicago Ave & Sheridan Rd" "Southport Ave & Roscoe St" ...
##  $ member_casual     : chr [1:4355499] "member" "member" "casual" "member" ...
##  $ ride_length       : num [1:4355499] 183 95 244 3 422 259 194 177 298 396 ...
##  $ day_of_week       : num [1:4355499] 7 7 7 7 7 7 7 7 7 7 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   ride_id = col_character(),
##   ..   rideable_type = col_character(),
##   ..   started_at = col_character(),
##   ..   ended_at = col_character(),
##   ..   start_station_name = col_character(),
##   ..   end_station_name = col_character(),
##   ..   member_casual = col_character(),
##   ..   ride_length = col_time(format = ""),
##   ..   day_of_week = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
summary(all_trips)  #Statistical summary of data. Mainly for numerics
##    ride_id          rideable_type       started_at          ended_at        
##  Length:4355499     Length:4355499     Length:4355499     Length:4355499    
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##  start_station_name end_station_name   member_casual       ride_length     
##  Length:4355499     Length:4355499     Length:4355499     Min.   :    0.0  
##  Class :character   Class :character   Class :character   1st Qu.:  344.0  
##  Mode  :character   Mode  :character   Mode  :character   Median :  599.0  
##                                                           Mean   :  967.4  
##                                                           3rd Qu.: 1063.0  
##                                                           Max.   :86392.0  
##   day_of_week   
##  Min.   :1.000  
##  1st Qu.:2.000  
##  Median :4.000  
##  Mean   :4.113  
##  3rd Qu.:6.000  
##  Max.   :7.000
all_trips$date <- as.Date(all_trips$started_at, format = "%m/%d/%Y %H:%M") 
all_trips$month <- format(as.Date(all_trips$date), "%m")
all_trips$day <- format(as.Date(all_trips$date), "%d")
all_trips$year <- format(as.Date(all_trips$date), "%Y")
all_trips$day_of_week <- format(as.Date(all_trips$date), "%A")
all_trips_v2 <- all_trips[!(all_trips$start_station_name == "HQ QR" | all_trips$ride_length<0),]
mean(all_trips_v2$ride_length) 
## [1] 967.4112
median(all_trips_v2$ride_length) 
## [1] 599
max(all_trips_v2$ride_length) 
## [1] 86392
min(all_trips_v2$ride_length)
## [1] 0
summary(all_trips_v2$ride_length)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0   344.0   599.0   967.4  1063.0 86392.0
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = mean)
##   all_trips_v2$member_casual all_trips_v2$ride_length
## 1                     casual                1385.5452
## 2                     member                 738.9054
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = median)
##   all_trips_v2$member_casual all_trips_v2$ride_length
## 1                     casual                      778
## 2                     member                      525
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = max)
##   all_trips_v2$member_casual all_trips_v2$ride_length
## 1                     casual                    86261
## 2                     member                    86392
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = min)
##   all_trips_v2$member_casual all_trips_v2$ride_length
## 1                     casual                        0
## 2                     member                        0
all_trips_v2$day_of_week <- ordered(all_trips_v2$day_of_week, 
  levels=c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual + all_trips_v2$day_of_week, FUN = function(x) round(mean(x), 2))
##    all_trips_v2$member_casual all_trips_v2$day_of_week all_trips_v2$ride_length
## 1                      casual                   Sunday                  1605.37
## 2                      member                   Sunday                   825.14
## 3                      casual                   Monday                  1370.22
## 4                      member                   Monday                   705.95
## 5                      casual                  Tuesday                  1250.19
## 6                      member                  Tuesday                   715.22
## 7                      casual                Wednesday                  1184.54
## 8                      member                Wednesday                   707.79
## 9                      casual                 Thursday                  1197.37
## 10                     member                 Thursday                   705.39
## 11                     casual                   Friday                  1346.26
## 12                     member                   Friday                   730.54
## 13                     casual                 Saturday                  1557.03
## 14                     member                 Saturday                   823.07
all_trips_v2 <- all_trips_v2 %>%
  mutate(started_at = as.POSIXct(started_at, format = "%m/%d/%Y %H:%M"))
all_trips_v2 %>%
  mutate(weekday = wday(started_at, label = TRUE)) %>%
  group_by(member_casual, weekday) %>%
  summarise(
    number_of_rides = n(),
    average_duration = mean(ride_length)
  ) %>%
  arrange(member_casual, weekday)
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
## # A tibble: 14 × 4
## # Groups:   member_casual [2]
##    member_casual weekday number_of_rides average_duration
##    <chr>         <ord>             <int>            <dbl>
##  1 casual        Sun              255156            1605.
##  2 casual        Mon              176790            1370.
##  3 casual        Tue              180145            1250.
##  4 casual        Wed              181465            1185.
##  5 casual        Thu              200549            1197.
##  6 casual        Fri              228897            1346.
##  7 casual        Sat              316119            1557.
##  8 member        Sun              306838             825.
##  9 member        Mon              391777             706.
## 10 member        Tue              444157             715.
## 11 member        Wed              454949             708.
## 12 member        Thu              462618             705.
## 13 member        Fri              399908             731.
## 14 member        Sat              356131             823.
all_trips_v2 %>%
  mutate(weekday = wday(started_at, label = TRUE)) %>%
  group_by(member_casual, weekday) %>%
  summarise(
    number_of_rides = n(),
    average_duration = mean(ride_length)
    ) %>%
  arrange(member_casual, weekday) %>%
  ggplot(aes(x = weekday, y = number_of_rides, fill = member_casual)) +
  geom_col(position = "dodge") +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "Total Annual Rides by Weekday")
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.

all_trips_v2 %>%
  mutate(weekday = wday(started_at, label = TRUE)) %>%
  group_by(member_casual, weekday) %>%
  summarise(
    number_of_rides = n(),
    average_duration = mean(ride_length)
    ) %>%
  arrange(member_casual, weekday) %>%
  ggplot(aes(x = weekday, y = average_duration / 60, fill = member_casual)) +
  geom_col(position = "dodge") +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "Average Annual Ride Duration by Weekday", y = "Average Duration (minutes)")
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.