Title: “Case Study: Bike Share Analysis”

Author: Kimberly Sheffield

Date: 2024-02-25

Output: htmL

Introduction:

Welcome to my analysis case study on Cyclistic bike-share. In this study, I’ll address real-world challenges encountered by Cyclistic, a bike-share company in Chicago. I’ll navigate through the data analysis process to tackle crucial business inquiries and generate actionable insights.

Background:

The company’s director of marketing, Lily Moreno, believes that the future success of Cyclistic relies on maximizing the number of annual memberships. To achieve this, I aim to understand the differences in behavior between casual riders and annual members, then leverage these insights to design a new marketing strategy. We need to:

  1. Gain a comprehensive understanding of the distinct behaviors and preferences exhibited by annual members and casual riders.

  2. Develop an innovative and targeted marketing strategy that effectively converts casual riders into loyal annual members.

  3. Strengthen our thesis and marketing recommendations by supporting them with compelling data insights and professional data visualizations.

Ask:

Based on the information provided, I can make some initial assumptions and theories. I may assume that annual members and casual riders exhibit distinct patterns in bike usage, ride duration, and frequency. Theorizing that annual members, being more committed, would have longer average ride duration and a higher usage frequency compared to casual riders, I also speculate that factors such as cost-effectiveness, convenience, carbon footprint awareness, and exclusive benefits may motivate casual riders to transition into annual members.

Three questions will guide the future marketing program:

  1. How do annual members and casual riders use Cyclistic bikes differently?

  2. Why would casual riders buy Cyclistic annual memberships?

  3. How can Cyclistic use digital media to influence casual riders to become members?

Prepare:

I will produce a report with the following deliverables:

  1. A clear statement of the business task

Our primary objective is to encourage casual riders to become dedicated annual members, nurture lasting loyalty, and ultimately, maximize Cyclistic’s growth.

  1. A description of all data sources used

We will use Cyclistic’s historical trip data to analyze and identify trends. Our analysis will be based on the last annual cycle, from January 2023 to December 2023. The data has been made available by Motivate International Inc. under this License.

  1. Documentation of any cleaning or manipulation of data in RStudio

  2. A summary of my analysis

  3. Supporting visualizations and key findings

  4. Your top three recommendations based on my analysis

  5. Concentrate on specialized marketing efforts to raise awareness about the advantages of membership and provide exclusive promotions in areas experiencing a rise in health-conscious individuals engaging in low-impact physical activities to extend their health span.

  6. Develop motivating incentives aimed at enticing occasional riders to transition, highlighting benefits like minimizing carbon footprint, saving finances, and enhancing health span through gentle physical activity.

Process:

colnames(all_trips)  #List of column names
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "member_casual"     
## [10] "ride_length"        "day_of_week"
nrow(all_trips)  #How many rows are in data frame?
## [1] 4179889
dim(all_trips)  #Dimensions of the data frame?
## [1] 4179889      11
head(all_trips)  #See the first 6 rows of data frame.  Also tail(all_trips)
## # A tibble: 6 × 11
##   ride_id  rideable_type started_at ended_at start_station_name start_station_id
##   <chr>    <chr>         <chr>      <chr>    <chr>              <chr>           
## 1 F96D5A7… electric_bike 1/21/2023… 1/21/20… Lincoln Ave & Ful… TA1309000058    
## 2 13CB7EB… classic_bike  1/10/2023… 1/10/20… Kimbark Ave & 53r… TA1309000037    
## 3 BD88A2E… electric_bike 1/2/2023 … 1/2/202… Western Ave & Lun… RP-005          
## 4 C90792D… classic_bike  1/22/2023… 1/22/20… Kimbark Ave & 53r… TA1309000037    
## 5 3397017… classic_bike  1/12/2023… 1/12/20… Kimbark Ave & 53r… TA1309000037    
## 6 58E6815… electric_bike 1/31/2023… 1/31/20… Lakeview Ave & Fu… TA1309000019    
## # ℹ 5 more variables: end_station_name <chr>, end_station_id <chr>,
## #   member_casual <chr>, ride_length <dbl>, day_of_week <dbl>
str(all_trips)  #See list of columns and data types (numeric, character, etc)
## tibble [4,179,889 × 11] (S3: tbl_df/tbl/data.frame)
##  $ ride_id           : chr [1:4179889] "F96D5A74A3E41399" "13CB7EB698CEDB88" "BD88A2E670661CE5" "C90792D034FED968" ...
##  $ rideable_type     : chr [1:4179889] "electric_bike" "classic_bike" "electric_bike" "classic_bike" ...
##  $ started_at        : chr [1:4179889] "1/21/2023 20:05" "1/10/2023 15:37" "1/2/2023 7:51" "1/22/2023 10:52" ...
##  $ ended_at          : chr [1:4179889] "1/21/2023 20:16" "1/10/2023 15:46" "1/2/2023 8:05" "1/22/2023 11:01" ...
##  $ start_station_name: chr [1:4179889] "Lincoln Ave & Fullerton Ave" "Kimbark Ave & 53rd St" "Western Ave & Lunt Ave" "Kimbark Ave & 53rd St" ...
##  $ start_station_id  : chr [1:4179889] "TA1309000058" "TA1309000037" "RP-005" "TA1309000037" ...
##  $ end_station_name  : chr [1:4179889] "Hampden Ct & Diversey Ave" "Greenwood Ave & 47th St" "Valli Produce - Evanston Plaza" "Greenwood Ave & 47th St" ...
##  $ end_station_id    : chr [1:4179889] "202480" "TA1308000002" "599" "TA1308000002" ...
##  $ member_casual     : chr [1:4179889] "member" "member" "casual" "member" ...
##  $ ride_length       : num [1:4179889] 660 540 840 540 900 180 840 540 780 720 ...
##  $ day_of_week       : num [1:4179889] 7 3 2 1 5 3 1 4 4 6 ...
summary(all_trips)  #Statistical summary of data. Mainly for numerics
##    ride_id          rideable_type       started_at          ended_at        
##  Length:4179889     Length:4179889     Length:4179889     Length:4179889    
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  start_station_name start_station_id   end_station_name   end_station_id    
##  Length:4179889     Length:4179889     Length:4179889     Length:4179889    
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  member_casual       ride_length        day_of_week    
##  Length:4179889     Min.   :     0.0   Min.   :1.0     
##  Class :character   1st Qu.:   360.0   1st Qu.:2.0     
##  Mode  :character   Median :   600.0   Median :4.0     
##                     Mean   :   954.9   Mean   :4.1     
##                     3rd Qu.:  1020.0   3rd Qu.:6.0     
##                     Max.   :147480.0   Max.   :7.0     
##                     NA's   :3          NA's   :326899
all_trips$date <- as.Date(all_trips$started_at, format = "%m/%d/%Y %H:%M") 
all_trips$month <- format(as.Date(all_trips$date), "%m")
all_trips$day <- format(as.Date(all_trips$date), "%d")
all_trips$year <- format(as.Date(all_trips$date), "%Y")
all_trips$day_of_week <- format(as.Date(all_trips$date), "%A")
all_trips_v2 <- all_trips[!(all_trips$start_station_name == "HQ QR" | all_trips$ride_length<0),]
mean(all_trips_v2$ride_length) 
## [1] NA
median(all_trips_v2$ride_length) 
## [1] NA
max(all_trips_v2$ride_length) 
## [1] NA
min(all_trips_v2$ride_length)
## [1] NA
summary(all_trips_v2$ride_length)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
##      0.0    360.0    600.0    954.9   1020.0 147480.0       24
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = mean)
##   all_trips_v2$member_casual all_trips_v2$ride_length
## 1                     casual                1371.5865
## 2                     member                 727.4112
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = median)
##   all_trips_v2$member_casual all_trips_v2$ride_length
## 1                     casual                      780
## 2                     member                      540
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = max)
##   all_trips_v2$member_casual all_trips_v2$ride_length
## 1                     casual                   147480
## 2                     member                    89880
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = min)
##   all_trips_v2$member_casual all_trips_v2$ride_length
## 1                     casual                        0
## 2                     member                        0
all_trips_v2$day_of_week <- ordered(all_trips_v2$day_of_week, 
                                    levels=c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual + all_trips_v2$day_of_week, FUN = function(x) round(mean(x), 2))
##    all_trips_v2$member_casual all_trips_v2$day_of_week all_trips_v2$ride_length
## 1                      casual                   Sunday                  1588.96
## 2                      member                   Sunday                   816.39
## 3                      casual                   Monday                  1350.67
## 4                      member                   Monday                   693.39
## 5                      casual                  Tuesday                  1226.77
## 6                      member                  Tuesday                   698.70
## 7                      casual                Wednesday                  1171.71
## 8                      member                Wednesday                   694.53
## 9                      casual                 Thursday                  1195.88
## 10                     member                 Thursday                   696.20
## 11                     casual                   Friday                  1337.99
## 12                     member                   Friday                   720.95
## 13                     casual                 Saturday                  1549.52
## 14                     member                 Saturday                   814.48
all_trips_v2 <- all_trips_v2 %>%
  mutate(started_at = as.POSIXct(started_at, format = "%m/%d/%Y %H:%M"))
all_trips_v2 %>%
  mutate(weekday = wday(started_at, label = TRUE)) %>%
  group_by(member_casual, weekday) %>%
  summarise(
    number_of_rides = n(),
    average_duration = mean(ride_length)
  ) %>%
  arrange(member_casual, weekday)
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
## # A tibble: 15 × 4
## # Groups:   member_casual [3]
##    member_casual weekday number_of_rides average_duration
##    <chr>         <ord>             <int>            <dbl>
##  1 casual        Sun              244063            1589.
##  2 casual        Mon              169512            1351.
##  3 casual        Tue              175979            1227.
##  4 casual        Wed              177380            1172.
##  5 casual        Thu              193240            1196.
##  6 casual        Fri              219360            1338.
##  7 casual        Sat              296484            1550.
##  8 member        Sun              296909             816.
##  9 member        Mon              373935             693.
## 10 member        Tue              433874             699.
## 11 member        Wed              438064             695.
## 12 member        Thu              438318             696.
## 13 member        Fri              385788             721.
## 14 member        Sat              336959             814.
## 15 <NA>          <NA>                 24              NA
all_trips_v2 %>%
  mutate(weekday = wday(started_at, label = TRUE)) %>%
  group_by(member_casual, weekday) %>%
  summarise(
    number_of_rides = n(),
    average_duration = mean(ride_length)
  ) %>%
  arrange(member_casual, weekday) %>%
  ggplot(aes(x = weekday, y = number_of_rides, fill = member_casual)) +
  geom_col(position = "dodge") +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "Total Annual Rides by Weekday")
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.

all_trips_v2 %>%
  mutate(weekday = wday(started_at, label = TRUE)) %>%
  group_by(member_casual, weekday) %>%
  summarise(
    number_of_rides = n(),
    average_duration = mean(ride_length)
  ) %>%
  arrange(member_casual, weekday) %>%
  ggplot(aes(x = weekday, y = average_duration / 60, fill = member_casual)) +
  geom_col(position = "dodge") +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "Average Annual Ride Duration by Weekday", y = "Average Duration (minutes)")
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
## Warning: Removed 1 rows containing missing values (`geom_col()`).

Analyze:

Casual users exhibit different preferences. They ride less frequently but for longer distances, with a preference for weekends, which peak on Saturdays. Their top start and end stations are located at Streeter Dr. and Grand Ave., indicating a different area of interest compared to annual members, allowing for targeted advertising to casual riders.

All of these findings highlight the importance of understanding the distinct behaviors and preferences of members and casual users. By discovering this information, Cyclistic can tailor our strategies to better serve these different user types. Cyclistic can optimize station placements, adjust operational schedules to accommodate peak times, and design targeted marketing campaigns to attract and retain both annual members and casual riders. All of this can enhance the overall user experience and lead to greater customer satisfaction while bringing Cyclistic sustain growth. This data-driven analysis demonstrates the power of leveraging information to make informed decisions, driving continual growth, and ensuring the success of Cyclistic.

Facts about the difference between annual members and causal riders from January 2023 to December 2023:

Annually, members ride more frequently than casual users, with a 34.7% higher ride frequency. Tuesday to Thursday are the most popular days for members, suggesting a regular weekday commuting pattern. Both members and casual users ride most often between 3-6 p.m., indicating high demand during the late afternoon hours. The top start and end stations for members are Kingsbury and Kinzie St., while casual users prefer Streeter Dr. and Grand Ave. Casual users predominantly ride on Saturday and Sunday, indicating a preference for weekends.

Casual users ride 33% longer distances on average compared to members, potentially indicating a preference for longer rides or exploring different destinations. July records the highest ridership, while December has the lowest, possibly due to weather conditions and holiday-related factors. Saturday is the most popular day of the week across all users and is the only day casual users slightly surpass members in usage.

Share:

I will bring together insights from the questions Lily Moreno assigned them. Then I would suggest we consider running promotions targeted at casual users and utilizing targeted advertising in their most frequent locations. I would suggest gaining additional data from all users by creating surveys for each type that they receive via email or after a purchase, which would preferably include:

With all that new data, create user-specific marketing, such as:

Act:

I suggest to shareholders the importance of ongoing monitoring and analysis of rider data to pinpoint emerging trends, patterns, and areas for enhancement. This practice will empower the bike-sharing program to make well-informed decisions and adjust strategies according to evolving user preferences. Implementing these measures will allow Cyclistic to create a more personalized and satisfying experience for both members and casual users. This tailored approach will not only attract new riders but also foster increased loyalty and engagement among existing users, leading to sustainable growth and a competitive advantage in the market.

Conclusion:

In conclusion, the solutions derived from data analysis are designed to optimize marketing efficiency through targeted advertising and tailored surveys for different user segments. By implementing these strategies, Cyclistic can position itself for long-term growth while also making a positive impact on the community and advocating for sustainable transportation practices.