capstone1.knit

title: “Case Study: Bike Share Analysis” Author: Kimberly Sheffield Date: “2024-02-25” output: html_notebook —

Introduction:

Welcome to my Cyclistic bike-share analysis case study. Here, we will tackle real-world challenges faced by Cyclistic, a bike-share company based in Chicago. Throughout this study, we will follow the data analysis process to answer key business questions and develop actionable insights.

Deliverables:

The company’s director of marketing, Lily Moreno, believes that the future success of Cyclistic relies on maximizing the number of annual memberships. To achieve this, we aim to understand the differences in behavior between casual riders and annual members, then leverage these insights to design a new marketing strategy. We need to:

Gain a comprehensive understanding of the distinct behaviors and preferences exhibited by annual members and casual riders.
Develop an innovative and targeted marketing strategy that effectively converts casual riders into loyal annual members.
Strengthen our thesis and marketing recommendations by supporting them with compelling data insights and professional data visualizations.

Assumptions and Theories:

Based on the information provided, we can make some initial assumptions and theories. We may assume that annual members and casual riders exhibit distinct patterns in bike usage, ride duration, and frequency. Theorizing that annual members, being more committed, would have longer average ride durations and a higher usage frequency compared to casual riders, we also speculate that factors such as cost-effectiveness, convenience, carbon footprint awareness, and exclusive benefits may motivate casual riders to transition into annual members.

Problem:

Three questions will guide the future marketing program:

How do annual members and casual riders use Cyclistic bikes differently?
Why would casual riders buy Cyclistic annual memberships?
How can Cyclistic use digital media to influence casual riders to become members?

Ask:

We will produce a report with the following deliverables:

A clear statement of the business task

Our overarching goal is to drive the conversion of casual riders into devoted annual members, foster long-term loyalty, and maximize the growth of Cyclistic.

A description of all data sources used

We will use Cyclistic’s historical trip data to analyze and identify trends. Our analysis will be based on the last annual cycle, from April 2022 to May 2023. The data has been made available by Motivate International Inc. under this License.

Documentation of any cleaning or manipulation of data
A summary of your analysis

Annually, members ride more frequently than casual users, with a 34.7% higher ride frequency. Tuesday to Thursday are the most popular days for members, suggesting a regular weekday commuting pattern. Both members and casual users ride most often between 3-6p.m., indicating high demand during the late afternoon hours. The top start and end stations for members are Kingsbury and Kinzie St., while casual users prefer Streeter Dr. and Grand Ave. Casual users predominantly ride on Saturday and Sunday, indicating a preference for weekends.

Casual users ride 33% longer distances on average compared to members, potentially indicating a preference for longer rides or exploring different destinations. July records the highest ridership, while December has the lowest, possibly due to weather conditions and holiday-related factors. Saturday is the most popular day of the week across all users and is the only day casual users slightly surpass members in usage.

Supporting visualizations and key findings
Your top three recommendations based on your analysis
Concentrate on specialized marketing efforts to raise awareness about the advantages of membership and provide exclusive promotions in areas experiencing a rise in health-conscious individuals engaging in low-impact physical activities to extend their health span.
Develop motivating incentives aimed at enticing occasional riders to transition, highlighting benefits like minimizing carbon footprint, saving finances, and enhancing health span through gentle physical activity.

“3. Design tailored surveys for diverse user demographics to gather valuable insights by leveraging data-driven analysis of user type preferences and hotspots.”

Solution:

In truth, the solution is beyond our assigned task. Any implications are nothing more than speculation without answering the other two stakeholder questions. However, what our question can tell you are facts about the difference between annual members and causal riders. With that said, from January 2023 to December 2023:

Members ride 34.7% more often than casual users
Members average ride duration was 12 minutes and 04 seconds
Members most often ride from Tuesday to Thursday and weekdays in general
Members top start and end stations are located at Kingsbury and Kinzie St.
Members prefer our classic bikes at a rate of 63.8% of all their rentals. A distant second are electric bikes at 36.2%
Casual users top start and end stations are located at Streeter Dr. and Grand Ave.
Casual users most often ride on Saturdays and weekends in general
Casual users ride 53.1% longer than members on average

Conclusion:

The analysis of the data gave us several key insights. Members ride more frequently than casual users, showing a clear preference for weekdays, which peak on Wednesdays. They also tend to ride during the late afternoon hours, indicating a commuting pattern. The top start and end stations for members are Kingsbury and Kinzie St., suggesting a concentrated area of member activity. That gives us an opportunity to create targeted surveys to gain valuable insights specifically about what makes our services worth their annual membership.

Casual users exhibit different preferences. They ride less frequently but for longer distances, with a preference for weekends, which peak on Saturdays. Their top start and end stations are located at Streeter Dr. and Grand Ave., indicating a different area of interest compared to annual members, allowing for targeted advertising to casual riders.

All of these findings highlight the importance of understanding the distinct behaviors and preferences of members and casual users. By discovering this information, Cyclistic can tailor our strategies to better serve these different user types. Cyclistic can optimize station placements, adjust operational schedules to accommodate peak times, and design targeted marketing campaigns to attract and retain both annual members and casual riders. All of this can enhance the overall user experience and lead to greater customer satisfaction while bringing Cyclistic sustain growth. This data-driven analysis demonstrates the power of leveraging information to make informed decisions, driving continual growth, and ensuring the success of Cyclistic.

Next steps:

We should collaborate with the rest of the team and bring together their insights from the questions Lily Moreno assigned them. Then I would suggest we consider running promotions targeted at casual users and utilizing targeted advertising in their most frequent locations.

I would suggest gaining additional data from all users by creating surveys for each type that they receive via email or after a purchase, which would preferably include:

Preferred payment options: Investigate whether annual members and casual riders have preferences for specific payment options, such as credit cards, mobile apps, or in-person payments.
Demographic factors: Explore if there are any demographic differences between annual members and casual riders, such as age, gender, or income level, that could influence their bike usage behavior.
Purpose of rides: Examine the purpose of bike rides for annual members and casual riders. Do annual members primarily use bikes for commuting, while casual riders use them for leisure or recreational purposes?

With all that new data, create user-specific marketing, such as:

Develop targeted marketing campaigns for members and casual users separately. Highlight the convenience, cost-effectiveness, and environmental benefits of membership to attract and retain members. For casual users, emphasize the flexibility and leisurely experience and create promotions to encourage weekday ridership.
Seasonal Promotions: Capitalize on the popularity of July by introducing special promotions, such as discounted membership rates, extended riding hours, or partnerships with local events and attractions. Additionally, for December, consider offering holiday-themed incentives, such as festive decorations, seasonal rides, or charity initiatives, to engage riders and increase usage.

Lastly, I would continuously monitor and analyze rider data to identify emerging trends, patterns, and areas for improvement. This will enable the bike-sharing program to make informed decisions and adapt strategies to evolving user needs.

By implementing these recommendations, Cyclistic can foster a more personalized and enjoyable experience for both members and casual users. This tailored approach will not only attract new riders but also increase rider loyalty and engagement, leading to sustainable growth and a competitive edge in the market.

The proposed solutions gained from data analysis aim to enhance marketing efficiency by utilizing targeted strategies in advertising and surveys for each respective user type. Cylistic can position itself for sustained growth while fostering a positive impact on the community and promoting sustainable transportation.

colnames(all_trips)  #List of column names

##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "member_casual"     
## [10] "ride_length"        "day_of_week"

nrow(all_trips)  #How many rows are in data frame?

## [1] 4179889

dim(all_trips)  #Dimensions of the data frame?

## [1] 4179889      11

head(all_trips)  #See the first 6 rows of data frame.  Also tail(all_trips)

## # A tibble: 6 × 11
##   ride_id  rideable_type started_at ended_at start_station_name start_station_id
##   <chr>    <chr>         <chr>      <chr>    <chr>              <chr>           
## 1 F96D5A7… electric_bike 1/21/2023… 1/21/20… Lincoln Ave & Ful… TA1309000058    
## 2 13CB7EB… classic_bike  1/10/2023… 1/10/20… Kimbark Ave & 53r… TA1309000037    
## 3 BD88A2E… electric_bike 1/2/2023 … 1/2/202… Western Ave & Lun… RP-005          
## 4 C90792D… classic_bike  1/22/2023… 1/22/20… Kimbark Ave & 53r… TA1309000037    
## 5 3397017… classic_bike  1/12/2023… 1/12/20… Kimbark Ave & 53r… TA1309000037    
## 6 58E6815… electric_bike 1/31/2023… 1/31/20… Lakeview Ave & Fu… TA1309000019    
## # ℹ 5 more variables: end_station_name <chr>, end_station_id <chr>,
## #   member_casual <chr>, ride_length <dbl>, day_of_week <dbl>

str(all_trips)  #See list of columns and data types (numeric, character, etc)

## tibble [4,179,889 × 11] (S3: tbl_df/tbl/data.frame)
##  $ ride_id           : chr [1:4179889] "F96D5A74A3E41399" "13CB7EB698CEDB88" "BD88A2E670661CE5" "C90792D034FED968" ...
##  $ rideable_type     : chr [1:4179889] "electric_bike" "classic_bike" "electric_bike" "classic_bike" ...
##  $ started_at        : chr [1:4179889] "1/21/2023 20:05" "1/10/2023 15:37" "1/2/2023 7:51" "1/22/2023 10:52" ...
##  $ ended_at          : chr [1:4179889] "1/21/2023 20:16" "1/10/2023 15:46" "1/2/2023 8:05" "1/22/2023 11:01" ...
##  $ start_station_name: chr [1:4179889] "Lincoln Ave & Fullerton Ave" "Kimbark Ave & 53rd St" "Western Ave & Lunt Ave" "Kimbark Ave & 53rd St" ...
##  $ start_station_id  : chr [1:4179889] "TA1309000058" "TA1309000037" "RP-005" "TA1309000037" ...
##  $ end_station_name  : chr [1:4179889] "Hampden Ct & Diversey Ave" "Greenwood Ave & 47th St" "Valli Produce - Evanston Plaza" "Greenwood Ave & 47th St" ...
##  $ end_station_id    : chr [1:4179889] "202480" "TA1308000002" "599" "TA1308000002" ...
##  $ member_casual     : chr [1:4179889] "member" "member" "casual" "member" ...
##  $ ride_length       : num [1:4179889] 660 540 840 540 900 180 840 540 780 720 ...
##  $ day_of_week       : num [1:4179889] 7 3 2 1 5 3 1 4 4 6 ...

summary(all_trips)  #Statistical summary of data. Mainly for numerics

##    ride_id          rideable_type       started_at          ended_at        
##  Length:4179889     Length:4179889     Length:4179889     Length:4179889    
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  start_station_name start_station_id   end_station_name   end_station_id    
##  Length:4179889     Length:4179889     Length:4179889     Length:4179889    
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  member_casual       ride_length        day_of_week    
##  Length:4179889     Min.   :     0.0   Min.   :1.0     
##  Class :character   1st Qu.:   360.0   1st Qu.:2.0     
##  Mode  :character   Median :   600.0   Median :4.0     
##                     Mean   :   954.9   Mean   :4.1     
##                     3rd Qu.:  1020.0   3rd Qu.:6.0     
##                     Max.   :147480.0   Max.   :7.0     
##                     NA's   :3          NA's   :326899

all_trips$date <- as.Date(all_trips$started_at, format = "%m/%d/%Y %H:%M") 
all_trips$month <- format(as.Date(all_trips$date), "%m")
all_trips$day <- format(as.Date(all_trips$date), "%d")
all_trips$year <- format(as.Date(all_trips$date), "%Y")
all_trips$day_of_week <- format(as.Date(all_trips$date), "%A")

all_trips_v2 <- all_trips[!(all_trips$start_station_name == "HQ QR" | all_trips$ride_length<0),]

mean(all_trips_v2$ride_length)

## [1] NA

median(all_trips_v2$ride_length)

## [1] NA

max(all_trips_v2$ride_length)

## [1] NA

min(all_trips_v2$ride_length)

## [1] NA

summary(all_trips_v2$ride_length)

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
##      0.0    360.0    600.0    954.9   1020.0 147480.0       24

aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = mean)

##   all_trips_v2$member_casual all_trips_v2$ride_length
## 1                     casual                1371.5865
## 2                     member                 727.4112

aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = median)

##   all_trips_v2$member_casual all_trips_v2$ride_length
## 1                     casual                      780
## 2                     member                      540

aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = max)

##   all_trips_v2$member_casual all_trips_v2$ride_length
## 1                     casual                   147480
## 2                     member                    89880

aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = min)

##   all_trips_v2$member_casual all_trips_v2$ride_length
## 1                     casual                        0
## 2                     member                        0

all_trips_v2$day_of_week <- ordered(all_trips_v2$day_of_week, 
                                    levels=c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))

aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual + all_trips_v2$day_of_week, FUN = function(x) round(mean(x), 2))

##    all_trips_v2$member_casual all_trips_v2$day_of_week all_trips_v2$ride_length
## 1                      casual                   Sunday                  1588.96
## 2                      member                   Sunday                   816.39
## 3                      casual                   Monday                  1350.67
## 4                      member                   Monday                   693.39
## 5                      casual                  Tuesday                  1226.77
## 6                      member                  Tuesday                   698.70
## 7                      casual                Wednesday                  1171.71
## 8                      member                Wednesday                   694.53
## 9                      casual                 Thursday                  1195.88
## 10                     member                 Thursday                   696.20
## 11                     casual                   Friday                  1337.99
## 12                     member                   Friday                   720.95
## 13                     casual                 Saturday                  1549.52
## 14                     member                 Saturday                   814.48

all_trips_v2 <- all_trips_v2 %>%
  mutate(started_at = as.POSIXct(started_at, format = "%m/%d/%Y %H:%M"))

all_trips_v2 %>%
  mutate(weekday = wday(started_at, label = TRUE)) %>%
  group_by(member_casual, weekday) %>%
  summarise(
    number_of_rides = n(),
    average_duration = mean(ride_length)
  ) %>%
  arrange(member_casual, weekday)

## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.

## # A tibble: 15 × 4
## # Groups:   member_casual [3]
##    member_casual weekday number_of_rides average_duration
##    <chr>         <ord>             <int>            <dbl>
##  1 casual        Sun              244063            1589.
##  2 casual        Mon              169512            1351.
##  3 casual        Tue              175979            1227.
##  4 casual        Wed              177380            1172.
##  5 casual        Thu              193240            1196.
##  6 casual        Fri              219360            1338.
##  7 casual        Sat              296484            1550.
##  8 member        Sun              296909             816.
##  9 member        Mon              373935             693.
## 10 member        Tue              433874             699.
## 11 member        Wed              438064             695.
## 12 member        Thu              438318             696.
## 13 member        Fri              385788             721.
## 14 member        Sat              336959             814.
## 15 <NA>          <NA>                 24              NA

all_trips_v2 %>%
  mutate(weekday = wday(started_at, label = TRUE)) %>%
  group_by(member_casual, weekday) %>%
  summarise(
    number_of_rides = n(),
    average_duration = mean(ride_length)
  ) %>%
  arrange(member_casual, weekday) %>%
  ggplot(aes(x = weekday, y = number_of_rides, fill = member_casual)) +
  geom_col(position = "dodge") +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "Total Annual Rides by Weekday")

## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.

all_trips_v2 %>%
  mutate(weekday = wday(started_at, label = TRUE)) %>%
  group_by(member_casual, weekday) %>%
  summarise(
    number_of_rides = n(),
    average_duration = mean(ride_length)
  ) %>%
  arrange(member_casual, weekday) %>%
  ggplot(aes(x = weekday, y = average_duration / 60, fill = member_casual)) +
  geom_col(position = "dodge") +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "Average Annual Ride Duration by Weekday", y = "Average Duration (minutes)")

## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.

## Warning: Removed 1 rows containing missing values (`geom_col()`).

Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.

When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).

The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.