title: “Case Study: Bike Share Analysis” Author: Kimberly Sheffield Date: “2024-02-25” output: html_notebook —
Welcome to my Cyclistic bike-share analysis case study. Here, we will tackle real-world challenges faced by Cyclistic, a bike-share company based in Chicago. Throughout this study, we will follow the data analysis process to answer key business questions and develop actionable insights.
The company’s director of marketing, Lily Moreno, believes that the future success of Cyclistic relies on maximizing the number of annual memberships. To achieve this, we aim to understand the differences in behavior between casual riders and annual members, then leverage these insights to design a new marketing strategy. We need to:
Gain a comprehensive understanding of the distinct behaviors and preferences exhibited by annual members and casual riders.
Develop an innovative and targeted marketing strategy that effectively converts casual riders into loyal annual members.
Strengthen our thesis and marketing recommendations by supporting them with compelling data insights and professional data visualizations.
Based on the information provided, we can make some initial assumptions and theories. We may assume that annual members and casual riders exhibit distinct patterns in bike usage, ride duration, and frequency. Theorizing that annual members, being more committed, would have longer average ride durations and a higher usage frequency compared to casual riders, we also speculate that factors such as cost-effectiveness, convenience, carbon footprint awareness, and exclusive benefits may motivate casual riders to transition into annual members.
Three questions will guide the future marketing program:
How do annual members and casual riders use Cyclistic bikes differently?
Why would casual riders buy Cyclistic annual memberships?
How can Cyclistic use digital media to influence casual riders to become members?
We will produce a report with the following deliverables:
Our overarching goal is to drive the conversion of casual riders into devoted annual members, foster long-term loyalty, and maximize the growth of Cyclistic.
We will use Cyclistic’s historical trip data to analyze and identify trends. Our analysis will be based on the last annual cycle, from April 2022 to May 2023. The data has been made available by Motivate International Inc. under this License.
Documentation of any cleaning or manipulation of data
A summary of your analysis
Annually, members ride more frequently than casual users, with a 34.7% higher ride frequency. Tuesday to Thursday are the most popular days for members, suggesting a regular weekday commuting pattern. Both members and casual users ride most often between 3-6p.m., indicating high demand during the late afternoon hours. The top start and end stations for members are Kingsbury and Kinzie St., while casual users prefer Streeter Dr. and Grand Ave. Casual users predominantly ride on Saturday and Sunday, indicating a preference for weekends.
Casual users ride 33% longer distances on average compared to members, potentially indicating a preference for longer rides or exploring different destinations. July records the highest ridership, while December has the lowest, possibly due to weather conditions and holiday-related factors. Saturday is the most popular day of the week across all users and is the only day casual users slightly surpass members in usage.
Supporting visualizations and key findings
Your top three recommendations based on your analysis
Concentrate on specialized marketing efforts to raise awareness about the advantages of membership and provide exclusive promotions in areas experiencing a rise in health-conscious individuals engaging in low-impact physical activities to extend their health span.
Develop motivating incentives aimed at enticing occasional riders to transition, highlighting benefits like minimizing carbon footprint, saving finances, and enhancing health span through gentle physical activity.
“3. Design tailored surveys for diverse user demographics to gather valuable insights by leveraging data-driven analysis of user type preferences and hotspots.”
In truth, the solution is beyond our assigned task. Any implications are nothing more than speculation without answering the other two stakeholder questions. However, what our question can tell you are facts about the difference between annual members and causal riders. With that said, from January 2023 to December 2023:
The analysis of the data gave us several key insights. Members ride more frequently than casual users, showing a clear preference for weekdays, which peak on Wednesdays. They also tend to ride during the late afternoon hours, indicating a commuting pattern. The top start and end stations for members are Kingsbury and Kinzie St., suggesting a concentrated area of member activity. That gives us an opportunity to create targeted surveys to gain valuable insights specifically about what makes our services worth their annual membership.
Casual users exhibit different preferences. They ride less frequently but for longer distances, with a preference for weekends, which peak on Saturdays. Their top start and end stations are located at Streeter Dr. and Grand Ave., indicating a different area of interest compared to annual members, allowing for targeted advertising to casual riders.
All of these findings highlight the importance of understanding the distinct behaviors and preferences of members and casual users. By discovering this information, Cyclistic can tailor our strategies to better serve these different user types. Cyclistic can optimize station placements, adjust operational schedules to accommodate peak times, and design targeted marketing campaigns to attract and retain both annual members and casual riders. All of this can enhance the overall user experience and lead to greater customer satisfaction while bringing Cyclistic sustain growth. This data-driven analysis demonstrates the power of leveraging information to make informed decisions, driving continual growth, and ensuring the success of Cyclistic.
We should collaborate with the rest of the team and bring together their insights from the questions Lily Moreno assigned them. Then I would suggest we consider running promotions targeted at casual users and utilizing targeted advertising in their most frequent locations.
I would suggest gaining additional data from all users by creating surveys for each type that they receive via email or after a purchase, which would preferably include:
Preferred payment options: Investigate whether annual members and casual riders have preferences for specific payment options, such as credit cards, mobile apps, or in-person payments.
Demographic factors: Explore if there are any demographic differences between annual members and casual riders, such as age, gender, or income level, that could influence their bike usage behavior.
Purpose of rides: Examine the purpose of bike rides for annual members and casual riders. Do annual members primarily use bikes for commuting, while casual riders use them for leisure or recreational purposes?
With all that new data, create user-specific marketing, such as:
Develop targeted marketing campaigns for members and casual users separately. Highlight the convenience, cost-effectiveness, and environmental benefits of membership to attract and retain members. For casual users, emphasize the flexibility and leisurely experience and create promotions to encourage weekday ridership.
Seasonal Promotions: Capitalize on the popularity of July by introducing special promotions, such as discounted membership rates, extended riding hours, or partnerships with local events and attractions. Additionally, for December, consider offering holiday-themed incentives, such as festive decorations, seasonal rides, or charity initiatives, to engage riders and increase usage.
Lastly, I would continuously monitor and analyze rider data to identify emerging trends, patterns, and areas for improvement. This will enable the bike-sharing program to make informed decisions and adapt strategies to evolving user needs.
By implementing these recommendations, Cyclistic can foster a more personalized and enjoyable experience for both members and casual users. This tailored approach will not only attract new riders but also increase rider loyalty and engagement, leading to sustainable growth and a competitive edge in the market.
The proposed solutions gained from data analysis aim to enhance marketing efficiency by utilizing targeted strategies in advertising and surveys for each respective user type. Cylistic can position itself for sustained growth while fostering a positive impact on the community and promoting sustainable transportation.
colnames(all_trips) #List of column names
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "member_casual"
## [10] "ride_length" "day_of_week"
nrow(all_trips) #How many rows are in data frame?
## [1] 4179889
dim(all_trips) #Dimensions of the data frame?
## [1] 4179889 11
head(all_trips) #See the first 6 rows of data frame. Also tail(all_trips)
## # A tibble: 6 × 11
## ride_id rideable_type started_at ended_at start_station_name start_station_id
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 F96D5A7… electric_bike 1/21/2023… 1/21/20… Lincoln Ave & Ful… TA1309000058
## 2 13CB7EB… classic_bike 1/10/2023… 1/10/20… Kimbark Ave & 53r… TA1309000037
## 3 BD88A2E… electric_bike 1/2/2023 … 1/2/202… Western Ave & Lun… RP-005
## 4 C90792D… classic_bike 1/22/2023… 1/22/20… Kimbark Ave & 53r… TA1309000037
## 5 3397017… classic_bike 1/12/2023… 1/12/20… Kimbark Ave & 53r… TA1309000037
## 6 58E6815… electric_bike 1/31/2023… 1/31/20… Lakeview Ave & Fu… TA1309000019
## # ℹ 5 more variables: end_station_name <chr>, end_station_id <chr>,
## # member_casual <chr>, ride_length <dbl>, day_of_week <dbl>
str(all_trips) #See list of columns and data types (numeric, character, etc)
## tibble [4,179,889 × 11] (S3: tbl_df/tbl/data.frame)
## $ ride_id : chr [1:4179889] "F96D5A74A3E41399" "13CB7EB698CEDB88" "BD88A2E670661CE5" "C90792D034FED968" ...
## $ rideable_type : chr [1:4179889] "electric_bike" "classic_bike" "electric_bike" "classic_bike" ...
## $ started_at : chr [1:4179889] "1/21/2023 20:05" "1/10/2023 15:37" "1/2/2023 7:51" "1/22/2023 10:52" ...
## $ ended_at : chr [1:4179889] "1/21/2023 20:16" "1/10/2023 15:46" "1/2/2023 8:05" "1/22/2023 11:01" ...
## $ start_station_name: chr [1:4179889] "Lincoln Ave & Fullerton Ave" "Kimbark Ave & 53rd St" "Western Ave & Lunt Ave" "Kimbark Ave & 53rd St" ...
## $ start_station_id : chr [1:4179889] "TA1309000058" "TA1309000037" "RP-005" "TA1309000037" ...
## $ end_station_name : chr [1:4179889] "Hampden Ct & Diversey Ave" "Greenwood Ave & 47th St" "Valli Produce - Evanston Plaza" "Greenwood Ave & 47th St" ...
## $ end_station_id : chr [1:4179889] "202480" "TA1308000002" "599" "TA1308000002" ...
## $ member_casual : chr [1:4179889] "member" "member" "casual" "member" ...
## $ ride_length : num [1:4179889] 660 540 840 540 900 180 840 540 780 720 ...
## $ day_of_week : num [1:4179889] 7 3 2 1 5 3 1 4 4 6 ...
summary(all_trips) #Statistical summary of data. Mainly for numerics
## ride_id rideable_type started_at ended_at
## Length:4179889 Length:4179889 Length:4179889 Length:4179889
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## start_station_name start_station_id end_station_name end_station_id
## Length:4179889 Length:4179889 Length:4179889 Length:4179889
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## member_casual ride_length day_of_week
## Length:4179889 Min. : 0.0 Min. :1.0
## Class :character 1st Qu.: 360.0 1st Qu.:2.0
## Mode :character Median : 600.0 Median :4.0
## Mean : 954.9 Mean :4.1
## 3rd Qu.: 1020.0 3rd Qu.:6.0
## Max. :147480.0 Max. :7.0
## NA's :3 NA's :326899
all_trips$date <- as.Date(all_trips$started_at, format = "%m/%d/%Y %H:%M")
all_trips$month <- format(as.Date(all_trips$date), "%m")
all_trips$day <- format(as.Date(all_trips$date), "%d")
all_trips$year <- format(as.Date(all_trips$date), "%Y")
all_trips$day_of_week <- format(as.Date(all_trips$date), "%A")
all_trips_v2 <- all_trips[!(all_trips$start_station_name == "HQ QR" | all_trips$ride_length<0),]
mean(all_trips_v2$ride_length)
## [1] NA
median(all_trips_v2$ride_length)
## [1] NA
max(all_trips_v2$ride_length)
## [1] NA
min(all_trips_v2$ride_length)
## [1] NA
summary(all_trips_v2$ride_length)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0 360.0 600.0 954.9 1020.0 147480.0 24
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = mean)
## all_trips_v2$member_casual all_trips_v2$ride_length
## 1 casual 1371.5865
## 2 member 727.4112
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = median)
## all_trips_v2$member_casual all_trips_v2$ride_length
## 1 casual 780
## 2 member 540
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = max)
## all_trips_v2$member_casual all_trips_v2$ride_length
## 1 casual 147480
## 2 member 89880
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = min)
## all_trips_v2$member_casual all_trips_v2$ride_length
## 1 casual 0
## 2 member 0
all_trips_v2$day_of_week <- ordered(all_trips_v2$day_of_week,
levels=c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual + all_trips_v2$day_of_week, FUN = function(x) round(mean(x), 2))
## all_trips_v2$member_casual all_trips_v2$day_of_week all_trips_v2$ride_length
## 1 casual Sunday 1588.96
## 2 member Sunday 816.39
## 3 casual Monday 1350.67
## 4 member Monday 693.39
## 5 casual Tuesday 1226.77
## 6 member Tuesday 698.70
## 7 casual Wednesday 1171.71
## 8 member Wednesday 694.53
## 9 casual Thursday 1195.88
## 10 member Thursday 696.20
## 11 casual Friday 1337.99
## 12 member Friday 720.95
## 13 casual Saturday 1549.52
## 14 member Saturday 814.48
all_trips_v2 <- all_trips_v2 %>%
mutate(started_at = as.POSIXct(started_at, format = "%m/%d/%Y %H:%M"))
all_trips_v2 %>%
mutate(weekday = wday(started_at, label = TRUE)) %>%
group_by(member_casual, weekday) %>%
summarise(
number_of_rides = n(),
average_duration = mean(ride_length)
) %>%
arrange(member_casual, weekday)
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
## # A tibble: 15 × 4
## # Groups: member_casual [3]
## member_casual weekday number_of_rides average_duration
## <chr> <ord> <int> <dbl>
## 1 casual Sun 244063 1589.
## 2 casual Mon 169512 1351.
## 3 casual Tue 175979 1227.
## 4 casual Wed 177380 1172.
## 5 casual Thu 193240 1196.
## 6 casual Fri 219360 1338.
## 7 casual Sat 296484 1550.
## 8 member Sun 296909 816.
## 9 member Mon 373935 693.
## 10 member Tue 433874 699.
## 11 member Wed 438064 695.
## 12 member Thu 438318 696.
## 13 member Fri 385788 721.
## 14 member Sat 336959 814.
## 15 <NA> <NA> 24 NA
all_trips_v2 %>%
mutate(weekday = wday(started_at, label = TRUE)) %>%
group_by(member_casual, weekday) %>%
summarise(
number_of_rides = n(),
average_duration = mean(ride_length)
) %>%
arrange(member_casual, weekday) %>%
ggplot(aes(x = weekday, y = number_of_rides, fill = member_casual)) +
geom_col(position = "dodge") +
scale_y_continuous(labels = scales::comma) +
labs(title = "Total Annual Rides by Weekday")
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
all_trips_v2 %>%
mutate(weekday = wday(started_at, label = TRUE)) %>%
group_by(member_casual, weekday) %>%
summarise(
number_of_rides = n(),
average_duration = mean(ride_length)
) %>%
arrange(member_casual, weekday) %>%
ggplot(aes(x = weekday, y = average_duration / 60, fill = member_casual)) +
geom_col(position = "dodge") +
scale_y_continuous(labels = scales::comma) +
labs(title = "Average Annual Ride Duration by Weekday", y = "Average Duration (minutes)")
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
## Warning: Removed 1 rows containing missing values (`geom_col()`).
Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.
When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).
The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.