Title: “Case Study: Bike Share Analysis”
Author: Kimberly Sheffield
Date: 2024-02-25
Output: htmL
Welcome to my analysis case study on Cyclistic bike-share. In this study, I’ll address real-world challenges encountered by Cyclistic, a bike-share company in Chicago. I’ll navigate through the data analysis process to tackle crucial business inquiries and generate actionable insights.
The company’s director of marketing, Lily Moreno, believes that the future success of Cyclistic relies on maximizing the number of annual memberships. To achieve this, I aim to understand the differences in behavior between casual riders and annual members, then leverage these insights to design a new marketing strategy. We need to:
Gain a comprehensive understanding of the distinct behaviors and preferences exhibited by annual members and casual riders.
Develop an innovative and targeted marketing strategy that effectively converts casual riders into loyal annual members.
Strengthen our thesis and marketing recommendations by supporting them with compelling data insights and professional data visualizations.
Based on the information provided, I can make some initial assumptions and theories. I may assume that annual members and casual riders exhibit distinct patterns in bike usage, ride duration, and frequency. Theorizing that annual members, being more committed, would have longer average ride duration and a higher usage frequency compared to casual riders, I also speculate that factors such as cost-effectiveness, convenience, carbon footprint awareness, and exclusive benefits may motivate casual riders to transition into annual members.
Three questions will guide the future marketing program:
How do annual members and casual riders use Cyclistic bikes differently?
Why would casual riders buy Cyclistic annual memberships?
How can Cyclistic use digital media to influence casual riders to become members?
I will produce a report with the following deliverables:
Our primary objective is to encourage casual riders to become dedicated annual members, nurture lasting loyalty, and ultimately, maximize Cyclistic’s growth.
We will use Cyclistic’s historical trip data to analyze and identify trends. Our analysis will be based on the last annual cycle, from January 2023 to December 2023. The data has been made available by Motivate International Inc. under this License.
Documentation of any cleaning or manipulation of data in RStudio
A summary of my analysis
Supporting visualizations and key findings
Your top three recommendations based on my analysis
Concentrate on specialized marketing efforts to raise awareness about the advantages of membership and provide exclusive promotions in areas experiencing a rise in health-conscious individuals engaging in low-impact physical activities to extend their health span.
Develop motivating incentives aimed at enticing occasional riders to transition, highlighting benefits like minimizing carbon footprint, saving finances, and enhancing health span through gentle physical activity.
colnames(all_trips) #List of column names
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "member_casual"
## [10] "ride_length" "day_of_week"
nrow(all_trips) #How many rows are in data frame?
## [1] 4179889
dim(all_trips) #Dimensions of the data frame?
## [1] 4179889 11
head(all_trips) #See the first 6 rows of data frame. Also tail(all_trips)
## # A tibble: 6 × 11
## ride_id rideable_type started_at ended_at start_station_name start_station_id
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 F96D5A7… electric_bike 1/21/2023… 1/21/20… Lincoln Ave & Ful… TA1309000058
## 2 13CB7EB… classic_bike 1/10/2023… 1/10/20… Kimbark Ave & 53r… TA1309000037
## 3 BD88A2E… electric_bike 1/2/2023 … 1/2/202… Western Ave & Lun… RP-005
## 4 C90792D… classic_bike 1/22/2023… 1/22/20… Kimbark Ave & 53r… TA1309000037
## 5 3397017… classic_bike 1/12/2023… 1/12/20… Kimbark Ave & 53r… TA1309000037
## 6 58E6815… electric_bike 1/31/2023… 1/31/20… Lakeview Ave & Fu… TA1309000019
## # ℹ 5 more variables: end_station_name <chr>, end_station_id <chr>,
## # member_casual <chr>, ride_length <dbl>, day_of_week <dbl>
str(all_trips) #See list of columns and data types (numeric, character, etc)
## tibble [4,179,889 × 11] (S3: tbl_df/tbl/data.frame)
## $ ride_id : chr [1:4179889] "F96D5A74A3E41399" "13CB7EB698CEDB88" "BD88A2E670661CE5" "C90792D034FED968" ...
## $ rideable_type : chr [1:4179889] "electric_bike" "classic_bike" "electric_bike" "classic_bike" ...
## $ started_at : chr [1:4179889] "1/21/2023 20:05" "1/10/2023 15:37" "1/2/2023 7:51" "1/22/2023 10:52" ...
## $ ended_at : chr [1:4179889] "1/21/2023 20:16" "1/10/2023 15:46" "1/2/2023 8:05" "1/22/2023 11:01" ...
## $ start_station_name: chr [1:4179889] "Lincoln Ave & Fullerton Ave" "Kimbark Ave & 53rd St" "Western Ave & Lunt Ave" "Kimbark Ave & 53rd St" ...
## $ start_station_id : chr [1:4179889] "TA1309000058" "TA1309000037" "RP-005" "TA1309000037" ...
## $ end_station_name : chr [1:4179889] "Hampden Ct & Diversey Ave" "Greenwood Ave & 47th St" "Valli Produce - Evanston Plaza" "Greenwood Ave & 47th St" ...
## $ end_station_id : chr [1:4179889] "202480" "TA1308000002" "599" "TA1308000002" ...
## $ member_casual : chr [1:4179889] "member" "member" "casual" "member" ...
## $ ride_length : num [1:4179889] 660 540 840 540 900 180 840 540 780 720 ...
## $ day_of_week : num [1:4179889] 7 3 2 1 5 3 1 4 4 6 ...
summary(all_trips) #Statistical summary of data. Mainly for numerics
## ride_id rideable_type started_at ended_at
## Length:4179889 Length:4179889 Length:4179889 Length:4179889
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## start_station_name start_station_id end_station_name end_station_id
## Length:4179889 Length:4179889 Length:4179889 Length:4179889
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## member_casual ride_length day_of_week
## Length:4179889 Min. : 0.0 Min. :1.0
## Class :character 1st Qu.: 360.0 1st Qu.:2.0
## Mode :character Median : 600.0 Median :4.0
## Mean : 954.9 Mean :4.1
## 3rd Qu.: 1020.0 3rd Qu.:6.0
## Max. :147480.0 Max. :7.0
## NA's :3 NA's :326899
all_trips$date <- as.Date(all_trips$started_at, format = "%m/%d/%Y %H:%M")
all_trips$month <- format(as.Date(all_trips$date), "%m")
all_trips$day <- format(as.Date(all_trips$date), "%d")
all_trips$year <- format(as.Date(all_trips$date), "%Y")
all_trips$day_of_week <- format(as.Date(all_trips$date), "%A")
all_trips_v2 <- all_trips[!(all_trips$start_station_name == "HQ QR" | all_trips$ride_length<0),]
mean(all_trips_v2$ride_length)
## [1] NA
median(all_trips_v2$ride_length)
## [1] NA
max(all_trips_v2$ride_length)
## [1] NA
min(all_trips_v2$ride_length)
## [1] NA
summary(all_trips_v2$ride_length)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0 360.0 600.0 954.9 1020.0 147480.0 24
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = mean)
## all_trips_v2$member_casual all_trips_v2$ride_length
## 1 casual 1371.5865
## 2 member 727.4112
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = median)
## all_trips_v2$member_casual all_trips_v2$ride_length
## 1 casual 780
## 2 member 540
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = max)
## all_trips_v2$member_casual all_trips_v2$ride_length
## 1 casual 147480
## 2 member 89880
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = min)
## all_trips_v2$member_casual all_trips_v2$ride_length
## 1 casual 0
## 2 member 0
all_trips_v2$day_of_week <- ordered(all_trips_v2$day_of_week,
levels=c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual + all_trips_v2$day_of_week, FUN = function(x) round(mean(x), 2))
## all_trips_v2$member_casual all_trips_v2$day_of_week all_trips_v2$ride_length
## 1 casual Sunday 1588.96
## 2 member Sunday 816.39
## 3 casual Monday 1350.67
## 4 member Monday 693.39
## 5 casual Tuesday 1226.77
## 6 member Tuesday 698.70
## 7 casual Wednesday 1171.71
## 8 member Wednesday 694.53
## 9 casual Thursday 1195.88
## 10 member Thursday 696.20
## 11 casual Friday 1337.99
## 12 member Friday 720.95
## 13 casual Saturday 1549.52
## 14 member Saturday 814.48
all_trips_v2 <- all_trips_v2 %>%
mutate(started_at = as.POSIXct(started_at, format = "%m/%d/%Y %H:%M"))
all_trips_v2 %>%
mutate(weekday = wday(started_at, label = TRUE)) %>%
group_by(member_casual, weekday) %>%
summarise(
number_of_rides = n(),
average_duration = mean(ride_length)
) %>%
arrange(member_casual, weekday)
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
## # A tibble: 15 × 4
## # Groups: member_casual [3]
## member_casual weekday number_of_rides average_duration
## <chr> <ord> <int> <dbl>
## 1 casual Sun 244063 1589.
## 2 casual Mon 169512 1351.
## 3 casual Tue 175979 1227.
## 4 casual Wed 177380 1172.
## 5 casual Thu 193240 1196.
## 6 casual Fri 219360 1338.
## 7 casual Sat 296484 1550.
## 8 member Sun 296909 816.
## 9 member Mon 373935 693.
## 10 member Tue 433874 699.
## 11 member Wed 438064 695.
## 12 member Thu 438318 696.
## 13 member Fri 385788 721.
## 14 member Sat 336959 814.
## 15 <NA> <NA> 24 NA
all_trips_v2 %>%
mutate(weekday = wday(started_at, label = TRUE)) %>%
group_by(member_casual, weekday) %>%
summarise(
number_of_rides = n(),
average_duration = mean(ride_length)
) %>%
arrange(member_casual, weekday) %>%
ggplot(aes(x = weekday, y = number_of_rides, fill = member_casual)) +
geom_col(position = "dodge") +
scale_y_continuous(labels = scales::comma) +
labs(title = "Total Annual Rides by Weekday")
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
all_trips_v2 %>%
mutate(weekday = wday(started_at, label = TRUE)) %>%
group_by(member_casual, weekday) %>%
summarise(
number_of_rides = n(),
average_duration = mean(ride_length)
) %>%
arrange(member_casual, weekday) %>%
ggplot(aes(x = weekday, y = average_duration / 60, fill = member_casual)) +
geom_col(position = "dodge") +
scale_y_continuous(labels = scales::comma) +
labs(title = "Average Annual Ride Duration by Weekday", y = "Average Duration (minutes)")
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
## Warning: Removed 1 rows containing missing values (`geom_col()`).
Casual users exhibit different preferences. They ride less frequently but for longer distances, with a preference for weekends, which peak on Saturdays. Their top start and end stations are located at Streeter Dr. and Grand Ave., indicating a different area of interest compared to annual members, allowing for targeted advertising to casual riders.
All of these findings highlight the importance of understanding the distinct behaviors and preferences of members and casual users. By discovering this information, Cyclistic can tailor our strategies to better serve these different user types. Cyclistic can optimize station placements, adjust operational schedules to accommodate peak times, and design targeted marketing campaigns to attract and retain both annual members and casual riders. All of this can enhance the overall user experience and lead to greater customer satisfaction while bringing Cyclistic sustain growth. This data-driven analysis demonstrates the power of leveraging information to make informed decisions, driving continual growth, and ensuring the success of Cyclistic.
Facts about the difference between annual members and causal riders from January 2023 to December 2023:
Annually, members ride more frequently than casual users, with a 34.7% higher ride frequency. Tuesday to Thursday are the most popular days for members, suggesting a regular weekday commuting pattern. Both members and casual users ride most often between 3-6 p.m., indicating high demand during the late afternoon hours. The top start and end stations for members are Kingsbury and Kinzie St., while casual users prefer Streeter Dr. and Grand Ave. Casual users predominantly ride on Saturday and Sunday, indicating a preference for weekends.
Casual users ride 33% longer distances on average compared to members, potentially indicating a preference for longer rides or exploring different destinations. July records the highest ridership, while December has the lowest, possibly due to weather conditions and holiday-related factors. Saturday is the most popular day of the week across all users and is the only day casual users slightly surpass members in usage.
I suggest to shareholders the importance of ongoing monitoring and analysis of rider data to pinpoint emerging trends, patterns, and areas for enhancement. This practice will empower the bike-sharing program to make well-informed decisions and adjust strategies according to evolving user preferences. Implementing these measures will allow Cyclistic to create a more personalized and satisfying experience for both members and casual users. This tailored approach will not only attract new riders but also foster increased loyalty and engagement among existing users, leading to sustainable growth and a competitive advantage in the market.
In conclusion, the solutions derived from data analysis are designed to optimize marketing efficiency through targeted advertising and tailored surveys for different user segments. By implementing these strategies, Cyclistic can position itself for long-term growth while also making a positive impact on the community and advocating for sustainable transportation practices.