Welcome to my Capstone Case Study for Cyclistic. We will dive into the data analysis process to answer stakeholder questions and recommend actionable insights for data-driven decisions.
Cyclistic, a bike-sharing company in Chicago with a fleet of over 5,800 bikes, aims to increase annual memberships. Casual riders, who currently purchase single-ride or full-day passes, are less profitable for Cyclistic than annual members. However, the pricing flexibility has helped attract more customers. Cyclistic’s finance analysts have concluded that maximizing the number of annual members will be key to future growth. Lily Moreno, director of marketing, believes converting casual riders into members is a solid opportunity and set a goal to design marketing strategies aimed at this conversion. To better understand how to achieve this goal, her team needs to analyze historical data to identify trends and differences between annual members and casual riders.
Based on the information provided, we can make some initial assumptions and theories. We may assume that annual members and casual riders exhibit distinct patterns in bike usage, ride duration, and frequency. Theorizing that annual members, being more committed, would have longer average ride durations and a higher usage frequency compared to casual riders, we also speculate that factors such as cost-effectiveness, convenience, carbon footprint awareness, and exclusive benefits may motivate casual riders to transition into annual members.
Three questions will guide the future marketing program:
How do annual members and casual riders use Cyclistic bikes differenty? Task given by our manager, Lily Moreno.
Why would casual riders buy Cyclistic’s annual memberships?
How can Cyclistic use digital media to influence casual riders to become annual members?
1. A clear statement of the business task.
Our overarching goal is to drive the conversion of casual riders into devoted annual members, foster long-term loyalty, and maximize the growth of Cyclistic.
2. A description of sourced data.
Cyclistic’s historical trip data, based on the last annual cycle from April 2023 to March 2024. Motivate International Inc. has made the data available under this License
3. Documentation of any cleaning or manipulation of data.
All Excel, SQL, and R cleaning and manipulation are found in the GitHub Capstone-Case-Study-Projects repository “Case Study 1 - Cyclistic Bike-Share Analysis” in extensive detail.
4. A Summary of Analysis
The analysis of bike ride data reveals several key insights into user behavior and system performance. Overall, the annual average ride length for all users stands at 16 minutes and 07 seconds, with casual riders taking longer rides averaging 23 minutes and 05 seconds compared to members’ average of 12 minutes and 18 seconds. July emerges as the month with the largest variety of ride length durations, totaling 11,042 unique durations out of 20,011 distinct lengths recorded throughout the year. While customer usage peaks in August and declines in January, both casual and member riders show consistent usage patterns from May to October, with a notable drop from November to April. Peak usage occurs during late afternoons, especially between 3-6pm, and differs between casual riders, who peak on Saturdays, and members, who peak on Thursdays. Despite members constituting the majority of riders, casual riders contribute significantly to weekend usage, comprising 35.34% of rides compared to members’ 64.66%. Further analysis of top-performing days, ride lengths, and popular rideable types enriches our understanding of user preferences and system dynamics, underscoring the importance of tailored strategies to accommodate diverse user behaviors.
5. Visualizations and key findings.
All Excel, SQL, and R-supporting visualizations are found in theGitHub Capstone-Case-Study-Projects repository “Case Study 1 - Cyclistic Bike-Share Analysis” in extensive detail.
6. Top 3 recommendations for stakeholders. To drive the conversion of casual riders into devoted annual members, foster long-term loyalty, and maximize the growth of Cyclistic, stakeholders should consider the following recommendations:
Promotional Incentives for Membership Conversion: Implement targeted marketing campaigns offering exclusive discounts, perks, or incentives to encourage casual riders to sign up for annual memberships. Providing trial memberships or discounted rates for the first year can entice casual riders to experience the benefits of membership, such as unlimited rides, priority access to bikes, and member-exclusive events.
Enhanced User Experience and Engagement: Invest in improving the overall user experience of the Cyclistic platform and mobile app to enhance convenience, ease of use, and engagement for both casual and member riders. Introduce features such as personalized recommendations, rewards programs, and interactive challenges to incentivize continued usage and foster a sense of community among members.
Data-Driven Targeting and Customization: Utilize data analytics and segmentation techniques to identify key demographics, usage patterns, and preferences among casual riders. Tailor marketing messages, service offerings, and member benefits based on these insights to effectively target and engage potential members with personalized offers and experiences that resonate with their needs and interests.
By implementing these recommendations, Cyclistic can drive higher conversion rates of casual riders into loyal annual members, foster long-term customer loyalty, and achieve sustainable growth by maximizing user retention and satisfaction.
In truth, the solution is beyond our assigned task. Any implications are nothing more than speculation without answering the other two stakeholder questions. However, what our question can tell you are facts about the difference between annual members and causal riders. With that said, from April 2023 to March 2024:
Present in this Rmd file at the bottom of the page
The findings of this case study provide valuable insights into the differences between annual members and casual riders of Cyclistic. The analysis revealed that casual riders tend to take longer rides, while annual members prefer shorter rides. Additionally, casual riders are more likely to use the bikes on weekends and on top-performing days, while members exhibit a consistent usage pattern throughout the week. This information can help Cyclistic develop targeted marketing strategies to convert casual riders into annual members. By offering incentives, enhancing the user experience, and utilizing data-driven insights, Cyclistic can increase the number of annual members and achieve sustainable growth. In conclusion, leveraging the comprehensive analysis of Cyclistic’s bike ride data, the company can develop tailored strategies to drive membership conversion, foster long-term loyalty, and maximize growth. By implementing recommendations such as promotional incentives and user experience enhancements, Cyclistic is poised to capitalize on opportunities and thrive in the competitive urban transportation market.
Preferred payment options: Investigate whether annual members and casual riders have preferences for specific payment options, such as credit cards, mobile apps, or in-person payments.
Demographic factors: Explore if there are any demographic differences between annual members and casual riders, such as age, gender, or income level, that could influence their bike usage behavior.
Purpose of rides: Examine the purpose of bike rides for annual members and casual riders. Do annual members primarily use bikes for commuting, while casual riders use them for leisure or recreational purposes?
With all that new data, create user-specific marketing, such as:
Develop targeted marketing campaigns for members and casual users separately. Highlight the convenience, cost-effectiveness, and environmental benefits of membership to attract and retain members. For casual users, emphasize the flexibility and leisurely experience and create promotions to encourage weekday ridership.
Seasonal Promotions: Capitalize on the popularity of July by introducing special promotions, such as discounted membership rates, extended riding hours, or partnerships with local events and attractions. Additionally, for December, consider offering holiday-themed incentives, such as festive decorations, seasonal rides, or charity initiatives, to engage riders and increase usage.
By implementing these recommendations, Cyclistic can foster a more personalized and enjoyable experience for both members and casual users. This tailored approach will not only attract new riders but also increase rider loyalty and engagement, leading to sustainable growth and a competitive edge in the market.
The proposed solutions gained from data analysis aim to enhance marketing efficiency by utilizing targeted strategies in advertising and surveys for each respective user type. Cylistic can position itself for sustained growth while fostering a positive impact on the community and promoting sustainable transportation.
all_trips <- bind_rows(db1,db2,db3,db4,db5,db6,db7,db8,db9,db10,db11,db12)
all_trips$ride_length <- as.numeric(as.POSIXlt(all_trips$ride_length, format = "%H:%M:%S"))
colnames(all_trips) #List of column names
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "end_station_name"
## [7] "member_casual" "ride_length" "day_of_week"
nrow(all_trips) #How many rows are in data frame?
## [1] 4355499
dim(all_trips) #Dimensions of the data frame?
## [1] 4355499 9
head(all_trips) #See the first 6 rows of data frame. Also tail(all_trips)
## # A tibble: 6 × 9
## ride_id rideable_type started_at ended_at start_station_name end_station_name
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 AD30447… classic_bike 4/1/2023 … 4/1/202… Sheffield Ave & W… Sheffield Ave &…
## 2 DC3F49A… classic_bike 4/1/2023 … 4/1/202… California Ave & … California Ave …
## 3 749D486… classic_bike 4/1/2023 … 4/1/202… Sheridan Rd & Noy… Chicago Ave & S…
## 4 D50B53B… electric_bike 4/1/2023 … 4/1/202… Southport Ave & R… Southport Ave &…
## 5 563BB19… classic_bike 4/1/2023 … 4/1/202… Wentworth Ave & 3… Halsted St & 35…
## 6 C00E421… electric_bike 4/1/2023 … 4/1/202… Clark St & Armita… Clark St & Drum…
## # ℹ 3 more variables: member_casual <chr>, ride_length <dbl>, day_of_week <dbl>
str(all_trips) #See list of columns and data types (numeric, character, etc)
## spc_tbl_ [4,355,499 × 9] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ ride_id : chr [1:4355499] "AD304476EF192169" "DC3F49A1DA0EABFD" "749D48646A35C4F0" "D50B53B3DBE498FF" ...
## $ rideable_type : chr [1:4355499] "classic_bike" "classic_bike" "classic_bike" "electric_bike" ...
## $ started_at : chr [1:4355499] "4/1/2023 0:00" "4/1/2023 0:02" "4/1/2023 0:01" "4/1/2023 0:05" ...
## $ ended_at : chr [1:4355499] "4/1/2023 0:03" "4/1/2023 0:04" "4/1/2023 0:05" "4/1/2023 0:05" ...
## $ start_station_name: chr [1:4355499] "Sheffield Ave & Wrightwood Ave" "California Ave & Milwaukee Ave" "Sheridan Rd & Noyes St (NU)" "Southport Ave & Roscoe St" ...
## $ end_station_name : chr [1:4355499] "Sheffield Ave & Webster Ave" "California Ave & Milwaukee Ave" "Chicago Ave & Sheridan Rd" "Southport Ave & Roscoe St" ...
## $ member_casual : chr [1:4355499] "member" "member" "casual" "member" ...
## $ ride_length : num [1:4355499] 183 95 244 3 422 259 194 177 298 396 ...
## $ day_of_week : num [1:4355499] 7 7 7 7 7 7 7 7 7 7 ...
## - attr(*, "spec")=
## .. cols(
## .. ride_id = col_character(),
## .. rideable_type = col_character(),
## .. started_at = col_character(),
## .. ended_at = col_character(),
## .. start_station_name = col_character(),
## .. end_station_name = col_character(),
## .. member_casual = col_character(),
## .. ride_length = col_time(format = ""),
## .. day_of_week = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
summary(all_trips) #Statistical summary of data. Mainly for numerics
## ride_id rideable_type started_at ended_at
## Length:4355499 Length:4355499 Length:4355499 Length:4355499
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## start_station_name end_station_name member_casual ride_length
## Length:4355499 Length:4355499 Length:4355499 Min. : 0.0
## Class :character Class :character Class :character 1st Qu.: 344.0
## Mode :character Mode :character Mode :character Median : 599.0
## Mean : 967.4
## 3rd Qu.: 1063.0
## Max. :86392.0
## day_of_week
## Min. :1.000
## 1st Qu.:2.000
## Median :4.000
## Mean :4.113
## 3rd Qu.:6.000
## Max. :7.000
all_trips$date <- as.Date(all_trips$started_at, format = "%m/%d/%Y %H:%M")
all_trips$month <- format(as.Date(all_trips$date), "%m")
all_trips$day <- format(as.Date(all_trips$date), "%d")
all_trips$year <- format(as.Date(all_trips$date), "%Y")
all_trips$day_of_week <- format(as.Date(all_trips$date), "%A")
all_trips_v2 <- all_trips[!(all_trips$start_station_name == "HQ QR" | all_trips$ride_length<0),]
mean(all_trips_v2$ride_length)
## [1] 967.4112
median(all_trips_v2$ride_length)
## [1] 599
max(all_trips_v2$ride_length)
## [1] 86392
min(all_trips_v2$ride_length)
## [1] 0
summary(all_trips_v2$ride_length)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 344.0 599.0 967.4 1063.0 86392.0
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = mean)
## all_trips_v2$member_casual all_trips_v2$ride_length
## 1 casual 1385.5452
## 2 member 738.9054
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = median)
## all_trips_v2$member_casual all_trips_v2$ride_length
## 1 casual 778
## 2 member 525
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = max)
## all_trips_v2$member_casual all_trips_v2$ride_length
## 1 casual 86261
## 2 member 86392
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = min)
## all_trips_v2$member_casual all_trips_v2$ride_length
## 1 casual 0
## 2 member 0
all_trips_v2$day_of_week <- ordered(all_trips_v2$day_of_week,
levels=c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual + all_trips_v2$day_of_week, FUN = function(x) round(mean(x), 2))
## all_trips_v2$member_casual all_trips_v2$day_of_week all_trips_v2$ride_length
## 1 casual Sunday 1605.37
## 2 member Sunday 825.14
## 3 casual Monday 1370.22
## 4 member Monday 705.95
## 5 casual Tuesday 1250.19
## 6 member Tuesday 715.22
## 7 casual Wednesday 1184.54
## 8 member Wednesday 707.79
## 9 casual Thursday 1197.37
## 10 member Thursday 705.39
## 11 casual Friday 1346.26
## 12 member Friday 730.54
## 13 casual Saturday 1557.03
## 14 member Saturday 823.07
all_trips_v2 <- all_trips_v2 %>%
mutate(started_at = as.POSIXct(started_at, format = "%m/%d/%Y %H:%M"))
all_trips_v2 %>%
mutate(weekday = wday(started_at, label = TRUE)) %>%
group_by(member_casual, weekday) %>%
summarise(
number_of_rides = n(),
average_duration = mean(ride_length)
) %>%
arrange(member_casual, weekday)
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
## # A tibble: 14 × 4
## # Groups: member_casual [2]
## member_casual weekday number_of_rides average_duration
## <chr> <ord> <int> <dbl>
## 1 casual Sun 255156 1605.
## 2 casual Mon 176790 1370.
## 3 casual Tue 180145 1250.
## 4 casual Wed 181465 1185.
## 5 casual Thu 200549 1197.
## 6 casual Fri 228897 1346.
## 7 casual Sat 316119 1557.
## 8 member Sun 306838 825.
## 9 member Mon 391777 706.
## 10 member Tue 444157 715.
## 11 member Wed 454949 708.
## 12 member Thu 462618 705.
## 13 member Fri 399908 731.
## 14 member Sat 356131 823.
all_trips_v2 %>%
mutate(weekday = wday(started_at, label = TRUE)) %>%
group_by(member_casual, weekday) %>%
summarise(
number_of_rides = n(),
average_duration = mean(ride_length)
) %>%
arrange(member_casual, weekday) %>%
ggplot(aes(x = weekday, y = number_of_rides, fill = member_casual)) +
geom_col(position = "dodge") +
scale_y_continuous(labels = scales::comma) +
labs(title = "Total Annual Rides by Weekday")
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
all_trips_v2 %>%
mutate(weekday = wday(started_at, label = TRUE)) %>%
group_by(member_casual, weekday) %>%
summarise(
number_of_rides = n(),
average_duration = mean(ride_length)
) %>%
arrange(member_casual, weekday) %>%
ggplot(aes(x = weekday, y = average_duration / 60, fill = member_casual)) +
geom_col(position = "dodge") +
scale_y_continuous(labels = scales::comma) +
labs(title = "Average Annual Ride Duration by Weekday", y = "Average Duration (minutes)")
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.