Recent financial analysis has shown that annual memberships generated the biggest revenue to the business and there is the belief that this segment of customers will be key to future business growth. The Director of Marketing and my manager has set a clear goal: Design marketing strategies aimed at converting casual riders into annual members. My task is to find out how annual members and casual riders use Cyclistic bikes differently.
The business task is to convert casual riders into annual members.
I am to find out how annual members and casual riders use Cyclistic bikes differently.
Insights from this task can reveal factors which can help boost annual memberships and help formulate steps aimed at convincing casual riders to switch to annual memberships.
Executive team
Director of Marketing (Lily Moreno)
I downloaded files about historical trip dataset from Amazon aws. https://divvy-tripdata.s3.amazonaws.com/index.html. The dataset, owned by Motivate International Inc, is open source.
The downloaded dataset consisted of files spanning from August 2021 to July 2022.
Upon inspection, I found the data in each file to be structured and having a wide format and comprising 13 variables and several hundred thousand rows.
The data across the variables were of different types and formats.
The original variables or column names include: “ride_id”, “rideable_type”, “started_at”, “ended_at”, “start_station_name”, “start_station_id”, “end_station_name”, “end_station_id”, “start_lat”, “start_lng”, “end_lat”, “end_lng” and “member_casual”.
Although the columns had some missing values, there was ample information for analysis.
To aid my analysis, I added additional variables such as “ride_length” by subtracting “started_at” and “ended_at” variables and “day_of_week” by using the =WEEKDAY() function on both the “started_at” and “ended_at” variables. Furthermore, I derived column, “distance_km”, by using the beginning and ending latitude and longitudes for the station locations in the following formula:”=ACOS(COS(RADIANS(90-start_lat)) times COS(RADIANS(90-end_lat)) plus SIN(RADIANS(90-start_lat)) times SIN(RADIANS(90-end_lat)) times COS(RADIANS(start_lng-end_lng))) time 6371”. “speed” was derived by using inputs from the “ride_length” and “distance_km” variables.
The additional columns I created brought the total variables to 17.
The formatting appeared consistent for most variables except for the “ride_id”, “start_station_name”, “end_station_name”, “start_station_id” and “end_station_id” as well as some with missing data. I lopped off a relatively few number of inconsistent distance data in “distance_km” column as well as speed data in the “speed” column. With regard to speed, I referenced the fastest speed every recorded by man on flat surface and motor-paced. I eventually filtered for speeds less than 300km/h in R Studio. I also deleted a few rows in the” ride_length” variable that had inconsistent formatting before importing files to R.
Asides the csv files, I backup copies in excel file type.
Here is my dataset comprising my csv cleaned files:
Jul_2021<- read.csv("~/Data Analytics/Case Study/Original Files/Tripdata in csv/202107-divvy-tripdata.csv")
Aug_2021<- read.csv("~/Data Analytics/Case Study/Original Files/Tripdata in csv/202108-divvy-tripdata.csv")
Sep_2021<- read.csv("~/Data Analytics/Case Study/Original Files/Tripdata in csv/202109-divvy-tripdata.csv")
Oct_2021<- read.csv("~/Data Analytics/Case Study/Original Files/Tripdata in csv/202110-divvy-tripdata.csv")
Nov_2021<- read.csv("~/Data Analytics/Case Study/Original Files/Tripdata in csv/202111-divvy-tripdata.csv")
Dec_2021<- read.csv("~/Data Analytics/Case Study/Original Files/Tripdata in csv/202112-divvy-tripdata.csv")
Jan_2022<- read.csv("~/Data Analytics/Case Study/Original Files/Tripdata in csv/202201-divvy-tripdata.csv")
Feb_2022<- read.csv("~/Data Analytics/Case Study/Original Files/Tripdata in csv/202202-divvy-tripdata.csv")
Mar_2022<- read.csv("~/Data Analytics/Case Study/Original Files/Tripdata in csv/202203-divvy-tripdata.csv")
Apr_2022<- read.csv("~/Data Analytics/Case Study/Original Files/Tripdata in csv/202204-divvy-tripdata.csv")
May_2022<- read.csv("~/Data Analytics/Case Study/Original Files/Tripdata in csv/202205-divvy-tripdata.csv")
Jun_2022<- read.csv("~/Data Analytics/Case Study/Original Files/Tripdata in csv/202206-divvy-tripdata.csv")
Jul_2022<- read.csv("~/Data Analytics/Case Study/Original Files/Tripdata in csv/202207-divvy-tripdata.csv")
Combine the uploaded dataset with the following code chunk:
all_trips<- bind_rows(Aug_2021,Sep_2021,Oct_2021,
Nov_2021,Dec_2021,Jan_2022,Feb_2022,
Mar_2022,Apr_2022,May_2022,Jun_2022,
Jul_2022)
Clean up the data with the following code chunk:
all_trips<- all_trips %>%
select(-c(start_lat,start_lng,end_lat,end_lng)) %>%
mutate(started_at=mdy_hm(started_at,tz="UTC"),
start_date=as.Date(started_at),
start_month=format(as.Date(started_at),"%B"),
start_day=format(as.Date(started_at),"%d"),
start_year=format(as.Date(started_at),"%Y"),
start_weekday=format(as.Date(started_at),"%A"),
ended_at=mdy_hm(ended_at,tz="UTC"),
end_date=as.Date(ended_at),
end_month=format(as.Date(ended_at),"%B"),
end_day=format(as.Date(ended_at),"%d"),
end_year=format(as.Date(ended_at),"%Y"),
end_weekday=format(as.Date(ended_at),"%A")) %>%
select(-c(day_of_week)) %>%
filter(complete.cases(.)) %>%
filter(ride_length>0.00) %>%
mutate(ride_length=difftime(ended_at,started_at,units="mins"))%>%
mutate(
start_weekday=factor(start_weekday,levels=c("Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday")),
end_weekday=factor(end_weekday,levels=c("Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday")),
start_month=factor(start_month,levels=c("July","August","September","October","November","December","January","February","March","April","May","June")),
end_month=factor(end_month,levels=c("July","August","September","October","November","December","January","February","March","April","May","June")))
Run the following code chunk to abbreviate the weekdays and months:
all_trips<- all_trips %>%
mutate(start_weekday=wday(started_at,label=TRUE),
end_weekday=wday(ended_at,label=TRUE),
start_month=month(started_at,label=TRUE),
end_month=month(ended_at,label=TRUE))
Generate Beginning and Ending Dates for the Charts:
mindate<-min(all_trips$start_date)
maxdate<-max(all_trips$start_date)
mindate
## [1] "2021-08-01"
maxdate
## [1] "2022-07-31"
We need a mode function for our results
getmode<- function(a){
modes<- unique(a)
modes[which.max(tabulate(match(a, modes)))]
}
Aggregate statistics
Ride_length<-all_trips %>%
aggregate(ride_length~member_casual+rideable_type,FUN=mean)
all_trips %>% aggregate(distance_km~member_casual,FUN=mean)
## member_casual distance_km
## 1 casual 2.256193
## 2 member 2.088253
all_trips %>% aggregate(ride_length~member_casual,FUN=mean)
## member_casual ride_length
## 1 casual 24.34043 mins
## 2 member 12.65067 mins
all_trips %>% aggregate(ride_length~member_casual,FUN=median)
## member_casual ride_length
## 1 casual 14 mins
## 2 member 9 mins
all_trips %>% aggregate(ride_length~member_casual,FUN=max)
## member_casual ride_length
## 1 casual 41629 mins
## 2 member 1500 mins
all_trips %>% aggregate(ride_length~member_casual,FUN=min)
## member_casual ride_length
## 1 casual 0 mins
## 2 member 0 mins
all_trips %>% aggregate(ride_length~member_casual,getmode)
## member_casual ride_length
## 1 casual 8
## 2 member 5
all_trips %>% aggregate(speed~member_casual,FUN=max)
## member_casual speed
## 1 casual 930.88
## 2 member 1499.04
all_trips %>% aggregate(start_weekday~member_casual+rideable_type,getmode)
## member_casual rideable_type start_weekday
## 1 casual classic_bike Sat
## 2 member classic_bike Tue
## 3 casual docked_bike Sat
## 4 casual electric_bike Sat
## 5 member electric_bike Thu
Mod_weekday<- all_trips %>% aggregate(start_weekday~member_casual+rideable_type,getmode)
Table 1: Cyclistic Customer Behaviour
| Cyclistic's Customer Behaviour | ||||||
|---|---|---|---|---|---|---|
Monthly travel stats of casual and member riders from August '21 to July '22. Average figures. |
||||||
| bike trips (monthly) |
duration, min |
distance, km |
speed, km/h |
peak month |
peak weekday |
|
| casual | ||||||
| classic_bike | 93,320 | 24.94 | 2.12 | 3.04 | Aug | Sat |
| docked_bike1 | 18,539 | 54.25 | 2.17 | 1.99 | Aug | Sat |
| electric_bike | 95,513 | 17.95 | 2.41 | 4.24 | Jul | Sat |
| member | ||||||
| classic_bike | 159,252 | 13.27 | 1.94 | 3.97 | Aug | Tue |
| electric_bike | 120,177 | 11.83 | 2.29 | 5.09 | Jul | Thu |
| Source: Motivate International Inc, Ofili Victor | ||||||
1 Only casual riders rode docked bikes, presumably for leisure, as they travelled the longest |
||||||
Table 2: Major Findings from Analysis
Upon analysis, it was observed that casual riders rode longer, covered more distance but rode less frequently compared to annual members, especially around weekends, and were especially more active during spring and summer relative to annual subscribers. Another observation with casual riders was that they were the only category which used docked bikes. It could therefore be conclusively deduced that casual riders rode for leisure while annual members rose for more serious daily commute to work.
In order to be able to convert Casual riders to annual members, I would recommend that our proposition to Casual riders should make them have a sense that they would derive some sort of benefit so that there is a win-win situation as against a zero-sum game. Then we will compel existing and potential Casual riders to make annual subscriptions in order to take advantage of the benefits.
Casual riders are the only category who used docked bikes and so the use of this bike type can easily be subjected to registration for annual memberships. From Table 01, it can be observed that the docked bikes spend the most time out of stations; hence, higher rental fee could be demanded for this bike type compared to the other types, potentially increasing profitability.
People generally like to be part of a good cause. Since riding bikes are considered environmentally friendly, I will recommend that Cyclistic get involved with activities tied to promoting social causes such as protection of the environment. The messaging should communicate to the Casual rider the benefits of saving the environment and that a fraction of their subscription fees could, for example, support the planting of more trees in deforested areas around the world, etc.
From the results of my analysis, it was observed that most Casual riders are active in spring and summer as these are the seasons when physical social activities and interactions are on the high side. I recommend that the marketing campaign message be also based on the health benefits of riding bikes while also providing networking opportunities with fellow riders to cultivate new meaningful relationships in areas of mutual interest such as business, social, etc.
Finally, I would recommend to my supervisor for the analytics team to conduct an online survey among existing riders to understand the target audience better and get to know their mobility needs better and their reasons for their preference for flexible plans. This will give the team more information with which to design campaign messages.