Cyclistics: The uphill battle to convert casual users to members

The following notebook serves as an analysis of an example case study with fictional company Cyclistics.

Because this is a sample case study for a portfolio, most code will be visible to those reading. A more presentable version would omit code references where feasible so as to not distract a casual reader.

Data is secured as per a licensing agreement with Motivate International Inc. allowing the aforementioned data set to be used for public means. Data will not be used to identify riders’ individual information such as credit card numbers or other personal details as per the licensing agreement and guidelines concerning ethical use of data

The data for the past 12 months is contained inside 12 CSV files corresponding to each month. Data exceeds tens of thousands of rows for each CSV file and all combine to approximately six million rows of data.

Insights gleaned here are aimed at understanding how the use of Cyclistic’s products and services differ from casual users to full-fledged members and how that difference can play a part in a strategy to convert casuals into memberships.

\(-\)

To begin, we install packages necessary for analysis

## package 'tidyverse' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\Tyler\AppData\Local\Temp\Rtmp6TvpgN\downloaded_packages
## package 'lubridate' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\Tyler\AppData\Local\Temp\Rtmp6TvpgN\downloaded_packages
## package 'ggplot2' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\Tyler\AppData\Local\Temp\Rtmp6TvpgN\downloaded_packages
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.8     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.1
## ✔ readr   2.1.2     ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## 
## Attaching package: 'lubridate'
## 
## 
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

\(-\)

Next, we take each existing data set, previously cleaned and organized and input into R.

divvy_202108 <- read_csv("202108-divvy-tripdata.csv")
divvy_202109 <- read_csv("202109-divvy-tripdata.csv")
divvy_202110 <- read_csv("202110-divvy-tripdata.csv")
divvy_202111 <- read_csv("202111-divvy-tripdata.csv")
divvy_202112 <- read_csv("202112-divvy-tripdata.csv")
divvy_202201 <- read_csv("202201-divvy-tripdata.csv")
divvy_202202 <- read_csv("202202-divvy-tripdata.csv")
divvy_202203 <- read_csv("202203-divvy-tripdata.csv")
divvy_202204 <- read_csv("202204-divvy-tripdata.csv")
divvy_202205 <- read_csv("202205-divvy-tripdata.csv")
divvy_202206 <- read_csv("202206-divvy-tripdata.csv")
divvy_202207 <- read_csv("202207-divvy-tripdata.csv")

\(-\)

Combine each month’s data into a single dataframe called “all_trips.”

all_trips <- bind_rows(divvy_202108, divvy_202109, divvy_202110, divvy_202111, divvy_202112, divvy_202201, divvy_202202, divvy_202203, divvy_202204, divvy_202205, divvy_202206, divvy_202207)

\(-\)

Here, we compare members and casual users on the basis of summary statistics mean, median, maximum, and minimum.

aggregate(all_trips$ride_length ~ all_trips$member_casual, FUN = mean)
##   all_trips$member_casual all_trips$ride_length
## 1                  casual        1456.8947 secs
## 2                  member         762.1266 secs
aggregate(all_trips$ride_length ~ all_trips$member_casual, FUN = median)
##   all_trips$member_casual all_trips$ride_length
## 1                  casual                   864
## 2                  member                   541
aggregate(all_trips$ride_length ~ all_trips$member_casual, FUN = max)
##   all_trips$member_casual all_trips$ride_length
## 1                  casual            86398 secs
## 2                  member            86399 secs
aggregate(all_trips$ride_length ~ all_trips$member_casual, FUN = min)
##   all_trips$member_casual all_trips$ride_length
## 1                  casual                0 secs
## 2                  member                0 secs

\(-\)

Next, we aggregate the average ride time by each day for members vs casual users

aggregate(all_trips$ride_length ~ all_trips$member_casual + all_trips$day_of_week, FUN = mean)
##    all_trips$member_casual all_trips$day_of_week all_trips$ride_length
## 1                   casual                     1        1685.2801 secs
## 2                   member                     1         861.4598 secs
## 3                   casual                     2        1494.7334 secs
## 4                   member                     2         739.2754 secs
## 5                   casual                     3        1283.4907 secs
## 6                   member                     3         715.3556 secs
## 7                   casual                     4        1247.2417 secs
## 8                   member                     4         723.2198 secs
## 9                   casual                     5        1286.4675 secs
## 10                  member                     5         728.0092 secs
## 11                  casual                     6        1349.8715 secs
## 12                  member                     6         742.5144 secs
## 13                  casual                     7        1604.1259 secs
## 14                  member                     7         852.7000 secs

\(-\)

Lastly, before visualizing, we analyze the ridership data by type and weekday

all_trips %>% 
  mutate(weekday = wday(started_at, label = TRUE)) %>% 
  group_by(member_casual, weekday) %>% 
  summarize(number_of_rides = n(), average_duration = mean(ride_length)) %>% 
  arrange(member_casual, weekday)
## # A tibble: 14 × 4
## # Groups:   member_casual [2]
##    member_casual weekday number_of_rides average_duration
##    <chr>         <ord>             <int> <drtn>          
##  1 casual        Sun              475626 1685.2801 secs  
##  2 casual        Mon              299656 1494.7334 secs  
##  3 casual        Tue              273826 1283.4907 secs  
##  4 casual        Wed              281783 1247.2417 secs  
##  5 casual        Thu              316118 1286.4675 secs  
##  6 casual        Fri              347642 1349.8715 secs  
##  7 casual        Sat              527575 1604.1259 secs  
##  8 member        Sun              417978  861.4598 secs  
##  9 member        Mon              472392  739.2754 secs  
## 10 member        Tue              523387  715.3556 secs  
## 11 member        Wed              522648  723.2198 secs  
## 12 member        Thu              522662  728.0092 secs  
## 13 member        Fri              466680  742.5144 secs  
## 14 member        Sat              453490  852.7000 secs

\(-\)

Now let’s visualize the number of rides by rider type

all_trips %>% 
  mutate(weekday = wday(started_at, label = TRUE)) %>% 
  group_by(member_casual, weekday) %>% 
  summarise(number_of_rides = n()
            ,average_duration = mean(ride_length)) %>% 
  arrange(member_casual, weekday)  %>% 
  ggplot(aes(x = weekday, y = number_of_rides, fill = member_casual)) +
  geom_col(position = "dodge") +
  labs(title = "Number of rides by rider type at Cyclistic")

\(-\)

Finally, let’s create a visualization for average duration

all_trips %>% 
  mutate(weekday = wday(started_at, label = TRUE)) %>% 
  group_by(member_casual, weekday) %>% 
  summarise(number_of_rides = n()
            ,average_duration = mean(ride_length)) %>% 
  arrange(member_casual, weekday)  %>% 
  ggplot(aes(x = weekday, y = average_duration, fill = member_casual)) +
  geom_col(position = "dodge") +
  labs(title = "Average trip duration by casual of members of Cyclistic")

\(-\)

Conclusions and Recommendations

The final recommendation is to target casual riders for membership who have activity during the week before obtaining more data on casual riders’ motivations and preferences in order to target those who are more active on the weekends.

We clearly see that casual users ride for longer in general and predominantly on the weekends or close to the weekends with a low in the middle of the week.

Members tend to have shorter ride lengths and their number of rides peak on weekdays. That said, trip duration is longer for members on weekends than on weekdays indicating both a use of bicycles for commuting and also for leisure.

The data indicates that casual users could be those who often do not use Cyclistic’s services for commuting although expanding the granularity of data via a scientific survey or other such means will provide a more clear picture of casual users’ intents.

Assuming the above conclusions are the only relevant ones we can draw from this data alone, it suggests that the best path for conversion of casual users into full-fledged members is to target casual users who have activity during the weekdays as their activity overlaps with the trends that we see from members. Other casual members can be targeted in another phase of outreach once their motivations are clearer.

It is also worth bearing in mind that this portion of the analysis aimed to answer the question, “How do annual members and casual riders use Cyclistic bikes differently?” Two other questions the rest of the analysis team is responsible for answering, why casual riders would buy annual memberships and how Cyclistic can use digital media to influence casual riders to become members, form the full analysis although this portion of the analysis keeps those questions in mind to provide a foundation for a full and detailed analysis.

This analysis provides a clear recommendation for a first targeted outreach to this cohort and the results of that digital marketing strategy will inform further efforts to convert as many casual riders to full memberships as possible.