1. Introduction

Cyclistic is a bike-sharing company that has transformed urban transportation by providing an affordable, sustainable, and flexible travel option for both residents and tourists. The service operates a fleet of bikes that users can rent from docking stations across the city, catering to a diverse range of customers.

The company offers two primary user categories:

Cyclistic’s management has observed that annual members tend to generate higher long-term revenue and are more likely to use the service frequently. As a result, the marketing team is interested in understanding the behavioral patterns of casual riders compared to annual members to develop strategies that encourage casual riders to convert into paying members.

Business Task

The key objective of this analysis is to examine how casual riders and annual members use Cyclistic’s bikes differently. By analyzing patterns such as ride duration, frequency, and peak usage times, the company can create targeted marketing campaigns and promotional strategies to drive membership growth.

This case study will address the following questions:

  1. How do ride durations differ between casual riders and annual members?
  2. On which days of the week are casual riders most active?
  3. What usage trends can be leveraged to encourage casual riders to convert into members?

Approach & Methodology

This study follows a data-driven approach, leveraging historical ride data from Q1 2020 to extract meaningful insights. The analysis will involve:

  • Data Cleaning – Checking for missing values, formatting inconsistencies, and preparing the dataset for analysis.
  • Feature Engineering – Creating new variables such as ride duration and day of the week to enhance the analysis.
  • Data Visualization – Using ggplot2 to generate clear and insightful graphs to compare member and casual rider behaviors.

Tools Used

  • RStudio: Primary environment for scripting, data manipulation, and visualization.
  • tidyverse: Collection of R packages for data wrangling, transformation, and visualization.
  • ggplot2: Used for creating insightful visualizations.
  • lubridate: Handles date-time data efficiently.
  • RMarkdown: Compiles the analysis into a well-structured and reproducible report.

2. Data Cleaning and Processing

Loading Required Libraries

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
library(ggplot2)

Loading Dataset

setwd("C:/Users/HM Traders/Downloads/Cyclistic_Data")
divvy_2020_q1 <- read_csv("Divvy_Trips_2020_Q1.csv", show_col_types = FALSE)
head(divvy_2020_q1)
## # A tibble: 6 × 13
##   ride_id          rideable_type started_at          ended_at           
##   <chr>            <chr>         <dttm>              <dttm>             
## 1 EACB19130B0CDA4A docked_bike   2020-01-21 20:06:59 2020-01-21 20:14:30
## 2 8FED874C809DC021 docked_bike   2020-01-30 14:22:39 2020-01-30 14:26:22
## 3 789F3C21E472CA96 docked_bike   2020-01-09 19:29:26 2020-01-09 19:32:17
## 4 C9A388DAC6ABF313 docked_bike   2020-01-06 16:17:07 2020-01-06 16:25:56
## 5 943BC3CBECCFD662 docked_bike   2020-01-30 08:37:16 2020-01-30 08:42:48
## 6 6D9C8A6938165C11 docked_bike   2020-01-10 12:33:05 2020-01-10 12:37:54
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <dbl>,
## #   end_station_name <chr>, end_station_id <dbl>, start_lat <dbl>,
## #   start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>

Checking for Missing Values

colSums(is.na(divvy_2020_q1))
##            ride_id      rideable_type         started_at           ended_at 
##                  0                  0                  0                  0 
## start_station_name   start_station_id   end_station_name     end_station_id 
##                  0                  0                  1                  1 
##          start_lat          start_lng            end_lat            end_lng 
##                  0                  0                  1                  1 
##      member_casual 
##                  0

Feature Engineering

divvy_2020_q1 <- divvy_2020_q1 %>% 
  mutate(ride_duration = as.numeric(difftime(ended_at, started_at, units = "mins")),
         day_of_week = wday(started_at, label = TRUE))

3. Data Analysis and Visualization

Total Rides by User Type

divvy_2020_q1 %>% 
  group_by(member_casual) %>% 
  summarise(total_rides = n()) %>% 
  ggplot(aes(x = member_casual, y = total_rides, fill = member_casual)) + 
  geom_bar(stat = "identity") + 
  labs(title = "Total Rides by User Type", x = "User Type", y = "Total Rides") +
  theme_minimal()

Average Ride Duration by User Type

divvy_2020_q1 %>% 
  group_by(member_casual) %>% 
  summarise(avg_duration = mean(ride_duration)) %>% 
  ggplot(aes(x = member_casual, y = avg_duration, fill = member_casual)) + 
  geom_bar(stat = "identity") + 
  labs(title = "Average Ride Duration by User Type", x = "User Type", y = "Duration (mins)") +
  theme_minimal()

Rides by Day of the Week

divvy_2020_q1 %>% 
  group_by(day_of_week, member_casual) %>% 
  summarise(total_rides = n(), .groups = 'drop') %>% 
  ggplot(aes(x = day_of_week, y = total_rides, fill = member_casual)) + 
  geom_bar(stat = "identity", position = "dodge") + 
  labs(title = "Rides by Day of the Week", x = "Day of the Week", y = "Total Rides") +
  theme_minimal()

4. Conclusion

Key Findings:

  • Annual members take significantly more rides than casual riders.
  • Casual riders have longer ride durations, indicating recreational use.
  • Weekends see a higher number of casual riders, while members ride consistently throughout the week.

Recommendations for Cyclistic:

  • Introduce Weekday Promotions: Offer incentives like weekday discounts to encourage casual riders to use bikes beyond weekends.
  • Enhance Membership Benefits: Improve perks such as faster unlock times or exclusive docking stations.
  • Targeted Marketing Campaigns: Advertise the cost-effectiveness of an annual membership, particularly for frequent leisure cyclists.

This structured report provides data-driven insights to help Cyclistic optimize its marketing strategies and grow its member base.