Case Study: How Does a Bike-Share Navigate Speedy Success?

Introduction

This project is a Google Data Analytics Capstone Project on Coursera, designed to reinforce the analytical skills gained throughout the course duration. For this exercise, I utilize a publicly available data from Cyclistic, a fictional bike-share company based in the city of Chicago, launched in 2016. The company features 5800 bicycles and 692 docking station. They offer various types of bikes such as classic , electric, and docked bikes.

Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members. The finance analysts have concluded that annual members are much more profitable than casual riders.

I am the junior data analyst working in the marketing analyst team at Cyclistic. In order to answer the key business questions, uncover key insights, pattern and trends on the bike rental service usage and communicate the findings in an easy, engaging and persuasive way to the key stakeholder. i will follow the steps of the data analysis process: ask, prepare, process, analyze, share, and act

Stakeholders

⦁ Lily Moreno: The director of marketing

⦁ Cyclistic marketing analytics team

⦁ Cyclistic executive team

Ask

Business Task

⦁ How do annual members and casual riders use Cyclistic bikes differently?

⦁ Increase in revenue derived from the transition of casual riders into annual members subscriber. i aim to uncover distinctions in the utilization patterns of Cyclistic bikes between casual riders and annual members and how can Cyclistic use digital media to influence casual riders to become members?

⦁ Recommend a new marketing strategy geared towards the conversion of casual riders

Prepare

Data Source Description

This report made use of data collected in 12 months (August 2022 - July 2023) for detailed data analysis to complete this case study. To download the data, please use this link https://divvy-tripdata.s3.amazonaws.com/index.html. This data was made public by Motivate International Inc, under this license https://divvybikes.com/data-license-agreement. Due to data privacy issues, personal information has been removed or encrypted. i Ensure the “Divvy bikeshare” dataset integerity by checking if the data ROCCC.

Data Interigity(ROCCC)

Reliable: data is reliable and accurate being collected by a credible source using advanced data collection technology

Original: Primary data source, the company Cyclistic gathers this data first-hand.

Comprehensive: The dataset has more than 5 million complete data entries. With the available data we are able to compare the two groups ‘annual members & casual members’, but not the preference that lead to their choice on what product/services to opt for.

Current: Data is up-to-date as it includes data from the past 12 months.

Cited : Yes, The data is authorized under license.

Dataset

The dataset consist of 13 columns and a total of 5723606 rows. Here’s an overview of the columns:

ride_id:— a unique ID per ride

rideable_type:- the type of bicycle used

started_at: the date and time at which bicycle was checked out

ended_at: the date and time at which bicycle was checked in

start_station_name: the name of the station at the start of the trip

start_station_id: a unique identifier for the start station

end_station_name: the name of the station at the end of the trip

end_station_id: a unique identifier for the end station

Start_lat: the latitude of the start station

start_lng: the longitude of the start station

end_lat: the latitude of the end station

end_lng: the longitude of the end station

member_casual: this field indicates whether the rider type is member or a casual

Process

R Programming

R programming is the data analysis tool of my choice for this project, and preferable to spreadsheet due to the raw amount of data in this dataset. R is an open-source programming language and free software environment for statistical computing and graphics. R has a wide range of tools available and can handle massive amounts of data with ease. I imported and load the data into RStudio.

Data Extraction and Processing

# Setting up my r environment and Loading packages

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(janitor)
## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(lubridate)
library(readr)
library(skimr)
library(dplyr)
library(tidyr)
library(gt)
library(farver)
library(plotrix)
library(tinytex)
# setting directory

setwd("C:/Users/HP PC/Documents/Cousera/Capstone_Project/cyclist_divvy_dataset")
# assigning and importing dataset  

aug_22 <- read.csv('202208-divvy-tripdata.csv')
sep_22 <- read.csv("202209-divvy-tripdata.csv")
oct_22 <- read.csv("202210-divvy-tripdata.csv")
nov_22 <- read.csv("202211-divvy-tripdata.csv")
dec_22 <- read.csv("202212-divvy-tripdata.csv")
jan_23 <- read.csv("202301-divvy-tripdata.csv")
feb_23 <- read.csv("202302-divvy-tripdata.csv")
mar_23 <- read.csv("202303-divvy-tripdata.csv")
apr_23 <- read.csv("202304-divvy-tripdata.csv")
may_23 <- read.csv("202305-divvy-tripdata.csv")
jun_23 <- read.csv("202306-divvy-tripdata.csv")
jul_23 <- read.csv("202307-divvy-tripdata.csv")

Column Names

colnames(aug_22)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
colnames(sep_22)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
colnames(oct_22)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
colnames(nov_22)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
colnames(dec_22)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
colnames(jan_23)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
colnames(feb_23)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
colnames(mar_23)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
colnames(apr_23)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
colnames(may_23)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
colnames(jun_23)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
colnames(jul_23)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"

i ensured the column are the same on each assigned data set variable which ascertained that all column name are consistent.

# merging all the data set

divvy_trips <- rbind(aug_22, sep_22, oct_22, nov_22, dec_22, jan_23, feb_23, mar_23, apr_23, may_23, jun_23, jul_23)
# broad overview of the dataset

skim_without_charts(divvy_trips)
Data summary
Name divvy_trips
Number of rows 5723606
Number of columns 13
_______________________
Column type frequency:
character 9
numeric 4
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
ride_id 0 1 16 16 0 5723606 0
rideable_type 0 1 11 13 0 3 0
started_at 0 1 19 19 0 4821003 0
ended_at 0 1 19 19 0 4833101 0
start_station_name 0 1 0 64 868772 1802 0
start_station_id 0 1 0 36 868904 1503 0
end_station_name 0 1 0 64 925008 1800 0
end_station_id 0 1 0 36 925149 1506 0
member_casual 0 1 6 6 0 2 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100
start_lat 0 1 41.90 0.05 41.64 41.88 41.90 41.93 42.07
start_lng 0 1 -87.65 0.03 -87.92 -87.66 -87.64 -87.63 -87.52
end_lat 6102 1 41.90 0.07 0.00 41.88 41.90 41.93 42.18
end_lng 6102 1 -87.65 0.12 -88.16 -87.66 -87.64 -87.63 0.00
str(divvy_trips)
## 'data.frame':    5723606 obs. of  13 variables:
##  $ ride_id           : chr  "550CF7EFEAE0C618" "DAD198F405F9C5F5" "E6F2BC47B65CB7FD" "F597830181C2E13C" ...
##  $ rideable_type     : chr  "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
##  $ started_at        : chr  "2022-08-07 21:34:15" "2022-08-08 14:39:21" "2022-08-08 15:29:50" "2022-08-08 02:43:50" ...
##  $ ended_at          : chr  "2022-08-07 21:41:46" "2022-08-08 14:53:23" "2022-08-08 15:40:34" "2022-08-08 02:58:53" ...
##  $ start_station_name: chr  "" "" "" "" ...
##  $ start_station_id  : chr  "" "" "" "" ...
##  $ end_station_name  : chr  "" "" "" "" ...
##  $ end_station_id    : chr  "" "" "" "" ...
##  $ start_lat         : num  41.9 41.9 42 41.9 41.9 ...
##  $ start_lng         : num  -87.7 -87.6 -87.7 -87.7 -87.7 ...
##  $ end_lat           : num  41.9 41.9 42 42 41.8 ...
##  $ end_lng           : num  -87.7 -87.6 -87.7 -87.7 -87.7 ...
##  $ member_casual     : chr  "casual" "casual" "casual" "casual" ...

Carrying Out analysis to identify obvious errors, understand patterns within the data, detect outliers, and find relation within variables.

#quick glace of the data set, showing the first 10 rows

head(divvy_trips, 10) 
# checking for duplicates

nrow(divvy_trips)
## [1] 5723606
divvy_trips %>%  distinct(ride_id, .keep_all = TRUE) %>% count()    
#no duplicates found 
divvy_trips1 <- divvy_trips %>% drop_na 

# there are 6,102 missing values, this brings the total number of observation to 5717504 

Validating and Transforming Data

# converting 'started_at' and 'ended_at' to datetime 

divvy_trips1$started_at <- as.POSIXct(divvy_trips1$started_at)
divvy_trips1$ended_at <- as.POSIXct(divvy_trips1$ended_at)
# 'started_ride_date' column
divvy_trips1$started_ride_date <- format(as.Date(divvy_trips1$started_at), "%D")

# 'started_ride_time' column
divvy_trips1$started_ride_time <- format(as.POSIXct(divvy_trips1$started_at), "%H:%M:%S")

# 'ended_date' column
divvy_trips1$ended_date <- format(as.Date(divvy_trips1$ended_at), "%D")

# 'ended_ride_time' column
divvy_trips1$ended_ride_time <- format(as.POSIXct(divvy_trips1$ended_at), "%H:%M:%S")

# 'ride_length' column
divvy_trips1$ride_length <- difftime(divvy_trips1$ended_at, divvy_trips1$started_at, units = "secs")

# 'start_hour'
divvy_trips1$start_hour <- format(as.POSIXct(divvy_trips1$started_at), "%H")
#Extracting date component for 'year', 'month', & 'weekday'

divvy_trips2 <- divvy_trips1 %>%
  mutate(year = lubridate::year(started_at),  
         month = lubridate::month(started_at, label = TRUE), # month  "abbr"
         weekday = lubridate::wday(started_at, label = TRUE)) # day of week
# overview of dataset

glimpse(divvy_trips2) #glimpse of dataset
## Rows: 5,717,504
## Columns: 22
## $ ride_id            <chr> "550CF7EFEAE0C618", "DAD198F405F9C5F5", "E6F2BC47B6…
## $ rideable_type      <chr> "electric_bike", "electric_bike", "electric_bike", …
## $ started_at         <dttm> 2022-08-07 21:34:15, 2022-08-08 14:39:21, 2022-08-…
## $ ended_at           <dttm> 2022-08-07 21:41:46, 2022-08-08 14:53:23, 2022-08-…
## $ start_station_name <chr> "", "", "", "", "", "", "", "", "", "", "", "", "",…
## $ start_station_id   <chr> "", "", "", "", "", "", "", "", "", "", "", "", "",…
## $ end_station_name   <chr> "", "", "", "", "", "", "", "", "", "", "", "", "",…
## $ end_station_id     <chr> "", "", "", "", "", "", "", "", "", "", "", "", "",…
## $ start_lat          <dbl> 41.93, 41.89, 41.97, 41.94, 41.85, 41.79, 41.89, 41…
## $ start_lng          <dbl> -87.69, -87.64, -87.69, -87.65, -87.65, -87.72, -87…
## $ end_lat            <dbl> 41.94, 41.92, 41.97, 41.97, 41.84, 41.82, 41.89, 41…
## $ end_lng            <dbl> -87.72, -87.64, -87.66, -87.69, -87.66, -87.69, -87…
## $ member_casual      <chr> "casual", "casual", "casual", "casual", "casual", "…
## $ started_ride_date  <chr> "08/07/22", "08/08/22", "08/08/22", "08/08/22", "08…
## $ started_ride_time  <chr> "21:34:15", "14:39:21", "15:29:50", "02:43:50", "20…
## $ ended_date         <chr> "08/07/22", "08/08/22", "08/08/22", "08/08/22", "08…
## $ ended_ride_time    <chr> "21:41:46", "14:53:23", "15:40:34", "02:58:53", "20…
## $ ride_length        <drtn> 451 secs, 842 secs, 644 secs, 903 secs, 352 secs, …
## $ start_hour         <chr> "21", "14", "15", "02", "20", "13", "14", "20", "21…
## $ year               <dbl> 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 202…
## $ month              <ord> Aug, Aug, Aug, Aug, Aug, Aug, Aug, Aug, Aug, Aug, A…
## $ weekday            <ord> Sun, Mon, Mon, Mon, Sun, Mon, Mon, Sun, Sun, Sun, S…
# removing bad data and dealing with outliers

divvy_trips2 %>% 
  select(member_casual, ride_length) %>% 
  group_by(member_casual) %>%
  dplyr::summarize(min_ride_length = min(ride_length), max_ride_length = max(ride_length)) %>% 
  ungroup()

The minimum and maximum ‘ride_length’ shows a negative values and a ‘ride_length’ more than 24 hours(86400secs) of any ride trips. This values are outliers and will be removed. A trip less than 60 sec may also be due to other factors such as faulty bikes and will also be removed.

divvy_trips3 <- divvy_trips2 %>%
  filter(ride_length >= 60 & ride_length < 86400)

nrow(divvy_trips3) 
## [1] 5565647
#brings the total number of observation to 5565647
# converting 'ride_length' to numeric before analyzing
is.factor(divvy_trips3$ride_length) #FLASE
## [1] FALSE
divvy_trips3$ride_length <- as.numeric(as.character(divvy_trips3$ride_length))

is.numeric(divvy_trips3$ride_length) #TRUE
## [1] TRUE
#ordering by 'started_at' in Ascending order
divvy_trips_clean <-  arrange(divvy_trips3, started_at)
# Dataset Overview

divvy_trips_clean %>% head(10)

Analyze

Descripitive Analysis of dataset

Using the summary function to determine the mean, median, min & max range of ride_length”

##statistical summary of trip duration
summary(divvy_trips_clean$ride_length)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    60.0   344.0   592.0   931.5  1042.0 86255.0
#statistical summary of trip duration by member type
divvy_trips_clean %>% 
    group_by(member_casual) %>% 
    summarize(minmum_ride_length = min(ride_length),
              maximum_ride_length = max(ride_length),
              average_ride_length = mean(ride_length),
              median_ride_length = median(ride_length))
#yearly percentage of casual Vs. member rides
divvy_trips_clean %>% 
  group_by(member_casual) %>%
  summarise("numbers of rides" = n(), perc = n()/nrow(divvy_trips_clean) *100)
#Create data for the graph
x <-  c(3456701, 2108946)
piepercent <- paste0(round(x/sum(x)*100, 1), "%")
labels <-  c("member", "casual")
color <- c("#00BFC4","#F8766D")
# Plot the chart.
pie3D(x, labels = piepercent, explode =0.1, radius = 1,
  main = "Proportion of Total Rides %", col = color)
  legend(title = "Membership", "topright", c("member", "casual"), 
  cex = 0.5, fill = color)

The distribution chart indicates, members possessing (62.1%), and casual riders have (37.9%) of the total rides. members used rides (24.2%) more than casual riders during the 12 months period.

# Comparison of total rides n

divvy_trips_clean %>% 
  group_by(member_casual) %>%
  ggplot(aes(x=member_casual, fill = member_casual))+
  labs(title= "Membership Distribution", y= "Number of rides", x="casual vs member", fill = "Membership type") +
  geom_bar(position='dodge')

An analysis of the rides frequency over a period of 12 months to gain precise insights that will guide my approach to encourage casual members to transition into full-fledged annual members.

#Calculating rides per start_hour & Average
divvy_trips_clean %>%
  group_by(start_hour, member_casual) %>%
  summarize(ride_numbers = n(), .groups = 'drop', Average_duration = mean(ride_length))
# Visualization of rides at early hours 
divvy_trips_clean %>%
  group_by(start_hour, member_casual) %>%
  summarize(ride_numbers = n(), .groups = 'drop', Average_duration = mean(ride_length)) %>% 
  ggplot(aes(x=start_hour, y=ride_numbers, fill= member_casual)) +
  geom_col(position = "dodge") +
  labs(title = "Rides started per hour of the day", x="Hours", y="Number of started rides", fill="Membership") 

  • From the above chart we can see more members between 7:00-9:00 and 16:00-18:00 which would suggest that annual members use their bikes to commute to and from work. Casual riders have a spike in number of rides around 11:00-12:00 and 17:00-19:00. Also there is bigger volume rise in the afternoon for both type of riders. This information needs to be checked on day basis.
#Calculating rides per week 

divvy_trips_clean %>%
  group_by(member_casual, weekday) %>%
  summarize(Number_Rides = n(),  .groups = 'drop', Average_rides = mean(ride_length)) %>%
  arrange(weekday)
# Visualization of rides weekly
divvy_trips_clean %>%
  group_by(member_casual, weekday) %>%
  summarize(Number_Rides = n(),  .groups = 'drop', Average_rides = mean(ride_length)) %>%
  arrange(weekday) %>%
  ggplot(aes(x=weekday, y=Number_Rides, fill=member_casual))+
  geom_col(position='dodge')+
  labs(title = "Total Rides Weekly: Subscriber Vs. Casual", x="Days", y="Total Ride Numbers", fill="Membership_Type") 

  • The Chart shows, during week day members have a much higher frequency of rides than casual riders. Casual riders however, shows more frequency during weekends, while the number of rides for members actually decreases slightly.
# Visualization of average weekly rides
divvy_trips_clean %>%
  group_by(member_casual, weekday) %>%
  summarize(Number_Rides = n(),  .groups = 'drop', Average_rides = mean(ride_length)) %>%
  arrange(weekday) %>%
  ggplot(aes(x=member_casual, y=Average_rides, fill=weekday))+
  geom_bar(position='Dodge', stat='identity')+
  labs(title = "Average Rides Weekly: Subscriber Vs. Casual", x="Days", y="Average_rides_length", fill="Weekday")

  • The chart for the average rides shows casual members have longer average distance of rides than the member users, but they have relatively longer rides.
# Calculating rides per month

divvy_trips_mth <- divvy_trips_clean

divvy_trips_mth$month <- factor(divvy_trips_mth$month,levels = c( "Aug", "Sep", "Oct", "Nov", "Dec", "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul"))

divvy_trips_mth %>%
  group_by(member_casual, month) %>%
  summarize(ride_numbers = n(), .groups = 'drop', Average_duration = mean(ride_length)) %>%
  arrange(month)
# Visualization Monthly rides

divvy_trips_mth %>%
  group_by(member_casual, month) %>%
  summarize(ride_numbers = n(), .groups = 'drop', Average_duration = mean(ride_length)) %>%
  arrange(month) %>% 
  ggplot(aes(x=month, y=ride_numbers, fill= member_casual)) +
  theme(axis.text.x = element_text(angle = 45)) +
  geom_col(width=0.5,position = position_dodge(width=0.5)) +
  labs(title = "Monthly Trips: Casual Vs. Subscribed members", x="month", y="Number of Trips", fill="Membership")

  • The months June, July, August and September are the most busy time of the year among both members and casual riders. It is possible due to winter there is a significant drop in total rides in the months of November, December, January and February for both type of customers. The chart also indicates member’s total rides are higher than casual riders throughout the year.
#Average number of rides
divvy_trips_clean %>%
  group_by(member_casual, month) %>%
  summarize(ride_numbers = n(), .groups = 'drop', Average_duration = mean(ride_length)) %>%
  arrange(month) %>%
  ggplot(aes(x=month, y=Average_duration, fill= member_casual)) +
  geom_col(position = 'dodge') +
  labs(title = "Average ride length of members by month", x="month", y="Average Ride_length", fill="Membership")

  • Average ride length of members is about the same <1000 secs throughout the year. While casual riders average ride length is between 750 - 1500 secs throughout the year. But in the month of July average right length is higher but total rides are lowest as compared to other months.
divvy_trips_clean %>% 
  group_by(member_casual, rideable_type) %>%
  summarise(total=n(), .groups = 'drop', perc =round(n()/nrow(divvy_trips_clean)*100, 1)) %>%
  arrange(rideable_type)
# Visualization of prefered bikes
divvy_trips_clean %>% 
  group_by(member_casual, rideable_type) %>%
  summarise(total=n(), .groups = 'drop', perc =round(n()/nrow(divvy_trips_clean)*100, 1)) %>%
  arrange(rideable_type) %>%
  ggplot(aes(x=rideable_type, y= total, fill=member_casual))+
  geom_col(position='dodge')+
  labs(title = "Preferred Bikes")+ 
  facet_wrap(~member_casual)+
  geom_text(aes(label = paste(format(perc, nsmall=1), "%")), vjust= -0.2)

  • Electric bikes ranks first, being mostly preferred by the casual and members at (21.7%) & (32.2%) respectively of total bike usage.It is also noticeable that the member rides also go more for classic bikes at (29.9%). Docked bikes are only used by casual riders with marginal usage of as little as (2.2%).

Share

Key insights

  • Subscriber member use bikes more frequently than the casual riders under the same circumstances which may be influenced by many factors such as price, choice of bikes, location of docking station, weekend outings etc may constitute why annual members and casual riders use Cyclistic bikes differently.
  • casual riders cycle longer on average than the members.
  • The hotter season both segments experience the highest demand for the bikes and this most likely to be associated with the favorable whether conditions.

Recommendation

Deliverable

  • Design offers in line with the daily bike demand

  • Price is a decisive factor in consumer choice (what to buy ) and how much of it to buy. Coupons and discounts could be handed out along with the annual subscription, weekend-only membership bonuses, incentive/discount for casual riders in line with the daily targeting casual riders. Differentiated offers aligned with trip duration (short and long riders) can also be considered

  • Create marketing campaigns via email, or advertisement in the docking stations explaining why annual member is beneficial. Campaigns should be placed at the peak months of the year especially during hot seasons. Advert should also focus on electric bike and classic bikes due frequency of bike usage.

Conclusion

Note : All ride ids are unique so we cannot conclude if the same rider taken several rides. More rider data needed for further analysis