Case study: How does a bike-share navigate speedy success?

Overview

Cyclistic, a prominent bike-share company headquartered in Chicago, has rapidly gained traction in the city’s transportation landscape. In an effort to delve deeper into their customer base and refine their marketing strategies, Cyclistic seeks to understand the distinct behaviors and preferences of casual riders versus annual members.

The company recognizes the need to tailor its marketing approach to effectively convert casual riders into committed annual members. By leveraging data-driven insights, Cyclistic aims to develop a comprehensive understanding of how these two customer segments interact with their services differently.

Business Task

The objective of this business task is to develop a comprehensive marketing strategy for Cyclistic that addresses the distinct needs and behaviors of both annual members and casual riders. By answering the following three questions, we aim to optimize marketing efforts, increase customer engagement, and drive conversions from casual riders to annual members.

Understanding Usage Patterns: Analyze Cyclistic’s dataset to identify differences in how annual members and casual riders utilize Cyclistic bikes.

Data Background

The dataset was acquired from Click here and Motivate International Inc made the data available under this license.

For this project, I downloaded data for twelve months (January to December 2020). The zipped CSVs were downloaded and unzipped into a folder.

Below shown the dataset of a cyclistic biketrip data for the year 2020.The dataset has 3541683 rows and 13 column.

Due to the large size of data we use R to analyse effectively.

R Programming

Loading Packages The R package is a collection of R functions, data sets, and compiled code that extends the functionality of R. Here we use four packages to analyse the data.

In R, the library() function is used to load R packages into your current R session

library (tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library (janitor)

## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

library (lubridate)
library (scales)

## 
## Attaching package: 'scales'
## 
## The following object is masked from 'package:purrr':
## 
##     discard
## 
## The following object is masked from 'package:readr':
## 
##     col_factor

library(readr)
library(geosphere)
rm(list=ls())

Read CSV file

Below given are the year 2020 dataset of cyclistic bike share program which are downloaded and saved as CSV files. Here read.csv() is used for reading the csv files.

df1 <- read.csv(“Divvy_Trips_2020_Q1.csv”) df2 <- read.csv(“202004-divvy-tripdata.csv”) df3 <- read.csv(“202005-divvy-tripdata.csv”) df4 <- read.csv(“202006-divvy-tripdata.csv”) df5<- read.csv(“202007-divvy-tripdata.csv”) df6 <- read.csv(“202008-divvy-tripdata.csv”) df7 <- read.csv(“202009-divvy-tripdata.csv”) df8 <- read.csv(“202010-divvy-tripdata.csv”) df9 <- read.csv(“202011-divvy-tripdata.csv”) df10 <- read.csv(“202012-divvy-tripdata.csv”) df20 <- rbind(df1,df2,df3,df4,df5,df6,df7,df8,df9,df10)

Save the binded dataset as CSV file.

write.csv(df20,file = “df20.CSV”,row.names = FALSE)

df20 <- read_csv("C:/Users/nisha/Desktop/New folder/DataAnalytics_NishaP/Dataset/df20.CSV")

## Rows: 3541683 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

In R, the head() function is used to view the first few rows of a data frame or a matrix. It allows you to quickly inspect the structure and content of your data without displaying the entire dataset.

head(df20)

## # A tibble: 6 × 13
##   ride_id          rideable_type started_at          ended_at           
##   <chr>            <chr>         <dttm>              <dttm>             
## 1 EACB19130B0CDA4A docked_bike   2020-01-21 20:06:59 2020-01-21 20:14:30
## 2 8FED874C809DC021 docked_bike   2020-01-30 14:22:39 2020-01-30 14:26:22
## 3 789F3C21E472CA96 docked_bike   2020-01-09 19:29:26 2020-01-09 19:32:17
## 4 C9A388DAC6ABF313 docked_bike   2020-01-06 16:17:07 2020-01-06 16:25:56
## 5 943BC3CBECCFD662 docked_bike   2020-01-30 08:37:16 2020-01-30 08:42:48
## 6 6D9C8A6938165C11 docked_bike   2020-01-10 12:33:05 2020-01-10 12:37:54
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <chr>,
## #   end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## #   start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>

Cleaning Data

Janitor is an R package that provides a set of functions to clean and preprocess data in R data frames

df20_cleanedcols <- janitor::remove_empty(df20,which =c("cols"))
df20_cleanedrows <- janitor::remove_empty(df20,which =c("rows"))
dim(df20_cleanedcols)

## [1] 3541683      13

dim(df20_cleanedrows)

## [1] 3541683      13

Removing duplicates and NA values

df20_clean <- na.omit(df20)
# for unique and removing duplicates 
unique(df20_clean)

## # A tibble: 3,389,381 × 13
##    ride_id          rideable_type started_at          ended_at           
##    <chr>            <chr>         <dttm>              <dttm>             
##  1 EACB19130B0CDA4A docked_bike   2020-01-21 20:06:59 2020-01-21 20:14:30
##  2 8FED874C809DC021 docked_bike   2020-01-30 14:22:39 2020-01-30 14:26:22
##  3 789F3C21E472CA96 docked_bike   2020-01-09 19:29:26 2020-01-09 19:32:17
##  4 C9A388DAC6ABF313 docked_bike   2020-01-06 16:17:07 2020-01-06 16:25:56
##  5 943BC3CBECCFD662 docked_bike   2020-01-30 08:37:16 2020-01-30 08:42:48
##  6 6D9C8A6938165C11 docked_bike   2020-01-10 12:33:05 2020-01-10 12:37:54
##  7 31EB9B8F406D4C82 docked_bike   2020-01-10 13:07:35 2020-01-10 13:12:24
##  8 A2B24E3F9C9720E3 docked_bike   2020-01-10 07:24:53 2020-01-10 07:29:50
##  9 5E3F01E1441730B7 docked_bike   2020-01-31 16:37:16 2020-01-31 16:42:11
## 10 19DC57F7E3140131 docked_bike   2020-01-31 09:39:17 2020-01-31 09:42:40
## # ℹ 3,389,371 more rows
## # ℹ 9 more variables: start_station_name <chr>, start_station_id <chr>,
## #   end_station_name <chr>, end_station_id <chr>, start_lat <dbl>,
## #   start_lng <dbl>, end_lat <dbl>, end_lng <dbl>, member_casual <chr>

dim(df20_clean)

## [1] 3389381      13

df20_clean <- df20_clean %>% filter(df20_clean$start_station_name!=" ")

Organising Data

Lubridate is an R package designed to make it easier to work with dates and times in R. It provides a set of functions that simplify common tasks such as parsing, manipulating, and formatting dates and times.we use parse date ymd_hms() and as.Date() for changing the Started_at and ended_at column format.

Difftime() is used for calculating the difference in time. This helps us to find and analyse the duration of each ride.

df <- df20_clean

df$started_date <- as.Date(df$started_at)
df$ended_date <- as.Date(df$ended_at)
#time as  hours and minutes
df$started_at <- lubridate::ymd_hms(df$started_at)

## Warning: 11 failed to parse.

df$ended_at <- lubridate::ymd_hms(df$ended_at)

## Warning: 1 failed to parse.

df$Start_time <- format(df$started_at,"%H:%M:%S")
df$End_time <- format(df$ended_at,"%H:%M:%S")
df$day_of_the_week <- weekdays(df$started_at)
df$month <- month(df$started_at, label = TRUE, abbr = TRUE)

df$trip_duration <- (as.double(difftime(df$ended_at,df$started_at)))/60

df<-df %>% 
  filter(trip_duration > 0)

glimpse(df)

## Rows: 3,378,424
## Columns: 20
## $ ride_id            <chr> "EACB19130B0CDA4A", "8FED874C809DC021", "789F3C21E4…
## $ rideable_type      <chr> "docked_bike", "docked_bike", "docked_bike", "docke…
## $ started_at         <dttm> 2020-01-21 20:06:59, 2020-01-30 14:22:39, 2020-01-…
## $ ended_at           <dttm> 2020-01-21 20:14:30, 2020-01-30 14:26:22, 2020-01-…
## $ start_station_name <chr> "Western Ave & Leland Ave", "Clark St & Montrose Av…
## $ start_station_id   <chr> "239", "234", "296", "51", "66", "212", "96", "96",…
## $ end_station_name   <chr> "Clark St & Leland Ave", "Southport Ave & Irving Pa…
## $ end_station_id     <chr> "326", "318", "117", "24", "212", "96", "212", "212…
## $ start_lat          <dbl> 41.9665, 41.9616, 41.9401, 41.8846, 41.8856, 41.889…
## $ start_lng          <dbl> -87.6884, -87.6660, -87.6455, -87.6319, -87.6418, -…
## $ end_lat            <dbl> 41.9671, 41.9542, 41.9402, 41.8918, 41.8899, 41.884…
## $ end_lng            <dbl> -87.6674, -87.6644, -87.6530, -87.6206, -87.6343, -…
## $ member_casual      <chr> "member", "member", "member", "member", "member", "…
## $ started_date       <date> 2020-01-21, 2020-01-30, 2020-01-09, 2020-01-06, 20…
## $ ended_date         <date> 2020-01-21, 2020-01-30, 2020-01-09, 2020-01-06, 20…
## $ Start_time         <chr> "20:06:59", "14:22:39", "19:29:26", "16:17:07", "08…
## $ End_time           <chr> "20:14:30", "14:26:22", "19:32:17", "16:25:56", "08…
## $ day_of_the_week    <chr> "Tuesday", "Thursday", "Thursday", "Monday", "Thurs…
## $ month              <ord> Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, J…
## $ trip_duration      <dbl> 7.516667, 3.716667, 2.850000, 8.816667, 5.533333, 4…

Dim function dim(df) retrieve or set the dimensions of an object, such as a matrix or an array.

Here’s how it works:

dim(df)

## [1] 3378424      20

The distHaversine functionin R, from the geosphere package, is used to calculate the great-circle distance between two points on the Earth’s surface given their latitude and longitude coordinates. This distance is calculated using the Haversine formula, which accounts for the spherical shape of the Earth.

df$distance <- mapply(function(lat1, lon1, lat2, lon2) {
  distHaversine(c(lon1, lat1), c(lon2, lat2))
}, df$start_lat, df$start_lng, df$end_lat, df$end_lng)

#change to km
df$distance <- df$distance/1000

glimpse(df)

## Rows: 3,378,424
## Columns: 21
## $ ride_id            <chr> "EACB19130B0CDA4A", "8FED874C809DC021", "789F3C21E4…
## $ rideable_type      <chr> "docked_bike", "docked_bike", "docked_bike", "docke…
## $ started_at         <dttm> 2020-01-21 20:06:59, 2020-01-30 14:22:39, 2020-01-…
## $ ended_at           <dttm> 2020-01-21 20:14:30, 2020-01-30 14:26:22, 2020-01-…
## $ start_station_name <chr> "Western Ave & Leland Ave", "Clark St & Montrose Av…
## $ start_station_id   <chr> "239", "234", "296", "51", "66", "212", "96", "96",…
## $ end_station_name   <chr> "Clark St & Leland Ave", "Southport Ave & Irving Pa…
## $ end_station_id     <chr> "326", "318", "117", "24", "212", "96", "212", "212…
## $ start_lat          <dbl> 41.9665, 41.9616, 41.9401, 41.8846, 41.8856, 41.889…
## $ start_lng          <dbl> -87.6884, -87.6660, -87.6455, -87.6319, -87.6418, -…
## $ end_lat            <dbl> 41.9671, 41.9542, 41.9402, 41.8918, 41.8899, 41.884…
## $ end_lng            <dbl> -87.6674, -87.6644, -87.6530, -87.6206, -87.6343, -…
## $ member_casual      <chr> "member", "member", "member", "member", "member", "…
## $ started_date       <date> 2020-01-21, 2020-01-30, 2020-01-09, 2020-01-06, 20…
## $ ended_date         <date> 2020-01-21, 2020-01-30, 2020-01-09, 2020-01-06, 20…
## $ Start_time         <chr> "20:06:59", "14:22:39", "19:29:26", "16:17:07", "08…
## $ End_time           <chr> "20:14:30", "14:26:22", "19:32:17", "16:25:56", "08…
## $ day_of_the_week    <chr> "Tuesday", "Thursday", "Thursday", "Monday", "Thurs…
## $ month              <ord> Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, J…
## $ trip_duration      <dbl> 7.516667, 3.716667, 2.850000, 8.816667, 5.533333, 4…
## $ distance           <dbl> 1.7394455, 0.8343444, 0.6211318, 1.2326158, 0.78450…

Summarize the data

summary of the dataframe by removing unnecessary column

df <- df %>% 
  select(-start_station_id,-end_station_id,-start_lat,-end_lat,-start_lng,-end_lng)
glimpse(df)

## Rows: 3,378,424
## Columns: 15
## $ ride_id            <chr> "EACB19130B0CDA4A", "8FED874C809DC021", "789F3C21E4…
## $ rideable_type      <chr> "docked_bike", "docked_bike", "docked_bike", "docke…
## $ started_at         <dttm> 2020-01-21 20:06:59, 2020-01-30 14:22:39, 2020-01-…
## $ ended_at           <dttm> 2020-01-21 20:14:30, 2020-01-30 14:26:22, 2020-01-…
## $ start_station_name <chr> "Western Ave & Leland Ave", "Clark St & Montrose Av…
## $ end_station_name   <chr> "Clark St & Leland Ave", "Southport Ave & Irving Pa…
## $ member_casual      <chr> "member", "member", "member", "member", "member", "…
## $ started_date       <date> 2020-01-21, 2020-01-30, 2020-01-09, 2020-01-06, 20…
## $ ended_date         <date> 2020-01-21, 2020-01-30, 2020-01-09, 2020-01-06, 20…
## $ Start_time         <chr> "20:06:59", "14:22:39", "19:29:26", "16:17:07", "08…
## $ End_time           <chr> "20:14:30", "14:26:22", "19:32:17", "16:25:56", "08…
## $ day_of_the_week    <chr> "Tuesday", "Thursday", "Thursday", "Monday", "Thurs…
## $ month              <ord> Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, J…
## $ trip_duration      <dbl> 7.516667, 3.716667, 2.850000, 8.816667, 5.533333, 4…
## $ distance           <dbl> 1.7394455, 0.8343444, 0.6211318, 1.2326158, 0.78450…

##calculate riable_type usage
sum_df <- df %>% 
  select(rideable_type,member_casual,started_at,start_station_name,day_of_the_week,month,trip_duration,distance) %>% 
  group_by(rideable_type,member_casual) %>% 
  summarise(Total_Duration = sum(trip_duration),Count = n(),Total_distance = sum(distance)) %>% 
  ungroup()

## `summarise()` has grouped output by 'rideable_type'. You can override using the
## `.groups` argument.

glimpse(sum_df)

## Rows: 6
## Columns: 5
## $ rideable_type  <chr> "classic_bike", "classic_bike", "docked_bike", "docked_…
## $ member_casual  <chr> "casual", "member", "casual", "member", "casual", "memb…
## $ Total_Duration <dbl> 261577.0, 747037.1, 58895632.9, 29011008.2, 3052981.5, …
## $ Count          <int> 11259, 59141, 1140592, 1810758, 145379, 211295
## $ Total_distance <dbl> 22434.51, 112343.41, 2422396.66, 3948031.01, 362682.23,…

##calculate rideable_type usage
sum_df <- df %>% 
  select(rideable_type,member_casual,started_at,start_station_name,day_of_the_week,month,trip_duration,distance) %>% 
  group_by(rideable_type,member_casual) %>% 
  summarise(Total_Duration = sum(trip_duration),Count = n(),Total_distance = sum(distance)) %>% 
  ungroup()

## `summarise()` has grouped output by 'rideable_type'. You can override using the
## `.groups` argument.

glimpse(sum_df)

## Rows: 6
## Columns: 5
## $ rideable_type  <chr> "classic_bike", "classic_bike", "docked_bike", "docked_…
## $ member_casual  <chr> "casual", "member", "casual", "member", "casual", "memb…
## $ Total_Duration <dbl> 261577.0, 747037.1, 58895632.9, 29011008.2, 3052981.5, …
## $ Count          <int> 11259, 59141, 1140592, 1810758, 145379, 211295
## $ Total_distance <dbl> 22434.51, 112343.41, 2422396.66, 3948031.01, 362682.23,…

## member Vs Casual distribution
Member_type<- df %>% 
  group_by(member_casual) %>% 
  summarise(Count = n(),Total_duration = sum(trip_duration),Total_distance = sum(distance)) %>% 
  ungroup()
glimpse(Member_type)

## Rows: 2
## Columns: 4
## $ member_casual  <chr> "casual", "member"
## $ Count          <int> 1297230, 2081194
## $ Total_duration <dbl> 62210191, 32533813
## $ Total_distance <dbl> 2807513, 4603502

# Daily ride 
ride_per_day <- df %>% 
  group_by(started_date,member_casual) %>% 
  summarise(Avg_Trip = mean(trip_duration),Avg_distance = mean(distance),Count = n()) %>% 
  arrange(started_date) %>% 
  ungroup()

## `summarise()` has grouped output by 'started_date'. You can override using the
## `.groups` argument.

glimpse(ride_per_day)

## Rows: 726
## Columns: 5
## $ started_date  <date> 2020-01-01, 2020-01-01, 2020-01-02, 2020-01-02, 2020-01…
## $ member_casual <chr> "casual", "member", "casual", "member", "casual", "membe…
## $ Avg_Trip      <dbl> 82.751572, 12.622806, 102.745953, 11.360752, 31.038227, …
## $ Avg_distance  <dbl> 1.856419, 1.807829, 2.067135, 1.945271, 2.050374, 1.8494…
## $ Count         <int> 477, 1664, 663, 5816, 453, 5437, 390, 2797, 431, 2604, 2…

## weekly ride
Weekly_ride <- df %>% 
  group_by(day_of_the_week,member_casual) %>% 
  summarise(Avg_Trip = mean(trip_duration),
            Avg_distance = mean(distance),Count = n()) %>% 
  arrange(day_of_the_week) %>% 
  ungroup()

## `summarise()` has grouped output by 'day_of_the_week'. You can override using
## the `.groups` argument.

glimpse(Weekly_ride)

## Rows: 14
## Columns: 5
## $ day_of_the_week <chr> "Friday", "Friday", "Monday", "Monday", "Saturday", "S…
## $ member_casual   <chr> "casual", "member", "casual", "member", "casual", "mem…
## $ Avg_Trip        <dbl> 46.86454, 15.37371, 45.70907, 14.86617, 49.65495, 17.9…
## $ Avg_distance    <dbl> 2.145251, 2.184277, 2.042295, 2.151856, 2.299606, 2.34…
## $ Count           <int> 199038, 314621, 131980, 279927, 300853, 297345, 236559…

# monthly Ride
monthly_ride <- df %>% 
  group_by(month, member_casual) %>% 
  summarise(Avg_Trip = mean(trip_duration),
            Avg_distance = mean(distance),Count = n()) %>% 
  arrange(month) %>% 
  ungroup()

## `summarise()` has grouped output by 'month'. You can override using the
## `.groups` argument.

glimpse(monthly_ride)

## Rows: 24
## Columns: 5
## $ month         <ord> Jan, Jan, Feb, Feb, Mar, Mar, Apr, Apr, May, May, Jun, J…
## $ member_casual <chr> "casual", "member", "casual", "member", "casual", "membe…
## $ Avg_Trip      <dbl> 161.64949, 11.14904, 127.63017, 12.80652, 63.11613, 14.3…
## $ Avg_distance  <dbl> 1.944444, 1.783580, 1.955242, 1.768710, 1.962375, 1.9841…
## $ Count         <int> 7785, 136099, 12860, 126715, 27631, 115617, 23584, 61065…

# Popular start station 
Popular_top_start_stations <- df %>%
  count(start_station_name) %>%
  arrange(desc(n)) %>% 
  head(10)

# Top 20 start station
top_start_stations <- df %>%
  group_by(start_station_name,member_casual) %>% 
  count(start_station_name) %>%
  arrange(desc(n)) %>% 
  head(20) %>% 
  ungroup()

#Top 20 end Station
  top_end_stations <- df %>%
  group_by(end_station_name,member_casual) %>% 
  count(end_station_name) %>%
  arrange(desc(n)) %>% 
  head(20) %>% 
 ungroup()

head(top_start_stations)

## # A tibble: 6 × 3
##   start_station_name        member_casual     n
##   <chr>                     <chr>         <int>
## 1 Streeter Dr & Grand Ave   casual        25859
## 2 Clark St & Elm St         member        20193
## 3 Lake Shore Dr & Monroe St casual        19892
## 4 Millennium Park           casual        18368
## 5 Kingsbury St & Kinzie St  member        16431
## 6 St. Clair St & Erie St    member        15814

head(top_end_stations)

## # A tibble: 6 × 3
##   end_station_name          member_casual     n
##   <chr>                     <chr>         <int>
## 1 Streeter Dr & Grand Ave   casual        28463
## 2 Clark St & Elm St         member        20882
## 3 Millennium Park           casual        19419
## 4 Lake Shore Dr & Monroe St casual        19253
## 5 St. Clair St & Erie St    member        17654
## 6 Kingsbury St & Kinzie St  member        16630

#  top station with large distance ride 
dis_df <- df %>% 
  group_by(start_station_name,member_casual) %>% 
  summarise(Avg_distance = mean(distance)) %>% 
  arrange(desc(Avg_distance)) %>% 
  head(20) %>% 
  ungroup()

## `summarise()` has grouped output by 'start_station_name'. You can override
## using the `.groups` argument.

head(dis_df)

## # A tibble: 6 × 3
##   start_station_name         member_casual Avg_distance
##   <chr>                      <chr>                <dbl>
## 1 Stony Island Ave & 90th St member                7.69
## 2 Vincennes Ave & 104th St   member                7.19
## 3 Dodge Ave & Main St        casual                6.87
## 4 Michigan Ave & 71st St     member                6.65
## 5 Oglesby Ave & 100th St     member                6.61
## 6 Ashland Ave & 74th St      member                6.08

#hourly Bike Demand

df <- df %>% 
  mutate(start_hour = lubridate::hour(started_at))

hourly_need <- df %>% 
 group_by(member_casual,start_hour) %>% 
  summarise(number_of_trips = n()) %>% 
  ungroup()

## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.

Data Visualisation

The ggplot() function is the primary function used in the ggplot2 package, a popular data visualization package in R. It is used to create and customize plots based on a grammar of graphics approach, allowing users to create complex and highly customizable visualizations with relatively simple syntax.

Here’s how the ggplot() workes to calculate

Member Vs Casual Distribution

Member_type$Percentage <- round(Member_type$Count/sum(Member_type$Count)*100)

ggplot(Member_type,mapping = aes(x = " ", y = Percentage, fill = member_casual)) + 
    geom_col(color = "black") + 
    geom_text(aes(label=paste(member_casual, paste(Percentage,"%"),sep="\n")),     position = position_stack(vjust=0.5), color="black") +
  labs(title = "Members vs Casual Distribution") +
  coord_polar(theta = "y") +
  scale_fill_brewer() +
  theme_bw()

Riders Bike type Usage

#Most used bike type 
 ggplot(sum_df,mapping = aes(x = rideable_type ,y = Count,fill = rideable_type)) +
  geom_bar(stat = "identity") +
  facet_wrap(~member_casual, nrow = 1) +
  theme(legend.position = "none") +
  labs(title = "Rider Bike type Usage",x = "Bike_type",y = "Count")

Rides Done on Per Day

# Total Ride per day

ggplot(ride_per_day, aes(x = started_date, y = Count,fill = factor(member_casual))) +
  geom_col()+ labs( title  ="Ride taken Per Day",
       x = "Date",
       y = "Count") +
  theme_minimal()

Weekly Ride Count

#Total Ride per Week

ggplot(Weekly_ride, aes(x = day_of_the_week, y = Count,fill = factor(member_casual))) +
  geom_col()+ labs( title  ="Weekly Ride Count",
       x = "Day of Week",
       y = "Count") +
  theme_minimal()

Monthly Ride Count

ggplot(monthly_ride, aes(x = month, y = Count ,fill = factor(member_casual))) +
  geom_col()+ labs( title  ="Monthly Ride Count",
       x = "Month(year 2020)",
       y = "Count") +theme_minimal()+
  theme(axis.text.x = element_text(angle = 45))

Average Distance Covered from Various Start Station

## Average Distance ride by member type in a year

ggplot(dis_df, aes(x = Avg_distance, y = reorder(start_station_name,Avg_distance), fill = factor(member_casual))) +
  geom_col() +
  labs(title = "Large Distance Covered from Various Start Station(Top 20)",
       x = "Distance Covered (km)",
       y = "Station Name",
       fill = "Rider Type") +
  theme_minimal()+
  theme(axis.text.x = element_text(angle = 90))

Popular Start Station

Popular_top_start_stations %>% 
ggplot() + geom_col(aes(x=n,y= reorder(start_station_name,n)))+ scale_x_continuous(labels = comma)+
  labs(title = "Top 10 popular Start Station", y = "No of Rides")

Rider most preferred Stations

#Top Start Station Name

ggplot(top_start_stations, aes(x = n, y = reorder(start_station_name,n), fill = factor(member_casual))) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Top 20 Start Station Name",
       x = "Ride Count",
       y = "Station Name",
       fill = "Rider Type") +
  theme_minimal()

#Top End Station Name

ggplot(top_end_stations, aes(x = n, y = reorder(end_station_name,n), fill = factor(member_casual))) + 
  geom_bar(stat = "identity", position = "dodge") + 
  labs(title = "Top 20 End Station Name",
       x = "Ride Count",
       y = "Station Name",
       fill = "Rider Type") +
  theme_minimal()

Hourly Bike Demand

hourly_need %>% 
  ggplot()+geom_line(aes(x=start_hour,y= number_of_trips,color = member_casual))+
  labs(title = "Hourly Bike Demand",
       x = "Hour",
       y = "No of Trips",
       fill = "Member_casual") +
  scale_x_continuous(limits = c(0,24), name ="Hours")+
  theme_minimal()

Recommendations for Stakeholders:

1.Enhance Membership Programs:

Since 62%* of the users are members, there is an opportunity to further strengthen membership benefits to retain and attract more long-term users. Consider offering loyalty programs, discounts for long-term memberships, or exclusive benefits during peak seasons (April to September).

2.Bike Type Optimization:

Given that both members and casual riders prefer docked bikes, ensure that there are sufficient docked bikes available at high-demand stations, especially during peak hours. Consider investing in more docked bikes and maintaining a balance with other bike types.

3.Seasonal Promotions:

Since ride frequency increases from April and decreases after September, plan for seasonal promotions and marketing campaigns to maximize ridership during these months. This could include discounted rides, special events, or partnerships with local businesses to encourage more usage.

4.Improve Weekend Services:

With higher ride volumes on weekends, ensure there are adequate resources and bike availability. Consider running special weekend events or promotions to further boost ridership.

5.Focus on High-Demand Stations:

Vincennes Ave & 104th Station and Streeter Dr & Grand Ave are key stations with high ride counts and distances covered. Enhance services at these stations, such as better bike maintenance, increased docking stations, and potentially setting up customer service points.

6.Adjust for Peak Hours:

With peak demand between 3 pm to 6 pm, allocate more bikes and ensure efficient redistribution of bikes to meet demand. Consider offering incentives for riders who choose to ride outside of these peak hours to balance the load.

Recommendations for Marketing Team:

Targeted Marketing Campaigns:

Develop targeted campaigns to convert casual riders to members. Highlight the benefits of membership, such as cost savings, exclusive access to promotions, and convenience.

Leverage Popular Stations:

Use the popularity of stations like Streeter Dr & Grand Ave to create event-based marketing. For instance, set up pop-up events, offer free refreshments, or partner with nearby attractions to draw in more riders.

Promote During Peak Seasons:

Utilize data showing increased rides from April to September to launch time-limited offers and campaigns. Engage with riders through social media, email newsletters, and local advertisements to promote these offers.

4.Weekend Specials:

Since weekends see higher ridership, promote special weekend passes or family packages to attract group rides. Collaborate with local tourist attractions or restaurants to offer combined deals.

5.Highlight Environmental Impact:

Emphasize the environmental benefits of using bike share programs in your marketing materials. Share statistics on carbon footprint reduction and promote the sustainable aspect of biking to attract eco-conscious riders.

6.Dynamic Pricing:

Implement dynamic pricing strategies during peak hours and seasons to manage demand and encourage off-peak usage. Offer discounted rates for rides starting early in the morning or late at night.

By addressing these, Cyclistic can enhance user experience, optimize operations, and effectively increase ridership and membership.

Thank you,

Nisha Prasanth.