Cyclistic Bike-Share Capstone Project

Introduction

This project is a Google Data Analytics Capstone Project on Coursera, designed to reinforce the analytical skills gained throughout the course duration. For this exercise, I utilize a publicly available data from Cyclistic, a fictional bike-share company based in the city of Chicago, launched in 2016. The company features 5800 bicycles and 692 docking station. They offer various types of bikes such as classic , electric, and docked bikes.

Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members. The finance analysts have concluded that annual members are much more profitable than casual riders.

I am the junior data analyst working in the marketing analyst team at Cyclistic. In order to answer the key business questions, uncover key insights, pattern and trends on the bike rental service usage and communicate the findings in an easy, engaging and persuasive way to the key stakeholder. i will follow the steps of the data analysis process: ask, prepare, process, analyze, share, and act

Stakeholders

⦁ Lily Moreno: The director of marketing

⦁ Cyclistic marketing analytics team

⦁ Cyclistic executive team

Ask

Business Task

⦁ How do annual members and casual riders use Cyclistic bikes differently?

⦁ Increase in revenue derived from the transition of casual riders into annual members subscriber. i aim to uncover distinctions in the utilization patterns of Cyclistic bikes between casual riders and annual members and how can Cyclistic use digital media to influence casual riders to become members?

⦁ Recommend a new marketing strategy geared towards the conversion of casual riders

Prepare

Data Source Description

This report made use of data collected in 12 months (August 2022 - July 2023) for detailed data analysis to complete this case study. To download the data, please use this link https://divvy-tripdata.s3.amazonaws.com/index.html. This data was made public by Motivate International Inc, under this license https://divvybikes.com/data-license-agreement. Due to data privacy issues, personal information has been removed or encrypted. i Ensure the “Divvy bikeshare” dataset integerity by checking if the data ROCCC.

Data Interigity(ROCCC)

Reliable: data is reliable and accurate being collected by a credible source using advanced data collection technology

Original: Primary data source, the company Cyclistic gathers this data first-hand.

Comprehensive: The dataset has more than 5 million complete data entries. With the available data we are able to compare the two groups ‘annual members & casual members’, but not the preference that lead to their choice on what product/services to opt for.

Current: Data is up-to-date as it includes data from the past 12 months.

Cited : Yes, The data is authorized under license.

Dataset

The dataset consist of 13 columns and a total of 5723606 rows. Here’s an overview of the columns:

ride_id:— a unique ID per ride

rideable_type:- the type of bicycle used

started_at: the date and time at which bicycle was checked out

ended_at: the date and time at which bicycle was checked in

start_station_name: the name of the station at the start of the trip

start_station_id: a unique identifier for the start station

end_station_name: the name of the station at the end of the trip

end_station_id: a unique identifier for the end station

Start_lat: the latitude of the start station

start_lng: the longitude of the start station

end_lat: the latitude of the end station

end_lng: the longitude of the end station

member_casual: this field indicates whether the rider type is member or a casual

Process

R Programming

R programming is the data analysis tool of my choice for this project, and preferable to spreadsheet due to the raw amount of data in this dataset. R is an open-source programming language and free software environment for statistical computing and graphics. R has a wide range of tools available and can handle massive amounts of data with ease. I imported and load the data into RStudio.

Data Extraction and Processing

# Setting up my r environment and Loading packages

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(ggplot2)
library(janitor)

## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

library(lubridate)
library(readr)
library(skimr)
library(dplyr)
library(tidyr)
library(gt)
library(farver)
library(plotrix)
library(tinytex)

# setting directory

setwd("C:/Users/HP PC/Documents/Cousera/Capstone_Project/cyclist_divvy_dataset")

# assigning and importing dataset  

aug_22 <- read.csv('202208-divvy-tripdata.csv')
sep_22 <- read.csv("202209-divvy-tripdata.csv")
oct_22 <- read.csv("202210-divvy-tripdata.csv")
nov_22 <- read.csv("202211-divvy-tripdata.csv")
dec_22 <- read.csv("202212-divvy-tripdata.csv")
jan_23 <- read.csv("202301-divvy-tripdata.csv")
feb_23 <- read.csv("202302-divvy-tripdata.csv")
mar_23 <- read.csv("202303-divvy-tripdata.csv")
apr_23 <- read.csv("202304-divvy-tripdata.csv")
may_23 <- read.csv("202305-divvy-tripdata.csv")
jun_23 <- read.csv("202306-divvy-tripdata.csv")
jul_23 <- read.csv("202307-divvy-tripdata.csv")

Column Names

colnames(aug_22)

##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"

colnames(sep_22)

##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"

colnames(oct_22)

##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"

colnames(nov_22)

##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"

colnames(dec_22)

##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"

colnames(jan_23)

##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"

colnames(feb_23)

##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"

colnames(mar_23)

##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"

colnames(apr_23)

##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"

colnames(may_23)

##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"

colnames(jun_23)

##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"

colnames(jul_23)

##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"

i ensured the column are the same on each assigned data set variable which ascertained that all column name are consistent.

# merging all the data set

divvy_trips <- rbind(aug_22, sep_22, oct_22, nov_22, dec_22, jan_23, feb_23, mar_23, apr_23, may_23, jun_23, jul_23)

# broad overview of the dataset

skim_without_charts(divvy_trips)

Data summary
Name	divvy_trips
Number of rows	5723606
Number of columns	13
_______________________
Column type frequency:
character	9
numeric	4
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	empty	n_unique
ride_id	1	16	16	0	5723606
rideable_type	1	11	13	0	3
started_at	1	19	19	0	4821003
ended_at	1	19	19	0	4833101
start_station_name	1	0	64	868772	1802
start_station_id	1	0	36	868904	1503
end_station_name	1	0	64	925008	1800
end_station_id	1	0	36	925149	1506
member_casual	1	6	6	0	2

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100
start_lat	0	1	41.90	0.05	41.64	41.88	41.90	41.93	42.07
start_lng	0	1	-87.65	0.03	-87.92	-87.66	-87.64	-87.63	-87.52
end_lat	6102	1	41.90	0.07	0.00	41.88	41.90	41.93	42.18
end_lng	6102	1	-87.65	0.12	-88.16	-87.66	-87.64	-87.63	0.00

str(divvy_trips)

## 'data.frame':    5723606 obs. of  13 variables:
##  $ ride_id           : chr  "550CF7EFEAE0C618" "DAD198F405F9C5F5" "E6F2BC47B65CB7FD" "F597830181C2E13C" ...
##  $ rideable_type     : chr  "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
##  $ started_at        : chr  "2022-08-07 21:34:15" "2022-08-08 14:39:21" "2022-08-08 15:29:50" "2022-08-08 02:43:50" ...
##  $ ended_at          : chr  "2022-08-07 21:41:46" "2022-08-08 14:53:23" "2022-08-08 15:40:34" "2022-08-08 02:58:53" ...
##  $ start_station_name: chr  "" "" "" "" ...
##  $ start_station_id  : chr  "" "" "" "" ...
##  $ end_station_name  : chr  "" "" "" "" ...
##  $ end_station_id    : chr  "" "" "" "" ...
##  $ start_lat         : num  41.9 41.9 42 41.9 41.9 ...
##  $ start_lng         : num  -87.7 -87.6 -87.7 -87.7 -87.7 ...
##  $ end_lat           : num  41.9 41.9 42 42 41.8 ...
##  $ end_lng           : num  -87.7 -87.6 -87.7 -87.7 -87.7 ...
##  $ member_casual     : chr  "casual" "casual" "casual" "casual" ...

Carrying Out analysis to identify obvious errors, understand patterns within the data, detect outliers, and find relation within variables.

#quick glace of the data set, showing the first 10 rows

head(divvy_trips, 10)

# checking for duplicates

nrow(divvy_trips)

## [1] 5723606

divvy_trips %>%  distinct(ride_id, .keep_all = TRUE) %>% count()

#no duplicates found

divvy_trips1 <- divvy_trips %>% drop_na 

# there are 6,102 missing values, this brings the total number of observation to 5717504

Validating and Transforming Data

# converting 'started_at' and 'ended_at' to datetime 

divvy_trips1$started_at <- as.POSIXct(divvy_trips1$started_at)
divvy_trips1$ended_at <- as.POSIXct(divvy_trips1$ended_at)

# 'started_ride_date' column
divvy_trips1$started_ride_date <- format(as.Date(divvy_trips1$started_at), "%D")

# 'started_ride_time' column
divvy_trips1$started_ride_time <- format(as.POSIXct(divvy_trips1$started_at), "%H:%M:%S")

# 'ended_date' column
divvy_trips1$ended_date <- format(as.Date(divvy_trips1$ended_at), "%D")

# 'ended_ride_time' column
divvy_trips1$ended_ride_time <- format(as.POSIXct(divvy_trips1$ended_at), "%H:%M:%S")

# 'ride_length' column
divvy_trips1$ride_length <- difftime(divvy_trips1$ended_at, divvy_trips1$started_at, units = "secs")

# 'start_hour'
divvy_trips1$start_hour <- format(as.POSIXct(divvy_trips1$started_at), "%H")

#Extracting date component for 'year', 'month', & 'weekday'

divvy_trips2 <- divvy_trips1 %>%
  mutate(year = lubridate::year(started_at),  
         month = lubridate::month(started_at, label = TRUE), # month  "abbr"
         weekday = lubridate::wday(started_at, label = TRUE)) # day of week

# overview of dataset

glimpse(divvy_trips2) #glimpse of dataset

## Rows: 5,717,504
## Columns: 22
## $ ride_id            <chr> "550CF7EFEAE0C618", "DAD198F405F9C5F5", "E6F2BC47B6…
## $ rideable_type      <chr> "electric_bike", "electric_bike", "electric_bike", …
## $ started_at         <dttm> 2022-08-07 21:34:15, 2022-08-08 14:39:21, 2022-08-…
## $ ended_at           <dttm> 2022-08-07 21:41:46, 2022-08-08 14:53:23, 2022-08-…
## $ start_station_name <chr> "", "", "", "", "", "", "", "", "", "", "", "", "",…
## $ start_station_id   <chr> "", "", "", "", "", "", "", "", "", "", "", "", "",…
## $ end_station_name   <chr> "", "", "", "", "", "", "", "", "", "", "", "", "",…
## $ end_station_id     <chr> "", "", "", "", "", "", "", "", "", "", "", "", "",…
## $ start_lat          <dbl> 41.93, 41.89, 41.97, 41.94, 41.85, 41.79, 41.89, 41…
## $ start_lng          <dbl> -87.69, -87.64, -87.69, -87.65, -87.65, -87.72, -87…
## $ end_lat            <dbl> 41.94, 41.92, 41.97, 41.97, 41.84, 41.82, 41.89, 41…
## $ end_lng            <dbl> -87.72, -87.64, -87.66, -87.69, -87.66, -87.69, -87…
## $ member_casual      <chr> "casual", "casual", "casual", "casual", "casual", "…
## $ started_ride_date  <chr> "08/07/22", "08/08/22", "08/08/22", "08/08/22", "08…
## $ started_ride_time  <chr> "21:34:15", "14:39:21", "15:29:50", "02:43:50", "20…
## $ ended_date         <chr> "08/07/22", "08/08/22", "08/08/22", "08/08/22", "08…
## $ ended_ride_time    <chr> "21:41:46", "14:53:23", "15:40:34", "02:58:53", "20…
## $ ride_length        <drtn> 451 secs, 842 secs, 644 secs, 903 secs, 352 secs, …
## $ start_hour         <chr> "21", "14", "15", "02", "20", "13", "14", "20", "21…
## $ year               <dbl> 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 202…
## $ month              <ord> Aug, Aug, Aug, Aug, Aug, Aug, Aug, Aug, Aug, Aug, A…
## $ weekday            <ord> Sun, Mon, Mon, Mon, Sun, Mon, Mon, Sun, Sun, Sun, S…

# removing bad data and dealing with outliers

divvy_trips2 %>% 
  select(member_casual, ride_length) %>% 
  group_by(member_casual) %>%
  dplyr::summarize(min_ride_length = min(ride_length), max_ride_length = max(ride_length)) %>% 
  ungroup()

The minimum and maximum ‘ride_length’ shows a negative values and a ‘ride_length’ more than 24 hours(86400secs) of any ride trips. This values are outliers and will be removed. A trip less than 60 sec may also be due to other factors such as faulty bikes and will also be removed.

divvy_trips3 <- divvy_trips2 %>%
  filter(ride_length >= 60 & ride_length < 86400)

nrow(divvy_trips3)

## [1] 5565647

#brings the total number of observation to 5565647

# converting 'ride_length' to numeric before analyzing
is.factor(divvy_trips3$ride_length) #FLASE

## [1] FALSE

divvy_trips3$ride_length <- as.numeric(as.character(divvy_trips3$ride_length))

is.numeric(divvy_trips3$ride_length) #TRUE

## [1] TRUE

#ordering by 'started_at' in Ascending order
divvy_trips_clean <-  arrange(divvy_trips3, started_at)

# Dataset Overview

divvy_trips_clean %>% head(10)

Analyze

Descripitive Analysis of dataset

Using the summary function to determine the mean, median, min & max range of ride_length”

##statistical summary of trip duration
summary(divvy_trips_clean$ride_length)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    60.0   344.0   592.0   931.5  1042.0 86255.0

#statistical summary of trip duration by member type
divvy_trips_clean %>% 
    group_by(member_casual) %>% 
    summarize(minmum_ride_length = min(ride_length),
              maximum_ride_length = max(ride_length),
              average_ride_length = mean(ride_length),
              median_ride_length = median(ride_length))

#yearly percentage of casual Vs. member rides
divvy_trips_clean %>% 
  group_by(member_casual) %>%
  summarise("numbers of rides" = n(), perc = n()/nrow(divvy_trips_clean) *100)

#Create data for the graph
x <-  c(3456701, 2108946)
piepercent <- paste0(round(x/sum(x)*100, 1), "%")
labels <-  c("member", "casual")
color <- c("#00BFC4","#F8766D")

# Plot the chart.
pie3D(x, labels = piepercent, explode =0.1, radius = 1,
  main = "Proportion of Total Rides %", col = color)
  legend(title = "Membership", "topright", c("member", "casual"), 
  cex = 0.5, fill = color)

The distribution chart indicates, members possessing (62.1%), and casual riders have (37.9%) of the total rides. members used rides (24.2%) more than casual riders during the 12 months period.

# Comparison of total rides n

divvy_trips_clean %>% 
  group_by(member_casual) %>%
  ggplot(aes(x=member_casual, fill = member_casual))+
  labs(title= "Membership Distribution", y= "Number of rides", x="casual vs member", fill = "Membership type") +
  geom_bar(position='dodge')

An analysis of the rides frequency over a period of 12 months to gain precise insights that will guide my approach to encourage casual members to transition into full-fledged annual members.

#Calculating rides per start_hour & Average
divvy_trips_clean %>%
  group_by(start_hour, member_casual) %>%
  summarize(ride_numbers = n(), .groups = 'drop', Average_duration = mean(ride_length))

# Visualization of rides at early hours 
divvy_trips_clean %>%
  group_by(start_hour, member_casual) %>%
  summarize(ride_numbers = n(), .groups = 'drop', Average_duration = mean(ride_length)) %>% 
  ggplot(aes(x=start_hour, y=ride_numbers, fill= member_casual)) +
  geom_col(position = "dodge") +
  labs(title = "Rides started per hour of the day", x="Hours", y="Number of started rides", fill="Membership")

From the above chart we can see more members between 7:00-9:00 and 16:00-18:00 which would suggest that annual members use their bikes to commute to and from work. Casual riders have a spike in number of rides around 11:00-12:00 and 17:00-19:00. Also there is bigger volume rise in the afternoon for both type of riders. This information needs to be checked on day basis.

#Calculating rides per week 

divvy_trips_clean %>%
  group_by(member_casual, weekday) %>%
  summarize(Number_Rides = n(),  .groups = 'drop', Average_rides = mean(ride_length)) %>%
  arrange(weekday)

# Visualization of rides weekly
divvy_trips_clean %>%
  group_by(member_casual, weekday) %>%
  summarize(Number_Rides = n(),  .groups = 'drop', Average_rides = mean(ride_length)) %>%
  arrange(weekday) %>%
  ggplot(aes(x=weekday, y=Number_Rides, fill=member_casual))+
  geom_col(position='dodge')+
  labs(title = "Total Rides Weekly: Subscriber Vs. Casual", x="Days", y="Total Ride Numbers", fill="Membership_Type")

The Chart shows, during week day members have a much higher frequency of rides than casual riders. Casual riders however, shows more frequency during weekends, while the number of rides for members actually decreases slightly.

# Visualization of average weekly rides
divvy_trips_clean %>%
  group_by(member_casual, weekday) %>%
  summarize(Number_Rides = n(),  .groups = 'drop', Average_rides = mean(ride_length)) %>%
  arrange(weekday) %>%
  ggplot(aes(x=member_casual, y=Average_rides, fill=weekday))+
  geom_bar(position='Dodge', stat='identity')+
  labs(title = "Average Rides Weekly: Subscriber Vs. Casual", x="Days", y="Average_rides_length", fill="Weekday")

The chart for the average rides shows casual members have longer average distance of rides than the member users, but they have relatively longer rides.

# Calculating rides per month

divvy_trips_mth <- divvy_trips_clean

divvy_trips_mth$month <- factor(divvy_trips_mth$month,levels = c( "Aug", "Sep", "Oct", "Nov", "Dec", "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul"))

divvy_trips_mth %>%
  group_by(member_casual, month) %>%
  summarize(ride_numbers = n(), .groups = 'drop', Average_duration = mean(ride_length)) %>%
  arrange(month)

# Visualization Monthly rides

divvy_trips_mth %>%
  group_by(member_casual, month) %>%
  summarize(ride_numbers = n(), .groups = 'drop', Average_duration = mean(ride_length)) %>%
  arrange(month) %>% 
  ggplot(aes(x=month, y=ride_numbers, fill= member_casual)) +
  theme(axis.text.x = element_text(angle = 45)) +
  geom_col(width=0.5,position = position_dodge(width=0.5)) +
  labs(title = "Monthly Trips: Casual Vs. Subscribed members", x="month", y="Number of Trips", fill="Membership")

The months June, July, August and September are the most busy time of the year among both members and casual riders. It is possible due to winter there is a significant drop in total rides in the months of November, December, January and February for both type of customers. The chart also indicates member’s total rides are higher than casual riders throughout the year.

#Average number of rides
divvy_trips_clean %>%
  group_by(member_casual, month) %>%
  summarize(ride_numbers = n(), .groups = 'drop', Average_duration = mean(ride_length)) %>%
  arrange(month) %>%
  ggplot(aes(x=month, y=Average_duration, fill= member_casual)) +
  geom_col(position = 'dodge') +
  labs(title = "Average ride length of members by month", x="month", y="Average Ride_length", fill="Membership")

Average ride length of members is about the same <1000 secs throughout the year. While casual riders average ride length is between 750 - 1500 secs throughout the year. But in the month of July average right length is higher but total rides are lowest as compared to other months.

divvy_trips_clean %>% 
  group_by(member_casual, rideable_type) %>%
  summarise(total=n(), .groups = 'drop', perc =round(n()/nrow(divvy_trips_clean)*100, 1)) %>%
  arrange(rideable_type)

# Visualization of prefered bikes
divvy_trips_clean %>% 
  group_by(member_casual, rideable_type) %>%
  summarise(total=n(), .groups = 'drop', perc =round(n()/nrow(divvy_trips_clean)*100, 1)) %>%
  arrange(rideable_type) %>%
  ggplot(aes(x=rideable_type, y= total, fill=member_casual))+
  geom_col(position='dodge')+
  labs(title = "Preferred Bikes")+ 
  facet_wrap(~member_casual)+
  geom_text(aes(label = paste(format(perc, nsmall=1), "%")), vjust= -0.2)

Electric bikes ranks first, being mostly preferred by the casual and members at (21.7%) & (32.2%) respectively of total bike usage.It is also noticeable that the member rides also go more for classic bikes at (29.9%). Docked bikes are only used by casual riders with marginal usage of as little as (2.2%).

Conclusion

Note : All ride ids are unique so we cannot conclude if the same rider taken several rides. More rider data needed for further analysis

Additional data that could expand scope of analysis
Pricing details for members and casual riders - Based on this data, we might be to optimize cost structure for casual riders or provide discounts without affecting the profit margin.
Address neighborhood details of members to investigate if there are any location specific parameters that encourage membership.Way to determine a recurring bike user using payment information or any personal identification.

Cyclistic Bike-Share Capstone Project

Olorunfemi Taiwo

2023-11-03

Introduction

Stakeholders

Ask

Business Task

Prepare

Data Source Description

Dataset

Process

R Programming

Data Extraction and Processing

Column Names

Validating and Transforming Data

Analyze

Descripitive Analysis of dataset

Conclusion

Cyclistic Bike-Share Capstone Project

Olorunfemi Taiwo

2023-11-03

Case Study: How Does a Bike-Share Navigate Speedy Success?

Introduction

Stakeholders

Ask

Business Task

Prepare

Data Source Description

Dataset

Process

R Programming

Data Extraction and Processing

Column Names

Validating and Transforming Data

Analyze

Descripitive Analysis of dataset

Share

Key insights

Recommendation

Conclusion