This is my version of the Google Data Analytics : Case Study 1 , A Cyclistic bike-share Analysis case study. The “Cyclistic” is a fictional company for this case study. They said this case study will be a ‘Tangible’ way to demonstrate my knowledge and skills so here we are 😊 : I’m a junior ‘Data Analyst’ working in the marketing analytics team at company named “Cyclistic”, a bike-share company in Chicago. I’ve joined this team six months ago and have been busy learning about Cyclistic’s mission and buisness goals as well as how I, as a junior data analyst, can help Cyclistic achieve them. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships.
Before we begin the Process for this Project, there are a few key points that are wrapped below as these are the steps I’ll be following to ensure its completion:
I'll be following these vital steps for the Data Analysis Process:
Ask, Prepare, Process, Analyze, Share, and Act.
Data Analysis Process
The director of marketing believes the company’s future success depends on maximizing the number of annual memberships.
About the company:
In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime. Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments.One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members. *Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, Moreno believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, Moreno believes there is a very good chance to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs. Moreno has set a clear goal: Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. Moreno and her team are interested in analyzing the Cyclistic historical bike trip data to identify trends.
In this phase:
I’ll be using Cyclistic’s historical trip data to analyze and identify trends, Downloaded last 12 months of data and stored it on my Google Drive
The data is located on divvy-tripdata and data has been made available by Motivate International Inc. under this license.
The company has their own licence over the dataset. Besides that, the
dataset doesn't have any personal information about the riders. So, I'm
sure that Data isn't Biased And have full credibility for the same reason.
Finally, it's ROCCC because it's reliable, original, comprehensive,
current and cited.
Verified the data's integrity by checking all the files, have consistent
columns and the data types.
I’ve also done analysis using excel but due to “Big Data” MS Excel won’t be able to load it.
R is primarily used for statistical analysis and data visualization. So, I chose ‘RStudio’ to merge all the months based Data into One Single file for further Analysis.
library(tidyverse)
library(lubridate)
library(ggplot2)
library(readxl)
setwd("D:/Case_Study/Data/CSV")
t22_05 <- read.csv("202205-divvy-tripdata.csv")
t22_06 <- read.csv("202206-divvy-tripdata.csv")
t22_07 <- read.csv("202207-divvy-tripdata.csv")
t22_08 <- read.csv("202208-divvy-tripdata.csv")
t22_09 <- read.csv("202209-divvy-publictripdata.csv")
t22_10 <- read.csv("202210-divvy-tripdata.csv")
t22_11 <- read.csv("202211-divvy-tripdata.csv")
t22_12 <- read.csv("202212-divvy-tripdata.csv")
t23_01 <- read.csv("202301-divvy-tripdata.csv")
t23_02 <- read.csv("202302-divvy-tripdata.csv")
t23_03 <- read.csv("202303-divvy-tripdata.csv")
t23_04 <- read.csv("202304_divvy_tripdata.csv")
str(t23_04) # checked all the data one by one with " str() ".
## 'data.frame': 426590 obs. of 13 variables:
## $ ride_id : chr "8FE8F7D9C10E88C7" "34E4ED3ADF1D821B" "5296BF07A2F77CB5" "40759916B76D5D52" ...
## $ rideable_type : chr "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
## $ started_at : chr "2023-04-02 08:37:28" "2023-04-19 11:29:02" "2023-04-19 08:41:22" "2023-04-19 13:31:30" ...
## $ ended_at : chr "2023-04-02 08:41:37" "2023-04-19 11:52:12" "2023-04-19 08:43:22" "2023-04-19 13:35:09" ...
## $ start_station_name: chr "" "" "" "" ...
## $ start_station_id : chr "" "" "" "" ...
## $ end_station_name : chr "" "" "" "" ...
## $ end_station_id : chr "" "" "" "" ...
## $ start_lat : num 41.8 41.9 41.9 41.9 41.9 ...
## $ start_lng : num -87.6 -87.7 -87.7 -87.7 -87.7 ...
## $ end_lat : num 41.8 41.9 41.9 41.9 41.9 ...
## $ end_lng : num -87.6 -87.7 -87.7 -87.7 -87.6 ...
## $ member_casual : chr "member" "member" "member" "member" ...
tripdata <- bind_rows(t22_05,t22_06,t22_07,t22_08,t22_09,t22_10,
t22_11,t22_12,t23_01,t23_02,t23_03,t23_04)
str(tripdata)
## 'data.frame': 5859061 obs. of 13 variables:
## $ ride_id : chr "EC2DE40644C6B0F4" "1C31AD03897EE385" "1542FBEC830415CF" "6FF59852924528F8" ...
## $ rideable_type : chr "classic_bike" "classic_bike" "classic_bike" "classic_bike" ...
## $ started_at : chr "2022-05-23 23:06:58" "2022-05-11 08:53:28" "2022-05-26 18:36:28" "2022-05-10 07:30:07" ...
## $ ended_at : chr "2022-05-23 23:40:19" "2022-05-11 09:31:22" "2022-05-26 18:58:18" "2022-05-10 07:38:49" ...
## $ start_station_name: chr "Wabash Ave & Grand Ave" "DuSable Lake Shore Dr & Monroe St" "Clinton St & Madison St" "Clinton St & Madison St" ...
## $ start_station_id : chr "TA1307000117" "13300" "TA1305000032" "TA1305000032" ...
## $ end_station_name : chr "Halsted St & Roscoe St" "Field Blvd & South Water St" "Wood St & Milwaukee Ave" "Clark St & Randolph St" ...
## $ end_station_id : chr "TA1309000025" "15534" "13221" "TA1305000030" ...
## $ start_lat : num 41.9 41.9 41.9 41.9 41.9 ...
## $ start_lng : num -87.6 -87.6 -87.6 -87.6 -87.6 ...
## $ end_lat : num 41.9 41.9 41.9 41.9 41.9 ...
## $ end_lng : num -87.6 -87.6 -87.7 -87.6 -87.7 ...
## $ member_casual : chr "member" "member" "member" "member" ...
write.csv(tripdata, file = "tripdata.csv")
tripdata <- tripdata %>%
select(-c(start_lat, start_lng, end_lat, end_lng))
str(tripdata)
## 'data.frame': 5859061 obs. of 9 variables:
## $ ride_id : chr "EC2DE40644C6B0F4" "1C31AD03897EE385" "1542FBEC830415CF" "6FF59852924528F8" ...
## $ rideable_type : chr "classic_bike" "classic_bike" "classic_bike" "classic_bike" ...
## $ started_at : chr "2022-05-23 23:06:58" "2022-05-11 08:53:28" "2022-05-26 18:36:28" "2022-05-10 07:30:07" ...
## $ ended_at : chr "2022-05-23 23:40:19" "2022-05-11 09:31:22" "2022-05-26 18:58:18" "2022-05-10 07:38:49" ...
## $ start_station_name: chr "Wabash Ave & Grand Ave" "DuSable Lake Shore Dr & Monroe St" "Clinton St & Madison St" "Clinton St & Madison St" ...
## $ start_station_id : chr "TA1307000117" "13300" "TA1305000032" "TA1305000032" ...
## $ end_station_name : chr "Halsted St & Roscoe St" "Field Blvd & South Water St" "Wood St & Milwaukee Ave" "Clark St & Randolph St" ...
## $ end_station_id : chr "TA1309000025" "15534" "13221" "TA1305000030" ...
## $ member_casual : chr "member" "member" "member" "member" ...
summary(tripdata)
## ride_id rideable_type started_at ended_at
## Length:5859061 Length:5859061 Length:5859061 Length:5859061
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
## start_station_name start_station_id end_station_name end_station_id
## Length:5859061 Length:5859061 Length:5859061 Length:5859061
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
## member_casual
## Length:5859061
## Class :character
## Mode :character
table(tripdata$member_casual)
##
## casual member
## 2358307 3500754
and formatted as date :
tripdata$date <- as.Date(tripdata$started_at)
tripdata$day <- format(as.Date(tripdata$date), "%d")
tripdata$month <- format(as.Date(tripdata$date), "%B")
tripdata$year <- format(as.Date(tripdata$date), "%Y")
tripdata$day_of_week <- format(as.Date(tripdata$date), "%A")
tripdata$pick <- as_datetime(tripdata$started_at)
tripdata$pick_time <- format(as_datetime(tripdata$pick), "%H:%M:")
tripdata$drop <- as_datetime(tripdata$ended_at)
tripdata$drop_time <- format(as_datetime(tripdata$drop), "%H:%M")
glimpse(tripdata)
## Rows: 5,859,061
## Columns: 18
## $ ride_id <chr> "EC2DE40644C6B0F4", "1C31AD03897EE385", "1542FBEC83…
## $ rideable_type <chr> "classic_bike", "classic_bike", "classic_bike", "cl…
## $ started_at <chr> "2022-05-23 23:06:58", "2022-05-11 08:53:28", "2022…
## $ ended_at <chr> "2022-05-23 23:40:19", "2022-05-11 09:31:22", "2022…
## $ start_station_name <chr> "Wabash Ave & Grand Ave", "DuSable Lake Shore Dr & …
## $ start_station_id <chr> "TA1307000117", "13300", "TA1305000032", "TA1305000…
## $ end_station_name <chr> "Halsted St & Roscoe St", "Field Blvd & South Water…
## $ end_station_id <chr> "TA1309000025", "15534", "13221", "TA1305000030", "…
## $ member_casual <chr> "member", "member", "member", "member", "member", "…
## $ date <date> 2022-05-23, 2022-05-11, 2022-05-26, 2022-05-10, 20…
## $ day <chr> "23", "11", "26", "10", "10", "04", "27", "29", "16…
## $ month <chr> "May", "May", "May", "May", "May", "May", "May", "M…
## $ year <chr> "2022", "2022", "2022", "2022", "2022", "2022", "20…
## $ day_of_week <chr> "Monday", "Wednesday", "Thursday", "Tuesday", "Tues…
## $ pick <dttm> 2022-05-23 23:06:58, 2022-05-11 08:53:28, 2022-05-…
## $ pick_time <chr> "23:06:", "08:53:", "18:36:", "07:30:", "17:31:", "…
## $ drop <dttm> 2022-05-23 23:40:19, 2022-05-11 09:31:22, 2022-05-…
## $ drop_time <chr> "23:40", "09:31", "18:58", "07:38", "17:36", "14:56…
tripdata <- tripdata %>%
select(-c(pick,drop))
tripdata_duplicate <- tripdata[!duplicated(tripdata$ride_id),]
glimpse(tripdata)
## Rows: 5,859,061
## Columns: 16
## $ ride_id <chr> "EC2DE40644C6B0F4", "1C31AD03897EE385", "1542FBEC83…
## $ rideable_type <chr> "classic_bike", "classic_bike", "classic_bike", "cl…
## $ started_at <chr> "2022-05-23 23:06:58", "2022-05-11 08:53:28", "2022…
## $ ended_at <chr> "2022-05-23 23:40:19", "2022-05-11 09:31:22", "2022…
## $ start_station_name <chr> "Wabash Ave & Grand Ave", "DuSable Lake Shore Dr & …
## $ start_station_id <chr> "TA1307000117", "13300", "TA1305000032", "TA1305000…
## $ end_station_name <chr> "Halsted St & Roscoe St", "Field Blvd & South Water…
## $ end_station_id <chr> "TA1309000025", "15534", "13221", "TA1305000030", "…
## $ member_casual <chr> "member", "member", "member", "member", "member", "…
## $ date <date> 2022-05-23, 2022-05-11, 2022-05-26, 2022-05-10, 20…
## $ day <chr> "23", "11", "26", "10", "10", "04", "27", "29", "16…
## $ month <chr> "May", "May", "May", "May", "May", "May", "May", "M…
## $ year <chr> "2022", "2022", "2022", "2022", "2022", "2022", "20…
## $ day_of_week <chr> "Monday", "Wednesday", "Thursday", "Tuesday", "Tues…
## $ pick_time <chr> "23:06:", "08:53:", "18:36:", "07:30:", "17:31:", "…
## $ drop_time <chr> "23:40", "09:31", "18:58", "07:38", "17:36", "14:56…
tripdata$ride_length <- difftime(tripdata$ended_at,tripdata$started_at)
is.factor(tripdata$ride_length) # Confirming data not to be factor
## [1] FALSE
is.numeric(tripdata$ride_length) # Confirming data is to be numeric
## [1] FALSE
tripdata$ride_length <- as.numeric(as.duration(tripdata$ride_length))
is.numeric(tripdata$ride_length)
## [1] TRUE
tripdata_new <- tripdata
mean(tripdata_new$ride_length) # Average ride duration
## [1] 1136.131
median(tripdata_new$ride_length) # Median ride duration
## [1] 599
max(tripdata_new$ride_length) # Maximum ride duration
## [1] 2483235
min(tripdata_new$ride_length) # Minimum ride duration
## [1] -621201
summary(tripdata_new$ride_length) # Statistical summary of ride duration
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -621201 339 599 1136 1075 2483235
tripdata_new <- tripdata_new[!tripdata_new$ride_length <0, ]
str(tripdata_new) # Took a look on data
## 'data.frame': 5858958 obs. of 17 variables:
## $ ride_id : chr "EC2DE40644C6B0F4" "1C31AD03897EE385" "1542FBEC830415CF" "6FF59852924528F8" ...
## $ rideable_type : chr "classic_bike" "classic_bike" "classic_bike" "classic_bike" ...
## $ started_at : chr "2022-05-23 23:06:58" "2022-05-11 08:53:28" "2022-05-26 18:36:28" "2022-05-10 07:30:07" ...
## $ ended_at : chr "2022-05-23 23:40:19" "2022-05-11 09:31:22" "2022-05-26 18:58:18" "2022-05-10 07:38:49" ...
## $ start_station_name: chr "Wabash Ave & Grand Ave" "DuSable Lake Shore Dr & Monroe St" "Clinton St & Madison St" "Clinton St & Madison St" ...
## $ start_station_id : chr "TA1307000117" "13300" "TA1305000032" "TA1305000032" ...
## $ end_station_name : chr "Halsted St & Roscoe St" "Field Blvd & South Water St" "Wood St & Milwaukee Ave" "Clark St & Randolph St" ...
## $ end_station_id : chr "TA1309000025" "15534" "13221" "TA1305000030" ...
## $ member_casual : chr "member" "member" "member" "member" ...
## $ date : Date, format: "2022-05-23" "2022-05-11" ...
## $ day : chr "23" "11" "26" "10" ...
## $ month : chr "May" "May" "May" "May" ...
## $ year : chr "2022" "2022" "2022" "2022" ...
## $ day_of_week : chr "Monday" "Wednesday" "Thursday" "Tuesday" ...
## $ pick_time : chr "23:06:" "08:53:" "18:36:" "07:30:" ...
## $ drop_time : chr "23:40" "09:31" "18:58" "07:38" ...
## $ ride_length : num 2001 2274 1310 522 301 ...
summary(tripdata_new$ride_length)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 339 599 1136 1075 2483235
aggregate(tripdata_new$ride_length~tripdata_new$member_casual, FUN= mean) #average
## tripdata_new$member_casual tripdata_new$ride_length
## 1 casual 1709.6796
## 2 member 750.0187
aggregate(tripdata_new$ride_length~tripdata_new$member_casual, FUN= median) #median
## tripdata_new$member_casual tripdata_new$ride_length
## 1 casual 751
## 2 member 520
aggregate(tripdata_new$ride_length~tripdata_new$member_casual, FUN= max) #maximun
## tripdata_new$member_casual tripdata_new$ride_length
## 1 casual 2483235
## 2 member 93580
aggregate(tripdata_new$ride_length~tripdata_new$member_casual, FUN= min) #minimum
## tripdata_new$member_casual tripdata_new$ride_length
## 1 casual 0
## 2 member 0
aggregate(tripdata_new$ride_length~tripdata_new$member_casual+
tripdata_new$day_of_week, FUN= mean)
## tripdata_new$member_casual tripdata_new$day_of_week tripdata_new$ride_length
## 1 casual Friday 1651.1089
## 2 member Friday 741.8887
## 3 casual Monday 1702.1047
## 4 member Monday 718.6881
## 5 casual Saturday 1934.6068
## 6 member Saturday 835.9089
## 7 casual Sunday 2006.2732
## 8 member Sunday 829.4256
## 9 casual Thursday 1481.9947
## 10 member Thursday 726.0527
## 11 casual Tuesday 1520.1513
## 12 member Tuesday 717.0647
## 13 casual Wednesday 1450.3821
## 14 member Wednesday 713.4792
tripdata_new$day_of_week <- ordered(tripdata_new$day_of_week,
levels = c("Sunday","Monday",
"Tuesday","Wednesday",
"Thursday","Friday","Saturday"))
aggregate(tripdata_new$ride_length~tripdata_new$member_casual+
tripdata_new$day_of_week, FUN= mean)
## tripdata_new$member_casual tripdata_new$day_of_week tripdata_new$ride_length
## 1 casual Sunday 2006.2732
## 2 member Sunday 829.4256
## 3 casual Monday 1702.1047
## 4 member Monday 718.6881
## 5 casual Tuesday 1520.1513
## 6 member Tuesday 717.0647
## 7 casual Wednesday 1450.3821
## 8 member Wednesday 713.4792
## 9 casual Thursday 1481.9947
## 10 member Thursday 726.0527
## 11 casual Friday 1651.1089
## 12 member Friday 741.8887
## 13 casual Saturday 1934.6068
## 14 member Saturday 835.9089
glimpse(tripdata_new) # Took a look on data
## Rows: 5,858,958
## Columns: 17
## $ ride_id <chr> "EC2DE40644C6B0F4", "1C31AD03897EE385", "1542FBEC83…
## $ rideable_type <chr> "classic_bike", "classic_bike", "classic_bike", "cl…
## $ started_at <chr> "2022-05-23 23:06:58", "2022-05-11 08:53:28", "2022…
## $ ended_at <chr> "2022-05-23 23:40:19", "2022-05-11 09:31:22", "2022…
## $ start_station_name <chr> "Wabash Ave & Grand Ave", "DuSable Lake Shore Dr & …
## $ start_station_id <chr> "TA1307000117", "13300", "TA1305000032", "TA1305000…
## $ end_station_name <chr> "Halsted St & Roscoe St", "Field Blvd & South Water…
## $ end_station_id <chr> "TA1309000025", "15534", "13221", "TA1305000030", "…
## $ member_casual <chr> "member", "member", "member", "member", "member", "…
## $ date <date> 2022-05-23, 2022-05-11, 2022-05-26, 2022-05-10, 20…
## $ day <chr> "23", "11", "26", "10", "10", "04", "27", "29", "16…
## $ month <chr> "May", "May", "May", "May", "May", "May", "May", "M…
## $ year <chr> "2022", "2022", "2022", "2022", "2022", "2022", "20…
## $ day_of_week <ord> Monday, Wednesday, Thursday, Tuesday, Tuesday, Wedn…
## $ pick_time <chr> "23:06:", "08:53:", "18:36:", "07:30:", "17:31:", "…
## $ drop_time <chr> "23:40", "09:31", "18:58", "07:38", "17:36", "14:56…
## $ ride_length <dbl> 2001, 2274, 1310, 522, 301, 429, 533, 730, 1002, 88…
rides_per_day <- tripdata_new %>%
group_by(member_casual, day_of_week, rideable_type) %>%
summarise(number_of_rides = n(),
median_duration = median(ride_length),
maximum_duration = max(ride_length),
average_duration = mean(ride_length)) %>%
arrange(member_casual, day_of_week, rideable_type)
rides_per_day <- as.data.frame(rides_per_day)
glimpse(rides_per_day) # Took a look on it
## Rows: 35
## Columns: 7
## $ member_casual <chr> "casual", "casual", "casual", "casual", "casual", "ca…
## $ day_of_week <ord> Sunday, Sunday, Sunday, Monday, Monday, Monday, Tuesd…
## $ rideable_type <chr> "classic_bike", "docked_bike", "electric_bike", "clas…
## $ number_of_rides <int> 156683, 33968, 198160, 101921, 21193, 152634, 98284, …
## $ median_duration <dbl> 973.0, 1763.5, 710.0, 842.0, 1730.0, 604.0, 762.0, 16…
## $ maximum_duration <dbl> 90055, 2175468, 28826, 89997, 1922127, 28800, 89997, …
## $ average_duration <dbl> 1880.3107, 8064.8280, 1067.3310, 1703.8064, 7270.6341…
ggplot(data = rides_per_day,
mapping= aes(x = day_of_week, y = number_of_rides, fill= member_casual)) +
geom_bar(position = "dodge", stat = "identity") +
scale_y_continuous(labels = scales::comma)+
labs( x = "Day of Week", y = "Number of Rides", fill = "Member/Casual" ,
title = "Average Number of Rides Per Day : Memeber vs Causal",
subtitle = "High peek on Weekend by Casual riders",
caption = "Data Analyst 'JP'")
Total number of rides taken by riders on weekend is almost the same, where on weekdays, member riders engagement are high compared to casual riders.
ggplot(data= rides_per_day,
mapping = aes(x = day_of_week , y = average_duration, fill = member_casual)) +
geom_bar(position = "dodge", stat = "identity") +
labs(x = "Day of Week", y = "Average Duration", fill = "Memeber/Casual",
title = " Average Ride Duration Per Day : Member vs Casual",
subtitle = "Overall Average Ride Duration by Casual Riders are High",
caption = "Data Analyst 'JP'")
Casual riders’ average duration for using bikes are much high than Memebers. I’ll be suggesting to advertise about being ‘Members’ is way not too costly.
ggplot(data= rides_per_day,
mapping = aes(x = day_of_week , y = maximum_duration, fill = member_casual)) +
geom_bar(position = "dodge", stat = "identity") +
labs(x = "Day of Week", y = "Average Duration", fill = "Memeber/Casual",
title = " Maximum Ride Duration Per Day : Member vs Casual",
subtitle = "Overall Maximum Ride Duration by Casual Riders are High",
caption = "Data Analyst 'JP'")
Maximum Rides duration taken by Casual riders is way high than Memebers. Casuals Riders riding bikes for maximum ‘ride duration’ than Memebers.
ggplot(data= rides_per_day,
mapping = aes(x = day_of_week , y = median_duration, fill = member_casual)) +
geom_bar(position = "dodge", stat = "identity") +
labs(x = "Day of Week", y = "Average Duration", fill = "Memeber/Casual",
title = " Median Ride Duration Per Day : Member vs Casual",
subtitle = "Overall Average Ride Duration by Casual Riders are High",
caption = "Data Analyst 'JP'")
Overall Casual Riders riding bikes for longer period of time.
ggplot(data = rides_per_day) +
geom_point(mapping = aes (x = day_of_week , y= number_of_rides, color= rideable_type, shape = member_casual)) +
scale_y_continuous(labels = scales::comma)
Docked bike is being used only by casual riders.
ggplot(data= rides_per_day,
mapping = aes(x = day_of_week , y = average_duration,
fill = member_casual, shape = rideable_type)) +
geom_point() +
facet_wrap(~member_casual) +
labs(x = "Day of Week", y = "Average Duration", shape = "Rideable Types ", fill = "Member/Casual",
title = " Average Ride Duration by bike-types : Member vs Casual",
subtitle = "Average ride duration for Docked bike is high by casuals",
caption = "Data Analyst 'JP'")
Casual Riders prefer to ride “Docked Bike” for long amount of time.
ggplot(data= rides_per_day,
mapping = aes(x = rideable_type , y = day_of_week, fill = member_casual)) +
geom_bar(position = "dodge", stat = "identity") +facet_wrap(~member_casual)+
labs(x = "Rideable Type", y = "Day of Week", fill = "Memeber/Casual",
title = " Ridable Types used per Day : Member vs Casual",
subtitle = "members never used 'docked bike' as of now",
caption = "Data Analyst 'JP'")
Members never picked ‘docked bikes’ for last 12 months
ggplot(data= rides_per_day,
mapping = aes(x = member_casual , y = number_of_rides, fill = day_of_week)) +
geom_bar(position = "dodge", stat = "identity") +
scale_y_continuous(labels = scales::comma) +
labs(x = "Member vs Casual", y = "Number of Rides", fill = "Memeber/Casual",
title = " Rides used by Member & Casual",
subtitle = "Members' ride are usually on peek during weekdays",
caption = "Data Analyst 'JP'")
The number of rides taken by Casual riders is high than Member riders. advertising them to get membership since they are paying more money than members.
tripdata$drop <- as_datetime(tripdata$ended_at)
tripdata$drop_time <- format(as_datetime(tripdata$drop), "%H:%M")
glimpse(tripdata)
## Rows: 5,859,061
## Columns: 18
## $ ride_id <chr> "EC2DE40644C6B0F4", "1C31AD03897EE385", "1542FBEC83…
## $ rideable_type <chr> "classic_bike", "classic_bike", "classic_bike", "cl…
## $ started_at <chr> "2022-05-23 23:06:58", "2022-05-11 08:53:28", "2022…
## $ ended_at <chr> "2022-05-23 23:40:19", "2022-05-11 09:31:22", "2022…
## $ start_station_name <chr> "Wabash Ave & Grand Ave", "DuSable Lake Shore Dr & …
## $ start_station_id <chr> "TA1307000117", "13300", "TA1305000032", "TA1305000…
## $ end_station_name <chr> "Halsted St & Roscoe St", "Field Blvd & South Water…
## $ end_station_id <chr> "TA1309000025", "15534", "13221", "TA1305000030", "…
## $ member_casual <chr> "member", "member", "member", "member", "member", "…
## $ date <date> 2022-05-23, 2022-05-11, 2022-05-26, 2022-05-10, 20…
## $ day <chr> "23", "11", "26", "10", "10", "04", "27", "29", "16…
## $ month <chr> "May", "May", "May", "May", "May", "May", "May", "M…
## $ year <chr> "2022", "2022", "2022", "2022", "2022", "2022", "20…
## $ day_of_week <chr> "Monday", "Wednesday", "Thursday", "Tuesday", "Tues…
## $ pick_time <chr> "23:06:", "08:53:", "18:36:", "07:30:", "17:31:", "…
## $ drop_time <chr> "23:40", "09:31", "18:58", "07:38", "17:36", "14:56…
## $ ride_length <dbl> 2001, 2274, 1310, 522, 301, 429, 533, 730, 1002, 88…
## $ drop <dttm> 2022-05-23 23:40:19, 2022-05-11 09:31:22, 2022-05-…
tripdata <- tripdata %>%
select(-c(drop))
glimpse(tripdata) # look on data
## Rows: 5,859,061
## Columns: 17
## $ ride_id <chr> "EC2DE40644C6B0F4", "1C31AD03897EE385", "1542FBEC83…
## $ rideable_type <chr> "classic_bike", "classic_bike", "classic_bike", "cl…
## $ started_at <chr> "2022-05-23 23:06:58", "2022-05-11 08:53:28", "2022…
## $ ended_at <chr> "2022-05-23 23:40:19", "2022-05-11 09:31:22", "2022…
## $ start_station_name <chr> "Wabash Ave & Grand Ave", "DuSable Lake Shore Dr & …
## $ start_station_id <chr> "TA1307000117", "13300", "TA1305000032", "TA1305000…
## $ end_station_name <chr> "Halsted St & Roscoe St", "Field Blvd & South Water…
## $ end_station_id <chr> "TA1309000025", "15534", "13221", "TA1305000030", "…
## $ member_casual <chr> "member", "member", "member", "member", "member", "…
## $ date <date> 2022-05-23, 2022-05-11, 2022-05-26, 2022-05-10, 20…
## $ day <chr> "23", "11", "26", "10", "10", "04", "27", "29", "16…
## $ month <chr> "May", "May", "May", "May", "May", "May", "May", "M…
## $ year <chr> "2022", "2022", "2022", "2022", "2022", "2022", "20…
## $ day_of_week <chr> "Monday", "Wednesday", "Thursday", "Tuesday", "Tues…
## $ pick_time <chr> "23:06:", "08:53:", "18:36:", "07:30:", "17:31:", "…
## $ drop_time <chr> "23:40", "09:31", "18:58", "07:38", "17:36", "14:56…
## $ ride_length <dbl> 2001, 2274, 1310, 522, 301, 429, 533, 730, 1002, 88…
tripdata_new2 <- tripdata_new
tripdata_new2$pick <- as_datetime(tripdata_new2$started_at)
tripdata_new2$pick_hour <- format(as_datetime(tripdata_new2$pick), "%H")
tripdata_new2$drop <- as_datetime(tripdata_new2$ended_at)
tripdata_new2$drop_hour <- format(as_datetime(tripdata_new2$drop), "%H")
tripdata_new2 <- tripdata_new2 %>%
select(-c(pick, drop))
glimpse(tripdata_new2) # Took a look on it
## Rows: 5,858,958
## Columns: 19
## $ ride_id <chr> "EC2DE40644C6B0F4", "1C31AD03897EE385", "1542FBEC83…
## $ rideable_type <chr> "classic_bike", "classic_bike", "classic_bike", "cl…
## $ started_at <chr> "2022-05-23 23:06:58", "2022-05-11 08:53:28", "2022…
## $ ended_at <chr> "2022-05-23 23:40:19", "2022-05-11 09:31:22", "2022…
## $ start_station_name <chr> "Wabash Ave & Grand Ave", "DuSable Lake Shore Dr & …
## $ start_station_id <chr> "TA1307000117", "13300", "TA1305000032", "TA1305000…
## $ end_station_name <chr> "Halsted St & Roscoe St", "Field Blvd & South Water…
## $ end_station_id <chr> "TA1309000025", "15534", "13221", "TA1305000030", "…
## $ member_casual <chr> "member", "member", "member", "member", "member", "…
## $ date <date> 2022-05-23, 2022-05-11, 2022-05-26, 2022-05-10, 20…
## $ day <chr> "23", "11", "26", "10", "10", "04", "27", "29", "16…
## $ month <chr> "May", "May", "May", "May", "May", "May", "May", "M…
## $ year <chr> "2022", "2022", "2022", "2022", "2022", "2022", "20…
## $ day_of_week <ord> Monday, Wednesday, Thursday, Tuesday, Tuesday, Wedn…
## $ pick_time <chr> "23:06:", "08:53:", "18:36:", "07:30:", "17:31:", "…
## $ drop_time <chr> "23:40", "09:31", "18:58", "07:38", "17:36", "14:56…
## $ ride_length <dbl> 2001, 2274, 1310, 522, 301, 429, 533, 730, 1002, 88…
## $ pick_hour <chr> "23", "08", "18", "07", "17", "14", "12", "19", "17…
## $ drop_hour <chr> "23", "09", "18", "07", "17", "14", "12", "19", "18…
ggplot(data= tripdata_new2,
mapping = aes(x = pick_hour, fill = member_casual)) +
geom_bar(position = "dodge") +
scale_y_continuous(labels = scales::comma) +
labs(x = "Bikes' Pick Hour 00:00 To 23:00", y = "Number of Rides Taken", fill = "Memeber/Casual",
title = " Rush Hours by Member & Casual",
subtitle = "High Peek Rate During Afternoon",
caption = "Data Analyst 'JP'")
Members and Casual Riders using rides during work-hour are high.
ggplot(data= tripdata_new2,
mapping = aes(x = drop_hour, fill = member_casual)) +
geom_bar(position = "dodge") +
scale_y_continuous(labels = scales::comma) +
labs(x = "Bikes' Drop Hour 00:00 To 23:00", y = "Number of Rides Taken", fill = "Memeber/Casual",
title = " Bikes' Drop Hours by Member & Casual",
subtitle = "High Drop Rate too During Afternoon",
caption = "Data Analyst 'JP'")
Experimented the idea of Riders picking bikes and dropping it, on time basis.
tripdata_new2$month1 <- as.Date(tripdata_new2$started_at)
tripdata_new2$month_year <- format(as.Date(tripdata_new2$month1), "%m %y")
tripdata_new2 <- tripdata_new2 %>%
select(-c(month1))
glimpse(tripdata_new2) # Checked Data
## Rows: 5,858,958
## Columns: 20
## $ ride_id <chr> "EC2DE40644C6B0F4", "1C31AD03897EE385", "1542FBEC83…
## $ rideable_type <chr> "classic_bike", "classic_bike", "classic_bike", "cl…
## $ started_at <chr> "2022-05-23 23:06:58", "2022-05-11 08:53:28", "2022…
## $ ended_at <chr> "2022-05-23 23:40:19", "2022-05-11 09:31:22", "2022…
## $ start_station_name <chr> "Wabash Ave & Grand Ave", "DuSable Lake Shore Dr & …
## $ start_station_id <chr> "TA1307000117", "13300", "TA1305000032", "TA1305000…
## $ end_station_name <chr> "Halsted St & Roscoe St", "Field Blvd & South Water…
## $ end_station_id <chr> "TA1309000025", "15534", "13221", "TA1305000030", "…
## $ member_casual <chr> "member", "member", "member", "member", "member", "…
## $ date <date> 2022-05-23, 2022-05-11, 2022-05-26, 2022-05-10, 20…
## $ day <chr> "23", "11", "26", "10", "10", "04", "27", "29", "16…
## $ month <chr> "May", "May", "May", "May", "May", "May", "May", "M…
## $ year <chr> "2022", "2022", "2022", "2022", "2022", "2022", "20…
## $ day_of_week <ord> Monday, Wednesday, Thursday, Tuesday, Tuesday, Wedn…
## $ pick_time <chr> "23:06:", "08:53:", "18:36:", "07:30:", "17:31:", "…
## $ drop_time <chr> "23:40", "09:31", "18:58", "07:38", "17:36", "14:56…
## $ ride_length <dbl> 2001, 2274, 1310, 522, 301, 429, 533, 730, 1002, 88…
## $ pick_hour <chr> "23", "08", "18", "07", "17", "14", "12", "19", "17…
## $ drop_hour <chr> "23", "09", "18", "07", "17", "14", "12", "19", "18…
## $ month_year <chr> "05 22", "05 22", "05 22", "05 22", "05 22", "05 22…
ggplot(data= tripdata_new2,
mapping = aes(x = month_year, fill = member_casual)) +
geom_bar(position = "dodge") +
scale_y_continuous(labels = scales::comma) +
labs(x = "Month-Year", y = "Number of Rides Taken", fill = "Memeber/Casual",
title = " Month Wise Rides Taken by Members vs Casuals ",
subtitle = "Causal Riders Taking Less Rides in Winter Season than Members",
caption = "Data Analyst 'JP'")
Casual Riders engagement in Winter Season is very low as compared to Members. Advertising before and during the Winter will have good impact on company too. While offering ‘becoming new member’ gets discount on “electric bike” & “classic bike” too 😊
Cyclistic’s bike-share has good amount of engagements whether riders are Casuals or Members.
Advertising about being ‘Members’ is not too costly at all.
Casual Riders can easily save money when they subscribe annual Membership.
Casual Riders engagement in Winter Season is low as compared to Member Riders, Advertising about getting Membership too on Social Media like Instagram, Facebook etc. will help them get to know about membership plans.
Before and During Winter Season, Advertising about getting membership will be beneficial too.
If data has “lat, lng” to “address” could have help analysis much deeper.like area-wise bike-share distribution.
we can increase rides cost during time but can’t say its effect on it anyhow well riders engagement are high during work-hour.
If more about updated info like start station id , lat, lng, more info about rider, address would be useful