Welcome to the Cyclistic bike-share analysis case study! In this case study, I will perform many real-world tasks of a junior data analyst. I will work for a fictional company, Cyclistic, and meet different characters and team members. In order to answer the key business questions, I will follow the steps of the data analysis process: ask, prepare, process, analyze, share, and act.
You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations
In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.
Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. Oneapproach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members.
Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, Moreno believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, Moreno believes there is a very good chance to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs.
Moreno has set a clear goal: Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. Moreno and her team are interested in analyzing the Cyclistic historical bike trip data to identify trends.
Moreno the director of marketing has assigned me the first question to answer: How do annual members and casual riders use Cyclistic bikes differently?
Key Task
I will use Cyclistic’s historical trip data to analyze and identify trends. The data has been made available by Motivate International Inc. under this license.I Will choose to work with an entire year of data from April-2020 to Mar-2021.This is public data that I Will use to explore how different customer types are using Cyclistic bikes. But note that data-privacy issues prohibit me from using riders’ personally identifiable information. This means that I won’t be able to connect pass purchases to credit card numbers to determine if casual riders live in the Cyclistic service area or if they have purchased multiple single passes.
Key tasks
Installing the Required Packages
#Installing the Required Packages
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(skimr)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(ggplot2)
library(readr)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ tibble 3.2.1
## ✔ purrr 1.0.1 ✔ tidyr 1.3.0
## ✔ stringr 1.5.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Downloading and Importing the data
April2021 <- read_csv("C:/Users/SUKHVIR/Downloads/Trip Data from 2021 to 2022/202104-divvy-tripdata.csv")
May2021 <- read_csv("C:/Users/SUKHVIR/Downloads/Trip Data from 2021 to 2022/202105-divvy-tripdata.csv")
Jun2021 <- read_csv("C:/Users/SUKHVIR/Downloads/Trip Data from 2021 to 2022/202106-divvy-tripdata.csv")
July2021 <- read_csv("C:/Users/SUKHVIR/Downloads/Trip Data from 2021 to 2022/202107-divvy-tripdata.csv")
Aug2021 <- read_csv("C:/Users/SUKHVIR/Downloads/Trip Data from 2021 to 2022/202108-divvy-tripdata.csv")
Sept2021 <- read_csv("C:/Users/SUKHVIR/Downloads/Trip Data from 2021 to 2022/202109-divvy-tripdata.csv")
Oct2021 <- read_csv("C:/Users/SUKHVIR/Downloads/Trip Data from 2021 to 2022/202110-divvy-tripdata.csv")
Nov2021 <- read_csv("C:/Users/SUKHVIR/Downloads/Trip Data from 2021 to 2022/202111-divvy-tripdata.csv")
Dec2021 <- read_csv("C:/Users/SUKHVIR/Downloads/Trip Data from 2021 to 2022/202112-divvy-tripdata.csv")
Jan2022 <- read_csv("C:/Users/SUKHVIR/Downloads/Trip Data from 2021 to 2022/202201-divvy-tripdata.csv")
Feb2022 <- read_csv("C:/Users/SUKHVIR/Downloads/Trip Data from 2021 to 2022/202202-divvy-tripdata.csv")
Mar2022 <- read_csv("C:/Users/SUKHVIR/Downloads/Trip Data from 2021 to 2022/202203-divvy-tripdata.csv")
Combine the All data into one file
Master_Trip_Data<-rbind(April2021,May2021,Jun2021,July2021,Aug2021,Sept2021,Oct2021,Nov2021,
Dec2021,Jan2022,Feb2022,Mar2022)
Key tasks
Using glimpse and Colname function to check the data
glimpse(Master_Trip_Data)
## Rows: 5,723,532
## Columns: 13
## $ ride_id <chr> "6C992BD37A98A63F", "1E0145613A209000", "E498E15508…
## $ rideable_type <chr> "classic_bike", "docked_bike", "docked_bike", "clas…
## $ started_at <dttm> 2021-04-12 18:25:36, 2021-04-27 17:27:11, 2021-04-…
## $ ended_at <dttm> 2021-04-12 18:56:55, 2021-04-27 18:31:29, 2021-04-…
## $ start_station_name <chr> "State St & Pearson St", "Dorchester Ave & 49th St"…
## $ start_station_id <chr> "TA1307000061", "KA1503000069", "20121", "TA1305000…
## $ end_station_name <chr> "Southport Ave & Waveland Ave", "Dorchester Ave & 4…
## $ end_station_id <chr> "13235", "KA1503000069", "20121", "13235", "20121",…
## $ start_lat <dbl> 41.89745, 41.80577, 41.74149, 41.90312, 41.74149, 4…
## $ start_lng <dbl> -87.62872, -87.59246, -87.65841, -87.67394, -87.658…
## $ end_lat <dbl> 41.94815, 41.80577, 41.74149, 41.94815, 41.74149, 4…
## $ end_lng <dbl> -87.66394, -87.59246, -87.65841, -87.66394, -87.658…
## $ member_casual <chr> "member", "casual", "casual", "member", "casual", "…
colnames(Master_Trip_Data)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
Filtering the column and removed unusable column
Master_Trip_Data_01<- Master_Trip_Data %>%
select(ride_id,rideable_type,started_at,ended_at,member_casual)
colnames(Master_Trip_Data_01)
## [1] "ride_id" "rideable_type" "started_at" "ended_at"
## [5] "member_casual"
Rename the columm name
Master_Trip_Data_01_Rename<-Master_Trip_Data_01 %>%
rename(bike_type = rideable_type,customer_type=member_casual)
colnames(Master_Trip_Data_01_Rename)
## [1] "ride_id" "bike_type" "started_at" "ended_at"
## [5] "customer_type"
Checking the duplicate and Na value
sum(is.na(Master_Trip_Data_01_Rename))
## [1] 0
sum(duplicated(Master_Trip_Data_01_Rename$ride_id))
## [1] 0
Convert the started at and ended at character format to POSIXct to know the ride length
Master_Trip_Data_01_Rename$started_at<-as.POSIXct(Master_Trip_Data_01_Rename$started_at,format="%Y-%m-%d %H:%M:%S")
Master_Trip_Data_01_Rename$ended_at<-as.POSIXct(Master_Trip_Data_01_Rename$ended_at,format="%Y-%m-%d %H:%M:%S")
Extracting date,day,month year and day of week
Master_Trip_Data_01_Rename$date<-as.Date(Master_Trip_Data_01_Rename$started_at)
Master_Trip_Data_01_Rename$day<-format(as.Date(Master_Trip_Data_01_Rename$date),"%d")
Master_Trip_Data_01_Rename$Month<-format(as.Date(Master_Trip_Data_01_Rename$date),"%m")
Master_Trip_Data_01_Rename$Year<-format(as.Date(Master_Trip_Data_01_Rename$date),"%Y")
Master_Trip_Data_01_Rename$day_of_week<-format(as.Date(Master_Trip_Data_01_Rename$date),"%A")
Calculating the ride length
Master_Trip_Data_01_Rename$ride_length<-difftime(Master_Trip_Data_01_Rename$ended_at,Master_Trip_Data_01_Rename$started_at)
glimpse(Master_Trip_Data_01_Rename)
## Rows: 5,723,532
## Columns: 11
## $ ride_id <chr> "6C992BD37A98A63F", "1E0145613A209000", "E498E15508A80BA…
## $ bike_type <chr> "classic_bike", "docked_bike", "docked_bike", "classic_b…
## $ started_at <dttm> 2021-04-12 18:25:36, 2021-04-27 17:27:11, 2021-04-03 12…
## $ ended_at <dttm> 2021-04-12 18:56:55, 2021-04-27 18:31:29, 2021-04-07 11…
## $ customer_type <chr> "member", "casual", "casual", "member", "casual", "casua…
## $ date <date> 2021-04-12, 2021-04-27, 2021-04-03, 2021-04-17, 2021-04…
## $ day <chr> "12", "27", "03", "17", "03", "25", "03", "06", "12", "2…
## $ Month <chr> "04", "04", "04", "04", "04", "04", "04", "04", "04", "0…
## $ Year <chr> "2021", "2021", "2021", "2021", "2021", "2021", "2021", …
## $ day_of_week <chr> "Monday", "Tuesday", "Saturday", "Saturday", "Saturday",…
## $ ride_length <drtn> 1879 secs, 3858 secs, 341859 secs, 1506 secs, 5477 secs…
Converting the ride_length column from factor to numeric
Master_Trip_Data_01_Rename$ride_length<-as.numeric(Master_Trip_Data_01_Rename$ride_length)
Filtering and removing the negative value from the ride length column
Master_Trip_Data_01_Rename_Arr<-Master_Trip_Data_01_Rename %>%
filter(ride_length>=1)
Using arrange function to check whether negtive value removed from dataset or not
Master_Trip_Data_01_Rename_Arr_1<-arrange(Master_Trip_Data_01_Rename_Arr,ride_length)
glimpse(Master_Trip_Data_01_Rename_Arr_1)
## Rows: 5,722,873
## Columns: 11
## $ ride_id <chr> "3F99442B76EC2051", "2DC9DF08B3526631", "08F12FCBFCB2E2A…
## $ bike_type <chr> "classic_bike", "classic_bike", "classic_bike", "electri…
## $ started_at <dttm> 2021-04-16 18:18:00, 2021-04-16 07:58:39, 2021-04-04 23…
## $ ended_at <dttm> 2021-04-16 18:18:01, 2021-04-16 07:58:40, 2021-04-04 23…
## $ customer_type <chr> "member", "member", "casual", "casual", "member", "membe…
## $ date <date> 2021-04-16, 2021-04-16, 2021-04-04, 2021-04-07, 2021-04…
## $ day <chr> "16", "16", "04", "07", "17", "18", "03", "18", "07", "0…
## $ Month <chr> "04", "04", "04", "04", "04", "04", "04", "04", "04", "0…
## $ Year <chr> "2021", "2021", "2021", "2021", "2021", "2021", "2021", …
## $ day_of_week <chr> "Friday", "Friday", "Sunday", "Wednesday", "Saturday", "…
## $ ride_length <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
Giving name to the final data
Final_file<-Master_Trip_Data_01_Rename_Arr_1
Key tasks
Calculate the average ride_length for members and casual riders
Final_file %>%
group_by(customer_type) %>%
summarise(Avg_ride_lenght=mean(ride_length))
## # A tibble: 2 × 2
## customer_type Avg_ride_lenght
## <chr> <dbl>
## 1 casual 1905.
## 2 member 802.
Calculate the average ride_length for users by day_of_week
Final_file %>%
group_by(day_of_week,customer_type) %>%
summarise(Avg_ride_length = mean(ride_length))
## `summarise()` has grouped output by 'day_of_week'. You can override using the
## `.groups` argument.
## # A tibble: 14 × 3
## # Groups: day_of_week [7]
## day_of_week customer_type Avg_ride_length
## <chr> <chr> <dbl>
## 1 Friday casual 1806.
## 2 Friday member 788.
## 3 Monday casual 1889.
## 4 Monday member 778.
## 5 Saturday casual 2057.
## 6 Saturday member 900.
## 7 Sunday casual 2245.
## 8 Sunday member 921.
## 9 Thursday casual 1673.
## 10 Thursday member 754.
## 11 Tuesday casual 1646.
## 12 Tuesday member 751.
## 13 Wednesday casual 1666.
## 14 Wednesday member 755.
Calculate the number of rides for users by day_of_week
Final_file %>%
group_by(day_of_week,customer_type) %>%
summarise(Numbers_of_rides=n())
## `summarise()` has grouped output by 'day_of_week'. You can override using the
## `.groups` argument.
## # A tibble: 14 × 3
## # Groups: day_of_week [7]
## day_of_week customer_type Numbers_of_rides
## <chr> <chr> <int>
## 1 Friday casual 364237
## 2 Friday member 453072
## 3 Monday casual 292960
## 4 Monday member 439405
## 5 Saturday casual 549945
## 6 Saturday member 431302
## 7 Sunday casual 482746
## 8 Sunday member 387681
## 9 Thursday casual 293604
## 10 Thursday member 475298
## 11 Tuesday casual 276338
## 12 Tuesday member 490059
## 13 Wednesday casual 286364
## 14 Wednesday member 499862
Checking Minimum,Maximum,Average ride length by customer type
Final_file %>%
group_by(customer_type) %>%
summarise(Min_ride=min(ride_length),max_ride=max(ride_length),Avg_ride=mean(ride_length))
## # A tibble: 2 × 4
## customer_type Min_ride max_ride Avg_ride
## <chr> <dbl> <dbl> <dbl>
## 1 casual 1 3356649 1905.
## 2 member 1 93594 802.
Key task
In the visualize phase i will divide the data visualization according to the Customer type,bike,type and ride length.
Total_customer<-Final_file %>%
select(customer_type) %>%
count(customer_type)
ggplot(Total_customer,aes(x=customer_type,y=n,fill=customer_type))+
geom_bar(stat = "identity")+
geom_text(aes(label=n),vjust=-0.4)+theme_minimal()+
labs(title = "Casuals and Members distribution",x="Casuals x Members",y="Total Counts")+
scale_fill_brewer(palette = "Blues")
ggplot(data = Final_file)+geom_bar(mapping = aes(x=customer_type,fill=bike_type))+
facet_wrap(~bike_type)+theme_minimal()+
labs(title = "Types of bikes used by differrent customer",
x="Casuals x Members",y="Ride_length")+theme_minimal()+
scale_fill_brewer(direction = -1)
Day_of_week <-Final_file %>%
group_by(customer_type,day_of_week) %>%
summarise(number_of_rides=n())
## `summarise()` has grouped output by 'customer_type'. You can override using the
## `.groups` argument.
ggplot(data = Day_of_week)+geom_col(mapping = aes(x=day_of_week,y=number_of_rides,fill=customer_type))+
facet_wrap(~customer_type)+theme_minimal()+
theme(axis.text.x = element_text(angle = 90, hjust = 1))+
scale_x_discrete(limits=c("Sunday", "Monday", "Tuesday",
"Wednesday", "Thursday", "Friday", "Saturday"))+
scale_fill_brewer(palette = "Paired")+
labs(title = "Casual and Annual member on Weekdays")
Month<-Final_file %>%
group_by(customer_type,Month) %>%
summarise(number_of_rides=n())
## `summarise()` has grouped output by 'customer_type'. You can override using the
## `.groups` argument.
ggplot(data = Month)+geom_col(mapping = aes(x=Month,y=number_of_rides,fill=customer_type))+
facet_wrap(~customer_type)+
theme_minimal()+
theme(axis.text.x = element_text(angle = 90, hjust = 1))+
scale_fill_brewer(direction = -1)+
labs(title = "Casual and Annual member on Monthly")
Total_bikes<-Final_file %>%
group_by(bike_type) %>%
summarise(number_of_rides=n())
ggplot(Total_bikes,aes(x=bike_type,y=number_of_rides, fill=bike_type))+
geom_col()+
theme_minimal()+
labs(title = "Total counts Accroding to Bike_type",x="bike_type",y="Total Counts")+
scale_fill_brewer(palette = "Blues",direction = -1)+
geom_text(aes(label=number_of_rides),vjust= -0.2)
Bike_type_wd<- Final_file %>%
group_by(bike_type,day_of_week) %>%
summarise(number_of_rides= n())
## `summarise()` has grouped output by 'bike_type'. You can override using the
## `.groups` argument.
ggplot(Bike_type_wd,aes(x=day_of_week, y=number_of_rides,fill=bike_type))+
geom_col()+theme_minimal()+facet_wrap(~bike_type)+
theme(axis.text.x = element_text(angle = 90, hjust = 1))+
scale_x_discrete(limits=c("Sunday", "Monday", "Tuesday",
"Wednesday", "Thursday", "Friday", "Saturday"))+
labs(title = "Types of bikes Used Accroding to the Day of Week")+
scale_fill_brewer(direction = -1)
bike_type_mn<-Final_file %>%
group_by(bike_type,Month) %>%
summarise(number_of_rides=n())
## `summarise()` has grouped output by 'bike_type'. You can override using the
## `.groups` argument.
ggplot(bike_type_mn,aes(x=Month, y=number_of_rides,fill=bike_type))+
geom_col()+theme_minimal()+facet_wrap(~bike_type)+
theme(axis.text.x = element_text(angle = 90, hjust = 1))+
scale_fill_brewer(direction = -1)
3.Visualize the data by Ride length
Cus_ridelength<-Final_file %>%
group_by(customer_type) %>%
summarise(time_spent=sum(ride_length))
ggplot(Cus_ridelength,aes(x=customer_type,y=time_spent, fill=customer_type))+
geom_col()+theme_minimal()+labs(title = "Total time spent by Casual and Annual Member",y="Time Spent")+
scale_fill_brewer(palette = "Blues",direction = -1)
Bike_ridelength<-Final_file %>%
group_by(bike_type) %>%
summarise(Time_Spent=sum(ride_length))
ggplot(Bike_ridelength,aes(x=bike_type,y=Time_Spent,fill=bike_type))+
geom_col()+theme_minimal()+scale_fill_brewer(palette = "Blues",direction = -1)+
labs(title = "Total Time Spent by each bikes")+
scale_x_discrete(limits=c("classic_bike","electric_bike","docked_bike"))
Day<-Final_file %>%
group_by(day_of_week) %>%
summarise(Time_Spent=sum(ride_length))
ggplot(Day,aes(x=day_of_week,y=Time_Spent,fill=Time_Spent))+
geom_col()+scale_x_discrete(limits=c("Sunday", "Monday", "Tuesday",
"Wednesday", "Thursday", "Friday", "Saturday"))+theme_minimal()+
theme(axis.text.x = element_text(angle = 90, hjust = 1))+
labs(title = "Total Time Spent According to the day of Week")+
scale_fill_distiller(direction = -1)
Month<-Final_file %>%
group_by(Month) %>%
summarise(Time_spent=sum(ride_length))
ggplot(Month,aes(x=Month,y=Time_spent,fill=Time_spent))+
geom_col()+theme_minimal()+
labs(title = "Total Time Spent According to the Month")
Findings
1.Customer type:- As per the finding the count of the annual member is more then casual members.Both Annual member and casual member preferred to use classic bike but the annual member used classic bike more than casual members which is followed by the electrical bike and docked bike.On weekends,casual member takes most of the rides compared to other weekdays.on the contrary, Tuesday Wednesday,Thursday are peak rides for annual members.On monthly basis, both casual and annual member are at peak in the mid of year such as May,Jun,July,Aug
2.Bike type :- Classic bike is used mostly if compared to the electric and docked bike.On weekdays, classic bike mostly used on Sundays and Saturday and for the electric bile all weekdays seems equal. On Monthly basis, june and july are the peak month for classic bike and Oct month seems busy for electrical bike.
3.Ride length :- Casual member spent most of the time as compared to the annual members.Classic bike travel mostly followed by electrical bike. on weekdays, Saturday and Sunday are the busiest one.on monthly basis, mid months are the busiest if compared to the other months
Conclusions
After checking the finding we can see that annual members mostly take the short trip but numbers of the rides is much more than the causal members. on the contrary, casual members take the longest ride in comparison to annual members.
Recommendations
1.We should run the online campaign or use popular fitness model to tell the people,what is the benefit of using cycle on daily basis and how we can be healthy both physically and mentally by using cycle on daily routine.
2.We should give the special discounts or some months free if customer will purchase annual membership.we should also provide the fitness electronic gadgets along with annual membership is possible.
3.we should also tell the different uses of all three different bike so that customer can use the bike accordingly. For example we can use electrical for long distance and withing city we can use classic bike So that we can increase the annual membership of electrical bike.We also tell the difference to people about using electrical bike and using petroleum bike or cars and how much it is expensive and environment friendly.